All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/2] Use ioreq-server API
@ 2014-12-05 10:50 Paul Durrant
  2014-12-05 10:50 ` [Qemu-devel] [PATCH v5 1/2] Add device listener interface Paul Durrant
  2014-12-05 10:50 ` [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available Paul Durrant
  0 siblings, 2 replies; 20+ messages in thread
From: Paul Durrant @ 2014-12-05 10:50 UTC (permalink / raw)
  To: qemu-devel

This patch series is v5 of what was originally the single patch
"Xen: Use the ioreq-server API when available".

v2 of the series moved the code that added the PCI bus listener
to patch #1 and the remainder of the changes to patch #2. Patch #2
was then re-worked to constrain the #ifdefing to xen_common.h, as
requested by Stefano.

v3 of the series modifies patch #1 to add the listener interface
into the core qdev, rather than the PCI bus code. This change only
requires trivial modification to patch #2, to only act on realize/
unrealize of PCI devices. Patch #2 was also modified at Stefano's
request to remove an extra identity check of memory sections
against the ram region.

v4 of the series replaces the use of a vmstate pre_save callback
with extra code in the existing runstate change notification
callback. It also tidies up some things in xen-hvm.c pointed out
by Stefano and adds his ack to patch #2.

v5 rebases and fixes patch #1 to consistently use device/DEVICE
rather than a mix of that and qdev/QDEV, and also moves the
listener notifications into device_set_realized(). The series
was also re-tested against xen-4.5.0-rc3.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v5 1/2] Add device listener interface
  2014-12-05 10:50 [Qemu-devel] [PATCH v5 0/2] Use ioreq-server API Paul Durrant
@ 2014-12-05 10:50 ` Paul Durrant
  2014-12-05 11:44   ` Paolo Bonzini
  2014-12-05 10:50 ` [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available Paul Durrant
  1 sibling, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2014-12-05 10:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, Thomas Huth, Michael S. Tsirkin,
	Markus Armbruster, Christian Borntraeger, Paul Durrant,
	Igor Mammedov, Paolo Bonzini, Andreas Faerber"

The Xen ioreq-server API, introduced in Xen 4.5, requires that PCI device
models explicitly register with Xen for config space accesses. This patch
adds a listener interface into qdev-core which can be used by the Xen
interface code to monitor for arrival and departure of PCI devices.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andreas Faerber" <afaerber@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Thomas Huth <thuth@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 hw/core/qdev.c          |   53 +++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/qdev-core.h  |   10 +++++++++
 include/qemu/typedefs.h |    1 +
 3 files changed, 64 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 35fd00d..76ff9ef 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -189,6 +189,56 @@ int qdev_init(DeviceState *dev)
     return 0;
 }
 
+static QTAILQ_HEAD(device_listeners, DeviceListener) device_listeners
+    = QTAILQ_HEAD_INITIALIZER(device_listeners);
+
+enum ListenerDirection { Forward, Reverse };
+
+#define DEVICE_LISTENER_CALL(_callback, _direction, _args...)     \
+    do {                                                          \
+        DeviceListener *_listener;                                \
+                                                                  \
+        switch (_direction) {                                     \
+        case Forward:                                             \
+            QTAILQ_FOREACH(_listener, &device_listeners, link) {  \
+                if (_listener->_callback) {                       \
+                    _listener->_callback(_listener, ##_args);     \
+                }                                                 \
+            }                                                     \
+            break;                                                \
+        case Reverse:                                             \
+            QTAILQ_FOREACH_REVERSE(_listener, &device_listeners,  \
+                                   device_listeners, link) {      \
+                if (_listener->_callback) {                       \
+                    _listener->_callback(_listener, ##_args);     \
+                }                                                 \
+            }                                                     \
+            break;                                                \
+        default:                                                  \
+            abort();                                              \
+        }                                                         \
+    } while (0)
+
+static int device_listener_add(DeviceState *dev, void *opaque)
+{
+    DEVICE_LISTENER_CALL(realize, Forward, dev);
+
+    return 0;
+}
+
+void device_listener_register(DeviceListener *listener)
+{
+    QTAILQ_INSERT_TAIL(&device_listeners, listener, link);
+
+    qbus_walk_children(sysbus_get_default(), NULL, NULL, device_listener_add,
+                       NULL, NULL);
+}
+
+void device_listener_unregister(DeviceListener *listener)
+{
+    QTAILQ_REMOVE(&device_listeners, listener, link);
+}
+
 static void device_realize(DeviceState *dev, Error **errp)
 {
     DeviceClass *dc = DEVICE_GET_CLASS(dev);
@@ -994,6 +1044,8 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
             goto fail;
         }
 
+        DEVICE_LISTENER_CALL(realize, Forward, dev);
+
         hotplug_ctrl = qdev_get_hotplug_handler(dev);
         if (hotplug_ctrl) {
             hotplug_handler_plug(hotplug_ctrl, dev, &local_err);
@@ -1035,6 +1087,7 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
             dc->unrealize(dev, local_errp);
         }
         dev->pending_deleted_event = true;
+        DEVICE_LISTENER_CALL(unrealize, Reverse, dev);
     }
 
     if (local_err != NULL) {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 589bbe7..15a226f 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -165,6 +165,12 @@ struct DeviceState {
     int alias_required_for_version;
 };
 
+struct DeviceListener {
+    void (*realize)(DeviceListener *listener, DeviceState *dev);
+    void (*unrealize)(DeviceListener *listener, DeviceState *dev);
+    QTAILQ_ENTRY(DeviceListener) link;
+};
+
 #define TYPE_BUS "bus"
 #define BUS(obj) OBJECT_CHECK(BusState, (obj), TYPE_BUS)
 #define BUS_CLASS(klass) OBJECT_CLASS_CHECK(BusClass, (klass), TYPE_BUS)
@@ -376,4 +382,8 @@ static inline bool qbus_is_hotpluggable(BusState *bus)
 {
    return bus->hotplug_handler;
 }
+
+void device_listener_register(DeviceListener *listener);
+void device_listener_unregister(DeviceListener *listener);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 3475177..4bb4938 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -20,6 +20,7 @@ typedef struct Property Property;
 typedef struct PropertyInfo PropertyInfo;
 typedef struct CompatProperty CompatProperty;
 typedef struct DeviceState DeviceState;
+typedef struct DeviceListener DeviceListener;
 typedef struct BusState BusState;
 typedef struct BusClass BusClass;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available
  2014-12-05 10:50 [Qemu-devel] [PATCH v5 0/2] Use ioreq-server API Paul Durrant
  2014-12-05 10:50 ` [Qemu-devel] [PATCH v5 1/2] Add device listener interface Paul Durrant
@ 2014-12-05 10:50 ` Paul Durrant
  2015-01-28 19:32   ` Don Slutz
  1 sibling, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2014-12-05 10:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, Paul Durrant, Gerd Hoffmann,
	Stefan Hajnoczi, Paolo Bonzini

The ioreq-server API added to Xen 4.5 offers better security than
the existing Xen/QEMU interface because the shared pages that are
used to pass emulation request/results back and forth are removed
from the guest's memory space before any requests are serviced.
This prevents the guest from mapping these pages (they are in a
well known location) and attempting to attack QEMU by synthesizing
its own request structures. Hence, this patch modifies configure
to detect whether the API is available, and adds the necessary
code to use the API if it is.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Alexander Graf <agraf@suse.de>
---
 configure                   |   29 ++++++
 include/hw/xen/xen_common.h |  223 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                |    9 ++
 xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
 4 files changed, 399 insertions(+), 22 deletions(-)

diff --git a/configure b/configure
index 47048f0..b1f8c2a 100755
--- a/configure
+++ b/configure
@@ -1877,6 +1877,32 @@ int main(void) {
   xc_gnttab_open(NULL, 0);
   xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
   xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
+  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
+  return 0;
+}
+EOF
+      compile_prog "" "$xen_libs"
+    then
+    xen_ctrl_version=450
+    xen=yes
+
+  elif
+      cat > $TMPC <<EOF &&
+#include <xenctrl.h>
+#include <xenstore.h>
+#include <stdint.h>
+#include <xen/hvm/hvm_info_table.h>
+#if !defined(HVM_MAX_VCPUS)
+# error HVM_MAX_VCPUS not defined
+#endif
+int main(void) {
+  xc_interface *xc;
+  xs_daemon_open();
+  xc = xc_interface_open(0, 0, 0);
+  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
+  xc_gnttab_open(NULL, 0);
+  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
+  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
   return 0;
 }
 EOF
@@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
     echo "Target Sparc Arch $sparc_cpu"
 fi
 echo "xen support       $xen"
+if test "$xen" = "yes" ; then
+  echo "xen ctrl version  $xen_ctrl_version"
+fi
 echo "brlapi support    $brlapi"
 echo "bluez  support    $bluez"
 echo "Documentation     $docs"
diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index 95612a4..519696f 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -16,7 +16,9 @@
 
 #include "hw/hw.h"
 #include "hw/xen/xen.h"
+#include "hw/pci/pci.h"
 #include "qemu/queue.h"
+#include "trace.h"
 
 /*
  * We don't support Xen prior to 3.3.0.
@@ -179,4 +181,225 @@ static inline int xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
 }
 #endif
 
+/* Xen before 4.5 */
+#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
+
+#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
+#define HVM_PARAM_BUFIOREQ_EVTCHN 26
+#endif
+
+#define IOREQ_TYPE_PCI_CONFIG 2
+
+typedef uint32_t ioservid_t;
+
+static inline void xen_map_memory_section(XenXC xc, domid_t dom,
+                                          ioservid_t ioservid,
+                                          MemoryRegionSection *section)
+{
+}
+
+static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid,
+                                            MemoryRegionSection *section)
+{
+}
+
+static inline void xen_map_io_section(XenXC xc, domid_t dom,
+                                      ioservid_t ioservid,
+                                      MemoryRegionSection *section)
+{
+}
+
+static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
+                                        ioservid_t ioservid,
+                                        MemoryRegionSection *section)
+{
+}
+
+static inline void xen_map_pcidev(XenXC xc, domid_t dom,
+                                  ioservid_t ioservid,
+                                  PCIDevice *pci_dev)
+{
+}
+
+static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
+                                    ioservid_t ioservid,
+                                    PCIDevice *pci_dev)
+{
+}
+
+static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
+                                          ioservid_t *ioservid)
+{
+    return 0;
+}
+
+static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid)
+{
+}
+
+static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid,
+                                            xen_pfn_t *ioreq_pfn,
+                                            xen_pfn_t *bufioreq_pfn,
+                                            evtchn_port_t *bufioreq_evtchn)
+{
+    unsigned long param;
+    int rc;
+
+    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN, &param);
+    if (rc < 0) {
+        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
+        return -1;
+    }
+
+    *ioreq_pfn = param;
+
+    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN, &param);
+    if (rc < 0) {
+        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
+        return -1;
+    }
+
+    *bufioreq_pfn = param;
+
+    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_EVTCHN,
+                          &param);
+    if (rc < 0) {
+        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
+        return -1;
+    }
+
+    *bufioreq_evtchn = param;
+
+    return 0;
+}
+
+static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
+                                             ioservid_t ioservid,
+                                             bool enable)
+{
+    return 0;
+}
+
+/* Xen 4.5 */
+#else
+
+static inline void xen_map_memory_section(XenXC xc, domid_t dom,
+                                          ioservid_t ioservid,
+                                          MemoryRegionSection *section)
+{
+    hwaddr start_addr = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr end_addr = start_addr + size - 1;
+
+    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
+    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
+                                        start_addr, end_addr);
+}
+
+static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid,
+                                            MemoryRegionSection *section)
+{
+    hwaddr start_addr = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr end_addr = start_addr + size - 1;
+
+    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
+    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
+                                            start_addr, end_addr);
+}
+
+static inline void xen_map_io_section(XenXC xc, domid_t dom,
+                                      ioservid_t ioservid,
+                                      MemoryRegionSection *section)
+{
+    hwaddr start_addr = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr end_addr = start_addr + size - 1;
+
+    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
+    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
+                                        start_addr, end_addr);
+}
+
+static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
+                                        ioservid_t ioservid,
+                                        MemoryRegionSection *section)
+{
+    hwaddr start_addr = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr end_addr = start_addr + size - 1;
+
+    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
+    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
+                                            start_addr, end_addr);
+}
+
+static inline void xen_map_pcidev(XenXC xc, domid_t dom,
+                                  ioservid_t ioservid,
+                                  PCIDevice *pci_dev)
+{
+    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
+                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
+    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
+                                      0, pci_bus_num(pci_dev->bus),
+                                      PCI_SLOT(pci_dev->devfn),
+                                      PCI_FUNC(pci_dev->devfn));
+}
+
+static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
+                                    ioservid_t ioservid,
+                                    PCIDevice *pci_dev)
+{
+    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
+                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
+    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
+                                          0, pci_bus_num(pci_dev->bus),
+                                          PCI_SLOT(pci_dev->devfn),
+                                          PCI_FUNC(pci_dev->devfn));
+}
+
+static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
+                                          ioservid_t *ioservid)
+{
+    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
+
+    if (rc == 0) {
+        trace_xen_ioreq_server_create(*ioservid);
+    }
+
+    return rc;
+}
+
+static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid)
+{
+    trace_xen_ioreq_server_destroy(ioservid);
+    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
+}
+
+static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid,
+                                            xen_pfn_t *ioreq_pfn,
+                                            xen_pfn_t *bufioreq_pfn,
+                                            evtchn_port_t *bufioreq_evtchn)
+{
+    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
+                                        ioreq_pfn, bufioreq_pfn,
+                                        bufioreq_evtchn);
+}
+
+static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
+                                             ioservid_t ioservid,
+                                             bool enable)
+{
+    trace_xen_ioreq_server_state(ioservid, enable);
+    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
+}
+
+#endif
+
 #endif /* QEMU_HW_XEN_COMMON_H */
diff --git a/trace-events b/trace-events
index b5722ea..abd1118 100644
--- a/trace-events
+++ b/trace-events
@@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages:
 # xen-hvm.c
 xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx"
 xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
+xen_ioreq_server_create(uint32_t id) "id: %u"
+xen_ioreq_server_destroy(uint32_t id) "id: %u"
+xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
+xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
+xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
+xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
+xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
+xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
+xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
 
 # xen-mapcache.c
 xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
diff --git a/xen-hvm.c b/xen-hvm.c
index 7548794..31cb3ca 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -85,9 +85,6 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
 }
 #  define FMT_ioreq_size "u"
 #endif
-#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
-#define HVM_PARAM_BUFIOREQ_EVTCHN 26
-#endif
 
 #define BUFFER_IO_MAX_DELAY  100
 
@@ -101,6 +98,7 @@ typedef struct XenPhysmap {
 } XenPhysmap;
 
 typedef struct XenIOState {
+    ioservid_t ioservid;
     shared_iopage_t *shared_page;
     shared_vmport_iopage_t *shared_vmport_page;
     buffered_iopage_t *buffered_io_page;
@@ -117,6 +115,8 @@ typedef struct XenIOState {
 
     struct xs_handle *xenstore;
     MemoryListener memory_listener;
+    MemoryListener io_listener;
+    DeviceListener device_listener;
     QLIST_HEAD(, XenPhysmap) physmap;
     hwaddr free_phys_offset;
     const XenPhysmap *log_for_dirtybit;
@@ -467,12 +467,23 @@ static void xen_set_memory(struct MemoryListener *listener,
     bool log_dirty = memory_region_is_logging(section->mr);
     hvmmem_type_t mem_type;
 
+    if (section->mr == &ram_memory) {
+        return;
+    } else {
+        if (add) {
+            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
+                                   section);
+        } else {
+            xen_unmap_memory_section(xen_xc, xen_domid, state->ioservid,
+                                     section);
+        }
+    }
+
     if (!memory_region_is_ram(section->mr)) {
         return;
     }
 
-    if (!(section->mr != &ram_memory
-          && ( (log_dirty && add) || (!log_dirty && !add)))) {
+    if (log_dirty != add) {
         return;
     }
 
@@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener *listener,
     memory_region_unref(section->mr);
 }
 
+static void xen_io_add(MemoryListener *listener,
+                       MemoryRegionSection *section)
+{
+    XenIOState *state = container_of(listener, XenIOState, io_listener);
+
+    memory_region_ref(section->mr);
+
+    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
+}
+
+static void xen_io_del(MemoryListener *listener,
+                       MemoryRegionSection *section)
+{
+    XenIOState *state = container_of(listener, XenIOState, io_listener);
+
+    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid, section);
+
+    memory_region_unref(section->mr);
+}
+
+static void xen_device_realize(DeviceListener *listener,
+			       DeviceState *dev)
+{
+    XenIOState *state = container_of(listener, XenIOState, device_listener);
+
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        PCIDevice *pci_dev = PCI_DEVICE(dev);
+
+        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
+    }
+}
+
+static void xen_device_unrealize(DeviceListener *listener,
+				 DeviceState *dev)
+{
+    XenIOState *state = container_of(listener, XenIOState, device_listener);
+
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        PCIDevice *pci_dev = PCI_DEVICE(dev);
+
+        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
+    }
+}
+
 static void xen_sync_dirty_bitmap(XenIOState *state,
                                   hwaddr start_addr,
                                   ram_addr_t size)
@@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener = {
     .priority = 10,
 };
 
+static MemoryListener xen_io_listener = {
+    .region_add = xen_io_add,
+    .region_del = xen_io_del,
+    .priority = 10,
+};
+
+static DeviceListener xen_device_listener = {
+    .realize = xen_device_realize,
+    .unrealize = xen_device_unrealize,
+};
+
 /* get the ioreq packets from share mem */
 static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState *state, int vcpu)
 {
@@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state, ioreq_t *req)
         case IOREQ_TYPE_INVALIDATE:
             xen_invalidate_map_cache();
             break;
+        case IOREQ_TYPE_PCI_CONFIG: {
+            uint32_t sbdf = req->addr >> 32;
+            uint32_t val;
+
+            /* Fake a write to port 0xCF8 so that
+             * the config space access will target the
+             * correct device model.
+             */
+            val = (1u << 31) |
+                  ((req->addr & 0x0f00) << 16) |
+                  ((sbdf & 0xffff) << 8) |
+                  (req->addr & 0xfc);
+            do_outp(0xcf8, 4, val);
+
+            /* Now issue the config space access via
+             * port 0xCFC
+             */
+            req->addr = 0xcfc | (req->addr & 0x03);
+            cpu_ioreq_pio(req);
+            break;
+        }
         default:
             hw_error("Invalid ioreq type 0x%x\n", req->type);
     }
@@ -993,9 +1080,15 @@ static void xen_main_loop_prepare(XenIOState *state)
 static void xen_hvm_change_state_handler(void *opaque, int running,
                                          RunState rstate)
 {
+    XenIOState *state = opaque;
+
     if (running) {
-        xen_main_loop_prepare((XenIOState *)opaque);
+        xen_main_loop_prepare(state);
     }
+
+    xen_set_ioreq_server_state(xen_xc, xen_domid,
+                               state->ioservid,
+                               (rstate == RUN_STATE_RUNNING));
 }
 
 static void xen_exit_notifier(Notifier *n, void *data)
@@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
                  MemoryRegion **ram_memory)
 {
     int i, rc;
-    unsigned long ioreq_pfn;
-    unsigned long bufioreq_evtchn;
+    xen_pfn_t ioreq_pfn;
+    xen_pfn_t bufioreq_pfn;
+    evtchn_port_t bufioreq_evtchn;
     XenIOState *state;
 
     state = g_malloc0(sizeof (XenIOState));
@@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
         return -1;
     }
 
+    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state->ioservid);
+    if (rc < 0) {
+        perror("xen: ioreq server create");
+        return -1;
+    }
+
     state->exit.notify = xen_exit_notifier;
     qemu_add_exit_notifier(&state->exit);
 
@@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
     state->wakeup.notify = xen_wakeup_notifier;
     qemu_register_wakeup_notifier(&state->wakeup);
 
-    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN, &ioreq_pfn);
+    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state->ioservid,
+                                   &ioreq_pfn, &bufioreq_pfn,
+                                   &bufioreq_evtchn);
+    if (rc < 0) {
+        hw_error("failed to get ioreq server info: error %d handle=" XC_INTERFACE_FMT,
+                 errno, xen_xc);
+    }
+
     DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
+    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
+    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
+
     state->shared_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
                                               PROT_READ|PROT_WRITE, ioreq_pfn);
     if (state->shared_page == NULL) {
@@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
         hw_error("get vmport regs pfn returned error %d, rc=%d", errno, rc);
     }
 
-    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
-    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
-    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
-                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
+    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid,
+                                                   XC_PAGE_SIZE,
+                                                   PROT_READ|PROT_WRITE,
+                                                   bufioreq_pfn);
     if (state->buffered_io_page == NULL) {
         hw_error("map buffered IO page returned error %d", errno);
     }
@@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
     /* Note: cpus is empty at this point in init */
     state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
 
+    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state->ioservid, true);
+    if (rc < 0) {
+        hw_error("failed to enable ioreq server info: error %d handle=" XC_INTERFACE_FMT,
+                 errno, xen_xc);
+    }
+
     state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
 
     /* FIXME: how about if we overflow the page here? */
@@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
         rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
                                         xen_vcpu_eport(state->shared_page, i));
         if (rc == -1) {
-            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
+            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
             return -1;
         }
         state->ioreq_local_port[i] = rc;
     }
 
-    rc = xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_EVTCHN,
-            &bufioreq_evtchn);
-    if (rc < 0) {
-        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
-        return -1;
-    }
     rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
-            (uint32_t)bufioreq_evtchn);
+                                    bufioreq_evtchn);
     if (rc == -1) {
-        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
+        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
         return -1;
     }
     state->bufioreq_local_port = rc;
@@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
     memory_listener_register(&state->memory_listener, &address_space_memory);
     state->log_for_dirtybit = NULL;
 
+    state->io_listener = xen_io_listener;
+    memory_listener_register(&state->io_listener, &address_space_io);
+
+    state->device_listener = xen_device_listener;
+    device_listener_register(&state->device_listener);
+
     /* Initialize backend core & drivers */
     if (xen_be_init() != 0) {
         fprintf(stderr, "%s: xen backend core setup failed\n", __FUNCTION__);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/2] Add device listener interface
  2014-12-05 10:50 ` [Qemu-devel] [PATCH v5 1/2] Add device listener interface Paul Durrant
@ 2014-12-05 11:44   ` Paolo Bonzini
  2014-12-08 11:12     ` Paul Durrant
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-12-05 11:44 UTC (permalink / raw)
  To: Paul Durrant, qemu-devel
  Cc: Peter Crosthwaite, Thomas Huth, Michael S. Tsirkin,
	Markus Armbruster, Christian Borntraeger, Igor Mammedov,
	Andreas Faerber"



On 05/12/2014 11:50, Paul Durrant wrote:
> The Xen ioreq-server API, introduced in Xen 4.5, requires that PCI device
> models explicitly register with Xen for config space accesses. This patch
> adds a listener interface into qdev-core which can be used by the Xen
> interface code to monitor for arrival and departure of PCI devices.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andreas Faerber" <afaerber@suse.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Thomas Huth <thuth@linux.vnet.ibm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  hw/core/qdev.c          |   53 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/qdev-core.h  |   10 +++++++++
>  include/qemu/typedefs.h |    1 +
>  3 files changed, 64 insertions(+)
> 
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 35fd00d..76ff9ef 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -189,6 +189,56 @@ int qdev_init(DeviceState *dev)
>      return 0;
>  }
>  
> +static QTAILQ_HEAD(device_listeners, DeviceListener) device_listeners
> +    = QTAILQ_HEAD_INITIALIZER(device_listeners);
> +
> +enum ListenerDirection { Forward, Reverse };
> +
> +#define DEVICE_LISTENER_CALL(_callback, _direction, _args...)     \
> +    do {                                                          \
> +        DeviceListener *_listener;                                \
> +                                                                  \
> +        switch (_direction) {                                     \
> +        case Forward:                                             \
> +            QTAILQ_FOREACH(_listener, &device_listeners, link) {  \
> +                if (_listener->_callback) {                       \
> +                    _listener->_callback(_listener, ##_args);     \
> +                }                                                 \
> +            }                                                     \
> +            break;                                                \
> +        case Reverse:                                             \
> +            QTAILQ_FOREACH_REVERSE(_listener, &device_listeners,  \
> +                                   device_listeners, link) {      \
> +                if (_listener->_callback) {                       \
> +                    _listener->_callback(_listener, ##_args);     \
> +                }                                                 \
> +            }                                                     \
> +            break;                                                \
> +        default:                                                  \
> +            abort();                                              \
> +        }                                                         \
> +    } while (0)
> +
> +static int device_listener_add(DeviceState *dev, void *opaque)
> +{
> +    DEVICE_LISTENER_CALL(realize, Forward, dev);
> +
> +    return 0;
> +}
> +
> +void device_listener_register(DeviceListener *listener)
> +{
> +    QTAILQ_INSERT_TAIL(&device_listeners, listener, link);
> +
> +    qbus_walk_children(sysbus_get_default(), NULL, NULL, device_listener_add,
> +                       NULL, NULL);
> +}
> +
> +void device_listener_unregister(DeviceListener *listener)
> +{
> +    QTAILQ_REMOVE(&device_listeners, listener, link);
> +}
> +
>  static void device_realize(DeviceState *dev, Error **errp)
>  {
>      DeviceClass *dc = DEVICE_GET_CLASS(dev);
> @@ -994,6 +1044,8 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
>              goto fail;
>          }
>  
> +        DEVICE_LISTENER_CALL(realize, Forward, dev);
> +
>          hotplug_ctrl = qdev_get_hotplug_handler(dev);
>          if (hotplug_ctrl) {
>              hotplug_handler_plug(hotplug_ctrl, dev, &local_err);
> @@ -1035,6 +1087,7 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
>              dc->unrealize(dev, local_errp);
>          }
>          dev->pending_deleted_event = true;
> +        DEVICE_LISTENER_CALL(unrealize, Reverse, dev);
>      }
>  
>      if (local_err != NULL) {
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 589bbe7..15a226f 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -165,6 +165,12 @@ struct DeviceState {
>      int alias_required_for_version;
>  };
>  
> +struct DeviceListener {
> +    void (*realize)(DeviceListener *listener, DeviceState *dev);
> +    void (*unrealize)(DeviceListener *listener, DeviceState *dev);
> +    QTAILQ_ENTRY(DeviceListener) link;
> +};
> +
>  #define TYPE_BUS "bus"
>  #define BUS(obj) OBJECT_CHECK(BusState, (obj), TYPE_BUS)
>  #define BUS_CLASS(klass) OBJECT_CLASS_CHECK(BusClass, (klass), TYPE_BUS)
> @@ -376,4 +382,8 @@ static inline bool qbus_is_hotpluggable(BusState *bus)
>  {
>     return bus->hotplug_handler;
>  }
> +
> +void device_listener_register(DeviceListener *listener);
> +void device_listener_unregister(DeviceListener *listener);
> +
>  #endif
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index 3475177..4bb4938 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -20,6 +20,7 @@ typedef struct Property Property;
>  typedef struct PropertyInfo PropertyInfo;
>  typedef struct CompatProperty CompatProperty;
>  typedef struct DeviceState DeviceState;
> +typedef struct DeviceListener DeviceListener;
>  typedef struct BusState BusState;
>  typedef struct BusClass BusClass;
>  
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/2] Add device listener interface
  2014-12-05 11:44   ` Paolo Bonzini
@ 2014-12-08 11:12     ` Paul Durrant
  2014-12-09 17:40       ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2014-12-08 11:12 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Peter Crosthwaite, Thomas Huth, Michael S. Tsirkin,
	Markus Armbruster, Christian Borntraeger, Igor Mammedov,
	Andreas Faerber"

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: 05 December 2014 11:45
> To: Paul Durrant; qemu-devel@nongnu.org
> Cc: Michael S. Tsirkin; Andreas Faerber"; Peter Crosthwaite; Igor
> Mammedov; Markus Armbruster; Thomas Huth; Christian Borntraeger
> Subject: Re: [PATCH v5 1/2] Add device listener interface
> 
> 
> 
> On 05/12/2014 11:50, Paul Durrant wrote:
> > The Xen ioreq-server API, introduced in Xen 4.5, requires that PCI device
> > models explicitly register with Xen for config space accesses. This patch
> > adds a listener interface into qdev-core which can be used by the Xen
> > interface code to monitor for arrival and departure of PCI devices.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Andreas Faerber" <afaerber@suse.de>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
> > Cc: Igor Mammedov <imammedo@redhat.com>
> > Cc: Markus Armbruster <armbru@redhat.com>
> > Cc: Thomas Huth <thuth@linux.vnet.ibm.com>
> > Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >  hw/core/qdev.c          |   53
> +++++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/qdev-core.h  |   10 +++++++++
> >  include/qemu/typedefs.h |    1 +
> >  3 files changed, 64 insertions(+)
> >
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index 35fd00d..76ff9ef 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -189,6 +189,56 @@ int qdev_init(DeviceState *dev)
> >      return 0;
> >  }
> >
> > +static QTAILQ_HEAD(device_listeners, DeviceListener) device_listeners
> > +    = QTAILQ_HEAD_INITIALIZER(device_listeners);
> > +
> > +enum ListenerDirection { Forward, Reverse };
> > +
> > +#define DEVICE_LISTENER_CALL(_callback, _direction, _args...)     \
> > +    do {                                                          \
> > +        DeviceListener *_listener;                                \
> > +                                                                  \
> > +        switch (_direction) {                                     \
> > +        case Forward:                                             \
> > +            QTAILQ_FOREACH(_listener, &device_listeners, link) {  \
> > +                if (_listener->_callback) {                       \
> > +                    _listener->_callback(_listener, ##_args);     \
> > +                }                                                 \
> > +            }                                                     \
> > +            break;                                                \
> > +        case Reverse:                                             \
> > +            QTAILQ_FOREACH_REVERSE(_listener, &device_listeners,  \
> > +                                   device_listeners, link) {      \
> > +                if (_listener->_callback) {                       \
> > +                    _listener->_callback(_listener, ##_args);     \
> > +                }                                                 \
> > +            }                                                     \
> > +            break;                                                \
> > +        default:                                                  \
> > +            abort();                                              \
> > +        }                                                         \
> > +    } while (0)
> > +
> > +static int device_listener_add(DeviceState *dev, void *opaque)
> > +{
> > +    DEVICE_LISTENER_CALL(realize, Forward, dev);
> > +
> > +    return 0;
> > +}
> > +
> > +void device_listener_register(DeviceListener *listener)
> > +{
> > +    QTAILQ_INSERT_TAIL(&device_listeners, listener, link);
> > +
> > +    qbus_walk_children(sysbus_get_default(), NULL, NULL,
> device_listener_add,
> > +                       NULL, NULL);
> > +}
> > +
> > +void device_listener_unregister(DeviceListener *listener)
> > +{
> > +    QTAILQ_REMOVE(&device_listeners, listener, link);
> > +}
> > +
> >  static void device_realize(DeviceState *dev, Error **errp)
> >  {
> >      DeviceClass *dc = DEVICE_GET_CLASS(dev);
> > @@ -994,6 +1044,8 @@ static void device_set_realized(Object *obj, bool
> value, Error **errp)
> >              goto fail;
> >          }
> >
> > +        DEVICE_LISTENER_CALL(realize, Forward, dev);
> > +
> >          hotplug_ctrl = qdev_get_hotplug_handler(dev);
> >          if (hotplug_ctrl) {
> >              hotplug_handler_plug(hotplug_ctrl, dev, &local_err);
> > @@ -1035,6 +1087,7 @@ static void device_set_realized(Object *obj, bool
> value, Error **errp)
> >              dc->unrealize(dev, local_errp);
> >          }
> >          dev->pending_deleted_event = true;
> > +        DEVICE_LISTENER_CALL(unrealize, Reverse, dev);
> >      }
> >
> >      if (local_err != NULL) {
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 589bbe7..15a226f 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -165,6 +165,12 @@ struct DeviceState {
> >      int alias_required_for_version;
> >  };
> >
> > +struct DeviceListener {
> > +    void (*realize)(DeviceListener *listener, DeviceState *dev);
> > +    void (*unrealize)(DeviceListener *listener, DeviceState *dev);
> > +    QTAILQ_ENTRY(DeviceListener) link;
> > +};
> > +
> >  #define TYPE_BUS "bus"
> >  #define BUS(obj) OBJECT_CHECK(BusState, (obj), TYPE_BUS)
> >  #define BUS_CLASS(klass) OBJECT_CLASS_CHECK(BusClass, (klass),
> TYPE_BUS)
> > @@ -376,4 +382,8 @@ static inline bool qbus_is_hotpluggable(BusState
> *bus)
> >  {
> >     return bus->hotplug_handler;
> >  }
> > +
> > +void device_listener_register(DeviceListener *listener);
> > +void device_listener_unregister(DeviceListener *listener);
> > +
> >  #endif
> > diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> > index 3475177..4bb4938 100644
> > --- a/include/qemu/typedefs.h
> > +++ b/include/qemu/typedefs.h
> > @@ -20,6 +20,7 @@ typedef struct Property Property;
> >  typedef struct PropertyInfo PropertyInfo;
> >  typedef struct CompatProperty CompatProperty;
> >  typedef struct DeviceState DeviceState;
> > +typedef struct DeviceListener DeviceListener;
> >  typedef struct BusState BusState;
> >  typedef struct BusClass BusClass;
> >
> >
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> 

Great. Thanks for that. All I need now is an ack on this from a maintainer.

  Paul

> Thanks!
> 
> Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/2] Add device listener interface
  2014-12-08 11:12     ` Paul Durrant
@ 2014-12-09 17:40       ` Paolo Bonzini
  2014-12-09 17:54         ` Andreas Färber
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-12-09 17:40 UTC (permalink / raw)
  To: Paul Durrant, qemu-devel
  Cc: Peter Crosthwaite, Thomas Huth, Michael S. Tsirkin,
	Markus Armbruster, Christian Borntraeger, Igor Mammedov,
	Andreas Faerber"



On 08/12/2014 12:12, Paul Durrant wrote:
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> Sent: 05 December 2014 11:45
>> To: Paul Durrant; qemu-devel@nongnu.org
>> Cc: Michael S. Tsirkin; Andreas Faerber"; Peter Crosthwaite; Igor
>> Mammedov; Markus Armbruster; Thomas Huth; Christian Borntraeger
>> Subject: Re: [PATCH v5 1/2] Add device listener interface
>>
>>
>>
>> On 05/12/2014 11:50, Paul Durrant wrote:
>>> The Xen ioreq-server API, introduced in Xen 4.5, requires that PCI device
>>> models explicitly register with Xen for config space accesses. This patch
>>> adds a listener interface into qdev-core which can be used by the Xen
>>> interface code to monitor for arrival and departure of PCI devices.
>>>
>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>> Cc: Andreas Faerber" <afaerber@suse.de>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Cc: Thomas Huth <thuth@linux.vnet.ibm.com>
>>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>>> ---
>>>  hw/core/qdev.c          |   53
>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>  include/hw/qdev-core.h  |   10 +++++++++
>>>  include/qemu/typedefs.h |    1 +
>>>  3 files changed, 64 insertions(+)
>>>
>>> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
>>> index 35fd00d..76ff9ef 100644
>>> --- a/hw/core/qdev.c
>>> +++ b/hw/core/qdev.c
>>> @@ -189,6 +189,56 @@ int qdev_init(DeviceState *dev)
>>>      return 0;
>>>  }
>>>
>>> +static QTAILQ_HEAD(device_listeners, DeviceListener) device_listeners
>>> +    = QTAILQ_HEAD_INITIALIZER(device_listeners);
>>> +
>>> +enum ListenerDirection { Forward, Reverse };
>>> +
>>> +#define DEVICE_LISTENER_CALL(_callback, _direction, _args...)     \
>>> +    do {                                                          \
>>> +        DeviceListener *_listener;                                \
>>> +                                                                  \
>>> +        switch (_direction) {                                     \
>>> +        case Forward:                                             \
>>> +            QTAILQ_FOREACH(_listener, &device_listeners, link) {  \
>>> +                if (_listener->_callback) {                       \
>>> +                    _listener->_callback(_listener, ##_args);     \
>>> +                }                                                 \
>>> +            }                                                     \
>>> +            break;                                                \
>>> +        case Reverse:                                             \
>>> +            QTAILQ_FOREACH_REVERSE(_listener, &device_listeners,  \
>>> +                                   device_listeners, link) {      \
>>> +                if (_listener->_callback) {                       \
>>> +                    _listener->_callback(_listener, ##_args);     \
>>> +                }                                                 \
>>> +            }                                                     \
>>> +            break;                                                \
>>> +        default:                                                  \
>>> +            abort();                                              \
>>> +        }                                                         \
>>> +    } while (0)
>>> +
>>> +static int device_listener_add(DeviceState *dev, void *opaque)
>>> +{
>>> +    DEVICE_LISTENER_CALL(realize, Forward, dev);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +void device_listener_register(DeviceListener *listener)
>>> +{
>>> +    QTAILQ_INSERT_TAIL(&device_listeners, listener, link);
>>> +
>>> +    qbus_walk_children(sysbus_get_default(), NULL, NULL,
>> device_listener_add,
>>> +                       NULL, NULL);
>>> +}
>>> +
>>> +void device_listener_unregister(DeviceListener *listener)
>>> +{
>>> +    QTAILQ_REMOVE(&device_listeners, listener, link);
>>> +}
>>> +
>>>  static void device_realize(DeviceState *dev, Error **errp)
>>>  {
>>>      DeviceClass *dc = DEVICE_GET_CLASS(dev);
>>> @@ -994,6 +1044,8 @@ static void device_set_realized(Object *obj, bool
>> value, Error **errp)
>>>              goto fail;
>>>          }
>>>
>>> +        DEVICE_LISTENER_CALL(realize, Forward, dev);
>>> +
>>>          hotplug_ctrl = qdev_get_hotplug_handler(dev);
>>>          if (hotplug_ctrl) {
>>>              hotplug_handler_plug(hotplug_ctrl, dev, &local_err);
>>> @@ -1035,6 +1087,7 @@ static void device_set_realized(Object *obj, bool
>> value, Error **errp)
>>>              dc->unrealize(dev, local_errp);
>>>          }
>>>          dev->pending_deleted_event = true;
>>> +        DEVICE_LISTENER_CALL(unrealize, Reverse, dev);
>>>      }
>>>
>>>      if (local_err != NULL) {
>>> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
>>> index 589bbe7..15a226f 100644
>>> --- a/include/hw/qdev-core.h
>>> +++ b/include/hw/qdev-core.h
>>> @@ -165,6 +165,12 @@ struct DeviceState {
>>>      int alias_required_for_version;
>>>  };
>>>
>>> +struct DeviceListener {
>>> +    void (*realize)(DeviceListener *listener, DeviceState *dev);
>>> +    void (*unrealize)(DeviceListener *listener, DeviceState *dev);
>>> +    QTAILQ_ENTRY(DeviceListener) link;
>>> +};
>>> +
>>>  #define TYPE_BUS "bus"
>>>  #define BUS(obj) OBJECT_CHECK(BusState, (obj), TYPE_BUS)
>>>  #define BUS_CLASS(klass) OBJECT_CLASS_CHECK(BusClass, (klass),
>> TYPE_BUS)
>>> @@ -376,4 +382,8 @@ static inline bool qbus_is_hotpluggable(BusState
>> *bus)
>>>  {
>>>     return bus->hotplug_handler;
>>>  }
>>> +
>>> +void device_listener_register(DeviceListener *listener);
>>> +void device_listener_unregister(DeviceListener *listener);
>>> +
>>>  #endif
>>> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
>>> index 3475177..4bb4938 100644
>>> --- a/include/qemu/typedefs.h
>>> +++ b/include/qemu/typedefs.h
>>> @@ -20,6 +20,7 @@ typedef struct Property Property;
>>>  typedef struct PropertyInfo PropertyInfo;
>>>  typedef struct CompatProperty CompatProperty;
>>>  typedef struct DeviceState DeviceState;
>>> +typedef struct DeviceListener DeviceListener;
>>>  typedef struct BusState BusState;
>>>  typedef struct BusClass BusClass;
>>>
>>>
>>
>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>
> 
> Great. Thanks for that. All I need now is an ack on this from a maintainer.

I think my reviewed-by is pretty close to an Ack.

Stefano, feel free to pick this patch up together with 2/2.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/2] Add device listener interface
  2014-12-09 17:40       ` Paolo Bonzini
@ 2014-12-09 17:54         ` Andreas Färber
  0 siblings, 0 replies; 20+ messages in thread
From: Andreas Färber @ 2014-12-09 17:54 UTC (permalink / raw)
  To: Paul Durrant, qemu-devel, Stefano Stabellini
  Cc: Peter Crosthwaite, Thomas Huth, Michael S. Tsirkin,
	Markus Armbruster, Christian Borntraeger, Igor Mammedov,
	Paolo Bonzini

Am 09.12.2014 um 18:40 schrieb Paolo Bonzini:
> On 08/12/2014 12:12, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>>> Sent: 05 December 2014 11:45
>>> To: Paul Durrant; qemu-devel@nongnu.org
>>> Cc: Michael S. Tsirkin; Andreas Faerber"; Peter Crosthwaite; Igor
>>> Mammedov; Markus Armbruster; Thomas Huth; Christian Borntraeger
>>> Subject: Re: [PATCH v5 1/2] Add device listener interface
>>>
>>>
>>>
>>> On 05/12/2014 11:50, Paul Durrant wrote:
>>>> The Xen ioreq-server API, introduced in Xen 4.5, requires that PCI device
>>>> models explicitly register with Xen for config space accesses. This patch
>>>> adds a listener interface into qdev-core which can be used by the Xen
>>>> interface code to monitor for arrival and departure of PCI devices.
>>>>
>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>> Cc: Andreas Faerber" <afaerber@suse.de>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Cc: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>> Cc: Thomas Huth <thuth@linux.vnet.ibm.com>
>>>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>>>> ---
>>>>  hw/core/qdev.c          |   53
>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/qdev-core.h  |   10 +++++++++
>>>>  include/qemu/typedefs.h |    1 +
>>>>  3 files changed, 64 insertions(+)
>>>>
>>>> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
>>>> index 35fd00d..76ff9ef 100644
>>>> --- a/hw/core/qdev.c
>>>> +++ b/hw/core/qdev.c
>>>> @@ -189,6 +189,56 @@ int qdev_init(DeviceState *dev)
>>>>      return 0;
>>>>  }
>>>>
>>>> +static QTAILQ_HEAD(device_listeners, DeviceListener) device_listeners
>>>> +    = QTAILQ_HEAD_INITIALIZER(device_listeners);
>>>> +
>>>> +enum ListenerDirection { Forward, Reverse };
>>>> +
>>>> +#define DEVICE_LISTENER_CALL(_callback, _direction, _args...)     \
>>>> +    do {                                                          \
>>>> +        DeviceListener *_listener;                                \
>>>> +                                                                  \
>>>> +        switch (_direction) {                                     \
>>>> +        case Forward:                                             \
>>>> +            QTAILQ_FOREACH(_listener, &device_listeners, link) {  \
>>>> +                if (_listener->_callback) {                       \
>>>> +                    _listener->_callback(_listener, ##_args);     \
>>>> +                }                                                 \
>>>> +            }                                                     \
>>>> +            break;                                                \
>>>> +        case Reverse:                                             \
>>>> +            QTAILQ_FOREACH_REVERSE(_listener, &device_listeners,  \
>>>> +                                   device_listeners, link) {      \
>>>> +                if (_listener->_callback) {                       \
>>>> +                    _listener->_callback(_listener, ##_args);     \
>>>> +                }                                                 \
>>>> +            }                                                     \
>>>> +            break;                                                \
>>>> +        default:                                                  \
>>>> +            abort();                                              \
>>>> +        }                                                         \
>>>> +    } while (0)
>>>> +
>>>> +static int device_listener_add(DeviceState *dev, void *opaque)
>>>> +{
>>>> +    DEVICE_LISTENER_CALL(realize, Forward, dev);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +void device_listener_register(DeviceListener *listener)
>>>> +{
>>>> +    QTAILQ_INSERT_TAIL(&device_listeners, listener, link);
>>>> +
>>>> +    qbus_walk_children(sysbus_get_default(), NULL, NULL,
>>> device_listener_add,
>>>> +                       NULL, NULL);
>>>> +}
>>>> +
>>>> +void device_listener_unregister(DeviceListener *listener)
>>>> +{
>>>> +    QTAILQ_REMOVE(&device_listeners, listener, link);
>>>> +}
>>>> +
>>>>  static void device_realize(DeviceState *dev, Error **errp)
>>>>  {
>>>>      DeviceClass *dc = DEVICE_GET_CLASS(dev);
>>>> @@ -994,6 +1044,8 @@ static void device_set_realized(Object *obj, bool
>>> value, Error **errp)
>>>>              goto fail;
>>>>          }
>>>>
>>>> +        DEVICE_LISTENER_CALL(realize, Forward, dev);
>>>> +
>>>>          hotplug_ctrl = qdev_get_hotplug_handler(dev);
>>>>          if (hotplug_ctrl) {
>>>>              hotplug_handler_plug(hotplug_ctrl, dev, &local_err);
>>>> @@ -1035,6 +1087,7 @@ static void device_set_realized(Object *obj, bool
>>> value, Error **errp)
>>>>              dc->unrealize(dev, local_errp);
>>>>          }
>>>>          dev->pending_deleted_event = true;
>>>> +        DEVICE_LISTENER_CALL(unrealize, Reverse, dev);
>>>>      }
>>>>
>>>>      if (local_err != NULL) {
>>>> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
>>>> index 589bbe7..15a226f 100644
>>>> --- a/include/hw/qdev-core.h
>>>> +++ b/include/hw/qdev-core.h
>>>> @@ -165,6 +165,12 @@ struct DeviceState {
>>>>      int alias_required_for_version;
>>>>  };
>>>>
>>>> +struct DeviceListener {
>>>> +    void (*realize)(DeviceListener *listener, DeviceState *dev);
>>>> +    void (*unrealize)(DeviceListener *listener, DeviceState *dev);
>>>> +    QTAILQ_ENTRY(DeviceListener) link;
>>>> +};
>>>> +
>>>>  #define TYPE_BUS "bus"
>>>>  #define BUS(obj) OBJECT_CHECK(BusState, (obj), TYPE_BUS)
>>>>  #define BUS_CLASS(klass) OBJECT_CLASS_CHECK(BusClass, (klass),
>>> TYPE_BUS)
>>>> @@ -376,4 +382,8 @@ static inline bool qbus_is_hotpluggable(BusState
>>> *bus)
>>>>  {
>>>>     return bus->hotplug_handler;
>>>>  }
>>>> +
>>>> +void device_listener_register(DeviceListener *listener);
>>>> +void device_listener_unregister(DeviceListener *listener);
>>>> +
>>>>  #endif
>>>> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
>>>> index 3475177..4bb4938 100644
>>>> --- a/include/qemu/typedefs.h
>>>> +++ b/include/qemu/typedefs.h
>>>> @@ -20,6 +20,7 @@ typedef struct Property Property;
>>>>  typedef struct PropertyInfo PropertyInfo;
>>>>  typedef struct CompatProperty CompatProperty;
>>>>  typedef struct DeviceState DeviceState;
>>>> +typedef struct DeviceListener DeviceListener;
>>>>  typedef struct BusState BusState;
>>>>  typedef struct BusClass BusClass;
>>>>
>>>>
>>>
>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>>
>>
>> Great. Thanks for that. All I need now is an ack on this from a maintainer.
> 
> I think my reviewed-by is pretty close to an Ack.
> 
> Stefano, feel free to pick this patch up together with 2/2.

In that case,

Reviewed-by: Andreas Färber <afaerber@suse.de>

If I should still pick it up, just let me know.

Regards,
Andreas

-- 
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 21284 AG Nürnberg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available
  2014-12-05 10:50 ` [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available Paul Durrant
@ 2015-01-28 19:32   ` Don Slutz
  2015-01-29  0:05     ` Don Slutz
  0 siblings, 1 reply; 20+ messages in thread
From: Don Slutz @ 2015-01-28 19:32 UTC (permalink / raw)
  To: Paul Durrant, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, Gerd Hoffmann, Stefan Hajnoczi,
	Paolo Bonzini

On 12/05/14 05:50, Paul Durrant wrote:
> The ioreq-server API added to Xen 4.5 offers better security than
> the existing Xen/QEMU interface because the shared pages that are
> used to pass emulation request/results back and forth are removed
> from the guest's memory space before any requests are serviced.
> This prevents the guest from mapping these pages (they are in a
> well known location) and attempting to attack QEMU by synthesizing
> its own request structures. Hence, this patch modifies configure
> to detect whether the API is available, and adds the necessary
> code to use the API if it is.

This patch (which is now on xenbits qemu staging) is causing me
issues.

So far I have tracked it back to hvm_select_ioreq_server()
which selects the "default_ioreq_server".  Since I have one 1
QEMU, it is both the "default_ioreq_server" and an enabled
2nd ioreq_server.  I am continuing to understand why my changes
are causing this.  More below.

This patch causes QEMU to only call xc_evtchn_bind_interdomain()
for the enabled 2nd ioreq_server.  So when (if)
hvm_select_ioreq_server() selects the "default_ioreq_server", the
guest hangs on an I/O.

Using the debug key 'e':

(XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
(XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
(XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
(XEN) [2015-01-28 18:57:07]     port [p/m/s]
(XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
(XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
(XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
(XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
(XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
(XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
(XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
(XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
(XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
(XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
(XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
(XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
(XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
(XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
(XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
(XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
(XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
(XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
(XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
(XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
(XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
(XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
(XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
(XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
(XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
(XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
(XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
(XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
(XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
(XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
(XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
(XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
(XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
(XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
(XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
(XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
(XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
(XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
(XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
(XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
(XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
(XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
(XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
(XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
(XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
(XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
(XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
(XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
(XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
(XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
(XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
(XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
(XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
(XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
(XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
(XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
(XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
(XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
(XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
(XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
(XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
(XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
(XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
(XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
(XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
(XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
(XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
(XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
(XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
(XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
(XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
(XEN) [2015-01-28 18:57:07]     port [p/m/s]
(XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
(XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
(XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
(XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
(XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
(XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
(XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
(XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
(XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
(XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
(XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
(XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
(XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
(XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
(XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
(XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0

You can see that domain 1 has only half of it's event channels
fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
does:

            notify_via_xen_event_channel(d, port);

Nothing happens and you hang in hvm_wait_for_io() forever.


This does raise the questions:

1) Does this patch causes extra event channels to be created
   that cannot be used?

2) Should the "default_ioreq_server" be deleted?


Not sure the right way to go.

    -Don Slutz


> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Michael Tokarev <mjt@tls.msk.ru>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Stefan Weil <sw@weilnetz.de>
> Cc: Olaf Hering <olaf@aepfle.de>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Alexander Graf <agraf@suse.de>
> ---
>  configure                   |   29 ++++++
>  include/hw/xen/xen_common.h |  223 +++++++++++++++++++++++++++++++++++++++++++
>  trace-events                |    9 ++
>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
>  4 files changed, 399 insertions(+), 22 deletions(-)
> 
> diff --git a/configure b/configure
> index 47048f0..b1f8c2a 100755
> --- a/configure
> +++ b/configure
> @@ -1877,6 +1877,32 @@ int main(void) {
>    xc_gnttab_open(NULL, 0);
>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
> +  return 0;
> +}
> +EOF
> +      compile_prog "" "$xen_libs"
> +    then
> +    xen_ctrl_version=450
> +    xen=yes
> +
> +  elif
> +      cat > $TMPC <<EOF &&
> +#include <xenctrl.h>
> +#include <xenstore.h>
> +#include <stdint.h>
> +#include <xen/hvm/hvm_info_table.h>
> +#if !defined(HVM_MAX_VCPUS)
> +# error HVM_MAX_VCPUS not defined
> +#endif
> +int main(void) {
> +  xc_interface *xc;
> +  xs_daemon_open();
> +  xc = xc_interface_open(0, 0, 0);
> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
> +  xc_gnttab_open(NULL, 0);
> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>    return 0;
>  }
>  EOF
> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
>      echo "Target Sparc Arch $sparc_cpu"
>  fi
>  echo "xen support       $xen"
> +if test "$xen" = "yes" ; then
> +  echo "xen ctrl version  $xen_ctrl_version"
> +fi
>  echo "brlapi support    $brlapi"
>  echo "bluez  support    $bluez"
>  echo "Documentation     $docs"
> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
> index 95612a4..519696f 100644
> --- a/include/hw/xen/xen_common.h
> +++ b/include/hw/xen/xen_common.h
> @@ -16,7 +16,9 @@
>  
>  #include "hw/hw.h"
>  #include "hw/xen/xen.h"
> +#include "hw/pci/pci.h"
>  #include "qemu/queue.h"
> +#include "trace.h"
>  
>  /*
>   * We don't support Xen prior to 3.3.0.
> @@ -179,4 +181,225 @@ static inline int xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
>  }
>  #endif
>  
> +/* Xen before 4.5 */
> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
> +
> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
> +#endif
> +
> +#define IOREQ_TYPE_PCI_CONFIG 2
> +
> +typedef uint32_t ioservid_t;
> +
> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
> +                                          ioservid_t ioservid,
> +                                          MemoryRegionSection *section)
> +{
> +}
> +
> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid,
> +                                            MemoryRegionSection *section)
> +{
> +}
> +
> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
> +                                      ioservid_t ioservid,
> +                                      MemoryRegionSection *section)
> +{
> +}
> +
> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
> +                                        ioservid_t ioservid,
> +                                        MemoryRegionSection *section)
> +{
> +}
> +
> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
> +                                  ioservid_t ioservid,
> +                                  PCIDevice *pci_dev)
> +{
> +}
> +
> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
> +                                    ioservid_t ioservid,
> +                                    PCIDevice *pci_dev)
> +{
> +}
> +
> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
> +                                          ioservid_t *ioservid)
> +{
> +    return 0;
> +}
> +
> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid)
> +{
> +}
> +
> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid,
> +                                            xen_pfn_t *ioreq_pfn,
> +                                            xen_pfn_t *bufioreq_pfn,
> +                                            evtchn_port_t *bufioreq_evtchn)
> +{
> +    unsigned long param;
> +    int rc;
> +
> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN, &param);
> +    if (rc < 0) {
> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
> +        return -1;
> +    }
> +
> +    *ioreq_pfn = param;
> +
> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN, &param);
> +    if (rc < 0) {
> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
> +        return -1;
> +    }
> +
> +    *bufioreq_pfn = param;
> +
> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_EVTCHN,
> +                          &param);
> +    if (rc < 0) {
> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
> +        return -1;
> +    }
> +
> +    *bufioreq_evtchn = param;
> +
> +    return 0;
> +}
> +
> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
> +                                             ioservid_t ioservid,
> +                                             bool enable)
> +{
> +    return 0;
> +}
> +
> +/* Xen 4.5 */
> +#else
> +
> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
> +                                          ioservid_t ioservid,
> +                                          MemoryRegionSection *section)
> +{
> +    hwaddr start_addr = section->offset_within_address_space;
> +    ram_addr_t size = int128_get64(section->size);
> +    hwaddr end_addr = start_addr + size - 1;
> +
> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
> +                                        start_addr, end_addr);
> +}
> +
> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid,
> +                                            MemoryRegionSection *section)
> +{
> +    hwaddr start_addr = section->offset_within_address_space;
> +    ram_addr_t size = int128_get64(section->size);
> +    hwaddr end_addr = start_addr + size - 1;
> +
> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
> +                                            start_addr, end_addr);
> +}
> +
> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
> +                                      ioservid_t ioservid,
> +                                      MemoryRegionSection *section)
> +{
> +    hwaddr start_addr = section->offset_within_address_space;
> +    ram_addr_t size = int128_get64(section->size);
> +    hwaddr end_addr = start_addr + size - 1;
> +
> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
> +                                        start_addr, end_addr);
> +}
> +
> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
> +                                        ioservid_t ioservid,
> +                                        MemoryRegionSection *section)
> +{
> +    hwaddr start_addr = section->offset_within_address_space;
> +    ram_addr_t size = int128_get64(section->size);
> +    hwaddr end_addr = start_addr + size - 1;
> +
> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
> +                                            start_addr, end_addr);
> +}
> +
> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
> +                                  ioservid_t ioservid,
> +                                  PCIDevice *pci_dev)
> +{
> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
> +                                      0, pci_bus_num(pci_dev->bus),
> +                                      PCI_SLOT(pci_dev->devfn),
> +                                      PCI_FUNC(pci_dev->devfn));
> +}
> +
> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
> +                                    ioservid_t ioservid,
> +                                    PCIDevice *pci_dev)
> +{
> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
> +                                          0, pci_bus_num(pci_dev->bus),
> +                                          PCI_SLOT(pci_dev->devfn),
> +                                          PCI_FUNC(pci_dev->devfn));
> +}
> +
> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
> +                                          ioservid_t *ioservid)
> +{
> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
> +
> +    if (rc == 0) {
> +        trace_xen_ioreq_server_create(*ioservid);
> +    }
> +
> +    return rc;
> +}
> +
> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid)
> +{
> +    trace_xen_ioreq_server_destroy(ioservid);
> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
> +}
> +
> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid,
> +                                            xen_pfn_t *ioreq_pfn,
> +                                            xen_pfn_t *bufioreq_pfn,
> +                                            evtchn_port_t *bufioreq_evtchn)
> +{
> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
> +                                        ioreq_pfn, bufioreq_pfn,
> +                                        bufioreq_evtchn);
> +}
> +
> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
> +                                             ioservid_t ioservid,
> +                                             bool enable)
> +{
> +    trace_xen_ioreq_server_state(ioservid, enable);
> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
> +}
> +
> +#endif
> +
>  #endif /* QEMU_HW_XEN_COMMON_H */
> diff --git a/trace-events b/trace-events
> index b5722ea..abd1118 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages:
>  # xen-hvm.c
>  xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx"
>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
> +xen_ioreq_server_create(uint32_t id) "id: %u"
> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
>  
>  # xen-mapcache.c
>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
> diff --git a/xen-hvm.c b/xen-hvm.c
> index 7548794..31cb3ca 100644
> --- a/xen-hvm.c
> +++ b/xen-hvm.c
> @@ -85,9 +85,6 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>  }
>  #  define FMT_ioreq_size "u"
>  #endif
> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
> -#endif
>  
>  #define BUFFER_IO_MAX_DELAY  100
>  
> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
>  } XenPhysmap;
>  
>  typedef struct XenIOState {
> +    ioservid_t ioservid;
>      shared_iopage_t *shared_page;
>      shared_vmport_iopage_t *shared_vmport_page;
>      buffered_iopage_t *buffered_io_page;
> @@ -117,6 +115,8 @@ typedef struct XenIOState {
>  
>      struct xs_handle *xenstore;
>      MemoryListener memory_listener;
> +    MemoryListener io_listener;
> +    DeviceListener device_listener;
>      QLIST_HEAD(, XenPhysmap) physmap;
>      hwaddr free_phys_offset;
>      const XenPhysmap *log_for_dirtybit;
> @@ -467,12 +467,23 @@ static void xen_set_memory(struct MemoryListener *listener,
>      bool log_dirty = memory_region_is_logging(section->mr);
>      hvmmem_type_t mem_type;
>  
> +    if (section->mr == &ram_memory) {
> +        return;
> +    } else {
> +        if (add) {
> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
> +                                   section);
> +        } else {
> +            xen_unmap_memory_section(xen_xc, xen_domid, state->ioservid,
> +                                     section);
> +        }
> +    }
> +
>      if (!memory_region_is_ram(section->mr)) {
>          return;
>      }
>  
> -    if (!(section->mr != &ram_memory
> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
> +    if (log_dirty != add) {
>          return;
>      }
>  
> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener *listener,
>      memory_region_unref(section->mr);
>  }
>  
> +static void xen_io_add(MemoryListener *listener,
> +                       MemoryRegionSection *section)
> +{
> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
> +
> +    memory_region_ref(section->mr);
> +
> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
> +}
> +
> +static void xen_io_del(MemoryListener *listener,
> +                       MemoryRegionSection *section)
> +{
> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
> +
> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid, section);
> +
> +    memory_region_unref(section->mr);
> +}
> +
> +static void xen_device_realize(DeviceListener *listener,
> +			       DeviceState *dev)
> +{
> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
> +
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
> +
> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
> +    }
> +}
> +
> +static void xen_device_unrealize(DeviceListener *listener,
> +				 DeviceState *dev)
> +{
> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
> +
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
> +
> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
> +    }
> +}
> +
>  static void xen_sync_dirty_bitmap(XenIOState *state,
>                                    hwaddr start_addr,
>                                    ram_addr_t size)
> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener = {
>      .priority = 10,
>  };
>  
> +static MemoryListener xen_io_listener = {
> +    .region_add = xen_io_add,
> +    .region_del = xen_io_del,
> +    .priority = 10,
> +};
> +
> +static DeviceListener xen_device_listener = {
> +    .realize = xen_device_realize,
> +    .unrealize = xen_device_unrealize,
> +};
> +
>  /* get the ioreq packets from share mem */
>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState *state, int vcpu)
>  {
> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state, ioreq_t *req)
>          case IOREQ_TYPE_INVALIDATE:
>              xen_invalidate_map_cache();
>              break;
> +        case IOREQ_TYPE_PCI_CONFIG: {
> +            uint32_t sbdf = req->addr >> 32;
> +            uint32_t val;
> +
> +            /* Fake a write to port 0xCF8 so that
> +             * the config space access will target the
> +             * correct device model.
> +             */
> +            val = (1u << 31) |
> +                  ((req->addr & 0x0f00) << 16) |
> +                  ((sbdf & 0xffff) << 8) |
> +                  (req->addr & 0xfc);
> +            do_outp(0xcf8, 4, val);
> +
> +            /* Now issue the config space access via
> +             * port 0xCFC
> +             */
> +            req->addr = 0xcfc | (req->addr & 0x03);
> +            cpu_ioreq_pio(req);
> +            break;
> +        }
>          default:
>              hw_error("Invalid ioreq type 0x%x\n", req->type);
>      }
> @@ -993,9 +1080,15 @@ static void xen_main_loop_prepare(XenIOState *state)
>  static void xen_hvm_change_state_handler(void *opaque, int running,
>                                           RunState rstate)
>  {
> +    XenIOState *state = opaque;
> +
>      if (running) {
> -        xen_main_loop_prepare((XenIOState *)opaque);
> +        xen_main_loop_prepare(state);
>      }
> +
> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
> +                               state->ioservid,
> +                               (rstate == RUN_STATE_RUNNING));
>  }
>  
>  static void xen_exit_notifier(Notifier *n, void *data)
> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>                   MemoryRegion **ram_memory)
>  {
>      int i, rc;
> -    unsigned long ioreq_pfn;
> -    unsigned long bufioreq_evtchn;
> +    xen_pfn_t ioreq_pfn;
> +    xen_pfn_t bufioreq_pfn;
> +    evtchn_port_t bufioreq_evtchn;
>      XenIOState *state;
>  
>      state = g_malloc0(sizeof (XenIOState));
> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>          return -1;
>      }
>  
> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state->ioservid);
> +    if (rc < 0) {
> +        perror("xen: ioreq server create");
> +        return -1;
> +    }
> +
>      state->exit.notify = xen_exit_notifier;
>      qemu_add_exit_notifier(&state->exit);
>  
> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>      state->wakeup.notify = xen_wakeup_notifier;
>      qemu_register_wakeup_notifier(&state->wakeup);
>  
> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN, &ioreq_pfn);
> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state->ioservid,
> +                                   &ioreq_pfn, &bufioreq_pfn,
> +                                   &bufioreq_evtchn);
> +    if (rc < 0) {
> +        hw_error("failed to get ioreq server info: error %d handle=" XC_INTERFACE_FMT,
> +                 errno, xen_xc);
> +    }
> +
>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
> +
>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
>      if (state->shared_page == NULL) {
> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno, rc);
>      }
>  
> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
> -    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
> +    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid,
> +                                                   XC_PAGE_SIZE,
> +                                                   PROT_READ|PROT_WRITE,
> +                                                   bufioreq_pfn);
>      if (state->buffered_io_page == NULL) {
>          hw_error("map buffered IO page returned error %d", errno);
>      }
> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>      /* Note: cpus is empty at this point in init */
>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
>  
> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state->ioservid, true);
> +    if (rc < 0) {
> +        hw_error("failed to enable ioreq server info: error %d handle=" XC_INTERFACE_FMT,
> +                 errno, xen_xc);
> +    }
> +
>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
>  
>      /* FIXME: how about if we overflow the page here? */
> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>                                          xen_vcpu_eport(state->shared_page, i));
>          if (rc == -1) {
> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
>              return -1;
>          }
>          state->ioreq_local_port[i] = rc;
>      }
>  
> -    rc = xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_EVTCHN,
> -            &bufioreq_evtchn);
> -    if (rc < 0) {
> -        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
> -        return -1;
> -    }
>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
> -            (uint32_t)bufioreq_evtchn);
> +                                    bufioreq_evtchn);
>      if (rc == -1) {
> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
>          return -1;
>      }
>      state->bufioreq_local_port = rc;
> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>      memory_listener_register(&state->memory_listener, &address_space_memory);
>      state->log_for_dirtybit = NULL;
>  
> +    state->io_listener = xen_io_listener;
> +    memory_listener_register(&state->io_listener, &address_space_io);
> +
> +    state->device_listener = xen_device_listener;
> +    device_listener_register(&state->device_listener);
> +
>      /* Initialize backend core & drivers */
>      if (xen_be_init() != 0) {
>          fprintf(stderr, "%s: xen backend core setup failed\n", __FUNCTION__);
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available
  2015-01-28 19:32   ` Don Slutz
@ 2015-01-29  0:05     ` Don Slutz
  2015-01-29  0:57       ` Don Slutz
  0 siblings, 1 reply; 20+ messages in thread
From: Don Slutz @ 2015-01-29  0:05 UTC (permalink / raw)
  To: Don Slutz, Paul Durrant, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, Gerd Hoffmann, Stefan Hajnoczi,
	Paolo Bonzini

On 01/28/15 14:32, Don Slutz wrote:
> On 12/05/14 05:50, Paul Durrant wrote:
>> The ioreq-server API added to Xen 4.5 offers better security than
>> the existing Xen/QEMU interface because the shared pages that are
>> used to pass emulation request/results back and forth are removed
>> from the guest's memory space before any requests are serviced.
>> This prevents the guest from mapping these pages (they are in a
>> well known location) and attempting to attack QEMU by synthesizing
>> its own request structures. Hence, this patch modifies configure
>> to detect whether the API is available, and adds the necessary
>> code to use the API if it is.
> 
> This patch (which is now on xenbits qemu staging) is causing me
> issues.
> 

I have found the key.

The following will reproduce my issue:

1) xl create -p <config>
2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or
   HVM_PARAM_BUFIOREQ_EVTCHN
3) xl unpause new guest

The guest will hang in hvmloader.

More in thread:


Subject: Re: [Xen-devel] [PATCH] ioreq-server: handle
IOREQ_TYPE_PCI_CONFIG in assist function
References: <1422385589-17316-1-git-send-email-wei.liu2@citrix.com>


    -Don Slutz


> So far I have tracked it back to hvm_select_ioreq_server()
> which selects the "default_ioreq_server".  Since I have one 1
> QEMU, it is both the "default_ioreq_server" and an enabled
> 2nd ioreq_server.  I am continuing to understand why my changes
> are causing this.  More below.
> 
> This patch causes QEMU to only call xc_evtchn_bind_interdomain()
> for the enabled 2nd ioreq_server.  So when (if)
> hvm_select_ioreq_server() selects the "default_ioreq_server", the
> guest hangs on an I/O.
> 
> Using the debug key 'e':
> 
> (XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
> (XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
> (XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
> (XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
> (XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
> (XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
> (XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
> (XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
> (XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
> (XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
> (XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
> (XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
> (XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
> (XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
> (XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
> (XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
> (XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
> (XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
> (XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
> (XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
> (XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
> (XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
> (XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
> (XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
> (XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
> (XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
> (XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
> (XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
> (XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
> (XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
> (XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
> (XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
> (XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
> (XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
> (XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
> (XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
> (XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
> (XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
> (XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
> (XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
> (XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
> (XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
> (XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
> (XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
> (XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0
> 
> You can see that domain 1 has only half of it's event channels
> fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
> does:
> 
>             notify_via_xen_event_channel(d, port);
> 
> Nothing happens and you hang in hvm_wait_for_io() forever.
> 
> 
> This does raise the questions:
> 
> 1) Does this patch causes extra event channels to be created
>    that cannot be used?
> 
> 2) Should the "default_ioreq_server" be deleted?
> 
> 
> Not sure the right way to go.
> 
>     -Don Slutz
> 
> 
>>
>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Michael Tokarev <mjt@tls.msk.ru>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Cc: Stefan Weil <sw@weilnetz.de>
>> Cc: Olaf Hering <olaf@aepfle.de>
>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Cc: Alexander Graf <agraf@suse.de>
>> ---
>>  configure                   |   29 ++++++
>>  include/hw/xen/xen_common.h |  223 +++++++++++++++++++++++++++++++++++++++++++
>>  trace-events                |    9 ++
>>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
>>  4 files changed, 399 insertions(+), 22 deletions(-)
>>
>> diff --git a/configure b/configure
>> index 47048f0..b1f8c2a 100755
>> --- a/configure
>> +++ b/configure
>> @@ -1877,6 +1877,32 @@ int main(void) {
>>    xc_gnttab_open(NULL, 0);
>>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
>> +  return 0;
>> +}
>> +EOF
>> +      compile_prog "" "$xen_libs"
>> +    then
>> +    xen_ctrl_version=450
>> +    xen=yes
>> +
>> +  elif
>> +      cat > $TMPC <<EOF &&
>> +#include <xenctrl.h>
>> +#include <xenstore.h>
>> +#include <stdint.h>
>> +#include <xen/hvm/hvm_info_table.h>
>> +#if !defined(HVM_MAX_VCPUS)
>> +# error HVM_MAX_VCPUS not defined
>> +#endif
>> +int main(void) {
>> +  xc_interface *xc;
>> +  xs_daemon_open();
>> +  xc = xc_interface_open(0, 0, 0);
>> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
>> +  xc_gnttab_open(NULL, 0);
>> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>    return 0;
>>  }
>>  EOF
>> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
>>      echo "Target Sparc Arch $sparc_cpu"
>>  fi
>>  echo "xen support       $xen"
>> +if test "$xen" = "yes" ; then
>> +  echo "xen ctrl version  $xen_ctrl_version"
>> +fi
>>  echo "brlapi support    $brlapi"
>>  echo "bluez  support    $bluez"
>>  echo "Documentation     $docs"
>> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
>> index 95612a4..519696f 100644
>> --- a/include/hw/xen/xen_common.h
>> +++ b/include/hw/xen/xen_common.h
>> @@ -16,7 +16,9 @@
>>  
>>  #include "hw/hw.h"
>>  #include "hw/xen/xen.h"
>> +#include "hw/pci/pci.h"
>>  #include "qemu/queue.h"
>> +#include "trace.h"
>>  
>>  /*
>>   * We don't support Xen prior to 3.3.0.
>> @@ -179,4 +181,225 @@ static inline int xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
>>  }
>>  #endif
>>  
>> +/* Xen before 4.5 */
>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
>> +
>> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>> +#endif
>> +
>> +#define IOREQ_TYPE_PCI_CONFIG 2
>> +
>> +typedef uint32_t ioservid_t;
>> +
>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>> +                                          ioservid_t ioservid,
>> +                                          MemoryRegionSection *section)
>> +{
>> +}
>> +
>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid,
>> +                                            MemoryRegionSection *section)
>> +{
>> +}
>> +
>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>> +                                      ioservid_t ioservid,
>> +                                      MemoryRegionSection *section)
>> +{
>> +}
>> +
>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>> +                                        ioservid_t ioservid,
>> +                                        MemoryRegionSection *section)
>> +{
>> +}
>> +
>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>> +                                  ioservid_t ioservid,
>> +                                  PCIDevice *pci_dev)
>> +{
>> +}
>> +
>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>> +                                    ioservid_t ioservid,
>> +                                    PCIDevice *pci_dev)
>> +{
>> +}
>> +
>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>> +                                          ioservid_t *ioservid)
>> +{
>> +    return 0;
>> +}
>> +
>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid)
>> +{
>> +}
>> +
>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid,
>> +                                            xen_pfn_t *ioreq_pfn,
>> +                                            xen_pfn_t *bufioreq_pfn,
>> +                                            evtchn_port_t *bufioreq_evtchn)
>> +{
>> +    unsigned long param;
>> +    int rc;
>> +
>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN, &param);
>> +    if (rc < 0) {
>> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
>> +        return -1;
>> +    }
>> +
>> +    *ioreq_pfn = param;
>> +
>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN, &param);
>> +    if (rc < 0) {
>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
>> +        return -1;
>> +    }
>> +
>> +    *bufioreq_pfn = param;
>> +
>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_EVTCHN,
>> +                          &param);
>> +    if (rc < 0) {
>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
>> +        return -1;
>> +    }
>> +
>> +    *bufioreq_evtchn = param;
>> +
>> +    return 0;
>> +}
>> +
>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>> +                                             ioservid_t ioservid,
>> +                                             bool enable)
>> +{
>> +    return 0;
>> +}
>> +
>> +/* Xen 4.5 */
>> +#else
>> +
>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>> +                                          ioservid_t ioservid,
>> +                                          MemoryRegionSection *section)
>> +{
>> +    hwaddr start_addr = section->offset_within_address_space;
>> +    ram_addr_t size = int128_get64(section->size);
>> +    hwaddr end_addr = start_addr + size - 1;
>> +
>> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
>> +                                        start_addr, end_addr);
>> +}
>> +
>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid,
>> +                                            MemoryRegionSection *section)
>> +{
>> +    hwaddr start_addr = section->offset_within_address_space;
>> +    ram_addr_t size = int128_get64(section->size);
>> +    hwaddr end_addr = start_addr + size - 1;
>> +
>> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
>> +                                            start_addr, end_addr);
>> +}
>> +
>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>> +                                      ioservid_t ioservid,
>> +                                      MemoryRegionSection *section)
>> +{
>> +    hwaddr start_addr = section->offset_within_address_space;
>> +    ram_addr_t size = int128_get64(section->size);
>> +    hwaddr end_addr = start_addr + size - 1;
>> +
>> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
>> +                                        start_addr, end_addr);
>> +}
>> +
>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>> +                                        ioservid_t ioservid,
>> +                                        MemoryRegionSection *section)
>> +{
>> +    hwaddr start_addr = section->offset_within_address_space;
>> +    ram_addr_t size = int128_get64(section->size);
>> +    hwaddr end_addr = start_addr + size - 1;
>> +
>> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
>> +                                            start_addr, end_addr);
>> +}
>> +
>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>> +                                  ioservid_t ioservid,
>> +                                  PCIDevice *pci_dev)
>> +{
>> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
>> +                                      0, pci_bus_num(pci_dev->bus),
>> +                                      PCI_SLOT(pci_dev->devfn),
>> +                                      PCI_FUNC(pci_dev->devfn));
>> +}
>> +
>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>> +                                    ioservid_t ioservid,
>> +                                    PCIDevice *pci_dev)
>> +{
>> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
>> +                                          0, pci_bus_num(pci_dev->bus),
>> +                                          PCI_SLOT(pci_dev->devfn),
>> +                                          PCI_FUNC(pci_dev->devfn));
>> +}
>> +
>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>> +                                          ioservid_t *ioservid)
>> +{
>> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
>> +
>> +    if (rc == 0) {
>> +        trace_xen_ioreq_server_create(*ioservid);
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid)
>> +{
>> +    trace_xen_ioreq_server_destroy(ioservid);
>> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
>> +}
>> +
>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid,
>> +                                            xen_pfn_t *ioreq_pfn,
>> +                                            xen_pfn_t *bufioreq_pfn,
>> +                                            evtchn_port_t *bufioreq_evtchn)
>> +{
>> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
>> +                                        ioreq_pfn, bufioreq_pfn,
>> +                                        bufioreq_evtchn);
>> +}
>> +
>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>> +                                             ioservid_t ioservid,
>> +                                             bool enable)
>> +{
>> +    trace_xen_ioreq_server_state(ioservid, enable);
>> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
>> +}
>> +
>> +#endif
>> +
>>  #endif /* QEMU_HW_XEN_COMMON_H */
>> diff --git a/trace-events b/trace-events
>> index b5722ea..abd1118 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages:
>>  # xen-hvm.c
>>  xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx"
>>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>> +xen_ioreq_server_create(uint32_t id) "id: %u"
>> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
>> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
>> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
>>  
>>  # xen-mapcache.c
>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
>> diff --git a/xen-hvm.c b/xen-hvm.c
>> index 7548794..31cb3ca 100644
>> --- a/xen-hvm.c
>> +++ b/xen-hvm.c
>> @@ -85,9 +85,6 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>>  }
>>  #  define FMT_ioreq_size "u"
>>  #endif
>> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>> -#endif
>>  
>>  #define BUFFER_IO_MAX_DELAY  100
>>  
>> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
>>  } XenPhysmap;
>>  
>>  typedef struct XenIOState {
>> +    ioservid_t ioservid;
>>      shared_iopage_t *shared_page;
>>      shared_vmport_iopage_t *shared_vmport_page;
>>      buffered_iopage_t *buffered_io_page;
>> @@ -117,6 +115,8 @@ typedef struct XenIOState {
>>  
>>      struct xs_handle *xenstore;
>>      MemoryListener memory_listener;
>> +    MemoryListener io_listener;
>> +    DeviceListener device_listener;
>>      QLIST_HEAD(, XenPhysmap) physmap;
>>      hwaddr free_phys_offset;
>>      const XenPhysmap *log_for_dirtybit;
>> @@ -467,12 +467,23 @@ static void xen_set_memory(struct MemoryListener *listener,
>>      bool log_dirty = memory_region_is_logging(section->mr);
>>      hvmmem_type_t mem_type;
>>  
>> +    if (section->mr == &ram_memory) {
>> +        return;
>> +    } else {
>> +        if (add) {
>> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
>> +                                   section);
>> +        } else {
>> +            xen_unmap_memory_section(xen_xc, xen_domid, state->ioservid,
>> +                                     section);
>> +        }
>> +    }
>> +
>>      if (!memory_region_is_ram(section->mr)) {
>>          return;
>>      }
>>  
>> -    if (!(section->mr != &ram_memory
>> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
>> +    if (log_dirty != add) {
>>          return;
>>      }
>>  
>> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener *listener,
>>      memory_region_unref(section->mr);
>>  }
>>  
>> +static void xen_io_add(MemoryListener *listener,
>> +                       MemoryRegionSection *section)
>> +{
>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>> +
>> +    memory_region_ref(section->mr);
>> +
>> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
>> +}
>> +
>> +static void xen_io_del(MemoryListener *listener,
>> +                       MemoryRegionSection *section)
>> +{
>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>> +
>> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid, section);
>> +
>> +    memory_region_unref(section->mr);
>> +}
>> +
>> +static void xen_device_realize(DeviceListener *listener,
>> +			       DeviceState *dev)
>> +{
>> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
>> +
>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>> +
>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>> +    }
>> +}
>> +
>> +static void xen_device_unrealize(DeviceListener *listener,
>> +				 DeviceState *dev)
>> +{
>> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
>> +
>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>> +
>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>> +    }
>> +}
>> +
>>  static void xen_sync_dirty_bitmap(XenIOState *state,
>>                                    hwaddr start_addr,
>>                                    ram_addr_t size)
>> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener = {
>>      .priority = 10,
>>  };
>>  
>> +static MemoryListener xen_io_listener = {
>> +    .region_add = xen_io_add,
>> +    .region_del = xen_io_del,
>> +    .priority = 10,
>> +};
>> +
>> +static DeviceListener xen_device_listener = {
>> +    .realize = xen_device_realize,
>> +    .unrealize = xen_device_unrealize,
>> +};
>> +
>>  /* get the ioreq packets from share mem */
>>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState *state, int vcpu)
>>  {
>> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state, ioreq_t *req)
>>          case IOREQ_TYPE_INVALIDATE:
>>              xen_invalidate_map_cache();
>>              break;
>> +        case IOREQ_TYPE_PCI_CONFIG: {
>> +            uint32_t sbdf = req->addr >> 32;
>> +            uint32_t val;
>> +
>> +            /* Fake a write to port 0xCF8 so that
>> +             * the config space access will target the
>> +             * correct device model.
>> +             */
>> +            val = (1u << 31) |
>> +                  ((req->addr & 0x0f00) << 16) |
>> +                  ((sbdf & 0xffff) << 8) |
>> +                  (req->addr & 0xfc);
>> +            do_outp(0xcf8, 4, val);
>> +
>> +            /* Now issue the config space access via
>> +             * port 0xCFC
>> +             */
>> +            req->addr = 0xcfc | (req->addr & 0x03);
>> +            cpu_ioreq_pio(req);
>> +            break;
>> +        }
>>          default:
>>              hw_error("Invalid ioreq type 0x%x\n", req->type);
>>      }
>> @@ -993,9 +1080,15 @@ static void xen_main_loop_prepare(XenIOState *state)
>>  static void xen_hvm_change_state_handler(void *opaque, int running,
>>                                           RunState rstate)
>>  {
>> +    XenIOState *state = opaque;
>> +
>>      if (running) {
>> -        xen_main_loop_prepare((XenIOState *)opaque);
>> +        xen_main_loop_prepare(state);
>>      }
>> +
>> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
>> +                               state->ioservid,
>> +                               (rstate == RUN_STATE_RUNNING));
>>  }
>>  
>>  static void xen_exit_notifier(Notifier *n, void *data)
>> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>                   MemoryRegion **ram_memory)
>>  {
>>      int i, rc;
>> -    unsigned long ioreq_pfn;
>> -    unsigned long bufioreq_evtchn;
>> +    xen_pfn_t ioreq_pfn;
>> +    xen_pfn_t bufioreq_pfn;
>> +    evtchn_port_t bufioreq_evtchn;
>>      XenIOState *state;
>>  
>>      state = g_malloc0(sizeof (XenIOState));
>> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>          return -1;
>>      }
>>  
>> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state->ioservid);
>> +    if (rc < 0) {
>> +        perror("xen: ioreq server create");
>> +        return -1;
>> +    }
>> +
>>      state->exit.notify = xen_exit_notifier;
>>      qemu_add_exit_notifier(&state->exit);
>>  
>> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>      state->wakeup.notify = xen_wakeup_notifier;
>>      qemu_register_wakeup_notifier(&state->wakeup);
>>  
>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN, &ioreq_pfn);
>> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state->ioservid,
>> +                                   &ioreq_pfn, &bufioreq_pfn,
>> +                                   &bufioreq_evtchn);
>> +    if (rc < 0) {
>> +        hw_error("failed to get ioreq server info: error %d handle=" XC_INTERFACE_FMT,
>> +                 errno, xen_xc);
>> +    }
>> +
>>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
>> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
>> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
>> +
>>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
>>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
>>      if (state->shared_page == NULL) {
>> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno, rc);
>>      }
>>  
>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
>> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
>> -    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
>> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
>> +    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid,
>> +                                                   XC_PAGE_SIZE,
>> +                                                   PROT_READ|PROT_WRITE,
>> +                                                   bufioreq_pfn);
>>      if (state->buffered_io_page == NULL) {
>>          hw_error("map buffered IO page returned error %d", errno);
>>      }
>> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>      /* Note: cpus is empty at this point in init */
>>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
>>  
>> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state->ioservid, true);
>> +    if (rc < 0) {
>> +        hw_error("failed to enable ioreq server info: error %d handle=" XC_INTERFACE_FMT,
>> +                 errno, xen_xc);
>> +    }
>> +
>>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
>>  
>>      /* FIXME: how about if we overflow the page here? */
>> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>                                          xen_vcpu_eport(state->shared_page, i));
>>          if (rc == -1) {
>> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
>>              return -1;
>>          }
>>          state->ioreq_local_port[i] = rc;
>>      }
>>  
>> -    rc = xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_EVTCHN,
>> -            &bufioreq_evtchn);
>> -    if (rc < 0) {
>> -        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
>> -        return -1;
>> -    }
>>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>> -            (uint32_t)bufioreq_evtchn);
>> +                                    bufioreq_evtchn);
>>      if (rc == -1) {
>> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
>>          return -1;
>>      }
>>      state->bufioreq_local_port = rc;
>> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>      memory_listener_register(&state->memory_listener, &address_space_memory);
>>      state->log_for_dirtybit = NULL;
>>  
>> +    state->io_listener = xen_io_listener;
>> +    memory_listener_register(&state->io_listener, &address_space_io);
>> +
>> +    state->device_listener = xen_device_listener;
>> +    device_listener_register(&state->device_listener);
>> +
>>      /* Initialize backend core & drivers */
>>      if (xen_be_init() != 0) {
>>          fprintf(stderr, "%s: xen backend core setup failed\n", __FUNCTION__);
>>
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available
  2015-01-29  0:05     ` Don Slutz
@ 2015-01-29  0:57       ` Don Slutz
  2015-01-29 12:09         ` Paul Durrant
  0 siblings, 1 reply; 20+ messages in thread
From: Don Slutz @ 2015-01-29  0:57 UTC (permalink / raw)
  To: Don Slutz, Paul Durrant, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, Gerd Hoffmann, Stefan Hajnoczi,
	Paolo Bonzini



On 01/28/15 19:05, Don Slutz wrote:
> On 01/28/15 14:32, Don Slutz wrote:
>> On 12/05/14 05:50, Paul Durrant wrote:
>>> The ioreq-server API added to Xen 4.5 offers better security than
>>> the existing Xen/QEMU interface because the shared pages that are
>>> used to pass emulation request/results back and forth are removed
>>> from the guest's memory space before any requests are serviced.
>>> This prevents the guest from mapping these pages (they are in a
>>> well known location) and attempting to attack QEMU by synthesizing
>>> its own request structures. Hence, this patch modifies configure
>>> to detect whether the API is available, and adds the necessary
>>> code to use the API if it is.
>>
>> This patch (which is now on xenbits qemu staging) is causing me
>> issues.
>>
> 
> I have found the key.
> 
> The following will reproduce my issue:
> 
> 1) xl create -p <config>
> 2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or
>    HVM_PARAM_BUFIOREQ_EVTCHN
> 3) xl unpause new guest
> 
> The guest will hang in hvmloader.
> 
> More in thread:
> 
> 
> Subject: Re: [Xen-devel] [PATCH] ioreq-server: handle
> IOREQ_TYPE_PCI_CONFIG in assist function
> References: <1422385589-17316-1-git-send-email-wei.liu2@citrix.com>
> 
> 

Opps, That thread does not make sense to include what I have found.

Here is the info I was going to send there:


Using QEMU upstream master (or xenbits qemu staging), you do not have a
default ioreq server.  And so hvm_select_ioreq_server() returns NULL for
hvmloader's iorequest to:

CPU4  0 (+       0)  HANDLE_PIO [ port = 0x0cfe size = 2 dir = 1 ]

(I added this xentrace to figure out what is happening, and I have
a lot of data about it, if any one wants it.)

To get a guest hang instead of calling hvm_complete_assist_req()
for some of hvmloader's pci_read() calls, you can do the following:


1) xl create -p <config>
2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or
   HVM_PARAM_BUFIOREQ_EVTCHN
3) xl unpause new guest

The guest will hang in hvmloader.

The read of HVM_PARAM_IOREQ_PFN will cause a default ioreq server to
be created and directed to the QEMU upsteam that is not a default
ioreq server.  This read also creates the extra event channels that
I see.

I see that hvmop_create_ioreq_server() prevents you from creating
an is_default ioreq_server, so QEMU is not able to do.

Not sure where we go from here.

   -Don Slutz


>     -Don Slutz
> 
> 
>> So far I have tracked it back to hvm_select_ioreq_server()
>> which selects the "default_ioreq_server".  Since I have one 1
>> QEMU, it is both the "default_ioreq_server" and an enabled
>> 2nd ioreq_server.  I am continuing to understand why my changes
>> are causing this.  More below.
>>
>> This patch causes QEMU to only call xc_evtchn_bind_interdomain()
>> for the enabled 2nd ioreq_server.  So when (if)
>> hvm_select_ioreq_server() selects the "default_ioreq_server", the
>> guest hangs on an I/O.
>>
>> Using the debug key 'e':
>>
>> (XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
>> (XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
>> (XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
>> (XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
>> (XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
>> (XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
>> (XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
>> (XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
>> (XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
>> (XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
>> (XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
>> (XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
>> (XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
>> (XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
>> (XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
>> (XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
>> (XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
>> (XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
>> (XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
>> (XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
>> (XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
>> (XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
>> (XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
>> (XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
>> (XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
>> (XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
>> (XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
>> (XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
>> (XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
>> (XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
>> (XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
>> (XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
>> (XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
>> (XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
>> (XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
>> (XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
>> (XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
>> (XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
>> (XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
>> (XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
>> (XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
>> (XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
>> (XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0
>>
>> You can see that domain 1 has only half of it's event channels
>> fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
>> does:
>>
>>             notify_via_xen_event_channel(d, port);
>>
>> Nothing happens and you hang in hvm_wait_for_io() forever.
>>
>>
>> This does raise the questions:
>>
>> 1) Does this patch causes extra event channels to be created
>>    that cannot be used?
>>
>> 2) Should the "default_ioreq_server" be deleted?
>>
>>
>> Not sure the right way to go.
>>
>>     -Don Slutz
>>
>>
>>>
>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>> Cc: Peter Maydell <peter.maydell@linaro.org>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Michael Tokarev <mjt@tls.msk.ru>
>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>> Cc: Stefan Weil <sw@weilnetz.de>
>>> Cc: Olaf Hering <olaf@aepfle.de>
>>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> Cc: Alexander Graf <agraf@suse.de>
>>> ---
>>>  configure                   |   29 ++++++
>>>  include/hw/xen/xen_common.h |  223 +++++++++++++++++++++++++++++++++++++++++++
>>>  trace-events                |    9 ++
>>>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
>>>  4 files changed, 399 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/configure b/configure
>>> index 47048f0..b1f8c2a 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -1877,6 +1877,32 @@ int main(void) {
>>>    xc_gnttab_open(NULL, 0);
>>>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
>>> +  return 0;
>>> +}
>>> +EOF
>>> +      compile_prog "" "$xen_libs"
>>> +    then
>>> +    xen_ctrl_version=450
>>> +    xen=yes
>>> +
>>> +  elif
>>> +      cat > $TMPC <<EOF &&
>>> +#include <xenctrl.h>
>>> +#include <xenstore.h>
>>> +#include <stdint.h>
>>> +#include <xen/hvm/hvm_info_table.h>
>>> +#if !defined(HVM_MAX_VCPUS)
>>> +# error HVM_MAX_VCPUS not defined
>>> +#endif
>>> +int main(void) {
>>> +  xc_interface *xc;
>>> +  xs_daemon_open();
>>> +  xc = xc_interface_open(0, 0, 0);
>>> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
>>> +  xc_gnttab_open(NULL, 0);
>>> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>>    return 0;
>>>  }
>>>  EOF
>>> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
>>>      echo "Target Sparc Arch $sparc_cpu"
>>>  fi
>>>  echo "xen support       $xen"
>>> +if test "$xen" = "yes" ; then
>>> +  echo "xen ctrl version  $xen_ctrl_version"
>>> +fi
>>>  echo "brlapi support    $brlapi"
>>>  echo "bluez  support    $bluez"
>>>  echo "Documentation     $docs"
>>> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
>>> index 95612a4..519696f 100644
>>> --- a/include/hw/xen/xen_common.h
>>> +++ b/include/hw/xen/xen_common.h
>>> @@ -16,7 +16,9 @@
>>>  
>>>  #include "hw/hw.h"
>>>  #include "hw/xen/xen.h"
>>> +#include "hw/pci/pci.h"
>>>  #include "qemu/queue.h"
>>> +#include "trace.h"
>>>  
>>>  /*
>>>   * We don't support Xen prior to 3.3.0.
>>> @@ -179,4 +181,225 @@ static inline int xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
>>>  }
>>>  #endif
>>>  
>>> +/* Xen before 4.5 */
>>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
>>> +
>>> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>> +#endif
>>> +
>>> +#define IOREQ_TYPE_PCI_CONFIG 2
>>> +
>>> +typedef uint32_t ioservid_t;
>>> +
>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>> +                                          ioservid_t ioservid,
>>> +                                          MemoryRegionSection *section)
>>> +{
>>> +}
>>> +
>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid,
>>> +                                            MemoryRegionSection *section)
>>> +{
>>> +}
>>> +
>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>> +                                      ioservid_t ioservid,
>>> +                                      MemoryRegionSection *section)
>>> +{
>>> +}
>>> +
>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>> +                                        ioservid_t ioservid,
>>> +                                        MemoryRegionSection *section)
>>> +{
>>> +}
>>> +
>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>> +                                  ioservid_t ioservid,
>>> +                                  PCIDevice *pci_dev)
>>> +{
>>> +}
>>> +
>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>> +                                    ioservid_t ioservid,
>>> +                                    PCIDevice *pci_dev)
>>> +{
>>> +}
>>> +
>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>> +                                          ioservid_t *ioservid)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid)
>>> +{
>>> +}
>>> +
>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid,
>>> +                                            xen_pfn_t *ioreq_pfn,
>>> +                                            xen_pfn_t *bufioreq_pfn,
>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>> +{
>>> +    unsigned long param;
>>> +    int rc;
>>> +
>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN, &param);
>>> +    if (rc < 0) {
>>> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
>>> +        return -1;
>>> +    }
>>> +
>>> +    *ioreq_pfn = param;
>>> +
>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN, &param);
>>> +    if (rc < 0) {
>>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
>>> +        return -1;
>>> +    }
>>> +
>>> +    *bufioreq_pfn = param;
>>> +
>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_EVTCHN,
>>> +                          &param);
>>> +    if (rc < 0) {
>>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>> +        return -1;
>>> +    }
>>> +
>>> +    *bufioreq_evtchn = param;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>> +                                             ioservid_t ioservid,
>>> +                                             bool enable)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +/* Xen 4.5 */
>>> +#else
>>> +
>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>> +                                          ioservid_t ioservid,
>>> +                                          MemoryRegionSection *section)
>>> +{
>>> +    hwaddr start_addr = section->offset_within_address_space;
>>> +    ram_addr_t size = int128_get64(section->size);
>>> +    hwaddr end_addr = start_addr + size - 1;
>>> +
>>> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
>>> +                                        start_addr, end_addr);
>>> +}
>>> +
>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid,
>>> +                                            MemoryRegionSection *section)
>>> +{
>>> +    hwaddr start_addr = section->offset_within_address_space;
>>> +    ram_addr_t size = int128_get64(section->size);
>>> +    hwaddr end_addr = start_addr + size - 1;
>>> +
>>> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
>>> +                                            start_addr, end_addr);
>>> +}
>>> +
>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>> +                                      ioservid_t ioservid,
>>> +                                      MemoryRegionSection *section)
>>> +{
>>> +    hwaddr start_addr = section->offset_within_address_space;
>>> +    ram_addr_t size = int128_get64(section->size);
>>> +    hwaddr end_addr = start_addr + size - 1;
>>> +
>>> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
>>> +                                        start_addr, end_addr);
>>> +}
>>> +
>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>> +                                        ioservid_t ioservid,
>>> +                                        MemoryRegionSection *section)
>>> +{
>>> +    hwaddr start_addr = section->offset_within_address_space;
>>> +    ram_addr_t size = int128_get64(section->size);
>>> +    hwaddr end_addr = start_addr + size - 1;
>>> +
>>> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
>>> +                                            start_addr, end_addr);
>>> +}
>>> +
>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>> +                                  ioservid_t ioservid,
>>> +                                  PCIDevice *pci_dev)
>>> +{
>>> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
>>> +                                      0, pci_bus_num(pci_dev->bus),
>>> +                                      PCI_SLOT(pci_dev->devfn),
>>> +                                      PCI_FUNC(pci_dev->devfn));
>>> +}
>>> +
>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>> +                                    ioservid_t ioservid,
>>> +                                    PCIDevice *pci_dev)
>>> +{
>>> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
>>> +                                          0, pci_bus_num(pci_dev->bus),
>>> +                                          PCI_SLOT(pci_dev->devfn),
>>> +                                          PCI_FUNC(pci_dev->devfn));
>>> +}
>>> +
>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>> +                                          ioservid_t *ioservid)
>>> +{
>>> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
>>> +
>>> +    if (rc == 0) {
>>> +        trace_xen_ioreq_server_create(*ioservid);
>>> +    }
>>> +
>>> +    return rc;
>>> +}
>>> +
>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid)
>>> +{
>>> +    trace_xen_ioreq_server_destroy(ioservid);
>>> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
>>> +}
>>> +
>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid,
>>> +                                            xen_pfn_t *ioreq_pfn,
>>> +                                            xen_pfn_t *bufioreq_pfn,
>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>> +{
>>> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
>>> +                                        ioreq_pfn, bufioreq_pfn,
>>> +                                        bufioreq_evtchn);
>>> +}
>>> +
>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>> +                                             ioservid_t ioservid,
>>> +                                             bool enable)
>>> +{
>>> +    trace_xen_ioreq_server_state(ioservid, enable);
>>> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
>>> +}
>>> +
>>> +#endif
>>> +
>>>  #endif /* QEMU_HW_XEN_COMMON_H */
>>> diff --git a/trace-events b/trace-events
>>> index b5722ea..abd1118 100644
>>> --- a/trace-events
>>> +++ b/trace-events
>>> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages:
>>>  # xen-hvm.c
>>>  xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx"
>>>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>>> +xen_ioreq_server_create(uint32_t id) "id: %u"
>>> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
>>> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
>>> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
>>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
>>>  
>>>  # xen-mapcache.c
>>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
>>> diff --git a/xen-hvm.c b/xen-hvm.c
>>> index 7548794..31cb3ca 100644
>>> --- a/xen-hvm.c
>>> +++ b/xen-hvm.c
>>> @@ -85,9 +85,6 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>>>  }
>>>  #  define FMT_ioreq_size "u"
>>>  #endif
>>> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>> -#endif
>>>  
>>>  #define BUFFER_IO_MAX_DELAY  100
>>>  
>>> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
>>>  } XenPhysmap;
>>>  
>>>  typedef struct XenIOState {
>>> +    ioservid_t ioservid;
>>>      shared_iopage_t *shared_page;
>>>      shared_vmport_iopage_t *shared_vmport_page;
>>>      buffered_iopage_t *buffered_io_page;
>>> @@ -117,6 +115,8 @@ typedef struct XenIOState {
>>>  
>>>      struct xs_handle *xenstore;
>>>      MemoryListener memory_listener;
>>> +    MemoryListener io_listener;
>>> +    DeviceListener device_listener;
>>>      QLIST_HEAD(, XenPhysmap) physmap;
>>>      hwaddr free_phys_offset;
>>>      const XenPhysmap *log_for_dirtybit;
>>> @@ -467,12 +467,23 @@ static void xen_set_memory(struct MemoryListener *listener,
>>>      bool log_dirty = memory_region_is_logging(section->mr);
>>>      hvmmem_type_t mem_type;
>>>  
>>> +    if (section->mr == &ram_memory) {
>>> +        return;
>>> +    } else {
>>> +        if (add) {
>>> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
>>> +                                   section);
>>> +        } else {
>>> +            xen_unmap_memory_section(xen_xc, xen_domid, state->ioservid,
>>> +                                     section);
>>> +        }
>>> +    }
>>> +
>>>      if (!memory_region_is_ram(section->mr)) {
>>>          return;
>>>      }
>>>  
>>> -    if (!(section->mr != &ram_memory
>>> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
>>> +    if (log_dirty != add) {
>>>          return;
>>>      }
>>>  
>>> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener *listener,
>>>      memory_region_unref(section->mr);
>>>  }
>>>  
>>> +static void xen_io_add(MemoryListener *listener,
>>> +                       MemoryRegionSection *section)
>>> +{
>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>> +
>>> +    memory_region_ref(section->mr);
>>> +
>>> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
>>> +}
>>> +
>>> +static void xen_io_del(MemoryListener *listener,
>>> +                       MemoryRegionSection *section)
>>> +{
>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>> +
>>> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid, section);
>>> +
>>> +    memory_region_unref(section->mr);
>>> +}
>>> +
>>> +static void xen_device_realize(DeviceListener *listener,
>>> +			       DeviceState *dev)
>>> +{
>>> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
>>> +
>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>> +
>>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>> +    }
>>> +}
>>> +
>>> +static void xen_device_unrealize(DeviceListener *listener,
>>> +				 DeviceState *dev)
>>> +{
>>> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
>>> +
>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>> +
>>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>> +    }
>>> +}
>>> +
>>>  static void xen_sync_dirty_bitmap(XenIOState *state,
>>>                                    hwaddr start_addr,
>>>                                    ram_addr_t size)
>>> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener = {
>>>      .priority = 10,
>>>  };
>>>  
>>> +static MemoryListener xen_io_listener = {
>>> +    .region_add = xen_io_add,
>>> +    .region_del = xen_io_del,
>>> +    .priority = 10,
>>> +};
>>> +
>>> +static DeviceListener xen_device_listener = {
>>> +    .realize = xen_device_realize,
>>> +    .unrealize = xen_device_unrealize,
>>> +};
>>> +
>>>  /* get the ioreq packets from share mem */
>>>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState *state, int vcpu)
>>>  {
>>> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state, ioreq_t *req)
>>>          case IOREQ_TYPE_INVALIDATE:
>>>              xen_invalidate_map_cache();
>>>              break;
>>> +        case IOREQ_TYPE_PCI_CONFIG: {
>>> +            uint32_t sbdf = req->addr >> 32;
>>> +            uint32_t val;
>>> +
>>> +            /* Fake a write to port 0xCF8 so that
>>> +             * the config space access will target the
>>> +             * correct device model.
>>> +             */
>>> +            val = (1u << 31) |
>>> +                  ((req->addr & 0x0f00) << 16) |
>>> +                  ((sbdf & 0xffff) << 8) |
>>> +                  (req->addr & 0xfc);
>>> +            do_outp(0xcf8, 4, val);
>>> +
>>> +            /* Now issue the config space access via
>>> +             * port 0xCFC
>>> +             */
>>> +            req->addr = 0xcfc | (req->addr & 0x03);
>>> +            cpu_ioreq_pio(req);
>>> +            break;
>>> +        }
>>>          default:
>>>              hw_error("Invalid ioreq type 0x%x\n", req->type);
>>>      }
>>> @@ -993,9 +1080,15 @@ static void xen_main_loop_prepare(XenIOState *state)
>>>  static void xen_hvm_change_state_handler(void *opaque, int running,
>>>                                           RunState rstate)
>>>  {
>>> +    XenIOState *state = opaque;
>>> +
>>>      if (running) {
>>> -        xen_main_loop_prepare((XenIOState *)opaque);
>>> +        xen_main_loop_prepare(state);
>>>      }
>>> +
>>> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
>>> +                               state->ioservid,
>>> +                               (rstate == RUN_STATE_RUNNING));
>>>  }
>>>  
>>>  static void xen_exit_notifier(Notifier *n, void *data)
>>> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>                   MemoryRegion **ram_memory)
>>>  {
>>>      int i, rc;
>>> -    unsigned long ioreq_pfn;
>>> -    unsigned long bufioreq_evtchn;
>>> +    xen_pfn_t ioreq_pfn;
>>> +    xen_pfn_t bufioreq_pfn;
>>> +    evtchn_port_t bufioreq_evtchn;
>>>      XenIOState *state;
>>>  
>>>      state = g_malloc0(sizeof (XenIOState));
>>> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>          return -1;
>>>      }
>>>  
>>> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state->ioservid);
>>> +    if (rc < 0) {
>>> +        perror("xen: ioreq server create");
>>> +        return -1;
>>> +    }
>>> +
>>>      state->exit.notify = xen_exit_notifier;
>>>      qemu_add_exit_notifier(&state->exit);
>>>  
>>> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>      state->wakeup.notify = xen_wakeup_notifier;
>>>      qemu_register_wakeup_notifier(&state->wakeup);
>>>  
>>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN, &ioreq_pfn);
>>> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state->ioservid,
>>> +                                   &ioreq_pfn, &bufioreq_pfn,
>>> +                                   &bufioreq_evtchn);
>>> +    if (rc < 0) {
>>> +        hw_error("failed to get ioreq server info: error %d handle=" XC_INTERFACE_FMT,
>>> +                 errno, xen_xc);
>>> +    }
>>> +
>>>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
>>> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
>>> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
>>> +
>>>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
>>>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
>>>      if (state->shared_page == NULL) {
>>> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno, rc);
>>>      }
>>>  
>>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
>>> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
>>> -    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
>>> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
>>> +    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid,
>>> +                                                   XC_PAGE_SIZE,
>>> +                                                   PROT_READ|PROT_WRITE,
>>> +                                                   bufioreq_pfn);
>>>      if (state->buffered_io_page == NULL) {
>>>          hw_error("map buffered IO page returned error %d", errno);
>>>      }
>>> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>      /* Note: cpus is empty at this point in init */
>>>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
>>>  
>>> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state->ioservid, true);
>>> +    if (rc < 0) {
>>> +        hw_error("failed to enable ioreq server info: error %d handle=" XC_INTERFACE_FMT,
>>> +                 errno, xen_xc);
>>> +    }
>>> +
>>>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
>>>  
>>>      /* FIXME: how about if we overflow the page here? */
>>> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>>                                          xen_vcpu_eport(state->shared_page, i));
>>>          if (rc == -1) {
>>> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
>>>              return -1;
>>>          }
>>>          state->ioreq_local_port[i] = rc;
>>>      }
>>>  
>>> -    rc = xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_EVTCHN,
>>> -            &bufioreq_evtchn);
>>> -    if (rc < 0) {
>>> -        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>> -        return -1;
>>> -    }
>>>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>> -            (uint32_t)bufioreq_evtchn);
>>> +                                    bufioreq_evtchn);
>>>      if (rc == -1) {
>>> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
>>>          return -1;
>>>      }
>>>      state->bufioreq_local_port = rc;
>>> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>      memory_listener_register(&state->memory_listener, &address_space_memory);
>>>      state->log_for_dirtybit = NULL;
>>>  
>>> +    state->io_listener = xen_io_listener;
>>> +    memory_listener_register(&state->io_listener, &address_space_io);
>>> +
>>> +    state->device_listener = xen_device_listener;
>>> +    device_listener_register(&state->device_listener);
>>> +
>>>      /* Initialize backend core & drivers */
>>>      if (xen_be_init() != 0) {
>>>          fprintf(stderr, "%s: xen backend core setup failed\n", __FUNCTION__);
>>>
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available
  2015-01-29  0:57       ` Don Slutz
@ 2015-01-29 12:09         ` Paul Durrant
  2015-01-29 19:14           ` Don Slutz
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2015-01-29 12:09 UTC (permalink / raw)
  To: Don Slutz, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, Gerd Hoffmann, Stefan Hajnoczi,
	Paolo Bonzini

> -----Original Message-----
> From: Don Slutz [mailto:dslutz@verizon.com]
> Sent: 29 January 2015 00:58
> To: Don Slutz; Paul Durrant; qemu-devel@nongnu.org; Stefano Stabellini
> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini
> Subject: Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API
> when available
> 
> 
> 
> On 01/28/15 19:05, Don Slutz wrote:
> > On 01/28/15 14:32, Don Slutz wrote:
> >> On 12/05/14 05:50, Paul Durrant wrote:
> >>> The ioreq-server API added to Xen 4.5 offers better security than
> >>> the existing Xen/QEMU interface because the shared pages that are
> >>> used to pass emulation request/results back and forth are removed
> >>> from the guest's memory space before any requests are serviced.
> >>> This prevents the guest from mapping these pages (they are in a
> >>> well known location) and attempting to attack QEMU by synthesizing
> >>> its own request structures. Hence, this patch modifies configure
> >>> to detect whether the API is available, and adds the necessary
> >>> code to use the API if it is.
> >>
> >> This patch (which is now on xenbits qemu staging) is causing me
> >> issues.
> >>
> >
> > I have found the key.
> >
> > The following will reproduce my issue:
> >
> > 1) xl create -p <config>
> > 2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN,
> or
> >    HVM_PARAM_BUFIOREQ_EVTCHN
> > 3) xl unpause new guest
> >
> > The guest will hang in hvmloader.
> >
> > More in thread:
> >
> >
> > Subject: Re: [Xen-devel] [PATCH] ioreq-server: handle
> > IOREQ_TYPE_PCI_CONFIG in assist function
> > References: <1422385589-17316-1-git-send-email-wei.liu2@citrix.com>
> >
> >
> 
> Opps, That thread does not make sense to include what I have found.
> 
> Here is the info I was going to send there:
> 
> 
> Using QEMU upstream master (or xenbits qemu staging), you do not have a
> default ioreq server.  And so hvm_select_ioreq_server() returns NULL for
> hvmloader's iorequest to:
> 
> CPU4  0 (+       0)  HANDLE_PIO [ port = 0x0cfe size = 2 dir = 1 ]
> 
> (I added this xentrace to figure out what is happening, and I have
> a lot of data about it, if any one wants it.)
> 
> To get a guest hang instead of calling hvm_complete_assist_req()
> for some of hvmloader's pci_read() calls, you can do the following:
> 
> 
> 1) xl create -p <config>
> 2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN,
> or
>    HVM_PARAM_BUFIOREQ_EVTCHN
> 3) xl unpause new guest
> 
> The guest will hang in hvmloader.
> 
> The read of HVM_PARAM_IOREQ_PFN will cause a default ioreq server to
> be created and directed to the QEMU upsteam that is not a default
> ioreq server.  This read also creates the extra event channels that
> I see.
> 
> I see that hvmop_create_ioreq_server() prevents you from creating
> an is_default ioreq_server, so QEMU is not able to do.
> 
> Not sure where we go from here.
> 

Given that IIRC you are using a new dedicated IOREQ type, I think there needs to be something that allows an emulator to register for this IOREQ type. How about adding a new type to those defined for HVMOP_map_io_range_to_ioreq_server for your case? (In your case the start and end values in the hypercall would be meaningless but it could be used to steer hvm_select_ioreq_server() into sending all emulation requests or your new type to QEMU.
Actually such a mechanism could be used to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU patches, they are going nowhere. Upstream QEMU (as default) used to ignore them anyway, which is why I didn't bother with such a patch to Xen before but since you now need one maybe you could add that too?

  Paul

>    -Don Slutz
> 
> 
> >     -Don Slutz
> >
> >
> >> So far I have tracked it back to hvm_select_ioreq_server()
> >> which selects the "default_ioreq_server".  Since I have one 1
> >> QEMU, it is both the "default_ioreq_server" and an enabled
> >> 2nd ioreq_server.  I am continuing to understand why my changes
> >> are causing this.  More below.
> >>
> >> This patch causes QEMU to only call xc_evtchn_bind_interdomain()
> >> for the enabled 2nd ioreq_server.  So when (if)
> >> hvm_select_ioreq_server() selects the "default_ioreq_server", the
> >> guest hangs on an I/O.
> >>
> >> Using the debug key 'e':
> >>
> >> (XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
> >> (XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
> >> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
> >> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
> >> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
> >> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
> >> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
> >> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
> >> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
> >> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
> >> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
> >> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
> >> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
> >> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
> >> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
> >> (XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
> >> (XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
> >> (XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
> >> (XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
> >> (XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
> >> (XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
> >> (XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
> >> (XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
> >> (XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
> >> (XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
> >> (XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
> >> (XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
> >> (XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
> >> (XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
> >> (XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
> >> (XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
> >> (XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
> >> (XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
> >> (XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
> >> (XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
> >> (XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
> >> (XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
> >> (XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
> >> (XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
> >> (XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
> >> (XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
> >> (XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
> >> (XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
> >> (XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
> >> (XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
> >> (XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
> >> (XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
> >> (XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
> >> (XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
> >> (XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
> >> (XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
> >> (XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
> >> (XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
> >> (XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
> >> (XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
> >> (XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
> >> (XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
> >> (XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
> >> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
> >> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
> >> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
> >> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
> >> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
> >> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
> >> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
> >> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
> >> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
> >> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
> >> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
> >> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0
> >>
> >> You can see that domain 1 has only half of it's event channels
> >> fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
> >> does:
> >>
> >>             notify_via_xen_event_channel(d, port);
> >>
> >> Nothing happens and you hang in hvm_wait_for_io() forever.
> >>
> >>
> >> This does raise the questions:
> >>
> >> 1) Does this patch causes extra event channels to be created
> >>    that cannot be used?
> >>
> >> 2) Should the "default_ioreq_server" be deleted?
> >>
> >>
> >> Not sure the right way to go.
> >>
> >>     -Don Slutz
> >>
> >>
> >>>
> >>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> >>> Cc: Peter Maydell <peter.maydell@linaro.org>
> >>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >>> Cc: Michael Tokarev <mjt@tls.msk.ru>
> >>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> >>> Cc: Stefan Weil <sw@weilnetz.de>
> >>> Cc: Olaf Hering <olaf@aepfle.de>
> >>> Cc: Gerd Hoffmann <kraxel@redhat.com>
> >>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>> Cc: Alexander Graf <agraf@suse.de>
> >>> ---
> >>>  configure                   |   29 ++++++
> >>>  include/hw/xen/xen_common.h |  223
> +++++++++++++++++++++++++++++++++++++++++++
> >>>  trace-events                |    9 ++
> >>>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
> >>>  4 files changed, 399 insertions(+), 22 deletions(-)
> >>>
> >>> diff --git a/configure b/configure
> >>> index 47048f0..b1f8c2a 100755
> >>> --- a/configure
> >>> +++ b/configure
> >>> @@ -1877,6 +1877,32 @@ int main(void) {
> >>>    xc_gnttab_open(NULL, 0);
> >>>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
> >>>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
> >>> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
> >>> +  return 0;
> >>> +}
> >>> +EOF
> >>> +      compile_prog "" "$xen_libs"
> >>> +    then
> >>> +    xen_ctrl_version=450
> >>> +    xen=yes
> >>> +
> >>> +  elif
> >>> +      cat > $TMPC <<EOF &&
> >>> +#include <xenctrl.h>
> >>> +#include <xenstore.h>
> >>> +#include <stdint.h>
> >>> +#include <xen/hvm/hvm_info_table.h>
> >>> +#if !defined(HVM_MAX_VCPUS)
> >>> +# error HVM_MAX_VCPUS not defined
> >>> +#endif
> >>> +int main(void) {
> >>> +  xc_interface *xc;
> >>> +  xs_daemon_open();
> >>> +  xc = xc_interface_open(0, 0, 0);
> >>> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
> >>> +  xc_gnttab_open(NULL, 0);
> >>> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
> >>> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
> >>>    return 0;
> >>>  }
> >>>  EOF
> >>> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
> >>>      echo "Target Sparc Arch $sparc_cpu"
> >>>  fi
> >>>  echo "xen support       $xen"
> >>> +if test "$xen" = "yes" ; then
> >>> +  echo "xen ctrl version  $xen_ctrl_version"
> >>> +fi
> >>>  echo "brlapi support    $brlapi"
> >>>  echo "bluez  support    $bluez"
> >>>  echo "Documentation     $docs"
> >>> diff --git a/include/hw/xen/xen_common.h
> b/include/hw/xen/xen_common.h
> >>> index 95612a4..519696f 100644
> >>> --- a/include/hw/xen/xen_common.h
> >>> +++ b/include/hw/xen/xen_common.h
> >>> @@ -16,7 +16,9 @@
> >>>
> >>>  #include "hw/hw.h"
> >>>  #include "hw/xen/xen.h"
> >>> +#include "hw/pci/pci.h"
> >>>  #include "qemu/queue.h"
> >>> +#include "trace.h"
> >>>
> >>>  /*
> >>>   * We don't support Xen prior to 3.3.0.
> >>> @@ -179,4 +181,225 @@ static inline int
> xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
> >>>  }
> >>>  #endif
> >>>
> >>> +/* Xen before 4.5 */
> >>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
> >>> +
> >>> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
> >>> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
> >>> +#endif
> >>> +
> >>> +#define IOREQ_TYPE_PCI_CONFIG 2
> >>> +
> >>> +typedef uint32_t ioservid_t;
> >>> +
> >>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
> >>> +                                          ioservid_t ioservid,
> >>> +                                          MemoryRegionSection *section)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
> dom,
> >>> +                                            ioservid_t ioservid,
> >>> +                                            MemoryRegionSection *section)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
> >>> +                                      ioservid_t ioservid,
> >>> +                                      MemoryRegionSection *section)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
> >>> +                                        ioservid_t ioservid,
> >>> +                                        MemoryRegionSection *section)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
> >>> +                                  ioservid_t ioservid,
> >>> +                                  PCIDevice *pci_dev)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
> >>> +                                    ioservid_t ioservid,
> >>> +                                    PCIDevice *pci_dev)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
> >>> +                                          ioservid_t *ioservid)
> >>> +{
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
> >>> +                                            ioservid_t ioservid)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
> >>> +                                            ioservid_t ioservid,
> >>> +                                            xen_pfn_t *ioreq_pfn,
> >>> +                                            xen_pfn_t *bufioreq_pfn,
> >>> +                                            evtchn_port_t *bufioreq_evtchn)
> >>> +{
> >>> +    unsigned long param;
> >>> +    int rc;
> >>> +
> >>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN,
> &param);
> >>> +    if (rc < 0) {
> >>> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
> >>> +        return -1;
> >>> +    }
> >>> +
> >>> +    *ioreq_pfn = param;
> >>> +
> >>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN,
> &param);
> >>> +    if (rc < 0) {
> >>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
> >>> +        return -1;
> >>> +    }
> >>> +
> >>> +    *bufioreq_pfn = param;
> >>> +
> >>> +    rc = xc_get_hvm_param(xc, dom,
> HVM_PARAM_BUFIOREQ_EVTCHN,
> >>> +                          &param);
> >>> +    if (rc < 0) {
> >>> +        fprintf(stderr, "failed to get
> HVM_PARAM_BUFIOREQ_EVTCHN\n");
> >>> +        return -1;
> >>> +    }
> >>> +
> >>> +    *bufioreq_evtchn = param;
> >>> +
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
> >>> +                                             ioservid_t ioservid,
> >>> +                                             bool enable)
> >>> +{
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +/* Xen 4.5 */
> >>> +#else
> >>> +
> >>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
> >>> +                                          ioservid_t ioservid,
> >>> +                                          MemoryRegionSection *section)
> >>> +{
> >>> +    hwaddr start_addr = section->offset_within_address_space;
> >>> +    ram_addr_t size = int128_get64(section->size);
> >>> +    hwaddr end_addr = start_addr + size - 1;
> >>> +
> >>> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
> >>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
> >>> +                                        start_addr, end_addr);
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
> dom,
> >>> +                                            ioservid_t ioservid,
> >>> +                                            MemoryRegionSection *section)
> >>> +{
> >>> +    hwaddr start_addr = section->offset_within_address_space;
> >>> +    ram_addr_t size = int128_get64(section->size);
> >>> +    hwaddr end_addr = start_addr + size - 1;
> >>> +
> >>> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
> >>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
> >>> +                                            start_addr, end_addr);
> >>> +}
> >>> +
> >>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
> >>> +                                      ioservid_t ioservid,
> >>> +                                      MemoryRegionSection *section)
> >>> +{
> >>> +    hwaddr start_addr = section->offset_within_address_space;
> >>> +    ram_addr_t size = int128_get64(section->size);
> >>> +    hwaddr end_addr = start_addr + size - 1;
> >>> +
> >>> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
> >>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
> >>> +                                        start_addr, end_addr);
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
> >>> +                                        ioservid_t ioservid,
> >>> +                                        MemoryRegionSection *section)
> >>> +{
> >>> +    hwaddr start_addr = section->offset_within_address_space;
> >>> +    ram_addr_t size = int128_get64(section->size);
> >>> +    hwaddr end_addr = start_addr + size - 1;
> >>> +
> >>> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
> >>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
> >>> +                                            start_addr, end_addr);
> >>> +}
> >>> +
> >>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
> >>> +                                  ioservid_t ioservid,
> >>> +                                  PCIDevice *pci_dev)
> >>> +{
> >>> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
> >>> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
> >>> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
> >>> +                                      0, pci_bus_num(pci_dev->bus),
> >>> +                                      PCI_SLOT(pci_dev->devfn),
> >>> +                                      PCI_FUNC(pci_dev->devfn));
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
> >>> +                                    ioservid_t ioservid,
> >>> +                                    PCIDevice *pci_dev)
> >>> +{
> >>> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
> >>> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
> >>> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
> >>> +                                          0, pci_bus_num(pci_dev->bus),
> >>> +                                          PCI_SLOT(pci_dev->devfn),
> >>> +                                          PCI_FUNC(pci_dev->devfn));
> >>> +}
> >>> +
> >>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
> >>> +                                          ioservid_t *ioservid)
> >>> +{
> >>> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
> >>> +
> >>> +    if (rc == 0) {
> >>> +        trace_xen_ioreq_server_create(*ioservid);
> >>> +    }
> >>> +
> >>> +    return rc;
> >>> +}
> >>> +
> >>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
> >>> +                                            ioservid_t ioservid)
> >>> +{
> >>> +    trace_xen_ioreq_server_destroy(ioservid);
> >>> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
> >>> +}
> >>> +
> >>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
> >>> +                                            ioservid_t ioservid,
> >>> +                                            xen_pfn_t *ioreq_pfn,
> >>> +                                            xen_pfn_t *bufioreq_pfn,
> >>> +                                            evtchn_port_t *bufioreq_evtchn)
> >>> +{
> >>> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
> >>> +                                        ioreq_pfn, bufioreq_pfn,
> >>> +                                        bufioreq_evtchn);
> >>> +}
> >>> +
> >>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
> >>> +                                             ioservid_t ioservid,
> >>> +                                             bool enable)
> >>> +{
> >>> +    trace_xen_ioreq_server_state(ioservid, enable);
> >>> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
> >>> +}
> >>> +
> >>> +#endif
> >>> +
> >>>  #endif /* QEMU_HW_XEN_COMMON_H */
> >>> diff --git a/trace-events b/trace-events
> >>> index b5722ea..abd1118 100644
> >>> --- a/trace-events
> >>> +++ b/trace-events
> >>> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label,
> uint32_t num) "Number of %s pages:
> >>>  # xen-hvm.c
> >>>  xen_ram_alloc(unsigned long ram_addr, unsigned long size)
> "requested: %#lx, size %#lx"
> >>>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool
> log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
> >>> +xen_ioreq_server_create(uint32_t id) "id: %u"
> >>> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
> >>> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
> >>> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> >>> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> >>> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t
> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> >>> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t
> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> >>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
> "id: %u bdf: %02x.%02x.%02x"
> >>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
> "id: %u bdf: %02x.%02x.%02x"
> >>>
> >>>  # xen-mapcache.c
> >>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
> >>> diff --git a/xen-hvm.c b/xen-hvm.c
> >>> index 7548794..31cb3ca 100644
> >>> --- a/xen-hvm.c
> >>> +++ b/xen-hvm.c
> >>> @@ -85,9 +85,6 @@ static inline ioreq_t
> *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
> >>>  }
> >>>  #  define FMT_ioreq_size "u"
> >>>  #endif
> >>> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
> >>> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
> >>> -#endif
> >>>
> >>>  #define BUFFER_IO_MAX_DELAY  100
> >>>
> >>> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
> >>>  } XenPhysmap;
> >>>
> >>>  typedef struct XenIOState {
> >>> +    ioservid_t ioservid;
> >>>      shared_iopage_t *shared_page;
> >>>      shared_vmport_iopage_t *shared_vmport_page;
> >>>      buffered_iopage_t *buffered_io_page;
> >>> @@ -117,6 +115,8 @@ typedef struct XenIOState {
> >>>
> >>>      struct xs_handle *xenstore;
> >>>      MemoryListener memory_listener;
> >>> +    MemoryListener io_listener;
> >>> +    DeviceListener device_listener;
> >>>      QLIST_HEAD(, XenPhysmap) physmap;
> >>>      hwaddr free_phys_offset;
> >>>      const XenPhysmap *log_for_dirtybit;
> >>> @@ -467,12 +467,23 @@ static void xen_set_memory(struct
> MemoryListener *listener,
> >>>      bool log_dirty = memory_region_is_logging(section->mr);
> >>>      hvmmem_type_t mem_type;
> >>>
> >>> +    if (section->mr == &ram_memory) {
> >>> +        return;
> >>> +    } else {
> >>> +        if (add) {
> >>> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
> >>> +                                   section);
> >>> +        } else {
> >>> +            xen_unmap_memory_section(xen_xc, xen_domid, state-
> >ioservid,
> >>> +                                     section);
> >>> +        }
> >>> +    }
> >>> +
> >>>      if (!memory_region_is_ram(section->mr)) {
> >>>          return;
> >>>      }
> >>>
> >>> -    if (!(section->mr != &ram_memory
> >>> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
> >>> +    if (log_dirty != add) {
> >>>          return;
> >>>      }
> >>>
> >>> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener
> *listener,
> >>>      memory_region_unref(section->mr);
> >>>  }
> >>>
> >>> +static void xen_io_add(MemoryListener *listener,
> >>> +                       MemoryRegionSection *section)
> >>> +{
> >>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
> >>> +
> >>> +    memory_region_ref(section->mr);
> >>> +
> >>> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
> >>> +}
> >>> +
> >>> +static void xen_io_del(MemoryListener *listener,
> >>> +                       MemoryRegionSection *section)
> >>> +{
> >>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
> >>> +
> >>> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid,
> section);
> >>> +
> >>> +    memory_region_unref(section->mr);
> >>> +}
> >>> +
> >>> +static void xen_device_realize(DeviceListener *listener,
> >>> +			       DeviceState *dev)
> >>> +{
> >>> +    XenIOState *state = container_of(listener, XenIOState,
> device_listener);
> >>> +
> >>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> >>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
> >>> +
> >>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
> >>> +    }
> >>> +}
> >>> +
> >>> +static void xen_device_unrealize(DeviceListener *listener,
> >>> +				 DeviceState *dev)
> >>> +{
> >>> +    XenIOState *state = container_of(listener, XenIOState,
> device_listener);
> >>> +
> >>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> >>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
> >>> +
> >>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
> >>> +    }
> >>> +}
> >>> +
> >>>  static void xen_sync_dirty_bitmap(XenIOState *state,
> >>>                                    hwaddr start_addr,
> >>>                                    ram_addr_t size)
> >>> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener =
> {
> >>>      .priority = 10,
> >>>  };
> >>>
> >>> +static MemoryListener xen_io_listener = {
> >>> +    .region_add = xen_io_add,
> >>> +    .region_del = xen_io_del,
> >>> +    .priority = 10,
> >>> +};
> >>> +
> >>> +static DeviceListener xen_device_listener = {
> >>> +    .realize = xen_device_realize,
> >>> +    .unrealize = xen_device_unrealize,
> >>> +};
> >>> +
> >>>  /* get the ioreq packets from share mem */
> >>>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState
> *state, int vcpu)
> >>>  {
> >>> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state,
> ioreq_t *req)
> >>>          case IOREQ_TYPE_INVALIDATE:
> >>>              xen_invalidate_map_cache();
> >>>              break;
> >>> +        case IOREQ_TYPE_PCI_CONFIG: {
> >>> +            uint32_t sbdf = req->addr >> 32;
> >>> +            uint32_t val;
> >>> +
> >>> +            /* Fake a write to port 0xCF8 so that
> >>> +             * the config space access will target the
> >>> +             * correct device model.
> >>> +             */
> >>> +            val = (1u << 31) |
> >>> +                  ((req->addr & 0x0f00) << 16) |
> >>> +                  ((sbdf & 0xffff) << 8) |
> >>> +                  (req->addr & 0xfc);
> >>> +            do_outp(0xcf8, 4, val);
> >>> +
> >>> +            /* Now issue the config space access via
> >>> +             * port 0xCFC
> >>> +             */
> >>> +            req->addr = 0xcfc | (req->addr & 0x03);
> >>> +            cpu_ioreq_pio(req);
> >>> +            break;
> >>> +        }
> >>>          default:
> >>>              hw_error("Invalid ioreq type 0x%x\n", req->type);
> >>>      }
> >>> @@ -993,9 +1080,15 @@ static void
> xen_main_loop_prepare(XenIOState *state)
> >>>  static void xen_hvm_change_state_handler(void *opaque, int running,
> >>>                                           RunState rstate)
> >>>  {
> >>> +    XenIOState *state = opaque;
> >>> +
> >>>      if (running) {
> >>> -        xen_main_loop_prepare((XenIOState *)opaque);
> >>> +        xen_main_loop_prepare(state);
> >>>      }
> >>> +
> >>> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
> >>> +                               state->ioservid,
> >>> +                               (rstate == RUN_STATE_RUNNING));
> >>>  }
> >>>
> >>>  static void xen_exit_notifier(Notifier *n, void *data)
> >>> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>                   MemoryRegion **ram_memory)
> >>>  {
> >>>      int i, rc;
> >>> -    unsigned long ioreq_pfn;
> >>> -    unsigned long bufioreq_evtchn;
> >>> +    xen_pfn_t ioreq_pfn;
> >>> +    xen_pfn_t bufioreq_pfn;
> >>> +    evtchn_port_t bufioreq_evtchn;
> >>>      XenIOState *state;
> >>>
> >>>      state = g_malloc0(sizeof (XenIOState));
> >>> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>          return -1;
> >>>      }
> >>>
> >>> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state-
> >ioservid);
> >>> +    if (rc < 0) {
> >>> +        perror("xen: ioreq server create");
> >>> +        return -1;
> >>> +    }
> >>> +
> >>>      state->exit.notify = xen_exit_notifier;
> >>>      qemu_add_exit_notifier(&state->exit);
> >>>
> >>> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>      state->wakeup.notify = xen_wakeup_notifier;
> >>>      qemu_register_wakeup_notifier(&state->wakeup);
> >>>
> >>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN,
> &ioreq_pfn);
> >>> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state-
> >ioservid,
> >>> +                                   &ioreq_pfn, &bufioreq_pfn,
> >>> +                                   &bufioreq_evtchn);
> >>> +    if (rc < 0) {
> >>> +        hw_error("failed to get ioreq server info: error %d handle="
> XC_INTERFACE_FMT,
> >>> +                 errno, xen_xc);
> >>> +    }
> >>> +
> >>>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
> >>> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
> >>> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
> >>> +
> >>>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid,
> XC_PAGE_SIZE,
> >>>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
> >>>      if (state->shared_page == NULL) {
> >>> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno,
> rc);
> >>>      }
> >>>
> >>> -    xc_get_hvm_param(xen_xc, xen_domid,
> HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
> >>> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
> >>> -    state->buffered_io_page = xc_map_foreign_range(xen_xc,
> xen_domid, XC_PAGE_SIZE,
> >>> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
> >>> +    state->buffered_io_page = xc_map_foreign_range(xen_xc,
> xen_domid,
> >>> +                                                   XC_PAGE_SIZE,
> >>> +                                                   PROT_READ|PROT_WRITE,
> >>> +                                                   bufioreq_pfn);
> >>>      if (state->buffered_io_page == NULL) {
> >>>          hw_error("map buffered IO page returned error %d", errno);
> >>>      }
> >>> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>      /* Note: cpus is empty at this point in init */
> >>>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
> >>>
> >>> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state-
> >ioservid, true);
> >>> +    if (rc < 0) {
> >>> +        hw_error("failed to enable ioreq server info: error %d handle="
> XC_INTERFACE_FMT,
> >>> +                 errno, xen_xc);
> >>> +    }
> >>> +
> >>>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof
> (evtchn_port_t));
> >>>
> >>>      /* FIXME: how about if we overflow the page here? */
> >>> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
> >>>                                          xen_vcpu_eport(state->shared_page, i));
> >>>          if (rc == -1) {
> >>> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
> >>> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
> >>>              return -1;
> >>>          }
> >>>          state->ioreq_local_port[i] = rc;
> >>>      }
> >>>
> >>> -    rc = xc_get_hvm_param(xen_xc, xen_domid,
> HVM_PARAM_BUFIOREQ_EVTCHN,
> >>> -            &bufioreq_evtchn);
> >>> -    if (rc < 0) {
> >>> -        fprintf(stderr, "failed to get
> HVM_PARAM_BUFIOREQ_EVTCHN\n");
> >>> -        return -1;
> >>> -    }
> >>>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
> >>> -            (uint32_t)bufioreq_evtchn);
> >>> +                                    bufioreq_evtchn);
> >>>      if (rc == -1) {
> >>> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
> >>> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
> >>>          return -1;
> >>>      }
> >>>      state->bufioreq_local_port = rc;
> >>> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>      memory_listener_register(&state->memory_listener,
> &address_space_memory);
> >>>      state->log_for_dirtybit = NULL;
> >>>
> >>> +    state->io_listener = xen_io_listener;
> >>> +    memory_listener_register(&state->io_listener, &address_space_io);
> >>> +
> >>> +    state->device_listener = xen_device_listener;
> >>> +    device_listener_register(&state->device_listener);
> >>> +
> >>>      /* Initialize backend core & drivers */
> >>>      if (xen_be_init() != 0) {
> >>>          fprintf(stderr, "%s: xen backend core setup failed\n",
> __FUNCTION__);
> >>>
> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available
  2015-01-29 12:09         ` Paul Durrant
@ 2015-01-29 19:14           ` Don Slutz
  2015-01-29 19:41             ` [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT Don Slutz
                               ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Don Slutz @ 2015-01-29 19:14 UTC (permalink / raw)
  To: Paul Durrant, Don Slutz, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, Gerd Hoffmann, Stefan Hajnoczi,
	Paolo Bonzini

On 01/29/15 07:09, Paul Durrant wrote:
>> -----Original Message-----
>> From: Don Slutz [mailto:dslutz@verizon.com]
>> Sent: 29 January 2015 00:58
>> To: Don Slutz; Paul Durrant; qemu-devel@nongnu.org; Stefano Stabellini
>> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
>> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini
>> Subject: Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API
>> when available
>>
>>
>>
>> On 01/28/15 19:05, Don Slutz wrote:
>>> On 01/28/15 14:32, Don Slutz wrote:
>>>> On 12/05/14 05:50, Paul Durrant wrote:
>>>>> The ioreq-server API added to Xen 4.5 offers better security than
>>>>> the existing Xen/QEMU interface because the shared pages that are
>>>>> used to pass emulation request/results back and forth are removed
>>>>> from the guest's memory space before any requests are serviced.
>>>>> This prevents the guest from mapping these pages (they are in a
>>>>> well known location) and attempting to attack QEMU by synthesizing
>>>>> its own request structures. Hence, this patch modifies configure
>>>>> to detect whether the API is available, and adds the necessary
>>>>> code to use the API if it is.
>>>>
>>>> This patch (which is now on xenbits qemu staging) is causing me
>>>> issues.
>>>>
>>>
>>> I have found the key.
>>>
>>> The following will reproduce my issue:
>>>
>>> 1) xl create -p <config>
>>> 2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN,
>> or
>>>    HVM_PARAM_BUFIOREQ_EVTCHN
>>> 3) xl unpause new guest
>>>
>>> The guest will hang in hvmloader.
>>>
>>> More in thread:
>>>
>>>
>>> Subject: Re: [Xen-devel] [PATCH] ioreq-server: handle
>>> IOREQ_TYPE_PCI_CONFIG in assist function
>>> References: <1422385589-17316-1-git-send-email-wei.liu2@citrix.com>
>>>
>>>
>>
>> Opps, That thread does not make sense to include what I have found.
>>
>> Here is the info I was going to send there:
>>
>>
>> Using QEMU upstream master (or xenbits qemu staging), you do not have a
>> default ioreq server.  And so hvm_select_ioreq_server() returns NULL for
>> hvmloader's iorequest to:
>>
>> CPU4  0 (+       0)  HANDLE_PIO [ port = 0x0cfe size = 2 dir = 1 ]
>>
>> (I added this xentrace to figure out what is happening, and I have
>> a lot of data about it, if any one wants it.)
>>
>> To get a guest hang instead of calling hvm_complete_assist_req()
>> for some of hvmloader's pci_read() calls, you can do the following:
>>
>>
>> 1) xl create -p <config>
>> 2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN,
>> or
>>    HVM_PARAM_BUFIOREQ_EVTCHN
>> 3) xl unpause new guest
>>
>> The guest will hang in hvmloader.
>>
>> The read of HVM_PARAM_IOREQ_PFN will cause a default ioreq server to
>> be created and directed to the QEMU upsteam that is not a default
>> ioreq server.  This read also creates the extra event channels that
>> I see.
>>
>> I see that hvmop_create_ioreq_server() prevents you from creating
>> an is_default ioreq_server, so QEMU is not able to do.
>>
>> Not sure where we go from here.
>>
> 
> Given that IIRC you are using a new dedicated IOREQ type, I
> think there needs to be something that allows an emulator to
> register for this IOREQ type. How about adding a new type to
> those defined for HVMOP_map_io_range_to_ioreq_server for your
> case? (In your case the start and end values in the hypercall
> would be meaningless but it could be used to steer
> hvm_select_ioreq_server() into sending all emulation requests or
> your new type to QEMU.
> 
> Actually such a mechanism could be used
> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
> patches, they are going nowhere. Upstream QEMU (as default) used
> to ignore them anyway, which is why I didn't bother with such a
> patch to Xen before but since you now need one maybe you could
> add that too?
>

I am confused by these statements.  They do contain useful information
but do not have any relation to the issue I am talking about.

Here is longer description:


In a newly cloned xen with the .config:

QEMU_UPSTREAM_REVISION = master
QEMU_UPSTREAM_URL = git://xenbits.xen.org/staging/qemu-upstream-unstable.git
debug = n

And build it:

./configure --prefix=/usr --disable-stubdom
make -j8 rpmball

And run it:

[root@dcs-xen-54 ~]# xl cre -p -V /home/don/aoe-xfg/C63-min-tools.trace.xfg

Read hvm_param(Ioreq_Pfn)

[root@dcs-xen-54 ~]# xl unpause C63-min-tools
[root@dcs-xen-54 ~]# xl list
Name                                        ID   Mem VCPUs      State
Time(s)
Domain-0                                     0  2048     8     r-----
   35.5
C63-min-tools                                1  8194     1     ------
    0.0
[root@dcs-xen-54 ~]# date
Thu Jan 29 12:23:10 EST 2015
[root@dcs-xen-54 ~]# xl list
Name                                        ID   Mem VCPUs      State
Time(s)
Domain-0                                     0  2048     8     r-----
   66.9
C63-min-tools                                1  8194     1     ------
    0.0
[root@dcs-xen-54 ~]# /usr/lib/xen/bin/xenctx 1
cs:eip: 0018:00101583
flags: 00000002 nz
ss:esp: 0020:001ba488
eax: 80000108   ebx: 00000002   ecx: 00000002   edx: 00000cfe
esi: 00000000   edi: 00000000   ebp: 00000000
 ds:     0020    es:     0020    fs:     0020    gs:     0020
Code (instr addr 00101583)
0c 00 00 ec 0f b6 c0 5b c3 90 83 e3 02 8d 93 fc 0c 00 00 66 ed <0f> b7
c0 5b c3 90 8d b4 26 00 00
[root@dcs-xen-54 ~]#


You can see that the guest is still waiting for the inl from 0x00000cfe.




-- I used the tool (From:

Subject: [OPTIONAL][PATCH for-4.5 v7 7/7] Add xen-hvm-param
Date: Thu, 2 Oct 2014 17:30:17 -0400
Message-ID: <1412285417-19180-8-git-send-email-dslutz@verizon.com>
X-Mailer: git-send-email 1.8.4
In-Reply-To: <1412285417-19180-1-git-send-email-dslutz@verizon.com>

) "dist/install/usr/sbin/xen-hvm-param 1" to do the hvm param read (any
way that calls on xc_get_hvm_param(,,HVM_PARAM_IOREQ_PFN,) will cause
the issue).

   -Don Slutz


>   Paul
> 
>>    -Don Slutz
>>
>>
>>>     -Don Slutz
>>>
>>>
>>>> So far I have tracked it back to hvm_select_ioreq_server()
>>>> which selects the "default_ioreq_server".  Since I have one 1
>>>> QEMU, it is both the "default_ioreq_server" and an enabled
>>>> 2nd ioreq_server.  I am continuing to understand why my changes
>>>> are causing this.  More below.
>>>>
>>>> This patch causes QEMU to only call xc_evtchn_bind_interdomain()
>>>> for the enabled 2nd ioreq_server.  So when (if)
>>>> hvm_select_ioreq_server() selects the "default_ioreq_server", the
>>>> guest hangs on an I/O.
>>>>
>>>> Using the debug key 'e':
>>>>
>>>> (XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
>>>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
>>>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>>>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>>>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
>>>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
>>>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
>>>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
>>>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
>>>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
>>>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
>>>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
>>>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
>>>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
>>>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
>>>> (XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
>>>> (XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
>>>> (XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
>>>> (XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
>>>> (XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
>>>> (XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
>>>> (XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
>>>> (XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
>>>> (XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
>>>> (XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
>>>> (XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
>>>> (XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
>>>> (XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
>>>> (XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
>>>> (XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
>>>> (XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
>>>> (XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
>>>> (XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
>>>> (XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
>>>> (XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
>>>> (XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
>>>> (XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
>>>> (XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
>>>> (XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
>>>> (XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
>>>> (XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
>>>> (XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
>>>> (XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
>>>> (XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
>>>> (XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
>>>> (XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
>>>> (XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
>>>> (XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
>>>> (XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
>>>> (XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
>>>> (XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
>>>> (XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
>>>> (XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
>>>> (XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
>>>> (XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
>>>> (XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
>>>> (XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
>>>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
>>>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>>>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>>>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
>>>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
>>>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
>>>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
>>>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
>>>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
>>>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
>>>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
>>>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
>>>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0
>>>>
>>>> You can see that domain 1 has only half of it's event channels
>>>> fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
>>>> does:
>>>>
>>>>             notify_via_xen_event_channel(d, port);
>>>>
>>>> Nothing happens and you hang in hvm_wait_for_io() forever.
>>>>
>>>>
>>>> This does raise the questions:
>>>>
>>>> 1) Does this patch causes extra event channels to be created
>>>>    that cannot be used?
>>>>
>>>> 2) Should the "default_ioreq_server" be deleted?
>>>>
>>>>
>>>> Not sure the right way to go.
>>>>
>>>>     -Don Slutz
>>>>
>>>>
>>>>>
>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>>> Cc: Peter Maydell <peter.maydell@linaro.org>
>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>>> Cc: Michael Tokarev <mjt@tls.msk.ru>
>>>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>>>> Cc: Stefan Weil <sw@weilnetz.de>
>>>>> Cc: Olaf Hering <olaf@aepfle.de>
>>>>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> Cc: Alexander Graf <agraf@suse.de>
>>>>> ---
>>>>>  configure                   |   29 ++++++
>>>>>  include/hw/xen/xen_common.h |  223
>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>  trace-events                |    9 ++
>>>>>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
>>>>>  4 files changed, 399 insertions(+), 22 deletions(-)
>>>>>
>>>>> diff --git a/configure b/configure
>>>>> index 47048f0..b1f8c2a 100755
>>>>> --- a/configure
>>>>> +++ b/configure
>>>>> @@ -1877,6 +1877,32 @@ int main(void) {
>>>>>    xc_gnttab_open(NULL, 0);
>>>>>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>>>>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>>>> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
>>>>> +  return 0;
>>>>> +}
>>>>> +EOF
>>>>> +      compile_prog "" "$xen_libs"
>>>>> +    then
>>>>> +    xen_ctrl_version=450
>>>>> +    xen=yes
>>>>> +
>>>>> +  elif
>>>>> +      cat > $TMPC <<EOF &&
>>>>> +#include <xenctrl.h>
>>>>> +#include <xenstore.h>
>>>>> +#include <stdint.h>
>>>>> +#include <xen/hvm/hvm_info_table.h>
>>>>> +#if !defined(HVM_MAX_VCPUS)
>>>>> +# error HVM_MAX_VCPUS not defined
>>>>> +#endif
>>>>> +int main(void) {
>>>>> +  xc_interface *xc;
>>>>> +  xs_daemon_open();
>>>>> +  xc = xc_interface_open(0, 0, 0);
>>>>> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
>>>>> +  xc_gnttab_open(NULL, 0);
>>>>> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>>>> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>>>>    return 0;
>>>>>  }
>>>>>  EOF
>>>>> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
>>>>>      echo "Target Sparc Arch $sparc_cpu"
>>>>>  fi
>>>>>  echo "xen support       $xen"
>>>>> +if test "$xen" = "yes" ; then
>>>>> +  echo "xen ctrl version  $xen_ctrl_version"
>>>>> +fi
>>>>>  echo "brlapi support    $brlapi"
>>>>>  echo "bluez  support    $bluez"
>>>>>  echo "Documentation     $docs"
>>>>> diff --git a/include/hw/xen/xen_common.h
>> b/include/hw/xen/xen_common.h
>>>>> index 95612a4..519696f 100644
>>>>> --- a/include/hw/xen/xen_common.h
>>>>> +++ b/include/hw/xen/xen_common.h
>>>>> @@ -16,7 +16,9 @@
>>>>>
>>>>>  #include "hw/hw.h"
>>>>>  #include "hw/xen/xen.h"
>>>>> +#include "hw/pci/pci.h"
>>>>>  #include "qemu/queue.h"
>>>>> +#include "trace.h"
>>>>>
>>>>>  /*
>>>>>   * We don't support Xen prior to 3.3.0.
>>>>> @@ -179,4 +181,225 @@ static inline int
>> xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
>>>>>  }
>>>>>  #endif
>>>>>
>>>>> +/* Xen before 4.5 */
>>>>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
>>>>> +
>>>>> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>>>> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>>>> +#endif
>>>>> +
>>>>> +#define IOREQ_TYPE_PCI_CONFIG 2
>>>>> +
>>>>> +typedef uint32_t ioservid_t;
>>>>> +
>>>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>>>> +                                          ioservid_t ioservid,
>>>>> +                                          MemoryRegionSection *section)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
>> dom,
>>>>> +                                            ioservid_t ioservid,
>>>>> +                                            MemoryRegionSection *section)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>>>> +                                      ioservid_t ioservid,
>>>>> +                                      MemoryRegionSection *section)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>>>> +                                        ioservid_t ioservid,
>>>>> +                                        MemoryRegionSection *section)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>>>> +                                  ioservid_t ioservid,
>>>>> +                                  PCIDevice *pci_dev)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>>>> +                                    ioservid_t ioservid,
>>>>> +                                    PCIDevice *pci_dev)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>>>> +                                          ioservid_t *ioservid)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>>>> +                                            ioservid_t ioservid)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>>>> +                                            ioservid_t ioservid,
>>>>> +                                            xen_pfn_t *ioreq_pfn,
>>>>> +                                            xen_pfn_t *bufioreq_pfn,
>>>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>>>> +{
>>>>> +    unsigned long param;
>>>>> +    int rc;
>>>>> +
>>>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN,
>> &param);
>>>>> +    if (rc < 0) {
>>>>> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>> +    *ioreq_pfn = param;
>>>>> +
>>>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN,
>> &param);
>>>>> +    if (rc < 0) {
>>>>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>> +    *bufioreq_pfn = param;
>>>>> +
>>>>> +    rc = xc_get_hvm_param(xc, dom,
>> HVM_PARAM_BUFIOREQ_EVTCHN,
>>>>> +                          &param);
>>>>> +    if (rc < 0) {
>>>>> +        fprintf(stderr, "failed to get
>> HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>> +    *bufioreq_evtchn = param;
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>>>> +                                             ioservid_t ioservid,
>>>>> +                                             bool enable)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +/* Xen 4.5 */
>>>>> +#else
>>>>> +
>>>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>>>> +                                          ioservid_t ioservid,
>>>>> +                                          MemoryRegionSection *section)
>>>>> +{
>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>> +
>>>>> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
>>>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
>>>>> +                                        start_addr, end_addr);
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
>> dom,
>>>>> +                                            ioservid_t ioservid,
>>>>> +                                            MemoryRegionSection *section)
>>>>> +{
>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>> +
>>>>> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
>>>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
>>>>> +                                            start_addr, end_addr);
>>>>> +}
>>>>> +
>>>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>>>> +                                      ioservid_t ioservid,
>>>>> +                                      MemoryRegionSection *section)
>>>>> +{
>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>> +
>>>>> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
>>>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
>>>>> +                                        start_addr, end_addr);
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>>>> +                                        ioservid_t ioservid,
>>>>> +                                        MemoryRegionSection *section)
>>>>> +{
>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>> +
>>>>> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
>>>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
>>>>> +                                            start_addr, end_addr);
>>>>> +}
>>>>> +
>>>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>>>> +                                  ioservid_t ioservid,
>>>>> +                                  PCIDevice *pci_dev)
>>>>> +{
>>>>> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>>>> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>>>> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
>>>>> +                                      0, pci_bus_num(pci_dev->bus),
>>>>> +                                      PCI_SLOT(pci_dev->devfn),
>>>>> +                                      PCI_FUNC(pci_dev->devfn));
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>>>> +                                    ioservid_t ioservid,
>>>>> +                                    PCIDevice *pci_dev)
>>>>> +{
>>>>> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>>>> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>>>> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
>>>>> +                                          0, pci_bus_num(pci_dev->bus),
>>>>> +                                          PCI_SLOT(pci_dev->devfn),
>>>>> +                                          PCI_FUNC(pci_dev->devfn));
>>>>> +}
>>>>> +
>>>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>>>> +                                          ioservid_t *ioservid)
>>>>> +{
>>>>> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
>>>>> +
>>>>> +    if (rc == 0) {
>>>>> +        trace_xen_ioreq_server_create(*ioservid);
>>>>> +    }
>>>>> +
>>>>> +    return rc;
>>>>> +}
>>>>> +
>>>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>>>> +                                            ioservid_t ioservid)
>>>>> +{
>>>>> +    trace_xen_ioreq_server_destroy(ioservid);
>>>>> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
>>>>> +}
>>>>> +
>>>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>>>> +                                            ioservid_t ioservid,
>>>>> +                                            xen_pfn_t *ioreq_pfn,
>>>>> +                                            xen_pfn_t *bufioreq_pfn,
>>>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>>>> +{
>>>>> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
>>>>> +                                        ioreq_pfn, bufioreq_pfn,
>>>>> +                                        bufioreq_evtchn);
>>>>> +}
>>>>> +
>>>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>>>> +                                             ioservid_t ioservid,
>>>>> +                                             bool enable)
>>>>> +{
>>>>> +    trace_xen_ioreq_server_state(ioservid, enable);
>>>>> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
>>>>> +}
>>>>> +
>>>>> +#endif
>>>>> +
>>>>>  #endif /* QEMU_HW_XEN_COMMON_H */
>>>>> diff --git a/trace-events b/trace-events
>>>>> index b5722ea..abd1118 100644
>>>>> --- a/trace-events
>>>>> +++ b/trace-events
>>>>> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label,
>> uint32_t num) "Number of %s pages:
>>>>>  # xen-hvm.c
>>>>>  xen_ram_alloc(unsigned long ram_addr, unsigned long size)
>> "requested: %#lx, size %#lx"
>>>>>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool
>> log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>>>>> +xen_ioreq_server_create(uint32_t id) "id: %u"
>>>>> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
>>>>> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
>>>>> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t
>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t
>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
>> "id: %u bdf: %02x.%02x.%02x"
>>>>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
>> "id: %u bdf: %02x.%02x.%02x"
>>>>>
>>>>>  # xen-mapcache.c
>>>>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
>>>>> diff --git a/xen-hvm.c b/xen-hvm.c
>>>>> index 7548794..31cb3ca 100644
>>>>> --- a/xen-hvm.c
>>>>> +++ b/xen-hvm.c
>>>>> @@ -85,9 +85,6 @@ static inline ioreq_t
>> *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>>>>>  }
>>>>>  #  define FMT_ioreq_size "u"
>>>>>  #endif
>>>>> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>>>> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>>>> -#endif
>>>>>
>>>>>  #define BUFFER_IO_MAX_DELAY  100
>>>>>
>>>>> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
>>>>>  } XenPhysmap;
>>>>>
>>>>>  typedef struct XenIOState {
>>>>> +    ioservid_t ioservid;
>>>>>      shared_iopage_t *shared_page;
>>>>>      shared_vmport_iopage_t *shared_vmport_page;
>>>>>      buffered_iopage_t *buffered_io_page;
>>>>> @@ -117,6 +115,8 @@ typedef struct XenIOState {
>>>>>
>>>>>      struct xs_handle *xenstore;
>>>>>      MemoryListener memory_listener;
>>>>> +    MemoryListener io_listener;
>>>>> +    DeviceListener device_listener;
>>>>>      QLIST_HEAD(, XenPhysmap) physmap;
>>>>>      hwaddr free_phys_offset;
>>>>>      const XenPhysmap *log_for_dirtybit;
>>>>> @@ -467,12 +467,23 @@ static void xen_set_memory(struct
>> MemoryListener *listener,
>>>>>      bool log_dirty = memory_region_is_logging(section->mr);
>>>>>      hvmmem_type_t mem_type;
>>>>>
>>>>> +    if (section->mr == &ram_memory) {
>>>>> +        return;
>>>>> +    } else {
>>>>> +        if (add) {
>>>>> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
>>>>> +                                   section);
>>>>> +        } else {
>>>>> +            xen_unmap_memory_section(xen_xc, xen_domid, state-
>>> ioservid,
>>>>> +                                     section);
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>      if (!memory_region_is_ram(section->mr)) {
>>>>>          return;
>>>>>      }
>>>>>
>>>>> -    if (!(section->mr != &ram_memory
>>>>> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
>>>>> +    if (log_dirty != add) {
>>>>>          return;
>>>>>      }
>>>>>
>>>>> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener
>> *listener,
>>>>>      memory_region_unref(section->mr);
>>>>>  }
>>>>>
>>>>> +static void xen_io_add(MemoryListener *listener,
>>>>> +                       MemoryRegionSection *section)
>>>>> +{
>>>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>>>> +
>>>>> +    memory_region_ref(section->mr);
>>>>> +
>>>>> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
>>>>> +}
>>>>> +
>>>>> +static void xen_io_del(MemoryListener *listener,
>>>>> +                       MemoryRegionSection *section)
>>>>> +{
>>>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>>>> +
>>>>> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid,
>> section);
>>>>> +
>>>>> +    memory_region_unref(section->mr);
>>>>> +}
>>>>> +
>>>>> +static void xen_device_realize(DeviceListener *listener,
>>>>> +			       DeviceState *dev)
>>>>> +{
>>>>> +    XenIOState *state = container_of(listener, XenIOState,
>> device_listener);
>>>>> +
>>>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>>> +
>>>>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +static void xen_device_unrealize(DeviceListener *listener,
>>>>> +				 DeviceState *dev)
>>>>> +{
>>>>> +    XenIOState *state = container_of(listener, XenIOState,
>> device_listener);
>>>>> +
>>>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>>> +
>>>>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>  static void xen_sync_dirty_bitmap(XenIOState *state,
>>>>>                                    hwaddr start_addr,
>>>>>                                    ram_addr_t size)
>>>>> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener =
>> {
>>>>>      .priority = 10,
>>>>>  };
>>>>>
>>>>> +static MemoryListener xen_io_listener = {
>>>>> +    .region_add = xen_io_add,
>>>>> +    .region_del = xen_io_del,
>>>>> +    .priority = 10,
>>>>> +};
>>>>> +
>>>>> +static DeviceListener xen_device_listener = {
>>>>> +    .realize = xen_device_realize,
>>>>> +    .unrealize = xen_device_unrealize,
>>>>> +};
>>>>> +
>>>>>  /* get the ioreq packets from share mem */
>>>>>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState
>> *state, int vcpu)
>>>>>  {
>>>>> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state,
>> ioreq_t *req)
>>>>>          case IOREQ_TYPE_INVALIDATE:
>>>>>              xen_invalidate_map_cache();
>>>>>              break;
>>>>> +        case IOREQ_TYPE_PCI_CONFIG: {
>>>>> +            uint32_t sbdf = req->addr >> 32;
>>>>> +            uint32_t val;
>>>>> +
>>>>> +            /* Fake a write to port 0xCF8 so that
>>>>> +             * the config space access will target the
>>>>> +             * correct device model.
>>>>> +             */
>>>>> +            val = (1u << 31) |
>>>>> +                  ((req->addr & 0x0f00) << 16) |
>>>>> +                  ((sbdf & 0xffff) << 8) |
>>>>> +                  (req->addr & 0xfc);
>>>>> +            do_outp(0xcf8, 4, val);
>>>>> +
>>>>> +            /* Now issue the config space access via
>>>>> +             * port 0xCFC
>>>>> +             */
>>>>> +            req->addr = 0xcfc | (req->addr & 0x03);
>>>>> +            cpu_ioreq_pio(req);
>>>>> +            break;
>>>>> +        }
>>>>>          default:
>>>>>              hw_error("Invalid ioreq type 0x%x\n", req->type);
>>>>>      }
>>>>> @@ -993,9 +1080,15 @@ static void
>> xen_main_loop_prepare(XenIOState *state)
>>>>>  static void xen_hvm_change_state_handler(void *opaque, int running,
>>>>>                                           RunState rstate)
>>>>>  {
>>>>> +    XenIOState *state = opaque;
>>>>> +
>>>>>      if (running) {
>>>>> -        xen_main_loop_prepare((XenIOState *)opaque);
>>>>> +        xen_main_loop_prepare(state);
>>>>>      }
>>>>> +
>>>>> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
>>>>> +                               state->ioservid,
>>>>> +                               (rstate == RUN_STATE_RUNNING));
>>>>>  }
>>>>>
>>>>>  static void xen_exit_notifier(Notifier *n, void *data)
>>>>> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>                   MemoryRegion **ram_memory)
>>>>>  {
>>>>>      int i, rc;
>>>>> -    unsigned long ioreq_pfn;
>>>>> -    unsigned long bufioreq_evtchn;
>>>>> +    xen_pfn_t ioreq_pfn;
>>>>> +    xen_pfn_t bufioreq_pfn;
>>>>> +    evtchn_port_t bufioreq_evtchn;
>>>>>      XenIOState *state;
>>>>>
>>>>>      state = g_malloc0(sizeof (XenIOState));
>>>>> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>          return -1;
>>>>>      }
>>>>>
>>>>> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state-
>>> ioservid);
>>>>> +    if (rc < 0) {
>>>>> +        perror("xen: ioreq server create");
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>>      state->exit.notify = xen_exit_notifier;
>>>>>      qemu_add_exit_notifier(&state->exit);
>>>>>
>>>>> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>      state->wakeup.notify = xen_wakeup_notifier;
>>>>>      qemu_register_wakeup_notifier(&state->wakeup);
>>>>>
>>>>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN,
>> &ioreq_pfn);
>>>>> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state-
>>> ioservid,
>>>>> +                                   &ioreq_pfn, &bufioreq_pfn,
>>>>> +                                   &bufioreq_evtchn);
>>>>> +    if (rc < 0) {
>>>>> +        hw_error("failed to get ioreq server info: error %d handle="
>> XC_INTERFACE_FMT,
>>>>> +                 errno, xen_xc);
>>>>> +    }
>>>>> +
>>>>>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
>>>>> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
>>>>> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
>>>>> +
>>>>>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid,
>> XC_PAGE_SIZE,
>>>>>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
>>>>>      if (state->shared_page == NULL) {
>>>>> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno,
>> rc);
>>>>>      }
>>>>>
>>>>> -    xc_get_hvm_param(xen_xc, xen_domid,
>> HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
>>>>> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
>>>>> -    state->buffered_io_page = xc_map_foreign_range(xen_xc,
>> xen_domid, XC_PAGE_SIZE,
>>>>> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
>>>>> +    state->buffered_io_page = xc_map_foreign_range(xen_xc,
>> xen_domid,
>>>>> +                                                   XC_PAGE_SIZE,
>>>>> +                                                   PROT_READ|PROT_WRITE,
>>>>> +                                                   bufioreq_pfn);
>>>>>      if (state->buffered_io_page == NULL) {
>>>>>          hw_error("map buffered IO page returned error %d", errno);
>>>>>      }
>>>>> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>      /* Note: cpus is empty at this point in init */
>>>>>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
>>>>>
>>>>> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state-
>>> ioservid, true);
>>>>> +    if (rc < 0) {
>>>>> +        hw_error("failed to enable ioreq server info: error %d handle="
>> XC_INTERFACE_FMT,
>>>>> +                 errno, xen_xc);
>>>>> +    }
>>>>> +
>>>>>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof
>> (evtchn_port_t));
>>>>>
>>>>>      /* FIXME: how about if we overflow the page here? */
>>>>> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>>>>                                          xen_vcpu_eport(state->shared_page, i));
>>>>>          if (rc == -1) {
>>>>> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>>>> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
>>>>>              return -1;
>>>>>          }
>>>>>          state->ioreq_local_port[i] = rc;
>>>>>      }
>>>>>
>>>>> -    rc = xc_get_hvm_param(xen_xc, xen_domid,
>> HVM_PARAM_BUFIOREQ_EVTCHN,
>>>>> -            &bufioreq_evtchn);
>>>>> -    if (rc < 0) {
>>>>> -        fprintf(stderr, "failed to get
>> HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>>>> -        return -1;
>>>>> -    }
>>>>>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>>>> -            (uint32_t)bufioreq_evtchn);
>>>>> +                                    bufioreq_evtchn);
>>>>>      if (rc == -1) {
>>>>> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>>>> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
>>>>>          return -1;
>>>>>      }
>>>>>      state->bufioreq_local_port = rc;
>>>>> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>      memory_listener_register(&state->memory_listener,
>> &address_space_memory);
>>>>>      state->log_for_dirtybit = NULL;
>>>>>
>>>>> +    state->io_listener = xen_io_listener;
>>>>> +    memory_listener_register(&state->io_listener, &address_space_io);
>>>>> +
>>>>> +    state->device_listener = xen_device_listener;
>>>>> +    device_listener_register(&state->device_listener);
>>>>> +
>>>>>      /* Initialize backend core & drivers */
>>>>>      if (xen_be_init() != 0) {
>>>>>          fprintf(stderr, "%s: xen backend core setup failed\n",
>> __FUNCTION__);
>>>>>
>>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
  2015-01-29 19:14           ` Don Slutz
@ 2015-01-29 19:41             ` Don Slutz
  2015-01-30 10:23               ` Paul Durrant
  2015-01-30 10:23               ` [Qemu-devel] " Paul Durrant
  2015-01-29 19:41             ` Don Slutz
                               ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: Don Slutz @ 2015-01-29 19:41 UTC (permalink / raw)
  To: Paul Durrant, Don Slutz, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, xen-devel, Gerd Hoffmann,
	Stefan Hajnoczi, Paolo Bonzini

>> On 01/29/15 07:09, Paul Durrant wrote:
...
>> Given that IIRC you are using a new dedicated IOREQ type, I
>> think there needs to be something that allows an emulator to
>> register for this IOREQ type. How about adding a new type to
>> those defined for HVMOP_map_io_range_to_ioreq_server for your
>> case? (In your case the start and end values in the hypercall
>> would be meaningless but it could be used to steer
>> hvm_select_ioreq_server() into sending all emulation requests or
>> your new type to QEMU.
>>

This is an interesting idea.  Will need to spend more time on it.


>> Actually such a mechanism could be used
>> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
>> patches, they are going nowhere. Upstream QEMU (as default) used
>> to ignore them anyway, which is why I didn't bother with such a
>> patch to Xen before but since you now need one maybe you could
>> add that too?
>>

I think it would not be that hard.  Will look into adding it.


Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
It looks like to me that if a vpcu (like 0) needs to wait for the
2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
and return as if the ioreq is done.  What am I missing?

   -Don Slutz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
  2015-01-29 19:14           ` Don Slutz
  2015-01-29 19:41             ` [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT Don Slutz
@ 2015-01-29 19:41             ` Don Slutz
  2015-01-30  2:40             ` [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available Don Slutz
  2015-01-30  2:40             ` Don Slutz
  3 siblings, 0 replies; 20+ messages in thread
From: Don Slutz @ 2015-01-29 19:41 UTC (permalink / raw)
  To: Paul Durrant, Don Slutz, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, xen-devel, Gerd Hoffmann,
	Stefan Hajnoczi, Paolo Bonzini

>> On 01/29/15 07:09, Paul Durrant wrote:
...
>> Given that IIRC you are using a new dedicated IOREQ type, I
>> think there needs to be something that allows an emulator to
>> register for this IOREQ type. How about adding a new type to
>> those defined for HVMOP_map_io_range_to_ioreq_server for your
>> case? (In your case the start and end values in the hypercall
>> would be meaningless but it could be used to steer
>> hvm_select_ioreq_server() into sending all emulation requests or
>> your new type to QEMU.
>>

This is an interesting idea.  Will need to spend more time on it.


>> Actually such a mechanism could be used
>> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
>> patches, they are going nowhere. Upstream QEMU (as default) used
>> to ignore them anyway, which is why I didn't bother with such a
>> patch to Xen before but since you now need one maybe you could
>> add that too?
>>

I think it would not be that hard.  Will look into adding it.


Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
It looks like to me that if a vpcu (like 0) needs to wait for the
2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
and return as if the ioreq is done.  What am I missing?

   -Don Slutz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available
  2015-01-29 19:14           ` Don Slutz
                               ` (2 preceding siblings ...)
  2015-01-30  2:40             ` [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available Don Slutz
@ 2015-01-30  2:40             ` Don Slutz
  3 siblings, 0 replies; 20+ messages in thread
From: Don Slutz @ 2015-01-30  2:40 UTC (permalink / raw)
  To: Paul Durrant, Don Slutz, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, xen-devel, Gerd Hoffmann,
	Stefan Hajnoczi, Paolo Bonzini

On 01/29/15 14:14, Don Slutz wrote:
> On 01/29/15 07:09, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Don Slutz [mailto:dslutz@verizon.com]
>>> Sent: 29 January 2015 00:58
>>> To: Don Slutz; Paul Durrant; qemu-devel@nongnu.org; Stefano Stabellini
>>> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
>>> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini
>>> Subject: Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API
>>> when available
>>>

...

> 
> You can see that the guest is still waiting for the inl from 0x00000cfe.
> 
> 
> 
...

The one line patch:


>From 5269b1fb947f207057ca69e320c79b397db3e8f5 Mon Sep 17 00:00:00 2001
From: Don Slutz <dslutz@verizon.com>
Date: Thu, 29 Jan 2015 21:24:05 -0500
Subject: [PATCH] Prevent hang if read of HVM_PARAM_IOREQ_PFN,
 HVM_PARAM_BUFIOREQ_PFN, HVM_PARAM_BUFIOREQ_EVTCHN is done
 before hvmloader starts.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
 xen/arch/x86/hvm/hvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bad410e..7ac4b45 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -993,7 +993,7 @@ static int hvm_create_ioreq_server(struct domain *d,
domid_t domid,
     spin_lock(&d->arch.hvm_domain.ioreq_server.lock);

     rc = -EEXIST;
-    if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
+    if ( is_default && !list_empty(&d->arch.hvm_domain.ioreq_server.list) )
         goto fail2;

     rc = hvm_ioreq_server_init(s, d, domid, is_default, handle_bufioreq,
-- 
1.7.11.7


Does "fix this" but no idea if this is the way to go.

    -Don Slutz

>    -Don Slutz
> 
> 
>>   Paul
>>
>>>    -Don Slutz
>>>
>>>
>>>>     -Don Slutz
>>>>
>>>>
>>>>> So far I have tracked it back to hvm_select_ioreq_server()
>>>>> which selects the "default_ioreq_server".  Since I have one 1
>>>>> QEMU, it is both the "default_ioreq_server" and an enabled
>>>>> 2nd ioreq_server.  I am continuing to understand why my changes
>>>>> are causing this.  More below.
>>>>>
>>>>> This patch causes QEMU to only call xc_evtchn_bind_interdomain()
>>>>> for the enabled 2nd ioreq_server.  So when (if)
>>>>> hvm_select_ioreq_server() selects the "default_ioreq_server", the
>>>>> guest hangs on an I/O.
>>>>>
>>>>> Using the debug key 'e':
>>>>>
>>>>> (XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
>>>>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
>>>>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>>>>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>>>>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
>>>>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
>>>>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
>>>>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
>>>>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
>>>>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
>>>>> (XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
>>>>> (XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
>>>>> (XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
>>>>> (XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
>>>>> (XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
>>>>> (XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
>>>>> (XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
>>>>> (XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
>>>>> (XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
>>>>> (XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
>>>>> (XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
>>>>> (XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
>>>>> (XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
>>>>> (XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
>>>>> (XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
>>>>> (XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
>>>>> (XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
>>>>> (XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
>>>>> (XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
>>>>> (XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
>>>>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
>>>>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>>>>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>>>>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
>>>>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
>>>>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
>>>>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
>>>>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
>>>>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
>>>>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
>>>>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
>>>>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
>>>>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0
>>>>>
>>>>> You can see that domain 1 has only half of it's event channels
>>>>> fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
>>>>> does:
>>>>>
>>>>>             notify_via_xen_event_channel(d, port);
>>>>>
>>>>> Nothing happens and you hang in hvm_wait_for_io() forever.
>>>>>
>>>>>
>>>>> This does raise the questions:
>>>>>
>>>>> 1) Does this patch causes extra event channels to be created
>>>>>    that cannot be used?
>>>>>
>>>>> 2) Should the "default_ioreq_server" be deleted?
>>>>>
>>>>>
>>>>> Not sure the right way to go.
>>>>>
>>>>>     -Don Slutz
>>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>>>> Cc: Peter Maydell <peter.maydell@linaro.org>
>>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>>>> Cc: Michael Tokarev <mjt@tls.msk.ru>
>>>>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>>>>> Cc: Stefan Weil <sw@weilnetz.de>
>>>>>> Cc: Olaf Hering <olaf@aepfle.de>
>>>>>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>>>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> Cc: Alexander Graf <agraf@suse.de>
>>>>>> ---
>>>>>>  configure                   |   29 ++++++
>>>>>>  include/hw/xen/xen_common.h |  223
>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>>  trace-events                |    9 ++
>>>>>>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
>>>>>>  4 files changed, 399 insertions(+), 22 deletions(-)
>>>>>>
>>>>>> diff --git a/configure b/configure
>>>>>> index 47048f0..b1f8c2a 100755
>>>>>> --- a/configure
>>>>>> +++ b/configure
>>>>>> @@ -1877,6 +1877,32 @@ int main(void) {
>>>>>>    xc_gnttab_open(NULL, 0);
>>>>>>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>>>>>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>>>>> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
>>>>>> +  return 0;
>>>>>> +}
>>>>>> +EOF
>>>>>> +      compile_prog "" "$xen_libs"
>>>>>> +    then
>>>>>> +    xen_ctrl_version=450
>>>>>> +    xen=yes
>>>>>> +
>>>>>> +  elif
>>>>>> +      cat > $TMPC <<EOF &&
>>>>>> +#include <xenctrl.h>
>>>>>> +#include <xenstore.h>
>>>>>> +#include <stdint.h>
>>>>>> +#include <xen/hvm/hvm_info_table.h>
>>>>>> +#if !defined(HVM_MAX_VCPUS)
>>>>>> +# error HVM_MAX_VCPUS not defined
>>>>>> +#endif
>>>>>> +int main(void) {
>>>>>> +  xc_interface *xc;
>>>>>> +  xs_daemon_open();
>>>>>> +  xc = xc_interface_open(0, 0, 0);
>>>>>> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
>>>>>> +  xc_gnttab_open(NULL, 0);
>>>>>> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>>>>> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>>>>>    return 0;
>>>>>>  }
>>>>>>  EOF
>>>>>> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
>>>>>>      echo "Target Sparc Arch $sparc_cpu"
>>>>>>  fi
>>>>>>  echo "xen support       $xen"
>>>>>> +if test "$xen" = "yes" ; then
>>>>>> +  echo "xen ctrl version  $xen_ctrl_version"
>>>>>> +fi
>>>>>>  echo "brlapi support    $brlapi"
>>>>>>  echo "bluez  support    $bluez"
>>>>>>  echo "Documentation     $docs"
>>>>>> diff --git a/include/hw/xen/xen_common.h
>>> b/include/hw/xen/xen_common.h
>>>>>> index 95612a4..519696f 100644
>>>>>> --- a/include/hw/xen/xen_common.h
>>>>>> +++ b/include/hw/xen/xen_common.h
>>>>>> @@ -16,7 +16,9 @@
>>>>>>
>>>>>>  #include "hw/hw.h"
>>>>>>  #include "hw/xen/xen.h"
>>>>>> +#include "hw/pci/pci.h"
>>>>>>  #include "qemu/queue.h"
>>>>>> +#include "trace.h"
>>>>>>
>>>>>>  /*
>>>>>>   * We don't support Xen prior to 3.3.0.
>>>>>> @@ -179,4 +181,225 @@ static inline int
>>> xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
>>>>>>  }
>>>>>>  #endif
>>>>>>
>>>>>> +/* Xen before 4.5 */
>>>>>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
>>>>>> +
>>>>>> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>>>>> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>>>>> +#endif
>>>>>> +
>>>>>> +#define IOREQ_TYPE_PCI_CONFIG 2
>>>>>> +
>>>>>> +typedef uint32_t ioservid_t;
>>>>>> +
>>>>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>>>>> +                                          ioservid_t ioservid,
>>>>>> +                                          MemoryRegionSection *section)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
>>> dom,
>>>>>> +                                            ioservid_t ioservid,
>>>>>> +                                            MemoryRegionSection *section)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>>>>> +                                      ioservid_t ioservid,
>>>>>> +                                      MemoryRegionSection *section)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>>>>> +                                        ioservid_t ioservid,
>>>>>> +                                        MemoryRegionSection *section)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>>>>> +                                  ioservid_t ioservid,
>>>>>> +                                  PCIDevice *pci_dev)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>>>>> +                                    ioservid_t ioservid,
>>>>>> +                                    PCIDevice *pci_dev)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>>>>> +                                          ioservid_t *ioservid)
>>>>>> +{
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>>>>> +                                            ioservid_t ioservid)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>>>>> +                                            ioservid_t ioservid,
>>>>>> +                                            xen_pfn_t *ioreq_pfn,
>>>>>> +                                            xen_pfn_t *bufioreq_pfn,
>>>>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>>>>> +{
>>>>>> +    unsigned long param;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN,
>>> &param);
>>>>>> +    if (rc < 0) {
>>>>>> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>> +    *ioreq_pfn = param;
>>>>>> +
>>>>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN,
>>> &param);
>>>>>> +    if (rc < 0) {
>>>>>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>> +    *bufioreq_pfn = param;
>>>>>> +
>>>>>> +    rc = xc_get_hvm_param(xc, dom,
>>> HVM_PARAM_BUFIOREQ_EVTCHN,
>>>>>> +                          &param);
>>>>>> +    if (rc < 0) {
>>>>>> +        fprintf(stderr, "failed to get
>>> HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>> +    *bufioreq_evtchn = param;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>>>>> +                                             ioservid_t ioservid,
>>>>>> +                                             bool enable)
>>>>>> +{
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Xen 4.5 */
>>>>>> +#else
>>>>>> +
>>>>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>>>>> +                                          ioservid_t ioservid,
>>>>>> +                                          MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>>> +
>>>>>> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
>>>>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
>>>>>> +                                        start_addr, end_addr);
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
>>> dom,
>>>>>> +                                            ioservid_t ioservid,
>>>>>> +                                            MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>>> +
>>>>>> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
>>>>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
>>>>>> +                                            start_addr, end_addr);
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>>>>> +                                      ioservid_t ioservid,
>>>>>> +                                      MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>>> +
>>>>>> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
>>>>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
>>>>>> +                                        start_addr, end_addr);
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>>>>> +                                        ioservid_t ioservid,
>>>>>> +                                        MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>>> +
>>>>>> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
>>>>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
>>>>>> +                                            start_addr, end_addr);
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>>>>> +                                  ioservid_t ioservid,
>>>>>> +                                  PCIDevice *pci_dev)
>>>>>> +{
>>>>>> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>>>>> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>>>>> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
>>>>>> +                                      0, pci_bus_num(pci_dev->bus),
>>>>>> +                                      PCI_SLOT(pci_dev->devfn),
>>>>>> +                                      PCI_FUNC(pci_dev->devfn));
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>>>>> +                                    ioservid_t ioservid,
>>>>>> +                                    PCIDevice *pci_dev)
>>>>>> +{
>>>>>> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>>>>> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>>>>> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
>>>>>> +                                          0, pci_bus_num(pci_dev->bus),
>>>>>> +                                          PCI_SLOT(pci_dev->devfn),
>>>>>> +                                          PCI_FUNC(pci_dev->devfn));
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>>>>> +                                          ioservid_t *ioservid)
>>>>>> +{
>>>>>> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
>>>>>> +
>>>>>> +    if (rc == 0) {
>>>>>> +        trace_xen_ioreq_server_create(*ioservid);
>>>>>> +    }
>>>>>> +
>>>>>> +    return rc;
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>>>>> +                                            ioservid_t ioservid)
>>>>>> +{
>>>>>> +    trace_xen_ioreq_server_destroy(ioservid);
>>>>>> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>>>>> +                                            ioservid_t ioservid,
>>>>>> +                                            xen_pfn_t *ioreq_pfn,
>>>>>> +                                            xen_pfn_t *bufioreq_pfn,
>>>>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>>>>> +{
>>>>>> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
>>>>>> +                                        ioreq_pfn, bufioreq_pfn,
>>>>>> +                                        bufioreq_evtchn);
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>>>>> +                                             ioservid_t ioservid,
>>>>>> +                                             bool enable)
>>>>>> +{
>>>>>> +    trace_xen_ioreq_server_state(ioservid, enable);
>>>>>> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
>>>>>> +}
>>>>>> +
>>>>>> +#endif
>>>>>> +
>>>>>>  #endif /* QEMU_HW_XEN_COMMON_H */
>>>>>> diff --git a/trace-events b/trace-events
>>>>>> index b5722ea..abd1118 100644
>>>>>> --- a/trace-events
>>>>>> +++ b/trace-events
>>>>>> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label,
>>> uint32_t num) "Number of %s pages:
>>>>>>  # xen-hvm.c
>>>>>>  xen_ram_alloc(unsigned long ram_addr, unsigned long size)
>>> "requested: %#lx, size %#lx"
>>>>>>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool
>>> log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>>>>>> +xen_ioreq_server_create(uint32_t id) "id: %u"
>>>>>> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
>>>>>> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
>>>>>> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>>> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>>> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t
>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>>> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t
>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
>>> "id: %u bdf: %02x.%02x.%02x"
>>>>>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
>>> "id: %u bdf: %02x.%02x.%02x"
>>>>>>
>>>>>>  # xen-mapcache.c
>>>>>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
>>>>>> diff --git a/xen-hvm.c b/xen-hvm.c
>>>>>> index 7548794..31cb3ca 100644
>>>>>> --- a/xen-hvm.c
>>>>>> +++ b/xen-hvm.c
>>>>>> @@ -85,9 +85,6 @@ static inline ioreq_t
>>> *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>>>>>>  }
>>>>>>  #  define FMT_ioreq_size "u"
>>>>>>  #endif
>>>>>> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>>>>> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>>>>> -#endif
>>>>>>
>>>>>>  #define BUFFER_IO_MAX_DELAY  100
>>>>>>
>>>>>> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
>>>>>>  } XenPhysmap;
>>>>>>
>>>>>>  typedef struct XenIOState {
>>>>>> +    ioservid_t ioservid;
>>>>>>      shared_iopage_t *shared_page;
>>>>>>      shared_vmport_iopage_t *shared_vmport_page;
>>>>>>      buffered_iopage_t *buffered_io_page;
>>>>>> @@ -117,6 +115,8 @@ typedef struct XenIOState {
>>>>>>
>>>>>>      struct xs_handle *xenstore;
>>>>>>      MemoryListener memory_listener;
>>>>>> +    MemoryListener io_listener;
>>>>>> +    DeviceListener device_listener;
>>>>>>      QLIST_HEAD(, XenPhysmap) physmap;
>>>>>>      hwaddr free_phys_offset;
>>>>>>      const XenPhysmap *log_for_dirtybit;
>>>>>> @@ -467,12 +467,23 @@ static void xen_set_memory(struct
>>> MemoryListener *listener,
>>>>>>      bool log_dirty = memory_region_is_logging(section->mr);
>>>>>>      hvmmem_type_t mem_type;
>>>>>>
>>>>>> +    if (section->mr == &ram_memory) {
>>>>>> +        return;
>>>>>> +    } else {
>>>>>> +        if (add) {
>>>>>> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
>>>>>> +                                   section);
>>>>>> +        } else {
>>>>>> +            xen_unmap_memory_section(xen_xc, xen_domid, state-
>>>> ioservid,
>>>>>> +                                     section);
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>      if (!memory_region_is_ram(section->mr)) {
>>>>>>          return;
>>>>>>      }
>>>>>>
>>>>>> -    if (!(section->mr != &ram_memory
>>>>>> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
>>>>>> +    if (log_dirty != add) {
>>>>>>          return;
>>>>>>      }
>>>>>>
>>>>>> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener
>>> *listener,
>>>>>>      memory_region_unref(section->mr);
>>>>>>  }
>>>>>>
>>>>>> +static void xen_io_add(MemoryListener *listener,
>>>>>> +                       MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>>>>> +
>>>>>> +    memory_region_ref(section->mr);
>>>>>> +
>>>>>> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
>>>>>> +}
>>>>>> +
>>>>>> +static void xen_io_del(MemoryListener *listener,
>>>>>> +                       MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>>>>> +
>>>>>> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid,
>>> section);
>>>>>> +
>>>>>> +    memory_region_unref(section->mr);
>>>>>> +}
>>>>>> +
>>>>>> +static void xen_device_realize(DeviceListener *listener,
>>>>>> +			       DeviceState *dev)
>>>>>> +{
>>>>>> +    XenIOState *state = container_of(listener, XenIOState,
>>> device_listener);
>>>>>> +
>>>>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>>>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>>>> +
>>>>>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static void xen_device_unrealize(DeviceListener *listener,
>>>>>> +				 DeviceState *dev)
>>>>>> +{
>>>>>> +    XenIOState *state = container_of(listener, XenIOState,
>>> device_listener);
>>>>>> +
>>>>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>>>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>>>> +
>>>>>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>>  static void xen_sync_dirty_bitmap(XenIOState *state,
>>>>>>                                    hwaddr start_addr,
>>>>>>                                    ram_addr_t size)
>>>>>> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener =
>>> {
>>>>>>      .priority = 10,
>>>>>>  };
>>>>>>
>>>>>> +static MemoryListener xen_io_listener = {
>>>>>> +    .region_add = xen_io_add,
>>>>>> +    .region_del = xen_io_del,
>>>>>> +    .priority = 10,
>>>>>> +};
>>>>>> +
>>>>>> +static DeviceListener xen_device_listener = {
>>>>>> +    .realize = xen_device_realize,
>>>>>> +    .unrealize = xen_device_unrealize,
>>>>>> +};
>>>>>> +
>>>>>>  /* get the ioreq packets from share mem */
>>>>>>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState
>>> *state, int vcpu)
>>>>>>  {
>>>>>> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state,
>>> ioreq_t *req)
>>>>>>          case IOREQ_TYPE_INVALIDATE:
>>>>>>              xen_invalidate_map_cache();
>>>>>>              break;
>>>>>> +        case IOREQ_TYPE_PCI_CONFIG: {
>>>>>> +            uint32_t sbdf = req->addr >> 32;
>>>>>> +            uint32_t val;
>>>>>> +
>>>>>> +            /* Fake a write to port 0xCF8 so that
>>>>>> +             * the config space access will target the
>>>>>> +             * correct device model.
>>>>>> +             */
>>>>>> +            val = (1u << 31) |
>>>>>> +                  ((req->addr & 0x0f00) << 16) |
>>>>>> +                  ((sbdf & 0xffff) << 8) |
>>>>>> +                  (req->addr & 0xfc);
>>>>>> +            do_outp(0xcf8, 4, val);
>>>>>> +
>>>>>> +            /* Now issue the config space access via
>>>>>> +             * port 0xCFC
>>>>>> +             */
>>>>>> +            req->addr = 0xcfc | (req->addr & 0x03);
>>>>>> +            cpu_ioreq_pio(req);
>>>>>> +            break;
>>>>>> +        }
>>>>>>          default:
>>>>>>              hw_error("Invalid ioreq type 0x%x\n", req->type);
>>>>>>      }
>>>>>> @@ -993,9 +1080,15 @@ static void
>>> xen_main_loop_prepare(XenIOState *state)
>>>>>>  static void xen_hvm_change_state_handler(void *opaque, int running,
>>>>>>                                           RunState rstate)
>>>>>>  {
>>>>>> +    XenIOState *state = opaque;
>>>>>> +
>>>>>>      if (running) {
>>>>>> -        xen_main_loop_prepare((XenIOState *)opaque);
>>>>>> +        xen_main_loop_prepare(state);
>>>>>>      }
>>>>>> +
>>>>>> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
>>>>>> +                               state->ioservid,
>>>>>> +                               (rstate == RUN_STATE_RUNNING));
>>>>>>  }
>>>>>>
>>>>>>  static void xen_exit_notifier(Notifier *n, void *data)
>>>>>> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>                   MemoryRegion **ram_memory)
>>>>>>  {
>>>>>>      int i, rc;
>>>>>> -    unsigned long ioreq_pfn;
>>>>>> -    unsigned long bufioreq_evtchn;
>>>>>> +    xen_pfn_t ioreq_pfn;
>>>>>> +    xen_pfn_t bufioreq_pfn;
>>>>>> +    evtchn_port_t bufioreq_evtchn;
>>>>>>      XenIOState *state;
>>>>>>
>>>>>>      state = g_malloc0(sizeof (XenIOState));
>>>>>> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>          return -1;
>>>>>>      }
>>>>>>
>>>>>> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state-
>>>> ioservid);
>>>>>> +    if (rc < 0) {
>>>>>> +        perror("xen: ioreq server create");
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>>      state->exit.notify = xen_exit_notifier;
>>>>>>      qemu_add_exit_notifier(&state->exit);
>>>>>>
>>>>>> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>      state->wakeup.notify = xen_wakeup_notifier;
>>>>>>      qemu_register_wakeup_notifier(&state->wakeup);
>>>>>>
>>>>>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN,
>>> &ioreq_pfn);
>>>>>> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state-
>>>> ioservid,
>>>>>> +                                   &ioreq_pfn, &bufioreq_pfn,
>>>>>> +                                   &bufioreq_evtchn);
>>>>>> +    if (rc < 0) {
>>>>>> +        hw_error("failed to get ioreq server info: error %d handle="
>>> XC_INTERFACE_FMT,
>>>>>> +                 errno, xen_xc);
>>>>>> +    }
>>>>>> +
>>>>>>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
>>>>>> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
>>>>>> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
>>>>>> +
>>>>>>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid,
>>> XC_PAGE_SIZE,
>>>>>>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
>>>>>>      if (state->shared_page == NULL) {
>>>>>> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno,
>>> rc);
>>>>>>      }
>>>>>>
>>>>>> -    xc_get_hvm_param(xen_xc, xen_domid,
>>> HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
>>>>>> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
>>>>>> -    state->buffered_io_page = xc_map_foreign_range(xen_xc,
>>> xen_domid, XC_PAGE_SIZE,
>>>>>> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
>>>>>> +    state->buffered_io_page = xc_map_foreign_range(xen_xc,
>>> xen_domid,
>>>>>> +                                                   XC_PAGE_SIZE,
>>>>>> +                                                   PROT_READ|PROT_WRITE,
>>>>>> +                                                   bufioreq_pfn);
>>>>>>      if (state->buffered_io_page == NULL) {
>>>>>>          hw_error("map buffered IO page returned error %d", errno);
>>>>>>      }
>>>>>> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>      /* Note: cpus is empty at this point in init */
>>>>>>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
>>>>>>
>>>>>> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state-
>>>> ioservid, true);
>>>>>> +    if (rc < 0) {
>>>>>> +        hw_error("failed to enable ioreq server info: error %d handle="
>>> XC_INTERFACE_FMT,
>>>>>> +                 errno, xen_xc);
>>>>>> +    }
>>>>>> +
>>>>>>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof
>>> (evtchn_port_t));
>>>>>>
>>>>>>      /* FIXME: how about if we overflow the page here? */
>>>>>> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>>>>>                                          xen_vcpu_eport(state->shared_page, i));
>>>>>>          if (rc == -1) {
>>>>>> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>>>>> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
>>>>>>              return -1;
>>>>>>          }
>>>>>>          state->ioreq_local_port[i] = rc;
>>>>>>      }
>>>>>>
>>>>>> -    rc = xc_get_hvm_param(xen_xc, xen_domid,
>>> HVM_PARAM_BUFIOREQ_EVTCHN,
>>>>>> -            &bufioreq_evtchn);
>>>>>> -    if (rc < 0) {
>>>>>> -        fprintf(stderr, "failed to get
>>> HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>>>>> -        return -1;
>>>>>> -    }
>>>>>>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>>>>> -            (uint32_t)bufioreq_evtchn);
>>>>>> +                                    bufioreq_evtchn);
>>>>>>      if (rc == -1) {
>>>>>> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>>>>> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
>>>>>>          return -1;
>>>>>>      }
>>>>>>      state->bufioreq_local_port = rc;
>>>>>> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>      memory_listener_register(&state->memory_listener,
>>> &address_space_memory);
>>>>>>      state->log_for_dirtybit = NULL;
>>>>>>
>>>>>> +    state->io_listener = xen_io_listener;
>>>>>> +    memory_listener_register(&state->io_listener, &address_space_io);
>>>>>> +
>>>>>> +    state->device_listener = xen_device_listener;
>>>>>> +    device_listener_register(&state->device_listener);
>>>>>> +
>>>>>>      /* Initialize backend core & drivers */
>>>>>>      if (xen_be_init() != 0) {
>>>>>>          fprintf(stderr, "%s: xen backend core setup failed\n",
>>> __FUNCTION__);
>>>>>>
>>>>>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available
  2015-01-29 19:14           ` Don Slutz
  2015-01-29 19:41             ` [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT Don Slutz
  2015-01-29 19:41             ` Don Slutz
@ 2015-01-30  2:40             ` Don Slutz
  2015-01-30  2:40             ` Don Slutz
  3 siblings, 0 replies; 20+ messages in thread
From: Don Slutz @ 2015-01-30  2:40 UTC (permalink / raw)
  To: Paul Durrant, Don Slutz, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, xen-devel, Gerd Hoffmann,
	Stefan Hajnoczi, Paolo Bonzini

On 01/29/15 14:14, Don Slutz wrote:
> On 01/29/15 07:09, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Don Slutz [mailto:dslutz@verizon.com]
>>> Sent: 29 January 2015 00:58
>>> To: Don Slutz; Paul Durrant; qemu-devel@nongnu.org; Stefano Stabellini
>>> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
>>> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini
>>> Subject: Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API
>>> when available
>>>

...

> 
> You can see that the guest is still waiting for the inl from 0x00000cfe.
> 
> 
> 
...

The one line patch:


>From 5269b1fb947f207057ca69e320c79b397db3e8f5 Mon Sep 17 00:00:00 2001
From: Don Slutz <dslutz@verizon.com>
Date: Thu, 29 Jan 2015 21:24:05 -0500
Subject: [PATCH] Prevent hang if read of HVM_PARAM_IOREQ_PFN,
 HVM_PARAM_BUFIOREQ_PFN, HVM_PARAM_BUFIOREQ_EVTCHN is done
 before hvmloader starts.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
 xen/arch/x86/hvm/hvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bad410e..7ac4b45 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -993,7 +993,7 @@ static int hvm_create_ioreq_server(struct domain *d,
domid_t domid,
     spin_lock(&d->arch.hvm_domain.ioreq_server.lock);

     rc = -EEXIST;
-    if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
+    if ( is_default && !list_empty(&d->arch.hvm_domain.ioreq_server.list) )
         goto fail2;

     rc = hvm_ioreq_server_init(s, d, domid, is_default, handle_bufioreq,
-- 
1.7.11.7


Does "fix this" but no idea if this is the way to go.

    -Don Slutz

>    -Don Slutz
> 
> 
>>   Paul
>>
>>>    -Don Slutz
>>>
>>>
>>>>     -Don Slutz
>>>>
>>>>
>>>>> So far I have tracked it back to hvm_select_ioreq_server()
>>>>> which selects the "default_ioreq_server".  Since I have one 1
>>>>> QEMU, it is both the "default_ioreq_server" and an enabled
>>>>> 2nd ioreq_server.  I am continuing to understand why my changes
>>>>> are causing this.  More below.
>>>>>
>>>>> This patch causes QEMU to only call xc_evtchn_bind_interdomain()
>>>>> for the enabled 2nd ioreq_server.  So when (if)
>>>>> hvm_select_ioreq_server() selects the "default_ioreq_server", the
>>>>> guest hangs on an I/O.
>>>>>
>>>>> Using the debug key 'e':
>>>>>
>>>>> (XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
>>>>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
>>>>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>>>>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>>>>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
>>>>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
>>>>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
>>>>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
>>>>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
>>>>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
>>>>> (XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
>>>>> (XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
>>>>> (XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
>>>>> (XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
>>>>> (XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
>>>>> (XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
>>>>> (XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
>>>>> (XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
>>>>> (XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
>>>>> (XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
>>>>> (XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
>>>>> (XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
>>>>> (XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
>>>>> (XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
>>>>> (XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
>>>>> (XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
>>>>> (XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
>>>>> (XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
>>>>> (XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
>>>>> (XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
>>>>> (XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
>>>>> (XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
>>>>> (XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
>>>>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
>>>>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>>>>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>>>>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
>>>>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
>>>>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
>>>>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
>>>>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
>>>>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
>>>>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
>>>>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
>>>>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
>>>>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
>>>>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0
>>>>>
>>>>> You can see that domain 1 has only half of it's event channels
>>>>> fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
>>>>> does:
>>>>>
>>>>>             notify_via_xen_event_channel(d, port);
>>>>>
>>>>> Nothing happens and you hang in hvm_wait_for_io() forever.
>>>>>
>>>>>
>>>>> This does raise the questions:
>>>>>
>>>>> 1) Does this patch causes extra event channels to be created
>>>>>    that cannot be used?
>>>>>
>>>>> 2) Should the "default_ioreq_server" be deleted?
>>>>>
>>>>>
>>>>> Not sure the right way to go.
>>>>>
>>>>>     -Don Slutz
>>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>>>> Cc: Peter Maydell <peter.maydell@linaro.org>
>>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>>>> Cc: Michael Tokarev <mjt@tls.msk.ru>
>>>>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>>>>> Cc: Stefan Weil <sw@weilnetz.de>
>>>>>> Cc: Olaf Hering <olaf@aepfle.de>
>>>>>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>>>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> Cc: Alexander Graf <agraf@suse.de>
>>>>>> ---
>>>>>>  configure                   |   29 ++++++
>>>>>>  include/hw/xen/xen_common.h |  223
>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>>  trace-events                |    9 ++
>>>>>>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
>>>>>>  4 files changed, 399 insertions(+), 22 deletions(-)
>>>>>>
>>>>>> diff --git a/configure b/configure
>>>>>> index 47048f0..b1f8c2a 100755
>>>>>> --- a/configure
>>>>>> +++ b/configure
>>>>>> @@ -1877,6 +1877,32 @@ int main(void) {
>>>>>>    xc_gnttab_open(NULL, 0);
>>>>>>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>>>>>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>>>>> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
>>>>>> +  return 0;
>>>>>> +}
>>>>>> +EOF
>>>>>> +      compile_prog "" "$xen_libs"
>>>>>> +    then
>>>>>> +    xen_ctrl_version=450
>>>>>> +    xen=yes
>>>>>> +
>>>>>> +  elif
>>>>>> +      cat > $TMPC <<EOF &&
>>>>>> +#include <xenctrl.h>
>>>>>> +#include <xenstore.h>
>>>>>> +#include <stdint.h>
>>>>>> +#include <xen/hvm/hvm_info_table.h>
>>>>>> +#if !defined(HVM_MAX_VCPUS)
>>>>>> +# error HVM_MAX_VCPUS not defined
>>>>>> +#endif
>>>>>> +int main(void) {
>>>>>> +  xc_interface *xc;
>>>>>> +  xs_daemon_open();
>>>>>> +  xc = xc_interface_open(0, 0, 0);
>>>>>> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
>>>>>> +  xc_gnttab_open(NULL, 0);
>>>>>> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>>>>> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>>>>>    return 0;
>>>>>>  }
>>>>>>  EOF
>>>>>> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
>>>>>>      echo "Target Sparc Arch $sparc_cpu"
>>>>>>  fi
>>>>>>  echo "xen support       $xen"
>>>>>> +if test "$xen" = "yes" ; then
>>>>>> +  echo "xen ctrl version  $xen_ctrl_version"
>>>>>> +fi
>>>>>>  echo "brlapi support    $brlapi"
>>>>>>  echo "bluez  support    $bluez"
>>>>>>  echo "Documentation     $docs"
>>>>>> diff --git a/include/hw/xen/xen_common.h
>>> b/include/hw/xen/xen_common.h
>>>>>> index 95612a4..519696f 100644
>>>>>> --- a/include/hw/xen/xen_common.h
>>>>>> +++ b/include/hw/xen/xen_common.h
>>>>>> @@ -16,7 +16,9 @@
>>>>>>
>>>>>>  #include "hw/hw.h"
>>>>>>  #include "hw/xen/xen.h"
>>>>>> +#include "hw/pci/pci.h"
>>>>>>  #include "qemu/queue.h"
>>>>>> +#include "trace.h"
>>>>>>
>>>>>>  /*
>>>>>>   * We don't support Xen prior to 3.3.0.
>>>>>> @@ -179,4 +181,225 @@ static inline int
>>> xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
>>>>>>  }
>>>>>>  #endif
>>>>>>
>>>>>> +/* Xen before 4.5 */
>>>>>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
>>>>>> +
>>>>>> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>>>>> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>>>>> +#endif
>>>>>> +
>>>>>> +#define IOREQ_TYPE_PCI_CONFIG 2
>>>>>> +
>>>>>> +typedef uint32_t ioservid_t;
>>>>>> +
>>>>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>>>>> +                                          ioservid_t ioservid,
>>>>>> +                                          MemoryRegionSection *section)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
>>> dom,
>>>>>> +                                            ioservid_t ioservid,
>>>>>> +                                            MemoryRegionSection *section)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>>>>> +                                      ioservid_t ioservid,
>>>>>> +                                      MemoryRegionSection *section)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>>>>> +                                        ioservid_t ioservid,
>>>>>> +                                        MemoryRegionSection *section)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>>>>> +                                  ioservid_t ioservid,
>>>>>> +                                  PCIDevice *pci_dev)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>>>>> +                                    ioservid_t ioservid,
>>>>>> +                                    PCIDevice *pci_dev)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>>>>> +                                          ioservid_t *ioservid)
>>>>>> +{
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>>>>> +                                            ioservid_t ioservid)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>>>>> +                                            ioservid_t ioservid,
>>>>>> +                                            xen_pfn_t *ioreq_pfn,
>>>>>> +                                            xen_pfn_t *bufioreq_pfn,
>>>>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>>>>> +{
>>>>>> +    unsigned long param;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN,
>>> &param);
>>>>>> +    if (rc < 0) {
>>>>>> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>> +    *ioreq_pfn = param;
>>>>>> +
>>>>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN,
>>> &param);
>>>>>> +    if (rc < 0) {
>>>>>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>> +    *bufioreq_pfn = param;
>>>>>> +
>>>>>> +    rc = xc_get_hvm_param(xc, dom,
>>> HVM_PARAM_BUFIOREQ_EVTCHN,
>>>>>> +                          &param);
>>>>>> +    if (rc < 0) {
>>>>>> +        fprintf(stderr, "failed to get
>>> HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>> +    *bufioreq_evtchn = param;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>>>>> +                                             ioservid_t ioservid,
>>>>>> +                                             bool enable)
>>>>>> +{
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Xen 4.5 */
>>>>>> +#else
>>>>>> +
>>>>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>>>>> +                                          ioservid_t ioservid,
>>>>>> +                                          MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>>> +
>>>>>> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
>>>>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
>>>>>> +                                        start_addr, end_addr);
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
>>> dom,
>>>>>> +                                            ioservid_t ioservid,
>>>>>> +                                            MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>>> +
>>>>>> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
>>>>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
>>>>>> +                                            start_addr, end_addr);
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>>>>> +                                      ioservid_t ioservid,
>>>>>> +                                      MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>>> +
>>>>>> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
>>>>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
>>>>>> +                                        start_addr, end_addr);
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>>>>> +                                        ioservid_t ioservid,
>>>>>> +                                        MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>>> +
>>>>>> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
>>>>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
>>>>>> +                                            start_addr, end_addr);
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>>>>> +                                  ioservid_t ioservid,
>>>>>> +                                  PCIDevice *pci_dev)
>>>>>> +{
>>>>>> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>>>>> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>>>>> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
>>>>>> +                                      0, pci_bus_num(pci_dev->bus),
>>>>>> +                                      PCI_SLOT(pci_dev->devfn),
>>>>>> +                                      PCI_FUNC(pci_dev->devfn));
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>>>>> +                                    ioservid_t ioservid,
>>>>>> +                                    PCIDevice *pci_dev)
>>>>>> +{
>>>>>> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>>>>> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>>>>> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
>>>>>> +                                          0, pci_bus_num(pci_dev->bus),
>>>>>> +                                          PCI_SLOT(pci_dev->devfn),
>>>>>> +                                          PCI_FUNC(pci_dev->devfn));
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>>>>> +                                          ioservid_t *ioservid)
>>>>>> +{
>>>>>> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
>>>>>> +
>>>>>> +    if (rc == 0) {
>>>>>> +        trace_xen_ioreq_server_create(*ioservid);
>>>>>> +    }
>>>>>> +
>>>>>> +    return rc;
>>>>>> +}
>>>>>> +
>>>>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>>>>> +                                            ioservid_t ioservid)
>>>>>> +{
>>>>>> +    trace_xen_ioreq_server_destroy(ioservid);
>>>>>> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>>>>> +                                            ioservid_t ioservid,
>>>>>> +                                            xen_pfn_t *ioreq_pfn,
>>>>>> +                                            xen_pfn_t *bufioreq_pfn,
>>>>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>>>>> +{
>>>>>> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
>>>>>> +                                        ioreq_pfn, bufioreq_pfn,
>>>>>> +                                        bufioreq_evtchn);
>>>>>> +}
>>>>>> +
>>>>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>>>>> +                                             ioservid_t ioservid,
>>>>>> +                                             bool enable)
>>>>>> +{
>>>>>> +    trace_xen_ioreq_server_state(ioservid, enable);
>>>>>> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
>>>>>> +}
>>>>>> +
>>>>>> +#endif
>>>>>> +
>>>>>>  #endif /* QEMU_HW_XEN_COMMON_H */
>>>>>> diff --git a/trace-events b/trace-events
>>>>>> index b5722ea..abd1118 100644
>>>>>> --- a/trace-events
>>>>>> +++ b/trace-events
>>>>>> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label,
>>> uint32_t num) "Number of %s pages:
>>>>>>  # xen-hvm.c
>>>>>>  xen_ram_alloc(unsigned long ram_addr, unsigned long size)
>>> "requested: %#lx, size %#lx"
>>>>>>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool
>>> log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>>>>>> +xen_ioreq_server_create(uint32_t id) "id: %u"
>>>>>> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
>>>>>> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
>>>>>> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>>> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>>> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t
>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>>> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t
>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
>>> "id: %u bdf: %02x.%02x.%02x"
>>>>>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
>>> "id: %u bdf: %02x.%02x.%02x"
>>>>>>
>>>>>>  # xen-mapcache.c
>>>>>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
>>>>>> diff --git a/xen-hvm.c b/xen-hvm.c
>>>>>> index 7548794..31cb3ca 100644
>>>>>> --- a/xen-hvm.c
>>>>>> +++ b/xen-hvm.c
>>>>>> @@ -85,9 +85,6 @@ static inline ioreq_t
>>> *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>>>>>>  }
>>>>>>  #  define FMT_ioreq_size "u"
>>>>>>  #endif
>>>>>> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>>>>> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>>>>> -#endif
>>>>>>
>>>>>>  #define BUFFER_IO_MAX_DELAY  100
>>>>>>
>>>>>> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
>>>>>>  } XenPhysmap;
>>>>>>
>>>>>>  typedef struct XenIOState {
>>>>>> +    ioservid_t ioservid;
>>>>>>      shared_iopage_t *shared_page;
>>>>>>      shared_vmport_iopage_t *shared_vmport_page;
>>>>>>      buffered_iopage_t *buffered_io_page;
>>>>>> @@ -117,6 +115,8 @@ typedef struct XenIOState {
>>>>>>
>>>>>>      struct xs_handle *xenstore;
>>>>>>      MemoryListener memory_listener;
>>>>>> +    MemoryListener io_listener;
>>>>>> +    DeviceListener device_listener;
>>>>>>      QLIST_HEAD(, XenPhysmap) physmap;
>>>>>>      hwaddr free_phys_offset;
>>>>>>      const XenPhysmap *log_for_dirtybit;
>>>>>> @@ -467,12 +467,23 @@ static void xen_set_memory(struct
>>> MemoryListener *listener,
>>>>>>      bool log_dirty = memory_region_is_logging(section->mr);
>>>>>>      hvmmem_type_t mem_type;
>>>>>>
>>>>>> +    if (section->mr == &ram_memory) {
>>>>>> +        return;
>>>>>> +    } else {
>>>>>> +        if (add) {
>>>>>> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
>>>>>> +                                   section);
>>>>>> +        } else {
>>>>>> +            xen_unmap_memory_section(xen_xc, xen_domid, state-
>>>> ioservid,
>>>>>> +                                     section);
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>      if (!memory_region_is_ram(section->mr)) {
>>>>>>          return;
>>>>>>      }
>>>>>>
>>>>>> -    if (!(section->mr != &ram_memory
>>>>>> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
>>>>>> +    if (log_dirty != add) {
>>>>>>          return;
>>>>>>      }
>>>>>>
>>>>>> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener
>>> *listener,
>>>>>>      memory_region_unref(section->mr);
>>>>>>  }
>>>>>>
>>>>>> +static void xen_io_add(MemoryListener *listener,
>>>>>> +                       MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>>>>> +
>>>>>> +    memory_region_ref(section->mr);
>>>>>> +
>>>>>> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
>>>>>> +}
>>>>>> +
>>>>>> +static void xen_io_del(MemoryListener *listener,
>>>>>> +                       MemoryRegionSection *section)
>>>>>> +{
>>>>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>>>>> +
>>>>>> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid,
>>> section);
>>>>>> +
>>>>>> +    memory_region_unref(section->mr);
>>>>>> +}
>>>>>> +
>>>>>> +static void xen_device_realize(DeviceListener *listener,
>>>>>> +			       DeviceState *dev)
>>>>>> +{
>>>>>> +    XenIOState *state = container_of(listener, XenIOState,
>>> device_listener);
>>>>>> +
>>>>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>>>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>>>> +
>>>>>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static void xen_device_unrealize(DeviceListener *listener,
>>>>>> +				 DeviceState *dev)
>>>>>> +{
>>>>>> +    XenIOState *state = container_of(listener, XenIOState,
>>> device_listener);
>>>>>> +
>>>>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>>>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>>>> +
>>>>>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>>  static void xen_sync_dirty_bitmap(XenIOState *state,
>>>>>>                                    hwaddr start_addr,
>>>>>>                                    ram_addr_t size)
>>>>>> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener =
>>> {
>>>>>>      .priority = 10,
>>>>>>  };
>>>>>>
>>>>>> +static MemoryListener xen_io_listener = {
>>>>>> +    .region_add = xen_io_add,
>>>>>> +    .region_del = xen_io_del,
>>>>>> +    .priority = 10,
>>>>>> +};
>>>>>> +
>>>>>> +static DeviceListener xen_device_listener = {
>>>>>> +    .realize = xen_device_realize,
>>>>>> +    .unrealize = xen_device_unrealize,
>>>>>> +};
>>>>>> +
>>>>>>  /* get the ioreq packets from share mem */
>>>>>>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState
>>> *state, int vcpu)
>>>>>>  {
>>>>>> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state,
>>> ioreq_t *req)
>>>>>>          case IOREQ_TYPE_INVALIDATE:
>>>>>>              xen_invalidate_map_cache();
>>>>>>              break;
>>>>>> +        case IOREQ_TYPE_PCI_CONFIG: {
>>>>>> +            uint32_t sbdf = req->addr >> 32;
>>>>>> +            uint32_t val;
>>>>>> +
>>>>>> +            /* Fake a write to port 0xCF8 so that
>>>>>> +             * the config space access will target the
>>>>>> +             * correct device model.
>>>>>> +             */
>>>>>> +            val = (1u << 31) |
>>>>>> +                  ((req->addr & 0x0f00) << 16) |
>>>>>> +                  ((sbdf & 0xffff) << 8) |
>>>>>> +                  (req->addr & 0xfc);
>>>>>> +            do_outp(0xcf8, 4, val);
>>>>>> +
>>>>>> +            /* Now issue the config space access via
>>>>>> +             * port 0xCFC
>>>>>> +             */
>>>>>> +            req->addr = 0xcfc | (req->addr & 0x03);
>>>>>> +            cpu_ioreq_pio(req);
>>>>>> +            break;
>>>>>> +        }
>>>>>>          default:
>>>>>>              hw_error("Invalid ioreq type 0x%x\n", req->type);
>>>>>>      }
>>>>>> @@ -993,9 +1080,15 @@ static void
>>> xen_main_loop_prepare(XenIOState *state)
>>>>>>  static void xen_hvm_change_state_handler(void *opaque, int running,
>>>>>>                                           RunState rstate)
>>>>>>  {
>>>>>> +    XenIOState *state = opaque;
>>>>>> +
>>>>>>      if (running) {
>>>>>> -        xen_main_loop_prepare((XenIOState *)opaque);
>>>>>> +        xen_main_loop_prepare(state);
>>>>>>      }
>>>>>> +
>>>>>> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
>>>>>> +                               state->ioservid,
>>>>>> +                               (rstate == RUN_STATE_RUNNING));
>>>>>>  }
>>>>>>
>>>>>>  static void xen_exit_notifier(Notifier *n, void *data)
>>>>>> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>                   MemoryRegion **ram_memory)
>>>>>>  {
>>>>>>      int i, rc;
>>>>>> -    unsigned long ioreq_pfn;
>>>>>> -    unsigned long bufioreq_evtchn;
>>>>>> +    xen_pfn_t ioreq_pfn;
>>>>>> +    xen_pfn_t bufioreq_pfn;
>>>>>> +    evtchn_port_t bufioreq_evtchn;
>>>>>>      XenIOState *state;
>>>>>>
>>>>>>      state = g_malloc0(sizeof (XenIOState));
>>>>>> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>          return -1;
>>>>>>      }
>>>>>>
>>>>>> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state-
>>>> ioservid);
>>>>>> +    if (rc < 0) {
>>>>>> +        perror("xen: ioreq server create");
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>>      state->exit.notify = xen_exit_notifier;
>>>>>>      qemu_add_exit_notifier(&state->exit);
>>>>>>
>>>>>> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>      state->wakeup.notify = xen_wakeup_notifier;
>>>>>>      qemu_register_wakeup_notifier(&state->wakeup);
>>>>>>
>>>>>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN,
>>> &ioreq_pfn);
>>>>>> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state-
>>>> ioservid,
>>>>>> +                                   &ioreq_pfn, &bufioreq_pfn,
>>>>>> +                                   &bufioreq_evtchn);
>>>>>> +    if (rc < 0) {
>>>>>> +        hw_error("failed to get ioreq server info: error %d handle="
>>> XC_INTERFACE_FMT,
>>>>>> +                 errno, xen_xc);
>>>>>> +    }
>>>>>> +
>>>>>>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
>>>>>> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
>>>>>> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
>>>>>> +
>>>>>>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid,
>>> XC_PAGE_SIZE,
>>>>>>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
>>>>>>      if (state->shared_page == NULL) {
>>>>>> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno,
>>> rc);
>>>>>>      }
>>>>>>
>>>>>> -    xc_get_hvm_param(xen_xc, xen_domid,
>>> HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
>>>>>> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
>>>>>> -    state->buffered_io_page = xc_map_foreign_range(xen_xc,
>>> xen_domid, XC_PAGE_SIZE,
>>>>>> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
>>>>>> +    state->buffered_io_page = xc_map_foreign_range(xen_xc,
>>> xen_domid,
>>>>>> +                                                   XC_PAGE_SIZE,
>>>>>> +                                                   PROT_READ|PROT_WRITE,
>>>>>> +                                                   bufioreq_pfn);
>>>>>>      if (state->buffered_io_page == NULL) {
>>>>>>          hw_error("map buffered IO page returned error %d", errno);
>>>>>>      }
>>>>>> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>      /* Note: cpus is empty at this point in init */
>>>>>>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
>>>>>>
>>>>>> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state-
>>>> ioservid, true);
>>>>>> +    if (rc < 0) {
>>>>>> +        hw_error("failed to enable ioreq server info: error %d handle="
>>> XC_INTERFACE_FMT,
>>>>>> +                 errno, xen_xc);
>>>>>> +    }
>>>>>> +
>>>>>>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof
>>> (evtchn_port_t));
>>>>>>
>>>>>>      /* FIXME: how about if we overflow the page here? */
>>>>>> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>>>>>                                          xen_vcpu_eport(state->shared_page, i));
>>>>>>          if (rc == -1) {
>>>>>> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>>>>> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
>>>>>>              return -1;
>>>>>>          }
>>>>>>          state->ioreq_local_port[i] = rc;
>>>>>>      }
>>>>>>
>>>>>> -    rc = xc_get_hvm_param(xen_xc, xen_domid,
>>> HVM_PARAM_BUFIOREQ_EVTCHN,
>>>>>> -            &bufioreq_evtchn);
>>>>>> -    if (rc < 0) {
>>>>>> -        fprintf(stderr, "failed to get
>>> HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>>>>> -        return -1;
>>>>>> -    }
>>>>>>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>>>>> -            (uint32_t)bufioreq_evtchn);
>>>>>> +                                    bufioreq_evtchn);
>>>>>>      if (rc == -1) {
>>>>>> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>>>>> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
>>>>>>          return -1;
>>>>>>      }
>>>>>>      state->bufioreq_local_port = rc;
>>>>>> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t
>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>>      memory_listener_register(&state->memory_listener,
>>> &address_space_memory);
>>>>>>      state->log_for_dirtybit = NULL;
>>>>>>
>>>>>> +    state->io_listener = xen_io_listener;
>>>>>> +    memory_listener_register(&state->io_listener, &address_space_io);
>>>>>> +
>>>>>> +    state->device_listener = xen_device_listener;
>>>>>> +    device_listener_register(&state->device_listener);
>>>>>> +
>>>>>>      /* Initialize backend core & drivers */
>>>>>>      if (xen_be_init() != 0) {
>>>>>>          fprintf(stderr, "%s: xen backend core setup failed\n",
>>> __FUNCTION__);
>>>>>>
>>>>>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
  2015-01-29 19:41             ` [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT Don Slutz
  2015-01-30 10:23               ` Paul Durrant
@ 2015-01-30 10:23               ` Paul Durrant
  2015-01-30 18:26                 ` Don Slutz
  2015-01-30 18:26                 ` [Qemu-devel] " Don Slutz
  1 sibling, 2 replies; 20+ messages in thread
From: Paul Durrant @ 2015-01-30 10:23 UTC (permalink / raw)
  To: Don Slutz, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, xen-devel, Gerd Hoffmann,
	Stefan Hajnoczi, Paolo Bonzini

> -----Original Message-----
> From: Don Slutz [mailto:dslutz@verizon.com]
> Sent: 29 January 2015 19:41
> To: Paul Durrant; Don Slutz; qemu-devel@nongnu.org; Stefano Stabellini
> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini;
> xen-devel
> Subject: New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
> 
> >> On 01/29/15 07:09, Paul Durrant wrote:
> ...
> >> Given that IIRC you are using a new dedicated IOREQ type, I
> >> think there needs to be something that allows an emulator to
> >> register for this IOREQ type. How about adding a new type to
> >> those defined for HVMOP_map_io_range_to_ioreq_server for your
> >> case? (In your case the start and end values in the hypercall
> >> would be meaningless but it could be used to steer
> >> hvm_select_ioreq_server() into sending all emulation requests or
> >> your new type to QEMU.
> >>
> 
> This is an interesting idea.  Will need to spend more time on it.
> 
> 
> >> Actually such a mechanism could be used
> >> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
> >> patches, they are going nowhere. Upstream QEMU (as default) used
> >> to ignore them anyway, which is why I didn't bother with such a
> >> patch to Xen before but since you now need one maybe you could
> >> add that too?
> >>
> 
> I think it would not be that hard.  Will look into adding it.
> 
> 
> Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
> It looks like to me that if a vpcu (like 0) needs to wait for the
> 2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
> and return as if the ioreq is done.  What am I missing?
> 

hvm_do_resume() walks the ioreq server list looking at the IOREQ state in the shared page of each server in turn. If no IOREQ was sent to that server then then state will be IOREQ_NONE and hvm_wait_for_io() will return 1 immediately so the outer loop in hvm_do_resume() will move on to the next server. If a state of IOREQ_READY or IOREQ_INPROCESS is found then the vcpu blocks on the relevant event channel until the state transitions to IORESP_READY. The IOREQ is then completed and the loop moves on to the next server.
Normally an IOREQ would only be directed at one server and indeed IOREQs that are issued for emulation requests (i.e. when io_state != HVMIO_none) fall into this category but there is one case of a broadcast IOREQ, which is the INVALIDATE IOREQ (sent to tell emulators to invalidate any mappings of guest memory they may have cached) and that is why hvm_do_resume() has to iterate over all servers.

Does that make sense?

  Paul

>    -Don Slutz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
  2015-01-29 19:41             ` [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT Don Slutz
@ 2015-01-30 10:23               ` Paul Durrant
  2015-01-30 10:23               ` [Qemu-devel] " Paul Durrant
  1 sibling, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2015-01-30 10:23 UTC (permalink / raw)
  To: Don Slutz, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, xen-devel, Gerd Hoffmann,
	Stefan Hajnoczi, Paolo Bonzini

> -----Original Message-----
> From: Don Slutz [mailto:dslutz@verizon.com]
> Sent: 29 January 2015 19:41
> To: Paul Durrant; Don Slutz; qemu-devel@nongnu.org; Stefano Stabellini
> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini;
> xen-devel
> Subject: New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
> 
> >> On 01/29/15 07:09, Paul Durrant wrote:
> ...
> >> Given that IIRC you are using a new dedicated IOREQ type, I
> >> think there needs to be something that allows an emulator to
> >> register for this IOREQ type. How about adding a new type to
> >> those defined for HVMOP_map_io_range_to_ioreq_server for your
> >> case? (In your case the start and end values in the hypercall
> >> would be meaningless but it could be used to steer
> >> hvm_select_ioreq_server() into sending all emulation requests or
> >> your new type to QEMU.
> >>
> 
> This is an interesting idea.  Will need to spend more time on it.
> 
> 
> >> Actually such a mechanism could be used
> >> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
> >> patches, they are going nowhere. Upstream QEMU (as default) used
> >> to ignore them anyway, which is why I didn't bother with such a
> >> patch to Xen before but since you now need one maybe you could
> >> add that too?
> >>
> 
> I think it would not be that hard.  Will look into adding it.
> 
> 
> Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
> It looks like to me that if a vpcu (like 0) needs to wait for the
> 2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
> and return as if the ioreq is done.  What am I missing?
> 

hvm_do_resume() walks the ioreq server list looking at the IOREQ state in the shared page of each server in turn. If no IOREQ was sent to that server then then state will be IOREQ_NONE and hvm_wait_for_io() will return 1 immediately so the outer loop in hvm_do_resume() will move on to the next server. If a state of IOREQ_READY or IOREQ_INPROCESS is found then the vcpu blocks on the relevant event channel until the state transitions to IORESP_READY. The IOREQ is then completed and the loop moves on to the next server.
Normally an IOREQ would only be directed at one server and indeed IOREQs that are issued for emulation requests (i.e. when io_state != HVMIO_none) fall into this category but there is one case of a broadcast IOREQ, which is the INVALIDATE IOREQ (sent to tell emulators to invalidate any mappings of guest memory they may have cached) and that is why hvm_do_resume() has to iterate over all servers.

Does that make sense?

  Paul

>    -Don Slutz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
  2015-01-30 10:23               ` [Qemu-devel] " Paul Durrant
  2015-01-30 18:26                 ` Don Slutz
@ 2015-01-30 18:26                 ` Don Slutz
  1 sibling, 0 replies; 20+ messages in thread
From: Don Slutz @ 2015-01-30 18:26 UTC (permalink / raw)
  To: Paul Durrant, Don Slutz, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, xen-devel, Gerd Hoffmann,
	Stefan Hajnoczi, Paolo Bonzini

On 01/30/15 05:23, Paul Durrant wrote:
>> -----Original Message-----
>> From: Don Slutz [mailto:dslutz@verizon.com]
>> Sent: 29 January 2015 19:41
>> To: Paul Durrant; Don Slutz; qemu-devel@nongnu.org; Stefano Stabellini
>> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
>> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini;
>> xen-devel
>> Subject: New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
>>
>>>> On 01/29/15 07:09, Paul Durrant wrote:
>> ...
>>>> Given that IIRC you are using a new dedicated IOREQ type, I
>>>> think there needs to be something that allows an emulator to
>>>> register for this IOREQ type. How about adding a new type to
>>>> those defined for HVMOP_map_io_range_to_ioreq_server for your
>>>> case? (In your case the start and end values in the hypercall
>>>> would be meaningless but it could be used to steer
>>>> hvm_select_ioreq_server() into sending all emulation requests or
>>>> your new type to QEMU.
>>>>
>>
>> This is an interesting idea.  Will need to spend more time on it.
>>

This does look very doable.  The only issue I see is that it requires
a QEMU change in order to work.  This would prevent Xen 4.6 from using
QEMU 2.2.0 and vmport (vmware-tools, vmware-mouse).

What makes sense to me is to "invert it"  I.E. the default is to send
IOREQ_TYPE_VMWARE_PORT via io_range, and an ioreq server can say stop
sending them.

The reason this is safe so far is that IOREQ_TYPE_VMWARE_PORT can only
be sent if vmport is configured on.  And in that case QEMU will be
started with vmport=on which will cause all QEMUs that do not support
IOREQ_TYPE_VMWARE_PORT to crash.




>>
>>>> Actually such a mechanism could be used
>>>> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
>>>> patches, they are going nowhere. Upstream QEMU (as default) used
>>>> to ignore them anyway, which is why I didn't bother with such a
>>>> patch to Xen before but since you now need one maybe you could
>>>> add that too?
>>>>
>>
>> I think it would not be that hard.  Will look into adding it.
>>
>>
>> Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
>> It looks like to me that if a vpcu (like 0) needs to wait for the
>> 2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
>> and return as if the ioreq is done.  What am I missing?
>>
> 
> hvm_do_resume() walks the ioreq server list looking at the IOREQ state in the shared page of each server in turn. If no IOREQ was sent to that server then then state will be IOREQ_NONE and hvm_wait_for_io() will return 1 immediately so the outer loop in hvm_do_resume() will move on to the next server. If a state of IOREQ_READY or IOREQ_INPROCESS is found then the vcpu blocks on the relevant event channel until the state transitions to IORESP_READY. The IOREQ is then completed and the loop moves on to the next server.
> Normally an IOREQ would only be directed at one server and indeed IOREQs that are issued for emulation requests (i.e. when io_state != HVMIO_none) fall into this category but there is one case of a broadcast IOREQ, which is the INVALIDATE IOREQ (sent to tell emulators to invalidate any mappings of guest memory they may have cached) and that is why hvm_do_resume() has to iterate over all servers.
> 
> Does that make sense?
> 

Thanks for the clear info.  It does make sense.

   -Don Slutz

>   Paul
> 
>>    -Don Slutz
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
  2015-01-30 10:23               ` [Qemu-devel] " Paul Durrant
@ 2015-01-30 18:26                 ` Don Slutz
  2015-01-30 18:26                 ` [Qemu-devel] " Don Slutz
  1 sibling, 0 replies; 20+ messages in thread
From: Don Slutz @ 2015-01-30 18:26 UTC (permalink / raw)
  To: Paul Durrant, Don Slutz, qemu-devel, Stefano Stabellini
  Cc: Peter Maydell, Olaf Hering, Alexey Kardashevskiy, Stefan Weil,
	Michael Tokarev, Alexander Graf, xen-devel, Gerd Hoffmann,
	Stefan Hajnoczi, Paolo Bonzini

On 01/30/15 05:23, Paul Durrant wrote:
>> -----Original Message-----
>> From: Don Slutz [mailto:dslutz@verizon.com]
>> Sent: 29 January 2015 19:41
>> To: Paul Durrant; Don Slutz; qemu-devel@nongnu.org; Stefano Stabellini
>> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
>> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini;
>> xen-devel
>> Subject: New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
>>
>>>> On 01/29/15 07:09, Paul Durrant wrote:
>> ...
>>>> Given that IIRC you are using a new dedicated IOREQ type, I
>>>> think there needs to be something that allows an emulator to
>>>> register for this IOREQ type. How about adding a new type to
>>>> those defined for HVMOP_map_io_range_to_ioreq_server for your
>>>> case? (In your case the start and end values in the hypercall
>>>> would be meaningless but it could be used to steer
>>>> hvm_select_ioreq_server() into sending all emulation requests or
>>>> your new type to QEMU.
>>>>
>>
>> This is an interesting idea.  Will need to spend more time on it.
>>

This does look very doable.  The only issue I see is that it requires
a QEMU change in order to work.  This would prevent Xen 4.6 from using
QEMU 2.2.0 and vmport (vmware-tools, vmware-mouse).

What makes sense to me is to "invert it"  I.E. the default is to send
IOREQ_TYPE_VMWARE_PORT via io_range, and an ioreq server can say stop
sending them.

The reason this is safe so far is that IOREQ_TYPE_VMWARE_PORT can only
be sent if vmport is configured on.  And in that case QEMU will be
started with vmport=on which will cause all QEMUs that do not support
IOREQ_TYPE_VMWARE_PORT to crash.




>>
>>>> Actually such a mechanism could be used
>>>> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
>>>> patches, they are going nowhere. Upstream QEMU (as default) used
>>>> to ignore them anyway, which is why I didn't bother with such a
>>>> patch to Xen before but since you now need one maybe you could
>>>> add that too?
>>>>
>>
>> I think it would not be that hard.  Will look into adding it.
>>
>>
>> Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
>> It looks like to me that if a vpcu (like 0) needs to wait for the
>> 2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
>> and return as if the ioreq is done.  What am I missing?
>>
> 
> hvm_do_resume() walks the ioreq server list looking at the IOREQ state in the shared page of each server in turn. If no IOREQ was sent to that server then then state will be IOREQ_NONE and hvm_wait_for_io() will return 1 immediately so the outer loop in hvm_do_resume() will move on to the next server. If a state of IOREQ_READY or IOREQ_INPROCESS is found then the vcpu blocks on the relevant event channel until the state transitions to IORESP_READY. The IOREQ is then completed and the loop moves on to the next server.
> Normally an IOREQ would only be directed at one server and indeed IOREQs that are issued for emulation requests (i.e. when io_state != HVMIO_none) fall into this category but there is one case of a broadcast IOREQ, which is the INVALIDATE IOREQ (sent to tell emulators to invalidate any mappings of guest memory they may have cached) and that is why hvm_do_resume() has to iterate over all servers.
> 
> Does that make sense?
> 

Thanks for the clear info.  It does make sense.

   -Don Slutz

>   Paul
> 
>>    -Don Slutz
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-01-30 18:26 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-05 10:50 [Qemu-devel] [PATCH v5 0/2] Use ioreq-server API Paul Durrant
2014-12-05 10:50 ` [Qemu-devel] [PATCH v5 1/2] Add device listener interface Paul Durrant
2014-12-05 11:44   ` Paolo Bonzini
2014-12-08 11:12     ` Paul Durrant
2014-12-09 17:40       ` Paolo Bonzini
2014-12-09 17:54         ` Andreas Färber
2014-12-05 10:50 ` [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available Paul Durrant
2015-01-28 19:32   ` Don Slutz
2015-01-29  0:05     ` Don Slutz
2015-01-29  0:57       ` Don Slutz
2015-01-29 12:09         ` Paul Durrant
2015-01-29 19:14           ` Don Slutz
2015-01-29 19:41             ` [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT Don Slutz
2015-01-30 10:23               ` Paul Durrant
2015-01-30 10:23               ` [Qemu-devel] " Paul Durrant
2015-01-30 18:26                 ` Don Slutz
2015-01-30 18:26                 ` [Qemu-devel] " Don Slutz
2015-01-29 19:41             ` Don Slutz
2015-01-30  2:40             ` [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when available Don Slutz
2015-01-30  2:40             ` Don Slutz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.