All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough
@ 2012-06-14 17:01 Anthony PERARD
  2012-06-14 17:01   ` Anthony PERARD
                   ` (15 more replies)
  0 siblings, 16 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

Hi all,

This patch series introduces the PCI passthrough for Xen.

Changes since the last version:
  - New patch that introduce a new qdev-property pci-host-devaddr.
  => the "export pci_parse_devaddr" patch is not anymore usefull.

Thanks,


Allen Kay (2):
  Introduce Xen PCI Passthrough, qdevice (1/3)
  Introduce Xen PCI Passthrough, PCI config space helpers (2/3)

Anthony PERARD (6):
  pci_ids: Add INTEL_82599_SFP_VF id.
  configure: Introduce --enable-xen-pci-passthrough.
  Introduce XenHostPCIDevice to access a pci device on the host.
  pci.c: Add opaque argument to pci_for_each_device.
  qdev-properties: Introduce pci-host-devaddr.
  Introduce apic-msidef.h

Jiang Yunhong (1):
  Introduce Xen PCI Passthrough, MSI (3/3)

 configure                |   29 +
 hw/apic-msidef.h         |   30 +
 hw/apic.c                |   11 +-
 hw/i386/Makefile.objs    |    2 +
 hw/pci.c                 |   11 +-
 hw/pci.h                 |    4 +-
 hw/pci_ids.h             |    1 +
 hw/qdev-properties.c     |  107 +++
 hw/qdev.h                |    3 +
 hw/xen-host-pci-device.c |  396 ++++++++++
 hw/xen-host-pci-device.h |   55 ++
 hw/xen_common.h          |    3 +
 hw/xen_platform.c        |    8 +-
 hw/xen_pt.c              |  851 +++++++++++++++++++++
 hw/xen_pt.h              |  301 ++++++++
 hw/xen_pt_config_init.c  | 1869 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/xen_pt_msi.c          |  620 +++++++++++++++
 qemu-common.h            |    7 +
 xen-all.c                |   12 +
 19 files changed, 4301 insertions(+), 19 deletions(-)
 create mode 100644 hw/apic-msidef.h
 create mode 100644 hw/xen-host-pci-device.c
 create mode 100644 hw/xen-host-pci-device.h
 create mode 100644 hw/xen_pt.c
 create mode 100644 hw/xen_pt.h
 create mode 100644 hw/xen_pt_config_init.c
 create mode 100644 hw/xen_pt_msi.c

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH V13 1/9] pci_ids: Add INTEL_82599_SFP_VF id.
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
@ 2012-06-14 17:01   ` Anthony PERARD
  2012-06-14 17:01 ` [PATCH V13 2/9] configure: Introduce --enable-xen-pci-passthrough Anthony PERARD
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

We are using this in our quirk lookup provided by patch
titled: Introduce Xen PCI Passthrough, PCI config space helpers.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/pci_ids.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index e8235a7..649e6b3 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -118,6 +118,7 @@
 #define PCI_DEVICE_ID_INTEL_82801I_UHCI6 0x2939
 #define PCI_DEVICE_ID_INTEL_82801I_EHCI1 0x293a
 #define PCI_DEVICE_ID_INTEL_82801I_EHCI2 0x293c
+#define PCI_DEVICE_ID_INTEL_82599_SFP_VF 0x10ed
 
 #define PCI_VENDOR_ID_XEN               0x5853
 #define PCI_DEVICE_ID_XEN_PLATFORM      0x0001
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V13 1/9] pci_ids: Add INTEL_82599_SFP_VF id.
@ 2012-06-14 17:01   ` Anthony PERARD
  0 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

We are using this in our quirk lookup provided by patch
titled: Introduce Xen PCI Passthrough, PCI config space helpers.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/pci_ids.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index e8235a7..649e6b3 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -118,6 +118,7 @@
 #define PCI_DEVICE_ID_INTEL_82801I_UHCI6 0x2939
 #define PCI_DEVICE_ID_INTEL_82801I_EHCI1 0x293a
 #define PCI_DEVICE_ID_INTEL_82801I_EHCI2 0x293c
+#define PCI_DEVICE_ID_INTEL_82599_SFP_VF 0x10ed
 
 #define PCI_VENDOR_ID_XEN               0x5853
 #define PCI_DEVICE_ID_XEN_PLATFORM      0x0001
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH V13 2/9] configure: Introduce --enable-xen-pci-passthrough.
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
  2012-06-14 17:01   ` Anthony PERARD
  2012-06-14 17:01 ` [PATCH V13 2/9] configure: Introduce --enable-xen-pci-passthrough Anthony PERARD
@ 2012-06-14 17:01 ` Anthony PERARD
  2012-06-14 17:01   ` Anthony PERARD
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 configure |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index c2366ee..d734a03 100755
--- a/configure
+++ b/configure
@@ -137,6 +137,7 @@ vnc_png=""
 vnc_thread="no"
 xen=""
 xen_ctrl_version=""
+xen_pci_passthrough=""
 linux_aio=""
 cap_ng=""
 attr=""
@@ -685,6 +686,10 @@ for opt do
   ;;
   --enable-xen) xen="yes"
   ;;
+  --disable-xen-pci-passthrough) xen_pci_passthrough="no"
+  ;;
+  --enable-xen-pci-passthrough) xen_pci_passthrough="yes"
+  ;;
   --disable-brlapi) brlapi="no"
   ;;
   --enable-brlapi) brlapi="yes"
@@ -1032,6 +1037,8 @@ echo "                           (affects only QEMU, not qemu-img)"
 echo "  --enable-mixemu          enable mixer emulation"
 echo "  --disable-xen            disable xen backend driver support"
 echo "  --enable-xen             enable xen backend driver support"
+echo "  --disable-xen-pci-passthrough"
+echo "  --enable-xen-pci-passthrough"
 echo "  --disable-brlapi         disable BrlAPI"
 echo "  --enable-brlapi          enable BrlAPI"
 echo "  --disable-vnc-tls        disable TLS encryption for VNC server"
@@ -1508,6 +1515,25 @@ EOF
   fi
 fi
 
+if test "$xen_pci_passthrough" != "no"; then
+  if test "$xen" = "yes" && test "$linux" = "yes" &&
+    test "$xen_ctrl_version" -ge 340; then
+    xen_pci_passthrough=yes
+  else
+    if test "$xen_pci_passthrough" = "yes"; then
+      echo "ERROR"
+      echo "ERROR: User requested feature Xen PCI Passthrough"
+      echo "ERROR: but this feature require /sys from Linux"
+      if test "$xen_ctrl_version" -lt 340; then
+        echo "ERROR: This feature does not work with Xen 3.3"
+      fi
+      echo "ERROR"
+      exit 1;
+    fi
+    xen_pci_passthrough=no
+  fi
+fi
+
 ##########################################
 # pkg-config probe
 
@@ -3699,6 +3725,9 @@ case "$target_arch2" in
     if test "$xen" = "yes" -a "$target_softmmu" = "yes" ; then
       target_phys_bits=64
       echo "CONFIG_XEN=y" >> $config_target_mak
+      if test "$xen_pci_passthrough" = yes; then
+        echo "CONFIG_XEN_PCI_PASSTHROUGH=y" >> "$config_target_mak"
+      fi
     else
       echo "CONFIG_NO_XEN=y" >> $config_target_mak
     fi
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V13 2/9] configure: Introduce --enable-xen-pci-passthrough.
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
  2012-06-14 17:01   ` Anthony PERARD
@ 2012-06-14 17:01 ` Anthony PERARD
  2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 configure |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index c2366ee..d734a03 100755
--- a/configure
+++ b/configure
@@ -137,6 +137,7 @@ vnc_png=""
 vnc_thread="no"
 xen=""
 xen_ctrl_version=""
+xen_pci_passthrough=""
 linux_aio=""
 cap_ng=""
 attr=""
@@ -685,6 +686,10 @@ for opt do
   ;;
   --enable-xen) xen="yes"
   ;;
+  --disable-xen-pci-passthrough) xen_pci_passthrough="no"
+  ;;
+  --enable-xen-pci-passthrough) xen_pci_passthrough="yes"
+  ;;
   --disable-brlapi) brlapi="no"
   ;;
   --enable-brlapi) brlapi="yes"
@@ -1032,6 +1037,8 @@ echo "                           (affects only QEMU, not qemu-img)"
 echo "  --enable-mixemu          enable mixer emulation"
 echo "  --disable-xen            disable xen backend driver support"
 echo "  --enable-xen             enable xen backend driver support"
+echo "  --disable-xen-pci-passthrough"
+echo "  --enable-xen-pci-passthrough"
 echo "  --disable-brlapi         disable BrlAPI"
 echo "  --enable-brlapi          enable BrlAPI"
 echo "  --disable-vnc-tls        disable TLS encryption for VNC server"
@@ -1508,6 +1515,25 @@ EOF
   fi
 fi
 
+if test "$xen_pci_passthrough" != "no"; then
+  if test "$xen" = "yes" && test "$linux" = "yes" &&
+    test "$xen_ctrl_version" -ge 340; then
+    xen_pci_passthrough=yes
+  else
+    if test "$xen_pci_passthrough" = "yes"; then
+      echo "ERROR"
+      echo "ERROR: User requested feature Xen PCI Passthrough"
+      echo "ERROR: but this feature require /sys from Linux"
+      if test "$xen_ctrl_version" -lt 340; then
+        echo "ERROR: This feature does not work with Xen 3.3"
+      fi
+      echo "ERROR"
+      exit 1;
+    fi
+    xen_pci_passthrough=no
+  fi
+fi
+
 ##########################################
 # pkg-config probe
 
@@ -3699,6 +3725,9 @@ case "$target_arch2" in
     if test "$xen" = "yes" -a "$target_softmmu" = "yes" ; then
       target_phys_bits=64
       echo "CONFIG_XEN=y" >> $config_target_mak
+      if test "$xen_pci_passthrough" = yes; then
+        echo "CONFIG_XEN_PCI_PASSTHROUGH=y" >> "$config_target_mak"
+      fi
     else
       echo "CONFIG_NO_XEN=y" >> $config_target_mak
     fi
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH V13 3/9] Introduce XenHostPCIDevice to access a pci device on the host.
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
@ 2012-06-14 17:01   ` Anthony PERARD
  2012-06-14 17:01 ` [PATCH V13 2/9] configure: Introduce --enable-xen-pci-passthrough Anthony PERARD
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---
 hw/i386/Makefile.objs    |    1 +
 hw/xen-host-pci-device.c |  396 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/xen-host-pci-device.h |   55 +++++++
 3 files changed, 452 insertions(+), 0 deletions(-)
 create mode 100644 hw/xen-host-pci-device.c
 create mode 100644 hw/xen-host-pci-device.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index d43f1df..b719d8e 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -7,6 +7,7 @@ obj-y += debugcon.o multiboot.o
 obj-y += pc_piix.o
 obj-y += pc_sysfw.o
 obj-$(CONFIG_XEN) += xen_platform.o xen_apic.o
+obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
 obj-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o kvm/i8254.o
 obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
diff --git a/hw/xen-host-pci-device.c b/hw/xen-host-pci-device.c
new file mode 100644
index 0000000..e7ff680
--- /dev/null
+++ b/hw/xen-host-pci-device.c
@@ -0,0 +1,396 @@
+/*
+ * Copyright (C) 2011       Citrix Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "xen-host-pci-device.h"
+
+#define XEN_HOST_PCI_MAX_EXT_CAP \
+    ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
+
+#ifdef XEN_HOST_PCI_DEVICE_DEBUG
+#  define XEN_HOST_PCI_LOG(f, a...) fprintf(stderr, "%s: " f, __func__, ##a)
+#else
+#  define XEN_HOST_PCI_LOG(f, a...) (void)0
+#endif
+
+/*
+ * from linux/ioport.h
+ * IO resources have these defined flags.
+ */
+#define IORESOURCE_BITS         0x000000ff      /* Bus-specific bits */
+
+#define IORESOURCE_TYPE_BITS    0x00000f00      /* Resource type */
+#define IORESOURCE_IO           0x00000100
+#define IORESOURCE_MEM          0x00000200
+
+#define IORESOURCE_PREFETCH     0x00001000      /* No side effects */
+#define IORESOURCE_MEM_64       0x00100000
+
+static int xen_host_pci_sysfs_path(const XenHostPCIDevice *d,
+                                   const char *name, char *buf, ssize_t size)
+{
+    int rc;
+
+    rc = snprintf(buf, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
+                  d->domain, d->bus, d->dev, d->func, name);
+
+    if (rc >= size || rc < 0) {
+        /* The ouput is truncated or an other error is encountered */
+        return -ENODEV;
+    }
+    return 0;
+}
+
+
+/* This size should be enough to read the first 7 lines of a ressource file */
+#define XEN_HOST_PCI_RESSOURCE_BUFFER_SIZE 400
+static int xen_host_pci_get_resource(XenHostPCIDevice *d)
+{
+    int i, rc, fd;
+    char path[PATH_MAX];
+    char buf[XEN_HOST_PCI_RESSOURCE_BUFFER_SIZE];
+    unsigned long long start, end, flags, size;
+    char *endptr, *s;
+    uint8_t type;
+
+    rc = xen_host_pci_sysfs_path(d, "resource", path, sizeof (path));
+    if (rc) {
+        return rc;
+    }
+    fd = open(path, O_RDONLY);
+    if (fd == -1) {
+        XEN_HOST_PCI_LOG("Error: Can't open %s: %s\n", path, strerror(errno));
+        return -errno;
+    }
+
+    do {
+        rc = read(fd, &buf, sizeof (buf) - 1);
+        if (rc < 0 && errno != EINTR) {
+            rc = -errno;
+            goto out;
+        }
+    } while (rc < 0);
+    buf[rc] = 0;
+    rc = 0;
+
+    s = buf;
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        type = 0;
+
+        start = strtoll(s, &endptr, 16);
+        if (*endptr != ' ' || s == endptr) {
+            break;
+        }
+        s = endptr + 1;
+        end = strtoll(s, &endptr, 16);
+        if (*endptr != ' ' || s == endptr) {
+            break;
+        }
+        s = endptr + 1;
+        flags = strtoll(s, &endptr, 16);
+        if (*endptr != '\n' || s == endptr) {
+            break;
+        }
+        s = endptr + 1;
+
+        if (start) {
+            size = end - start + 1;
+        } else {
+            size = 0;
+        }
+
+        if (flags & IORESOURCE_IO) {
+            type |= XEN_HOST_PCI_REGION_TYPE_IO;
+        }
+        if (flags & IORESOURCE_MEM) {
+            type |= XEN_HOST_PCI_REGION_TYPE_MEM;
+        }
+        if (flags & IORESOURCE_PREFETCH) {
+            type |= XEN_HOST_PCI_REGION_TYPE_PREFETCH;
+        }
+        if (flags & IORESOURCE_MEM_64) {
+            type |= XEN_HOST_PCI_REGION_TYPE_MEM_64;
+        }
+
+        if (i < PCI_ROM_SLOT) {
+            d->io_regions[i].base_addr = start;
+            d->io_regions[i].size = size;
+            d->io_regions[i].type = type;
+            d->io_regions[i].bus_flags = flags & IORESOURCE_BITS;
+        } else {
+            d->rom.base_addr = start;
+            d->rom.size = size;
+            d->rom.type = type;
+            d->rom.bus_flags = flags & IORESOURCE_BITS;
+        }
+    }
+    if (i != PCI_NUM_REGIONS) {
+        /* Invalid format or input to short */
+        rc = -ENODEV;
+    }
+
+out:
+    close(fd);
+    return rc;
+}
+
+/* This size should be enough to read a long from a file */
+#define XEN_HOST_PCI_GET_VALUE_BUFFER_SIZE 22
+static int xen_host_pci_get_value(XenHostPCIDevice *d, const char *name,
+                                  unsigned int *pvalue, int base)
+{
+    char path[PATH_MAX];
+    char buf[XEN_HOST_PCI_GET_VALUE_BUFFER_SIZE];
+    int fd, rc;
+    unsigned long value;
+    char *endptr;
+
+    rc = xen_host_pci_sysfs_path(d, name, path, sizeof (path));
+    if (rc) {
+        return rc;
+    }
+    fd = open(path, O_RDONLY);
+    if (fd == -1) {
+        XEN_HOST_PCI_LOG("Error: Can't open %s: %s\n", path, strerror(errno));
+        return -errno;
+    }
+    do {
+        rc = read(fd, &buf, sizeof (buf) - 1);
+        if (rc < 0 && errno != EINTR) {
+            rc = -errno;
+            goto out;
+        }
+    } while (rc < 0);
+    buf[rc] = 0;
+    value = strtol(buf, &endptr, base);
+    if (endptr == buf || *endptr != '\n') {
+        rc = -1;
+    } else if ((value == LONG_MIN || value == LONG_MAX) && errno == ERANGE) {
+        rc = -errno;
+    } else {
+        rc = 0;
+        *pvalue = value;
+    }
+out:
+    close(fd);
+    return rc;
+}
+
+static inline int xen_host_pci_get_hex_value(XenHostPCIDevice *d,
+                                             const char *name,
+                                             unsigned int *pvalue)
+{
+    return xen_host_pci_get_value(d, name, pvalue, 16);
+}
+
+static inline int xen_host_pci_get_dec_value(XenHostPCIDevice *d,
+                                             const char *name,
+                                             unsigned int *pvalue)
+{
+    return xen_host_pci_get_value(d, name, pvalue, 10);
+}
+
+static bool xen_host_pci_dev_is_virtfn(XenHostPCIDevice *d)
+{
+    char path[PATH_MAX];
+    struct stat buf;
+
+    if (xen_host_pci_sysfs_path(d, "physfn", path, sizeof (path))) {
+        return false;
+    }
+    return !stat(path, &buf);
+}
+
+static int xen_host_pci_config_open(XenHostPCIDevice *d)
+{
+    char path[PATH_MAX];
+    int rc;
+
+    rc = xen_host_pci_sysfs_path(d, "config", path, sizeof (path));
+    if (rc) {
+        return rc;
+    }
+    d->config_fd = open(path, O_RDWR);
+    if (d->config_fd < 0) {
+        return -errno;
+    }
+    return 0;
+}
+
+static int xen_host_pci_config_read(XenHostPCIDevice *d,
+                                    int pos, void *buf, int len)
+{
+    int rc;
+
+    do {
+        rc = pread(d->config_fd, buf, len, pos);
+    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+    if (rc != len) {
+        return -errno;
+    }
+    return 0;
+}
+
+static int xen_host_pci_config_write(XenHostPCIDevice *d,
+                                     int pos, const void *buf, int len)
+{
+    int rc;
+
+    do {
+        rc = pwrite(d->config_fd, buf, len, pos);
+    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+    if (rc != len) {
+        return -errno;
+    }
+    return 0;
+}
+
+
+int xen_host_pci_get_byte(XenHostPCIDevice *d, int pos, uint8_t *p)
+{
+    uint8_t buf;
+    int rc = xen_host_pci_config_read(d, pos, &buf, 1);
+    if (!rc) {
+        *p = buf;
+    }
+    return rc;
+}
+
+int xen_host_pci_get_word(XenHostPCIDevice *d, int pos, uint16_t *p)
+{
+    uint16_t buf;
+    int rc = xen_host_pci_config_read(d, pos, &buf, 2);
+    if (!rc) {
+        *p = le16_to_cpu(buf);
+    }
+    return rc;
+}
+
+int xen_host_pci_get_long(XenHostPCIDevice *d, int pos, uint32_t *p)
+{
+    uint32_t buf;
+    int rc = xen_host_pci_config_read(d, pos, &buf, 4);
+    if (!rc) {
+        *p = le32_to_cpu(buf);
+    }
+    return rc;
+}
+
+int xen_host_pci_get_block(XenHostPCIDevice *d, int pos, uint8_t *buf, int len)
+{
+    return xen_host_pci_config_read(d, pos, buf, len);
+}
+
+int xen_host_pci_set_byte(XenHostPCIDevice *d, int pos, uint8_t data)
+{
+    return xen_host_pci_config_write(d, pos, &data, 1);
+}
+
+int xen_host_pci_set_word(XenHostPCIDevice *d, int pos, uint16_t data)
+{
+    data = cpu_to_le16(data);
+    return xen_host_pci_config_write(d, pos, &data, 2);
+}
+
+int xen_host_pci_set_long(XenHostPCIDevice *d, int pos, uint32_t data)
+{
+    data = cpu_to_le32(data);
+    return xen_host_pci_config_write(d, pos, &data, 4);
+}
+
+int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf, int len)
+{
+    return xen_host_pci_config_write(d, pos, buf, len);
+}
+
+int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *d, uint32_t cap)
+{
+    uint32_t header = 0;
+    int max_cap = XEN_HOST_PCI_MAX_EXT_CAP;
+    int pos = PCI_CONFIG_SPACE_SIZE;
+
+    do {
+        if (xen_host_pci_get_long(d, pos, &header)) {
+            break;
+        }
+        /*
+         * If we have no capabilities, this is indicated by cap ID,
+         * cap version and next pointer all being 0.
+         */
+        if (header == 0) {
+            break;
+        }
+
+        if (PCI_EXT_CAP_ID(header) == cap) {
+            return pos;
+        }
+
+        pos = PCI_EXT_CAP_NEXT(header);
+        if (pos < PCI_CONFIG_SPACE_SIZE) {
+            break;
+        }
+
+        max_cap--;
+    } while (max_cap > 0);
+
+    return -1;
+}
+
+int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
+                            uint8_t bus, uint8_t dev, uint8_t func)
+{
+    unsigned int v;
+    int rc = 0;
+
+    d->config_fd = -1;
+    d->domain = domain;
+    d->bus = bus;
+    d->dev = dev;
+    d->func = func;
+
+    rc = xen_host_pci_config_open(d);
+    if (rc) {
+        goto error;
+    }
+    rc = xen_host_pci_get_resource(d);
+    if (rc) {
+        goto error;
+    }
+    rc = xen_host_pci_get_hex_value(d, "vendor", &v);
+    if (rc) {
+        goto error;
+    }
+    d->vendor_id = v;
+    rc = xen_host_pci_get_hex_value(d, "device", &v);
+    if (rc) {
+        goto error;
+    }
+    d->device_id = v;
+    rc = xen_host_pci_get_dec_value(d, "irq", &v);
+    if (rc) {
+        goto error;
+    }
+    d->irq = v;
+    d->is_virtfn = xen_host_pci_dev_is_virtfn(d);
+
+    return 0;
+error:
+    if (d->config_fd >= 0) {
+        close(d->config_fd);
+        d->config_fd = -1;
+    }
+    return rc;
+}
+
+void xen_host_pci_device_put(XenHostPCIDevice *d)
+{
+    if (d->config_fd >= 0) {
+        close(d->config_fd);
+        d->config_fd = -1;
+    }
+}
diff --git a/hw/xen-host-pci-device.h b/hw/xen-host-pci-device.h
new file mode 100644
index 0000000..0079dac
--- /dev/null
+++ b/hw/xen-host-pci-device.h
@@ -0,0 +1,55 @@
+#ifndef XEN_HOST_PCI_DEVICE_H
+#define XEN_HOST_PCI_DEVICE_H
+
+#include "pci.h"
+
+enum {
+    XEN_HOST_PCI_REGION_TYPE_IO = 1 << 1,
+    XEN_HOST_PCI_REGION_TYPE_MEM = 1 << 2,
+    XEN_HOST_PCI_REGION_TYPE_PREFETCH = 1 << 3,
+    XEN_HOST_PCI_REGION_TYPE_MEM_64 = 1 << 4,
+};
+
+typedef struct XenHostPCIIORegion {
+    pcibus_t base_addr;
+    pcibus_t size;
+    uint8_t type;
+    uint8_t bus_flags; /* Bus-specific bits */
+} XenHostPCIIORegion;
+
+typedef struct XenHostPCIDevice {
+    uint16_t domain;
+    uint8_t bus;
+    uint8_t dev;
+    uint8_t func;
+
+    uint16_t vendor_id;
+    uint16_t device_id;
+    int irq;
+
+    XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
+    XenHostPCIIORegion rom;
+
+    bool is_virtfn;
+
+    int config_fd;
+} XenHostPCIDevice;
+
+int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
+                            uint8_t bus, uint8_t dev, uint8_t func);
+void xen_host_pci_device_put(XenHostPCIDevice *pci_dev);
+
+int xen_host_pci_get_byte(XenHostPCIDevice *d, int pos, uint8_t *p);
+int xen_host_pci_get_word(XenHostPCIDevice *d, int pos, uint16_t *p);
+int xen_host_pci_get_long(XenHostPCIDevice *d, int pos, uint32_t *p);
+int xen_host_pci_get_block(XenHostPCIDevice *d, int pos, uint8_t *buf,
+                           int len);
+int xen_host_pci_set_byte(XenHostPCIDevice *d, int pos, uint8_t data);
+int xen_host_pci_set_word(XenHostPCIDevice *d, int pos, uint16_t data);
+int xen_host_pci_set_long(XenHostPCIDevice *d, int pos, uint32_t data);
+int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf,
+                           int len);
+
+int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *s, uint32_t cap);
+
+#endif /* !XEN_HOST_PCI_DEVICE_H_ */
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V13 3/9] Introduce XenHostPCIDevice to access a pci device on the host.
@ 2012-06-14 17:01   ` Anthony PERARD
  0 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---
 hw/i386/Makefile.objs    |    1 +
 hw/xen-host-pci-device.c |  396 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/xen-host-pci-device.h |   55 +++++++
 3 files changed, 452 insertions(+), 0 deletions(-)
 create mode 100644 hw/xen-host-pci-device.c
 create mode 100644 hw/xen-host-pci-device.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index d43f1df..b719d8e 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -7,6 +7,7 @@ obj-y += debugcon.o multiboot.o
 obj-y += pc_piix.o
 obj-y += pc_sysfw.o
 obj-$(CONFIG_XEN) += xen_platform.o xen_apic.o
+obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
 obj-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o kvm/i8254.o
 obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
diff --git a/hw/xen-host-pci-device.c b/hw/xen-host-pci-device.c
new file mode 100644
index 0000000..e7ff680
--- /dev/null
+++ b/hw/xen-host-pci-device.c
@@ -0,0 +1,396 @@
+/*
+ * Copyright (C) 2011       Citrix Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "xen-host-pci-device.h"
+
+#define XEN_HOST_PCI_MAX_EXT_CAP \
+    ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
+
+#ifdef XEN_HOST_PCI_DEVICE_DEBUG
+#  define XEN_HOST_PCI_LOG(f, a...) fprintf(stderr, "%s: " f, __func__, ##a)
+#else
+#  define XEN_HOST_PCI_LOG(f, a...) (void)0
+#endif
+
+/*
+ * from linux/ioport.h
+ * IO resources have these defined flags.
+ */
+#define IORESOURCE_BITS         0x000000ff      /* Bus-specific bits */
+
+#define IORESOURCE_TYPE_BITS    0x00000f00      /* Resource type */
+#define IORESOURCE_IO           0x00000100
+#define IORESOURCE_MEM          0x00000200
+
+#define IORESOURCE_PREFETCH     0x00001000      /* No side effects */
+#define IORESOURCE_MEM_64       0x00100000
+
+static int xen_host_pci_sysfs_path(const XenHostPCIDevice *d,
+                                   const char *name, char *buf, ssize_t size)
+{
+    int rc;
+
+    rc = snprintf(buf, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
+                  d->domain, d->bus, d->dev, d->func, name);
+
+    if (rc >= size || rc < 0) {
+        /* The ouput is truncated or an other error is encountered */
+        return -ENODEV;
+    }
+    return 0;
+}
+
+
+/* This size should be enough to read the first 7 lines of a ressource file */
+#define XEN_HOST_PCI_RESSOURCE_BUFFER_SIZE 400
+static int xen_host_pci_get_resource(XenHostPCIDevice *d)
+{
+    int i, rc, fd;
+    char path[PATH_MAX];
+    char buf[XEN_HOST_PCI_RESSOURCE_BUFFER_SIZE];
+    unsigned long long start, end, flags, size;
+    char *endptr, *s;
+    uint8_t type;
+
+    rc = xen_host_pci_sysfs_path(d, "resource", path, sizeof (path));
+    if (rc) {
+        return rc;
+    }
+    fd = open(path, O_RDONLY);
+    if (fd == -1) {
+        XEN_HOST_PCI_LOG("Error: Can't open %s: %s\n", path, strerror(errno));
+        return -errno;
+    }
+
+    do {
+        rc = read(fd, &buf, sizeof (buf) - 1);
+        if (rc < 0 && errno != EINTR) {
+            rc = -errno;
+            goto out;
+        }
+    } while (rc < 0);
+    buf[rc] = 0;
+    rc = 0;
+
+    s = buf;
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        type = 0;
+
+        start = strtoll(s, &endptr, 16);
+        if (*endptr != ' ' || s == endptr) {
+            break;
+        }
+        s = endptr + 1;
+        end = strtoll(s, &endptr, 16);
+        if (*endptr != ' ' || s == endptr) {
+            break;
+        }
+        s = endptr + 1;
+        flags = strtoll(s, &endptr, 16);
+        if (*endptr != '\n' || s == endptr) {
+            break;
+        }
+        s = endptr + 1;
+
+        if (start) {
+            size = end - start + 1;
+        } else {
+            size = 0;
+        }
+
+        if (flags & IORESOURCE_IO) {
+            type |= XEN_HOST_PCI_REGION_TYPE_IO;
+        }
+        if (flags & IORESOURCE_MEM) {
+            type |= XEN_HOST_PCI_REGION_TYPE_MEM;
+        }
+        if (flags & IORESOURCE_PREFETCH) {
+            type |= XEN_HOST_PCI_REGION_TYPE_PREFETCH;
+        }
+        if (flags & IORESOURCE_MEM_64) {
+            type |= XEN_HOST_PCI_REGION_TYPE_MEM_64;
+        }
+
+        if (i < PCI_ROM_SLOT) {
+            d->io_regions[i].base_addr = start;
+            d->io_regions[i].size = size;
+            d->io_regions[i].type = type;
+            d->io_regions[i].bus_flags = flags & IORESOURCE_BITS;
+        } else {
+            d->rom.base_addr = start;
+            d->rom.size = size;
+            d->rom.type = type;
+            d->rom.bus_flags = flags & IORESOURCE_BITS;
+        }
+    }
+    if (i != PCI_NUM_REGIONS) {
+        /* Invalid format or input to short */
+        rc = -ENODEV;
+    }
+
+out:
+    close(fd);
+    return rc;
+}
+
+/* This size should be enough to read a long from a file */
+#define XEN_HOST_PCI_GET_VALUE_BUFFER_SIZE 22
+static int xen_host_pci_get_value(XenHostPCIDevice *d, const char *name,
+                                  unsigned int *pvalue, int base)
+{
+    char path[PATH_MAX];
+    char buf[XEN_HOST_PCI_GET_VALUE_BUFFER_SIZE];
+    int fd, rc;
+    unsigned long value;
+    char *endptr;
+
+    rc = xen_host_pci_sysfs_path(d, name, path, sizeof (path));
+    if (rc) {
+        return rc;
+    }
+    fd = open(path, O_RDONLY);
+    if (fd == -1) {
+        XEN_HOST_PCI_LOG("Error: Can't open %s: %s\n", path, strerror(errno));
+        return -errno;
+    }
+    do {
+        rc = read(fd, &buf, sizeof (buf) - 1);
+        if (rc < 0 && errno != EINTR) {
+            rc = -errno;
+            goto out;
+        }
+    } while (rc < 0);
+    buf[rc] = 0;
+    value = strtol(buf, &endptr, base);
+    if (endptr == buf || *endptr != '\n') {
+        rc = -1;
+    } else if ((value == LONG_MIN || value == LONG_MAX) && errno == ERANGE) {
+        rc = -errno;
+    } else {
+        rc = 0;
+        *pvalue = value;
+    }
+out:
+    close(fd);
+    return rc;
+}
+
+static inline int xen_host_pci_get_hex_value(XenHostPCIDevice *d,
+                                             const char *name,
+                                             unsigned int *pvalue)
+{
+    return xen_host_pci_get_value(d, name, pvalue, 16);
+}
+
+static inline int xen_host_pci_get_dec_value(XenHostPCIDevice *d,
+                                             const char *name,
+                                             unsigned int *pvalue)
+{
+    return xen_host_pci_get_value(d, name, pvalue, 10);
+}
+
+static bool xen_host_pci_dev_is_virtfn(XenHostPCIDevice *d)
+{
+    char path[PATH_MAX];
+    struct stat buf;
+
+    if (xen_host_pci_sysfs_path(d, "physfn", path, sizeof (path))) {
+        return false;
+    }
+    return !stat(path, &buf);
+}
+
+static int xen_host_pci_config_open(XenHostPCIDevice *d)
+{
+    char path[PATH_MAX];
+    int rc;
+
+    rc = xen_host_pci_sysfs_path(d, "config", path, sizeof (path));
+    if (rc) {
+        return rc;
+    }
+    d->config_fd = open(path, O_RDWR);
+    if (d->config_fd < 0) {
+        return -errno;
+    }
+    return 0;
+}
+
+static int xen_host_pci_config_read(XenHostPCIDevice *d,
+                                    int pos, void *buf, int len)
+{
+    int rc;
+
+    do {
+        rc = pread(d->config_fd, buf, len, pos);
+    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+    if (rc != len) {
+        return -errno;
+    }
+    return 0;
+}
+
+static int xen_host_pci_config_write(XenHostPCIDevice *d,
+                                     int pos, const void *buf, int len)
+{
+    int rc;
+
+    do {
+        rc = pwrite(d->config_fd, buf, len, pos);
+    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+    if (rc != len) {
+        return -errno;
+    }
+    return 0;
+}
+
+
+int xen_host_pci_get_byte(XenHostPCIDevice *d, int pos, uint8_t *p)
+{
+    uint8_t buf;
+    int rc = xen_host_pci_config_read(d, pos, &buf, 1);
+    if (!rc) {
+        *p = buf;
+    }
+    return rc;
+}
+
+int xen_host_pci_get_word(XenHostPCIDevice *d, int pos, uint16_t *p)
+{
+    uint16_t buf;
+    int rc = xen_host_pci_config_read(d, pos, &buf, 2);
+    if (!rc) {
+        *p = le16_to_cpu(buf);
+    }
+    return rc;
+}
+
+int xen_host_pci_get_long(XenHostPCIDevice *d, int pos, uint32_t *p)
+{
+    uint32_t buf;
+    int rc = xen_host_pci_config_read(d, pos, &buf, 4);
+    if (!rc) {
+        *p = le32_to_cpu(buf);
+    }
+    return rc;
+}
+
+int xen_host_pci_get_block(XenHostPCIDevice *d, int pos, uint8_t *buf, int len)
+{
+    return xen_host_pci_config_read(d, pos, buf, len);
+}
+
+int xen_host_pci_set_byte(XenHostPCIDevice *d, int pos, uint8_t data)
+{
+    return xen_host_pci_config_write(d, pos, &data, 1);
+}
+
+int xen_host_pci_set_word(XenHostPCIDevice *d, int pos, uint16_t data)
+{
+    data = cpu_to_le16(data);
+    return xen_host_pci_config_write(d, pos, &data, 2);
+}
+
+int xen_host_pci_set_long(XenHostPCIDevice *d, int pos, uint32_t data)
+{
+    data = cpu_to_le32(data);
+    return xen_host_pci_config_write(d, pos, &data, 4);
+}
+
+int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf, int len)
+{
+    return xen_host_pci_config_write(d, pos, buf, len);
+}
+
+int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *d, uint32_t cap)
+{
+    uint32_t header = 0;
+    int max_cap = XEN_HOST_PCI_MAX_EXT_CAP;
+    int pos = PCI_CONFIG_SPACE_SIZE;
+
+    do {
+        if (xen_host_pci_get_long(d, pos, &header)) {
+            break;
+        }
+        /*
+         * If we have no capabilities, this is indicated by cap ID,
+         * cap version and next pointer all being 0.
+         */
+        if (header == 0) {
+            break;
+        }
+
+        if (PCI_EXT_CAP_ID(header) == cap) {
+            return pos;
+        }
+
+        pos = PCI_EXT_CAP_NEXT(header);
+        if (pos < PCI_CONFIG_SPACE_SIZE) {
+            break;
+        }
+
+        max_cap--;
+    } while (max_cap > 0);
+
+    return -1;
+}
+
+int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
+                            uint8_t bus, uint8_t dev, uint8_t func)
+{
+    unsigned int v;
+    int rc = 0;
+
+    d->config_fd = -1;
+    d->domain = domain;
+    d->bus = bus;
+    d->dev = dev;
+    d->func = func;
+
+    rc = xen_host_pci_config_open(d);
+    if (rc) {
+        goto error;
+    }
+    rc = xen_host_pci_get_resource(d);
+    if (rc) {
+        goto error;
+    }
+    rc = xen_host_pci_get_hex_value(d, "vendor", &v);
+    if (rc) {
+        goto error;
+    }
+    d->vendor_id = v;
+    rc = xen_host_pci_get_hex_value(d, "device", &v);
+    if (rc) {
+        goto error;
+    }
+    d->device_id = v;
+    rc = xen_host_pci_get_dec_value(d, "irq", &v);
+    if (rc) {
+        goto error;
+    }
+    d->irq = v;
+    d->is_virtfn = xen_host_pci_dev_is_virtfn(d);
+
+    return 0;
+error:
+    if (d->config_fd >= 0) {
+        close(d->config_fd);
+        d->config_fd = -1;
+    }
+    return rc;
+}
+
+void xen_host_pci_device_put(XenHostPCIDevice *d)
+{
+    if (d->config_fd >= 0) {
+        close(d->config_fd);
+        d->config_fd = -1;
+    }
+}
diff --git a/hw/xen-host-pci-device.h b/hw/xen-host-pci-device.h
new file mode 100644
index 0000000..0079dac
--- /dev/null
+++ b/hw/xen-host-pci-device.h
@@ -0,0 +1,55 @@
+#ifndef XEN_HOST_PCI_DEVICE_H
+#define XEN_HOST_PCI_DEVICE_H
+
+#include "pci.h"
+
+enum {
+    XEN_HOST_PCI_REGION_TYPE_IO = 1 << 1,
+    XEN_HOST_PCI_REGION_TYPE_MEM = 1 << 2,
+    XEN_HOST_PCI_REGION_TYPE_PREFETCH = 1 << 3,
+    XEN_HOST_PCI_REGION_TYPE_MEM_64 = 1 << 4,
+};
+
+typedef struct XenHostPCIIORegion {
+    pcibus_t base_addr;
+    pcibus_t size;
+    uint8_t type;
+    uint8_t bus_flags; /* Bus-specific bits */
+} XenHostPCIIORegion;
+
+typedef struct XenHostPCIDevice {
+    uint16_t domain;
+    uint8_t bus;
+    uint8_t dev;
+    uint8_t func;
+
+    uint16_t vendor_id;
+    uint16_t device_id;
+    int irq;
+
+    XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
+    XenHostPCIIORegion rom;
+
+    bool is_virtfn;
+
+    int config_fd;
+} XenHostPCIDevice;
+
+int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
+                            uint8_t bus, uint8_t dev, uint8_t func);
+void xen_host_pci_device_put(XenHostPCIDevice *pci_dev);
+
+int xen_host_pci_get_byte(XenHostPCIDevice *d, int pos, uint8_t *p);
+int xen_host_pci_get_word(XenHostPCIDevice *d, int pos, uint16_t *p);
+int xen_host_pci_get_long(XenHostPCIDevice *d, int pos, uint32_t *p);
+int xen_host_pci_get_block(XenHostPCIDevice *d, int pos, uint8_t *buf,
+                           int len);
+int xen_host_pci_set_byte(XenHostPCIDevice *d, int pos, uint8_t data);
+int xen_host_pci_set_word(XenHostPCIDevice *d, int pos, uint16_t data);
+int xen_host_pci_set_long(XenHostPCIDevice *d, int pos, uint32_t data);
+int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf,
+                           int len);
+
+int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *s, uint32_t cap);
+
+#endif /* !XEN_HOST_PCI_DEVICE_H_ */
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device.
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
                   ` (3 preceding siblings ...)
  2012-06-14 17:01   ` Anthony PERARD
@ 2012-06-14 17:01 ` Anthony PERARD
  2012-06-14 19:38   ` Michael S. Tsirkin
                     ` (3 more replies)
  2012-06-14 17:01 ` Anthony PERARD
                   ` (10 subsequent siblings)
  15 siblings, 4 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

The purpose is to have a more generic pci_for_each_device by passing an extra
argument to the function called on every device.

This patch will be used in a next patch.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/pci.c          |   11 +++++++----
 hw/pci.h          |    4 +++-
 hw/xen_platform.c |    8 ++++----
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 127b7ac..d6537e3 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1127,7 +1127,9 @@ static const pci_class_desc pci_class_descriptions[] =
 };
 
 static void pci_for_each_device_under_bus(PCIBus *bus,
-                                          void (*fn)(PCIBus *b, PCIDevice *d))
+                                          void (*fn)(PCIBus *b, PCIDevice *d,
+                                                     void *opaque),
+                                          void *opaque)
 {
     PCIDevice *d;
     int devfn;
@@ -1135,18 +1137,19 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
     for(devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
         d = bus->devices[devfn];
         if (d) {
-            fn(bus, d);
+            fn(bus, d, opaque);
         }
     }
 }
 
 void pci_for_each_device(PCIBus *bus, int bus_num,
-                         void (*fn)(PCIBus *b, PCIDevice *d))
+                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
+                         void *opaque)
 {
     bus = pci_find_bus_nr(bus, bus_num);
 
     if (bus) {
-        pci_for_each_device_under_bus(bus, fn);
+        pci_for_each_device_under_bus(bus, fn, opaque);
     }
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index 7f223c0..95b608c 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -312,7 +312,9 @@ PCIDevice *pci_nic_init(NICInfo *nd, const char *default_model,
 PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char *default_model,
                                const char *default_devaddr);
 int pci_bus_num(PCIBus *s);
-void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *bus, PCIDevice *d));
+void pci_for_each_device(PCIBus *bus, int bus_num,
+                         void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
+                         void *opaque);
 PCIBus *pci_find_root_bus(int domain);
 int pci_find_domain(const PCIBus *bus);
 PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn);
diff --git a/hw/xen_platform.c b/hw/xen_platform.c
index 0214f37..c1fe984 100644
--- a/hw/xen_platform.c
+++ b/hw/xen_platform.c
@@ -83,7 +83,7 @@ static void log_writeb(PCIXenPlatformState *s, char val)
 #define UNPLUG_ALL_NICS 2
 #define UNPLUG_AUX_IDE_DISKS 4
 
-static void unplug_nic(PCIBus *b, PCIDevice *d)
+static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
 {
     if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
             PCI_CLASS_NETWORK_ETHERNET) {
@@ -96,10 +96,10 @@ static void unplug_nic(PCIBus *b, PCIDevice *d)
 
 static void pci_unplug_nics(PCIBus *bus)
 {
-    pci_for_each_device(bus, 0, unplug_nic);
+    pci_for_each_device(bus, 0, unplug_nic, NULL);
 }
 
-static void unplug_disks(PCIBus *b, PCIDevice *d)
+static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
 {
     if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
             PCI_CLASS_STORAGE_IDE) {
@@ -109,7 +109,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d)
 
 static void pci_unplug_disks(PCIBus *bus)
 {
-    pci_for_each_device(bus, 0, unplug_disks);
+    pci_for_each_device(bus, 0, unplug_disks, NULL);
 }
 
 static void platform_fixed_ioport_writew(void *opaque, uint32_t addr, uint32_t val)
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device.
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
                   ` (4 preceding siblings ...)
  2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device Anthony PERARD
@ 2012-06-14 17:01 ` Anthony PERARD
  2012-06-14 17:01 ` [PATCH V13 5/9] qdev-properties: Introduce pci-host-devaddr Anthony PERARD
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

The purpose is to have a more generic pci_for_each_device by passing an extra
argument to the function called on every device.

This patch will be used in a next patch.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/pci.c          |   11 +++++++----
 hw/pci.h          |    4 +++-
 hw/xen_platform.c |    8 ++++----
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 127b7ac..d6537e3 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1127,7 +1127,9 @@ static const pci_class_desc pci_class_descriptions[] =
 };
 
 static void pci_for_each_device_under_bus(PCIBus *bus,
-                                          void (*fn)(PCIBus *b, PCIDevice *d))
+                                          void (*fn)(PCIBus *b, PCIDevice *d,
+                                                     void *opaque),
+                                          void *opaque)
 {
     PCIDevice *d;
     int devfn;
@@ -1135,18 +1137,19 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
     for(devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
         d = bus->devices[devfn];
         if (d) {
-            fn(bus, d);
+            fn(bus, d, opaque);
         }
     }
 }
 
 void pci_for_each_device(PCIBus *bus, int bus_num,
-                         void (*fn)(PCIBus *b, PCIDevice *d))
+                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
+                         void *opaque)
 {
     bus = pci_find_bus_nr(bus, bus_num);
 
     if (bus) {
-        pci_for_each_device_under_bus(bus, fn);
+        pci_for_each_device_under_bus(bus, fn, opaque);
     }
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index 7f223c0..95b608c 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -312,7 +312,9 @@ PCIDevice *pci_nic_init(NICInfo *nd, const char *default_model,
 PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char *default_model,
                                const char *default_devaddr);
 int pci_bus_num(PCIBus *s);
-void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *bus, PCIDevice *d));
+void pci_for_each_device(PCIBus *bus, int bus_num,
+                         void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
+                         void *opaque);
 PCIBus *pci_find_root_bus(int domain);
 int pci_find_domain(const PCIBus *bus);
 PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn);
diff --git a/hw/xen_platform.c b/hw/xen_platform.c
index 0214f37..c1fe984 100644
--- a/hw/xen_platform.c
+++ b/hw/xen_platform.c
@@ -83,7 +83,7 @@ static void log_writeb(PCIXenPlatformState *s, char val)
 #define UNPLUG_ALL_NICS 2
 #define UNPLUG_AUX_IDE_DISKS 4
 
-static void unplug_nic(PCIBus *b, PCIDevice *d)
+static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
 {
     if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
             PCI_CLASS_NETWORK_ETHERNET) {
@@ -96,10 +96,10 @@ static void unplug_nic(PCIBus *b, PCIDevice *d)
 
 static void pci_unplug_nics(PCIBus *bus)
 {
-    pci_for_each_device(bus, 0, unplug_nic);
+    pci_for_each_device(bus, 0, unplug_nic, NULL);
 }
 
-static void unplug_disks(PCIBus *b, PCIDevice *d)
+static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
 {
     if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
             PCI_CLASS_STORAGE_IDE) {
@@ -109,7 +109,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d)
 
 static void pci_unplug_disks(PCIBus *bus)
 {
-    pci_for_each_device(bus, 0, unplug_disks);
+    pci_for_each_device(bus, 0, unplug_disks, NULL);
 }
 
 static void platform_fixed_ioport_writew(void *opaque, uint32_t addr, uint32_t val)
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH V13 5/9] qdev-properties: Introduce pci-host-devaddr.
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
                   ` (6 preceding siblings ...)
  2012-06-14 17:01 ` [PATCH V13 5/9] qdev-properties: Introduce pci-host-devaddr Anthony PERARD
@ 2012-06-14 17:01 ` Anthony PERARD
  2012-06-14 19:39   ` Michael S. Tsirkin
  2012-06-14 19:39   ` [Qemu-devel] " Michael S. Tsirkin
  2012-06-14 17:01   ` Anthony PERARD
                   ` (7 subsequent siblings)
  15 siblings, 2 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

This new property will be used to specify a host pci device address.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---
 hw/qdev-properties.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/qdev.h            |    3 +
 qemu-common.h        |    7 +++
 3 files changed, 117 insertions(+), 0 deletions(-)

diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 9ae3187..43e1964 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -899,6 +899,113 @@ PropertyInfo qdev_prop_blocksize = {
     .set   = set_blocksize,
 };
 
+/* --- pci host address --- */
+
+static void get_pci_host_devaddr(Object *obj, Visitor *v, void *opaque,
+                                 const char *name, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    Property *prop = opaque;
+    PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
+    char buffer[] = "xxxx:xx:xx.x";
+    char *p = buffer;
+    int rc = 0;
+
+    rc = snprintf(buffer, sizeof(buffer), "%04x:%02x:%02x.%d",
+                  addr->domain, addr->bus, addr->slot, addr->function);
+    assert(rc == sizeof(buffer) - 1);
+
+    visit_type_str(v, &p, name, errp);
+}
+
+/*
+ * Parse [<domain>:]<bus>:<slot>.<func>
+ *   if <domain> is not supplied, it's assumed to be 0.
+ */
+static void set_pci_host_devaddr(Object *obj, Visitor *v, void *opaque,
+                                 const char *name, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    Property *prop = opaque;
+    PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
+    Error *local_err = NULL;
+    char *str, *p;
+    char *e;
+    unsigned long val;
+    unsigned long dom = 0, bus = 0;
+    unsigned int slot = 0, func = 0;
+
+    if (dev->state != DEV_STATE_CREATED) {
+        error_set(errp, QERR_PERMISSION_DENIED);
+        return;
+    }
+
+    visit_type_str(v, &str, name, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    p = str;
+    val = strtoul(p, &e, 16);
+    if (e == p || *e != ':') {
+        goto inval;
+    }
+    bus = val;
+
+    p = e + 1;
+    val = strtoul(p, &e, 16);
+    if (e == p) {
+        goto inval;
+    }
+    if (*e == ':') {
+        dom = bus;
+        bus = val;
+        p = e + 1;
+        val = strtoul(p, &e, 16);
+        if (e == p) {
+            goto inval;
+        }
+    }
+    slot = val;
+
+    if (*e != '.') {
+        goto inval;
+    }
+    p = e + 1;
+    val = strtoul(p, &e, 10);
+    if (e == p) {
+        goto inval;
+    }
+    func = val;
+
+    if (dom > 0xffff || bus > 0xff || slot > 0x1f || func > 7) {
+        goto inval;
+    }
+
+    if (*e) {
+        goto inval;
+    }
+
+    addr->domain = dom;
+    addr->bus = bus;
+    addr->slot = slot;
+    addr->function = func;
+
+    g_free(str);
+    return;
+
+inval:
+    error_set_from_qdev_prop_error(errp, EINVAL, dev, prop, str);
+    g_free(str);
+}
+
+PropertyInfo qdev_prop_pci_host_devaddr = {
+    .name = "pci-host-devaddr",
+    .get = get_pci_host_devaddr,
+    .set = set_pci_host_devaddr,
+};
+
 /* --- public helpers --- */
 
 static Property *qdev_prop_walk(Property *props, const char *name)
diff --git a/hw/qdev.h b/hw/qdev.h
index 5386b16..8746f84 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -223,6 +223,7 @@ extern PropertyInfo qdev_prop_netdev;
 extern PropertyInfo qdev_prop_vlan;
 extern PropertyInfo qdev_prop_pci_devfn;
 extern PropertyInfo qdev_prop_blocksize;
+extern PropertyInfo qdev_prop_pci_host_devaddr;
 
 #define DEFINE_PROP(_name, _state, _field, _prop, _type) { \
         .name      = (_name),                                    \
@@ -286,6 +287,8 @@ extern PropertyInfo qdev_prop_blocksize;
                         LostTickPolicy)
 #define DEFINE_PROP_BLOCKSIZE(_n, _s, _f, _d) \
     DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_blocksize, uint16_t)
+#define DEFINE_PROP_PCI_HOST_DEVADDR(_n, _s, _f) \
+    DEFINE_PROP(_n, _s, _f, qdev_prop_pci_host_devaddr, PCIHostDeviceAddress)
 
 #define DEFINE_PROP_END_OF_LIST()               \
     {}
diff --git a/qemu-common.h b/qemu-common.h
index 91e0562..0d6e51c 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -274,6 +274,13 @@ typedef enum LostTickPolicy {
     LOST_TICK_MAX
 } LostTickPolicy;
 
+typedef struct PCIHostDeviceAddress {
+    unsigned int domain;
+    unsigned int bus;
+    unsigned int slot;
+    unsigned int function;
+} PCIHostDeviceAddress;
+
 void tcg_exec_init(unsigned long tb_size);
 bool tcg_enabled(void);
 
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V13 5/9] qdev-properties: Introduce pci-host-devaddr.
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
                   ` (5 preceding siblings ...)
  2012-06-14 17:01 ` Anthony PERARD
@ 2012-06-14 17:01 ` Anthony PERARD
  2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

This new property will be used to specify a host pci device address.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---
 hw/qdev-properties.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/qdev.h            |    3 +
 qemu-common.h        |    7 +++
 3 files changed, 117 insertions(+), 0 deletions(-)

diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 9ae3187..43e1964 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -899,6 +899,113 @@ PropertyInfo qdev_prop_blocksize = {
     .set   = set_blocksize,
 };
 
+/* --- pci host address --- */
+
+static void get_pci_host_devaddr(Object *obj, Visitor *v, void *opaque,
+                                 const char *name, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    Property *prop = opaque;
+    PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
+    char buffer[] = "xxxx:xx:xx.x";
+    char *p = buffer;
+    int rc = 0;
+
+    rc = snprintf(buffer, sizeof(buffer), "%04x:%02x:%02x.%d",
+                  addr->domain, addr->bus, addr->slot, addr->function);
+    assert(rc == sizeof(buffer) - 1);
+
+    visit_type_str(v, &p, name, errp);
+}
+
+/*
+ * Parse [<domain>:]<bus>:<slot>.<func>
+ *   if <domain> is not supplied, it's assumed to be 0.
+ */
+static void set_pci_host_devaddr(Object *obj, Visitor *v, void *opaque,
+                                 const char *name, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    Property *prop = opaque;
+    PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
+    Error *local_err = NULL;
+    char *str, *p;
+    char *e;
+    unsigned long val;
+    unsigned long dom = 0, bus = 0;
+    unsigned int slot = 0, func = 0;
+
+    if (dev->state != DEV_STATE_CREATED) {
+        error_set(errp, QERR_PERMISSION_DENIED);
+        return;
+    }
+
+    visit_type_str(v, &str, name, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    p = str;
+    val = strtoul(p, &e, 16);
+    if (e == p || *e != ':') {
+        goto inval;
+    }
+    bus = val;
+
+    p = e + 1;
+    val = strtoul(p, &e, 16);
+    if (e == p) {
+        goto inval;
+    }
+    if (*e == ':') {
+        dom = bus;
+        bus = val;
+        p = e + 1;
+        val = strtoul(p, &e, 16);
+        if (e == p) {
+            goto inval;
+        }
+    }
+    slot = val;
+
+    if (*e != '.') {
+        goto inval;
+    }
+    p = e + 1;
+    val = strtoul(p, &e, 10);
+    if (e == p) {
+        goto inval;
+    }
+    func = val;
+
+    if (dom > 0xffff || bus > 0xff || slot > 0x1f || func > 7) {
+        goto inval;
+    }
+
+    if (*e) {
+        goto inval;
+    }
+
+    addr->domain = dom;
+    addr->bus = bus;
+    addr->slot = slot;
+    addr->function = func;
+
+    g_free(str);
+    return;
+
+inval:
+    error_set_from_qdev_prop_error(errp, EINVAL, dev, prop, str);
+    g_free(str);
+}
+
+PropertyInfo qdev_prop_pci_host_devaddr = {
+    .name = "pci-host-devaddr",
+    .get = get_pci_host_devaddr,
+    .set = set_pci_host_devaddr,
+};
+
 /* --- public helpers --- */
 
 static Property *qdev_prop_walk(Property *props, const char *name)
diff --git a/hw/qdev.h b/hw/qdev.h
index 5386b16..8746f84 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -223,6 +223,7 @@ extern PropertyInfo qdev_prop_netdev;
 extern PropertyInfo qdev_prop_vlan;
 extern PropertyInfo qdev_prop_pci_devfn;
 extern PropertyInfo qdev_prop_blocksize;
+extern PropertyInfo qdev_prop_pci_host_devaddr;
 
 #define DEFINE_PROP(_name, _state, _field, _prop, _type) { \
         .name      = (_name),                                    \
@@ -286,6 +287,8 @@ extern PropertyInfo qdev_prop_blocksize;
                         LostTickPolicy)
 #define DEFINE_PROP_BLOCKSIZE(_n, _s, _f, _d) \
     DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_blocksize, uint16_t)
+#define DEFINE_PROP_PCI_HOST_DEVADDR(_n, _s, _f) \
+    DEFINE_PROP(_n, _s, _f, qdev_prop_pci_host_devaddr, PCIHostDeviceAddress)
 
 #define DEFINE_PROP_END_OF_LIST()               \
     {}
diff --git a/qemu-common.h b/qemu-common.h
index 91e0562..0d6e51c 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -274,6 +274,13 @@ typedef enum LostTickPolicy {
     LOST_TICK_MAX
 } LostTickPolicy;
 
+typedef struct PCIHostDeviceAddress {
+    unsigned int domain;
+    unsigned int bus;
+    unsigned int slot;
+    unsigned int function;
+} PCIHostDeviceAddress;
+
 void tcg_exec_init(unsigned long tb_size);
 bool tcg_enabled(void);
 
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH V13 6/9] Introduce Xen PCI Passthrough, qdevice (1/3)
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
@ 2012-06-14 17:01   ` Anthony PERARD
  2012-06-14 17:01 ` [PATCH V13 2/9] configure: Introduce --enable-xen-pci-passthrough Anthony PERARD
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Allen Kay, Xen Devel, Guy Zana,
	Anthony PERARD

From: Allen Kay <allen.m.kay@intel.com>

A more complete history can be found here:
git://xenbits.xensource.com/qemu-xen-unstable.git

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Guy Zana <guy@neocleus.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/i386/Makefile.objs   |    1 +
 hw/xen_common.h         |    3 +
 hw/xen_pt.c             |  812 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/xen_pt.h             |  248 +++++++++++++++
 hw/xen_pt_config_init.c |   11 +
 xen-all.c               |   12 +
 6 files changed, 1087 insertions(+), 0 deletions(-)
 create mode 100644 hw/xen_pt.c
 create mode 100644 hw/xen_pt.h
 create mode 100644 hw/xen_pt_config_init.c

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index b719d8e..e361a92 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -8,6 +8,7 @@ obj-y += pc_piix.o
 obj-y += pc_sysfw.o
 obj-$(CONFIG_XEN) += xen_platform.o xen_apic.o
 obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
+obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o
 obj-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o kvm/i8254.o
 obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
diff --git a/hw/xen_common.h b/hw/xen_common.h
index fe7f227..03b0bb1 100644
--- a/hw/xen_common.h
+++ b/hw/xen_common.h
@@ -150,4 +150,7 @@ static inline int xen_xc_hvm_inject_msi(XenXC xen_xc, domid_t dom,
 
 void destroy_hvm_domain(bool reboot);
 
+/* shutdown/destroy current domain because of an error */
+void xen_shutdown_fatal_error(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
+
 #endif /* QEMU_HW_XEN_COMMON_H */
diff --git a/hw/xen_pt.c b/hw/xen_pt.c
new file mode 100644
index 0000000..63a5c80
--- /dev/null
+++ b/hw/xen_pt.c
@@ -0,0 +1,812 @@
+/*
+ * Copyright (c) 2007, Neocleus Corporation.
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Alex Novik <alex@neocleus.com>
+ * Allen Kay <allen.m.kay@intel.com>
+ * Guy Zana <guy@neocleus.com>
+ *
+ * This file implements direct PCI assignment to a HVM guest
+ */
+
+/*
+ * Interrupt Disable policy:
+ *
+ * INTx interrupt:
+ *   Initialize(register_real_device)
+ *     Map INTx(xc_physdev_map_pirq):
+ *       <fail>
+ *         - Set real Interrupt Disable bit to '1'.
+ *         - Set machine_irq and assigned_device->machine_irq to '0'.
+ *         * Don't bind INTx.
+ *
+ *     Bind INTx(xc_domain_bind_pt_pci_irq):
+ *       <fail>
+ *         - Set real Interrupt Disable bit to '1'.
+ *         - Unmap INTx.
+ *         - Decrement xen_pt_mapped_machine_irq[machine_irq]
+ *         - Set assigned_device->machine_irq to '0'.
+ *
+ *   Write to Interrupt Disable bit by guest software(xen_pt_cmd_reg_write)
+ *     Write '0'
+ *       - Set real bit to '0' if assigned_device->machine_irq isn't '0'.
+ *
+ *     Write '1'
+ *       - Set real bit to '1'.
+ */
+
+#include <sys/ioctl.h>
+
+#include "pci.h"
+#include "xen.h"
+#include "xen_backend.h"
+#include "xen_pt.h"
+#include "range.h"
+
+#define XEN_PT_NR_IRQS (256)
+static uint8_t xen_pt_mapped_machine_irq[XEN_PT_NR_IRQS] = {0};
+
+void xen_pt_log(const PCIDevice *d, const char *f, ...)
+{
+    va_list ap;
+
+    va_start(ap, f);
+    if (d) {
+        fprintf(stderr, "[%02x:%02x.%d] ", pci_bus_num(d->bus),
+                PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
+    }
+    vfprintf(stderr, f, ap);
+    va_end(ap);
+}
+
+/* Config Space */
+
+static int xen_pt_pci_config_access_check(PCIDevice *d, uint32_t addr, int len)
+{
+    /* check offset range */
+    if (addr >= 0xFF) {
+        XEN_PT_ERR(d, "Failed to access register with offset exceeding 0xFF. "
+                   "(addr: 0x%02x, len: %d)\n", addr, len);
+        return -1;
+    }
+
+    /* check read size */
+    if ((len != 1) && (len != 2) && (len != 4)) {
+        XEN_PT_ERR(d, "Failed to access register with invalid access length. "
+                   "(addr: 0x%02x, len: %d)\n", addr, len);
+        return -1;
+    }
+
+    /* check offset alignment */
+    if (addr & (len - 1)) {
+        XEN_PT_ERR(d, "Failed to access register with invalid access size "
+                   "alignment. (addr: 0x%02x, len: %d)\n", addr, len);
+        return -1;
+    }
+
+    return 0;
+}
+
+int xen_pt_bar_offset_to_index(uint32_t offset)
+{
+    int index = 0;
+
+    /* check Exp ROM BAR */
+    if (offset == PCI_ROM_ADDRESS) {
+        return PCI_ROM_SLOT;
+    }
+
+    /* calculate BAR index */
+    index = (offset - PCI_BASE_ADDRESS_0) >> 2;
+    if (index >= PCI_NUM_REGIONS) {
+        return -1;
+    }
+
+    return index;
+}
+
+static uint32_t xen_pt_pci_read_config(PCIDevice *d, uint32_t addr, int len)
+{
+    XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
+    uint32_t val = 0;
+    XenPTRegGroup *reg_grp_entry = NULL;
+    XenPTReg *reg_entry = NULL;
+    int rc = 0;
+    int emul_len = 0;
+    uint32_t find_addr = addr;
+
+    if (xen_pt_pci_config_access_check(d, addr, len)) {
+        goto exit;
+    }
+
+    /* find register group entry */
+    reg_grp_entry = xen_pt_find_reg_grp(s, addr);
+    if (reg_grp_entry) {
+        /* check 0-Hardwired register group */
+        if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED) {
+            /* no need to emulate, just return 0 */
+            val = 0;
+            goto exit;
+        }
+    }
+
+    /* read I/O device register value */
+    rc = xen_host_pci_get_block(&s->real_device, addr, (uint8_t *)&val, len);
+    if (rc < 0) {
+        XEN_PT_ERR(d, "pci_read_block failed. return value: %d.\n", rc);
+        memset(&val, 0xff, len);
+    }
+
+    /* just return the I/O device register value for
+     * passthrough type register group */
+    if (reg_grp_entry == NULL) {
+        goto exit;
+    }
+
+    /* adjust the read value to appropriate CFC-CFF window */
+    val <<= (addr & 3) << 3;
+    emul_len = len;
+
+    /* loop around the guest requested size */
+    while (emul_len > 0) {
+        /* find register entry to be emulated */
+        reg_entry = xen_pt_find_reg(reg_grp_entry, find_addr);
+        if (reg_entry) {
+            XenPTRegInfo *reg = reg_entry->reg;
+            uint32_t real_offset = reg_grp_entry->base_offset + reg->offset;
+            uint32_t valid_mask = 0xFFFFFFFF >> ((4 - emul_len) << 3);
+            uint8_t *ptr_val = NULL;
+
+            valid_mask <<= (find_addr - real_offset) << 3;
+            ptr_val = (uint8_t *)&val + (real_offset & 3);
+
+            /* do emulation based on register size */
+            switch (reg->size) {
+            case 1:
+                if (reg->u.b.read) {
+                    rc = reg->u.b.read(s, reg_entry, ptr_val, valid_mask);
+                }
+                break;
+            case 2:
+                if (reg->u.w.read) {
+                    rc = reg->u.w.read(s, reg_entry,
+                                       (uint16_t *)ptr_val, valid_mask);
+                }
+                break;
+            case 4:
+                if (reg->u.dw.read) {
+                    rc = reg->u.dw.read(s, reg_entry,
+                                        (uint32_t *)ptr_val, valid_mask);
+                }
+                break;
+            }
+
+            if (rc < 0) {
+                xen_shutdown_fatal_error("Internal error: Invalid read "
+                                         "emulation. (%s, rc: %d)\n",
+                                         __func__, rc);
+                return 0;
+            }
+
+            /* calculate next address to find */
+            emul_len -= reg->size;
+            if (emul_len > 0) {
+                find_addr = real_offset + reg->size;
+            }
+        } else {
+            /* nothing to do with passthrough type register,
+             * continue to find next byte */
+            emul_len--;
+            find_addr++;
+        }
+    }
+
+    /* need to shift back before returning them to pci bus emulator */
+    val >>= ((addr & 3) << 3);
+
+exit:
+    XEN_PT_LOG_CONFIG(d, addr, val, len);
+    return val;
+}
+
+static void xen_pt_pci_write_config(PCIDevice *d, uint32_t addr,
+                                    uint32_t val, int len)
+{
+    XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
+    int index = 0;
+    XenPTRegGroup *reg_grp_entry = NULL;
+    int rc = 0;
+    uint32_t read_val = 0;
+    int emul_len = 0;
+    XenPTReg *reg_entry = NULL;
+    uint32_t find_addr = addr;
+    XenPTRegInfo *reg = NULL;
+
+    if (xen_pt_pci_config_access_check(d, addr, len)) {
+        return;
+    }
+
+    XEN_PT_LOG_CONFIG(d, addr, val, len);
+
+    /* check unused BAR register */
+    index = xen_pt_bar_offset_to_index(addr);
+    if ((index >= 0) && (val > 0 && val < XEN_PT_BAR_ALLF) &&
+        (s->bases[index].bar_flag == XEN_PT_BAR_FLAG_UNUSED)) {
+        XEN_PT_WARN(d, "Guest attempt to set address to unused Base Address "
+                    "Register. (addr: 0x%02x, len: %d)\n", addr, len);
+    }
+
+    /* find register group entry */
+    reg_grp_entry = xen_pt_find_reg_grp(s, addr);
+    if (reg_grp_entry) {
+        /* check 0-Hardwired register group */
+        if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED) {
+            /* ignore silently */
+            XEN_PT_WARN(d, "Access to 0-Hardwired register. "
+                        "(addr: 0x%02x, len: %d)\n", addr, len);
+            return;
+        }
+    }
+
+    rc = xen_host_pci_get_block(&s->real_device, addr,
+                                (uint8_t *)&read_val, len);
+    if (rc < 0) {
+        XEN_PT_ERR(d, "pci_read_block failed. return value: %d.\n", rc);
+        memset(&read_val, 0xff, len);
+    }
+
+    /* pass directly to the real device for passthrough type register group */
+    if (reg_grp_entry == NULL) {
+        goto out;
+    }
+
+    memory_region_transaction_begin();
+    pci_default_write_config(d, addr, val, len);
+
+    /* adjust the read and write value to appropriate CFC-CFF window */
+    read_val <<= (addr & 3) << 3;
+    val <<= (addr & 3) << 3;
+    emul_len = len;
+
+    /* loop around the guest requested size */
+    while (emul_len > 0) {
+        /* find register entry to be emulated */
+        reg_entry = xen_pt_find_reg(reg_grp_entry, find_addr);
+        if (reg_entry) {
+            reg = reg_entry->reg;
+            uint32_t real_offset = reg_grp_entry->base_offset + reg->offset;
+            uint32_t valid_mask = 0xFFFFFFFF >> ((4 - emul_len) << 3);
+            uint8_t *ptr_val = NULL;
+
+            valid_mask <<= (find_addr - real_offset) << 3;
+            ptr_val = (uint8_t *)&val + (real_offset & 3);
+
+            /* do emulation based on register size */
+            switch (reg->size) {
+            case 1:
+                if (reg->u.b.write) {
+                    rc = reg->u.b.write(s, reg_entry, ptr_val,
+                                        read_val >> ((real_offset & 3) << 3),
+                                        valid_mask);
+                }
+                break;
+            case 2:
+                if (reg->u.w.write) {
+                    rc = reg->u.w.write(s, reg_entry, (uint16_t *)ptr_val,
+                                        (read_val >> ((real_offset & 3) << 3)),
+                                        valid_mask);
+                }
+                break;
+            case 4:
+                if (reg->u.dw.write) {
+                    rc = reg->u.dw.write(s, reg_entry, (uint32_t *)ptr_val,
+                                         (read_val >> ((real_offset & 3) << 3)),
+                                         valid_mask);
+                }
+                break;
+            }
+
+            if (rc < 0) {
+                xen_shutdown_fatal_error("Internal error: Invalid write"
+                                         " emulation. (%s, rc: %d)\n",
+                                         __func__, rc);
+                return;
+            }
+
+            /* calculate next address to find */
+            emul_len -= reg->size;
+            if (emul_len > 0) {
+                find_addr = real_offset + reg->size;
+            }
+        } else {
+            /* nothing to do with passthrough type register,
+             * continue to find next byte */
+            emul_len--;
+            find_addr++;
+        }
+    }
+
+    /* need to shift back before passing them to xen_host_pci_device */
+    val >>= (addr & 3) << 3;
+
+    memory_region_transaction_commit();
+
+out:
+    if (!(reg && reg->no_wb)) {
+        /* unknown regs are passed through */
+        rc = xen_host_pci_set_block(&s->real_device, addr,
+                                    (uint8_t *)&val, len);
+
+        if (rc < 0) {
+            XEN_PT_ERR(d, "pci_write_block failed. return value: %d.\n", rc);
+        }
+    }
+}
+
+/* register regions */
+
+static uint64_t xen_pt_bar_read(void *o, target_phys_addr_t addr,
+                                unsigned size)
+{
+    PCIDevice *d = o;
+    /* if this function is called, that probably means that there is a
+     * misconfiguration of the IOMMU. */
+    XEN_PT_ERR(d, "Should not read BAR through QEMU. @0x"TARGET_FMT_plx"\n",
+               addr);
+    return 0;
+}
+static void xen_pt_bar_write(void *o, target_phys_addr_t addr, uint64_t val,
+                             unsigned size)
+{
+    PCIDevice *d = o;
+    /* Same comment as xen_pt_bar_read function */
+    XEN_PT_ERR(d, "Should not write BAR through QEMU. @0x"TARGET_FMT_plx"\n",
+               addr);
+}
+
+static const MemoryRegionOps ops = {
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .read = xen_pt_bar_read,
+    .write = xen_pt_bar_write,
+};
+
+static int xen_pt_register_regions(XenPCIPassthroughState *s)
+{
+    int i = 0;
+    XenHostPCIDevice *d = &s->real_device;
+
+    /* Register PIO/MMIO BARs */
+    for (i = 0; i < PCI_ROM_SLOT; i++) {
+        XenHostPCIIORegion *r = &d->io_regions[i];
+        uint8_t type;
+
+        if (r->base_addr == 0 || r->size == 0) {
+            continue;
+        }
+
+        s->bases[i].access.u = r->base_addr;
+
+        if (r->type & XEN_HOST_PCI_REGION_TYPE_IO) {
+            type = PCI_BASE_ADDRESS_SPACE_IO;
+        } else {
+            type = PCI_BASE_ADDRESS_SPACE_MEMORY;
+            if (r->type & XEN_HOST_PCI_REGION_TYPE_PREFETCH) {
+                type |= PCI_BASE_ADDRESS_MEM_PREFETCH;
+            }
+        }
+
+        memory_region_init_io(&s->bar[i], &ops, &s->dev,
+                              "xen-pci-pt-bar", r->size);
+        pci_register_bar(&s->dev, i, type, &s->bar[i]);
+
+        XEN_PT_LOG(&s->dev, "IO region %i registered (size=0x%08"PRIx64
+                   " base_addr=0x%08"PRIx64" type: %#x)\n",
+                   i, r->size, r->base_addr, type);
+    }
+
+    /* Register expansion ROM address */
+    if (d->rom.base_addr && d->rom.size) {
+        uint32_t bar_data = 0;
+
+        /* Re-set BAR reported by OS, otherwise ROM can't be read. */
+        if (xen_host_pci_get_long(d, PCI_ROM_ADDRESS, &bar_data)) {
+            return 0;
+        }
+        if ((bar_data & PCI_ROM_ADDRESS_MASK) == 0) {
+            bar_data |= d->rom.base_addr & PCI_ROM_ADDRESS_MASK;
+            xen_host_pci_set_long(d, PCI_ROM_ADDRESS, bar_data);
+        }
+
+        s->bases[PCI_ROM_SLOT].access.maddr = d->rom.base_addr;
+
+        memory_region_init_rom_device(&s->rom, NULL, NULL,
+                                      "xen-pci-pt-rom", d->rom.size);
+        pci_register_bar(&s->dev, PCI_ROM_SLOT, PCI_BASE_ADDRESS_MEM_PREFETCH,
+                         &s->rom);
+
+        XEN_PT_LOG(&s->dev, "Expansion ROM registered (size=0x%08"PRIx64
+                   " base_addr=0x%08"PRIx64")\n",
+                   d->rom.size, d->rom.base_addr);
+    }
+
+    return 0;
+}
+
+static void xen_pt_unregister_regions(XenPCIPassthroughState *s)
+{
+    XenHostPCIDevice *d = &s->real_device;
+    int i;
+
+    for (i = 0; i < PCI_NUM_REGIONS - 1; i++) {
+        XenHostPCIIORegion *r = &d->io_regions[i];
+
+        if (r->base_addr == 0 || r->size == 0) {
+            continue;
+        }
+
+        memory_region_destroy(&s->bar[i]);
+    }
+    if (d->rom.base_addr && d->rom.size) {
+        memory_region_destroy(&s->rom);
+    }
+}
+
+/* region mapping */
+
+static int xen_pt_bar_from_region(XenPCIPassthroughState *s, MemoryRegion *mr)
+{
+    int i = 0;
+
+    for (i = 0; i < PCI_NUM_REGIONS - 1; i++) {
+        if (mr == &s->bar[i]) {
+            return i;
+        }
+    }
+    if (mr == &s->rom) {
+        return PCI_ROM_SLOT;
+    }
+    return -1;
+}
+
+/*
+ * This function checks if an io_region overlaps an io_region from another
+ * device.  The io_region to check is provided with (addr, size and type)
+ * A callback can be provided and will be called for every region that is
+ * overlapped.
+ * The return value indicates if the region is overlappsed */
+struct CheckBarArgs {
+    XenPCIPassthroughState *s;
+    pcibus_t addr;
+    pcibus_t size;
+    uint8_t type;
+    bool rc;
+};
+static void xen_pt_check_bar_overlap(PCIBus *bus, PCIDevice *d, void *opaque)
+{
+    struct CheckBarArgs *arg = opaque;
+    XenPCIPassthroughState *s = arg->s;
+    uint8_t type = arg->type;
+    int i;
+
+    if (d->devfn == s->dev.devfn) {
+        return;
+    }
+
+    /* xxx: This ignores bridges. */
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        const PCIIORegion *r = &d->io_regions[i];
+
+        if (!r->size) {
+            continue;
+        }
+        if ((type & PCI_BASE_ADDRESS_SPACE_IO)
+            != (r->type & PCI_BASE_ADDRESS_SPACE_IO)) {
+            continue;
+        }
+
+        if (ranges_overlap(arg->addr, arg->size, r->addr, r->size)) {
+            XEN_PT_WARN(&s->dev,
+                        "Overlapped to device [%02x:%02x.%d] Region: %i"
+                        " (addr: %#"FMT_PCIBUS", len: %#"FMT_PCIBUS")\n",
+                        pci_bus_num(bus), PCI_SLOT(d->devfn),
+                        PCI_FUNC(d->devfn), i, r->addr, r->size);
+            arg->rc = true;
+        }
+    }
+}
+
+static void xen_pt_region_update(XenPCIPassthroughState *s,
+                                 MemoryRegionSection *sec, bool adding)
+{
+    PCIDevice *d = &s->dev;
+    MemoryRegion *mr = sec->mr;
+    int bar = -1;
+    int rc;
+    int op = adding ? DPCI_ADD_MAPPING : DPCI_REMOVE_MAPPING;
+    struct CheckBarArgs args = {
+        .s = s,
+        .addr = sec->offset_within_address_space,
+        .size = sec->size,
+        .rc = false,
+    };
+
+    bar = xen_pt_bar_from_region(s, mr);
+    if (bar == -1) {
+        return;
+    }
+
+    args.type = d->io_regions[bar].type;
+    pci_for_each_device(d->bus, pci_bus_num(d->bus),
+                        xen_pt_check_bar_overlap, &args);
+    if (args.rc) {
+        XEN_PT_WARN(d, "Region: %d (addr: %#"FMT_PCIBUS
+                    ", len: %#"FMT_PCIBUS") is overlapped.\n",
+                    bar, sec->offset_within_address_space, sec->size);
+    }
+
+    if (d->io_regions[bar].type & PCI_BASE_ADDRESS_SPACE_IO) {
+        uint32_t guest_port = sec->offset_within_address_space;
+        uint32_t machine_port = s->bases[bar].access.pio_base;
+        uint32_t size = sec->size;
+        rc = xc_domain_ioport_mapping(xen_xc, xen_domid,
+                                      guest_port, machine_port, size,
+                                      op);
+        if (rc) {
+            XEN_PT_ERR(d, "%s ioport mapping failed! (rc: %i)\n",
+                       adding ? "create new" : "remove old", rc);
+        }
+    } else {
+        pcibus_t guest_addr = sec->offset_within_address_space;
+        pcibus_t machine_addr = s->bases[bar].access.maddr
+            + sec->offset_within_region;
+        pcibus_t size = sec->size;
+        rc = xc_domain_memory_mapping(xen_xc, xen_domid,
+                                      XEN_PFN(guest_addr + XC_PAGE_SIZE - 1),
+                                      XEN_PFN(machine_addr + XC_PAGE_SIZE - 1),
+                                      XEN_PFN(size + XC_PAGE_SIZE - 1),
+                                      op);
+        if (rc) {
+            XEN_PT_ERR(d, "%s mem mapping failed! (rc: %i)\n",
+                       adding ? "create new" : "remove old", rc);
+        }
+    }
+}
+
+static void xen_pt_begin(MemoryListener *l)
+{
+}
+
+static void xen_pt_commit(MemoryListener *l)
+{
+}
+
+static void xen_pt_region_add(MemoryListener *l, MemoryRegionSection *sec)
+{
+    XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState,
+                                             memory_listener);
+
+    xen_pt_region_update(s, sec, true);
+}
+
+static void xen_pt_region_del(MemoryListener *l, MemoryRegionSection *sec)
+{
+    XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState,
+                                             memory_listener);
+
+    xen_pt_region_update(s, sec, false);
+}
+
+static void xen_pt_region_nop(MemoryListener *l, MemoryRegionSection *s)
+{
+}
+
+static void xen_pt_log_fns(MemoryListener *l, MemoryRegionSection *s)
+{
+}
+
+static void xen_pt_log_global_fns(MemoryListener *l)
+{
+}
+
+static void xen_pt_eventfd_fns(MemoryListener *l, MemoryRegionSection *s,
+                               bool match_data, uint64_t data, int fd)
+{
+}
+
+static const MemoryListener xen_pt_memory_listener = {
+    .begin = xen_pt_begin,
+    .commit = xen_pt_commit,
+    .region_add = xen_pt_region_add,
+    .region_nop = xen_pt_region_nop,
+    .region_del = xen_pt_region_del,
+    .log_start = xen_pt_log_fns,
+    .log_stop = xen_pt_log_fns,
+    .log_sync = xen_pt_log_fns,
+    .log_global_start = xen_pt_log_global_fns,
+    .log_global_stop = xen_pt_log_global_fns,
+    .eventfd_add = xen_pt_eventfd_fns,
+    .eventfd_del = xen_pt_eventfd_fns,
+    .priority = 10,
+};
+
+/* init */
+
+static int xen_pt_initfn(PCIDevice *d)
+{
+    XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
+    int rc = 0;
+    uint8_t machine_irq = 0;
+    int pirq = XEN_PT_UNASSIGNED_PIRQ;
+
+    /* register real device */
+    XEN_PT_LOG(d, "Assigning real physical device %02x:%02x.%d"
+               " to devfn %#x\n",
+               s->hostaddr.bus, s->hostaddr.slot, s->hostaddr.function,
+               s->dev.devfn);
+
+    rc = xen_host_pci_device_get(&s->real_device,
+                                 s->hostaddr.domain, s->hostaddr.bus,
+                                 s->hostaddr.slot, s->hostaddr.function);
+    if (rc) {
+        XEN_PT_ERR(d, "Failed to \"open\" the real pci device. rc: %i\n", rc);
+        return -1;
+    }
+
+    s->is_virtfn = s->real_device.is_virtfn;
+    if (s->is_virtfn) {
+        XEN_PT_LOG(d, "%04x:%02x:%02x.%d is a SR-IOV Virtual Function\n",
+                   s->real_device.domain, bus, slot, func);
+    }
+
+    /* Initialize virtualized PCI configuration (Extended 256 Bytes) */
+    if (xen_host_pci_get_block(&s->real_device, 0, d->config,
+                               PCI_CONFIG_SPACE_SIZE) == -1) {
+        xen_host_pci_device_put(&s->real_device);
+        return -1;
+    }
+
+    s->memory_listener = xen_pt_memory_listener;
+
+    /* Handle real device's MMIO/PIO BARs */
+    xen_pt_register_regions(s);
+
+    /* Bind interrupt */
+    if (!s->dev.config[PCI_INTERRUPT_PIN]) {
+        XEN_PT_LOG(d, "no pin interrupt\n");
+        goto out;
+    }
+
+    machine_irq = s->real_device.irq;
+    rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
+
+    if (rc < 0) {
+        XEN_PT_ERR(d, "Mapping machine irq %u to pirq %i failed, (rc: %d)\n",
+                   machine_irq, pirq, rc);
+
+        /* Disable PCI intx assertion (turn on bit10 of devctl) */
+        xen_host_pci_set_word(&s->real_device,
+                              PCI_COMMAND,
+                              pci_get_word(s->dev.config + PCI_COMMAND)
+                              | PCI_COMMAND_INTX_DISABLE);
+        machine_irq = 0;
+        s->machine_irq = 0;
+    } else {
+        machine_irq = pirq;
+        s->machine_irq = pirq;
+        xen_pt_mapped_machine_irq[machine_irq]++;
+    }
+
+    /* bind machine_irq to device */
+    if (machine_irq != 0) {
+        uint8_t e_intx = xen_pt_pci_intx(s);
+
+        rc = xc_domain_bind_pt_pci_irq(xen_xc, xen_domid, machine_irq,
+                                       pci_bus_num(d->bus),
+                                       PCI_SLOT(d->devfn),
+                                       e_intx);
+        if (rc < 0) {
+            XEN_PT_ERR(d, "Binding of interrupt %i failed! (rc: %d)\n",
+                       e_intx, rc);
+
+            /* Disable PCI intx assertion (turn on bit10 of devctl) */
+            xen_host_pci_set_word(&s->real_device, PCI_COMMAND,
+                                  *(uint16_t *)(&s->dev.config[PCI_COMMAND])
+                                  | PCI_COMMAND_INTX_DISABLE);
+            xen_pt_mapped_machine_irq[machine_irq]--;
+
+            if (xen_pt_mapped_machine_irq[machine_irq] == 0) {
+                if (xc_physdev_unmap_pirq(xen_xc, xen_domid, machine_irq)) {
+                    XEN_PT_ERR(d, "Unmapping of machine interrupt %i failed!"
+                               " (rc: %d)\n", machine_irq, rc);
+                }
+            }
+            s->machine_irq = 0;
+        }
+    }
+
+out:
+    memory_listener_register(&s->memory_listener, NULL);
+    XEN_PT_LOG(d, "Real physical device %02x:%02x.%d registered successfuly!\n",
+               bus, slot, func);
+
+    return 0;
+}
+
+static int xen_pt_unregister_device(PCIDevice *d)
+{
+    XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
+    uint8_t machine_irq = s->machine_irq;
+    uint8_t intx = xen_pt_pci_intx(s);
+    int rc;
+
+    if (machine_irq) {
+        rc = xc_domain_unbind_pt_irq(xen_xc, xen_domid, machine_irq,
+                                     PT_IRQ_TYPE_PCI,
+                                     pci_bus_num(d->bus),
+                                     PCI_SLOT(s->dev.devfn),
+                                     intx,
+                                     0 /* isa_irq */);
+        if (rc < 0) {
+            XEN_PT_ERR(d, "unbinding of interrupt INT%c failed."
+                       " (machine irq: %i, rc: %d)"
+                       " But bravely continuing on..\n",
+                       'a' + intx, machine_irq, rc);
+        }
+    }
+
+    if (machine_irq) {
+        xen_pt_mapped_machine_irq[machine_irq]--;
+
+        if (xen_pt_mapped_machine_irq[machine_irq] == 0) {
+            rc = xc_physdev_unmap_pirq(xen_xc, xen_domid, machine_irq);
+
+            if (rc < 0) {
+                XEN_PT_ERR(d, "unmapping of interrupt %i failed. (rc: %d)"
+                           " But bravely continuing on..\n",
+                           machine_irq, rc);
+            }
+        }
+    }
+
+    xen_pt_unregister_regions(s);
+    memory_listener_unregister(&s->memory_listener);
+
+    xen_host_pci_device_put(&s->real_device);
+
+    return 0;
+}
+
+static Property xen_pci_passthrough_properties[] = {
+    DEFINE_PROP_PCI_HOST_DEVADDR("hostaddr", XenPCIPassthroughState, hostaddr),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void xen_pci_passthrough_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->init = xen_pt_initfn;
+    k->exit = xen_pt_unregister_device;
+    k->config_read = xen_pt_pci_read_config;
+    k->config_write = xen_pt_pci_write_config;
+    dc->desc = "Assign an host PCI device with Xen";
+    dc->props = xen_pci_passthrough_properties;
+};
+
+static TypeInfo xen_pci_passthrough_info = {
+    .name = "xen-pci-passthrough",
+    .parent = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(XenPCIPassthroughState),
+    .class_init = xen_pci_passthrough_class_init,
+};
+
+static void xen_pci_passthrough_register_types(void)
+{
+    type_register_static(&xen_pci_passthrough_info);
+}
+
+type_init(xen_pci_passthrough_register_types)
diff --git a/hw/xen_pt.h b/hw/xen_pt.h
new file mode 100644
index 0000000..36001a7
--- /dev/null
+++ b/hw/xen_pt.h
@@ -0,0 +1,248 @@
+#ifndef XEN_PT_H
+#define XEN_PT_H
+
+#include "qemu-common.h"
+#include "xen_common.h"
+#include "pci.h"
+#include "xen-host-pci-device.h"
+
+void xen_pt_log(const PCIDevice *d, const char *f, ...) GCC_FMT_ATTR(2, 3);
+
+#define XEN_PT_ERR(d, _f, _a...) xen_pt_log(d, "%s: Error: "_f, __func__, ##_a)
+
+#ifdef XEN_PT_LOGGING_ENABLED
+#  define XEN_PT_LOG(d, _f, _a...)  xen_pt_log(d, "%s: " _f, __func__, ##_a)
+#  define XEN_PT_WARN(d, _f, _a...) \
+    xen_pt_log(d, "%s: Warning: "_f, __func__, ##_a)
+#else
+#  define XEN_PT_LOG(d, _f, _a...)
+#  define XEN_PT_WARN(d, _f, _a...)
+#endif
+
+#ifdef XEN_PT_DEBUG_PCI_CONFIG_ACCESS
+#  define XEN_PT_LOG_CONFIG(d, addr, val, len) \
+    xen_pt_log(d, "%s: address=0x%04x val=0x%08x len=%d\n", \
+               __func__, addr, val, len)
+#else
+#  define XEN_PT_LOG_CONFIG(d, addr, val, len)
+#endif
+
+
+/* Helper */
+#define XEN_PFN(x) ((x) >> XC_PAGE_SHIFT)
+
+typedef struct XenPTRegInfo XenPTRegInfo;
+typedef struct XenPTReg XenPTReg;
+
+typedef struct XenPCIPassthroughState XenPCIPassthroughState;
+
+/* function type for config reg */
+typedef int (*xen_pt_conf_reg_init)
+    (XenPCIPassthroughState *, XenPTRegInfo *, uint32_t real_offset,
+     uint32_t *data);
+typedef int (*xen_pt_conf_dword_write)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint32_t *val, uint32_t dev_value, uint32_t valid_mask);
+typedef int (*xen_pt_conf_word_write)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint16_t *val, uint16_t dev_value, uint16_t valid_mask);
+typedef int (*xen_pt_conf_byte_write)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint8_t *val, uint8_t dev_value, uint8_t valid_mask);
+typedef int (*xen_pt_conf_dword_read)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint32_t *val, uint32_t valid_mask);
+typedef int (*xen_pt_conf_word_read)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint16_t *val, uint16_t valid_mask);
+typedef int (*xen_pt_conf_byte_read)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint8_t *val, uint8_t valid_mask);
+
+#define XEN_PT_BAR_ALLF 0xFFFFFFFF
+#define XEN_PT_BAR_UNMAPPED (-1)
+
+
+typedef enum {
+    XEN_PT_GRP_TYPE_HARDWIRED = 0,  /* 0 Hardwired reg group */
+    XEN_PT_GRP_TYPE_EMU,            /* emul reg group */
+} XenPTRegisterGroupType;
+
+typedef enum {
+    XEN_PT_BAR_FLAG_MEM = 0,        /* Memory type BAR */
+    XEN_PT_BAR_FLAG_IO,             /* I/O type BAR */
+    XEN_PT_BAR_FLAG_UPPER,          /* upper 64bit BAR */
+    XEN_PT_BAR_FLAG_UNUSED,         /* unused BAR */
+} XenPTBarFlag;
+
+
+typedef struct XenPTRegion {
+    /* BAR flag */
+    XenPTBarFlag bar_flag;
+    /* Translation of the emulated address */
+    union {
+        uint64_t maddr;
+        uint64_t pio_base;
+        uint64_t u;
+    } access;
+} XenPTRegion;
+
+/* XenPTRegInfo declaration
+ * - only for emulated register (either a part or whole bit).
+ * - for passthrough register that need special behavior (like interacting with
+ *   other component), set emu_mask to all 0 and specify r/w func properly.
+ * - do NOT use ALL F for init_val, otherwise the tbl will not be registered.
+ */
+
+/* emulated register infomation */
+struct XenPTRegInfo {
+    uint32_t offset;
+    uint32_t size;
+    uint32_t init_val;
+    /* reg read only field mask (ON:RO/ROS, OFF:other) */
+    uint32_t ro_mask;
+    /* reg emulate field mask (ON:emu, OFF:passthrough) */
+    uint32_t emu_mask;
+    /* no write back allowed */
+    uint32_t no_wb;
+    xen_pt_conf_reg_init init;
+    /* read/write function pointer
+     * for double_word/word/byte size */
+    union {
+        struct {
+            xen_pt_conf_dword_write write;
+            xen_pt_conf_dword_read read;
+        } dw;
+        struct {
+            xen_pt_conf_word_write write;
+            xen_pt_conf_word_read read;
+        } w;
+        struct {
+            xen_pt_conf_byte_write write;
+            xen_pt_conf_byte_read read;
+        } b;
+    } u;
+};
+
+/* emulated register management */
+struct XenPTReg {
+    QLIST_ENTRY(XenPTReg) entries;
+    XenPTRegInfo *reg;
+    uint32_t data; /* emulated value */
+};
+
+typedef struct XenPTRegGroupInfo XenPTRegGroupInfo;
+
+/* emul reg group size initialize method */
+typedef int (*xen_pt_reg_size_init_fn)
+    (XenPCIPassthroughState *, const XenPTRegGroupInfo *,
+     uint32_t base_offset, uint8_t *size);
+
+/* emulated register group infomation */
+struct XenPTRegGroupInfo {
+    uint8_t grp_id;
+    XenPTRegisterGroupType grp_type;
+    uint8_t grp_size;
+    xen_pt_reg_size_init_fn size_init;
+    XenPTRegInfo *emu_regs;
+};
+
+/* emul register group management table */
+typedef struct XenPTRegGroup {
+    QLIST_ENTRY(XenPTRegGroup) entries;
+    const XenPTRegGroupInfo *reg_grp;
+    uint32_t base_offset;
+    uint8_t size;
+    QLIST_HEAD(, XenPTReg) reg_tbl_list;
+} XenPTRegGroup;
+
+
+#define XEN_PT_UNASSIGNED_PIRQ (-1)
+
+struct XenPCIPassthroughState {
+    PCIDevice dev;
+
+    PCIHostDeviceAddress hostaddr;
+    bool is_virtfn;
+    XenHostPCIDevice real_device;
+    XenPTRegion bases[PCI_NUM_REGIONS]; /* Access regions */
+    QLIST_HEAD(, XenPTRegGroup) reg_grps;
+
+    uint32_t machine_irq;
+
+    MemoryRegion bar[PCI_NUM_REGIONS - 1];
+    MemoryRegion rom;
+
+    MemoryListener memory_listener;
+};
+
+int xen_pt_config_init(XenPCIPassthroughState *s);
+void xen_pt_config_delete(XenPCIPassthroughState *s);
+XenPTRegGroup *xen_pt_find_reg_grp(XenPCIPassthroughState *s, uint32_t address);
+XenPTReg *xen_pt_find_reg(XenPTRegGroup *reg_grp, uint32_t address);
+int xen_pt_bar_offset_to_index(uint32_t offset);
+
+static inline pcibus_t xen_pt_get_emul_size(XenPTBarFlag flag, pcibus_t r_size)
+{
+    /* align resource size (memory type only) */
+    if (flag == XEN_PT_BAR_FLAG_MEM) {
+        return (r_size + XC_PAGE_SIZE - 1) & XC_PAGE_MASK;
+    } else {
+        return r_size;
+    }
+}
+
+/* INTx */
+/* The PCI Local Bus Specification, Rev. 3.0,
+ * Section 6.2.4 Miscellaneous Registers, pp 223
+ * outlines 5 valid values for the interrupt pin (intx).
+ *  0: For devices (or device functions) that don't use an interrupt in
+ *  1: INTA#
+ *  2: INTB#
+ *  3: INTC#
+ *  4: INTD#
+ *
+ * Xen uses the following 4 values for intx
+ *  0: INTA#
+ *  1: INTB#
+ *  2: INTC#
+ *  3: INTD#
+ *
+ * Observing that these list of values are not the same, xen_pt_pci_read_intx()
+ * uses the following mapping from hw to xen values.
+ * This seems to reflect the current usage within Xen.
+ *
+ * PCI hardware    | Xen | Notes
+ * ----------------+-----+----------------------------------------------------
+ * 0               | 0   | No interrupt
+ * 1               | 0   | INTA#
+ * 2               | 1   | INTB#
+ * 3               | 2   | INTC#
+ * 4               | 3   | INTD#
+ * any other value | 0   | This should never happen, log error message
+ */
+
+static inline uint8_t xen_pt_pci_read_intx(XenPCIPassthroughState *s)
+{
+    uint8_t v = 0;
+    xen_host_pci_get_byte(&s->real_device, PCI_INTERRUPT_PIN, &v);
+    return v;
+}
+
+static inline uint8_t xen_pt_pci_intx(XenPCIPassthroughState *s)
+{
+    uint8_t r_val = xen_pt_pci_read_intx(s);
+
+    XEN_PT_LOG(&s->dev, "intx=%i\n", r_val);
+    if (r_val < 1 || r_val > 4) {
+        XEN_PT_LOG(&s->dev, "Interrupt pin read from hardware is out of range:"
+                   " value=%i, acceptable range is 1 - 4\n", r_val);
+        r_val = 0;
+    } else {
+        r_val -= 1;
+    }
+
+    return r_val;
+}
+
+#endif /* !XEN_PT_H */
diff --git a/hw/xen_pt_config_init.c b/hw/xen_pt_config_init.c
new file mode 100644
index 0000000..64d22e8
--- /dev/null
+++ b/hw/xen_pt_config_init.c
@@ -0,0 +1,11 @@
+#include "xen_pt.h"
+
+XenPTRegGroup *xen_pt_find_reg_grp(XenPCIPassthroughState *s, uint32_t address)
+{
+    return NULL;
+}
+
+XenPTReg *xen_pt_find_reg(XenPTRegGroup *reg_grp, uint32_t address)
+{
+    return NULL;
+}
diff --git a/xen-all.c b/xen-all.c
index b5220cc..59f2323 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -1191,3 +1191,15 @@ void xen_register_framebuffer(MemoryRegion *mr)
 {
     framebuffer = mr;
 }
+
+void xen_shutdown_fatal_error(const char *fmt, ...)
+{
+    va_list ap;
+
+    va_start(ap, fmt);
+    vfprintf(stderr, fmt, ap);
+    va_end(ap);
+    fprintf(stderr, "Will destroy the domain.\n");
+    /* destroy the domain */
+    qemu_system_shutdown_request();
+}
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V13 6/9] Introduce Xen PCI Passthrough, qdevice (1/3)
@ 2012-06-14 17:01   ` Anthony PERARD
  0 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Allen Kay, Xen Devel, Guy Zana,
	Anthony PERARD

From: Allen Kay <allen.m.kay@intel.com>

A more complete history can be found here:
git://xenbits.xensource.com/qemu-xen-unstable.git

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Guy Zana <guy@neocleus.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/i386/Makefile.objs   |    1 +
 hw/xen_common.h         |    3 +
 hw/xen_pt.c             |  812 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/xen_pt.h             |  248 +++++++++++++++
 hw/xen_pt_config_init.c |   11 +
 xen-all.c               |   12 +
 6 files changed, 1087 insertions(+), 0 deletions(-)
 create mode 100644 hw/xen_pt.c
 create mode 100644 hw/xen_pt.h
 create mode 100644 hw/xen_pt_config_init.c

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index b719d8e..e361a92 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -8,6 +8,7 @@ obj-y += pc_piix.o
 obj-y += pc_sysfw.o
 obj-$(CONFIG_XEN) += xen_platform.o xen_apic.o
 obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
+obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o
 obj-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o kvm/i8254.o
 obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
diff --git a/hw/xen_common.h b/hw/xen_common.h
index fe7f227..03b0bb1 100644
--- a/hw/xen_common.h
+++ b/hw/xen_common.h
@@ -150,4 +150,7 @@ static inline int xen_xc_hvm_inject_msi(XenXC xen_xc, domid_t dom,
 
 void destroy_hvm_domain(bool reboot);
 
+/* shutdown/destroy current domain because of an error */
+void xen_shutdown_fatal_error(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
+
 #endif /* QEMU_HW_XEN_COMMON_H */
diff --git a/hw/xen_pt.c b/hw/xen_pt.c
new file mode 100644
index 0000000..63a5c80
--- /dev/null
+++ b/hw/xen_pt.c
@@ -0,0 +1,812 @@
+/*
+ * Copyright (c) 2007, Neocleus Corporation.
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Alex Novik <alex@neocleus.com>
+ * Allen Kay <allen.m.kay@intel.com>
+ * Guy Zana <guy@neocleus.com>
+ *
+ * This file implements direct PCI assignment to a HVM guest
+ */
+
+/*
+ * Interrupt Disable policy:
+ *
+ * INTx interrupt:
+ *   Initialize(register_real_device)
+ *     Map INTx(xc_physdev_map_pirq):
+ *       <fail>
+ *         - Set real Interrupt Disable bit to '1'.
+ *         - Set machine_irq and assigned_device->machine_irq to '0'.
+ *         * Don't bind INTx.
+ *
+ *     Bind INTx(xc_domain_bind_pt_pci_irq):
+ *       <fail>
+ *         - Set real Interrupt Disable bit to '1'.
+ *         - Unmap INTx.
+ *         - Decrement xen_pt_mapped_machine_irq[machine_irq]
+ *         - Set assigned_device->machine_irq to '0'.
+ *
+ *   Write to Interrupt Disable bit by guest software(xen_pt_cmd_reg_write)
+ *     Write '0'
+ *       - Set real bit to '0' if assigned_device->machine_irq isn't '0'.
+ *
+ *     Write '1'
+ *       - Set real bit to '1'.
+ */
+
+#include <sys/ioctl.h>
+
+#include "pci.h"
+#include "xen.h"
+#include "xen_backend.h"
+#include "xen_pt.h"
+#include "range.h"
+
+#define XEN_PT_NR_IRQS (256)
+static uint8_t xen_pt_mapped_machine_irq[XEN_PT_NR_IRQS] = {0};
+
+void xen_pt_log(const PCIDevice *d, const char *f, ...)
+{
+    va_list ap;
+
+    va_start(ap, f);
+    if (d) {
+        fprintf(stderr, "[%02x:%02x.%d] ", pci_bus_num(d->bus),
+                PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
+    }
+    vfprintf(stderr, f, ap);
+    va_end(ap);
+}
+
+/* Config Space */
+
+static int xen_pt_pci_config_access_check(PCIDevice *d, uint32_t addr, int len)
+{
+    /* check offset range */
+    if (addr >= 0xFF) {
+        XEN_PT_ERR(d, "Failed to access register with offset exceeding 0xFF. "
+                   "(addr: 0x%02x, len: %d)\n", addr, len);
+        return -1;
+    }
+
+    /* check read size */
+    if ((len != 1) && (len != 2) && (len != 4)) {
+        XEN_PT_ERR(d, "Failed to access register with invalid access length. "
+                   "(addr: 0x%02x, len: %d)\n", addr, len);
+        return -1;
+    }
+
+    /* check offset alignment */
+    if (addr & (len - 1)) {
+        XEN_PT_ERR(d, "Failed to access register with invalid access size "
+                   "alignment. (addr: 0x%02x, len: %d)\n", addr, len);
+        return -1;
+    }
+
+    return 0;
+}
+
+int xen_pt_bar_offset_to_index(uint32_t offset)
+{
+    int index = 0;
+
+    /* check Exp ROM BAR */
+    if (offset == PCI_ROM_ADDRESS) {
+        return PCI_ROM_SLOT;
+    }
+
+    /* calculate BAR index */
+    index = (offset - PCI_BASE_ADDRESS_0) >> 2;
+    if (index >= PCI_NUM_REGIONS) {
+        return -1;
+    }
+
+    return index;
+}
+
+static uint32_t xen_pt_pci_read_config(PCIDevice *d, uint32_t addr, int len)
+{
+    XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
+    uint32_t val = 0;
+    XenPTRegGroup *reg_grp_entry = NULL;
+    XenPTReg *reg_entry = NULL;
+    int rc = 0;
+    int emul_len = 0;
+    uint32_t find_addr = addr;
+
+    if (xen_pt_pci_config_access_check(d, addr, len)) {
+        goto exit;
+    }
+
+    /* find register group entry */
+    reg_grp_entry = xen_pt_find_reg_grp(s, addr);
+    if (reg_grp_entry) {
+        /* check 0-Hardwired register group */
+        if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED) {
+            /* no need to emulate, just return 0 */
+            val = 0;
+            goto exit;
+        }
+    }
+
+    /* read I/O device register value */
+    rc = xen_host_pci_get_block(&s->real_device, addr, (uint8_t *)&val, len);
+    if (rc < 0) {
+        XEN_PT_ERR(d, "pci_read_block failed. return value: %d.\n", rc);
+        memset(&val, 0xff, len);
+    }
+
+    /* just return the I/O device register value for
+     * passthrough type register group */
+    if (reg_grp_entry == NULL) {
+        goto exit;
+    }
+
+    /* adjust the read value to appropriate CFC-CFF window */
+    val <<= (addr & 3) << 3;
+    emul_len = len;
+
+    /* loop around the guest requested size */
+    while (emul_len > 0) {
+        /* find register entry to be emulated */
+        reg_entry = xen_pt_find_reg(reg_grp_entry, find_addr);
+        if (reg_entry) {
+            XenPTRegInfo *reg = reg_entry->reg;
+            uint32_t real_offset = reg_grp_entry->base_offset + reg->offset;
+            uint32_t valid_mask = 0xFFFFFFFF >> ((4 - emul_len) << 3);
+            uint8_t *ptr_val = NULL;
+
+            valid_mask <<= (find_addr - real_offset) << 3;
+            ptr_val = (uint8_t *)&val + (real_offset & 3);
+
+            /* do emulation based on register size */
+            switch (reg->size) {
+            case 1:
+                if (reg->u.b.read) {
+                    rc = reg->u.b.read(s, reg_entry, ptr_val, valid_mask);
+                }
+                break;
+            case 2:
+                if (reg->u.w.read) {
+                    rc = reg->u.w.read(s, reg_entry,
+                                       (uint16_t *)ptr_val, valid_mask);
+                }
+                break;
+            case 4:
+                if (reg->u.dw.read) {
+                    rc = reg->u.dw.read(s, reg_entry,
+                                        (uint32_t *)ptr_val, valid_mask);
+                }
+                break;
+            }
+
+            if (rc < 0) {
+                xen_shutdown_fatal_error("Internal error: Invalid read "
+                                         "emulation. (%s, rc: %d)\n",
+                                         __func__, rc);
+                return 0;
+            }
+
+            /* calculate next address to find */
+            emul_len -= reg->size;
+            if (emul_len > 0) {
+                find_addr = real_offset + reg->size;
+            }
+        } else {
+            /* nothing to do with passthrough type register,
+             * continue to find next byte */
+            emul_len--;
+            find_addr++;
+        }
+    }
+
+    /* need to shift back before returning them to pci bus emulator */
+    val >>= ((addr & 3) << 3);
+
+exit:
+    XEN_PT_LOG_CONFIG(d, addr, val, len);
+    return val;
+}
+
+static void xen_pt_pci_write_config(PCIDevice *d, uint32_t addr,
+                                    uint32_t val, int len)
+{
+    XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
+    int index = 0;
+    XenPTRegGroup *reg_grp_entry = NULL;
+    int rc = 0;
+    uint32_t read_val = 0;
+    int emul_len = 0;
+    XenPTReg *reg_entry = NULL;
+    uint32_t find_addr = addr;
+    XenPTRegInfo *reg = NULL;
+
+    if (xen_pt_pci_config_access_check(d, addr, len)) {
+        return;
+    }
+
+    XEN_PT_LOG_CONFIG(d, addr, val, len);
+
+    /* check unused BAR register */
+    index = xen_pt_bar_offset_to_index(addr);
+    if ((index >= 0) && (val > 0 && val < XEN_PT_BAR_ALLF) &&
+        (s->bases[index].bar_flag == XEN_PT_BAR_FLAG_UNUSED)) {
+        XEN_PT_WARN(d, "Guest attempt to set address to unused Base Address "
+                    "Register. (addr: 0x%02x, len: %d)\n", addr, len);
+    }
+
+    /* find register group entry */
+    reg_grp_entry = xen_pt_find_reg_grp(s, addr);
+    if (reg_grp_entry) {
+        /* check 0-Hardwired register group */
+        if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED) {
+            /* ignore silently */
+            XEN_PT_WARN(d, "Access to 0-Hardwired register. "
+                        "(addr: 0x%02x, len: %d)\n", addr, len);
+            return;
+        }
+    }
+
+    rc = xen_host_pci_get_block(&s->real_device, addr,
+                                (uint8_t *)&read_val, len);
+    if (rc < 0) {
+        XEN_PT_ERR(d, "pci_read_block failed. return value: %d.\n", rc);
+        memset(&read_val, 0xff, len);
+    }
+
+    /* pass directly to the real device for passthrough type register group */
+    if (reg_grp_entry == NULL) {
+        goto out;
+    }
+
+    memory_region_transaction_begin();
+    pci_default_write_config(d, addr, val, len);
+
+    /* adjust the read and write value to appropriate CFC-CFF window */
+    read_val <<= (addr & 3) << 3;
+    val <<= (addr & 3) << 3;
+    emul_len = len;
+
+    /* loop around the guest requested size */
+    while (emul_len > 0) {
+        /* find register entry to be emulated */
+        reg_entry = xen_pt_find_reg(reg_grp_entry, find_addr);
+        if (reg_entry) {
+            reg = reg_entry->reg;
+            uint32_t real_offset = reg_grp_entry->base_offset + reg->offset;
+            uint32_t valid_mask = 0xFFFFFFFF >> ((4 - emul_len) << 3);
+            uint8_t *ptr_val = NULL;
+
+            valid_mask <<= (find_addr - real_offset) << 3;
+            ptr_val = (uint8_t *)&val + (real_offset & 3);
+
+            /* do emulation based on register size */
+            switch (reg->size) {
+            case 1:
+                if (reg->u.b.write) {
+                    rc = reg->u.b.write(s, reg_entry, ptr_val,
+                                        read_val >> ((real_offset & 3) << 3),
+                                        valid_mask);
+                }
+                break;
+            case 2:
+                if (reg->u.w.write) {
+                    rc = reg->u.w.write(s, reg_entry, (uint16_t *)ptr_val,
+                                        (read_val >> ((real_offset & 3) << 3)),
+                                        valid_mask);
+                }
+                break;
+            case 4:
+                if (reg->u.dw.write) {
+                    rc = reg->u.dw.write(s, reg_entry, (uint32_t *)ptr_val,
+                                         (read_val >> ((real_offset & 3) << 3)),
+                                         valid_mask);
+                }
+                break;
+            }
+
+            if (rc < 0) {
+                xen_shutdown_fatal_error("Internal error: Invalid write"
+                                         " emulation. (%s, rc: %d)\n",
+                                         __func__, rc);
+                return;
+            }
+
+            /* calculate next address to find */
+            emul_len -= reg->size;
+            if (emul_len > 0) {
+                find_addr = real_offset + reg->size;
+            }
+        } else {
+            /* nothing to do with passthrough type register,
+             * continue to find next byte */
+            emul_len--;
+            find_addr++;
+        }
+    }
+
+    /* need to shift back before passing them to xen_host_pci_device */
+    val >>= (addr & 3) << 3;
+
+    memory_region_transaction_commit();
+
+out:
+    if (!(reg && reg->no_wb)) {
+        /* unknown regs are passed through */
+        rc = xen_host_pci_set_block(&s->real_device, addr,
+                                    (uint8_t *)&val, len);
+
+        if (rc < 0) {
+            XEN_PT_ERR(d, "pci_write_block failed. return value: %d.\n", rc);
+        }
+    }
+}
+
+/* register regions */
+
+static uint64_t xen_pt_bar_read(void *o, target_phys_addr_t addr,
+                                unsigned size)
+{
+    PCIDevice *d = o;
+    /* if this function is called, that probably means that there is a
+     * misconfiguration of the IOMMU. */
+    XEN_PT_ERR(d, "Should not read BAR through QEMU. @0x"TARGET_FMT_plx"\n",
+               addr);
+    return 0;
+}
+static void xen_pt_bar_write(void *o, target_phys_addr_t addr, uint64_t val,
+                             unsigned size)
+{
+    PCIDevice *d = o;
+    /* Same comment as xen_pt_bar_read function */
+    XEN_PT_ERR(d, "Should not write BAR through QEMU. @0x"TARGET_FMT_plx"\n",
+               addr);
+}
+
+static const MemoryRegionOps ops = {
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .read = xen_pt_bar_read,
+    .write = xen_pt_bar_write,
+};
+
+static int xen_pt_register_regions(XenPCIPassthroughState *s)
+{
+    int i = 0;
+    XenHostPCIDevice *d = &s->real_device;
+
+    /* Register PIO/MMIO BARs */
+    for (i = 0; i < PCI_ROM_SLOT; i++) {
+        XenHostPCIIORegion *r = &d->io_regions[i];
+        uint8_t type;
+
+        if (r->base_addr == 0 || r->size == 0) {
+            continue;
+        }
+
+        s->bases[i].access.u = r->base_addr;
+
+        if (r->type & XEN_HOST_PCI_REGION_TYPE_IO) {
+            type = PCI_BASE_ADDRESS_SPACE_IO;
+        } else {
+            type = PCI_BASE_ADDRESS_SPACE_MEMORY;
+            if (r->type & XEN_HOST_PCI_REGION_TYPE_PREFETCH) {
+                type |= PCI_BASE_ADDRESS_MEM_PREFETCH;
+            }
+        }
+
+        memory_region_init_io(&s->bar[i], &ops, &s->dev,
+                              "xen-pci-pt-bar", r->size);
+        pci_register_bar(&s->dev, i, type, &s->bar[i]);
+
+        XEN_PT_LOG(&s->dev, "IO region %i registered (size=0x%08"PRIx64
+                   " base_addr=0x%08"PRIx64" type: %#x)\n",
+                   i, r->size, r->base_addr, type);
+    }
+
+    /* Register expansion ROM address */
+    if (d->rom.base_addr && d->rom.size) {
+        uint32_t bar_data = 0;
+
+        /* Re-set BAR reported by OS, otherwise ROM can't be read. */
+        if (xen_host_pci_get_long(d, PCI_ROM_ADDRESS, &bar_data)) {
+            return 0;
+        }
+        if ((bar_data & PCI_ROM_ADDRESS_MASK) == 0) {
+            bar_data |= d->rom.base_addr & PCI_ROM_ADDRESS_MASK;
+            xen_host_pci_set_long(d, PCI_ROM_ADDRESS, bar_data);
+        }
+
+        s->bases[PCI_ROM_SLOT].access.maddr = d->rom.base_addr;
+
+        memory_region_init_rom_device(&s->rom, NULL, NULL,
+                                      "xen-pci-pt-rom", d->rom.size);
+        pci_register_bar(&s->dev, PCI_ROM_SLOT, PCI_BASE_ADDRESS_MEM_PREFETCH,
+                         &s->rom);
+
+        XEN_PT_LOG(&s->dev, "Expansion ROM registered (size=0x%08"PRIx64
+                   " base_addr=0x%08"PRIx64")\n",
+                   d->rom.size, d->rom.base_addr);
+    }
+
+    return 0;
+}
+
+static void xen_pt_unregister_regions(XenPCIPassthroughState *s)
+{
+    XenHostPCIDevice *d = &s->real_device;
+    int i;
+
+    for (i = 0; i < PCI_NUM_REGIONS - 1; i++) {
+        XenHostPCIIORegion *r = &d->io_regions[i];
+
+        if (r->base_addr == 0 || r->size == 0) {
+            continue;
+        }
+
+        memory_region_destroy(&s->bar[i]);
+    }
+    if (d->rom.base_addr && d->rom.size) {
+        memory_region_destroy(&s->rom);
+    }
+}
+
+/* region mapping */
+
+static int xen_pt_bar_from_region(XenPCIPassthroughState *s, MemoryRegion *mr)
+{
+    int i = 0;
+
+    for (i = 0; i < PCI_NUM_REGIONS - 1; i++) {
+        if (mr == &s->bar[i]) {
+            return i;
+        }
+    }
+    if (mr == &s->rom) {
+        return PCI_ROM_SLOT;
+    }
+    return -1;
+}
+
+/*
+ * This function checks if an io_region overlaps an io_region from another
+ * device.  The io_region to check is provided with (addr, size and type)
+ * A callback can be provided and will be called for every region that is
+ * overlapped.
+ * The return value indicates if the region is overlappsed */
+struct CheckBarArgs {
+    XenPCIPassthroughState *s;
+    pcibus_t addr;
+    pcibus_t size;
+    uint8_t type;
+    bool rc;
+};
+static void xen_pt_check_bar_overlap(PCIBus *bus, PCIDevice *d, void *opaque)
+{
+    struct CheckBarArgs *arg = opaque;
+    XenPCIPassthroughState *s = arg->s;
+    uint8_t type = arg->type;
+    int i;
+
+    if (d->devfn == s->dev.devfn) {
+        return;
+    }
+
+    /* xxx: This ignores bridges. */
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        const PCIIORegion *r = &d->io_regions[i];
+
+        if (!r->size) {
+            continue;
+        }
+        if ((type & PCI_BASE_ADDRESS_SPACE_IO)
+            != (r->type & PCI_BASE_ADDRESS_SPACE_IO)) {
+            continue;
+        }
+
+        if (ranges_overlap(arg->addr, arg->size, r->addr, r->size)) {
+            XEN_PT_WARN(&s->dev,
+                        "Overlapped to device [%02x:%02x.%d] Region: %i"
+                        " (addr: %#"FMT_PCIBUS", len: %#"FMT_PCIBUS")\n",
+                        pci_bus_num(bus), PCI_SLOT(d->devfn),
+                        PCI_FUNC(d->devfn), i, r->addr, r->size);
+            arg->rc = true;
+        }
+    }
+}
+
+static void xen_pt_region_update(XenPCIPassthroughState *s,
+                                 MemoryRegionSection *sec, bool adding)
+{
+    PCIDevice *d = &s->dev;
+    MemoryRegion *mr = sec->mr;
+    int bar = -1;
+    int rc;
+    int op = adding ? DPCI_ADD_MAPPING : DPCI_REMOVE_MAPPING;
+    struct CheckBarArgs args = {
+        .s = s,
+        .addr = sec->offset_within_address_space,
+        .size = sec->size,
+        .rc = false,
+    };
+
+    bar = xen_pt_bar_from_region(s, mr);
+    if (bar == -1) {
+        return;
+    }
+
+    args.type = d->io_regions[bar].type;
+    pci_for_each_device(d->bus, pci_bus_num(d->bus),
+                        xen_pt_check_bar_overlap, &args);
+    if (args.rc) {
+        XEN_PT_WARN(d, "Region: %d (addr: %#"FMT_PCIBUS
+                    ", len: %#"FMT_PCIBUS") is overlapped.\n",
+                    bar, sec->offset_within_address_space, sec->size);
+    }
+
+    if (d->io_regions[bar].type & PCI_BASE_ADDRESS_SPACE_IO) {
+        uint32_t guest_port = sec->offset_within_address_space;
+        uint32_t machine_port = s->bases[bar].access.pio_base;
+        uint32_t size = sec->size;
+        rc = xc_domain_ioport_mapping(xen_xc, xen_domid,
+                                      guest_port, machine_port, size,
+                                      op);
+        if (rc) {
+            XEN_PT_ERR(d, "%s ioport mapping failed! (rc: %i)\n",
+                       adding ? "create new" : "remove old", rc);
+        }
+    } else {
+        pcibus_t guest_addr = sec->offset_within_address_space;
+        pcibus_t machine_addr = s->bases[bar].access.maddr
+            + sec->offset_within_region;
+        pcibus_t size = sec->size;
+        rc = xc_domain_memory_mapping(xen_xc, xen_domid,
+                                      XEN_PFN(guest_addr + XC_PAGE_SIZE - 1),
+                                      XEN_PFN(machine_addr + XC_PAGE_SIZE - 1),
+                                      XEN_PFN(size + XC_PAGE_SIZE - 1),
+                                      op);
+        if (rc) {
+            XEN_PT_ERR(d, "%s mem mapping failed! (rc: %i)\n",
+                       adding ? "create new" : "remove old", rc);
+        }
+    }
+}
+
+static void xen_pt_begin(MemoryListener *l)
+{
+}
+
+static void xen_pt_commit(MemoryListener *l)
+{
+}
+
+static void xen_pt_region_add(MemoryListener *l, MemoryRegionSection *sec)
+{
+    XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState,
+                                             memory_listener);
+
+    xen_pt_region_update(s, sec, true);
+}
+
+static void xen_pt_region_del(MemoryListener *l, MemoryRegionSection *sec)
+{
+    XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState,
+                                             memory_listener);
+
+    xen_pt_region_update(s, sec, false);
+}
+
+static void xen_pt_region_nop(MemoryListener *l, MemoryRegionSection *s)
+{
+}
+
+static void xen_pt_log_fns(MemoryListener *l, MemoryRegionSection *s)
+{
+}
+
+static void xen_pt_log_global_fns(MemoryListener *l)
+{
+}
+
+static void xen_pt_eventfd_fns(MemoryListener *l, MemoryRegionSection *s,
+                               bool match_data, uint64_t data, int fd)
+{
+}
+
+static const MemoryListener xen_pt_memory_listener = {
+    .begin = xen_pt_begin,
+    .commit = xen_pt_commit,
+    .region_add = xen_pt_region_add,
+    .region_nop = xen_pt_region_nop,
+    .region_del = xen_pt_region_del,
+    .log_start = xen_pt_log_fns,
+    .log_stop = xen_pt_log_fns,
+    .log_sync = xen_pt_log_fns,
+    .log_global_start = xen_pt_log_global_fns,
+    .log_global_stop = xen_pt_log_global_fns,
+    .eventfd_add = xen_pt_eventfd_fns,
+    .eventfd_del = xen_pt_eventfd_fns,
+    .priority = 10,
+};
+
+/* init */
+
+static int xen_pt_initfn(PCIDevice *d)
+{
+    XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
+    int rc = 0;
+    uint8_t machine_irq = 0;
+    int pirq = XEN_PT_UNASSIGNED_PIRQ;
+
+    /* register real device */
+    XEN_PT_LOG(d, "Assigning real physical device %02x:%02x.%d"
+               " to devfn %#x\n",
+               s->hostaddr.bus, s->hostaddr.slot, s->hostaddr.function,
+               s->dev.devfn);
+
+    rc = xen_host_pci_device_get(&s->real_device,
+                                 s->hostaddr.domain, s->hostaddr.bus,
+                                 s->hostaddr.slot, s->hostaddr.function);
+    if (rc) {
+        XEN_PT_ERR(d, "Failed to \"open\" the real pci device. rc: %i\n", rc);
+        return -1;
+    }
+
+    s->is_virtfn = s->real_device.is_virtfn;
+    if (s->is_virtfn) {
+        XEN_PT_LOG(d, "%04x:%02x:%02x.%d is a SR-IOV Virtual Function\n",
+                   s->real_device.domain, bus, slot, func);
+    }
+
+    /* Initialize virtualized PCI configuration (Extended 256 Bytes) */
+    if (xen_host_pci_get_block(&s->real_device, 0, d->config,
+                               PCI_CONFIG_SPACE_SIZE) == -1) {
+        xen_host_pci_device_put(&s->real_device);
+        return -1;
+    }
+
+    s->memory_listener = xen_pt_memory_listener;
+
+    /* Handle real device's MMIO/PIO BARs */
+    xen_pt_register_regions(s);
+
+    /* Bind interrupt */
+    if (!s->dev.config[PCI_INTERRUPT_PIN]) {
+        XEN_PT_LOG(d, "no pin interrupt\n");
+        goto out;
+    }
+
+    machine_irq = s->real_device.irq;
+    rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
+
+    if (rc < 0) {
+        XEN_PT_ERR(d, "Mapping machine irq %u to pirq %i failed, (rc: %d)\n",
+                   machine_irq, pirq, rc);
+
+        /* Disable PCI intx assertion (turn on bit10 of devctl) */
+        xen_host_pci_set_word(&s->real_device,
+                              PCI_COMMAND,
+                              pci_get_word(s->dev.config + PCI_COMMAND)
+                              | PCI_COMMAND_INTX_DISABLE);
+        machine_irq = 0;
+        s->machine_irq = 0;
+    } else {
+        machine_irq = pirq;
+        s->machine_irq = pirq;
+        xen_pt_mapped_machine_irq[machine_irq]++;
+    }
+
+    /* bind machine_irq to device */
+    if (machine_irq != 0) {
+        uint8_t e_intx = xen_pt_pci_intx(s);
+
+        rc = xc_domain_bind_pt_pci_irq(xen_xc, xen_domid, machine_irq,
+                                       pci_bus_num(d->bus),
+                                       PCI_SLOT(d->devfn),
+                                       e_intx);
+        if (rc < 0) {
+            XEN_PT_ERR(d, "Binding of interrupt %i failed! (rc: %d)\n",
+                       e_intx, rc);
+
+            /* Disable PCI intx assertion (turn on bit10 of devctl) */
+            xen_host_pci_set_word(&s->real_device, PCI_COMMAND,
+                                  *(uint16_t *)(&s->dev.config[PCI_COMMAND])
+                                  | PCI_COMMAND_INTX_DISABLE);
+            xen_pt_mapped_machine_irq[machine_irq]--;
+
+            if (xen_pt_mapped_machine_irq[machine_irq] == 0) {
+                if (xc_physdev_unmap_pirq(xen_xc, xen_domid, machine_irq)) {
+                    XEN_PT_ERR(d, "Unmapping of machine interrupt %i failed!"
+                               " (rc: %d)\n", machine_irq, rc);
+                }
+            }
+            s->machine_irq = 0;
+        }
+    }
+
+out:
+    memory_listener_register(&s->memory_listener, NULL);
+    XEN_PT_LOG(d, "Real physical device %02x:%02x.%d registered successfuly!\n",
+               bus, slot, func);
+
+    return 0;
+}
+
+static int xen_pt_unregister_device(PCIDevice *d)
+{
+    XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
+    uint8_t machine_irq = s->machine_irq;
+    uint8_t intx = xen_pt_pci_intx(s);
+    int rc;
+
+    if (machine_irq) {
+        rc = xc_domain_unbind_pt_irq(xen_xc, xen_domid, machine_irq,
+                                     PT_IRQ_TYPE_PCI,
+                                     pci_bus_num(d->bus),
+                                     PCI_SLOT(s->dev.devfn),
+                                     intx,
+                                     0 /* isa_irq */);
+        if (rc < 0) {
+            XEN_PT_ERR(d, "unbinding of interrupt INT%c failed."
+                       " (machine irq: %i, rc: %d)"
+                       " But bravely continuing on..\n",
+                       'a' + intx, machine_irq, rc);
+        }
+    }
+
+    if (machine_irq) {
+        xen_pt_mapped_machine_irq[machine_irq]--;
+
+        if (xen_pt_mapped_machine_irq[machine_irq] == 0) {
+            rc = xc_physdev_unmap_pirq(xen_xc, xen_domid, machine_irq);
+
+            if (rc < 0) {
+                XEN_PT_ERR(d, "unmapping of interrupt %i failed. (rc: %d)"
+                           " But bravely continuing on..\n",
+                           machine_irq, rc);
+            }
+        }
+    }
+
+    xen_pt_unregister_regions(s);
+    memory_listener_unregister(&s->memory_listener);
+
+    xen_host_pci_device_put(&s->real_device);
+
+    return 0;
+}
+
+static Property xen_pci_passthrough_properties[] = {
+    DEFINE_PROP_PCI_HOST_DEVADDR("hostaddr", XenPCIPassthroughState, hostaddr),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void xen_pci_passthrough_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->init = xen_pt_initfn;
+    k->exit = xen_pt_unregister_device;
+    k->config_read = xen_pt_pci_read_config;
+    k->config_write = xen_pt_pci_write_config;
+    dc->desc = "Assign an host PCI device with Xen";
+    dc->props = xen_pci_passthrough_properties;
+};
+
+static TypeInfo xen_pci_passthrough_info = {
+    .name = "xen-pci-passthrough",
+    .parent = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(XenPCIPassthroughState),
+    .class_init = xen_pci_passthrough_class_init,
+};
+
+static void xen_pci_passthrough_register_types(void)
+{
+    type_register_static(&xen_pci_passthrough_info);
+}
+
+type_init(xen_pci_passthrough_register_types)
diff --git a/hw/xen_pt.h b/hw/xen_pt.h
new file mode 100644
index 0000000..36001a7
--- /dev/null
+++ b/hw/xen_pt.h
@@ -0,0 +1,248 @@
+#ifndef XEN_PT_H
+#define XEN_PT_H
+
+#include "qemu-common.h"
+#include "xen_common.h"
+#include "pci.h"
+#include "xen-host-pci-device.h"
+
+void xen_pt_log(const PCIDevice *d, const char *f, ...) GCC_FMT_ATTR(2, 3);
+
+#define XEN_PT_ERR(d, _f, _a...) xen_pt_log(d, "%s: Error: "_f, __func__, ##_a)
+
+#ifdef XEN_PT_LOGGING_ENABLED
+#  define XEN_PT_LOG(d, _f, _a...)  xen_pt_log(d, "%s: " _f, __func__, ##_a)
+#  define XEN_PT_WARN(d, _f, _a...) \
+    xen_pt_log(d, "%s: Warning: "_f, __func__, ##_a)
+#else
+#  define XEN_PT_LOG(d, _f, _a...)
+#  define XEN_PT_WARN(d, _f, _a...)
+#endif
+
+#ifdef XEN_PT_DEBUG_PCI_CONFIG_ACCESS
+#  define XEN_PT_LOG_CONFIG(d, addr, val, len) \
+    xen_pt_log(d, "%s: address=0x%04x val=0x%08x len=%d\n", \
+               __func__, addr, val, len)
+#else
+#  define XEN_PT_LOG_CONFIG(d, addr, val, len)
+#endif
+
+
+/* Helper */
+#define XEN_PFN(x) ((x) >> XC_PAGE_SHIFT)
+
+typedef struct XenPTRegInfo XenPTRegInfo;
+typedef struct XenPTReg XenPTReg;
+
+typedef struct XenPCIPassthroughState XenPCIPassthroughState;
+
+/* function type for config reg */
+typedef int (*xen_pt_conf_reg_init)
+    (XenPCIPassthroughState *, XenPTRegInfo *, uint32_t real_offset,
+     uint32_t *data);
+typedef int (*xen_pt_conf_dword_write)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint32_t *val, uint32_t dev_value, uint32_t valid_mask);
+typedef int (*xen_pt_conf_word_write)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint16_t *val, uint16_t dev_value, uint16_t valid_mask);
+typedef int (*xen_pt_conf_byte_write)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint8_t *val, uint8_t dev_value, uint8_t valid_mask);
+typedef int (*xen_pt_conf_dword_read)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint32_t *val, uint32_t valid_mask);
+typedef int (*xen_pt_conf_word_read)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint16_t *val, uint16_t valid_mask);
+typedef int (*xen_pt_conf_byte_read)
+    (XenPCIPassthroughState *, XenPTReg *cfg_entry,
+     uint8_t *val, uint8_t valid_mask);
+
+#define XEN_PT_BAR_ALLF 0xFFFFFFFF
+#define XEN_PT_BAR_UNMAPPED (-1)
+
+
+typedef enum {
+    XEN_PT_GRP_TYPE_HARDWIRED = 0,  /* 0 Hardwired reg group */
+    XEN_PT_GRP_TYPE_EMU,            /* emul reg group */
+} XenPTRegisterGroupType;
+
+typedef enum {
+    XEN_PT_BAR_FLAG_MEM = 0,        /* Memory type BAR */
+    XEN_PT_BAR_FLAG_IO,             /* I/O type BAR */
+    XEN_PT_BAR_FLAG_UPPER,          /* upper 64bit BAR */
+    XEN_PT_BAR_FLAG_UNUSED,         /* unused BAR */
+} XenPTBarFlag;
+
+
+typedef struct XenPTRegion {
+    /* BAR flag */
+    XenPTBarFlag bar_flag;
+    /* Translation of the emulated address */
+    union {
+        uint64_t maddr;
+        uint64_t pio_base;
+        uint64_t u;
+    } access;
+} XenPTRegion;
+
+/* XenPTRegInfo declaration
+ * - only for emulated register (either a part or whole bit).
+ * - for passthrough register that need special behavior (like interacting with
+ *   other component), set emu_mask to all 0 and specify r/w func properly.
+ * - do NOT use ALL F for init_val, otherwise the tbl will not be registered.
+ */
+
+/* emulated register infomation */
+struct XenPTRegInfo {
+    uint32_t offset;
+    uint32_t size;
+    uint32_t init_val;
+    /* reg read only field mask (ON:RO/ROS, OFF:other) */
+    uint32_t ro_mask;
+    /* reg emulate field mask (ON:emu, OFF:passthrough) */
+    uint32_t emu_mask;
+    /* no write back allowed */
+    uint32_t no_wb;
+    xen_pt_conf_reg_init init;
+    /* read/write function pointer
+     * for double_word/word/byte size */
+    union {
+        struct {
+            xen_pt_conf_dword_write write;
+            xen_pt_conf_dword_read read;
+        } dw;
+        struct {
+            xen_pt_conf_word_write write;
+            xen_pt_conf_word_read read;
+        } w;
+        struct {
+            xen_pt_conf_byte_write write;
+            xen_pt_conf_byte_read read;
+        } b;
+    } u;
+};
+
+/* emulated register management */
+struct XenPTReg {
+    QLIST_ENTRY(XenPTReg) entries;
+    XenPTRegInfo *reg;
+    uint32_t data; /* emulated value */
+};
+
+typedef struct XenPTRegGroupInfo XenPTRegGroupInfo;
+
+/* emul reg group size initialize method */
+typedef int (*xen_pt_reg_size_init_fn)
+    (XenPCIPassthroughState *, const XenPTRegGroupInfo *,
+     uint32_t base_offset, uint8_t *size);
+
+/* emulated register group infomation */
+struct XenPTRegGroupInfo {
+    uint8_t grp_id;
+    XenPTRegisterGroupType grp_type;
+    uint8_t grp_size;
+    xen_pt_reg_size_init_fn size_init;
+    XenPTRegInfo *emu_regs;
+};
+
+/* emul register group management table */
+typedef struct XenPTRegGroup {
+    QLIST_ENTRY(XenPTRegGroup) entries;
+    const XenPTRegGroupInfo *reg_grp;
+    uint32_t base_offset;
+    uint8_t size;
+    QLIST_HEAD(, XenPTReg) reg_tbl_list;
+} XenPTRegGroup;
+
+
+#define XEN_PT_UNASSIGNED_PIRQ (-1)
+
+struct XenPCIPassthroughState {
+    PCIDevice dev;
+
+    PCIHostDeviceAddress hostaddr;
+    bool is_virtfn;
+    XenHostPCIDevice real_device;
+    XenPTRegion bases[PCI_NUM_REGIONS]; /* Access regions */
+    QLIST_HEAD(, XenPTRegGroup) reg_grps;
+
+    uint32_t machine_irq;
+
+    MemoryRegion bar[PCI_NUM_REGIONS - 1];
+    MemoryRegion rom;
+
+    MemoryListener memory_listener;
+};
+
+int xen_pt_config_init(XenPCIPassthroughState *s);
+void xen_pt_config_delete(XenPCIPassthroughState *s);
+XenPTRegGroup *xen_pt_find_reg_grp(XenPCIPassthroughState *s, uint32_t address);
+XenPTReg *xen_pt_find_reg(XenPTRegGroup *reg_grp, uint32_t address);
+int xen_pt_bar_offset_to_index(uint32_t offset);
+
+static inline pcibus_t xen_pt_get_emul_size(XenPTBarFlag flag, pcibus_t r_size)
+{
+    /* align resource size (memory type only) */
+    if (flag == XEN_PT_BAR_FLAG_MEM) {
+        return (r_size + XC_PAGE_SIZE - 1) & XC_PAGE_MASK;
+    } else {
+        return r_size;
+    }
+}
+
+/* INTx */
+/* The PCI Local Bus Specification, Rev. 3.0,
+ * Section 6.2.4 Miscellaneous Registers, pp 223
+ * outlines 5 valid values for the interrupt pin (intx).
+ *  0: For devices (or device functions) that don't use an interrupt in
+ *  1: INTA#
+ *  2: INTB#
+ *  3: INTC#
+ *  4: INTD#
+ *
+ * Xen uses the following 4 values for intx
+ *  0: INTA#
+ *  1: INTB#
+ *  2: INTC#
+ *  3: INTD#
+ *
+ * Observing that these list of values are not the same, xen_pt_pci_read_intx()
+ * uses the following mapping from hw to xen values.
+ * This seems to reflect the current usage within Xen.
+ *
+ * PCI hardware    | Xen | Notes
+ * ----------------+-----+----------------------------------------------------
+ * 0               | 0   | No interrupt
+ * 1               | 0   | INTA#
+ * 2               | 1   | INTB#
+ * 3               | 2   | INTC#
+ * 4               | 3   | INTD#
+ * any other value | 0   | This should never happen, log error message
+ */
+
+static inline uint8_t xen_pt_pci_read_intx(XenPCIPassthroughState *s)
+{
+    uint8_t v = 0;
+    xen_host_pci_get_byte(&s->real_device, PCI_INTERRUPT_PIN, &v);
+    return v;
+}
+
+static inline uint8_t xen_pt_pci_intx(XenPCIPassthroughState *s)
+{
+    uint8_t r_val = xen_pt_pci_read_intx(s);
+
+    XEN_PT_LOG(&s->dev, "intx=%i\n", r_val);
+    if (r_val < 1 || r_val > 4) {
+        XEN_PT_LOG(&s->dev, "Interrupt pin read from hardware is out of range:"
+                   " value=%i, acceptable range is 1 - 4\n", r_val);
+        r_val = 0;
+    } else {
+        r_val -= 1;
+    }
+
+    return r_val;
+}
+
+#endif /* !XEN_PT_H */
diff --git a/hw/xen_pt_config_init.c b/hw/xen_pt_config_init.c
new file mode 100644
index 0000000..64d22e8
--- /dev/null
+++ b/hw/xen_pt_config_init.c
@@ -0,0 +1,11 @@
+#include "xen_pt.h"
+
+XenPTRegGroup *xen_pt_find_reg_grp(XenPCIPassthroughState *s, uint32_t address)
+{
+    return NULL;
+}
+
+XenPTReg *xen_pt_find_reg(XenPTRegGroup *reg_grp, uint32_t address)
+{
+    return NULL;
+}
diff --git a/xen-all.c b/xen-all.c
index b5220cc..59f2323 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -1191,3 +1191,15 @@ void xen_register_framebuffer(MemoryRegion *mr)
 {
     framebuffer = mr;
 }
+
+void xen_shutdown_fatal_error(const char *fmt, ...)
+{
+    va_list ap;
+
+    va_start(ap, fmt);
+    vfprintf(stderr, fmt, ap);
+    va_end(ap);
+    fprintf(stderr, "Will destroy the domain.\n");
+    /* destroy the domain */
+    qemu_system_shutdown_request();
+}
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH V13 7/9] Introduce Xen PCI Passthrough, PCI config space helpers (2/3)
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
@ 2012-06-14 17:01   ` Anthony PERARD
  2012-06-14 17:01 ` [PATCH V13 2/9] configure: Introduce --enable-xen-pci-passthrough Anthony PERARD
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Allen Kay, Xen Devel, Guy Zana,
	Anthony PERARD

From: Allen Kay <allen.m.kay@intel.com>

A more complete history can be found here:
git://xenbits.xensource.com/qemu-xen-unstable.git

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Guy Zana <guy@neocleus.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/xen_pt.c             |   10 +
 hw/xen_pt.h             |    2 +
 hw/xen_pt_config_init.c | 1387 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1399 insertions(+), 0 deletions(-)

diff --git a/hw/xen_pt.c b/hw/xen_pt.c
index 63a5c80..92ad0fa 100644
--- a/hw/xen_pt.c
+++ b/hw/xen_pt.c
@@ -673,6 +673,13 @@ static int xen_pt_initfn(PCIDevice *d)
     /* Handle real device's MMIO/PIO BARs */
     xen_pt_register_regions(s);
 
+    /* reinitialize each config register to be emulated */
+    if (xen_pt_config_init(s)) {
+        XEN_PT_ERR(d, "PCI Config space initialisation failed.\n");
+        xen_host_pci_device_put(&s->real_device);
+        return -1;
+    }
+
     /* Bind interrupt */
     if (!s->dev.config[PCI_INTERRUPT_PIN]) {
         XEN_PT_LOG(d, "no pin interrupt\n");
@@ -771,6 +778,9 @@ static int xen_pt_unregister_device(PCIDevice *d)
         }
     }
 
+    /* delete all emulated config registers */
+    xen_pt_config_delete(s);
+
     xen_pt_unregister_regions(s);
     memory_listener_unregister(&s->memory_listener);
 
diff --git a/hw/xen_pt.h b/hw/xen_pt.h
index 36001a7..4b76073 100644
--- a/hw/xen_pt.h
+++ b/hw/xen_pt.h
@@ -62,6 +62,8 @@ typedef int (*xen_pt_conf_byte_read)
 #define XEN_PT_BAR_ALLF 0xFFFFFFFF
 #define XEN_PT_BAR_UNMAPPED (-1)
 
+#define PCI_CAP_MAX 48
+
 
 typedef enum {
     XEN_PT_GRP_TYPE_HARDWIRED = 0,  /* 0 Hardwired reg group */
diff --git a/hw/xen_pt_config_init.c b/hw/xen_pt_config_init.c
index 64d22e8..1d97876 100644
--- a/hw/xen_pt_config_init.c
+++ b/hw/xen_pt_config_init.c
@@ -1,11 +1,1398 @@
+/*
+ * Copyright (c) 2007, Neocleus Corporation.
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Alex Novik <alex@neocleus.com>
+ * Allen Kay <allen.m.kay@intel.com>
+ * Guy Zana <guy@neocleus.com>
+ *
+ * This file implements direct PCI assignment to a HVM guest
+ */
+
+#include "qemu-timer.h"
+#include "xen_backend.h"
 #include "xen_pt.h"
 
+#define XEN_PT_MERGE_VALUE(value, data, val_mask) \
+    (((value) & (val_mask)) | ((data) & ~(val_mask)))
+
+#define XEN_PT_INVALID_REG          0xFFFFFFFF      /* invalid register value */
+
+/* prototype */
+
+static int xen_pt_ptr_reg_init(XenPCIPassthroughState *s, XenPTRegInfo *reg,
+                               uint32_t real_offset, uint32_t *data);
+
+
+/* helper */
+
+/* A return value of 1 means the capability should NOT be exposed to guest. */
+static int xen_pt_hide_dev_cap(const XenHostPCIDevice *d, uint8_t grp_id)
+{
+    switch (grp_id) {
+    case PCI_CAP_ID_EXP:
+        /* The PCI Express Capability Structure of the VF of Intel 82599 10GbE
+         * Controller looks trivial, e.g., the PCI Express Capabilities
+         * Register is 0. We should not try to expose it to guest.
+         *
+         * The datasheet is available at
+         * http://download.intel.com/design/network/datashts/82599_datasheet.pdf
+         *
+         * See 'Table 9.7. VF PCIe Configuration Space' of the datasheet, the
+         * PCI Express Capability Structure of the VF of Intel 82599 10GbE
+         * Controller looks trivial, e.g., the PCI Express Capabilities
+         * Register is 0, so the Capability Version is 0 and
+         * xen_pt_pcie_size_init() would fail.
+         */
+        if (d->vendor_id == PCI_VENDOR_ID_INTEL &&
+            d->device_id == PCI_DEVICE_ID_INTEL_82599_SFP_VF) {
+            return 1;
+        }
+        break;
+    }
+    return 0;
+}
+
+/*   find emulate register group entry */
 XenPTRegGroup *xen_pt_find_reg_grp(XenPCIPassthroughState *s, uint32_t address)
 {
+    XenPTRegGroup *entry = NULL;
+
+    /* find register group entry */
+    QLIST_FOREACH(entry, &s->reg_grps, entries) {
+        /* check address */
+        if ((entry->base_offset <= address)
+            && ((entry->base_offset + entry->size) > address)) {
+            return entry;
+        }
+    }
+
+    /* group entry not found */
     return NULL;
 }
 
+/* find emulate register entry */
 XenPTReg *xen_pt_find_reg(XenPTRegGroup *reg_grp, uint32_t address)
 {
+    XenPTReg *reg_entry = NULL;
+    XenPTRegInfo *reg = NULL;
+    uint32_t real_offset = 0;
+
+    /* find register entry */
+    QLIST_FOREACH(reg_entry, &reg_grp->reg_tbl_list, entries) {
+        reg = reg_entry->reg;
+        real_offset = reg_grp->base_offset + reg->offset;
+        /* check address */
+        if ((real_offset <= address)
+            && ((real_offset + reg->size) > address)) {
+            return reg_entry;
+        }
+    }
+
     return NULL;
 }
+
+
+/****************
+ * general register functions
+ */
+
+/* register initialization function */
+
+static int xen_pt_common_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegInfo *reg, uint32_t real_offset,
+                                  uint32_t *data)
+{
+    *data = reg->init_val;
+    return 0;
+}
+
+/* Read register functions */
+
+static int xen_pt_byte_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                uint8_t *value, uint8_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint8_t valid_emu_mask = 0;
+
+    /* emulate byte register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+static int xen_pt_word_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                uint16_t *value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t valid_emu_mask = 0;
+
+    /* emulate word register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+static int xen_pt_long_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                uint32_t *value, uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint32_t valid_emu_mask = 0;
+
+    /* emulate long register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+
+/* Write register functions */
+
+static int xen_pt_byte_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                 uint8_t *val, uint8_t dev_value,
+                                 uint8_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint8_t writable_mask = 0;
+    uint8_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+static int xen_pt_word_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                 uint16_t *val, uint16_t dev_value,
+                                 uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+static int xen_pt_long_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                 uint32_t *val, uint32_t dev_value,
+                                 uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+
+
+/* XenPTRegInfo declaration
+ * - only for emulated register (either a part or whole bit).
+ * - for passthrough register that need special behavior (like interacting with
+ *   other component), set emu_mask to all 0 and specify r/w func properly.
+ * - do NOT use ALL F for init_val, otherwise the tbl will not be registered.
+ */
+
+/********************
+ * Header Type0
+ */
+
+static int xen_pt_vendor_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegInfo *reg, uint32_t real_offset,
+                                  uint32_t *data)
+{
+    *data = s->real_device.vendor_id;
+    return 0;
+}
+static int xen_pt_device_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegInfo *reg, uint32_t real_offset,
+                                  uint32_t *data)
+{
+    *data = s->real_device.device_id;
+    return 0;
+}
+static int xen_pt_status_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegInfo *reg, uint32_t real_offset,
+                                  uint32_t *data)
+{
+    XenPTRegGroup *reg_grp_entry = NULL;
+    XenPTReg *reg_entry = NULL;
+    uint32_t reg_field = 0;
+
+    /* find Header register group */
+    reg_grp_entry = xen_pt_find_reg_grp(s, PCI_CAPABILITY_LIST);
+    if (reg_grp_entry) {
+        /* find Capabilities Pointer register */
+        reg_entry = xen_pt_find_reg(reg_grp_entry, PCI_CAPABILITY_LIST);
+        if (reg_entry) {
+            /* check Capabilities Pointer register */
+            if (reg_entry->data) {
+                reg_field |= PCI_STATUS_CAP_LIST;
+            } else {
+                reg_field &= ~PCI_STATUS_CAP_LIST;
+            }
+        } else {
+            xen_shutdown_fatal_error("Internal error: Couldn't find XenPTReg*"
+                                     " for Capabilities Pointer register."
+                                     " (%s)\n", __func__);
+            return -1;
+        }
+    } else {
+        xen_shutdown_fatal_error("Internal error: Couldn't find XenPTRegGroup"
+                                 " for Header. (%s)\n", __func__);
+        return -1;
+    }
+
+    *data = reg_field;
+    return 0;
+}
+static int xen_pt_header_type_reg_init(XenPCIPassthroughState *s,
+                                       XenPTRegInfo *reg, uint32_t real_offset,
+                                       uint32_t *data)
+{
+    /* read PCI_HEADER_TYPE */
+    *data = reg->init_val | 0x80;
+    return 0;
+}
+
+/* initialize Interrupt Pin register */
+static int xen_pt_irqpin_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegInfo *reg, uint32_t real_offset,
+                                  uint32_t *data)
+{
+    *data = xen_pt_pci_read_intx(s);
+    return 0;
+}
+
+/* Command register */
+static int xen_pt_cmd_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                               uint16_t *value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t valid_emu_mask = 0;
+    uint16_t emu_mask = reg->emu_mask;
+
+    if (s->is_virtfn) {
+        emu_mask |= PCI_COMMAND_MEMORY;
+    }
+
+    /* emulate word register */
+    valid_emu_mask = emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+static int xen_pt_cmd_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                uint16_t *val, uint16_t dev_value,
+                                uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t emu_mask = reg->emu_mask;
+
+    if (s->is_virtfn) {
+        emu_mask |= PCI_COMMAND_MEMORY;
+    }
+
+    /* modify emulate register */
+    writable_mask = ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~emu_mask & valid_mask;
+
+    if (*val & PCI_COMMAND_INTX_DISABLE) {
+        throughable_mask |= PCI_COMMAND_INTX_DISABLE;
+    } else {
+        if (s->machine_irq) {
+            throughable_mask |= PCI_COMMAND_INTX_DISABLE;
+        }
+    }
+
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+
+/* BAR */
+#define XEN_PT_BAR_MEM_RO_MASK    0x0000000F  /* BAR ReadOnly mask(Memory) */
+#define XEN_PT_BAR_MEM_EMU_MASK   0xFFFFFFF0  /* BAR emul mask(Memory) */
+#define XEN_PT_BAR_IO_RO_MASK     0x00000003  /* BAR ReadOnly mask(I/O) */
+#define XEN_PT_BAR_IO_EMU_MASK    0xFFFFFFFC  /* BAR emul mask(I/O) */
+
+static XenPTBarFlag xen_pt_bar_reg_parse(XenPCIPassthroughState *s,
+                                         XenPTRegInfo *reg)
+{
+    PCIDevice *d = &s->dev;
+    XenPTRegion *region = NULL;
+    PCIIORegion *r;
+    int index = 0;
+
+    /* check 64bit BAR */
+    index = xen_pt_bar_offset_to_index(reg->offset);
+    if ((0 < index) && (index < PCI_ROM_SLOT)) {
+        int type = s->real_device.io_regions[index - 1].type;
+
+        if ((type & XEN_HOST_PCI_REGION_TYPE_MEM)
+            && (type & XEN_HOST_PCI_REGION_TYPE_MEM_64)) {
+            region = &s->bases[index - 1];
+            if (region->bar_flag != XEN_PT_BAR_FLAG_UPPER) {
+                return XEN_PT_BAR_FLAG_UPPER;
+            }
+        }
+    }
+
+    /* check unused BAR */
+    r = &d->io_regions[index];
+    if (r->size == 0) {
+        return XEN_PT_BAR_FLAG_UNUSED;
+    }
+
+    /* for ExpROM BAR */
+    if (index == PCI_ROM_SLOT) {
+        return XEN_PT_BAR_FLAG_MEM;
+    }
+
+    /* check BAR I/O indicator */
+    if (s->real_device.io_regions[index].type & XEN_HOST_PCI_REGION_TYPE_IO) {
+        return XEN_PT_BAR_FLAG_IO;
+    } else {
+        return XEN_PT_BAR_FLAG_MEM;
+    }
+}
+
+static inline uint32_t base_address_with_flags(XenHostPCIIORegion *hr)
+{
+    if (hr->type & XEN_HOST_PCI_REGION_TYPE_IO) {
+        return hr->base_addr | (hr->bus_flags & ~PCI_BASE_ADDRESS_IO_MASK);
+    } else {
+        return hr->base_addr | (hr->bus_flags & ~PCI_BASE_ADDRESS_MEM_MASK);
+    }
+}
+
+static int xen_pt_bar_reg_init(XenPCIPassthroughState *s, XenPTRegInfo *reg,
+                               uint32_t real_offset, uint32_t *data)
+{
+    uint32_t reg_field = 0;
+    int index;
+
+    index = xen_pt_bar_offset_to_index(reg->offset);
+    if (index < 0 || index >= PCI_NUM_REGIONS) {
+        XEN_PT_ERR(&s->dev, "Internal error: Invalid BAR index [%d].\n", index);
+        return -1;
+    }
+
+    /* set BAR flag */
+    s->bases[index].bar_flag = xen_pt_bar_reg_parse(s, reg);
+    if (s->bases[index].bar_flag == XEN_PT_BAR_FLAG_UNUSED) {
+        reg_field = XEN_PT_INVALID_REG;
+    }
+
+    *data = reg_field;
+    return 0;
+}
+static int xen_pt_bar_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                               uint32_t *value, uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint32_t valid_emu_mask = 0;
+    uint32_t bar_emu_mask = 0;
+    int index;
+
+    /* get BAR index */
+    index = xen_pt_bar_offset_to_index(reg->offset);
+    if (index < 0 || index >= PCI_NUM_REGIONS) {
+        XEN_PT_ERR(&s->dev, "Internal error: Invalid BAR index [%d].\n", index);
+        return -1;
+    }
+
+    /* use fixed-up value from kernel sysfs */
+    *value = base_address_with_flags(&s->real_device.io_regions[index]);
+
+    /* set emulate mask depend on BAR flag */
+    switch (s->bases[index].bar_flag) {
+    case XEN_PT_BAR_FLAG_MEM:
+        bar_emu_mask = XEN_PT_BAR_MEM_EMU_MASK;
+        break;
+    case XEN_PT_BAR_FLAG_IO:
+        bar_emu_mask = XEN_PT_BAR_IO_EMU_MASK;
+        break;
+    case XEN_PT_BAR_FLAG_UPPER:
+        bar_emu_mask = XEN_PT_BAR_ALLF;
+        break;
+    default:
+        break;
+    }
+
+    /* emulate BAR */
+    valid_emu_mask = bar_emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+static int xen_pt_bar_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                uint32_t *val, uint32_t dev_value,
+                                uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    XenPTRegion *base = NULL;
+    PCIDevice *d = &s->dev;
+    const PCIIORegion *r;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    uint32_t bar_emu_mask = 0;
+    uint32_t bar_ro_mask = 0;
+    uint32_t r_size = 0;
+    int index = 0;
+
+    index = xen_pt_bar_offset_to_index(reg->offset);
+    if (index < 0 || index >= PCI_NUM_REGIONS) {
+        XEN_PT_ERR(d, "Internal error: Invalid BAR index [%d].\n", index);
+        return -1;
+    }
+
+    r = &d->io_regions[index];
+    base = &s->bases[index];
+    r_size = xen_pt_get_emul_size(base->bar_flag, r->size);
+
+    /* set emulate mask and read-only mask values depend on the BAR flag */
+    switch (s->bases[index].bar_flag) {
+    case XEN_PT_BAR_FLAG_MEM:
+        bar_emu_mask = XEN_PT_BAR_MEM_EMU_MASK;
+        bar_ro_mask = XEN_PT_BAR_MEM_RO_MASK | (r_size - 1);
+        break;
+    case XEN_PT_BAR_FLAG_IO:
+        bar_emu_mask = XEN_PT_BAR_IO_EMU_MASK;
+        bar_ro_mask = XEN_PT_BAR_IO_RO_MASK | (r_size - 1);
+        break;
+    case XEN_PT_BAR_FLAG_UPPER:
+        bar_emu_mask = XEN_PT_BAR_ALLF;
+        bar_ro_mask = 0;    /* all upper 32bit are R/W */
+        break;
+    default:
+        break;
+    }
+
+    /* modify emulate register */
+    writable_mask = bar_emu_mask & ~bar_ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* check whether we need to update the virtual region address or not */
+    switch (s->bases[index].bar_flag) {
+    case XEN_PT_BAR_FLAG_MEM:
+        /* nothing to do */
+        break;
+    case XEN_PT_BAR_FLAG_IO:
+        /* nothing to do */
+        break;
+    case XEN_PT_BAR_FLAG_UPPER:
+        if (cfg_entry->data) {
+            if (cfg_entry->data != (XEN_PT_BAR_ALLF & ~bar_ro_mask)) {
+                XEN_PT_WARN(d, "Guest attempt to set high MMIO Base Address. "
+                            "Ignore mapping. "
+                            "(offset: 0x%02x, high address: 0x%08x)\n",
+                            reg->offset, cfg_entry->data);
+            }
+        }
+        break;
+    default:
+        break;
+    }
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~bar_emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+
+/* write Exp ROM BAR */
+static int xen_pt_exp_rom_bar_reg_write(XenPCIPassthroughState *s,
+                                        XenPTReg *cfg_entry, uint32_t *val,
+                                        uint32_t dev_value, uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    XenPTRegion *base = NULL;
+    PCIDevice *d = (PCIDevice *)&s->dev;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    pcibus_t r_size = 0;
+    uint32_t bar_emu_mask = 0;
+    uint32_t bar_ro_mask = 0;
+
+    r_size = d->io_regions[PCI_ROM_SLOT].size;
+    base = &s->bases[PCI_ROM_SLOT];
+    /* align memory type resource size */
+    r_size = xen_pt_get_emul_size(base->bar_flag, r_size);
+
+    /* set emulate mask and read-only mask */
+    bar_emu_mask = reg->emu_mask;
+    bar_ro_mask = (reg->ro_mask | (r_size - 1)) & ~PCI_ROM_ADDRESS_ENABLE;
+
+    /* modify emulate register */
+    writable_mask = ~bar_ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~bar_emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+
+/* Header Type0 reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_header0[] = {
+    /* Vendor ID reg */
+    {
+        .offset     = PCI_VENDOR_ID,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_vendor_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Device ID reg */
+    {
+        .offset     = PCI_DEVICE_ID,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_device_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Command reg */
+    {
+        .offset     = PCI_COMMAND,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xF880,
+        .emu_mask   = 0x0740,
+        .init       = xen_pt_common_reg_init,
+        .u.w.read   = xen_pt_cmd_reg_read,
+        .u.w.write  = xen_pt_cmd_reg_write,
+    },
+    /* Capabilities Pointer reg */
+    {
+        .offset     = PCI_CAPABILITY_LIST,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Status reg */
+    /* use emulated Cap Ptr value to initialize,
+     * so need to be declared after Cap Ptr reg
+     */
+    {
+        .offset     = PCI_STATUS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x06FF,
+        .emu_mask   = 0x0010,
+        .init       = xen_pt_status_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Cache Line Size reg */
+    {
+        .offset     = PCI_CACHE_LINE_SIZE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_common_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Latency Timer reg */
+    {
+        .offset     = PCI_LATENCY_TIMER,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_common_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Header Type reg */
+    {
+        .offset     = PCI_HEADER_TYPE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0x00,
+        .init       = xen_pt_header_type_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Interrupt Line reg */
+    {
+        .offset     = PCI_INTERRUPT_LINE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_common_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Interrupt Pin reg */
+    {
+        .offset     = PCI_INTERRUPT_PIN,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_irqpin_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* BAR 0 reg */
+    /* mask of BAR need to be decided later, depends on IO/MEM type */
+    {
+        .offset     = PCI_BASE_ADDRESS_0,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* BAR 1 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_1,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* BAR 2 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_2,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* BAR 3 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_3,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* BAR 4 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_4,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* BAR 5 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_5,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* Expansion ROM BAR reg */
+    {
+        .offset     = PCI_ROM_ADDRESS,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x000007FE,
+        .emu_mask   = 0xFFFFF800,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_long_reg_read,
+        .u.dw.write = xen_pt_exp_rom_bar_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/*********************************
+ * Vital Product Data Capability
+ */
+
+/* Vital Product Data Capability Structure reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_vpd[] = {
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/**************************************
+ * Vendor Specific Capability
+ */
+
+/* Vendor Specific Capability Structure reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_vendor[] = {
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/*****************************
+ * PCI Express Capability
+ */
+
+static inline uint8_t get_capability_version(XenPCIPassthroughState *s,
+                                             uint32_t offset)
+{
+    uint8_t flags = pci_get_byte(s->dev.config + offset + PCI_EXP_FLAGS);
+    return flags & PCI_EXP_FLAGS_VERS;
+}
+
+static inline uint8_t get_device_type(XenPCIPassthroughState *s,
+                                      uint32_t offset)
+{
+    uint8_t flags = pci_get_byte(s->dev.config + offset + PCI_EXP_FLAGS);
+    return (flags & PCI_EXP_FLAGS_TYPE) >> 4;
+}
+
+/* initialize Link Control register */
+static int xen_pt_linkctrl_reg_init(XenPCIPassthroughState *s,
+                                    XenPTRegInfo *reg, uint32_t real_offset,
+                                    uint32_t *data)
+{
+    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+    uint8_t dev_type = get_device_type(s, real_offset - reg->offset);
+
+    /* no need to initialize in case of Root Complex Integrated Endpoint
+     * with cap_ver 1.x
+     */
+    if ((dev_type == PCI_EXP_TYPE_RC_END) && (cap_ver == 1)) {
+        *data = XEN_PT_INVALID_REG;
+    }
+
+    *data = reg->init_val;
+    return 0;
+}
+/* initialize Device Control 2 register */
+static int xen_pt_devctrl2_reg_init(XenPCIPassthroughState *s,
+                                    XenPTRegInfo *reg, uint32_t real_offset,
+                                    uint32_t *data)
+{
+    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+
+    /* no need to initialize in case of cap_ver 1.x */
+    if (cap_ver == 1) {
+        *data = XEN_PT_INVALID_REG;
+    }
+
+    *data = reg->init_val;
+    return 0;
+}
+/* initialize Link Control 2 register */
+static int xen_pt_linkctrl2_reg_init(XenPCIPassthroughState *s,
+                                     XenPTRegInfo *reg, uint32_t real_offset,
+                                     uint32_t *data)
+{
+    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+    uint32_t reg_field = 0;
+
+    /* no need to initialize in case of cap_ver 1.x */
+    if (cap_ver == 1) {
+        reg_field = XEN_PT_INVALID_REG;
+    } else {
+        /* set Supported Link Speed */
+        uint8_t lnkcap = pci_get_byte(s->dev.config + real_offset - reg->offset
+                                      + PCI_EXP_LNKCAP);
+        reg_field |= PCI_EXP_LNKCAP_SLS & lnkcap;
+    }
+
+    *data = reg_field;
+    return 0;
+}
+
+/* PCI Express Capability Structure reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_pcie[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Device Capabilities reg */
+    {
+        .offset     = PCI_EXP_DEVCAP,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x1FFCFFFF,
+        .emu_mask   = 0x10000000,
+        .init       = xen_pt_common_reg_init,
+        .u.dw.read  = xen_pt_long_reg_read,
+        .u.dw.write = xen_pt_long_reg_write,
+    },
+    /* Device Control reg */
+    {
+        .offset     = PCI_EXP_DEVCTL,
+        .size       = 2,
+        .init_val   = 0x2810,
+        .ro_mask    = 0x8400,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_common_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Link Control reg */
+    {
+        .offset     = PCI_EXP_LNKCTL,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFC34,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_linkctrl_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Device Control 2 reg */
+    {
+        .offset     = 0x28,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFE0,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_devctrl2_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Link Control 2 reg */
+    {
+        .offset     = 0x30,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xE040,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_linkctrl2_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/*********************************
+ * Power Management Capability
+ */
+
+/* read Power Management Control/Status register */
+static int xen_pt_pmcsr_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                 uint16_t *value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t valid_emu_mask = reg->emu_mask;
+
+    valid_emu_mask |= PCI_PM_CTRL_STATE_MASK | PCI_PM_CTRL_NO_SOFT_RESET;
+
+    valid_emu_mask = valid_emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+/* write Power Management Control/Status register */
+static int xen_pt_pmcsr_reg_write(XenPCIPassthroughState *s,
+                                  XenPTReg *cfg_entry, uint16_t *val,
+                                  uint16_t dev_value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t emu_mask = reg->emu_mask;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+
+    emu_mask |= PCI_PM_CTRL_STATE_MASK | PCI_PM_CTRL_NO_SOFT_RESET;
+
+    /* modify emulate register */
+    writable_mask = emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+
+/* Power Management Capability reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_pm[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Power Management Capabilities reg */
+    {
+        .offset     = PCI_CAP_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xF9C8,
+        .init       = xen_pt_common_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* PCI Power Management Control/Status reg */
+    {
+        .offset     = PCI_PM_CTRL,
+        .size       = 2,
+        .init_val   = 0x0008,
+        .ro_mask    = 0xE1FC,
+        .emu_mask   = 0x8100,
+        .init       = xen_pt_common_reg_init,
+        .u.w.read   = xen_pt_pmcsr_reg_read,
+        .u.w.write  = xen_pt_pmcsr_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/****************************
+ * Capabilities
+ */
+
+/* capability structure register group size functions */
+
+static int xen_pt_reg_grp_size_init(XenPCIPassthroughState *s,
+                                    const XenPTRegGroupInfo *grp_reg,
+                                    uint32_t base_offset, uint8_t *size)
+{
+    *size = grp_reg->grp_size;
+    return 0;
+}
+/* get Vendor Specific Capability Structure register group size */
+static int xen_pt_vendor_size_init(XenPCIPassthroughState *s,
+                                   const XenPTRegGroupInfo *grp_reg,
+                                   uint32_t base_offset, uint8_t *size)
+{
+    *size = pci_get_byte(s->dev.config + base_offset + 0x02);
+    return 0;
+}
+/* get PCI Express Capability Structure register group size */
+static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
+                                 const XenPTRegGroupInfo *grp_reg,
+                                 uint32_t base_offset, uint8_t *size)
+{
+    PCIDevice *d = &s->dev;
+    uint8_t version = get_capability_version(s, base_offset);
+    uint8_t type = get_device_type(s, base_offset);
+    uint8_t pcie_size = 0;
+
+
+    /* calculate size depend on capability version and device/port type */
+    /* in case of PCI Express Base Specification Rev 1.x */
+    if (version == 1) {
+        /* The PCI Express Capabilities, Device Capabilities, and Device
+         * Status/Control registers are required for all PCI Express devices.
+         * The Link Capabilities and Link Status/Control are required for all
+         * Endpoints that are not Root Complex Integrated Endpoints. Endpoints
+         * are not required to implement registers other than those listed
+         * above and terminate the capability structure.
+         */
+        switch (type) {
+        case PCI_EXP_TYPE_ENDPOINT:
+        case PCI_EXP_TYPE_LEG_END:
+            pcie_size = 0x14;
+            break;
+        case PCI_EXP_TYPE_RC_END:
+            /* has no link */
+            pcie_size = 0x0C;
+            break;
+            /* only EndPoint passthrough is supported */
+        case PCI_EXP_TYPE_ROOT_PORT:
+        case PCI_EXP_TYPE_UPSTREAM:
+        case PCI_EXP_TYPE_DOWNSTREAM:
+        case PCI_EXP_TYPE_PCI_BRIDGE:
+        case PCI_EXP_TYPE_PCIE_BRIDGE:
+        case PCI_EXP_TYPE_RC_EC:
+        default:
+            XEN_PT_ERR(d, "Unsupported device/port type %#x.\n", type);
+            return -1;
+        }
+    }
+    /* in case of PCI Express Base Specification Rev 2.0 */
+    else if (version == 2) {
+        switch (type) {
+        case PCI_EXP_TYPE_ENDPOINT:
+        case PCI_EXP_TYPE_LEG_END:
+        case PCI_EXP_TYPE_RC_END:
+            /* For Functions that do not implement the registers,
+             * these spaces must be hardwired to 0b.
+             */
+            pcie_size = 0x3C;
+            break;
+            /* only EndPoint passthrough is supported */
+        case PCI_EXP_TYPE_ROOT_PORT:
+        case PCI_EXP_TYPE_UPSTREAM:
+        case PCI_EXP_TYPE_DOWNSTREAM:
+        case PCI_EXP_TYPE_PCI_BRIDGE:
+        case PCI_EXP_TYPE_PCIE_BRIDGE:
+        case PCI_EXP_TYPE_RC_EC:
+        default:
+            XEN_PT_ERR(d, "Unsupported device/port type %#x.\n", type);
+            return -1;
+        }
+    } else {
+        XEN_PT_ERR(d, "Unsupported capability version %#x.\n", version);
+        return -1;
+    }
+
+    *size = pcie_size;
+    return 0;
+}
+
+static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
+    /* Header Type0 reg group */
+    {
+        .grp_id      = 0xFF,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0x40,
+        .size_init   = xen_pt_reg_grp_size_init,
+        .emu_regs = xen_pt_emu_reg_header0,
+    },
+    /* PCI PowerManagement Capability reg group */
+    {
+        .grp_id      = PCI_CAP_ID_PM,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = PCI_PM_SIZEOF,
+        .size_init   = xen_pt_reg_grp_size_init,
+        .emu_regs = xen_pt_emu_reg_pm,
+    },
+    /* AGP Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_AGP,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x30,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* Vital Product Data Capability Structure reg group */
+    {
+        .grp_id      = PCI_CAP_ID_VPD,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0x08,
+        .size_init   = xen_pt_reg_grp_size_init,
+        .emu_regs = xen_pt_emu_reg_vpd,
+    },
+    /* Slot Identification reg group */
+    {
+        .grp_id     = PCI_CAP_ID_SLOTID,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x04,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* PCI-X Capabilities List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_PCIX,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x18,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* Vendor Specific Capability Structure reg group */
+    {
+        .grp_id      = PCI_CAP_ID_VNDR,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0xFF,
+        .size_init   = xen_pt_vendor_size_init,
+        .emu_regs = xen_pt_emu_reg_vendor,
+    },
+    /* SHPC Capability List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_SHPC,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x08,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* Subsystem ID and Subsystem Vendor ID Capability List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_SSVID,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x08,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* AGP 8x Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_AGP3,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x30,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* PCI Express Capability Structure reg group */
+    {
+        .grp_id      = PCI_CAP_ID_EXP,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0xFF,
+        .size_init   = xen_pt_pcie_size_init,
+        .emu_regs = xen_pt_emu_reg_pcie,
+    },
+    {
+        .grp_size = 0,
+    },
+};
+
+/* initialize Capabilities Pointer or Next Pointer register */
+static int xen_pt_ptr_reg_init(XenPCIPassthroughState *s,
+                               XenPTRegInfo *reg, uint32_t real_offset,
+                               uint32_t *data)
+{
+    int i;
+    uint8_t *config = s->dev.config;
+    uint32_t reg_field = pci_get_byte(config + real_offset);
+    uint8_t cap_id = 0;
+
+    /* find capability offset */
+    while (reg_field) {
+        for (i = 0; xen_pt_emu_reg_grps[i].grp_size != 0; i++) {
+            if (xen_pt_hide_dev_cap(&s->real_device,
+                                    xen_pt_emu_reg_grps[i].grp_id)) {
+                continue;
+            }
+
+            cap_id = pci_get_byte(config + reg_field + PCI_CAP_LIST_ID);
+            if (xen_pt_emu_reg_grps[i].grp_id == cap_id) {
+                if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU) {
+                    goto out;
+                }
+                /* ignore the 0 hardwired capability, find next one */
+                break;
+            }
+        }
+
+        /* next capability */
+        reg_field = pci_get_byte(config + reg_field + PCI_CAP_LIST_NEXT);
+    }
+
+out:
+    *data = reg_field;
+    return 0;
+}
+
+
+/*************
+ * Main
+ */
+
+static uint8_t find_cap_offset(XenPCIPassthroughState *s, uint8_t cap)
+{
+    uint8_t id;
+    unsigned max_cap = PCI_CAP_MAX;
+    uint8_t pos = PCI_CAPABILITY_LIST;
+    uint8_t status = 0;
+
+    if (xen_host_pci_get_byte(&s->real_device, PCI_STATUS, &status)) {
+        return 0;
+    }
+    if ((status & PCI_STATUS_CAP_LIST) == 0) {
+        return 0;
+    }
+
+    while (max_cap--) {
+        if (xen_host_pci_get_byte(&s->real_device, pos, &pos)) {
+            break;
+        }
+        if (pos < PCI_CONFIG_HEADER_SIZE) {
+            break;
+        }
+
+        pos &= ~3;
+        if (xen_host_pci_get_byte(&s->real_device,
+                                  pos + PCI_CAP_LIST_ID, &id)) {
+            break;
+        }
+
+        if (id == 0xff) {
+            break;
+        }
+        if (id == cap) {
+            return pos;
+        }
+
+        pos += PCI_CAP_LIST_NEXT;
+    }
+    return 0;
+}
+
+static int xen_pt_config_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegGroup *reg_grp, XenPTRegInfo *reg)
+{
+    XenPTReg *reg_entry;
+    uint32_t data = 0;
+    int rc = 0;
+
+    reg_entry = g_new0(XenPTReg, 1);
+    reg_entry->reg = reg;
+
+    if (reg->init) {
+        /* initialize emulate register */
+        rc = reg->init(s, reg_entry->reg,
+                       reg_grp->base_offset + reg->offset, &data);
+        if (rc < 0) {
+            free(reg_entry);
+            return rc;
+        }
+        if (data == XEN_PT_INVALID_REG) {
+            /* free unused BAR register entry */
+            free(reg_entry);
+            return 0;
+        }
+        /* set register value */
+        reg_entry->data = data;
+    }
+    /* list add register entry */
+    QLIST_INSERT_HEAD(&reg_grp->reg_tbl_list, reg_entry, entries);
+
+    return 0;
+}
+
+int xen_pt_config_init(XenPCIPassthroughState *s)
+{
+    int i, rc;
+
+    QLIST_INIT(&s->reg_grps);
+
+    for (i = 0; xen_pt_emu_reg_grps[i].grp_size != 0; i++) {
+        uint32_t reg_grp_offset = 0;
+        XenPTRegGroup *reg_grp_entry = NULL;
+
+        if (xen_pt_emu_reg_grps[i].grp_id != 0xFF) {
+            if (xen_pt_hide_dev_cap(&s->real_device,
+                                    xen_pt_emu_reg_grps[i].grp_id)) {
+                continue;
+            }
+
+            reg_grp_offset = find_cap_offset(s, xen_pt_emu_reg_grps[i].grp_id);
+
+            if (!reg_grp_offset) {
+                continue;
+            }
+        }
+
+        reg_grp_entry = g_new0(XenPTRegGroup, 1);
+        QLIST_INIT(&reg_grp_entry->reg_tbl_list);
+        QLIST_INSERT_HEAD(&s->reg_grps, reg_grp_entry, entries);
+
+        reg_grp_entry->base_offset = reg_grp_offset;
+        reg_grp_entry->reg_grp = xen_pt_emu_reg_grps + i;
+        if (xen_pt_emu_reg_grps[i].size_init) {
+            /* get register group size */
+            rc = xen_pt_emu_reg_grps[i].size_init(s, reg_grp_entry->reg_grp,
+                                                  reg_grp_offset,
+                                                  &reg_grp_entry->size);
+            if (rc < 0) {
+                xen_pt_config_delete(s);
+                return rc;
+            }
+        }
+
+        if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU) {
+            if (xen_pt_emu_reg_grps[i].emu_regs) {
+                int j = 0;
+                XenPTRegInfo *regs = xen_pt_emu_reg_grps[i].emu_regs;
+                /* initialize capability register */
+                for (j = 0; regs->size != 0; j++, regs++) {
+                    /* initialize capability register */
+                    rc = xen_pt_config_reg_init(s, reg_grp_entry, regs);
+                    if (rc < 0) {
+                        xen_pt_config_delete(s);
+                        return rc;
+                    }
+                }
+            }
+        }
+    }
+
+    return 0;
+}
+
+/* delete all emulate register */
+void xen_pt_config_delete(XenPCIPassthroughState *s)
+{
+    struct XenPTRegGroup *reg_group, *next_grp;
+    struct XenPTReg *reg, *next_reg;
+
+    /* free all register group entry */
+    QLIST_FOREACH_SAFE(reg_group, &s->reg_grps, entries, next_grp) {
+        /* free all register entry */
+        QLIST_FOREACH_SAFE(reg, &reg_group->reg_tbl_list, entries, next_reg) {
+            QLIST_REMOVE(reg, entries);
+            g_free(reg);
+        }
+
+        QLIST_REMOVE(reg_group, entries);
+        g_free(reg_group);
+    }
+}
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V13 7/9] Introduce Xen PCI Passthrough, PCI config space helpers (2/3)
@ 2012-06-14 17:01   ` Anthony PERARD
  0 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Allen Kay, Xen Devel, Guy Zana,
	Anthony PERARD

From: Allen Kay <allen.m.kay@intel.com>

A more complete history can be found here:
git://xenbits.xensource.com/qemu-xen-unstable.git

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Guy Zana <guy@neocleus.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/xen_pt.c             |   10 +
 hw/xen_pt.h             |    2 +
 hw/xen_pt_config_init.c | 1387 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1399 insertions(+), 0 deletions(-)

diff --git a/hw/xen_pt.c b/hw/xen_pt.c
index 63a5c80..92ad0fa 100644
--- a/hw/xen_pt.c
+++ b/hw/xen_pt.c
@@ -673,6 +673,13 @@ static int xen_pt_initfn(PCIDevice *d)
     /* Handle real device's MMIO/PIO BARs */
     xen_pt_register_regions(s);
 
+    /* reinitialize each config register to be emulated */
+    if (xen_pt_config_init(s)) {
+        XEN_PT_ERR(d, "PCI Config space initialisation failed.\n");
+        xen_host_pci_device_put(&s->real_device);
+        return -1;
+    }
+
     /* Bind interrupt */
     if (!s->dev.config[PCI_INTERRUPT_PIN]) {
         XEN_PT_LOG(d, "no pin interrupt\n");
@@ -771,6 +778,9 @@ static int xen_pt_unregister_device(PCIDevice *d)
         }
     }
 
+    /* delete all emulated config registers */
+    xen_pt_config_delete(s);
+
     xen_pt_unregister_regions(s);
     memory_listener_unregister(&s->memory_listener);
 
diff --git a/hw/xen_pt.h b/hw/xen_pt.h
index 36001a7..4b76073 100644
--- a/hw/xen_pt.h
+++ b/hw/xen_pt.h
@@ -62,6 +62,8 @@ typedef int (*xen_pt_conf_byte_read)
 #define XEN_PT_BAR_ALLF 0xFFFFFFFF
 #define XEN_PT_BAR_UNMAPPED (-1)
 
+#define PCI_CAP_MAX 48
+
 
 typedef enum {
     XEN_PT_GRP_TYPE_HARDWIRED = 0,  /* 0 Hardwired reg group */
diff --git a/hw/xen_pt_config_init.c b/hw/xen_pt_config_init.c
index 64d22e8..1d97876 100644
--- a/hw/xen_pt_config_init.c
+++ b/hw/xen_pt_config_init.c
@@ -1,11 +1,1398 @@
+/*
+ * Copyright (c) 2007, Neocleus Corporation.
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Alex Novik <alex@neocleus.com>
+ * Allen Kay <allen.m.kay@intel.com>
+ * Guy Zana <guy@neocleus.com>
+ *
+ * This file implements direct PCI assignment to a HVM guest
+ */
+
+#include "qemu-timer.h"
+#include "xen_backend.h"
 #include "xen_pt.h"
 
+#define XEN_PT_MERGE_VALUE(value, data, val_mask) \
+    (((value) & (val_mask)) | ((data) & ~(val_mask)))
+
+#define XEN_PT_INVALID_REG          0xFFFFFFFF      /* invalid register value */
+
+/* prototype */
+
+static int xen_pt_ptr_reg_init(XenPCIPassthroughState *s, XenPTRegInfo *reg,
+                               uint32_t real_offset, uint32_t *data);
+
+
+/* helper */
+
+/* A return value of 1 means the capability should NOT be exposed to guest. */
+static int xen_pt_hide_dev_cap(const XenHostPCIDevice *d, uint8_t grp_id)
+{
+    switch (grp_id) {
+    case PCI_CAP_ID_EXP:
+        /* The PCI Express Capability Structure of the VF of Intel 82599 10GbE
+         * Controller looks trivial, e.g., the PCI Express Capabilities
+         * Register is 0. We should not try to expose it to guest.
+         *
+         * The datasheet is available at
+         * http://download.intel.com/design/network/datashts/82599_datasheet.pdf
+         *
+         * See 'Table 9.7. VF PCIe Configuration Space' of the datasheet, the
+         * PCI Express Capability Structure of the VF of Intel 82599 10GbE
+         * Controller looks trivial, e.g., the PCI Express Capabilities
+         * Register is 0, so the Capability Version is 0 and
+         * xen_pt_pcie_size_init() would fail.
+         */
+        if (d->vendor_id == PCI_VENDOR_ID_INTEL &&
+            d->device_id == PCI_DEVICE_ID_INTEL_82599_SFP_VF) {
+            return 1;
+        }
+        break;
+    }
+    return 0;
+}
+
+/*   find emulate register group entry */
 XenPTRegGroup *xen_pt_find_reg_grp(XenPCIPassthroughState *s, uint32_t address)
 {
+    XenPTRegGroup *entry = NULL;
+
+    /* find register group entry */
+    QLIST_FOREACH(entry, &s->reg_grps, entries) {
+        /* check address */
+        if ((entry->base_offset <= address)
+            && ((entry->base_offset + entry->size) > address)) {
+            return entry;
+        }
+    }
+
+    /* group entry not found */
     return NULL;
 }
 
+/* find emulate register entry */
 XenPTReg *xen_pt_find_reg(XenPTRegGroup *reg_grp, uint32_t address)
 {
+    XenPTReg *reg_entry = NULL;
+    XenPTRegInfo *reg = NULL;
+    uint32_t real_offset = 0;
+
+    /* find register entry */
+    QLIST_FOREACH(reg_entry, &reg_grp->reg_tbl_list, entries) {
+        reg = reg_entry->reg;
+        real_offset = reg_grp->base_offset + reg->offset;
+        /* check address */
+        if ((real_offset <= address)
+            && ((real_offset + reg->size) > address)) {
+            return reg_entry;
+        }
+    }
+
     return NULL;
 }
+
+
+/****************
+ * general register functions
+ */
+
+/* register initialization function */
+
+static int xen_pt_common_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegInfo *reg, uint32_t real_offset,
+                                  uint32_t *data)
+{
+    *data = reg->init_val;
+    return 0;
+}
+
+/* Read register functions */
+
+static int xen_pt_byte_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                uint8_t *value, uint8_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint8_t valid_emu_mask = 0;
+
+    /* emulate byte register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+static int xen_pt_word_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                uint16_t *value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t valid_emu_mask = 0;
+
+    /* emulate word register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+static int xen_pt_long_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                uint32_t *value, uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint32_t valid_emu_mask = 0;
+
+    /* emulate long register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+
+/* Write register functions */
+
+static int xen_pt_byte_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                 uint8_t *val, uint8_t dev_value,
+                                 uint8_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint8_t writable_mask = 0;
+    uint8_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+static int xen_pt_word_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                 uint16_t *val, uint16_t dev_value,
+                                 uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+static int xen_pt_long_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                 uint32_t *val, uint32_t dev_value,
+                                 uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+
+
+/* XenPTRegInfo declaration
+ * - only for emulated register (either a part or whole bit).
+ * - for passthrough register that need special behavior (like interacting with
+ *   other component), set emu_mask to all 0 and specify r/w func properly.
+ * - do NOT use ALL F for init_val, otherwise the tbl will not be registered.
+ */
+
+/********************
+ * Header Type0
+ */
+
+static int xen_pt_vendor_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegInfo *reg, uint32_t real_offset,
+                                  uint32_t *data)
+{
+    *data = s->real_device.vendor_id;
+    return 0;
+}
+static int xen_pt_device_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegInfo *reg, uint32_t real_offset,
+                                  uint32_t *data)
+{
+    *data = s->real_device.device_id;
+    return 0;
+}
+static int xen_pt_status_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegInfo *reg, uint32_t real_offset,
+                                  uint32_t *data)
+{
+    XenPTRegGroup *reg_grp_entry = NULL;
+    XenPTReg *reg_entry = NULL;
+    uint32_t reg_field = 0;
+
+    /* find Header register group */
+    reg_grp_entry = xen_pt_find_reg_grp(s, PCI_CAPABILITY_LIST);
+    if (reg_grp_entry) {
+        /* find Capabilities Pointer register */
+        reg_entry = xen_pt_find_reg(reg_grp_entry, PCI_CAPABILITY_LIST);
+        if (reg_entry) {
+            /* check Capabilities Pointer register */
+            if (reg_entry->data) {
+                reg_field |= PCI_STATUS_CAP_LIST;
+            } else {
+                reg_field &= ~PCI_STATUS_CAP_LIST;
+            }
+        } else {
+            xen_shutdown_fatal_error("Internal error: Couldn't find XenPTReg*"
+                                     " for Capabilities Pointer register."
+                                     " (%s)\n", __func__);
+            return -1;
+        }
+    } else {
+        xen_shutdown_fatal_error("Internal error: Couldn't find XenPTRegGroup"
+                                 " for Header. (%s)\n", __func__);
+        return -1;
+    }
+
+    *data = reg_field;
+    return 0;
+}
+static int xen_pt_header_type_reg_init(XenPCIPassthroughState *s,
+                                       XenPTRegInfo *reg, uint32_t real_offset,
+                                       uint32_t *data)
+{
+    /* read PCI_HEADER_TYPE */
+    *data = reg->init_val | 0x80;
+    return 0;
+}
+
+/* initialize Interrupt Pin register */
+static int xen_pt_irqpin_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegInfo *reg, uint32_t real_offset,
+                                  uint32_t *data)
+{
+    *data = xen_pt_pci_read_intx(s);
+    return 0;
+}
+
+/* Command register */
+static int xen_pt_cmd_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                               uint16_t *value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t valid_emu_mask = 0;
+    uint16_t emu_mask = reg->emu_mask;
+
+    if (s->is_virtfn) {
+        emu_mask |= PCI_COMMAND_MEMORY;
+    }
+
+    /* emulate word register */
+    valid_emu_mask = emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+static int xen_pt_cmd_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                uint16_t *val, uint16_t dev_value,
+                                uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t emu_mask = reg->emu_mask;
+
+    if (s->is_virtfn) {
+        emu_mask |= PCI_COMMAND_MEMORY;
+    }
+
+    /* modify emulate register */
+    writable_mask = ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~emu_mask & valid_mask;
+
+    if (*val & PCI_COMMAND_INTX_DISABLE) {
+        throughable_mask |= PCI_COMMAND_INTX_DISABLE;
+    } else {
+        if (s->machine_irq) {
+            throughable_mask |= PCI_COMMAND_INTX_DISABLE;
+        }
+    }
+
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+
+/* BAR */
+#define XEN_PT_BAR_MEM_RO_MASK    0x0000000F  /* BAR ReadOnly mask(Memory) */
+#define XEN_PT_BAR_MEM_EMU_MASK   0xFFFFFFF0  /* BAR emul mask(Memory) */
+#define XEN_PT_BAR_IO_RO_MASK     0x00000003  /* BAR ReadOnly mask(I/O) */
+#define XEN_PT_BAR_IO_EMU_MASK    0xFFFFFFFC  /* BAR emul mask(I/O) */
+
+static XenPTBarFlag xen_pt_bar_reg_parse(XenPCIPassthroughState *s,
+                                         XenPTRegInfo *reg)
+{
+    PCIDevice *d = &s->dev;
+    XenPTRegion *region = NULL;
+    PCIIORegion *r;
+    int index = 0;
+
+    /* check 64bit BAR */
+    index = xen_pt_bar_offset_to_index(reg->offset);
+    if ((0 < index) && (index < PCI_ROM_SLOT)) {
+        int type = s->real_device.io_regions[index - 1].type;
+
+        if ((type & XEN_HOST_PCI_REGION_TYPE_MEM)
+            && (type & XEN_HOST_PCI_REGION_TYPE_MEM_64)) {
+            region = &s->bases[index - 1];
+            if (region->bar_flag != XEN_PT_BAR_FLAG_UPPER) {
+                return XEN_PT_BAR_FLAG_UPPER;
+            }
+        }
+    }
+
+    /* check unused BAR */
+    r = &d->io_regions[index];
+    if (r->size == 0) {
+        return XEN_PT_BAR_FLAG_UNUSED;
+    }
+
+    /* for ExpROM BAR */
+    if (index == PCI_ROM_SLOT) {
+        return XEN_PT_BAR_FLAG_MEM;
+    }
+
+    /* check BAR I/O indicator */
+    if (s->real_device.io_regions[index].type & XEN_HOST_PCI_REGION_TYPE_IO) {
+        return XEN_PT_BAR_FLAG_IO;
+    } else {
+        return XEN_PT_BAR_FLAG_MEM;
+    }
+}
+
+static inline uint32_t base_address_with_flags(XenHostPCIIORegion *hr)
+{
+    if (hr->type & XEN_HOST_PCI_REGION_TYPE_IO) {
+        return hr->base_addr | (hr->bus_flags & ~PCI_BASE_ADDRESS_IO_MASK);
+    } else {
+        return hr->base_addr | (hr->bus_flags & ~PCI_BASE_ADDRESS_MEM_MASK);
+    }
+}
+
+static int xen_pt_bar_reg_init(XenPCIPassthroughState *s, XenPTRegInfo *reg,
+                               uint32_t real_offset, uint32_t *data)
+{
+    uint32_t reg_field = 0;
+    int index;
+
+    index = xen_pt_bar_offset_to_index(reg->offset);
+    if (index < 0 || index >= PCI_NUM_REGIONS) {
+        XEN_PT_ERR(&s->dev, "Internal error: Invalid BAR index [%d].\n", index);
+        return -1;
+    }
+
+    /* set BAR flag */
+    s->bases[index].bar_flag = xen_pt_bar_reg_parse(s, reg);
+    if (s->bases[index].bar_flag == XEN_PT_BAR_FLAG_UNUSED) {
+        reg_field = XEN_PT_INVALID_REG;
+    }
+
+    *data = reg_field;
+    return 0;
+}
+static int xen_pt_bar_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                               uint32_t *value, uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint32_t valid_emu_mask = 0;
+    uint32_t bar_emu_mask = 0;
+    int index;
+
+    /* get BAR index */
+    index = xen_pt_bar_offset_to_index(reg->offset);
+    if (index < 0 || index >= PCI_NUM_REGIONS) {
+        XEN_PT_ERR(&s->dev, "Internal error: Invalid BAR index [%d].\n", index);
+        return -1;
+    }
+
+    /* use fixed-up value from kernel sysfs */
+    *value = base_address_with_flags(&s->real_device.io_regions[index]);
+
+    /* set emulate mask depend on BAR flag */
+    switch (s->bases[index].bar_flag) {
+    case XEN_PT_BAR_FLAG_MEM:
+        bar_emu_mask = XEN_PT_BAR_MEM_EMU_MASK;
+        break;
+    case XEN_PT_BAR_FLAG_IO:
+        bar_emu_mask = XEN_PT_BAR_IO_EMU_MASK;
+        break;
+    case XEN_PT_BAR_FLAG_UPPER:
+        bar_emu_mask = XEN_PT_BAR_ALLF;
+        break;
+    default:
+        break;
+    }
+
+    /* emulate BAR */
+    valid_emu_mask = bar_emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+static int xen_pt_bar_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                uint32_t *val, uint32_t dev_value,
+                                uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    XenPTRegion *base = NULL;
+    PCIDevice *d = &s->dev;
+    const PCIIORegion *r;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    uint32_t bar_emu_mask = 0;
+    uint32_t bar_ro_mask = 0;
+    uint32_t r_size = 0;
+    int index = 0;
+
+    index = xen_pt_bar_offset_to_index(reg->offset);
+    if (index < 0 || index >= PCI_NUM_REGIONS) {
+        XEN_PT_ERR(d, "Internal error: Invalid BAR index [%d].\n", index);
+        return -1;
+    }
+
+    r = &d->io_regions[index];
+    base = &s->bases[index];
+    r_size = xen_pt_get_emul_size(base->bar_flag, r->size);
+
+    /* set emulate mask and read-only mask values depend on the BAR flag */
+    switch (s->bases[index].bar_flag) {
+    case XEN_PT_BAR_FLAG_MEM:
+        bar_emu_mask = XEN_PT_BAR_MEM_EMU_MASK;
+        bar_ro_mask = XEN_PT_BAR_MEM_RO_MASK | (r_size - 1);
+        break;
+    case XEN_PT_BAR_FLAG_IO:
+        bar_emu_mask = XEN_PT_BAR_IO_EMU_MASK;
+        bar_ro_mask = XEN_PT_BAR_IO_RO_MASK | (r_size - 1);
+        break;
+    case XEN_PT_BAR_FLAG_UPPER:
+        bar_emu_mask = XEN_PT_BAR_ALLF;
+        bar_ro_mask = 0;    /* all upper 32bit are R/W */
+        break;
+    default:
+        break;
+    }
+
+    /* modify emulate register */
+    writable_mask = bar_emu_mask & ~bar_ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* check whether we need to update the virtual region address or not */
+    switch (s->bases[index].bar_flag) {
+    case XEN_PT_BAR_FLAG_MEM:
+        /* nothing to do */
+        break;
+    case XEN_PT_BAR_FLAG_IO:
+        /* nothing to do */
+        break;
+    case XEN_PT_BAR_FLAG_UPPER:
+        if (cfg_entry->data) {
+            if (cfg_entry->data != (XEN_PT_BAR_ALLF & ~bar_ro_mask)) {
+                XEN_PT_WARN(d, "Guest attempt to set high MMIO Base Address. "
+                            "Ignore mapping. "
+                            "(offset: 0x%02x, high address: 0x%08x)\n",
+                            reg->offset, cfg_entry->data);
+            }
+        }
+        break;
+    default:
+        break;
+    }
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~bar_emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+
+/* write Exp ROM BAR */
+static int xen_pt_exp_rom_bar_reg_write(XenPCIPassthroughState *s,
+                                        XenPTReg *cfg_entry, uint32_t *val,
+                                        uint32_t dev_value, uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    XenPTRegion *base = NULL;
+    PCIDevice *d = (PCIDevice *)&s->dev;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    pcibus_t r_size = 0;
+    uint32_t bar_emu_mask = 0;
+    uint32_t bar_ro_mask = 0;
+
+    r_size = d->io_regions[PCI_ROM_SLOT].size;
+    base = &s->bases[PCI_ROM_SLOT];
+    /* align memory type resource size */
+    r_size = xen_pt_get_emul_size(base->bar_flag, r_size);
+
+    /* set emulate mask and read-only mask */
+    bar_emu_mask = reg->emu_mask;
+    bar_ro_mask = (reg->ro_mask | (r_size - 1)) & ~PCI_ROM_ADDRESS_ENABLE;
+
+    /* modify emulate register */
+    writable_mask = ~bar_ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~bar_emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+
+/* Header Type0 reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_header0[] = {
+    /* Vendor ID reg */
+    {
+        .offset     = PCI_VENDOR_ID,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_vendor_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Device ID reg */
+    {
+        .offset     = PCI_DEVICE_ID,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_device_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Command reg */
+    {
+        .offset     = PCI_COMMAND,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xF880,
+        .emu_mask   = 0x0740,
+        .init       = xen_pt_common_reg_init,
+        .u.w.read   = xen_pt_cmd_reg_read,
+        .u.w.write  = xen_pt_cmd_reg_write,
+    },
+    /* Capabilities Pointer reg */
+    {
+        .offset     = PCI_CAPABILITY_LIST,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Status reg */
+    /* use emulated Cap Ptr value to initialize,
+     * so need to be declared after Cap Ptr reg
+     */
+    {
+        .offset     = PCI_STATUS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x06FF,
+        .emu_mask   = 0x0010,
+        .init       = xen_pt_status_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Cache Line Size reg */
+    {
+        .offset     = PCI_CACHE_LINE_SIZE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_common_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Latency Timer reg */
+    {
+        .offset     = PCI_LATENCY_TIMER,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_common_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Header Type reg */
+    {
+        .offset     = PCI_HEADER_TYPE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0x00,
+        .init       = xen_pt_header_type_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Interrupt Line reg */
+    {
+        .offset     = PCI_INTERRUPT_LINE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_common_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Interrupt Pin reg */
+    {
+        .offset     = PCI_INTERRUPT_PIN,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_irqpin_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* BAR 0 reg */
+    /* mask of BAR need to be decided later, depends on IO/MEM type */
+    {
+        .offset     = PCI_BASE_ADDRESS_0,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* BAR 1 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_1,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* BAR 2 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_2,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* BAR 3 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_3,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* BAR 4 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_4,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* BAR 5 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_5,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_bar_reg_read,
+        .u.dw.write = xen_pt_bar_reg_write,
+    },
+    /* Expansion ROM BAR reg */
+    {
+        .offset     = PCI_ROM_ADDRESS,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x000007FE,
+        .emu_mask   = 0xFFFFF800,
+        .init       = xen_pt_bar_reg_init,
+        .u.dw.read  = xen_pt_long_reg_read,
+        .u.dw.write = xen_pt_exp_rom_bar_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/*********************************
+ * Vital Product Data Capability
+ */
+
+/* Vital Product Data Capability Structure reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_vpd[] = {
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/**************************************
+ * Vendor Specific Capability
+ */
+
+/* Vendor Specific Capability Structure reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_vendor[] = {
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/*****************************
+ * PCI Express Capability
+ */
+
+static inline uint8_t get_capability_version(XenPCIPassthroughState *s,
+                                             uint32_t offset)
+{
+    uint8_t flags = pci_get_byte(s->dev.config + offset + PCI_EXP_FLAGS);
+    return flags & PCI_EXP_FLAGS_VERS;
+}
+
+static inline uint8_t get_device_type(XenPCIPassthroughState *s,
+                                      uint32_t offset)
+{
+    uint8_t flags = pci_get_byte(s->dev.config + offset + PCI_EXP_FLAGS);
+    return (flags & PCI_EXP_FLAGS_TYPE) >> 4;
+}
+
+/* initialize Link Control register */
+static int xen_pt_linkctrl_reg_init(XenPCIPassthroughState *s,
+                                    XenPTRegInfo *reg, uint32_t real_offset,
+                                    uint32_t *data)
+{
+    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+    uint8_t dev_type = get_device_type(s, real_offset - reg->offset);
+
+    /* no need to initialize in case of Root Complex Integrated Endpoint
+     * with cap_ver 1.x
+     */
+    if ((dev_type == PCI_EXP_TYPE_RC_END) && (cap_ver == 1)) {
+        *data = XEN_PT_INVALID_REG;
+    }
+
+    *data = reg->init_val;
+    return 0;
+}
+/* initialize Device Control 2 register */
+static int xen_pt_devctrl2_reg_init(XenPCIPassthroughState *s,
+                                    XenPTRegInfo *reg, uint32_t real_offset,
+                                    uint32_t *data)
+{
+    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+
+    /* no need to initialize in case of cap_ver 1.x */
+    if (cap_ver == 1) {
+        *data = XEN_PT_INVALID_REG;
+    }
+
+    *data = reg->init_val;
+    return 0;
+}
+/* initialize Link Control 2 register */
+static int xen_pt_linkctrl2_reg_init(XenPCIPassthroughState *s,
+                                     XenPTRegInfo *reg, uint32_t real_offset,
+                                     uint32_t *data)
+{
+    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+    uint32_t reg_field = 0;
+
+    /* no need to initialize in case of cap_ver 1.x */
+    if (cap_ver == 1) {
+        reg_field = XEN_PT_INVALID_REG;
+    } else {
+        /* set Supported Link Speed */
+        uint8_t lnkcap = pci_get_byte(s->dev.config + real_offset - reg->offset
+                                      + PCI_EXP_LNKCAP);
+        reg_field |= PCI_EXP_LNKCAP_SLS & lnkcap;
+    }
+
+    *data = reg_field;
+    return 0;
+}
+
+/* PCI Express Capability Structure reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_pcie[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Device Capabilities reg */
+    {
+        .offset     = PCI_EXP_DEVCAP,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x1FFCFFFF,
+        .emu_mask   = 0x10000000,
+        .init       = xen_pt_common_reg_init,
+        .u.dw.read  = xen_pt_long_reg_read,
+        .u.dw.write = xen_pt_long_reg_write,
+    },
+    /* Device Control reg */
+    {
+        .offset     = PCI_EXP_DEVCTL,
+        .size       = 2,
+        .init_val   = 0x2810,
+        .ro_mask    = 0x8400,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_common_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Link Control reg */
+    {
+        .offset     = PCI_EXP_LNKCTL,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFC34,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_linkctrl_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Device Control 2 reg */
+    {
+        .offset     = 0x28,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFE0,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_devctrl2_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* Link Control 2 reg */
+    {
+        .offset     = 0x30,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xE040,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_linkctrl2_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/*********************************
+ * Power Management Capability
+ */
+
+/* read Power Management Control/Status register */
+static int xen_pt_pmcsr_reg_read(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
+                                 uint16_t *value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t valid_emu_mask = reg->emu_mask;
+
+    valid_emu_mask |= PCI_PM_CTRL_STATE_MASK | PCI_PM_CTRL_NO_SOFT_RESET;
+
+    valid_emu_mask = valid_emu_mask & valid_mask;
+    *value = XEN_PT_MERGE_VALUE(*value, cfg_entry->data, ~valid_emu_mask);
+
+    return 0;
+}
+/* write Power Management Control/Status register */
+static int xen_pt_pmcsr_reg_write(XenPCIPassthroughState *s,
+                                  XenPTReg *cfg_entry, uint16_t *val,
+                                  uint16_t dev_value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t emu_mask = reg->emu_mask;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+
+    emu_mask |= PCI_PM_CTRL_STATE_MASK | PCI_PM_CTRL_NO_SOFT_RESET;
+
+    /* modify emulate register */
+    writable_mask = emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    return 0;
+}
+
+/* Power Management Capability reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_pm[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Power Management Capabilities reg */
+    {
+        .offset     = PCI_CAP_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xF9C8,
+        .init       = xen_pt_common_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    /* PCI Power Management Control/Status reg */
+    {
+        .offset     = PCI_PM_CTRL,
+        .size       = 2,
+        .init_val   = 0x0008,
+        .ro_mask    = 0xE1FC,
+        .emu_mask   = 0x8100,
+        .init       = xen_pt_common_reg_init,
+        .u.w.read   = xen_pt_pmcsr_reg_read,
+        .u.w.write  = xen_pt_pmcsr_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/****************************
+ * Capabilities
+ */
+
+/* capability structure register group size functions */
+
+static int xen_pt_reg_grp_size_init(XenPCIPassthroughState *s,
+                                    const XenPTRegGroupInfo *grp_reg,
+                                    uint32_t base_offset, uint8_t *size)
+{
+    *size = grp_reg->grp_size;
+    return 0;
+}
+/* get Vendor Specific Capability Structure register group size */
+static int xen_pt_vendor_size_init(XenPCIPassthroughState *s,
+                                   const XenPTRegGroupInfo *grp_reg,
+                                   uint32_t base_offset, uint8_t *size)
+{
+    *size = pci_get_byte(s->dev.config + base_offset + 0x02);
+    return 0;
+}
+/* get PCI Express Capability Structure register group size */
+static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
+                                 const XenPTRegGroupInfo *grp_reg,
+                                 uint32_t base_offset, uint8_t *size)
+{
+    PCIDevice *d = &s->dev;
+    uint8_t version = get_capability_version(s, base_offset);
+    uint8_t type = get_device_type(s, base_offset);
+    uint8_t pcie_size = 0;
+
+
+    /* calculate size depend on capability version and device/port type */
+    /* in case of PCI Express Base Specification Rev 1.x */
+    if (version == 1) {
+        /* The PCI Express Capabilities, Device Capabilities, and Device
+         * Status/Control registers are required for all PCI Express devices.
+         * The Link Capabilities and Link Status/Control are required for all
+         * Endpoints that are not Root Complex Integrated Endpoints. Endpoints
+         * are not required to implement registers other than those listed
+         * above and terminate the capability structure.
+         */
+        switch (type) {
+        case PCI_EXP_TYPE_ENDPOINT:
+        case PCI_EXP_TYPE_LEG_END:
+            pcie_size = 0x14;
+            break;
+        case PCI_EXP_TYPE_RC_END:
+            /* has no link */
+            pcie_size = 0x0C;
+            break;
+            /* only EndPoint passthrough is supported */
+        case PCI_EXP_TYPE_ROOT_PORT:
+        case PCI_EXP_TYPE_UPSTREAM:
+        case PCI_EXP_TYPE_DOWNSTREAM:
+        case PCI_EXP_TYPE_PCI_BRIDGE:
+        case PCI_EXP_TYPE_PCIE_BRIDGE:
+        case PCI_EXP_TYPE_RC_EC:
+        default:
+            XEN_PT_ERR(d, "Unsupported device/port type %#x.\n", type);
+            return -1;
+        }
+    }
+    /* in case of PCI Express Base Specification Rev 2.0 */
+    else if (version == 2) {
+        switch (type) {
+        case PCI_EXP_TYPE_ENDPOINT:
+        case PCI_EXP_TYPE_LEG_END:
+        case PCI_EXP_TYPE_RC_END:
+            /* For Functions that do not implement the registers,
+             * these spaces must be hardwired to 0b.
+             */
+            pcie_size = 0x3C;
+            break;
+            /* only EndPoint passthrough is supported */
+        case PCI_EXP_TYPE_ROOT_PORT:
+        case PCI_EXP_TYPE_UPSTREAM:
+        case PCI_EXP_TYPE_DOWNSTREAM:
+        case PCI_EXP_TYPE_PCI_BRIDGE:
+        case PCI_EXP_TYPE_PCIE_BRIDGE:
+        case PCI_EXP_TYPE_RC_EC:
+        default:
+            XEN_PT_ERR(d, "Unsupported device/port type %#x.\n", type);
+            return -1;
+        }
+    } else {
+        XEN_PT_ERR(d, "Unsupported capability version %#x.\n", version);
+        return -1;
+    }
+
+    *size = pcie_size;
+    return 0;
+}
+
+static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
+    /* Header Type0 reg group */
+    {
+        .grp_id      = 0xFF,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0x40,
+        .size_init   = xen_pt_reg_grp_size_init,
+        .emu_regs = xen_pt_emu_reg_header0,
+    },
+    /* PCI PowerManagement Capability reg group */
+    {
+        .grp_id      = PCI_CAP_ID_PM,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = PCI_PM_SIZEOF,
+        .size_init   = xen_pt_reg_grp_size_init,
+        .emu_regs = xen_pt_emu_reg_pm,
+    },
+    /* AGP Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_AGP,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x30,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* Vital Product Data Capability Structure reg group */
+    {
+        .grp_id      = PCI_CAP_ID_VPD,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0x08,
+        .size_init   = xen_pt_reg_grp_size_init,
+        .emu_regs = xen_pt_emu_reg_vpd,
+    },
+    /* Slot Identification reg group */
+    {
+        .grp_id     = PCI_CAP_ID_SLOTID,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x04,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* PCI-X Capabilities List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_PCIX,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x18,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* Vendor Specific Capability Structure reg group */
+    {
+        .grp_id      = PCI_CAP_ID_VNDR,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0xFF,
+        .size_init   = xen_pt_vendor_size_init,
+        .emu_regs = xen_pt_emu_reg_vendor,
+    },
+    /* SHPC Capability List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_SHPC,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x08,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* Subsystem ID and Subsystem Vendor ID Capability List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_SSVID,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x08,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* AGP 8x Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_AGP3,
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x30,
+        .size_init  = xen_pt_reg_grp_size_init,
+    },
+    /* PCI Express Capability Structure reg group */
+    {
+        .grp_id      = PCI_CAP_ID_EXP,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0xFF,
+        .size_init   = xen_pt_pcie_size_init,
+        .emu_regs = xen_pt_emu_reg_pcie,
+    },
+    {
+        .grp_size = 0,
+    },
+};
+
+/* initialize Capabilities Pointer or Next Pointer register */
+static int xen_pt_ptr_reg_init(XenPCIPassthroughState *s,
+                               XenPTRegInfo *reg, uint32_t real_offset,
+                               uint32_t *data)
+{
+    int i;
+    uint8_t *config = s->dev.config;
+    uint32_t reg_field = pci_get_byte(config + real_offset);
+    uint8_t cap_id = 0;
+
+    /* find capability offset */
+    while (reg_field) {
+        for (i = 0; xen_pt_emu_reg_grps[i].grp_size != 0; i++) {
+            if (xen_pt_hide_dev_cap(&s->real_device,
+                                    xen_pt_emu_reg_grps[i].grp_id)) {
+                continue;
+            }
+
+            cap_id = pci_get_byte(config + reg_field + PCI_CAP_LIST_ID);
+            if (xen_pt_emu_reg_grps[i].grp_id == cap_id) {
+                if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU) {
+                    goto out;
+                }
+                /* ignore the 0 hardwired capability, find next one */
+                break;
+            }
+        }
+
+        /* next capability */
+        reg_field = pci_get_byte(config + reg_field + PCI_CAP_LIST_NEXT);
+    }
+
+out:
+    *data = reg_field;
+    return 0;
+}
+
+
+/*************
+ * Main
+ */
+
+static uint8_t find_cap_offset(XenPCIPassthroughState *s, uint8_t cap)
+{
+    uint8_t id;
+    unsigned max_cap = PCI_CAP_MAX;
+    uint8_t pos = PCI_CAPABILITY_LIST;
+    uint8_t status = 0;
+
+    if (xen_host_pci_get_byte(&s->real_device, PCI_STATUS, &status)) {
+        return 0;
+    }
+    if ((status & PCI_STATUS_CAP_LIST) == 0) {
+        return 0;
+    }
+
+    while (max_cap--) {
+        if (xen_host_pci_get_byte(&s->real_device, pos, &pos)) {
+            break;
+        }
+        if (pos < PCI_CONFIG_HEADER_SIZE) {
+            break;
+        }
+
+        pos &= ~3;
+        if (xen_host_pci_get_byte(&s->real_device,
+                                  pos + PCI_CAP_LIST_ID, &id)) {
+            break;
+        }
+
+        if (id == 0xff) {
+            break;
+        }
+        if (id == cap) {
+            return pos;
+        }
+
+        pos += PCI_CAP_LIST_NEXT;
+    }
+    return 0;
+}
+
+static int xen_pt_config_reg_init(XenPCIPassthroughState *s,
+                                  XenPTRegGroup *reg_grp, XenPTRegInfo *reg)
+{
+    XenPTReg *reg_entry;
+    uint32_t data = 0;
+    int rc = 0;
+
+    reg_entry = g_new0(XenPTReg, 1);
+    reg_entry->reg = reg;
+
+    if (reg->init) {
+        /* initialize emulate register */
+        rc = reg->init(s, reg_entry->reg,
+                       reg_grp->base_offset + reg->offset, &data);
+        if (rc < 0) {
+            free(reg_entry);
+            return rc;
+        }
+        if (data == XEN_PT_INVALID_REG) {
+            /* free unused BAR register entry */
+            free(reg_entry);
+            return 0;
+        }
+        /* set register value */
+        reg_entry->data = data;
+    }
+    /* list add register entry */
+    QLIST_INSERT_HEAD(&reg_grp->reg_tbl_list, reg_entry, entries);
+
+    return 0;
+}
+
+int xen_pt_config_init(XenPCIPassthroughState *s)
+{
+    int i, rc;
+
+    QLIST_INIT(&s->reg_grps);
+
+    for (i = 0; xen_pt_emu_reg_grps[i].grp_size != 0; i++) {
+        uint32_t reg_grp_offset = 0;
+        XenPTRegGroup *reg_grp_entry = NULL;
+
+        if (xen_pt_emu_reg_grps[i].grp_id != 0xFF) {
+            if (xen_pt_hide_dev_cap(&s->real_device,
+                                    xen_pt_emu_reg_grps[i].grp_id)) {
+                continue;
+            }
+
+            reg_grp_offset = find_cap_offset(s, xen_pt_emu_reg_grps[i].grp_id);
+
+            if (!reg_grp_offset) {
+                continue;
+            }
+        }
+
+        reg_grp_entry = g_new0(XenPTRegGroup, 1);
+        QLIST_INIT(&reg_grp_entry->reg_tbl_list);
+        QLIST_INSERT_HEAD(&s->reg_grps, reg_grp_entry, entries);
+
+        reg_grp_entry->base_offset = reg_grp_offset;
+        reg_grp_entry->reg_grp = xen_pt_emu_reg_grps + i;
+        if (xen_pt_emu_reg_grps[i].size_init) {
+            /* get register group size */
+            rc = xen_pt_emu_reg_grps[i].size_init(s, reg_grp_entry->reg_grp,
+                                                  reg_grp_offset,
+                                                  &reg_grp_entry->size);
+            if (rc < 0) {
+                xen_pt_config_delete(s);
+                return rc;
+            }
+        }
+
+        if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU) {
+            if (xen_pt_emu_reg_grps[i].emu_regs) {
+                int j = 0;
+                XenPTRegInfo *regs = xen_pt_emu_reg_grps[i].emu_regs;
+                /* initialize capability register */
+                for (j = 0; regs->size != 0; j++, regs++) {
+                    /* initialize capability register */
+                    rc = xen_pt_config_reg_init(s, reg_grp_entry, regs);
+                    if (rc < 0) {
+                        xen_pt_config_delete(s);
+                        return rc;
+                    }
+                }
+            }
+        }
+    }
+
+    return 0;
+}
+
+/* delete all emulate register */
+void xen_pt_config_delete(XenPCIPassthroughState *s)
+{
+    struct XenPTRegGroup *reg_group, *next_grp;
+    struct XenPTReg *reg, *next_reg;
+
+    /* free all register group entry */
+    QLIST_FOREACH_SAFE(reg_group, &s->reg_grps, entries, next_grp) {
+        /* free all register entry */
+        QLIST_FOREACH_SAFE(reg, &reg_group->reg_tbl_list, entries, next_reg) {
+            QLIST_REMOVE(reg, entries);
+            g_free(reg);
+        }
+
+        QLIST_REMOVE(reg_group, entries);
+        g_free(reg_group);
+    }
+}
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH V13 8/9] Introduce apic-msidef.h
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
                   ` (10 preceding siblings ...)
  2012-06-14 17:01 ` [PATCH V13 8/9] Introduce apic-msidef.h Anthony PERARD
@ 2012-06-14 17:01 ` Anthony PERARD
  2012-06-14 19:40   ` Michael S. Tsirkin
  2012-06-14 19:40   ` [Qemu-devel] " Michael S. Tsirkin
  2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 9/9] Introduce Xen PCI Passthrough, MSI (3/3) Anthony PERARD
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

This patch move the msi definition from apic.c to apic-msidef.h. So it can be
used also by other .c files.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/apic-msidef.h |   30 ++++++++++++++++++++++++++++++
 hw/apic.c        |   11 +----------
 2 files changed, 31 insertions(+), 10 deletions(-)
 create mode 100644 hw/apic-msidef.h

diff --git a/hw/apic-msidef.h b/hw/apic-msidef.h
new file mode 100644
index 0000000..6e2eb71
--- /dev/null
+++ b/hw/apic-msidef.h
@@ -0,0 +1,30 @@
+#ifndef HW_APIC_MSIDEF_H
+#define HW_APIC_MSIDEF_H
+
+/*
+ * Intel APIC constants: from include/asm/msidef.h
+ */
+
+/*
+ * Shifts for MSI data
+ */
+
+#define MSI_DATA_VECTOR_SHIFT           0
+#define  MSI_DATA_VECTOR_MASK           0x000000ff
+
+#define MSI_DATA_DELIVERY_MODE_SHIFT    8
+#define MSI_DATA_LEVEL_SHIFT            14
+#define MSI_DATA_TRIGGER_SHIFT          15
+
+/*
+ * Shift/mask fields for msi address
+ */
+
+#define MSI_ADDR_DEST_MODE_SHIFT        2
+
+#define MSI_ADDR_REDIRECTION_SHIFT      3
+
+#define MSI_ADDR_DEST_ID_SHIFT          12
+#define  MSI_ADDR_DEST_ID_MASK          0x00ffff0
+
+#endif /* HW_APIC_MSIDEF_H */
diff --git a/hw/apic.c b/hw/apic.c
index 5fbf01c..60552df 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -23,19 +23,10 @@
 #include "host-utils.h"
 #include "trace.h"
 #include "pc.h"
+#include "apic-msidef.h"
 
 #define MAX_APIC_WORDS 8
 
-/* Intel APIC constants: from include/asm/msidef.h */
-#define MSI_DATA_VECTOR_SHIFT		0
-#define MSI_DATA_VECTOR_MASK		0x000000ff
-#define MSI_DATA_DELIVERY_MODE_SHIFT	8
-#define MSI_DATA_TRIGGER_SHIFT		15
-#define MSI_DATA_LEVEL_SHIFT		14
-#define MSI_ADDR_DEST_MODE_SHIFT	2
-#define MSI_ADDR_DEST_ID_SHIFT		12
-#define	MSI_ADDR_DEST_ID_MASK		0x00ffff0
-
 #define SYNC_FROM_VAPIC                 0x1
 #define SYNC_TO_VAPIC                   0x2
 #define SYNC_ISR_IRR_TO_VAPIC           0x4
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V13 8/9] Introduce apic-msidef.h
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
                   ` (9 preceding siblings ...)
  2012-06-14 17:01   ` Anthony PERARD
@ 2012-06-14 17:01 ` Anthony PERARD
  2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Xen Devel, Anthony PERARD

This patch move the msi definition from apic.c to apic-msidef.h. So it can be
used also by other .c files.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/apic-msidef.h |   30 ++++++++++++++++++++++++++++++
 hw/apic.c        |   11 +----------
 2 files changed, 31 insertions(+), 10 deletions(-)
 create mode 100644 hw/apic-msidef.h

diff --git a/hw/apic-msidef.h b/hw/apic-msidef.h
new file mode 100644
index 0000000..6e2eb71
--- /dev/null
+++ b/hw/apic-msidef.h
@@ -0,0 +1,30 @@
+#ifndef HW_APIC_MSIDEF_H
+#define HW_APIC_MSIDEF_H
+
+/*
+ * Intel APIC constants: from include/asm/msidef.h
+ */
+
+/*
+ * Shifts for MSI data
+ */
+
+#define MSI_DATA_VECTOR_SHIFT           0
+#define  MSI_DATA_VECTOR_MASK           0x000000ff
+
+#define MSI_DATA_DELIVERY_MODE_SHIFT    8
+#define MSI_DATA_LEVEL_SHIFT            14
+#define MSI_DATA_TRIGGER_SHIFT          15
+
+/*
+ * Shift/mask fields for msi address
+ */
+
+#define MSI_ADDR_DEST_MODE_SHIFT        2
+
+#define MSI_ADDR_REDIRECTION_SHIFT      3
+
+#define MSI_ADDR_DEST_ID_SHIFT          12
+#define  MSI_ADDR_DEST_ID_MASK          0x00ffff0
+
+#endif /* HW_APIC_MSIDEF_H */
diff --git a/hw/apic.c b/hw/apic.c
index 5fbf01c..60552df 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -23,19 +23,10 @@
 #include "host-utils.h"
 #include "trace.h"
 #include "pc.h"
+#include "apic-msidef.h"
 
 #define MAX_APIC_WORDS 8
 
-/* Intel APIC constants: from include/asm/msidef.h */
-#define MSI_DATA_VECTOR_SHIFT		0
-#define MSI_DATA_VECTOR_MASK		0x000000ff
-#define MSI_DATA_DELIVERY_MODE_SHIFT	8
-#define MSI_DATA_TRIGGER_SHIFT		15
-#define MSI_DATA_LEVEL_SHIFT		14
-#define MSI_ADDR_DEST_MODE_SHIFT	2
-#define MSI_ADDR_DEST_ID_SHIFT		12
-#define	MSI_ADDR_DEST_ID_MASK		0x00ffff0
-
 #define SYNC_FROM_VAPIC                 0x1
 #define SYNC_TO_VAPIC                   0x2
 #define SYNC_ISR_IRR_TO_VAPIC           0x4
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH V13 9/9] Introduce Xen PCI Passthrough, MSI (3/3)
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
                   ` (11 preceding siblings ...)
  2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
@ 2012-06-14 17:01 ` Anthony PERARD
  2012-06-14 17:01 ` Anthony PERARD
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Shan Haitao, Xen Devel, Anthony PERARD

From: Jiang Yunhong <yunhong.jiang@intel.com>

A more complete history can be found here:
git://xenbits.xensource.com/qemu-xen-unstable.git

Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Shan Haitao <haitao.shan@intel.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/i386/Makefile.objs   |    2 +-
 hw/xen_pt.c             |   31 +++-
 hw/xen_pt.h             |   51 ++++
 hw/xen_pt_config_init.c |  471 +++++++++++++++++++++++++++++++++++
 hw/xen_pt_msi.c         |  620 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1173 insertions(+), 2 deletions(-)
 create mode 100644 hw/xen_pt_msi.c

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index e361a92..d364c37 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -8,7 +8,7 @@ obj-y += pc_piix.o
 obj-y += pc_sysfw.o
 obj-$(CONFIG_XEN) += xen_platform.o xen_apic.o
 obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
-obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o
+obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o xen_pt_msi.o
 obj-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o kvm/i8254.o
 obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
diff --git a/hw/xen_pt.c b/hw/xen_pt.c
index 92ad0fa..3b6d186 100644
--- a/hw/xen_pt.c
+++ b/hw/xen_pt.c
@@ -36,6 +36,20 @@
  *
  *     Write '1'
  *       - Set real bit to '1'.
+ *
+ * MSI interrupt:
+ *   Initialize MSI register(xen_pt_msi_setup, xen_pt_msi_update)
+ *     Bind MSI(xc_domain_update_msi_irq)
+ *       <fail>
+ *         - Unmap MSI.
+ *         - Set dev->msi->pirq to '-1'.
+ *
+ * MSI-X interrupt:
+ *   Initialize MSI-X register(xen_pt_msix_update_one)
+ *     Bind MSI-X(xc_domain_update_msi_irq)
+ *       <fail>
+ *         - Unmap MSI-X.
+ *         - Set entry->pirq to '-1'.
  */
 
 #include <sys/ioctl.h>
@@ -534,7 +548,15 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
     };
 
     bar = xen_pt_bar_from_region(s, mr);
-    if (bar == -1) {
+    if (bar == -1 && (!s->msix || &s->msix->mmio != mr)) {
+        return;
+    }
+
+    if (s->msix && &s->msix->mmio == mr) {
+        if (adding) {
+            s->msix->mmio_base_addr = sec->offset_within_address_space;
+            rc = xen_pt_msix_update_remap(s, s->msix->bar_index);
+        }
         return;
     }
 
@@ -764,6 +786,13 @@ static int xen_pt_unregister_device(PCIDevice *d)
         }
     }
 
+    if (s->msi) {
+        xen_pt_msi_disable(s);
+    }
+    if (s->msix) {
+        xen_pt_msix_disable(s);
+    }
+
     if (machine_irq) {
         xen_pt_mapped_machine_irq[machine_irq]--;
 
diff --git a/hw/xen_pt.h b/hw/xen_pt.h
index 4b76073..41904ec 100644
--- a/hw/xen_pt.h
+++ b/hw/xen_pt.h
@@ -160,6 +160,36 @@ typedef struct XenPTRegGroup {
 
 
 #define XEN_PT_UNASSIGNED_PIRQ (-1)
+typedef struct XenPTMSI {
+    uint16_t flags;
+    uint32_t addr_lo;  /* guest message address */
+    uint32_t addr_hi;  /* guest message upper address */
+    uint16_t data;     /* guest message data */
+    uint32_t ctrl_offset; /* saved control offset */
+    int pirq;          /* guest pirq corresponding */
+    bool initialized;  /* when guest MSI is initialized */
+    bool mapped;       /* when pirq is mapped */
+} XenPTMSI;
+
+typedef struct XenPTMSIXEntry {
+    int pirq;
+    uint64_t addr;
+    uint32_t data;
+    uint32_t vector_ctrl;
+    bool updated; /* indicate whether MSI ADDR or DATA is updated */
+} XenPTMSIXEntry;
+typedef struct XenPTMSIX {
+    uint32_t ctrl_offset;
+    bool enabled;
+    int total_entries;
+    int bar_index;
+    uint64_t table_base;
+    uint32_t table_offset_adjust; /* page align mmap */
+    uint64_t mmio_base_addr;
+    MemoryRegion mmio;
+    void *phys_iomem_base;
+    XenPTMSIXEntry msix_entry[0];
+} XenPTMSIX;
 
 struct XenPCIPassthroughState {
     PCIDevice dev;
@@ -172,6 +202,9 @@ struct XenPCIPassthroughState {
 
     uint32_t machine_irq;
 
+    XenPTMSI *msi;
+    XenPTMSIX *msix;
+
     MemoryRegion bar[PCI_NUM_REGIONS - 1];
     MemoryRegion rom;
 
@@ -247,4 +280,22 @@ static inline uint8_t xen_pt_pci_intx(XenPCIPassthroughState *s)
     return r_val;
 }
 
+/* MSI/MSI-X */
+int xen_pt_msi_set_enable(XenPCIPassthroughState *s, bool en);
+int xen_pt_msi_setup(XenPCIPassthroughState *s);
+int xen_pt_msi_update(XenPCIPassthroughState *d);
+void xen_pt_msi_disable(XenPCIPassthroughState *s);
+
+int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base);
+void xen_pt_msix_delete(XenPCIPassthroughState *s);
+int xen_pt_msix_update(XenPCIPassthroughState *s);
+int xen_pt_msix_update_remap(XenPCIPassthroughState *s, int bar_index);
+void xen_pt_msix_disable(XenPCIPassthroughState *s);
+
+static inline bool xen_pt_has_msix_mapping(XenPCIPassthroughState *s, int bar)
+{
+    return s->msix && s->msix->bar_index == bar;
+}
+
+
 #endif /* !XEN_PT_H */
diff --git a/hw/xen_pt_config_init.c b/hw/xen_pt_config_init.c
index 1d97876..00eb3d9 100644
--- a/hw/xen_pt_config_init.c
+++ b/hw/xen_pt_config_init.c
@@ -1022,6 +1022,410 @@ static XenPTRegInfo xen_pt_emu_reg_pm[] = {
 };
 
 
+/********************************
+ * MSI Capability
+ */
+
+/* Helper */
+static bool xen_pt_msgdata_check_type(uint32_t offset, uint16_t flags)
+{
+    /* check the offset whether matches the type or not */
+    bool is_32 = (offset == PCI_MSI_DATA_32) && !(flags & PCI_MSI_FLAGS_64BIT);
+    bool is_64 = (offset == PCI_MSI_DATA_64) &&  (flags & PCI_MSI_FLAGS_64BIT);
+    return is_32 || is_64;
+}
+
+/* Message Control register */
+static int xen_pt_msgctrl_reg_init(XenPCIPassthroughState *s,
+                                   XenPTRegInfo *reg, uint32_t real_offset,
+                                   uint32_t *data)
+{
+    PCIDevice *d = &s->dev;
+    XenPTMSI *msi = s->msi;
+    uint16_t reg_field = 0;
+
+    /* use I/O device register's value as initial value */
+    reg_field = pci_get_word(d->config + real_offset);
+
+    if (reg_field & PCI_MSI_FLAGS_ENABLE) {
+        XEN_PT_LOG(&s->dev, "MSI already enabled, disabling it first\n");
+        xen_host_pci_set_word(&s->real_device, real_offset,
+                              reg_field & ~PCI_MSI_FLAGS_ENABLE);
+    }
+    msi->flags |= reg_field;
+    msi->ctrl_offset = real_offset;
+    msi->initialized = false;
+    msi->mapped = false;
+
+    *data = reg->init_val;
+    return 0;
+}
+static int xen_pt_msgctrl_reg_write(XenPCIPassthroughState *s,
+                                    XenPTReg *cfg_entry, uint16_t *val,
+                                    uint16_t dev_value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    XenPTMSI *msi = s->msi;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t raw_val;
+
+    /* Currently no support for multi-vector */
+    if (*val & PCI_MSI_FLAGS_QSIZE) {
+        XEN_PT_WARN(&s->dev, "Tries to set more than 1 vector ctrl %x\n", *val);
+    }
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+    msi->flags |= cfg_entry->data & ~PCI_MSI_FLAGS_ENABLE;
+
+    /* create value for writing to I/O device register */
+    raw_val = *val;
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    /* update MSI */
+    if (raw_val & PCI_MSI_FLAGS_ENABLE) {
+        /* setup MSI pirq for the first time */
+        if (!msi->initialized) {
+            /* Init physical one */
+            XEN_PT_LOG(&s->dev, "setup MSI\n");
+            if (xen_pt_msi_setup(s)) {
+                /* We do not broadcast the error to the framework code, so
+                 * that MSI errors are contained in MSI emulation code and
+                 * QEMU can go on running.
+                 * Guest MSI would be actually not working.
+                 */
+                *val &= ~PCI_MSI_FLAGS_ENABLE;
+                XEN_PT_WARN(&s->dev, "Can not map MSI.\n");
+                return 0;
+            }
+            if (xen_pt_msi_update(s)) {
+                *val &= ~PCI_MSI_FLAGS_ENABLE;
+                XEN_PT_WARN(&s->dev, "Can not bind MSI\n");
+                return 0;
+            }
+            msi->initialized = true;
+            msi->mapped = true;
+        }
+        msi->flags |= PCI_MSI_FLAGS_ENABLE;
+    } else {
+        msi->flags &= ~PCI_MSI_FLAGS_ENABLE;
+    }
+
+    /* pass through MSI_ENABLE bit */
+    *val &= ~PCI_MSI_FLAGS_ENABLE;
+    *val |= raw_val & PCI_MSI_FLAGS_ENABLE;
+
+    return 0;
+}
+
+/* initialize Message Upper Address register */
+static int xen_pt_msgaddr64_reg_init(XenPCIPassthroughState *s,
+                                     XenPTRegInfo *reg, uint32_t real_offset,
+                                     uint32_t *data)
+{
+    /* no need to initialize in case of 32 bit type */
+    if (!(s->msi->flags & PCI_MSI_FLAGS_64BIT)) {
+        *data = XEN_PT_INVALID_REG;
+    } else {
+        *data = reg->init_val;
+    }
+
+    return 0;
+}
+/* this function will be called twice (for 32 bit and 64 bit type) */
+/* initialize Message Data register */
+static int xen_pt_msgdata_reg_init(XenPCIPassthroughState *s,
+                                   XenPTRegInfo *reg, uint32_t real_offset,
+                                   uint32_t *data)
+{
+    uint32_t flags = s->msi->flags;
+    uint32_t offset = reg->offset;
+
+    /* check the offset whether matches the type or not */
+    if (xen_pt_msgdata_check_type(offset, flags)) {
+        *data = reg->init_val;
+    } else {
+        *data = XEN_PT_INVALID_REG;
+    }
+    return 0;
+}
+
+/* write Message Address register */
+static int xen_pt_msgaddr32_reg_write(XenPCIPassthroughState *s,
+                                      XenPTReg *cfg_entry, uint32_t *val,
+                                      uint32_t dev_value, uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    uint32_t old_addr = cfg_entry->data;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+    s->msi->addr_lo = cfg_entry->data;
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    /* update MSI */
+    if (cfg_entry->data != old_addr) {
+        if (s->msi->mapped) {
+            xen_pt_msi_update(s);
+        }
+    }
+
+    return 0;
+}
+/* write Message Upper Address register */
+static int xen_pt_msgaddr64_reg_write(XenPCIPassthroughState *s,
+                                      XenPTReg *cfg_entry, uint32_t *val,
+                                      uint32_t dev_value, uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    uint32_t old_addr = cfg_entry->data;
+
+    /* check whether the type is 64 bit or not */
+    if (!(s->msi->flags & PCI_MSI_FLAGS_64BIT)) {
+        XEN_PT_ERR(&s->dev,
+                   "Can't write to the upper address without 64 bit support\n");
+        return -1;
+    }
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+    /* update the msi_info too */
+    s->msi->addr_hi = cfg_entry->data;
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    /* update MSI */
+    if (cfg_entry->data != old_addr) {
+        if (s->msi->mapped) {
+            xen_pt_msi_update(s);
+        }
+    }
+
+    return 0;
+}
+
+
+/* this function will be called twice (for 32 bit and 64 bit type) */
+/* write Message Data register */
+static int xen_pt_msgdata_reg_write(XenPCIPassthroughState *s,
+                                    XenPTReg *cfg_entry, uint16_t *val,
+                                    uint16_t dev_value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    XenPTMSI *msi = s->msi;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t old_data = cfg_entry->data;
+    uint32_t offset = reg->offset;
+
+    /* check the offset whether matches the type or not */
+    if (!xen_pt_msgdata_check_type(offset, msi->flags)) {
+        /* exit I/O emulator */
+        XEN_PT_ERR(&s->dev, "the offset does not match the 32/64 bit type!\n");
+        return -1;
+    }
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+    /* update the msi_info too */
+    msi->data = cfg_entry->data;
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    /* update MSI */
+    if (cfg_entry->data != old_data) {
+        if (msi->mapped) {
+            xen_pt_msi_update(s);
+        }
+    }
+
+    return 0;
+}
+
+/* MSI Capability Structure reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_msi[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Message Control reg */
+    {
+        .offset     = PCI_MSI_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFF8E,
+        .emu_mask   = 0x007F,
+        .init       = xen_pt_msgctrl_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_msgctrl_reg_write,
+    },
+    /* Message Address reg */
+    {
+        .offset     = PCI_MSI_ADDRESS_LO,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x00000003,
+        .emu_mask   = 0xFFFFFFFF,
+        .no_wb      = 1,
+        .init       = xen_pt_common_reg_init,
+        .u.dw.read  = xen_pt_long_reg_read,
+        .u.dw.write = xen_pt_msgaddr32_reg_write,
+    },
+    /* Message Upper Address reg (if PCI_MSI_FLAGS_64BIT set) */
+    {
+        .offset     = PCI_MSI_ADDRESS_HI,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x00000000,
+        .emu_mask   = 0xFFFFFFFF,
+        .no_wb      = 1,
+        .init       = xen_pt_msgaddr64_reg_init,
+        .u.dw.read  = xen_pt_long_reg_read,
+        .u.dw.write = xen_pt_msgaddr64_reg_write,
+    },
+    /* Message Data reg (16 bits of data for 32-bit devices) */
+    {
+        .offset     = PCI_MSI_DATA_32,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .no_wb      = 1,
+        .init       = xen_pt_msgdata_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_msgdata_reg_write,
+    },
+    /* Message Data reg (16 bits of data for 64-bit devices) */
+    {
+        .offset     = PCI_MSI_DATA_64,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .no_wb      = 1,
+        .init       = xen_pt_msgdata_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_msgdata_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/**************************************
+ * MSI-X Capability
+ */
+
+/* Message Control register for MSI-X */
+static int xen_pt_msixctrl_reg_init(XenPCIPassthroughState *s,
+                                    XenPTRegInfo *reg, uint32_t real_offset,
+                                    uint32_t *data)
+{
+    PCIDevice *d = &s->dev;
+    uint16_t reg_field = 0;
+
+    /* use I/O device register's value as initial value */
+    reg_field = pci_get_word(d->config + real_offset);
+
+    if (reg_field & PCI_MSIX_FLAGS_ENABLE) {
+        XEN_PT_LOG(d, "MSIX already enabled, disabling it first\n");
+        xen_host_pci_set_word(&s->real_device, real_offset,
+                              reg_field & ~PCI_MSIX_FLAGS_ENABLE);
+    }
+
+    s->msix->ctrl_offset = real_offset;
+
+    *data = reg->init_val;
+    return 0;
+}
+static int xen_pt_msixctrl_reg_write(XenPCIPassthroughState *s,
+                                     XenPTReg *cfg_entry, uint16_t *val,
+                                     uint16_t dev_value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    int debug_msix_enabled_old;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    /* update MSI-X */
+    if ((*val & PCI_MSIX_FLAGS_ENABLE)
+        && !(*val & PCI_MSIX_FLAGS_MASKALL)) {
+        xen_pt_msix_update(s);
+    }
+
+    debug_msix_enabled_old = s->msix->enabled;
+    s->msix->enabled = !!(*val & PCI_MSIX_FLAGS_ENABLE);
+    if (s->msix->enabled != debug_msix_enabled_old) {
+        XEN_PT_LOG(&s->dev, "%s MSI-X\n",
+                   s->msix->enabled ? "enable" : "disable");
+    }
+
+    return 0;
+}
+
+/* MSI-X Capability Structure reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_msix[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Message Control reg */
+    {
+        .offset     = PCI_MSI_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x3FFF,
+        .emu_mask   = 0x0000,
+        .init       = xen_pt_msixctrl_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_msixctrl_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
 /****************************
  * Capabilities
  */
@@ -1115,6 +1519,49 @@ static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
     *size = pcie_size;
     return 0;
 }
+/* get MSI Capability Structure register group size */
+static int xen_pt_msi_size_init(XenPCIPassthroughState *s,
+                                const XenPTRegGroupInfo *grp_reg,
+                                uint32_t base_offset, uint8_t *size)
+{
+    PCIDevice *d = &s->dev;
+    uint16_t msg_ctrl = 0;
+    uint8_t msi_size = 0xa;
+
+    msg_ctrl = pci_get_word(d->config + (base_offset + PCI_MSI_FLAGS));
+
+    /* check if 64-bit address is capable of per-vector masking */
+    if (msg_ctrl & PCI_MSI_FLAGS_64BIT) {
+        msi_size += 4;
+    }
+    if (msg_ctrl & PCI_MSI_FLAGS_MASKBIT) {
+        msi_size += 10;
+    }
+
+    s->msi = g_new0(XenPTMSI, 1);
+    s->msi->pirq = XEN_PT_UNASSIGNED_PIRQ;
+
+    *size = msi_size;
+    return 0;
+}
+/* get MSI-X Capability Structure register group size */
+static int xen_pt_msix_size_init(XenPCIPassthroughState *s,
+                                 const XenPTRegGroupInfo *grp_reg,
+                                 uint32_t base_offset, uint8_t *size)
+{
+    int rc = 0;
+
+    rc = xen_pt_msix_init(s, base_offset);
+
+    if (rc < 0) {
+        XEN_PT_ERR(&s->dev, "Internal error: Invalid xen_pt_msix_init.\n");
+        return rc;
+    }
+
+    *size = grp_reg->grp_size;
+    return 0;
+}
+
 
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
     /* Header Type0 reg group */
@@ -1155,6 +1602,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .grp_size   = 0x04,
         .size_init  = xen_pt_reg_grp_size_init,
     },
+    /* MSI Capability Structure reg group */
+    {
+        .grp_id      = PCI_CAP_ID_MSI,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0xFF,
+        .size_init   = xen_pt_msi_size_init,
+        .emu_regs = xen_pt_emu_reg_msi,
+    },
     /* PCI-X Capabilities List Item reg group */
     {
         .grp_id     = PCI_CAP_ID_PCIX,
@@ -1199,6 +1654,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init   = xen_pt_pcie_size_init,
         .emu_regs = xen_pt_emu_reg_pcie,
     },
+    /* MSI-X Capability Structure reg group */
+    {
+        .grp_id      = PCI_CAP_ID_MSIX,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0x0C,
+        .size_init   = xen_pt_msix_size_init,
+        .emu_regs = xen_pt_emu_reg_msix,
+    },
     {
         .grp_size = 0,
     },
@@ -1384,6 +1847,14 @@ void xen_pt_config_delete(XenPCIPassthroughState *s)
     struct XenPTRegGroup *reg_group, *next_grp;
     struct XenPTReg *reg, *next_reg;
 
+    /* free MSI/MSI-X info table */
+    if (s->msix) {
+        xen_pt_msix_delete(s);
+    }
+    if (s->msi) {
+        g_free(s->msi);
+    }
+
     /* free all register group entry */
     QLIST_FOREACH_SAFE(reg_group, &s->reg_grps, entries, next_grp) {
         /* free all register entry */
diff --git a/hw/xen_pt_msi.c b/hw/xen_pt_msi.c
new file mode 100644
index 0000000..2299cc7
--- /dev/null
+++ b/hw/xen_pt_msi.c
@@ -0,0 +1,620 @@
+/*
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Jiang Yunhong <yunhong.jiang@intel.com>
+ *
+ * This file implements direct PCI assignment to a HVM guest
+ */
+
+#include <sys/mman.h>
+
+#include "xen_backend.h"
+#include "xen_pt.h"
+#include "apic-msidef.h"
+
+
+#define XEN_PT_AUTO_ASSIGN -1
+
+/* shift count for gflags */
+#define XEN_PT_GFLAGS_SHIFT_DEST_ID        0
+#define XEN_PT_GFLAGS_SHIFT_RH             8
+#define XEN_PT_GFLAGS_SHIFT_DM             9
+#define XEN_PT_GFLAGSSHIFT_DELIV_MODE     12
+#define XEN_PT_GFLAGSSHIFT_TRG_MODE       15
+
+
+/*
+ * Helpers
+ */
+
+static inline uint8_t msi_vector(uint32_t data)
+{
+    return (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
+}
+
+static inline uint8_t msi_dest_id(uint32_t addr)
+{
+    return (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+}
+
+static inline uint32_t msi_ext_dest_id(uint32_t addr_hi)
+{
+    return addr_hi & 0xffffff00;
+}
+
+static uint32_t msi_gflags(uint32_t data, uint64_t addr)
+{
+    uint32_t result = 0;
+    int rh, dm, dest_id, deliv_mode, trig_mode;
+
+    rh = (addr >> MSI_ADDR_REDIRECTION_SHIFT) & 0x1;
+    dm = (addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
+    dest_id = msi_dest_id(addr);
+    deliv_mode = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+    trig_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+
+    result = dest_id | (rh << XEN_PT_GFLAGS_SHIFT_RH)
+        | (dm << XEN_PT_GFLAGS_SHIFT_DM)
+        | (deliv_mode << XEN_PT_GFLAGSSHIFT_DELIV_MODE)
+        | (trig_mode << XEN_PT_GFLAGSSHIFT_TRG_MODE);
+
+    return result;
+}
+
+static inline uint64_t msi_addr64(XenPTMSI *msi)
+{
+    return (uint64_t)msi->addr_hi << 32 | msi->addr_lo;
+}
+
+static int msi_msix_enable(XenPCIPassthroughState *s,
+                           uint32_t address,
+                           uint16_t flag,
+                           bool enable)
+{
+    uint16_t val = 0;
+
+    if (!address) {
+        return -1;
+    }
+
+    xen_host_pci_get_word(&s->real_device, address, &val);
+    if (enable) {
+        val |= flag;
+    } else {
+        val &= ~flag;
+    }
+    xen_host_pci_set_word(&s->real_device, address, val);
+    return 0;
+}
+
+static int msi_msix_setup(XenPCIPassthroughState *s,
+                          uint64_t addr,
+                          uint32_t data,
+                          int *ppirq,
+                          bool is_msix,
+                          int msix_entry,
+                          bool is_not_mapped)
+{
+    uint8_t gvec = msi_vector(data);
+    int rc = 0;
+
+    assert((!is_msix && msix_entry == 0) || is_msix);
+
+    if (gvec == 0) {
+        /* if gvec is 0, the guest is asking for a particular pirq that
+         * is passed as dest_id */
+        *ppirq = msi_ext_dest_id(addr >> 32) | msi_dest_id(addr);
+        if (!*ppirq) {
+            /* this probably identifies an misconfiguration of the guest,
+             * try the emulated path */
+            *ppirq = XEN_PT_UNASSIGNED_PIRQ;
+        } else {
+            XEN_PT_LOG(&s->dev, "requested pirq %d for MSI%s"
+                       " (vec: %#x, entry: %#x)\n",
+                       *ppirq, is_msix ? "-X" : "", gvec, msix_entry);
+        }
+    }
+
+    if (is_not_mapped) {
+        uint64_t table_base = 0;
+
+        if (is_msix) {
+            table_base = s->msix->table_base;
+        }
+
+        rc = xc_physdev_map_pirq_msi(xen_xc, xen_domid, XEN_PT_AUTO_ASSIGN,
+                                     ppirq, PCI_DEVFN(s->real_device.dev,
+                                                      s->real_device.func),
+                                     s->real_device.bus,
+                                     msix_entry, table_base);
+        if (rc) {
+            XEN_PT_ERR(&s->dev,
+                       "Mapping of MSI%s (rc: %i, vec: %#x, entry %#x)\n",
+                       is_msix ? "-X" : "", rc, gvec, msix_entry);
+            return rc;
+        }
+    }
+
+    return 0;
+}
+static int msi_msix_update(XenPCIPassthroughState *s,
+                           uint64_t addr,
+                           uint32_t data,
+                           int pirq,
+                           bool is_msix,
+                           int msix_entry,
+                           int *old_pirq)
+{
+    PCIDevice *d = &s->dev;
+    uint8_t gvec = msi_vector(data);
+    uint32_t gflags = msi_gflags(data, addr);
+    int rc = 0;
+    uint64_t table_addr = 0;
+
+    XEN_PT_LOG(d, "Updating MSI%s with pirq %d gvec %#x gflags %#x"
+               " (entry: %#x)\n",
+               is_msix ? "-X" : "", pirq, gvec, gflags, msix_entry);
+
+    if (is_msix) {
+        table_addr = s->msix->mmio_base_addr;
+    }
+
+    rc = xc_domain_update_msi_irq(xen_xc, xen_domid, gvec,
+                                  pirq, gflags, table_addr);
+
+    if (rc) {
+        XEN_PT_ERR(d, "Updating of MSI%s failed. (rc: %d)\n",
+                   is_msix ? "-X" : "", rc);
+
+        if (xc_physdev_unmap_pirq(xen_xc, xen_domid, *old_pirq)) {
+            XEN_PT_ERR(d, "Unmapping of MSI%s pirq %d failed.\n",
+                       is_msix ? "-X" : "", *old_pirq);
+        }
+        *old_pirq = XEN_PT_UNASSIGNED_PIRQ;
+    }
+    return rc;
+}
+
+static int msi_msix_disable(XenPCIPassthroughState *s,
+                            uint64_t addr,
+                            uint32_t data,
+                            int pirq,
+                            bool is_msix,
+                            bool is_binded)
+{
+    PCIDevice *d = &s->dev;
+    uint8_t gvec = msi_vector(data);
+    uint32_t gflags = msi_gflags(data, addr);
+    int rc = 0;
+
+    if (pirq == XEN_PT_UNASSIGNED_PIRQ) {
+        return 0;
+    }
+
+    if (is_binded) {
+        XEN_PT_LOG(d, "Unbind MSI%s with pirq %d, gvec %#x\n",
+                   is_msix ? "-X" : "", pirq, gvec);
+        rc = xc_domain_unbind_msi_irq(xen_xc, xen_domid, gvec, pirq, gflags);
+        if (rc) {
+            XEN_PT_ERR(d, "Unbinding of MSI%s failed. (pirq: %d, gvec: %#x)\n",
+                       is_msix ? "-X" : "", pirq, gvec);
+            return rc;
+        }
+    }
+
+    XEN_PT_LOG(d, "Unmap MSI%s pirq %d\n", is_msix ? "-X" : "", pirq);
+    rc = xc_physdev_unmap_pirq(xen_xc, xen_domid, pirq);
+    if (rc) {
+        XEN_PT_ERR(d, "Unmapping of MSI%s pirq %d failed. (rc: %i)\n",
+                   is_msix ? "-X" : "", pirq, rc);
+        return rc;
+    }
+
+    return 0;
+}
+
+/*
+ * MSI virtualization functions
+ */
+
+int xen_pt_msi_set_enable(XenPCIPassthroughState *s, bool enable)
+{
+    XEN_PT_LOG(&s->dev, "%s MSI.\n", enable ? "enabling" : "disabling");
+
+    if (!s->msi) {
+        return -1;
+    }
+
+    return msi_msix_enable(s, s->msi->ctrl_offset, PCI_MSI_FLAGS_ENABLE,
+                           enable);
+}
+
+/* setup physical msi, but don't enable it */
+int xen_pt_msi_setup(XenPCIPassthroughState *s)
+{
+    int pirq = XEN_PT_UNASSIGNED_PIRQ;
+    int rc = 0;
+    XenPTMSI *msi = s->msi;
+
+    if (msi->initialized) {
+        XEN_PT_ERR(&s->dev,
+                   "Setup physical MSI when it has been properly initialized.\n");
+        return -1;
+    }
+
+    rc = msi_msix_setup(s, msi_addr64(msi), msi->data, &pirq, false, 0, true);
+    if (rc) {
+        return rc;
+    }
+
+    if (pirq < 0) {
+        XEN_PT_ERR(&s->dev, "Invalid pirq number: %d.\n", pirq);
+        return -1;
+    }
+
+    msi->pirq = pirq;
+    XEN_PT_LOG(&s->dev, "MSI mapped with pirq %d.\n", pirq);
+
+    return 0;
+}
+
+int xen_pt_msi_update(XenPCIPassthroughState *s)
+{
+    XenPTMSI *msi = s->msi;
+    return msi_msix_update(s, msi_addr64(msi), msi->data, msi->pirq,
+                           false, 0, &msi->pirq);
+}
+
+void xen_pt_msi_disable(XenPCIPassthroughState *s)
+{
+    XenPTMSI *msi = s->msi;
+
+    if (!msi) {
+        return;
+    }
+
+    xen_pt_msi_set_enable(s, false);
+
+    msi_msix_disable(s, msi_addr64(msi), msi->data, msi->pirq, false,
+                     msi->initialized);
+
+    /* clear msi info */
+    msi->flags = 0;
+    msi->mapped = false;
+    msi->pirq = XEN_PT_UNASSIGNED_PIRQ;
+}
+
+/*
+ * MSI-X virtualization functions
+ */
+
+static int msix_set_enable(XenPCIPassthroughState *s, bool enabled)
+{
+    XEN_PT_LOG(&s->dev, "%s MSI-X.\n", enabled ? "enabling" : "disabling");
+
+    if (!s->msix) {
+        return -1;
+    }
+
+    return msi_msix_enable(s, s->msix->ctrl_offset, PCI_MSIX_FLAGS_ENABLE,
+                           enabled);
+}
+
+static int xen_pt_msix_update_one(XenPCIPassthroughState *s, int entry_nr)
+{
+    XenPTMSIXEntry *entry = NULL;
+    int pirq;
+    int rc;
+
+    if (entry_nr < 0 || entry_nr >= s->msix->total_entries) {
+        return -EINVAL;
+    }
+
+    entry = &s->msix->msix_entry[entry_nr];
+
+    if (!entry->updated) {
+        return 0;
+    }
+
+    pirq = entry->pirq;
+
+    rc = msi_msix_setup(s, entry->data, entry->data, &pirq, true, entry_nr,
+                        entry->pirq == XEN_PT_UNASSIGNED_PIRQ);
+    if (rc) {
+        return rc;
+    }
+    if (entry->pirq == XEN_PT_UNASSIGNED_PIRQ) {
+        entry->pirq = pirq;
+    }
+
+    rc = msi_msix_update(s, entry->addr, entry->data, pirq, true,
+                         entry_nr, &entry->pirq);
+
+    if (!rc) {
+        entry->updated = false;
+    }
+
+    return rc;
+}
+
+int xen_pt_msix_update(XenPCIPassthroughState *s)
+{
+    XenPTMSIX *msix = s->msix;
+    int i;
+
+    for (i = 0; i < msix->total_entries; i++) {
+        xen_pt_msix_update_one(s, i);
+    }
+
+    return 0;
+}
+
+void xen_pt_msix_disable(XenPCIPassthroughState *s)
+{
+    int i = 0;
+
+    msix_set_enable(s, false);
+
+    for (i = 0; i < s->msix->total_entries; i++) {
+        XenPTMSIXEntry *entry = &s->msix->msix_entry[i];
+
+        msi_msix_disable(s, entry->addr, entry->data, entry->pirq, true, true);
+
+        /* clear MSI-X info */
+        entry->pirq = XEN_PT_UNASSIGNED_PIRQ;
+        entry->updated = false;
+    }
+}
+
+int xen_pt_msix_update_remap(XenPCIPassthroughState *s, int bar_index)
+{
+    XenPTMSIXEntry *entry;
+    int i, ret;
+
+    if (!(s->msix && s->msix->bar_index == bar_index)) {
+        return 0;
+    }
+
+    for (i = 0; i < s->msix->total_entries; i++) {
+        entry = &s->msix->msix_entry[i];
+        if (entry->pirq != XEN_PT_UNASSIGNED_PIRQ) {
+            ret = xc_domain_unbind_pt_irq(xen_xc, xen_domid, entry->pirq,
+                                          PT_IRQ_TYPE_MSI, 0, 0, 0, 0);
+            if (ret) {
+                XEN_PT_ERR(&s->dev, "unbind MSI-X entry %d failed\n",
+                           entry->pirq);
+            }
+            entry->updated = true;
+        }
+    }
+    return xen_pt_msix_update(s);
+}
+
+static uint32_t get_entry_value(XenPTMSIXEntry *e, int offset)
+{
+    switch (offset) {
+    case PCI_MSIX_ENTRY_LOWER_ADDR:
+        return e->addr & UINT32_MAX;
+    case PCI_MSIX_ENTRY_UPPER_ADDR:
+        return e->addr >> 32;
+    case PCI_MSIX_ENTRY_DATA:
+        return e->data;
+    case PCI_MSIX_ENTRY_VECTOR_CTRL:
+        return e->vector_ctrl;
+    default:
+        return 0;
+    }
+}
+
+static void set_entry_value(XenPTMSIXEntry *e, int offset, uint32_t val)
+{
+    switch (offset) {
+    case PCI_MSIX_ENTRY_LOWER_ADDR:
+        e->addr = (e->addr & ((uint64_t)UINT32_MAX << 32)) | val;
+        break;
+    case PCI_MSIX_ENTRY_UPPER_ADDR:
+        e->addr = (uint64_t)val << 32 | (e->addr & UINT32_MAX);
+        break;
+    case PCI_MSIX_ENTRY_DATA:
+        e->data = val;
+        break;
+    case PCI_MSIX_ENTRY_VECTOR_CTRL:
+        e->vector_ctrl = val;
+        break;
+    }
+}
+
+static void pci_msix_write(void *opaque, target_phys_addr_t addr,
+                           uint64_t val, unsigned size)
+{
+    XenPCIPassthroughState *s = opaque;
+    XenPTMSIX *msix = s->msix;
+    XenPTMSIXEntry *entry;
+    int entry_nr, offset;
+
+    entry_nr = addr / PCI_MSIX_ENTRY_SIZE;
+    if (entry_nr < 0 || entry_nr >= msix->total_entries) {
+        XEN_PT_ERR(&s->dev, "asked MSI-X entry '%i' invalid!\n", entry_nr);
+        return;
+    }
+    entry = &msix->msix_entry[entry_nr];
+    offset = addr % PCI_MSIX_ENTRY_SIZE;
+
+    if (offset != PCI_MSIX_ENTRY_VECTOR_CTRL) {
+        const volatile uint32_t *vec_ctrl;
+
+        if (get_entry_value(entry, offset) == val) {
+            return;
+        }
+
+        /*
+         * If Xen intercepts the mask bit access, entry->vec_ctrl may not be
+         * up-to-date. Read from hardware directly.
+         */
+        vec_ctrl = s->msix->phys_iomem_base + entry_nr * PCI_MSIX_ENTRY_SIZE
+            + PCI_MSIX_ENTRY_VECTOR_CTRL;
+
+        if (msix->enabled && !(*vec_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT)) {
+            XEN_PT_ERR(&s->dev, "Can't update msix entry %d since MSI-X is"
+                       " already enabled.\n", entry_nr);
+            return;
+        }
+
+        entry->updated = true;
+    }
+
+    set_entry_value(entry, offset, val);
+
+    if (offset == PCI_MSIX_ENTRY_VECTOR_CTRL) {
+        if (msix->enabled && !(val & PCI_MSIX_ENTRY_CTRL_MASKBIT)) {
+            xen_pt_msix_update_one(s, entry_nr);
+        }
+    }
+}
+
+static uint64_t pci_msix_read(void *opaque, target_phys_addr_t addr,
+                              unsigned size)
+{
+    XenPCIPassthroughState *s = opaque;
+    XenPTMSIX *msix = s->msix;
+    int entry_nr, offset;
+
+    entry_nr = addr / PCI_MSIX_ENTRY_SIZE;
+    if (entry_nr < 0) {
+        XEN_PT_ERR(&s->dev, "asked MSI-X entry '%i' invalid!\n", entry_nr);
+        return 0;
+    }
+
+    offset = addr % PCI_MSIX_ENTRY_SIZE;
+
+    if (addr < msix->total_entries * PCI_MSIX_ENTRY_SIZE) {
+        return get_entry_value(&msix->msix_entry[entry_nr], offset);
+    } else {
+        /* Pending Bit Array (PBA) */
+        return *(uint32_t *)(msix->phys_iomem_base + addr);
+    }
+}
+
+static const MemoryRegionOps pci_msix_ops = {
+    .read = pci_msix_read,
+    .write = pci_msix_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+        .unaligned = false,
+    },
+};
+
+int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base)
+{
+    uint8_t id = 0;
+    uint16_t control = 0;
+    uint32_t table_off = 0;
+    int i, total_entries, bar_index;
+    XenHostPCIDevice *hd = &s->real_device;
+    PCIDevice *d = &s->dev;
+    int fd = -1;
+    XenPTMSIX *msix = NULL;
+    int rc = 0;
+
+    rc = xen_host_pci_get_byte(hd, base + PCI_CAP_LIST_ID, &id);
+    if (rc) {
+        return rc;
+    }
+
+    if (id != PCI_CAP_ID_MSIX) {
+        XEN_PT_ERR(d, "Invalid id %#x base %#x\n", id, base);
+        return -1;
+    }
+
+    xen_host_pci_get_word(hd, base + PCI_MSIX_FLAGS, &control);
+    total_entries = control & PCI_MSIX_FLAGS_QSIZE;
+    total_entries += 1;
+
+    s->msix = g_malloc0(sizeof (XenPTMSIX)
+                        + total_entries * sizeof (XenPTMSIXEntry));
+    msix = s->msix;
+
+    msix->total_entries = total_entries;
+    for (i = 0; i < total_entries; i++) {
+        msix->msix_entry[i].pirq = XEN_PT_UNASSIGNED_PIRQ;
+    }
+
+    memory_region_init_io(&msix->mmio, &pci_msix_ops, s, "xen-pci-pt-msix",
+                          (total_entries * PCI_MSIX_ENTRY_SIZE
+                           + XC_PAGE_SIZE - 1)
+                          & XC_PAGE_MASK);
+
+    xen_host_pci_get_long(hd, base + PCI_MSIX_TABLE, &table_off);
+    bar_index = msix->bar_index = table_off & PCI_MSIX_FLAGS_BIRMASK;
+    table_off = table_off & ~PCI_MSIX_FLAGS_BIRMASK;
+    msix->table_base = s->real_device.io_regions[bar_index].base_addr;
+    XEN_PT_LOG(d, "get MSI-X table BAR base 0x%"PRIx64"\n", msix->table_base);
+
+    fd = open("/dev/mem", O_RDWR);
+    if (fd == -1) {
+        rc = -errno;
+        XEN_PT_ERR(d, "Can't open /dev/mem: %s\n", strerror(errno));
+        goto error_out;
+    }
+    XEN_PT_LOG(d, "table_off = %#x, total_entries = %d\n",
+               table_off, total_entries);
+    msix->table_offset_adjust = table_off & 0x0fff;
+    msix->phys_iomem_base =
+        mmap(NULL,
+             total_entries * PCI_MSIX_ENTRY_SIZE + msix->table_offset_adjust,
+             PROT_READ,
+             MAP_SHARED | MAP_LOCKED,
+             fd,
+             msix->table_base + table_off - msix->table_offset_adjust);
+    close(fd);
+    if (msix->phys_iomem_base == MAP_FAILED) {
+        rc = -errno;
+        XEN_PT_ERR(d, "Can't map physical MSI-X table: %s\n", strerror(errno));
+        goto error_out;
+    }
+    msix->phys_iomem_base = (char *)msix->phys_iomem_base
+        + msix->table_offset_adjust;
+
+    XEN_PT_LOG(d, "mapping physical MSI-X table to %p\n",
+               msix->phys_iomem_base);
+
+    memory_region_add_subregion_overlap(&s->bar[bar_index], table_off,
+                                        &msix->mmio,
+                                        2); /* Priority: pci default + 1 */
+
+    return 0;
+
+error_out:
+    memory_region_destroy(&msix->mmio);
+    g_free(s->msix);
+    s->msix = NULL;
+    return rc;
+}
+
+void xen_pt_msix_delete(XenPCIPassthroughState *s)
+{
+    XenPTMSIX *msix = s->msix;
+
+    if (!msix) {
+        return;
+    }
+
+    /* unmap the MSI-X memory mapped register area */
+    if (msix->phys_iomem_base) {
+        XEN_PT_LOG(&s->dev, "unmapping physical MSI-X table from %p\n",
+                   msix->phys_iomem_base);
+        munmap(msix->phys_iomem_base, msix->total_entries * PCI_MSIX_ENTRY_SIZE
+               + msix->table_offset_adjust);
+    }
+
+    memory_region_del_subregion(&s->bar[msix->bar_index], &msix->mmio);
+    memory_region_destroy(&msix->mmio);
+
+    g_free(s->msix);
+    s->msix = NULL;
+}
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V13 9/9] Introduce Xen PCI Passthrough, MSI (3/3)
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
                   ` (12 preceding siblings ...)
  2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 9/9] Introduce Xen PCI Passthrough, MSI (3/3) Anthony PERARD
@ 2012-06-14 17:01 ` Anthony PERARD
  2012-06-14 19:41 ` [PATCH V13 0/9] Xen PCI Passthrough Michael S. Tsirkin
  2012-06-14 19:41 ` [Qemu-devel] " Michael S. Tsirkin
  15 siblings, 0 replies; 37+ messages in thread
From: Anthony PERARD @ 2012-06-14 17:01 UTC (permalink / raw)
  To: QEMU-devel
  Cc: Anthony Liguori, Jiang Yunhong, Stefano Stabellini, Jan Kiszka,
	Michael S. Tsirkin, Shan Haitao, Xen Devel, Anthony PERARD

From: Jiang Yunhong <yunhong.jiang@intel.com>

A more complete history can be found here:
git://xenbits.xensource.com/qemu-xen-unstable.git

Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Shan Haitao <haitao.shan@intel.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/i386/Makefile.objs   |    2 +-
 hw/xen_pt.c             |   31 +++-
 hw/xen_pt.h             |   51 ++++
 hw/xen_pt_config_init.c |  471 +++++++++++++++++++++++++++++++++++
 hw/xen_pt_msi.c         |  620 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1173 insertions(+), 2 deletions(-)
 create mode 100644 hw/xen_pt_msi.c

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index e361a92..d364c37 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -8,7 +8,7 @@ obj-y += pc_piix.o
 obj-y += pc_sysfw.o
 obj-$(CONFIG_XEN) += xen_platform.o xen_apic.o
 obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
-obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o
+obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o xen_pt_msi.o
 obj-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o kvm/i8254.o
 obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
diff --git a/hw/xen_pt.c b/hw/xen_pt.c
index 92ad0fa..3b6d186 100644
--- a/hw/xen_pt.c
+++ b/hw/xen_pt.c
@@ -36,6 +36,20 @@
  *
  *     Write '1'
  *       - Set real bit to '1'.
+ *
+ * MSI interrupt:
+ *   Initialize MSI register(xen_pt_msi_setup, xen_pt_msi_update)
+ *     Bind MSI(xc_domain_update_msi_irq)
+ *       <fail>
+ *         - Unmap MSI.
+ *         - Set dev->msi->pirq to '-1'.
+ *
+ * MSI-X interrupt:
+ *   Initialize MSI-X register(xen_pt_msix_update_one)
+ *     Bind MSI-X(xc_domain_update_msi_irq)
+ *       <fail>
+ *         - Unmap MSI-X.
+ *         - Set entry->pirq to '-1'.
  */
 
 #include <sys/ioctl.h>
@@ -534,7 +548,15 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
     };
 
     bar = xen_pt_bar_from_region(s, mr);
-    if (bar == -1) {
+    if (bar == -1 && (!s->msix || &s->msix->mmio != mr)) {
+        return;
+    }
+
+    if (s->msix && &s->msix->mmio == mr) {
+        if (adding) {
+            s->msix->mmio_base_addr = sec->offset_within_address_space;
+            rc = xen_pt_msix_update_remap(s, s->msix->bar_index);
+        }
         return;
     }
 
@@ -764,6 +786,13 @@ static int xen_pt_unregister_device(PCIDevice *d)
         }
     }
 
+    if (s->msi) {
+        xen_pt_msi_disable(s);
+    }
+    if (s->msix) {
+        xen_pt_msix_disable(s);
+    }
+
     if (machine_irq) {
         xen_pt_mapped_machine_irq[machine_irq]--;
 
diff --git a/hw/xen_pt.h b/hw/xen_pt.h
index 4b76073..41904ec 100644
--- a/hw/xen_pt.h
+++ b/hw/xen_pt.h
@@ -160,6 +160,36 @@ typedef struct XenPTRegGroup {
 
 
 #define XEN_PT_UNASSIGNED_PIRQ (-1)
+typedef struct XenPTMSI {
+    uint16_t flags;
+    uint32_t addr_lo;  /* guest message address */
+    uint32_t addr_hi;  /* guest message upper address */
+    uint16_t data;     /* guest message data */
+    uint32_t ctrl_offset; /* saved control offset */
+    int pirq;          /* guest pirq corresponding */
+    bool initialized;  /* when guest MSI is initialized */
+    bool mapped;       /* when pirq is mapped */
+} XenPTMSI;
+
+typedef struct XenPTMSIXEntry {
+    int pirq;
+    uint64_t addr;
+    uint32_t data;
+    uint32_t vector_ctrl;
+    bool updated; /* indicate whether MSI ADDR or DATA is updated */
+} XenPTMSIXEntry;
+typedef struct XenPTMSIX {
+    uint32_t ctrl_offset;
+    bool enabled;
+    int total_entries;
+    int bar_index;
+    uint64_t table_base;
+    uint32_t table_offset_adjust; /* page align mmap */
+    uint64_t mmio_base_addr;
+    MemoryRegion mmio;
+    void *phys_iomem_base;
+    XenPTMSIXEntry msix_entry[0];
+} XenPTMSIX;
 
 struct XenPCIPassthroughState {
     PCIDevice dev;
@@ -172,6 +202,9 @@ struct XenPCIPassthroughState {
 
     uint32_t machine_irq;
 
+    XenPTMSI *msi;
+    XenPTMSIX *msix;
+
     MemoryRegion bar[PCI_NUM_REGIONS - 1];
     MemoryRegion rom;
 
@@ -247,4 +280,22 @@ static inline uint8_t xen_pt_pci_intx(XenPCIPassthroughState *s)
     return r_val;
 }
 
+/* MSI/MSI-X */
+int xen_pt_msi_set_enable(XenPCIPassthroughState *s, bool en);
+int xen_pt_msi_setup(XenPCIPassthroughState *s);
+int xen_pt_msi_update(XenPCIPassthroughState *d);
+void xen_pt_msi_disable(XenPCIPassthroughState *s);
+
+int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base);
+void xen_pt_msix_delete(XenPCIPassthroughState *s);
+int xen_pt_msix_update(XenPCIPassthroughState *s);
+int xen_pt_msix_update_remap(XenPCIPassthroughState *s, int bar_index);
+void xen_pt_msix_disable(XenPCIPassthroughState *s);
+
+static inline bool xen_pt_has_msix_mapping(XenPCIPassthroughState *s, int bar)
+{
+    return s->msix && s->msix->bar_index == bar;
+}
+
+
 #endif /* !XEN_PT_H */
diff --git a/hw/xen_pt_config_init.c b/hw/xen_pt_config_init.c
index 1d97876..00eb3d9 100644
--- a/hw/xen_pt_config_init.c
+++ b/hw/xen_pt_config_init.c
@@ -1022,6 +1022,410 @@ static XenPTRegInfo xen_pt_emu_reg_pm[] = {
 };
 
 
+/********************************
+ * MSI Capability
+ */
+
+/* Helper */
+static bool xen_pt_msgdata_check_type(uint32_t offset, uint16_t flags)
+{
+    /* check the offset whether matches the type or not */
+    bool is_32 = (offset == PCI_MSI_DATA_32) && !(flags & PCI_MSI_FLAGS_64BIT);
+    bool is_64 = (offset == PCI_MSI_DATA_64) &&  (flags & PCI_MSI_FLAGS_64BIT);
+    return is_32 || is_64;
+}
+
+/* Message Control register */
+static int xen_pt_msgctrl_reg_init(XenPCIPassthroughState *s,
+                                   XenPTRegInfo *reg, uint32_t real_offset,
+                                   uint32_t *data)
+{
+    PCIDevice *d = &s->dev;
+    XenPTMSI *msi = s->msi;
+    uint16_t reg_field = 0;
+
+    /* use I/O device register's value as initial value */
+    reg_field = pci_get_word(d->config + real_offset);
+
+    if (reg_field & PCI_MSI_FLAGS_ENABLE) {
+        XEN_PT_LOG(&s->dev, "MSI already enabled, disabling it first\n");
+        xen_host_pci_set_word(&s->real_device, real_offset,
+                              reg_field & ~PCI_MSI_FLAGS_ENABLE);
+    }
+    msi->flags |= reg_field;
+    msi->ctrl_offset = real_offset;
+    msi->initialized = false;
+    msi->mapped = false;
+
+    *data = reg->init_val;
+    return 0;
+}
+static int xen_pt_msgctrl_reg_write(XenPCIPassthroughState *s,
+                                    XenPTReg *cfg_entry, uint16_t *val,
+                                    uint16_t dev_value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    XenPTMSI *msi = s->msi;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t raw_val;
+
+    /* Currently no support for multi-vector */
+    if (*val & PCI_MSI_FLAGS_QSIZE) {
+        XEN_PT_WARN(&s->dev, "Tries to set more than 1 vector ctrl %x\n", *val);
+    }
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+    msi->flags |= cfg_entry->data & ~PCI_MSI_FLAGS_ENABLE;
+
+    /* create value for writing to I/O device register */
+    raw_val = *val;
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    /* update MSI */
+    if (raw_val & PCI_MSI_FLAGS_ENABLE) {
+        /* setup MSI pirq for the first time */
+        if (!msi->initialized) {
+            /* Init physical one */
+            XEN_PT_LOG(&s->dev, "setup MSI\n");
+            if (xen_pt_msi_setup(s)) {
+                /* We do not broadcast the error to the framework code, so
+                 * that MSI errors are contained in MSI emulation code and
+                 * QEMU can go on running.
+                 * Guest MSI would be actually not working.
+                 */
+                *val &= ~PCI_MSI_FLAGS_ENABLE;
+                XEN_PT_WARN(&s->dev, "Can not map MSI.\n");
+                return 0;
+            }
+            if (xen_pt_msi_update(s)) {
+                *val &= ~PCI_MSI_FLAGS_ENABLE;
+                XEN_PT_WARN(&s->dev, "Can not bind MSI\n");
+                return 0;
+            }
+            msi->initialized = true;
+            msi->mapped = true;
+        }
+        msi->flags |= PCI_MSI_FLAGS_ENABLE;
+    } else {
+        msi->flags &= ~PCI_MSI_FLAGS_ENABLE;
+    }
+
+    /* pass through MSI_ENABLE bit */
+    *val &= ~PCI_MSI_FLAGS_ENABLE;
+    *val |= raw_val & PCI_MSI_FLAGS_ENABLE;
+
+    return 0;
+}
+
+/* initialize Message Upper Address register */
+static int xen_pt_msgaddr64_reg_init(XenPCIPassthroughState *s,
+                                     XenPTRegInfo *reg, uint32_t real_offset,
+                                     uint32_t *data)
+{
+    /* no need to initialize in case of 32 bit type */
+    if (!(s->msi->flags & PCI_MSI_FLAGS_64BIT)) {
+        *data = XEN_PT_INVALID_REG;
+    } else {
+        *data = reg->init_val;
+    }
+
+    return 0;
+}
+/* this function will be called twice (for 32 bit and 64 bit type) */
+/* initialize Message Data register */
+static int xen_pt_msgdata_reg_init(XenPCIPassthroughState *s,
+                                   XenPTRegInfo *reg, uint32_t real_offset,
+                                   uint32_t *data)
+{
+    uint32_t flags = s->msi->flags;
+    uint32_t offset = reg->offset;
+
+    /* check the offset whether matches the type or not */
+    if (xen_pt_msgdata_check_type(offset, flags)) {
+        *data = reg->init_val;
+    } else {
+        *data = XEN_PT_INVALID_REG;
+    }
+    return 0;
+}
+
+/* write Message Address register */
+static int xen_pt_msgaddr32_reg_write(XenPCIPassthroughState *s,
+                                      XenPTReg *cfg_entry, uint32_t *val,
+                                      uint32_t dev_value, uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    uint32_t old_addr = cfg_entry->data;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+    s->msi->addr_lo = cfg_entry->data;
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    /* update MSI */
+    if (cfg_entry->data != old_addr) {
+        if (s->msi->mapped) {
+            xen_pt_msi_update(s);
+        }
+    }
+
+    return 0;
+}
+/* write Message Upper Address register */
+static int xen_pt_msgaddr64_reg_write(XenPCIPassthroughState *s,
+                                      XenPTReg *cfg_entry, uint32_t *val,
+                                      uint32_t dev_value, uint32_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    uint32_t old_addr = cfg_entry->data;
+
+    /* check whether the type is 64 bit or not */
+    if (!(s->msi->flags & PCI_MSI_FLAGS_64BIT)) {
+        XEN_PT_ERR(&s->dev,
+                   "Can't write to the upper address without 64 bit support\n");
+        return -1;
+    }
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+    /* update the msi_info too */
+    s->msi->addr_hi = cfg_entry->data;
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    /* update MSI */
+    if (cfg_entry->data != old_addr) {
+        if (s->msi->mapped) {
+            xen_pt_msi_update(s);
+        }
+    }
+
+    return 0;
+}
+
+
+/* this function will be called twice (for 32 bit and 64 bit type) */
+/* write Message Data register */
+static int xen_pt_msgdata_reg_write(XenPCIPassthroughState *s,
+                                    XenPTReg *cfg_entry, uint16_t *val,
+                                    uint16_t dev_value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    XenPTMSI *msi = s->msi;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t old_data = cfg_entry->data;
+    uint32_t offset = reg->offset;
+
+    /* check the offset whether matches the type or not */
+    if (!xen_pt_msgdata_check_type(offset, msi->flags)) {
+        /* exit I/O emulator */
+        XEN_PT_ERR(&s->dev, "the offset does not match the 32/64 bit type!\n");
+        return -1;
+    }
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+    /* update the msi_info too */
+    msi->data = cfg_entry->data;
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    /* update MSI */
+    if (cfg_entry->data != old_data) {
+        if (msi->mapped) {
+            xen_pt_msi_update(s);
+        }
+    }
+
+    return 0;
+}
+
+/* MSI Capability Structure reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_msi[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Message Control reg */
+    {
+        .offset     = PCI_MSI_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFF8E,
+        .emu_mask   = 0x007F,
+        .init       = xen_pt_msgctrl_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_msgctrl_reg_write,
+    },
+    /* Message Address reg */
+    {
+        .offset     = PCI_MSI_ADDRESS_LO,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x00000003,
+        .emu_mask   = 0xFFFFFFFF,
+        .no_wb      = 1,
+        .init       = xen_pt_common_reg_init,
+        .u.dw.read  = xen_pt_long_reg_read,
+        .u.dw.write = xen_pt_msgaddr32_reg_write,
+    },
+    /* Message Upper Address reg (if PCI_MSI_FLAGS_64BIT set) */
+    {
+        .offset     = PCI_MSI_ADDRESS_HI,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x00000000,
+        .emu_mask   = 0xFFFFFFFF,
+        .no_wb      = 1,
+        .init       = xen_pt_msgaddr64_reg_init,
+        .u.dw.read  = xen_pt_long_reg_read,
+        .u.dw.write = xen_pt_msgaddr64_reg_write,
+    },
+    /* Message Data reg (16 bits of data for 32-bit devices) */
+    {
+        .offset     = PCI_MSI_DATA_32,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .no_wb      = 1,
+        .init       = xen_pt_msgdata_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_msgdata_reg_write,
+    },
+    /* Message Data reg (16 bits of data for 64-bit devices) */
+    {
+        .offset     = PCI_MSI_DATA_64,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .no_wb      = 1,
+        .init       = xen_pt_msgdata_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_msgdata_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
+/**************************************
+ * MSI-X Capability
+ */
+
+/* Message Control register for MSI-X */
+static int xen_pt_msixctrl_reg_init(XenPCIPassthroughState *s,
+                                    XenPTRegInfo *reg, uint32_t real_offset,
+                                    uint32_t *data)
+{
+    PCIDevice *d = &s->dev;
+    uint16_t reg_field = 0;
+
+    /* use I/O device register's value as initial value */
+    reg_field = pci_get_word(d->config + real_offset);
+
+    if (reg_field & PCI_MSIX_FLAGS_ENABLE) {
+        XEN_PT_LOG(d, "MSIX already enabled, disabling it first\n");
+        xen_host_pci_set_word(&s->real_device, real_offset,
+                              reg_field & ~PCI_MSIX_FLAGS_ENABLE);
+    }
+
+    s->msix->ctrl_offset = real_offset;
+
+    *data = reg->init_val;
+    return 0;
+}
+static int xen_pt_msixctrl_reg_write(XenPCIPassthroughState *s,
+                                     XenPTReg *cfg_entry, uint16_t *val,
+                                     uint16_t dev_value, uint16_t valid_mask)
+{
+    XenPTRegInfo *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    int debug_msix_enabled_old;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = XEN_PT_MERGE_VALUE(*val, cfg_entry->data, writable_mask);
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *val = XEN_PT_MERGE_VALUE(*val, dev_value, throughable_mask);
+
+    /* update MSI-X */
+    if ((*val & PCI_MSIX_FLAGS_ENABLE)
+        && !(*val & PCI_MSIX_FLAGS_MASKALL)) {
+        xen_pt_msix_update(s);
+    }
+
+    debug_msix_enabled_old = s->msix->enabled;
+    s->msix->enabled = !!(*val & PCI_MSIX_FLAGS_ENABLE);
+    if (s->msix->enabled != debug_msix_enabled_old) {
+        XEN_PT_LOG(&s->dev, "%s MSI-X\n",
+                   s->msix->enabled ? "enable" : "disable");
+    }
+
+    return 0;
+}
+
+/* MSI-X Capability Structure reg static infomation table */
+static XenPTRegInfo xen_pt_emu_reg_msix[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = xen_pt_ptr_reg_init,
+        .u.b.read   = xen_pt_byte_reg_read,
+        .u.b.write  = xen_pt_byte_reg_write,
+    },
+    /* Message Control reg */
+    {
+        .offset     = PCI_MSI_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x3FFF,
+        .emu_mask   = 0x0000,
+        .init       = xen_pt_msixctrl_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_msixctrl_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
 /****************************
  * Capabilities
  */
@@ -1115,6 +1519,49 @@ static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
     *size = pcie_size;
     return 0;
 }
+/* get MSI Capability Structure register group size */
+static int xen_pt_msi_size_init(XenPCIPassthroughState *s,
+                                const XenPTRegGroupInfo *grp_reg,
+                                uint32_t base_offset, uint8_t *size)
+{
+    PCIDevice *d = &s->dev;
+    uint16_t msg_ctrl = 0;
+    uint8_t msi_size = 0xa;
+
+    msg_ctrl = pci_get_word(d->config + (base_offset + PCI_MSI_FLAGS));
+
+    /* check if 64-bit address is capable of per-vector masking */
+    if (msg_ctrl & PCI_MSI_FLAGS_64BIT) {
+        msi_size += 4;
+    }
+    if (msg_ctrl & PCI_MSI_FLAGS_MASKBIT) {
+        msi_size += 10;
+    }
+
+    s->msi = g_new0(XenPTMSI, 1);
+    s->msi->pirq = XEN_PT_UNASSIGNED_PIRQ;
+
+    *size = msi_size;
+    return 0;
+}
+/* get MSI-X Capability Structure register group size */
+static int xen_pt_msix_size_init(XenPCIPassthroughState *s,
+                                 const XenPTRegGroupInfo *grp_reg,
+                                 uint32_t base_offset, uint8_t *size)
+{
+    int rc = 0;
+
+    rc = xen_pt_msix_init(s, base_offset);
+
+    if (rc < 0) {
+        XEN_PT_ERR(&s->dev, "Internal error: Invalid xen_pt_msix_init.\n");
+        return rc;
+    }
+
+    *size = grp_reg->grp_size;
+    return 0;
+}
+
 
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
     /* Header Type0 reg group */
@@ -1155,6 +1602,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .grp_size   = 0x04,
         .size_init  = xen_pt_reg_grp_size_init,
     },
+    /* MSI Capability Structure reg group */
+    {
+        .grp_id      = PCI_CAP_ID_MSI,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0xFF,
+        .size_init   = xen_pt_msi_size_init,
+        .emu_regs = xen_pt_emu_reg_msi,
+    },
     /* PCI-X Capabilities List Item reg group */
     {
         .grp_id     = PCI_CAP_ID_PCIX,
@@ -1199,6 +1654,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init   = xen_pt_pcie_size_init,
         .emu_regs = xen_pt_emu_reg_pcie,
     },
+    /* MSI-X Capability Structure reg group */
+    {
+        .grp_id      = PCI_CAP_ID_MSIX,
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0x0C,
+        .size_init   = xen_pt_msix_size_init,
+        .emu_regs = xen_pt_emu_reg_msix,
+    },
     {
         .grp_size = 0,
     },
@@ -1384,6 +1847,14 @@ void xen_pt_config_delete(XenPCIPassthroughState *s)
     struct XenPTRegGroup *reg_group, *next_grp;
     struct XenPTReg *reg, *next_reg;
 
+    /* free MSI/MSI-X info table */
+    if (s->msix) {
+        xen_pt_msix_delete(s);
+    }
+    if (s->msi) {
+        g_free(s->msi);
+    }
+
     /* free all register group entry */
     QLIST_FOREACH_SAFE(reg_group, &s->reg_grps, entries, next_grp) {
         /* free all register entry */
diff --git a/hw/xen_pt_msi.c b/hw/xen_pt_msi.c
new file mode 100644
index 0000000..2299cc7
--- /dev/null
+++ b/hw/xen_pt_msi.c
@@ -0,0 +1,620 @@
+/*
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Jiang Yunhong <yunhong.jiang@intel.com>
+ *
+ * This file implements direct PCI assignment to a HVM guest
+ */
+
+#include <sys/mman.h>
+
+#include "xen_backend.h"
+#include "xen_pt.h"
+#include "apic-msidef.h"
+
+
+#define XEN_PT_AUTO_ASSIGN -1
+
+/* shift count for gflags */
+#define XEN_PT_GFLAGS_SHIFT_DEST_ID        0
+#define XEN_PT_GFLAGS_SHIFT_RH             8
+#define XEN_PT_GFLAGS_SHIFT_DM             9
+#define XEN_PT_GFLAGSSHIFT_DELIV_MODE     12
+#define XEN_PT_GFLAGSSHIFT_TRG_MODE       15
+
+
+/*
+ * Helpers
+ */
+
+static inline uint8_t msi_vector(uint32_t data)
+{
+    return (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
+}
+
+static inline uint8_t msi_dest_id(uint32_t addr)
+{
+    return (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+}
+
+static inline uint32_t msi_ext_dest_id(uint32_t addr_hi)
+{
+    return addr_hi & 0xffffff00;
+}
+
+static uint32_t msi_gflags(uint32_t data, uint64_t addr)
+{
+    uint32_t result = 0;
+    int rh, dm, dest_id, deliv_mode, trig_mode;
+
+    rh = (addr >> MSI_ADDR_REDIRECTION_SHIFT) & 0x1;
+    dm = (addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
+    dest_id = msi_dest_id(addr);
+    deliv_mode = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+    trig_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+
+    result = dest_id | (rh << XEN_PT_GFLAGS_SHIFT_RH)
+        | (dm << XEN_PT_GFLAGS_SHIFT_DM)
+        | (deliv_mode << XEN_PT_GFLAGSSHIFT_DELIV_MODE)
+        | (trig_mode << XEN_PT_GFLAGSSHIFT_TRG_MODE);
+
+    return result;
+}
+
+static inline uint64_t msi_addr64(XenPTMSI *msi)
+{
+    return (uint64_t)msi->addr_hi << 32 | msi->addr_lo;
+}
+
+static int msi_msix_enable(XenPCIPassthroughState *s,
+                           uint32_t address,
+                           uint16_t flag,
+                           bool enable)
+{
+    uint16_t val = 0;
+
+    if (!address) {
+        return -1;
+    }
+
+    xen_host_pci_get_word(&s->real_device, address, &val);
+    if (enable) {
+        val |= flag;
+    } else {
+        val &= ~flag;
+    }
+    xen_host_pci_set_word(&s->real_device, address, val);
+    return 0;
+}
+
+static int msi_msix_setup(XenPCIPassthroughState *s,
+                          uint64_t addr,
+                          uint32_t data,
+                          int *ppirq,
+                          bool is_msix,
+                          int msix_entry,
+                          bool is_not_mapped)
+{
+    uint8_t gvec = msi_vector(data);
+    int rc = 0;
+
+    assert((!is_msix && msix_entry == 0) || is_msix);
+
+    if (gvec == 0) {
+        /* if gvec is 0, the guest is asking for a particular pirq that
+         * is passed as dest_id */
+        *ppirq = msi_ext_dest_id(addr >> 32) | msi_dest_id(addr);
+        if (!*ppirq) {
+            /* this probably identifies an misconfiguration of the guest,
+             * try the emulated path */
+            *ppirq = XEN_PT_UNASSIGNED_PIRQ;
+        } else {
+            XEN_PT_LOG(&s->dev, "requested pirq %d for MSI%s"
+                       " (vec: %#x, entry: %#x)\n",
+                       *ppirq, is_msix ? "-X" : "", gvec, msix_entry);
+        }
+    }
+
+    if (is_not_mapped) {
+        uint64_t table_base = 0;
+
+        if (is_msix) {
+            table_base = s->msix->table_base;
+        }
+
+        rc = xc_physdev_map_pirq_msi(xen_xc, xen_domid, XEN_PT_AUTO_ASSIGN,
+                                     ppirq, PCI_DEVFN(s->real_device.dev,
+                                                      s->real_device.func),
+                                     s->real_device.bus,
+                                     msix_entry, table_base);
+        if (rc) {
+            XEN_PT_ERR(&s->dev,
+                       "Mapping of MSI%s (rc: %i, vec: %#x, entry %#x)\n",
+                       is_msix ? "-X" : "", rc, gvec, msix_entry);
+            return rc;
+        }
+    }
+
+    return 0;
+}
+static int msi_msix_update(XenPCIPassthroughState *s,
+                           uint64_t addr,
+                           uint32_t data,
+                           int pirq,
+                           bool is_msix,
+                           int msix_entry,
+                           int *old_pirq)
+{
+    PCIDevice *d = &s->dev;
+    uint8_t gvec = msi_vector(data);
+    uint32_t gflags = msi_gflags(data, addr);
+    int rc = 0;
+    uint64_t table_addr = 0;
+
+    XEN_PT_LOG(d, "Updating MSI%s with pirq %d gvec %#x gflags %#x"
+               " (entry: %#x)\n",
+               is_msix ? "-X" : "", pirq, gvec, gflags, msix_entry);
+
+    if (is_msix) {
+        table_addr = s->msix->mmio_base_addr;
+    }
+
+    rc = xc_domain_update_msi_irq(xen_xc, xen_domid, gvec,
+                                  pirq, gflags, table_addr);
+
+    if (rc) {
+        XEN_PT_ERR(d, "Updating of MSI%s failed. (rc: %d)\n",
+                   is_msix ? "-X" : "", rc);
+
+        if (xc_physdev_unmap_pirq(xen_xc, xen_domid, *old_pirq)) {
+            XEN_PT_ERR(d, "Unmapping of MSI%s pirq %d failed.\n",
+                       is_msix ? "-X" : "", *old_pirq);
+        }
+        *old_pirq = XEN_PT_UNASSIGNED_PIRQ;
+    }
+    return rc;
+}
+
+static int msi_msix_disable(XenPCIPassthroughState *s,
+                            uint64_t addr,
+                            uint32_t data,
+                            int pirq,
+                            bool is_msix,
+                            bool is_binded)
+{
+    PCIDevice *d = &s->dev;
+    uint8_t gvec = msi_vector(data);
+    uint32_t gflags = msi_gflags(data, addr);
+    int rc = 0;
+
+    if (pirq == XEN_PT_UNASSIGNED_PIRQ) {
+        return 0;
+    }
+
+    if (is_binded) {
+        XEN_PT_LOG(d, "Unbind MSI%s with pirq %d, gvec %#x\n",
+                   is_msix ? "-X" : "", pirq, gvec);
+        rc = xc_domain_unbind_msi_irq(xen_xc, xen_domid, gvec, pirq, gflags);
+        if (rc) {
+            XEN_PT_ERR(d, "Unbinding of MSI%s failed. (pirq: %d, gvec: %#x)\n",
+                       is_msix ? "-X" : "", pirq, gvec);
+            return rc;
+        }
+    }
+
+    XEN_PT_LOG(d, "Unmap MSI%s pirq %d\n", is_msix ? "-X" : "", pirq);
+    rc = xc_physdev_unmap_pirq(xen_xc, xen_domid, pirq);
+    if (rc) {
+        XEN_PT_ERR(d, "Unmapping of MSI%s pirq %d failed. (rc: %i)\n",
+                   is_msix ? "-X" : "", pirq, rc);
+        return rc;
+    }
+
+    return 0;
+}
+
+/*
+ * MSI virtualization functions
+ */
+
+int xen_pt_msi_set_enable(XenPCIPassthroughState *s, bool enable)
+{
+    XEN_PT_LOG(&s->dev, "%s MSI.\n", enable ? "enabling" : "disabling");
+
+    if (!s->msi) {
+        return -1;
+    }
+
+    return msi_msix_enable(s, s->msi->ctrl_offset, PCI_MSI_FLAGS_ENABLE,
+                           enable);
+}
+
+/* setup physical msi, but don't enable it */
+int xen_pt_msi_setup(XenPCIPassthroughState *s)
+{
+    int pirq = XEN_PT_UNASSIGNED_PIRQ;
+    int rc = 0;
+    XenPTMSI *msi = s->msi;
+
+    if (msi->initialized) {
+        XEN_PT_ERR(&s->dev,
+                   "Setup physical MSI when it has been properly initialized.\n");
+        return -1;
+    }
+
+    rc = msi_msix_setup(s, msi_addr64(msi), msi->data, &pirq, false, 0, true);
+    if (rc) {
+        return rc;
+    }
+
+    if (pirq < 0) {
+        XEN_PT_ERR(&s->dev, "Invalid pirq number: %d.\n", pirq);
+        return -1;
+    }
+
+    msi->pirq = pirq;
+    XEN_PT_LOG(&s->dev, "MSI mapped with pirq %d.\n", pirq);
+
+    return 0;
+}
+
+int xen_pt_msi_update(XenPCIPassthroughState *s)
+{
+    XenPTMSI *msi = s->msi;
+    return msi_msix_update(s, msi_addr64(msi), msi->data, msi->pirq,
+                           false, 0, &msi->pirq);
+}
+
+void xen_pt_msi_disable(XenPCIPassthroughState *s)
+{
+    XenPTMSI *msi = s->msi;
+
+    if (!msi) {
+        return;
+    }
+
+    xen_pt_msi_set_enable(s, false);
+
+    msi_msix_disable(s, msi_addr64(msi), msi->data, msi->pirq, false,
+                     msi->initialized);
+
+    /* clear msi info */
+    msi->flags = 0;
+    msi->mapped = false;
+    msi->pirq = XEN_PT_UNASSIGNED_PIRQ;
+}
+
+/*
+ * MSI-X virtualization functions
+ */
+
+static int msix_set_enable(XenPCIPassthroughState *s, bool enabled)
+{
+    XEN_PT_LOG(&s->dev, "%s MSI-X.\n", enabled ? "enabling" : "disabling");
+
+    if (!s->msix) {
+        return -1;
+    }
+
+    return msi_msix_enable(s, s->msix->ctrl_offset, PCI_MSIX_FLAGS_ENABLE,
+                           enabled);
+}
+
+static int xen_pt_msix_update_one(XenPCIPassthroughState *s, int entry_nr)
+{
+    XenPTMSIXEntry *entry = NULL;
+    int pirq;
+    int rc;
+
+    if (entry_nr < 0 || entry_nr >= s->msix->total_entries) {
+        return -EINVAL;
+    }
+
+    entry = &s->msix->msix_entry[entry_nr];
+
+    if (!entry->updated) {
+        return 0;
+    }
+
+    pirq = entry->pirq;
+
+    rc = msi_msix_setup(s, entry->data, entry->data, &pirq, true, entry_nr,
+                        entry->pirq == XEN_PT_UNASSIGNED_PIRQ);
+    if (rc) {
+        return rc;
+    }
+    if (entry->pirq == XEN_PT_UNASSIGNED_PIRQ) {
+        entry->pirq = pirq;
+    }
+
+    rc = msi_msix_update(s, entry->addr, entry->data, pirq, true,
+                         entry_nr, &entry->pirq);
+
+    if (!rc) {
+        entry->updated = false;
+    }
+
+    return rc;
+}
+
+int xen_pt_msix_update(XenPCIPassthroughState *s)
+{
+    XenPTMSIX *msix = s->msix;
+    int i;
+
+    for (i = 0; i < msix->total_entries; i++) {
+        xen_pt_msix_update_one(s, i);
+    }
+
+    return 0;
+}
+
+void xen_pt_msix_disable(XenPCIPassthroughState *s)
+{
+    int i = 0;
+
+    msix_set_enable(s, false);
+
+    for (i = 0; i < s->msix->total_entries; i++) {
+        XenPTMSIXEntry *entry = &s->msix->msix_entry[i];
+
+        msi_msix_disable(s, entry->addr, entry->data, entry->pirq, true, true);
+
+        /* clear MSI-X info */
+        entry->pirq = XEN_PT_UNASSIGNED_PIRQ;
+        entry->updated = false;
+    }
+}
+
+int xen_pt_msix_update_remap(XenPCIPassthroughState *s, int bar_index)
+{
+    XenPTMSIXEntry *entry;
+    int i, ret;
+
+    if (!(s->msix && s->msix->bar_index == bar_index)) {
+        return 0;
+    }
+
+    for (i = 0; i < s->msix->total_entries; i++) {
+        entry = &s->msix->msix_entry[i];
+        if (entry->pirq != XEN_PT_UNASSIGNED_PIRQ) {
+            ret = xc_domain_unbind_pt_irq(xen_xc, xen_domid, entry->pirq,
+                                          PT_IRQ_TYPE_MSI, 0, 0, 0, 0);
+            if (ret) {
+                XEN_PT_ERR(&s->dev, "unbind MSI-X entry %d failed\n",
+                           entry->pirq);
+            }
+            entry->updated = true;
+        }
+    }
+    return xen_pt_msix_update(s);
+}
+
+static uint32_t get_entry_value(XenPTMSIXEntry *e, int offset)
+{
+    switch (offset) {
+    case PCI_MSIX_ENTRY_LOWER_ADDR:
+        return e->addr & UINT32_MAX;
+    case PCI_MSIX_ENTRY_UPPER_ADDR:
+        return e->addr >> 32;
+    case PCI_MSIX_ENTRY_DATA:
+        return e->data;
+    case PCI_MSIX_ENTRY_VECTOR_CTRL:
+        return e->vector_ctrl;
+    default:
+        return 0;
+    }
+}
+
+static void set_entry_value(XenPTMSIXEntry *e, int offset, uint32_t val)
+{
+    switch (offset) {
+    case PCI_MSIX_ENTRY_LOWER_ADDR:
+        e->addr = (e->addr & ((uint64_t)UINT32_MAX << 32)) | val;
+        break;
+    case PCI_MSIX_ENTRY_UPPER_ADDR:
+        e->addr = (uint64_t)val << 32 | (e->addr & UINT32_MAX);
+        break;
+    case PCI_MSIX_ENTRY_DATA:
+        e->data = val;
+        break;
+    case PCI_MSIX_ENTRY_VECTOR_CTRL:
+        e->vector_ctrl = val;
+        break;
+    }
+}
+
+static void pci_msix_write(void *opaque, target_phys_addr_t addr,
+                           uint64_t val, unsigned size)
+{
+    XenPCIPassthroughState *s = opaque;
+    XenPTMSIX *msix = s->msix;
+    XenPTMSIXEntry *entry;
+    int entry_nr, offset;
+
+    entry_nr = addr / PCI_MSIX_ENTRY_SIZE;
+    if (entry_nr < 0 || entry_nr >= msix->total_entries) {
+        XEN_PT_ERR(&s->dev, "asked MSI-X entry '%i' invalid!\n", entry_nr);
+        return;
+    }
+    entry = &msix->msix_entry[entry_nr];
+    offset = addr % PCI_MSIX_ENTRY_SIZE;
+
+    if (offset != PCI_MSIX_ENTRY_VECTOR_CTRL) {
+        const volatile uint32_t *vec_ctrl;
+
+        if (get_entry_value(entry, offset) == val) {
+            return;
+        }
+
+        /*
+         * If Xen intercepts the mask bit access, entry->vec_ctrl may not be
+         * up-to-date. Read from hardware directly.
+         */
+        vec_ctrl = s->msix->phys_iomem_base + entry_nr * PCI_MSIX_ENTRY_SIZE
+            + PCI_MSIX_ENTRY_VECTOR_CTRL;
+
+        if (msix->enabled && !(*vec_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT)) {
+            XEN_PT_ERR(&s->dev, "Can't update msix entry %d since MSI-X is"
+                       " already enabled.\n", entry_nr);
+            return;
+        }
+
+        entry->updated = true;
+    }
+
+    set_entry_value(entry, offset, val);
+
+    if (offset == PCI_MSIX_ENTRY_VECTOR_CTRL) {
+        if (msix->enabled && !(val & PCI_MSIX_ENTRY_CTRL_MASKBIT)) {
+            xen_pt_msix_update_one(s, entry_nr);
+        }
+    }
+}
+
+static uint64_t pci_msix_read(void *opaque, target_phys_addr_t addr,
+                              unsigned size)
+{
+    XenPCIPassthroughState *s = opaque;
+    XenPTMSIX *msix = s->msix;
+    int entry_nr, offset;
+
+    entry_nr = addr / PCI_MSIX_ENTRY_SIZE;
+    if (entry_nr < 0) {
+        XEN_PT_ERR(&s->dev, "asked MSI-X entry '%i' invalid!\n", entry_nr);
+        return 0;
+    }
+
+    offset = addr % PCI_MSIX_ENTRY_SIZE;
+
+    if (addr < msix->total_entries * PCI_MSIX_ENTRY_SIZE) {
+        return get_entry_value(&msix->msix_entry[entry_nr], offset);
+    } else {
+        /* Pending Bit Array (PBA) */
+        return *(uint32_t *)(msix->phys_iomem_base + addr);
+    }
+}
+
+static const MemoryRegionOps pci_msix_ops = {
+    .read = pci_msix_read,
+    .write = pci_msix_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+        .unaligned = false,
+    },
+};
+
+int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base)
+{
+    uint8_t id = 0;
+    uint16_t control = 0;
+    uint32_t table_off = 0;
+    int i, total_entries, bar_index;
+    XenHostPCIDevice *hd = &s->real_device;
+    PCIDevice *d = &s->dev;
+    int fd = -1;
+    XenPTMSIX *msix = NULL;
+    int rc = 0;
+
+    rc = xen_host_pci_get_byte(hd, base + PCI_CAP_LIST_ID, &id);
+    if (rc) {
+        return rc;
+    }
+
+    if (id != PCI_CAP_ID_MSIX) {
+        XEN_PT_ERR(d, "Invalid id %#x base %#x\n", id, base);
+        return -1;
+    }
+
+    xen_host_pci_get_word(hd, base + PCI_MSIX_FLAGS, &control);
+    total_entries = control & PCI_MSIX_FLAGS_QSIZE;
+    total_entries += 1;
+
+    s->msix = g_malloc0(sizeof (XenPTMSIX)
+                        + total_entries * sizeof (XenPTMSIXEntry));
+    msix = s->msix;
+
+    msix->total_entries = total_entries;
+    for (i = 0; i < total_entries; i++) {
+        msix->msix_entry[i].pirq = XEN_PT_UNASSIGNED_PIRQ;
+    }
+
+    memory_region_init_io(&msix->mmio, &pci_msix_ops, s, "xen-pci-pt-msix",
+                          (total_entries * PCI_MSIX_ENTRY_SIZE
+                           + XC_PAGE_SIZE - 1)
+                          & XC_PAGE_MASK);
+
+    xen_host_pci_get_long(hd, base + PCI_MSIX_TABLE, &table_off);
+    bar_index = msix->bar_index = table_off & PCI_MSIX_FLAGS_BIRMASK;
+    table_off = table_off & ~PCI_MSIX_FLAGS_BIRMASK;
+    msix->table_base = s->real_device.io_regions[bar_index].base_addr;
+    XEN_PT_LOG(d, "get MSI-X table BAR base 0x%"PRIx64"\n", msix->table_base);
+
+    fd = open("/dev/mem", O_RDWR);
+    if (fd == -1) {
+        rc = -errno;
+        XEN_PT_ERR(d, "Can't open /dev/mem: %s\n", strerror(errno));
+        goto error_out;
+    }
+    XEN_PT_LOG(d, "table_off = %#x, total_entries = %d\n",
+               table_off, total_entries);
+    msix->table_offset_adjust = table_off & 0x0fff;
+    msix->phys_iomem_base =
+        mmap(NULL,
+             total_entries * PCI_MSIX_ENTRY_SIZE + msix->table_offset_adjust,
+             PROT_READ,
+             MAP_SHARED | MAP_LOCKED,
+             fd,
+             msix->table_base + table_off - msix->table_offset_adjust);
+    close(fd);
+    if (msix->phys_iomem_base == MAP_FAILED) {
+        rc = -errno;
+        XEN_PT_ERR(d, "Can't map physical MSI-X table: %s\n", strerror(errno));
+        goto error_out;
+    }
+    msix->phys_iomem_base = (char *)msix->phys_iomem_base
+        + msix->table_offset_adjust;
+
+    XEN_PT_LOG(d, "mapping physical MSI-X table to %p\n",
+               msix->phys_iomem_base);
+
+    memory_region_add_subregion_overlap(&s->bar[bar_index], table_off,
+                                        &msix->mmio,
+                                        2); /* Priority: pci default + 1 */
+
+    return 0;
+
+error_out:
+    memory_region_destroy(&msix->mmio);
+    g_free(s->msix);
+    s->msix = NULL;
+    return rc;
+}
+
+void xen_pt_msix_delete(XenPCIPassthroughState *s)
+{
+    XenPTMSIX *msix = s->msix;
+
+    if (!msix) {
+        return;
+    }
+
+    /* unmap the MSI-X memory mapped register area */
+    if (msix->phys_iomem_base) {
+        XEN_PT_LOG(&s->dev, "unmapping physical MSI-X table from %p\n",
+                   msix->phys_iomem_base);
+        munmap(msix->phys_iomem_base, msix->total_entries * PCI_MSIX_ENTRY_SIZE
+               + msix->table_offset_adjust);
+    }
+
+    memory_region_del_subregion(&s->bar[msix->bar_index], &msix->mmio);
+    memory_region_destroy(&msix->mmio);
+
+    g_free(s->msix);
+    s->msix = NULL;
+}
-- 
Anthony PERARD

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH V13 1/9] pci_ids: Add INTEL_82599_SFP_VF id.
  2012-06-14 17:01   ` Anthony PERARD
  (?)
  (?)
@ 2012-06-14 19:37   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14 19:37 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Kiszka, Anthony Liguori, Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:41PM +0100, Anthony PERARD wrote:
> We are using this in our quirk lookup provided by patch
> titled: Introduce Xen PCI Passthrough, PCI config space helpers.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  hw/pci_ids.h |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci_ids.h b/hw/pci_ids.h
> index e8235a7..649e6b3 100644
> --- a/hw/pci_ids.h
> +++ b/hw/pci_ids.h
> @@ -118,6 +118,7 @@
>  #define PCI_DEVICE_ID_INTEL_82801I_UHCI6 0x2939
>  #define PCI_DEVICE_ID_INTEL_82801I_EHCI1 0x293a
>  #define PCI_DEVICE_ID_INTEL_82801I_EHCI2 0x293c
> +#define PCI_DEVICE_ID_INTEL_82599_SFP_VF 0x10ed
>  
>  #define PCI_VENDOR_ID_XEN               0x5853
>  #define PCI_DEVICE_ID_XEN_PLATFORM      0x0001
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V13 1/9] pci_ids: Add INTEL_82599_SFP_VF id.
  2012-06-14 17:01   ` Anthony PERARD
  (?)
@ 2012-06-14 19:37   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14 19:37 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Kiszka, Anthony Liguori, Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:41PM +0100, Anthony PERARD wrote:
> We are using this in our quirk lookup provided by patch
> titled: Introduce Xen PCI Passthrough, PCI config space helpers.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  hw/pci_ids.h |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci_ids.h b/hw/pci_ids.h
> index e8235a7..649e6b3 100644
> --- a/hw/pci_ids.h
> +++ b/hw/pci_ids.h
> @@ -118,6 +118,7 @@
>  #define PCI_DEVICE_ID_INTEL_82801I_UHCI6 0x2939
>  #define PCI_DEVICE_ID_INTEL_82801I_EHCI1 0x293a
>  #define PCI_DEVICE_ID_INTEL_82801I_EHCI2 0x293c
> +#define PCI_DEVICE_ID_INTEL_82599_SFP_VF 0x10ed
>  
>  #define PCI_VENDOR_ID_XEN               0x5853
>  #define PCI_DEVICE_ID_XEN_PLATFORM      0x0001
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device.
  2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device Anthony PERARD
  2012-06-14 19:38   ` Michael S. Tsirkin
@ 2012-06-14 19:38   ` Michael S. Tsirkin
  2012-06-14 19:54   ` [Qemu-devel] [Xen-devel] " Konrad Rzeszutek Wilk
  2012-06-14 19:54   ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14 19:38 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Kiszka, Anthony Liguori, Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:44PM +0100, Anthony PERARD wrote:
> The purpose is to have a more generic pci_for_each_device by passing an extra
> argument to the function called on every device.
> 
> This patch will be used in a next patch.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  hw/pci.c          |   11 +++++++----
>  hw/pci.h          |    4 +++-
>  hw/xen_platform.c |    8 ++++----
>  3 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 127b7ac..d6537e3 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1127,7 +1127,9 @@ static const pci_class_desc pci_class_descriptions[] =
>  };
>  
>  static void pci_for_each_device_under_bus(PCIBus *bus,
> -                                          void (*fn)(PCIBus *b, PCIDevice *d))
> +                                          void (*fn)(PCIBus *b, PCIDevice *d,
> +                                                     void *opaque),
> +                                          void *opaque)
>  {
>      PCIDevice *d;
>      int devfn;
> @@ -1135,18 +1137,19 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
>      for(devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
>          d = bus->devices[devfn];
>          if (d) {
> -            fn(bus, d);
> +            fn(bus, d, opaque);
>          }
>      }
>  }
>  
>  void pci_for_each_device(PCIBus *bus, int bus_num,
> -                         void (*fn)(PCIBus *b, PCIDevice *d))
> +                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
> +                         void *opaque)
>  {
>      bus = pci_find_bus_nr(bus, bus_num);
>  
>      if (bus) {
> -        pci_for_each_device_under_bus(bus, fn);
> +        pci_for_each_device_under_bus(bus, fn, opaque);
>      }
>  }
>  
> diff --git a/hw/pci.h b/hw/pci.h
> index 7f223c0..95b608c 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -312,7 +312,9 @@ PCIDevice *pci_nic_init(NICInfo *nd, const char *default_model,
>  PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char *default_model,
>                                 const char *default_devaddr);
>  int pci_bus_num(PCIBus *s);
> -void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *bus, PCIDevice *d));
> +void pci_for_each_device(PCIBus *bus, int bus_num,
> +                         void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
> +                         void *opaque);
>  PCIBus *pci_find_root_bus(int domain);
>  int pci_find_domain(const PCIBus *bus);
>  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn);
> diff --git a/hw/xen_platform.c b/hw/xen_platform.c
> index 0214f37..c1fe984 100644
> --- a/hw/xen_platform.c
> +++ b/hw/xen_platform.c
> @@ -83,7 +83,7 @@ static void log_writeb(PCIXenPlatformState *s, char val)
>  #define UNPLUG_ALL_NICS 2
>  #define UNPLUG_AUX_IDE_DISKS 4
>  
> -static void unplug_nic(PCIBus *b, PCIDevice *d)
> +static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
>  {
>      if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
>              PCI_CLASS_NETWORK_ETHERNET) {
> @@ -96,10 +96,10 @@ static void unplug_nic(PCIBus *b, PCIDevice *d)
>  
>  static void pci_unplug_nics(PCIBus *bus)
>  {
> -    pci_for_each_device(bus, 0, unplug_nic);
> +    pci_for_each_device(bus, 0, unplug_nic, NULL);
>  }
>  
> -static void unplug_disks(PCIBus *b, PCIDevice *d)
> +static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
>  {
>      if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
>              PCI_CLASS_STORAGE_IDE) {
> @@ -109,7 +109,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d)
>  
>  static void pci_unplug_disks(PCIBus *bus)
>  {
> -    pci_for_each_device(bus, 0, unplug_disks);
> +    pci_for_each_device(bus, 0, unplug_disks, NULL);
>  }
>  
>  static void platform_fixed_ioport_writew(void *opaque, uint32_t addr, uint32_t val)
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device.
  2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device Anthony PERARD
@ 2012-06-14 19:38   ` Michael S. Tsirkin
  2012-06-14 19:38   ` [Qemu-devel] " Michael S. Tsirkin
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14 19:38 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Kiszka, Anthony Liguori, Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:44PM +0100, Anthony PERARD wrote:
> The purpose is to have a more generic pci_for_each_device by passing an extra
> argument to the function called on every device.
> 
> This patch will be used in a next patch.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  hw/pci.c          |   11 +++++++----
>  hw/pci.h          |    4 +++-
>  hw/xen_platform.c |    8 ++++----
>  3 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 127b7ac..d6537e3 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1127,7 +1127,9 @@ static const pci_class_desc pci_class_descriptions[] =
>  };
>  
>  static void pci_for_each_device_under_bus(PCIBus *bus,
> -                                          void (*fn)(PCIBus *b, PCIDevice *d))
> +                                          void (*fn)(PCIBus *b, PCIDevice *d,
> +                                                     void *opaque),
> +                                          void *opaque)
>  {
>      PCIDevice *d;
>      int devfn;
> @@ -1135,18 +1137,19 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
>      for(devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
>          d = bus->devices[devfn];
>          if (d) {
> -            fn(bus, d);
> +            fn(bus, d, opaque);
>          }
>      }
>  }
>  
>  void pci_for_each_device(PCIBus *bus, int bus_num,
> -                         void (*fn)(PCIBus *b, PCIDevice *d))
> +                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
> +                         void *opaque)
>  {
>      bus = pci_find_bus_nr(bus, bus_num);
>  
>      if (bus) {
> -        pci_for_each_device_under_bus(bus, fn);
> +        pci_for_each_device_under_bus(bus, fn, opaque);
>      }
>  }
>  
> diff --git a/hw/pci.h b/hw/pci.h
> index 7f223c0..95b608c 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -312,7 +312,9 @@ PCIDevice *pci_nic_init(NICInfo *nd, const char *default_model,
>  PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char *default_model,
>                                 const char *default_devaddr);
>  int pci_bus_num(PCIBus *s);
> -void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *bus, PCIDevice *d));
> +void pci_for_each_device(PCIBus *bus, int bus_num,
> +                         void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
> +                         void *opaque);
>  PCIBus *pci_find_root_bus(int domain);
>  int pci_find_domain(const PCIBus *bus);
>  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn);
> diff --git a/hw/xen_platform.c b/hw/xen_platform.c
> index 0214f37..c1fe984 100644
> --- a/hw/xen_platform.c
> +++ b/hw/xen_platform.c
> @@ -83,7 +83,7 @@ static void log_writeb(PCIXenPlatformState *s, char val)
>  #define UNPLUG_ALL_NICS 2
>  #define UNPLUG_AUX_IDE_DISKS 4
>  
> -static void unplug_nic(PCIBus *b, PCIDevice *d)
> +static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
>  {
>      if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
>              PCI_CLASS_NETWORK_ETHERNET) {
> @@ -96,10 +96,10 @@ static void unplug_nic(PCIBus *b, PCIDevice *d)
>  
>  static void pci_unplug_nics(PCIBus *bus)
>  {
> -    pci_for_each_device(bus, 0, unplug_nic);
> +    pci_for_each_device(bus, 0, unplug_nic, NULL);
>  }
>  
> -static void unplug_disks(PCIBus *b, PCIDevice *d)
> +static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
>  {
>      if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
>              PCI_CLASS_STORAGE_IDE) {
> @@ -109,7 +109,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d)
>  
>  static void pci_unplug_disks(PCIBus *bus)
>  {
> -    pci_for_each_device(bus, 0, unplug_disks);
> +    pci_for_each_device(bus, 0, unplug_disks, NULL);
>  }
>  
>  static void platform_fixed_ioport_writew(void *opaque, uint32_t addr, uint32_t val)
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH V13 5/9] qdev-properties: Introduce pci-host-devaddr.
  2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
  2012-06-14 19:39   ` Michael S. Tsirkin
@ 2012-06-14 19:39   ` Michael S. Tsirkin
  1 sibling, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14 19:39 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Kiszka, Anthony Liguori, Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:45PM +0100, Anthony PERARD wrote:
> This new property will be used to specify a host pci device address.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  hw/qdev-properties.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/qdev.h            |    3 +
>  qemu-common.h        |    7 +++
>  3 files changed, 117 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
> index 9ae3187..43e1964 100644
> --- a/hw/qdev-properties.c
> +++ b/hw/qdev-properties.c
> @@ -899,6 +899,113 @@ PropertyInfo qdev_prop_blocksize = {
>      .set   = set_blocksize,
>  };
>  
> +/* --- pci host address --- */
> +
> +static void get_pci_host_devaddr(Object *obj, Visitor *v, void *opaque,
> +                                 const char *name, Error **errp)
> +{
> +    DeviceState *dev = DEVICE(obj);
> +    Property *prop = opaque;
> +    PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
> +    char buffer[] = "xxxx:xx:xx.x";
> +    char *p = buffer;
> +    int rc = 0;
> +
> +    rc = snprintf(buffer, sizeof(buffer), "%04x:%02x:%02x.%d",
> +                  addr->domain, addr->bus, addr->slot, addr->function);
> +    assert(rc == sizeof(buffer) - 1);
> +
> +    visit_type_str(v, &p, name, errp);
> +}
> +
> +/*
> + * Parse [<domain>:]<bus>:<slot>.<func>
> + *   if <domain> is not supplied, it's assumed to be 0.
> + */
> +static void set_pci_host_devaddr(Object *obj, Visitor *v, void *opaque,
> +                                 const char *name, Error **errp)
> +{
> +    DeviceState *dev = DEVICE(obj);
> +    Property *prop = opaque;
> +    PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
> +    Error *local_err = NULL;
> +    char *str, *p;
> +    char *e;
> +    unsigned long val;
> +    unsigned long dom = 0, bus = 0;
> +    unsigned int slot = 0, func = 0;
> +
> +    if (dev->state != DEV_STATE_CREATED) {
> +        error_set(errp, QERR_PERMISSION_DENIED);
> +        return;
> +    }
> +
> +    visit_type_str(v, &str, name, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    p = str;
> +    val = strtoul(p, &e, 16);
> +    if (e == p || *e != ':') {
> +        goto inval;
> +    }
> +    bus = val;
> +
> +    p = e + 1;
> +    val = strtoul(p, &e, 16);
> +    if (e == p) {
> +        goto inval;
> +    }
> +    if (*e == ':') {
> +        dom = bus;
> +        bus = val;
> +        p = e + 1;
> +        val = strtoul(p, &e, 16);
> +        if (e == p) {
> +            goto inval;
> +        }
> +    }
> +    slot = val;
> +
> +    if (*e != '.') {
> +        goto inval;
> +    }
> +    p = e + 1;
> +    val = strtoul(p, &e, 10);
> +    if (e == p) {
> +        goto inval;
> +    }
> +    func = val;
> +
> +    if (dom > 0xffff || bus > 0xff || slot > 0x1f || func > 7) {
> +        goto inval;
> +    }
> +
> +    if (*e) {
> +        goto inval;
> +    }
> +
> +    addr->domain = dom;
> +    addr->bus = bus;
> +    addr->slot = slot;
> +    addr->function = func;
> +
> +    g_free(str);
> +    return;
> +
> +inval:
> +    error_set_from_qdev_prop_error(errp, EINVAL, dev, prop, str);
> +    g_free(str);
> +}
> +
> +PropertyInfo qdev_prop_pci_host_devaddr = {
> +    .name = "pci-host-devaddr",
> +    .get = get_pci_host_devaddr,
> +    .set = set_pci_host_devaddr,
> +};
> +
>  /* --- public helpers --- */
>  
>  static Property *qdev_prop_walk(Property *props, const char *name)
> diff --git a/hw/qdev.h b/hw/qdev.h
> index 5386b16..8746f84 100644
> --- a/hw/qdev.h
> +++ b/hw/qdev.h
> @@ -223,6 +223,7 @@ extern PropertyInfo qdev_prop_netdev;
>  extern PropertyInfo qdev_prop_vlan;
>  extern PropertyInfo qdev_prop_pci_devfn;
>  extern PropertyInfo qdev_prop_blocksize;
> +extern PropertyInfo qdev_prop_pci_host_devaddr;
>  
>  #define DEFINE_PROP(_name, _state, _field, _prop, _type) { \
>          .name      = (_name),                                    \
> @@ -286,6 +287,8 @@ extern PropertyInfo qdev_prop_blocksize;
>                          LostTickPolicy)
>  #define DEFINE_PROP_BLOCKSIZE(_n, _s, _f, _d) \
>      DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_blocksize, uint16_t)
> +#define DEFINE_PROP_PCI_HOST_DEVADDR(_n, _s, _f) \
> +    DEFINE_PROP(_n, _s, _f, qdev_prop_pci_host_devaddr, PCIHostDeviceAddress)
>  
>  #define DEFINE_PROP_END_OF_LIST()               \
>      {}
> diff --git a/qemu-common.h b/qemu-common.h
> index 91e0562..0d6e51c 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -274,6 +274,13 @@ typedef enum LostTickPolicy {
>      LOST_TICK_MAX
>  } LostTickPolicy;
>  
> +typedef struct PCIHostDeviceAddress {
> +    unsigned int domain;
> +    unsigned int bus;
> +    unsigned int slot;
> +    unsigned int function;
> +} PCIHostDeviceAddress;
> +
>  void tcg_exec_init(unsigned long tb_size);
>  bool tcg_enabled(void);
>  
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V13 5/9] qdev-properties: Introduce pci-host-devaddr.
  2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
@ 2012-06-14 19:39   ` Michael S. Tsirkin
  2012-06-14 19:39   ` [Qemu-devel] " Michael S. Tsirkin
  1 sibling, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14 19:39 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Kiszka, Anthony Liguori, Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:45PM +0100, Anthony PERARD wrote:
> This new property will be used to specify a host pci device address.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  hw/qdev-properties.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/qdev.h            |    3 +
>  qemu-common.h        |    7 +++
>  3 files changed, 117 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
> index 9ae3187..43e1964 100644
> --- a/hw/qdev-properties.c
> +++ b/hw/qdev-properties.c
> @@ -899,6 +899,113 @@ PropertyInfo qdev_prop_blocksize = {
>      .set   = set_blocksize,
>  };
>  
> +/* --- pci host address --- */
> +
> +static void get_pci_host_devaddr(Object *obj, Visitor *v, void *opaque,
> +                                 const char *name, Error **errp)
> +{
> +    DeviceState *dev = DEVICE(obj);
> +    Property *prop = opaque;
> +    PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
> +    char buffer[] = "xxxx:xx:xx.x";
> +    char *p = buffer;
> +    int rc = 0;
> +
> +    rc = snprintf(buffer, sizeof(buffer), "%04x:%02x:%02x.%d",
> +                  addr->domain, addr->bus, addr->slot, addr->function);
> +    assert(rc == sizeof(buffer) - 1);
> +
> +    visit_type_str(v, &p, name, errp);
> +}
> +
> +/*
> + * Parse [<domain>:]<bus>:<slot>.<func>
> + *   if <domain> is not supplied, it's assumed to be 0.
> + */
> +static void set_pci_host_devaddr(Object *obj, Visitor *v, void *opaque,
> +                                 const char *name, Error **errp)
> +{
> +    DeviceState *dev = DEVICE(obj);
> +    Property *prop = opaque;
> +    PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
> +    Error *local_err = NULL;
> +    char *str, *p;
> +    char *e;
> +    unsigned long val;
> +    unsigned long dom = 0, bus = 0;
> +    unsigned int slot = 0, func = 0;
> +
> +    if (dev->state != DEV_STATE_CREATED) {
> +        error_set(errp, QERR_PERMISSION_DENIED);
> +        return;
> +    }
> +
> +    visit_type_str(v, &str, name, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    p = str;
> +    val = strtoul(p, &e, 16);
> +    if (e == p || *e != ':') {
> +        goto inval;
> +    }
> +    bus = val;
> +
> +    p = e + 1;
> +    val = strtoul(p, &e, 16);
> +    if (e == p) {
> +        goto inval;
> +    }
> +    if (*e == ':') {
> +        dom = bus;
> +        bus = val;
> +        p = e + 1;
> +        val = strtoul(p, &e, 16);
> +        if (e == p) {
> +            goto inval;
> +        }
> +    }
> +    slot = val;
> +
> +    if (*e != '.') {
> +        goto inval;
> +    }
> +    p = e + 1;
> +    val = strtoul(p, &e, 10);
> +    if (e == p) {
> +        goto inval;
> +    }
> +    func = val;
> +
> +    if (dom > 0xffff || bus > 0xff || slot > 0x1f || func > 7) {
> +        goto inval;
> +    }
> +
> +    if (*e) {
> +        goto inval;
> +    }
> +
> +    addr->domain = dom;
> +    addr->bus = bus;
> +    addr->slot = slot;
> +    addr->function = func;
> +
> +    g_free(str);
> +    return;
> +
> +inval:
> +    error_set_from_qdev_prop_error(errp, EINVAL, dev, prop, str);
> +    g_free(str);
> +}
> +
> +PropertyInfo qdev_prop_pci_host_devaddr = {
> +    .name = "pci-host-devaddr",
> +    .get = get_pci_host_devaddr,
> +    .set = set_pci_host_devaddr,
> +};
> +
>  /* --- public helpers --- */
>  
>  static Property *qdev_prop_walk(Property *props, const char *name)
> diff --git a/hw/qdev.h b/hw/qdev.h
> index 5386b16..8746f84 100644
> --- a/hw/qdev.h
> +++ b/hw/qdev.h
> @@ -223,6 +223,7 @@ extern PropertyInfo qdev_prop_netdev;
>  extern PropertyInfo qdev_prop_vlan;
>  extern PropertyInfo qdev_prop_pci_devfn;
>  extern PropertyInfo qdev_prop_blocksize;
> +extern PropertyInfo qdev_prop_pci_host_devaddr;
>  
>  #define DEFINE_PROP(_name, _state, _field, _prop, _type) { \
>          .name      = (_name),                                    \
> @@ -286,6 +287,8 @@ extern PropertyInfo qdev_prop_blocksize;
>                          LostTickPolicy)
>  #define DEFINE_PROP_BLOCKSIZE(_n, _s, _f, _d) \
>      DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_blocksize, uint16_t)
> +#define DEFINE_PROP_PCI_HOST_DEVADDR(_n, _s, _f) \
> +    DEFINE_PROP(_n, _s, _f, qdev_prop_pci_host_devaddr, PCIHostDeviceAddress)
>  
>  #define DEFINE_PROP_END_OF_LIST()               \
>      {}
> diff --git a/qemu-common.h b/qemu-common.h
> index 91e0562..0d6e51c 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -274,6 +274,13 @@ typedef enum LostTickPolicy {
>      LOST_TICK_MAX
>  } LostTickPolicy;
>  
> +typedef struct PCIHostDeviceAddress {
> +    unsigned int domain;
> +    unsigned int bus;
> +    unsigned int slot;
> +    unsigned int function;
> +} PCIHostDeviceAddress;
> +
>  void tcg_exec_init(unsigned long tb_size);
>  bool tcg_enabled(void);
>  
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH V13 8/9] Introduce apic-msidef.h
  2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
  2012-06-14 19:40   ` Michael S. Tsirkin
@ 2012-06-14 19:40   ` Michael S. Tsirkin
  1 sibling, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14 19:40 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Kiszka, Anthony Liguori, Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:48PM +0100, Anthony PERARD wrote:
> This patch move the msi definition from apic.c to apic-msidef.h. So it can be
> used also by other .c files.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  hw/apic-msidef.h |   30 ++++++++++++++++++++++++++++++
>  hw/apic.c        |   11 +----------
>  2 files changed, 31 insertions(+), 10 deletions(-)
>  create mode 100644 hw/apic-msidef.h
> 
> diff --git a/hw/apic-msidef.h b/hw/apic-msidef.h
> new file mode 100644
> index 0000000..6e2eb71
> --- /dev/null
> +++ b/hw/apic-msidef.h
> @@ -0,0 +1,30 @@
> +#ifndef HW_APIC_MSIDEF_H
> +#define HW_APIC_MSIDEF_H
> +
> +/*
> + * Intel APIC constants: from include/asm/msidef.h
> + */
> +
> +/*
> + * Shifts for MSI data
> + */
> +
> +#define MSI_DATA_VECTOR_SHIFT           0
> +#define  MSI_DATA_VECTOR_MASK           0x000000ff
> +
> +#define MSI_DATA_DELIVERY_MODE_SHIFT    8
> +#define MSI_DATA_LEVEL_SHIFT            14
> +#define MSI_DATA_TRIGGER_SHIFT          15
> +
> +/*
> + * Shift/mask fields for msi address
> + */
> +
> +#define MSI_ADDR_DEST_MODE_SHIFT        2
> +
> +#define MSI_ADDR_REDIRECTION_SHIFT      3
> +
> +#define MSI_ADDR_DEST_ID_SHIFT          12
> +#define  MSI_ADDR_DEST_ID_MASK          0x00ffff0
> +
> +#endif /* HW_APIC_MSIDEF_H */
> diff --git a/hw/apic.c b/hw/apic.c
> index 5fbf01c..60552df 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -23,19 +23,10 @@
>  #include "host-utils.h"
>  #include "trace.h"
>  #include "pc.h"
> +#include "apic-msidef.h"
>  
>  #define MAX_APIC_WORDS 8
>  
> -/* Intel APIC constants: from include/asm/msidef.h */
> -#define MSI_DATA_VECTOR_SHIFT		0
> -#define MSI_DATA_VECTOR_MASK		0x000000ff
> -#define MSI_DATA_DELIVERY_MODE_SHIFT	8
> -#define MSI_DATA_TRIGGER_SHIFT		15
> -#define MSI_DATA_LEVEL_SHIFT		14
> -#define MSI_ADDR_DEST_MODE_SHIFT	2
> -#define MSI_ADDR_DEST_ID_SHIFT		12
> -#define	MSI_ADDR_DEST_ID_MASK		0x00ffff0
> -
>  #define SYNC_FROM_VAPIC                 0x1
>  #define SYNC_TO_VAPIC                   0x2
>  #define SYNC_ISR_IRR_TO_VAPIC           0x4
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V13 8/9] Introduce apic-msidef.h
  2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
@ 2012-06-14 19:40   ` Michael S. Tsirkin
  2012-06-14 19:40   ` [Qemu-devel] " Michael S. Tsirkin
  1 sibling, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14 19:40 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Kiszka, Anthony Liguori, Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:48PM +0100, Anthony PERARD wrote:
> This patch move the msi definition from apic.c to apic-msidef.h. So it can be
> used also by other .c files.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  hw/apic-msidef.h |   30 ++++++++++++++++++++++++++++++
>  hw/apic.c        |   11 +----------
>  2 files changed, 31 insertions(+), 10 deletions(-)
>  create mode 100644 hw/apic-msidef.h
> 
> diff --git a/hw/apic-msidef.h b/hw/apic-msidef.h
> new file mode 100644
> index 0000000..6e2eb71
> --- /dev/null
> +++ b/hw/apic-msidef.h
> @@ -0,0 +1,30 @@
> +#ifndef HW_APIC_MSIDEF_H
> +#define HW_APIC_MSIDEF_H
> +
> +/*
> + * Intel APIC constants: from include/asm/msidef.h
> + */
> +
> +/*
> + * Shifts for MSI data
> + */
> +
> +#define MSI_DATA_VECTOR_SHIFT           0
> +#define  MSI_DATA_VECTOR_MASK           0x000000ff
> +
> +#define MSI_DATA_DELIVERY_MODE_SHIFT    8
> +#define MSI_DATA_LEVEL_SHIFT            14
> +#define MSI_DATA_TRIGGER_SHIFT          15
> +
> +/*
> + * Shift/mask fields for msi address
> + */
> +
> +#define MSI_ADDR_DEST_MODE_SHIFT        2
> +
> +#define MSI_ADDR_REDIRECTION_SHIFT      3
> +
> +#define MSI_ADDR_DEST_ID_SHIFT          12
> +#define  MSI_ADDR_DEST_ID_MASK          0x00ffff0
> +
> +#endif /* HW_APIC_MSIDEF_H */
> diff --git a/hw/apic.c b/hw/apic.c
> index 5fbf01c..60552df 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -23,19 +23,10 @@
>  #include "host-utils.h"
>  #include "trace.h"
>  #include "pc.h"
> +#include "apic-msidef.h"
>  
>  #define MAX_APIC_WORDS 8
>  
> -/* Intel APIC constants: from include/asm/msidef.h */
> -#define MSI_DATA_VECTOR_SHIFT		0
> -#define MSI_DATA_VECTOR_MASK		0x000000ff
> -#define MSI_DATA_DELIVERY_MODE_SHIFT	8
> -#define MSI_DATA_TRIGGER_SHIFT		15
> -#define MSI_DATA_LEVEL_SHIFT		14
> -#define MSI_ADDR_DEST_MODE_SHIFT	2
> -#define MSI_ADDR_DEST_ID_SHIFT		12
> -#define	MSI_ADDR_DEST_ID_MASK		0x00ffff0
> -
>  #define SYNC_FROM_VAPIC                 0x1
>  #define SYNC_TO_VAPIC                   0x2
>  #define SYNC_ISR_IRR_TO_VAPIC           0x4
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
                   ` (14 preceding siblings ...)
  2012-06-14 19:41 ` [PATCH V13 0/9] Xen PCI Passthrough Michael S. Tsirkin
@ 2012-06-14 19:41 ` Michael S. Tsirkin
  2012-06-15 11:05   ` Stefano Stabellini
  2012-06-15 11:05   ` [Qemu-devel] " Stefano Stabellini
  15 siblings, 2 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14 19:41 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Kiszka, Anthony Liguori, Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:40PM +0100, Anthony PERARD wrote:
> Hi all,
> 
> This patch series introduces the PCI passthrough for Xen.
> 
> Changes since the last version:
>   - New patch that introduce a new qdev-property pci-host-devaddr.
>   => the "export pci_parse_devaddr" patch is not anymore usefull.
> 
> Thanks,

I reviewed some patches and Acked. Won't have the time to
review the rest of the series short term.
If you need me to merge some of these patches myself
pls let me know.

> Allen Kay (2):
>   Introduce Xen PCI Passthrough, qdevice (1/3)
>   Introduce Xen PCI Passthrough, PCI config space helpers (2/3)
> 
> Anthony PERARD (6):
>   pci_ids: Add INTEL_82599_SFP_VF id.
>   configure: Introduce --enable-xen-pci-passthrough.
>   Introduce XenHostPCIDevice to access a pci device on the host.
>   pci.c: Add opaque argument to pci_for_each_device.
>   qdev-properties: Introduce pci-host-devaddr.
>   Introduce apic-msidef.h
> 
> Jiang Yunhong (1):
>   Introduce Xen PCI Passthrough, MSI (3/3)
> 
>  configure                |   29 +
>  hw/apic-msidef.h         |   30 +
>  hw/apic.c                |   11 +-
>  hw/i386/Makefile.objs    |    2 +
>  hw/pci.c                 |   11 +-
>  hw/pci.h                 |    4 +-
>  hw/pci_ids.h             |    1 +
>  hw/qdev-properties.c     |  107 +++
>  hw/qdev.h                |    3 +
>  hw/xen-host-pci-device.c |  396 ++++++++++
>  hw/xen-host-pci-device.h |   55 ++
>  hw/xen_common.h          |    3 +
>  hw/xen_platform.c        |    8 +-
>  hw/xen_pt.c              |  851 +++++++++++++++++++++
>  hw/xen_pt.h              |  301 ++++++++
>  hw/xen_pt_config_init.c  | 1869 ++++++++++++++++++++++++++++++++++++++++++++++
>  hw/xen_pt_msi.c          |  620 +++++++++++++++
>  qemu-common.h            |    7 +
>  xen-all.c                |   12 +
>  19 files changed, 4301 insertions(+), 19 deletions(-)
>  create mode 100644 hw/apic-msidef.h
>  create mode 100644 hw/xen-host-pci-device.c
>  create mode 100644 hw/xen-host-pci-device.h
>  create mode 100644 hw/xen_pt.c
>  create mode 100644 hw/xen_pt.h
>  create mode 100644 hw/xen_pt_config_init.c
>  create mode 100644 hw/xen_pt_msi.c
> 
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V13 0/9] Xen PCI Passthrough
  2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
                   ` (13 preceding siblings ...)
  2012-06-14 17:01 ` Anthony PERARD
@ 2012-06-14 19:41 ` Michael S. Tsirkin
  2012-06-14 19:41 ` [Qemu-devel] " Michael S. Tsirkin
  15 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14 19:41 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Kiszka, Anthony Liguori, Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:40PM +0100, Anthony PERARD wrote:
> Hi all,
> 
> This patch series introduces the PCI passthrough for Xen.
> 
> Changes since the last version:
>   - New patch that introduce a new qdev-property pci-host-devaddr.
>   => the "export pci_parse_devaddr" patch is not anymore usefull.
> 
> Thanks,

I reviewed some patches and Acked. Won't have the time to
review the rest of the series short term.
If you need me to merge some of these patches myself
pls let me know.

> Allen Kay (2):
>   Introduce Xen PCI Passthrough, qdevice (1/3)
>   Introduce Xen PCI Passthrough, PCI config space helpers (2/3)
> 
> Anthony PERARD (6):
>   pci_ids: Add INTEL_82599_SFP_VF id.
>   configure: Introduce --enable-xen-pci-passthrough.
>   Introduce XenHostPCIDevice to access a pci device on the host.
>   pci.c: Add opaque argument to pci_for_each_device.
>   qdev-properties: Introduce pci-host-devaddr.
>   Introduce apic-msidef.h
> 
> Jiang Yunhong (1):
>   Introduce Xen PCI Passthrough, MSI (3/3)
> 
>  configure                |   29 +
>  hw/apic-msidef.h         |   30 +
>  hw/apic.c                |   11 +-
>  hw/i386/Makefile.objs    |    2 +
>  hw/pci.c                 |   11 +-
>  hw/pci.h                 |    4 +-
>  hw/pci_ids.h             |    1 +
>  hw/qdev-properties.c     |  107 +++
>  hw/qdev.h                |    3 +
>  hw/xen-host-pci-device.c |  396 ++++++++++
>  hw/xen-host-pci-device.h |   55 ++
>  hw/xen_common.h          |    3 +
>  hw/xen_platform.c        |    8 +-
>  hw/xen_pt.c              |  851 +++++++++++++++++++++
>  hw/xen_pt.h              |  301 ++++++++
>  hw/xen_pt_config_init.c  | 1869 ++++++++++++++++++++++++++++++++++++++++++++++
>  hw/xen_pt_msi.c          |  620 +++++++++++++++
>  qemu-common.h            |    7 +
>  xen-all.c                |   12 +
>  19 files changed, 4301 insertions(+), 19 deletions(-)
>  create mode 100644 hw/apic-msidef.h
>  create mode 100644 hw/xen-host-pci-device.c
>  create mode 100644 hw/xen-host-pci-device.h
>  create mode 100644 hw/xen_pt.c
>  create mode 100644 hw/xen_pt.h
>  create mode 100644 hw/xen_pt_config_init.c
>  create mode 100644 hw/xen_pt_msi.c
> 
> -- 
> Anthony PERARD

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [PATCH V13 1/9] pci_ids: Add INTEL_82599_SFP_VF id.
  2012-06-14 17:01   ` Anthony PERARD
                     ` (3 preceding siblings ...)
  (?)
@ 2012-06-14 19:52   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 37+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-14 19:52 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Anthony Liguori, Michael S. Tsirkin, Jan Kiszka,
	Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:41PM +0100, Anthony PERARD wrote:
> We are using this in our quirk lookup provided by patch
> titled: Introduce Xen PCI Passthrough, PCI config space helpers.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  hw/pci_ids.h |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci_ids.h b/hw/pci_ids.h
> index e8235a7..649e6b3 100644
> --- a/hw/pci_ids.h
> +++ b/hw/pci_ids.h
> @@ -118,6 +118,7 @@
>  #define PCI_DEVICE_ID_INTEL_82801I_UHCI6 0x2939
>  #define PCI_DEVICE_ID_INTEL_82801I_EHCI1 0x293a
>  #define PCI_DEVICE_ID_INTEL_82801I_EHCI2 0x293c
> +#define PCI_DEVICE_ID_INTEL_82599_SFP_VF 0x10ed
>  
>  #define PCI_VENDOR_ID_XEN               0x5853
>  #define PCI_DEVICE_ID_XEN_PLATFORM      0x0001
> -- 
> Anthony PERARD
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V13 1/9] pci_ids: Add INTEL_82599_SFP_VF id.
  2012-06-14 17:01   ` Anthony PERARD
                     ` (2 preceding siblings ...)
  (?)
@ 2012-06-14 19:52   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 37+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-14 19:52 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Anthony Liguori, Michael S. Tsirkin, Jan Kiszka,
	Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:41PM +0100, Anthony PERARD wrote:
> We are using this in our quirk lookup provided by patch
> titled: Introduce Xen PCI Passthrough, PCI config space helpers.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  hw/pci_ids.h |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci_ids.h b/hw/pci_ids.h
> index e8235a7..649e6b3 100644
> --- a/hw/pci_ids.h
> +++ b/hw/pci_ids.h
> @@ -118,6 +118,7 @@
>  #define PCI_DEVICE_ID_INTEL_82801I_UHCI6 0x2939
>  #define PCI_DEVICE_ID_INTEL_82801I_EHCI1 0x293a
>  #define PCI_DEVICE_ID_INTEL_82801I_EHCI2 0x293c
> +#define PCI_DEVICE_ID_INTEL_82599_SFP_VF 0x10ed
>  
>  #define PCI_VENDOR_ID_XEN               0x5853
>  #define PCI_DEVICE_ID_XEN_PLATFORM      0x0001
> -- 
> Anthony PERARD
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [PATCH V13 3/9] Introduce XenHostPCIDevice to access a pci device on the host.
  2012-06-14 17:01   ` Anthony PERARD
  (?)
  (?)
@ 2012-06-14 19:53   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 37+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-14 19:53 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Anthony Liguori, Michael S. Tsirkin, Jan Kiszka,
	Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:43PM +0100, Anthony PERARD wrote:
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  hw/i386/Makefile.objs    |    1 +
>  hw/xen-host-pci-device.c |  396 ++++++++++++++++++++++++++++++++++++++++++++++
>  hw/xen-host-pci-device.h |   55 +++++++
>  3 files changed, 452 insertions(+), 0 deletions(-)
>  create mode 100644 hw/xen-host-pci-device.c
>  create mode 100644 hw/xen-host-pci-device.h
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index d43f1df..b719d8e 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -7,6 +7,7 @@ obj-y += debugcon.o multiboot.o
>  obj-y += pc_piix.o
>  obj-y += pc_sysfw.o
>  obj-$(CONFIG_XEN) += xen_platform.o xen_apic.o
> +obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
>  obj-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o kvm/i8254.o
>  obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>  
> diff --git a/hw/xen-host-pci-device.c b/hw/xen-host-pci-device.c
> new file mode 100644
> index 0000000..e7ff680
> --- /dev/null
> +++ b/hw/xen-host-pci-device.c
> @@ -0,0 +1,396 @@
> +/*
> + * Copyright (C) 2011       Citrix Ltd.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "xen-host-pci-device.h"
> +
> +#define XEN_HOST_PCI_MAX_EXT_CAP \
> +    ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
> +
> +#ifdef XEN_HOST_PCI_DEVICE_DEBUG
> +#  define XEN_HOST_PCI_LOG(f, a...) fprintf(stderr, "%s: " f, __func__, ##a)
> +#else
> +#  define XEN_HOST_PCI_LOG(f, a...) (void)0
> +#endif
> +
> +/*
> + * from linux/ioport.h
> + * IO resources have these defined flags.
> + */
> +#define IORESOURCE_BITS         0x000000ff      /* Bus-specific bits */
> +
> +#define IORESOURCE_TYPE_BITS    0x00000f00      /* Resource type */
> +#define IORESOURCE_IO           0x00000100
> +#define IORESOURCE_MEM          0x00000200
> +
> +#define IORESOURCE_PREFETCH     0x00001000      /* No side effects */
> +#define IORESOURCE_MEM_64       0x00100000
> +
> +static int xen_host_pci_sysfs_path(const XenHostPCIDevice *d,
> +                                   const char *name, char *buf, ssize_t size)
> +{
> +    int rc;
> +
> +    rc = snprintf(buf, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
> +                  d->domain, d->bus, d->dev, d->func, name);
> +
> +    if (rc >= size || rc < 0) {
> +        /* The ouput is truncated or an other error is encountered */
> +        return -ENODEV;
> +    }
> +    return 0;
> +}
> +
> +
> +/* This size should be enough to read the first 7 lines of a ressource file */
> +#define XEN_HOST_PCI_RESSOURCE_BUFFER_SIZE 400
> +static int xen_host_pci_get_resource(XenHostPCIDevice *d)
> +{
> +    int i, rc, fd;
> +    char path[PATH_MAX];
> +    char buf[XEN_HOST_PCI_RESSOURCE_BUFFER_SIZE];
> +    unsigned long long start, end, flags, size;
> +    char *endptr, *s;
> +    uint8_t type;
> +
> +    rc = xen_host_pci_sysfs_path(d, "resource", path, sizeof (path));
> +    if (rc) {
> +        return rc;
> +    }
> +    fd = open(path, O_RDONLY);
> +    if (fd == -1) {
> +        XEN_HOST_PCI_LOG("Error: Can't open %s: %s\n", path, strerror(errno));
> +        return -errno;
> +    }
> +
> +    do {
> +        rc = read(fd, &buf, sizeof (buf) - 1);
> +        if (rc < 0 && errno != EINTR) {
> +            rc = -errno;
> +            goto out;
> +        }
> +    } while (rc < 0);
> +    buf[rc] = 0;
> +    rc = 0;
> +
> +    s = buf;
> +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> +        type = 0;
> +
> +        start = strtoll(s, &endptr, 16);
> +        if (*endptr != ' ' || s == endptr) {
> +            break;
> +        }
> +        s = endptr + 1;
> +        end = strtoll(s, &endptr, 16);
> +        if (*endptr != ' ' || s == endptr) {
> +            break;
> +        }
> +        s = endptr + 1;
> +        flags = strtoll(s, &endptr, 16);
> +        if (*endptr != '\n' || s == endptr) {
> +            break;
> +        }
> +        s = endptr + 1;
> +
> +        if (start) {
> +            size = end - start + 1;
> +        } else {
> +            size = 0;
> +        }
> +
> +        if (flags & IORESOURCE_IO) {
> +            type |= XEN_HOST_PCI_REGION_TYPE_IO;
> +        }
> +        if (flags & IORESOURCE_MEM) {
> +            type |= XEN_HOST_PCI_REGION_TYPE_MEM;
> +        }
> +        if (flags & IORESOURCE_PREFETCH) {
> +            type |= XEN_HOST_PCI_REGION_TYPE_PREFETCH;
> +        }
> +        if (flags & IORESOURCE_MEM_64) {
> +            type |= XEN_HOST_PCI_REGION_TYPE_MEM_64;
> +        }
> +
> +        if (i < PCI_ROM_SLOT) {
> +            d->io_regions[i].base_addr = start;
> +            d->io_regions[i].size = size;
> +            d->io_regions[i].type = type;
> +            d->io_regions[i].bus_flags = flags & IORESOURCE_BITS;
> +        } else {
> +            d->rom.base_addr = start;
> +            d->rom.size = size;
> +            d->rom.type = type;
> +            d->rom.bus_flags = flags & IORESOURCE_BITS;
> +        }
> +    }
> +    if (i != PCI_NUM_REGIONS) {
> +        /* Invalid format or input to short */
> +        rc = -ENODEV;
> +    }
> +
> +out:
> +    close(fd);
> +    return rc;
> +}
> +
> +/* This size should be enough to read a long from a file */
> +#define XEN_HOST_PCI_GET_VALUE_BUFFER_SIZE 22
> +static int xen_host_pci_get_value(XenHostPCIDevice *d, const char *name,
> +                                  unsigned int *pvalue, int base)
> +{
> +    char path[PATH_MAX];
> +    char buf[XEN_HOST_PCI_GET_VALUE_BUFFER_SIZE];
> +    int fd, rc;
> +    unsigned long value;
> +    char *endptr;
> +
> +    rc = xen_host_pci_sysfs_path(d, name, path, sizeof (path));
> +    if (rc) {
> +        return rc;
> +    }
> +    fd = open(path, O_RDONLY);
> +    if (fd == -1) {
> +        XEN_HOST_PCI_LOG("Error: Can't open %s: %s\n", path, strerror(errno));
> +        return -errno;
> +    }
> +    do {
> +        rc = read(fd, &buf, sizeof (buf) - 1);
> +        if (rc < 0 && errno != EINTR) {
> +            rc = -errno;
> +            goto out;
> +        }
> +    } while (rc < 0);
> +    buf[rc] = 0;
> +    value = strtol(buf, &endptr, base);
> +    if (endptr == buf || *endptr != '\n') {
> +        rc = -1;
> +    } else if ((value == LONG_MIN || value == LONG_MAX) && errno == ERANGE) {
> +        rc = -errno;
> +    } else {
> +        rc = 0;
> +        *pvalue = value;
> +    }
> +out:
> +    close(fd);
> +    return rc;
> +}
> +
> +static inline int xen_host_pci_get_hex_value(XenHostPCIDevice *d,
> +                                             const char *name,
> +                                             unsigned int *pvalue)
> +{
> +    return xen_host_pci_get_value(d, name, pvalue, 16);
> +}
> +
> +static inline int xen_host_pci_get_dec_value(XenHostPCIDevice *d,
> +                                             const char *name,
> +                                             unsigned int *pvalue)
> +{
> +    return xen_host_pci_get_value(d, name, pvalue, 10);
> +}
> +
> +static bool xen_host_pci_dev_is_virtfn(XenHostPCIDevice *d)
> +{
> +    char path[PATH_MAX];
> +    struct stat buf;
> +
> +    if (xen_host_pci_sysfs_path(d, "physfn", path, sizeof (path))) {
> +        return false;
> +    }
> +    return !stat(path, &buf);
> +}
> +
> +static int xen_host_pci_config_open(XenHostPCIDevice *d)
> +{
> +    char path[PATH_MAX];
> +    int rc;
> +
> +    rc = xen_host_pci_sysfs_path(d, "config", path, sizeof (path));
> +    if (rc) {
> +        return rc;
> +    }
> +    d->config_fd = open(path, O_RDWR);
> +    if (d->config_fd < 0) {
> +        return -errno;
> +    }
> +    return 0;
> +}
> +
> +static int xen_host_pci_config_read(XenHostPCIDevice *d,
> +                                    int pos, void *buf, int len)
> +{
> +    int rc;
> +
> +    do {
> +        rc = pread(d->config_fd, buf, len, pos);
> +    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
> +    if (rc != len) {
> +        return -errno;
> +    }
> +    return 0;
> +}
> +
> +static int xen_host_pci_config_write(XenHostPCIDevice *d,
> +                                     int pos, const void *buf, int len)
> +{
> +    int rc;
> +
> +    do {
> +        rc = pwrite(d->config_fd, buf, len, pos);
> +    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
> +    if (rc != len) {
> +        return -errno;
> +    }
> +    return 0;
> +}
> +
> +
> +int xen_host_pci_get_byte(XenHostPCIDevice *d, int pos, uint8_t *p)
> +{
> +    uint8_t buf;
> +    int rc = xen_host_pci_config_read(d, pos, &buf, 1);
> +    if (!rc) {
> +        *p = buf;
> +    }
> +    return rc;
> +}
> +
> +int xen_host_pci_get_word(XenHostPCIDevice *d, int pos, uint16_t *p)
> +{
> +    uint16_t buf;
> +    int rc = xen_host_pci_config_read(d, pos, &buf, 2);
> +    if (!rc) {
> +        *p = le16_to_cpu(buf);
> +    }
> +    return rc;
> +}
> +
> +int xen_host_pci_get_long(XenHostPCIDevice *d, int pos, uint32_t *p)
> +{
> +    uint32_t buf;
> +    int rc = xen_host_pci_config_read(d, pos, &buf, 4);
> +    if (!rc) {
> +        *p = le32_to_cpu(buf);
> +    }
> +    return rc;
> +}
> +
> +int xen_host_pci_get_block(XenHostPCIDevice *d, int pos, uint8_t *buf, int len)
> +{
> +    return xen_host_pci_config_read(d, pos, buf, len);
> +}
> +
> +int xen_host_pci_set_byte(XenHostPCIDevice *d, int pos, uint8_t data)
> +{
> +    return xen_host_pci_config_write(d, pos, &data, 1);
> +}
> +
> +int xen_host_pci_set_word(XenHostPCIDevice *d, int pos, uint16_t data)
> +{
> +    data = cpu_to_le16(data);
> +    return xen_host_pci_config_write(d, pos, &data, 2);
> +}
> +
> +int xen_host_pci_set_long(XenHostPCIDevice *d, int pos, uint32_t data)
> +{
> +    data = cpu_to_le32(data);
> +    return xen_host_pci_config_write(d, pos, &data, 4);
> +}
> +
> +int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf, int len)
> +{
> +    return xen_host_pci_config_write(d, pos, buf, len);
> +}
> +
> +int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *d, uint32_t cap)
> +{
> +    uint32_t header = 0;
> +    int max_cap = XEN_HOST_PCI_MAX_EXT_CAP;
> +    int pos = PCI_CONFIG_SPACE_SIZE;
> +
> +    do {
> +        if (xen_host_pci_get_long(d, pos, &header)) {
> +            break;
> +        }
> +        /*
> +         * If we have no capabilities, this is indicated by cap ID,
> +         * cap version and next pointer all being 0.
> +         */
> +        if (header == 0) {
> +            break;
> +        }
> +
> +        if (PCI_EXT_CAP_ID(header) == cap) {
> +            return pos;
> +        }
> +
> +        pos = PCI_EXT_CAP_NEXT(header);
> +        if (pos < PCI_CONFIG_SPACE_SIZE) {
> +            break;
> +        }
> +
> +        max_cap--;
> +    } while (max_cap > 0);
> +
> +    return -1;
> +}
> +
> +int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
> +                            uint8_t bus, uint8_t dev, uint8_t func)
> +{
> +    unsigned int v;
> +    int rc = 0;
> +
> +    d->config_fd = -1;
> +    d->domain = domain;
> +    d->bus = bus;
> +    d->dev = dev;
> +    d->func = func;
> +
> +    rc = xen_host_pci_config_open(d);
> +    if (rc) {
> +        goto error;
> +    }
> +    rc = xen_host_pci_get_resource(d);
> +    if (rc) {
> +        goto error;
> +    }
> +    rc = xen_host_pci_get_hex_value(d, "vendor", &v);
> +    if (rc) {
> +        goto error;
> +    }
> +    d->vendor_id = v;
> +    rc = xen_host_pci_get_hex_value(d, "device", &v);
> +    if (rc) {
> +        goto error;
> +    }
> +    d->device_id = v;
> +    rc = xen_host_pci_get_dec_value(d, "irq", &v);
> +    if (rc) {
> +        goto error;
> +    }
> +    d->irq = v;
> +    d->is_virtfn = xen_host_pci_dev_is_virtfn(d);
> +
> +    return 0;
> +error:
> +    if (d->config_fd >= 0) {
> +        close(d->config_fd);
> +        d->config_fd = -1;
> +    }
> +    return rc;
> +}
> +
> +void xen_host_pci_device_put(XenHostPCIDevice *d)
> +{
> +    if (d->config_fd >= 0) {
> +        close(d->config_fd);
> +        d->config_fd = -1;
> +    }
> +}
> diff --git a/hw/xen-host-pci-device.h b/hw/xen-host-pci-device.h
> new file mode 100644
> index 0000000..0079dac
> --- /dev/null
> +++ b/hw/xen-host-pci-device.h
> @@ -0,0 +1,55 @@
> +#ifndef XEN_HOST_PCI_DEVICE_H
> +#define XEN_HOST_PCI_DEVICE_H
> +
> +#include "pci.h"
> +
> +enum {
> +    XEN_HOST_PCI_REGION_TYPE_IO = 1 << 1,
> +    XEN_HOST_PCI_REGION_TYPE_MEM = 1 << 2,
> +    XEN_HOST_PCI_REGION_TYPE_PREFETCH = 1 << 3,
> +    XEN_HOST_PCI_REGION_TYPE_MEM_64 = 1 << 4,
> +};
> +
> +typedef struct XenHostPCIIORegion {
> +    pcibus_t base_addr;
> +    pcibus_t size;
> +    uint8_t type;
> +    uint8_t bus_flags; /* Bus-specific bits */
> +} XenHostPCIIORegion;
> +
> +typedef struct XenHostPCIDevice {
> +    uint16_t domain;
> +    uint8_t bus;
> +    uint8_t dev;
> +    uint8_t func;
> +
> +    uint16_t vendor_id;
> +    uint16_t device_id;
> +    int irq;
> +
> +    XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
> +    XenHostPCIIORegion rom;
> +
> +    bool is_virtfn;
> +
> +    int config_fd;
> +} XenHostPCIDevice;
> +
> +int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
> +                            uint8_t bus, uint8_t dev, uint8_t func);
> +void xen_host_pci_device_put(XenHostPCIDevice *pci_dev);
> +
> +int xen_host_pci_get_byte(XenHostPCIDevice *d, int pos, uint8_t *p);
> +int xen_host_pci_get_word(XenHostPCIDevice *d, int pos, uint16_t *p);
> +int xen_host_pci_get_long(XenHostPCIDevice *d, int pos, uint32_t *p);
> +int xen_host_pci_get_block(XenHostPCIDevice *d, int pos, uint8_t *buf,
> +                           int len);
> +int xen_host_pci_set_byte(XenHostPCIDevice *d, int pos, uint8_t data);
> +int xen_host_pci_set_word(XenHostPCIDevice *d, int pos, uint16_t data);
> +int xen_host_pci_set_long(XenHostPCIDevice *d, int pos, uint32_t data);
> +int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf,
> +                           int len);
> +
> +int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *s, uint32_t cap);
> +
> +#endif /* !XEN_HOST_PCI_DEVICE_H_ */
> -- 
> Anthony PERARD
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V13 3/9] Introduce XenHostPCIDevice to access a pci device on the host.
  2012-06-14 17:01   ` Anthony PERARD
  (?)
@ 2012-06-14 19:53   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 37+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-14 19:53 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Anthony Liguori, Michael S. Tsirkin, Jan Kiszka,
	Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:43PM +0100, Anthony PERARD wrote:
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  hw/i386/Makefile.objs    |    1 +
>  hw/xen-host-pci-device.c |  396 ++++++++++++++++++++++++++++++++++++++++++++++
>  hw/xen-host-pci-device.h |   55 +++++++
>  3 files changed, 452 insertions(+), 0 deletions(-)
>  create mode 100644 hw/xen-host-pci-device.c
>  create mode 100644 hw/xen-host-pci-device.h
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index d43f1df..b719d8e 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -7,6 +7,7 @@ obj-y += debugcon.o multiboot.o
>  obj-y += pc_piix.o
>  obj-y += pc_sysfw.o
>  obj-$(CONFIG_XEN) += xen_platform.o xen_apic.o
> +obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
>  obj-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o kvm/i8254.o
>  obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>  
> diff --git a/hw/xen-host-pci-device.c b/hw/xen-host-pci-device.c
> new file mode 100644
> index 0000000..e7ff680
> --- /dev/null
> +++ b/hw/xen-host-pci-device.c
> @@ -0,0 +1,396 @@
> +/*
> + * Copyright (C) 2011       Citrix Ltd.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "xen-host-pci-device.h"
> +
> +#define XEN_HOST_PCI_MAX_EXT_CAP \
> +    ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
> +
> +#ifdef XEN_HOST_PCI_DEVICE_DEBUG
> +#  define XEN_HOST_PCI_LOG(f, a...) fprintf(stderr, "%s: " f, __func__, ##a)
> +#else
> +#  define XEN_HOST_PCI_LOG(f, a...) (void)0
> +#endif
> +
> +/*
> + * from linux/ioport.h
> + * IO resources have these defined flags.
> + */
> +#define IORESOURCE_BITS         0x000000ff      /* Bus-specific bits */
> +
> +#define IORESOURCE_TYPE_BITS    0x00000f00      /* Resource type */
> +#define IORESOURCE_IO           0x00000100
> +#define IORESOURCE_MEM          0x00000200
> +
> +#define IORESOURCE_PREFETCH     0x00001000      /* No side effects */
> +#define IORESOURCE_MEM_64       0x00100000
> +
> +static int xen_host_pci_sysfs_path(const XenHostPCIDevice *d,
> +                                   const char *name, char *buf, ssize_t size)
> +{
> +    int rc;
> +
> +    rc = snprintf(buf, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
> +                  d->domain, d->bus, d->dev, d->func, name);
> +
> +    if (rc >= size || rc < 0) {
> +        /* The ouput is truncated or an other error is encountered */
> +        return -ENODEV;
> +    }
> +    return 0;
> +}
> +
> +
> +/* This size should be enough to read the first 7 lines of a ressource file */
> +#define XEN_HOST_PCI_RESSOURCE_BUFFER_SIZE 400
> +static int xen_host_pci_get_resource(XenHostPCIDevice *d)
> +{
> +    int i, rc, fd;
> +    char path[PATH_MAX];
> +    char buf[XEN_HOST_PCI_RESSOURCE_BUFFER_SIZE];
> +    unsigned long long start, end, flags, size;
> +    char *endptr, *s;
> +    uint8_t type;
> +
> +    rc = xen_host_pci_sysfs_path(d, "resource", path, sizeof (path));
> +    if (rc) {
> +        return rc;
> +    }
> +    fd = open(path, O_RDONLY);
> +    if (fd == -1) {
> +        XEN_HOST_PCI_LOG("Error: Can't open %s: %s\n", path, strerror(errno));
> +        return -errno;
> +    }
> +
> +    do {
> +        rc = read(fd, &buf, sizeof (buf) - 1);
> +        if (rc < 0 && errno != EINTR) {
> +            rc = -errno;
> +            goto out;
> +        }
> +    } while (rc < 0);
> +    buf[rc] = 0;
> +    rc = 0;
> +
> +    s = buf;
> +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> +        type = 0;
> +
> +        start = strtoll(s, &endptr, 16);
> +        if (*endptr != ' ' || s == endptr) {
> +            break;
> +        }
> +        s = endptr + 1;
> +        end = strtoll(s, &endptr, 16);
> +        if (*endptr != ' ' || s == endptr) {
> +            break;
> +        }
> +        s = endptr + 1;
> +        flags = strtoll(s, &endptr, 16);
> +        if (*endptr != '\n' || s == endptr) {
> +            break;
> +        }
> +        s = endptr + 1;
> +
> +        if (start) {
> +            size = end - start + 1;
> +        } else {
> +            size = 0;
> +        }
> +
> +        if (flags & IORESOURCE_IO) {
> +            type |= XEN_HOST_PCI_REGION_TYPE_IO;
> +        }
> +        if (flags & IORESOURCE_MEM) {
> +            type |= XEN_HOST_PCI_REGION_TYPE_MEM;
> +        }
> +        if (flags & IORESOURCE_PREFETCH) {
> +            type |= XEN_HOST_PCI_REGION_TYPE_PREFETCH;
> +        }
> +        if (flags & IORESOURCE_MEM_64) {
> +            type |= XEN_HOST_PCI_REGION_TYPE_MEM_64;
> +        }
> +
> +        if (i < PCI_ROM_SLOT) {
> +            d->io_regions[i].base_addr = start;
> +            d->io_regions[i].size = size;
> +            d->io_regions[i].type = type;
> +            d->io_regions[i].bus_flags = flags & IORESOURCE_BITS;
> +        } else {
> +            d->rom.base_addr = start;
> +            d->rom.size = size;
> +            d->rom.type = type;
> +            d->rom.bus_flags = flags & IORESOURCE_BITS;
> +        }
> +    }
> +    if (i != PCI_NUM_REGIONS) {
> +        /* Invalid format or input to short */
> +        rc = -ENODEV;
> +    }
> +
> +out:
> +    close(fd);
> +    return rc;
> +}
> +
> +/* This size should be enough to read a long from a file */
> +#define XEN_HOST_PCI_GET_VALUE_BUFFER_SIZE 22
> +static int xen_host_pci_get_value(XenHostPCIDevice *d, const char *name,
> +                                  unsigned int *pvalue, int base)
> +{
> +    char path[PATH_MAX];
> +    char buf[XEN_HOST_PCI_GET_VALUE_BUFFER_SIZE];
> +    int fd, rc;
> +    unsigned long value;
> +    char *endptr;
> +
> +    rc = xen_host_pci_sysfs_path(d, name, path, sizeof (path));
> +    if (rc) {
> +        return rc;
> +    }
> +    fd = open(path, O_RDONLY);
> +    if (fd == -1) {
> +        XEN_HOST_PCI_LOG("Error: Can't open %s: %s\n", path, strerror(errno));
> +        return -errno;
> +    }
> +    do {
> +        rc = read(fd, &buf, sizeof (buf) - 1);
> +        if (rc < 0 && errno != EINTR) {
> +            rc = -errno;
> +            goto out;
> +        }
> +    } while (rc < 0);
> +    buf[rc] = 0;
> +    value = strtol(buf, &endptr, base);
> +    if (endptr == buf || *endptr != '\n') {
> +        rc = -1;
> +    } else if ((value == LONG_MIN || value == LONG_MAX) && errno == ERANGE) {
> +        rc = -errno;
> +    } else {
> +        rc = 0;
> +        *pvalue = value;
> +    }
> +out:
> +    close(fd);
> +    return rc;
> +}
> +
> +static inline int xen_host_pci_get_hex_value(XenHostPCIDevice *d,
> +                                             const char *name,
> +                                             unsigned int *pvalue)
> +{
> +    return xen_host_pci_get_value(d, name, pvalue, 16);
> +}
> +
> +static inline int xen_host_pci_get_dec_value(XenHostPCIDevice *d,
> +                                             const char *name,
> +                                             unsigned int *pvalue)
> +{
> +    return xen_host_pci_get_value(d, name, pvalue, 10);
> +}
> +
> +static bool xen_host_pci_dev_is_virtfn(XenHostPCIDevice *d)
> +{
> +    char path[PATH_MAX];
> +    struct stat buf;
> +
> +    if (xen_host_pci_sysfs_path(d, "physfn", path, sizeof (path))) {
> +        return false;
> +    }
> +    return !stat(path, &buf);
> +}
> +
> +static int xen_host_pci_config_open(XenHostPCIDevice *d)
> +{
> +    char path[PATH_MAX];
> +    int rc;
> +
> +    rc = xen_host_pci_sysfs_path(d, "config", path, sizeof (path));
> +    if (rc) {
> +        return rc;
> +    }
> +    d->config_fd = open(path, O_RDWR);
> +    if (d->config_fd < 0) {
> +        return -errno;
> +    }
> +    return 0;
> +}
> +
> +static int xen_host_pci_config_read(XenHostPCIDevice *d,
> +                                    int pos, void *buf, int len)
> +{
> +    int rc;
> +
> +    do {
> +        rc = pread(d->config_fd, buf, len, pos);
> +    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
> +    if (rc != len) {
> +        return -errno;
> +    }
> +    return 0;
> +}
> +
> +static int xen_host_pci_config_write(XenHostPCIDevice *d,
> +                                     int pos, const void *buf, int len)
> +{
> +    int rc;
> +
> +    do {
> +        rc = pwrite(d->config_fd, buf, len, pos);
> +    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
> +    if (rc != len) {
> +        return -errno;
> +    }
> +    return 0;
> +}
> +
> +
> +int xen_host_pci_get_byte(XenHostPCIDevice *d, int pos, uint8_t *p)
> +{
> +    uint8_t buf;
> +    int rc = xen_host_pci_config_read(d, pos, &buf, 1);
> +    if (!rc) {
> +        *p = buf;
> +    }
> +    return rc;
> +}
> +
> +int xen_host_pci_get_word(XenHostPCIDevice *d, int pos, uint16_t *p)
> +{
> +    uint16_t buf;
> +    int rc = xen_host_pci_config_read(d, pos, &buf, 2);
> +    if (!rc) {
> +        *p = le16_to_cpu(buf);
> +    }
> +    return rc;
> +}
> +
> +int xen_host_pci_get_long(XenHostPCIDevice *d, int pos, uint32_t *p)
> +{
> +    uint32_t buf;
> +    int rc = xen_host_pci_config_read(d, pos, &buf, 4);
> +    if (!rc) {
> +        *p = le32_to_cpu(buf);
> +    }
> +    return rc;
> +}
> +
> +int xen_host_pci_get_block(XenHostPCIDevice *d, int pos, uint8_t *buf, int len)
> +{
> +    return xen_host_pci_config_read(d, pos, buf, len);
> +}
> +
> +int xen_host_pci_set_byte(XenHostPCIDevice *d, int pos, uint8_t data)
> +{
> +    return xen_host_pci_config_write(d, pos, &data, 1);
> +}
> +
> +int xen_host_pci_set_word(XenHostPCIDevice *d, int pos, uint16_t data)
> +{
> +    data = cpu_to_le16(data);
> +    return xen_host_pci_config_write(d, pos, &data, 2);
> +}
> +
> +int xen_host_pci_set_long(XenHostPCIDevice *d, int pos, uint32_t data)
> +{
> +    data = cpu_to_le32(data);
> +    return xen_host_pci_config_write(d, pos, &data, 4);
> +}
> +
> +int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf, int len)
> +{
> +    return xen_host_pci_config_write(d, pos, buf, len);
> +}
> +
> +int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *d, uint32_t cap)
> +{
> +    uint32_t header = 0;
> +    int max_cap = XEN_HOST_PCI_MAX_EXT_CAP;
> +    int pos = PCI_CONFIG_SPACE_SIZE;
> +
> +    do {
> +        if (xen_host_pci_get_long(d, pos, &header)) {
> +            break;
> +        }
> +        /*
> +         * If we have no capabilities, this is indicated by cap ID,
> +         * cap version and next pointer all being 0.
> +         */
> +        if (header == 0) {
> +            break;
> +        }
> +
> +        if (PCI_EXT_CAP_ID(header) == cap) {
> +            return pos;
> +        }
> +
> +        pos = PCI_EXT_CAP_NEXT(header);
> +        if (pos < PCI_CONFIG_SPACE_SIZE) {
> +            break;
> +        }
> +
> +        max_cap--;
> +    } while (max_cap > 0);
> +
> +    return -1;
> +}
> +
> +int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
> +                            uint8_t bus, uint8_t dev, uint8_t func)
> +{
> +    unsigned int v;
> +    int rc = 0;
> +
> +    d->config_fd = -1;
> +    d->domain = domain;
> +    d->bus = bus;
> +    d->dev = dev;
> +    d->func = func;
> +
> +    rc = xen_host_pci_config_open(d);
> +    if (rc) {
> +        goto error;
> +    }
> +    rc = xen_host_pci_get_resource(d);
> +    if (rc) {
> +        goto error;
> +    }
> +    rc = xen_host_pci_get_hex_value(d, "vendor", &v);
> +    if (rc) {
> +        goto error;
> +    }
> +    d->vendor_id = v;
> +    rc = xen_host_pci_get_hex_value(d, "device", &v);
> +    if (rc) {
> +        goto error;
> +    }
> +    d->device_id = v;
> +    rc = xen_host_pci_get_dec_value(d, "irq", &v);
> +    if (rc) {
> +        goto error;
> +    }
> +    d->irq = v;
> +    d->is_virtfn = xen_host_pci_dev_is_virtfn(d);
> +
> +    return 0;
> +error:
> +    if (d->config_fd >= 0) {
> +        close(d->config_fd);
> +        d->config_fd = -1;
> +    }
> +    return rc;
> +}
> +
> +void xen_host_pci_device_put(XenHostPCIDevice *d)
> +{
> +    if (d->config_fd >= 0) {
> +        close(d->config_fd);
> +        d->config_fd = -1;
> +    }
> +}
> diff --git a/hw/xen-host-pci-device.h b/hw/xen-host-pci-device.h
> new file mode 100644
> index 0000000..0079dac
> --- /dev/null
> +++ b/hw/xen-host-pci-device.h
> @@ -0,0 +1,55 @@
> +#ifndef XEN_HOST_PCI_DEVICE_H
> +#define XEN_HOST_PCI_DEVICE_H
> +
> +#include "pci.h"
> +
> +enum {
> +    XEN_HOST_PCI_REGION_TYPE_IO = 1 << 1,
> +    XEN_HOST_PCI_REGION_TYPE_MEM = 1 << 2,
> +    XEN_HOST_PCI_REGION_TYPE_PREFETCH = 1 << 3,
> +    XEN_HOST_PCI_REGION_TYPE_MEM_64 = 1 << 4,
> +};
> +
> +typedef struct XenHostPCIIORegion {
> +    pcibus_t base_addr;
> +    pcibus_t size;
> +    uint8_t type;
> +    uint8_t bus_flags; /* Bus-specific bits */
> +} XenHostPCIIORegion;
> +
> +typedef struct XenHostPCIDevice {
> +    uint16_t domain;
> +    uint8_t bus;
> +    uint8_t dev;
> +    uint8_t func;
> +
> +    uint16_t vendor_id;
> +    uint16_t device_id;
> +    int irq;
> +
> +    XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
> +    XenHostPCIIORegion rom;
> +
> +    bool is_virtfn;
> +
> +    int config_fd;
> +} XenHostPCIDevice;
> +
> +int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
> +                            uint8_t bus, uint8_t dev, uint8_t func);
> +void xen_host_pci_device_put(XenHostPCIDevice *pci_dev);
> +
> +int xen_host_pci_get_byte(XenHostPCIDevice *d, int pos, uint8_t *p);
> +int xen_host_pci_get_word(XenHostPCIDevice *d, int pos, uint16_t *p);
> +int xen_host_pci_get_long(XenHostPCIDevice *d, int pos, uint32_t *p);
> +int xen_host_pci_get_block(XenHostPCIDevice *d, int pos, uint8_t *buf,
> +                           int len);
> +int xen_host_pci_set_byte(XenHostPCIDevice *d, int pos, uint8_t data);
> +int xen_host_pci_set_word(XenHostPCIDevice *d, int pos, uint16_t data);
> +int xen_host_pci_set_long(XenHostPCIDevice *d, int pos, uint32_t data);
> +int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf,
> +                           int len);
> +
> +int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *s, uint32_t cap);
> +
> +#endif /* !XEN_HOST_PCI_DEVICE_H_ */
> -- 
> Anthony PERARD
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device.
  2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device Anthony PERARD
  2012-06-14 19:38   ` Michael S. Tsirkin
  2012-06-14 19:38   ` [Qemu-devel] " Michael S. Tsirkin
@ 2012-06-14 19:54   ` Konrad Rzeszutek Wilk
  2012-06-14 19:54   ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 37+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-14 19:54 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Anthony Liguori, Michael S. Tsirkin, Jan Kiszka,
	Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:44PM +0100, Anthony PERARD wrote:
> The purpose is to have a more generic pci_for_each_device by passing an extra
> argument to the function called on every device.
> 
> This patch will be used in a next patch.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  hw/pci.c          |   11 +++++++----
>  hw/pci.h          |    4 +++-
>  hw/xen_platform.c |    8 ++++----
>  3 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 127b7ac..d6537e3 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1127,7 +1127,9 @@ static const pci_class_desc pci_class_descriptions[] =
>  };
>  
>  static void pci_for_each_device_under_bus(PCIBus *bus,
> -                                          void (*fn)(PCIBus *b, PCIDevice *d))
> +                                          void (*fn)(PCIBus *b, PCIDevice *d,
> +                                                     void *opaque),
> +                                          void *opaque)
>  {
>      PCIDevice *d;
>      int devfn;
> @@ -1135,18 +1137,19 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
>      for(devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
>          d = bus->devices[devfn];
>          if (d) {
> -            fn(bus, d);
> +            fn(bus, d, opaque);
>          }
>      }
>  }
>  
>  void pci_for_each_device(PCIBus *bus, int bus_num,
> -                         void (*fn)(PCIBus *b, PCIDevice *d))
> +                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
> +                         void *opaque)
>  {
>      bus = pci_find_bus_nr(bus, bus_num);
>  
>      if (bus) {
> -        pci_for_each_device_under_bus(bus, fn);
> +        pci_for_each_device_under_bus(bus, fn, opaque);
>      }
>  }
>  
> diff --git a/hw/pci.h b/hw/pci.h
> index 7f223c0..95b608c 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -312,7 +312,9 @@ PCIDevice *pci_nic_init(NICInfo *nd, const char *default_model,
>  PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char *default_model,
>                                 const char *default_devaddr);
>  int pci_bus_num(PCIBus *s);
> -void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *bus, PCIDevice *d));
> +void pci_for_each_device(PCIBus *bus, int bus_num,
> +                         void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
> +                         void *opaque);
>  PCIBus *pci_find_root_bus(int domain);
>  int pci_find_domain(const PCIBus *bus);
>  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn);
> diff --git a/hw/xen_platform.c b/hw/xen_platform.c
> index 0214f37..c1fe984 100644
> --- a/hw/xen_platform.c
> +++ b/hw/xen_platform.c
> @@ -83,7 +83,7 @@ static void log_writeb(PCIXenPlatformState *s, char val)
>  #define UNPLUG_ALL_NICS 2
>  #define UNPLUG_AUX_IDE_DISKS 4
>  
> -static void unplug_nic(PCIBus *b, PCIDevice *d)
> +static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
>  {
>      if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
>              PCI_CLASS_NETWORK_ETHERNET) {
> @@ -96,10 +96,10 @@ static void unplug_nic(PCIBus *b, PCIDevice *d)
>  
>  static void pci_unplug_nics(PCIBus *bus)
>  {
> -    pci_for_each_device(bus, 0, unplug_nic);
> +    pci_for_each_device(bus, 0, unplug_nic, NULL);
>  }
>  
> -static void unplug_disks(PCIBus *b, PCIDevice *d)
> +static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
>  {
>      if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
>              PCI_CLASS_STORAGE_IDE) {
> @@ -109,7 +109,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d)
>  
>  static void pci_unplug_disks(PCIBus *bus)
>  {
> -    pci_for_each_device(bus, 0, unplug_disks);
> +    pci_for_each_device(bus, 0, unplug_disks, NULL);
>  }
>  
>  static void platform_fixed_ioport_writew(void *opaque, uint32_t addr, uint32_t val)
> -- 
> Anthony PERARD
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device.
  2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device Anthony PERARD
                     ` (2 preceding siblings ...)
  2012-06-14 19:54   ` [Qemu-devel] [Xen-devel] " Konrad Rzeszutek Wilk
@ 2012-06-14 19:54   ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 37+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-14 19:54 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Anthony Liguori, Michael S. Tsirkin, Jan Kiszka,
	Stefano Stabellini, QEMU-devel, Xen Devel

On Thu, Jun 14, 2012 at 06:01:44PM +0100, Anthony PERARD wrote:
> The purpose is to have a more generic pci_for_each_device by passing an extra
> argument to the function called on every device.
> 
> This patch will be used in a next patch.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  hw/pci.c          |   11 +++++++----
>  hw/pci.h          |    4 +++-
>  hw/xen_platform.c |    8 ++++----
>  3 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 127b7ac..d6537e3 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1127,7 +1127,9 @@ static const pci_class_desc pci_class_descriptions[] =
>  };
>  
>  static void pci_for_each_device_under_bus(PCIBus *bus,
> -                                          void (*fn)(PCIBus *b, PCIDevice *d))
> +                                          void (*fn)(PCIBus *b, PCIDevice *d,
> +                                                     void *opaque),
> +                                          void *opaque)
>  {
>      PCIDevice *d;
>      int devfn;
> @@ -1135,18 +1137,19 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
>      for(devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
>          d = bus->devices[devfn];
>          if (d) {
> -            fn(bus, d);
> +            fn(bus, d, opaque);
>          }
>      }
>  }
>  
>  void pci_for_each_device(PCIBus *bus, int bus_num,
> -                         void (*fn)(PCIBus *b, PCIDevice *d))
> +                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
> +                         void *opaque)
>  {
>      bus = pci_find_bus_nr(bus, bus_num);
>  
>      if (bus) {
> -        pci_for_each_device_under_bus(bus, fn);
> +        pci_for_each_device_under_bus(bus, fn, opaque);
>      }
>  }
>  
> diff --git a/hw/pci.h b/hw/pci.h
> index 7f223c0..95b608c 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -312,7 +312,9 @@ PCIDevice *pci_nic_init(NICInfo *nd, const char *default_model,
>  PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char *default_model,
>                                 const char *default_devaddr);
>  int pci_bus_num(PCIBus *s);
> -void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *bus, PCIDevice *d));
> +void pci_for_each_device(PCIBus *bus, int bus_num,
> +                         void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
> +                         void *opaque);
>  PCIBus *pci_find_root_bus(int domain);
>  int pci_find_domain(const PCIBus *bus);
>  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn);
> diff --git a/hw/xen_platform.c b/hw/xen_platform.c
> index 0214f37..c1fe984 100644
> --- a/hw/xen_platform.c
> +++ b/hw/xen_platform.c
> @@ -83,7 +83,7 @@ static void log_writeb(PCIXenPlatformState *s, char val)
>  #define UNPLUG_ALL_NICS 2
>  #define UNPLUG_AUX_IDE_DISKS 4
>  
> -static void unplug_nic(PCIBus *b, PCIDevice *d)
> +static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
>  {
>      if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
>              PCI_CLASS_NETWORK_ETHERNET) {
> @@ -96,10 +96,10 @@ static void unplug_nic(PCIBus *b, PCIDevice *d)
>  
>  static void pci_unplug_nics(PCIBus *bus)
>  {
> -    pci_for_each_device(bus, 0, unplug_nic);
> +    pci_for_each_device(bus, 0, unplug_nic, NULL);
>  }
>  
> -static void unplug_disks(PCIBus *b, PCIDevice *d)
> +static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
>  {
>      if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
>              PCI_CLASS_STORAGE_IDE) {
> @@ -109,7 +109,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d)
>  
>  static void pci_unplug_disks(PCIBus *bus)
>  {
> -    pci_for_each_device(bus, 0, unplug_disks);
> +    pci_for_each_device(bus, 0, unplug_disks, NULL);
>  }
>  
>  static void platform_fixed_ioport_writew(void *opaque, uint32_t addr, uint32_t val)
> -- 
> Anthony PERARD
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough
  2012-06-14 19:41 ` [Qemu-devel] " Michael S. Tsirkin
  2012-06-15 11:05   ` Stefano Stabellini
@ 2012-06-15 11:05   ` Stefano Stabellini
  1 sibling, 0 replies; 37+ messages in thread
From: Stefano Stabellini @ 2012-06-15 11:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka, QEMU-devel,
	Xen Devel, Anthony Perard

On Thu, 14 Jun 2012, Michael S. Tsirkin wrote:
> On Thu, Jun 14, 2012 at 06:01:40PM +0100, Anthony PERARD wrote:
> > Hi all,
> > 
> > This patch series introduces the PCI passthrough for Xen.
> > 
> > Changes since the last version:
> >   - New patch that introduce a new qdev-property pci-host-devaddr.
> >   => the "export pci_parse_devaddr" patch is not anymore usefull.
> > 
> > Thanks,
> 
> I reviewed some patches and Acked. Won't have the time to
> review the rest of the series short term.
> If you need me to merge some of these patches myself
> pls let me know.

Considering that you Acked all the non-Xen patches (apart from the
configure patch), I think I'll just go ahead and submit a pull request
to Anthony myself, if you are OK with it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V13 0/9] Xen PCI Passthrough
  2012-06-14 19:41 ` [Qemu-devel] " Michael S. Tsirkin
@ 2012-06-15 11:05   ` Stefano Stabellini
  2012-06-15 11:05   ` [Qemu-devel] " Stefano Stabellini
  1 sibling, 0 replies; 37+ messages in thread
From: Stefano Stabellini @ 2012-06-15 11:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Anthony Liguori, Stefano Stabellini, Jan Kiszka, QEMU-devel,
	Xen Devel, Anthony Perard

On Thu, 14 Jun 2012, Michael S. Tsirkin wrote:
> On Thu, Jun 14, 2012 at 06:01:40PM +0100, Anthony PERARD wrote:
> > Hi all,
> > 
> > This patch series introduces the PCI passthrough for Xen.
> > 
> > Changes since the last version:
> >   - New patch that introduce a new qdev-property pci-host-devaddr.
> >   => the "export pci_parse_devaddr" patch is not anymore usefull.
> > 
> > Thanks,
> 
> I reviewed some patches and Acked. Won't have the time to
> review the rest of the series short term.
> If you need me to merge some of these patches myself
> pls let me know.

Considering that you Acked all the non-Xen patches (apart from the
configure patch), I think I'll just go ahead and submit a pull request
to Anthony myself, if you are OK with it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2012-06-15 11:06 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-14 17:01 [Qemu-devel] [PATCH V13 0/9] Xen PCI Passthrough Anthony PERARD
2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 1/9] pci_ids: Add INTEL_82599_SFP_VF id Anthony PERARD
2012-06-14 17:01   ` Anthony PERARD
2012-06-14 19:37   ` Michael S. Tsirkin
2012-06-14 19:37   ` [Qemu-devel] " Michael S. Tsirkin
2012-06-14 19:52   ` Konrad Rzeszutek Wilk
2012-06-14 19:52   ` [Qemu-devel] [Xen-devel] " Konrad Rzeszutek Wilk
2012-06-14 17:01 ` [PATCH V13 2/9] configure: Introduce --enable-xen-pci-passthrough Anthony PERARD
2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 3/9] Introduce XenHostPCIDevice to access a pci device on the host Anthony PERARD
2012-06-14 17:01   ` Anthony PERARD
2012-06-14 19:53   ` Konrad Rzeszutek Wilk
2012-06-14 19:53   ` [Qemu-devel] [Xen-devel] " Konrad Rzeszutek Wilk
2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 4/9] pci.c: Add opaque argument to pci_for_each_device Anthony PERARD
2012-06-14 19:38   ` Michael S. Tsirkin
2012-06-14 19:38   ` [Qemu-devel] " Michael S. Tsirkin
2012-06-14 19:54   ` [Qemu-devel] [Xen-devel] " Konrad Rzeszutek Wilk
2012-06-14 19:54   ` Konrad Rzeszutek Wilk
2012-06-14 17:01 ` Anthony PERARD
2012-06-14 17:01 ` [PATCH V13 5/9] qdev-properties: Introduce pci-host-devaddr Anthony PERARD
2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
2012-06-14 19:39   ` Michael S. Tsirkin
2012-06-14 19:39   ` [Qemu-devel] " Michael S. Tsirkin
2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 6/9] Introduce Xen PCI Passthrough, qdevice (1/3) Anthony PERARD
2012-06-14 17:01   ` Anthony PERARD
2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 7/9] Introduce Xen PCI Passthrough, PCI config space helpers (2/3) Anthony PERARD
2012-06-14 17:01   ` Anthony PERARD
2012-06-14 17:01 ` [PATCH V13 8/9] Introduce apic-msidef.h Anthony PERARD
2012-06-14 17:01 ` [Qemu-devel] " Anthony PERARD
2012-06-14 19:40   ` Michael S. Tsirkin
2012-06-14 19:40   ` [Qemu-devel] " Michael S. Tsirkin
2012-06-14 17:01 ` [Qemu-devel] [PATCH V13 9/9] Introduce Xen PCI Passthrough, MSI (3/3) Anthony PERARD
2012-06-14 17:01 ` Anthony PERARD
2012-06-14 19:41 ` [PATCH V13 0/9] Xen PCI Passthrough Michael S. Tsirkin
2012-06-14 19:41 ` [Qemu-devel] " Michael S. Tsirkin
2012-06-15 11:05   ` Stefano Stabellini
2012-06-15 11:05   ` [Qemu-devel] " Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.