[Qemu-devel] [PATCH v3 00/11] igd passthrough chipset tweaks

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-05 11:41 ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users, Gerd Hoffmann

  Hi,

We have some code in our tree to support pci passthrough of intel
graphics devices (igd) on xen, which requires some chipset tweaks
for (a) the host bridge and (b) the lpc/isa-bridge to meat the
expectations of the guest driver.

For kvm we need pretty much the same, also the requirements for vgpu
(xengt/kvmgt) are very simliar.  This patch wires up the existing
support for kvm.  It also brings a bunch of bugfixes and cleanups.

Unfortunaly the oldish laptop I had planned to use for testing turned
out to have no working iommu support for igd, so this patch series
still has seen very light testing only.  Any testing feedback is very
welcome.

Testing with kvm/i440fx:
  Add '-M pc,igd-passthru=on' to turn on the chipset tweaks.
  Passthrough the igd using vfio.

Testing with kvm/q35:
  Add '-M q35,igd-passthru=on' to turn on the the chipset tweaks.
  Pick up the linux kernel patch referenced in patch #11, build a
  custom kernel with it.  Passthrough the igd using vfio.

Testing with xen:
  Existing setups should continue working ;)

Changes in v3:
  * Handle igd-passthrough-isa-bridge creation in machine init.
  * Fix xen build failure.

Changes in v2:
  * Added igd-passthrough-isa-bridge support form kvm.
  * Added patch to drop has_igd_gfx_passthru.

cheers,
  Gerd

Gerd Hoffmann (11):
  pc: wire up TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE for !xen
  pc: remove has_igd_gfx_passthru global
  pc: move igd support code to igd.c
  igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
  igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
  igd: use defines for standard pci config space offsets
  igd: revamp host config read
  igd: add q35 support
  igd: move igd-passthrough-isa-bridge to igd.c too
  igd: handle igd-passthrough-isa-bridge setup in realize()
  igd: move igd-passthrough-isa-bridge creation to machine init

 hw/i386/pc_piix.c         | 130 +++------------------------------
 hw/pci-host/Makefile.objs |   3 +
 hw/pci-host/igd.c         | 181 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/pci-host/piix.c        |  88 ----------------------
 hw/pci-host/q35.c         |   6 +-
 hw/xen/xen_pt.c           |  14 ----
 hw/xen/xen_pt.h           |   5 +-
 include/hw/i386/pc.h      |   2 +-
 vl.c                      |  10 ---
 9 files changed, 204 insertions(+), 235 deletions(-)
 create mode 100644 hw/pci-host/igd.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 132+ messages in thread

* [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-05 11:41 ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Cao jin, vfio-users-H+wXaHxf7aLQT0dZR+AlfA

  Hi,

We have some code in our tree to support pci passthrough of intel
graphics devices (igd) on xen, which requires some chipset tweaks
for (a) the host bridge and (b) the lpc/isa-bridge to meat the
expectations of the guest driver.

For kvm we need pretty much the same, also the requirements for vgpu
(xengt/kvmgt) are very simliar.  This patch wires up the existing
support for kvm.  It also brings a bunch of bugfixes and cleanups.

Unfortunaly the oldish laptop I had planned to use for testing turned
out to have no working iommu support for igd, so this patch series
still has seen very light testing only.  Any testing feedback is very
welcome.

Testing with kvm/i440fx:
  Add '-M pc,igd-passthru=on' to turn on the chipset tweaks.
  Passthrough the igd using vfio.

Testing with kvm/q35:
  Add '-M q35,igd-passthru=on' to turn on the the chipset tweaks.
  Pick up the linux kernel patch referenced in patch #11, build a
  custom kernel with it.  Passthrough the igd using vfio.

Testing with xen:
  Existing setups should continue working ;)

Changes in v3:
  * Handle igd-passthrough-isa-bridge creation in machine init.
  * Fix xen build failure.

Changes in v2:
  * Added igd-passthrough-isa-bridge support form kvm.
  * Added patch to drop has_igd_gfx_passthru.

cheers,
  Gerd

Gerd Hoffmann (11):
  pc: wire up TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE for !xen
  pc: remove has_igd_gfx_passthru global
  pc: move igd support code to igd.c
  igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
  igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
  igd: use defines for standard pci config space offsets
  igd: revamp host config read
  igd: add q35 support
  igd: move igd-passthrough-isa-bridge to igd.c too
  igd: handle igd-passthrough-isa-bridge setup in realize()
  igd: move igd-passthrough-isa-bridge creation to machine init

 hw/i386/pc_piix.c         | 130 +++------------------------------
 hw/pci-host/Makefile.objs |   3 +
 hw/pci-host/igd.c         | 181 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/pci-host/piix.c        |  88 ----------------------
 hw/pci-host/q35.c         |   6 +-
 hw/xen/xen_pt.c           |  14 ----
 hw/xen/xen_pt.h           |   5 +-
 include/hw/i386/pc.h      |   2 +-
 vl.c                      |  10 ---
 9 files changed, 204 insertions(+), 235 deletions(-)
 create mode 100644 hw/pci-host/igd.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 01/11] pc: wire up TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE for !xen
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	Michael S. Tsirkin, Cao jin, vfio-users, Paolo Bonzini,
	Richard Henderson, Gerd Hoffmann

rename pc_xen_hvm_init_pci to pc_i440fx_init_pci,
use it for both xen and non-xen init.

That changes behavior of all pc-i440fx-$version machine types where
specifying -machine igd-passthru=on used to have no effect and now it
has.  It is unlikely to cause any trouble though as there used to be
no reason to add that option to kvm guests in the first place.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/i386/pc_piix.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 438cdae..6532e32 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -368,10 +368,9 @@ static void pc_init_isa(MachineState *machine)
     pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, TYPE_I440FX_PCI_DEVICE);
 }
 
-#ifdef CONFIG_XEN
-static void pc_xen_hvm_init_pci(MachineState *machine)
+static void pc_i440fx_init_pci(MachineState *machine)
 {
-    const char *pci_type = has_igd_gfx_passthru ?
+    const char *pci_type = machine->igd_gfx_passthru ?
                 TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE : TYPE_I440FX_PCI_DEVICE;
 
     pc_init1(machine,
@@ -379,6 +378,7 @@ static void pc_xen_hvm_init_pci(MachineState *machine)
              pci_type);
 }
 
+#ifdef CONFIG_XEN
 static void pc_xen_hvm_init(MachineState *machine)
 {
     PCIBus *bus;
@@ -388,7 +388,7 @@ static void pc_xen_hvm_init(MachineState *machine)
         exit(1);
     }
 
-    pc_xen_hvm_init_pci(machine);
+    pc_i440fx_init_pci(machine);
 
     bus = pci_find_primary_bus();
     if (bus != NULL) {
@@ -404,8 +404,7 @@ static void pc_xen_hvm_init(MachineState *machine)
         if (compat) { \
             compat(machine); \
         } \
-        pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
-                 TYPE_I440FX_PCI_DEVICE); \
+        pc_i440fx_init_pci(machine); \
     } \
     DEFINE_PC_MACHINE(suffix, name, pc_init_##suffix, optionfn)
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 01/11] pc: wire up TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE for !xen
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Michael S. Tsirkin, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA, Paolo Bonzini,
	Richard Henderson

rename pc_xen_hvm_init_pci to pc_i440fx_init_pci,
use it for both xen and non-xen init.

That changes behavior of all pc-i440fx-$version machine types where
specifying -machine igd-passthru=on used to have no effect and now it
has.  It is unlikely to cause any trouble though as there used to be
no reason to add that option to kvm guests in the first place.

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Eduardo Habkost <ehabkost-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini-mvvWK6WmYclDPfheJLI6IQ@public.gmane.org>
---
 hw/i386/pc_piix.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 438cdae..6532e32 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -368,10 +368,9 @@ static void pc_init_isa(MachineState *machine)
     pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, TYPE_I440FX_PCI_DEVICE);
 }
 
-#ifdef CONFIG_XEN
-static void pc_xen_hvm_init_pci(MachineState *machine)
+static void pc_i440fx_init_pci(MachineState *machine)
 {
-    const char *pci_type = has_igd_gfx_passthru ?
+    const char *pci_type = machine->igd_gfx_passthru ?
                 TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE : TYPE_I440FX_PCI_DEVICE;
 
     pc_init1(machine,
@@ -379,6 +378,7 @@ static void pc_xen_hvm_init_pci(MachineState *machine)
              pci_type);
 }
 
+#ifdef CONFIG_XEN
 static void pc_xen_hvm_init(MachineState *machine)
 {
     PCIBus *bus;
@@ -388,7 +388,7 @@ static void pc_xen_hvm_init(MachineState *machine)
         exit(1);
     }
 
-    pc_xen_hvm_init_pci(machine);
+    pc_i440fx_init_pci(machine);
 
     bus = pci_find_primary_bus();
     if (bus != NULL) {
@@ -404,8 +404,7 @@ static void pc_xen_hvm_init(MachineState *machine)
         if (compat) { \
             compat(machine); \
         } \
-        pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
-                 TYPE_I440FX_PCI_DEVICE); \
+        pc_i440fx_init_pci(machine); \
     } \
     DEFINE_PC_MACHINE(suffix, name, pc_init_##suffix, optionfn)
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 02/11] pc: remove has_igd_gfx_passthru global
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users, Paolo Bonzini, Gerd Hoffmann

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/xen/xen_pt.h |  5 +++--
 vl.c            | 10 ----------
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index 3749711..cdd73ff 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -4,6 +4,7 @@
 #include "qemu-common.h"
 #include "hw/xen/xen_common.h"
 #include "hw/pci/pci.h"
+#include "hw/boards.h"
 #include "xen-host-pci-device.h"
 
 void xen_pt_log(const PCIDevice *d, const char *f, ...) GCC_FMT_ATTR(2, 3);
@@ -322,10 +323,10 @@ extern void *pci_assign_dev_load_option_rom(PCIDevice *dev,
                                             unsigned int domain,
                                             unsigned int bus, unsigned int slot,
                                             unsigned int function);
-extern bool has_igd_gfx_passthru;
 static inline bool is_igd_vga_passthrough(XenHostPCIDevice *dev)
 {
-    return (has_igd_gfx_passthru
+    MachineState *machine = MACHINE(qdev_get_machine());
+    return (machine->igd_gfx_passthru
             && ((dev->class_code >> 0x8) == PCI_CLASS_DISPLAY_VGA));
 }
 int xen_pt_register_vga_regions(XenHostPCIDevice *dev);
diff --git a/vl.c b/vl.c
index 5aaea77..d4e51ec 100644
--- a/vl.c
+++ b/vl.c
@@ -1365,13 +1365,6 @@ static inline void semihosting_arg_fallback(const char *file, const char *cmd)
     }
 }
 
-/* Now we still need this for compatibility with XEN. */
-bool has_igd_gfx_passthru;
-static void igd_gfx_passthru(void)
-{
-    has_igd_gfx_passthru = current_machine->igd_gfx_passthru;
-}
-
 /***********************************************************/
 /* USB devices */
 
@@ -4550,9 +4543,6 @@ int main(int argc, char **argv, char **envp)
             exit(1);
     }
 
-    /* Check if IGD GFX passthrough. */
-    igd_gfx_passthru();
-
     /* init generic devices */
     if (qemu_opts_foreach(qemu_find_opts("device"),
                           device_init_func, NULL, NULL)) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 02/11] pc: remove has_igd_gfx_passthru global
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Cao jin, vfio-users-H+wXaHxf7aLQT0dZR+AlfA,
	Paolo Bonzini

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 hw/xen/xen_pt.h |  5 +++--
 vl.c            | 10 ----------
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index 3749711..cdd73ff 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -4,6 +4,7 @@
 #include "qemu-common.h"
 #include "hw/xen/xen_common.h"
 #include "hw/pci/pci.h"
+#include "hw/boards.h"
 #include "xen-host-pci-device.h"
 
 void xen_pt_log(const PCIDevice *d, const char *f, ...) GCC_FMT_ATTR(2, 3);
@@ -322,10 +323,10 @@ extern void *pci_assign_dev_load_option_rom(PCIDevice *dev,
                                             unsigned int domain,
                                             unsigned int bus, unsigned int slot,
                                             unsigned int function);
-extern bool has_igd_gfx_passthru;
 static inline bool is_igd_vga_passthrough(XenHostPCIDevice *dev)
 {
-    return (has_igd_gfx_passthru
+    MachineState *machine = MACHINE(qdev_get_machine());
+    return (machine->igd_gfx_passthru
             && ((dev->class_code >> 0x8) == PCI_CLASS_DISPLAY_VGA));
 }
 int xen_pt_register_vga_regions(XenHostPCIDevice *dev);
diff --git a/vl.c b/vl.c
index 5aaea77..d4e51ec 100644
--- a/vl.c
+++ b/vl.c
@@ -1365,13 +1365,6 @@ static inline void semihosting_arg_fallback(const char *file, const char *cmd)
     }
 }
 
-/* Now we still need this for compatibility with XEN. */
-bool has_igd_gfx_passthru;
-static void igd_gfx_passthru(void)
-{
-    has_igd_gfx_passthru = current_machine->igd_gfx_passthru;
-}
-
 /***********************************************************/
 /* USB devices */
 
@@ -4550,9 +4543,6 @@ int main(int argc, char **argv, char **envp)
             exit(1);
     }
 
-    /* Check if IGD GFX passthrough. */
-    igd_gfx_passthru();
-
     /* init generic devices */
     if (qemu_opts_foreach(qemu_find_opts("device"),
                           device_init_func, NULL, NULL)) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 03/11] pc: move igd support code to igd.c
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	Michael S. Tsirkin, Cao jin, vfio-users, Gerd Hoffmann

Pure code motion, except for dropping instance_size for
TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE (no need to set,
we can inherit it from TYPE_I440FX_PCI_DEVICE).

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/pci-host/Makefile.objs |  3 ++
 hw/pci-host/igd.c         | 96 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/pci-host/piix.c        | 88 -------------------------------------------
 3 files changed, 99 insertions(+), 88 deletions(-)
 create mode 100644 hw/pci-host/igd.c

diff --git a/hw/pci-host/Makefile.objs b/hw/pci-host/Makefile.objs
index 45f1f0e..e341a49 100644
--- a/hw/pci-host/Makefile.objs
+++ b/hw/pci-host/Makefile.objs
@@ -11,6 +11,9 @@ common-obj-$(CONFIG_PPCE500_PCI) += ppce500.o
 # ARM devices
 common-obj-$(CONFIG_VERSATILE_PCI) += versatile.o
 
+# igd passthrough support
+common-obj-$(CONFIG_LINUX) += igd.o
+
 common-obj-$(CONFIG_PCI_APB) += apb.o
 common-obj-$(CONFIG_FULONG) += bonito.o
 common-obj-$(CONFIG_PCI_PIIX) += piix.o
diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
new file mode 100644
index 0000000..ef0273b
--- /dev/null
+++ b/hw/pci-host/igd.c
@@ -0,0 +1,96 @@
+#include "qemu-common.h"
+#include "hw/pci/pci.h"
+#include "hw/i386/pc.h"
+
+/* IGD Passthrough Host Bridge. */
+typedef struct {
+    uint8_t offset;
+    uint8_t len;
+} IGDHostInfo;
+
+/* Here we just expose minimal host bridge offset subset. */
+static const IGDHostInfo igd_host_bridge_infos[] = {
+    {0x08, 2},  /* revision id */
+    {0x2c, 2},  /* sybsystem vendor id */
+    {0x2e, 2},  /* sybsystem id */
+    {0x50, 2},  /* SNB: processor graphics control register */
+    {0x52, 2},  /* processor graphics control register */
+    {0xa4, 4},  /* SNB: graphics base of stolen memory */
+    {0xa8, 4},  /* SNB: base of GTT stolen memory */
+};
+
+static int host_pci_config_read(int pos, int len, uint32_t val)
+{
+    char path[PATH_MAX];
+    int config_fd;
+    ssize_t size = sizeof(path);
+    /* Access real host bridge. */
+    int rc = snprintf(path, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
+                      0, 0, 0, 0, "config");
+    int ret = 0;
+
+    if (rc >= size || rc < 0) {
+        return -ENODEV;
+    }
+
+    config_fd = open(path, O_RDWR);
+    if (config_fd < 0) {
+        return -ENODEV;
+    }
+
+    if (lseek(config_fd, pos, SEEK_SET) != pos) {
+        ret = -errno;
+        goto out;
+    }
+    do {
+        rc = read(config_fd, (uint8_t *)&val, len);
+    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+    if (rc != len) {
+        ret = -errno;
+    }
+out:
+    close(config_fd);
+    return ret;
+}
+
+static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
+{
+    uint32_t val = 0;
+    int rc, i, num;
+    int pos, len;
+
+    num = ARRAY_SIZE(igd_host_bridge_infos);
+    for (i = 0; i < num; i++) {
+        pos = igd_host_bridge_infos[i].offset;
+        len = igd_host_bridge_infos[i].len;
+        rc = host_pci_config_read(pos, len, val);
+        if (rc) {
+            return -ENODEV;
+        }
+        pci_default_write_config(pci_dev, pos, val, len);
+    }
+
+    return 0;
+}
+
+static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->init = igd_pt_i440fx_initfn;
+    dc->desc = "IGD Passthrough Host bridge";
+}
+
+static const TypeInfo igd_passthrough_i440fx_info = {
+    .name          = TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE,
+    .parent        = TYPE_I440FX_PCI_DEVICE,
+    .class_init    = igd_passthrough_i440fx_class_init,
+};
+
+static void igd_register_types(void)
+{
+    type_register_static(&igd_passthrough_i440fx_info);
+}
+
+type_init(igd_register_types)
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 715208b..ccacb57 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -744,93 +744,6 @@ static const TypeInfo i440fx_info = {
     .class_init    = i440fx_class_init,
 };
 
-/* IGD Passthrough Host Bridge. */
-typedef struct {
-    uint8_t offset;
-    uint8_t len;
-} IGDHostInfo;
-
-/* Here we just expose minimal host bridge offset subset. */
-static const IGDHostInfo igd_host_bridge_infos[] = {
-    {0x08, 2},  /* revision id */
-    {0x2c, 2},  /* sybsystem vendor id */
-    {0x2e, 2},  /* sybsystem id */
-    {0x50, 2},  /* SNB: processor graphics control register */
-    {0x52, 2},  /* processor graphics control register */
-    {0xa4, 4},  /* SNB: graphics base of stolen memory */
-    {0xa8, 4},  /* SNB: base of GTT stolen memory */
-};
-
-static int host_pci_config_read(int pos, int len, uint32_t val)
-{
-    char path[PATH_MAX];
-    int config_fd;
-    ssize_t size = sizeof(path);
-    /* Access real host bridge. */
-    int rc = snprintf(path, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
-                      0, 0, 0, 0, "config");
-    int ret = 0;
-
-    if (rc >= size || rc < 0) {
-        return -ENODEV;
-    }
-
-    config_fd = open(path, O_RDWR);
-    if (config_fd < 0) {
-        return -ENODEV;
-    }
-
-    if (lseek(config_fd, pos, SEEK_SET) != pos) {
-        ret = -errno;
-        goto out;
-    }
-    do {
-        rc = read(config_fd, (uint8_t *)&val, len);
-    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
-    if (rc != len) {
-        ret = -errno;
-    }
-out:
-    close(config_fd);
-    return ret;
-}
-
-static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
-{
-    uint32_t val = 0;
-    int rc, i, num;
-    int pos, len;
-
-    num = ARRAY_SIZE(igd_host_bridge_infos);
-    for (i = 0; i < num; i++) {
-        pos = igd_host_bridge_infos[i].offset;
-        len = igd_host_bridge_infos[i].len;
-        rc = host_pci_config_read(pos, len, val);
-        if (rc) {
-            return -ENODEV;
-        }
-        pci_default_write_config(pci_dev, pos, val, len);
-    }
-
-    return 0;
-}
-
-static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
-{
-    DeviceClass *dc = DEVICE_CLASS(klass);
-    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
-
-    k->init = igd_pt_i440fx_initfn;
-    dc->desc = "IGD Passthrough Host bridge";
-}
-
-static const TypeInfo igd_passthrough_i440fx_info = {
-    .name          = TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE,
-    .parent        = TYPE_I440FX_PCI_DEVICE,
-    .instance_size = sizeof(PCII440FXState),
-    .class_init    = igd_passthrough_i440fx_class_init,
-};
-
 static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge,
                                                 PCIBus *rootbus)
 {
@@ -872,7 +785,6 @@ static const TypeInfo i440fx_pcihost_info = {
 static void i440fx_register_types(void)
 {
     type_register_static(&i440fx_info);
-    type_register_static(&igd_passthrough_i440fx_info);
     type_register_static(&piix3_pci_type_info);
     type_register_static(&piix3_info);
     type_register_static(&piix3_xen_info);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 03/11] pc: move igd support code to igd.c
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Michael S. Tsirkin, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA

Pure code motion, except for dropping instance_size for
TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE (no need to set,
we can inherit it from TYPE_I440FX_PCI_DEVICE).

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Acked-by: Stefano Stabellini <stefano.stabellini-mvvWK6WmYclDPfheJLI6IQ@public.gmane.org>
---
 hw/pci-host/Makefile.objs |  3 ++
 hw/pci-host/igd.c         | 96 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/pci-host/piix.c        | 88 -------------------------------------------
 3 files changed, 99 insertions(+), 88 deletions(-)
 create mode 100644 hw/pci-host/igd.c

diff --git a/hw/pci-host/Makefile.objs b/hw/pci-host/Makefile.objs
index 45f1f0e..e341a49 100644
--- a/hw/pci-host/Makefile.objs
+++ b/hw/pci-host/Makefile.objs
@@ -11,6 +11,9 @@ common-obj-$(CONFIG_PPCE500_PCI) += ppce500.o
 # ARM devices
 common-obj-$(CONFIG_VERSATILE_PCI) += versatile.o
 
+# igd passthrough support
+common-obj-$(CONFIG_LINUX) += igd.o
+
 common-obj-$(CONFIG_PCI_APB) += apb.o
 common-obj-$(CONFIG_FULONG) += bonito.o
 common-obj-$(CONFIG_PCI_PIIX) += piix.o
diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
new file mode 100644
index 0000000..ef0273b
--- /dev/null
+++ b/hw/pci-host/igd.c
@@ -0,0 +1,96 @@
+#include "qemu-common.h"
+#include "hw/pci/pci.h"
+#include "hw/i386/pc.h"
+
+/* IGD Passthrough Host Bridge. */
+typedef struct {
+    uint8_t offset;
+    uint8_t len;
+} IGDHostInfo;
+
+/* Here we just expose minimal host bridge offset subset. */
+static const IGDHostInfo igd_host_bridge_infos[] = {
+    {0x08, 2},  /* revision id */
+    {0x2c, 2},  /* sybsystem vendor id */
+    {0x2e, 2},  /* sybsystem id */
+    {0x50, 2},  /* SNB: processor graphics control register */
+    {0x52, 2},  /* processor graphics control register */
+    {0xa4, 4},  /* SNB: graphics base of stolen memory */
+    {0xa8, 4},  /* SNB: base of GTT stolen memory */
+};
+
+static int host_pci_config_read(int pos, int len, uint32_t val)
+{
+    char path[PATH_MAX];
+    int config_fd;
+    ssize_t size = sizeof(path);
+    /* Access real host bridge. */
+    int rc = snprintf(path, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
+                      0, 0, 0, 0, "config");
+    int ret = 0;
+
+    if (rc >= size || rc < 0) {
+        return -ENODEV;
+    }
+
+    config_fd = open(path, O_RDWR);
+    if (config_fd < 0) {
+        return -ENODEV;
+    }
+
+    if (lseek(config_fd, pos, SEEK_SET) != pos) {
+        ret = -errno;
+        goto out;
+    }
+    do {
+        rc = read(config_fd, (uint8_t *)&val, len);
+    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+    if (rc != len) {
+        ret = -errno;
+    }
+out:
+    close(config_fd);
+    return ret;
+}
+
+static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
+{
+    uint32_t val = 0;
+    int rc, i, num;
+    int pos, len;
+
+    num = ARRAY_SIZE(igd_host_bridge_infos);
+    for (i = 0; i < num; i++) {
+        pos = igd_host_bridge_infos[i].offset;
+        len = igd_host_bridge_infos[i].len;
+        rc = host_pci_config_read(pos, len, val);
+        if (rc) {
+            return -ENODEV;
+        }
+        pci_default_write_config(pci_dev, pos, val, len);
+    }
+
+    return 0;
+}
+
+static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->init = igd_pt_i440fx_initfn;
+    dc->desc = "IGD Passthrough Host bridge";
+}
+
+static const TypeInfo igd_passthrough_i440fx_info = {
+    .name          = TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE,
+    .parent        = TYPE_I440FX_PCI_DEVICE,
+    .class_init    = igd_passthrough_i440fx_class_init,
+};
+
+static void igd_register_types(void)
+{
+    type_register_static(&igd_passthrough_i440fx_info);
+}
+
+type_init(igd_register_types)
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 715208b..ccacb57 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -744,93 +744,6 @@ static const TypeInfo i440fx_info = {
     .class_init    = i440fx_class_init,
 };
 
-/* IGD Passthrough Host Bridge. */
-typedef struct {
-    uint8_t offset;
-    uint8_t len;
-} IGDHostInfo;
-
-/* Here we just expose minimal host bridge offset subset. */
-static const IGDHostInfo igd_host_bridge_infos[] = {
-    {0x08, 2},  /* revision id */
-    {0x2c, 2},  /* sybsystem vendor id */
-    {0x2e, 2},  /* sybsystem id */
-    {0x50, 2},  /* SNB: processor graphics control register */
-    {0x52, 2},  /* processor graphics control register */
-    {0xa4, 4},  /* SNB: graphics base of stolen memory */
-    {0xa8, 4},  /* SNB: base of GTT stolen memory */
-};
-
-static int host_pci_config_read(int pos, int len, uint32_t val)
-{
-    char path[PATH_MAX];
-    int config_fd;
-    ssize_t size = sizeof(path);
-    /* Access real host bridge. */
-    int rc = snprintf(path, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
-                      0, 0, 0, 0, "config");
-    int ret = 0;
-
-    if (rc >= size || rc < 0) {
-        return -ENODEV;
-    }
-
-    config_fd = open(path, O_RDWR);
-    if (config_fd < 0) {
-        return -ENODEV;
-    }
-
-    if (lseek(config_fd, pos, SEEK_SET) != pos) {
-        ret = -errno;
-        goto out;
-    }
-    do {
-        rc = read(config_fd, (uint8_t *)&val, len);
-    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
-    if (rc != len) {
-        ret = -errno;
-    }
-out:
-    close(config_fd);
-    return ret;
-}
-
-static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
-{
-    uint32_t val = 0;
-    int rc, i, num;
-    int pos, len;
-
-    num = ARRAY_SIZE(igd_host_bridge_infos);
-    for (i = 0; i < num; i++) {
-        pos = igd_host_bridge_infos[i].offset;
-        len = igd_host_bridge_infos[i].len;
-        rc = host_pci_config_read(pos, len, val);
-        if (rc) {
-            return -ENODEV;
-        }
-        pci_default_write_config(pci_dev, pos, val, len);
-    }
-
-    return 0;
-}
-
-static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
-{
-    DeviceClass *dc = DEVICE_CLASS(klass);
-    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
-
-    k->init = igd_pt_i440fx_initfn;
-    dc->desc = "IGD Passthrough Host bridge";
-}
-
-static const TypeInfo igd_passthrough_i440fx_info = {
-    .name          = TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE,
-    .parent        = TYPE_I440FX_PCI_DEVICE,
-    .instance_size = sizeof(PCII440FXState),
-    .class_init    = igd_passthrough_i440fx_class_init,
-};
-
 static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge,
                                                 PCIBus *rootbus)
 {
@@ -872,7 +785,6 @@ static const TypeInfo i440fx_pcihost_info = {
 static void i440fx_register_types(void)
 {
     type_register_static(&i440fx_info);
-    type_register_static(&igd_passthrough_i440fx_info);
     type_register_static(&piix3_pci_type_info);
     type_register_static(&piix3_info);
     type_register_static(&piix3_xen_info);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users, Gerd Hoffmann

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/pci-host/igd.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index ef0273b..d1eeafb 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -53,7 +53,7 @@ out:
     return ret;
 }
 
-static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
+static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
 {
     uint32_t val = 0;
     int rc, i, num;
@@ -65,12 +65,11 @@ static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
         len = igd_host_bridge_infos[i].len;
         rc = host_pci_config_read(pos, len, val);
         if (rc) {
-            return -ENODEV;
+            error_setg(errp, "failed to read host config");
+            return;
         }
         pci_default_write_config(pci_dev, pos, val, len);
     }
-
-    return 0;
 }
 
 static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
@@ -78,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
     DeviceClass *dc = DEVICE_CLASS(klass);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-    k->init = igd_pt_i440fx_initfn;
+    k->realize = igd_pt_i440fx_realize;
     dc->desc = "IGD Passthrough Host bridge";
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Cao jin, vfio-users-H+wXaHxf7aLQT0dZR+AlfA

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 hw/pci-host/igd.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index ef0273b..d1eeafb 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -53,7 +53,7 @@ out:
     return ret;
 }
 
-static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
+static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
 {
     uint32_t val = 0;
     int rc, i, num;
@@ -65,12 +65,11 @@ static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
         len = igd_host_bridge_infos[i].len;
         rc = host_pci_config_read(pos, len, val);
         if (rc) {
-            return -ENODEV;
+            error_setg(errp, "failed to read host config");
+            return;
         }
         pci_default_write_config(pci_dev, pos, val, len);
     }
-
-    return 0;
 }
 
 static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
@@ -78,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
     DeviceClass *dc = DEVICE_CLASS(klass);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-    k->init = igd_pt_i440fx_initfn;
+    k->realize = igd_pt_i440fx_realize;
     dc->desc = "IGD Passthrough Host bridge";
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users, Gerd Hoffmann

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/pci-host/igd.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index d1eeafb..6f52ab1 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -53,12 +53,20 @@ out:
     return ret;
 }
 
+static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
 static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
 {
+    Error *err = NULL;
     uint32_t val = 0;
     int rc, i, num;
     int pos, len;
 
+    i440fx_realize(pci_dev, &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        return;
+    }
+
     num = ARRAY_SIZE(igd_host_bridge_infos);
     for (i = 0; i < num; i++) {
         pos = igd_host_bridge_infos[i].offset;
@@ -77,6 +85,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
     DeviceClass *dc = DEVICE_CLASS(klass);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
+    i440fx_realize = k->realize;
     k->realize = igd_pt_i440fx_realize;
     dc->desc = "IGD Passthrough Host bridge";
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Cao jin, vfio-users-H+wXaHxf7aLQT0dZR+AlfA

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 hw/pci-host/igd.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index d1eeafb..6f52ab1 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -53,12 +53,20 @@ out:
     return ret;
 }
 
+static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
 static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
 {
+    Error *err = NULL;
     uint32_t val = 0;
     int rc, i, num;
     int pos, len;
 
+    i440fx_realize(pci_dev, &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        return;
+    }
+
     num = ARRAY_SIZE(igd_host_bridge_infos);
     for (i = 0; i < num; i++) {
         pos = igd_host_bridge_infos[i].offset;
@@ -77,6 +85,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
     DeviceClass *dc = DEVICE_CLASS(klass);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
+    i440fx_realize = k->realize;
     k->realize = igd_pt_i440fx_realize;
     dc->desc = "IGD Passthrough Host bridge";
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 06/11] igd: use defines for standard pci config space offsets
  2016-01-05 11:41 ` Gerd Hoffmann
@ 2016-01-05 11:41   ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users, Gerd Hoffmann

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/pci-host/igd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index 6f52ab1..0784128 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -10,9 +10,9 @@ typedef struct {
 
 /* Here we just expose minimal host bridge offset subset. */
 static const IGDHostInfo igd_host_bridge_infos[] = {
-    {0x08, 2},  /* revision id */
-    {0x2c, 2},  /* sybsystem vendor id */
-    {0x2e, 2},  /* sybsystem id */
+    {PCI_REVISION_ID,         2},
+    {PCI_SUBSYSTEM_VENDOR_ID, 2},
+    {PCI_SUBSYSTEM_ID,        2},
     {0x50, 2},  /* SNB: processor graphics control register */
     {0x52, 2},  /* processor graphics control register */
     {0xa4, 4},  /* SNB: graphics base of stolen memory */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 06/11] igd: use defines for standard pci config space offsets
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users, Gerd Hoffmann

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/pci-host/igd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index 6f52ab1..0784128 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -10,9 +10,9 @@ typedef struct {
 
 /* Here we just expose minimal host bridge offset subset. */
 static const IGDHostInfo igd_host_bridge_infos[] = {
-    {0x08, 2},  /* revision id */
-    {0x2c, 2},  /* sybsystem vendor id */
-    {0x2e, 2},  /* sybsystem id */
+    {PCI_REVISION_ID,         2},
+    {PCI_SUBSYSTEM_VENDOR_ID, 2},
+    {PCI_SUBSYSTEM_ID,        2},
     {0x50, 2},  /* SNB: processor graphics control register */
     {0x52, 2},  /* processor graphics control register */
     {0xa4, 4},  /* SNB: graphics base of stolen memory */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 07/11] igd: revamp host config read
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users, Gerd Hoffmann

Move all work to the host_pci_config_copy helper function,
which we can easily reuse when adding q35 support.
Open sysfs file only once for all values.  Use pread.
Proper error handling.  Fix bugs:

 * Don't throw away results (like old host_pci_config_read
   did because val was passed by value not reference).
 * Update config space directly (writing via
   pci_default_write_config only works for registers
   whitelisted in wmask).

Hmm, this code can hardly ever worked before,
/me wonders what test coverage it had.

With this patch in place igd-passthru=on actually
works, although it still requires root priviledges
because linux refuses to allow non-root users access
pci config space above offset 0x50.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/pci-host/igd.c | 65 +++++++++++++++++++++++--------------------------------
 1 file changed, 27 insertions(+), 38 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index 0784128..ec48875 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -19,47 +19,39 @@ static const IGDHostInfo igd_host_bridge_infos[] = {
     {0xa8, 4},  /* SNB: base of GTT stolen memory */
 };
 
-static int host_pci_config_read(int pos, int len, uint32_t val)
+static void host_pci_config_copy(PCIDevice *guest, const char *host,
+                                 const IGDHostInfo *list, int len, Error **errp)
 {
-    char path[PATH_MAX];
-    int config_fd;
-    ssize_t size = sizeof(path);
-    /* Access real host bridge. */
-    int rc = snprintf(path, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
-                      0, 0, 0, 0, "config");
-    int ret = 0;
+    char *path;
+    int config_fd, rc, i;
 
-    if (rc >= size || rc < 0) {
-        return -ENODEV;
-    }
-
-    config_fd = open(path, O_RDWR);
+    path = g_strdup_printf("/sys/bus/pci/devices/%s/config", host);
+    config_fd = open(path, O_RDONLY);
     if (config_fd < 0) {
-        return -ENODEV;
+        error_setg_file_open(errp, errno, path);
+        goto out_free;
     }
 
-    if (lseek(config_fd, pos, SEEK_SET) != pos) {
-        ret = -errno;
-        goto out;
+    for (i = 0; i < len; i++) {
+        rc = pread(config_fd, guest->config + list[i].offset,
+                   list[i].len, list[i].offset);
+        if (rc != list[i].len) {
+            error_setg_errno(errp, errno, "read %s, offset 0x%x",
+                             path, list[i].offset);
+            goto out_close;
+        }
     }
-    do {
-        rc = read(config_fd, (uint8_t *)&val, len);
-    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
-    if (rc != len) {
-        ret = -errno;
-    }
-out:
+
+out_close:
     close(config_fd);
-    return ret;
+out_free:
+    g_free(path);
 }
 
 static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
 static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
 {
     Error *err = NULL;
-    uint32_t val = 0;
-    int rc, i, num;
-    int pos, len;
 
     i440fx_realize(pci_dev, &err);
     if (err != NULL) {
@@ -67,16 +59,13 @@ static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
         return;
     }
 
-    num = ARRAY_SIZE(igd_host_bridge_infos);
-    for (i = 0; i < num; i++) {
-        pos = igd_host_bridge_infos[i].offset;
-        len = igd_host_bridge_infos[i].len;
-        rc = host_pci_config_read(pos, len, val);
-        if (rc) {
-            error_setg(errp, "failed to read host config");
-            return;
-        }
-        pci_default_write_config(pci_dev, pos, val, len);
+    host_pci_config_copy(pci_dev, "0000:00:00.0",
+                         igd_host_bridge_infos,
+                         ARRAY_SIZE(igd_host_bridge_infos),
+                         &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        return;
     }
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 07/11] igd: revamp host config read
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Cao jin, vfio-users-H+wXaHxf7aLQT0dZR+AlfA

Move all work to the host_pci_config_copy helper function,
which we can easily reuse when adding q35 support.
Open sysfs file only once for all values.  Use pread.
Proper error handling.  Fix bugs:

 * Don't throw away results (like old host_pci_config_read
   did because val was passed by value not reference).
 * Update config space directly (writing via
   pci_default_write_config only works for registers
   whitelisted in wmask).

Hmm, this code can hardly ever worked before,
/me wonders what test coverage it had.

With this patch in place igd-passthru=on actually
works, although it still requires root priviledges
because linux refuses to allow non-root users access
pci config space above offset 0x50.

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 hw/pci-host/igd.c | 65 +++++++++++++++++++++++--------------------------------
 1 file changed, 27 insertions(+), 38 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index 0784128..ec48875 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -19,47 +19,39 @@ static const IGDHostInfo igd_host_bridge_infos[] = {
     {0xa8, 4},  /* SNB: base of GTT stolen memory */
 };
 
-static int host_pci_config_read(int pos, int len, uint32_t val)
+static void host_pci_config_copy(PCIDevice *guest, const char *host,
+                                 const IGDHostInfo *list, int len, Error **errp)
 {
-    char path[PATH_MAX];
-    int config_fd;
-    ssize_t size = sizeof(path);
-    /* Access real host bridge. */
-    int rc = snprintf(path, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
-                      0, 0, 0, 0, "config");
-    int ret = 0;
+    char *path;
+    int config_fd, rc, i;
 
-    if (rc >= size || rc < 0) {
-        return -ENODEV;
-    }
-
-    config_fd = open(path, O_RDWR);
+    path = g_strdup_printf("/sys/bus/pci/devices/%s/config", host);
+    config_fd = open(path, O_RDONLY);
     if (config_fd < 0) {
-        return -ENODEV;
+        error_setg_file_open(errp, errno, path);
+        goto out_free;
     }
 
-    if (lseek(config_fd, pos, SEEK_SET) != pos) {
-        ret = -errno;
-        goto out;
+    for (i = 0; i < len; i++) {
+        rc = pread(config_fd, guest->config + list[i].offset,
+                   list[i].len, list[i].offset);
+        if (rc != list[i].len) {
+            error_setg_errno(errp, errno, "read %s, offset 0x%x",
+                             path, list[i].offset);
+            goto out_close;
+        }
     }
-    do {
-        rc = read(config_fd, (uint8_t *)&val, len);
-    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
-    if (rc != len) {
-        ret = -errno;
-    }
-out:
+
+out_close:
     close(config_fd);
-    return ret;
+out_free:
+    g_free(path);
 }
 
 static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
 static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
 {
     Error *err = NULL;
-    uint32_t val = 0;
-    int rc, i, num;
-    int pos, len;
 
     i440fx_realize(pci_dev, &err);
     if (err != NULL) {
@@ -67,16 +59,13 @@ static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
         return;
     }
 
-    num = ARRAY_SIZE(igd_host_bridge_infos);
-    for (i = 0; i < num; i++) {
-        pos = igd_host_bridge_infos[i].offset;
-        len = igd_host_bridge_infos[i].len;
-        rc = host_pci_config_read(pos, len, val);
-        if (rc) {
-            error_setg(errp, "failed to read host config");
-            return;
-        }
-        pci_default_write_config(pci_dev, pos, val, len);
+    host_pci_config_copy(pci_dev, "0000:00:00.0",
+                         igd_host_bridge_infos,
+                         ARRAY_SIZE(igd_host_bridge_infos),
+                         &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        return;
     }
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 08/11] igd: add q35 support
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	Michael S. Tsirkin, Cao jin, vfio-users, Gerd Hoffmann

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/pci-host/igd.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 hw/pci-host/q35.c |  6 +++++-
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index ec48875..f6e3f7a 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -1,5 +1,6 @@
 #include "qemu-common.h"
 #include "hw/pci/pci.h"
+#include "hw/pci-host/q35.h"
 #include "hw/i386/pc.h"
 
 /* IGD Passthrough Host Bridge. */
@@ -76,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
 
     i440fx_realize = k->realize;
     k->realize = igd_pt_i440fx_realize;
-    dc->desc = "IGD Passthrough Host bridge";
+    dc->desc = "IGD Passthrough Host bridge (i440fx)";
 }
 
 static const TypeInfo igd_passthrough_i440fx_info = {
@@ -85,9 +86,47 @@ static const TypeInfo igd_passthrough_i440fx_info = {
     .class_init    = igd_passthrough_i440fx_class_init,
 };
 
+static void (*q35_realize)(PCIDevice *pci_dev, Error **errp);
+static void igd_pt_q35_realize(PCIDevice *pci_dev, Error **errp)
+{
+    Error *err = NULL;
+
+    q35_realize(pci_dev, &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    host_pci_config_copy(pci_dev, "0000:00:00.0",
+                         igd_host_bridge_infos,
+                         ARRAY_SIZE(igd_host_bridge_infos),
+                         &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        return;
+    }
+}
+
+static void igd_passthrough_q35_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    q35_realize = k->realize;
+    k->realize = igd_pt_q35_realize;
+    dc->desc = "IGD Passthrough Host bridge (q35)";
+}
+
+static const TypeInfo igd_passthrough_q35_info = {
+    .name          = "igd-passthrough-q35-mch",
+    .parent        = TYPE_MCH_PCI_DEVICE,
+    .class_init    = igd_passthrough_q35_class_init,
+};
+
 static void igd_register_types(void)
 {
     type_register_static(&igd_passthrough_i440fx_info);
+    type_register_static(&igd_passthrough_q35_info);
 }
 
 type_init(igd_register_types)
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 1fb4707..07dc595 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -151,7 +151,11 @@ static void q35_host_initfn(Object *obj)
     memory_region_init_io(&phb->data_mem, obj, &pci_host_data_le_ops, phb,
                           "pci-conf-data", 4);
 
-    object_initialize(&s->mch, sizeof(s->mch), TYPE_MCH_PCI_DEVICE);
+    if (object_property_get_bool(qdev_get_machine(), "igd-passthru", NULL)) {
+        object_initialize(&s->mch, sizeof(s->mch), "igd-passthrough-q35-mch");
+    } else {
+        object_initialize(&s->mch, sizeof(s->mch), TYPE_MCH_PCI_DEVICE);
+    }
     object_property_add_child(OBJECT(s), "mch", OBJECT(&s->mch), NULL);
     qdev_prop_set_uint32(DEVICE(&s->mch), "addr", PCI_DEVFN(0, 0));
     qdev_prop_set_bit(DEVICE(&s->mch), "multifunction", false);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 08/11] igd: add q35 support
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Michael S. Tsirkin, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 hw/pci-host/igd.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 hw/pci-host/q35.c |  6 +++++-
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index ec48875..f6e3f7a 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -1,5 +1,6 @@
 #include "qemu-common.h"
 #include "hw/pci/pci.h"
+#include "hw/pci-host/q35.h"
 #include "hw/i386/pc.h"
 
 /* IGD Passthrough Host Bridge. */
@@ -76,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
 
     i440fx_realize = k->realize;
     k->realize = igd_pt_i440fx_realize;
-    dc->desc = "IGD Passthrough Host bridge";
+    dc->desc = "IGD Passthrough Host bridge (i440fx)";
 }
 
 static const TypeInfo igd_passthrough_i440fx_info = {
@@ -85,9 +86,47 @@ static const TypeInfo igd_passthrough_i440fx_info = {
     .class_init    = igd_passthrough_i440fx_class_init,
 };
 
+static void (*q35_realize)(PCIDevice *pci_dev, Error **errp);
+static void igd_pt_q35_realize(PCIDevice *pci_dev, Error **errp)
+{
+    Error *err = NULL;
+
+    q35_realize(pci_dev, &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    host_pci_config_copy(pci_dev, "0000:00:00.0",
+                         igd_host_bridge_infos,
+                         ARRAY_SIZE(igd_host_bridge_infos),
+                         &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        return;
+    }
+}
+
+static void igd_passthrough_q35_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    q35_realize = k->realize;
+    k->realize = igd_pt_q35_realize;
+    dc->desc = "IGD Passthrough Host bridge (q35)";
+}
+
+static const TypeInfo igd_passthrough_q35_info = {
+    .name          = "igd-passthrough-q35-mch",
+    .parent        = TYPE_MCH_PCI_DEVICE,
+    .class_init    = igd_passthrough_q35_class_init,
+};
+
 static void igd_register_types(void)
 {
     type_register_static(&igd_passthrough_i440fx_info);
+    type_register_static(&igd_passthrough_q35_info);
 }
 
 type_init(igd_register_types)
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 1fb4707..07dc595 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -151,7 +151,11 @@ static void q35_host_initfn(Object *obj)
     memory_region_init_io(&phb->data_mem, obj, &pci_host_data_le_ops, phb,
                           "pci-conf-data", 4);
 
-    object_initialize(&s->mch, sizeof(s->mch), TYPE_MCH_PCI_DEVICE);
+    if (object_property_get_bool(qdev_get_machine(), "igd-passthru", NULL)) {
+        object_initialize(&s->mch, sizeof(s->mch), "igd-passthrough-q35-mch");
+    } else {
+        object_initialize(&s->mch, sizeof(s->mch), TYPE_MCH_PCI_DEVICE);
+    }
     object_property_add_child(OBJECT(s), "mch", OBJECT(&s->mch), NULL);
     qdev_prop_set_uint32(DEVICE(&s->mch), "addr", PCI_DEVFN(0, 0));
     qdev_prop_set_bit(DEVICE(&s->mch), "multifunction", false);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 09/11] igd: move igd-passthrough-isa-bridge to igd.c too
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	Michael S. Tsirkin, Cao jin, vfio-users, Paolo Bonzini,
	Richard Henderson, Gerd Hoffmann

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/i386/pc_piix.c | 113 ------------------------------------------------------
 hw/pci-host/igd.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+), 113 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 6532e32..f36222e 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -914,119 +914,6 @@ static void pc_i440fx_0_10_machine_options(MachineClass *m)
 DEFINE_I440FX_MACHINE(v0_10, "pc-0.10", pc_compat_0_13,
                       pc_i440fx_0_10_machine_options);
 
-typedef struct {
-    uint16_t gpu_device_id;
-    uint16_t pch_device_id;
-    uint8_t pch_revision_id;
-} IGDDeviceIDInfo;
-
-/* In real world different GPU should have different PCH. But actually
- * the different PCH DIDs likely map to different PCH SKUs. We do the
- * same thing for the GPU. For PCH, the different SKUs are going to be
- * all the same silicon design and implementation, just different
- * features turn on and off with fuses. The SW interfaces should be
- * consistent across all SKUs in a given family (eg LPT). But just same
- * features may not be supported.
- *
- * Most of these different PCH features probably don't matter to the
- * Gfx driver, but obviously any difference in display port connections
- * will so it should be fine with any PCH in case of passthrough.
- *
- * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell)
- * scenarios, 0x9cc3 for BDW(Broadwell).
- */
-static const IGDDeviceIDInfo igd_combo_id_infos[] = {
-    /* HSW Classic */
-    {0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */
-    {0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */
-    {0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */
-    {0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */
-    {0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */
-    /* HSW ULT */
-    {0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */
-    {0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */
-    {0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */
-    {0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */
-    {0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */
-    {0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */
-    /* HSW CRW */
-    {0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */
-    {0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */
-    /* HSW Server */
-    {0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */
-    /* HSW SRVR */
-    {0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */
-    /* BSW */
-    {0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */
-    {0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */
-    {0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */
-    {0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */
-    {0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */
-    {0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */
-    {0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */
-    {0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */
-    {0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */
-    {0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */
-    {0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */
-};
-
-static void isa_bridge_class_init(ObjectClass *klass, void *data)
-{
-    DeviceClass *dc = DEVICE_CLASS(klass);
-    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
-
-    dc->desc        = "ISA bridge faked to support IGD PT";
-    k->vendor_id    = PCI_VENDOR_ID_INTEL;
-    k->class_id     = PCI_CLASS_BRIDGE_ISA;
-};
-
-static TypeInfo isa_bridge_info = {
-    .name          = "igd-passthrough-isa-bridge",
-    .parent        = TYPE_PCI_DEVICE,
-    .instance_size = sizeof(PCIDevice),
-    .class_init = isa_bridge_class_init,
-};
-
-static void pt_graphics_register_types(void)
-{
-    type_register_static(&isa_bridge_info);
-}
-type_init(pt_graphics_register_types)
-
-void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id)
-{
-    struct PCIDevice *bridge_dev;
-    int i, num;
-    uint16_t pch_dev_id = 0xffff;
-    uint8_t pch_rev_id;
-
-    num = ARRAY_SIZE(igd_combo_id_infos);
-    for (i = 0; i < num; i++) {
-        if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) {
-            pch_dev_id = igd_combo_id_infos[i].pch_device_id;
-            pch_rev_id = igd_combo_id_infos[i].pch_revision_id;
-        }
-    }
-
-    if (pch_dev_id == 0xffff) {
-        return;
-    }
-
-    /* Currently IGD drivers always need to access PCH by 1f.0. */
-    bridge_dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
-                                   "igd-passthrough-isa-bridge");
-
-    /*
-     * Note that vendor id is always PCI_VENDOR_ID_INTEL.
-     */
-    if (!bridge_dev) {
-        fprintf(stderr, "set igd-passthrough-isa-bridge failed!\n");
-        return;
-    }
-    pci_config_set_device_id(bridge_dev->config, pch_dev_id);
-    pci_config_set_revision(bridge_dev->config, pch_rev_id);
-}
-
 static void isapc_machine_options(MachineClass *m)
 {
     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index f6e3f7a..96b679d 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -123,10 +123,118 @@ static const TypeInfo igd_passthrough_q35_info = {
     .class_init    = igd_passthrough_q35_class_init,
 };
 
+typedef struct {
+    uint16_t gpu_device_id;
+    uint16_t pch_device_id;
+    uint8_t pch_revision_id;
+} IGDDeviceIDInfo;
+
+/* In real world different GPU should have different PCH. But actually
+ * the different PCH DIDs likely map to different PCH SKUs. We do the
+ * same thing for the GPU. For PCH, the different SKUs are going to be
+ * all the same silicon design and implementation, just different
+ * features turn on and off with fuses. The SW interfaces should be
+ * consistent across all SKUs in a given family (eg LPT). But just same
+ * features may not be supported.
+ *
+ * Most of these different PCH features probably don't matter to the
+ * Gfx driver, but obviously any difference in display port connections
+ * will so it should be fine with any PCH in case of passthrough.
+ *
+ * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell)
+ * scenarios, 0x9cc3 for BDW(Broadwell).
+ */
+static const IGDDeviceIDInfo igd_combo_id_infos[] = {
+    /* HSW Classic */
+    {0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */
+    {0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */
+    {0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */
+    {0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */
+    {0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */
+    /* HSW ULT */
+    {0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */
+    {0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */
+    {0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */
+    {0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */
+    {0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */
+    {0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */
+    /* HSW CRW */
+    {0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */
+    {0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */
+    /* HSW Server */
+    {0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */
+    /* HSW SRVR */
+    {0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */
+    /* BSW */
+    {0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */
+    {0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */
+    {0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */
+    {0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */
+    {0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */
+    {0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */
+    {0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */
+    {0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */
+    {0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */
+    {0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */
+    {0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */
+};
+
+static void isa_bridge_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    dc->desc        = "ISA bridge faked to support IGD PT";
+    k->vendor_id    = PCI_VENDOR_ID_INTEL;
+    k->class_id     = PCI_CLASS_BRIDGE_ISA;
+};
+
+static TypeInfo igd_passthrough_isa_bridge_info = {
+    .name          = "igd-passthrough-isa-bridge",
+    .parent        = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(PCIDevice),
+    .class_init = isa_bridge_class_init,
+};
+
+void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id)
+{
+    struct PCIDevice *bridge_dev;
+    int i, num;
+    uint16_t pch_dev_id = 0xffff;
+    uint8_t pch_rev_id;
+
+    num = ARRAY_SIZE(igd_combo_id_infos);
+    for (i = 0; i < num; i++) {
+        if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) {
+            pch_dev_id = igd_combo_id_infos[i].pch_device_id;
+            pch_rev_id = igd_combo_id_infos[i].pch_revision_id;
+        }
+    }
+
+    if (pch_dev_id == 0xffff) {
+        return;
+    }
+
+    /* Currently IGD drivers always need to access PCH by 1f.0. */
+    bridge_dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
+                                   "igd-passthrough-isa-bridge");
+
+    /*
+     * Note that vendor id is always PCI_VENDOR_ID_INTEL.
+     */
+    if (!bridge_dev) {
+        fprintf(stderr, "set igd-passthrough-isa-bridge failed!\n");
+        return;
+    }
+    pci_config_set_device_id(bridge_dev->config, pch_dev_id);
+    pci_config_set_revision(bridge_dev->config, pch_rev_id);
+}
+
 static void igd_register_types(void)
 {
     type_register_static(&igd_passthrough_i440fx_info);
     type_register_static(&igd_passthrough_q35_info);
+    type_register_static(&igd_passthrough_isa_bridge_info);
 }
 
 type_init(igd_register_types)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 09/11] igd: move igd-passthrough-isa-bridge to igd.c too
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Michael S. Tsirkin, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA, Paolo Bonzini,
	Richard Henderson

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 hw/i386/pc_piix.c | 113 ------------------------------------------------------
 hw/pci-host/igd.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+), 113 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 6532e32..f36222e 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -914,119 +914,6 @@ static void pc_i440fx_0_10_machine_options(MachineClass *m)
 DEFINE_I440FX_MACHINE(v0_10, "pc-0.10", pc_compat_0_13,
                       pc_i440fx_0_10_machine_options);
 
-typedef struct {
-    uint16_t gpu_device_id;
-    uint16_t pch_device_id;
-    uint8_t pch_revision_id;
-} IGDDeviceIDInfo;
-
-/* In real world different GPU should have different PCH. But actually
- * the different PCH DIDs likely map to different PCH SKUs. We do the
- * same thing for the GPU. For PCH, the different SKUs are going to be
- * all the same silicon design and implementation, just different
- * features turn on and off with fuses. The SW interfaces should be
- * consistent across all SKUs in a given family (eg LPT). But just same
- * features may not be supported.
- *
- * Most of these different PCH features probably don't matter to the
- * Gfx driver, but obviously any difference in display port connections
- * will so it should be fine with any PCH in case of passthrough.
- *
- * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell)
- * scenarios, 0x9cc3 for BDW(Broadwell).
- */
-static const IGDDeviceIDInfo igd_combo_id_infos[] = {
-    /* HSW Classic */
-    {0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */
-    {0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */
-    {0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */
-    {0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */
-    {0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */
-    /* HSW ULT */
-    {0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */
-    {0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */
-    {0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */
-    {0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */
-    {0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */
-    {0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */
-    /* HSW CRW */
-    {0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */
-    {0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */
-    /* HSW Server */
-    {0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */
-    /* HSW SRVR */
-    {0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */
-    /* BSW */
-    {0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */
-    {0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */
-    {0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */
-    {0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */
-    {0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */
-    {0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */
-    {0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */
-    {0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */
-    {0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */
-    {0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */
-    {0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */
-};
-
-static void isa_bridge_class_init(ObjectClass *klass, void *data)
-{
-    DeviceClass *dc = DEVICE_CLASS(klass);
-    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
-
-    dc->desc        = "ISA bridge faked to support IGD PT";
-    k->vendor_id    = PCI_VENDOR_ID_INTEL;
-    k->class_id     = PCI_CLASS_BRIDGE_ISA;
-};
-
-static TypeInfo isa_bridge_info = {
-    .name          = "igd-passthrough-isa-bridge",
-    .parent        = TYPE_PCI_DEVICE,
-    .instance_size = sizeof(PCIDevice),
-    .class_init = isa_bridge_class_init,
-};
-
-static void pt_graphics_register_types(void)
-{
-    type_register_static(&isa_bridge_info);
-}
-type_init(pt_graphics_register_types)
-
-void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id)
-{
-    struct PCIDevice *bridge_dev;
-    int i, num;
-    uint16_t pch_dev_id = 0xffff;
-    uint8_t pch_rev_id;
-
-    num = ARRAY_SIZE(igd_combo_id_infos);
-    for (i = 0; i < num; i++) {
-        if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) {
-            pch_dev_id = igd_combo_id_infos[i].pch_device_id;
-            pch_rev_id = igd_combo_id_infos[i].pch_revision_id;
-        }
-    }
-
-    if (pch_dev_id == 0xffff) {
-        return;
-    }
-
-    /* Currently IGD drivers always need to access PCH by 1f.0. */
-    bridge_dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
-                                   "igd-passthrough-isa-bridge");
-
-    /*
-     * Note that vendor id is always PCI_VENDOR_ID_INTEL.
-     */
-    if (!bridge_dev) {
-        fprintf(stderr, "set igd-passthrough-isa-bridge failed!\n");
-        return;
-    }
-    pci_config_set_device_id(bridge_dev->config, pch_dev_id);
-    pci_config_set_revision(bridge_dev->config, pch_rev_id);
-}
-
 static void isapc_machine_options(MachineClass *m)
 {
     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index f6e3f7a..96b679d 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -123,10 +123,118 @@ static const TypeInfo igd_passthrough_q35_info = {
     .class_init    = igd_passthrough_q35_class_init,
 };
 
+typedef struct {
+    uint16_t gpu_device_id;
+    uint16_t pch_device_id;
+    uint8_t pch_revision_id;
+} IGDDeviceIDInfo;
+
+/* In real world different GPU should have different PCH. But actually
+ * the different PCH DIDs likely map to different PCH SKUs. We do the
+ * same thing for the GPU. For PCH, the different SKUs are going to be
+ * all the same silicon design and implementation, just different
+ * features turn on and off with fuses. The SW interfaces should be
+ * consistent across all SKUs in a given family (eg LPT). But just same
+ * features may not be supported.
+ *
+ * Most of these different PCH features probably don't matter to the
+ * Gfx driver, but obviously any difference in display port connections
+ * will so it should be fine with any PCH in case of passthrough.
+ *
+ * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell)
+ * scenarios, 0x9cc3 for BDW(Broadwell).
+ */
+static const IGDDeviceIDInfo igd_combo_id_infos[] = {
+    /* HSW Classic */
+    {0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */
+    {0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */
+    {0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */
+    {0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */
+    {0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */
+    /* HSW ULT */
+    {0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */
+    {0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */
+    {0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */
+    {0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */
+    {0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */
+    {0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */
+    /* HSW CRW */
+    {0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */
+    {0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */
+    /* HSW Server */
+    {0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */
+    /* HSW SRVR */
+    {0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */
+    /* BSW */
+    {0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */
+    {0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */
+    {0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */
+    {0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */
+    {0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */
+    {0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */
+    {0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */
+    {0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */
+    {0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */
+    {0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */
+    {0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */
+};
+
+static void isa_bridge_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    dc->desc        = "ISA bridge faked to support IGD PT";
+    k->vendor_id    = PCI_VENDOR_ID_INTEL;
+    k->class_id     = PCI_CLASS_BRIDGE_ISA;
+};
+
+static TypeInfo igd_passthrough_isa_bridge_info = {
+    .name          = "igd-passthrough-isa-bridge",
+    .parent        = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(PCIDevice),
+    .class_init = isa_bridge_class_init,
+};
+
+void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id)
+{
+    struct PCIDevice *bridge_dev;
+    int i, num;
+    uint16_t pch_dev_id = 0xffff;
+    uint8_t pch_rev_id;
+
+    num = ARRAY_SIZE(igd_combo_id_infos);
+    for (i = 0; i < num; i++) {
+        if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) {
+            pch_dev_id = igd_combo_id_infos[i].pch_device_id;
+            pch_rev_id = igd_combo_id_infos[i].pch_revision_id;
+        }
+    }
+
+    if (pch_dev_id == 0xffff) {
+        return;
+    }
+
+    /* Currently IGD drivers always need to access PCH by 1f.0. */
+    bridge_dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
+                                   "igd-passthrough-isa-bridge");
+
+    /*
+     * Note that vendor id is always PCI_VENDOR_ID_INTEL.
+     */
+    if (!bridge_dev) {
+        fprintf(stderr, "set igd-passthrough-isa-bridge failed!\n");
+        return;
+    }
+    pci_config_set_device_id(bridge_dev->config, pch_dev_id);
+    pci_config_set_revision(bridge_dev->config, pch_rev_id);
+}
+
 static void igd_register_types(void)
 {
     type_register_static(&igd_passthrough_i440fx_info);
     type_register_static(&igd_passthrough_q35_info);
+    type_register_static(&igd_passthrough_isa_bridge_info);
 }
 
 type_init(igd_register_types)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 10/11] igd: handle igd-passthrough-isa-bridge setup in realize()
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	Michael S. Tsirkin, Cao jin, vfio-users, Gerd Hoffmann

That way a simple '-device igd-passthrough-isa-bridge,addr=1f' will
do the setup.

Also instead of looking up reasonable PCI IDs based on the graphic
device id simply copy over the ids from the host, thereby reusing the
infrastructure we have in place for the igd host bridges.  Less code,
and should be more robust as we don't have to maintain the id table
to keep things going.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/pci-host/igd.c    | 115 +++++++++++++--------------------------------------
 hw/xen/xen_pt.c      |   2 +-
 include/hw/i386/pc.h |   2 +-
 3 files changed, 30 insertions(+), 89 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index 96b679d..8f32c39 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -123,111 +123,52 @@ static const TypeInfo igd_passthrough_q35_info = {
     .class_init    = igd_passthrough_q35_class_init,
 };
 
-typedef struct {
-    uint16_t gpu_device_id;
-    uint16_t pch_device_id;
-    uint8_t pch_revision_id;
-} IGDDeviceIDInfo;
-
-/* In real world different GPU should have different PCH. But actually
- * the different PCH DIDs likely map to different PCH SKUs. We do the
- * same thing for the GPU. For PCH, the different SKUs are going to be
- * all the same silicon design and implementation, just different
- * features turn on and off with fuses. The SW interfaces should be
- * consistent across all SKUs in a given family (eg LPT). But just same
- * features may not be supported.
- *
- * Most of these different PCH features probably don't matter to the
- * Gfx driver, but obviously any difference in display port connections
- * will so it should be fine with any PCH in case of passthrough.
- *
- * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell)
- * scenarios, 0x9cc3 for BDW(Broadwell).
- */
-static const IGDDeviceIDInfo igd_combo_id_infos[] = {
-    /* HSW Classic */
-    {0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */
-    {0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */
-    {0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */
-    {0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */
-    {0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */
-    /* HSW ULT */
-    {0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */
-    {0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */
-    {0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */
-    {0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */
-    {0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */
-    {0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */
-    /* HSW CRW */
-    {0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */
-    {0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */
-    /* HSW Server */
-    {0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */
-    /* HSW SRVR */
-    {0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */
-    /* BSW */
-    {0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */
-    {0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */
-    {0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */
-    {0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */
-    {0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */
-    {0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */
-    {0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */
-    {0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */
-    {0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */
-    {0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */
-    {0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */
+static const IGDHostInfo igd_isa_bridge_infos[] = {
+    {PCI_VENDOR_ID,           2},
+    {PCI_DEVICE_ID,           2},
+    {PCI_REVISION_ID,         2},
+    {PCI_SUBSYSTEM_VENDOR_ID, 2},
+    {PCI_SUBSYSTEM_ID,        2},
 };
 
+static void igd_pt_isa_bridge_realize(PCIDevice *pci_dev, Error **errp)
+{
+    Error *err = NULL;
+
+    if (pci_dev->devfn != PCI_DEVFN(0x1f, 0)) {
+        error_setg(errp, "igd isa bridge must have address 1f.0");
+        return;
+    }
+
+    host_pci_config_copy(pci_dev, "0000:00:1f.0",
+                         igd_isa_bridge_infos,
+                         ARRAY_SIZE(igd_isa_bridge_infos),
+                         &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        return;
+    }
+}
+
 static void isa_bridge_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     dc->desc        = "ISA bridge faked to support IGD PT";
-    k->vendor_id    = PCI_VENDOR_ID_INTEL;
+    k->realize      = igd_pt_isa_bridge_realize;
     k->class_id     = PCI_CLASS_BRIDGE_ISA;
 };
 
 static TypeInfo igd_passthrough_isa_bridge_info = {
     .name          = "igd-passthrough-isa-bridge",
     .parent        = TYPE_PCI_DEVICE,
-    .instance_size = sizeof(PCIDevice),
     .class_init = isa_bridge_class_init,
 };
 
-void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id)
+void igd_passthrough_isa_bridge_create(PCIBus *bus)
 {
-    struct PCIDevice *bridge_dev;
-    int i, num;
-    uint16_t pch_dev_id = 0xffff;
-    uint8_t pch_rev_id;
-
-    num = ARRAY_SIZE(igd_combo_id_infos);
-    for (i = 0; i < num; i++) {
-        if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) {
-            pch_dev_id = igd_combo_id_infos[i].pch_device_id;
-            pch_rev_id = igd_combo_id_infos[i].pch_revision_id;
-        }
-    }
-
-    if (pch_dev_id == 0xffff) {
-        return;
-    }
-
-    /* Currently IGD drivers always need to access PCH by 1f.0. */
-    bridge_dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
-                                   "igd-passthrough-isa-bridge");
-
-    /*
-     * Note that vendor id is always PCI_VENDOR_ID_INTEL.
-     */
-    if (!bridge_dev) {
-        fprintf(stderr, "set igd-passthrough-isa-bridge failed!\n");
-        return;
-    }
-    pci_config_set_device_id(bridge_dev->config, pch_dev_id);
-    pci_config_set_revision(bridge_dev->config, pch_rev_id);
+    pci_create_simple(bus, PCI_DEVFN(0x1f, 0), "igd-passthrough-isa-bridge");
 }
 
 static void igd_register_types(void)
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index aa96288..18a7f72 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -693,7 +693,7 @@ xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
     PCIDevice *d = &s->dev;
 
     gpu_dev_id = dev->device_id;
-    igd_passthrough_isa_bridge_create(d->bus, gpu_dev_id);
+    igd_passthrough_isa_bridge_create(d->bus);
 }
 
 /* destroy. */
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index b0d6283..48cdd03 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -863,5 +863,5 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
     (m)->compat_props = props; \
 } while (0)
 
-extern void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id);
+extern void igd_passthrough_isa_bridge_create(PCIBus *bus);
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 10/11] igd: handle igd-passthrough-isa-bridge setup in realize()
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Michael S. Tsirkin, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA

That way a simple '-device igd-passthrough-isa-bridge,addr=1f' will
do the setup.

Also instead of looking up reasonable PCI IDs based on the graphic
device id simply copy over the ids from the host, thereby reusing the
infrastructure we have in place for the igd host bridges.  Less code,
and should be more robust as we don't have to maintain the id table
to keep things going.

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 hw/pci-host/igd.c    | 115 +++++++++++++--------------------------------------
 hw/xen/xen_pt.c      |   2 +-
 include/hw/i386/pc.h |   2 +-
 3 files changed, 30 insertions(+), 89 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index 96b679d..8f32c39 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -123,111 +123,52 @@ static const TypeInfo igd_passthrough_q35_info = {
     .class_init    = igd_passthrough_q35_class_init,
 };
 
-typedef struct {
-    uint16_t gpu_device_id;
-    uint16_t pch_device_id;
-    uint8_t pch_revision_id;
-} IGDDeviceIDInfo;
-
-/* In real world different GPU should have different PCH. But actually
- * the different PCH DIDs likely map to different PCH SKUs. We do the
- * same thing for the GPU. For PCH, the different SKUs are going to be
- * all the same silicon design and implementation, just different
- * features turn on and off with fuses. The SW interfaces should be
- * consistent across all SKUs in a given family (eg LPT). But just same
- * features may not be supported.
- *
- * Most of these different PCH features probably don't matter to the
- * Gfx driver, but obviously any difference in display port connections
- * will so it should be fine with any PCH in case of passthrough.
- *
- * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell)
- * scenarios, 0x9cc3 for BDW(Broadwell).
- */
-static const IGDDeviceIDInfo igd_combo_id_infos[] = {
-    /* HSW Classic */
-    {0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */
-    {0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */
-    {0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */
-    {0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */
-    {0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */
-    /* HSW ULT */
-    {0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */
-    {0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */
-    {0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */
-    {0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */
-    {0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */
-    {0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */
-    /* HSW CRW */
-    {0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */
-    {0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */
-    /* HSW Server */
-    {0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */
-    /* HSW SRVR */
-    {0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */
-    /* BSW */
-    {0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */
-    {0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */
-    {0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */
-    {0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */
-    {0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */
-    {0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */
-    {0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */
-    {0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */
-    {0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */
-    {0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */
-    {0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */
+static const IGDHostInfo igd_isa_bridge_infos[] = {
+    {PCI_VENDOR_ID,           2},
+    {PCI_DEVICE_ID,           2},
+    {PCI_REVISION_ID,         2},
+    {PCI_SUBSYSTEM_VENDOR_ID, 2},
+    {PCI_SUBSYSTEM_ID,        2},
 };
 
+static void igd_pt_isa_bridge_realize(PCIDevice *pci_dev, Error **errp)
+{
+    Error *err = NULL;
+
+    if (pci_dev->devfn != PCI_DEVFN(0x1f, 0)) {
+        error_setg(errp, "igd isa bridge must have address 1f.0");
+        return;
+    }
+
+    host_pci_config_copy(pci_dev, "0000:00:1f.0",
+                         igd_isa_bridge_infos,
+                         ARRAY_SIZE(igd_isa_bridge_infos),
+                         &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        return;
+    }
+}
+
 static void isa_bridge_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     dc->desc        = "ISA bridge faked to support IGD PT";
-    k->vendor_id    = PCI_VENDOR_ID_INTEL;
+    k->realize      = igd_pt_isa_bridge_realize;
     k->class_id     = PCI_CLASS_BRIDGE_ISA;
 };
 
 static TypeInfo igd_passthrough_isa_bridge_info = {
     .name          = "igd-passthrough-isa-bridge",
     .parent        = TYPE_PCI_DEVICE,
-    .instance_size = sizeof(PCIDevice),
     .class_init = isa_bridge_class_init,
 };
 
-void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id)
+void igd_passthrough_isa_bridge_create(PCIBus *bus)
 {
-    struct PCIDevice *bridge_dev;
-    int i, num;
-    uint16_t pch_dev_id = 0xffff;
-    uint8_t pch_rev_id;
-
-    num = ARRAY_SIZE(igd_combo_id_infos);
-    for (i = 0; i < num; i++) {
-        if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) {
-            pch_dev_id = igd_combo_id_infos[i].pch_device_id;
-            pch_rev_id = igd_combo_id_infos[i].pch_revision_id;
-        }
-    }
-
-    if (pch_dev_id == 0xffff) {
-        return;
-    }
-
-    /* Currently IGD drivers always need to access PCH by 1f.0. */
-    bridge_dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
-                                   "igd-passthrough-isa-bridge");
-
-    /*
-     * Note that vendor id is always PCI_VENDOR_ID_INTEL.
-     */
-    if (!bridge_dev) {
-        fprintf(stderr, "set igd-passthrough-isa-bridge failed!\n");
-        return;
-    }
-    pci_config_set_device_id(bridge_dev->config, pch_dev_id);
-    pci_config_set_revision(bridge_dev->config, pch_rev_id);
+    pci_create_simple(bus, PCI_DEVFN(0x1f, 0), "igd-passthrough-isa-bridge");
 }
 
 static void igd_register_types(void)
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index aa96288..18a7f72 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -693,7 +693,7 @@ xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
     PCIDevice *d = &s->dev;
 
     gpu_dev_id = dev->device_id;
-    igd_passthrough_isa_bridge_create(d->bus, gpu_dev_id);
+    igd_passthrough_isa_bridge_create(d->bus);
 }
 
 /* destroy. */
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index b0d6283..48cdd03 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -863,5 +863,5 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
     (m)->compat_props = props; \
 } while (0)
 
-extern void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id);
+extern void igd_passthrough_isa_bridge_create(PCIBus *bus);
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	Michael S. Tsirkin, Cao jin, vfio-users, Paolo Bonzini,
	Richard Henderson, Gerd Hoffmann

This patch moves igd-passthrough-isa-bridge creation out of the xen
passthrough code into machine init.  It is triggered by the
igd-passthru=on machine option.  Advantages:

 * This works for on both xen and kvm.
 * It is activated for the pc machine type only, q35 has a real
   isa bridge on 1f.0 and must be handled differently.  The q35
   plan is https://lkml.org/lkml/2015/11/26/183 (should land in
   the next merge window, i.e. linux 4.5).
 * If we don't need it any more some day (intel is busy removing
   chipset dependencies from the guest driver) we have a single
   machine switch to just turn off all igd passthru chipset
   tweaks.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/i386/pc_piix.c |  6 ++++++
 hw/xen/xen_pt.c   | 14 --------------
 2 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index f36222e..2afbbd3 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -281,6 +281,12 @@ static void pc_init1(MachineState *machine,
     if (pcmc->pci_enabled) {
         pc_pci_device_init(pci_bus);
     }
+
+#ifdef CONFIG_LINUX
+    if (machine->igd_gfx_passthru) {
+        igd_passthrough_isa_bridge_create(pci_bus);
+    }
+#endif
 }
 
 /* Looking for a pc_compat_2_4() function? It doesn't exist.
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 18a7f72..5f626c9 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -685,17 +685,6 @@ static const MemoryListener xen_pt_io_listener = {
     .priority = 10,
 };
 
-static void
-xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
-                                      XenHostPCIDevice *dev)
-{
-    uint16_t gpu_dev_id;
-    PCIDevice *d = &s->dev;
-
-    gpu_dev_id = dev->device_id;
-    igd_passthrough_isa_bridge_create(d->bus);
-}
-
 /* destroy. */
 static void xen_pt_destroy(PCIDevice *d) {
 
@@ -810,9 +799,6 @@ static int xen_pt_initfn(PCIDevice *d)
             xen_host_pci_device_put(&s->real_device);
             return -1;
         }
-
-        /* Register ISA bridge for passthrough GFX. */
-        xen_igd_passthrough_isa_bridge_create(s, &s->real_device);
     }
 
     /* Handle real device's MMIO/PIO BARs */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-05 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-05 11:41 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Michael S. Tsirkin, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA, Paolo Bonzini,
	Richard Henderson

This patch moves igd-passthrough-isa-bridge creation out of the xen
passthrough code into machine init.  It is triggered by the
igd-passthru=on machine option.  Advantages:

 * This works for on both xen and kvm.
 * It is activated for the pc machine type only, q35 has a real
   isa bridge on 1f.0 and must be handled differently.  The q35
   plan is https://lkml.org/lkml/2015/11/26/183 (should land in
   the next merge window, i.e. linux 4.5).
 * If we don't need it any more some day (intel is busy removing
   chipset dependencies from the guest driver) we have a single
   machine switch to just turn off all igd passthru chipset
   tweaks.

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 hw/i386/pc_piix.c |  6 ++++++
 hw/xen/xen_pt.c   | 14 --------------
 2 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index f36222e..2afbbd3 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -281,6 +281,12 @@ static void pc_init1(MachineState *machine,
     if (pcmc->pci_enabled) {
         pc_pci_device_init(pci_bus);
     }
+
+#ifdef CONFIG_LINUX
+    if (machine->igd_gfx_passthru) {
+        igd_passthrough_isa_bridge_create(pci_bus);
+    }
+#endif
 }
 
 /* Looking for a pc_compat_2_4() function? It doesn't exist.
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 18a7f72..5f626c9 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -685,17 +685,6 @@ static const MemoryListener xen_pt_io_listener = {
     .priority = 10,
 };
 
-static void
-xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
-                                      XenHostPCIDevice *dev)
-{
-    uint16_t gpu_dev_id;
-    PCIDevice *d = &s->dev;
-
-    gpu_dev_id = dev->device_id;
-    igd_passthrough_isa_bridge_create(d->bus);
-}
-
 /* destroy. */
 static void xen_pt_destroy(PCIDevice *d) {
 
@@ -810,9 +799,6 @@ static int xen_pt_initfn(PCIDevice *d)
             xen_host_pci_device_put(&s->real_device);
             return -1;
         }
-
-        /* Register ISA bridge for passthrough GFX. */
-        xen_igd_passthrough_isa_bridge_create(s, &s->real_device);
     }
 
     /* Handle real device's MMIO/PIO BARs */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-05 11:41 ` Gerd Hoffmann
@ 2016-01-05 13:07   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 132+ messages in thread
From: Michael S. Tsirkin @ 2016-01-05 13:07 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, Jan 05, 2016 at 12:41:27PM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> We have some code in our tree to support pci passthrough of intel
> graphics devices (igd) on xen, which requires some chipset tweaks
> for (a) the host bridge and (b) the lpc/isa-bridge to meat the
> expectations of the guest driver.
> 
> For kvm we need pretty much the same, also the requirements for vgpu
> (xengt/kvmgt) are very simliar.  This patch wires up the existing
> support for kvm.  It also brings a bunch of bugfixes and cleanups.
> 
> Unfortunaly the oldish laptop I had planned to use for testing turned
> out to have no working iommu support for igd, so this patch series
> still has seen very light testing only.  Any testing feedback is very
> welcome.

I'm very interested to hear about it too, especially in light of the
fact that config accesses to host seem completely
broken ATM.


> Testing with kvm/i440fx:
>   Add '-M pc,igd-passthru=on' to turn on the chipset tweaks.
>   Passthrough the igd using vfio.
> 
> Testing with kvm/q35:
>   Add '-M q35,igd-passthru=on' to turn on the the chipset tweaks.
>   Pick up the linux kernel patch referenced in patch #11, build a
>   custom kernel with it.  Passthrough the igd using vfio.
> 
> Testing with xen:
>   Existing setups should continue working ;)
> 
> Changes in v3:
>   * Handle igd-passthrough-isa-bridge creation in machine init.
>   * Fix xen build failure.
> 
> Changes in v2:
>   * Added igd-passthrough-isa-bridge support form kvm.
>   * Added patch to drop has_igd_gfx_passthru.
> 
> cheers,
>   Gerd
> 
> Gerd Hoffmann (11):
>   pc: wire up TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE for !xen
>   pc: remove has_igd_gfx_passthru global
>   pc: move igd support code to igd.c
>   igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
>   igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
>   igd: use defines for standard pci config space offsets
>   igd: revamp host config read
>   igd: add q35 support
>   igd: move igd-passthrough-isa-bridge to igd.c too
>   igd: handle igd-passthrough-isa-bridge setup in realize()
>   igd: move igd-passthrough-isa-bridge creation to machine init
> 
>  hw/i386/pc_piix.c         | 130 +++------------------------------
>  hw/pci-host/Makefile.objs |   3 +
>  hw/pci-host/igd.c         | 181 ++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pci-host/piix.c        |  88 ----------------------
>  hw/pci-host/q35.c         |   6 +-
>  hw/xen/xen_pt.c           |  14 ----
>  hw/xen/xen_pt.h           |   5 +-
>  include/hw/i386/pc.h      |   2 +-
>  vl.c                      |  10 ---
>  9 files changed, 204 insertions(+), 235 deletions(-)
>  create mode 100644 hw/pci-host/igd.c
> 
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-05 13:07   ` Michael S. Tsirkin
  0 siblings, 0 replies; 132+ messages in thread
From: Michael S. Tsirkin @ 2016-01-05 13:07 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, Jan 05, 2016 at 12:41:27PM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> We have some code in our tree to support pci passthrough of intel
> graphics devices (igd) on xen, which requires some chipset tweaks
> for (a) the host bridge and (b) the lpc/isa-bridge to meat the
> expectations of the guest driver.
> 
> For kvm we need pretty much the same, also the requirements for vgpu
> (xengt/kvmgt) are very simliar.  This patch wires up the existing
> support for kvm.  It also brings a bunch of bugfixes and cleanups.
> 
> Unfortunaly the oldish laptop I had planned to use for testing turned
> out to have no working iommu support for igd, so this patch series
> still has seen very light testing only.  Any testing feedback is very
> welcome.

I'm very interested to hear about it too, especially in light of the
fact that config accesses to host seem completely
broken ATM.


> Testing with kvm/i440fx:
>   Add '-M pc,igd-passthru=on' to turn on the chipset tweaks.
>   Passthrough the igd using vfio.
> 
> Testing with kvm/q35:
>   Add '-M q35,igd-passthru=on' to turn on the the chipset tweaks.
>   Pick up the linux kernel patch referenced in patch #11, build a
>   custom kernel with it.  Passthrough the igd using vfio.
> 
> Testing with xen:
>   Existing setups should continue working ;)
> 
> Changes in v3:
>   * Handle igd-passthrough-isa-bridge creation in machine init.
>   * Fix xen build failure.
> 
> Changes in v2:
>   * Added igd-passthrough-isa-bridge support form kvm.
>   * Added patch to drop has_igd_gfx_passthru.
> 
> cheers,
>   Gerd
> 
> Gerd Hoffmann (11):
>   pc: wire up TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE for !xen
>   pc: remove has_igd_gfx_passthru global
>   pc: move igd support code to igd.c
>   igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
>   igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
>   igd: use defines for standard pci config space offsets
>   igd: revamp host config read
>   igd: add q35 support
>   igd: move igd-passthrough-isa-bridge to igd.c too
>   igd: handle igd-passthrough-isa-bridge setup in realize()
>   igd: move igd-passthrough-isa-bridge creation to machine init
> 
>  hw/i386/pc_piix.c         | 130 +++------------------------------
>  hw/pci-host/Makefile.objs |   3 +
>  hw/pci-host/igd.c         | 181 ++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pci-host/piix.c        |  88 ----------------------
>  hw/pci-host/q35.c         |   6 +-
>  hw/xen/xen_pt.c           |  14 ----
>  hw/xen/xen_pt.h           |   5 +-
>  include/hw/i386/pc.h      |   2 +-
>  vl.c                      |  10 ---
>  9 files changed, 204 insertions(+), 235 deletions(-)
>  create mode 100644 hw/pci-host/igd.c
> 
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [PATCH v3 02/11] pc: remove has_igd_gfx_passthru global
  2016-01-05 11:41   ` Gerd Hoffmann
@ 2016-01-06 14:32     ` Stefano Stabellini
  -1 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 14:32 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users, Paolo Bonzini

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  hw/xen/xen_pt.h |  5 +++--
>  vl.c            | 10 ----------
>  2 files changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
> index 3749711..cdd73ff 100644
> --- a/hw/xen/xen_pt.h
> +++ b/hw/xen/xen_pt.h
> @@ -4,6 +4,7 @@
>  #include "qemu-common.h"
>  #include "hw/xen/xen_common.h"
>  #include "hw/pci/pci.h"
> +#include "hw/boards.h"
>  #include "xen-host-pci-device.h"
>  
>  void xen_pt_log(const PCIDevice *d, const char *f, ...) GCC_FMT_ATTR(2, 3);
> @@ -322,10 +323,10 @@ extern void *pci_assign_dev_load_option_rom(PCIDevice *dev,
>                                              unsigned int domain,
>                                              unsigned int bus, unsigned int slot,
>                                              unsigned int function);
> -extern bool has_igd_gfx_passthru;
>  static inline bool is_igd_vga_passthrough(XenHostPCIDevice *dev)
>  {
> -    return (has_igd_gfx_passthru
> +    MachineState *machine = MACHINE(qdev_get_machine());
> +    return (machine->igd_gfx_passthru
>              && ((dev->class_code >> 0x8) == PCI_CLASS_DISPLAY_VGA));
>  }
>  int xen_pt_register_vga_regions(XenHostPCIDevice *dev);
> diff --git a/vl.c b/vl.c
> index 5aaea77..d4e51ec 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1365,13 +1365,6 @@ static inline void semihosting_arg_fallback(const char *file, const char *cmd)
>      }
>  }
>  
> -/* Now we still need this for compatibility with XEN. */
> -bool has_igd_gfx_passthru;
> -static void igd_gfx_passthru(void)
> -{
> -    has_igd_gfx_passthru = current_machine->igd_gfx_passthru;
> -}
> -
>  /***********************************************************/
>  /* USB devices */
>  
> @@ -4550,9 +4543,6 @@ int main(int argc, char **argv, char **envp)
>              exit(1);
>      }
>  
> -    /* Check if IGD GFX passthrough. */
> -    igd_gfx_passthru();
> -
>      /* init generic devices */
>      if (qemu_opts_foreach(qemu_find_opts("device"),
>                            device_init_func, NULL, NULL)) {
> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Xen-devel] [PATCH v3 02/11] pc: remove has_igd_gfx_passthru global
@ 2016-01-06 14:32     ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 14:32 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users, Paolo Bonzini

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  hw/xen/xen_pt.h |  5 +++--
>  vl.c            | 10 ----------
>  2 files changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
> index 3749711..cdd73ff 100644
> --- a/hw/xen/xen_pt.h
> +++ b/hw/xen/xen_pt.h
> @@ -4,6 +4,7 @@
>  #include "qemu-common.h"
>  #include "hw/xen/xen_common.h"
>  #include "hw/pci/pci.h"
> +#include "hw/boards.h"
>  #include "xen-host-pci-device.h"
>  
>  void xen_pt_log(const PCIDevice *d, const char *f, ...) GCC_FMT_ATTR(2, 3);
> @@ -322,10 +323,10 @@ extern void *pci_assign_dev_load_option_rom(PCIDevice *dev,
>                                              unsigned int domain,
>                                              unsigned int bus, unsigned int slot,
>                                              unsigned int function);
> -extern bool has_igd_gfx_passthru;
>  static inline bool is_igd_vga_passthrough(XenHostPCIDevice *dev)
>  {
> -    return (has_igd_gfx_passthru
> +    MachineState *machine = MACHINE(qdev_get_machine());
> +    return (machine->igd_gfx_passthru
>              && ((dev->class_code >> 0x8) == PCI_CLASS_DISPLAY_VGA));
>  }
>  int xen_pt_register_vga_regions(XenHostPCIDevice *dev);
> diff --git a/vl.c b/vl.c
> index 5aaea77..d4e51ec 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1365,13 +1365,6 @@ static inline void semihosting_arg_fallback(const char *file, const char *cmd)
>      }
>  }
>  
> -/* Now we still need this for compatibility with XEN. */
> -bool has_igd_gfx_passthru;
> -static void igd_gfx_passthru(void)
> -{
> -    has_igd_gfx_passthru = current_machine->igd_gfx_passthru;
> -}
> -
>  /***********************************************************/
>  /* USB devices */
>  
> @@ -4550,9 +4543,6 @@ int main(int argc, char **argv, char **envp)
>              exit(1);
>      }
>  
> -    /* Check if IGD GFX passthrough. */
> -    igd_gfx_passthru();
> -
>      /* init generic devices */
>      if (qemu_opts_foreach(qemu_find_opts("device"),
>                            device_init_func, NULL, NULL)) {
> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
  2016-01-05 11:41   ` Gerd Hoffmann
@ 2016-01-06 14:32     ` Stefano Stabellini
  -1 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 14:32 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  hw/pci-host/igd.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index ef0273b..d1eeafb 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -53,7 +53,7 @@ out:
>      return ret;
>  }
>  
> -static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
> +static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
>  {
>      uint32_t val = 0;
>      int rc, i, num;
> @@ -65,12 +65,11 @@ static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
>          len = igd_host_bridge_infos[i].len;
>          rc = host_pci_config_read(pos, len, val);
>          if (rc) {
> -            return -ENODEV;
> +            error_setg(errp, "failed to read host config");
> +            return;
>          }
>          pci_default_write_config(pci_dev, pos, val, len);
>      }
> -
> -    return 0;
>  }
>  
>  static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> @@ -78,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
> -    k->init = igd_pt_i440fx_initfn;
> +    k->realize = igd_pt_i440fx_realize;
>      dc->desc = "IGD Passthrough Host bridge";
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
@ 2016-01-06 14:32     ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 14:32 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  hw/pci-host/igd.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index ef0273b..d1eeafb 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -53,7 +53,7 @@ out:
>      return ret;
>  }
>  
> -static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
> +static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
>  {
>      uint32_t val = 0;
>      int rc, i, num;
> @@ -65,12 +65,11 @@ static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
>          len = igd_host_bridge_infos[i].len;
>          rc = host_pci_config_read(pos, len, val);
>          if (rc) {
> -            return -ENODEV;
> +            error_setg(errp, "failed to read host config");
> +            return;
>          }
>          pci_default_write_config(pci_dev, pos, val, len);
>      }
> -
> -    return 0;
>  }
>  
>  static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> @@ -78,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
> -    k->init = igd_pt_i440fx_initfn;
> +    k->realize = igd_pt_i440fx_realize;
>      dc->desc = "IGD Passthrough Host bridge";
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
@ 2016-01-06 14:41     ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 14:41 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/pci-host/igd.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index d1eeafb..6f52ab1 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -53,12 +53,20 @@ out:
>      return ret;
>  }
>  
> +static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
>  static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
>  {
> +    Error *err = NULL;
>      uint32_t val = 0;
>      int rc, i, num;
>      int pos, len;

Can't we get the parent PCIDeviceClass realize function from pci_dev? So
that we don't have to introduce i440fx_realize?


> +    i440fx_realize(pci_dev, &err);
> +    if (err != NULL) {
> +        error_propagate(errp, err);
> +        return;
> +    }
> +
>      num = ARRAY_SIZE(igd_host_bridge_infos);
>      for (i = 0; i < num; i++) {
>          pos = igd_host_bridge_infos[i].offset;
> @@ -77,6 +85,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
> +    i440fx_realize = k->realize;
>      k->realize = igd_pt_i440fx_realize;
>      dc->desc = "IGD Passthrough Host bridge";
>  }
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
@ 2016-01-06 14:41     ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 14:41 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, qemu-devel-qX2TKyscuCcdnm+yROfE0A, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  hw/pci-host/igd.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index d1eeafb..6f52ab1 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -53,12 +53,20 @@ out:
>      return ret;
>  }
>  
> +static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
>  static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
>  {
> +    Error *err = NULL;
>      uint32_t val = 0;
>      int rc, i, num;
>      int pos, len;

Can't we get the parent PCIDeviceClass realize function from pci_dev? So
that we don't have to introduce i440fx_realize?


> +    i440fx_realize(pci_dev, &err);
> +    if (err != NULL) {
> +        error_propagate(errp, err);
> +        return;
> +    }
> +
>      num = ARRAY_SIZE(igd_host_bridge_infos);
>      for (i = 0; i < num; i++) {
>          pos = igd_host_bridge_infos[i].offset;
> @@ -77,6 +85,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
> +    i440fx_realize = k->realize;
>      k->realize = igd_pt_i440fx_realize;
>      dc->desc = "IGD Passthrough Host bridge";
>  }
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/11] igd: use defines for standard pci config space offsets
  2016-01-05 11:41   ` Gerd Hoffmann
@ 2016-01-06 14:43     ` Stefano Stabellini
  -1 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 14:43 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  hw/pci-host/igd.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index 6f52ab1..0784128 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -10,9 +10,9 @@ typedef struct {
>  
>  /* Here we just expose minimal host bridge offset subset. */
>  static const IGDHostInfo igd_host_bridge_infos[] = {
> -    {0x08, 2},  /* revision id */
> -    {0x2c, 2},  /* sybsystem vendor id */
> -    {0x2e, 2},  /* sybsystem id */
> +    {PCI_REVISION_ID,         2},
> +    {PCI_SUBSYSTEM_VENDOR_ID, 2},
> +    {PCI_SUBSYSTEM_ID,        2},
>      {0x50, 2},  /* SNB: processor graphics control register */
>      {0x52, 2},  /* processor graphics control register */
>      {0xa4, 4},  /* SNB: graphics base of stolen memory */
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 06/11] igd: use defines for standard pci config space offsets
@ 2016-01-06 14:43     ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 14:43 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  hw/pci-host/igd.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index 6f52ab1..0784128 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -10,9 +10,9 @@ typedef struct {
>  
>  /* Here we just expose minimal host bridge offset subset. */
>  static const IGDHostInfo igd_host_bridge_infos[] = {
> -    {0x08, 2},  /* revision id */
> -    {0x2c, 2},  /* sybsystem vendor id */
> -    {0x2e, 2},  /* sybsystem id */
> +    {PCI_REVISION_ID,         2},
> +    {PCI_SUBSYSTEM_VENDOR_ID, 2},
> +    {PCI_SUBSYSTEM_ID,        2},
>      {0x50, 2},  /* SNB: processor graphics control register */
>      {0x52, 2},  /* processor graphics control register */
>      {0xa4, 4},  /* SNB: graphics base of stolen memory */
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/11] igd: revamp host config read
  2016-01-05 11:41   ` Gerd Hoffmann
@ 2016-01-06 15:02     ` Stefano Stabellini
  -1 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 15:02 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> Move all work to the host_pci_config_copy helper function,
> which we can easily reuse when adding q35 support.
> Open sysfs file only once for all values.  Use pread.
> Proper error handling.  Fix bugs:
> 
>  * Don't throw away results (like old host_pci_config_read
>    did because val was passed by value not reference).
>  * Update config space directly (writing via
>    pci_default_write_config only works for registers
>    whitelisted in wmask).
> 
> Hmm, this code can hardly ever worked before,
> /me wonders what test coverage it had.
> 
> With this patch in place igd-passthru=on actually
> works, although it still requires root priviledges
> because linux refuses to allow non-root users access
> pci config space above offset 0x50.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/pci-host/igd.c | 65 +++++++++++++++++++++++--------------------------------
>  1 file changed, 27 insertions(+), 38 deletions(-)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index 0784128..ec48875 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -19,47 +19,39 @@ static const IGDHostInfo igd_host_bridge_infos[] = {
>      {0xa8, 4},  /* SNB: base of GTT stolen memory */
>  };
>  
> -static int host_pci_config_read(int pos, int len, uint32_t val)
> +static void host_pci_config_copy(PCIDevice *guest, const char *host,
> +                                 const IGDHostInfo *list, int len, Error **errp)
>  {
> -    char path[PATH_MAX];
> -    int config_fd;
> -    ssize_t size = sizeof(path);
> -    /* Access real host bridge. */
> -    int rc = snprintf(path, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
> -                      0, 0, 0, 0, "config");
> -    int ret = 0;
> +    char *path;
> +    int config_fd, rc, i;
>  
> -    if (rc >= size || rc < 0) {
> -        return -ENODEV;
> -    }
> -
> -    config_fd = open(path, O_RDWR);
> +    path = g_strdup_printf("/sys/bus/pci/devices/%s/config", host);
> +    config_fd = open(path, O_RDONLY);
>      if (config_fd < 0) {
> -        return -ENODEV;
> +        error_setg_file_open(errp, errno, path);
> +        goto out_free;
>      }
>  
> -    if (lseek(config_fd, pos, SEEK_SET) != pos) {
> -        ret = -errno;
> -        goto out;
> +    for (i = 0; i < len; i++) {
> +        rc = pread(config_fd, guest->config + list[i].offset,
> +                   list[i].len, list[i].offset);
> +        if (rc != list[i].len) {

pread is allowed to return early, returning the number of bytes read.



> +            error_setg_errno(errp, errno, "read %s, offset 0x%x",
> +                             path, list[i].offset);
> +            goto out_close;
> +        }
>      }
> -    do {
> -        rc = read(config_fd, (uint8_t *)&val, len);
> -    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
> -    if (rc != len) {
> -        ret = -errno;
> -    }
> -out:
> +
> +out_close:
>      close(config_fd);
> -    return ret;
> +out_free:
> +    g_free(path);
>  }
>  
>  static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
>  static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
>  {
>      Error *err = NULL;
> -    uint32_t val = 0;
> -    int rc, i, num;
> -    int pos, len;
>  
>      i440fx_realize(pci_dev, &err);
>      if (err != NULL) {
> @@ -67,16 +59,13 @@ static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
>          return;
>      }
>  
> -    num = ARRAY_SIZE(igd_host_bridge_infos);
> -    for (i = 0; i < num; i++) {
> -        pos = igd_host_bridge_infos[i].offset;
> -        len = igd_host_bridge_infos[i].len;
> -        rc = host_pci_config_read(pos, len, val);
> -        if (rc) {
> -            error_setg(errp, "failed to read host config");
> -            return;
> -        }
> -        pci_default_write_config(pci_dev, pos, val, len);
> +    host_pci_config_copy(pci_dev, "0000:00:00.0",
> +                         igd_host_bridge_infos,
> +                         ARRAY_SIZE(igd_host_bridge_infos),
> +                         &err);
> +    if (err != NULL) {
> +        error_propagate(errp, err);
> +        return;
>      }
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 07/11] igd: revamp host config read
@ 2016-01-06 15:02     ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 15:02 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> Move all work to the host_pci_config_copy helper function,
> which we can easily reuse when adding q35 support.
> Open sysfs file only once for all values.  Use pread.
> Proper error handling.  Fix bugs:
> 
>  * Don't throw away results (like old host_pci_config_read
>    did because val was passed by value not reference).
>  * Update config space directly (writing via
>    pci_default_write_config only works for registers
>    whitelisted in wmask).
> 
> Hmm, this code can hardly ever worked before,
> /me wonders what test coverage it had.
> 
> With this patch in place igd-passthru=on actually
> works, although it still requires root priviledges
> because linux refuses to allow non-root users access
> pci config space above offset 0x50.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/pci-host/igd.c | 65 +++++++++++++++++++++++--------------------------------
>  1 file changed, 27 insertions(+), 38 deletions(-)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index 0784128..ec48875 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -19,47 +19,39 @@ static const IGDHostInfo igd_host_bridge_infos[] = {
>      {0xa8, 4},  /* SNB: base of GTT stolen memory */
>  };
>  
> -static int host_pci_config_read(int pos, int len, uint32_t val)
> +static void host_pci_config_copy(PCIDevice *guest, const char *host,
> +                                 const IGDHostInfo *list, int len, Error **errp)
>  {
> -    char path[PATH_MAX];
> -    int config_fd;
> -    ssize_t size = sizeof(path);
> -    /* Access real host bridge. */
> -    int rc = snprintf(path, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
> -                      0, 0, 0, 0, "config");
> -    int ret = 0;
> +    char *path;
> +    int config_fd, rc, i;
>  
> -    if (rc >= size || rc < 0) {
> -        return -ENODEV;
> -    }
> -
> -    config_fd = open(path, O_RDWR);
> +    path = g_strdup_printf("/sys/bus/pci/devices/%s/config", host);
> +    config_fd = open(path, O_RDONLY);
>      if (config_fd < 0) {
> -        return -ENODEV;
> +        error_setg_file_open(errp, errno, path);
> +        goto out_free;
>      }
>  
> -    if (lseek(config_fd, pos, SEEK_SET) != pos) {
> -        ret = -errno;
> -        goto out;
> +    for (i = 0; i < len; i++) {
> +        rc = pread(config_fd, guest->config + list[i].offset,
> +                   list[i].len, list[i].offset);
> +        if (rc != list[i].len) {

pread is allowed to return early, returning the number of bytes read.



> +            error_setg_errno(errp, errno, "read %s, offset 0x%x",
> +                             path, list[i].offset);
> +            goto out_close;
> +        }
>      }
> -    do {
> -        rc = read(config_fd, (uint8_t *)&val, len);
> -    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
> -    if (rc != len) {
> -        ret = -errno;
> -    }
> -out:
> +
> +out_close:
>      close(config_fd);
> -    return ret;
> +out_free:
> +    g_free(path);
>  }
>  
>  static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
>  static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
>  {
>      Error *err = NULL;
> -    uint32_t val = 0;
> -    int rc, i, num;
> -    int pos, len;
>  
>      i440fx_realize(pci_dev, &err);
>      if (err != NULL) {
> @@ -67,16 +59,13 @@ static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
>          return;
>      }
>  
> -    num = ARRAY_SIZE(igd_host_bridge_infos);
> -    for (i = 0; i < num; i++) {
> -        pos = igd_host_bridge_infos[i].offset;
> -        len = igd_host_bridge_infos[i].len;
> -        rc = host_pci_config_read(pos, len, val);
> -        if (rc) {
> -            error_setg(errp, "failed to read host config");
> -            return;
> -        }
> -        pci_default_write_config(pci_dev, pos, val, len);
> +    host_pci_config_copy(pci_dev, "0000:00:00.0",
> +                         igd_host_bridge_infos,
> +                         ARRAY_SIZE(igd_host_bridge_infos),
> +                         &err);
> +    if (err != NULL) {
> +        error_propagate(errp, err);
> +        return;
>      }
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 10/11] igd: handle igd-passthrough-isa-bridge setup in realize()
  2016-01-05 11:41   ` Gerd Hoffmann
@ 2016-01-06 15:29     ` Stefano Stabellini
  -1 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 15:29 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	Michael S. Tsirkin, qemu-devel, Cao jin, vfio-users

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> That way a simple '-device igd-passthrough-isa-bridge,addr=1f' will
> do the setup.

Is this going to change the QEMU command line arguments to use it?


> Also instead of looking up reasonable PCI IDs based on the graphic
> device id simply copy over the ids from the host, thereby reusing the
> infrastructure we have in place for the igd host bridges.  Less code,
> and should be more robust as we don't have to maintain the id table
> to keep things going.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/pci-host/igd.c    | 115 +++++++++++++--------------------------------------
>  hw/xen/xen_pt.c      |   2 +-
>  include/hw/i386/pc.h |   2 +-
>  3 files changed, 30 insertions(+), 89 deletions(-)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index 96b679d..8f32c39 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -123,111 +123,52 @@ static const TypeInfo igd_passthrough_q35_info = {
>      .class_init    = igd_passthrough_q35_class_init,
>  };
>  
> -typedef struct {
> -    uint16_t gpu_device_id;
> -    uint16_t pch_device_id;
> -    uint8_t pch_revision_id;
> -} IGDDeviceIDInfo;
> -
> -/* In real world different GPU should have different PCH. But actually
> - * the different PCH DIDs likely map to different PCH SKUs. We do the
> - * same thing for the GPU. For PCH, the different SKUs are going to be
> - * all the same silicon design and implementation, just different
> - * features turn on and off with fuses. The SW interfaces should be
> - * consistent across all SKUs in a given family (eg LPT). But just same
> - * features may not be supported.
> - *
> - * Most of these different PCH features probably don't matter to the
> - * Gfx driver, but obviously any difference in display port connections
> - * will so it should be fine with any PCH in case of passthrough.
> - *
> - * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell)
> - * scenarios, 0x9cc3 for BDW(Broadwell).
> - */
> -static const IGDDeviceIDInfo igd_combo_id_infos[] = {
> -    /* HSW Classic */
> -    {0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */
> -    {0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */
> -    {0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */
> -    {0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */
> -    {0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */
> -    /* HSW ULT */
> -    {0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */
> -    {0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */
> -    {0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */
> -    {0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */
> -    {0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */
> -    {0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */
> -    /* HSW CRW */
> -    {0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */
> -    {0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */
> -    /* HSW Server */
> -    {0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */
> -    /* HSW SRVR */
> -    {0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */
> -    /* BSW */
> -    {0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */
> -    {0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */
> -    {0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */
> -    {0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */
> -    {0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */
> -    {0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */
> -    {0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */
> -    {0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */
> -    {0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */
> -    {0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */
> -    {0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */
> +static const IGDHostInfo igd_isa_bridge_infos[] = {
> +    {PCI_VENDOR_ID,           2},
> +    {PCI_DEVICE_ID,           2},
> +    {PCI_REVISION_ID,         2},
> +    {PCI_SUBSYSTEM_VENDOR_ID, 2},
> +    {PCI_SUBSYSTEM_ID,        2},
>  };
>  
> +static void igd_pt_isa_bridge_realize(PCIDevice *pci_dev, Error **errp)
> +{
> +    Error *err = NULL;
> +
> +    if (pci_dev->devfn != PCI_DEVFN(0x1f, 0)) {
> +        error_setg(errp, "igd isa bridge must have address 1f.0");
> +        return;
> +    }
> +
> +    host_pci_config_copy(pci_dev, "0000:00:1f.0",
> +                         igd_isa_bridge_infos,
> +                         ARRAY_SIZE(igd_isa_bridge_infos),
> +                         &err);
> +    if (err != NULL) {
> +        error_propagate(errp, err);
> +        return;
> +    }
> +}
> +
>  static void isa_bridge_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
>      dc->desc        = "ISA bridge faked to support IGD PT";
> -    k->vendor_id    = PCI_VENDOR_ID_INTEL;
> +    k->realize      = igd_pt_isa_bridge_realize;
>      k->class_id     = PCI_CLASS_BRIDGE_ISA;
>  };
>  
>  static TypeInfo igd_passthrough_isa_bridge_info = {
>      .name          = "igd-passthrough-isa-bridge",
>      .parent        = TYPE_PCI_DEVICE,
> -    .instance_size = sizeof(PCIDevice),
>      .class_init = isa_bridge_class_init,
>  };
>  
> -void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id)
> +void igd_passthrough_isa_bridge_create(PCIBus *bus)
>  {
> -    struct PCIDevice *bridge_dev;
> -    int i, num;
> -    uint16_t pch_dev_id = 0xffff;
> -    uint8_t pch_rev_id;
> -
> -    num = ARRAY_SIZE(igd_combo_id_infos);
> -    for (i = 0; i < num; i++) {
> -        if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) {
> -            pch_dev_id = igd_combo_id_infos[i].pch_device_id;
> -            pch_rev_id = igd_combo_id_infos[i].pch_revision_id;
> -        }
> -    }
> -
> -    if (pch_dev_id == 0xffff) {
> -        return;
> -    }
> -
> -    /* Currently IGD drivers always need to access PCH by 1f.0. */
> -    bridge_dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
> -                                   "igd-passthrough-isa-bridge");
> -
> -    /*
> -     * Note that vendor id is always PCI_VENDOR_ID_INTEL.
> -     */
> -    if (!bridge_dev) {
> -        fprintf(stderr, "set igd-passthrough-isa-bridge failed!\n");
> -        return;
> -    }
> -    pci_config_set_device_id(bridge_dev->config, pch_dev_id);
> -    pci_config_set_revision(bridge_dev->config, pch_rev_id);
> +    pci_create_simple(bus, PCI_DEVFN(0x1f, 0), "igd-passthrough-isa-bridge");
>  }
>  
>  static void igd_register_types(void)
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index aa96288..18a7f72 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -693,7 +693,7 @@ xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
>      PCIDevice *d = &s->dev;
>  
>      gpu_dev_id = dev->device_id;
> -    igd_passthrough_isa_bridge_create(d->bus, gpu_dev_id);
> +    igd_passthrough_isa_bridge_create(d->bus);
>  }
>  
>  /* destroy. */
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index b0d6283..48cdd03 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -863,5 +863,5 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
>      (m)->compat_props = props; \
>  } while (0)
>  
> -extern void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id);
> +extern void igd_passthrough_isa_bridge_create(PCIBus *bus);
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 10/11] igd: handle igd-passthrough-isa-bridge setup in realize()
@ 2016-01-06 15:29     ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 15:29 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	Michael S. Tsirkin, qemu-devel, Cao jin, vfio-users

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> That way a simple '-device igd-passthrough-isa-bridge,addr=1f' will
> do the setup.

Is this going to change the QEMU command line arguments to use it?


> Also instead of looking up reasonable PCI IDs based on the graphic
> device id simply copy over the ids from the host, thereby reusing the
> infrastructure we have in place for the igd host bridges.  Less code,
> and should be more robust as we don't have to maintain the id table
> to keep things going.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/pci-host/igd.c    | 115 +++++++++++++--------------------------------------
>  hw/xen/xen_pt.c      |   2 +-
>  include/hw/i386/pc.h |   2 +-
>  3 files changed, 30 insertions(+), 89 deletions(-)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index 96b679d..8f32c39 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -123,111 +123,52 @@ static const TypeInfo igd_passthrough_q35_info = {
>      .class_init    = igd_passthrough_q35_class_init,
>  };
>  
> -typedef struct {
> -    uint16_t gpu_device_id;
> -    uint16_t pch_device_id;
> -    uint8_t pch_revision_id;
> -} IGDDeviceIDInfo;
> -
> -/* In real world different GPU should have different PCH. But actually
> - * the different PCH DIDs likely map to different PCH SKUs. We do the
> - * same thing for the GPU. For PCH, the different SKUs are going to be
> - * all the same silicon design and implementation, just different
> - * features turn on and off with fuses. The SW interfaces should be
> - * consistent across all SKUs in a given family (eg LPT). But just same
> - * features may not be supported.
> - *
> - * Most of these different PCH features probably don't matter to the
> - * Gfx driver, but obviously any difference in display port connections
> - * will so it should be fine with any PCH in case of passthrough.
> - *
> - * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell)
> - * scenarios, 0x9cc3 for BDW(Broadwell).
> - */
> -static const IGDDeviceIDInfo igd_combo_id_infos[] = {
> -    /* HSW Classic */
> -    {0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */
> -    {0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */
> -    {0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */
> -    {0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */
> -    {0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */
> -    /* HSW ULT */
> -    {0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */
> -    {0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */
> -    {0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */
> -    {0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */
> -    {0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */
> -    {0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */
> -    /* HSW CRW */
> -    {0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */
> -    {0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */
> -    /* HSW Server */
> -    {0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */
> -    /* HSW SRVR */
> -    {0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */
> -    /* BSW */
> -    {0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */
> -    {0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */
> -    {0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */
> -    {0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */
> -    {0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */
> -    {0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */
> -    {0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */
> -    {0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */
> -    {0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */
> -    {0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */
> -    {0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */
> +static const IGDHostInfo igd_isa_bridge_infos[] = {
> +    {PCI_VENDOR_ID,           2},
> +    {PCI_DEVICE_ID,           2},
> +    {PCI_REVISION_ID,         2},
> +    {PCI_SUBSYSTEM_VENDOR_ID, 2},
> +    {PCI_SUBSYSTEM_ID,        2},
>  };
>  
> +static void igd_pt_isa_bridge_realize(PCIDevice *pci_dev, Error **errp)
> +{
> +    Error *err = NULL;
> +
> +    if (pci_dev->devfn != PCI_DEVFN(0x1f, 0)) {
> +        error_setg(errp, "igd isa bridge must have address 1f.0");
> +        return;
> +    }
> +
> +    host_pci_config_copy(pci_dev, "0000:00:1f.0",
> +                         igd_isa_bridge_infos,
> +                         ARRAY_SIZE(igd_isa_bridge_infos),
> +                         &err);
> +    if (err != NULL) {
> +        error_propagate(errp, err);
> +        return;
> +    }
> +}
> +
>  static void isa_bridge_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
>      dc->desc        = "ISA bridge faked to support IGD PT";
> -    k->vendor_id    = PCI_VENDOR_ID_INTEL;
> +    k->realize      = igd_pt_isa_bridge_realize;
>      k->class_id     = PCI_CLASS_BRIDGE_ISA;
>  };
>  
>  static TypeInfo igd_passthrough_isa_bridge_info = {
>      .name          = "igd-passthrough-isa-bridge",
>      .parent        = TYPE_PCI_DEVICE,
> -    .instance_size = sizeof(PCIDevice),
>      .class_init = isa_bridge_class_init,
>  };
>  
> -void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id)
> +void igd_passthrough_isa_bridge_create(PCIBus *bus)
>  {
> -    struct PCIDevice *bridge_dev;
> -    int i, num;
> -    uint16_t pch_dev_id = 0xffff;
> -    uint8_t pch_rev_id;
> -
> -    num = ARRAY_SIZE(igd_combo_id_infos);
> -    for (i = 0; i < num; i++) {
> -        if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) {
> -            pch_dev_id = igd_combo_id_infos[i].pch_device_id;
> -            pch_rev_id = igd_combo_id_infos[i].pch_revision_id;
> -        }
> -    }
> -
> -    if (pch_dev_id == 0xffff) {
> -        return;
> -    }
> -
> -    /* Currently IGD drivers always need to access PCH by 1f.0. */
> -    bridge_dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
> -                                   "igd-passthrough-isa-bridge");
> -
> -    /*
> -     * Note that vendor id is always PCI_VENDOR_ID_INTEL.
> -     */
> -    if (!bridge_dev) {
> -        fprintf(stderr, "set igd-passthrough-isa-bridge failed!\n");
> -        return;
> -    }
> -    pci_config_set_device_id(bridge_dev->config, pch_dev_id);
> -    pci_config_set_revision(bridge_dev->config, pch_rev_id);
> +    pci_create_simple(bus, PCI_DEVFN(0x1f, 0), "igd-passthrough-isa-bridge");
>  }
>  
>  static void igd_register_types(void)
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index aa96288..18a7f72 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -693,7 +693,7 @@ xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
>      PCIDevice *d = &s->dev;
>  
>      gpu_dev_id = dev->device_id;
> -    igd_passthrough_isa_bridge_create(d->bus, gpu_dev_id);
> +    igd_passthrough_isa_bridge_create(d->bus);
>  }
>  
>  /* destroy. */
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index b0d6283..48cdd03 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -863,5 +863,5 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
>      (m)->compat_props = props; \
>  } while (0)
>  
> -extern void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id);
> +extern void igd_passthrough_isa_bridge_create(PCIBus *bus);
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
  2016-01-05 11:41   ` Gerd Hoffmann
@ 2016-01-06 15:36     ` Stefano Stabellini
  -1 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 15:36 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	Michael S. Tsirkin, qemu-devel, Cao jin, vfio-users,
	Paolo Bonzini, Richard Henderson

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> This patch moves igd-passthrough-isa-bridge creation out of the xen
> passthrough code into machine init.  It is triggered by the
> igd-passthru=on machine option.  Advantages:
> 
>  * This works for on both xen and kvm.
>  * It is activated for the pc machine type only, q35 has a real
>    isa bridge on 1f.0 and must be handled differently.  The q35
>    plan is https://lkml.org/lkml/2015/11/26/183 (should land in
>    the next merge window, i.e. linux 4.5).
>  * If we don't need it any more some day (intel is busy removing
>    chipset dependencies from the guest driver) we have a single
>    machine switch to just turn off all igd passthru chipset
>    tweaks.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/i386/pc_piix.c |  6 ++++++
>  hw/xen/xen_pt.c   | 14 --------------
>  2 files changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index f36222e..2afbbd3 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -281,6 +281,12 @@ static void pc_init1(MachineState *machine,
>      if (pcmc->pci_enabled) {
>          pc_pci_device_init(pci_bus);
>      }
> +
> +#ifdef CONFIG_LINUX
> +    if (machine->igd_gfx_passthru) {
> +        igd_passthrough_isa_bridge_create(pci_bus);
> +    }
> +#endif

One thing I don't like about this is that it is going to skip the checks
done in xen_pt_initfn. For example it is going to create the isa bridge,
even if there is going to be an error loading the vga bios or if the
device specified is not even an Intel graphic card.


>  }
>  
>  /* Looking for a pc_compat_2_4() function? It doesn't exist.
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index 18a7f72..5f626c9 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -685,17 +685,6 @@ static const MemoryListener xen_pt_io_listener = {
>      .priority = 10,
>  };
>  
> -static void
> -xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
> -                                      XenHostPCIDevice *dev)
> -{
> -    uint16_t gpu_dev_id;
> -    PCIDevice *d = &s->dev;
> -
> -    gpu_dev_id = dev->device_id;
> -    igd_passthrough_isa_bridge_create(d->bus);
> -}
> -
>  /* destroy. */
>  static void xen_pt_destroy(PCIDevice *d) {
>  
> @@ -810,9 +799,6 @@ static int xen_pt_initfn(PCIDevice *d)
>              xen_host_pci_device_put(&s->real_device);
>              return -1;
>          }
> -
> -        /* Register ISA bridge for passthrough GFX. */
> -        xen_igd_passthrough_isa_bridge_create(s, &s->real_device);
>      }
>  
>      /* Handle real device's MMIO/PIO BARs */
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-06 15:36     ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 15:36 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	Michael S. Tsirkin, qemu-devel, Cao jin, vfio-users,
	Paolo Bonzini, Richard Henderson

On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> This patch moves igd-passthrough-isa-bridge creation out of the xen
> passthrough code into machine init.  It is triggered by the
> igd-passthru=on machine option.  Advantages:
> 
>  * This works for on both xen and kvm.
>  * It is activated for the pc machine type only, q35 has a real
>    isa bridge on 1f.0 and must be handled differently.  The q35
>    plan is https://lkml.org/lkml/2015/11/26/183 (should land in
>    the next merge window, i.e. linux 4.5).
>  * If we don't need it any more some day (intel is busy removing
>    chipset dependencies from the guest driver) we have a single
>    machine switch to just turn off all igd passthru chipset
>    tweaks.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/i386/pc_piix.c |  6 ++++++
>  hw/xen/xen_pt.c   | 14 --------------
>  2 files changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index f36222e..2afbbd3 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -281,6 +281,12 @@ static void pc_init1(MachineState *machine,
>      if (pcmc->pci_enabled) {
>          pc_pci_device_init(pci_bus);
>      }
> +
> +#ifdef CONFIG_LINUX
> +    if (machine->igd_gfx_passthru) {
> +        igd_passthrough_isa_bridge_create(pci_bus);
> +    }
> +#endif

One thing I don't like about this is that it is going to skip the checks
done in xen_pt_initfn. For example it is going to create the isa bridge,
even if there is going to be an error loading the vga bios or if the
device specified is not even an Intel graphic card.


>  }
>  
>  /* Looking for a pc_compat_2_4() function? It doesn't exist.
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index 18a7f72..5f626c9 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -685,17 +685,6 @@ static const MemoryListener xen_pt_io_listener = {
>      .priority = 10,
>  };
>  
> -static void
> -xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
> -                                      XenHostPCIDevice *dev)
> -{
> -    uint16_t gpu_dev_id;
> -    PCIDevice *d = &s->dev;
> -
> -    gpu_dev_id = dev->device_id;
> -    igd_passthrough_isa_bridge_create(d->bus);
> -}
> -
>  /* destroy. */
>  static void xen_pt_destroy(PCIDevice *d) {
>  
> @@ -810,9 +799,6 @@ static int xen_pt_initfn(PCIDevice *d)
>              xen_host_pci_device_put(&s->real_device);
>              return -1;
>          }
> -
> -        /* Register ISA bridge for passthrough GFX. */
> -        xen_igd_passthrough_isa_bridge_create(s, &s->real_device);
>      }
>  
>      /* Handle real device's MMIO/PIO BARs */
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
  2016-01-06 14:41     ` Stefano Stabellini
@ 2016-01-06 15:45       ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-06 15:45 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, qemu-devel, Cao jin, vfio-users

> >  
> > +static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
> >  static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
> >  {
> > +    Error *err = NULL;
> >      uint32_t val = 0;
> >      int rc, i, num;
> >      int pos, len;
> 
> Can't we get the parent PCIDeviceClass realize function from pci_dev? So
> that we don't have to introduce i440fx_realize?

I don't think so ...

> >  
> > +    i440fx_realize = k->realize;
> >      k->realize = igd_pt_i440fx_realize;

... because we are overriding it right here.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
@ 2016-01-06 15:45       ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-06 15:45 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, qemu-devel, Cao jin, vfio-users

> >  
> > +static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
> >  static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
> >  {
> > +    Error *err = NULL;
> >      uint32_t val = 0;
> >      int rc, i, num;
> >      int pos, len;
> 
> Can't we get the parent PCIDeviceClass realize function from pci_dev? So
> that we don't have to introduce i440fx_realize?

I don't think so ...

> >  
> > +    i440fx_realize = k->realize;
> >      k->realize = igd_pt_i440fx_realize;

... because we are overriding it right here.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/11] igd: revamp host config read
  2016-01-06 15:02     ` Stefano Stabellini
@ 2016-01-06 15:51       ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-06 15:51 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, qemu-devel, Cao jin, vfio-users

> > +    for (i = 0; i < len; i++) {
> > +        rc = pread(config_fd, guest->config + list[i].offset,
> > +                   list[i].len, list[i].offset);
> > +        if (rc != list[i].len) {
> 
> pread is allowed to return early, returning the number of bytes read.
> 

This is a sysfs file though, not a socket or pipe where a partial read
makes sense and will actually happen.  If we can't read something
that'll be because the kernel denies access.

So IMHO it should be fine to treat anything which doesn't give us the
amount of bytes we asked for as an error condition.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 07/11] igd: revamp host config read
@ 2016-01-06 15:51       ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-06 15:51 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, qemu-devel, Cao jin, vfio-users

> > +    for (i = 0; i < len; i++) {
> > +        rc = pread(config_fd, guest->config + list[i].offset,
> > +                   list[i].len, list[i].offset);
> > +        if (rc != list[i].len) {
> 
> pread is allowed to return early, returning the number of bytes read.
> 

This is a sysfs file though, not a socket or pipe where a partial read
makes sense and will actually happen.  If we can't read something
that'll be because the kernel denies access.

So IMHO it should be fine to treat anything which doesn't give us the
amount of bytes we asked for as an error condition.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 10/11] igd: handle igd-passthrough-isa-bridge setup in realize()
  2016-01-06 15:29     ` Stefano Stabellini
@ 2016-01-06 15:52       ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-06 15:52 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Cao jin, vfio-users

On Mi, 2016-01-06 at 15:29 +0000, Stefano Stabellini wrote:
> On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> > That way a simple '-device igd-passthrough-isa-bridge,addr=1f' will
> > do the setup.
> 
> Is this going to change the QEMU command line arguments to use it?

See patch 11 ;)

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 10/11] igd: handle igd-passthrough-isa-bridge setup in realize()
@ 2016-01-06 15:52       ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-06 15:52 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Cao jin, vfio-users

On Mi, 2016-01-06 at 15:29 +0000, Stefano Stabellini wrote:
> On Tue, 5 Jan 2016, Gerd Hoffmann wrote:
> > That way a simple '-device igd-passthrough-isa-bridge,addr=1f' will
> > do the setup.
> 
> Is this going to change the QEMU command line arguments to use it?

See patch 11 ;)

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [PATCH v3 07/11] igd: revamp host config read
@ 2016-01-06 16:23         ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 16:23 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Wed, 6 Jan 2016, Gerd Hoffmann wrote:
> > > +    for (i = 0; i < len; i++) {
> > > +        rc = pread(config_fd, guest->config + list[i].offset,
> > > +                   list[i].len, list[i].offset);
> > > +        if (rc != list[i].len) {
> > 
> > pread is allowed to return early, returning the number of bytes read.
> > 
> 
> This is a sysfs file though, not a socket or pipe where a partial read
> makes sense and will actually happen.  If we can't read something
> that'll be because the kernel denies access.
> 
> So IMHO it should be fine to treat anything which doesn't give us the
> amount of bytes we asked for as an error condition.

True, still theoretically, it's possible for pread to return early. Who
knows what glibc and linux are going to do in the future.

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Xen-devel] [PATCH v3 07/11] igd: revamp host config read
@ 2016-01-06 16:23         ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-06 16:23 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, qemu-devel-qX2TKyscuCcdnm+yROfE0A, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA

On Wed, 6 Jan 2016, Gerd Hoffmann wrote:
> > > +    for (i = 0; i < len; i++) {
> > > +        rc = pread(config_fd, guest->config + list[i].offset,
> > > +                   list[i].len, list[i].offset);
> > > +        if (rc != list[i].len) {
> > 
> > pread is allowed to return early, returning the number of bytes read.
> > 
> 
> This is a sysfs file though, not a socket or pipe where a partial read
> makes sense and will actually happen.  If we can't read something
> that'll be because the kernel denies access.
> 
> So IMHO it should be fine to treat anything which doesn't give us the
> amount of bytes we asked for as an error condition.

True, still theoretically, it's possible for pread to return early. Who
knows what glibc and linux are going to do in the future.

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
  2016-01-06 15:36     ` Stefano Stabellini
@ 2016-01-07  7:38       ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-07  7:38 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Cao jin, vfio-users, Paolo Bonzini,
	Richard Henderson

  Hi,

> One thing I don't like about this is that it is going to skip the checks
> done in xen_pt_initfn.

Hmm?  Those checks are still done when you assign a igd ...

> For example it is going to create the isa bridge,
> even if there is going to be an error loading the vga bios or if the
> device specified is not even an Intel graphic card.

Creating the special igd-isa-bridge is no longer tied to actually
assigning a igd, but to the igd-passthru=on machine option being present
(and machine type being 'pc').

xen_pt_initfn checks that igd-passthru=on is set in case it finds a igd
device is assigned, that will make sure the igd-isa-bridge is present.

But, yes, you can create a igd-isa-bridge now even when not assigning a
igd device, either by specifying igd-passthru=on or using -device.  I
fail to see why this is a problem though, care to explain?

Also note that moving this to machine init nicely handles the fact that
the igd-isa-bridge is needed on 'pc' only, not on 'q35'.  If you don't
want create the igd-isa-bridge in machine init, what is your alternative
suggestion to handle this?

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-07  7:38       ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-07  7:38 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Cao jin, vfio-users, Paolo Bonzini,
	Richard Henderson

  Hi,

> One thing I don't like about this is that it is going to skip the checks
> done in xen_pt_initfn.

Hmm?  Those checks are still done when you assign a igd ...

> For example it is going to create the isa bridge,
> even if there is going to be an error loading the vga bios or if the
> device specified is not even an Intel graphic card.

Creating the special igd-isa-bridge is no longer tied to actually
assigning a igd, but to the igd-passthru=on machine option being present
(and machine type being 'pc').

xen_pt_initfn checks that igd-passthru=on is set in case it finds a igd
device is assigned, that will make sure the igd-isa-bridge is present.

But, yes, you can create a igd-isa-bridge now even when not assigning a
igd device, either by specifying igd-passthru=on or using -device.  I
fail to see why this is a problem though, care to explain?

Also note that moving this to machine init nicely handles the fact that
the igd-isa-bridge is needed on 'pc' only, not on 'q35'.  If you don't
want create the igd-isa-bridge in machine init, what is your alternative
suggestion to handle this?

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
  2016-01-07  7:38       ` Gerd Hoffmann
@ 2016-01-07 13:10         ` Stefano Stabellini
  -1 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-07 13:10 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	Andrew Cooper, Stefano Stabellini, qemu-devel, Jan Beulich,
	Cao jin, vfio-users, Paolo Bonzini, Richard Henderson

CC'ing the Xen x86 maintainers

On Thu, 7 Jan 2016, Gerd Hoffmann wrote:
>   Hi,
> 
> > One thing I don't like about this is that it is going to skip the checks
> > done in xen_pt_initfn.
> 
> Hmm?  Those checks are still done when you assign a igd ...

Their failure doesn't affect the creation of the bridge.


> > For example it is going to create the isa bridge,
> > even if there is going to be an error loading the vga bios or if the
> > device specified is not even an Intel graphic card.
> 
> Creating the special igd-isa-bridge is no longer tied to actually
> assigning a igd, but to the igd-passthru=on machine option being present
> (and machine type being 'pc').

and machine type 'xenfv', unless I am mistaken?


> xen_pt_initfn checks that igd-passthru=on is set in case it finds a igd
> device is assigned, that will make sure the igd-isa-bridge is present.
> 
> But, yes, you can create a igd-isa-bridge now even when not assigning a
> igd device, either by specifying igd-passthru=on or using -device.  I
> fail to see why this is a problem though, care to explain?

It is going to change the PCI layout of any virtual machines with a
config file containing

gfx_passthru="igd"

and no pci config line. A Xen 4.7 user could add gfx_passthru="igd" to
all her VM config files, because actually it does nothing unless an
Intel graphic card is assigned to the VM. With this change she couldn't
migrate their VMs from Xen 4.7 to Xen 4.8 safely because suddenly a new
bridge appears.

That said Xen 4.7 hasn't been released yet, so we could still change
the intended behaviour. But it would require a bit of coordination (the
qemu-xen tree in 4.7 is based on v2.4.1).



> Also note that moving this to machine init nicely handles the fact that
> the igd-isa-bridge is needed on 'pc' only, not on 'q35'.  If you don't
> want create the igd-isa-bridge in machine init, what is your alternative
> suggestion to handle this?

Maybe we could retain the check whether an Intel graphic card has been
assigned? 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-07 13:10         ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-07 13:10 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	Andrew Cooper, Stefano Stabellini, qemu-devel, Jan Beulich,
	Cao jin, vfio-users, Paolo Bonzini, Richard Henderson

CC'ing the Xen x86 maintainers

On Thu, 7 Jan 2016, Gerd Hoffmann wrote:
>   Hi,
> 
> > One thing I don't like about this is that it is going to skip the checks
> > done in xen_pt_initfn.
> 
> Hmm?  Those checks are still done when you assign a igd ...

Their failure doesn't affect the creation of the bridge.


> > For example it is going to create the isa bridge,
> > even if there is going to be an error loading the vga bios or if the
> > device specified is not even an Intel graphic card.
> 
> Creating the special igd-isa-bridge is no longer tied to actually
> assigning a igd, but to the igd-passthru=on machine option being present
> (and machine type being 'pc').

and machine type 'xenfv', unless I am mistaken?


> xen_pt_initfn checks that igd-passthru=on is set in case it finds a igd
> device is assigned, that will make sure the igd-isa-bridge is present.
> 
> But, yes, you can create a igd-isa-bridge now even when not assigning a
> igd device, either by specifying igd-passthru=on or using -device.  I
> fail to see why this is a problem though, care to explain?

It is going to change the PCI layout of any virtual machines with a
config file containing

gfx_passthru="igd"

and no pci config line. A Xen 4.7 user could add gfx_passthru="igd" to
all her VM config files, because actually it does nothing unless an
Intel graphic card is assigned to the VM. With this change she couldn't
migrate their VMs from Xen 4.7 to Xen 4.8 safely because suddenly a new
bridge appears.

That said Xen 4.7 hasn't been released yet, so we could still change
the intended behaviour. But it would require a bit of coordination (the
qemu-xen tree in 4.7 is based on v2.4.1).



> Also note that moving this to machine init nicely handles the fact that
> the igd-isa-bridge is needed on 'pc' only, not on 'q35'.  If you don't
> want create the igd-isa-bridge in machine init, what is your alternative
> suggestion to handle this?

Maybe we could retain the check whether an Intel graphic card has been
assigned? 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-07 15:50           ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-07 15:50 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	Andrew Cooper, qemu-devel, Jan Beulich, Cao jin, vfio-users,
	Paolo Bonzini, Richard Henderson

On Do, 2016-01-07 at 13:10 +0000, Stefano Stabellini wrote:
> CC'ing the Xen x86 maintainers
> 
> On Thu, 7 Jan 2016, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > One thing I don't like about this is that it is going to skip the checks
> > > done in xen_pt_initfn.
> > 
> > Hmm?  Those checks are still done when you assign a igd ...
> 
> Their failure doesn't affect the creation of the bridge.

Doesn't their failure makes qemu throw a fatal error and exit?
So the guest isn't going to run either way?

> > > For example it is going to create the isa bridge,
> > > even if there is going to be an error loading the vga bios or if the
> > > device specified is not even an Intel graphic card.
> > 
> > Creating the special igd-isa-bridge is no longer tied to actually
> > assigning a igd, but to the igd-passthru=on machine option being present
> > (and machine type being 'pc').
> 
> and machine type 'xenfv', unless I am mistaken?

Yes, xenfv too (uses i440fx too and thus is a 'pc' derivate).

> > xen_pt_initfn checks that igd-passthru=on is set in case it finds a igd
> > device is assigned, that will make sure the igd-isa-bridge is present.
> > 
> > But, yes, you can create a igd-isa-bridge now even when not assigning a
> > igd device, either by specifying igd-passthru=on or using -device.  I
> > fail to see why this is a problem though, care to explain?
> 
> It is going to change the PCI layout of any virtual machines with a
> config file containing
> 
> gfx_passthru="igd"
> 
> and no pci config line. A Xen 4.7 user could add gfx_passthru="igd" to
> all her VM config files, because actually it does nothing unless an
> Intel graphic card is assigned to the VM.

No.  It changes the host bridge even when not passing through a igd,
because that is linked to igd-passthru=on only.

So making both host bridge tweak and isa bridge tweak triggered by
igd-passthru=on brings more consistency to the whole thing.

> > Also note that moving this to machine init nicely handles the fact that
> > the igd-isa-bridge is needed on 'pc' only, not on 'q35'.  If you don't
> > want create the igd-isa-bridge in machine init, what is your alternative
> > suggestion to handle this?
> 
> Maybe we could retain the check whether an Intel graphic card has been
> assigned? 

Should be possible, but is not that easy due to initialization order
issues.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-07 15:50           ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-07 15:50 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Richard-H+wXaHxf7aLQT0dZR+AlfA, Michael S. Tsirkin, Cooper,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A, Jan Beulich,
	Andrew-H+wXaHxf7aLQT0dZR+AlfA, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA, Paolo Bonzini, Henderson

On Do, 2016-01-07 at 13:10 +0000, Stefano Stabellini wrote:
> CC'ing the Xen x86 maintainers
> 
> On Thu, 7 Jan 2016, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > One thing I don't like about this is that it is going to skip the checks
> > > done in xen_pt_initfn.
> > 
> > Hmm?  Those checks are still done when you assign a igd ...
> 
> Their failure doesn't affect the creation of the bridge.

Doesn't their failure makes qemu throw a fatal error and exit?
So the guest isn't going to run either way?

> > > For example it is going to create the isa bridge,
> > > even if there is going to be an error loading the vga bios or if the
> > > device specified is not even an Intel graphic card.
> > 
> > Creating the special igd-isa-bridge is no longer tied to actually
> > assigning a igd, but to the igd-passthru=on machine option being present
> > (and machine type being 'pc').
> 
> and machine type 'xenfv', unless I am mistaken?

Yes, xenfv too (uses i440fx too and thus is a 'pc' derivate).

> > xen_pt_initfn checks that igd-passthru=on is set in case it finds a igd
> > device is assigned, that will make sure the igd-isa-bridge is present.
> > 
> > But, yes, you can create a igd-isa-bridge now even when not assigning a
> > igd device, either by specifying igd-passthru=on or using -device.  I
> > fail to see why this is a problem though, care to explain?
> 
> It is going to change the PCI layout of any virtual machines with a
> config file containing
> 
> gfx_passthru="igd"
> 
> and no pci config line. A Xen 4.7 user could add gfx_passthru="igd" to
> all her VM config files, because actually it does nothing unless an
> Intel graphic card is assigned to the VM.

No.  It changes the host bridge even when not passing through a igd,
because that is linked to igd-passthru=on only.

So making both host bridge tweak and isa bridge tweak triggered by
igd-passthru=on brings more consistency to the whole thing.

> > Also note that moving this to machine init nicely handles the fact that
> > the igd-isa-bridge is needed on 'pc' only, not on 'q35'.  If you don't
> > want create the igd-isa-bridge in machine init, what is your alternative
> > suggestion to handle this?
> 
> Maybe we could retain the check whether an Intel graphic card has been
> assigned? 

Should be possible, but is not that easy due to initialization order
issues.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
  2016-01-07 15:50           ` Gerd Hoffmann
@ 2016-01-08 11:20             ` Stefano Stabellini
  -1 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-08 11:20 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	Andrew Cooper, Stefano Stabellini, qemu-devel, Jan Beulich,
	Cao jin, vfio-users, Paolo Bonzini, Richard Henderson

On Thu, 7 Jan 2016, Gerd Hoffmann wrote:
> On Do, 2016-01-07 at 13:10 +0000, Stefano Stabellini wrote:
> > CC'ing the Xen x86 maintainers
> > 
> > On Thu, 7 Jan 2016, Gerd Hoffmann wrote:
> > >   Hi,
> > > 
> > > > One thing I don't like about this is that it is going to skip the checks
> > > > done in xen_pt_initfn.
> > > 
> > > Hmm?  Those checks are still done when you assign a igd ...
> > 
> > Their failure doesn't affect the creation of the bridge.
> 
> Doesn't their failure makes qemu throw a fatal error and exit?
> So the guest isn't going to run either way?

No, it doesn't. QEMU doesn't even print an error message.


> > > > For example it is going to create the isa bridge,
> > > > even if there is going to be an error loading the vga bios or if the
> > > > device specified is not even an Intel graphic card.
> > > 
> > > Creating the special igd-isa-bridge is no longer tied to actually
> > > assigning a igd, but to the igd-passthru=on machine option being present
> > > (and machine type being 'pc').
> > 
> > and machine type 'xenfv', unless I am mistaken?
> 
> Yes, xenfv too (uses i440fx too and thus is a 'pc' derivate).

Good


> > > xen_pt_initfn checks that igd-passthru=on is set in case it finds a igd
> > > device is assigned, that will make sure the igd-isa-bridge is present.
> > > 
> > > But, yes, you can create a igd-isa-bridge now even when not assigning a
> > > igd device, either by specifying igd-passthru=on or using -device.  I
> > > fail to see why this is a problem though, care to explain?
> > 
> > It is going to change the PCI layout of any virtual machines with a
> > config file containing
> > 
> > gfx_passthru="igd"
> > 
> > and no pci config line. A Xen 4.7 user could add gfx_passthru="igd" to
> > all her VM config files, because actually it does nothing unless an
> > Intel graphic card is assigned to the VM.
> 
> No.  It changes the host bridge even when not passing through a igd,
> because that is linked to igd-passthru=on only.
>
> So making both host bridge tweak and isa bridge tweak triggered by
> igd-passthru=on brings more consistency to the whole thing.

That is true. Given that the only qemu-xen codebase with igd support is
4.7 and 4.7 hasn't been released yet, I am OK with changing the guest
visible PCI layout. I might ask for your help in backporting the patches
;-)


> > > Also note that moving this to machine init nicely handles the fact that
> > > the igd-isa-bridge is needed on 'pc' only, not on 'q35'.  If you don't
> > > want create the igd-isa-bridge in machine init, what is your alternative
> > > suggestion to handle this?
> > 
> > Maybe we could retain the check whether an Intel graphic card has been
> > assigned? 
> 
> Should be possible, but is not that easy due to initialization order
> issues.

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-08 11:20             ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-08 11:20 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	Andrew Cooper, Stefano Stabellini, qemu-devel, Jan Beulich,
	Cao jin, vfio-users, Paolo Bonzini, Richard Henderson

On Thu, 7 Jan 2016, Gerd Hoffmann wrote:
> On Do, 2016-01-07 at 13:10 +0000, Stefano Stabellini wrote:
> > CC'ing the Xen x86 maintainers
> > 
> > On Thu, 7 Jan 2016, Gerd Hoffmann wrote:
> > >   Hi,
> > > 
> > > > One thing I don't like about this is that it is going to skip the checks
> > > > done in xen_pt_initfn.
> > > 
> > > Hmm?  Those checks are still done when you assign a igd ...
> > 
> > Their failure doesn't affect the creation of the bridge.
> 
> Doesn't their failure makes qemu throw a fatal error and exit?
> So the guest isn't going to run either way?

No, it doesn't. QEMU doesn't even print an error message.


> > > > For example it is going to create the isa bridge,
> > > > even if there is going to be an error loading the vga bios or if the
> > > > device specified is not even an Intel graphic card.
> > > 
> > > Creating the special igd-isa-bridge is no longer tied to actually
> > > assigning a igd, but to the igd-passthru=on machine option being present
> > > (and machine type being 'pc').
> > 
> > and machine type 'xenfv', unless I am mistaken?
> 
> Yes, xenfv too (uses i440fx too and thus is a 'pc' derivate).

Good


> > > xen_pt_initfn checks that igd-passthru=on is set in case it finds a igd
> > > device is assigned, that will make sure the igd-isa-bridge is present.
> > > 
> > > But, yes, you can create a igd-isa-bridge now even when not assigning a
> > > igd device, either by specifying igd-passthru=on or using -device.  I
> > > fail to see why this is a problem though, care to explain?
> > 
> > It is going to change the PCI layout of any virtual machines with a
> > config file containing
> > 
> > gfx_passthru="igd"
> > 
> > and no pci config line. A Xen 4.7 user could add gfx_passthru="igd" to
> > all her VM config files, because actually it does nothing unless an
> > Intel graphic card is assigned to the VM.
> 
> No.  It changes the host bridge even when not passing through a igd,
> because that is linked to igd-passthru=on only.
>
> So making both host bridge tweak and isa bridge tweak triggered by
> igd-passthru=on brings more consistency to the whole thing.

That is true. Given that the only qemu-xen codebase with igd support is
4.7 and 4.7 hasn't been released yet, I am OK with changing the guest
visible PCI layout. I might ask for your help in backporting the patches
;-)


> > > Also note that moving this to machine init nicely handles the fact that
> > > the igd-isa-bridge is needed on 'pc' only, not on 'q35'.  If you don't
> > > want create the igd-isa-bridge in machine init, what is your alternative
> > > suggestion to handle this?
> > 
> > Maybe we could retain the check whether an Intel graphic card has been
> > assigned? 
> 
> Should be possible, but is not that easy due to initialization order
> issues.

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-08 12:12               ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-08 12:12 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	Andrew Cooper, xudong.hao, qemu-devel, Jan Beulich, Cao jin,
	Gerd Hoffmann, Paolo Bonzini, Richard Henderson, vfio-users

On Fri, 8 Jan 2016, Stefano Stabellini wrote:
> > > > xen_pt_initfn checks that igd-passthru=on is set in case it finds a igd
> > > > device is assigned, that will make sure the igd-isa-bridge is present.
> > > > 
> > > > But, yes, you can create a igd-isa-bridge now even when not assigning a
> > > > igd device, either by specifying igd-passthru=on or using -device.  I
> > > > fail to see why this is a problem though, care to explain?
> > > 
> > > It is going to change the PCI layout of any virtual machines with a
> > > config file containing
> > > 
> > > gfx_passthru="igd"
> > > 
> > > and no pci config line. A Xen 4.7 user could add gfx_passthru="igd" to
> > > all her VM config files, because actually it does nothing unless an
> > > Intel graphic card is assigned to the VM.
> > 
> > No.  It changes the host bridge even when not passing through a igd,
> > because that is linked to igd-passthru=on only.
> >
> > So making both host bridge tweak and isa bridge tweak triggered by
> > igd-passthru=on brings more consistency to the whole thing.
> 
> That is true. Given that the only qemu-xen codebase with igd support is
> 4.7 and 4.7 hasn't been released yet, I am OK with changing the guest
> visible PCI layout. I might ask for your help in backporting the patches
> ;-)

One thing that I forgot to consider is that QEMU 2.5 has been released
with igd passthrough too and Xen 4.6 + QEMU 2.5 is a combination we
should support.

However QEMU 2.5 has a serious bug
(http://marc.info/?l=qemu-devel&m=145172165010604) which probably
prevents igd passthrough from working at all. I asked Xudong to
investigate. I am thinking that if the feature works in 2.5, we need to
support it, therefore we cannot break migration by changing the PCI
layout.  Otherwise if the feature doesn't work, we could take the
liberty to make the change.  Do you agree?

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-08 12:12               ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-08 12:12 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Michael S. Tsirkin, Andrew Cooper,
	xudong.hao-ral2JQCrhuEAvxtiuMwx3w,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A, Jan Beulich, Cao jin,
	Paolo Bonzini, Richard Henderson,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA

On Fri, 8 Jan 2016, Stefano Stabellini wrote:
> > > > xen_pt_initfn checks that igd-passthru=on is set in case it finds a igd
> > > > device is assigned, that will make sure the igd-isa-bridge is present.
> > > > 
> > > > But, yes, you can create a igd-isa-bridge now even when not assigning a
> > > > igd device, either by specifying igd-passthru=on or using -device.  I
> > > > fail to see why this is a problem though, care to explain?
> > > 
> > > It is going to change the PCI layout of any virtual machines with a
> > > config file containing
> > > 
> > > gfx_passthru="igd"
> > > 
> > > and no pci config line. A Xen 4.7 user could add gfx_passthru="igd" to
> > > all her VM config files, because actually it does nothing unless an
> > > Intel graphic card is assigned to the VM.
> > 
> > No.  It changes the host bridge even when not passing through a igd,
> > because that is linked to igd-passthru=on only.
> >
> > So making both host bridge tweak and isa bridge tweak triggered by
> > igd-passthru=on brings more consistency to the whole thing.
> 
> That is true. Given that the only qemu-xen codebase with igd support is
> 4.7 and 4.7 hasn't been released yet, I am OK with changing the guest
> visible PCI layout. I might ask for your help in backporting the patches
> ;-)

One thing that I forgot to consider is that QEMU 2.5 has been released
with igd passthrough too and Xen 4.6 + QEMU 2.5 is a combination we
should support.

However QEMU 2.5 has a serious bug
(http://marc.info/?l=qemu-devel&m=145172165010604) which probably
prevents igd passthrough from working at all. I asked Xudong to
investigate. I am thinking that if the feature works in 2.5, we need to
support it, therefore we cannot break migration by changing the PCI
layout.  Otherwise if the feature doesn't work, we could take the
liberty to make the change.  Do you agree?

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
  2016-01-08 12:12               ` Stefano Stabellini
@ 2016-01-08 12:32                 ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-08 12:32 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	Andrew Cooper, xudong.hao, qemu-devel, Jan Beulich, Cao jin,
	vfio-users, Paolo Bonzini, Richard Henderson

  Hi,

> > That is true. Given that the only qemu-xen codebase with igd support is
> > 4.7 and 4.7 hasn't been released yet, I am OK with changing the guest
> > visible PCI layout. I might ask for your help in backporting the patches
> > ;-)

What are the 4.7 release plans btw?

> One thing that I forgot to consider is that QEMU 2.5 has been released
> with igd passthrough too and Xen 4.6 + QEMU 2.5 is a combination we
> should support.
> 
> However QEMU 2.5 has a serious bug
> (http://marc.info/?l=qemu-devel&m=145172165010604) which probably
> prevents igd passthrough from working at all.

Stumbled over that one too, so my series has a (different) fix for it as
well.

> I asked Xudong to investigate.  I am thinking that if the feature
> works in 2.5, we need to support it, therefore we cannot break
> migration by changing the PCI layout.

I'd expect it is broken (at least for older guests).  In case 2.5 works
fine as-is we should be able to ditch
TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE altogether b/c it is a no-op in
2.5 because of the bug.

But lets wait for the test results ...

> Otherwise if the feature doesn't work, we could take the liberty to
> make the change.  Do you agree?

Yes.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-08 12:32                 ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-08 12:32 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	Andrew Cooper, xudong.hao, qemu-devel, Jan Beulich, Cao jin,
	vfio-users, Paolo Bonzini, Richard Henderson

  Hi,

> > That is true. Given that the only qemu-xen codebase with igd support is
> > 4.7 and 4.7 hasn't been released yet, I am OK with changing the guest
> > visible PCI layout. I might ask for your help in backporting the patches
> > ;-)

What are the 4.7 release plans btw?

> One thing that I forgot to consider is that QEMU 2.5 has been released
> with igd passthrough too and Xen 4.6 + QEMU 2.5 is a combination we
> should support.
> 
> However QEMU 2.5 has a serious bug
> (http://marc.info/?l=qemu-devel&m=145172165010604) which probably
> prevents igd passthrough from working at all.

Stumbled over that one too, so my series has a (different) fix for it as
well.

> I asked Xudong to investigate.  I am thinking that if the feature
> works in 2.5, we need to support it, therefore we cannot break
> migration by changing the PCI layout.

I'd expect it is broken (at least for older guests).  In case 2.5 works
fine as-is we should be able to ditch
TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE altogether b/c it is a no-op in
2.5 because of the bug.

But lets wait for the test results ...

> Otherwise if the feature doesn't work, we could take the liberty to
> make the change.  Do you agree?

Yes.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
  2016-01-08 12:32                 ` Gerd Hoffmann
@ 2016-01-08 12:38                   ` Stefano Stabellini
  -1 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-08 12:38 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	Andrew Cooper, Stefano Stabellini, xudong.hao, qemu-devel,
	Jan Beulich, Cao jin, vfio-users, Paolo Bonzini,
	Richard Henderson

On Fri, 8 Jan 2016, Gerd Hoffmann wrote:
>   Hi,
> 
> > > That is true. Given that the only qemu-xen codebase with igd support is
> > > 4.7 and 4.7 hasn't been released yet, I am OK with changing the guest
> > > visible PCI layout. I might ask for your help in backporting the patches
> > > ;-)
> 
> What are the 4.7 release plans btw?

This is the last development update from the release manager:

http://marc.info/?l=qemu-devel&m=145172165010604

* Last posting date: March 18, 2016
* Hard code freeze: April 1, 2016
* RC1: TBD
* Release: June 3, 2016

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init
@ 2016-01-08 12:38                   ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-08 12:38 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Michael S. Tsirkin,
	Andrew Cooper, Stefano Stabellini, xudong.hao, qemu-devel,
	Jan Beulich, Cao jin, vfio-users, Paolo Bonzini,
	Richard Henderson

On Fri, 8 Jan 2016, Gerd Hoffmann wrote:
>   Hi,
> 
> > > That is true. Given that the only qemu-xen codebase with igd support is
> > > 4.7 and 4.7 hasn't been released yet, I am OK with changing the guest
> > > visible PCI layout. I might ask for your help in backporting the patches
> > > ;-)
> 
> What are the 4.7 release plans btw?

This is the last development update from the release manager:

http://marc.info/?l=qemu-devel&m=145172165010604

* Last posting date: March 18, 2016
* Hard code freeze: April 1, 2016
* RC1: TBD
* Release: June 3, 2016

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 02/11] pc: remove has_igd_gfx_passthru global
  2016-01-05 11:41   ` Gerd Hoffmann
@ 2016-01-19 15:09     ` Eduardo Habkost
  -1 siblings, 0 replies; 132+ messages in thread
From: Eduardo Habkost @ 2016-01-19 15:09 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Stefano Stabellini, qemu-devel, Cao jin,
	vfio-users, Paolo Bonzini

On Tue, Jan 05, 2016 at 12:41:29PM +0100, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 02/11] pc: remove has_igd_gfx_passthru global
@ 2016-01-19 15:09     ` Eduardo Habkost
  0 siblings, 0 replies; 132+ messages in thread
From: Eduardo Habkost @ 2016-01-19 15:09 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Stefano Stabellini, qemu-devel, Cao jin,
	vfio-users, Paolo Bonzini

On Tue, Jan 05, 2016 at 12:41:29PM +0100, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
@ 2016-01-19 15:13         ` Eduardo Habkost
  0 siblings, 0 replies; 132+ messages in thread
From: Eduardo Habkost @ 2016-01-19 15:13 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Stefano Stabellini, qemu-devel, Cao jin, vfio-users

On Wed, Jan 06, 2016 at 04:45:01PM +0100, Gerd Hoffmann wrote:
> > >  
> > > +static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
> > >  static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
> > >  {
> > > +    Error *err = NULL;
> > >      uint32_t val = 0;
> > >      int rc, i, num;
> > >      int pos, len;
> > 
> > Can't we get the parent PCIDeviceClass realize function from pci_dev? So
> > that we don't have to introduce i440fx_realize?
> 
> I don't think so ...
> 
> > >  
> > > +    i440fx_realize = k->realize;
> > >      k->realize = igd_pt_i440fx_realize;
> 
> ... because we are overriding it right here.

Many device classes have a parent_realize field so they can keep
a pointer to the original realize function. It's better than a
static variable.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
@ 2016-01-19 15:13         ` Eduardo Habkost
  0 siblings, 0 replies; 132+ messages in thread
From: Eduardo Habkost @ 2016-01-19 15:13 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Stefano Stabellini,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA

On Wed, Jan 06, 2016 at 04:45:01PM +0100, Gerd Hoffmann wrote:
> > >  
> > > +static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
> > >  static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
> > >  {
> > > +    Error *err = NULL;
> > >      uint32_t val = 0;
> > >      int rc, i, num;
> > >      int pos, len;
> > 
> > Can't we get the parent PCIDeviceClass realize function from pci_dev? So
> > that we don't have to introduce i440fx_realize?
> 
> I don't think so ...
> 
> > >  
> > > +    i440fx_realize = k->realize;
> > >      k->realize = igd_pt_i440fx_realize;
> 
> ... because we are overriding it right here.

Many device classes have a parent_realize field so they can keep
a pointer to the original realize function. It's better than a
static variable.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
@ 2016-01-20  9:10           ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-20  9:10 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: igvt-g, xen-devel, Stefano Stabellini, qemu-devel, Cao jin, vfio-users

[-- Attachment #1: Type: text/plain, Size: 413 bytes --]

  Hi,

> > > > +    i440fx_realize = k->realize;
> > > >      k->realize = igd_pt_i440fx_realize;
> > 
> > ... because we are overriding it right here.
> 
> Many device classes have a parent_realize field so they can keep
> a pointer to the original realize function. It's better than a
> static variable.

How does the attached patch (incremental fix, not tested yet) look like?

cheers,
  Gerd


[-- Attachment #2: 0001-fixup-TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE-realize.patch --]
[-- Type: text/x-patch, Size: 2398 bytes --]

From 3d110e297b5182107e055db3ab69092affdef5bb Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Wed, 20 Jan 2016 10:08:19 +0100
Subject: [PATCH] [fixup] TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE realize_parent

---
 hw/pci-host/igd.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index 160828f..2d51745 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -49,12 +49,24 @@ out_free:
     g_free(path);
 }
 
-static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
+#define IGD_PT_I440FX_CLASS(class)                              \
+    OBJECT_CLASS_CHECK(IGDPtI440fxClass, (class),               \
+                       TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE)
+#define IGD_PT_I440FX_GET_CLASS(obj)                            \
+    OBJECT_GET_CLASS(IGDPtI440fxClass, (obj),                   \
+                     TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE)
+
+typedef struct IGDPtI440fxClass {
+    PCIDeviceClass parent_class;
+    void (*parent_realize)(PCIDevice *dev, Error **errp);
+} IGDPtI440fxClass;
+
 static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
 {
+    IGDPtI440fxClass *k = IGD_PT_I440FX_GET_CLASS(pci_dev);
     Error *err = NULL;
 
-    i440fx_realize(pci_dev, &err);
+    k->parent_realize(pci_dev, &err);
     if (err != NULL) {
         error_propagate(errp, err);
         return;
@@ -72,11 +84,12 @@ static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
 
 static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
 {
+    IGDPtI440fxClass *k = IGD_PT_I440FX_CLASS(klass);
     DeviceClass *dc = DEVICE_CLASS(klass);
-    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+    PCIDeviceClass *pc = PCI_DEVICE_CLASS(klass);
 
-    i440fx_realize = k->realize;
-    k->realize = igd_pt_i440fx_realize;
+    k->parent_realize = pc->realize;
+    pc->realize = igd_pt_i440fx_realize;
     dc->desc = "IGD Passthrough Host bridge (i440fx)";
 }
 
@@ -84,6 +97,7 @@ static const TypeInfo igd_passthrough_i440fx_info = {
     .name          = TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE,
     .parent        = TYPE_I440FX_PCI_DEVICE,
     .class_init    = igd_passthrough_i440fx_class_init,
+    .class_size    = sizeof(IGDPtI440fxClass),
 };
 
 static void (*q35_realize)(PCIDevice *pci_dev, Error **errp);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
@ 2016-01-20  9:10           ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-20  9:10 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Stefano Stabellini,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA

[-- Attachment #1: Type: text/plain, Size: 413 bytes --]

  Hi,

> > > > +    i440fx_realize = k->realize;
> > > >      k->realize = igd_pt_i440fx_realize;
> > 
> > ... because we are overriding it right here.
> 
> Many device classes have a parent_realize field so they can keep
> a pointer to the original realize function. It's better than a
> static variable.

How does the attached patch (incremental fix, not tested yet) look like?

cheers,
  Gerd


[-- Attachment #2: 0001-fixup-TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE-realize.patch --]
[-- Type: text/x-patch, Size: 2398 bytes --]

From 3d110e297b5182107e055db3ab69092affdef5bb Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Wed, 20 Jan 2016 10:08:19 +0100
Subject: [PATCH] [fixup] TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE realize_parent

---
 hw/pci-host/igd.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
index 160828f..2d51745 100644
--- a/hw/pci-host/igd.c
+++ b/hw/pci-host/igd.c
@@ -49,12 +49,24 @@ out_free:
     g_free(path);
 }
 
-static void (*i440fx_realize)(PCIDevice *pci_dev, Error **errp);
+#define IGD_PT_I440FX_CLASS(class)                              \
+    OBJECT_CLASS_CHECK(IGDPtI440fxClass, (class),               \
+                       TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE)
+#define IGD_PT_I440FX_GET_CLASS(obj)                            \
+    OBJECT_GET_CLASS(IGDPtI440fxClass, (obj),                   \
+                     TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE)
+
+typedef struct IGDPtI440fxClass {
+    PCIDeviceClass parent_class;
+    void (*parent_realize)(PCIDevice *dev, Error **errp);
+} IGDPtI440fxClass;
+
 static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
 {
+    IGDPtI440fxClass *k = IGD_PT_I440FX_GET_CLASS(pci_dev);
     Error *err = NULL;
 
-    i440fx_realize(pci_dev, &err);
+    k->parent_realize(pci_dev, &err);
     if (err != NULL) {
         error_propagate(errp, err);
         return;
@@ -72,11 +84,12 @@ static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
 
 static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
 {
+    IGDPtI440fxClass *k = IGD_PT_I440FX_CLASS(klass);
     DeviceClass *dc = DEVICE_CLASS(klass);
-    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+    PCIDeviceClass *pc = PCI_DEVICE_CLASS(klass);
 
-    i440fx_realize = k->realize;
-    k->realize = igd_pt_i440fx_realize;
+    k->parent_realize = pc->realize;
+    pc->realize = igd_pt_i440fx_realize;
     dc->desc = "IGD Passthrough Host bridge (i440fx)";
 }
 
@@ -84,6 +97,7 @@ static const TypeInfo igd_passthrough_i440fx_info = {
     .name          = TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE,
     .parent        = TYPE_I440FX_PCI_DEVICE,
     .class_init    = igd_passthrough_i440fx_class_init,
+    .class_size    = sizeof(IGDPtI440fxClass),
 };
 
 static void (*q35_realize)(PCIDevice *pci_dev, Error **errp);
-- 
1.8.3.1


[-- Attachment #3: Type: text/plain, Size: 174 bytes --]

_______________________________________________
vfio-users mailing list
vfio-users-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
https://www.redhat.com/mailman/listinfo/vfio-users

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
  2016-01-05 11:41   ` Gerd Hoffmann
@ 2016-01-23 14:51     ` Eduardo Habkost
  -1 siblings, 0 replies; 132+ messages in thread
From: Eduardo Habkost @ 2016-01-23 14:51 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Stefano Stabellini, qemu-devel, Cao jin, vfio-users

On Tue, Jan 05, 2016 at 12:41:31PM +0100, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/pci-host/igd.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index ef0273b..d1eeafb 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -53,7 +53,7 @@ out:
>      return ret;
>  }
>  
> -static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
> +static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
>  {
>      uint32_t val = 0;
>      int rc, i, num;
> @@ -65,12 +65,11 @@ static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
>          len = igd_host_bridge_infos[i].len;
>          rc = host_pci_config_read(pos, len, val);
>          if (rc) {
> -            return -ENODEV;
> +            error_setg(errp, "failed to read host config");
> +            return;
>          }
>          pci_default_write_config(pci_dev, pos, val, len);
>      }
> -
> -    return 0;
>  }
>  
>  static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> @@ -78,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
> -    k->init = igd_pt_i440fx_initfn;
> +    k->realize = igd_pt_i440fx_realize;

I am trying to understand how this have ever worked before:

* PCIDeviceClass::init is called by pci_default_realize()
  (default value for PCIDeviceClass::realize)
* i440fx_class_init() overrides PCIDeviceClass::realize
  to i440fx_realize()

So, when exactly was igd_pt_i440fx_realize() being called, before
this series? I don't have a Xen host to be able to test it using
xenfv, and if I test "-machine pc,igd-passthrough=on" after
applying patch 01/11, I don't see igd_pt_i440fx_initfn() being
called at all.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
@ 2016-01-23 14:51     ` Eduardo Habkost
  0 siblings, 0 replies; 132+ messages in thread
From: Eduardo Habkost @ 2016-01-23 14:51 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Stefano Stabellini, qemu-devel, Cao jin, vfio-users

On Tue, Jan 05, 2016 at 12:41:31PM +0100, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/pci-host/igd.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/pci-host/igd.c b/hw/pci-host/igd.c
> index ef0273b..d1eeafb 100644
> --- a/hw/pci-host/igd.c
> +++ b/hw/pci-host/igd.c
> @@ -53,7 +53,7 @@ out:
>      return ret;
>  }
>  
> -static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
> +static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
>  {
>      uint32_t val = 0;
>      int rc, i, num;
> @@ -65,12 +65,11 @@ static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev)
>          len = igd_host_bridge_infos[i].len;
>          rc = host_pci_config_read(pos, len, val);
>          if (rc) {
> -            return -ENODEV;
> +            error_setg(errp, "failed to read host config");
> +            return;
>          }
>          pci_default_write_config(pci_dev, pos, val, len);
>      }
> -
> -    return 0;
>  }
>  
>  static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> @@ -78,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
> -    k->init = igd_pt_i440fx_initfn;
> +    k->realize = igd_pt_i440fx_realize;

I am trying to understand how this have ever worked before:

* PCIDeviceClass::init is called by pci_default_realize()
  (default value for PCIDeviceClass::realize)
* i440fx_class_init() overrides PCIDeviceClass::realize
  to i440fx_realize()

So, when exactly was igd_pt_i440fx_realize() being called, before
this series? I don't have a Xen host to be able to test it using
xenfv, and if I test "-machine pc,igd-passthrough=on" after
applying patch 01/11, I don't see igd_pt_i440fx_initfn() being
called at all.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
  2016-01-20  9:10           ` Gerd Hoffmann
@ 2016-01-23 14:52             ` Eduardo Habkost
  -1 siblings, 0 replies; 132+ messages in thread
From: Eduardo Habkost @ 2016-01-23 14:52 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Stefano Stabellini, qemu-devel, Cao jin, vfio-users

On Wed, Jan 20, 2016 at 10:10:11AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > > > > +    i440fx_realize = k->realize;
> > > > >      k->realize = igd_pt_i440fx_realize;
> > > 
> > > ... because we are overriding it right here.
> > 
> > Many device classes have a parent_realize field so they can keep
> > a pointer to the original realize function. It's better than a
> > static variable.
> 
> How does the attached patch (incremental fix, not tested yet) look like?

Looks good.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

But, I have a similar question to the one I had about patch
04/11: how did this ever work before?

Does that mean i440fx_realize() was never called when
creating/initializing a TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE
before?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize
@ 2016-01-23 14:52             ` Eduardo Habkost
  0 siblings, 0 replies; 132+ messages in thread
From: Eduardo Habkost @ 2016-01-23 14:52 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Stefano Stabellini, qemu-devel, Cao jin, vfio-users

On Wed, Jan 20, 2016 at 10:10:11AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > > > > +    i440fx_realize = k->realize;
> > > > >      k->realize = igd_pt_i440fx_realize;
> > > 
> > > ... because we are overriding it right here.
> > 
> > Many device classes have a parent_realize field so they can keep
> > a pointer to the original realize function. It's better than a
> > static variable.
> 
> How does the attached patch (incremental fix, not tested yet) look like?

Looks good.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

But, I have a similar question to the one I had about patch
04/11: how did this ever work before?

Does that mean i440fx_realize() was never called when
creating/initializing a TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE
before?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
  2016-01-23 14:51     ` Eduardo Habkost
@ 2016-01-25  8:59       ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-25  8:59 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: igvt-g, xen-devel, Stefano Stabellini, qemu-devel, Cao jin, vfio-users

  Hi,

> >  static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> > @@ -78,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> >      DeviceClass *dc = DEVICE_CLASS(klass);
> >      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> >  
> > -    k->init = igd_pt_i440fx_initfn;
> > +    k->realize = igd_pt_i440fx_realize;
> 
> I am trying to understand how this have ever worked before:
> 
> * PCIDeviceClass::init is called by pci_default_realize()
>   (default value for PCIDeviceClass::realize)
> * i440fx_class_init() overrides PCIDeviceClass::realize
>   to i440fx_realize()
> 
> So, when exactly was igd_pt_i440fx_realize() being called, before
> this series?

It simply didn't?

I suspect this got ported over from the qemu-xen tree, but wasn't really
tested and also not adapted to commit "9af21db pci: Trivial device model
conversions to realize".  So this patch actually is yet another
bugfix ...

Current test status of this series (by Hao) is this:

  * newer linux intel drivers work just fine without chipset tweaks.
  * older linux intel drivers and windows guests don't work even with
    all the fixes in this series.

So applying the series doesn't improve things at all (code cleanups
aside).  My current plan to go forward is:

  (a) get test hardware (wip atm).
  (b) go figure what is really needed, lots of testing.
  (c) rework and repost series.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
@ 2016-01-25  8:59       ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-25  8:59 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: igvt-g, xen-devel, Stefano Stabellini, qemu-devel, Cao jin, vfio-users

  Hi,

> >  static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> > @@ -78,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> >      DeviceClass *dc = DEVICE_CLASS(klass);
> >      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> >  
> > -    k->init = igd_pt_i440fx_initfn;
> > +    k->realize = igd_pt_i440fx_realize;
> 
> I am trying to understand how this have ever worked before:
> 
> * PCIDeviceClass::init is called by pci_default_realize()
>   (default value for PCIDeviceClass::realize)
> * i440fx_class_init() overrides PCIDeviceClass::realize
>   to i440fx_realize()
> 
> So, when exactly was igd_pt_i440fx_realize() being called, before
> this series?

It simply didn't?

I suspect this got ported over from the qemu-xen tree, but wasn't really
tested and also not adapted to commit "9af21db pci: Trivial device model
conversions to realize".  So this patch actually is yet another
bugfix ...

Current test status of this series (by Hao) is this:

  * newer linux intel drivers work just fine without chipset tweaks.
  * older linux intel drivers and windows guests don't work even with
    all the fixes in this series.

So applying the series doesn't improve things at all (code cleanups
aside).  My current plan to go forward is:

  (a) get test hardware (wip atm).
  (b) go figure what is really needed, lots of testing.
  (c) rework and repost series.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
  2016-01-25  8:59       ` Gerd Hoffmann
@ 2016-01-25 11:53         ` Stefano Stabellini
  -1 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-25 11:53 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Mon, 25 Jan 2016, Gerd Hoffmann wrote:
>   Hi,
> 
> > >  static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> > > @@ -78,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> > >      DeviceClass *dc = DEVICE_CLASS(klass);
> > >      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > >  
> > > -    k->init = igd_pt_i440fx_initfn;
> > > +    k->realize = igd_pt_i440fx_realize;
> > 
> > I am trying to understand how this have ever worked before:
> > 
> > * PCIDeviceClass::init is called by pci_default_realize()
> >   (default value for PCIDeviceClass::realize)
> > * i440fx_class_init() overrides PCIDeviceClass::realize
> >   to i440fx_realize()
> > 
> > So, when exactly was igd_pt_i440fx_realize() being called, before
> > this series?
> 
> It simply didn't?
> 
> I suspect this got ported over from the qemu-xen tree, but wasn't really
> tested and also not adapted to commit "9af21db pci: Trivial device model
> conversions to realize".  So this patch actually is yet another
> bugfix ...

You are probably right. For your reference, the original code is here:

http://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=blob_plain;f=hw/pt-graphics.c;hb=HEAD

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize
@ 2016-01-25 11:53         ` Stefano Stabellini
  0 siblings, 0 replies; 132+ messages in thread
From: Stefano Stabellini @ 2016-01-25 11:53 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Mon, 25 Jan 2016, Gerd Hoffmann wrote:
>   Hi,
> 
> > >  static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> > > @@ -78,7 +77,7 @@ static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data)
> > >      DeviceClass *dc = DEVICE_CLASS(klass);
> > >      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > >  
> > > -    k->init = igd_pt_i440fx_initfn;
> > > +    k->realize = igd_pt_i440fx_realize;
> > 
> > I am trying to understand how this have ever worked before:
> > 
> > * PCIDeviceClass::init is called by pci_default_realize()
> >   (default value for PCIDeviceClass::realize)
> > * i440fx_class_init() overrides PCIDeviceClass::realize
> >   to i440fx_realize()
> > 
> > So, when exactly was igd_pt_i440fx_realize() being called, before
> > this series?
> 
> It simply didn't?
> 
> I suspect this got ported over from the qemu-xen tree, but wasn't really
> tested and also not adapted to commit "9af21db pci: Trivial device model
> conversions to realize".  So this patch actually is yet another
> bugfix ...

You are probably right. For your reference, the original code is here:

http://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=blob_plain;f=hw/pt-graphics.c;hb=HEAD

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-05 11:41 ` Gerd Hoffmann
@ 2016-01-28 19:35   ` Alex Williamson
  -1 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-01-28 19:35 UTC (permalink / raw)
  To: Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users

On Tue, 2016-01-05 at 12:41 +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> We have some code in our tree to support pci passthrough of intel
> graphics devices (igd) on xen, which requires some chipset tweaks
> for (a) the host bridge and (b) the lpc/isa-bridge to meat the
> expectations of the guest driver.
> 
> For kvm we need pretty much the same, also the requirements for vgpu
> (xengt/kvmgt) are very simliar.  This patch wires up the existing
> support for kvm.  It also brings a bunch of bugfixes and cleanups.
> 
> Unfortunaly the oldish laptop I had planned to use for testing turned
> out to have no working iommu support for igd, so this patch series
> still has seen very light testing only.  Any testing feedback is very
> welcome.
> 

Hi Gerd,

I believe I have working code for getting the IGD OpRegion from the host
into QEMU using a vfio device specific region, but now comes the part of
how do we expose it into the VM and I'm looking for suggestions.
Effectively in vfio-pci I have a MemoryRegion that can access the host
OpRegion.  We can map that directly into the guest, map it read-only
into the guest, or we can read out the contents and have our own virtual
version of it.  So let me throw out the options, some of which come from
you, and we can hammer out which way to go.

1) The OpRegion MemoryRegion is mapped into system_memory through
programming of the 0xFC config space register.
 a) vfio-pci could pick an address to do this as it is realized.
 b) SeaBIOS/OVMF could program this.

Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to
pick an address and mark it as e820 reserved.  I'm not sure how to pick
that address.  We'd probably want to make the 0xFC config register
read-only.  1.b) has the issue you mentioned where in most cases the
OpRegion will be 8k, but the BIOS won't know how much address space it's
mapping into system memory when it writes the 0xFC register.  I don't
know how much of a problem this is since the BIOS can easily determine
the size once mapped and re-map it somewhere there's sufficient space.
Practically, it seems like it's always going to be 8K.  This of course
requires modification to every BIOS.  It also leaves the 0xFC register
as a mapping control rather than a pointer to the OpRegion in RAM, which
doesn't really match real hardware.  The BIOS would need to pick an
address in this case.

2) Read-only mappings version of 1)

Discussion: Really nothing changes from the issues above, just prevents
any possibility of the guest modifying anything in the host.  Xen
apparently allows write access to the host page already.

3) Copy OpRegion contents into buffer and do either 1) or 2) above.

Discussion: No benefit that I can see over above other than maybe
allowing write access that doesn't affect the host.

4) Copy contents into a guest RAM location, mark it reserved, point to
it via 0xFC config as scratch register.
 a) Done by QEMU (vfio-pci)
 b) Done by SeaBIOS/OVMF

Discussion: This is the most like real hardware.  4.a) has the usual
issue of how to pick an address, but the benefit of not requiring BIOS
changes (simply mark the RAM reserved via existing methods).  4.b) would
require passing a buffer containing the contents of the OpRegion via
fw_cfg and letting the BIOS do the setup.  The latter of course requires
modifying each BIOS for this support.

Of course none of these support hotplug nor really can they since
reserved memory regions are not dynamic in the architecture.

In all cases, some piece of software needs to know where it can place
the OpRegion in guest memory.  It seems like there are advantages or
disadvantages whether that's done by QEMU or the BIOS, but we only need
to do it once if it's QEMU.  Suggestions, comments, preferences?

Another thing I notice in this series is the access to PCI config space
of both the host bridge and the LPC bridge.  This prevents unprivileged
use cases and is a barrier to libvirt support since it will need to
provide access to the pci-sysfs files for the process.  Should vfio add
additional device specific regions to expose the config space of these
other devices?  I don't see that there's any write access necessary, so
these would be read-only.  The comment in the kernel regarding why an
unprivileged user can only access standard config space indicates that
some devices lockup if unimplemented config space is accessed.  It seems
like that's probably not an issue for recent-ish Intel host bridges and
LPC devices.  If OpRegion, host bridge config, and LPC config were all
provided through vfio, would there be any need for igd-passthrough
switches on the machine type?  It seems like the QEMU vfio-pci driver
could enable the necessary features and pre-fill the host and LPC bridge
config items on demand when parsing an IGD device.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-28 19:35   ` Alex Williamson
  0 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-01-28 19:35 UTC (permalink / raw)
  To: Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users

On Tue, 2016-01-05 at 12:41 +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> We have some code in our tree to support pci passthrough of intel
> graphics devices (igd) on xen, which requires some chipset tweaks
> for (a) the host bridge and (b) the lpc/isa-bridge to meat the
> expectations of the guest driver.
> 
> For kvm we need pretty much the same, also the requirements for vgpu
> (xengt/kvmgt) are very simliar.  This patch wires up the existing
> support for kvm.  It also brings a bunch of bugfixes and cleanups.
> 
> Unfortunaly the oldish laptop I had planned to use for testing turned
> out to have no working iommu support for igd, so this patch series
> still has seen very light testing only.  Any testing feedback is very
> welcome.
> 

Hi Gerd,

I believe I have working code for getting the IGD OpRegion from the host
into QEMU using a vfio device specific region, but now comes the part of
how do we expose it into the VM and I'm looking for suggestions.
Effectively in vfio-pci I have a MemoryRegion that can access the host
OpRegion.  We can map that directly into the guest, map it read-only
into the guest, or we can read out the contents and have our own virtual
version of it.  So let me throw out the options, some of which come from
you, and we can hammer out which way to go.

1) The OpRegion MemoryRegion is mapped into system_memory through
programming of the 0xFC config space register.
 a) vfio-pci could pick an address to do this as it is realized.
 b) SeaBIOS/OVMF could program this.

Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to
pick an address and mark it as e820 reserved.  I'm not sure how to pick
that address.  We'd probably want to make the 0xFC config register
read-only.  1.b) has the issue you mentioned where in most cases the
OpRegion will be 8k, but the BIOS won't know how much address space it's
mapping into system memory when it writes the 0xFC register.  I don't
know how much of a problem this is since the BIOS can easily determine
the size once mapped and re-map it somewhere there's sufficient space.
Practically, it seems like it's always going to be 8K.  This of course
requires modification to every BIOS.  It also leaves the 0xFC register
as a mapping control rather than a pointer to the OpRegion in RAM, which
doesn't really match real hardware.  The BIOS would need to pick an
address in this case.

2) Read-only mappings version of 1)

Discussion: Really nothing changes from the issues above, just prevents
any possibility of the guest modifying anything in the host.  Xen
apparently allows write access to the host page already.

3) Copy OpRegion contents into buffer and do either 1) or 2) above.

Discussion: No benefit that I can see over above other than maybe
allowing write access that doesn't affect the host.

4) Copy contents into a guest RAM location, mark it reserved, point to
it via 0xFC config as scratch register.
 a) Done by QEMU (vfio-pci)
 b) Done by SeaBIOS/OVMF

Discussion: This is the most like real hardware.  4.a) has the usual
issue of how to pick an address, but the benefit of not requiring BIOS
changes (simply mark the RAM reserved via existing methods).  4.b) would
require passing a buffer containing the contents of the OpRegion via
fw_cfg and letting the BIOS do the setup.  The latter of course requires
modifying each BIOS for this support.

Of course none of these support hotplug nor really can they since
reserved memory regions are not dynamic in the architecture.

In all cases, some piece of software needs to know where it can place
the OpRegion in guest memory.  It seems like there are advantages or
disadvantages whether that's done by QEMU or the BIOS, but we only need
to do it once if it's QEMU.  Suggestions, comments, preferences?

Another thing I notice in this series is the access to PCI config space
of both the host bridge and the LPC bridge.  This prevents unprivileged
use cases and is a barrier to libvirt support since it will need to
provide access to the pci-sysfs files for the process.  Should vfio add
additional device specific regions to expose the config space of these
other devices?  I don't see that there's any write access necessary, so
these would be read-only.  The comment in the kernel regarding why an
unprivileged user can only access standard config space indicates that
some devices lockup if unimplemented config space is accessed.  It seems
like that's probably not an issue for recent-ish Intel host bridges and
LPC devices.  If OpRegion, host bridge config, and LPC config were all
provided through vfio, would there be any need for igd-passthrough
switches on the machine type?  It seems like the QEMU vfio-pci driver
could enable the necessary features and pre-fill the host and LPC bridge
config items on demand when parsing an IGD device.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-28 19:35   ` Alex Williamson
@ 2016-01-29  2:22     ` Kay, Allen M
  -1 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-01-29  2:22 UTC (permalink / raw)
  To: Alex Williamson, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users



> -----Original Message-----
> From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> Williamson
> Sent: Thursday, January 28, 2016 11:36 AM
> To: Gerd Hoffmann; qemu-devel@nongnu.org
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-users@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> 
> 1) The OpRegion MemoryRegion is mapped into system_memory through
> programming of the 0xFC config space register.
>  a) vfio-pci could pick an address to do this as it is realized.
>  b) SeaBIOS/OVMF could program this.
> 
> Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to pick
> an address and mark it as e820 reserved.  I'm not sure how to pick that
> address.  We'd probably want to make the 0xFC config register read-
> only.  1.b) has the issue you mentioned where in most cases the OpRegion
> will be 8k, but the BIOS won't know how much address space it's mapping
> into system memory when it writes the 0xFC register.  I don't know how
> much of a problem this is since the BIOS can easily determine the size once
> mapped and re-map it somewhere there's sufficient space.
> Practically, it seems like it's always going to be 8K.  This of course requires
> modification to every BIOS.  It also leaves the 0xFC register as a mapping
> control rather than a pointer to the OpRegion in RAM, which doesn't really
> match real hardware.  The BIOS would need to pick an address in this case.
> 
> 2) Read-only mappings version of 1)
> 
> Discussion: Really nothing changes from the issues above, just prevents any
> possibility of the guest modifying anything in the host.  Xen apparently allows
> write access to the host page already.
> 
> 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> 
> Discussion: No benefit that I can see over above other than maybe allowing
> write access that doesn't affect the host.
> 
> 4) Copy contents into a guest RAM location, mark it reserved, point to it via
> 0xFC config as scratch register.
>  a) Done by QEMU (vfio-pci)
>  b) Done by SeaBIOS/OVMF
> 
> Discussion: This is the most like real hardware.  4.a) has the usual issue of
> how to pick an address, but the benefit of not requiring BIOS changes (simply
> mark the RAM reserved via existing methods).  4.b) would require passing a
> buffer containing the contents of the OpRegion via fw_cfg and letting the
> BIOS do the setup.  The latter of course requires modifying each BIOS for this
> support.
> 
> Of course none of these support hotplug nor really can they since reserved
> memory regions are not dynamic in the architecture.
> 
> In all cases, some piece of software needs to know where it can place the
> OpRegion in guest memory.  It seems like there are advantages or
> disadvantages whether that's done by QEMU or the BIOS, but we only need
> to do it once if it's QEMU.  Suggestions, comments, preferences?
> 

Hi Alex, another thing to consider is how to communicate to the guest driver the address at 0xFC contains a valid GPA address that can be accessed by the driver without causing a EPT fault - since the same driver will be used on other hypervisors and they may not EPT map OpRegion memory.  On idea proposed by display driver team is to set bit0 of the address to 1 for indicating OpRegion memory can be safely accessed by the guest driver.

> 
> Another thing I notice in this series is the access to PCI config space of both
> the host bridge and the LPC bridge.  This prevents unprivileged use cases and
> is a barrier to libvirt support since it will need to provide access to the pci-
> sysfs files for the process.  Should vfio add additional device specific regions
> to expose the config space of these other devices?  I don't see that there's
> any write access necessary, so these would be read-only.  The comment in
> the kernel regarding why an unprivileged user can only access standard
> config space indicates that some devices lockup if unimplemented config
> space is accessed.  It seems like that's probably not an issue for recent-ish
> Intel host bridges and LPC devices.  If OpRegion, host bridge config, and LPC
> config were all provided through vfio, would there be any need for igd-
> passthrough switches on the machine type?  It seems like the QEMU vfio-pci
> driver could enable the necessary features and pre-fill the host and LPC
> bridge config items on demand when parsing an IGD device.  Thanks,
> 
> Alex
> 
> __

Allen
_____________________________________________
> iGVT-g mailing list
> iGVT-g@lists.01.org
> https://lists.01.org/mailman/listinfo/igvt-g

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-29  2:22     ` Kay, Allen M
  0 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-01-29  2:22 UTC (permalink / raw)
  To: Alex Williamson, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users



> -----Original Message-----
> From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> Williamson
> Sent: Thursday, January 28, 2016 11:36 AM
> To: Gerd Hoffmann; qemu-devel@nongnu.org
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-users@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> 
> 1) The OpRegion MemoryRegion is mapped into system_memory through
> programming of the 0xFC config space register.
>  a) vfio-pci could pick an address to do this as it is realized.
>  b) SeaBIOS/OVMF could program this.
> 
> Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to pick
> an address and mark it as e820 reserved.  I'm not sure how to pick that
> address.  We'd probably want to make the 0xFC config register read-
> only.  1.b) has the issue you mentioned where in most cases the OpRegion
> will be 8k, but the BIOS won't know how much address space it's mapping
> into system memory when it writes the 0xFC register.  I don't know how
> much of a problem this is since the BIOS can easily determine the size once
> mapped and re-map it somewhere there's sufficient space.
> Practically, it seems like it's always going to be 8K.  This of course requires
> modification to every BIOS.  It also leaves the 0xFC register as a mapping
> control rather than a pointer to the OpRegion in RAM, which doesn't really
> match real hardware.  The BIOS would need to pick an address in this case.
> 
> 2) Read-only mappings version of 1)
> 
> Discussion: Really nothing changes from the issues above, just prevents any
> possibility of the guest modifying anything in the host.  Xen apparently allows
> write access to the host page already.
> 
> 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> 
> Discussion: No benefit that I can see over above other than maybe allowing
> write access that doesn't affect the host.
> 
> 4) Copy contents into a guest RAM location, mark it reserved, point to it via
> 0xFC config as scratch register.
>  a) Done by QEMU (vfio-pci)
>  b) Done by SeaBIOS/OVMF
> 
> Discussion: This is the most like real hardware.  4.a) has the usual issue of
> how to pick an address, but the benefit of not requiring BIOS changes (simply
> mark the RAM reserved via existing methods).  4.b) would require passing a
> buffer containing the contents of the OpRegion via fw_cfg and letting the
> BIOS do the setup.  The latter of course requires modifying each BIOS for this
> support.
> 
> Of course none of these support hotplug nor really can they since reserved
> memory regions are not dynamic in the architecture.
> 
> In all cases, some piece of software needs to know where it can place the
> OpRegion in guest memory.  It seems like there are advantages or
> disadvantages whether that's done by QEMU or the BIOS, but we only need
> to do it once if it's QEMU.  Suggestions, comments, preferences?
> 

Hi Alex, another thing to consider is how to communicate to the guest driver the address at 0xFC contains a valid GPA address that can be accessed by the driver without causing a EPT fault - since the same driver will be used on other hypervisors and they may not EPT map OpRegion memory.  On idea proposed by display driver team is to set bit0 of the address to 1 for indicating OpRegion memory can be safely accessed by the guest driver.

> 
> Another thing I notice in this series is the access to PCI config space of both
> the host bridge and the LPC bridge.  This prevents unprivileged use cases and
> is a barrier to libvirt support since it will need to provide access to the pci-
> sysfs files for the process.  Should vfio add additional device specific regions
> to expose the config space of these other devices?  I don't see that there's
> any write access necessary, so these would be read-only.  The comment in
> the kernel regarding why an unprivileged user can only access standard
> config space indicates that some devices lockup if unimplemented config
> space is accessed.  It seems like that's probably not an issue for recent-ish
> Intel host bridges and LPC devices.  If OpRegion, host bridge config, and LPC
> config were all provided through vfio, would there be any need for igd-
> passthrough switches on the machine type?  It seems like the QEMU vfio-pci
> driver could enable the necessary features and pre-fill the host and LPC
> bridge config items on demand when parsing an IGD device.  Thanks,
> 
> Alex
> 
> __

Allen
_____________________________________________
> iGVT-g mailing list
> iGVT-g@lists.01.org
> https://lists.01.org/mailman/listinfo/igvt-g

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-29  2:22     ` Kay, Allen M
@ 2016-01-29  2:54       ` Alex Williamson
  -1 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-01-29  2:54 UTC (permalink / raw)
  To: Kay, Allen M, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users

On Fri, 2016-01-29 at 02:22 +0000, Kay, Allen M wrote:
> 
> > -----Original Message-----
> > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> > Williamson
> > Sent: Thursday, January 28, 2016 11:36 AM
> > To: Gerd Hoffmann; qemu-devel@nongnu.org
> > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; Cao jin; vfio-users@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > 
> > 1) The OpRegion MemoryRegion is mapped into system_memory through
> > programming of the 0xFC config space register.
> >  a) vfio-pci could pick an address to do this as it is realized.
> >  b) SeaBIOS/OVMF could program this.
> > 
> > Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to pick
> > an address and mark it as e820 reserved.  I'm not sure how to pick that
> > address.  We'd probably want to make the 0xFC config register read-
> > only.  1.b) has the issue you mentioned where in most cases the OpRegion
> > will be 8k, but the BIOS won't know how much address space it's mapping
> > into system memory when it writes the 0xFC register.  I don't know how
> > much of a problem this is since the BIOS can easily determine the size once
> > mapped and re-map it somewhere there's sufficient space.
> > Practically, it seems like it's always going to be 8K.  This of course requires
> > modification to every BIOS.  It also leaves the 0xFC register as a mapping
> > control rather than a pointer to the OpRegion in RAM, which doesn't really
> > match real hardware.  The BIOS would need to pick an address in this case.
> > 
> > 2) Read-only mappings version of 1)
> > 
> > Discussion: Really nothing changes from the issues above, just prevents any
> > possibility of the guest modifying anything in the host.  Xen apparently allows
> > write access to the host page already.
> > 
> > 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> > 
> > Discussion: No benefit that I can see over above other than maybe allowing
> > write access that doesn't affect the host.
> > 
> > 4) Copy contents into a guest RAM location, mark it reserved, point to it via
> > 0xFC config as scratch register.
> >  a) Done by QEMU (vfio-pci)
> >  b) Done by SeaBIOS/OVMF
> > 
> > Discussion: This is the most like real hardware.  4.a) has the usual issue of
> > how to pick an address, but the benefit of not requiring BIOS changes (simply
> > mark the RAM reserved via existing methods).  4.b) would require passing a
> > buffer containing the contents of the OpRegion via fw_cfg and letting the
> > BIOS do the setup.  The latter of course requires modifying each BIOS for this
> > support.
> > 
> > Of course none of these support hotplug nor really can they since reserved
> > memory regions are not dynamic in the architecture.
> > 
> > In all cases, some piece of software needs to know where it can place the
> > OpRegion in guest memory.  It seems like there are advantages or
> > disadvantages whether that's done by QEMU or the BIOS, but we only need
> > to do it once if it's QEMU.  Suggestions, comments, preferences?
> > 
> 
> Hi Alex, another thing to consider is how to communicate to the guest driver the address at 0xFC contains a valid GPA address that can be accessed by the driver without causing a EPT fault - since
> the same driver will be used on other hypervisors and they may not EPT map OpRegion memory.  On idea proposed by display driver team is to set bit0 of the address to 1 for indicating OpRegion memory
> can be safely accessed by the guest driver.

Hi Allen,

Why is that any different than a guest accessing any other memory area
that it shouldn't?  The OpRegion starts with a 16-byte ID string, so if
the guest finds that it should feel fairly confident the OpRegion data
is valid.  The published spec also seems to define all bits of 0xfc as
valid, not implying any sort of alignment requirements, and the i915
driver does a memremap directly on the value read from 0xfc.  So I'm not
sure whether there's really a need to or ability to define any of those
bits in an adhoc way to indicate mapping.  If we do things right,
shouldn't the guest driver not even know it's running in a VM, at least
for the KVMGT-d case, so we need to be compatible with physical
hardware.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-29  2:54       ` Alex Williamson
  0 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-01-29  2:54 UTC (permalink / raw)
  To: Kay, Allen M, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users

On Fri, 2016-01-29 at 02:22 +0000, Kay, Allen M wrote:
> 
> > -----Original Message-----
> > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> > Williamson
> > Sent: Thursday, January 28, 2016 11:36 AM
> > To: Gerd Hoffmann; qemu-devel@nongnu.org
> > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; Cao jin; vfio-users@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > 
> > 1) The OpRegion MemoryRegion is mapped into system_memory through
> > programming of the 0xFC config space register.
> >  a) vfio-pci could pick an address to do this as it is realized.
> >  b) SeaBIOS/OVMF could program this.
> > 
> > Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to pick
> > an address and mark it as e820 reserved.  I'm not sure how to pick that
> > address.  We'd probably want to make the 0xFC config register read-
> > only.  1.b) has the issue you mentioned where in most cases the OpRegion
> > will be 8k, but the BIOS won't know how much address space it's mapping
> > into system memory when it writes the 0xFC register.  I don't know how
> > much of a problem this is since the BIOS can easily determine the size once
> > mapped and re-map it somewhere there's sufficient space.
> > Practically, it seems like it's always going to be 8K.  This of course requires
> > modification to every BIOS.  It also leaves the 0xFC register as a mapping
> > control rather than a pointer to the OpRegion in RAM, which doesn't really
> > match real hardware.  The BIOS would need to pick an address in this case.
> > 
> > 2) Read-only mappings version of 1)
> > 
> > Discussion: Really nothing changes from the issues above, just prevents any
> > possibility of the guest modifying anything in the host.  Xen apparently allows
> > write access to the host page already.
> > 
> > 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> > 
> > Discussion: No benefit that I can see over above other than maybe allowing
> > write access that doesn't affect the host.
> > 
> > 4) Copy contents into a guest RAM location, mark it reserved, point to it via
> > 0xFC config as scratch register.
> >  a) Done by QEMU (vfio-pci)
> >  b) Done by SeaBIOS/OVMF
> > 
> > Discussion: This is the most like real hardware.  4.a) has the usual issue of
> > how to pick an address, but the benefit of not requiring BIOS changes (simply
> > mark the RAM reserved via existing methods).  4.b) would require passing a
> > buffer containing the contents of the OpRegion via fw_cfg and letting the
> > BIOS do the setup.  The latter of course requires modifying each BIOS for this
> > support.
> > 
> > Of course none of these support hotplug nor really can they since reserved
> > memory regions are not dynamic in the architecture.
> > 
> > In all cases, some piece of software needs to know where it can place the
> > OpRegion in guest memory.  It seems like there are advantages or
> > disadvantages whether that's done by QEMU or the BIOS, but we only need
> > to do it once if it's QEMU.  Suggestions, comments, preferences?
> > 
> 
> Hi Alex, another thing to consider is how to communicate to the guest driver the address at 0xFC contains a valid GPA address that can be accessed by the driver without causing a EPT fault - since
> the same driver will be used on other hypervisors and they may not EPT map OpRegion memory.  On idea proposed by display driver team is to set bit0 of the address to 1 for indicating OpRegion memory
> can be safely accessed by the guest driver.

Hi Allen,

Why is that any different than a guest accessing any other memory area
that it shouldn't?  The OpRegion starts with a 16-byte ID string, so if
the guest finds that it should feel fairly confident the OpRegion data
is valid.  The published spec also seems to define all bits of 0xfc as
valid, not implying any sort of alignment requirements, and the i915
driver does a memremap directly on the value read from 0xfc.  So I'm not
sure whether there's really a need to or ability to define any of those
bits in an adhoc way to indicate mapping.  If we do things right,
shouldn't the guest driver not even know it's running in a VM, at least
for the KVMGT-d case, so we need to be compatible with physical
hardware.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-29  2:54       ` Alex Williamson
@ 2016-01-29  6:21         ` Jike Song
  -1 siblings, 0 replies; 132+ messages in thread
From: Jike Song @ 2016-01-29  6:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Kay,
	Allen M, qemu-devel, Cao jin, Gerd Hoffmann, vfio-users

On 01/29/2016 10:54 AM, Alex Williamson wrote:
> On Fri, 2016-01-29 at 02:22 +0000, Kay, Allen M wrote:
>>  
>>> -----Original Message-----
>>> From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
>>> Williamson
>>> Sent: Thursday, January 28, 2016 11:36 AM
>>> To: Gerd Hoffmann; qemu-devel@nongnu.org
>>> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
>>> Stefano Stabellini; Cao jin; vfio-users@redhat.com
>>> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
>>> tweaks
>>>  
>>>  
>>> 1) The OpRegion MemoryRegion is mapped into system_memory through
>>> programming of the 0xFC config space register.
>>>  a) vfio-pci could pick an address to do this as it is realized.
>>>  b) SeaBIOS/OVMF could program this.
>>>  
>>> Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to pick
>>> an address and mark it as e820 reserved.  I'm not sure how to pick that
>>> address.  We'd probably want to make the 0xFC config register read-
>>> only.  1.b) has the issue you mentioned where in most cases the OpRegion
>>> will be 8k, but the BIOS won't know how much address space it's mapping
>>> into system memory when it writes the 0xFC register.  I don't know how
>>> much of a problem this is since the BIOS can easily determine the size once
>>> mapped and re-map it somewhere there's sufficient space.
>>> Practically, it seems like it's always going to be 8K.  This of course requires
>>> modification to every BIOS.  It also leaves the 0xFC register as a mapping
>>> control rather than a pointer to the OpRegion in RAM, which doesn't really
>>> match real hardware.  The BIOS would need to pick an address in this case.
>>>  
>>> 2) Read-only mappings version of 1)
>>>  
>>> Discussion: Really nothing changes from the issues above, just prevents any
>>> possibility of the guest modifying anything in the host.  Xen apparently allows
>>> write access to the host page already.
>>>  
>>> 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
>>>  
>>> Discussion: No benefit that I can see over above other than maybe allowing
>>> write access that doesn't affect the host.
>>>  
>>> 4) Copy contents into a guest RAM location, mark it reserved, point to it via
>>> 0xFC config as scratch register.
>>>  a) Done by QEMU (vfio-pci)
>>>  b) Done by SeaBIOS/OVMF
>>>  
>>> Discussion: This is the most like real hardware.  4.a) has the usual issue of
>>> how to pick an address, but the benefit of not requiring BIOS changes (simply
>>> mark the RAM reserved via existing methods).  4.b) would require passing a
>>> buffer containing the contents of the OpRegion via fw_cfg and letting the
>>> BIOS do the setup.  The latter of course requires modifying each BIOS for this
>>> support.
>>>  
>>> Of course none of these support hotplug nor really can they since reserved
>>> memory regions are not dynamic in the architecture.
>>>  
>>> In all cases, some piece of software needs to know where it can place the
>>> OpRegion in guest memory.  It seems like there are advantages or
>>> disadvantages whether that's done by QEMU or the BIOS, but we only need
>>> to do it once if it's QEMU.  Suggestions, comments, preferences?
>>>  
>>  
>> Hi Alex, another thing to consider is how to communicate to the guest driver the address at 0xFC contains a valid GPA address that can be accessed by the driver without causing a EPT fault - since
>> the same driver will be used on other hypervisors and they may not EPT map OpRegion memory.  On idea proposed by display driver team is to set bit0 of the address to 1 for indicating OpRegion memory
>> can be safely accessed by the guest driver.
> 
> Hi Allen,
> 
> Why is that any different than a guest accessing any other memory area
> that it shouldn't?  The OpRegion starts with a 16-byte ID string, so if
> the guest finds that it should feel fairly confident the OpRegion data
> is valid.  The published spec also seems to define all bits of 0xfc as
> valid, not implying any sort of alignment requirements, and the i915
> driver does a memremap directly on the value read from 0xfc.  So I'm not
> sure whether there's really a need to or ability to define any of those
> bits in an adhoc way to indicate mapping.  If we do things right,
> shouldn't the guest driver not even know it's running in a VM, at least
> for the KVMGT-d case, so we need to be compatible with physical
> hardware.  Thanks,
> 

I agree. EPT page fault is allowed on guest OpRegion accessing, as long as
during the page fault handling, KVM will find a proper PFN for that GPA.
It's exactly what is expected for 'normal' memory.

> Alex
> 

--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-29  6:21         ` Jike Song
  0 siblings, 0 replies; 132+ messages in thread
From: Jike Song @ 2016-01-29  6:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Kay,
	Allen M, qemu-devel, Cao jin, Gerd Hoffmann, vfio-users

On 01/29/2016 10:54 AM, Alex Williamson wrote:
> On Fri, 2016-01-29 at 02:22 +0000, Kay, Allen M wrote:
>>  
>>> -----Original Message-----
>>> From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
>>> Williamson
>>> Sent: Thursday, January 28, 2016 11:36 AM
>>> To: Gerd Hoffmann; qemu-devel@nongnu.org
>>> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
>>> Stefano Stabellini; Cao jin; vfio-users@redhat.com
>>> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
>>> tweaks
>>>  
>>>  
>>> 1) The OpRegion MemoryRegion is mapped into system_memory through
>>> programming of the 0xFC config space register.
>>>  a) vfio-pci could pick an address to do this as it is realized.
>>>  b) SeaBIOS/OVMF could program this.
>>>  
>>> Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to pick
>>> an address and mark it as e820 reserved.  I'm not sure how to pick that
>>> address.  We'd probably want to make the 0xFC config register read-
>>> only.  1.b) has the issue you mentioned where in most cases the OpRegion
>>> will be 8k, but the BIOS won't know how much address space it's mapping
>>> into system memory when it writes the 0xFC register.  I don't know how
>>> much of a problem this is since the BIOS can easily determine the size once
>>> mapped and re-map it somewhere there's sufficient space.
>>> Practically, it seems like it's always going to be 8K.  This of course requires
>>> modification to every BIOS.  It also leaves the 0xFC register as a mapping
>>> control rather than a pointer to the OpRegion in RAM, which doesn't really
>>> match real hardware.  The BIOS would need to pick an address in this case.
>>>  
>>> 2) Read-only mappings version of 1)
>>>  
>>> Discussion: Really nothing changes from the issues above, just prevents any
>>> possibility of the guest modifying anything in the host.  Xen apparently allows
>>> write access to the host page already.
>>>  
>>> 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
>>>  
>>> Discussion: No benefit that I can see over above other than maybe allowing
>>> write access that doesn't affect the host.
>>>  
>>> 4) Copy contents into a guest RAM location, mark it reserved, point to it via
>>> 0xFC config as scratch register.
>>>  a) Done by QEMU (vfio-pci)
>>>  b) Done by SeaBIOS/OVMF
>>>  
>>> Discussion: This is the most like real hardware.  4.a) has the usual issue of
>>> how to pick an address, but the benefit of not requiring BIOS changes (simply
>>> mark the RAM reserved via existing methods).  4.b) would require passing a
>>> buffer containing the contents of the OpRegion via fw_cfg and letting the
>>> BIOS do the setup.  The latter of course requires modifying each BIOS for this
>>> support.
>>>  
>>> Of course none of these support hotplug nor really can they since reserved
>>> memory regions are not dynamic in the architecture.
>>>  
>>> In all cases, some piece of software needs to know where it can place the
>>> OpRegion in guest memory.  It seems like there are advantages or
>>> disadvantages whether that's done by QEMU or the BIOS, but we only need
>>> to do it once if it's QEMU.  Suggestions, comments, preferences?
>>>  
>>  
>> Hi Alex, another thing to consider is how to communicate to the guest driver the address at 0xFC contains a valid GPA address that can be accessed by the driver without causing a EPT fault - since
>> the same driver will be used on other hypervisors and they may not EPT map OpRegion memory.  On idea proposed by display driver team is to set bit0 of the address to 1 for indicating OpRegion memory
>> can be safely accessed by the guest driver.
> 
> Hi Allen,
> 
> Why is that any different than a guest accessing any other memory area
> that it shouldn't?  The OpRegion starts with a 16-byte ID string, so if
> the guest finds that it should feel fairly confident the OpRegion data
> is valid.  The published spec also seems to define all bits of 0xfc as
> valid, not implying any sort of alignment requirements, and the i915
> driver does a memremap directly on the value read from 0xfc.  So I'm not
> sure whether there's really a need to or ability to define any of those
> bits in an adhoc way to indicate mapping.  If we do things right,
> shouldn't the guest driver not even know it's running in a VM, at least
> for the KVMGT-d case, so we need to be compatible with physical
> hardware.  Thanks,
> 

I agree. EPT page fault is allowed on guest OpRegion accessing, as long as
during the page fault handling, KVM will find a proper PFN for that GPA.
It's exactly what is expected for 'normal' memory.

> Alex
> 

--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-28 19:35   ` Alex Williamson
@ 2016-01-29  7:09     ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-29  7:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

  Hi,

> 1) The OpRegion MemoryRegion is mapped into system_memory through
> programming of the 0xFC config space register.
>  a) vfio-pci could pick an address to do this as it is realized.
>  b) SeaBIOS/OVMF could program this.
> 
> Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to
> pick an address and mark it as e820 reserved.  I'm not sure how to pick
> that address.

Because of that I'd let the firmware pick the address and program 0xfc
accordingly, i.e. (b).  seabios can simply malloc two pages and be done
with it (any ram allocated by seabios will be tagged as e820 reserved).

> 2) Read-only mappings version of 1)
> 
> Discussion: Really nothing changes from the issues above, just prevents
> any possibility of the guest modifying anything in the host.  Xen
> apparently allows write access to the host page already.

I think read-only is out.  Probably xen allows write access because
guest drivers expect they have write access to the opregion, so the
question is ...

> 3) Copy OpRegion contents into buffer and do either 1) or 2) above.

whenever we give the guest a copy of the host opregion or direct access.

> 4) Copy contents into a guest RAM location, mark it reserved, point to
> it via 0xFC config as scratch register.
>  a) Done by QEMU (vfio-pci)
>  b) Done by SeaBIOS/OVMF
> 
> Discussion: This is the most like real hardware.  4.a) has the usual
> issue of how to pick an address, but the benefit of not requiring BIOS
> changes (simply mark the RAM reserved via existing methods).  4.b) would
> require passing a buffer containing the contents of the OpRegion via
> fw_cfg and letting the BIOS do the setup.  The latter of course requires
> modifying each BIOS for this support.

Maybe we should define the interface as "guest writes 0xfc to pick
address, qemu takes care to place opregion there".  That gives us the
freedom to change the qemu implementation (either copy host opregion or
map the host opregion) without breaking things.

> Of course none of these support hotplug nor really can they since
> reserved memory regions are not dynamic in the architecture.

igd is chipset graphics and therefore not hotpluggable anyway (on
physical hardware), I'd be very surprised if the guest drivers are
prepared to handle hotplug.

> Another thing I notice in this series is the access to PCI config space
> of both the host bridge and the LPC bridge.  This prevents unprivileged
> use cases

lpc bridge is no problem, only pci id fields are copied over and
unprivileged access is allowed for them.

Copying the gfx registers of the host bridge is a problem indeed.

> Should vfio add
> additional device specific regions to expose the config space of these
> other devices?

That is an option.  It is not clear yet which route we have to take
though.  Testing shows that newer linux drivers work fine even without
igd-passthru=on tweaks, whereas older linux kernels and windows drivers
don't work even with this series applied and igd-passthru=on.  I'll go
look at this as soon as I have test hardware (getting some is wip atm).

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-29  7:09     ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-01-29  7:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

  Hi,

> 1) The OpRegion MemoryRegion is mapped into system_memory through
> programming of the 0xFC config space register.
>  a) vfio-pci could pick an address to do this as it is realized.
>  b) SeaBIOS/OVMF could program this.
> 
> Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to
> pick an address and mark it as e820 reserved.  I'm not sure how to pick
> that address.

Because of that I'd let the firmware pick the address and program 0xfc
accordingly, i.e. (b).  seabios can simply malloc two pages and be done
with it (any ram allocated by seabios will be tagged as e820 reserved).

> 2) Read-only mappings version of 1)
> 
> Discussion: Really nothing changes from the issues above, just prevents
> any possibility of the guest modifying anything in the host.  Xen
> apparently allows write access to the host page already.

I think read-only is out.  Probably xen allows write access because
guest drivers expect they have write access to the opregion, so the
question is ...

> 3) Copy OpRegion contents into buffer and do either 1) or 2) above.

whenever we give the guest a copy of the host opregion or direct access.

> 4) Copy contents into a guest RAM location, mark it reserved, point to
> it via 0xFC config as scratch register.
>  a) Done by QEMU (vfio-pci)
>  b) Done by SeaBIOS/OVMF
> 
> Discussion: This is the most like real hardware.  4.a) has the usual
> issue of how to pick an address, but the benefit of not requiring BIOS
> changes (simply mark the RAM reserved via existing methods).  4.b) would
> require passing a buffer containing the contents of the OpRegion via
> fw_cfg and letting the BIOS do the setup.  The latter of course requires
> modifying each BIOS for this support.

Maybe we should define the interface as "guest writes 0xfc to pick
address, qemu takes care to place opregion there".  That gives us the
freedom to change the qemu implementation (either copy host opregion or
map the host opregion) without breaking things.

> Of course none of these support hotplug nor really can they since
> reserved memory regions are not dynamic in the architecture.

igd is chipset graphics and therefore not hotpluggable anyway (on
physical hardware), I'd be very surprised if the guest drivers are
prepared to handle hotplug.

> Another thing I notice in this series is the access to PCI config space
> of both the host bridge and the LPC bridge.  This prevents unprivileged
> use cases

lpc bridge is no problem, only pci id fields are copied over and
unprivileged access is allowed for them.

Copying the gfx registers of the host bridge is a problem indeed.

> Should vfio add
> additional device specific regions to expose the config space of these
> other devices?

That is an option.  It is not clear yet which route we have to take
though.  Testing shows that newer linux drivers work fine even without
igd-passthru=on tweaks, whereas older linux kernels and windows drivers
don't work even with this series applied and igd-passthru=on.  I'll go
look at this as soon as I have test hardware (getting some is wip atm).

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-29  7:09     ` Gerd Hoffmann
@ 2016-01-29 17:59       ` Alex Williamson
  -1 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-01-29 17:59 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Fri, 2016-01-29 at 08:09 +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > 1) The OpRegion MemoryRegion is mapped into system_memory through
> > programming of the 0xFC config space register.
> >  a) vfio-pci could pick an address to do this as it is realized.
> >  b) SeaBIOS/OVMF could program this.
> > 
> > Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to
> > pick an address and mark it as e820 reserved.  I'm not sure how to pick
> > that address.
> 
> Because of that I'd let the firmware pick the address and program 0xfc
> accordingly, i.e. (b).  seabios can simply malloc two pages and be done
> with it (any ram allocated by seabios will be tagged as e820 reserved).

Thanks for the tip that seabios allocated pages automatically become
e820 reserved, that simplifies things a bit.

> > 2) Read-only mappings version of 1)
> > 
> > Discussion: Really nothing changes from the issues above, just prevents
> > any possibility of the guest modifying anything in the host.  Xen
> > apparently allows write access to the host page already.
> 
> I think read-only is out.  Probably xen allows write access because
> guest drivers expect they have write access to the opregion, so the
> question is ...
> 
> > 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> 
> whenever we give the guest a copy of the host opregion or direct access.
> 
> > 4) Copy contents into a guest RAM location, mark it reserved, point to
> > it via 0xFC config as scratch register.
> >  a) Done by QEMU (vfio-pci)
> >  b) Done by SeaBIOS/OVMF
> > 
> > Discussion: This is the most like real hardware.  4.a) has the usual
> > issue of how to pick an address, but the benefit of not requiring BIOS
> > changes (simply mark the RAM reserved via existing methods).  4.b) would
> > require passing a buffer containing the contents of the OpRegion via
> > fw_cfg and letting the BIOS do the setup.  The latter of course requires
> > modifying each BIOS for this support.
> 
> Maybe we should define the interface as "guest writes 0xfc to pick
> address, qemu takes care to place opregion there".  That gives us the
> freedom to change the qemu implementation (either copy host opregion or
> map the host opregion) without breaking things.

Ok, so seabios allocates two pages, writes the base address of those
pages to 0xfc and looks to see whether the signature appears at that
address due to qemu mapping.  It verifies the size and does a
free/realloc if not the right size.  If the graphics signature does not
appear, free those pages and assume no opregion support.  If we later
decide to use a copy, we'd need to disable the 0xfc automagic mapping
and probably pass the data via fw_cfg.  Sound right?

Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently
testing for any Intel VGA device, but I wonder if I should only be
enabling anything opregion if it also appears at a specific address.

> > Of course none of these support hotplug nor really can they since
> > reserved memory regions are not dynamic in the architecture.
> 
> igd is chipset graphics and therefore not hotpluggable anyway (on
> physical hardware), I'd be very surprised if the guest drivers are
> prepared to handle hotplug.
> 
> > Another thing I notice in this series is the access to PCI config space
> > of both the host bridge and the LPC bridge.  This prevents unprivileged
> > use cases
> 
> lpc bridge is no problem, only pci id fields are copied over and
> unprivileged access is allowed for them.
> 
> Copying the gfx registers of the host bridge is a problem indeed.

I would argue that both are really a problem, libvirt wants to put QEMU
in a container that prevents access to any host system files other than
those explicitly allowed.  Therefore libvirt needs to grant the process
access to the lpc sysfs config file even though it only needs user
visible register values.

> > Should vfio add
> > additional device specific regions to expose the config space of these
> > other devices?
> 
> That is an option.  It is not clear yet which route we have to take
> though.  Testing shows that newer linux drivers work fine even without
> igd-passthru=on tweaks, whereas older linux kernels and windows drivers
> don't work even with this series applied and igd-passthru=on.  I'll go
> look at this as soon as I have test hardware (getting some is wip atm).

Ok, well we certainly don't need to necessarily tie config space of
those two devices together with opregion access, they can be added
later, but we should revisit before we make QEMU grab those config space
values itself, if we can make that functionality add value.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-29 17:59       ` Alex Williamson
  0 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-01-29 17:59 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Fri, 2016-01-29 at 08:09 +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > 1) The OpRegion MemoryRegion is mapped into system_memory through
> > programming of the 0xFC config space register.
> >  a) vfio-pci could pick an address to do this as it is realized.
> >  b) SeaBIOS/OVMF could program this.
> > 
> > Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to
> > pick an address and mark it as e820 reserved.  I'm not sure how to pick
> > that address.
> 
> Because of that I'd let the firmware pick the address and program 0xfc
> accordingly, i.e. (b).  seabios can simply malloc two pages and be done
> with it (any ram allocated by seabios will be tagged as e820 reserved).

Thanks for the tip that seabios allocated pages automatically become
e820 reserved, that simplifies things a bit.

> > 2) Read-only mappings version of 1)
> > 
> > Discussion: Really nothing changes from the issues above, just prevents
> > any possibility of the guest modifying anything in the host.  Xen
> > apparently allows write access to the host page already.
> 
> I think read-only is out.  Probably xen allows write access because
> guest drivers expect they have write access to the opregion, so the
> question is ...
> 
> > 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> 
> whenever we give the guest a copy of the host opregion or direct access.
> 
> > 4) Copy contents into a guest RAM location, mark it reserved, point to
> > it via 0xFC config as scratch register.
> >  a) Done by QEMU (vfio-pci)
> >  b) Done by SeaBIOS/OVMF
> > 
> > Discussion: This is the most like real hardware.  4.a) has the usual
> > issue of how to pick an address, but the benefit of not requiring BIOS
> > changes (simply mark the RAM reserved via existing methods).  4.b) would
> > require passing a buffer containing the contents of the OpRegion via
> > fw_cfg and letting the BIOS do the setup.  The latter of course requires
> > modifying each BIOS for this support.
> 
> Maybe we should define the interface as "guest writes 0xfc to pick
> address, qemu takes care to place opregion there".  That gives us the
> freedom to change the qemu implementation (either copy host opregion or
> map the host opregion) without breaking things.

Ok, so seabios allocates two pages, writes the base address of those
pages to 0xfc and looks to see whether the signature appears at that
address due to qemu mapping.  It verifies the size and does a
free/realloc if not the right size.  If the graphics signature does not
appear, free those pages and assume no opregion support.  If we later
decide to use a copy, we'd need to disable the 0xfc automagic mapping
and probably pass the data via fw_cfg.  Sound right?

Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently
testing for any Intel VGA device, but I wonder if I should only be
enabling anything opregion if it also appears at a specific address.

> > Of course none of these support hotplug nor really can they since
> > reserved memory regions are not dynamic in the architecture.
> 
> igd is chipset graphics and therefore not hotpluggable anyway (on
> physical hardware), I'd be very surprised if the guest drivers are
> prepared to handle hotplug.
> 
> > Another thing I notice in this series is the access to PCI config space
> > of both the host bridge and the LPC bridge.  This prevents unprivileged
> > use cases
> 
> lpc bridge is no problem, only pci id fields are copied over and
> unprivileged access is allowed for them.
> 
> Copying the gfx registers of the host bridge is a problem indeed.

I would argue that both are really a problem, libvirt wants to put QEMU
in a container that prevents access to any host system files other than
those explicitly allowed.  Therefore libvirt needs to grant the process
access to the lpc sysfs config file even though it only needs user
visible register values.

> > Should vfio add
> > additional device specific regions to expose the config space of these
> > other devices?
> 
> That is an option.  It is not clear yet which route we have to take
> though.  Testing shows that newer linux drivers work fine even without
> igd-passthru=on tweaks, whereas older linux kernels and windows drivers
> don't work even with this series applied and igd-passthru=on.  I'll go
> look at this as soon as I have test hardware (getting some is wip atm).

Ok, well we certainly don't need to necessarily tie config space of
those two devices together with opregion access, they can be added
later, but we should revisit before we make QEMU grab those config space
values itself, if we can make that functionality add value.  Thanks,

Alex


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-29  2:54       ` Alex Williamson
@ 2016-01-29 21:58         ` Kay, Allen M
  -1 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-01-29 21:58 UTC (permalink / raw)
  To: Alex Williamson, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Thursday, January 28, 2016 6:55 PM
> To: Kay, Allen M; Gerd Hoffmann; qemu-devel@nongnu.org
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-users@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> On Fri, 2016-01-29 at 02:22 +0000, Kay, Allen M wrote:
> >
> > > -----Original Message-----
> > > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> > > Williamson
> > > Sent: Thursday, January 28, 2016 11:36 AM
> > > To: Gerd Hoffmann; qemu-devel@nongnu.org
> > > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo
> > > Habkost; Stefano Stabellini; Cao jin; vfio-users@redhat.com
> > > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > chipset tweaks
> > >
> > >
> > > 1) The OpRegion MemoryRegion is mapped into system_memory
> through
> > > programming of the 0xFC config space register.
> > >  a) vfio-pci could pick an address to do this as it is realized.
> > >  b) SeaBIOS/OVMF could program this.
> > >
> > > Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need
> > > to pick an address and mark it as e820 reserved.  I'm not sure how
> > > to pick that address.  We'd probably want to make the 0xFC config
> > > register read- only.  1.b) has the issue you mentioned where in most
> > > cases the OpRegion will be 8k, but the BIOS won't know how much
> > > address space it's mapping into system memory when it writes the
> > > 0xFC register.  I don't know how much of a problem this is since the
> > > BIOS can easily determine the size once mapped and re-map it
> somewhere there's sufficient space.
> > > Practically, it seems like it's always going to be 8K.  This of
> > > course requires modification to every BIOS.  It also leaves the 0xFC
> > > register as a mapping control rather than a pointer to the OpRegion
> > > in RAM, which doesn't really match real hardware.  The BIOS would need
> to pick an address in this case.
> > >
> > > 2) Read-only mappings version of 1)
> > >
> > > Discussion: Really nothing changes from the issues above, just
> > > prevents any possibility of the guest modifying anything in the
> > > host.  Xen apparently allows write access to the host page already.
> > >
> > > 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> > >
> > > Discussion: No benefit that I can see over above other than maybe
> > > allowing write access that doesn't affect the host.
> > >
> > > 4) Copy contents into a guest RAM location, mark it reserved, point
> > > to it via 0xFC config as scratch register.
> > >  a) Done by QEMU (vfio-pci)
> > >  b) Done by SeaBIOS/OVMF
> > >
> > > Discussion: This is the most like real hardware.  4.a) has the usual
> > > issue of how to pick an address, but the benefit of not requiring
> > > BIOS changes (simply mark the RAM reserved via existing methods).
> > > 4.b) would require passing a buffer containing the contents of the
> > > OpRegion via fw_cfg and letting the BIOS do the setup.  The latter
> > > of course requires modifying each BIOS for this support.
> > >
> > > Of course none of these support hotplug nor really can they since
> > > reserved memory regions are not dynamic in the architecture.
> > >
> > > In all cases, some piece of software needs to know where it can
> > > place the OpRegion in guest memory.  It seems like there are
> > > advantages or disadvantages whether that's done by QEMU or the BIOS,
> > > but we only need to do it once if it's QEMU.  Suggestions, comments,
> preferences?
> > >
> >
> > Hi Alex, another thing to consider is how to communicate to the guest
> > driver the address at 0xFC contains a valid GPA address that can be
> > accessed by the driver without causing a EPT fault - since the same driver
> will be used on other hypervisors and they may not EPT map OpRegion
> memory.  On idea proposed by display driver team is to set bit0 of the
> address to 1 for indicating OpRegion memory can be safely accessed by the
> guest driver.
> 
> Hi Allen,
> 
> Why is that any different than a guest accessing any other memory area that
> it shouldn't?  The OpRegion starts with a 16-byte ID string, so if the guest
> finds that it should feel fairly confident the OpRegion data is valid.  The
> published spec also seems to define all bits of 0xfc as valid, not implying any
> sort of alignment requirements, and the i915 driver does a memremap
> directly on the value read from 0xfc.  So I'm not sure whether there's really a
> need to or ability to define any of those bits in an adhoc way to indicate
> mapping.  If we do things right, shouldn't the guest driver not even know it's
> running in a VM, at least for the KVMGT-d case, so we need to be compatible
> with physical hardware.  Thanks,
> 

First of all, I would like to clarify I'm talking about general IGD passthrough case - not specific to KVMGT.  In IGD passthrough configuration, one of the following will happen when the driver accesses OpRegion:

1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM), guest can successfully read the content of the OpRegion and check the ID string.  In this case, everything works fine.

2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all, then guest driver's attempt to setup GVA/GPA mapping will fail, which causes the driver to fail.  In this case, guest driver won't have the opportunity to look into the content of OpRegion memory and check the ID string.

> Alex


^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-29 21:58         ` Kay, Allen M
  0 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-01-29 21:58 UTC (permalink / raw)
  To: Alex Williamson, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Thursday, January 28, 2016 6:55 PM
> To: Kay, Allen M; Gerd Hoffmann; qemu-devel@nongnu.org
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-users@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> On Fri, 2016-01-29 at 02:22 +0000, Kay, Allen M wrote:
> >
> > > -----Original Message-----
> > > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> > > Williamson
> > > Sent: Thursday, January 28, 2016 11:36 AM
> > > To: Gerd Hoffmann; qemu-devel@nongnu.org
> > > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo
> > > Habkost; Stefano Stabellini; Cao jin; vfio-users@redhat.com
> > > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > chipset tweaks
> > >
> > >
> > > 1) The OpRegion MemoryRegion is mapped into system_memory
> through
> > > programming of the 0xFC config space register.
> > >  a) vfio-pci could pick an address to do this as it is realized.
> > >  b) SeaBIOS/OVMF could program this.
> > >
> > > Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need
> > > to pick an address and mark it as e820 reserved.  I'm not sure how
> > > to pick that address.  We'd probably want to make the 0xFC config
> > > register read- only.  1.b) has the issue you mentioned where in most
> > > cases the OpRegion will be 8k, but the BIOS won't know how much
> > > address space it's mapping into system memory when it writes the
> > > 0xFC register.  I don't know how much of a problem this is since the
> > > BIOS can easily determine the size once mapped and re-map it
> somewhere there's sufficient space.
> > > Practically, it seems like it's always going to be 8K.  This of
> > > course requires modification to every BIOS.  It also leaves the 0xFC
> > > register as a mapping control rather than a pointer to the OpRegion
> > > in RAM, which doesn't really match real hardware.  The BIOS would need
> to pick an address in this case.
> > >
> > > 2) Read-only mappings version of 1)
> > >
> > > Discussion: Really nothing changes from the issues above, just
> > > prevents any possibility of the guest modifying anything in the
> > > host.  Xen apparently allows write access to the host page already.
> > >
> > > 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> > >
> > > Discussion: No benefit that I can see over above other than maybe
> > > allowing write access that doesn't affect the host.
> > >
> > > 4) Copy contents into a guest RAM location, mark it reserved, point
> > > to it via 0xFC config as scratch register.
> > >  a) Done by QEMU (vfio-pci)
> > >  b) Done by SeaBIOS/OVMF
> > >
> > > Discussion: This is the most like real hardware.  4.a) has the usual
> > > issue of how to pick an address, but the benefit of not requiring
> > > BIOS changes (simply mark the RAM reserved via existing methods).
> > > 4.b) would require passing a buffer containing the contents of the
> > > OpRegion via fw_cfg and letting the BIOS do the setup.  The latter
> > > of course requires modifying each BIOS for this support.
> > >
> > > Of course none of these support hotplug nor really can they since
> > > reserved memory regions are not dynamic in the architecture.
> > >
> > > In all cases, some piece of software needs to know where it can
> > > place the OpRegion in guest memory.  It seems like there are
> > > advantages or disadvantages whether that's done by QEMU or the BIOS,
> > > but we only need to do it once if it's QEMU.  Suggestions, comments,
> preferences?
> > >
> >
> > Hi Alex, another thing to consider is how to communicate to the guest
> > driver the address at 0xFC contains a valid GPA address that can be
> > accessed by the driver without causing a EPT fault - since the same driver
> will be used on other hypervisors and they may not EPT map OpRegion
> memory.  On idea proposed by display driver team is to set bit0 of the
> address to 1 for indicating OpRegion memory can be safely accessed by the
> guest driver.
> 
> Hi Allen,
> 
> Why is that any different than a guest accessing any other memory area that
> it shouldn't?  The OpRegion starts with a 16-byte ID string, so if the guest
> finds that it should feel fairly confident the OpRegion data is valid.  The
> published spec also seems to define all bits of 0xfc as valid, not implying any
> sort of alignment requirements, and the i915 driver does a memremap
> directly on the value read from 0xfc.  So I'm not sure whether there's really a
> need to or ability to define any of those bits in an adhoc way to indicate
> mapping.  If we do things right, shouldn't the guest driver not even know it's
> running in a VM, at least for the KVMGT-d case, so we need to be compatible
> with physical hardware.  Thanks,
> 

First of all, I would like to clarify I'm talking about general IGD passthrough case - not specific to KVMGT.  In IGD passthrough configuration, one of the following will happen when the driver accesses OpRegion:

1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM), guest can successfully read the content of the OpRegion and check the ID string.  In this case, everything works fine.

2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all, then guest driver's attempt to setup GVA/GPA mapping will fail, which causes the driver to fail.  In this case, guest driver won't have the opportunity to look into the content of OpRegion memory and check the ID string.

> Alex


^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-29 17:59       ` Alex Williamson
@ 2016-01-30  1:18         ` Kay, Allen M
  -1 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-01-30  1:18 UTC (permalink / raw)
  To: Alex Williamson, Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users



> -----Original Message-----
> From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> Williamson
> Sent: Friday, January 29, 2016 10:00 AM
> To: Gerd Hoffmann
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> users@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently testing
> for any Intel VGA device, but I wonder if I should only be enabling anything
> opregion if it also appears at a specific address.
> 

No.  Both Windows and Linux IGD driver should work at any PCI slot.  We have seen 0:5.0 in the guest and the driver works.

Allen

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-30  1:18         ` Kay, Allen M
  0 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-01-30  1:18 UTC (permalink / raw)
  To: Alex Williamson, Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users



> -----Original Message-----
> From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> Williamson
> Sent: Friday, January 29, 2016 10:00 AM
> To: Gerd Hoffmann
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> users@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently testing
> for any Intel VGA device, but I wonder if I should only be enabling anything
> opregion if it also appears at a specific address.
> 

No.  Both Windows and Linux IGD driver should work at any PCI slot.  We have seen 0:5.0 in the guest and the driver works.

Allen

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-30  1:18         ` Kay, Allen M
@ 2016-01-31 17:42           ` Alex Williamson
  -1 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-01-31 17:42 UTC (permalink / raw)
  To: Kay, Allen M, Gerd Hoffmann, David Woodhouse
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Sat, 2016-01-30 at 01:18 +0000, Kay, Allen M wrote:
> 
> > -----Original Message-----
> > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> > Williamson
> > Sent: Friday, January 29, 2016 10:00 AM
> > To: Gerd Hoffmann
> > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > users@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently testing
> > for any Intel VGA device, but I wonder if I should only be enabling anything
> > opregion if it also appears at a specific address.
> > 
> 
> No.  Both Windows and Linux IGD driver should work at any PCI slot.  We have seen 0:5.0 in the guest and the driver works.

Thanks Allen.  Another question, when I boot a VM with an assigned HD
P4000 GPU, my console stream with IOMMU faults, like:

DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000 

All of these fall within the host RMRR range for the device:

DMAR: Setting RMRR:
DMAR: Setting identity map for device 0000:00:02.0 [0x9f800000 - 0xaf9fffff]

A while back, we excluded devices using RMRRs from participating in
IOMMU API domains because they may continue to DMA to these reserved
regions after assignment, possibly corrupting VM memory
(c875d2c1b808).  Intel later decided this exclusion shouldn't apply to
graphics devices (18436afdc11a).  Don't the above IOMMU faults reveal
that exactly the problem we're trying to prevent by general exclusion of
RMRR encumbered devices from the IOMMU API is actually occuring?  If I
were to have VM memory within the RMRR address range, I wouldn't be
seeing these faults, I'd be having the GPU corrupt my VM memory.

David notes in the latter commit above:

"We should be able to successfully assign graphics devices to guests
too, as long as the initial handling of stolen memory is reconfigured
appropriately."

What code is supposed to be doing that reconfiguration when a device is
assigned?  Clearly we don't have it yet, making assignment of these
devices very unsafe.  It seems like vfio or IOMMU code  in the kernel
needs device specific code to clear these settings to make it safe for
userspace, then perhaps VM BIOS support to reallocate.  Is there any
consistency across IGD revisions for doing this?  Is there a spec?
Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-01-31 17:42           ` Alex Williamson
  0 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-01-31 17:42 UTC (permalink / raw)
  To: Kay, Allen M, Gerd Hoffmann, David Woodhouse
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Sat, 2016-01-30 at 01:18 +0000, Kay, Allen M wrote:
> 
> > -----Original Message-----
> > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> > Williamson
> > Sent: Friday, January 29, 2016 10:00 AM
> > To: Gerd Hoffmann
> > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > users@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently testing
> > for any Intel VGA device, but I wonder if I should only be enabling anything
> > opregion if it also appears at a specific address.
> > 
> 
> No.  Both Windows and Linux IGD driver should work at any PCI slot.  We have seen 0:5.0 in the guest and the driver works.

Thanks Allen.  Another question, when I boot a VM with an assigned HD
P4000 GPU, my console stream with IOMMU faults, like:

DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000 

All of these fall within the host RMRR range for the device:

DMAR: Setting RMRR:
DMAR: Setting identity map for device 0000:00:02.0 [0x9f800000 - 0xaf9fffff]

A while back, we excluded devices using RMRRs from participating in
IOMMU API domains because they may continue to DMA to these reserved
regions after assignment, possibly corrupting VM memory
(c875d2c1b808).  Intel later decided this exclusion shouldn't apply to
graphics devices (18436afdc11a).  Don't the above IOMMU faults reveal
that exactly the problem we're trying to prevent by general exclusion of
RMRR encumbered devices from the IOMMU API is actually occuring?  If I
were to have VM memory within the RMRR address range, I wouldn't be
seeing these faults, I'd be having the GPU corrupt my VM memory.

David notes in the latter commit above:

"We should be able to successfully assign graphics devices to guests
too, as long as the initial handling of stolen memory is reconfigured
appropriately."

What code is supposed to be doing that reconfiguration when a device is
assigned?  Clearly we don't have it yet, making assignment of these
devices very unsafe.  It seems like vfio or IOMMU code  in the kernel
needs device specific code to clear these settings to make it safe for
userspace, then perhaps VM BIOS support to reallocate.  Is there any
consistency across IGD revisions for doing this?  Is there a spec?
Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-29 17:59       ` Alex Williamson
@ 2016-02-01 12:49         ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-02-01 12:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

  Hi,

> Thanks for the tip that seabios allocated pages automatically become
> e820 reserved, that simplifies things a bit.

It's common practice for all firmware.  The e820 table from qemu is just
a starting point, it is not passed on to the guest os as-is.  All
permanent allocations (acpi tables, smbios tables, seabios driver data
such as virtio rings, ...) are taken away from RAM and added to
RESERVED, and IIRC seabios also takes care to reserve the bios and
option rom regions in real mode address space.

> > Maybe we should define the interface as "guest writes 0xfc to pick
> > address, qemu takes care to place opregion there".  That gives us the
> > freedom to change the qemu implementation (either copy host opregion or
> > map the host opregion) without breaking things.
> 
> Ok, so seabios allocates two pages, writes the base address of those
> pages to 0xfc and looks to see whether the signature appears at that
> address due to qemu mapping.  It verifies the size and does a
> free/realloc if not the right size.

I think seabios first needs to reserve something big enough for a
temporary mapping, to check signature + size, otherwise the opregion
might scratch data structures beyond opregion in case it happens to be
larger than 8k.

How likely is it that the opregion size ever changes?  Should we better
be prepared to handle it?  Or would it be ok to have a ...

   if (opregion_size > 8k)
      panic();

... style sanity check?

> If the graphics signature does not
> appear, free those pages and assume no opregion support.

Yes.

> If we later
> decide to use a copy, we'd need to disable the 0xfc automagic mapping
> and probably pass the data via fw_cfg.  Sound right?

I'd have qemu copy the data on 0xfc write then, so things continue to
work without updating seabios.  So, the firmware has to allocate space,
reserve it etc.,  and programming the 0xfc register.  Qemu has to make
sure the opregion appears at the address written by the firmware, by
whatever method it prefers.

> > lpc bridge is no problem, only pci id fields are copied over and
> > unprivileged access is allowed for them.
> > 
> > Copying the gfx registers of the host bridge is a problem indeed.
> 
> I would argue that both are really a problem, libvirt wants to put QEMU
> in a container that prevents access to any host system files other than
> those explicitly allowed.  Therefore libvirt needs to grant the process
> access to the lpc sysfs config file even though it only needs user
> visible register values.

Yes, correct.  We want svirt be as strict as possible.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-01 12:49         ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-02-01 12:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

  Hi,

> Thanks for the tip that seabios allocated pages automatically become
> e820 reserved, that simplifies things a bit.

It's common practice for all firmware.  The e820 table from qemu is just
a starting point, it is not passed on to the guest os as-is.  All
permanent allocations (acpi tables, smbios tables, seabios driver data
such as virtio rings, ...) are taken away from RAM and added to
RESERVED, and IIRC seabios also takes care to reserve the bios and
option rom regions in real mode address space.

> > Maybe we should define the interface as "guest writes 0xfc to pick
> > address, qemu takes care to place opregion there".  That gives us the
> > freedom to change the qemu implementation (either copy host opregion or
> > map the host opregion) without breaking things.
> 
> Ok, so seabios allocates two pages, writes the base address of those
> pages to 0xfc and looks to see whether the signature appears at that
> address due to qemu mapping.  It verifies the size and does a
> free/realloc if not the right size.

I think seabios first needs to reserve something big enough for a
temporary mapping, to check signature + size, otherwise the opregion
might scratch data structures beyond opregion in case it happens to be
larger than 8k.

How likely is it that the opregion size ever changes?  Should we better
be prepared to handle it?  Or would it be ok to have a ...

   if (opregion_size > 8k)
      panic();

... style sanity check?

> If the graphics signature does not
> appear, free those pages and assume no opregion support.

Yes.

> If we later
> decide to use a copy, we'd need to disable the 0xfc automagic mapping
> and probably pass the data via fw_cfg.  Sound right?

I'd have qemu copy the data on 0xfc write then, so things continue to
work without updating seabios.  So, the firmware has to allocate space,
reserve it etc.,  and programming the 0xfc register.  Qemu has to make
sure the opregion appears at the address written by the firmware, by
whatever method it prefers.

> > lpc bridge is no problem, only pci id fields are copied over and
> > unprivileged access is allowed for them.
> > 
> > Copying the gfx registers of the host bridge is a problem indeed.
> 
> I would argue that both are really a problem, libvirt wants to put QEMU
> in a container that prevents access to any host system files other than
> those explicitly allowed.  Therefore libvirt needs to grant the process
> access to the lpc sysfs config file even though it only needs user
> visible register values.

Yes, correct.  We want svirt be as strict as possible.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-01 22:16           ` Alex Williamson
  0 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-02-01 22:16 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Mon, 2016-02-01 at 13:49 +0100, Gerd Hoffmann wrote:
> > > Maybe we should define the interface as "guest writes 0xfc to pick
> > > address, qemu takes care to place opregion there".  That gives us the
> > > freedom to change the qemu implementation (either copy host opregion or
> > > map the host opregion) without breaking things.
> > 
> > Ok, so seabios allocates two pages, writes the base address of those
> > pages to 0xfc and looks to see whether the signature appears at that
> > address due to qemu mapping.  It verifies the size and does a
> > free/realloc if not the right size.
> 
> I think seabios first needs to reserve something big enough for a
> temporary mapping, to check signature + size, otherwise the opregion
> might scratch data structures beyond opregion in case it happens to be
> larger than 8k.
> 
> How likely is it that the opregion size ever changes?  Should we better
> be prepared to handle it?  Or would it be ok to have a ...
> 
>    if (opregion_size > 8k)
>       panic();
> 
> ... style sanity check?
> 

The patch below is what I'm working with now, it assumes that the
opregion is 8K, maps, verifies, and re-allocs if it's a different size.
Maybe it is safer to abort if it is over 8K, but we're not actually
clobbering anything with the mapping, we're just temporarily mapping
over it.  So if there's not another thread of execution that could be
accessing something there and we're not stepping on our own stack or
data, it doesn't seem like there's a problem.

diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
index c31c2fa..4f3251e 100644
--- a/src/fw/pciinit.c
+++ b/src/fw/pciinit.c
@@ -257,6 +257,52 @@ static void ich9_smbus_setup(struct pci_device *dev, void *
     pci_config_writeb(bdf, ICH9_SMB_HOSTC, ICH9_SMB_HOSTC_HST_EN);
 }
 
+static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
+{
+    u16 bdf = dev->bdf;
+    u32 orig;
+    void *opregion;
+    int size = 8;
+
+    if (!CONFIG_QEMU)
+        return;
+
+    orig = pci_config_readl(bdf, 0xFC);
+
+realloc:
+    opregion = malloc_high(size * 1024);
+    if (!opregion) {
+        warn_noalloc();
+        return;
+    }
+
+    /*
+     * QEMU maps the OpRegion into system memory at the address written here,
+     * this overlaps our malloc, which marks the range e820 reserved.
+     */
+    pci_config_writel(bdf, 0xFC, cpu_to_le32((u32)opregion));
+
+    if (memcmp(opregion, "IntelGraphicsMem", 16)) {
+        pci_config_writel(bdf, 0xFC, orig);
+        free(opregion);
+        return; /* the opregion didn't magically appear, not supported */
+    }
+
+    if (size == le32_to_cpu(*(u32 *)(opregion + 16))) {
+        dprintf(1, "Intel IGD OpRegion enabled on %02x:%02x.%x\n",
+                pci_bdf_to_bus(bdf), pci_bdf_to_dev(bdf), pci_bdf_to_fn(bdf));
+        return; /* success! */
+    }
+
+    pci_config_writel(bdf, 0xFC, orig);
+    free(opregion);
+
+    if (size == 8) { /* try once more with a new size */
+        size = le32_to_cpu(*(u32 *)(opregion + 16));
+        goto realloc;
+    }
+}
+
 static const struct pci_device_id pci_device_tbl[] = {
     /* PIIX3/PIIX4 PCI to ISA bridge */
     PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_0,
@@ -290,6 +336,10 @@ static const struct pci_device_id pci_device_tbl[] = {
     PCI_DEVICE_CLASS(PCI_VENDOR_ID_APPLE, 0x0017, 0xff00, apple_macio_setup),
     PCI_DEVICE_CLASS(PCI_VENDOR_ID_APPLE, 0x0022, 0xff00, apple_macio_setup),
 
+    /* Intel IGD OpRegion setup */
+    PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA,
+                     intel_igd_opregion_setup),
+
     PCI_DEVICE_END,
 };
 

> > If the graphics signature does not
> > appear, free those pages and assume no opregion support.
> 
> Yes.
> 
> > If we later
> > decide to use a copy, we'd need to disable the 0xfc automagic mapping
> > and probably pass the data via fw_cfg.  Sound right?
> 
> I'd have qemu copy the data on 0xfc write then, so things continue to
> work without updating seabios.  So, the firmware has to allocate space,
> reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> sure the opregion appears at the address written by the firmware, by
> whatever method it prefers.

Ah, so here is where we'd clobber data in firmware.  I currently do
this in vfio's pci config write in QEMU:

        orig = pci_get_long(pdev->config + IGD_OPREGION);
        pci_default_write_config(pdev, addr, val, len);
        cur = pci_get_long(pdev->config + IGD_OPREGION);

        if (cur != orig) {
            if (orig) {
                memory_region_del_subregion(get_system_memory(),
                                            vdev->igd_opregion->mem);
            }

            if (cur) {
                memory_region_add_subregion(get_system_memory(),
                                            cur, vdev->igd_opregion->mem);
            }
        }

This means that fw can write 0x0 back to the ASL storage register and
the mapping goes away, no firmware data is overwritten and the overlap
was temporary.  If we copy it into the firmware provided buffer with
firmware not knowing the actual size then yes, we've just clobbered
something and it can't be recovered.  I'll post my patches and we can
hash out whether there's a better approach over something a little more
concrete.  I can see the opregion gets exposed and the guest driver does
use it.  I'm not entirely sure what functionality it's adding though
since a cursory test of booting an FC23 live iso image seems to
initialize the display correctly with or without the opregion.

> > > lpc bridge is no problem, only pci id fields are copied over and
> > > unprivileged access is allowed for them.
> > > 
> > > Copying the gfx registers of the host bridge is a problem indeed.
> > 
> > I would argue that both are really a problem, libvirt wants to put QEMU
> > in a container that prevents access to any host system files other than
> > those explicitly allowed.  Therefore libvirt needs to grant the process
> > access to the lpc sysfs config file even though it only needs user
> > visible register values.
> 
> Yes, correct.  We want svirt be as strict as possible.

So it might be a good idea to expose these through vfio.  What about
stolen memory?  I noted the IOMMU faults that I get when assigning IGD,
the bulk of it seems to be to the memory reserved as stolen for the GPU.
I can avoid those by clearing the guest view of the BDSM register, but I
think then we're just leaving stolen memory unused, which seems rather
wasteful.  Trying to identity map that stolen memory into the VM so that
we don't need to reconfigure the GPU seems problematic, but if vfio
exposed it as another region, we could do the same trick of mapping into
the VM address space.  The size of stolen memory is quite variable, so
we couldn't just assume a size.  We'd also need to know how to
reconfigure (and restore) the GPU for a new location, the BDSM register
just reports it.  Thanks,

Alex

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* Re: [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-01 22:16           ` Alex Williamson
  0 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-02-01 22:16 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, qemu-devel-qX2TKyscuCcdnm+yROfE0A, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA

On Mon, 2016-02-01 at 13:49 +0100, Gerd Hoffmann wrote:
> > > Maybe we should define the interface as "guest writes 0xfc to pick
> > > address, qemu takes care to place opregion there".  That gives us the
> > > freedom to change the qemu implementation (either copy host opregion or
> > > map the host opregion) without breaking things.
> > 
> > Ok, so seabios allocates two pages, writes the base address of those
> > pages to 0xfc and looks to see whether the signature appears at that
> > address due to qemu mapping.  It verifies the size and does a
> > free/realloc if not the right size.
> 
> I think seabios first needs to reserve something big enough for a
> temporary mapping, to check signature + size, otherwise the opregion
> might scratch data structures beyond opregion in case it happens to be
> larger than 8k.
> 
> How likely is it that the opregion size ever changes?  Should we better
> be prepared to handle it?  Or would it be ok to have a ...
> 
>    if (opregion_size > 8k)
>       panic();
> 
> ... style sanity check?
> 

The patch below is what I'm working with now, it assumes that the
opregion is 8K, maps, verifies, and re-allocs if it's a different size.
Maybe it is safer to abort if it is over 8K, but we're not actually
clobbering anything with the mapping, we're just temporarily mapping
over it.  So if there's not another thread of execution that could be
accessing something there and we're not stepping on our own stack or
data, it doesn't seem like there's a problem.

diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
index c31c2fa..4f3251e 100644
--- a/src/fw/pciinit.c
+++ b/src/fw/pciinit.c
@@ -257,6 +257,52 @@ static void ich9_smbus_setup(struct pci_device *dev, void *
     pci_config_writeb(bdf, ICH9_SMB_HOSTC, ICH9_SMB_HOSTC_HST_EN);
 }
 
+static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
+{
+    u16 bdf = dev->bdf;
+    u32 orig;
+    void *opregion;
+    int size = 8;
+
+    if (!CONFIG_QEMU)
+        return;
+
+    orig = pci_config_readl(bdf, 0xFC);
+
+realloc:
+    opregion = malloc_high(size * 1024);
+    if (!opregion) {
+        warn_noalloc();
+        return;
+    }
+
+    /*
+     * QEMU maps the OpRegion into system memory at the address written here,
+     * this overlaps our malloc, which marks the range e820 reserved.
+     */
+    pci_config_writel(bdf, 0xFC, cpu_to_le32((u32)opregion));
+
+    if (memcmp(opregion, "IntelGraphicsMem", 16)) {
+        pci_config_writel(bdf, 0xFC, orig);
+        free(opregion);
+        return; /* the opregion didn't magically appear, not supported */
+    }
+
+    if (size == le32_to_cpu(*(u32 *)(opregion + 16))) {
+        dprintf(1, "Intel IGD OpRegion enabled on %02x:%02x.%x\n",
+                pci_bdf_to_bus(bdf), pci_bdf_to_dev(bdf), pci_bdf_to_fn(bdf));
+        return; /* success! */
+    }
+
+    pci_config_writel(bdf, 0xFC, orig);
+    free(opregion);
+
+    if (size == 8) { /* try once more with a new size */
+        size = le32_to_cpu(*(u32 *)(opregion + 16));
+        goto realloc;
+    }
+}
+
 static const struct pci_device_id pci_device_tbl[] = {
     /* PIIX3/PIIX4 PCI to ISA bridge */
     PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_0,
@@ -290,6 +336,10 @@ static const struct pci_device_id pci_device_tbl[] = {
     PCI_DEVICE_CLASS(PCI_VENDOR_ID_APPLE, 0x0017, 0xff00, apple_macio_setup),
     PCI_DEVICE_CLASS(PCI_VENDOR_ID_APPLE, 0x0022, 0xff00, apple_macio_setup),
 
+    /* Intel IGD OpRegion setup */
+    PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA,
+                     intel_igd_opregion_setup),
+
     PCI_DEVICE_END,
 };
 

> > If the graphics signature does not
> > appear, free those pages and assume no opregion support.
> 
> Yes.
> 
> > If we later
> > decide to use a copy, we'd need to disable the 0xfc automagic mapping
> > and probably pass the data via fw_cfg.  Sound right?
> 
> I'd have qemu copy the data on 0xfc write then, so things continue to
> work without updating seabios.  So, the firmware has to allocate space,
> reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> sure the opregion appears at the address written by the firmware, by
> whatever method it prefers.

Ah, so here is where we'd clobber data in firmware.  I currently do
this in vfio's pci config write in QEMU:

        orig = pci_get_long(pdev->config + IGD_OPREGION);
        pci_default_write_config(pdev, addr, val, len);
        cur = pci_get_long(pdev->config + IGD_OPREGION);

        if (cur != orig) {
            if (orig) {
                memory_region_del_subregion(get_system_memory(),
                                            vdev->igd_opregion->mem);
            }

            if (cur) {
                memory_region_add_subregion(get_system_memory(),
                                            cur, vdev->igd_opregion->mem);
            }
        }

This means that fw can write 0x0 back to the ASL storage register and
the mapping goes away, no firmware data is overwritten and the overlap
was temporary.  If we copy it into the firmware provided buffer with
firmware not knowing the actual size then yes, we've just clobbered
something and it can't be recovered.  I'll post my patches and we can
hash out whether there's a better approach over something a little more
concrete.  I can see the opregion gets exposed and the guest driver does
use it.  I'm not entirely sure what functionality it's adding though
since a cursory test of booting an FC23 live iso image seems to
initialize the display correctly with or without the opregion.

> > > lpc bridge is no problem, only pci id fields are copied over and
> > > unprivileged access is allowed for them.
> > > 
> > > Copying the gfx registers of the host bridge is a problem indeed.
> > 
> > I would argue that both are really a problem, libvirt wants to put QEMU
> > in a container that prevents access to any host system files other than
> > those explicitly allowed.  Therefore libvirt needs to grant the process
> > access to the lpc sysfs config file even though it only needs user
> > visible register values.
> 
> Yes, correct.  We want svirt be as strict as possible.

So it might be a good idea to expose these through vfio.  What about
stolen memory?  I noted the IOMMU faults that I get when assigning IGD,
the bulk of it seems to be to the memory reserved as stolen for the GPU.
I can avoid those by clearing the guest view of the BDSM register, but I
think then we're just leaving stolen memory unused, which seems rather
wasteful.  Trying to identity map that stolen memory into the VM so that
we don't need to reconfigure the GPU seems problematic, but if vfio
exposed it as another region, we could do the same trick of mapping into
the VM address space.  The size of stolen memory is quite variable, so
we couldn't just assume a size.  We'd also need to know how to
reconfigure (and restore) the GPU for a new location, the BDSM register
just reports it.  Thanks,

Alex

_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users

^ permalink raw reply related	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-31 17:42           ` Alex Williamson
@ 2016-02-02  0:04             ` Kay, Allen M
  -1 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-02-02  0:04 UTC (permalink / raw)
  To: Alex Williamson, Gerd Hoffmann, David Woodhouse
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Sunday, January 31, 2016 9:42 AM
> To: Kay, Allen M; Gerd Hoffmann; David Woodhouse
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> users@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> On Sat, 2016-01-30 at 01:18 +0000, Kay, Allen M wrote:
> >
> > > -----Original Message-----
> > > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> > > Williamson
> > > Sent: Friday, January 29, 2016 10:00 AM
> > > To: Gerd Hoffmann
> > > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo
> > > Habkost; Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > > users@redhat.com
> > > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > chipset tweaks
> > >
> > > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently
> > > testing for any Intel VGA device, but I wonder if I should only be
> > > enabling anything opregion if it also appears at a specific address.
> > >
> >
> > No.  Both Windows and Linux IGD driver should work at any PCI slot.  We
> have seen 0:5.0 in the guest and the driver works.
> 
> Thanks Allen.  Another question, when I boot a VM with an assigned HD
> P4000 GPU, my console stream with IOMMU faults, like:
> 
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> 
> All of these fall within the host RMRR range for the device:
> 
> DMAR: Setting RMRR:
> DMAR: Setting identity map for device 0000:00:02.0 [0x9f800000 - 0xaf9fffff]

Hi Alex,

Do you configure IGD as primary or secondary display in your KVM setup?   If primary, are you running Intel vBIOS as part of guest boot?

On BDW/SKL systems, we have started to configure IGD as secondary and QEMU VGA and primary.  In this setup, we are no longer running vBIOS in the guest which avoids some complications.  vBIOS uses stolen memory for display buffers which requires RMRR mapping.  We have been using similar setup (IGD as secondary) on other hypervisors and have not seen IOMMU faults.

I will setup a KVM configuration on my SKL and see if I can duplicate your problem here.   I will try to call into Don's Thursday meeting to discuss this (I'm on call for jury duty this week).  I will give you a heads up on Wednesday evening.

Allen

> 
> A while back, we excluded devices using RMRRs from participating in IOMMU
> API domains because they may continue to DMA to these reserved regions
> after assignment, possibly corrupting VM memory (c875d2c1b808).  Intel
> later decided this exclusion shouldn't apply to graphics devices
> (18436afdc11a).  Don't the above IOMMU faults reveal that exactly the
> problem we're trying to prevent by general exclusion of RMRR encumbered
> devices from the IOMMU API is actually occuring?  If I were to have VM
> memory within the RMRR address range, I wouldn't be seeing these faults,
> I'd be having the GPU corrupt my VM memory.
> 
> David notes in the latter commit above:
> 
> "We should be able to successfully assign graphics devices to guests too, as
> long as the initial handling of stolen memory is reconfigured appropriately."
> 
> What code is supposed to be doing that reconfiguration when a device is
> assigned?  Clearly we don't have it yet, making assignment of these devices
> very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> specific code to clear these settings to make it safe for userspace, then
> perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> revisions for doing this?  Is there a spec?
> Thanks,
> 
> Alex


^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02  0:04             ` Kay, Allen M
  0 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-02-02  0:04 UTC (permalink / raw)
  To: Alex Williamson, Gerd Hoffmann, David Woodhouse
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Sunday, January 31, 2016 9:42 AM
> To: Kay, Allen M; Gerd Hoffmann; David Woodhouse
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> users@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> On Sat, 2016-01-30 at 01:18 +0000, Kay, Allen M wrote:
> >
> > > -----Original Message-----
> > > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> > > Williamson
> > > Sent: Friday, January 29, 2016 10:00 AM
> > > To: Gerd Hoffmann
> > > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo
> > > Habkost; Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > > users@redhat.com
> > > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > chipset tweaks
> > >
> > > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently
> > > testing for any Intel VGA device, but I wonder if I should only be
> > > enabling anything opregion if it also appears at a specific address.
> > >
> >
> > No.  Both Windows and Linux IGD driver should work at any PCI slot.  We
> have seen 0:5.0 in the guest and the driver works.
> 
> Thanks Allen.  Another question, when I boot a VM with an assigned HD
> P4000 GPU, my console stream with IOMMU faults, like:
> 
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> 
> All of these fall within the host RMRR range for the device:
> 
> DMAR: Setting RMRR:
> DMAR: Setting identity map for device 0000:00:02.0 [0x9f800000 - 0xaf9fffff]

Hi Alex,

Do you configure IGD as primary or secondary display in your KVM setup?   If primary, are you running Intel vBIOS as part of guest boot?

On BDW/SKL systems, we have started to configure IGD as secondary and QEMU VGA and primary.  In this setup, we are no longer running vBIOS in the guest which avoids some complications.  vBIOS uses stolen memory for display buffers which requires RMRR mapping.  We have been using similar setup (IGD as secondary) on other hypervisors and have not seen IOMMU faults.

I will setup a KVM configuration on my SKL and see if I can duplicate your problem here.   I will try to call into Don's Thursday meeting to discuss this (I'm on call for jury duty this week).  I will give you a heads up on Wednesday evening.

Allen

> 
> A while back, we excluded devices using RMRRs from participating in IOMMU
> API domains because they may continue to DMA to these reserved regions
> after assignment, possibly corrupting VM memory (c875d2c1b808).  Intel
> later decided this exclusion shouldn't apply to graphics devices
> (18436afdc11a).  Don't the above IOMMU faults reveal that exactly the
> problem we're trying to prevent by general exclusion of RMRR encumbered
> devices from the IOMMU API is actually occuring?  If I were to have VM
> memory within the RMRR address range, I wouldn't be seeing these faults,
> I'd be having the GPU corrupt my VM memory.
> 
> David notes in the latter commit above:
> 
> "We should be able to successfully assign graphics devices to guests too, as
> long as the initial handling of stolen memory is reconfigured appropriately."
> 
> What code is supposed to be doing that reconfiguration when a device is
> assigned?  Clearly we don't have it yet, making assignment of these devices
> very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> specific code to clear these settings to make it safe for userspace, then
> perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> revisions for doing this?  Is there a spec?
> Thanks,
> 
> Alex


^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02  0:04             ` Kay, Allen M
@ 2016-02-02  6:42               ` Tian, Kevin
  -1 siblings, 0 replies; 132+ messages in thread
From: Tian, Kevin @ 2016-02-02  6:42 UTC (permalink / raw)
  To: Kay, Allen M, Alex Williamson, Gerd Hoffmann, David Woodhouse
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

> From: Kay, Allen M
> Sent: Tuesday, February 02, 2016 8:04 AM
> >
> > David notes in the latter commit above:
> >
> > "We should be able to successfully assign graphics devices to guests too, as
> > long as the initial handling of stolen memory is reconfigured appropriately."
> >
> > What code is supposed to be doing that reconfiguration when a device is
> > assigned?  Clearly we don't have it yet, making assignment of these devices
> > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> > specific code to clear these settings to make it safe for userspace, then
> > perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> > revisions for doing this?  Is there a spec?
> > Thanks,

I don't think stolen memory should be handled explicitly. If yes, it should be
listed as a RMRR region so general RMRR setup will cover it. But as Allen
pointed out, the whole RMRR becomes unnecessary if we target only secondary
device for IGD.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02  6:42               ` Tian, Kevin
  0 siblings, 0 replies; 132+ messages in thread
From: Tian, Kevin @ 2016-02-02  6:42 UTC (permalink / raw)
  To: Kay, Allen M, Alex Williamson, Gerd Hoffmann, David Woodhouse
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

> From: Kay, Allen M
> Sent: Tuesday, February 02, 2016 8:04 AM
> >
> > David notes in the latter commit above:
> >
> > "We should be able to successfully assign graphics devices to guests too, as
> > long as the initial handling of stolen memory is reconfigured appropriately."
> >
> > What code is supposed to be doing that reconfiguration when a device is
> > assigned?  Clearly we don't have it yet, making assignment of these devices
> > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> > specific code to clear these settings to make it safe for userspace, then
> > perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> > revisions for doing this?  Is there a spec?
> > Thanks,

I don't think stolen memory should be handled explicitly. If yes, it should be
listed as a RMRR region so general RMRR setup will cover it. But as Allen
pointed out, the whole RMRR becomes unnecessary if we target only secondary
device for IGD.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02  7:01           ` Tian, Kevin
  0 siblings, 0 replies; 132+ messages in thread
From: Tian, Kevin @ 2016-02-02  7:01 UTC (permalink / raw)
  To: Gerd Hoffmann, Alex Williamson
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

> From: Gerd Hoffmann
> Sent: Monday, February 01, 2016 8:49 PM
> 
>   Hi,
> 
> > Thanks for the tip that seabios allocated pages automatically become
> > e820 reserved, that simplifies things a bit.
> 
> It's common practice for all firmware.  The e820 table from qemu is just
> a starting point, it is not passed on to the guest os as-is.  All
> permanent allocations (acpi tables, smbios tables, seabios driver data
> such as virtio rings, ...) are taken away from RAM and added to
> RESERVED, and IIRC seabios also takes care to reserve the bios and
> option rom regions in real mode address space.

Agree. It's cleaner to have seabios do allocation otherwise it's prune to
cause conflict if vfio-pci randomly picks up an address (though we can
do special tweaks within seabios to skip that one by reading 0xFC).

> 
> > > Maybe we should define the interface as "guest writes 0xfc to pick
> > > address, qemu takes care to place opregion there".  That gives us the
> > > freedom to change the qemu implementation (either copy host opregion or
> > > map the host opregion) without breaking things.
> >
> > Ok, so seabios allocates two pages, writes the base address of those
> > pages to 0xfc and looks to see whether the signature appears at that
> > address due to qemu mapping.  It verifies the size and does a
> > free/realloc if not the right size.
> 
> I think seabios first needs to reserve something big enough for a
> temporary mapping, to check signature + size, otherwise the opregion
> might scratch data structures beyond opregion in case it happens to be
> larger than 8k.
> 
> How likely is it that the opregion size ever changes?  Should we better
> be prepared to handle it?  Or would it be ok to have a ...
> 
>    if (opregion_size > 8k)
>       panic();
> 
> ... style sanity check?

Above sanity check should be enough. We use 8k in KVMGT too.

> 
> > If the graphics signature does not
> > appear, free those pages and assume no opregion support.
> 
> Yes.
> 
> > If we later
> > decide to use a copy, we'd need to disable the 0xfc automagic mapping
> > and probably pass the data via fw_cfg.  Sound right?
> 
> I'd have qemu copy the data on 0xfc write then, so things continue to
> work without updating seabios.  So, the firmware has to allocate space,
> reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> sure the opregion appears at the address written by the firmware, by
> whatever method it prefers.

Yup. It's Qemu's responsibility to expose opregion content. 

btw, prefer to do copying here. It's pointless to allow write from guest
side. One write example is SWSCI mailbox, thru which gfx driver can
trigger some SCI event to communicate with BIOS (specifically ACPI
methods here), mostly for some monitor operations. However it's 
not a right thing for guest to trigger host SCI and thus kick host 
ACPI methods.

> 
> > > lpc bridge is no problem, only pci id fields are copied over and
> > > unprivileged access is allowed for them.
> > >
> > > Copying the gfx registers of the host bridge is a problem indeed.
> >
> > I would argue that both are really a problem, libvirt wants to put QEMU
> > in a container that prevents access to any host system files other than
> > those explicitly allowed.  Therefore libvirt needs to grant the process
> > access to the lpc sysfs config file even though it only needs user
> > visible register values.
> 
> Yes, correct.  We want svirt be as strict as possible.
> 

That is the most tricky part in IGD pass-thru, which is why Intel decides
to remove those dependencies to make pass-thru easier.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02  7:01           ` Tian, Kevin
  0 siblings, 0 replies; 132+ messages in thread
From: Tian, Kevin @ 2016-02-02  7:01 UTC (permalink / raw)
  To: Gerd Hoffmann, Alex Williamson
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, qemu-devel-qX2TKyscuCcdnm+yROfE0A, Cao jin,
	vfio-users-H+wXaHxf7aLQT0dZR+AlfA

> From: Gerd Hoffmann
> Sent: Monday, February 01, 2016 8:49 PM
> 
>   Hi,
> 
> > Thanks for the tip that seabios allocated pages automatically become
> > e820 reserved, that simplifies things a bit.
> 
> It's common practice for all firmware.  The e820 table from qemu is just
> a starting point, it is not passed on to the guest os as-is.  All
> permanent allocations (acpi tables, smbios tables, seabios driver data
> such as virtio rings, ...) are taken away from RAM and added to
> RESERVED, and IIRC seabios also takes care to reserve the bios and
> option rom regions in real mode address space.

Agree. It's cleaner to have seabios do allocation otherwise it's prune to
cause conflict if vfio-pci randomly picks up an address (though we can
do special tweaks within seabios to skip that one by reading 0xFC).

> 
> > > Maybe we should define the interface as "guest writes 0xfc to pick
> > > address, qemu takes care to place opregion there".  That gives us the
> > > freedom to change the qemu implementation (either copy host opregion or
> > > map the host opregion) without breaking things.
> >
> > Ok, so seabios allocates two pages, writes the base address of those
> > pages to 0xfc and looks to see whether the signature appears at that
> > address due to qemu mapping.  It verifies the size and does a
> > free/realloc if not the right size.
> 
> I think seabios first needs to reserve something big enough for a
> temporary mapping, to check signature + size, otherwise the opregion
> might scratch data structures beyond opregion in case it happens to be
> larger than 8k.
> 
> How likely is it that the opregion size ever changes?  Should we better
> be prepared to handle it?  Or would it be ok to have a ...
> 
>    if (opregion_size > 8k)
>       panic();
> 
> ... style sanity check?

Above sanity check should be enough. We use 8k in KVMGT too.

> 
> > If the graphics signature does not
> > appear, free those pages and assume no opregion support.
> 
> Yes.
> 
> > If we later
> > decide to use a copy, we'd need to disable the 0xfc automagic mapping
> > and probably pass the data via fw_cfg.  Sound right?
> 
> I'd have qemu copy the data on 0xfc write then, so things continue to
> work without updating seabios.  So, the firmware has to allocate space,
> reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> sure the opregion appears at the address written by the firmware, by
> whatever method it prefers.

Yup. It's Qemu's responsibility to expose opregion content. 

btw, prefer to do copying here. It's pointless to allow write from guest
side. One write example is SWSCI mailbox, thru which gfx driver can
trigger some SCI event to communicate with BIOS (specifically ACPI
methods here), mostly for some monitor operations. However it's 
not a right thing for guest to trigger host SCI and thus kick host 
ACPI methods.

> 
> > > lpc bridge is no problem, only pci id fields are copied over and
> > > unprivileged access is allowed for them.
> > >
> > > Copying the gfx registers of the host bridge is a problem indeed.
> >
> > I would argue that both are really a problem, libvirt wants to put QEMU
> > in a container that prevents access to any host system files other than
> > those explicitly allowed.  Therefore libvirt needs to grant the process
> > access to the lpc sysfs config file even though it only needs user
> > visible register values.
> 
> Yes, correct.  We want svirt be as strict as possible.
> 

That is the most tricky part in IGD pass-thru, which is why Intel decides
to remove those dependencies to make pass-thru easier.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-01-29 21:58         ` Kay, Allen M
@ 2016-02-02  7:07           ` Tian, Kevin
  -1 siblings, 0 replies; 132+ messages in thread
From: Tian, Kevin @ 2016-02-02  7:07 UTC (permalink / raw)
  To: Kay, Allen M, Alex Williamson, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users

> From: Kay, Allen M
> Sent: Saturday, January 30, 2016 5:58 AM
>
> First of all, I would like to clarify I'm talking about general IGD passthrough case - not
> specific to KVMGT.  In IGD passthrough configuration, one of the following will happen
> when the driver accesses OpRegion:
> 
> 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by pre-map it (i.e. Xen)
> or map it during EPT page fault (i.e. KVM), guest can successfully read the content of the
> OpRegion and check the ID string.  In this case, everything works fine.
> 
> 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all, then guest driver's
> attempt to setup GVA/GPA mapping will fail, which causes the driver to fail.  In this case,
> guest driver won't have the opportunity to look into the content of OpRegion memory and
> check the ID string.
> 

Guest mapping of GVA->GPA can always succeed regardless of whether
GPA->HPA is valid. Failure will happen only when the GVA is actually
accessed by guest.

I don't understand 2). If hypervisor doesn't want to setup mapping,
there is no chance for guest driver to get opregion content, right? Or
do you mean some hypervisor wants to emulate the opregion access?
but even in that case there's no failure per se except in a slower path.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02  7:07           ` Tian, Kevin
  0 siblings, 0 replies; 132+ messages in thread
From: Tian, Kevin @ 2016-02-02  7:07 UTC (permalink / raw)
  To: Kay, Allen M, Alex Williamson, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users

> From: Kay, Allen M
> Sent: Saturday, January 30, 2016 5:58 AM
>
> First of all, I would like to clarify I'm talking about general IGD passthrough case - not
> specific to KVMGT.  In IGD passthrough configuration, one of the following will happen
> when the driver accesses OpRegion:
> 
> 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by pre-map it (i.e. Xen)
> or map it during EPT page fault (i.e. KVM), guest can successfully read the content of the
> OpRegion and check the ID string.  In this case, everything works fine.
> 
> 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all, then guest driver's
> attempt to setup GVA/GPA mapping will fail, which causes the driver to fail.  In this case,
> guest driver won't have the opportunity to look into the content of OpRegion memory and
> check the ID string.
> 

Guest mapping of GVA->GPA can always succeed regardless of whether
GPA->HPA is valid. Failure will happen only when the GVA is actually
accessed by guest.

I don't understand 2). If hypervisor doesn't want to setup mapping,
there is no chance for guest driver to get opregion content, right? Or
do you mean some hypervisor wants to emulate the opregion access?
but even in that case there's no failure per se except in a slower path.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-01 22:16           ` Alex Williamson
@ 2016-02-02  7:43             ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-02-02  7:43 UTC (permalink / raw)
  To: Alex Williamson
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

  Hi,

> +realloc:
> +    opregion = malloc_high(size * 1024);

memalign_high(PAGE_SIZE, size * 1024);

> > I'd have qemu copy the data on 0xfc write then, so things continue to
> > work without updating seabios.  So, the firmware has to allocate space,
> > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > sure the opregion appears at the address written by the firmware, by
> > whatever method it prefers.
> 
> Ah, so here is where we'd clobber data in firmware.  I currently do
> this in vfio's pci config write in QEMU:
> 
>         orig = pci_get_long(pdev->config + IGD_OPREGION);
>         pci_default_write_config(pdev, addr, val, len);
>         cur = pci_get_long(pdev->config + IGD_OPREGION);
> 
>         if (cur != orig) {
>             if (orig) {
>                 memory_region_del_subregion(get_system_memory(),
>                                             vdev->igd_opregion->mem);
>             }
> 
>             if (cur) {
>                 memory_region_add_subregion(get_system_memory(),
>                                             cur, vdev->igd_opregion->mem);
>             }
>         }

Ok, so we avoid the clobber and qemu sill has the choice to implement
the opregion in different ways, by simply changing how
vdev->igd_opregion->mem is backed.  Good.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02  7:43             ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-02-02  7:43 UTC (permalink / raw)
  To: Alex Williamson
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

  Hi,

> +realloc:
> +    opregion = malloc_high(size * 1024);

memalign_high(PAGE_SIZE, size * 1024);

> > I'd have qemu copy the data on 0xfc write then, so things continue to
> > work without updating seabios.  So, the firmware has to allocate space,
> > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > sure the opregion appears at the address written by the firmware, by
> > whatever method it prefers.
> 
> Ah, so here is where we'd clobber data in firmware.  I currently do
> this in vfio's pci config write in QEMU:
> 
>         orig = pci_get_long(pdev->config + IGD_OPREGION);
>         pci_default_write_config(pdev, addr, val, len);
>         cur = pci_get_long(pdev->config + IGD_OPREGION);
> 
>         if (cur != orig) {
>             if (orig) {
>                 memory_region_del_subregion(get_system_memory(),
>                                             vdev->igd_opregion->mem);
>             }
> 
>             if (cur) {
>                 memory_region_add_subregion(get_system_memory(),
>                                             cur, vdev->igd_opregion->mem);
>             }
>         }

Ok, so we avoid the clobber and qemu sill has the choice to implement
the opregion in different ways, by simply changing how
vdev->igd_opregion->mem is backed.  Good.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02  7:01           ` [iGVT-g] " Tian, Kevin
@ 2016-02-02  8:56             ` Gerd Hoffmann
  -1 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-02-02  8:56 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Alex Williamson, Cao jin, vfio-users

  Hi,

> > I'd have qemu copy the data on 0xfc write then, so things continue to
> > work without updating seabios.  So, the firmware has to allocate space,
> > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > sure the opregion appears at the address written by the firmware, by
> > whatever method it prefers.
> 
> Yup. It's Qemu's responsibility to expose opregion content. 
> 
> btw, prefer to do copying here. It's pointless to allow write from guest
> side. One write example is SWSCI mailbox, thru which gfx driver can
> trigger some SCI event to communicate with BIOS (specifically ACPI
> methods here), mostly for some monitor operations. However it's 
> not a right thing for guest to trigger host SCI and thus kick host 
> ACPI methods.

Thanks.

So, question again how we do that best.  Option one being the mmap way,
i.e. basically what the patches posted by alex are doing.  Option two
being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
seabios not only set 0xfc, but also store the opregion there by copying
from fw_cfg.

Advantage of option one is that we'll keep the option to do things in a
different way in the future, without breaking the guest/qemu interface.

Disadvantage is that it'll cause hugepage mappings to be splitted.

Hmm.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02  8:56             ` Gerd Hoffmann
  0 siblings, 0 replies; 132+ messages in thread
From: Gerd Hoffmann @ 2016-02-02  8:56 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Alex Williamson, Cao jin, vfio-users

  Hi,

> > I'd have qemu copy the data on 0xfc write then, so things continue to
> > work without updating seabios.  So, the firmware has to allocate space,
> > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > sure the opregion appears at the address written by the firmware, by
> > whatever method it prefers.
> 
> Yup. It's Qemu's responsibility to expose opregion content. 
> 
> btw, prefer to do copying here. It's pointless to allow write from guest
> side. One write example is SWSCI mailbox, thru which gfx driver can
> trigger some SCI event to communicate with BIOS (specifically ACPI
> methods here), mostly for some monitor operations. However it's 
> not a right thing for guest to trigger host SCI and thus kick host 
> ACPI methods.

Thanks.

So, question again how we do that best.  Option one being the mmap way,
i.e. basically what the patches posted by alex are doing.  Option two
being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
seabios not only set 0xfc, but also store the opregion there by copying
from fw_cfg.

Advantage of option one is that we'll keep the option to do things in a
different way in the future, without breaking the guest/qemu interface.

Disadvantage is that it'll cause hugepage mappings to be splitted.

Hmm.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02  6:42               ` Tian, Kevin
@ 2016-02-02 11:50                 ` David Woodhouse
  -1 siblings, 0 replies; 132+ messages in thread
From: David Woodhouse @ 2016-02-02 11:50 UTC (permalink / raw)
  To: Tian, Kevin, Kay, Allen M, Alex Williamson, Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

On Tue, 2016-02-02 at 06:42 +0000, Tian, Kevin wrote:
> > From: Kay, Allen M
> > Sent: Tuesday, February 02, 2016 8:04 AM
> > > 
> > > David notes in the latter commit above:
> > > 
> > > "We should be able to successfully assign graphics devices to guests too, as
> > > long as the initial handling of stolen memory is reconfigured appropriately."
> > > 
> > > What code is supposed to be doing that reconfiguration when a device is
> > > assigned?  Clearly we don't have it yet, making assignment of these devices
> > > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> > > specific code to clear these settings to make it safe for userspace, then
> > > perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> > > revisions for doing this?  Is there a spec?
> > > Thanks,

I haven't ever successfully assigned an IGD device to a VM myself, but
my understanding was that it *has* been done. So when the code was
changed to prevent assignment of devices afflicted by RMRRs (except USB
where we know it's OK), I just added the integrated graphics to that
same exception as USB, to preserve the status quo ante.

> I don't think stolen memory should be handled explicitly. If yes, it should be
> listed as a RMRR region so general RMRR setup will cover it. But as Allen
> pointed out, the whole RMRR becomes unnecessary if we target only secondary
> device for IGD.

Perhaps the best option is *not* to have special cases in the IOMMU
code for "those devices which can safely be assigned despite RMRRs".

Instead, let's let the device driver — or whatever — tell the IOMMU
code when it's *stopped* the firmware from (ab)using the device's DMA
facilities.

So when the USB code does the handoff thing to quiesce the firmware's
access to USB and take over in the OS, it would call the IOMMU function
to revoke the RMRR for the USB controller.

And if/when the graphics driver resets its device into a state where
it's no longer accessing stolen memory and can be assigned to a VM, it
can also call that 'RMRR revoke' function.

Likewise, if we teach device drivers to cancel whatever abominations
the HP firmware tends to set up behind the OS's back on other PCI
devices, we can cancel the RMRRs for those too.

Then the IOMMU code has a simple choice and no special cases — we can
assign a device iff it has no active RMRR.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 11:50                 ` David Woodhouse
  0 siblings, 0 replies; 132+ messages in thread
From: David Woodhouse @ 2016-02-02 11:50 UTC (permalink / raw)
  To: Tian, Kevin, Kay, Allen M, Alex Williamson, Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

On Tue, 2016-02-02 at 06:42 +0000, Tian, Kevin wrote:
> > From: Kay, Allen M
> > Sent: Tuesday, February 02, 2016 8:04 AM
> > > 
> > > David notes in the latter commit above:
> > > 
> > > "We should be able to successfully assign graphics devices to guests too, as
> > > long as the initial handling of stolen memory is reconfigured appropriately."
> > > 
> > > What code is supposed to be doing that reconfiguration when a device is
> > > assigned?  Clearly we don't have it yet, making assignment of these devices
> > > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> > > specific code to clear these settings to make it safe for userspace, then
> > > perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> > > revisions for doing this?  Is there a spec?
> > > Thanks,

I haven't ever successfully assigned an IGD device to a VM myself, but
my understanding was that it *has* been done. So when the code was
changed to prevent assignment of devices afflicted by RMRRs (except USB
where we know it's OK), I just added the integrated graphics to that
same exception as USB, to preserve the status quo ante.

> I don't think stolen memory should be handled explicitly. If yes, it should be
> listed as a RMRR region so general RMRR setup will cover it. But as Allen
> pointed out, the whole RMRR becomes unnecessary if we target only secondary
> device for IGD.

Perhaps the best option is *not* to have special cases in the IOMMU
code for "those devices which can safely be assigned despite RMRRs".

Instead, let's let the device driver — or whatever — tell the IOMMU
code when it's *stopped* the firmware from (ab)using the device's DMA
facilities.

So when the USB code does the handoff thing to quiesce the firmware's
access to USB and take over in the OS, it would call the IOMMU function
to revoke the RMRR for the USB controller.

And if/when the graphics driver resets its device into a state where
it's no longer accessing stolen memory and can be assigned to a VM, it
can also call that 'RMRR revoke' function.

Likewise, if we teach device drivers to cancel whatever abominations
the HP firmware tends to set up behind the OS's back on other PCI
devices, we can cancel the RMRRs for those too.

Then the IOMMU code has a simple choice and no special cases — we can
assign a device iff it has no active RMRR.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02  0:04             ` Kay, Allen M
@ 2016-02-02 14:38               ` Alex Williamson
  -1 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-02-02 14:38 UTC (permalink / raw)
  To: Kay, Allen M, Gerd Hoffmann, David Woodhouse
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 2016-02-02 at 00:04 +0000, Kay, Allen M wrote:
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Sunday, January 31, 2016 9:42 AM
> > To: Kay, Allen M; Gerd Hoffmann; David Woodhouse
> > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > users@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > On Sat, 2016-01-30 at 01:18 +0000, Kay, Allen M wrote:
> > > 
> > > > -----Original Message-----
> > > > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> > > > Williamson
> > > > Sent: Friday, January 29, 2016 10:00 AM
> > > > To: Gerd Hoffmann
> > > > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo
> > > > Habkost; Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > > > users@redhat.com
> > > > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > > chipset tweaks
> > > > 
> > > > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently
> > > > testing for any Intel VGA device, but I wonder if I should only be
> > > > enabling anything opregion if it also appears at a specific address.
> > > > 
> > > 
> > > No.  Both Windows and Linux IGD driver should work at any PCI slot.  We
> > have seen 0:5.0 in the guest and the driver works.
> > 
> > Thanks Allen.  Another question, when I boot a VM with an assigned HD
> > P4000 GPU, my console stream with IOMMU faults, like:
> > 
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> > 
> > All of these fall within the host RMRR range for the device:
> > 
> > DMAR: Setting RMRR:
> > DMAR: Setting identity map for device 0000:00:02.0 [0x9f800000 - 0xaf9fffff]
> 
> Hi Alex,
> 
> Do you configure IGD as primary or secondary display in your KVM setup?   If primary, are you running Intel vBIOS as part of guest boot?
> 
> On BDW/SKL systems, we have started to configure IGD as secondary and QEMU VGA and primary.  In this setup, we are no longer running vBIOS in the guest which avoids some complications.  vBIOS uses
> stolen memory for display buffers which requires RMRR mapping.  We have been using similar setup (IGD as secondary) on other hypervisors and have not seen IOMMU faults.
> 
> I will setup a KVM configuration on my SKL and see if I can duplicate your problem here.   I will try to call into Don's Thursday meeting to discuss this (I'm on call for jury duty this week).  I
> will give you a heads up on Wednesday evening.

Hi Allen,

I'm currently trying to run as primary, but I don't get any output until
well into the guest boot, so clearly the Intel vBIOS is not happy,
regardless of whether I provide VGA region access.  When I try to run as
secondary I don't get any output at all on the assigned device and the
FC23 Live CD I'm booting doesn't appear to see the IGD output.  I've
only just started playing with actually using it though, so perhaps I
haven't dialed it in just yet.  I will note though that the DMAR faults
are well after the vBIOS would have been run, I see the i915 driver
reads the stolen memory base from config register 0x5c.  Emulating this
register as returning 0x0 avoids the DMAR faults and fixes corruption of
the framebuffer, so this doesn't appear to be exclusive to the vBIOS.

Regardless of which we intend to support, device assignment is an
advanced topic for most users and I think we need to do something to
protect users from having their VM memory stomped on by an IGD device
writing framebuffer data over RAM.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 14:38               ` Alex Williamson
  0 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-02-02 14:38 UTC (permalink / raw)
  To: Kay, Allen M, Gerd Hoffmann, David Woodhouse
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 2016-02-02 at 00:04 +0000, Kay, Allen M wrote:
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Sunday, January 31, 2016 9:42 AM
> > To: Kay, Allen M; Gerd Hoffmann; David Woodhouse
> > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > users@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > On Sat, 2016-01-30 at 01:18 +0000, Kay, Allen M wrote:
> > > 
> > > > -----Original Message-----
> > > > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of Alex
> > > > Williamson
> > > > Sent: Friday, January 29, 2016 10:00 AM
> > > > To: Gerd Hoffmann
> > > > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo
> > > > Habkost; Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > > > users@redhat.com
> > > > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > > chipset tweaks
> > > > 
> > > > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently
> > > > testing for any Intel VGA device, but I wonder if I should only be
> > > > enabling anything opregion if it also appears at a specific address.
> > > > 
> > > 
> > > No.  Both Windows and Linux IGD driver should work at any PCI slot.  We
> > have seen 0:5.0 in the guest and the driver works.
> > 
> > Thanks Allen.  Another question, when I boot a VM with an assigned HD
> > P4000 GPU, my console stream with IOMMU faults, like:
> > 
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa30000
> > 
> > All of these fall within the host RMRR range for the device:
> > 
> > DMAR: Setting RMRR:
> > DMAR: Setting identity map for device 0000:00:02.0 [0x9f800000 - 0xaf9fffff]
> 
> Hi Alex,
> 
> Do you configure IGD as primary or secondary display in your KVM setup?   If primary, are you running Intel vBIOS as part of guest boot?
> 
> On BDW/SKL systems, we have started to configure IGD as secondary and QEMU VGA and primary.  In this setup, we are no longer running vBIOS in the guest which avoids some complications.  vBIOS uses
> stolen memory for display buffers which requires RMRR mapping.  We have been using similar setup (IGD as secondary) on other hypervisors and have not seen IOMMU faults.
> 
> I will setup a KVM configuration on my SKL and see if I can duplicate your problem here.   I will try to call into Don's Thursday meeting to discuss this (I'm on call for jury duty this week).  I
> will give you a heads up on Wednesday evening.

Hi Allen,

I'm currently trying to run as primary, but I don't get any output until
well into the guest boot, so clearly the Intel vBIOS is not happy,
regardless of whether I provide VGA region access.  When I try to run as
secondary I don't get any output at all on the assigned device and the
FC23 Live CD I'm booting doesn't appear to see the IGD output.  I've
only just started playing with actually using it though, so perhaps I
haven't dialed it in just yet.  I will note though that the DMAR faults
are well after the vBIOS would have been run, I see the i915 driver
reads the stolen memory base from config register 0x5c.  Emulating this
register as returning 0x0 avoids the DMAR faults and fixes corruption of
the framebuffer, so this doesn't appear to be exclusive to the vBIOS.

Regardless of which we intend to support, device assignment is an
advanced topic for most users and I think we need to do something to
protect users from having their VM memory stomped on by an IGD device
writing framebuffer data over RAM.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02 11:50                 ` David Woodhouse
@ 2016-02-02 14:54                   ` Alex Williamson
  -1 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-02-02 14:54 UTC (permalink / raw)
  To: David Woodhouse, Tian, Kevin, Kay, Allen M, Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 2016-02-02 at 11:50 +0000, David Woodhouse wrote:
> On Tue, 2016-02-02 at 06:42 +0000, Tian, Kevin wrote:
> > > From: Kay, Allen M
> > > Sent: Tuesday, February 02, 2016 8:04 AM
> > > > 
> > > > David notes in the latter commit above:
> > > > 
> > > > "We should be able to successfully assign graphics devices to guests too, as
> > > > long as the initial handling of stolen memory is reconfigured appropriately."
> > > > 
> > > > What code is supposed to be doing that reconfiguration when a device is
> > > > assigned?  Clearly we don't have it yet, making assignment of these devices
> > > > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> > > > specific code to clear these settings to make it safe for userspace, then
> > > > perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> > > > revisions for doing this?  Is there a spec?
> > > > Thanks,
> 
> I haven't ever successfully assigned an IGD device to a VM myself, but
> my understanding was that it *has* been done. So when the code was
> changed to prevent assignment of devices afflicted by RMRRs (except USB
> where we know it's OK), I just added the integrated graphics to that
> same exception as USB, to preserve the status quo ante.

It had been successfully done on /Xen/, not on anything that actually made
use of that exclusion, so there was no status quo to preserve.

> > I don't think stolen memory should be handled explicitly. If yes, it should be
> > listed as a RMRR region so general RMRR setup will cover it. But as Allen
> > pointed out, the whole RMRR becomes unnecessary if we target only secondary
> > device for IGD.
> 
> Perhaps the best option is *not* to have special cases in the IOMMU
> code for "those devices which can safely be assigned despite RMRRs".
> 
> Instead, let's let the device driver — or whatever — tell the IOMMU
> code when it's *stopped* the firmware from (ab)using the device's DMA
> facilities.
> 
> So when the USB code does the handoff thing to quiesce the firmware's
> access to USB and take over in the OS, it would call the IOMMU function
> to revoke the RMRR for the USB controller.
> 
> And if/when the graphics driver resets its device into a state where
> it's no longer accessing stolen memory and can be assigned to a VM, it
> can also call that 'RMRR revoke' function.
> 
> Likewise, if we teach device drivers to cancel whatever abominations
> the HP firmware tends to set up behind the OS's back on other PCI
> devices, we can cancel the RMRRs for those too.
> 
> Then the IOMMU code has a simple choice and no special cases — we can
> assign a device iff it has no active RMRR.

I first glance I like it, but there's a problem, it assumes there is a
host driver for the device that will permanently release the device from
the RMRR even after the device is unbound.  Currently we don't have a
requirement that the user must first bind the device to a native host
driver, unbind it, and only then is it eligible for device assignment.
In fact with GPUs we often blacklist the native driver or attach them
directly to a stub driver to avoid the host driver.  Maybe that issue
works itself out since the IOMMU won't allow access to the device
without this step, but it means that i915 needs to be better than most
graphics drivers when it comes to unbinding the device (which is not a
very high bar).  Of course as I've shown on IGD, it's not simply a
matter of declaring the RMRR unused, some reconfiguration of the device
is necessary such that the guest driver doesn't try to start using that
same reserved range.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 14:54                   ` Alex Williamson
  0 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-02-02 14:54 UTC (permalink / raw)
  To: David Woodhouse, Tian, Kevin, Kay, Allen M, Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

On Tue, 2016-02-02 at 11:50 +0000, David Woodhouse wrote:
> On Tue, 2016-02-02 at 06:42 +0000, Tian, Kevin wrote:
> > > From: Kay, Allen M
> > > Sent: Tuesday, February 02, 2016 8:04 AM
> > > > 
> > > > David notes in the latter commit above:
> > > > 
> > > > "We should be able to successfully assign graphics devices to guests too, as
> > > > long as the initial handling of stolen memory is reconfigured appropriately."
> > > > 
> > > > What code is supposed to be doing that reconfiguration when a device is
> > > > assigned?  Clearly we don't have it yet, making assignment of these devices
> > > > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> > > > specific code to clear these settings to make it safe for userspace, then
> > > > perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> > > > revisions for doing this?  Is there a spec?
> > > > Thanks,
> 
> I haven't ever successfully assigned an IGD device to a VM myself, but
> my understanding was that it *has* been done. So when the code was
> changed to prevent assignment of devices afflicted by RMRRs (except USB
> where we know it's OK), I just added the integrated graphics to that
> same exception as USB, to preserve the status quo ante.

It had been successfully done on /Xen/, not on anything that actually made
use of that exclusion, so there was no status quo to preserve.

> > I don't think stolen memory should be handled explicitly. If yes, it should be
> > listed as a RMRR region so general RMRR setup will cover it. But as Allen
> > pointed out, the whole RMRR becomes unnecessary if we target only secondary
> > device for IGD.
> 
> Perhaps the best option is *not* to have special cases in the IOMMU
> code for "those devices which can safely be assigned despite RMRRs".
> 
> Instead, let's let the device driver — or whatever — tell the IOMMU
> code when it's *stopped* the firmware from (ab)using the device's DMA
> facilities.
> 
> So when the USB code does the handoff thing to quiesce the firmware's
> access to USB and take over in the OS, it would call the IOMMU function
> to revoke the RMRR for the USB controller.
> 
> And if/when the graphics driver resets its device into a state where
> it's no longer accessing stolen memory and can be assigned to a VM, it
> can also call that 'RMRR revoke' function.
> 
> Likewise, if we teach device drivers to cancel whatever abominations
> the HP firmware tends to set up behind the OS's back on other PCI
> devices, we can cancel the RMRRs for those too.
> 
> Then the IOMMU code has a simple choice and no special cases — we can
> assign a device iff it has no active RMRR.

I first glance I like it, but there's a problem, it assumes there is a
host driver for the device that will permanently release the device from
the RMRR even after the device is unbound.  Currently we don't have a
requirement that the user must first bind the device to a native host
driver, unbind it, and only then is it eligible for device assignment.
In fact with GPUs we often blacklist the native driver or attach them
directly to a stub driver to avoid the host driver.  Maybe that issue
works itself out since the IOMMU won't allow access to the device
without this step, but it means that i915 needs to be better than most
graphics drivers when it comes to unbinding the device (which is not a
very high bar).  Of course as I've shown on IGD, it's not simply a
matter of declaring the RMRR unused, some reconfiguration of the device
is necessary such that the guest driver doesn't try to start using that
same reserved range.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02 14:54                   ` Alex Williamson
@ 2016-02-02 15:06                     ` David Woodhouse
  -1 siblings, 0 replies; 132+ messages in thread
From: David Woodhouse @ 2016-02-02 15:06 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin, Kay, Allen M, Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

On Tue, 2016-02-02 at 07:54 -0700, Alex Williamson wrote:
> 
> I first glance I like it, but there's a problem, it assumes there is a
> host driver for the device that will permanently release the device from
> the RMRR even after the device is unbound.  Currently we don't have a
> requirement that the user must first bind the device to a native host
> driver, unbind it, and only then is it eligible for device assignment.

It doesn't *have* to be a full native driver. It can be a PCI quirk
(the USB controllers could potentially do it that way, although they
don't). Or a stub 'shut it down' driver, potentially even done somehow
through VFIO.

But fundamentally, in all of these cases you have to do *something* to
stop the BIOS-controlled DMA. Otherwise the RMRR shouldn't have been
there in the first place, surely?

But for the gfx case... what *do* we have to do? Does the VMM (and the
VM's BIOS, between them) have to provision a "stolen" region of guest
memory and point the gfx framebuffer at that?

Once we have a proper handle on precisely what needs to happen, we can
have a better conversation about where/how to do that...

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 15:06                     ` David Woodhouse
  0 siblings, 0 replies; 132+ messages in thread
From: David Woodhouse @ 2016-02-02 15:06 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin, Kay, Allen M, Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Cao jin, vfio-users

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

On Tue, 2016-02-02 at 07:54 -0700, Alex Williamson wrote:
> 
> I first glance I like it, but there's a problem, it assumes there is a
> host driver for the device that will permanently release the device from
> the RMRR even after the device is unbound.  Currently we don't have a
> requirement that the user must first bind the device to a native host
> driver, unbind it, and only then is it eligible for device assignment.

It doesn't *have* to be a full native driver. It can be a PCI quirk
(the USB controllers could potentially do it that way, although they
don't). Or a stub 'shut it down' driver, potentially even done somehow
through VFIO.

But fundamentally, in all of these cases you have to do *something* to
stop the BIOS-controlled DMA. Otherwise the RMRR shouldn't have been
there in the first place, surely?

But for the gfx case... what *do* we have to do? Does the VMM (and the
VM's BIOS, between them) have to provision a "stolen" region of guest
memory and point the gfx framebuffer at that?

Once we have a proper handle on precisely what needs to happen, we can
have a better conversation about where/how to do that...

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02  8:56             ` Gerd Hoffmann
@ 2016-02-02 16:31               ` Kevin O'Connor
  -1 siblings, 0 replies; 132+ messages in thread
From: Kevin O'Connor @ 2016-02-02 16:31 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, Tian, Kevin, xen-devel, Eduardo Habkost,
	Stefano Stabellini, seabios, qemu-devel, Alex Williamson,
	Cao jin, vfio-users, Laszlo Ersek

On Tue, Feb 02, 2016 at 09:56:20AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > > I'd have qemu copy the data on 0xfc write then, so things continue to
> > > work without updating seabios.  So, the firmware has to allocate space,
> > > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > > sure the opregion appears at the address written by the firmware, by
> > > whatever method it prefers.
> > 
> > Yup. It's Qemu's responsibility to expose opregion content. 
> > 
> > btw, prefer to do copying here. It's pointless to allow write from guest
> > side. One write example is SWSCI mailbox, thru which gfx driver can
> > trigger some SCI event to communicate with BIOS (specifically ACPI
> > methods here), mostly for some monitor operations. However it's 
> > not a right thing for guest to trigger host SCI and thus kick host 
> > ACPI methods.
> 
> Thanks.
> 
> So, question again how we do that best.  Option one being the mmap way,
> i.e. basically what the patches posted by alex are doing.  Option two
> being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
> seabios not only set 0xfc, but also store the opregion there by copying
> from fw_cfg.

What about option 2a - SeaBIOS copies from fw_cfg to memory and then
programs 0xfc.  QEMU can detect the write to 0xfc and choose to map
that ram (thus completely ignoring the contents that were just copied
in) or it can choose not to map that ram (thus guest uses the contents
just copied in).

The advantage of this approach is that it is a bit simpler in the
firmware (no size probing is needed as the size comes from fw_cfg) and
it allows for future flexibility as the choice of mapping can be
deferred.

Totally untested seabios code below as example.

As an aside, if this type of "program a pci register" with a memory
address becomes common, we could enhance the acpi-style "linker
script" system to automate this..

-Kevin


static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
{
    struct romfile_s *file = romfile_find("etc/igd-opregion");
    if (!file)
        return;
    void *data = memalign_high(PAGE_SIZE, file->size);
    if (!data) {
        warn_noalloc();
        return;
    }
    int ret = file->copy(file, data, file->size);
    if (ret < 0) {
        free(data);
        return;
    }
    pci_config_writel(dev->bdf, 0xFC, (u32)data);
}

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 16:31               ` Kevin O'Connor
  0 siblings, 0 replies; 132+ messages in thread
From: Kevin O'Connor @ 2016-02-02 16:31 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, Tian, Kevin, xen-devel, Eduardo Habkost,
	Stefano Stabellini, seabios, qemu-devel, Alex Williamson,
	Cao jin, vfio-users, Laszlo Ersek

On Tue, Feb 02, 2016 at 09:56:20AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > > I'd have qemu copy the data on 0xfc write then, so things continue to
> > > work without updating seabios.  So, the firmware has to allocate space,
> > > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > > sure the opregion appears at the address written by the firmware, by
> > > whatever method it prefers.
> > 
> > Yup. It's Qemu's responsibility to expose opregion content. 
> > 
> > btw, prefer to do copying here. It's pointless to allow write from guest
> > side. One write example is SWSCI mailbox, thru which gfx driver can
> > trigger some SCI event to communicate with BIOS (specifically ACPI
> > methods here), mostly for some monitor operations. However it's 
> > not a right thing for guest to trigger host SCI and thus kick host 
> > ACPI methods.
> 
> Thanks.
> 
> So, question again how we do that best.  Option one being the mmap way,
> i.e. basically what the patches posted by alex are doing.  Option two
> being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
> seabios not only set 0xfc, but also store the opregion there by copying
> from fw_cfg.

What about option 2a - SeaBIOS copies from fw_cfg to memory and then
programs 0xfc.  QEMU can detect the write to 0xfc and choose to map
that ram (thus completely ignoring the contents that were just copied
in) or it can choose not to map that ram (thus guest uses the contents
just copied in).

The advantage of this approach is that it is a bit simpler in the
firmware (no size probing is needed as the size comes from fw_cfg) and
it allows for future flexibility as the choice of mapping can be
deferred.

Totally untested seabios code below as example.

As an aside, if this type of "program a pci register" with a memory
address becomes common, we could enhance the acpi-style "linker
script" system to automate this..

-Kevin


static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
{
    struct romfile_s *file = romfile_find("etc/igd-opregion");
    if (!file)
        return;
    void *data = memalign_high(PAGE_SIZE, file->size);
    if (!data) {
        warn_noalloc();
        return;
    }
    int ret = file->copy(file, data, file->size);
    if (ret < 0) {
        free(data);
        return;
    }
    pci_config_writel(dev->bdf, 0xFC, (u32)data);
}

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02 16:31               ` Kevin O'Connor
@ 2016-02-02 16:49                 ` Laszlo Ersek
  -1 siblings, 0 replies; 132+ messages in thread
From: Laszlo Ersek @ 2016-02-02 16:49 UTC (permalink / raw)
  To: Kevin O'Connor, Gerd Hoffmann
  Cc: igvt-g, Tian, Kevin, xen-devel, Eduardo Habkost,
	Stefano Stabellini, Michael S. Tsirkin, seabios, qemu-devel,
	Alex Williamson, Cao jin, vfio-users, Igor Mammedov

Including Igor & MST

Thanks
Laszlo

On 02/02/16 17:31, Kevin O'Connor wrote:
> On Tue, Feb 02, 2016 at 09:56:20AM +0100, Gerd Hoffmann wrote:
>>   Hi,
>>
>>>> I'd have qemu copy the data on 0xfc write then, so things continue to
>>>> work without updating seabios.  So, the firmware has to allocate space,
>>>> reserve it etc.,  and programming the 0xfc register.  Qemu has to make
>>>> sure the opregion appears at the address written by the firmware, by
>>>> whatever method it prefers.
>>>
>>> Yup. It's Qemu's responsibility to expose opregion content. 
>>>
>>> btw, prefer to do copying here. It's pointless to allow write from guest
>>> side. One write example is SWSCI mailbox, thru which gfx driver can
>>> trigger some SCI event to communicate with BIOS (specifically ACPI
>>> methods here), mostly for some monitor operations. However it's 
>>> not a right thing for guest to trigger host SCI and thus kick host 
>>> ACPI methods.
>>
>> Thanks.
>>
>> So, question again how we do that best.  Option one being the mmap way,
>> i.e. basically what the patches posted by alex are doing.  Option two
>> being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
>> seabios not only set 0xfc, but also store the opregion there by copying
>> from fw_cfg.
> 
> What about option 2a - SeaBIOS copies from fw_cfg to memory and then
> programs 0xfc.  QEMU can detect the write to 0xfc and choose to map
> that ram (thus completely ignoring the contents that were just copied
> in) or it can choose not to map that ram (thus guest uses the contents
> just copied in).
> 
> The advantage of this approach is that it is a bit simpler in the
> firmware (no size probing is needed as the size comes from fw_cfg) and
> it allows for future flexibility as the choice of mapping can be
> deferred.
> 
> Totally untested seabios code below as example.
> 
> As an aside, if this type of "program a pci register" with a memory
> address becomes common, we could enhance the acpi-style "linker
> script" system to automate this..
> 
> -Kevin
> 
> 
> static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
> {
>     struct romfile_s *file = romfile_find("etc/igd-opregion");
>     if (!file)
>         return;
>     void *data = memalign_high(PAGE_SIZE, file->size);
>     if (!data) {
>         warn_noalloc();
>         return;
>     }
>     int ret = file->copy(file, data, file->size);
>     if (ret < 0) {
>         free(data);
>         return;
>     }
>     pci_config_writel(dev->bdf, 0xFC, (u32)data);
> }
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 16:49                 ` Laszlo Ersek
  0 siblings, 0 replies; 132+ messages in thread
From: Laszlo Ersek @ 2016-02-02 16:49 UTC (permalink / raw)
  To: Kevin O'Connor, Gerd Hoffmann
  Cc: igvt-g, Tian, Kevin, xen-devel, Eduardo Habkost,
	Stefano Stabellini, Michael S. Tsirkin, seabios, qemu-devel,
	Alex Williamson, Cao jin, vfio-users, Igor Mammedov

Including Igor & MST

Thanks
Laszlo

On 02/02/16 17:31, Kevin O'Connor wrote:
> On Tue, Feb 02, 2016 at 09:56:20AM +0100, Gerd Hoffmann wrote:
>>   Hi,
>>
>>>> I'd have qemu copy the data on 0xfc write then, so things continue to
>>>> work without updating seabios.  So, the firmware has to allocate space,
>>>> reserve it etc.,  and programming the 0xfc register.  Qemu has to make
>>>> sure the opregion appears at the address written by the firmware, by
>>>> whatever method it prefers.
>>>
>>> Yup. It's Qemu's responsibility to expose opregion content. 
>>>
>>> btw, prefer to do copying here. It's pointless to allow write from guest
>>> side. One write example is SWSCI mailbox, thru which gfx driver can
>>> trigger some SCI event to communicate with BIOS (specifically ACPI
>>> methods here), mostly for some monitor operations. However it's 
>>> not a right thing for guest to trigger host SCI and thus kick host 
>>> ACPI methods.
>>
>> Thanks.
>>
>> So, question again how we do that best.  Option one being the mmap way,
>> i.e. basically what the patches posted by alex are doing.  Option two
>> being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
>> seabios not only set 0xfc, but also store the opregion there by copying
>> from fw_cfg.
> 
> What about option 2a - SeaBIOS copies from fw_cfg to memory and then
> programs 0xfc.  QEMU can detect the write to 0xfc and choose to map
> that ram (thus completely ignoring the contents that were just copied
> in) or it can choose not to map that ram (thus guest uses the contents
> just copied in).
> 
> The advantage of this approach is that it is a bit simpler in the
> firmware (no size probing is needed as the size comes from fw_cfg) and
> it allows for future flexibility as the choice of mapping can be
> deferred.
> 
> Totally untested seabios code below as example.
> 
> As an aside, if this type of "program a pci register" with a memory
> address becomes common, we could enhance the acpi-style "linker
> script" system to automate this..
> 
> -Kevin
> 
> 
> static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
> {
>     struct romfile_s *file = romfile_find("etc/igd-opregion");
>     if (!file)
>         return;
>     void *data = memalign_high(PAGE_SIZE, file->size);
>     if (!data) {
>         warn_noalloc();
>         return;
>     }
>     int ret = file->copy(file, data, file->size);
>     if (ret < 0) {
>         free(data);
>         return;
>     }
>     pci_config_writel(dev->bdf, 0xFC, (u32)data);
> }
> 

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 19:10             ` Kay, Allen M
  0 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-02-02 19:10 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users



> -----Original Message-----
> From: Tian, Kevin
> Sent: Monday, February 01, 2016 11:08 PM
> To: Kay, Allen M; Alex Williamson; Gerd Hoffmann; qemu-devel@nongnu.org
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-users@redhat.com
> Subject: RE: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> > From: Kay, Allen M
> > Sent: Saturday, January 30, 2016 5:58 AM
> >
> > First of all, I would like to clarify I'm talking about general IGD
> > passthrough case - not specific to KVMGT.  In IGD passthrough
> > configuration, one of the following will happen when the driver accesses
> OpRegion:
> >
> > 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by
> > pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM),
> > guest can successfully read the content of the OpRegion and check the ID
> string.  In this case, everything works fine.
> >
> > 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all,
> > then guest driver's attempt to setup GVA/GPA mapping will fail, which
> > causes the driver to fail.  In this case, guest driver won't have the
> > opportunity to look into the content of OpRegion memory and check the ID
> string.
> >
> 
> Guest mapping of GVA->GPA can always succeed regardless of whether
> GPA->HPA is valid. Failure will happen only when the GVA is actually
> accessed by guest.
> 

That is the data from team debugged IGD passthrough on a closed source hypervisor that does not map OpRegion with EPT.  The end result is the same -driver cannot access inside of OpRegion without failing.

> I don't understand 2). If hypervisor doesn't want to setup mapping, there is
> no chance for guest driver to get opregion content, right?

That was precisely the point I was trying to make.  As a result, guest driver needs some indication from the hypervisor that the address at 0xFC contains GPA that can be safely accessed by the driver without causing unrecoverable failure on hypervisors that does not map OpRegion - by leaving HPA address at 0xFC.

Allen

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 19:10             ` Kay, Allen M
  0 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-02-02 19:10 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson, Gerd Hoffmann,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Eduardo Habkost,
	Stefano Stabellini, Cao jin, vfio-users-H+wXaHxf7aLQT0dZR+AlfA



> -----Original Message-----
> From: Tian, Kevin
> Sent: Monday, February 01, 2016 11:08 PM
> To: Kay, Allen M; Alex Williamson; Gerd Hoffmann; qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org
> Cc: igvt-g-y27Ovi1pjclAfugRpC6u6w@public.gmane.org; xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR@public.gmane.org; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-users-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> Subject: RE: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> > From: Kay, Allen M
> > Sent: Saturday, January 30, 2016 5:58 AM
> >
> > First of all, I would like to clarify I'm talking about general IGD
> > passthrough case - not specific to KVMGT.  In IGD passthrough
> > configuration, one of the following will happen when the driver accesses
> OpRegion:
> >
> > 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by
> > pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM),
> > guest can successfully read the content of the OpRegion and check the ID
> string.  In this case, everything works fine.
> >
> > 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all,
> > then guest driver's attempt to setup GVA/GPA mapping will fail, which
> > causes the driver to fail.  In this case, guest driver won't have the
> > opportunity to look into the content of OpRegion memory and check the ID
> string.
> >
> 
> Guest mapping of GVA->GPA can always succeed regardless of whether
> GPA->HPA is valid. Failure will happen only when the GVA is actually
> accessed by guest.
> 

That is the data from team debugged IGD passthrough on a closed source hypervisor that does not map OpRegion with EPT.  The end result is the same -driver cannot access inside of OpRegion without failing.

> I don't understand 2). If hypervisor doesn't want to setup mapping, there is
> no chance for guest driver to get opregion content, right?

That was precisely the point I was trying to make.  As a result, guest driver needs some indication from the hypervisor that the address at 0xFC contains GPA that can be safely accessed by the driver without causing unrecoverable failure on hypervisors that does not map OpRegion - by leaving HPA address at 0xFC.

Allen

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02 19:10             ` [iGVT-g] " Kay, Allen M
@ 2016-02-02 19:37               ` Alex Williamson
  -1 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-02-02 19:37 UTC (permalink / raw)
  To: Kay, Allen M, Tian, Kevin, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users

On Tue, 2016-02-02 at 19:10 +0000, Kay, Allen M wrote:
> 
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Monday, February 01, 2016 11:08 PM
> > To: Kay, Allen M; Alex Williamson; Gerd Hoffmann; qemu-devel@nongnu.org
> > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; Cao jin; vfio-users@redhat.com
> > Subject: RE: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > > From: Kay, Allen M
> > > Sent: Saturday, January 30, 2016 5:58 AM
> > > 
> > > First of all, I would like to clarify I'm talking about general IGD
> > > passthrough case - not specific to KVMGT.  In IGD passthrough
> > > configuration, one of the following will happen when the driver accesses
> > OpRegion:
> > > 
> > > 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by
> > > pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM),
> > > guest can successfully read the content of the OpRegion and check the ID
> > string.  In this case, everything works fine.
> > > 
> > > 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all,
> > > then guest driver's attempt to setup GVA/GPA mapping will fail, which
> > > causes the driver to fail.  In this case, guest driver won't have the
> > > opportunity to look into the content of OpRegion memory and check the ID
> > string.
> > > 
> > 
> > Guest mapping of GVA->GPA can always succeed regardless of whether
> > GPA->HPA is valid. Failure will happen only when the GVA is actually
> > accessed by guest.
> > 

Hi Allen,

> That is the data from team debugged IGD passthrough on a closed source hypervisor that does not map OpRegion with EPT.  The end result is the same -driver cannot access inside of OpRegion without
> failing.

Define "failing".

> > I don't understand 2). If hypervisor doesn't want to setup mapping, there is
> > no chance for guest driver to get opregion content, right?
> 
> That was precisely the point I was trying to make.  As a result, guest driver needs some indication from the hypervisor that the address at 0xFC contains GPA that can be safely accessed by the
> driver without causing unrecoverable failure on hypervisors that does not map OpRegion - by leaving HPA address at 0xFC.

I think the thing that doesn't make sense to everyone here is that it's
common practice for x86 systems, especially legacy OSes, to probe
memory, get back -1 and move on.  A hypervisor should support that.  So
if there's a bogus address in the ASL Storage register and the driver
tries to read from the GPA indicated by that address, the VM should at
worst get back -1 or a memory space that doesn't contain the graphics
signature.  If there's a super strict hypervisor that doesn't handle the
VM faulting outside of it's address space, that's very prone to exploit.
If a driver wants to avoid it anyway, perhaps they should be doing
standard things like checking whether the ASL Storage address falls
within a reserved memory region rather than coming up with ad-hoc
register content based solutions.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 19:37               ` Alex Williamson
  0 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-02-02 19:37 UTC (permalink / raw)
  To: Kay, Allen M, Tian, Kevin, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users

On Tue, 2016-02-02 at 19:10 +0000, Kay, Allen M wrote:
> 
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Monday, February 01, 2016 11:08 PM
> > To: Kay, Allen M; Alex Williamson; Gerd Hoffmann; qemu-devel@nongnu.org
> > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; Cao jin; vfio-users@redhat.com
> > Subject: RE: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > > From: Kay, Allen M
> > > Sent: Saturday, January 30, 2016 5:58 AM
> > > 
> > > First of all, I would like to clarify I'm talking about general IGD
> > > passthrough case - not specific to KVMGT.  In IGD passthrough
> > > configuration, one of the following will happen when the driver accesses
> > OpRegion:
> > > 
> > > 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by
> > > pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM),
> > > guest can successfully read the content of the OpRegion and check the ID
> > string.  In this case, everything works fine.
> > > 
> > > 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all,
> > > then guest driver's attempt to setup GVA/GPA mapping will fail, which
> > > causes the driver to fail.  In this case, guest driver won't have the
> > > opportunity to look into the content of OpRegion memory and check the ID
> > string.
> > > 
> > 
> > Guest mapping of GVA->GPA can always succeed regardless of whether
> > GPA->HPA is valid. Failure will happen only when the GVA is actually
> > accessed by guest.
> > 

Hi Allen,

> That is the data from team debugged IGD passthrough on a closed source hypervisor that does not map OpRegion with EPT.  The end result is the same -driver cannot access inside of OpRegion without
> failing.

Define "failing".

> > I don't understand 2). If hypervisor doesn't want to setup mapping, there is
> > no chance for guest driver to get opregion content, right?
> 
> That was precisely the point I was trying to make.  As a result, guest driver needs some indication from the hypervisor that the address at 0xFC contains GPA that can be safely accessed by the
> driver without causing unrecoverable failure on hypervisors that does not map OpRegion - by leaving HPA address at 0xFC.

I think the thing that doesn't make sense to everyone here is that it's
common practice for x86 systems, especially legacy OSes, to probe
memory, get back -1 and move on.  A hypervisor should support that.  So
if there's a bogus address in the ASL Storage register and the driver
tries to read from the GPA indicated by that address, the VM should at
worst get back -1 or a memory space that doesn't contain the graphics
signature.  If there's a super strict hypervisor that doesn't handle the
VM faulting outside of it's address space, that's very prone to exploit.
If a driver wants to avoid it anyway, perhaps they should be doing
standard things like checking whether the ASL Storage address falls
within a reserved memory region rather than coming up with ad-hoc
register content based solutions.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02 16:31               ` Kevin O'Connor
@ 2016-02-02 20:18                 ` Alex Williamson
  -1 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-02-02 20:18 UTC (permalink / raw)
  To: Kevin O'Connor, Gerd Hoffmann
  Cc: igvt-g, Tian, Kevin, xen-devel, Eduardo Habkost,
	Stefano Stabellini, seabios, qemu-devel, Cao jin, vfio-users,
	Laszlo Ersek

On Tue, 2016-02-02 at 11:31 -0500, Kevin O'Connor wrote:
> On Tue, Feb 02, 2016 at 09:56:20AM +0100, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > > I'd have qemu copy the data on 0xfc write then, so things continue to
> > > > work without updating seabios.  So, the firmware has to allocate space,
> > > > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > > > sure the opregion appears at the address written by the firmware, by
> > > > whatever method it prefers.
> > > 
> > > Yup. It's Qemu's responsibility to expose opregion content. 
> > > 
> > > btw, prefer to do copying here. It's pointless to allow write from guest
> > > side. One write example is SWSCI mailbox, thru which gfx driver can
> > > trigger some SCI event to communicate with BIOS (specifically ACPI
> > > methods here), mostly for some monitor operations. However it's 
> > > not a right thing for guest to trigger host SCI and thus kick host 
> > > ACPI methods.
> > 
> > Thanks.
> > 
> > So, question again how we do that best.  Option one being the mmap way,
> > i.e. basically what the patches posted by alex are doing.  Option two
> > being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
> > seabios not only set 0xfc, but also store the opregion there by copying
> > from fw_cfg.
> 
> What about option 2a - SeaBIOS copies from fw_cfg to memory and then
> programs 0xfc.  QEMU can detect the write to 0xfc and choose to map
> that ram (thus completely ignoring the contents that were just copied
> in) or it can choose not to map that ram (thus guest uses the contents
> just copied in).
> 
> The advantage of this approach is that it is a bit simpler in the
> firmware (no size probing is needed as the size comes from fw_cfg) and
> it allows for future flexibility as the choice of mapping can be
> deferred.
> 
> Totally untested seabios code below as example.
> 
> As an aside, if this type of "program a pci register" with a memory
> address becomes common, we could enhance the acpi-style "linker
> script" system to automate this..
> 
> -Kevin
> 
> 
> static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
> {
>     struct romfile_s *file = romfile_find("etc/igd-opregion");
>     if (!file)
>         return;
>     void *data = memalign_high(PAGE_SIZE, file->size);
>     if (!data) {
>         warn_noalloc();
>         return;
>     }
>     int ret = file->copy(file, data, file->size);
>     if (ret < 0) {
>         free(data);
>         return;
>     }
>     pci_config_writel(dev->bdf, 0xFC, (u32)data);
> }

I posted a v2 of the last QEMU patch and the SeaBIOS patch that takes
this approach with an option in QEMU for the direct map.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 20:18                 ` Alex Williamson
  0 siblings, 0 replies; 132+ messages in thread
From: Alex Williamson @ 2016-02-02 20:18 UTC (permalink / raw)
  To: Kevin O'Connor, Gerd Hoffmann
  Cc: igvt-g, Tian, Kevin, xen-devel, Eduardo Habkost,
	Stefano Stabellini, seabios, qemu-devel, Cao jin, vfio-users,
	Laszlo Ersek

On Tue, 2016-02-02 at 11:31 -0500, Kevin O'Connor wrote:
> On Tue, Feb 02, 2016 at 09:56:20AM +0100, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > > I'd have qemu copy the data on 0xfc write then, so things continue to
> > > > work without updating seabios.  So, the firmware has to allocate space,
> > > > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > > > sure the opregion appears at the address written by the firmware, by
> > > > whatever method it prefers.
> > > 
> > > Yup. It's Qemu's responsibility to expose opregion content. 
> > > 
> > > btw, prefer to do copying here. It's pointless to allow write from guest
> > > side. One write example is SWSCI mailbox, thru which gfx driver can
> > > trigger some SCI event to communicate with BIOS (specifically ACPI
> > > methods here), mostly for some monitor operations. However it's 
> > > not a right thing for guest to trigger host SCI and thus kick host 
> > > ACPI methods.
> > 
> > Thanks.
> > 
> > So, question again how we do that best.  Option one being the mmap way,
> > i.e. basically what the patches posted by alex are doing.  Option two
> > being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
> > seabios not only set 0xfc, but also store the opregion there by copying
> > from fw_cfg.
> 
> What about option 2a - SeaBIOS copies from fw_cfg to memory and then
> programs 0xfc.  QEMU can detect the write to 0xfc and choose to map
> that ram (thus completely ignoring the contents that were just copied
> in) or it can choose not to map that ram (thus guest uses the contents
> just copied in).
> 
> The advantage of this approach is that it is a bit simpler in the
> firmware (no size probing is needed as the size comes from fw_cfg) and
> it allows for future flexibility as the choice of mapping can be
> deferred.
> 
> Totally untested seabios code below as example.
> 
> As an aside, if this type of "program a pci register" with a memory
> address becomes common, we could enhance the acpi-style "linker
> script" system to automate this..
> 
> -Kevin
> 
> 
> static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
> {
>     struct romfile_s *file = romfile_find("etc/igd-opregion");
>     if (!file)
>         return;
>     void *data = memalign_high(PAGE_SIZE, file->size);
>     if (!data) {
>         warn_noalloc();
>         return;
>     }
>     int ret = file->copy(file, data, file->size);
>     if (ret < 0) {
>         free(data);
>         return;
>     }
>     pci_config_writel(dev->bdf, 0xFC, (u32)data);
> }

I posted a v2 of the last QEMU patch and the SeaBIOS patch that takes
this approach with an option in QEMU for the direct map.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02 19:37               ` Alex Williamson
@ 2016-02-02 23:32                 ` Kay, Allen M
  -1 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-02-02 23:32 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Tuesday, February 02, 2016 11:37 AM
> To: Kay, Allen M; Tian, Kevin; Gerd Hoffmann; qemu-devel@nongnu.org
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-users@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> On Tue, 2016-02-02 at 19:10 +0000, Kay, Allen M wrote:
> >
> > > -----Original Message-----
> > > From: Tian, Kevin
> > > Sent: Monday, February 01, 2016 11:08 PM
> > > To: Kay, Allen M; Alex Williamson; Gerd Hoffmann;
> > > qemu-devel@nongnu.org
> > > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo
> > > Habkost; Stefano Stabellini; Cao jin; vfio-users@redhat.com
> > > Subject: RE: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > chipset tweaks
> > >
> > > > From: Kay, Allen M
> > > > Sent: Saturday, January 30, 2016 5:58 AM
> > > >
> > > > First of all, I would like to clarify I'm talking about general
> > > > IGD passthrough case - not specific to KVMGT.  In IGD passthrough
> > > > configuration, one of the following will happen when the driver
> > > > accesses
> > > OpRegion:
> > > >
> > > > 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by
> > > > pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM),
> > > > guest can successfully read the content of the OpRegion and check
> > > > the ID
> > > string.  In this case, everything works fine.
> > > >
> > > > 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at
> > > > all, then guest driver's attempt to setup GVA/GPA mapping will
> > > > fail, which causes the driver to fail.  In this case, guest driver
> > > > won't have the opportunity to look into the content of OpRegion
> > > > memory and check the ID
> > > string.
> > > >
> > >
> > > Guest mapping of GVA->GPA can always succeed regardless of whether
> > > GPA->HPA is valid. Failure will happen only when the GVA is actually
> > > accessed by guest.
> > >
> 
> Hi Allen,
> 
> > That is the data from team debugged IGD passthrough on a closed source
> > hypervisor that does not map OpRegion with EPT.  The end result is the
> same -driver cannot access inside of OpRegion without failing.
> 
> Define "failing".
> 

Hi Alex,

The reported behavior is OpRegion mapping in the guest fail which caused driver fail to load.  However, I think what you described below is reasonable.  I will take a close look at it after I get my KVM environment setup.

Allen

> > > I don't understand 2). If hypervisor doesn't want to setup mapping,
> > > there is no chance for guest driver to get opregion content, right?
> >
> > That was precisely the point I was trying to make.  As a result, guest
> > driver needs some indication from the hypervisor that the address at 0xFC
> contains GPA that can be safely accessed by the driver without causing
> unrecoverable failure on hypervisors that does not map OpRegion - by
> leaving HPA address at 0xFC.
> 
> I think the thing that doesn't make sense to everyone here is that it's
> common practice for x86 systems, especially legacy OSes, to probe memory,
> get back -1 and move on.  A hypervisor should support that.  So if there's a
> bogus address in the ASL Storage register and the driver tries to read from
> the GPA indicated by that address, the VM should at worst get back -1 or a
> memory space that doesn't contain the graphics signature.  
> If there's a super strict hypervisor that doesn't handle the VM faulting outside of it's address
> space, that's very prone to exploit.
> If a driver wants to avoid it anyway, perhaps they should be doing standard
> things like checking whether the ASL Storage address falls within a reserved
> memory region rather than coming up with ad-hoc register content based
> solutions.  Thanks,
> 
> Alex


^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-02 23:32                 ` Kay, Allen M
  0 siblings, 0 replies; 132+ messages in thread
From: Kay, Allen M @ 2016-02-02 23:32 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin, Gerd Hoffmann, qemu-devel
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini, Cao jin,
	vfio-users



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Tuesday, February 02, 2016 11:37 AM
> To: Kay, Allen M; Tian, Kevin; Gerd Hoffmann; qemu-devel@nongnu.org
> Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-users@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> On Tue, 2016-02-02 at 19:10 +0000, Kay, Allen M wrote:
> >
> > > -----Original Message-----
> > > From: Tian, Kevin
> > > Sent: Monday, February 01, 2016 11:08 PM
> > > To: Kay, Allen M; Alex Williamson; Gerd Hoffmann;
> > > qemu-devel@nongnu.org
> > > Cc: igvt-g@ml01.01.org; xen-devel@lists.xensource.com; Eduardo
> > > Habkost; Stefano Stabellini; Cao jin; vfio-users@redhat.com
> > > Subject: RE: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > chipset tweaks
> > >
> > > > From: Kay, Allen M
> > > > Sent: Saturday, January 30, 2016 5:58 AM
> > > >
> > > > First of all, I would like to clarify I'm talking about general
> > > > IGD passthrough case - not specific to KVMGT.  In IGD passthrough
> > > > configuration, one of the following will happen when the driver
> > > > accesses
> > > OpRegion:
> > > >
> > > > 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by
> > > > pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM),
> > > > guest can successfully read the content of the OpRegion and check
> > > > the ID
> > > string.  In this case, everything works fine.
> > > >
> > > > 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at
> > > > all, then guest driver's attempt to setup GVA/GPA mapping will
> > > > fail, which causes the driver to fail.  In this case, guest driver
> > > > won't have the opportunity to look into the content of OpRegion
> > > > memory and check the ID
> > > string.
> > > >
> > >
> > > Guest mapping of GVA->GPA can always succeed regardless of whether
> > > GPA->HPA is valid. Failure will happen only when the GVA is actually
> > > accessed by guest.
> > >
> 
> Hi Allen,
> 
> > That is the data from team debugged IGD passthrough on a closed source
> > hypervisor that does not map OpRegion with EPT.  The end result is the
> same -driver cannot access inside of OpRegion without failing.
> 
> Define "failing".
> 

Hi Alex,

The reported behavior is OpRegion mapping in the guest fail which caused driver fail to load.  However, I think what you described below is reasonable.  I will take a close look at it after I get my KVM environment setup.

Allen

> > > I don't understand 2). If hypervisor doesn't want to setup mapping,
> > > there is no chance for guest driver to get opregion content, right?
> >
> > That was precisely the point I was trying to make.  As a result, guest
> > driver needs some indication from the hypervisor that the address at 0xFC
> contains GPA that can be safely accessed by the driver without causing
> unrecoverable failure on hypervisors that does not map OpRegion - by
> leaving HPA address at 0xFC.
> 
> I think the thing that doesn't make sense to everyone here is that it's
> common practice for x86 systems, especially legacy OSes, to probe memory,
> get back -1 and move on.  A hypervisor should support that.  So if there's a
> bogus address in the ASL Storage register and the driver tries to read from
> the GPA indicated by that address, the VM should at worst get back -1 or a
> memory space that doesn't contain the graphics signature.  
> If there's a super strict hypervisor that doesn't handle the VM faulting outside of it's address
> space, that's very prone to exploit.
> If a driver wants to avoid it anyway, perhaps they should be doing standard
> things like checking whether the ASL Storage address falls within a reserved
> memory region rather than coming up with ad-hoc register content based
> solutions.  Thanks,
> 
> Alex


^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
  2016-02-02  8:56             ` Gerd Hoffmann
@ 2016-02-03  6:08               ` Tian, Kevin
  -1 siblings, 0 replies; 132+ messages in thread
From: Tian, Kevin @ 2016-02-03  6:08 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Alex Williamson, Cao jin, vfio-users

> From: Gerd Hoffmann [mailto:kraxel@redhat.com]
> Sent: Tuesday, February 02, 2016 4:56 PM
> 
>   Hi,
> 
> > > I'd have qemu copy the data on 0xfc write then, so things continue to
> > > work without updating seabios.  So, the firmware has to allocate space,
> > > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > > sure the opregion appears at the address written by the firmware, by
> > > whatever method it prefers.
> >
> > Yup. It's Qemu's responsibility to expose opregion content.
> >
> > btw, prefer to do copying here. It's pointless to allow write from guest
> > side. One write example is SWSCI mailbox, thru which gfx driver can
> > trigger some SCI event to communicate with BIOS (specifically ACPI
> > methods here), mostly for some monitor operations. However it's
> > not a right thing for guest to trigger host SCI and thus kick host
> > ACPI methods.
> 
> Thanks.
> 
> So, question again how we do that best.  Option one being the mmap way,
> i.e. basically what the patches posted by alex are doing.  Option two
> being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
> seabios not only set 0xfc, but also store the opregion there by copying
> from fw_cfg.
> 
> Advantage of option one is that we'll keep the option to do things in a
> different way in the future, without breaking the guest/qemu interface.
> 
> Disadvantage is that it'll cause hugepage mappings to be splitted.
> 

based on where you pick up the gfn to map or copy opregion. If you look
at physical, it's usually close to mmio region where several other reserved
e820 entries also exist. If we do same for virtual opregion, it shouldn't
impact hugepage.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 132+ messages in thread

* Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
@ 2016-02-03  6:08               ` Tian, Kevin
  0 siblings, 0 replies; 132+ messages in thread
From: Tian, Kevin @ 2016-02-03  6:08 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: igvt-g, xen-devel, Eduardo Habkost, Stefano Stabellini,
	qemu-devel, Alex Williamson, Cao jin, vfio-users

> From: Gerd Hoffmann [mailto:kraxel@redhat.com]
> Sent: Tuesday, February 02, 2016 4:56 PM
> 
>   Hi,
> 
> > > I'd have qemu copy the data on 0xfc write then, so things continue to
> > > work without updating seabios.  So, the firmware has to allocate space,
> > > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > > sure the opregion appears at the address written by the firmware, by
> > > whatever method it prefers.
> >
> > Yup. It's Qemu's responsibility to expose opregion content.
> >
> > btw, prefer to do copying here. It's pointless to allow write from guest
> > side. One write example is SWSCI mailbox, thru which gfx driver can
> > trigger some SCI event to communicate with BIOS (specifically ACPI
> > methods here), mostly for some monitor operations. However it's
> > not a right thing for guest to trigger host SCI and thus kick host
> > ACPI methods.
> 
> Thanks.
> 
> So, question again how we do that best.  Option one being the mmap way,
> i.e. basically what the patches posted by alex are doing.  Option two
> being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
> seabios not only set 0xfc, but also store the opregion there by copying
> from fw_cfg.
> 
> Advantage of option one is that we'll keep the option to do things in a
> different way in the future, without breaking the guest/qemu interface.
> 
> Disadvantage is that it'll cause hugepage mappings to be splitted.
> 

based on where you pick up the gfn to map or copy opregion. If you look
at physical, it's usually close to mmio region where several other reserved
e820 entries also exist. If we do same for virtual opregion, it shouldn't
impact hugepage.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 132+ messages in thread

end of thread, other threads:[~2016-02-03  6:08 UTC | newest]

Thread overview: 132+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-05 11:41 [Qemu-devel] [PATCH v3 00/11] igd passthrough chipset tweaks Gerd Hoffmann
2016-01-05 11:41 ` Gerd Hoffmann
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 01/11] pc: wire up TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE for !xen Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 02/11] pc: remove has_igd_gfx_passthru global Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-06 14:32   ` [Qemu-devel] [Xen-devel] " Stefano Stabellini
2016-01-06 14:32     ` Stefano Stabellini
2016-01-19 15:09   ` [Qemu-devel] " Eduardo Habkost
2016-01-19 15:09     ` Eduardo Habkost
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 03/11] pc: move igd support code to igd.c Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 04/11] igd: switch TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE to realize Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-06 14:32   ` [Qemu-devel] " Stefano Stabellini
2016-01-06 14:32     ` Stefano Stabellini
2016-01-23 14:51   ` [Qemu-devel] " Eduardo Habkost
2016-01-23 14:51     ` Eduardo Habkost
2016-01-25  8:59     ` [Qemu-devel] " Gerd Hoffmann
2016-01-25  8:59       ` Gerd Hoffmann
2016-01-25 11:53       ` [Qemu-devel] " Stefano Stabellini
2016-01-25 11:53         ` Stefano Stabellini
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 05/11] igd: TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE: call parent realize Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-06 14:41   ` [Qemu-devel] " Stefano Stabellini
2016-01-06 14:41     ` Stefano Stabellini
2016-01-06 15:45     ` [Qemu-devel] " Gerd Hoffmann
2016-01-06 15:45       ` Gerd Hoffmann
2016-01-19 15:13       ` [Qemu-devel] " Eduardo Habkost
2016-01-19 15:13         ` Eduardo Habkost
2016-01-20  9:10         ` [Qemu-devel] " Gerd Hoffmann
2016-01-20  9:10           ` Gerd Hoffmann
2016-01-23 14:52           ` [Qemu-devel] " Eduardo Habkost
2016-01-23 14:52             ` Eduardo Habkost
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 06/11] igd: use defines for standard pci config space offsets Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-06 14:43   ` [Qemu-devel] " Stefano Stabellini
2016-01-06 14:43     ` Stefano Stabellini
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 07/11] igd: revamp host config read Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-06 15:02   ` [Qemu-devel] " Stefano Stabellini
2016-01-06 15:02     ` Stefano Stabellini
2016-01-06 15:51     ` [Qemu-devel] " Gerd Hoffmann
2016-01-06 15:51       ` Gerd Hoffmann
2016-01-06 16:23       ` [Qemu-devel] [Xen-devel] " Stefano Stabellini
2016-01-06 16:23         ` Stefano Stabellini
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 08/11] igd: add q35 support Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 09/11] igd: move igd-passthrough-isa-bridge to igd.c too Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 10/11] igd: handle igd-passthrough-isa-bridge setup in realize() Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-06 15:29   ` [Qemu-devel] " Stefano Stabellini
2016-01-06 15:29     ` Stefano Stabellini
2016-01-06 15:52     ` [Qemu-devel] " Gerd Hoffmann
2016-01-06 15:52       ` Gerd Hoffmann
2016-01-05 11:41 ` [Qemu-devel] [PATCH v3 11/11] igd: move igd-passthrough-isa-bridge creation to machine init Gerd Hoffmann
2016-01-05 11:41   ` Gerd Hoffmann
2016-01-06 15:36   ` [Qemu-devel] " Stefano Stabellini
2016-01-06 15:36     ` Stefano Stabellini
2016-01-07  7:38     ` [Qemu-devel] " Gerd Hoffmann
2016-01-07  7:38       ` Gerd Hoffmann
2016-01-07 13:10       ` [Qemu-devel] " Stefano Stabellini
2016-01-07 13:10         ` Stefano Stabellini
2016-01-07 15:50         ` [Qemu-devel] " Gerd Hoffmann
2016-01-07 15:50           ` Gerd Hoffmann
2016-01-08 11:20           ` Stefano Stabellini
2016-01-08 11:20             ` Stefano Stabellini
2016-01-08 12:12             ` [Qemu-devel] " Stefano Stabellini
2016-01-08 12:12               ` Stefano Stabellini
2016-01-08 12:32               ` Gerd Hoffmann
2016-01-08 12:32                 ` Gerd Hoffmann
2016-01-08 12:38                 ` [Qemu-devel] " Stefano Stabellini
2016-01-08 12:38                   ` Stefano Stabellini
2016-01-05 13:07 ` [Qemu-devel] [PATCH v3 00/11] igd passthrough chipset tweaks Michael S. Tsirkin
2016-01-05 13:07   ` Michael S. Tsirkin
2016-01-28 19:35 ` [Qemu-devel] [vfio-users] " Alex Williamson
2016-01-28 19:35   ` Alex Williamson
2016-01-29  2:22   ` [Qemu-devel] [iGVT-g] " Kay, Allen M
2016-01-29  2:22     ` Kay, Allen M
2016-01-29  2:54     ` [Qemu-devel] " Alex Williamson
2016-01-29  2:54       ` Alex Williamson
2016-01-29  6:21       ` [Qemu-devel] " Jike Song
2016-01-29  6:21         ` Jike Song
2016-01-29 21:58       ` [Qemu-devel] " Kay, Allen M
2016-01-29 21:58         ` Kay, Allen M
2016-02-02  7:07         ` [Qemu-devel] " Tian, Kevin
2016-02-02  7:07           ` Tian, Kevin
2016-02-02 19:10           ` [Qemu-devel] " Kay, Allen M
2016-02-02 19:10             ` [iGVT-g] " Kay, Allen M
2016-02-02 19:37             ` [Qemu-devel] [iGVT-g] [vfio-users] " Alex Williamson
2016-02-02 19:37               ` Alex Williamson
2016-02-02 23:32               ` [Qemu-devel] " Kay, Allen M
2016-02-02 23:32                 ` Kay, Allen M
2016-01-29  7:09   ` [Qemu-devel] " Gerd Hoffmann
2016-01-29  7:09     ` Gerd Hoffmann
2016-01-29 17:59     ` [Qemu-devel] " Alex Williamson
2016-01-29 17:59       ` Alex Williamson
2016-01-30  1:18       ` [Qemu-devel] [iGVT-g] " Kay, Allen M
2016-01-30  1:18         ` Kay, Allen M
2016-01-31 17:42         ` [Qemu-devel] " Alex Williamson
2016-01-31 17:42           ` Alex Williamson
2016-02-02  0:04           ` [Qemu-devel] " Kay, Allen M
2016-02-02  0:04             ` Kay, Allen M
2016-02-02  6:42             ` [Qemu-devel] [Xen-devel] " Tian, Kevin
2016-02-02  6:42               ` Tian, Kevin
2016-02-02 11:50               ` [Qemu-devel] " David Woodhouse
2016-02-02 11:50                 ` David Woodhouse
2016-02-02 14:54                 ` [Qemu-devel] " Alex Williamson
2016-02-02 14:54                   ` Alex Williamson
2016-02-02 15:06                   ` [Qemu-devel] " David Woodhouse
2016-02-02 15:06                     ` David Woodhouse
2016-02-02 14:38             ` [Qemu-devel] " Alex Williamson
2016-02-02 14:38               ` Alex Williamson
2016-02-01 12:49       ` [Qemu-devel] " Gerd Hoffmann
2016-02-01 12:49         ` Gerd Hoffmann
2016-02-01 22:16         ` [Qemu-devel] " Alex Williamson
2016-02-01 22:16           ` Alex Williamson
2016-02-02  7:43           ` [Qemu-devel] [vfio-users] " Gerd Hoffmann
2016-02-02  7:43             ` Gerd Hoffmann
2016-02-02  7:01         ` [Qemu-devel] [iGVT-g] " Tian, Kevin
2016-02-02  7:01           ` [iGVT-g] " Tian, Kevin
2016-02-02  8:56           ` [Qemu-devel] [iGVT-g] [vfio-users] " Gerd Hoffmann
2016-02-02  8:56             ` Gerd Hoffmann
2016-02-02 16:31             ` [Qemu-devel] " Kevin O'Connor
2016-02-02 16:31               ` Kevin O'Connor
2016-02-02 16:49               ` [Qemu-devel] " Laszlo Ersek
2016-02-02 16:49                 ` Laszlo Ersek
2016-02-02 20:18               ` [Qemu-devel] " Alex Williamson
2016-02-02 20:18                 ` Alex Williamson
2016-02-03  6:08             ` [Qemu-devel] " Tian, Kevin
2016-02-03  6:08               ` Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.