[XEN PATCH] tools/libs/light/libxl_pci.c: explicitly grant access to Intel IGD opregion

* [XEN PATCH] tools/libs/light/libxl_pci.c: explicitly grant access to Intel IGD opregion
       [not found] <b62fbc602a629941c1acaad3b93d250a3eba33c0.1647222184.git.brchuckz.ref@netscape.net>
@ 2022-03-14  3:41 ` Chuck Zmudzinski
  2022-03-15 11:38   ` Jan Beulich
                     ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Chuck Zmudzinski @ 2022-03-14  3:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Anthony PERARD, Juergen Gross, Jan Beulich

When gfx_passthru is enabled for the Intel IGD, hvmloader maps the IGD
opregion to the guest but libxl does not grant the guest permission to
access the mapped memory region. This results in a crash of the i915.ko
kernel module in a Linux HVM guest when it needs to access the IGD
opregion:

Oct 23 11:36:33 domU kernel: Call Trace:
Oct 23 11:36:33 domU kernel:  ? idr_alloc+0x39/0x70
Oct 23 11:36:33 domU kernel:  drm_get_last_vbltimestamp+0xaa/0xc0 [drm]
Oct 23 11:36:33 domU kernel:  drm_reset_vblank_timestamp+0x5b/0xd0 [drm]
Oct 23 11:36:33 domU kernel:  drm_crtc_vblank_on+0x7b/0x130 [drm]
Oct 23 11:36:33 domU kernel:  intel_modeset_setup_hw_state+0xbd4/0x1900 [i915]
Oct 23 11:36:33 domU kernel:  ? _cond_resched+0x16/0x40
Oct 23 11:36:33 domU kernel:  ? ww_mutex_lock+0x15/0x80
Oct 23 11:36:33 domU kernel:  intel_modeset_init_nogem+0x867/0x1d30 [i915]
Oct 23 11:36:33 domU kernel:  ? gen6_write32+0x4b/0x1c0 [i915]
Oct 23 11:36:33 domU kernel:  ? intel_irq_postinstall+0xb9/0x670 [i915]
Oct 23 11:36:33 domU kernel:  i915_driver_probe+0x5c2/0xc90 [i915]
Oct 23 11:36:33 domU kernel:  ? vga_switcheroo_client_probe_defer+0x1f/0x40
Oct 23 11:36:33 domU kernel:  ? i915_pci_probe+0x3f/0x150 [i915]
Oct 23 11:36:33 domU kernel:  local_pci_probe+0x42/0x80
Oct 23 11:36:33 domU kernel:  ? _cond_resched+0x16/0x40
Oct 23 11:36:33 domU kernel:  pci_device_probe+0xfd/0x1b0
Oct 23 11:36:33 domU kernel:  really_probe+0x222/0x480
Oct 23 11:36:33 domU kernel:  driver_probe_device+0xe1/0x150
Oct 23 11:36:33 domU kernel:  device_driver_attach+0xa1/0xb0
Oct 23 11:36:33 domU kernel:  __driver_attach+0x8a/0x150
Oct 23 11:36:33 domU kernel:  ? device_driver_attach+0xb0/0xb0
Oct 23 11:36:33 domU kernel:  ? device_driver_attach+0xb0/0xb0
Oct 23 11:36:33 domU kernel:  bus_for_each_dev+0x78/0xc0
Oct 23 11:36:33 domU kernel:  bus_add_driver+0x12b/0x1e0
Oct 23 11:36:33 domU kernel:  driver_register+0x8b/0xe0
Oct 23 11:36:33 domU kernel:  ? 0xffffffffc06b8000
Oct 23 11:36:33 domU kernel:  i915_init+0x5d/0x70 [i915]
Oct 23 11:36:33 domU kernel:  do_one_initcall+0x44/0x1d0
Oct 23 11:36:33 domU kernel:  ? do_init_module+0x23/0x260
Oct 23 11:36:33 domU kernel:  ? kmem_cache_alloc_trace+0xf5/0x200
Oct 23 11:36:33 domU kernel:  do_init_module+0x5c/0x260
Oct 23 11:36:33 domU kernel:  __do_sys_finit_module+0xb1/0x110
Oct 23 11:36:33 domU kernel:  do_syscall_64+0x33/0x80
Oct 23 11:36:33 domU kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 23 11:36:33 domU kernel: RIP: 0033:0x7f188e1aa9b9

This bug first appeared with commits abfb006f1ff4 and 0561e1f01e87
during the development of Xen 4.5 in 2014 and is present in all Xen
versions 4.5 and higher.

Currently, because of another bug in Qemu upstream, this crash can
only be reproduced using the traditional Qemu device model, and of
course it can only be reproduced on a system with an Intel IGD and
compatible hardware and system BIOS that supports Intel VT-d. It also
only occurs with Linux Intel graphics drivers (i915) in a Linux guest.
It does not occur in a Windows guest with proprietary Windows Intel
graphics drivers. This testing was done with Qemu traditional running as
a process in dom0.

The commit abfb006f1ff4 was intended to explicitly grant access to all
needed I/O memory ranges so access requests by guests would not fail
after commit 0561e1f01e87 was applied, but it failed to handle the case
when the Intel IGD is being passed to an HVM guest for VGA passthrough.
This patch grants access to the Intel IGD opregion if an Intel IGD is
passed to an HVM guest and gfx_passthru is enabled in the xl.cfg guest
configuration file, in addition to the other memory that was configured
for access in commit abfb006f1ff4.

The fix is implemented as follows:

The first hunk of the patch adds two macros. These are the macros that
determine the starting address and size of the Intel IGD opregion.
PCI_INTEL_OPREGION matches the value in tools/firmware/hvmloader/pci_regs.h.
IGD_OPREGION_PAGES matches the value in tools/firmware/hvmloader/config.h.
These macros are used to correctly define the start address and size of
the Intel IGD opregion.

The second hunk is the new sysfs_dev_get_igd_opregion function, using
the same coding style as the other sysfs_dev_get_xxx functions in
the patched file. It returns the start address of the Intel IGD opregion
from dom0's point of view if there are no errors, and it is called by
code in the third and final hunk of the patch.

The third hunk extends the libxl__grant_vga_iomem_permission function,
which was a newly added function in one of the commits being fixed now
(abfb006f1ff4). That function, in addition to what it did before, now
checks if we have an Intel IGD and if so, it calls the new
sysfs_dev_get_igd_opregion function to get the starting address of the
memory to grant access to.

This problem was discovered by building and testing versions of
Xen 4.5-unstable until the aforementioned patches that broke passthrough
of the Intel IGD to a Linux HVM guest were found.

That alone, though, was not enough information to arrive at this fix.
After identifying abfb006f1ff4 and 0561e1f01e87 as the commits that were
causing the trouble, it was necessary to find out what memory was being
denied that previously was allowed. By examining verbose logs from both
Qemu and Xen, and the logs from a custom build of Xen that added a
custom log entry to indicate the address of the memory that was being
denied, it was possible to determine without doubt that the memory that
was being denied was the Intel IGD opregion.

Windows HVM guests are not affected by this issue, presumably because
the proprietary Windows Intel graphics drivers do not need to access the
Intel IGD opregion if for some reason it is inaccessible.

Guidelines for maintaining this code: This code needs to be kept
consistent with how hvmloader maps the Intel IGD opregion to the guest,
how hvmloader and libxenlight detect an Intel IGD on the PCI bus, and
how Qemu sets up the mapped IGD opregion in the guest and detects an
Intel IGD. For example, all these components should agree on the size of
the IGD opregion.

The most important point for the future is accurate detection of the
Intel IGD, which libxl__is_igd_vga_passthru currently provides. This
patch does not modify that function, but it does use it. It will be
important to ensure that function accurately detects an Intel IGD,
because that is the only way we validate the contents of the Intel
IGD opregion that we are permitting the guest to access. Currently, if
we have a VGA device, the vendor is Intel, and the gfx_passthru option
is enabled, we assume the contents of the memory we are mapping to
and sharing with the guest is valid. The libxl__is_igd_vga_passthru
function also reads the device id, but currently it assumes every
Intel GPU device has an opregion. So, for example, if it is discovered
that the newer discrete Intel GPUs do not have an opregion, the
libxl__is_igd_vga_passthru function should be modified to return false
for those devices.

Fixes: abfb006f1ff4 (tools/libxl: explicitly grant access to needed I/O-memory ranges)
Fixes: 0561e1f01e87 (xen/common: do not implicitly permit access to mapped I/O memory)
Backport: 4.12+

Signed-off-by: Chuck Zmudzinski <brchuckz@netscape.net>
---
 tools/libs/light/libxl_pci.c | 72 ++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 4bbbfe9f16..a4fc473de9 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -24,6 +24,8 @@
 #define PCI_OPTIONS            "msitranslate=%d,power_mgmt=%d"
 #define PCI_BDF_XSPATH         "%04x-%02x-%02x-%01x"
 #define PCI_PT_QDEV_ID         "pci-pt-%02x_%02x.%01x"
+#define PCI_INTEL_OPREGION     0xfc
+#define IGD_OPREGION_PAGES     3
 
 static unsigned int pci_encode_bdf(libxl_device_pci *pci)
 {
@@ -610,6 +612,45 @@ out:
     return ret;
 }
 
+static uint32_t sysfs_dev_get_igd_opregion(libxl__gc *gc,
+                                           libxl_device_pci *pci)
+{
+    char *pci_device_config_path =
+            GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/config",
+                      pci->domain, pci->bus, pci->dev, pci->func);
+    size_t read_items;
+    uint32_t igd_opregion;
+    uint32_t error = 0xffffffff;
+
+    FILE *f = fopen(pci_device_config_path, "r");
+    if (!f) {
+        LOGE(ERROR,
+             "pci device "PCI_BDF" does not have config attribute",
+             pci->domain, pci->bus, pci->dev, pci->func);
+        goto out;
+    }
+    if (fseek(f, PCI_INTEL_OPREGION, SEEK_SET)) {
+        LOGE(ERROR,
+             "cannot find igd opregion address of pci device "PCI_BDF,
+             pci->domain, pci->bus, pci->dev, pci->func);
+        goto out;
+    }
+    read_items = fread(&igd_opregion, 4, 1, f);
+    if (read_items != 1) {
+        LOGE(ERROR,
+             "cannot read igd opregion address of pci device "PCI_BDF,
+             pci->domain, pci->bus, pci->dev, pci->func);
+        goto out;
+    }
+    fclose(f);
+    return igd_opregion;
+
+out:
+    if (f)
+        fclose(f);
+    return error;
+}
+
 /*
  * Some devices may need some ways to work well. Here like IGD,
  * we have to pass a specific option to qemu.
@@ -2531,6 +2572,37 @@ int libxl__grant_vga_iomem_permission(libxl__gc *gc, const uint32_t domid,
                   domid, vga_iomem_start, (vga_iomem_start + 0x20 - 1));
             return ret;
         }
+
+        /* If this is an Intel IGD, allow access to the IGD opregion */
+        if (!libxl__is_igd_vga_passthru(gc, d_config)) return 0;
+
+        uint32_t igd_opregion = sysfs_dev_get_igd_opregion(gc, pci);
+        uint32_t error = 0xffffffff;
+        if (igd_opregion == error) break;
+
+        vga_iomem_start = ( (uint64_t) igd_opregion ) >> XC_PAGE_SHIFT;
+        ret = xc_domain_iomem_permission(CTX->xch, stubdom_domid,
+                                         vga_iomem_start,
+                                         IGD_OPREGION_PAGES, 1);
+        if (ret < 0) {
+            LOGED(ERROR, domid,
+                  "failed to give stubdom%d access to iomem range "
+                  "%"PRIx64"-%"PRIx64" for IGD passthru",
+                  stubdom_domid, vga_iomem_start, (vga_iomem_start +
+                                                IGD_OPREGION_PAGES - 1));
+            return ret;
+        }
+        ret = xc_domain_iomem_permission(CTX->xch, domid,
+                                         vga_iomem_start,
+                                         IGD_OPREGION_PAGES, 1);
+        if (ret < 0) {
+            LOGED(ERROR, domid,
+                  "failed to give dom%d access to iomem range "
+                  "%"PRIx64"-%"PRIx64" for IGD passthru",
+                  domid, vga_iomem_start, (vga_iomem_start +
+                                           IGD_OPREGION_PAGES - 1));
+            return ret;
+        }
         break;
     }
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread