Re: [XEN PATCH] tools/libs/light/libxl_pci.c: explicitly grant access to Intel IGD opregion

From: Chuck Zmudzinski <brchuckz@netscape.net>
To: Anthony PERARD <anthony.perard@citrix.com>
Cc: xen-devel@lists.xenproject.org, Wei Liu <wl@xen.org>,
	Juergen Gross <jgross@suse.com>, Jan Beulich <jbeulich@suse.com>
Subject: Re: [XEN PATCH] tools/libs/light/libxl_pci.c: explicitly grant access to Intel IGD opregion
Date: Wed, 30 Mar 2022 22:41:30 -0400	[thread overview]
Message-ID: <255ef47f-ea8a-79e5-601d-4ec8d6a44b7e@netscape.net> (raw)
In-Reply-To: <YkSQIoYhomhNKpYR@perard.uk.xensource.com>

On 3/30/22 1:15 PM, Anthony PERARD wrote:
> Hi Chuck,
>
> On Sun, Mar 13, 2022 at 11:41:37PM -0400, Chuck Zmudzinski wrote:
>> When gfx_passthru is enabled for the Intel IGD, hvmloader maps the IGD
>> opregion to the guest but libxl does not grant the guest permission to
> I'm not reading the same thing when looking at code in hvmloader. It
> seems that hvmloader allocate some memory for the IGD opregion rather
> than mapping it.
>
>
> tools/firmware/hvmloader/pci.c:184
>      if ( vendor_id == 0x8086 )
>      {
>          igd_opregion_pgbase = mem_hole_alloc(IGD_OPREGION_PAGES);
>          /*
>           * Write the the OpRegion offset to give the opregion
>           * address to the device model. The device model will trap
>           * and map the OpRegion at the give address.
>           */
>          pci_writel(vga_devfn, PCI_INTEL_OPREGION,
>                     igd_opregion_pgbase << PAGE_SHIFT);
>      }
>
> I think this write would go through QEMU, does it do something with it?

AFAICT Qemu does do something with it:

In the upstream Qemu, at hw/xen/xen_pt_config_init.c
and in hw/xen/xen_pt_graphics.c, we see where Qemu
implements functions to read and write the OpRegion,
and that means it reads and writes the value for the
mapped OpRegion address that is passed to it from
hvmloader. It is a 32-bit address that is stored in the
config attribute in sysfs for the Intel IGD on Linux guests.

I have examined the config attribute of the Intel IGD in
the control domain (dom0) and in the Linux guest and
what I see is that in both dom0 and in the guest, the
address of the IGD OpRegion is consistent with custom logs
I have examined from xen/common/domctl.c that show the
same machine and guest address for the OpRegion that
the config attribute has for the machine and the guest.

> (I kind of replying to this question at the end of the mail.)
>
> Is this code in hvmloader actually run in your case?

I admit I have not verified with certainty that Qemu and
the guest is getting the OpRegion address from hvmloader.
I will do a test to verify it, such as removing the code
from hvmloader that writes the address in the pci config
attribute and see if that prevents the guest from finding
the address where the OpRegion is mapped to in the guest.
That would prove that hvmloader code here is run in my case.
>
>> Currently, because of another bug in Qemu upstream, this crash can
>> only be reproduced using the traditional Qemu device model, and of
> qemu-traditional is a bit old... What is the bug in QEMU? Do you have a
> link to a patch/mail?

Not yet. I am working on it now. The bug is related to the
passthrough of the IRQ to the guest. So far I have compared
the logs in the Linux guest using Qemu upstream with Qemu
traditional, and I have found that with the upstream Qemu,
the Linux kernel in the guest reports that it cannot obtain
IRQ 24:

Mar 29 18:31:39 debian kernel: xen: --> pirq=16 -> irq=24 (gsi=24)
Mar 29 18:31:39 debian kernel: i915 0000:00:02.0: [drm] VT-d active for 
gfx access
...
Mar 29 18:31:39 debian kernel: xen:events: Failed to obtain physical IRQ 24

When using the traditional device model, this failure is not
reported. The failure of the system to handle the IRQ prevents
the guest from booting normally with the upstream Qemu.

Comparing Qemu upstream code with Qemu traditional code,
I notice that when the traditional Qemu sets up the pci-isa
bridge at slot 1f, it configures an IRQ, but the upstream
Qemu does not, so I suspect that is where the problem is, but I
have not found the fix yet. It is well known that the pci-isa
bridge at slot 1f needs to be specially configured for the
Intel IGD to function properly.
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index 4bbbfe9f16..a4fc473de9 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -2531,6 +2572,37 @@ int libxl__grant_vga_iomem_permission(libxl__gc *gc, const uint32_t domid,
>>                     domid, vga_iomem_start, (vga_iomem_start + 0x20 - 1));
>>               return ret;
>>           }
>> +
>> +        /* If this is an Intel IGD, allow access to the IGD opregion */
>> +        if (!libxl__is_igd_vga_passthru(gc, d_config)) return 0;
>> +
>> +        uint32_t igd_opregion = sysfs_dev_get_igd_opregion(gc, pci);
>> +        uint32_t error = 0xffffffff;
>> +        if (igd_opregion == error) break;
>> +
>> +        vga_iomem_start = ( (uint64_t) igd_opregion ) >> XC_PAGE_SHIFT;
>> +        ret = xc_domain_iomem_permission(CTX->xch, stubdom_domid,
>> +                                         vga_iomem_start,
>> +                                         IGD_OPREGION_PAGES, 1);
>> +        if (ret < 0) {
>> +            LOGED(ERROR, domid,
>> +                  "failed to give stubdom%d access to iomem range "
>> +                  "%"PRIx64"-%"PRIx64" for IGD passthru",
>> +                  stubdom_domid, vga_iomem_start, (vga_iomem_start +
>> +                                                IGD_OPREGION_PAGES - 1));
>> +            return ret;
>> +        }
>> +        ret = xc_domain_iomem_permission(CTX->xch, domid,
>> +                                         vga_iomem_start,
>> +                                         IGD_OPREGION_PAGES, 1);
> Here, you seems to add permission to an address that is read from the
> pci config space of the device, but as I've pointed above hvmloader
> seems to overwrite this address.

I do not think hvmloader overwrites this address. I think of
it as hvmloader passing the mapped address to the device
model and guest, but as I said above, I will do more tests
to verify with certainty what hvmloader is actually doing.
> It this call to
> xc_domain_iomem_permission() does actually anything useful?

I am certain this call to xc_domain_iomem_permission()
is necessary with the Qemu traditional device model to
obtain permission for the guest (domid) to access the
OpRegion. Without it, the bug is not fixed and I get the
crash in the i915 kernel module in the Linux guest and a dark
screen instead of a guest with output to the screen.

Moreover, I have verified with custom logs from
xen/common/domctl.c that access to the opregion
is denied to domid, but not to dom0, when this
patch is not applied.

> Or is it by luck that the address returned by
> sysfs_dev_get_igd_opregion() happened to be the address that hvmloader
> is going to write?
>
> Or maybe hvmloader doesn't actually do anything?

I mentioned earlier my plans to verify that
hvmloader does provide the device model and
the guest with the mapped address of the
Intel IGD OpRegion.
>
>
> Some more though on that, looking at QEMU, it seems that there's already
> a call to xc_domain_iomem_permission(), in igd_write_opregion(). So
> adding one in libxl would seems redundant,

You may be right that with Qemu upstream it is not
needed. I will be able to check it once I have a patch
for the IRQ problem in upstream Qemu. When I wrote
this patch a couple of weeks ago, though, I did not yet
know that Qemu upstream also calls
xc_domain_iomem_permission() and since Qemu
upstream seems to obtain the necessary permission,
the call here to xc_domain_iomem_permission()
can be done only when the device model is Qemu
traditional.
> or maybe it the one for the
> device model's domain that's needed  (named 'stubdom_domid' here)?

Well, I am not using a device model stub domain but
running the device model in dom0, and my tests
indicate it is not necessary to obtain permission for
dom0, but I do admit I need to do more tests before
submitting the next version of a patch. I plan to do
some tests with a device model stub domain and see
what configurations require permission for the stub
domain. I expect it will only be needed when the
device model is Qemu traditional because Qemu
upstream obtains the necessary permission.

I have no experience running the device model
in a stub domain, and IIRC the Xen wiki explains
that the supported configuration from the Xen Project
is with the traditional device model and mini os in
the stub domain, and others, such as Qubes OS,
have done some work on the upstream Qemu and
Linux running in a stub domain. Please correct me
if this is not correct.

Thank you for taking the time to look at this patch.

Chuck