On Thu, 16 Mar 2023, Jan Beulich wrote: > On 16.03.2023 10:27, Roger Pau Monné wrote: > > On Thu, Mar 16, 2023 at 09:55:03AM +0100, Jan Beulich wrote: > >> On 16.03.2023 01:44, Stefano Stabellini wrote: > >>> On Wed, 15 Mar 2023, Roger Pau Monné wrote: > >>>> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > >>>>> From: Chen Jiqian > >>>>> > >>>>> Use new xc_physdev_gsi_from_irq to get the GSI number > >>>>> > >>>>> Signed-off-by: Chen Jiqian > >>>>> Signed-off-by: Huang Rui > >>>>> --- > >>>>> tools/libs/light/libxl_pci.c | 1 + > >>>>> 1 file changed, 1 insertion(+) > >>>>> > >>>>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > >>>>> index f4c4f17545..47cf2799bf 100644 > >>>>> --- a/tools/libs/light/libxl_pci.c > >>>>> +++ b/tools/libs/light/libxl_pci.c > >>>>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > >>>>> goto out_no_irq; > >>>>> } > >>>>> if ((fscanf(f, "%u", &irq) == 1) && irq) { > >>>>> + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > >>>> > >>>> This is just a shot in the dark, because I don't really have enough > >>>> context to understand what's going on here, but see below. > >>>> > >>>> I've taken a look at this on my box, and it seems like on > >>>> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not > >>>> very consistent. > >>>> > >>>> If devices are in use by a driver the irq sysfs node reports either > >>>> the GSI irq or the MSI IRQ (in case a single MSI interrupt is > >>>> setup). > >>>> > >>>> It seems like pciback in Linux does something to report the correct > >>>> value: > >>>> > >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > >>>> 74 > >>>> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 > >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > >>>> 16 > >>>> > >>>> As you can see, making the device assignable changed the value > >>>> reported by the irq node to be the GSI instead of the MSI IRQ, I would > >>>> think you are missing something similar in the PVH setup (some pciback > >>>> magic)? > >>>> > >>>> Albeit I have no idea why you would need to translate from IRQ to GSI > >>>> in the way you do in this and related patches, because I'm missing the > >>>> context. > >>> > >>> As I mention in another email, also keep in mind that we need QEMU to > >>> work and QEMU calls: > >>> 1) xc_physdev_map_pirq (this is also called from libxl) > >>> 2) xc_domain_bind_pt_pci_irq > >>> > >>> > >>> In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ > >>> in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not > >>> the IRQ. If you look at the implementation of xc_physdev_map_pirq, > >>> you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen > >>> xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: > >>> > >>> if ( index < 0 || index >= nr_irqs_gsi ) > >>> { > >>> dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, > >>> index); > >>> return -EINVAL; > >>> } > >>> > >>> nr_irqs_gsi < 112, and the check will fail. > >>> > >>> So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need > >>> to discover the GSI number corresponding to the IRQ number. > >> > >> That's one possible approach. Another could be (making a lot of assumptions) > >> that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen > >> then translates that to GSI, knowing that PVH doesn't have (host) GSIs > >> exposed to it. > > > > I don't think Xen can translate a Linux IRQ to a GSI, as that's a > > Linux abstraction Xen has no part in. > > Well, I was talking about whatever Dom0 and Xen use to communicate. I.e. > if at all I might have meant pIRQ, but now that you mention ... > > > The GSIs exposed to a PVH dom0 are the native (host) ones, as we > > create an emulated IO-APIC topology that mimics the physical one. > > > > Question here is why Linux ends up with a IRQ != GSI, as it's my > > understanding on Linux GSIs will always be identity mapped to IRQs, and > > the IRQ space up to the last possible GSI is explicitly reserved for > > this purpose. > > ... this I guess pIRQ was a PV-only concept, and it really ought to be > GSI in the PVH case. So yes, it then all boils down to that Linux- > internal question. Excellent question but we'll have to wait for Ray as he is the one with access to the hardware. But I have this data I can share in the meantime: [ 1.260378] IRQ to pin mappings: [ 1.260387] IRQ1 -> 0:1 [ 1.260395] IRQ2 -> 0:2 [ 1.260403] IRQ3 -> 0:3 [ 1.260410] IRQ4 -> 0:4 [ 1.260418] IRQ5 -> 0:5 [ 1.260425] IRQ6 -> 0:6 [ 1.260432] IRQ7 -> 0:7 [ 1.260440] IRQ8 -> 0:8 [ 1.260447] IRQ9 -> 0:9 [ 1.260455] IRQ10 -> 0:10 [ 1.260462] IRQ11 -> 0:11 [ 1.260470] IRQ12 -> 0:12 [ 1.260478] IRQ13 -> 0:13 [ 1.260485] IRQ14 -> 0:14 [ 1.260493] IRQ15 -> 0:15 [ 1.260505] IRQ106 -> 1:8 [ 1.260513] IRQ112 -> 1:4 [ 1.260521] IRQ116 -> 1:13 [ 1.260529] IRQ117 -> 1:14 [ 1.260537] IRQ118 -> 1:15 [ 1.260544] .................................... done. And I think Ray traced the point in Linux where Linux gives us an IRQ == 112 (which is the one causing issues): __acpi_register_gsi-> acpi_register_gsi_ioapic-> mp_map_gsi_to_irq-> mp_map_pin_to_irq-> __irq_resolve_mapping() if (likely(data)) { desc = irq_data_to_desc(data); if (irq) *irq = data->irq; /* this IRQ is 112, IO-APIC-34 domain */ }