From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Jan Beulich" <jbeulich@suse.com>,
"Huang Rui" <ray.huang@amd.com>,
"Anthony PERARD" <anthony.perard@citrix.com>,
xen-devel@lists.xenproject.org,
"Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"Stewart Hildebrand" <Stewart.Hildebrand@amd.com>,
"Xenia Ragiadakou" <burzalodowa@gmail.com>,
"Honglei Huang" <honglei1.huang@amd.com>,
"Julia Zhang" <julia.zhang@amd.com>,
"Chen Jiqian" <Jiqian.Chen@amd.com>
Subject: Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi
Date: Mon, 20 Mar 2023 16:16:19 +0100 [thread overview]
Message-ID: <ZBh4w+hDajq06BU6@Air-de-Roger> (raw)
In-Reply-To: <alpine.DEB.2.22.394.2303171346410.3359@ubuntu-linux-20-04-desktop>
On Fri, Mar 17, 2023 at 01:55:08PM -0700, Stefano Stabellini wrote:
> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
> > > On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> > > > > On 17.03.2023 00:19, Stefano Stabellini wrote:
> > > > > > On Thu, 16 Mar 2023, Jan Beulich wrote:
> > > > > >> So yes, it then all boils down to that Linux-
> > > > > >> internal question.
> > > > > >
> > > > > > Excellent question but we'll have to wait for Ray as he is the one with
> > > > > > access to the hardware. But I have this data I can share in the
> > > > > > meantime:
> > > > > >
> > > > > > [ 1.260378] IRQ to pin mappings:
> > > > > > [ 1.260387] IRQ1 -> 0:1
> > > > > > [ 1.260395] IRQ2 -> 0:2
> > > > > > [ 1.260403] IRQ3 -> 0:3
> > > > > > [ 1.260410] IRQ4 -> 0:4
> > > > > > [ 1.260418] IRQ5 -> 0:5
> > > > > > [ 1.260425] IRQ6 -> 0:6
> > > > > > [ 1.260432] IRQ7 -> 0:7
> > > > > > [ 1.260440] IRQ8 -> 0:8
> > > > > > [ 1.260447] IRQ9 -> 0:9
> > > > > > [ 1.260455] IRQ10 -> 0:10
> > > > > > [ 1.260462] IRQ11 -> 0:11
> > > > > > [ 1.260470] IRQ12 -> 0:12
> > > > > > [ 1.260478] IRQ13 -> 0:13
> > > > > > [ 1.260485] IRQ14 -> 0:14
> > > > > > [ 1.260493] IRQ15 -> 0:15
> > > > > > [ 1.260505] IRQ106 -> 1:8
> > > > > > [ 1.260513] IRQ112 -> 1:4
> > > > > > [ 1.260521] IRQ116 -> 1:13
> > > > > > [ 1.260529] IRQ117 -> 1:14
> > > > > > [ 1.260537] IRQ118 -> 1:15
> > > > > > [ 1.260544] .................................... done.
> > > > >
> > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with
> > > > > Linux running baremetal on the same hardware?
> > > >
> > > > So I have some emails from Ray from he time he was looking into this,
> > > > and on Linux dom0 PVH dmesg there is:
> > > >
> > > > [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23
> > > > [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55
> > > >
> > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least
> > > > consistent.
> > > >
> > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> > > > > > 112 (which is the one causing issues):
> > > > > >
> > > > > > __acpi_register_gsi->
> > > > > > acpi_register_gsi_ioapic->
> > > > > > mp_map_gsi_to_irq->
> > > > > > mp_map_pin_to_irq->
> > > > > > __irq_resolve_mapping()
> > > > > >
> > > > > > if (likely(data)) {
> > > > > > desc = irq_data_to_desc(data);
> > > > > > if (irq)
> > > > > > *irq = data->irq;
> > > > > > /* this IRQ is 112, IO-APIC-34 domain */
> > > > > > }
> > > >
> > > >
> > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC
> > > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> > > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> > > > using PHYSDEV ops instead of doing it natively from the IO-APIC?
> > > >
> > > > FWIW, the introduced function in that patch
> > > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> > > > acpi_register_gsi_ioapic() without checking if the GSI is already
> > > > registered, which might lead to multiple IRQs being allocated for the
> > > > same underlying GSI?
> > >
> > > I understand this point and I think it needs investigating.
> > >
> > >
> > > > As I commented there, I think that approach is wrong. If the GSI has
> > > > not been mapped in Xen (because dom0 hasn't unmasked the respective
> > > > IO-APIC pin) we should add some logic in the toolstack to map it
> > > > before attempting to bind.
> > >
> > > But this statement confuses me. The toolstack doesn't get involved in
> > > IRQ setup for PCI devices for HVM guests?
> >
> > It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
> > to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that
> > cold be removed (maybe for qemu-trad only?) or it's also required by
> > QEMU upstream, I would have to investigate more.
>
> You are right. I am not certain, but it seems like a mistake in the
> toolstack to me. In theory, pci_add_dm_done should only be needed for PV
> guests, not for HVM guests. I am not sure. But I can see the call to
> xc_physdev_map_pirq you were referring to now.
>
>
> > It's my understanding it's in pci_add_dm_done() where Ray was getting
> > the mismatched IRQ vs GSI number.
>
> I think the mismatch was actually caused by the xc_physdev_map_pirq call
> from QEMU, which makes sense because in any case it should happen before
> the same call done by pci_add_dm_done (pci_add_dm_done is called after
> sending the pci passthrough QMP command to QEMU). So the first to hit
> the IRQ!=GSI problem would be QEMU.
I've been thinking about this a bit, and I think one of the possible
issues with the current handling of GSIs in a PVH dom0 is that GSIs
don't get registered until/unless they are unmasked. I could see this
as a problem when doing passthrough: it's possible for a GSI (iow:
vIO-APIC pin) to never get unmasked on dom0, because the device
driver(s) are using MSI(-X) interrupts instead. However, the IO-APIC
pin must be configured for it to be able to be mapped into a domU.
A possible solution is to propagate the vIO-APIC pin configuration
trigger/polarity when dom0 writes the low part of the redirection
table entry.
The patch below enables the usage of PHYSDEVOP_{un,}map_pirq from PVH
domains (I need to assert this is secure even for domUs) and also
propagates the vIO-APIC pin trigger/polarity mode on writes to the
low part of the RTE. Such propagation leads to the following
interrupt setup in Xen:
IRQ: 0 vec:f0 IO-APIC-edge status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt()
IRQ: 1 vec:38 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound
IRQ: 2 vec:a8 IO-APIC-edge status=000 aff:{0-7}/{0-7} no_action()
IRQ: 3 vec:f1 IO-APIC-edge status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt()
IRQ: 4 vec:40 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound
IRQ: 5 vec:48 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound
IRQ: 6 vec:50 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound
IRQ: 7 vec:58 IO-APIC-edge status=006 aff:{0-7}/{0} mapped, unbound
IRQ: 8 vec:60 IO-APIC-edge status=010 aff:{0}/{0} in-flight=0 d0: 8(-M-)
IRQ: 9 vec:68 IO-APIC-edge status=010 aff:{0}/{0} in-flight=0 d0: 9(-M-)
IRQ: 10 vec:70 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound
IRQ: 11 vec:78 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound
IRQ: 12 vec:88 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound
IRQ: 13 vec:90 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound
IRQ: 14 vec:98 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound
IRQ: 15 vec:a0 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound
IRQ: 16 vec:b0 IO-APIC-edge status=010 aff:{1}/{0-7} in-flight=0 d0: 16(-M-)
IRQ: 17 vec:b8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ: 18 vec:c0 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ: 19 vec:c8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ: 20 vec:d0 IO-APIC-edge status=010 aff:{1}/{0-7} in-flight=0 d0: 20(-M-)
IRQ: 21 vec:d8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ: 22 vec:e0 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ: 23 vec:e8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound
Note how now all GSIs on my box are setup, even when not bound to
dom0 anymore. The output without this patch looks like:
IRQ: 0 vec:f0 IO-APIC-edge status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt()
IRQ: 1 vec:38 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound
IRQ: 3 vec:f1 IO-APIC-edge status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt()
IRQ: 4 vec:40 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound
IRQ: 5 vec:48 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound
IRQ: 6 vec:50 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound
IRQ: 7 vec:58 IO-APIC-edge status=006 aff:{0}/{0} mapped, unbound
IRQ: 8 vec:d0 IO-APIC-edge status=010 aff:{6}/{0-7} in-flight=0 d0: 8(-M-)
IRQ: 9 vec:a8 IO-APIC-level status=010 aff:{2}/{0-7} in-flight=0 d0: 9(-M-)
IRQ: 10 vec:70 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound
IRQ: 11 vec:78 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound
IRQ: 12 vec:88 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound
IRQ: 13 vec:90 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound
IRQ: 14 vec:98 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound
IRQ: 15 vec:a0 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound
IRQ: 16 vec:e0 IO-APIC-level status=010 aff:{6}/{0-7} in-flight=0 d0: 16(-M-),d1: 16(-M-)
IRQ: 20 vec:d8 IO-APIC-level status=010 aff:{6}/{0-7} in-flight=0 d0: 20(-M-)
Legacy IRQs (below 16) are always registered.
With the patch above I seem to be able to do PCI passthrough to an HVM
domU from a PVH dom0.
Regards, Roger.
---
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 405d0a95af..cc53a3bd12 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
{
case PHYSDEVOP_map_pirq:
case PHYSDEVOP_unmap_pirq:
+ break;
+
case PHYSDEVOP_eoi:
case PHYSDEVOP_irq_status_query:
case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 41e3c4d5e4..50e23a093c 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -180,9 +180,7 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig,
/* Interrupt has been unmasked, bind it now. */
ret = mp_register_gsi(gsi, trig, pol);
- if ( ret == -EEXIST )
- return 0;
- if ( ret )
+ if ( ret && ret != -EEXIST )
{
gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
gsi, ret);
@@ -244,12 +242,18 @@ static void vioapic_write_redirent(
}
else
{
+ int ret;
+
unmasked = ent.fields.mask;
/* Remote IRR and Delivery Status are read-only. */
ent.bits = ((ent.bits >> 32) << 32) | val;
ent.fields.delivery_status = 0;
ent.fields.remote_irr = pent->fields.remote_irr;
unmasked = unmasked && !ent.fields.mask;
+ ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity);
+ if ( ret && ret != -EEXIST )
+ gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
+ gsi, ret);
}
*pent = ent;
next prev parent reply other threads:[~2023-03-20 15:17 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-12 7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui
2023-03-12 7:54 ` [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present Huang Rui
2023-03-13 11:55 ` Andrew Cooper
2023-03-13 12:21 ` Roger Pau Monné
2023-03-13 12:27 ` Andrew Cooper
2023-03-21 6:26 ` Huang Rui
2023-03-12 7:54 ` [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH Huang Rui
2023-03-13 7:23 ` Christian König
2023-03-13 7:26 ` Christian König
2023-03-13 8:46 ` Jan Beulich
2023-03-13 8:55 ` Huang Rui
2023-03-14 23:42 ` Stefano Stabellini
2023-03-14 16:02 ` Jan Beulich
2023-03-21 9:36 ` Huang Rui
2023-03-21 9:41 ` Jan Beulich
2023-03-21 10:14 ` Huang Rui
2023-03-21 10:20 ` Jan Beulich
2023-03-21 11:49 ` Huang Rui
2023-03-21 12:20 ` Roger Pau Monné
2023-03-21 12:25 ` Jan Beulich
2023-03-21 12:59 ` Huang Rui
2023-03-21 12:27 ` Jan Beulich
2023-03-21 13:03 ` Huang Rui
2023-03-22 7:28 ` Huang Rui
2023-03-22 7:45 ` Jan Beulich
2023-03-22 9:34 ` Roger Pau Monné
2023-03-22 12:33 ` Huang Rui
2023-03-22 12:48 ` Jan Beulich
2023-03-23 10:26 ` Huang Rui
2023-03-23 14:16 ` Jan Beulich
2023-03-23 10:43 ` Roger Pau Monné
2023-03-23 13:34 ` Huang Rui
2023-03-23 16:23 ` Roger Pau Monné
2023-03-24 4:37 ` Huang Rui
2023-03-12 7:54 ` [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH Huang Rui
2023-03-14 16:27 ` Jan Beulich
2023-03-15 15:57 ` Roger Pau Monné
2023-03-16 0:22 ` Stefano Stabellini
2023-03-21 10:09 ` Huang Rui
2023-03-21 10:17 ` Jan Beulich
2023-03-12 7:54 ` [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call Huang Rui
2023-03-14 16:30 ` Jan Beulich
2023-03-15 17:01 ` Andrew Cooper
2023-03-16 0:26 ` Stefano Stabellini
2023-03-16 0:39 ` Stefano Stabellini
2023-03-16 8:51 ` Jan Beulich
2023-03-16 9:18 ` Roger Pau Monné
2023-03-16 7:05 ` Jan Beulich
2023-03-21 12:42 ` Huang Rui
2023-03-21 12:22 ` Huang Rui
2023-03-12 7:54 ` [RFC XEN PATCH 5/6] tools/libs/call: add linux os call to get gsi from irq Huang Rui
2023-03-14 16:36 ` Jan Beulich
2023-03-12 7:54 ` [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi Huang Rui
2023-03-14 16:39 ` Jan Beulich
2023-03-15 16:35 ` Roger Pau Monné
2023-03-16 0:44 ` Stefano Stabellini
2023-03-16 8:54 ` Roger Pau Monné
2023-03-16 8:55 ` Jan Beulich
2023-03-16 9:27 ` Roger Pau Monné
2023-03-16 9:42 ` Jan Beulich
2023-03-16 23:19 ` Stefano Stabellini
2023-03-17 8:39 ` Jan Beulich
2023-03-17 9:51 ` Roger Pau Monné
2023-03-17 18:15 ` Stefano Stabellini
2023-03-17 19:48 ` Roger Pau Monné
2023-03-17 20:55 ` Stefano Stabellini
2023-03-20 15:16 ` Roger Pau Monné [this message]
2023-03-20 15:29 ` Jan Beulich
2023-03-20 16:50 ` Roger Pau Monné
2023-07-31 16:40 ` Chen, Jiqian
2023-08-23 8:57 ` Roger Pau Monné
2023-08-31 8:56 ` Chen, Jiqian
2023-03-13 7:24 ` [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Christian König
2023-03-21 10:26 ` Huang Rui
2023-03-20 16:22 ` Huang Rui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZBh4w+hDajq06BU6@Air-de-Roger \
--to=roger.pau@citrix.com \
--cc=Jiqian.Chen@amd.com \
--cc=Stewart.Hildebrand@amd.com \
--cc=alexander.deucher@amd.com \
--cc=anthony.perard@citrix.com \
--cc=burzalodowa@gmail.com \
--cc=christian.koenig@amd.com \
--cc=honglei1.huang@amd.com \
--cc=jbeulich@suse.com \
--cc=julia.zhang@amd.com \
--cc=ray.huang@amd.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).