All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Jan Beulich" <jbeulich@suse.com>,
	"Huang Rui" <ray.huang@amd.com>,
	"Anthony PERARD" <anthony.perard@citrix.com>,
	xen-devel@lists.xenproject.org,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Stewart Hildebrand" <Stewart.Hildebrand@amd.com>,
	"Xenia Ragiadakou" <burzalodowa@gmail.com>,
	"Honglei Huang" <honglei1.huang@amd.com>,
	"Julia Zhang" <julia.zhang@amd.com>,
	"Chen Jiqian" <Jiqian.Chen@amd.com>
Subject: Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi
Date: Mon, 20 Mar 2023 16:16:19 +0100	[thread overview]
Message-ID: <ZBh4w+hDajq06BU6@Air-de-Roger> (raw)
In-Reply-To: <alpine.DEB.2.22.394.2303171346410.3359@ubuntu-linux-20-04-desktop>

On Fri, Mar 17, 2023 at 01:55:08PM -0700, Stefano Stabellini wrote:
> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
> > > On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> > > > > On 17.03.2023 00:19, Stefano Stabellini wrote:
> > > > > > On Thu, 16 Mar 2023, Jan Beulich wrote:
> > > > > >> So yes, it then all boils down to that Linux-
> > > > > >> internal question.
> > > > > > 
> > > > > > Excellent question but we'll have to wait for Ray as he is the one with
> > > > > > access to the hardware. But I have this data I can share in the
> > > > > > meantime:
> > > > > > 
> > > > > > [    1.260378] IRQ to pin mappings:
> > > > > > [    1.260387] IRQ1 -> 0:1
> > > > > > [    1.260395] IRQ2 -> 0:2
> > > > > > [    1.260403] IRQ3 -> 0:3
> > > > > > [    1.260410] IRQ4 -> 0:4
> > > > > > [    1.260418] IRQ5 -> 0:5
> > > > > > [    1.260425] IRQ6 -> 0:6
> > > > > > [    1.260432] IRQ7 -> 0:7
> > > > > > [    1.260440] IRQ8 -> 0:8
> > > > > > [    1.260447] IRQ9 -> 0:9
> > > > > > [    1.260455] IRQ10 -> 0:10
> > > > > > [    1.260462] IRQ11 -> 0:11
> > > > > > [    1.260470] IRQ12 -> 0:12
> > > > > > [    1.260478] IRQ13 -> 0:13
> > > > > > [    1.260485] IRQ14 -> 0:14
> > > > > > [    1.260493] IRQ15 -> 0:15
> > > > > > [    1.260505] IRQ106 -> 1:8
> > > > > > [    1.260513] IRQ112 -> 1:4
> > > > > > [    1.260521] IRQ116 -> 1:13
> > > > > > [    1.260529] IRQ117 -> 1:14
> > > > > > [    1.260537] IRQ118 -> 1:15
> > > > > > [    1.260544] .................................... done.
> > > > > 
> > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with
> > > > > Linux running baremetal on the same hardware?
> > > > 
> > > > So I have some emails from Ray from he time he was looking into this,
> > > > and on Linux dom0 PVH dmesg there is:
> > > > 
> > > > [    0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23
> > > > [    0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55
> > > > 
> > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least
> > > > consistent.
> > > >  
> > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> > > > > > 112 (which is the one causing issues):
> > > > > > 
> > > > > > __acpi_register_gsi->
> > > > > >         acpi_register_gsi_ioapic->
> > > > > >                 mp_map_gsi_to_irq->
> > > > > >                         mp_map_pin_to_irq->
> > > > > >                                 __irq_resolve_mapping()
> > > > > > 
> > > > > >         if (likely(data)) {
> > > > > >                 desc = irq_data_to_desc(data);
> > > > > >                 if (irq)
> > > > > >                         *irq = data->irq;
> > > > > >                 /* this IRQ is 112, IO-APIC-34 domain */
> > > > > >         }
> > > > 
> > > > 
> > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC
> > > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> > > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> > > > using PHYSDEV ops instead of doing it natively from the IO-APIC?
> > > > 
> > > > FWIW, the introduced function in that patch
> > > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> > > > acpi_register_gsi_ioapic() without checking if the GSI is already
> > > > registered, which might lead to multiple IRQs being allocated for the
> > > > same underlying GSI?
> > > 
> > > I understand this point and I think it needs investigating.
> > > 
> > > 
> > > > As I commented there, I think that approach is wrong.  If the GSI has
> > > > not been mapped in Xen (because dom0 hasn't unmasked the respective
> > > > IO-APIC pin) we should add some logic in the toolstack to map it
> > > > before attempting to bind.
> > > 
> > > But this statement confuses me. The toolstack doesn't get involved in
> > > IRQ setup for PCI devices for HVM guests?
> > 
> > It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
> > to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
> > cold be removed (maybe for qemu-trad only?) or it's also required by
> > QEMU upstream, I would have to investigate more.
> 
> You are right. I am not certain, but it seems like a mistake in the
> toolstack to me. In theory, pci_add_dm_done should only be needed for PV
> guests, not for HVM guests. I am not sure. But I can see the call to
> xc_physdev_map_pirq you were referring to now.
> 
> 
> > It's my understanding it's in pci_add_dm_done() where Ray was getting
> > the mismatched IRQ vs GSI number.
> 
> I think the mismatch was actually caused by the xc_physdev_map_pirq call
> from QEMU, which makes sense because in any case it should happen before
> the same call done by pci_add_dm_done (pci_add_dm_done is called after
> sending the pci passthrough QMP command to QEMU). So the first to hit
> the IRQ!=GSI problem would be QEMU.

I've been thinking about this a bit, and I think one of the possible
issues with the current handling of GSIs in a PVH dom0 is that GSIs
don't get registered until/unless they are unmasked.  I could see this
as a problem when doing passthrough: it's possible for a GSI (iow:
vIO-APIC pin) to never get unmasked on dom0, because the device
driver(s) are using MSI(-X) interrupts instead.  However, the IO-APIC
pin must be configured for it to be able to be mapped into a domU.

A possible solution is to propagate the vIO-APIC pin configuration
trigger/polarity when dom0 writes the low part of the redirection
table entry.

The patch below enables the usage of PHYSDEVOP_{un,}map_pirq from PVH
domains (I need to assert this is secure even for domUs) and also
propagates the vIO-APIC pin trigger/polarity mode on writes to the
low part of the RTE.  Such propagation leads to the following
interrupt setup in Xen:

IRQ:   0 vec:f0 IO-APIC-edge    status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt()
IRQ:   1 vec:38 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   2 vec:a8 IO-APIC-edge    status=000 aff:{0-7}/{0-7} no_action()
IRQ:   3 vec:f1 IO-APIC-edge    status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt()
IRQ:   4 vec:40 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   5 vec:48 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   6 vec:50 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   7 vec:58 IO-APIC-edge    status=006 aff:{0-7}/{0} mapped, unbound
IRQ:   8 vec:60 IO-APIC-edge    status=010 aff:{0}/{0} in-flight=0 d0:  8(-M-)
IRQ:   9 vec:68 IO-APIC-edge    status=010 aff:{0}/{0} in-flight=0 d0:  9(-M-)
IRQ:  10 vec:70 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  11 vec:78 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  12 vec:88 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  13 vec:90 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  14 vec:98 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  15 vec:a0 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  16 vec:b0 IO-APIC-edge    status=010 aff:{1}/{0-7} in-flight=0 d0: 16(-M-)
IRQ:  17 vec:b8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  18 vec:c0 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  19 vec:c8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  20 vec:d0 IO-APIC-edge    status=010 aff:{1}/{0-7} in-flight=0 d0: 20(-M-)
IRQ:  21 vec:d8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  22 vec:e0 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  23 vec:e8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound

Note how now all GSIs on my box are setup, even when not bound to
dom0 anymore.  The output without this patch looks like:

IRQ:   0 vec:f0 IO-APIC-edge    status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt()
IRQ:   1 vec:38 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   3 vec:f1 IO-APIC-edge    status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt()
IRQ:   4 vec:40 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   5 vec:48 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   6 vec:50 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   7 vec:58 IO-APIC-edge    status=006 aff:{0}/{0} mapped, unbound
IRQ:   8 vec:d0 IO-APIC-edge    status=010 aff:{6}/{0-7} in-flight=0 d0:  8(-M-)
IRQ:   9 vec:a8 IO-APIC-level   status=010 aff:{2}/{0-7} in-flight=0 d0:  9(-M-)
IRQ:  10 vec:70 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  11 vec:78 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  12 vec:88 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  13 vec:90 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  14 vec:98 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  15 vec:a0 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  16 vec:e0 IO-APIC-level   status=010 aff:{6}/{0-7} in-flight=0 d0: 16(-M-),d1: 16(-M-)
IRQ:  20 vec:d8 IO-APIC-level   status=010 aff:{6}/{0-7} in-flight=0 d0: 20(-M-)

Legacy IRQs (below 16) are always registered.

With the patch above I seem to be able to do PCI passthrough to an HVM
domU from a PVH dom0.

Regards, Roger.

---
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 405d0a95af..cc53a3bd12 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     {
     case PHYSDEVOP_map_pirq:
     case PHYSDEVOP_unmap_pirq:
+        break;
+
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 41e3c4d5e4..50e23a093c 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -180,9 +180,7 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig,
 
     /* Interrupt has been unmasked, bind it now. */
     ret = mp_register_gsi(gsi, trig, pol);
-    if ( ret == -EEXIST )
-        return 0;
-    if ( ret )
+    if ( ret && ret != -EEXIST )
     {
         gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
                  gsi, ret);
@@ -244,12 +242,18 @@ static void vioapic_write_redirent(
     }
     else
     {
+        int ret;
+
         unmasked = ent.fields.mask;
         /* Remote IRR and Delivery Status are read-only. */
         ent.bits = ((ent.bits >> 32) << 32) | val;
         ent.fields.delivery_status = 0;
         ent.fields.remote_irr = pent->fields.remote_irr;
         unmasked = unmasked && !ent.fields.mask;
+        ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity);
+        if ( ret && ret !=  -EEXIST )
+            gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
+                    gsi, ret);
     }
 
     *pent = ent;



  reply	other threads:[~2023-03-20 15:17 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-12  7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui
2023-03-12  7:54 ` [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present Huang Rui
2023-03-13 11:55   ` Andrew Cooper
2023-03-13 12:21     ` Roger Pau Monné
2023-03-13 12:27       ` Andrew Cooper
2023-03-21  6:26         ` Huang Rui
2023-03-12  7:54 ` [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH Huang Rui
2023-03-13  7:23   ` Christian König
2023-03-13  7:26     ` Christian König
2023-03-13  8:46       ` Jan Beulich
2023-03-13  8:55       ` Huang Rui
2023-03-14 23:42       ` Stefano Stabellini
2023-03-14 16:02   ` Jan Beulich
2023-03-21  9:36     ` Huang Rui
2023-03-21  9:41       ` Jan Beulich
2023-03-21 10:14         ` Huang Rui
2023-03-21 10:20           ` Jan Beulich
2023-03-21 11:49             ` Huang Rui
2023-03-21 12:20               ` Roger Pau Monné
2023-03-21 12:25                 ` Jan Beulich
2023-03-21 12:59                 ` Huang Rui
2023-03-21 12:27               ` Jan Beulich
2023-03-21 13:03                 ` Huang Rui
2023-03-22  7:28                   ` Huang Rui
2023-03-22  7:45                     ` Jan Beulich
2023-03-22  9:34                     ` Roger Pau Monné
2023-03-22 12:33                       ` Huang Rui
2023-03-22 12:48                         ` Jan Beulich
2023-03-23 10:26                           ` Huang Rui
2023-03-23 14:16                             ` Jan Beulich
2023-03-23 10:43                           ` Roger Pau Monné
2023-03-23 13:34                             ` Huang Rui
2023-03-23 16:23                               ` Roger Pau Monné
2023-03-24  4:37                                 ` Huang Rui
2023-03-12  7:54 ` [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH Huang Rui
2023-03-14 16:27   ` Jan Beulich
2023-03-15 15:57   ` Roger Pau Monné
2023-03-16  0:22     ` Stefano Stabellini
2023-03-21 10:09     ` Huang Rui
2023-03-21 10:17       ` Jan Beulich
2023-03-12  7:54 ` [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call Huang Rui
2023-03-14 16:30   ` Jan Beulich
2023-03-15 17:01     ` Andrew Cooper
2023-03-16  0:26       ` Stefano Stabellini
2023-03-16  0:39         ` Stefano Stabellini
2023-03-16  8:51         ` Jan Beulich
2023-03-16  9:18           ` Roger Pau Monné
2023-03-16  7:05       ` Jan Beulich
2023-03-21 12:42       ` Huang Rui
2023-03-21 12:22     ` Huang Rui
2023-03-12  7:54 ` [RFC XEN PATCH 5/6] tools/libs/call: add linux os call to get gsi from irq Huang Rui
2023-03-14 16:36   ` Jan Beulich
2023-03-12  7:54 ` [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi Huang Rui
2023-03-14 16:39   ` Jan Beulich
2023-03-15 16:35   ` Roger Pau Monné
2023-03-16  0:44     ` Stefano Stabellini
2023-03-16  8:54       ` Roger Pau Monné
2023-03-16  8:55       ` Jan Beulich
2023-03-16  9:27         ` Roger Pau Monné
2023-03-16  9:42           ` Jan Beulich
2023-03-16 23:19             ` Stefano Stabellini
2023-03-17  8:39               ` Jan Beulich
2023-03-17  9:51                 ` Roger Pau Monné
2023-03-17 18:15                   ` Stefano Stabellini
2023-03-17 19:48                     ` Roger Pau Monné
2023-03-17 20:55                       ` Stefano Stabellini
2023-03-20 15:16                         ` Roger Pau Monné [this message]
2023-03-20 15:29                           ` Jan Beulich
2023-03-20 16:50                             ` Roger Pau Monné
2023-07-31 16:40                         ` Chen, Jiqian
2023-08-23  8:57                           ` Roger Pau Monné
2023-08-31  8:56                             ` Chen, Jiqian
2023-03-13  7:24 ` [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Christian König
2023-03-21 10:26   ` Huang Rui
2023-03-20 16:22 ` Huang Rui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZBh4w+hDajq06BU6@Air-de-Roger \
    --to=roger.pau@citrix.com \
    --cc=Jiqian.Chen@amd.com \
    --cc=Stewart.Hildebrand@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=anthony.perard@citrix.com \
    --cc=burzalodowa@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=honglei1.huang@amd.com \
    --cc=jbeulich@suse.com \
    --cc=julia.zhang@amd.com \
    --cc=ray.huang@amd.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.