All of lore.kernel.org
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Jonathan.Cameron@huawei.com, bilbao@vt.edu,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	leon@kernel.org, LKML <linux-kernel@vger.kernel.org>,
	linux-pci@vger.kernel.org, Linuxarm <linuxarm@huawei.com>,
	luzmaximilian@gmail.com, mchehab+huawei@kernel.org,
	schnelle@linux.ibm.com, Barry Song <song.bao.hua@hisilicon.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
Date: Sun, 22 Aug 2021 10:41:17 +1200	[thread overview]
Message-ID: <CAGsJ_4w35+mRE_qp117HhNOaHeUN1cO6GGPW36qtjaX6wUcQNA@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4wXqnudVO92qSKLdyJaMNuDE-d0srs=4rgJmOQKcG2P3g@mail.gmail.com>

On Sun, Aug 22, 2021 at 10:14 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi Bjorn,
> >
> > On Sat, 21 Aug 2021 00:33:28 +0100,
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > [+cc Thomas, Marc]
> > >
> > > On Sat, Aug 21, 2021 at 10:37:43AM +1200, Barry Song wrote:
> > > > From: Barry Song <song.bao.hua@hisilicon.com>
> > > >
> > > > /sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
> > > > moment especially for MSI-X cases.
> > >
> > > AFAICT this patch *only* affects MSI-X.  So are you saying the sysfs
> > > ABI is fine for MSI but confusing for MSI-X?
> > >
> > > > While MSI sets IRQ to the first
> > > > number in the vector, MSI-X does nothing for this though it saves
> > > > default_irq in msix_setup_entries(). Weird the saved default_irq
> > > > for MSI-X is never used in pci_msix_shutdown(), which is quite
> > > > different with pci_msi_shutdown(). Thus, this patch moves to show
> > > > the first IRQ number which is from the first msi_entry for MSI-X.
> > > > Hopefully, this can make IRQ ABI more clear and more consistent.
> > > >
> > > > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > > > ---
> > > >  drivers/pci/msi.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > > index 9232255..6bbf81b 100644
> > > > --- a/drivers/pci/msi.c
> > > > +++ b/drivers/pci/msi.c
> > > > @@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > >     int ret;
> > > >     u16 control;
> > > >     void __iomem *base;
> > > > +   struct msi_desc *desc;
> > > >
> > > >     /* Ensure MSI-X is disabled while it is set up */
> > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> > > > @@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> > > >
> > > >     pcibios_free_irq(dev);
> > > > +
> > > > +   desc = first_pci_msi_entry(dev);
> > > > +   dev->irq = desc->irq;
> > >
> > > This change is not primarily about sysfs.  This is about changing
> > > "dev->irq" when MSI-X is enabled, and it's only incidental that sysfs
> > > reflects that.
> > >
> > > So we need to know the effect of changing dev->irq.  Drivers may use
> > > the value of dev->irq, and I'm *guessing* this change shouldn't break
> > > them since we already do this for MSI, but I'd like some more expert
> > > opinion than mine :)
> > >
> > > For MSI we have:
> > >
> > >   msi_capability_init
> > >     msi_setup_entry
> > >       entry = alloc_msi_entry(nvec)
> > >       entry->msi_attrib.default_irq = dev->irq;     /* Save IOAPIC IRQ */
> > >     dev->irq = entry->irq;
> > >
> > >   pci_msi_shutdown
> > >     /* Restore dev->irq to its default pin-assertion IRQ */
> > >     dev->irq = desc->msi_attrib.default_irq;
> > >
> > > and for MSI-X we have:
> > >
> > >   msix_capability_init
> > >     msix_setup_entries
> > >       for (i = 0; i < nvec; i++)
> > >         entry = alloc_msi_entry(1)
> > >       entry->msi_attrib.default_irq = dev->irq;
> > >
> > >   pci_msix_shutdown
> > >     for_each_pci_msi_entry(entry, dev)
> > >       __pci_msix_desc_mask_irq
> > > +   dev->irq = entry->msi_attrib.default_irq;   # added by this patch
> > >
> > >
> > > Things that seem strange to me:
> > >
> > >   - The msi_setup_entry() comment "Save IOAPIC IRQ" seems needlessly
> > >     specific; maybe it should be "INTx IRQ".
> > >
> > >   - The pci_msi_shutdown() comment "Restore ... pin-assertion IRQ"
> > >     should match the msi_setup_entry() one, e.g., maybe it should also
> > >     be "INTx IRQ".  There are no INTx or IOAPIC pins in PCIe.
> > >
> > >   - The only use of .default_irq is to save and restore dev->irq, so
> > >     it looks like a per-device thing, not a per-vector thing.
> > >
> > >     In msi_setup_entry() there's only one msi_entry, so there's only
> > >     one saved .default_irq.
> > >
> > >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > >     get a saved .default_irq in each one?
> >
> > That's a key point.
> >
> > Old-school PCI/MSI is represented by a single interrupt, and you
> > *could* somehow make it relatively easy for drivers that only
> > understand INTx to migrate to MSI if you replaced whatever is held in
> > dev->irq (which should only represent the INTx mapping) with the MSI
> > interrupt number. Which I guess is what the MSI code is doing.
> >
> > This is the 21st century, and nobody should ever rely on such horror,
> > but I'm sure we do have such drivers in the tree. Boo.
> >
> > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > zero guarantee that the allocated interrupts will be in a contiguous
> > space.
> >
> > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > again!)".
> >
>
> The only thing is that dev->irq is an sysfs ABI to userspace. Due to
> the inconsistency
> between legacy PCI INTx, MSI, MSI-X, this ABI should have been
> absolutely broken nowadays.
> This is actually what the patchset was originally aiming at to fix.
>
> One more question from me is that does dev->irq actually hold any
> valid hardware INTx
> information while hardware is using MSI-X? At least in my hardware,
> sysfs ABI for PCI is all "0".
>
> root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat irq
> 0
>
> root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# ls -l msi_irqs/*
> -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/499
> -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/500
> -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/501
> ...
> root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat msi_irqs/499
> msix
>
> Not quite sure how it is going on different hardware platforms.
>
> > MSI-X is not something you can "accidentally" use. You have to
> > actively embrace it. In all honesty, this patch tries to move in the
> > wrong direction. If anything, we should kill this hack altogether and
> > fix the (handful of?) drivers that rely on it. That'd actually be a
> > good way to find whether they are still worth keeping in the tree. And
> > if it breaks too many of them, then at least we'll know where we
> > stand.
> >
> > I'd be tempted to leave the below patch simmer in -next for a few
> > weeks and see if how many people shout:
>
> This looks like a more proper direction to go.
> but here i am wondering how sysfs ABI document should follow the below change
> doc is patch 2/2:
> https://lore.kernel.org/lkml/20210820223744.8439-3-21cnbao@gmail.com/
>
> On the other hand, my feeling is that nobody should depend on sysfs
> irq entry nowadays.
> For example, userspace irqbalance is actually using /sys/devices/.../msi_irqs/
> So probably we should set this ABI invisible when devices are using
> MSI or MSI-X?

i mean something like the below,

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 5d63df7..1323841 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -26,6 +26,7 @@
 #include <linux/slab.h>
 #include <linux/vgaarb.h>
 #include <linux/pm_runtime.h>
+#include <linux/msi.h>
 #include <linux/of.h>
 #include "pci.h"

@@ -1437,6 +1438,16 @@ static umode_t pci_dev_attrs_are_visible(struct
kobject *kobj,
                if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
                        return 0;

+#ifdef CONFIG_PCI_MSI
+       /*
+        * if devices are MSI and MSI-X, IRQ sysfs ABI is meaningless
+        * and broken
+        */
+       if (a == &dev_attr_irq.attr)
+               if (first_pci_msi_entry(pdev))
+                       return 0;
+#endif
+
        return a->mode;
 }

>
> >
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index e5e75331b415..2be9a01cbe72 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -591,7 +591,6 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
> >         entry->msi_attrib.is_virtual    = 0;
> >         entry->msi_attrib.entry_nr      = 0;
> >         entry->msi_attrib.maskbit       = !!(control & PCI_MSI_FLAGS_MASKBIT);
> > -       entry->msi_attrib.default_irq   = dev->irq;     /* Save IOAPIC IRQ */
> >         entry->msi_attrib.multi_cap     = (control & PCI_MSI_FLAGS_QMASK) >> 1;
> >         entry->msi_attrib.multiple      = ilog2(__roundup_pow_of_two(nvec));
> >
> > @@ -682,7 +681,6 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> >         dev->msi_enabled = 1;
> >
> >         pcibios_free_irq(dev);
> > -       dev->irq = entry->irq;
> >         return 0;
> >  }
> >
> > @@ -742,7 +740,6 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
> >                 entry->msi_attrib.is_virtual =
> >                         entry->msi_attrib.entry_nr >= vec_count;
> >
> > -               entry->msi_attrib.default_irq   = dev->irq;
> >                 entry->mask_base                = base;
> >
> >                 addr = pci_msix_desc_addr(entry);
> > @@ -964,8 +961,6 @@ static void pci_msi_shutdown(struct pci_dev *dev)
> >         mask = msi_mask(desc->msi_attrib.multi_cap);
> >         msi_mask_irq(desc, mask, 0);
> >
> > -       /* Restore dev->irq to its default pin-assertion IRQ */
> > -       dev->irq = desc->msi_attrib.default_irq;
> >         pcibios_alloc_irq(dev);
> >  }
> >
> > diff --git a/include/linux/msi.h b/include/linux/msi.h
> > index e8bdcb83172b..a631664c1c38 100644
> > --- a/include/linux/msi.h
> > +++ b/include/linux/msi.h
> > @@ -114,7 +114,6 @@ struct ti_sci_inta_msi_desc {
> >   * @maskbit:   [PCI MSI/X] Mask-Pending bit supported?
> >   * @is_64:     [PCI MSI/X] Address size: 0=32bit 1=64bit
> >   * @entry_nr:  [PCI MSI/X] Entry which is described by this descriptor
> > - * @default_irq:[PCI MSI/X] The default pre-assigned non-MSI irq
> >   * @mask_pos:  [PCI MSI]   Mask register position
> >   * @mask_base: [PCI MSI-X] Mask register base address
> >   * @platform:  [platform]  Platform device specific msi descriptor data
> > @@ -148,7 +147,6 @@ struct msi_desc {
> >                                 u8      is_64           : 1;
> >                                 u8      is_virtual      : 1;
> >                                 u16     entry_nr;
> > -                               unsigned default_irq;
> >                         } msi_attrib;
> >                         union {
> >                                 u8      mask_pos;
> >
> > Thanks,
> >
> >         M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
>

Thanks
barry

  reply	other threads:[~2021-08-21 22:41 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-20 22:37 [PATCH v2 0/2] PCI/MSI: Clarify the IRQ sysfs ABI for PCI devices Barry Song
2021-08-20 22:37 ` [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X Barry Song
2021-08-20 23:33   ` Bjorn Helgaas
2021-08-21 10:42     ` Marc Zyngier
2021-08-21 22:14       ` Barry Song
2021-08-21 22:41         ` Barry Song [this message]
2021-08-23 10:33           ` Marc Zyngier
2021-08-24 19:25           ` Bjorn Helgaas
2021-08-23 10:30         ` Marc Zyngier
2021-08-23 11:03           ` Barry Song
2021-08-23 11:28             ` Marc Zyngier
2021-08-23 22:46               ` Barry Song
2021-08-24 19:34                 ` Bjorn Helgaas
2021-08-25  9:45                   ` Marc Zyngier
2021-08-24 20:51       ` Barry Song
2021-08-24 21:29         ` Barry Song
2021-08-25 10:24           ` Marc Zyngier
2021-08-24 22:51             ` Barry Song
2021-08-20 22:37 ` [PATCH v2 2/2] Documentation: ABI: sysfs-bus-pci: Add description for IRQ entry Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGsJ_4w35+mRE_qp117HhNOaHeUN1cO6GGPW36qtjaX6wUcQNA@mail.gmail.com \
    --to=21cnbao@gmail.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=bhelgaas@google.com \
    --cc=bilbao@vt.edu \
    --cc=corbet@lwn.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=luzmaximilian@gmail.com \
    --cc=maz@kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=schnelle@linux.ibm.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.