ntb.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Logan Gunthorpe <logang@deltatee.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Bjorn Helgaas <helgaas@kernel.org>, Marc Zygnier <maz@kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Megha Dey <megha.dey@intel.com>, Ashok Raj <ashok.raj@intel.com>,
	linux-pci@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jon Mason <jdmason@kudzu.us>, Dave Jiang <dave.jiang@intel.com>,
	Allen Hubbe <allenbh@gmail.com>,
	linux-ntb@googlegroups.com, linux-s390@vger.kernel.org,
	Heiko Carstens <hca@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	x86@kernel.org, Joerg Roedel <jroedel@suse.de>,
	iommu@lists.linux-foundation.org,
	Kalle Valo <kvalo@codeaurora.org>
Subject: Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Date: Mon, 06 Dec 2021 21:28:47 +0100	[thread overview]
Message-ID: <875ys14gw0.ffs@tglx> (raw)
In-Reply-To: <20211206170035.GJ4670@nvidia.com>

Jason,

On Mon, Dec 06 2021 at 13:00, Jason Gunthorpe wrote:
> On Mon, Dec 06, 2021 at 04:47:58PM +0100, Thomas Gleixner wrote:
>> It will need some more than that, e.g. mask/unmask and as we discussed
>> quite some time ago something like the irq_buslock/unlock pair, so you
>> can handle updates to the state from thread context via a command queue
>> (IIRC).
>
> pci_msi_create_irq_domain() hooks into the platforms irq_chip as an
> alternative to hierarchical irq domains (?)

It's based on hierarchical irqdomains. It's the outermost domain and irq
chip. On X86:

  [VECTOR chip=apic]->[PCI chip=PCI]
or
  [VECTOR chip=apic]->[IOMMU chip=IR]->[PCI chip=PCI-IR]

The chips are different because of quirks. See below :)

>		chip->irq_write_msi_msg = pci_msi_domain_write_msg;
> In almost all cases 'ops' will come along with a 'state', so lets
> create one:
>
> struct msi_storage {  // Look, I avoided the word table!

I already made a note, that I need to smuggle a table in somewhere :)

> And others for the different cases. Look no ifs!
>
> OK?

That's already the plan in some form, but there's a long way towards
that. See below.

Also there will be a question of how many different callbacks we're
going to create just to avoid one conditional. At some point this might
become silly.

> Now, we have some duplication between the struct msi_storage_ops and
> the struct irq_chip. Let's see what that is about:
>
> arch/x86/kernel/apic/msi.c:     .irq_write_msi_msg      = dmar_msi_write_msg,
> arch/x86/kernel/hpet.c: .irq_write_msi_msg = hpet_msi_write_msg,
>
> Surprised! These are actually IMS. The HPET and DMAR devices both have
> device-specific message storage! So these could use
> msi_storage_ops. And WTF is IOMMU DMAR driver code doing in
> apic/msi.c ???

Historical reasons coming from the pre irqdomain aera. Also DMAR needs
direct access to the x86 low level composer which we didn't want to
expose. Plus DMAR is shared with ia64 to make it more interesting.

Yes, they can be converted. But that's the least of my worries. Those
are straight forward and not really relevant for the design.

> arch/powerpc/platforms/pseries/msi.c:   .irq_write_msi_msg      = pseries_msi_write_msg,
>
> AFAICT this is really like virtualization? The hypervisor is
> controlling the real MSI table and what the OS sees is faked out
> somewhat.
>
> This is more of quirk in the PCI MSI implementation (do not touch the
> storage) and a block on non-PCI uses of MSI similar to what x86 needs?

There is an underlying hypervisor of some sorts and that stuff needs to
deal with it. I leave that to the powerpc wizards to sort out.

> drivers/irqchip/irq-gic-v2m.c:  .irq_write_msi_msg      = pci_msi_domain_write_msg,
> drivers/irqchip/irq-gic-v3-its-pci-msi.c:       .irq_write_msi_msg      = pci_msi_domain_write_msg,
> drivers/irqchip/irq-gic-v3-mbi.c:       .irq_write_msi_msg      = pci_msi_domain_write_msg,
>
> ARM seems to be replacing the 'mask at source' with 'mask at
> destination' - I wonder why?

Because the majority of PCI/MSI endpoint implementations do not provide
masking...

We're telling hardware people for 15+ years that this is a horrible
idea, but it's as effective as talking to a wall. Sure the spec grants
them to make my life miserable...

> Should this really be hierarchical where we mask *both* the MSI
> originating device (storage_ops->mask) and at the CPU IRQ controller?
> (gicv2m_mask_msi_irq ?) if it can?

I wish I could mask underneath for some stuff on x86. Though that would
not help with the worst problem vs. affinity settings. See the horrible
dance in:

    x86/kernel/apic/msi.c::msi_set_affinity()

So this will end up with a shim as the base domain for !IOMMU systems:

		 			       |--[HPET]
  [VECTOR chip=apic]-|--[x86-msi chip=x86-msi]-|--[PCI/MSI]
		     |--[DMAR]		       |--[PCI/MSI-X]

That nonsense can't move into the apic domain set_affinity() callback as
this is not required when interrupt remapping is enabled.

With IOMMU this looks then:

		 		        |--[HPET]
  [VECTOR chip=apic]-|--[IOMMU chip=IR]-|--[PCI/MSI]
		     |--[DMAR]	        |--[PCI/MSI-X]
		 		        |--[PCI/IMS]

> drivers/base/platform-msi.c:            chip->irq_write_msi_msg = platform_msi_write_msg;
>
> Oh! this is doing what I kind of just suggested, just non-generically
> and hacked into platform bus drivers the same as PCI does:

Correct. It's a hack and it's on the list of things which need to
vanish. I was already discussing that with Marc on the side for quite a
while.

> PCI, HPET, DMAR move to msi_storage_ops instead of using irq_chip

With different parent domains. DMAR hangs always directly off the vector
domain. HPET has its own IOMMU zone.

You forgot IO/APIC which is a MSI endpoint too, just more convoluted but
it's not using MSI domains so it's not in the way. I'm not going to
touch that with a ten foot pole. :)

There's also VMD, HyperV and the XEN crime which is a horrible shim to
make device->msi_domain consistent on x86. For fixing XEN properly I'm
not masochistic enough.

> For API compat every pci struct device will have to instantiate a
> msi_storage someplace, but that seems easy enough.

That's easy to hide in the existing driver interfaces for PCI/MSI and
PCI/MSI-X.

> Seems like a nice uniform solution?

That's where I'm heading.

I have a full inventory of the various horrors involved, so I have a
pretty good picture what kind of oddities are involved, where a shim
domain is required and which underlying platforms require the MSI irq
chip to do:

    irq_chip_mask()
       msi_ops->mask()
       parent->chip->mask()

and so forth. I need to figure out how the parent irq chip / irqdomain
transports that requirement.

But that part is not where the real work is. I'll get there eventually
once I sorted the underlying parts:

   - Building up the infrastructure in kernel/irq/

   - Decomposing pci/msi/* further

   - Make use of the infrastructure for an alternate pci/msi
     implemention.
   
   - Have a transition mechanism to convert one part at a time to keep
     the patch sizes reviewable and the whole mess bisectable.

Doing all this while keeping the full universe of legacy/arch, current
PCI/MSI domains alive makes that interesting. I broke the world in the
past, so I'm not afraid of doing it again. Though I try to avoid it to
the extent possible. :)

I have a pretty good picture in my head and notes already, which needs
to be dumped into code. But let me post part 1-3 V2 first, so that pile
gets out of the way. Not having to juggle 90 patches makes life easier.

Thanks,

        tglx

  reply	other threads:[~2021-12-06 20:28 UTC|newest]

Thread overview: 141+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-27  1:23 [patch 00/32] genirq/msi, PCI/MSI: Spring cleaning - Part 2 Thomas Gleixner
2021-11-27  1:23 ` [patch 01/32] genirq/msi: Move descriptor list to struct msi_device_data Thomas Gleixner
2021-11-27 12:19   ` Greg Kroah-Hartman
2021-11-27  1:23 ` [patch 02/32] genirq/msi: Add mutex for MSI list protection Thomas Gleixner
2021-11-27  1:23 ` [patch 03/32] genirq/msi: Provide msi_domain_alloc/free_irqs_descs_locked() Thomas Gleixner
2021-11-27  1:23 ` [patch 04/32] genirq/msi: Provide a set of advanced MSI accessors and iterators Thomas Gleixner
2021-11-28  1:00   ` Jason Gunthorpe
2021-11-28 19:22     ` Thomas Gleixner
2021-11-29  9:26       ` Thomas Gleixner
2021-11-29 14:01         ` Jason Gunthorpe
2021-11-29 14:46           ` Thomas Gleixner
2021-11-27  1:23 ` [patch 05/32] genirq/msi: Provide msi_alloc_msi_desc() and a simple allocator Thomas Gleixner
2021-11-27  1:23 ` [patch 06/32] genirq/msi: Provide domain flags to allocate/free MSI descriptors automatically Thomas Gleixner
2021-11-27  1:23 ` [patch 07/32] genirq/msi: Count the allocated MSI descriptors Thomas Gleixner
2021-11-27 12:19   ` Greg Kroah-Hartman
2021-11-27 19:22     ` Thomas Gleixner
2021-11-27 19:45       ` Thomas Gleixner
2021-11-28 11:07         ` Greg Kroah-Hartman
2021-11-28 19:23           ` Thomas Gleixner
2021-11-27  1:23 ` [patch 08/32] PCI/MSI: Protect MSI operations Thomas Gleixner
2021-11-27  1:23 ` [patch 09/32] PCI/MSI: Use msi_add_msi_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 10/32] PCI/MSI: Let core code free MSI descriptors Thomas Gleixner
2021-11-27  1:23 ` [patch 11/32] PCI/MSI: Use msi_on_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 12/32] x86/pci/xen: Use msi_for_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 13/32] xen/pcifront: Rework MSI handling Thomas Gleixner
2021-11-27  1:23 ` [patch 14/32] s390/pci: Rework MSI descriptor walk Thomas Gleixner
2021-11-29 10:31   ` Niklas Schnelle
2021-11-29 13:04     ` Thomas Gleixner
2021-11-27  1:23 ` [patch 15/32] powerpc/4xx/hsta: Rework MSI handling Thomas Gleixner
2021-11-27  1:23 ` [patch 16/32] powerpc/cell/axon_msi: Convert to msi_on_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 17/32] powerpc/pasemi/msi: Convert to msi_on_each_dec() Thomas Gleixner
2021-11-27  1:23 ` [patch 18/32] powerpc/fsl_msi: Use msi_for_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 19/32] powerpc/mpic_u3msi: Use msi_for_each-desc() Thomas Gleixner
2021-11-27  1:24 ` [patch 20/32] PCI: hv: Rework MSI handling Thomas Gleixner
2021-11-27  1:24 ` [patch 21/32] NTB/msi: Convert to msi_on_each_desc() Thomas Gleixner
2021-11-29 18:21   ` Logan Gunthorpe
2021-11-29 20:51     ` Thomas Gleixner
2021-11-29 22:27       ` Logan Gunthorpe
2021-11-29 22:50         ` Dave Jiang
2021-11-29 23:31         ` Jason Gunthorpe
2021-11-29 23:52           ` Logan Gunthorpe
2021-11-30  0:01             ` Jason Gunthorpe
2021-11-30  0:29         ` Thomas Gleixner
2021-11-30 19:21           ` Logan Gunthorpe
2021-11-30 19:48             ` Thomas Gleixner
2021-11-30 20:14               ` Logan Gunthorpe
2021-11-30 20:28               ` Jason Gunthorpe
2021-11-30 21:23                 ` Thomas Gleixner
2021-12-01  0:17                   ` Jason Gunthorpe
2021-12-01 10:16                     ` Thomas Gleixner
2021-12-01 13:00                       ` Jason Gunthorpe
2021-12-01 17:35                         ` Thomas Gleixner
2021-12-01 18:14                           ` Jason Gunthorpe
2021-12-01 18:46                             ` Logan Gunthorpe
2021-12-01 20:21                             ` Thomas Gleixner
2021-12-02  0:01                               ` Thomas Gleixner
2021-12-02 13:55                                 ` Jason Gunthorpe
2021-12-02 14:23                                   ` Greg Kroah-Hartman
2021-12-02 14:45                                     ` Jason Gunthorpe
2021-12-02 19:25                                   ` Thomas Gleixner
2021-12-02 20:00                                     ` Jason Gunthorpe
2021-12-02 22:31                                       ` Thomas Gleixner
2021-12-03  0:37                                         ` Jason Gunthorpe
2021-12-03 15:07                                           ` Thomas Gleixner
2021-12-03 16:41                                             ` Jason Gunthorpe
2021-12-04 14:20                                               ` Thomas Gleixner
2021-12-05 14:16                                                 ` Thomas Gleixner
2021-12-06 14:43                                                   ` Jason Gunthorpe
2021-12-06 15:47                                                     ` Thomas Gleixner
2021-12-06 17:00                                                       ` Jason Gunthorpe
2021-12-06 20:28                                                         ` Thomas Gleixner [this message]
2021-12-06 21:06                                                           ` Jason Gunthorpe
2021-12-06 22:21                                                             ` Thomas Gleixner
2021-12-06 14:19                                                 ` Jason Gunthorpe
2021-12-06 15:06                                                   ` Thomas Gleixner
2021-12-09  6:26                                               ` Tian, Kevin
2021-12-09  9:03                                                 ` Thomas Gleixner
2021-12-09 12:17                                                   ` Tian, Kevin
2021-12-09 15:57                                                     ` Thomas Gleixner
2021-12-10  7:37                                                       ` Tian, Kevin
2021-12-09  5:41                                   ` Tian, Kevin
2021-12-09  5:47                                     ` Jason Wang
2021-12-01 16:28                       ` Dave Jiang
2021-12-01 18:41                         ` Thomas Gleixner
2021-12-01 18:47                           ` Dave Jiang
2021-12-01 20:25                             ` Thomas Gleixner
2021-12-01 21:21                               ` Dave Jiang
2021-12-01 21:44                                 ` Thomas Gleixner
2021-12-01 21:49                                   ` Dave Jiang
2021-12-01 22:03                                     ` Thomas Gleixner
2021-12-01 22:53                                       ` Dave Jiang
2021-12-01 23:57                                         ` Thomas Gleixner
2021-12-09  5:23                                   ` Tian, Kevin
2021-12-09  8:37                                     ` Thomas Gleixner
2021-12-09 12:31                                       ` Tian, Kevin
2021-12-09 16:21                                       ` Jason Gunthorpe
2021-12-09 20:32                                         ` Thomas Gleixner
2021-12-09 20:58                                           ` Jason Gunthorpe
2021-12-09 22:09                                             ` Thomas Gleixner
2021-12-10  0:26                                               ` Thomas Gleixner
2021-12-10  7:29                                                 ` Tian, Kevin
2021-12-10 12:13                                                   ` Thomas Gleixner
2021-12-11  8:06                                                     ` Tian, Kevin
2021-12-10 12:39                                                   ` Jason Gunthorpe
2021-12-10 19:00                                                     ` Thomas Gleixner
2021-12-11  7:44                                                       ` Tian, Kevin
2021-12-11 13:04                                                         ` Thomas Gleixner
2021-12-12  1:56                                                           ` Tian, Kevin
2021-12-12 20:55                                                             ` Thomas Gleixner
2021-12-12 23:37                                                               ` Jason Gunthorpe
2021-12-13  7:50                                                                 ` Tian, Kevin
2021-12-11  7:52                                                     ` Tian, Kevin
2021-12-12  0:12                                                       ` Thomas Gleixner
2021-12-12  2:14                                                         ` Tian, Kevin
2021-12-12 20:50                                                           ` Thomas Gleixner
2021-12-12 23:42                                                         ` Jason Gunthorpe
2021-12-10  7:36                                             ` Tian, Kevin
2021-12-10 12:30                                               ` Jason Gunthorpe
2021-12-12  6:44                                               ` Mika Penttilä
2021-12-12 23:27                                                 ` Jason Gunthorpe
2021-12-01 14:52                   ` Thomas Gleixner
2021-12-01 15:11                     ` Jason Gunthorpe
2021-12-01 18:37                       ` Thomas Gleixner
2021-12-01 18:47                         ` Jason Gunthorpe
2021-12-01 20:26                           ` Thomas Gleixner
2021-11-27  1:24 ` [patch 22/32] soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation Thomas Gleixner
2021-11-27  1:24 ` [patch 23/32] soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs() Thomas Gleixner
2021-11-27  1:24 ` [patch 24/32] bus: fsl-mc-msi: Simplify MSI descriptor handling Thomas Gleixner
2021-11-27  1:24 ` [patch 25/32] platform-msi: Let core code handle MSI descriptors Thomas Gleixner
2021-11-27  1:24 ` [patch 26/32] platform-msi: Simplify platform device MSI code Thomas Gleixner
2021-11-27  1:24 ` [patch 27/32] genirq/msi: Make interrupt allocation less convoluted Thomas Gleixner
2021-11-27  1:24 ` [patch 28/32] genirq/msi: Convert to new functions Thomas Gleixner
2021-11-27  1:24 ` [patch 29/32] genirq/msi: Mop up old interfaces Thomas Gleixner
2021-11-27  1:24 ` [patch 30/32] genirq/msi: Add abuse prevention comment to msi header Thomas Gleixner
2021-11-27  1:24 ` [patch 31/32] genirq/msi: Simplify sysfs handling Thomas Gleixner
2021-11-27 12:32   ` Greg Kroah-Hartman
2021-11-27 19:31     ` Thomas Gleixner
2021-11-28 11:07       ` Greg Kroah-Hartman
2021-11-28 19:33         ` Thomas Gleixner
2021-11-27  1:24 ` [patch 32/32] genirq/msi: Convert storage to xarray Thomas Gleixner
2021-11-27 12:33   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875ys14gw0.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=alex.williamson@redhat.com \
    --cc=allenbh@gmail.com \
    --cc=ashok.raj@intel.com \
    --cc=borntraeger@de.ibm.com \
    --cc=dave.jiang@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=helgaas@kernel.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jdmason@kudzu.us \
    --cc=jgg@nvidia.com \
    --cc=jroedel@suse.de \
    --cc=kevin.tian@intel.com \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-ntb@googlegroups.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).