ntb.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Logan Gunthorpe <logang@deltatee.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Bjorn Helgaas <helgaas@kernel.org>, Marc Zygnier <maz@kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Megha Dey <megha.dey@intel.com>, Ashok Raj <ashok.raj@intel.com>,
	linux-pci@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jon Mason <jdmason@kudzu.us>, Dave Jiang <dave.jiang@intel.com>,
	Allen Hubbe <allenbh@gmail.com>,
	linux-ntb@googlegroups.com, linux-s390@vger.kernel.org,
	Heiko Carstens <hca@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	x86@kernel.org, Joerg Roedel <jroedel@suse.de>,
	iommu@lists.linux-foundation.org
Subject: Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Date: Fri, 03 Dec 2021 16:07:58 +0100	[thread overview]
Message-ID: <877dcl681d.ffs@tglx> (raw)
In-Reply-To: <20211203003749.GT4670@nvidia.com>

Jason,

On Thu, Dec 02 2021 at 20:37, Jason Gunthorpe wrote:
> On Thu, Dec 02, 2021 at 11:31:11PM +0100, Thomas Gleixner wrote:
>> >> Of course we can store them in pci_dev.dev.msi.data.store. Either with a
>> >> dedicated xarray or by partitioning the xarray space. Both have their
>> >> pro and cons.
>> >
>> > This decision seems to drive the question of how many 'struct devices'
>> > do we need, and where do we get them..
>> 
>> Not really. There is nothing what enforces to make the MSI irqdomain
>> storage strictly hang off struct device. There has to be a connection to
>> a struct device in some way obviously to make IOMMU happy.
>
> Yes, I thought this too, OK we agree

One thing which just occured to me and I missed so far is the
association of the irqdomain.

Right now we know for each device which irqdomain it belongs to. That's
part of device discovery (PCI, device tree or whatever) which set's up

     device->irqdomain

That obviously cannot work for that device specific IMS domain. Why?

Because the PCIe device provides MSI[x] which obviously requires that
device->irqdomain is pointing to the PCI/MSI irqdomain. Otherwise the
PCI/MSI allocation mechanism wont work.

Sure each driver can have

     mystruct->ims_irqdomain

and then do the allocation via

    msi_irq_domain_alloc(mystruct->ims_irqdomain....)

but I really need to look at all of the MSI infrastructure code whether
it makes assumptions about this:

   msi_desc->dev->irqdomain

being the correct one. Which would obviously just be correct when we'd
have a per queue struct device :)

>> Just badly named because the table itself is where the resulting message
>> is stored, which is composed with the help of the relevant MSI
>> descriptor. See above.
>
> I picked the name because it looks like it will primarily contain an
> xarray and the API you suggested to query it is idx based. To me that
> is a table. A table of msi_desc storage to be specific.
>
> It seems we have some agreement here as your lfunc also included the
> same xarray and uses an idx as part of the API, right?

It's a per lfunc xarray, yes.

>> We really should not try to make up an artifical table representation
>> for something which does not necessarily have a table at all, i.e. the
>> devices you talk about which store the message in queue specific system
>> memory. Pretending that this is a table is just silly.
>
> Now I'm a bit confused, what is the idx in the lfunc? 
>
> I think this is language again, I would call idx an artificial table
> representation?

Ok. I prefer the term indexed storage, but yeah.

>> I really meant a container like this:
>> 
>> struct logical_function {
>>         /* Pointer to the physical device */
>>         struct device		*phys_device;
>>         /* MSI descriptor storage */
>> 	struct msi_data		msi;
>
> Up to here is basically what I thought the "message table" would
> contain. Possibly a pointer to the iommu_domain too?
>
>>         /* The queue number */
>>         unsigned int		queue_nr;
>>         /* Add more information which is common to these things */
>
> Not sure why do we need this?
>
> Lets imagine a simple probe function for a simple timer device. It has
> no cdev, vfio, queues, etc. However, the device only supports IMS. No
> MSI, MSI-X, nothing else.
>
> What does the probe function look like to get to request_irq()?

The timer device would be represented by what? A struct device?

If so then something needs to set device->irqdomain and then you can
just do:

     msi_irq_domain_alloc_irqs(dev->irqdomain, dev, ...);

> Why does this simple driver create something called a 'logical
> function' to access its only interrupt?

It does not have to when it's struct device based.

> I think you have the right idea here, just forget about VFIO, the IDXD
> use case, all of it. Provide a way to use IMS cleanly and concurrently
> with MSI.
>
> Do that and everything else should have sane solutions too.

To do that, I need to understand the bigger picture and the intended
usage because otherwise we end up with an utter mess.

>> The idea is to have a common representation for these type of things
>> which allows:
>> 
>>  1) Have common code for exposing queues to VFIO, cdev, sysfs...
>> 
>>     You still need myqueue specific code, but the common stuff which is
>>     in struct logical_function can be generic and device independent.
>
> I will quote you: "Seriously, you cannot make something uniform which
> is by definition non-uniform." :)
>
> We will find there is no common stuff here, we did this exercise
> already when we designed struct auxiliary_device, and came up
> empty.

Really?

>>  2) Having the MSI storage per logical function (queue) allows to have
>>     a queue relative 0 based MSI index space.
>
> Can you explain a bit what you see 0 meaning in this idx numbering?
>
> I also don't understand what 'queue relative' means?
>
>>     The actual index in the physical table (think IMS) would be held in
>>     the msi descriptor itself.
>
> This means that a device like IDXD would store the phsical IMS table
> entry # in the msi descriptor? What is idx then?
>
> For a device like IDXD with a simple linear table, how does the driver
> request a specific entry in the IMS table? eg maybe IMS table entry #0
> is special like we often see in MSI?

If there is a hardwired entry then this hardwired entry belongs to the
controlblock (status, error or whatever), but certainly not to a
dynamically allocatable queue as that would defeat the whole purpose.

That hardwired entry could surely exist in a IDXD type setup where the
message storage is in an defined array on the device.

But with the use case you described where you store the message at some
place in per queue system memory, the MSI index is not relevant at all,
because there is no table. So it's completely meaningless for the device
and just useful for finding the related MSI descriptor again.

If the queue or the controlblock have an internal small array of
messages in their queue or controlblock specific memory, then how would
a per device global index help?

Or has each queue and controlblock and whatever access to a shared large
array where the messages are stored and the indices are handed out to
the queues and controlblocks?

If each of them have their own small array, then queue relative indexing
makes a ton of sense, no?

> For the devices I know about there are two approaches for allocating
> interrupts. 

I'm aware of that.

> When I imagine mlx5 using IMS, I see IMS as a simple extension of the
> existing MSI-X vector pool. Every IMS vector is interchangable and the
> main PCI driver will apply exactly the same allocation algorithm we
> already have today for MSI, just with even more vectors. When VFIO
> wants a vector it may get a MSI or it may get an IMS, I don't want to
> care.
>
> All of this about logical functions just confuses
> responsibilities. The IRQ layer should be worrying about configuring
> IRQs and not dictating how the device will design its IRQ assignment
> policy or subdevice scheme.

The IRQ layer _has_ to worry about it because there is a strict
relationship between device and irqdomain today.

Now with a PCIe device having PCI/MSI[X] it's a strict 1:1
relationship. Now add IMS to the picture and we end up with a 1:2
relationship, which does not work today.

That's why my initial reaction was to partition the device into
subdevices which would have solved the problem nicely.

> IMHO this has become hyper focused on the special IDXD VFIO use case -
> step back and think about my timer example above - a simple pci_driver
> that just wants to use IMS for itself. No queues, no VFIO, no
> mess. Just how does it configure and use the interrupt source.

That would mean that the PCI device would not use MSI-X at all, right?
So it could have pci_dev.dev.irqdomain = IMSdomain.

But that sucks too because it's yet another quirk as the PCIe discovery
cannot set that up simply because the device specific IMS domain does
not exist yet.

> Is it helpful feedback?

Yes, definitely. It helps me to understand the bigger picture and to
find a proper representation of these things so that we don't end up
with the usual mess in drivers/* which makes it a fricking nightmare to
change anything at the core code at all.

Thanks,

        tglx

  reply	other threads:[~2021-12-03 15:07 UTC|newest]

Thread overview: 141+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-27  1:23 [patch 00/32] genirq/msi, PCI/MSI: Spring cleaning - Part 2 Thomas Gleixner
2021-11-27  1:23 ` [patch 01/32] genirq/msi: Move descriptor list to struct msi_device_data Thomas Gleixner
2021-11-27 12:19   ` Greg Kroah-Hartman
2021-11-27  1:23 ` [patch 02/32] genirq/msi: Add mutex for MSI list protection Thomas Gleixner
2021-11-27  1:23 ` [patch 03/32] genirq/msi: Provide msi_domain_alloc/free_irqs_descs_locked() Thomas Gleixner
2021-11-27  1:23 ` [patch 04/32] genirq/msi: Provide a set of advanced MSI accessors and iterators Thomas Gleixner
2021-11-28  1:00   ` Jason Gunthorpe
2021-11-28 19:22     ` Thomas Gleixner
2021-11-29  9:26       ` Thomas Gleixner
2021-11-29 14:01         ` Jason Gunthorpe
2021-11-29 14:46           ` Thomas Gleixner
2021-11-27  1:23 ` [patch 05/32] genirq/msi: Provide msi_alloc_msi_desc() and a simple allocator Thomas Gleixner
2021-11-27  1:23 ` [patch 06/32] genirq/msi: Provide domain flags to allocate/free MSI descriptors automatically Thomas Gleixner
2021-11-27  1:23 ` [patch 07/32] genirq/msi: Count the allocated MSI descriptors Thomas Gleixner
2021-11-27 12:19   ` Greg Kroah-Hartman
2021-11-27 19:22     ` Thomas Gleixner
2021-11-27 19:45       ` Thomas Gleixner
2021-11-28 11:07         ` Greg Kroah-Hartman
2021-11-28 19:23           ` Thomas Gleixner
2021-11-27  1:23 ` [patch 08/32] PCI/MSI: Protect MSI operations Thomas Gleixner
2021-11-27  1:23 ` [patch 09/32] PCI/MSI: Use msi_add_msi_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 10/32] PCI/MSI: Let core code free MSI descriptors Thomas Gleixner
2021-11-27  1:23 ` [patch 11/32] PCI/MSI: Use msi_on_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 12/32] x86/pci/xen: Use msi_for_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 13/32] xen/pcifront: Rework MSI handling Thomas Gleixner
2021-11-27  1:23 ` [patch 14/32] s390/pci: Rework MSI descriptor walk Thomas Gleixner
2021-11-29 10:31   ` Niklas Schnelle
2021-11-29 13:04     ` Thomas Gleixner
2021-11-27  1:23 ` [patch 15/32] powerpc/4xx/hsta: Rework MSI handling Thomas Gleixner
2021-11-27  1:23 ` [patch 16/32] powerpc/cell/axon_msi: Convert to msi_on_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 17/32] powerpc/pasemi/msi: Convert to msi_on_each_dec() Thomas Gleixner
2021-11-27  1:23 ` [patch 18/32] powerpc/fsl_msi: Use msi_for_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 19/32] powerpc/mpic_u3msi: Use msi_for_each-desc() Thomas Gleixner
2021-11-27  1:24 ` [patch 20/32] PCI: hv: Rework MSI handling Thomas Gleixner
2021-11-27  1:24 ` [patch 21/32] NTB/msi: Convert to msi_on_each_desc() Thomas Gleixner
2021-11-29 18:21   ` Logan Gunthorpe
2021-11-29 20:51     ` Thomas Gleixner
2021-11-29 22:27       ` Logan Gunthorpe
2021-11-29 22:50         ` Dave Jiang
2021-11-29 23:31         ` Jason Gunthorpe
2021-11-29 23:52           ` Logan Gunthorpe
2021-11-30  0:01             ` Jason Gunthorpe
2021-11-30  0:29         ` Thomas Gleixner
2021-11-30 19:21           ` Logan Gunthorpe
2021-11-30 19:48             ` Thomas Gleixner
2021-11-30 20:14               ` Logan Gunthorpe
2021-11-30 20:28               ` Jason Gunthorpe
2021-11-30 21:23                 ` Thomas Gleixner
2021-12-01  0:17                   ` Jason Gunthorpe
2021-12-01 10:16                     ` Thomas Gleixner
2021-12-01 13:00                       ` Jason Gunthorpe
2021-12-01 17:35                         ` Thomas Gleixner
2021-12-01 18:14                           ` Jason Gunthorpe
2021-12-01 18:46                             ` Logan Gunthorpe
2021-12-01 20:21                             ` Thomas Gleixner
2021-12-02  0:01                               ` Thomas Gleixner
2021-12-02 13:55                                 ` Jason Gunthorpe
2021-12-02 14:23                                   ` Greg Kroah-Hartman
2021-12-02 14:45                                     ` Jason Gunthorpe
2021-12-02 19:25                                   ` Thomas Gleixner
2021-12-02 20:00                                     ` Jason Gunthorpe
2021-12-02 22:31                                       ` Thomas Gleixner
2021-12-03  0:37                                         ` Jason Gunthorpe
2021-12-03 15:07                                           ` Thomas Gleixner [this message]
2021-12-03 16:41                                             ` Jason Gunthorpe
2021-12-04 14:20                                               ` Thomas Gleixner
2021-12-05 14:16                                                 ` Thomas Gleixner
2021-12-06 14:43                                                   ` Jason Gunthorpe
2021-12-06 15:47                                                     ` Thomas Gleixner
2021-12-06 17:00                                                       ` Jason Gunthorpe
2021-12-06 20:28                                                         ` Thomas Gleixner
2021-12-06 21:06                                                           ` Jason Gunthorpe
2021-12-06 22:21                                                             ` Thomas Gleixner
2021-12-06 14:19                                                 ` Jason Gunthorpe
2021-12-06 15:06                                                   ` Thomas Gleixner
2021-12-09  6:26                                               ` Tian, Kevin
2021-12-09  9:03                                                 ` Thomas Gleixner
2021-12-09 12:17                                                   ` Tian, Kevin
2021-12-09 15:57                                                     ` Thomas Gleixner
2021-12-10  7:37                                                       ` Tian, Kevin
2021-12-09  5:41                                   ` Tian, Kevin
2021-12-09  5:47                                     ` Jason Wang
2021-12-01 16:28                       ` Dave Jiang
2021-12-01 18:41                         ` Thomas Gleixner
2021-12-01 18:47                           ` Dave Jiang
2021-12-01 20:25                             ` Thomas Gleixner
2021-12-01 21:21                               ` Dave Jiang
2021-12-01 21:44                                 ` Thomas Gleixner
2021-12-01 21:49                                   ` Dave Jiang
2021-12-01 22:03                                     ` Thomas Gleixner
2021-12-01 22:53                                       ` Dave Jiang
2021-12-01 23:57                                         ` Thomas Gleixner
2021-12-09  5:23                                   ` Tian, Kevin
2021-12-09  8:37                                     ` Thomas Gleixner
2021-12-09 12:31                                       ` Tian, Kevin
2021-12-09 16:21                                       ` Jason Gunthorpe
2021-12-09 20:32                                         ` Thomas Gleixner
2021-12-09 20:58                                           ` Jason Gunthorpe
2021-12-09 22:09                                             ` Thomas Gleixner
2021-12-10  0:26                                               ` Thomas Gleixner
2021-12-10  7:29                                                 ` Tian, Kevin
2021-12-10 12:13                                                   ` Thomas Gleixner
2021-12-11  8:06                                                     ` Tian, Kevin
2021-12-10 12:39                                                   ` Jason Gunthorpe
2021-12-10 19:00                                                     ` Thomas Gleixner
2021-12-11  7:44                                                       ` Tian, Kevin
2021-12-11 13:04                                                         ` Thomas Gleixner
2021-12-12  1:56                                                           ` Tian, Kevin
2021-12-12 20:55                                                             ` Thomas Gleixner
2021-12-12 23:37                                                               ` Jason Gunthorpe
2021-12-13  7:50                                                                 ` Tian, Kevin
2021-12-11  7:52                                                     ` Tian, Kevin
2021-12-12  0:12                                                       ` Thomas Gleixner
2021-12-12  2:14                                                         ` Tian, Kevin
2021-12-12 20:50                                                           ` Thomas Gleixner
2021-12-12 23:42                                                         ` Jason Gunthorpe
2021-12-10  7:36                                             ` Tian, Kevin
2021-12-10 12:30                                               ` Jason Gunthorpe
2021-12-12  6:44                                               ` Mika Penttilä
2021-12-12 23:27                                                 ` Jason Gunthorpe
2021-12-01 14:52                   ` Thomas Gleixner
2021-12-01 15:11                     ` Jason Gunthorpe
2021-12-01 18:37                       ` Thomas Gleixner
2021-12-01 18:47                         ` Jason Gunthorpe
2021-12-01 20:26                           ` Thomas Gleixner
2021-11-27  1:24 ` [patch 22/32] soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation Thomas Gleixner
2021-11-27  1:24 ` [patch 23/32] soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs() Thomas Gleixner
2021-11-27  1:24 ` [patch 24/32] bus: fsl-mc-msi: Simplify MSI descriptor handling Thomas Gleixner
2021-11-27  1:24 ` [patch 25/32] platform-msi: Let core code handle MSI descriptors Thomas Gleixner
2021-11-27  1:24 ` [patch 26/32] platform-msi: Simplify platform device MSI code Thomas Gleixner
2021-11-27  1:24 ` [patch 27/32] genirq/msi: Make interrupt allocation less convoluted Thomas Gleixner
2021-11-27  1:24 ` [patch 28/32] genirq/msi: Convert to new functions Thomas Gleixner
2021-11-27  1:24 ` [patch 29/32] genirq/msi: Mop up old interfaces Thomas Gleixner
2021-11-27  1:24 ` [patch 30/32] genirq/msi: Add abuse prevention comment to msi header Thomas Gleixner
2021-11-27  1:24 ` [patch 31/32] genirq/msi: Simplify sysfs handling Thomas Gleixner
2021-11-27 12:32   ` Greg Kroah-Hartman
2021-11-27 19:31     ` Thomas Gleixner
2021-11-28 11:07       ` Greg Kroah-Hartman
2021-11-28 19:33         ` Thomas Gleixner
2021-11-27  1:24 ` [patch 32/32] genirq/msi: Convert storage to xarray Thomas Gleixner
2021-11-27 12:33   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877dcl681d.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=alex.williamson@redhat.com \
    --cc=allenbh@gmail.com \
    --cc=ashok.raj@intel.com \
    --cc=borntraeger@de.ibm.com \
    --cc=dave.jiang@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=helgaas@kernel.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jdmason@kudzu.us \
    --cc=jgg@nvidia.com \
    --cc=jroedel@suse.de \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-ntb@googlegroups.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).