ntb.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Logan Gunthorpe <logang@deltatee.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Bjorn Helgaas <helgaas@kernel.org>, Marc Zygnier <maz@kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Megha Dey <megha.dey@intel.com>, Ashok Raj <ashok.raj@intel.com>,
	linux-pci@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jon Mason <jdmason@kudzu.us>, Dave Jiang <dave.jiang@intel.com>,
	Allen Hubbe <allenbh@gmail.com>,
	linux-ntb@googlegroups.com, linux-s390@vger.kernel.org,
	Heiko Carstens <hca@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	x86@kernel.org, Joerg Roedel <jroedel@suse.de>,
	iommu@lists.linux-foundation.org
Subject: Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Date: Thu, 02 Dec 2021 23:31:11 +0100	[thread overview]
Message-ID: <87o85y63m8.ffs@tglx> (raw)
In-Reply-To: <20211202200017.GS4670@nvidia.com>

Jason,

On Thu, Dec 02 2021 at 16:00, Jason Gunthorpe wrote:
> On Thu, Dec 02, 2021 at 08:25:48PM +0100, Thomas Gleixner wrote:
>> We seem to have a serious problem of terminology and the understanding
>> of topology which is why we continue to talk past each other forever.
>
> I think I understand and agree with everything you said below.

Good!

> The point we diverge is where to put the vector storage:

Kinda. The vector, i.e. message storage is either:

  - MSI entry in the PCI config space
  - MSI-X table in the PCI config space
  - Device specific IMS storage

The software representation aka struct msi_desc is a different
story. That's what we are debating.

>> Of course we can store them in pci_dev.dev.msi.data.store. Either with a
>> dedicated xarray or by partitioning the xarray space. Both have their
>> pro and cons.
>
> This decision seems to drive the question of how many 'struct devices'
> do we need, and where do we get them..

Not really. There is nothing what enforces to make the MSI irqdomain
storage strictly hang off struct device. There has to be a connection to
a struct device in some way obviously to make IOMMU happy.

>> Such a logical function would be the entity to hand out for VFIO or
>> cdev.
>
> What is a logical function, concretely?

That's a name I came up with for a abstract representation of such a
queue container. I came up with that as a obvious consequence of my
previous reasoning about PF -> VF -> XF.

> Does it have struct device?

It does not necessarily have to .

> Can I instead suggest a name like 'message interrupt table' ?

Well yes, but that's not what I meant. See below.

> Ie a device has two linearly indexed message interrupt tables - the
> PCI SIG defined MSI/MSI-X one created by the PCI core and the IMS one
> created by the driver.
>
> Both start at 0 index and they have different irq_domains.
>
> Instead of asking the driver to create a domain we ask the driver to
> create a new 'message interrupt table'. The driver provides the
> irq_chip to program the messages and the pci_device. The core code
> manages the irq domain setup.
>
> Using what you say below:
>
>> If this is not split out, then every driver and wrapper has to come up
>> with it's own representation of this instead of being able to do:
>> 
>>      request_irq(msi_get_virq(lfunc, idx=0), handler0, ...);
>>      request_irq(msi_get_virq(lfunc, idx=1), handler1, ...);
>
> We could say:
>   msi_get_virq(device.pci_msi_table, index=0)
>
> Is the 0th PCI SIG MSI vector
>
> Something like:
>
>  ims_table = pci_create_msi_table(pci_dev, my_irq_chip,..)
>  msi_get_virq(ims_table, index=0)

Which is pretty much a wrapper around two different irqdomains for the
device and either partitioned index space in the xarray or two xarrays.

Just badly named because the table itself is where the resulting message
is stored, which is composed with the help of the relevant MSI
descriptor. See above.

We really should not try to make up an artifical table representation
for something which does not necessarily have a table at all, i.e. the
devices you talk about which store the message in queue specific system
memory. Pretending that this is a table is just silly.

Also I disagree that this has to be tied to a PCI specific interface,
except for creating a PCI specific wrapper for it to not make a driver
developer have to write '&pdev->dev', which is the very least of our
problems.

IMS as a technical concept is absolutely not PCI specific at all and not
invented by PCI/SIG. It's a marketing brandname for something which
existed way before they thought about it: Message signaled interrupts.

Aside of that 'my_irq_chip' does not cut it at all because of the way
how the resulting messages are stored. IDXD has IOMEM storage and a
storage space limitation while your device uses system memory storage
and has other limitations, i.e. system memory and the number of queues
the device can provide.

An irqchip is a just set of functions to talk to hardware either
directly or via some indirect transport (I2C, SPI, MLX queue management
magic...). These functions require irqdomain and/or device specific
information to function.

Trying to create a universal pci_create_foo() wrapper around this is
going to be like the 13th Herkulean task.

Seriously, you cannot make something uniform which is by definition
non-uniform.

Let's not even try to pretend that it is possible.

> Is the 0th IMS vector
>
> Is it close to what you are thinking with lfunc?

Not really. I was really reasoning about an abstract representation for
a functional queue, which is more than just a queue allocated from the
PF or VF device.

I really meant a container like this:

struct logical_function {
        /* Pointer to the physical device */
        struct device		*phys_device;
        /* MSI descriptor storage */
	struct msi_data		msi;
        /* The queue number */
        unsigned int		queue_nr;
        /* Add more information which is common to these things */
};

Now the real queue, which is obviously not generic:

struct myqueue_function {
	struct logical_function lfunc;
        struct myqueue		queue;
};        

The idea is to have a common representation for these type of things
which allows:

 1) Have common code for exposing queues to VFIO, cdev, sysfs...

    You still need myqueue specific code, but the common stuff which is
    in struct logical_function can be generic and device independent.

 2) Having the MSI storage per logical function (queue) allows to have
    a queue relative 0 based MSI index space.

    The actual index in the physical table (think IMS) would be held in
    the msi descriptor itself.

`   Which then allows queue relative addressing without extra device/queue
    specific meta storage.

    i.e.

        msi_get_virq(&myqueue->lfunc.msi, idx = 0)

    v.s.

        idx = myqueue->msidx[0];
        msi_get_virq(pcidev->dev, idx);

        where the queue management code has to set up myqueue->msidx[]
        and stick the index of the underlying device storage into it.

 3) Setup and teardown would be simply per logical function for
    all of the related resources which are required.

    Interrrupt teardown would look like this:

      msi_domain_free_all_irqs(irqdomain, &lfunc->msi);

    vs.

      for (i = 0; i < myqueue->nrirqs; i++)
           msi_domain_free_irq(irqdomain, &pcidev->dev, myqueue->msidx[0]);


Now change struct logical_function to:

struct logical_function {
-       /* Pointer to the physical device */
-       struct device		*phys_device;

+       /* Pseudo device to allow using devres */
+       struct pseudo_device	pseudo_device;

	/* MSI descriptor storage */
	struct msi_data		msi;
        /* The queue number */
        unsigned int		queue_nr;
        /* Add more information which is common to these things */
};

where struct pseudo_device holds the phys_device pointer and then you
can utilize the devres infrastructure like you do for any other device
and do:

      pseudo_device_add(&myqueue->lfunc.pseudo_device);

at setup time and

      pseudo_device_remove(&myqueue->lfunc.pseudo_device);

on teardown and let all the resources including MSI interrupts be
released automatically.

Needs some infrastructure obviously, but to me that makes a lot of
sense.

And I named it pseudo_device on purpose as it is just a vehicle to make
existing infrastructure which is device specific usable for this kind of
thing.

I might be completely off track. Feel free to tell me so :)

Thanks,

        tglx

  reply	other threads:[~2021-12-02 22:31 UTC|newest]

Thread overview: 141+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-27  1:23 [patch 00/32] genirq/msi, PCI/MSI: Spring cleaning - Part 2 Thomas Gleixner
2021-11-27  1:23 ` [patch 01/32] genirq/msi: Move descriptor list to struct msi_device_data Thomas Gleixner
2021-11-27 12:19   ` Greg Kroah-Hartman
2021-11-27  1:23 ` [patch 02/32] genirq/msi: Add mutex for MSI list protection Thomas Gleixner
2021-11-27  1:23 ` [patch 03/32] genirq/msi: Provide msi_domain_alloc/free_irqs_descs_locked() Thomas Gleixner
2021-11-27  1:23 ` [patch 04/32] genirq/msi: Provide a set of advanced MSI accessors and iterators Thomas Gleixner
2021-11-28  1:00   ` Jason Gunthorpe
2021-11-28 19:22     ` Thomas Gleixner
2021-11-29  9:26       ` Thomas Gleixner
2021-11-29 14:01         ` Jason Gunthorpe
2021-11-29 14:46           ` Thomas Gleixner
2021-11-27  1:23 ` [patch 05/32] genirq/msi: Provide msi_alloc_msi_desc() and a simple allocator Thomas Gleixner
2021-11-27  1:23 ` [patch 06/32] genirq/msi: Provide domain flags to allocate/free MSI descriptors automatically Thomas Gleixner
2021-11-27  1:23 ` [patch 07/32] genirq/msi: Count the allocated MSI descriptors Thomas Gleixner
2021-11-27 12:19   ` Greg Kroah-Hartman
2021-11-27 19:22     ` Thomas Gleixner
2021-11-27 19:45       ` Thomas Gleixner
2021-11-28 11:07         ` Greg Kroah-Hartman
2021-11-28 19:23           ` Thomas Gleixner
2021-11-27  1:23 ` [patch 08/32] PCI/MSI: Protect MSI operations Thomas Gleixner
2021-11-27  1:23 ` [patch 09/32] PCI/MSI: Use msi_add_msi_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 10/32] PCI/MSI: Let core code free MSI descriptors Thomas Gleixner
2021-11-27  1:23 ` [patch 11/32] PCI/MSI: Use msi_on_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 12/32] x86/pci/xen: Use msi_for_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 13/32] xen/pcifront: Rework MSI handling Thomas Gleixner
2021-11-27  1:23 ` [patch 14/32] s390/pci: Rework MSI descriptor walk Thomas Gleixner
2021-11-29 10:31   ` Niklas Schnelle
2021-11-29 13:04     ` Thomas Gleixner
2021-11-27  1:23 ` [patch 15/32] powerpc/4xx/hsta: Rework MSI handling Thomas Gleixner
2021-11-27  1:23 ` [patch 16/32] powerpc/cell/axon_msi: Convert to msi_on_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 17/32] powerpc/pasemi/msi: Convert to msi_on_each_dec() Thomas Gleixner
2021-11-27  1:23 ` [patch 18/32] powerpc/fsl_msi: Use msi_for_each_desc() Thomas Gleixner
2021-11-27  1:23 ` [patch 19/32] powerpc/mpic_u3msi: Use msi_for_each-desc() Thomas Gleixner
2021-11-27  1:24 ` [patch 20/32] PCI: hv: Rework MSI handling Thomas Gleixner
2021-11-27  1:24 ` [patch 21/32] NTB/msi: Convert to msi_on_each_desc() Thomas Gleixner
2021-11-29 18:21   ` Logan Gunthorpe
2021-11-29 20:51     ` Thomas Gleixner
2021-11-29 22:27       ` Logan Gunthorpe
2021-11-29 22:50         ` Dave Jiang
2021-11-29 23:31         ` Jason Gunthorpe
2021-11-29 23:52           ` Logan Gunthorpe
2021-11-30  0:01             ` Jason Gunthorpe
2021-11-30  0:29         ` Thomas Gleixner
2021-11-30 19:21           ` Logan Gunthorpe
2021-11-30 19:48             ` Thomas Gleixner
2021-11-30 20:14               ` Logan Gunthorpe
2021-11-30 20:28               ` Jason Gunthorpe
2021-11-30 21:23                 ` Thomas Gleixner
2021-12-01  0:17                   ` Jason Gunthorpe
2021-12-01 10:16                     ` Thomas Gleixner
2021-12-01 13:00                       ` Jason Gunthorpe
2021-12-01 17:35                         ` Thomas Gleixner
2021-12-01 18:14                           ` Jason Gunthorpe
2021-12-01 18:46                             ` Logan Gunthorpe
2021-12-01 20:21                             ` Thomas Gleixner
2021-12-02  0:01                               ` Thomas Gleixner
2021-12-02 13:55                                 ` Jason Gunthorpe
2021-12-02 14:23                                   ` Greg Kroah-Hartman
2021-12-02 14:45                                     ` Jason Gunthorpe
2021-12-02 19:25                                   ` Thomas Gleixner
2021-12-02 20:00                                     ` Jason Gunthorpe
2021-12-02 22:31                                       ` Thomas Gleixner [this message]
2021-12-03  0:37                                         ` Jason Gunthorpe
2021-12-03 15:07                                           ` Thomas Gleixner
2021-12-03 16:41                                             ` Jason Gunthorpe
2021-12-04 14:20                                               ` Thomas Gleixner
2021-12-05 14:16                                                 ` Thomas Gleixner
2021-12-06 14:43                                                   ` Jason Gunthorpe
2021-12-06 15:47                                                     ` Thomas Gleixner
2021-12-06 17:00                                                       ` Jason Gunthorpe
2021-12-06 20:28                                                         ` Thomas Gleixner
2021-12-06 21:06                                                           ` Jason Gunthorpe
2021-12-06 22:21                                                             ` Thomas Gleixner
2021-12-06 14:19                                                 ` Jason Gunthorpe
2021-12-06 15:06                                                   ` Thomas Gleixner
2021-12-09  6:26                                               ` Tian, Kevin
2021-12-09  9:03                                                 ` Thomas Gleixner
2021-12-09 12:17                                                   ` Tian, Kevin
2021-12-09 15:57                                                     ` Thomas Gleixner
2021-12-10  7:37                                                       ` Tian, Kevin
2021-12-09  5:41                                   ` Tian, Kevin
2021-12-09  5:47                                     ` Jason Wang
2021-12-01 16:28                       ` Dave Jiang
2021-12-01 18:41                         ` Thomas Gleixner
2021-12-01 18:47                           ` Dave Jiang
2021-12-01 20:25                             ` Thomas Gleixner
2021-12-01 21:21                               ` Dave Jiang
2021-12-01 21:44                                 ` Thomas Gleixner
2021-12-01 21:49                                   ` Dave Jiang
2021-12-01 22:03                                     ` Thomas Gleixner
2021-12-01 22:53                                       ` Dave Jiang
2021-12-01 23:57                                         ` Thomas Gleixner
2021-12-09  5:23                                   ` Tian, Kevin
2021-12-09  8:37                                     ` Thomas Gleixner
2021-12-09 12:31                                       ` Tian, Kevin
2021-12-09 16:21                                       ` Jason Gunthorpe
2021-12-09 20:32                                         ` Thomas Gleixner
2021-12-09 20:58                                           ` Jason Gunthorpe
2021-12-09 22:09                                             ` Thomas Gleixner
2021-12-10  0:26                                               ` Thomas Gleixner
2021-12-10  7:29                                                 ` Tian, Kevin
2021-12-10 12:13                                                   ` Thomas Gleixner
2021-12-11  8:06                                                     ` Tian, Kevin
2021-12-10 12:39                                                   ` Jason Gunthorpe
2021-12-10 19:00                                                     ` Thomas Gleixner
2021-12-11  7:44                                                       ` Tian, Kevin
2021-12-11 13:04                                                         ` Thomas Gleixner
2021-12-12  1:56                                                           ` Tian, Kevin
2021-12-12 20:55                                                             ` Thomas Gleixner
2021-12-12 23:37                                                               ` Jason Gunthorpe
2021-12-13  7:50                                                                 ` Tian, Kevin
2021-12-11  7:52                                                     ` Tian, Kevin
2021-12-12  0:12                                                       ` Thomas Gleixner
2021-12-12  2:14                                                         ` Tian, Kevin
2021-12-12 20:50                                                           ` Thomas Gleixner
2021-12-12 23:42                                                         ` Jason Gunthorpe
2021-12-10  7:36                                             ` Tian, Kevin
2021-12-10 12:30                                               ` Jason Gunthorpe
2021-12-12  6:44                                               ` Mika Penttilä
2021-12-12 23:27                                                 ` Jason Gunthorpe
2021-12-01 14:52                   ` Thomas Gleixner
2021-12-01 15:11                     ` Jason Gunthorpe
2021-12-01 18:37                       ` Thomas Gleixner
2021-12-01 18:47                         ` Jason Gunthorpe
2021-12-01 20:26                           ` Thomas Gleixner
2021-11-27  1:24 ` [patch 22/32] soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation Thomas Gleixner
2021-11-27  1:24 ` [patch 23/32] soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs() Thomas Gleixner
2021-11-27  1:24 ` [patch 24/32] bus: fsl-mc-msi: Simplify MSI descriptor handling Thomas Gleixner
2021-11-27  1:24 ` [patch 25/32] platform-msi: Let core code handle MSI descriptors Thomas Gleixner
2021-11-27  1:24 ` [patch 26/32] platform-msi: Simplify platform device MSI code Thomas Gleixner
2021-11-27  1:24 ` [patch 27/32] genirq/msi: Make interrupt allocation less convoluted Thomas Gleixner
2021-11-27  1:24 ` [patch 28/32] genirq/msi: Convert to new functions Thomas Gleixner
2021-11-27  1:24 ` [patch 29/32] genirq/msi: Mop up old interfaces Thomas Gleixner
2021-11-27  1:24 ` [patch 30/32] genirq/msi: Add abuse prevention comment to msi header Thomas Gleixner
2021-11-27  1:24 ` [patch 31/32] genirq/msi: Simplify sysfs handling Thomas Gleixner
2021-11-27 12:32   ` Greg Kroah-Hartman
2021-11-27 19:31     ` Thomas Gleixner
2021-11-28 11:07       ` Greg Kroah-Hartman
2021-11-28 19:33         ` Thomas Gleixner
2021-11-27  1:24 ` [patch 32/32] genirq/msi: Convert storage to xarray Thomas Gleixner
2021-11-27 12:33   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o85y63m8.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=alex.williamson@redhat.com \
    --cc=allenbh@gmail.com \
    --cc=ashok.raj@intel.com \
    --cc=borntraeger@de.ibm.com \
    --cc=dave.jiang@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=helgaas@kernel.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jdmason@kudzu.us \
    --cc=jgg@nvidia.com \
    --cc=jroedel@suse.de \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-ntb@googlegroups.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).