From: Jason Gunthorpe <jgg@nvidia.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>, <x86@kernel.org>,
Marc Zyngier <maz@kernel.org>, Megha Dey <megha.dey@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Alex Williamson <alex.williamson@redhat.com>,
"Jacob Pan" <jacob.jun.pan@intel.com>,
Baolu Lu <baolu.lu@intel.com>, Kevin Tian <kevin.tian@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Joerg Roedel <joro@8bytes.org>,
<iommu@lists.linux-foundation.org>,
<linux-hyperv@vger.kernel.org>,
Haiyang Zhang <haiyangz@microsoft.com>,
"Jon Derrick" <jonathan.derrick@intel.com>,
Lu Baolu <baolu.lu@linux.intel.com>, Wei Liu <wei.liu@kernel.org>,
"K. Y. Srinivasan" <kys@microsoft.com>,
"Stephen Hemminger" <sthemmin@microsoft.com>,
Steve Wahl <steve.wahl@hpe.com>,
"Dimitri Sivanich" <sivanich@hpe.com>,
Russ Anderson <rja@hpe.com>, <linux-pci@vger.kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
"Lorenzo Pieralisi" <lorenzo.pieralisi@arm.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
<xen-devel@lists.xenproject.org>, Juergen Gross <jgross@suse.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
"Stefano Stabellini" <sstabellini@kernel.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING
Date: Fri, 21 Aug 2020 17:17:05 -0300 [thread overview]
Message-ID: <20200821201705.GA2811871@nvidia.com> (raw)
In-Reply-To: <874kovsrvk.fsf@nanos.tec.linutronix.de>
On Fri, Aug 21, 2020 at 09:47:43PM +0200, Thomas Gleixner wrote:
> On Fri, Aug 21 2020 at 09:45, Jason Gunthorpe wrote:
> > On Fri, Aug 21, 2020 at 02:25:02AM +0200, Thomas Gleixner wrote:
> >> +static void ims_mask_irq(struct irq_data *data)
> >> +{
> >> + struct msi_desc *desc = irq_data_get_msi_desc(data);
> >> + struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem;
> >> + u32 __iomem *ctrl = &slot->ctrl;
> >> +
> >> + iowrite32(ioread32(ctrl) & ~IMS_VECTOR_CTRL_UNMASK, ctrl);
> >
> > Just to be clear, this is exactly the sort of operation we can't do
> > with non-MSI interrupts. For a real PCI device to execute this it
> > would have to keep the data on die.
>
> We means NVIDIA and your new device, right?
We'd like to use this in the current Mellanox NIC HW, eg the mlx5
driver. (NVIDIA acquired Mellanox recently)
> So if I understand correctly then the queue memory where the MSI
> descriptor sits is in RAM.
Yes, IMHO that is the whole point of this 'IMS' stuff. If devices
could have enough on-die memory then they could just use really big
MSI-X tables. Currently due to on-die memory constraints mlx5 is
limited to a few hundred MSI-X vectors.
Since MSI-X tables are exposed via MMIO they can't be 'swapped' to
RAM.
Moving away from MSI-X's MMIO access model allows them to be swapped
to RAM. The cost is that accessing them for update is a
command/response operation not a MMIO operation.
The HW is already swapping the queues causing the interrupts to RAM,
so adding a bit of additional data to store the MSI addr/data is
reasonable.
To give some sense, a 'working set' for the NIC device in some cases
can be hundreds of megabytes of data. System RAM is used to store
this, and precious on-die memory holds some dynamic active set, much
like a processor cache.
> How is that supposed to work if interrupt remapping is disabled?
The best we can do is issue a command to the device and spin/sleep
until completion. The device will serialize everything internally.
If the device has died the driver has code to detect and trigger a
PCI function reset which will definitely stop the interrupt.
So, the implementation of these functions would be to push any change
onto a command queue, trigger the device to DMA the command, spin/sleep
until the device returns a response and then continue on. If the
device doesn't return a response in a time window then trigger a WQ to
do a full device reset.
The spin/sleep is only needed if the update has to be synchronous, so
things like rebalancing could just push the rebalancing work and
immediately return.
> If interrupt remapping is enabled then both are trivial because then the
> irq chip can delegate everything to the parent chip, i.e. the remapping
> unit.
I did like this notion that IRQ remapping could avoid the overhead of
spin/spleep. Most of the use cases we have for this will require the
IOMMU anyhow.
> > I saw the idxd driver was doing something like this, I assume it
> > avoids trouble because it is a fake PCI device integrated with the
> > CPU, not on a real PCI bus?
>
> That's how it is implemented as far as I understood the patches. It's
> device memory therefore iowrite32().
I don't know anything about idxd.. Given the scale of interrupt need I
assumed the idxd HW had some hidden swapping to RAM.
Since it is on-die with the CPU there are a bunch of ways I could
imagine Intel could make MMIO triggered swapping work that are not
available to a true PCI-E device.
Jason
next prev parent reply other threads:[~2020-08-21 20:17 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-21 0:24 [patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 01/38] iommu/amd: Prevent NULL pointer dereference Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 02/38] x86/init: Remove unused init ops Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 03/38] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 04/38] x86/irq: Add allocation type for parent domain retrieval Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 05/38] iommu/vt-d: Consolidate irq domain getter Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 06/38] iommu/amd: " Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 07/38] iommu/irq_remapping: Consolidate irq domain lookup Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 08/38] x86/irq: Prepare consolidation of irq_alloc_info Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 09/38] x86/msi: Consolidate HPET allocation Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 10/38] x86/ioapic: Consolidate IOAPIC allocation Thomas Gleixner
2020-08-26 8:40 ` Boqun Feng
2020-08-26 9:53 ` Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 11/38] x86/irq: Consolidate DMAR irq allocation Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 12/38] x86/irq: Consolidate UV domain allocation Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq() Thomas Gleixner
2020-08-25 20:03 ` Bjorn Helgaas
2020-08-25 21:11 ` Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 14/38] x86/msi: Consolidate MSI allocation Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 15/38] x86/msi: Use generic MSI domain ops Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 16/38] x86/irq: Move apic_post_init() invocation to one place Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 17/38] x86/pci: Reducde #ifdeffery in PCI init code Thomas Gleixner
2020-08-25 20:20 ` Bjorn Helgaas
2020-08-21 0:24 ` [patch RFC 18/38] x86/irq: Initialize PCI/MSI domain at PCI init time Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 19/38] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 20/38] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI Thomas Gleixner
2020-08-25 20:04 ` Bjorn Helgaas
2020-08-21 0:24 ` [patch RFC 21/38] PCI: MSI: Provide pci_dev_has_special_msi_domain() helper Thomas Gleixner
2020-08-25 20:16 ` Bjorn Helgaas
2020-08-21 0:24 ` [patch RFC 22/38] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() Thomas Gleixner
2020-08-24 4:48 ` Jürgen Groß
2020-08-21 0:24 ` [patch RFC 23/38] x86/xen: Rework MSI teardown Thomas Gleixner
2020-08-24 5:09 ` Jürgen Groß
2020-08-21 0:24 ` [patch RFC 24/38] x86/xen: Consolidate XEN-MSI init Thomas Gleixner
2020-08-24 4:59 ` Jürgen Groß
2020-08-24 21:21 ` Thomas Gleixner
2020-08-25 4:21 ` Jürgen Groß
2020-08-25 9:51 ` Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 25/38] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 26/38] x86/xen: Wrap XEN MSI management into irqdomain Thomas Gleixner
2020-08-24 6:21 ` Jürgen Groß
2020-08-25 7:57 ` Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 27/38] iommm/vt-d: Store irq domain in struct device Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 28/38] iommm/amd: " Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 29/38] x86/pci: Set default irq domain in pcibios_add_device() Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks Thomas Gleixner
2020-08-25 20:07 ` Bjorn Helgaas
2020-08-25 21:28 ` Thomas Gleixner
2020-08-25 21:35 ` Bjorn Helgaas
2020-08-25 21:40 ` Thomas Gleixner
2020-08-25 22:03 ` Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 31/38] x86/irq: Cleanup the arch_*_msi_irqs() leftovers Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 32/38] x86/irq: Make most MSI ops XEN private Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 33/38] x86/irq: Add DEV_MSI allocation type Thomas Gleixner
2020-08-21 0:24 ` [patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI Thomas Gleixner
2020-08-25 20:24 ` Bjorn Helgaas
2020-08-25 21:30 ` Thomas Gleixner
2020-08-25 21:50 ` Bjorn Helgaas
2020-08-21 0:24 ` [patch RFC 35/38] platform-msi: Provide default irq_chip::ack Thomas Gleixner
2020-08-21 0:25 ` [patch RFC 36/38] platform-msi: Add device MSI infrastructure Thomas Gleixner
2020-08-21 0:25 ` [patch RFC 37/38] irqdomain/msi: Provide msi_alloc/free_store() callbacks Thomas Gleixner
2020-08-21 0:25 ` [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING Thomas Gleixner
2020-08-21 12:45 ` Jason Gunthorpe
2020-08-21 19:47 ` Thomas Gleixner
2020-08-21 20:17 ` Jason Gunthorpe [this message]
2020-08-21 23:47 ` Thomas Gleixner
2020-08-22 0:51 ` Jason Gunthorpe
2020-08-22 1:34 ` Thomas Gleixner
2020-08-22 23:05 ` Jason Gunthorpe
2020-08-23 8:03 ` Thomas Gleixner
2020-08-22 14:19 ` [patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI Jürgen Groß
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200821201705.GA2811871@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@intel.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=boris.ostrovsky@oracle.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=haiyangz@microsoft.com \
--cc=iommu@lists.linux-foundation.org \
--cc=jacob.jun.pan@intel.com \
--cc=jgross@suse.com \
--cc=jonathan.derrick@intel.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=konrad.wilk@oracle.com \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=maz@kernel.org \
--cc=megha.dey@intel.com \
--cc=rafael@kernel.org \
--cc=rja@hpe.com \
--cc=sivanich@hpe.com \
--cc=sstabellini@kernel.org \
--cc=steve.wahl@hpe.com \
--cc=sthemmin@microsoft.com \
--cc=tglx@linutronix.de \
--cc=wei.liu@kernel.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).