linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	x86@kernel.org, Marc Zyngier <maz@kernel.org>,
	Megha Dey <megha.dey@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Jacob Pan <jacob.jun.pan@intel.com>,
	Baolu Lu <baolu.lu@intel.com>, Kevin Tian <kevin.tian@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Joerg Roedel <joro@8bytes.org>,
	iommu@lists.linux-foundation.org, linux-hyperv@vger.kernel.org,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Jon Derrick <jonathan.derrick@intel.com>,
	Lu Baolu <baolu.lu@linux.intel.com>, Wei Liu <wei.liu@kernel.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Steve Wahl <steve.wahl@hpe.com>,
	Dimitri Sivanich <sivanich@hpe.com>, Russ Anderson <rja@hpe.com>,
	linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	xen-devel@lists.xenproject.org, Juergen Gross <jgross@suse.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING
Date: Sat, 22 Aug 2020 03:34:45 +0200	[thread overview]
Message-ID: <874kovqx8q.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20200822005125.GB1152540@nvidia.com>

Jason,

On Fri, Aug 21 2020 at 21:51, Jason Gunthorpe wrote:
> On Sat, Aug 22, 2020 at 01:47:12AM +0200, Thomas Gleixner wrote:
>> > If the device has died the driver has code to detect and trigger a
>> > PCI function reset which will definitely stop the interrupt.
>> 
>> If that interrupt is gone into storm mode for some reason then this will
>> render your machine unusable before you can do that.
>
> Yes, but in general the HW design is to have one-shot interrupts, it
> would have to be well off the rails to storm. The kind of off the
> rails where it could also be doing crazy stuff on PCI-E that would be
> very harmful.

Yeah. One shot should prevent most of the wreckage. I just wanted to
spell it out.

>> One question is whether the device can see partial updates to that
>> memory due to the async 'swap' of context from the device CPU.
>
> It is worse than just partial updates.. The device operation is much
> more like you'd imagine a CPU cache. There could be copies of the RAM
> in the device for long periods of time, dirty data in the device that
> will flush back to CPU RAM overwriting CPU changes, etc.

TBH, that's insane. You clearly want to think about this some more. If
you swap out device state and device control state then you definitly
want to have regions which are read only from the device POV and never
written back. The MSI msg store clearly belongs into that category.
But that's not restricted to the MSI msg store, there is certainly other
stuff which never wants to be written back by the device.

If you don't do that then you simply can't write to that space from the
CPU and you have to transport this kind information always via command
queues.

But that does not make sense. It's trivial enough to have

    | RO state |
    | RW state |

and on swap in the whole thing is DMA'd into the device and on swap out
only the RW state part. It's not rocket science and makes a huge amount
of sense.

> Without involving the device there is just no way to create data
> consistency, and no way to change the data from the CPU. 
>
> This is the down side of having device data in the RAM. It cannot be
> so simple as 'just fetch it every time before you use it' as
> performance would be horrible.

That's clear, but with a proper seperation like the above and some extra
mechanism which allows you to tickle a relaod of 'RO state' you can
avoid quite some of the problems which you create otherwise.

>> If we really can get away with atomically updating the message as
>> outlined above and just let it happen at some point in the future then
>> most problems are solved, except for the nastyness of CPU hotplug.
>
> Since we can't avoid a device command, I'm think more along the lines
> of having the affinity update trigger an async WQ to issue the command
> from a thread context. Since it doesn't need to be synchronous it can
> make it out 'eventually'.
>
> I suppose the core code could provide this as a service? Sort of a
> varient of the other lazy things above?

Kinda. That needs a lot of thought for the affinity setting stuff
because it can be called from contexts which do not allow that. It's
solvable though, but I clearly need to stare at the corner cases for a
while.

> But it would have to work with ARM - is remapping a x86 only thing?

No. ARM64 has that as well.

> Does ARM put the affinity in the GIC tables not in the MSI data?

IIRC, yes.

>> Let me summarize what I think would be the sane solution for this:
>> 
>>   1) Utilize atomic writes for either all 16 bytes or reorder the bytes
>>      and update 8 bytes atomically which is sufficient as the wide
>>      address is only used with irq remapping and the MSI message in the
>>      device is never changed after startup.
>
> Sadly not something the device can manage due to data coherence

I disagree :)

>>   2) No requirement for issuing a command for regular migration
>>      operations as they have no requirements to be synchronous.
>> 
>>      Eventually store some state to force a reload on the next regular
>>      queue operation.
>
> Would the async version above be OK?

Async is fine in any variant (except for hotplug). Though having an
async WQ or whatever there needs some thought.

>>   3) No requirement for issuing a command for mask and unmask operations.
>>      The core code uses and handles lazy masking already. So if the
>>      hardware causes the lazyness, so be it.
>
> This lazy masking thing sounds good, I'm totally unfamiliar with it
> though.

It's used to avoid irq chip (often MMIO) access in scenarios where
disable/enable of an interrupt line happens with high frequency. Serial
has that issue. So we mark it disabled, but do not mask it and the core
can handle that and masks it once an interrupt comes in in masked
state. That obviously does not work out of the box to protect against
not disabled but masked state, but conceptually it's a similar problem
and can be made work without massive changes. 

OTOH, in normal operation for MSI interrupts (edge type) masking is not
used at all and just restricted to the startup teardown.

But I clearly need to think about it with a more awake brain some more.

> This email is super helpful, I definately don't know all these corners
> of the IRQ subsystem as my past with it has mostly been SOC stuff that
> isn't as complicated!

It's differently complicated and not less horrible :)

Thanks,

        tglx

  reply	other threads:[~2020-08-22  1:34 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21  0:24 [patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 01/38] iommu/amd: Prevent NULL pointer dereference Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 02/38] x86/init: Remove unused init ops Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 03/38] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 04/38] x86/irq: Add allocation type for parent domain retrieval Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 05/38] iommu/vt-d: Consolidate irq domain getter Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 06/38] iommu/amd: " Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 07/38] iommu/irq_remapping: Consolidate irq domain lookup Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 08/38] x86/irq: Prepare consolidation of irq_alloc_info Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 09/38] x86/msi: Consolidate HPET allocation Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 10/38] x86/ioapic: Consolidate IOAPIC allocation Thomas Gleixner
2020-08-26  8:40   ` Boqun Feng
2020-08-26  9:53     ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 11/38] x86/irq: Consolidate DMAR irq allocation Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 12/38] x86/irq: Consolidate UV domain allocation Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq() Thomas Gleixner
2020-08-25 20:03   ` Bjorn Helgaas
2020-08-25 21:11     ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 14/38] x86/msi: Consolidate MSI allocation Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 15/38] x86/msi: Use generic MSI domain ops Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 16/38] x86/irq: Move apic_post_init() invocation to one place Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 17/38] x86/pci: Reducde #ifdeffery in PCI init code Thomas Gleixner
2020-08-25 20:20   ` Bjorn Helgaas
2020-08-21  0:24 ` [patch RFC 18/38] x86/irq: Initialize PCI/MSI domain at PCI init time Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 19/38] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 20/38] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI Thomas Gleixner
2020-08-25 20:04   ` Bjorn Helgaas
2020-08-21  0:24 ` [patch RFC 21/38] PCI: MSI: Provide pci_dev_has_special_msi_domain() helper Thomas Gleixner
2020-08-25 20:16   ` Bjorn Helgaas
2020-08-21  0:24 ` [patch RFC 22/38] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() Thomas Gleixner
2020-08-24  4:48   ` Jürgen Groß
2020-08-21  0:24 ` [patch RFC 23/38] x86/xen: Rework MSI teardown Thomas Gleixner
2020-08-24  5:09   ` Jürgen Groß
2020-08-21  0:24 ` [patch RFC 24/38] x86/xen: Consolidate XEN-MSI init Thomas Gleixner
2020-08-24  4:59   ` Jürgen Groß
2020-08-24 21:21     ` Thomas Gleixner
2020-08-25  4:21       ` Jürgen Groß
2020-08-25  9:51         ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 25/38] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 26/38] x86/xen: Wrap XEN MSI management into irqdomain Thomas Gleixner
2020-08-24  6:21   ` Jürgen Groß
2020-08-25  7:57     ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 27/38] iommm/vt-d: Store irq domain in struct device Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 28/38] iommm/amd: " Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 29/38] x86/pci: Set default irq domain in pcibios_add_device() Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks Thomas Gleixner
2020-08-25 20:07   ` Bjorn Helgaas
2020-08-25 21:28     ` Thomas Gleixner
2020-08-25 21:35       ` Bjorn Helgaas
2020-08-25 21:40         ` Thomas Gleixner
2020-08-25 22:03           ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 31/38] x86/irq: Cleanup the arch_*_msi_irqs() leftovers Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 32/38] x86/irq: Make most MSI ops XEN private Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 33/38] x86/irq: Add DEV_MSI allocation type Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI Thomas Gleixner
2020-08-25 20:24   ` Bjorn Helgaas
2020-08-25 21:30     ` Thomas Gleixner
2020-08-25 21:50       ` Bjorn Helgaas
2020-08-21  0:24 ` [patch RFC 35/38] platform-msi: Provide default irq_chip::ack Thomas Gleixner
2020-08-21  0:25 ` [patch RFC 36/38] platform-msi: Add device MSI infrastructure Thomas Gleixner
2020-08-21  0:25 ` [patch RFC 37/38] irqdomain/msi: Provide msi_alloc/free_store() callbacks Thomas Gleixner
2020-08-21  0:25 ` [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING Thomas Gleixner
2020-08-21 12:45   ` Jason Gunthorpe
2020-08-21 19:47     ` Thomas Gleixner
2020-08-21 20:17       ` Jason Gunthorpe
2020-08-21 23:47         ` Thomas Gleixner
2020-08-22  0:51           ` Jason Gunthorpe
2020-08-22  1:34             ` Thomas Gleixner [this message]
2020-08-22 23:05               ` Jason Gunthorpe
2020-08-23  8:03                 ` Thomas Gleixner
2020-08-22 14:19 ` [patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI Jürgen Groß

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874kovqx8q.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=haiyangz@microsoft.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jgross@suse.com \
    --cc=jonathan.derrick@intel.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=rafael@kernel.org \
    --cc=rja@hpe.com \
    --cc=sivanich@hpe.com \
    --cc=sstabellini@kernel.org \
    --cc=steve.wahl@hpe.com \
    --cc=sthemmin@microsoft.com \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).