All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	"Guilherme G. Piccoli" <gpiccoli@canonical.com>,
	lukas@wunner.de, linux-pci@vger.kernel.org, kernelfans@gmail.com,
	andi@firstfloor.org, hpa@zytor.com, bhe@redhat.com,
	x86@kernel.org, okaya@kernel.org, mingo@redhat.com,
	jay.vosburgh@canonical.com, dyoung@redhat.com,
	gavin.guo@canonical.com, bp@alien8.de, bhelgaas@google.com,
	Guowen Shan <gshan@redhat.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	kernel@gpiccoli.net, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, ddstreet@canonical.com,
	vgoyal@redhat.com
Subject: Re: [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks
Date: Tue, 17 Nov 2020 16:25:23 -0600	[thread overview]
Message-ID: <87blfv7h9o.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <87wnyjwzeo.fsf@nanos.tec.linutronix.de> (Thomas Gleixner's message of "Tue, 17 Nov 2020 20:34:23 +0100")

Thomas Gleixner <tglx@linutronix.de> writes:

> On Tue, Nov 17 2020 at 12:19, David Woodhouse wrote:
>> On Tue, 2020-11-17 at 10:53 +0100, Thomas Gleixner wrote:
>>> But that does not solve the problem either simply because then the IOMMU
>>> will catch the rogue MSIs and you get an interrupt storm on the IOMMU
>>> error interrupt.
>>
>> Not if you can tell the IOMMU to stop reporting those errors.
>>
>> We can easily do it per-device for DMA errors; not quite sure what
>> granularity we have for interrupts. Perhaps the Intel IOMMU only lets
>> you set the Fault Processing Disable bit per IRTE entry, and you still
>> get faults for Compatibility Format interrupts? Not sure about AMD...
>
> It looks like the fault (DMAR) and event (AMD) interrupts can be
> disabled in the IOMMU. That might help to bridge the gap until the PCI
> bus is scanned in full glory and the devices can be shut up for real.
>
> If we make this conditional for a crash dump kernel that might do the
> trick.
>
> Lot's of _might_ there :)

Worth testing.

Folks tracking this down is this enough of a hint for you to write a
patch and test?

Also worth checking how close irqpoll is to handling a case like this.
At least historically it did a pretty good job at shutting down problem
interrupts.

I really find it weird that an edge triggered irq was firing fast enough
to stop a system from booting.  Level triggered irqs do that if they are
acknolwedged without actually being handled.  I think edge triggered
irqs only fire when another event comes in, and it seems odd to get so
many actual events causing interrupts that the system soft locks
up.  Is my memory of that situation confused?

I recommend making these facilities general debug facilities so that
they can be used for cases other than crash dump.  The crash dump kernel
would always enable them because it can assume that the hardware is
very likely in a wonky state.

Eric


WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Guowen Shan <gshan@redhat.com>,
	linux-pci@vger.kernel.org, kernelfans@gmail.com,
	andi@firstfloor.org, hpa@zytor.com, bhe@redhat.com,
	x86@kernel.org, okaya@kernel.org, mingo@redhat.com,
	Bjorn Helgaas <helgaas@kernel.org>,
	jay.vosburgh@canonical.com, dyoung@redhat.com,
	gavin.guo@canonical.com,
	"Guilherme G. Piccoli" <gpiccoli@canonical.com>,
	bp@alien8.de, bhelgaas@google.com, kexec@lists.infradead.org,
	kernel@gpiccoli.net, "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	linux-kernel@vger.kernel.org, ddstreet@canonical.com,
	lukas@wunner.de, vgoyal@redhat.com,
	David Woodhouse <dwmw2@infradead.org>
Subject: Re: [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks
Date: Tue, 17 Nov 2020 16:25:23 -0600	[thread overview]
Message-ID: <87blfv7h9o.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <87wnyjwzeo.fsf@nanos.tec.linutronix.de> (Thomas Gleixner's message of "Tue, 17 Nov 2020 20:34:23 +0100")

Thomas Gleixner <tglx@linutronix.de> writes:

> On Tue, Nov 17 2020 at 12:19, David Woodhouse wrote:
>> On Tue, 2020-11-17 at 10:53 +0100, Thomas Gleixner wrote:
>>> But that does not solve the problem either simply because then the IOMMU
>>> will catch the rogue MSIs and you get an interrupt storm on the IOMMU
>>> error interrupt.
>>
>> Not if you can tell the IOMMU to stop reporting those errors.
>>
>> We can easily do it per-device for DMA errors; not quite sure what
>> granularity we have for interrupts. Perhaps the Intel IOMMU only lets
>> you set the Fault Processing Disable bit per IRTE entry, and you still
>> get faults for Compatibility Format interrupts? Not sure about AMD...
>
> It looks like the fault (DMAR) and event (AMD) interrupts can be
> disabled in the IOMMU. That might help to bridge the gap until the PCI
> bus is scanned in full glory and the devices can be shut up for real.
>
> If we make this conditional for a crash dump kernel that might do the
> trick.
>
> Lot's of _might_ there :)

Worth testing.

Folks tracking this down is this enough of a hint for you to write a
patch and test?

Also worth checking how close irqpoll is to handling a case like this.
At least historically it did a pretty good job at shutting down problem
interrupts.

I really find it weird that an edge triggered irq was firing fast enough
to stop a system from booting.  Level triggered irqs do that if they are
acknolwedged without actually being handled.  I think edge triggered
irqs only fire when another event comes in, and it seems odd to get so
many actual events causing interrupts that the system soft locks
up.  Is my memory of that situation confused?

I recommend making these facilities general debug facilities so that
they can be used for cases other than crash dump.  The crash dump kernel
would always enable them because it can assume that the hardware is
very likely in a wonky state.

Eric


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2020-11-17 22:26 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-18 18:37 [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks Guilherme G. Piccoli
2018-10-18 18:37 ` Guilherme G. Piccoli
2018-10-18 18:37 ` [PATCH 2/3] x86/PCI: Export find_cap() to be used in early PCI code Guilherme G. Piccoli
2018-10-18 18:37   ` Guilherme G. Piccoli
2018-10-18 18:37 ` [PATCH 3/3] x86/quirks: Add parameter to clear MSIs early on boot Guilherme G. Piccoli
2018-10-18 18:37   ` Guilherme G. Piccoli
2018-10-18 20:08   ` Sinan Kaya
2018-10-18 20:08     ` Sinan Kaya
2018-10-18 20:13     ` Guilherme G. Piccoli
2018-10-18 20:13       ` Guilherme G. Piccoli
2018-10-18 20:30       ` Sinan Kaya
2018-10-18 20:30         ` Sinan Kaya
2018-10-22 19:44         ` Guilherme G. Piccoli
2018-10-22 19:44           ` Guilherme G. Piccoli
2018-10-18 22:15 ` [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks Bjorn Helgaas
2018-10-18 22:15   ` Bjorn Helgaas
2018-10-22 20:35   ` Guilherme G. Piccoli
2018-10-22 20:35     ` Guilherme G. Piccoli
2018-10-23 17:03     ` Bjorn Helgaas
2018-10-23 17:03       ` Bjorn Helgaas
2020-11-06 13:14       ` Guilherme G. Piccoli
2020-11-06 13:14         ` Guilherme G. Piccoli
2020-11-13 16:46         ` Bjorn Helgaas
2020-11-13 16:46           ` Bjorn Helgaas
2020-11-13 23:31           ` Thomas Gleixner
2020-11-13 23:31             ` Thomas Gleixner
2020-11-13 23:40             ` Thomas Gleixner
2020-11-13 23:40               ` Thomas Gleixner
2020-11-14 20:39               ` Bjorn Helgaas
2020-11-14 20:39                 ` Bjorn Helgaas
2020-11-14 20:58                 ` Thomas Gleixner
2020-11-14 20:58                   ` Thomas Gleixner
2020-11-14 21:22                   ` Bjorn Helgaas
2020-11-14 21:22                     ` Bjorn Helgaas
2020-11-15 14:05                     ` Eric W. Biederman
2020-11-15 14:05                       ` Eric W. Biederman
2020-11-15 14:29                       ` Eric W. Biederman
2020-11-15 14:29                         ` Eric W. Biederman
2020-11-15 15:11                         ` Thomas Gleixner
2020-11-15 15:11                           ` Thomas Gleixner
2020-11-15 17:01                           ` Lukas Wunner
2020-11-15 19:18                             ` Thomas Gleixner
2020-11-15 19:18                               ` Thomas Gleixner
2020-11-15 20:46                           ` Eric W. Biederman
2020-11-15 20:46                             ` Eric W. Biederman
2020-11-16 20:31                             ` Guilherme G. Piccoli
2020-11-16 20:31                               ` Guilherme G. Piccoli
2020-11-16 21:45                               ` Eric W. Biederman
2020-11-16 21:45                                 ` Eric W. Biederman
2020-11-16 21:49                                 ` Guilherme Piccoli
2020-11-16 21:49                                   ` Guilherme Piccoli
2020-11-17  0:19                               ` Bjorn Helgaas
2020-11-17  0:19                                 ` Bjorn Helgaas
2020-11-17  1:06                                 ` Eric W. Biederman
2020-11-17  1:06                                   ` Eric W. Biederman
2020-11-17  9:53                                   ` Thomas Gleixner
2020-11-17  9:53                                     ` Thomas Gleixner
2020-11-17 12:19                                     ` David Woodhouse
2020-11-17 12:19                                       ` David Woodhouse
2020-11-17 19:34                                       ` Thomas Gleixner
2020-11-17 19:34                                         ` Thomas Gleixner
2020-11-17 22:25                                         ` Eric W. Biederman [this message]
2020-11-17 22:25                                           ` Eric W. Biederman
2020-11-17 12:04                                   ` Guilherme Piccoli
2020-11-17 12:04                                     ` Guilherme Piccoli
2020-11-18 21:05                                     ` Bjorn Helgaas
2020-11-18 21:05                                       ` Bjorn Helgaas
2020-11-18 22:36                                       ` Guilherme Piccoli
2020-11-18 22:36                                         ` Guilherme Piccoli
2020-11-30 20:20                                         ` Bjorn Helgaas
2020-11-30 20:20                                           ` Bjorn Helgaas
2020-12-14 18:32                                           ` Guilherme Piccoli
2020-12-14 18:32                                             ` Guilherme Piccoli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87blfv7h9o.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=andi@firstfloor.org \
    --cc=bhe@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=ddstreet@canonical.com \
    --cc=dwmw2@infradead.org \
    --cc=dyoung@redhat.com \
    --cc=gavin.guo@canonical.com \
    --cc=gpiccoli@canonical.com \
    --cc=gshan@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=hpa@zytor.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=kernel@gpiccoli.net \
    --cc=kernelfans@gmail.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mingo@redhat.com \
    --cc=okaya@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=tglx@linutronix.de \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.