All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerry Hoemann <jerry.hoemann@hpe.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Kairui Song <kasong@redhat.com>, Baoquan He <bhe@redhat.com>,
	Deepa Dinamani <deepa.kernel@gmail.com>,
	jroedel@suse.de, Myron Stowe <myron.stowe@redhat.com>,
	linux-pci@vger.kernel.org, kexec@lists.infradead.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Randy Wright <rwright@hpe.com>, Dave Young <dyoung@redhat.com>,
	Khalid Aziz <khalid@gonehiking.org>
Subject: Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel
Date: Wed, 22 Jul 2020 15:50:48 -0600	[thread overview]
Message-ID: <20200722215048.GL220876@anatevka.americas.hpqcorp.net> (raw)
In-Reply-To: <20200722152123.GA1278089@bjorn-Precision-5520>

On Wed, Jul 22, 2020 at 10:21:23AM -0500, Bjorn Helgaas wrote:
> On Wed, Jul 22, 2020 at 10:52:26PM +0800, Kairui Song wrote:
> > On Fri, Mar 6, 2020 at 5:38 PM Baoquan He <bhe@redhat.com> wrote:
> > > On 03/04/20 at 08:53pm, Deepa Dinamani wrote:
> > > > On Wed, Mar 4, 2020 at 7:53 PM Baoquan He <bhe@redhat.com> wrote:
> > > > > On 03/03/20 at 01:01pm, Deepa Dinamani wrote:
> > > > > > I looked at this some more. Looks like we do not clear irqs
> > > > > > when we do a kexec reboot. And, the bootup code maintains
> > > > > > the same table for the kexec-ed kernel. I'm looking at the
> > > > > > following code in
> > > > >
> > > > > I guess you are talking about kdump reboot here, right? Kexec
> > > > > and kdump boot take the similar mechanism, but differ a
> > > > > little.
> > > >
> > > > Right I meant kdump kernel here. And, clearly the
> > > > is_kdump_kernel() case below.
> > > >
> > > > > > intel_irq_remapping.c:
> > > > > >
> > > > > >         if (ir_pre_enabled(iommu)) {
> > > > > >                 if (!is_kdump_kernel()) {
> > > > > >                         pr_warn("IRQ remapping was enabled on %s but
> > > > > > we are not in kdump mode\n",
> > > > > >                                 iommu->name);
> > > > > >                         clear_ir_pre_enabled(iommu);
> > > > > >                         iommu_disable_irq_remapping(iommu);
> > > > > >                 } else if (iommu_load_old_irte(iommu))
> > > > >
> > > > > Here, it's for kdump kernel to copy old ir table from 1st kernel.
> > > >
> > > > Correct.
> > > >
> > > > > >                         pr_err("Failed to copy IR table for %s from
> > > > > > previous kernel\n",
> > > > > >                                iommu->name);
> > > > > >                 else
> > > > > >                         pr_info("Copied IR table for %s from previous kernel\n",
> > > > > >                                 iommu->name);
> > > > > >         }
> > > > > >
> > > > > > Would cleaning the interrupts(like in the non kdump path
> > > > > > above) just before shutdown help here? This should clear the
> > > > > > interrupts enabled for all the devices in the current
> > > > > > kernel. So when kdump kernel starts, it starts clean. This
> > > > > > should probably help block out the interrupts from a device
> > > > > > that does not have a driver.
> > > > >
> > > > > I think stopping those devices out of control from continue
> > > > > sending interrupts is a good idea. While not sure if only
> > > > > clearing the interrupt will be enough. Those devices which
> > > > > will be initialized by their driver will brake, but devices
> > > > > which drivers are not loaded into kdump kernel may continue
> > > > > acting. Even though interrupts are cleaning at this time, the
> > > > > on-flight DMA could continue triggerring interrupt since the
> > > > > ir table and iopage table are rebuilt.
> > > >
> > > > This should be handled by the IOMMU, right? And, hence you are
> > > > getting UR. This seems like the correct execution flow to me.
> > >
> > > Sorry for late reply.
> > > Yes, this is initializing IOMMU device.
> > >
> > > > Anyway, you could just test this theory by removing the
> > > > is_kdump_kernel() check above and see if it solves your problem.
> > > > Obviously, check the VT-d spec to figure out the exact sequence to
> > > > turn off the IR.
> > >
> > > OK, I will talk to Kairui and get a machine to test it. Thanks for your
> > > nice idea, if you have a draft patch, we are happy to test it.
> > >
> > > > Note that the device that is causing the problem here is a legit
> > > > device. We want to have interrupts from devices we don't know about
> > > > blocked anyway because we can have compromised firmware/ devices that
> > > > could cause a DoS attack. So blocking the unwanted interrupts seems
> > > > like the right thing to do here.
> > >
> > > Kairui said it's a device which driver is not loaded in kdump kernel
> > > because it's not needed by kdump. We try to only load kernel modules
> > > which are needed, e.g one device is the dump target, its driver has to
> > > be loaded in. In this case, the device is more like a out of control
> > > device to kdump kernel.
> > 
> > Hi Bao, Deepa, sorry for this very late response. The test machine was
> > not available for sometime, and I restarted to work on this problem.
> > 
> > For the workaround mention by Deepa (by remote the is_kdump_kernel()
> > check), it didn't work, the machine still hangs upon shutdown.
> > The devices that were left in an unknown state and sending interrupt
> > could be a problem, but it's irrelevant to this hanging problem.
> > 
> > I think I didn't make one thing clear, The PCI UR error never arrives
> > in kernel, it's the iLo BMC on that HPE machine caught the error, and
> > send kernel an NMI. kernel is panicked by NMI, I'm still trying to
> > figure out why the NMI hanged kernel, even with panic=-1,
> > panic_on_io_nmi, panic_on_unknown_nmi all set. But if we can avoid the
> > NMI by shutdown the devices in right order, that's also a solution.
> 
> I'm not sure how much sympathy to have for this situation.  A PCIe UR
> is fatal for the transaction and maybe even the device, but from the
> overall system point of view, it *should* be a recoverable error and
> we shouldn't panic.
> 
> Errors like that should be reported via the normal AER or ACPI/APEI
> mechanisms.  It sounds like in this case, the platform has decided
> these aren't enough and it is trying to force a reboot?  If this is
> "special" platform behavior, I'm not sure how much we need to cater
> for it.
> 

Are these AER errors the type processed by the GHES code?

I'll note that RedHat runs their crash kernel with:  hest_disable.
So, the ghes code is disabled in the crash kernel.


-- 

-----------------------------------------------------------------------------
Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
-----------------------------------------------------------------------------

WARNING: multiple messages have this Message-ID (diff)
From: Jerry Hoemann <jerry.hoemann@hpe.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: jroedel@suse.de, Kairui Song <kasong@redhat.com>,
	Baoquan He <bhe@redhat.com>,
	linux-pci@vger.kernel.org, Dave Young <dyoung@redhat.com>,
	kexec@lists.infradead.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Randy Wright <rwright@hpe.com>,
	Deepa Dinamani <deepa.kernel@gmail.com>,
	Myron Stowe <myron.stowe@redhat.com>,
	Khalid Aziz <khalid@gonehiking.org>
Subject: Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel
Date: Wed, 22 Jul 2020 15:50:48 -0600	[thread overview]
Message-ID: <20200722215048.GL220876@anatevka.americas.hpqcorp.net> (raw)
In-Reply-To: <20200722152123.GA1278089@bjorn-Precision-5520>

On Wed, Jul 22, 2020 at 10:21:23AM -0500, Bjorn Helgaas wrote:
> On Wed, Jul 22, 2020 at 10:52:26PM +0800, Kairui Song wrote:
> > On Fri, Mar 6, 2020 at 5:38 PM Baoquan He <bhe@redhat.com> wrote:
> > > On 03/04/20 at 08:53pm, Deepa Dinamani wrote:
> > > > On Wed, Mar 4, 2020 at 7:53 PM Baoquan He <bhe@redhat.com> wrote:
> > > > > On 03/03/20 at 01:01pm, Deepa Dinamani wrote:
> > > > > > I looked at this some more. Looks like we do not clear irqs
> > > > > > when we do a kexec reboot. And, the bootup code maintains
> > > > > > the same table for the kexec-ed kernel. I'm looking at the
> > > > > > following code in
> > > > >
> > > > > I guess you are talking about kdump reboot here, right? Kexec
> > > > > and kdump boot take the similar mechanism, but differ a
> > > > > little.
> > > >
> > > > Right I meant kdump kernel here. And, clearly the
> > > > is_kdump_kernel() case below.
> > > >
> > > > > > intel_irq_remapping.c:
> > > > > >
> > > > > >         if (ir_pre_enabled(iommu)) {
> > > > > >                 if (!is_kdump_kernel()) {
> > > > > >                         pr_warn("IRQ remapping was enabled on %s but
> > > > > > we are not in kdump mode\n",
> > > > > >                                 iommu->name);
> > > > > >                         clear_ir_pre_enabled(iommu);
> > > > > >                         iommu_disable_irq_remapping(iommu);
> > > > > >                 } else if (iommu_load_old_irte(iommu))
> > > > >
> > > > > Here, it's for kdump kernel to copy old ir table from 1st kernel.
> > > >
> > > > Correct.
> > > >
> > > > > >                         pr_err("Failed to copy IR table for %s from
> > > > > > previous kernel\n",
> > > > > >                                iommu->name);
> > > > > >                 else
> > > > > >                         pr_info("Copied IR table for %s from previous kernel\n",
> > > > > >                                 iommu->name);
> > > > > >         }
> > > > > >
> > > > > > Would cleaning the interrupts(like in the non kdump path
> > > > > > above) just before shutdown help here? This should clear the
> > > > > > interrupts enabled for all the devices in the current
> > > > > > kernel. So when kdump kernel starts, it starts clean. This
> > > > > > should probably help block out the interrupts from a device
> > > > > > that does not have a driver.
> > > > >
> > > > > I think stopping those devices out of control from continue
> > > > > sending interrupts is a good idea. While not sure if only
> > > > > clearing the interrupt will be enough. Those devices which
> > > > > will be initialized by their driver will brake, but devices
> > > > > which drivers are not loaded into kdump kernel may continue
> > > > > acting. Even though interrupts are cleaning at this time, the
> > > > > on-flight DMA could continue triggerring interrupt since the
> > > > > ir table and iopage table are rebuilt.
> > > >
> > > > This should be handled by the IOMMU, right? And, hence you are
> > > > getting UR. This seems like the correct execution flow to me.
> > >
> > > Sorry for late reply.
> > > Yes, this is initializing IOMMU device.
> > >
> > > > Anyway, you could just test this theory by removing the
> > > > is_kdump_kernel() check above and see if it solves your problem.
> > > > Obviously, check the VT-d spec to figure out the exact sequence to
> > > > turn off the IR.
> > >
> > > OK, I will talk to Kairui and get a machine to test it. Thanks for your
> > > nice idea, if you have a draft patch, we are happy to test it.
> > >
> > > > Note that the device that is causing the problem here is a legit
> > > > device. We want to have interrupts from devices we don't know about
> > > > blocked anyway because we can have compromised firmware/ devices that
> > > > could cause a DoS attack. So blocking the unwanted interrupts seems
> > > > like the right thing to do here.
> > >
> > > Kairui said it's a device which driver is not loaded in kdump kernel
> > > because it's not needed by kdump. We try to only load kernel modules
> > > which are needed, e.g one device is the dump target, its driver has to
> > > be loaded in. In this case, the device is more like a out of control
> > > device to kdump kernel.
> > 
> > Hi Bao, Deepa, sorry for this very late response. The test machine was
> > not available for sometime, and I restarted to work on this problem.
> > 
> > For the workaround mention by Deepa (by remote the is_kdump_kernel()
> > check), it didn't work, the machine still hangs upon shutdown.
> > The devices that were left in an unknown state and sending interrupt
> > could be a problem, but it's irrelevant to this hanging problem.
> > 
> > I think I didn't make one thing clear, The PCI UR error never arrives
> > in kernel, it's the iLo BMC on that HPE machine caught the error, and
> > send kernel an NMI. kernel is panicked by NMI, I'm still trying to
> > figure out why the NMI hanged kernel, even with panic=-1,
> > panic_on_io_nmi, panic_on_unknown_nmi all set. But if we can avoid the
> > NMI by shutdown the devices in right order, that's also a solution.
> 
> I'm not sure how much sympathy to have for this situation.  A PCIe UR
> is fatal for the transaction and maybe even the device, but from the
> overall system point of view, it *should* be a recoverable error and
> we shouldn't panic.
> 
> Errors like that should be reported via the normal AER or ACPI/APEI
> mechanisms.  It sounds like in this case, the platform has decided
> these aren't enough and it is trying to force a reboot?  If this is
> "special" platform behavior, I'm not sure how much we need to cater
> for it.
> 

Are these AER errors the type processed by the GHES code?

I'll note that RedHat runs their crash kernel with:  hest_disable.
So, the ghes code is disabled in the crash kernel.


-- 

-----------------------------------------------------------------------------
Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
-----------------------------------------------------------------------------

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2020-07-22 21:51 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-25 19:21 [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel Kairui Song
2019-12-25 19:21 ` Kairui Song
2019-12-27 11:11 ` kbuild test robot
2020-01-03  7:58 ` Kairui Song
2020-01-10 21:42 ` Bjorn Helgaas
2020-01-10 22:25   ` Khalid Aziz and Shuah Khan
2020-01-10 23:00     ` Jerry Hoemann
2020-01-11  0:18       ` Khalid Aziz
2020-01-11  0:50         ` Baoquan He
2020-01-11  3:45           ` Khalid Aziz
2020-01-11  9:35             ` Kairui Song
2020-01-11 18:32               ` Deepa Dinamani
2020-01-13 17:07                 ` Kairui Song
2020-01-15  1:16                   ` Deepa Dinamani
2020-01-15  7:56                     ` Kairui Song
2020-01-15 17:30                   ` Khalid Aziz
2020-01-15 18:05                     ` Kairui Song
2020-01-15 21:17                       ` Khalid Aziz
2020-01-17  3:24                         ` Dave Young
2020-01-17  3:46                           ` Baoquan He
2020-01-17 15:44                           ` Khalid Aziz
2020-01-11 10:04             ` Baoquan He
2020-01-11  0:45       ` Baoquan He
2020-01-11  0:51         ` Baoquan He
2020-01-11  1:46         ` Baoquan He
2020-01-11  9:24         ` Kairui Song
2020-01-10 23:36   ` Jerry Hoemann
2020-01-11  8:46   ` Kairui Song
2020-02-22 16:56 ` Bjorn Helgaas
2020-02-24  4:56   ` Dave Young
2020-02-24 17:30   ` Kairui Song
2020-02-28 19:53     ` Deepa Dinamani
2020-03-03 21:01       ` Deepa Dinamani
2020-03-05  3:53         ` Baoquan He
2020-03-05  4:53           ` Deepa Dinamani
2020-03-05  6:06             ` Deepa Dinamani
2020-03-06  9:38             ` Baoquan He
2020-07-22 14:52               ` Kairui Song
2020-07-22 14:52                 ` Kairui Song
2020-07-22 15:21                 ` Bjorn Helgaas
2020-07-22 15:21                   ` Bjorn Helgaas
2020-07-22 21:50                   ` Jerry Hoemann [this message]
2020-07-22 21:50                     ` Jerry Hoemann
2020-07-23  0:00                     ` Bjorn Helgaas
2020-07-23  0:00                       ` Bjorn Helgaas
2020-07-23 18:34                       ` Kairui Song
2020-07-23 18:34                         ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200722215048.GL220876@anatevka.americas.hpqcorp.net \
    --to=jerry.hoemann@hpe.com \
    --cc=bhe@redhat.com \
    --cc=deepa.kernel@gmail.com \
    --cc=dyoung@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=jroedel@suse.de \
    --cc=kasong@redhat.com \
    --cc=kexec@lists.infradead.org \
    --cc=khalid@gonehiking.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=myron.stowe@redhat.com \
    --cc=rwright@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.