From: Marc Zyngier <marc.zyngier@arm.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Sinan Kaya <okaya@codeaurora.org>,
Paul Menzel <pmenzel+linux-pci@molgen.mpg.de>,
Dave Young <dyoung@redhat.com>, <linux-pci@vger.kernel.org>,
<kexec@lists.infradead.org>, <linux-kernel@vger.kernel.org>,
Lukas Wunner <lukas@wunner.de>,
Eric Biederman <ebiederm@xmission.com>,
Bjorn Helgaas <bhelgaas@google.com>,
Vivek Goyal <vgoyal@redhat.com>
Subject: Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
Date: Tue, 01 May 2018 17:31:43 +0100 [thread overview]
Message-ID: <861sevcp74.wl-marc.zyngier@arm.com> (raw)
In-Reply-To: <20180501132554.GA11698@bhelgaas-glaptop.roam.corp.google.com>
On Tue, 01 May 2018 14:25:54 +0100,
Bjorn Helgaas wrote:
Hi Bjorn,
> On Tue, May 01, 2018 at 01:59:20PM +0100, Marc Zyngier wrote:
> > On 01/05/18 13:38, Sinan Kaya wrote:
> > > +Marc,
> > >
> > > On 4/30/2018 5:27 PM, Sinan Kaya wrote:
> > >> On 4/30/2018 5:17 PM, Bjorn Helgaas wrote:
> > >>>> What should we do about this?
> > >>>>
> > >>>> Since there is an actual HW errata involved, should we quirk this
> > >>>> root port and not wait as if remove/shutdown doesn't exist?
> > >>> I was hoping to avoid a quirk because AFAIK all Intel parts have this
> > >>> issue so it will be an ongoing maintenance issue. I tried to avoid
> > >>> the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute
> > >>> timeout from hotplug command start time").
> > >>>
> > >>> But we still see the alarming messages, so we should probably add a
> > >>> quirk to get rid of those.
> > >>>
> > >>> But I haven't given up on the idea of getting rid of the
> > >>> pciehp_remove() path. I'm not convinced yet that we actually need to
> > >>> do anything to shut this device down. I don't like the assumption
> > >>> that kexec requires this. The kexec is fundamentally just a branch,
> > >>> and anything we do before the branch (i.e., in the old kernel), we
> > >>> should also be able to do after the branch (i.e., in the kexec-ed
> > >>> kernel).
> > >>>
> > >>
> > >> In my experience with kexec, MSI type edge interrupts are harmless.
> > >> You might just see a few unhandled interrupt messages during boot
> > >> if something is pending from the first kernel.
> >
> > Unfortunately, that's not always the case.
> >
> > A number of GICv3/v4 implementations (a very common interrupt controller
> > on ARM servers) cannot be disabled, which means they will keep writing
> > to their pending tables long after kexec will have started the new
> > kernel. And since we don't track memory allocation across kexec, you
> > end-up with significant chances of observing single bit corruption as
> > interrupts carry on being delivered. Oh, and you won't actually be able
> > to take MSIs because you can't even reprogram the damn thing.
> >
> > Yes, this can be considered a HW bug.
> >
> > >> It is the level interrupts that are more concerning. It remains pending
> > >> until the interrupt source is cleared. CPU never returns from the
> > >> interrupt handler to actually continue booting the second kernel.
> > >
> > > This makes me wonder why kexec doesn't disable all interrupt sources by
> > > itself instead of relying on the drivers shutdown routine. Some drivers
> > > don't even have a shutdown callback. Kexec could have done both as another
> > > example. Something like.
> > >
> > > 1. Call shutdown for all drivers if available.
> > > 2. Disable all interrupt sources in the interrupt controller
> > > 3. Start the new kernel.
> >
> > See above. Although you can shut off the end-point and to some extent
> > mask interrupts before jumping into the payload, it is not always
> > possible to go back to a reasonable state where you can take actually MSIs.
>
> This is exactly the sort of thing it would be nice to collect and
> document as part of the background of "why kexec works the way it
> does." It certainly helps explain things that are far from obvious if
> you don't have the background.
I'd certainly be happy to help with it if someone was willing to
kickstart such a document. kexec/kdump is a huge bag of "interesting"
tricks, and it has driven me mad over the past couple of months (I'm
typing this from a laptop that uses kexec as its bootloader, and it is
*not fun*).
M.
--
Jazz is not dead, it just smell funny.
next prev parent reply other threads:[~2018-05-01 16:31 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-26 10:17 pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) Paul Menzel
2018-04-27 19:22 ` Bjorn Helgaas
2018-04-27 19:34 ` Sinan Kaya
2018-04-27 21:12 ` Bjorn Helgaas
2018-04-28 0:56 ` Dave Young
2018-04-28 1:18 ` Dave Young
2018-04-28 13:03 ` okaya
2018-04-30 20:48 ` Sinan Kaya
2018-04-30 21:17 ` Bjorn Helgaas
2018-04-30 21:27 ` Sinan Kaya
2018-04-30 21:38 ` Lukas Wunner
2018-05-01 12:38 ` Sinan Kaya
2018-05-01 12:59 ` Marc Zyngier
2018-05-01 13:25 ` Bjorn Helgaas
2018-05-01 16:31 ` Marc Zyngier [this message]
2018-05-01 22:32 ` Eric W. Biederman
2018-05-03 8:49 ` Paul Menzel
2018-05-04 2:45 ` Bjorn Helgaas
2018-05-04 6:37 ` okaya
2018-05-04 13:33 ` Bjorn Helgaas
2018-05-04 14:24 ` okaya
2018-05-06 9:35 ` Paul Menzel
2018-05-07 21:33 ` Bjorn Helgaas
2018-05-08 6:59 ` Paul Menzel
2018-05-08 12:34 ` Bjorn Helgaas
2018-05-08 13:22 ` Paul Menzel
2018-05-09 11:41 ` Lukas Wunner
2018-05-09 12:57 ` Bjorn Helgaas
2018-05-09 13:16 ` Lukas Wunner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=861sevcp74.wl-marc.zyngier@arm.com \
--to=marc.zyngier@arm.com \
--cc=bhelgaas@google.com \
--cc=dyoung@redhat.com \
--cc=ebiederm@xmission.com \
--cc=helgaas@kernel.org \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=okaya@codeaurora.org \
--cc=pmenzel+linux-pci@molgen.mpg.de \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).