linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Alex G." <mr.nuke.me@gmail.com>
Cc: Alex_Gagniuc@Dellteam.com, bhelgaas@google.com,
	keith.busch@intel.com, Austin.Bolen@dell.com,
	Shyam.Iyer@dell.com, fred@fredlawl.com, poza@codeaurora.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] PCI/AER: Do not clear AER bits if we don't own AER
Date: Thu, 9 Aug 2018 14:18:32 -0500	[thread overview]
Message-ID: <20180809191832.GC113140@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <d235e71d-647f-00d1-8a21-2784b7939c25@gmail.com>

On Thu, Aug 09, 2018 at 02:00:23PM -0500, Alex G. wrote:
> On 08/09/2018 01:29 PM, Bjorn Helgaas wrote:
> > On Thu, Aug 09, 2018 at 04:46:32PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> > > On 08/09/2018 09:16 AM, Bjorn Helgaas wrote:
> (snip_
> > > >     enable_ecrc_checking()
> > > >     disable_ecrc_checking()
> > > 
> > > I don't immediately see how this would affect FFS, but the bits are part
> > > of the AER capability structure. According to the FFS model, those would
> > > be owned by FW, and we'd have to avoid touching them.
> > 
> > Per ACPI v6.2, sec 18.3.2.4, the HEST may contain entries for Root
> > Ports that contain the FIRMWARE_FIRST flag as well as values the OS is
> > supposed to write to several AER capability registers.  It looks like
> > we currently ignore everything except the FIRMWARE_FIRST and GLOBAL
> > flags (ACPI_HEST_FIRMWARE_FIRST and ACPI_HEST_GLOBAL in Linux).
> > 
> > That seems like a pretty major screwup and more than I want to fix
> > right now.
> 
> The logic is not very clear, but I think it goes like this:
> For GLOBAL and FFS, disable native AER everywhere.
> When !GLOBAL and FFS, then only disable native AER for the root port
> described by the HEST entry.

I agree the code is convoluted, but that sounds right to me.

What I meant is that we ignore the values the HEST entry tells us
we're supposed to write to Device Control and the AER Uncorrectable
Error Mask, Uncorrectable Error Severity, Correctable Error Mask, and
AER Capabilities and Control.

> > > For practical considerations this is not an issue today. The ACPI error
> > > handling code currently crashes when it encounters any fatal error, so
> > > we wouldn't hit this in the FFS case.
> > 
> > I wasn't aware the firmware-first path was *that* broken.  Are there
> > problem reports for this?  Is this a regression?
> 
> It's been like this since, I believe, 3.10, and probably much earlier. All
> reports that I have seen of linux crashing on surprise hot-plug have been
> caused by the panic() call in the apei code. Dell BIOSes do an extreme
> amount of work to determine when it's safe to _not_ report errors to the OS,
> since all known OSes crash on this path.

Oh, is this the __ghes_panic() path?  If so, I'm going to turn away
and plead ignorance unless the PCI core is doing something wrong that
eventually results in that panic.

  reply	other threads:[~2018-08-09 21:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-17 15:31 [PATCH] PCI/AER: Do not clear AER bits if we don't own AER Alexandru Gagniuc
2018-07-17 15:41 ` Sinan Kaya
2018-07-19 15:55   ` Alex G.
2018-07-19 16:58     ` Sinan Kaya
2018-07-19 19:56       ` Alex G.
2018-07-23 16:52 ` [PATCH v2] " Alexandru Gagniuc
2018-07-24 15:59   ` Alex G.
2018-07-30 23:35     ` [PATCH v3] " Alexandru Gagniuc
2018-08-08  1:14       ` Bjorn Helgaas
2018-08-08  3:46         ` Alex G.
2018-07-24 17:08   ` [PATCH v2] " kbuild test robot
2018-07-25  1:03   ` kbuild test robot
2018-08-09 14:15 ` [PATCH] " Bjorn Helgaas
2018-08-09 16:46   ` Alex_Gagniuc
2018-08-09 18:29     ` Bjorn Helgaas
2018-08-09 19:00       ` Alex G.
2018-08-09 19:18         ` Bjorn Helgaas [this message]
2018-08-09 19:42           ` Alex G.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180809191832.GC113140@bhelgaas-glaptop.roam.corp.google.com \
    --to=helgaas@kernel.org \
    --cc=Alex_Gagniuc@Dellteam.com \
    --cc=Austin.Bolen@dell.com \
    --cc=Shyam.Iyer@dell.com \
    --cc=bhelgaas@google.com \
    --cc=fred@fredlawl.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mr.nuke.me@gmail.com \
    --cc=poza@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).