linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick, Jonathan" <jonathan.derrick@intel.com>
To: "helgaas@kernel.org" <helgaas@kernel.org>
Cc: "rajatja@google.com" <rajatja@google.com>,
	"fred@fredlawl.com" <fred@fredlawl.com>,
	"ruscur@russell.cc" <ruscur@russell.cc>,
	"kbusch@kernel.org" <kbusch@kernel.org>,
	"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"olof@lixom.net" <olof@lixom.net>,
	"sbobroff@linux.ibm.com" <sbobroff@linux.ibm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"oohall@gmail.com" <oohall@gmail.com>,
	"mika.westerberg@linux.intel.com"
	<mika.westerberg@linux.intel.com>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"Patel, Mayurkumar" <mayurkumar.patel@intel.com>,
	"andriy.shevchenko@linux.intel.com" 
	<andriy.shevchenko@linux.intel.com>,
	"sathyanarayanan.kuppuswamy@linux.intel.com" 
	<sathyanarayanan.kuppuswamy@linux.intel.com>
Subject: Re: [PATCH v2 1/2] PCI/AER: Allow Native AER Host Bridges to use AER
Date: Mon, 27 Apr 2020 16:11:07 +0000	[thread overview]
Message-ID: <ac3d3b2d3f0e678b792281a1debf5762f1d52b1f.camel@intel.com> (raw)
In-Reply-To: <20200424233016.GA218665@google.com>

Hi Bjorn,

On Fri, 2020-04-24 at 18:30 -0500, Bjorn Helgaas wrote:
> Hi Jon,
> 
> I'm glad you raised this because I think the way we handle
> FIRMWARE_FIRST is really screwed up.
> 
> On Mon, Apr 20, 2020 at 03:37:09PM -0600, Jon Derrick wrote:
> > Some platforms have a mix of ports whose capabilities can be negotiated
> > by _OSC, and some ports which are not described by ACPI and instead
> > managed by Native drivers. The existing Firmware-First HEST model can
> > incorrectly tag these Native, Non-ACPI ports as Firmware-First managed
> > ports by advertising the HEST Global Flag and matching the type and
> > class of the port (aer_hest_parse).
> > 
> > If the port requests Native AER through the Host Bridge's capability
> > settings, the AER driver should honor those settings and allow the port
> > to bind. This patch changes the definition of Firmware-First to exclude
> > ports whose Host Bridges request Native AER.
> > 
> > Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> > ---
> >  drivers/pci/pcie/aer.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index f4274d3..30fbd1f 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -314,6 +314,9 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev)
> >  	if (pcie_ports_native)
> >  		return 0;
> >  
> > +	if (pci_find_host_bridge(dev->bus)->native_aer)
> > +		return 0;
> 
> I hope we don't have to complicate pcie_aer_get_firmware_first() by
> adding this "native_aer" check here.  I'm not sure what we actually
> *should* do based on FIRMWARE_FIRST, but I don't think the current
> uses really make sense.
> 
> I think Linux makes too many assumptions based on the FIRMWARE_FIRST
> bit.  The ACPI spec really only says (ACPI v6.3, sec 18.3.2.4):
> 
>   If set, FIRMWARE_FIRST indicates to the OSPM that system firmware
>   will handle errors from this source first.
> 
>   If FIRMWARE_FIRST is set in the flags field, the Enabled field [of
>   the HEST AER structure] is ignored by the OSPM.
> 
> I do not see anything there about who owns the AER Capability, but
> Linux assumes that if FIRMWARE_FIRST is set, firmware must own the AER
> Capability.  I think that's reading too much into the spec.
> 
> We already have _OSC, which *does* explicitly talk about who owns the
> AER Capability, and I think we should rely on that.  If firmware
> doesn't want the OS to touch the AER Capability, it should decline to
> give ownership to the OS via _OSC.
> 
> >  	if (!dev->__aer_firmware_first_valid)
> >  		aer_set_firmware_first(dev);
> >  	return dev->__aer_firmware_first;
> > -- 
> > 1.8.3.1
> > 

Just a little bit of reading and my interpretation, as it seems like
some of this is just layers upon layers of possibly conflicting yet
intentionally vague descriptions.

_OSC seems to describe that OSPM can handle AER (6.2.11.3):
PCI Express Advanced Error Reporting (AER) control
   The OS sets this bit to 1 to request control over PCI Express AER.
   If the OS successfully receives control of this feature, it must
   handle error reporting through the AER Capability as described in
   the PCI Express Base Specification.


For AER and DPC the ACPI root port enumeration will properly set
native_aer/dpc based on _OSC:

struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
...
	if (!(root->osc_control_set & OSC_PCI_EXPRESS_AER_CONTROL))
		host_bridge->native_aer = 0;
	if (!(root->osc_control_set & OSC_PCI_EXPRESS_PME_CONTROL))
		host_bridge->native_pme = 0;
	if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
		host_bridge->native_ltr = 0;
	if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
		host_bridge->native_dpc = 0;

As DPC was defined in an ECN [1], I would imagine AER will need to
cover DPC for legacy platforms prior to the ECN.



The complication is that HEST also seems to describe how ports (and
other devices) are managed either individually or globally:

Table 18-387  PCI Express Root Port AER Structure
...
Flags:
   [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that
   system firmware will handle errors from this source
   [1] - GLOBAL: If set, indicates that the settings contained in this
   structure apply globally to all PCI Express Devices. All other bits
   must be set to zero


The _OSC definition seems to contradict/negate the above FIRMWARE_FIRST
definition that says only firmware will handle errors. It's a bit
different than the IA_32 MCE definition which allows for a GHES_ASSIST
condition, which would cause Firmware 'First', however does allow the
error to be received by OSPM AER via GHES:

Table 18-385  IA-32 Architecture Corrected Machine Check Structure
   [0] - FIRMWARE_FIRST: If set, this bit indicates that system
   firmware will handle errors from this source first.
   [2] - GHES_ASSIST: If set, this bit indicates that although OSPM is
   responsible for directly handling the error (as expected when
   FIRMWARE_FIRST is not set), system firmware reports additional
   information in the context of an interrupt generated by the error.
   The additional information is reported in a Generic Hardware Error
   Source structure with a matching Related Source Id.


I think Linux needs to make an assumption that devices either
enumerated in HEST or enumerated globally by HEST should be managed by
FFS. However it seems that Linux should also be correlating that with
_OSC as _OSC seems to directly contradict and possibly supercede the
HEST expectation.



[1] https://members.pcisig.com/wg/PCI-SIG/document/12888


  reply	other threads:[~2020-04-27 16:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-20 21:37 [PATCH v2 0/2] Honoring Native AER/DPC Host Bridges Jon Derrick
2020-04-20 21:37 ` [PATCH v2 1/2] PCI/AER: Allow Native AER Host Bridges to use AER Jon Derrick
2020-04-22 22:48   ` Kuppuswamy, Sathyanarayanan
2020-04-23 15:11     ` Derrick, Jonathan
2020-04-24 23:30   ` Bjorn Helgaas
2020-04-27 16:11     ` Derrick, Jonathan [this message]
2020-04-27 22:14       ` Bjorn Helgaas
2020-04-20 21:37 ` [PATCH v2 2/2] PCI/DPC: Allow Native DPC Host Bridges to use DPC Jon Derrick
2020-04-22 22:50   ` Kuppuswamy, Sathyanarayanan
2020-04-23 15:11     ` Derrick, Jonathan
2020-04-25 20:46       ` Kuppuswamy, Sathyanarayanan
2020-04-27 15:15         ` Derrick, Jonathan
2020-04-27 15:43           ` Kuppuswamy, Sathyanarayanan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac3d3b2d3f0e678b792281a1debf5762f1d52b1f.camel@intel.com \
    --to=jonathan.derrick@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=fred@fredlawl.com \
    --cc=helgaas@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mayurkumar.patel@intel.com \
    --cc=mika.westerberg@linux.intel.com \
    --cc=olof@lixom.net \
    --cc=oohall@gmail.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rajatja@google.com \
    --cc=ruscur@russell.cc \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).