linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] PCI/AER: handling for RCiEPs
@ 2020-05-21 17:31 Jonathan Cameron
  2020-05-21 17:31 ` [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling Jonathan Cameron
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jonathan Cameron @ 2020-05-21 17:31 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: linux-acpi, linuxarm, Lorenzo Pieralisi, Jonathan Cameron

This RFC adds minimal AER handling for Root Complex integrated End Points
(RCiEPs).   These report their errors via a Root Complex Event Collector
(RCEC).  Note that this series does not provide a driver for said RCEC
because we do not need to do anything to it on a Hardware-Reduced ACPI
platform such as the ARM server we wish to support.

My assumption is that anyone needing support will need to enumerate the
association between the RCEC and RCiEPs, setting the rcec pointer added
to struct pci_dev.  If an alternate mechanism is preferred let me know.

Open questions are mainly in patch 2 description.  In particular a
number of the normal reset actions make little sense for an RCiEP (slot
reset?) so I'm unclear whether we should just call them all anyway or not.

Patch 1 avoids a reset of a register on the root port in a firmware first
flow.  It can occur for normal EP flow as well. It probably shouldn't,
but likely effects are minor (as firmware should have reset the register
already).

All comments welcome.  NB. We only care about the Hardware-Reduced
firmware first case so I'm more than happy to rip out he hints of
explicit RCEC support if people would prefer - I just put them in
for the RFC to show how that just possibly 'might' work.

There are other places that I suspect would need to take the RCEC case
into account that I have not addressed here.  Whilst we do have real
hardware RCiEPs, testing here was done with Qemu to allow comparison
of the flows for RCiEPs and EPs that were otherwise identical.
It is also easier to add whatever error injection is needed than on
real hardware.

Only the reduced hardware ACPI case has been tested as we would need
to add a bunch more stuff to Qemu to test the alternative forms
of firmware first of kernel first handling (which we don't care about :)

Jonathan Cameron (2):
  PCI/AER: Do not reset the device status if doing firmware first
    handling.
  PCI/AER: Add partial initial support for RCiEPs using RCEC or firmware
    first

 drivers/pci/pcie/aer.c |  3 +++
 drivers/pci/pcie/err.c | 61 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h    |  1 +
 3 files changed, 65 insertions(+)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling.
  2020-05-21 17:31 [RFC PATCH 0/2] PCI/AER: handling for RCiEPs Jonathan Cameron
@ 2020-05-21 17:31 ` Jonathan Cameron
  2020-06-16 17:47   ` Bjorn Helgaas
  2020-05-21 17:31 ` [PATCH 2/2] PCI/AER: Add partial initial support for RCiEPs using RCEC or firmware first Jonathan Cameron
  2020-06-16 10:47 ` [RFC PATCH 0/2] PCI/AER: handling for RCiEPs Jonathan Cameron
  2 siblings, 1 reply; 9+ messages in thread
From: Jonathan Cameron @ 2020-05-21 17:31 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: linux-acpi, linuxarm, Lorenzo Pieralisi, Jonathan Cameron

pci_aer_clear_device_status() currently resets the device status even when
firmware first handling is going on.  In particular it resets it on the
root port.

This has been discussed previously
https://lore.kernel.org/patchwork/patch/427375/.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/pci/pcie/aer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f4274d301235..43e78b97ace6 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
 {
 	u16 sta;
 
+	if (pcie_aer_get_firmware_first(dev))
+		return;
+
 	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
 	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] PCI/AER: Add partial initial support for RCiEPs using RCEC or firmware first
  2020-05-21 17:31 [RFC PATCH 0/2] PCI/AER: handling for RCiEPs Jonathan Cameron
  2020-05-21 17:31 ` [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling Jonathan Cameron
@ 2020-05-21 17:31 ` Jonathan Cameron
  2020-06-16 10:47 ` [RFC PATCH 0/2] PCI/AER: handling for RCiEPs Jonathan Cameron
  2 siblings, 0 replies; 9+ messages in thread
From: Jonathan Cameron @ 2020-05-21 17:31 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: linux-acpi, linuxarm, Lorenzo Pieralisi, Jonathan Cameron

Note this provides complete support for our usecase on an ARM server using
Hardware Reduced ACPI and adds appropriate place for an RCEC driver to hook
if someone else cares to write one, either for firmware first handling on
non Hardware Reduced ACPI or for kernel first AER handling.

For Root Complex integrated End Points (RCiEPs) there is no root port to
discover and hence we cannot walk the bus from the root port to do
appropriate resets.

The PCI specification provides Root Complex Event Collectors to deal with
this circumstance.  These are peer RCiEPs that provide (amongst other
things) collection + interrupt facilities for AER reporting for a set of
RCiEPs in the same root complex.

In the case of a Hardware Reduced ACPI platform, the AER errors are
reported via a GHESv2 path using CPER records as defined in the UEFI
specification.  These are intended to provide complete information and
appropriate hand shake in a fashion that does not require a specific form
of error reporting hardware.  This is contrast to AER handling via the
various HEST entries for PCI Root Port and PCI Device etc where we do
require direct access to the RCEC.

As such my interpretation of the spec is that a Reduced Hardware ACPI
platform should not access the RCEC from the OS at all during AER handling,
and in fact is welcome to use non standard hardware interfaces to provide
the equivalent functionality in any fashion it wishes (as all hidden beind
the firmware).

Hence I am making the provision of an RCEC optional.

The aim of the rest of the code was to replicate the actions that would
have occurred if this had been an EP below a root port. Some of them make
absolutely no sense, but I hope this RFC can start a discussion on what
we should be doing under these circumstances.

It probably makes sense to pull this new block of code out to a separate
function but for the RFC I've left it in place to keep it next to the
existing path.

It appears that the current kernel first code does not support detecting
the multiple error bits being set in the root port error status register.
This seems like a limitation both the normal EP / Root Port case and
for RCiEPs.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/pci/pcie/err.c | 61 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h    |  1 +
 2 files changed, 62 insertions(+)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 14bb8f54723e..d34be4483f73 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -153,6 +153,67 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
 	struct pci_bus *bus;
 
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
+		struct pci_dev *rcec = dev->rcec;
+		/* Not clear this makes any sense - we can't reset link anyway...*/
+		if (state == pci_channel_io_frozen) {
+			report_frozen_detected(dev, &status);
+			pci_err(dev, "io is frozen and cannot reset link\n");
+			goto failed;
+		} else {
+			report_normal_detected(dev, &status);
+		}
+
+		if (status == PCI_ERS_RESULT_CAN_RECOVER) {
+			status = PCI_ERS_RESULT_RECOVERED;
+			pci_dbg(dev, "broadcast mmio_enabled message\n");
+			report_mmio_enabled(dev, &status);
+		}
+
+		if (status == PCI_ERS_RESULT_NEED_RESET) {
+			/* No actual slot reset possible */
+			status = PCI_ERS_RESULT_RECOVERED;
+			pci_dbg(dev, "broadcast slot_reset message\n");
+			report_slot_reset(dev, &status);
+		}
+
+		if (status != PCI_ERS_RESULT_RECOVERED)
+			goto failed;
+
+		report_resume(dev, &status);
+
+		/*
+		 * These two should be called on the RCEC  - but in case
+		 * of firmware first they should be no-ops. Given that
+		 * in a reduced hardware ACPI system, it is possible there
+		 * is no standard compliant RCEC at all.
+		 *
+		 * Add some sort of check on what type of HEST entries we have?
+		 */
+		if (rcec) {
+			/*
+			 * Unlike the upstream port case for an EP, we have not
+			 * issued a reset on all device the RCEC handles, so
+			 * perhaps we should be more careful about resetting
+			 * the status registers on the RCEC?
+			 *
+			 * In particular we may need provide a means to handle
+			 * the multiple error bits being set in PCI_ERR_ROOT_STATUS
+			 */
+			pci_aer_clear_device_status(rcec);
+			pci_aer_clear_nonfatal_status(rcec);
+			/*
+			 * Non RCiEP case uses the downstream port above the device
+			 * for this message.
+			 */
+			pci_info(rcec, "device recovery successful\n");
+		} else {
+			pci_info(dev, "device recovery successful\n");
+		}
+
+		return status;
+	}
+
 	/*
 	 * Error recovery runs on all subordinates of the first downstream port.
 	 * If the downstream port detected the error, it is cleared at the end.
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 83ce1cdf5676..cb21dfe05f8c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -298,6 +298,7 @@ struct pci_dev {
 	struct list_head bus_list;	/* Node in per-bus list */
 	struct pci_bus	*bus;		/* Bus this device is on */
 	struct pci_bus	*subordinate;	/* Bus this device bridges to */
+	struct pci_dev	*rcec;		/* Root Complex Event Collector used */
 
 	void		*sysdata;	/* Hook for sys-specific extension */
 	struct proc_dir_entry *procent;	/* Device entry in /proc/bus/pci */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/2] PCI/AER: handling for RCiEPs
  2020-05-21 17:31 [RFC PATCH 0/2] PCI/AER: handling for RCiEPs Jonathan Cameron
  2020-05-21 17:31 ` [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling Jonathan Cameron
  2020-05-21 17:31 ` [PATCH 2/2] PCI/AER: Add partial initial support for RCiEPs using RCEC or firmware first Jonathan Cameron
@ 2020-06-16 10:47 ` Jonathan Cameron
  2 siblings, 0 replies; 9+ messages in thread
From: Jonathan Cameron @ 2020-06-16 10:47 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci; +Cc: linux-acpi, linuxarm, Lorenzo Pieralisi

Hi All,

Now the merge window is closed, I'd appreciate any comments on this series
from both ACPI and PCI related people.

Thanks,

Jonathan


On Fri, 22 May 2020 01:31:32 +0800
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> This RFC adds minimal AER handling for Root Complex integrated End Points
> (RCiEPs).   These report their errors via a Root Complex Event Collector
> (RCEC).  Note that this series does not provide a driver for said RCEC
> because we do not need to do anything to it on a Hardware-Reduced ACPI
> platform such as the ARM server we wish to support.
> 
> My assumption is that anyone needing support will need to enumerate the
> association between the RCEC and RCiEPs, setting the rcec pointer added
> to struct pci_dev.  If an alternate mechanism is preferred let me know.
> 
> Open questions are mainly in patch 2 description.  In particular a
> number of the normal reset actions make little sense for an RCiEP (slot
> reset?) so I'm unclear whether we should just call them all anyway or not.
> 
> Patch 1 avoids a reset of a register on the root port in a firmware first
> flow.  It can occur for normal EP flow as well. It probably shouldn't,
> but likely effects are minor (as firmware should have reset the register
> already).
> 
> All comments welcome.  NB. We only care about the Hardware-Reduced
> firmware first case so I'm more than happy to rip out he hints of
> explicit RCEC support if people would prefer - I just put them in
> for the RFC to show how that just possibly 'might' work.
> 
> There are other places that I suspect would need to take the RCEC case
> into account that I have not addressed here.  Whilst we do have real
> hardware RCiEPs, testing here was done with Qemu to allow comparison
> of the flows for RCiEPs and EPs that were otherwise identical.
> It is also easier to add whatever error injection is needed than on
> real hardware.
> 
> Only the reduced hardware ACPI case has been tested as we would need
> to add a bunch more stuff to Qemu to test the alternative forms
> of firmware first of kernel first handling (which we don't care about :)
> 
> Jonathan Cameron (2):
>   PCI/AER: Do not reset the device status if doing firmware first
>     handling.
>   PCI/AER: Add partial initial support for RCiEPs using RCEC or firmware
>     first
> 
>  drivers/pci/pcie/aer.c |  3 +++
>  drivers/pci/pcie/err.c | 61 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h    |  1 +
>  3 files changed, 65 insertions(+)
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling.
  2020-05-21 17:31 ` [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling Jonathan Cameron
@ 2020-06-16 17:47   ` Bjorn Helgaas
  2020-06-16 18:00     ` Kuppuswamy, Sathyanarayanan
  2020-06-17  9:18     ` Jonathan Cameron
  0 siblings, 2 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2020-06-16 17:47 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Bjorn Helgaas, linux-pci, linux-acpi, linuxarm,
	Lorenzo Pieralisi, Kuppuswamy Sathyanarayanan

[+cc Sathy]

On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote:
> pci_aer_clear_device_status() currently resets the device status even when
> firmware first handling is going on.  In particular it resets it on the
> root port.
>
> This has been discussed previously
> https://lore.kernel.org/patchwork/patch/427375/.

I don't think this reference is really pertinent, is it?  That patch
to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it
doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode.

But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS.

> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/pci/pcie/aer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f4274d301235..43e78b97ace6 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
>  {
>  	u16 sta;
>  
> +	if (pcie_aer_get_firmware_first(dev))
> +		return;

This needs to be adjusted because pcie_aer_get_firmware_first() no
longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST
parsing for AER ownership").

This will use the _OSC AER ownership bit to gate clearing of the
status bits in the PCIe capability (not the AER capability).

I think that's the right thing to do, but it's certainly not obvious
from the _OSC description in the PCI Firmware Spec r3.2.  I think we
need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of:

  System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
  2020, affecting PCI Firmware Specification, Rev. 3.2
  https://members.pcisig.com/wg/PCI-SIG/document/14076

>  	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
>  	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
>  }
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling.
  2020-06-16 17:47   ` Bjorn Helgaas
@ 2020-06-16 18:00     ` Kuppuswamy, Sathyanarayanan
  2020-06-17  9:31       ` Jonathan Cameron
  2020-06-17  9:18     ` Jonathan Cameron
  1 sibling, 1 reply; 9+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-06-16 18:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Jonathan Cameron
  Cc: Bjorn Helgaas, linux-pci, linux-acpi, linuxarm, Lorenzo Pieralisi

Hi Jonathan,

On 6/16/20 10:47 AM, Bjorn Helgaas wrote:
> [+cc Sathy]
> 
> On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote:
>> pci_aer_clear_device_status() currently resets the device status even when
>> firmware first handling is going on.  In particular it resets it on the
>> root port.
>>
>> This has been discussed previously
>> https://lore.kernel.org/patchwork/patch/427375/.
pci_aer_clear_device_status() is only used by handle_error_source(). And
I don't think handle_error_source() is called in FF mode. Can you
give more details on this issue ?
> 
> I don't think this reference is really pertinent, is it?  That patch
> to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it
> doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode.
> 
> But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS.
> 
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>>   drivers/pci/pcie/aer.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index f4274d301235..43e78b97ace6 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
>>   {
>>   	u16 sta;
>>   
>> +	if (pcie_aer_get_firmware_first(dev))
>> +		return;
> 
> This needs to be adjusted because pcie_aer_get_firmware_first() no
> longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST
> parsing for AER ownership").
> 
> This will use the _OSC AER ownership bit to gate clearing of the
> status bits in the PCIe capability (not the AER capability).
> 
> I think that's the right thing to do, but it's certainly not obvious
> from the _OSC description in the PCI Firmware Spec r3.2.  I think we
> need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of:
> 
>    System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
>    2020, affecting PCI Firmware Specification, Rev. 3.2
>    https://members.pcisig.com/wg/PCI-SIG/document/14076
> 
>>   	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
>>   	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
>>   }
>> -- 
>> 2.19.1
>>

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling.
  2020-06-16 17:47   ` Bjorn Helgaas
  2020-06-16 18:00     ` Kuppuswamy, Sathyanarayanan
@ 2020-06-17  9:18     ` Jonathan Cameron
  1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Cameron @ 2020-06-17  9:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linux-acpi, linuxarm,
	Lorenzo Pieralisi, Kuppuswamy Sathyanarayanan

On Tue, 16 Jun 2020 12:47:31 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc Sathy]
> 
> On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote:
> > pci_aer_clear_device_status() currently resets the device status even when
> > firmware first handling is going on.  In particular it resets it on the
> > root port.
> >
> > This has been discussed previously
> > https://lore.kernel.org/patchwork/patch/427375/.  
> 
> I don't think this reference is really pertinent, is it?  That patch
> to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it
> doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode.
> 
> But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS.

I'll be honest I've mostly forgotten my reasoning behind including that
reference.  Might have been as simple as I got lost in the renames.

I'll drop the reference.

> 
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >  drivers/pci/pcie/aer.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index f4274d301235..43e78b97ace6 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
> >  {
> >  	u16 sta;
> >  
> > +	if (pcie_aer_get_firmware_first(dev))
> > +		return;  
> 
> This needs to be adjusted because pcie_aer_get_firmware_first() no
> longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST
> parsing for AER ownership").
> 
> This will use the _OSC AER ownership bit to gate clearing of the
> status bits in the PCIe capability (not the AER capability).
> 
> I think that's the right thing to do, but it's certainly not obvious
> from the _OSC description in the PCI Firmware Spec r3.2.  I think we
> need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of:
> 
>   System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
>   2020, affecting PCI Firmware Specification, Rev. 3.2
>   https://members.pcisig.com/wg/PCI-SIG/document/14076

Thanks. I'll add that (though can't check the document currently
for reasons you can probably figure out *sigh*)

Note this patch is rather tangential to patch 2 which is the one
I really need feedback on.  Whilst this appeared to be
wrong it is 'mostly harmless'.

> 
> >  	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> >  	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> >  }
> > -- 
> > 2.19.1
> >   



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling.
  2020-06-16 18:00     ` Kuppuswamy, Sathyanarayanan
@ 2020-06-17  9:31       ` Jonathan Cameron
  2020-06-17 20:57         ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Cameron @ 2020-06-17  9:31 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Bjorn Helgaas, Bjorn Helgaas, linux-pci, linux-acpi, linuxarm,
	Lorenzo Pieralisi

On Tue, 16 Jun 2020 11:00:32 -0700
"Kuppuswamy, Sathyanarayanan" <sathyanarayanan.kuppuswamy@linux.intel.com> wrote:

> Hi Jonathan,
> 
> On 6/16/20 10:47 AM, Bjorn Helgaas wrote:
> > [+cc Sathy]
> > 
> > On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote:  
> >> pci_aer_clear_device_status() currently resets the device status even when
> >> firmware first handling is going on.  In particular it resets it on the
> >> root port.
> >>
> >> This has been discussed previously
> >> https://lore.kernel.org/patchwork/patch/427375/.  
> pci_aer_clear_device_status() is only used by handle_error_source(). And
> I don't think handle_error_source() is called in FF mode. Can you
> give more details on this issue ?

It's called in pcie_do_recovery

https://elixir.bootlin.com/linux/latest/source/drivers/pci/pcie/err.c#L200

Which is called from both handle_error_source and aer_recover_work_func.

indirectly called from ghes_handle_aer / ghes_do_proc

This particular flow will only happen (I think) on hardware reduced ACPI systems.

Jonathan

> > 
> > I don't think this reference is really pertinent, is it?  That patch
> > to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it
> > doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode.
> > 
> > But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS.
> >   
> >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> ---
> >>   drivers/pci/pcie/aer.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> >> index f4274d301235..43e78b97ace6 100644
> >> --- a/drivers/pci/pcie/aer.c
> >> +++ b/drivers/pci/pcie/aer.c
> >> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
> >>   {
> >>   	u16 sta;
> >>   
> >> +	if (pcie_aer_get_firmware_first(dev))
> >> +		return;  
> > 
> > This needs to be adjusted because pcie_aer_get_firmware_first() no
> > longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST
> > parsing for AER ownership").
> > 
> > This will use the _OSC AER ownership bit to gate clearing of the
> > status bits in the PCIe capability (not the AER capability).
> > 
> > I think that's the right thing to do, but it's certainly not obvious
> > from the _OSC description in the PCI Firmware Spec r3.2.  I think we
> > need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of:
> > 
> >    System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
> >    2020, affecting PCI Firmware Specification, Rev. 3.2
> >    https://members.pcisig.com/wg/PCI-SIG/document/14076
> >   
> >>   	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> >>   	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> >>   }
> >> -- 
> >> 2.19.1
> >>  
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling.
  2020-06-17  9:31       ` Jonathan Cameron
@ 2020-06-17 20:57         ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 9+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-06-17 20:57 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Bjorn Helgaas, Bjorn Helgaas, linux-pci, linux-acpi, linuxarm,
	Lorenzo Pieralisi

Hi,

On 6/17/20 2:31 AM, Jonathan Cameron wrote:
> On Tue, 16 Jun 2020 11:00:32 -0700
> "Kuppuswamy, Sathyanarayanan" <sathyanarayanan.kuppuswamy@linux.intel.com> wrote:
> 
>> Hi Jonathan,
>>
>> On 6/16/20 10:47 AM, Bjorn Helgaas wrote:
>>> [+cc Sathy]
>>>
>>> On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote:
>>>> pci_aer_clear_device_status() currently resets the device status even when
>>>> firmware first handling is going on.  In particular it resets it on the
>>>> root port.
>>>>
>>>> This has been discussed previously
>>>> https://lore.kernel.org/patchwork/patch/427375/.
>> pci_aer_clear_device_status() is only used by handle_error_source(). And
>> I don't think handle_error_source() is called in FF mode. Can you
>> give more details on this issue ?
> 
> It's called in pcie_do_recovery
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/pci/pcie/err.c#L200
> 
> Which is called from both handle_error_source and aer_recover_work_func.
> 
> indirectly called from ghes_handle_aer / ghes_do_proc
> 
> This particular flow will only happen (I think) on hardware reduced ACPI systems.
Ok. Makes sense.
> 
> Jonathan
> 
>>>
>>> I don't think this reference is really pertinent, is it?  That patch
>>> to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it
>>> doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode.
>>>
>>> But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS.
>>>    
>>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>> ---
>>>>    drivers/pci/pcie/aer.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>> index f4274d301235..43e78b97ace6 100644
>>>> --- a/drivers/pci/pcie/aer.c
>>>> +++ b/drivers/pci/pcie/aer.c
>>>> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
>>>>    {
>>>>    	u16 sta;
>>>>    
>>>> +	if (pcie_aer_get_firmware_first(dev))
use if (!pcie_aer_is_native(dev))
>>>> +		return;
>>>
>>> This needs to be adjusted because pcie_aer_get_firmware_first() no
>>> longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST
>>> parsing for AER ownership").
>>>
>>> This will use the _OSC AER ownership bit to gate clearing of the
>>> status bits in the PCIe capability (not the AER capability).
>>>
>>> I think that's the right thing to do, but it's certainly not obvious
>>> from the _OSC description in the PCI Firmware Spec r3.2.  I think we
>>> need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of:
>>>
>>>     System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
>>>     2020, affecting PCI Firmware Specification, Rev. 3.2
>>>     https://members.pcisig.com/wg/PCI-SIG/document/14076
>>>    
>>>>    	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
>>>>    	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
>>>>    }
>>>> -- 
>>>> 2.19.1
>>>>   
>>
> 
> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-17 20:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21 17:31 [RFC PATCH 0/2] PCI/AER: handling for RCiEPs Jonathan Cameron
2020-05-21 17:31 ` [PATCH 1/2] PCI/AER: Do not reset the device status if doing firmware first handling Jonathan Cameron
2020-06-16 17:47   ` Bjorn Helgaas
2020-06-16 18:00     ` Kuppuswamy, Sathyanarayanan
2020-06-17  9:31       ` Jonathan Cameron
2020-06-17 20:57         ` Kuppuswamy, Sathyanarayanan
2020-06-17  9:18     ` Jonathan Cameron
2020-05-21 17:31 ` [PATCH 2/2] PCI/AER: Add partial initial support for RCiEPs using RCEC or firmware first Jonathan Cameron
2020-06-16 10:47 ` [RFC PATCH 0/2] PCI/AER: handling for RCiEPs Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).