linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support
@ 2020-03-24  0:25 sathyanarayanan.kuppuswamy
  2020-03-24  0:25 ` [PATCH v18 01/11] PCI/ERR: Update error status after reset_link() sathyanarayanan.kuppuswamy
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:25 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

This patchset adds support for following features:

1. Error Disconnect Recover (EDR) support.
2. _OSC based negotiation support for DPC.

You can find EDR spec in the following link.

https://members.pcisig.com/wg/PCI-SIG/document/12614

Changes since v17 + Bjorns changes:
 * This version is based on Bjorn's review/edr branch.
 * Moved {pciehp,shpchp}_is_native() function definitions to pci.c and
   removed it's CONFIG_ACPI dependency.
 * Modified dpc_reset_link() function to return PCI_ERS_RESULT_NEED_RESET
   when hotplug is not supported or enabled in kernel.
 * Modified reset_link() function to handle PCI_ERS_RESULT_NEED_RESET as
   valid return value.
 * Moved the implementation of reset_link() function to pcie_do_recovery()
   and renamed function callback parameter from reset_cb to reset_link.
 * Moved the order of pci_acpi_add_edr_notifier() and
   pci_acpi_remove_edr_notifier() calls in pci_acpi_setup() and
   pci_acpi_cleanup() above wakeup capable support checks.
 * Used acpi_check_dsm() to check whether given _DSM is supported or
   not in edr.c.

Changes since v16:
 * Removed reset_link from pcie_port_service_driver.
 * Removed pcie_port_find_service().
 * Added pci_dpc_init() in pci_init_capabilities().

Changes since v15:
 * Splitted Patch # 3 in previous set into multiple patches.
 * Refactored EDR driver use pci_dev instead of dpc_dev.
 * Added some debug logs to EDR driver.
 * Used pci_aer_raw_clear_status() for clearing AER errors in EDR path.
 * Addressed other comments from Bjorn.
 * Rebased patches on top of Bjorns "PCI/DPC: Move data to struct pci_dev" patch.

Changes since v14:
 * Rebased on top of v5.6-rc1

Changes since v13:
 * Moved all EDR related code to edr.c
 * Addressed Bjorns comments.

Changes since v12:
 * Addressed Bjorns comments.
 * Added check for CONFIG_PCIE_EDR before requesting DPC control from firmware.
 * Removed ff_check parameter from AER APIs.
 * Used macros for _OST return status values in DPC driver.

Changes since v11:
 * Allowed error recovery to proceed after successful reset_link().
 * Used correct ACPI handle for sending EDR status.
 * Rebased on top of v5.5-rc5

Changes since v10:
 * Added "edr_enabled" member to dpc priv structure, which is used to cache EDR
   enabling status based on status of pcie_ports_dpc_native and FF mode.
 * Changed type of _DSM argument from Integer to Package in acpi_enable_dpc_port()
   function to fix ACPI related boot warnings.
 * Rebased on top of v5.5-rc3

Changes since v9:
 * Removed caching of pcie_aer_get_firmware_first() in dpc driver.
 * Added proper spec reference in git log for patch 5 & 7.
 * Added new function parameter "ff_check" to pci_cleanup_aer_uncorrect_error_status(),
   pci_aer_clear_fatal_status() and pci_cleanup_aer_error_status_regs() functions.
 * Rebased on top of v5.4-rc5

Changes since v8:
 * Rebased on top of v5.4-rc1

Changes since v7:
 * Updated DSM version number to match the spec.

Changes since v6:
 * Modified the order of patches to enable EDR only after all necessary support is added in kernel.
 * Addressed Bjorn comments.

Changes since v5:
 * Addressed Keith's comments.
 * Added additional check for FF mode in pci_aer_init().
 * Updated commit history of "PCI/DPC: Add support for DPC recovery on NON_FATAL errors" patch.

Changes since v4:
 * Rebased on top of v5.3-rc1
 * Fixed lock/unlock issue in edr_handle_event().
 * Merged "Update error status after reset_link()" patch into this patchset.

Changes since v3:
 * Moved EDR related ACPI functions/definitions to pci-acpi.c
 * Modified commit history in few patches to include spec reference.
 * Added support to handle DPC triggered by NON_FATAL errors.
 * Added edr_lock to protect PCI device receiving duplicate EDR notifications.
 * Addressed Bjorn comments.

Changes since v2:
 * Split EDR support patch into multiple patches.
 * Addressed Bjorn comments.

Changes since v1:
 * Rebased on top of v5.1-rc1

Bjorn Helgaas (1):
  PCI/DPC: Move DPC data into struct pci_dev

Kuppuswamy Sathyanarayanan (10):
  PCI/ERR: Update error status after reset_link()
  PCI: move {pciehp,shpchp}_is_native() definitions to pci.c
  PCI/DPC: Fix DPC recovery issue in non hotplug case
  PCI/ERR: Remove service dependency in pcie_do_recovery()
  PCI/ERR: Return status of pcie_do_recovery()
  PCI/DPC: Cache DPC capabilities in pci_init_capabilities()
  PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error
    Status
  PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR
  PCI/DPC: Add Error Disconnect Recover (EDR) support
  PCI/AER: Rationalize error status register clearing

 Documentation/PCI/pcieaer-howto.rst       |  23 +-
 drivers/acpi/pci_root.c                   |  15 ++
 drivers/net/ethernet/intel/ice/ice_main.c |   4 +-
 drivers/ntb/hw/idt/ntb_hw_idt.c           |   4 +-
 drivers/pci/pci-acpi.c                    |  40 +---
 drivers/pci/pci.c                         |  40 +++-
 drivers/pci/pci.h                         |  13 +-
 drivers/pci/pcie/Kconfig                  |  10 +
 drivers/pci/pcie/Makefile                 |   1 +
 drivers/pci/pcie/aer.c                    |  40 ++--
 drivers/pci/pcie/dpc.c                    | 145 ++++++-------
 drivers/pci/pcie/edr.c                    | 251 ++++++++++++++++++++++
 drivers/pci/pcie/err.c                    |  67 ++----
 drivers/pci/pcie/portdrv.h                |   5 -
 drivers/pci/pcie/portdrv_core.c           |  21 --
 drivers/pci/probe.c                       |   2 +
 drivers/scsi/lpfc/lpfc_attr.c             |   4 +-
 include/linux/acpi.h                      |   6 +-
 include/linux/aer.h                       |   9 +-
 include/linux/pci-acpi.h                  |   8 +
 include/linux/pci.h                       |   6 +
 include/linux/pci_hotplug.h               |   7 +-
 22 files changed, 471 insertions(+), 250 deletions(-)
 create mode 100644 drivers/pci/pcie/edr.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v18 01/11] PCI/ERR: Update error status after reset_link()
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
@ 2020-03-24  0:25 ` sathyanarayanan.kuppuswamy
  2020-03-24  0:25 ` [PATCH v18 02/11] PCI: move {pciehp,shpchp}_is_native() definitions to pci.c sathyanarayanan.kuppuswamy
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:25 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
reset_link() to recover from fatal errors. But during fatal error recovery,
if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or
PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using
reset_link()) pcie_do_recovery() will report the recovery result as
failure. So update the status of error after reset_link().

You can reproduce this issue by triggering a SW DPC using "DPC
Software Trigger" bit in "DPC Control Register". You should see recovery
failed dmesg log as below.

  pcieport 0000:00:16.0: DPC: containment event, status:0x1f27 source:0x0000
  pcieport 0000:00:16.0: DPC: software trigger detected
  pci 0000:04:00.0: AER: can't recover (no error_detected callback)
  pcieport 0000:00:16.0: AER: device recovery failed

Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
Link: https://lore.kernel.org/r/15e702a33cc27314f9d43a06ccb408086a229cef.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
---
 drivers/pci/pcie/err.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 01dfc8bb7ca0..1ac57e9e1e71 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -203,14 +203,14 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
 	bus = dev->subordinate;
 
 	pci_dbg(dev, "broadcast error_detected message\n");
-	if (state == pci_channel_io_frozen)
+	if (state == pci_channel_io_frozen) {
 		pci_walk_bus(bus, report_frozen_detected, &status);
-	else
+		status = reset_link(dev, service);
+		if (status != PCI_ERS_RESULT_RECOVERED)
+			goto failed;
+	} else {
 		pci_walk_bus(bus, report_normal_detected, &status);
-
-	if (state == pci_channel_io_frozen &&
-	    reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
-		goto failed;
+	}
 
 	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
 		status = PCI_ERS_RESULT_RECOVERED;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v18 02/11] PCI: move {pciehp,shpchp}_is_native() definitions to pci.c
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
  2020-03-24  0:25 ` [PATCH v18 01/11] PCI/ERR: Update error status after reset_link() sathyanarayanan.kuppuswamy
@ 2020-03-24  0:25 ` sathyanarayanan.kuppuswamy
  2020-03-24  0:26 ` [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case sathyanarayanan.kuppuswamy
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:25 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Currently pciehp_is_native() and shpchp_is_native() API's
always returns true if CONFIG_ACPI is not defined. But
these APIs does not have any dependency on ACPI. In non
ACPI case, we should return true only if slot supports it.

So move the definitions out of pci-acpi.c and always evaluate
the function before returning the status.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/pci-acpi.c      | 38 -------------------------------------
 drivers/pci/pci.c           | 38 +++++++++++++++++++++++++++++++++++++
 include/linux/pci_hotplug.h |  7 +++----
 3 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 0c02d500158f..1bf8765c41bd 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -781,44 +781,6 @@ int pci_acpi_program_hp_params(struct pci_dev *dev)
 	return -ENODEV;
 }
 
-/**
- * pciehp_is_native - Check whether a hotplug port is handled by the OS
- * @bridge: Hotplug port to check
- *
- * Returns true if the given @bridge is handled by the native PCIe hotplug
- * driver.
- */
-bool pciehp_is_native(struct pci_dev *bridge)
-{
-	const struct pci_host_bridge *host;
-	u32 slot_cap;
-
-	if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
-		return false;
-
-	pcie_capability_read_dword(bridge, PCI_EXP_SLTCAP, &slot_cap);
-	if (!(slot_cap & PCI_EXP_SLTCAP_HPC))
-		return false;
-
-	if (pcie_ports_native)
-		return true;
-
-	host = pci_find_host_bridge(bridge->bus);
-	return host->native_pcie_hotplug;
-}
-
-/**
- * shpchp_is_native - Check whether a hotplug port is handled by the OS
- * @bridge: Hotplug port to check
- *
- * Returns true if the given @bridge is handled by the native SHPC hotplug
- * driver.
- */
-bool shpchp_is_native(struct pci_dev *bridge)
-{
-	return bridge->shpc_managed;
-}
-
 /**
  * pci_acpi_wake_bus - Root bus wakeup notification fork function.
  * @context: Device wakeup context.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d828ca835a98..e724341cefff 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4783,6 +4783,44 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 	}
 }
 
+/**
+ * pciehp_is_native - Check whether a hotplug port is handled by the OS
+ * @bridge: Hotplug port to check
+ *
+ * Returns true if the given @bridge is handled by the native PCIe hotplug
+ * driver.
+ */
+bool pciehp_is_native(struct pci_dev *bridge)
+{
+	const struct pci_host_bridge *host;
+	u32 slot_cap;
+
+	if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
+		return false;
+
+	pcie_capability_read_dword(bridge, PCI_EXP_SLTCAP, &slot_cap);
+	if (!(slot_cap & PCI_EXP_SLTCAP_HPC))
+		return false;
+
+	if (pcie_ports_native)
+		return true;
+
+	host = pci_find_host_bridge(bridge->bus);
+	return host->native_pcie_hotplug;
+}
+
+/**
+ * shpchp_is_native - Check whether a hotplug port is handled by the OS
+ * @bridge: Hotplug port to check
+ *
+ * Returns true if the given @bridge is handled by the native SHPC hotplug
+ * driver.
+ */
+bool shpchp_is_native(struct pci_dev *bridge)
+{
+	return bridge->shpc_managed;
+}
+
 void pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	u16 ctrl;
diff --git a/include/linux/pci_hotplug.h b/include/linux/pci_hotplug.h
index b482e42d7153..11660e19a133 100644
--- a/include/linux/pci_hotplug.h
+++ b/include/linux/pci_hotplug.h
@@ -88,9 +88,7 @@ void pci_hp_deregister(struct hotplug_slot *slot);
 
 #ifdef CONFIG_ACPI
 #include <linux/acpi.h>
-bool pciehp_is_native(struct pci_dev *bridge);
 int acpi_get_hp_hw_control_from_firmware(struct pci_dev *bridge);
-bool shpchp_is_native(struct pci_dev *bridge);
 int acpi_pci_check_ejectable(struct pci_bus *pbus, acpi_handle handle);
 int acpi_pci_detect_ejectable(acpi_handle handle);
 #else
@@ -98,10 +96,11 @@ static inline int acpi_get_hp_hw_control_from_firmware(struct pci_dev *bridge)
 {
 	return 0;
 }
-static inline bool pciehp_is_native(struct pci_dev *bridge) { return true; }
-static inline bool shpchp_is_native(struct pci_dev *bridge) { return true; }
 #endif
 
+bool pciehp_is_native(struct pci_dev *bridge);
+bool shpchp_is_native(struct pci_dev *bridge);
+
 static inline bool hotplug_is_native(struct pci_dev *bridge)
 {
 	return pciehp_is_native(bridge) || shpchp_is_native(bridge);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
  2020-03-24  0:25 ` [PATCH v18 01/11] PCI/ERR: Update error status after reset_link() sathyanarayanan.kuppuswamy
  2020-03-24  0:25 ` [PATCH v18 02/11] PCI: move {pciehp,shpchp}_is_native() definitions to pci.c sathyanarayanan.kuppuswamy
@ 2020-03-24  0:26 ` sathyanarayanan.kuppuswamy
  2020-03-24 23:49   ` Bjorn Helgaas
  2020-03-24  0:26 ` [PATCH v18 04/11] PCI/DPC: Move DPC data into struct pci_dev sathyanarayanan.kuppuswamy
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:26 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

If hotplug is supported, during DPC event, hotplug
driver would remove the affected devices and detach
the drivers on DLLSC link down event and will
re-enumerate it once the DPC recovery is handled
and link comes back online (on DLLSC LINK up event).
Hence we don't depend on .mmio_enabled or .slot_reset
callbacks in error recovery handler to restore the
device.

But if hotplug is not supported/enabled, then we need
to let the error recovery handler attempt
the recovery of the devices using slot reset.

So if hotplug is not supported, then instead of
returning PCI_ERS_RESULT_RECOVERED, return
PCI_ERS_RESULT_NEED_RESET.

Also modify the way error recovery handler processes
the recovery value.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/pcie/dpc.c | 8 ++++++++
 drivers/pci/pcie/err.c | 5 +++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index e06f42f58d3d..0e356ed0d73f 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -13,6 +13,7 @@
 #include <linux/interrupt.h>
 #include <linux/init.h>
 #include <linux/pci.h>
+#include <linux/pci_hotplug.h>
 
 #include "portdrv.h"
 #include "../pci.h"
@@ -144,6 +145,13 @@ static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 	if (!pcie_wait_for_link(pdev, true))
 		return PCI_ERS_RESULT_DISCONNECT;
 
+	/*
+	 * If hotplug is not supported/enabled then let the device
+	 * recover using slot reset.
+	 */
+	if (!hotplug_is_native(pdev))
+		return PCI_ERS_RESULT_NEED_RESET;
+
 	return PCI_ERS_RESULT_RECOVERED;
 }
 
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 1ac57e9e1e71..6e52591a4722 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -178,7 +178,8 @@ static pci_ers_result_t reset_link(struct pci_dev *dev, u32 service)
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
 
-	if (status != PCI_ERS_RESULT_RECOVERED) {
+	if ((status != PCI_ERS_RESULT_RECOVERED) &&
+	    (status != PCI_ERS_RESULT_NEED_RESET)) {
 		pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s failed\n",
 			pci_name(dev));
 		return PCI_ERS_RESULT_DISCONNECT;
@@ -206,7 +207,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
 	if (state == pci_channel_io_frozen) {
 		pci_walk_bus(bus, report_frozen_detected, &status);
 		status = reset_link(dev, service);
-		if (status != PCI_ERS_RESULT_RECOVERED)
+		if (status == PCI_ERS_RESULT_DISCONNECT)
 			goto failed;
 	} else {
 		pci_walk_bus(bus, report_normal_detected, &status);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v18 04/11] PCI/DPC: Move DPC data into struct pci_dev
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
                   ` (2 preceding siblings ...)
  2020-03-24  0:26 ` [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case sathyanarayanan.kuppuswamy
@ 2020-03-24  0:26 ` sathyanarayanan.kuppuswamy
  2020-03-24  0:26 ` [PATCH v18 05/11] PCI/ERR: Remove service dependency in pcie_do_recovery() sathyanarayanan.kuppuswamy
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:26 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Bjorn Helgaas <bhelgaas@google.com>

We only need 25 bits of data for DPC, so I don't think it's worth the
complexity of allocating and keeping track of the struct dpc_dev separately
from the pci_dev.  Move that data into the struct pci_dev.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/dpc.c | 103 +++++++++++++----------------------------
 include/linux/pci.h    |   5 ++
 2 files changed, 36 insertions(+), 72 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 0e356ed0d73f..5c2e9d45a269 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -18,13 +18,6 @@
 #include "portdrv.h"
 #include "../pci.h"
 
-struct dpc_dev {
-	struct pcie_device	*dev;
-	u16			cap_pos;
-	bool			rp_extensions;
-	u8			rp_log_size;
-};
-
 static const char * const rp_pio_error_string[] = {
 	"Configuration Request received UR Completion",	 /* Bit Position 0  */
 	"Configuration Request received CA Completion",	 /* Bit Position 1  */
@@ -47,63 +40,42 @@ static const char * const rp_pio_error_string[] = {
 	"Memory Request Completion Timeout",		 /* Bit Position 18 */
 };
 
-static struct dpc_dev *to_dpc_dev(struct pci_dev *dev)
-{
-	struct device *device;
-
-	device = pcie_port_find_device(dev, PCIE_PORT_SERVICE_DPC);
-	if (!device)
-		return NULL;
-	return get_service_data(to_pcie_device(device));
-}
-
 void pci_save_dpc_state(struct pci_dev *dev)
 {
-	struct dpc_dev *dpc;
 	struct pci_cap_saved_state *save_state;
 	u16 *cap;
 
 	if (!pci_is_pcie(dev))
 		return;
 
-	dpc = to_dpc_dev(dev);
-	if (!dpc)
-		return;
-
 	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_DPC);
 	if (!save_state)
 		return;
 
 	cap = (u16 *)&save_state->cap.data[0];
-	pci_read_config_word(dev, dpc->cap_pos + PCI_EXP_DPC_CTL, cap);
+	pci_read_config_word(dev, dev->dpc_cap + PCI_EXP_DPC_CTL, cap);
 }
 
 void pci_restore_dpc_state(struct pci_dev *dev)
 {
-	struct dpc_dev *dpc;
 	struct pci_cap_saved_state *save_state;
 	u16 *cap;
 
 	if (!pci_is_pcie(dev))
 		return;
 
-	dpc = to_dpc_dev(dev);
-	if (!dpc)
-		return;
-
 	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_DPC);
 	if (!save_state)
 		return;
 
 	cap = (u16 *)&save_state->cap.data[0];
-	pci_write_config_word(dev, dpc->cap_pos + PCI_EXP_DPC_CTL, *cap);
+	pci_write_config_word(dev, dev->dpc_cap + PCI_EXP_DPC_CTL, *cap);
 }
 
-static int dpc_wait_rp_inactive(struct dpc_dev *dpc)
+static int dpc_wait_rp_inactive(struct pci_dev *pdev)
 {
 	unsigned long timeout = jiffies + HZ;
-	struct pci_dev *pdev = dpc->dev->port;
-	u16 cap = dpc->cap_pos, status;
+	u16 cap = pdev->dpc_cap, status;
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
 	while (status & PCI_EXP_DPC_RP_BUSY &&
@@ -120,15 +92,13 @@ static int dpc_wait_rp_inactive(struct dpc_dev *dpc)
 
 static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 {
-	struct dpc_dev *dpc;
 	u16 cap;
 
 	/*
 	 * DPC disables the Link automatically in hardware, so it has
 	 * already been reset by the time we get here.
 	 */
-	dpc = to_dpc_dev(pdev);
-	cap = dpc->cap_pos;
+	cap = pdev->dpc_cap;
 
 	/*
 	 * Wait until the Link is inactive, then clear DPC Trigger Status
@@ -136,7 +106,7 @@ static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 	 */
 	pcie_wait_for_link(pdev, false);
 
-	if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
+	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
 		return PCI_ERS_RESULT_DISCONNECT;
 
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
@@ -155,10 +125,9 @@ static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 	return PCI_ERS_RESULT_RECOVERED;
 }
 
-static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
+static void dpc_process_rp_pio_error(struct pci_dev *pdev)
 {
-	struct pci_dev *pdev = dpc->dev->port;
-	u16 cap = dpc->cap_pos, dpc_status, first_error;
+	u16 cap = pdev->dpc_cap, dpc_status, first_error;
 	u32 status, mask, sev, syserr, exc, dw0, dw1, dw2, dw3, log, prefix;
 	int i;
 
@@ -183,7 +152,7 @@ static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
 				first_error == i ? " (First)" : "");
 	}
 
-	if (dpc->rp_log_size < 4)
+	if (pdev->dpc_rp_log_size < 4)
 		goto clear_status;
 	pci_read_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_HEADER_LOG,
 			      &dw0);
@@ -196,12 +165,12 @@ static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
 	pci_err(pdev, "TLP Header: %#010x %#010x %#010x %#010x\n",
 		dw0, dw1, dw2, dw3);
 
-	if (dpc->rp_log_size < 5)
+	if (pdev->dpc_rp_log_size < 5)
 		goto clear_status;
 	pci_read_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_IMPSPEC_LOG, &log);
 	pci_err(pdev, "RP PIO ImpSpec Log %#010x\n", log);
 
-	for (i = 0; i < dpc->rp_log_size - 5; i++) {
+	for (i = 0; i < pdev->dpc_rp_log_size - 5; i++) {
 		pci_read_config_dword(pdev,
 			cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG, &prefix);
 		pci_err(pdev, "TLP Prefix Header: dw%d, %#010x\n", i, prefix);
@@ -234,10 +203,9 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
 
 static irqreturn_t dpc_handler(int irq, void *context)
 {
+	struct pci_dev *pdev = context;
+	u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
 	struct aer_err_info info;
-	struct dpc_dev *dpc = context;
-	struct pci_dev *pdev = dpc->dev->port;
-	u16 cap = dpc->cap_pos, status, source, reason, ext_reason;
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
@@ -256,8 +224,8 @@ static irqreturn_t dpc_handler(int irq, void *context)
 				     "reserved error");
 
 	/* show RP PIO error detail information */
-	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
-		dpc_process_rp_pio_error(dpc);
+	if (pdev->dpc_rp_extensions && reason == 3 && ext_reason == 0)
+		dpc_process_rp_pio_error(pdev);
 	else if (reason == 0 &&
 		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
 		 aer_get_device_error_info(pdev, &info)) {
@@ -274,9 +242,8 @@ static irqreturn_t dpc_handler(int irq, void *context)
 
 static irqreturn_t dpc_irq(int irq, void *context)
 {
-	struct dpc_dev *dpc = (struct dpc_dev *)context;
-	struct pci_dev *pdev = dpc->dev->port;
-	u16 cap = dpc->cap_pos, status;
+	struct pci_dev *pdev = context;
+	u16 cap = pdev->dpc_cap, status;
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
 
@@ -293,7 +260,6 @@ static irqreturn_t dpc_irq(int irq, void *context)
 #define FLAG(x, y) (((x) & (y)) ? '+' : '-')
 static int dpc_probe(struct pcie_device *dev)
 {
-	struct dpc_dev *dpc;
 	struct pci_dev *pdev = dev->port;
 	struct device *device = &dev->device;
 	int status;
@@ -302,43 +268,37 @@ static int dpc_probe(struct pcie_device *dev)
 	if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native)
 		return -ENOTSUPP;
 
-	dpc = devm_kzalloc(device, sizeof(*dpc), GFP_KERNEL);
-	if (!dpc)
-		return -ENOMEM;
-
-	dpc->cap_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_DPC);
-	dpc->dev = dev;
-	set_service_data(dev, dpc);
+	pdev->dpc_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_DPC);
 
 	status = devm_request_threaded_irq(device, dev->irq, dpc_irq,
 					   dpc_handler, IRQF_SHARED,
-					   "pcie-dpc", dpc);
+					   "pcie-dpc", pdev);
 	if (status) {
 		pci_warn(pdev, "request IRQ%d failed: %d\n", dev->irq,
 			 status);
 		return status;
 	}
 
-	pci_read_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CAP, &cap);
-	pci_read_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, &ctl);
+	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap);
+	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl);
 
-	dpc->rp_extensions = (cap & PCI_EXP_DPC_CAP_RP_EXT);
-	if (dpc->rp_extensions) {
-		dpc->rp_log_size = (cap & PCI_EXP_DPC_RP_PIO_LOG_SIZE) >> 8;
-		if (dpc->rp_log_size < 4 || dpc->rp_log_size > 9) {
+	pdev->dpc_rp_extensions = (cap & PCI_EXP_DPC_CAP_RP_EXT) ? 1 : 0;
+	if (pdev->dpc_rp_extensions) {
+		pdev->dpc_rp_log_size = (cap & PCI_EXP_DPC_RP_PIO_LOG_SIZE) >> 8;
+		if (pdev->dpc_rp_log_size < 4 || pdev->dpc_rp_log_size > 9) {
 			pci_err(pdev, "RP PIO log size %u is invalid\n",
-				dpc->rp_log_size);
-			dpc->rp_log_size = 0;
+				pdev->dpc_rp_log_size);
+			pdev->dpc_rp_log_size = 0;
 		}
 	}
 
 	ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN;
-	pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, ctl);
+	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
 
 	pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
 		 cap & PCI_EXP_DPC_IRQ, FLAG(cap, PCI_EXP_DPC_CAP_RP_EXT),
 		 FLAG(cap, PCI_EXP_DPC_CAP_POISONED_TLP),
-		 FLAG(cap, PCI_EXP_DPC_CAP_SW_TRIGGER), dpc->rp_log_size,
+		 FLAG(cap, PCI_EXP_DPC_CAP_SW_TRIGGER), pdev->dpc_rp_log_size,
 		 FLAG(cap, PCI_EXP_DPC_CAP_DL_ACTIVE));
 
 	pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_DPC, sizeof(u16));
@@ -347,13 +307,12 @@ static int dpc_probe(struct pcie_device *dev)
 
 static void dpc_remove(struct pcie_device *dev)
 {
-	struct dpc_dev *dpc = get_service_data(dev);
 	struct pci_dev *pdev = dev->port;
 	u16 ctl;
 
-	pci_read_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, &ctl);
+	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl);
 	ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
-	pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, ctl);
+	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
 }
 
 static struct pcie_port_service_driver dpcdriver = {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3840a541a9de..a0b7e7a53741 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -444,6 +444,11 @@ struct pci_dev {
 	const struct attribute_group **msi_irq_groups;
 #endif
 	struct pci_vpd *vpd;
+#ifdef CONFIG_PCIE_DPC
+	u16		dpc_cap;
+	unsigned int	dpc_rp_extensions:1;
+	u8		dpc_rp_log_size;
+#endif
 #ifdef CONFIG_PCI_ATS
 	union {
 		struct pci_sriov	*sriov;		/* PF: SR-IOV info */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v18 05/11] PCI/ERR: Remove service dependency in pcie_do_recovery()
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
                   ` (3 preceding siblings ...)
  2020-03-24  0:26 ` [PATCH v18 04/11] PCI/DPC: Move DPC data into struct pci_dev sathyanarayanan.kuppuswamy
@ 2020-03-24  0:26 ` sathyanarayanan.kuppuswamy
  2020-03-28 21:12   ` Kuppuswamy, Sathyanarayanan
  2020-03-24  0:26 ` [PATCH v18 06/11] PCI/ERR: Return status of pcie_do_recovery() sathyanarayanan.kuppuswamy
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:26 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Previously we passed the PCIe service type parameter to pcie_do_recovery(),
where reset_link() looked up the underlying pci_port_service_driver and its
.reset_link() function pointer. Instead of using this roundabout way, we
can just pass the driver-specific .reset_link() callback function when
calling pcie_do_recovery() function.

This allows us to call pcie_do_recovery() from code that is not a PCIe port
service driver, e.g., Error Disconnect Recover (EDR) support.

Remove pcie_port_find_service() and pcie_port_service_driver.reset_link
since they are now unused.

Link: https://lore.kernel.org/r/152c530a3ca8780ae85c2325f97f5f35f5d3602f.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/pcieaer-howto.rst | 19 +++-------
 drivers/pci/pci.h                   |  2 +-
 drivers/pci/pcie/aer.c              | 12 +++----
 drivers/pci/pcie/dpc.c              |  3 +-
 drivers/pci/pcie/err.c              | 54 +++++------------------------
 drivers/pci/pcie/portdrv.h          |  5 ---
 drivers/pci/pcie/portdrv_core.c     | 21 -----------
 7 files changed, 19 insertions(+), 97 deletions(-)

diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst
index 18bdefaafd1a..afbd8c1c321d 100644
--- a/Documentation/PCI/pcieaer-howto.rst
+++ b/Documentation/PCI/pcieaer-howto.rst
@@ -156,12 +156,6 @@ default reset_link function, but different upstream ports might
 have different specifications to reset pci express link, so all
 upstream ports should provide their own reset_link functions.
 
-In struct pcie_port_service_driver, a new pointer, reset_link, is
-added.
-::
-
-	pci_ers_result_t (*reset_link) (struct pci_dev *dev);
-
 Section 3.2.2.2 provides more detailed info on when to call
 reset_link.
 
@@ -212,15 +206,10 @@ error_detected(dev, pci_channel_io_frozen) to all drivers within
 a hierarchy in question. Then, performing link reset at upstream is
 necessary. As different kinds of devices might use different approaches
 to reset link, AER port service driver is required to provide the
-function to reset link. Firstly, kernel looks for if the upstream
-component has an aer driver. If it has, kernel uses the reset_link
-callback of the aer driver. If the upstream component has no aer driver
-and the port is downstream port, we will perform a hot reset as the
-default by setting the Secondary Bus Reset bit of the Bridge Control
-register associated with the downstream port. As for upstream ports,
-they should provide their own aer service drivers with reset_link
-function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
-reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
+function to reset link via callback parameter of pcie_do_recovery()
+function. If reset_link is not NULL, recovery function will use it
+to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER
+and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
 to mmio_enabled.
 
 helper functions
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6394e7746fb5..3e5efb83e9a2 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -548,7 +548,7 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev)
 
 /* PCI error reporting and recovery */
 void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
-		      u32 service);
+		      pci_ers_result_t (*reset_link)(struct pci_dev *pdev));
 
 bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
 #ifdef CONFIG_PCIEASPM
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 4a818b07a1af..c0540c3761dc 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -102,6 +102,7 @@ struct aer_stats {
 #define ERR_UNCOR_ID(d)			(d >> 16)
 
 static int pcie_aer_disable;
+static pci_ers_result_t aer_root_reset(struct pci_dev *dev);
 
 void pci_no_aer(void)
 {
@@ -1053,11 +1054,9 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
 					info->status);
 		pci_aer_clear_device_status(dev);
 	} else if (info->severity == AER_NONFATAL)
-		pcie_do_recovery(dev, pci_channel_io_normal,
-				 PCIE_PORT_SERVICE_AER);
+		pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
 	else if (info->severity == AER_FATAL)
-		pcie_do_recovery(dev, pci_channel_io_frozen,
-				 PCIE_PORT_SERVICE_AER);
+		pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset);
 	pci_dev_put(dev);
 }
 
@@ -1094,10 +1093,10 @@ static void aer_recover_work_func(struct work_struct *work)
 		cper_print_aer(pdev, entry.severity, entry.regs);
 		if (entry.severity == AER_NONFATAL)
 			pcie_do_recovery(pdev, pci_channel_io_normal,
-					 PCIE_PORT_SERVICE_AER);
+					 aer_root_reset);
 		else if (entry.severity == AER_FATAL)
 			pcie_do_recovery(pdev, pci_channel_io_frozen,
-					 PCIE_PORT_SERVICE_AER);
+					 aer_root_reset);
 		pci_dev_put(pdev);
 	}
 }
@@ -1501,7 +1500,6 @@ static struct pcie_port_service_driver aerdriver = {
 
 	.probe		= aer_probe,
 	.remove		= aer_remove,
-	.reset_link	= aer_root_reset,
 };
 
 /**
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 5c2e9d45a269..0c45133a9a91 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -235,7 +235,7 @@ static irqreturn_t dpc_handler(int irq, void *context)
 	}
 
 	/* We configure DPC so it only triggers on ERR_FATAL */
-	pcie_do_recovery(pdev, pci_channel_io_frozen, PCIE_PORT_SERVICE_DPC);
+	pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link);
 
 	return IRQ_HANDLED;
 }
@@ -321,7 +321,6 @@ static struct pcie_port_service_driver dpcdriver = {
 	.service	= PCIE_PORT_SERVICE_DPC,
 	.probe		= dpc_probe,
 	.remove		= dpc_remove,
-	.reset_link	= dpc_reset_link,
 };
 
 int __init pcie_dpc_init(void)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 6e52591a4722..caeb6f5d9970 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -146,50 +146,9 @@ static int report_resume(struct pci_dev *dev, void *data)
 	return 0;
 }
 
-/**
- * default_reset_link - default reset function
- * @dev: pointer to pci_dev data structure
- *
- * Invoked when performing link reset on a Downstream Port or a
- * Root Port with no aer driver.
- */
-static pci_ers_result_t default_reset_link(struct pci_dev *dev)
-{
-	int rc;
-
-	rc = pci_bus_error_reset(dev);
-	pci_printk(KERN_DEBUG, dev, "downstream link has been reset\n");
-	return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
-}
-
-static pci_ers_result_t reset_link(struct pci_dev *dev, u32 service)
-{
-	pci_ers_result_t status;
-	struct pcie_port_service_driver *driver = NULL;
-
-	driver = pcie_port_find_service(dev, service);
-	if (driver && driver->reset_link) {
-		status = driver->reset_link(dev);
-	} else if (pcie_downstream_port(dev)) {
-		status = default_reset_link(dev);
-	} else {
-		pci_printk(KERN_DEBUG, dev, "no link-reset support at upstream device %s\n",
-			pci_name(dev));
-		return PCI_ERS_RESULT_DISCONNECT;
-	}
-
-	if ((status != PCI_ERS_RESULT_RECOVERED) &&
-	    (status != PCI_ERS_RESULT_NEED_RESET)) {
-		pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s failed\n",
-			pci_name(dev));
-		return PCI_ERS_RESULT_DISCONNECT;
-	}
-
-	return status;
-}
-
-void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
-		      u32 service)
+void pcie_do_recovery(struct pci_dev *dev,
+		      enum pci_channel_state state,
+		      pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
 {
 	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
 	struct pci_bus *bus;
@@ -206,9 +165,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
 	pci_dbg(dev, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
 		pci_walk_bus(bus, report_frozen_detected, &status);
-		status = reset_link(dev, service);
-		if (status == PCI_ERS_RESULT_DISCONNECT)
+		status = reset_link(dev);
+		if ((status != PCI_ERS_RESULT_RECOVERED) &&
+		    (status != PCI_ERS_RESULT_NEED_RESET)) {
+			pci_dbg(dev, "link reset at upstream device failed\n");
 			goto failed;
+		}
 	} else {
 		pci_walk_bus(bus, report_normal_detected, &status);
 	}
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 1e673619b101..64b5e081cdb2 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -92,9 +92,6 @@ struct pcie_port_service_driver {
 	/* Device driver may resume normal operations */
 	void (*error_resume)(struct pci_dev *dev);
 
-	/* Link Reset Capability - AER service driver specific */
-	pci_ers_result_t (*reset_link)(struct pci_dev *dev);
-
 	int port_type;  /* Type of the port this driver can handle */
 	u32 service;    /* Port service this device represents */
 
@@ -161,7 +158,5 @@ static inline int pcie_aer_get_firmware_first(struct pci_dev *pci_dev)
 }
 #endif
 
-struct pcie_port_service_driver *pcie_port_find_service(struct pci_dev *dev,
-							u32 service);
 struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
 #endif /* _PORTDRV_H_ */
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 5075cb9e850c..50a9522ab07d 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -458,27 +458,6 @@ static int find_service_iter(struct device *device, void *data)
 	return 0;
 }
 
-/**
- * pcie_port_find_service - find the service driver
- * @dev: PCI Express port the service is associated with
- * @service: Service to find
- *
- * Find PCI Express port service driver associated with given service
- */
-struct pcie_port_service_driver *pcie_port_find_service(struct pci_dev *dev,
-							u32 service)
-{
-	struct pcie_port_service_driver *drv;
-	struct portdrv_service_data pdrvs;
-
-	pdrvs.drv = NULL;
-	pdrvs.service = service;
-	device_for_each_child(&dev->dev, &pdrvs, find_service_iter);
-
-	drv = pdrvs.drv;
-	return drv;
-}
-
 /**
  * pcie_port_find_device - find the struct device
  * @dev: PCI Express port the service is associated with
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v18 06/11] PCI/ERR: Return status of pcie_do_recovery()
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
                   ` (4 preceding siblings ...)
  2020-03-24  0:26 ` [PATCH v18 05/11] PCI/ERR: Remove service dependency in pcie_do_recovery() sathyanarayanan.kuppuswamy
@ 2020-03-24  0:26 ` sathyanarayanan.kuppuswamy
  2020-03-24  0:26 ` [PATCH v18 07/11] PCI/DPC: Cache DPC capabilities in pci_init_capabilities() sathyanarayanan.kuppuswamy
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:26 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

As per the DPC Enhancements ECN [1], sec 4.5.1, table 4-4, if the OS
supports Error Disconnect Recover (EDR), it must invalidate the software
state associated with child devices of the port without attempting to
access the child device hardware. In addition, if the OS supports DPC, it
must attempt to recover the child devices if the port implements the DPC
Capability. If the OS continues operation, the OS must inform the firmware
of the status of the recovery operation via the _OST method.

Return the result of pcie_do_recovery() so we can report it to firmware via
_OST.

[1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
    affecting PCI Firmware Specification, Rev. 3.2
    https://members.pcisig.com/wg/PCI-SIG/document/12888
Link: https://lore.kernel.org/r/a795fe1f1f42f5aa1d334768d4e719d8c147894e.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci.h      |  5 +++--
 drivers/pci/pcie/err.c | 10 ++++++----
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 3e5efb83e9a2..efbe94096050 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -547,8 +547,9 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev)
 #endif
 
 /* PCI error reporting and recovery */
-void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
-		      pci_ers_result_t (*reset_link)(struct pci_dev *pdev));
+pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
+			enum pci_channel_state state,
+			pci_ers_result_t (*reset_link)(struct pci_dev *pdev));
 
 bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
 #ifdef CONFIG_PCIEASPM
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index caeb6f5d9970..7881de20af29 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -146,9 +146,9 @@ static int report_resume(struct pci_dev *dev, void *data)
 	return 0;
 }
 
-void pcie_do_recovery(struct pci_dev *dev,
-		      enum pci_channel_state state,
-		      pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
+pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
+			enum pci_channel_state state,
+			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
 {
 	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
 	struct pci_bus *bus;
@@ -201,11 +201,13 @@ void pcie_do_recovery(struct pci_dev *dev,
 	pci_aer_clear_device_status(dev);
 	pci_cleanup_aer_uncorrect_error_status(dev);
 	pci_info(dev, "device recovery successful\n");
-	return;
+	return status;
 
 failed:
 	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
 
 	/* TODO: Should kernel panic here? */
 	pci_info(dev, "device recovery failed\n");
+
+	return status;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v18 07/11] PCI/DPC: Cache DPC capabilities in pci_init_capabilities()
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
                   ` (5 preceding siblings ...)
  2020-03-24  0:26 ` [PATCH v18 06/11] PCI/ERR: Return status of pcie_do_recovery() sathyanarayanan.kuppuswamy
@ 2020-03-24  0:26 ` sathyanarayanan.kuppuswamy
  2020-03-24  0:26 ` [PATCH v18 08/11] PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error Status sathyanarayanan.kuppuswamy
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:26 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Since Error Disconnect Recover needs to use DPC error handling routines
even if the OS doesn't have control of DPC, move the initalization and
caching of DPC capabilities from the DPC driver to pci_init_capabilities().

Link: https://lore.kernel.org/r/6ac4e893e7d1054fe43efed0f89ca02f072c3190.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci.h      |  2 ++
 drivers/pci/pcie/dpc.c | 33 +++++++++++++++++++++------------
 drivers/pci/probe.c    |  1 +
 3 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index efbe94096050..e48677a0ba42 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -448,9 +448,11 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
 #ifdef CONFIG_PCIE_DPC
 void pci_save_dpc_state(struct pci_dev *dev);
 void pci_restore_dpc_state(struct pci_dev *dev);
+void pci_dpc_init(struct pci_dev *pdev);
 #else
 static inline void pci_save_dpc_state(struct pci_dev *dev) {}
 static inline void pci_restore_dpc_state(struct pci_dev *dev) {}
+static inline void pci_dpc_init(struct pci_dev *pdev) {}
 #endif
 
 #ifdef CONFIG_PCI_ATS
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 0c45133a9a91..5870a0f154fc 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -257,6 +257,27 @@ static irqreturn_t dpc_irq(int irq, void *context)
 	return IRQ_HANDLED;
 }
 
+void pci_dpc_init(struct pci_dev *pdev)
+{
+	u16 cap;
+
+	pdev->dpc_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_DPC);
+	if (!pdev->dpc_cap)
+		return;
+
+	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap);
+	if (!(cap & PCI_EXP_DPC_CAP_RP_EXT))
+		return;
+
+	pdev->dpc_rp_extensions = true;
+	pdev->dpc_rp_log_size = (cap & PCI_EXP_DPC_RP_PIO_LOG_SIZE) >> 8;
+	if (pdev->dpc_rp_log_size < 4 || pdev->dpc_rp_log_size > 9) {
+		pci_err(pdev, "RP PIO log size %u is invalid\n",
+			pdev->dpc_rp_log_size);
+		pdev->dpc_rp_log_size = 0;
+	}
+}
+
 #define FLAG(x, y) (((x) & (y)) ? '+' : '-')
 static int dpc_probe(struct pcie_device *dev)
 {
@@ -268,8 +289,6 @@ static int dpc_probe(struct pcie_device *dev)
 	if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native)
 		return -ENOTSUPP;
 
-	pdev->dpc_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_DPC);
-
 	status = devm_request_threaded_irq(device, dev->irq, dpc_irq,
 					   dpc_handler, IRQF_SHARED,
 					   "pcie-dpc", pdev);
@@ -282,16 +301,6 @@ static int dpc_probe(struct pcie_device *dev)
 	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap);
 	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl);
 
-	pdev->dpc_rp_extensions = (cap & PCI_EXP_DPC_CAP_RP_EXT) ? 1 : 0;
-	if (pdev->dpc_rp_extensions) {
-		pdev->dpc_rp_log_size = (cap & PCI_EXP_DPC_RP_PIO_LOG_SIZE) >> 8;
-		if (pdev->dpc_rp_log_size < 4 || pdev->dpc_rp_log_size > 9) {
-			pci_err(pdev, "RP PIO log size %u is invalid\n",
-				pdev->dpc_rp_log_size);
-			pdev->dpc_rp_log_size = 0;
-		}
-	}
-
 	ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN;
 	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 512cb4312ddd..c6f91f886818 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2329,6 +2329,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	pci_enable_acs(dev);		/* Enable ACS P2P upstream forwarding */
 	pci_ptm_init(dev);		/* Precision Time Measurement */
 	pci_aer_init(dev);		/* Advanced Error Reporting */
+	pci_dpc_init(dev);		/* Downstream Port Containment */
 
 	pcie_report_downtraining(dev);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v18 08/11] PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error Status
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
                   ` (6 preceding siblings ...)
  2020-03-24  0:26 ` [PATCH v18 07/11] PCI/DPC: Cache DPC capabilities in pci_init_capabilities() sathyanarayanan.kuppuswamy
@ 2020-03-24  0:26 ` sathyanarayanan.kuppuswamy
  2020-03-24  0:26 ` [PATCH v18 09/11] PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR sathyanarayanan.kuppuswamy
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:26 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Per the SFI _OSC and DPC Updates ECN [1] implementation note flowchart, the
OS seems to be expected to clear AER status even if it doesn't have
ownership of the AER capability.  Unlike the DPC capability, where a DPC
ECN [2] specifies a window when the OS is allowed to access DPC registers
even if it doesn't have ownership, there is no clear model for AER.

Add pci_aer_raw_clear_status() to clear the AER error status registers
unconditionally.  This is intended for use only by the EDR path (see [2]).

[1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
    2020, affecting PCI Firmware Specification, Rev. 3.2
    https://members.pcisig.com/wg/PCI-SIG/document/14076
[2] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
    affecting PCI Firmware Specification, Rev. 3.2
    https://members.pcisig.com/wg/PCI-SIG/document/12888
[bhelgaas: changelog]
Link: https://lore.kernel.org/r/29fb514a0d86e9bcc75cec4ea8474cd4db33adbf.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci.h      |  2 ++
 drivers/pci/pcie/aer.c | 22 ++++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e48677a0ba42..6d09bb22b73d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -654,12 +654,14 @@ void pci_aer_exit(struct pci_dev *dev);
 extern const struct attribute_group aer_stats_attr_group;
 void pci_aer_clear_fatal_status(struct pci_dev *dev);
 void pci_aer_clear_device_status(struct pci_dev *dev);
+int pci_aer_raw_clear_status(struct pci_dev *dev);
 #else
 static inline void pci_no_aer(void) { }
 static inline void pci_aer_init(struct pci_dev *d) { }
 static inline void pci_aer_exit(struct pci_dev *d) { }
 static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
 static inline void pci_aer_clear_device_status(struct pci_dev *dev) { }
+static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
 #endif
 
 #ifdef CONFIG_ACPI
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index c0540c3761dc..bd9f122165e0 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -420,7 +420,16 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
 		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
 }
 
-int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
+/**
+ * pci_aer_raw_clear_status - Clear AER error registers.
+ * @dev: the PCI device
+ *
+ * Clearing AER error status registers unconditionally, regardless of
+ * whether they're owned by firmware or the OS.
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int pci_aer_raw_clear_status(struct pci_dev *dev)
 {
 	int pos;
 	u32 status;
@@ -433,9 +442,6 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
 	if (!pos)
 		return -EIO;
 
-	if (pcie_aer_get_firmware_first(dev))
-		return -EIO;
-
 	port_type = pci_pcie_type(dev);
 	if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
 		pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &status);
@@ -451,6 +457,14 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
 	return 0;
 }
 
+int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
+{
+	if (pcie_aer_get_firmware_first(dev))
+		return -EIO;
+
+	return pci_aer_raw_clear_status(dev);
+}
+
 void pci_save_aer_state(struct pci_dev *dev)
 {
 	struct pci_cap_saved_state *save_state;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v18 09/11] PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
                   ` (7 preceding siblings ...)
  2020-03-24  0:26 ` [PATCH v18 08/11] PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error Status sathyanarayanan.kuppuswamy
@ 2020-03-24  0:26 ` sathyanarayanan.kuppuswamy
  2020-03-24  0:26 ` [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:26 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

If firmware controls DPC, it is generally responsible for managing the DPC
capability and events, and the OS should not access the DPC capability.

However, if firmware controls DPC and both the OS and the platform support
Error Disconnect Recover (EDR) notifications, the OS EDR notify handler is
responsible for recovery, and the notify handler may read/write the DPC
capability until it clears the DPC Trigger Status bit.  See [1], sec 4.5.1,
table 4-6.

Expose some DPC error handling functions so they can be used by the EDR
notify handler.

[1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
    affecting PCI Firmware Specification, Rev. 3.2
    https://members.pcisig.com/wg/PCI-SIG/document/12888
Link: https://lore.kernel.org/r/ac8816d4d41d0894720660f9b51dbeac0842869d.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci.h      |  2 ++
 drivers/pci/pcie/dpc.c | 12 +++++++++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6d09bb22b73d..25265bf80a83 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -449,6 +449,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
 void pci_save_dpc_state(struct pci_dev *dev);
 void pci_restore_dpc_state(struct pci_dev *dev);
 void pci_dpc_init(struct pci_dev *pdev);
+void dpc_process_error(struct pci_dev *pdev);
+pci_ers_result_t dpc_reset_link(struct pci_dev *pdev);
 #else
 static inline void pci_save_dpc_state(struct pci_dev *dev) {}
 static inline void pci_restore_dpc_state(struct pci_dev *dev) {}
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 5870a0f154fc..e9087f5f32ec 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -90,7 +90,7 @@ static int dpc_wait_rp_inactive(struct pci_dev *pdev)
 	return 0;
 }
 
-static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
+pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 {
 	u16 cap;
 
@@ -201,9 +201,8 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
 	return 1;
 }
 
-static irqreturn_t dpc_handler(int irq, void *context)
+void dpc_process_error(struct pci_dev *pdev)
 {
-	struct pci_dev *pdev = context;
 	u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
 	struct aer_err_info info;
 
@@ -233,6 +232,13 @@ static irqreturn_t dpc_handler(int irq, void *context)
 		pci_cleanup_aer_uncorrect_error_status(pdev);
 		pci_aer_clear_fatal_status(pdev);
 	}
+}
+
+static irqreturn_t dpc_handler(int irq, void *context)
+{
+	struct pci_dev *pdev = context;
+
+	dpc_process_error(pdev);
 
 	/* We configure DPC so it only triggers on ERR_FATAL */
 	pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
                   ` (8 preceding siblings ...)
  2020-03-24  0:26 ` [PATCH v18 09/11] PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR sathyanarayanan.kuppuswamy
@ 2020-03-24  0:26 ` sathyanarayanan.kuppuswamy
  2020-03-24 21:37   ` Bjorn Helgaas
  2024-04-11 18:07   ` Bjorn Helgaas
  2020-03-24  0:26 ` [PATCH v18 11/11] PCI/AER: Rationalize error status register clearing sathyanarayanan.kuppuswamy
  2020-03-31 15:28 ` [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support Bjorn Helgaas
  11 siblings, 2 replies; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:26 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy,
	Len Brown, Rafael J. Wysocki

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Error Disconnect Recover (EDR) is a feature that allows ACPI firmware to
notify OSPM that a device has been disconnected due to an error condition
(ACPI v6.3, sec 5.6.6).  OSPM advertises its support for EDR on PCI devices
via _OSC (see [1], sec 4.5.1, table 4-4).  The OSPM EDR notify handler
should invalidate software state associated with disconnected devices and
may attempt to recover them.  OSPM communicates the status of recovery to
the firmware via _OST (sec 6.3.5.2).

For PCIe, firmware may use Downstream Port Containment (DPC) to support
EDR.  Per [1], sec 4.5.1, table 4-6, even if firmware has retained control
of DPC, OSPM may read/write DPC control and status registers during the EDR
notification processing window, i.e., from the time it receives an EDR
notification until it clears the DPC Trigger Status.

Note that per [1], sec 4.5.1 and 4.5.2.4,

  1. If the OS supports EDR, it should advertise that to firmware by
     setting OSC_PCI_EDR_SUPPORT in _OSC Support.

  2. If the OS sets OSC_PCI_EXPRESS_DPC_CONTROL in _OSC Control to request
     control of the DPC capability, it must also set OSC_PCI_EDR_SUPPORT in
     _OSC Support.

Add an EDR notify handler to attempt recovery.

[1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
    affecting PCI Firmware Specification, Rev. 3.2
    https://members.pcisig.com/wg/PCI-SIG/document/12888
Link: https://lore.kernel.org/r/9ae1d3285beeb81bbf85571a89b8f3d4451eae8f.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
Link: https://lore.kernel.org/r/246aa05acca8f0a7e6d20a65ab05af0027f60118.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
[bhelgaas: squash add/enable patches into one]
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
---
 drivers/acpi/pci_root.c   |  15 +++
 drivers/pci/pci-acpi.c    |   2 +
 drivers/pci/pcie/Kconfig  |  10 ++
 drivers/pci/pcie/Makefile |   1 +
 drivers/pci/pcie/edr.c    | 251 ++++++++++++++++++++++++++++++++++++++
 drivers/pci/probe.c       |   1 +
 include/linux/acpi.h      |   6 +-
 include/linux/pci-acpi.h  |   8 ++
 include/linux/pci.h       |   1 +
 9 files changed, 293 insertions(+), 2 deletions(-)
 create mode 100644 drivers/pci/pcie/edr.c

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index d1e666ef3fcc..0cb9df5462c3 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -131,6 +131,7 @@ static struct pci_osc_bit_struct pci_osc_support_bit[] = {
 	{ OSC_PCI_CLOCK_PM_SUPPORT, "ClockPM" },
 	{ OSC_PCI_SEGMENT_GROUPS_SUPPORT, "Segments" },
 	{ OSC_PCI_MSI_SUPPORT, "MSI" },
+	{ OSC_PCI_EDR_SUPPORT, "EDR" },
 	{ OSC_PCI_HPX_TYPE_3_SUPPORT, "HPX-Type3" },
 };
 
@@ -141,6 +142,7 @@ static struct pci_osc_bit_struct pci_osc_control_bit[] = {
 	{ OSC_PCI_EXPRESS_AER_CONTROL, "AER" },
 	{ OSC_PCI_EXPRESS_CAPABILITY_CONTROL, "PCIeCapability" },
 	{ OSC_PCI_EXPRESS_LTR_CONTROL, "LTR" },
+	{ OSC_PCI_EXPRESS_DPC_CONTROL, "DPC" },
 };
 
 static void decode_osc_bits(struct acpi_pci_root *root, char *msg, u32 word,
@@ -440,6 +442,8 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm,
 		support |= OSC_PCI_ASPM_SUPPORT | OSC_PCI_CLOCK_PM_SUPPORT;
 	if (pci_msi_enabled())
 		support |= OSC_PCI_MSI_SUPPORT;
+	if (IS_ENABLED(CONFIG_PCIE_EDR))
+		support |= OSC_PCI_EDR_SUPPORT;
 
 	decode_osc_support(root, "OS supports", support);
 	status = acpi_pci_osc_support(root, support);
@@ -487,6 +491,15 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm,
 			control |= OSC_PCI_EXPRESS_AER_CONTROL;
 	}
 
+	/*
+	 * Per the Downstream Port Containment Related Enhancements ECN to
+	 * the PCI Firmware Spec, r3.2, sec 4.5.1, table 4-5,
+	 * OSC_PCI_EXPRESS_DPC_CONTROL indicates the OS supports both DPC
+	 * and EDR.
+	 */
+	if (IS_ENABLED(CONFIG_PCIE_DPC) && IS_ENABLED(CONFIG_PCIE_EDR))
+		control |= OSC_PCI_EXPRESS_DPC_CONTROL;
+
 	requested = control;
 	status = acpi_pci_osc_control_set(handle, &control,
 					  OSC_PCI_EXPRESS_CAPABILITY_CONTROL);
@@ -916,6 +929,8 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
 		host_bridge->native_pme = 0;
 	if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
 		host_bridge->native_ltr = 0;
+	if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
+		host_bridge->native_dpc = 0;
 
 	/*
 	 * Evaluate the "PCI Boot Configuration" _DSM Function.  If it
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 1bf8765c41bd..fec50d4af415 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1203,6 +1203,7 @@ static void pci_acpi_setup(struct device *dev)
 
 	pci_acpi_optimize_delay(pci_dev, adev->handle);
 	pci_acpi_set_untrusted(pci_dev);
+	pci_acpi_add_edr_notifier(pci_dev);
 
 	pci_acpi_add_pm_notifier(adev, pci_dev);
 	if (!adev->wakeup.flags.valid)
@@ -1230,6 +1231,7 @@ static void pci_acpi_cleanup(struct device *dev)
 	if (!adev)
 		return;
 
+	pci_acpi_remove_edr_notifier(pci_dev);
 	pci_acpi_remove_pm_notifier(adev);
 	if (adev->wakeup.flags.valid) {
 		acpi_device_power_remove_dependent(adev, dev);
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 6e3c04b46fb1..772b1f4cb19e 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -140,3 +140,13 @@ config PCIE_BW
 	  This enables PCI Express Bandwidth Change Notification.  If
 	  you know link width or rate changes occur only to correct
 	  unreliable links, you may answer Y.
+
+config PCIE_EDR
+	bool "PCI Express Error Disconnect Recover support"
+	depends on PCIE_DPC && ACPI
+	help
+	  This option adds Error Disconnect Recover support as specified
+	  in the Downstream Port Containment Related Enhancements ECN to
+	  the PCI Firmware Specification r3.2.  Enable this if you want to
+	  support hybrid DPC model which uses both firmware and OS to
+	  implement DPC.
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index efb9d2e71e9e..68da9280ff11 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_PCIE_PME)		+= pme.o
 obj-$(CONFIG_PCIE_DPC)		+= dpc.o
 obj-$(CONFIG_PCIE_PTM)		+= ptm.o
 obj-$(CONFIG_PCIE_BW)		+= bw_notification.o
+obj-$(CONFIG_PCIE_EDR)		+= edr.o
diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c
new file mode 100644
index 000000000000..a4378d758599
--- /dev/null
+++ b/drivers/pci/pcie/edr.c
@@ -0,0 +1,251 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCI Error Disconnect Recover support
+ * Author: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
+ *
+ * Copyright (C) 2020 Intel Corp.
+ */
+
+#define dev_fmt(fmt) "EDR: " fmt
+
+#include <linux/pci.h>
+#include <linux/pci-acpi.h>
+
+#include "portdrv.h"
+#include "../pci.h"
+
+#define EDR_PORT_DPC_ENABLE_DSM		0x0C
+#define EDR_PORT_LOCATE_DSM		0x0D
+#define EDR_OST_SUCCESS			0x80
+#define EDR_OST_FAILED			0x81
+
+/*
+ * _DSM wrapper function to enable/disable DPC
+ * @pdev   : PCI device structure
+ *
+ * returns 0 on success or errno on failure.
+ */
+static int acpi_enable_dpc(struct pci_dev *pdev)
+{
+	struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
+	union acpi_object *obj, argv4, req;
+	int status;
+
+	/*
+	 * Some firmware implementations will return default values for
+	 * unsupported _DSM calls. So checking acpi_evaluate_dsm() return
+	 * value for NULL condition is not a complete method for finding
+	 * whether given _DSM function is supported or not. So use
+	 * explicit func 0 call to find whether given _DSM function is
+	 * supported or not.
+	 */
+        status = acpi_check_dsm(adev->handle, &pci_acpi_dsm_guid, 5,
+				1ULL << EDR_PORT_DPC_ENABLE_DSM);
+        if (!status)
+                return 0;
+
+	status = 0;
+	req.type = ACPI_TYPE_INTEGER;
+	req.integer.value = 1;
+
+	argv4.type = ACPI_TYPE_PACKAGE;
+	argv4.package.count = 1;
+	argv4.package.elements = &req;
+
+	/*
+	 * Per Downstream Port Containment Related Enhancements ECN to PCI
+	 * Firmware Specification r3.2, sec 4.6.12, EDR_PORT_DPC_ENABLE_DSM is
+	 * optional.  Return success if it's not implemented.
+	 */
+	obj = acpi_evaluate_dsm(adev->handle, &pci_acpi_dsm_guid, 5,
+				EDR_PORT_DPC_ENABLE_DSM, &argv4);
+	if (!obj)
+		return 0;
+
+	if (obj->type != ACPI_TYPE_INTEGER) {
+		pci_err(pdev, FW_BUG "Enable DPC _DSM returned non integer\n");
+		status = -EIO;
+	}
+
+	if (obj->integer.value != 1) {
+		pci_err(pdev, "Enable DPC _DSM failed to enable DPC\n");
+		status = -EIO;
+	}
+
+	ACPI_FREE(obj);
+
+	return status;
+}
+
+/*
+ * _DSM wrapper function to locate DPC port
+ * @pdev   : Device which received EDR event
+ *
+ * Returns pci_dev or NULL.  Caller is responsible for dropping a reference
+ * on the returned pci_dev with pci_dev_put().
+ */
+static struct pci_dev *acpi_dpc_port_get(struct pci_dev *pdev)
+{
+	struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
+	union acpi_object *obj;
+	u16 port;
+	bool status;
+
+	/*
+	 * Some firmware implementations will return default values for
+	 * unsupported _DSM calls. So checking acpi_evaluate_dsm() return
+	 * value for NULL condition is not a complete method for finding
+	 * whether given _DSM function is supported or not. So use
+	 * explicit func 0 call to find whether given _DSM function is
+	 * supported or not.
+	 */
+        status = acpi_check_dsm(adev->handle, &pci_acpi_dsm_guid, 5,
+				1ULL << EDR_PORT_LOCATE_DSM);
+        if (!status)
+		return pci_dev_get(pdev);
+
+	obj = acpi_evaluate_dsm(adev->handle, &pci_acpi_dsm_guid, 5,
+				EDR_PORT_LOCATE_DSM, NULL);
+	if (!obj)
+		return pci_dev_get(pdev);
+
+	if (obj->type != ACPI_TYPE_INTEGER) {
+		ACPI_FREE(obj);
+		pci_err(pdev, FW_BUG "Locate Port _DSM returned non integer\n");
+		return NULL;
+	}
+
+	/*
+	 * Firmware returns DPC port BDF details in following format:
+	 *	15:8 = bus
+	 *	 7:3 = device
+	 *	 2:0 = function
+	 */
+	port = obj->integer.value;
+
+	ACPI_FREE(obj);
+
+	return pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
+					   PCI_BUS_NUM(port), port & 0xff);
+}
+
+/*
+ * _OST wrapper function to let firmware know the status of EDR event
+ * @pdev   : Device used to send _OST
+ * @edev   : Device which experienced EDR event
+ * @status : Status of EDR event
+ */
+static int acpi_send_edr_status(struct pci_dev *pdev, struct pci_dev *edev,
+				u16 status)
+{
+	struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
+	u32 ost_status;
+
+	pci_dbg(pdev, "Status for %s: %#x\n", pci_name(edev), status);
+
+	ost_status = PCI_DEVID(edev->bus->number, edev->devfn) << 16;
+	ost_status |= status;
+
+	status = acpi_evaluate_ost(adev->handle, ACPI_NOTIFY_DISCONNECT_RECOVER,
+				   ost_status, NULL);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void edr_handle_event(acpi_handle handle, u32 event, void *data)
+{
+	struct pci_dev *pdev = data, *edev;
+	pci_ers_result_t estate = PCI_ERS_RESULT_DISCONNECT;
+	u16 status;
+
+	pci_info(pdev, "ACPI event %#x received\n", event);
+
+	if (event != ACPI_NOTIFY_DISCONNECT_RECOVER)
+		return;
+
+	/* Locate the port which issued EDR event */
+	edev = acpi_dpc_port_get(pdev);
+	if (!edev) {
+		pci_err(pdev, "Firmware failed to locate DPC port\n");
+		return;
+	}
+
+	pci_dbg(pdev, "Reported EDR dev: %s\n", pci_name(edev));
+
+	/* If port does not support DPC, just send the OST */
+	if (!edev->dpc_cap) {
+		pci_err(edev, FW_BUG "This device doesn't support DPC\n");
+		goto send_ost;
+	}
+
+	/* Check if there is a valid DPC trigger */
+	pci_read_config_word(edev, edev->dpc_cap + PCI_EXP_DPC_STATUS, &status);
+	if (!(status & PCI_EXP_DPC_STATUS_TRIGGER)) {
+		pci_err(edev, "Invalid DPC trigger %#010x\n", status);
+		goto send_ost;
+	}
+
+	dpc_process_error(edev);
+	pci_aer_raw_clear_status(edev);
+
+	/*
+	 * Irrespective of whether the DPC event is triggered by ERR_FATAL
+	 * or ERR_NONFATAL, since the link is already down, use the FATAL
+	 * error recovery path for both cases.
+	 */
+	estate = pcie_do_recovery(edev, pci_channel_io_frozen, dpc_reset_link);
+
+send_ost:
+
+	/*
+	 * If recovery is successful, send _OST(0xF, BDF << 16 | 0x80)
+	 * to firmware. If not successful, send _OST(0xF, BDF << 16 | 0x81).
+	 */
+	if (estate == PCI_ERS_RESULT_RECOVERED) {
+		pci_dbg(edev, "DPC port successfully recovered\n");
+		acpi_send_edr_status(pdev, edev, EDR_OST_SUCCESS);
+	} else {
+		pci_dbg(edev, "DPC port recovery failed\n");
+		acpi_send_edr_status(pdev, edev, EDR_OST_FAILED);
+	}
+
+	pci_dev_put(edev);
+}
+
+void pci_acpi_add_edr_notifier(struct pci_dev *pdev)
+{
+	struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
+	acpi_status status;
+
+	if (!adev) {
+		pci_dbg(pdev, "No valid ACPI node, skipping EDR init\n");
+		return;
+	}
+
+	status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
+					     edr_handle_event, pdev);
+	if (ACPI_FAILURE(status)) {
+		pci_err(pdev, "Failed to install notify handler\n");
+		return;
+	}
+
+	if (acpi_enable_dpc(pdev))
+		acpi_remove_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
+					   edr_handle_event);
+	else
+		pci_dbg(pdev, "Notify handler installed\n");
+}
+
+void pci_acpi_remove_edr_notifier(struct pci_dev *pdev)
+{
+	struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
+
+	if (!adev)
+		return;
+
+	acpi_remove_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
+				   edr_handle_event);
+	pci_dbg(pdev, "Notify handler removed\n");
+}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c6f91f886818..f67c007edcae 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -598,6 +598,7 @@ static void pci_init_host_bridge(struct pci_host_bridge *bridge)
 	bridge->native_shpc_hotplug = 1;
 	bridge->native_pme = 1;
 	bridge->native_ltr = 1;
+	bridge->native_dpc = 1;
 }
 
 struct pci_host_bridge *pci_alloc_host_bridge(size_t priv)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 0f24d701fbdc..b7d3caf6f205 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -530,8 +530,9 @@ extern bool osc_pc_lpi_support_confirmed;
 #define OSC_PCI_CLOCK_PM_SUPPORT		0x00000004
 #define OSC_PCI_SEGMENT_GROUPS_SUPPORT		0x00000008
 #define OSC_PCI_MSI_SUPPORT			0x00000010
+#define OSC_PCI_EDR_SUPPORT			0x00000080
 #define OSC_PCI_HPX_TYPE_3_SUPPORT		0x00000100
-#define OSC_PCI_SUPPORT_MASKS			0x0000011f
+#define OSC_PCI_SUPPORT_MASKS			0x0000019f
 
 /* PCI Host Bridge _OSC: Capabilities DWORD 3: Control Field */
 #define OSC_PCI_EXPRESS_NATIVE_HP_CONTROL	0x00000001
@@ -540,7 +541,8 @@ extern bool osc_pc_lpi_support_confirmed;
 #define OSC_PCI_EXPRESS_AER_CONTROL		0x00000008
 #define OSC_PCI_EXPRESS_CAPABILITY_CONTROL	0x00000010
 #define OSC_PCI_EXPRESS_LTR_CONTROL		0x00000020
-#define OSC_PCI_CONTROL_MASKS			0x0000003f
+#define OSC_PCI_EXPRESS_DPC_CONTROL		0x00000080
+#define OSC_PCI_CONTROL_MASKS			0x000000bf
 
 #define ACPI_GSB_ACCESS_ATTRIB_QUICK		0x00000002
 #define ACPI_GSB_ACCESS_ATTRIB_SEND_RCV         0x00000004
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 62b7fdcc661c..2d155bfb8fbf 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -112,6 +112,14 @@ extern const guid_t pci_acpi_dsm_guid;
 #define RESET_DELAY_DSM			0x08
 #define FUNCTION_DELAY_DSM		0x09
 
+#ifdef CONFIG_PCIE_EDR
+void pci_acpi_add_edr_notifier(struct pci_dev *pdev);
+void pci_acpi_remove_edr_notifier(struct pci_dev *pdev);
+#else
+static inline void pci_acpi_add_edr_notifier(struct pci_dev *pdev) { }
+static inline void pci_acpi_remove_edr_notifier(struct pci_dev *pdev) { }
+#endif /* CONFIG_PCIE_EDR */
+
 #else	/* CONFIG_ACPI */
 static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
 static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index a0b7e7a53741..7ed7c088c952 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -515,6 +515,7 @@ struct pci_host_bridge {
 	unsigned int	native_shpc_hotplug:1;	/* OS may use SHPC hotplug */
 	unsigned int	native_pme:1;		/* OS may use PCIe PME */
 	unsigned int	native_ltr:1;		/* OS may use PCIe LTR */
+	unsigned int	native_dpc:1;		/* OS may use PCIe DPC */
 	unsigned int	preserve_config:1;	/* Preserve FW resource setup */
 
 	/* Resource alignment requirements */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v18 11/11] PCI/AER: Rationalize error status register clearing
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
                   ` (9 preceding siblings ...)
  2020-03-24  0:26 ` [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
@ 2020-03-24  0:26 ` sathyanarayanan.kuppuswamy
  2020-03-31 15:28 ` [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support Bjorn Helgaas
  11 siblings, 0 replies; 29+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2020-03-24  0:26 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj, sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

The AER interfaces to clear error status registers were a confusing mess:

  - pci_cleanup_aer_uncorrect_error_status() cleared non-fatal errors
    from the Uncorrectable Error Status register.

  - pci_aer_clear_fatal_status() cleared fatal errors from the
    Uncorrectable Error Status register.

  - pci_cleanup_aer_error_status_regs() cleared the Root Error Status
    register (for Root Ports), the Uncorrectable Error Status register,
    and the Correctable Error Status register.

Rename them to make them consistent:

  From                                     To
  ---------------------------------------- -------------------------------
  pci_cleanup_aer_uncorrect_error_status() pci_aer_clear_nonfatal_status()
  pci_aer_clear_fatal_status()             pci_aer_clear_fatal_status()
  pci_cleanup_aer_error_status_regs()      pci_aer_clear_status()

Since pci_cleanup_aer_error_status_regs() (renamed to
pci_aer_clear_status()) is only used within drivers/pci/, move the
declaration from <linux/aer.h> to drivers/pci/pci.h.

[bhelgaas: commit log, add renames]
Link: https://lore.kernel.org/r/ca897f459ccb6da6bad81e3893d8daf9e865fac1.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/pcieaer-howto.rst       | 4 ++--
 drivers/net/ethernet/intel/ice/ice_main.c | 4 ++--
 drivers/ntb/hw/idt/ntb_hw_idt.c           | 4 ++--
 drivers/pci/pci.c                         | 2 +-
 drivers/pci/pci.h                         | 2 ++
 drivers/pci/pcie/aer.c                    | 8 ++++----
 drivers/pci/pcie/dpc.c                    | 2 +-
 drivers/pci/pcie/err.c                    | 2 +-
 drivers/scsi/lpfc/lpfc_attr.c             | 4 ++--
 include/linux/aer.h                       | 9 ++-------
 10 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst
index afbd8c1c321d..0b36b9ebfa4b 100644
--- a/Documentation/PCI/pcieaer-howto.rst
+++ b/Documentation/PCI/pcieaer-howto.rst
@@ -232,9 +232,9 @@ messages to root port when an error is detected.
 
 ::
 
-  int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
+  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);`
 
-pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
+pci_aer_clear_nonfatal_status clears non-fatal errors in the uncorrectable
 error status register.
 
 Frequent Asked Questions
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 5ae671609f98..effca3fa92e0 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3495,10 +3495,10 @@ static pci_ers_result_t ice_pci_err_slot_reset(struct pci_dev *pdev)
 			result = PCI_ERS_RESULT_DISCONNECT;
 	}
 
-	err = pci_cleanup_aer_uncorrect_error_status(pdev);
+	err = pci_aer_clear_nonfatal_status(pdev);
 	if (err)
 		dev_dbg(&pdev->dev,
-			"pci_cleanup_aer_uncorrect_error_status failed, error %d\n",
+			"pci_aer_clear_nonfatal_status() failed, error %d\n",
 			err);
 		/* non-fatal, continue */
 
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index dcf234680535..edae52384b8a 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2674,8 +2674,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
 	ret = pci_enable_pcie_error_reporting(pdev);
 	if (ret != 0)
 		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
-	else /* Cleanup uncorrectable error status before getting to init */
-		pci_cleanup_aer_uncorrect_error_status(pdev);
+	else /* Cleanup nonfatal error status before getting to init */
+		pci_aer_clear_nonfatal_status(pdev);
 
 	/* First enable the PCI device */
 	ret = pcim_enable_device(pdev);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e724341cefff..735d59b27cfd 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1503,7 +1503,7 @@ void pci_restore_state(struct pci_dev *dev)
 	pci_restore_rebar_state(dev);
 	pci_restore_dpc_state(dev);
 
-	pci_cleanup_aer_error_status_regs(dev);
+	pci_aer_clear_status(dev);
 	pci_restore_aer_state(dev);
 
 	pci_restore_config_space(dev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 25265bf80a83..bd46f23e3db1 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -656,6 +656,7 @@ void pci_aer_exit(struct pci_dev *dev);
 extern const struct attribute_group aer_stats_attr_group;
 void pci_aer_clear_fatal_status(struct pci_dev *dev);
 void pci_aer_clear_device_status(struct pci_dev *dev);
+int pci_aer_clear_status(struct pci_dev *dev);
 int pci_aer_raw_clear_status(struct pci_dev *dev);
 #else
 static inline void pci_no_aer(void) { }
@@ -663,6 +664,7 @@ static inline void pci_aer_init(struct pci_dev *d) { }
 static inline void pci_aer_exit(struct pci_dev *d) { }
 static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
 static inline void pci_aer_clear_device_status(struct pci_dev *dev) { }
+static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
 static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
 #endif
 
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index bd9f122165e0..f4274d301235 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -377,7 +377,7 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
 	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
 }
 
-int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
+int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
 	int pos;
 	u32 status, sev;
@@ -398,7 +398,7 @@ int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(pci_cleanup_aer_uncorrect_error_status);
+EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
 
 void pci_aer_clear_fatal_status(struct pci_dev *dev)
 {
@@ -457,7 +457,7 @@ int pci_aer_raw_clear_status(struct pci_dev *dev)
 	return 0;
 }
 
-int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
+int pci_aer_clear_status(struct pci_dev *dev)
 {
 	if (pcie_aer_get_firmware_first(dev))
 		return -EIO;
@@ -530,7 +530,7 @@ void pci_aer_init(struct pci_dev *dev)
 	n = pcie_cap_has_rtctl(dev) ? 5 : 4;
 	pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_ERR, sizeof(u32) * n);
 
-	pci_cleanup_aer_error_status_regs(dev);
+	pci_aer_clear_status(dev);
 }
 
 void pci_aer_exit(struct pci_dev *dev)
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index e9087f5f32ec..338f72f27043 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -229,7 +229,7 @@ void dpc_process_error(struct pci_dev *pdev)
 		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
 		 aer_get_device_error_info(pdev, &info)) {
 		aer_print_error(pdev, &info);
-		pci_cleanup_aer_uncorrect_error_status(pdev);
+		pci_aer_clear_nonfatal_status(pdev);
 		pci_aer_clear_fatal_status(pdev);
 	}
 }
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 7881de20af29..bb25006fb721 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -199,7 +199,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_walk_bus(bus, report_resume, &status);
 
 	pci_aer_clear_device_status(dev);
-	pci_cleanup_aer_uncorrect_error_status(dev);
+	pci_aer_clear_nonfatal_status(dev);
 	pci_info(dev, "device recovery successful\n");
 	return status;
 
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 46f56f30f77e..847300de7ff1 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -4783,7 +4783,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
  * Description:
  * If the @buf contains 1 and the device currently has the AER support
  * enabled, then invokes the kernel AER helper routine
- * pci_cleanup_aer_uncorrect_error_status to clean up the uncorrectable
+ * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
  * error status register.
  *
  * Notes:
@@ -4809,7 +4809,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
 		return -EINVAL;
 
 	if (phba->hba_flag & HBA_AER_ENABLED)
-		rc = pci_cleanup_aer_uncorrect_error_status(phba->pcidev);
+		rc = pci_aer_clear_nonfatal_status(phba->pcidev);
 
 	if (rc == 0)
 		return strlen(buf);
diff --git a/include/linux/aer.h b/include/linux/aer.h
index fa19e01f418a..97f64ba1b34a 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -44,8 +44,7 @@ struct aer_capability_regs {
 /* PCIe port driver needs this function to enable AER */
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
-int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
-int pci_cleanup_aer_error_status_regs(struct pci_dev *dev);
+int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 void pci_save_aer_state(struct pci_dev *dev);
 void pci_restore_aer_state(struct pci_dev *dev);
 #else
@@ -57,11 +56,7 @@ static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 {
 	return -EINVAL;
 }
-static inline int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
-{
-	return -EINVAL;
-}
-static inline int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
+static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
 	return -EINVAL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support
  2020-03-24  0:26 ` [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
@ 2020-03-24 21:37   ` Bjorn Helgaas
  2020-03-25  1:00     ` Kuppuswamy, Sathyanarayanan
  2024-04-11 18:07   ` Bjorn Helgaas
  1 sibling, 1 reply; 29+ messages in thread
From: Bjorn Helgaas @ 2020-03-24 21:37 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, Len Brown, Rafael J. Wysocki

On Mon, Mar 23, 2020 at 05:26:07PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> Error Disconnect Recover (EDR) is a feature that allows ACPI firmware to
> notify OSPM that a device has been disconnected due to an error condition
> (ACPI v6.3, sec 5.6.6).  OSPM advertises its support for EDR on PCI devices
> via _OSC (see [1], sec 4.5.1, table 4-4).  The OSPM EDR notify handler
> should invalidate software state associated with disconnected devices and
> may attempt to recover them.  OSPM communicates the status of recovery to
> the firmware via _OST (sec 6.3.5.2).
> 
> For PCIe, firmware may use Downstream Port Containment (DPC) to support
> EDR.  Per [1], sec 4.5.1, table 4-6, even if firmware has retained control
> of DPC, OSPM may read/write DPC control and status registers during the EDR
> notification processing window, i.e., from the time it receives an EDR
> notification until it clears the DPC Trigger Status.
> 
> Note that per [1], sec 4.5.1 and 4.5.2.4,
> 
>   1. If the OS supports EDR, it should advertise that to firmware by
>      setting OSC_PCI_EDR_SUPPORT in _OSC Support.
> 
>   2. If the OS sets OSC_PCI_EXPRESS_DPC_CONTROL in _OSC Control to request
>      control of the DPC capability, it must also set OSC_PCI_EDR_SUPPORT in
>      _OSC Support.
> 
> Add an EDR notify handler to attempt recovery.
> 
> [1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
>     affecting PCI Firmware Specification, Rev. 3.2
>     https://members.pcisig.com/wg/PCI-SIG/document/12888
> Link: https://lore.kernel.org/r/9ae1d3285beeb81bbf85571a89b8f3d4451eae8f.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
> Link: https://lore.kernel.org/r/246aa05acca8f0a7e6d20a65ab05af0027f60118.1583286655.git.sathyanarayanan.kuppuswamy@linux.intel.com
> [bhelgaas: squash add/enable patches into one]
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Len Brown <lenb@kernel.org>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>

> +static int acpi_enable_dpc(struct pci_dev *pdev)
> +{
> +	struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
> +	union acpi_object *obj, argv4, req;
> +	int status;
> +
> +	/*
> +	 * Some firmware implementations will return default values for
> +	 * unsupported _DSM calls. So checking acpi_evaluate_dsm() return
> +	 * value for NULL condition is not a complete method for finding
> +	 * whether given _DSM function is supported or not. So use
> +	 * explicit func 0 call to find whether given _DSM function is
> +	 * supported or not.
> +	 */
> +        status = acpi_check_dsm(adev->handle, &pci_acpi_dsm_guid, 5,
> +				1ULL << EDR_PORT_DPC_ENABLE_DSM);

This is really ugly.  What's the story on this firmware?  It sounds
defective to me.

Or is everybody that uses _DSM supposed to check before evaluating it?
E.g.,

  if (!acpi_check_dsm(...))
    return -EINVAL;

  obj = acpi_evaluate_dsm(...);

If everybody is supposed to do this, it seems like the check part
should be moved into acpi_evaluate_dsm().

> +        if (!status)
> +                return 0;
> +
> +	status = 0;
> +	req.type = ACPI_TYPE_INTEGER;
> +	req.integer.value = 1;
> +
> +	argv4.type = ACPI_TYPE_PACKAGE;
> +	argv4.package.count = 1;
> +	argv4.package.elements = &req;
> +
> +	/*
> +	 * Per Downstream Port Containment Related Enhancements ECN to PCI
> +	 * Firmware Specification r3.2, sec 4.6.12, EDR_PORT_DPC_ENABLE_DSM is
> +	 * optional.  Return success if it's not implemented.
> +	 */
> +	obj = acpi_evaluate_dsm(adev->handle, &pci_acpi_dsm_guid, 5,
> +				EDR_PORT_DPC_ENABLE_DSM, &argv4);
> +	if (!obj)
> +		return 0;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case
  2020-03-24  0:26 ` [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case sathyanarayanan.kuppuswamy
@ 2020-03-24 23:49   ` Bjorn Helgaas
  2020-03-25  1:17     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 29+ messages in thread
From: Bjorn Helgaas @ 2020-03-24 23:49 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy; +Cc: linux-pci, linux-kernel, ashok.raj

On Mon, Mar 23, 2020 at 05:26:00PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> If hotplug is supported, during DPC event, hotplug
> driver would remove the affected devices and detach
> the drivers on DLLSC link down event and will
> re-enumerate it once the DPC recovery is handled
> and link comes back online (on DLLSC LINK up event).
> Hence we don't depend on .mmio_enabled or .slot_reset
> callbacks in error recovery handler to restore the
> device.
> 
> But if hotplug is not supported/enabled, then we need
> to let the error recovery handler attempt
> the recovery of the devices using slot reset.
> 
> So if hotplug is not supported, then instead of
> returning PCI_ERS_RESULT_RECOVERED, return
> PCI_ERS_RESULT_NEED_RESET.
> 
> Also modify the way error recovery handler processes
> the recovery value.
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/pcie/dpc.c | 8 ++++++++
>  drivers/pci/pcie/err.c | 5 +++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index e06f42f58d3d..0e356ed0d73f 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -13,6 +13,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/init.h>
>  #include <linux/pci.h>
> +#include <linux/pci_hotplug.h>
>  
>  #include "portdrv.h"
>  #include "../pci.h"
> @@ -144,6 +145,13 @@ static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
>  	if (!pcie_wait_for_link(pdev, true))
>  		return PCI_ERS_RESULT_DISCONNECT;
>  
> +	/*
> +	 * If hotplug is not supported/enabled then let the device
> +	 * recover using slot reset.
> +	 */
> +	if (!hotplug_is_native(pdev))
> +		return PCI_ERS_RESULT_NEED_RESET;

I don't understand why hotplug is relevant here.  This path
(dpc_reset_link()) is only used for downstream ports that support DPC.
DPC has already disabled the link, which resets everything below the
port, regardless of whether the port supports hotplug.

I do see that PCI_ERS_RESULT_NEED_RESET seems to promise a lot more
than it actually *does*.  The doc (pci-error-recovery.rst) says
.error_detected() can return PCI_ERS_RESULT_NEED_RESET to *request* a
slot reset.  But if that happens, pcie_do_recovery() doesn't do a
reset at all.  It calls the driver's .slot_reset() method, which tells
the driver "we've reset your device; please re-initialize the
hardware."

I guess this abuses PCI_ERS_RESULT_NEED_RESET by taking advantage of
that implementation deficiency in pcie_do_recovery(): we know the
downstream devices have already been reset via DPC, and returning
PCI_ERS_RESULT_NEED_RESET means we'll call .slot_reset() to tell the
driver about that reset.

I can see how this achieves the desired result, but if/when we fix
pcie_do_recovery() to actually *do* the reset promised by
PCI_ERS_RESULT_NEED_RESET, we will be doing *two* resets: the first
via DPC and a second via whatever slot reset mechanism
pcie_do_recovery() would use.

So I guess the real issue (as you allude to in the commit log) is that
we rely on hotplug to unbind/rebind the driver, and without hotplug we
need to at least tell the driver the device was reset.

I'll try to expand the comment here so it reminds me what's going on
when we have to look at this again :)  Let me know if I'm on the right
track.

>  	return PCI_ERS_RESULT_RECOVERED;
>  }
>  
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 1ac57e9e1e71..6e52591a4722 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -178,7 +178,8 @@ static pci_ers_result_t reset_link(struct pci_dev *dev, u32 service)
>  		return PCI_ERS_RESULT_DISCONNECT;
>  	}
>  
> -	if (status != PCI_ERS_RESULT_RECOVERED) {
> +	if ((status != PCI_ERS_RESULT_RECOVERED) &&
> +	    (status != PCI_ERS_RESULT_NEED_RESET)) {
>  		pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s failed\n",
>  			pci_name(dev));
>  		return PCI_ERS_RESULT_DISCONNECT;
> @@ -206,7 +207,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>  	if (state == pci_channel_io_frozen) {
>  		pci_walk_bus(bus, report_frozen_detected, &status);
>  		status = reset_link(dev, service);
> -		if (status != PCI_ERS_RESULT_RECOVERED)
> +		if (status == PCI_ERS_RESULT_DISCONNECT)
>  			goto failed;
>  	} else {
>  		pci_walk_bus(bus, report_normal_detected, &status);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support
  2020-03-24 21:37   ` Bjorn Helgaas
@ 2020-03-25  1:00     ` Kuppuswamy, Sathyanarayanan
  2020-03-26 22:36       ` Bjorn Helgaas
  0 siblings, 1 reply; 29+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-03-25  1:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, ashok.raj, Len Brown, Rafael J. Wysocki

Hi Bjorn,

On 3/24/20 2:37 PM, Bjorn Helgaas wrote:
> This is really ugly.  What's the story on this firmware?  It sounds
> defective to me.
I think there is no defined standard for this. I have checked few
_DSM implementations. Some of them return default value and some
don't. But atleast in the test hardware I use, we need this check.

> 
> Or is everybody that uses _DSM supposed to check before evaluating it?
I think its safer to do this check.
> E.g.,
> 
>    if (!acpi_check_dsm(...))
>      return -EINVAL;
> 
>    obj = acpi_evaluate_dsm(...);
> 
> If everybody is supposed to do this, it seems like the check part
> should be moved into acpi_evaluate_dsm().


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case
  2020-03-24 23:49   ` Bjorn Helgaas
@ 2020-03-25  1:17     ` Kuppuswamy, Sathyanarayanan
  2020-03-28 17:10       ` Bjorn Helgaas
  0 siblings, 1 reply; 29+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-03-25  1:17 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj

Hi Bjorn,

On 3/24/20 4:49 PM, Bjorn Helgaas wrote:
> I don't understand why hotplug is relevant here.  This path
> (dpc_reset_link()) is only used for downstream ports that support DPC.
> DPC has already disabled the link, which resets everything below the
> port, regardless of whether the port supports hotplug.
> 
> I do see that PCI_ERS_RESULT_NEED_RESET seems to promise a lot more
> than it actually*does*.  The doc (pci-error-recovery.rst) says
> .error_detected() can return PCI_ERS_RESULT_NEED_RESET to*request*  a
> slot reset.  But if that happens, pcie_do_recovery() doesn't do a
> reset at all.  It calls the driver's .slot_reset() method, which tells
> the driver "we've reset your device; please re-initialize the
> hardware."
> 
> I guess this abuses PCI_ERS_RESULT_NEED_RESET by taking advantage of
> that implementation deficiency in pcie_do_recovery(): we know the
> downstream devices have already been reset via DPC, and returning
> PCI_ERS_RESULT_NEED_RESET means we'll call .slot_reset() to tell the
> driver about that reset.
> 
> I can see how this achieves the desired result, but if/when we fix
> pcie_do_recovery() to actually*do*  the reset promised by
> PCI_ERS_RESULT_NEED_RESET, we will be doing*two*  resets: the first
> via DPC and a second via whatever slot reset mechanism
> pcie_do_recovery() would use.
When we fix this issue, if we make sure the reset logic is
implemented before we call .reset_link callback we should be
able to avoid resetting the device twice. Before we call DPC
.reset_link callback, the device link will not up and hence we
should not able to reset it.
> 
> So I guess the real issue (as you allude to in the commit log) is that
> we rely on hotplug to unbind/rebind the driver, and without hotplug we
> need to at least tell the driver the device was reset.
Agree
> 
> I'll try to expand the comment here so it reminds me what's going on
> when we have to look at this again:)   Let me know if I'm on the right
> track.
Yes, your understanding is correct.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support
  2020-03-25  1:00     ` Kuppuswamy, Sathyanarayanan
@ 2020-03-26 22:36       ` Bjorn Helgaas
  0 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2020-03-26 22:36 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: linux-pci, linux-kernel, ashok.raj, Len Brown, Rafael J. Wysocki

On Tue, Mar 24, 2020 at 06:00:31PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Hi Bjorn,
> 
> On 3/24/20 2:37 PM, Bjorn Helgaas wrote:
> > This is really ugly.  What's the story on this firmware?  It sounds
> > defective to me.
>
> I think there is no defined standard for this. I have checked few
> _DSM implementations. Some of them return default value and some
> don't. But atleast in the test hardware I use, we need this check.

I agree that I don't see anything in the ACPI spec v6.3 about what
should happen if we supply a Function Index that isn't supported.
That looks like a hole in the spec.

> > Or is everybody that uses _DSM supposed to check before evaluating it?
>
> I think its safer to do this check.
>
> > E.g.,
> > 
> >    if (!acpi_check_dsm(...))
> >      return -EINVAL;
> > 
> >    obj = acpi_evaluate_dsm(...);
> > 
> > If everybody is supposed to do this, it seems like the check part
> > should be moved into acpi_evaluate_dsm().

So my question, and I guess this is really for Rafael, is that since
it seems like *everybody* needs to use acpi_check_dsm() in order to
use acpi_evaluate_dsm() safely, why don't we move the check *into*
acpi_evaluate_dsm()?

It's just error prone if we expect everybody to call both interfaces.

Bjorn

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case
  2020-03-25  1:17     ` Kuppuswamy, Sathyanarayanan
@ 2020-03-28 17:10       ` Bjorn Helgaas
  2020-03-28 22:04         ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 29+ messages in thread
From: Bjorn Helgaas @ 2020-03-28 17:10 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan; +Cc: linux-pci, linux-kernel, ashok.raj

On Tue, Mar 24, 2020 at 06:17:44PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> On 3/24/20 4:49 PM, Bjorn Helgaas wrote:
> > I don't understand why hotplug is relevant here.  This path
> > (dpc_reset_link()) is only used for downstream ports that support DPC.
> > DPC has already disabled the link, which resets everything below the
> > port, regardless of whether the port supports hotplug.
> > 
> > I do see that PCI_ERS_RESULT_NEED_RESET seems to promise a lot more
> > than it actually *does*.  The doc (pci-error-recovery.rst) says
> > .error_detected() can return PCI_ERS_RESULT_NEED_RESET to *request* a
> > slot reset.  But if that happens, pcie_do_recovery() doesn't do a
> > reset at all.  It calls the driver's .slot_reset() method, which tells
> > the driver "we've reset your device; please re-initialize the
> > hardware."
> > 
> > I guess this abuses PCI_ERS_RESULT_NEED_RESET by taking advantage of
> > that implementation deficiency in pcie_do_recovery(): we know the
> > downstream devices have already been reset via DPC, and returning
> > PCI_ERS_RESULT_NEED_RESET means we'll call .slot_reset() to tell the
> > driver about that reset.
> > 
> > I can see how this achieves the desired result, but if/when we fix
> > pcie_do_recovery() to actually *do* the reset promised by
> > PCI_ERS_RESULT_NEED_RESET, we will be doing *two* resets: the first
> > via DPC and a second via whatever slot reset mechanism
> > pcie_do_recovery() would use.
>
> When we fix this issue, if we make sure the reset logic is
> implemented before we call .reset_link callback we should be
> able to avoid resetting the device twice. Before we call DPC
> .reset_link callback, the device link will not up and hence we
> should not able to reset it.
>
> > So I guess the real issue (as you allude to in the commit log) is that
> > we rely on hotplug to unbind/rebind the driver, and without hotplug we
> > need to at least tell the driver the device was reset.
>
> Agree
>
> > I'll try to expand the comment here so it reminds me what's going on
> > when we have to look at this again:)   Let me know if I'm on the right
> > track.
>
> Yes, your understanding is correct.

OK, thanks.  I'm still uncomfortable with this issue, so I think I'm
going to apply this series but omit this patch.  Here's why:

1) The fact that resets may cause hotplug events isn't specific to
DPC, so I don't think dpc_reset_link() is the right place.  For
instance, aer_root_reset() ultimately does a secondary bus reset.  The
pci_slot_reset() -> pciehp_reset_slot() path goes to some trouble to
ignore the resulting hotplug event, but the pci_bus_reset() path does
not.

2) I'm not convinced that "hotplug_is_native()" is the correct test.
Even if we're using ACPI hotplug (acpiphp), that will detach the
drivers and remove the devices, won't it?

I considered something like the patch below, which partly addresses my
first concern, but not the second.  Even the first one is awfully
messy because of the different ways the aer_root_reset() path can
work.


PCI/ERR: Skip driver callbacks if reset causes hotplug remove/add

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 1ac57e9e1e71..000551a06013 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -208,6 +208,18 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
 		status = reset_link(dev, service);
 		if (status != PCI_ERS_RESULT_RECOVERED)
 			goto failed;
+
+		/*
+		 * If pdev supports hotplug, a link reset causes a hotplug
+		 * remove event.  If we have a hotplug driver, it will
+		 * detach all drivers of downstream devices and remove the
+		 * devices, so we can't call any driver error recovery
+		 * callbacks.  Bringing the link back up causes a hotplug
+		 * add event, and the devices should be re-enumerated and
+		 * the drivers re-attached.
+		 */
+		if (hotplug_is_native(pdev))
+			goto succeeded;
 	} else {
 		pci_walk_bus(bus, report_normal_detected, &status);
 	}
@@ -224,7 +236,11 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
 		 * functions to reset slot before calling
 		 * drivers' slot_reset callbacks?
 		 */
+		pci_warn(pdev, "driver requested reset, but that's not implemented\n");
 		status = PCI_ERS_RESULT_RECOVERED;
+	}
+
+	if (status == PCI_ERS_RESULT_RECOVERED) {
 		pci_dbg(dev, "broadcast slot_reset message\n");
 		pci_walk_bus(bus, report_slot_reset, &status);
 	}
@@ -235,6 +251,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
 	pci_dbg(dev, "broadcast resume message\n");
 	pci_walk_bus(bus, report_resume, &status);
 
+succeeded:
 	pci_aer_clear_device_status(dev);
 	pci_cleanup_aer_uncorrect_error_status(dev);
 	pci_info(dev, "device recovery successful\n");

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 05/11] PCI/ERR: Remove service dependency in pcie_do_recovery()
  2020-03-24  0:26 ` [PATCH v18 05/11] PCI/ERR: Remove service dependency in pcie_do_recovery() sathyanarayanan.kuppuswamy
@ 2020-03-28 21:12   ` Kuppuswamy, Sathyanarayanan
  2020-03-28 21:32     ` Bjorn Helgaas
  0 siblings, 1 reply; 29+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-03-28 21:12 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, ashok.raj

Hi Bjorn,

On 3/23/20 5:26 PM, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 

> +void pcie_do_recovery(struct pci_dev *dev,
> +		      enum pci_channel_state state,
> +		      pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
>   {
>   	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
>   	struct pci_bus *bus;
> @@ -206,9 +165,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>   	pci_dbg(dev, "broadcast error_detected message\n");
>   	if (state == pci_channel_io_frozen) {
>   		pci_walk_bus(bus, report_frozen_detected, &status);
> -		status = reset_link(dev, service);
> -		if 		if (reset_link)
			status = reset_link(dev);(status == PCI_ERS_RESULT_DISCONNECT
> +		status = reset_link(dev);
Above line needs to be replaced as below. Since there is a
possibility reset_link can NULL (eventhough currently its
not true).
		if (reset_link)
			status = reset_link(dev);
Shall I submit another version to add above fix on top of
our pci/edr branch ?
> +		if ((status != PCI_ERS_RESULT_RECOVERED) &&
> +		    (status != PCI_ERS_RESULT_NEED_RESET)) {
> +			pci_dbg(dev, "link reset at upstream device failed\n");
>   			goto failed;
> +		}
>   	} else {
>   		pci_walk_bus(bus, report_normal_detected, &status);
>   	}
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 1e673619b101..64b5e081cdb2 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -92,9 +92,6 @@ struct pcie_port_service_driver {
>   	/* Device driver may resume normal operations */
>   	void (*error_resume)(struct pci_dev *dev);
>   
> -	/* Link Reset Capability - AER service driver specific */
> -	pci_ers_result_t (*reset_link)(struct pci_dev *dev);
> -
>   	int port_type;  /* Type of the port this driver can handle */
>   	u32 service;    /* Port service this device represents */
>   
> @@ -161,7 +158,5 @@ static inline int pcie_aer_get_firmware_first(struct pci_dev *pci_dev)
>   }
>   #endif
>   
> -struct pcie_port_service_driver *pcie_port_find_service(struct pci_dev *dev,
> -							u32 service);
>   struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
>   #endif /* _PORTDRV_H_ */
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 5075cb9e850c..50a9522ab07d 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -458,27 +458,6 @@ static int find_service_iter(struct device *device, void *data)
>   	return 0;
>   }
>   
> -/**
> - * pcie_port_find_service - find the service driver
> - * @dev: PCI Express port the service is associated with
> - * @service: Service to find
> - *
> - * Find PCI Express port service driver associated with given service
> - */
> -struct pcie_port_service_driver *pcie_port_find_service(struct pci_dev *dev,
> -							u32 service)
> -{
> -	struct pcie_port_service_driver *drv;
> -	struct portdrv_service_data pdrvs;
> -
> -	pdrvs.drv = NULL;
> -	pdrvs.service = service;
> -	device_for_each_child(&dev->dev, &pdrvs, find_service_iter);
> -
> -	drv = pdrvs.drv;
> -	return drv;
> -}
> -
>   /**
>    * pcie_port_find_device - find the struct device
>    * @dev: PCI Express port the service is associated with
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 05/11] PCI/ERR: Remove service dependency in pcie_do_recovery()
  2020-03-28 21:12   ` Kuppuswamy, Sathyanarayanan
@ 2020-03-28 21:32     ` Bjorn Helgaas
  2020-03-28 21:55       ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 29+ messages in thread
From: Bjorn Helgaas @ 2020-03-28 21:32 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan; +Cc: linux-pci, linux-kernel, ashok.raj

On Sat, Mar 28, 2020 at 02:12:48PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> On 3/23/20 5:26 PM, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > 
> 
> > +void pcie_do_recovery(struct pci_dev *dev,
> > +		      enum pci_channel_state state,
> > +		      pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
> >   {
> >   	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> >   	struct pci_bus *bus;
> > @@ -206,9 +165,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> >   	pci_dbg(dev, "broadcast error_detected message\n");
> >   	if (state == pci_channel_io_frozen) {
> >   		pci_walk_bus(bus, report_frozen_detected, &status);
> > -		status = reset_link(dev, service);
> > -		if 		if (reset_link)
> 			status = reset_link(dev);(status == PCI_ERS_RESULT_DISCONNECT
> > +		status = reset_link(dev);
> Above line needs to be replaced as below. Since there is a
> possibility reset_link can NULL (eventhough currently its
> not true).
> 		if (reset_link)
> 			status = reset_link(dev);
> Shall I submit another version to add above fix on top of
> our pci/edr branch ?

No, I can squash that in if needed.

But I don't actually think we *do* need it.  All the callers supply a
valid reset_link function pointer, and if somebody changes or adds a
new one that doesn't, I'd rather take the null pointer exception and
find out about it than silently ignore it.

Bjorn

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 05/11] PCI/ERR: Remove service dependency in pcie_do_recovery()
  2020-03-28 21:32     ` Bjorn Helgaas
@ 2020-03-28 21:55       ` Kuppuswamy, Sathyanarayanan
  2020-03-28 22:16         ` Bjorn Helgaas
  0 siblings, 1 reply; 29+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-03-28 21:55 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj

Hi Bjorn,

On 3/28/20 2:32 PM, Bjorn Helgaas wrote:
> On Sat, Mar 28, 2020 at 02:12:48PM -0700, Kuppuswamy, Sathyanarayanan wrote:
>> On 3/23/20 5:26 PM, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>>
>>
>>> +void pcie_do_recovery(struct pci_dev *dev,
>>> +		      enum pci_channel_state state,
>>> +		      pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
>>>    {
>>>    	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
>>>    	struct pci_bus *bus;
>>> @@ -206,9 +165,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>>>    	pci_dbg(dev, "broadcast error_detected message\n");
>>>    	if (state == pci_channel_io_frozen) {
>>>    		pci_walk_bus(bus, report_frozen_detected, &status);
>>> -		status = reset_link(dev, service);
>>> -		if 		if (reset_link)
>> 			status = reset_link(dev);(status == PCI_ERS_RESULT_DISCONNECT
>>> +		status = reset_link(dev);
>> Above line needs to be replaced as below. Since there is a
>> possibility reset_link can NULL (eventhough currently its
>> not true).
>> 		if (reset_link)
>> 			status = reset_link(dev);
>> Shall I submit another version to add above fix on top of
>> our pci/edr branch ?
> 
> No, I can squash that in if needed.
> 
> But I don't actually think we *do* need it.  All the callers supply a
> valid reset_link function pointer, and if somebody changes or adds a
> new one that doesn't, I'd rather take the null pointer exception and
> find out about it than silently ignore it.
But the documentation says "If reset_link is not NULL, recovery function
will use it to reset the link." It considers NULL as a possible case.
So I think its better to allow that case with a pci_warn() message.
> 
> Bjorn
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case
  2020-03-28 17:10       ` Bjorn Helgaas
@ 2020-03-28 22:04         ` Kuppuswamy, Sathyanarayanan
  2020-03-28 22:21           ` Bjorn Helgaas
  0 siblings, 1 reply; 29+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-03-28 22:04 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj

Hi Bjorn,

On 3/28/20 10:10 AM, Bjorn Helgaas wrote:
> On Tue, Mar 24, 2020 at 06:17:44PM -0700, Kuppuswamy, Sathyanarayanan wrote:
>> On 3/24/20 4:49 PM, Bjorn Helgaas wrote:
>>> I don't understand why hotplug is relevant here.  This path
>>> (dpc_reset_link()) is only used for downstream ports that support DPC.
>>> DPC has already disabled the link, which resets everything below the
>>> port, regardless of whether the port supports hotplug.
>>>
>>> I do see that PCI_ERS_RESULT_NEED_RESET seems to promise a lot more
>>> than it actually *does*.  The doc (pci-error-recovery.rst) says
>>> .error_detected() can return PCI_ERS_RESULT_NEED_RESET to *request* a
>>> slot reset.  But if that happens, pcie_do_recovery() doesn't do a
>>> reset at all.  It calls the driver's .slot_reset() method, which tells
>>> the driver "we've reset your device; please re-initialize the
>>> hardware."
>>>
>>> I guess this abuses PCI_ERS_RESULT_NEED_RESET by taking advantage of
>>> that implementation deficiency in pcie_do_recovery(): we know the
>>> downstream devices have already been reset via DPC, and returning
>>> PCI_ERS_RESULT_NEED_RESET means we'll call .slot_reset() to tell the
>>> driver about that reset.
>>>
>>> I can see how this achieves the desired result, but if/when we fix
>>> pcie_do_recovery() to actually *do* the reset promised by
>>> PCI_ERS_RESULT_NEED_RESET, we will be doing *two* resets: the first
>>> via DPC and a second via whatever slot reset mechanism
>>> pcie_do_recovery() would use.
>>
>> When we fix this issue, if we make sure the reset logic is
>> implemented before we call .reset_link callback we should be
>> able to avoid resetting the device twice. Before we call DPC
>> .reset_link callback, the device link will not up and hence we
>> should not able to reset it.
>>
>>> So I guess the real issue (as you allude to in the commit log) is that
>>> we rely on hotplug to unbind/rebind the driver, and without hotplug we
>>> need to at least tell the driver the device was reset.
>>
>> Agree
>>
>>> I'll try to expand the comment here so it reminds me what's going on
>>> when we have to look at this again:)   Let me know if I'm on the right
>>> track.
>>
>> Yes, your understanding is correct.
> 
> OK, thanks.  I'm still uncomfortable with this issue, so I think I'm
> going to apply this series but omit this patch.  Here's why:
> 
> 1) The fact that resets may cause hotplug events isn't specific to
> DPC, so I don't think dpc_reset_link() is the right place.  For
> instance, aer_root_reset() ultimately does a secondary bus reset. 
Agree. Reset is common for pci_channel_io_frozen errors. I did not
look into aer_root_reset() implementation. So if state
is pci_channel_io_frozen then we can assume the slot has been
reseted.
  The
> pci_slot_reset() -> pciehp_reset_slot() path goes to some trouble to
> ignore the resulting hotplug event, but the pci_bus_reset() path does
> not.
> 
> 2) I'm not convinced that "hotplug_is_native()" is the correct test.
> Even if we're using ACPI hotplug (acpiphp), that will detach the
> drivers and remove the devices, won't it?
Yes, agreed. It does not handle ACPI hotplug case. In case of
ACPI hotplug, native_pcie_hotplug = 0. May be we need a new helper
function. hotplug_is_enabled() ?
> 
> I considered something like the patch below, which partly addresses my
> first concern, but not the second.  Even the first one is awfully
> messy because of the different ways the aer_root_reset() path can
> work.
> 
> 
> PCI/ERR: Skip driver callbacks if reset causes hotplug remove/add
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 1ac57e9e1e71..000551a06013 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -208,6 +208,18 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>   		status = reset_link(dev, service);
>   		if (status != PCI_ERS_RESULT_RECOVERED)
>   			goto failed;
> +
> +		/*
> +		 * If pdev supports hotplug, a link reset causes a hotplug
> +		 * remove event.  If we have a hotplug driver, it will
> +		 * detach all drivers of downstream devices and remove the
> +		 * devices, so we can't call any driver error recovery
> +		 * callbacks.  Bringing the link back up causes a hotplug
> +		 * add event, and the devices should be re-enumerated and
> +		 * the drivers re-attached.
> +		 */
> +		if (hotplug_is_native(pdev))
> +			goto succeeded;
>   	} else {
>   		pci_walk_bus(bus, report_normal_detected, &status);
>   	}
> @@ -224,7 +236,11 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>   		 * functions to reset slot before calling
>   		 * drivers' slot_reset callbacks?
>   		 */
> +		pci_warn(pdev, "driver requested reset, but that's not implemented\n");
>   		status = PCI_ERS_RESULT_RECOVERED;
> +	}
> +
> +	if (status == PCI_ERS_RESULT_RECOVERED) {
Moving it outside status == PCI_ERS_NEED_RESET check will let it execute
in non frozen error as well. IIUC, we should not call it on all error
types. Let me know your comments.
>   		pci_dbg(dev, "broadcast slot_reset message\n");
>   		pci_walk_bus(bus, report_slot_reset, &status);
>   	}
> @@ -235,6 +251,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>   	pci_dbg(dev, "broadcast resume message\n");
>   	pci_walk_bus(bus, report_resume, &status);
>   
> +succeeded:
>   	pci_aer_clear_device_status(dev);
>   	pci_cleanup_aer_uncorrect_error_status(dev);
>   	pci_info(dev, "device recovery successful\n");
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 05/11] PCI/ERR: Remove service dependency in pcie_do_recovery()
  2020-03-28 21:55       ` Kuppuswamy, Sathyanarayanan
@ 2020-03-28 22:16         ` Bjorn Helgaas
  0 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2020-03-28 22:16 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan; +Cc: linux-pci, linux-kernel, ashok.raj

On Sat, Mar 28, 2020 at 02:55:50PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> On 3/28/20 2:32 PM, Bjorn Helgaas wrote:
> > On Sat, Mar 28, 2020 at 02:12:48PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> > > On 3/23/20 5:26 PM, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > > 
> > > 
> > > > +void pcie_do_recovery(struct pci_dev *dev,
> > > > +		      enum pci_channel_state state,
> > > > +		      pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
> > > >    {
> > > >    	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> > > >    	struct pci_bus *bus;
> > > > @@ -206,9 +165,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> > > >    	pci_dbg(dev, "broadcast error_detected message\n");
> > > >    	if (state == pci_channel_io_frozen) {
> > > >    		pci_walk_bus(bus, report_frozen_detected, &status);
> > > > -		status = reset_link(dev, service);
> > > > -		if 		if (reset_link)
> > > 			status = reset_link(dev);(status == PCI_ERS_RESULT_DISCONNECT
> > > > +		status = reset_link(dev);
> > > Above line needs to be replaced as below. Since there is a
> > > possibility reset_link can NULL (eventhough currently its
> > > not true).
> > > 		if (reset_link)
> > > 			status = reset_link(dev);
> > > Shall I submit another version to add above fix on top of
> > > our pci/edr branch ?
> > 
> > No, I can squash that in if needed.
> > 
> > But I don't actually think we *do* need it.  All the callers supply a
> > valid reset_link function pointer, and if somebody changes or adds a
> > new one that doesn't, I'd rather take the null pointer exception and
> > find out about it than silently ignore it.
>
> But the documentation says "If reset_link is not NULL, recovery function
> will use it to reset the link." It considers NULL as a possible case.
> So I think its better to allow that case with a pci_warn() message.

I think we should rework the documentation to remove that.
pcie_do_recovery() is internal to the PCI core and not directly
relevant to drivers.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case
  2020-03-28 22:04         ` Kuppuswamy, Sathyanarayanan
@ 2020-03-28 22:21           ` Bjorn Helgaas
  2020-03-28 22:40             ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 29+ messages in thread
From: Bjorn Helgaas @ 2020-03-28 22:21 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan; +Cc: linux-pci, linux-kernel, ashok.raj

On Sat, Mar 28, 2020 at 03:04:05PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Hi Bjorn,
> 
> On 3/28/20 10:10 AM, Bjorn Helgaas wrote:
> > On Tue, Mar 24, 2020 at 06:17:44PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> > > On 3/24/20 4:49 PM, Bjorn Helgaas wrote:
> > > > I don't understand why hotplug is relevant here.  This path
> > > > (dpc_reset_link()) is only used for downstream ports that support DPC.
> > > > DPC has already disabled the link, which resets everything below the
> > > > port, regardless of whether the port supports hotplug.
> > > > 
> > > > I do see that PCI_ERS_RESULT_NEED_RESET seems to promise a lot more
> > > > than it actually *does*.  The doc (pci-error-recovery.rst) says
> > > > .error_detected() can return PCI_ERS_RESULT_NEED_RESET to *request* a
> > > > slot reset.  But if that happens, pcie_do_recovery() doesn't do a
> > > > reset at all.  It calls the driver's .slot_reset() method, which tells
> > > > the driver "we've reset your device; please re-initialize the
> > > > hardware."
> > > > 
> > > > I guess this abuses PCI_ERS_RESULT_NEED_RESET by taking advantage of
> > > > that implementation deficiency in pcie_do_recovery(): we know the
> > > > downstream devices have already been reset via DPC, and returning
> > > > PCI_ERS_RESULT_NEED_RESET means we'll call .slot_reset() to tell the
> > > > driver about that reset.
> > > > 
> > > > I can see how this achieves the desired result, but if/when we fix
> > > > pcie_do_recovery() to actually *do* the reset promised by
> > > > PCI_ERS_RESULT_NEED_RESET, we will be doing *two* resets: the first
> > > > via DPC and a second via whatever slot reset mechanism
> > > > pcie_do_recovery() would use.
> > > 
> > > When we fix this issue, if we make sure the reset logic is
> > > implemented before we call .reset_link callback we should be
> > > able to avoid resetting the device twice. Before we call DPC
> > > .reset_link callback, the device link will not up and hence we
> > > should not able to reset it.
> > > 
> > > > So I guess the real issue (as you allude to in the commit log) is that
> > > > we rely on hotplug to unbind/rebind the driver, and without hotplug we
> > > > need to at least tell the driver the device was reset.
> > > 
> > > Agree
> > > 
> > > > I'll try to expand the comment here so it reminds me what's going on
> > > > when we have to look at this again:)   Let me know if I'm on the right
> > > > track.
> > > 
> > > Yes, your understanding is correct.
> > 
> > OK, thanks.  I'm still uncomfortable with this issue, so I think I'm
> > going to apply this series but omit this patch.  Here's why:
> > 
> > 1) The fact that resets may cause hotplug events isn't specific to
> > DPC, so I don't think dpc_reset_link() is the right place.  For
> > instance, aer_root_reset() ultimately does a secondary bus reset.
> Agree. Reset is common for pci_channel_io_frozen errors. I did not
> look into aer_root_reset() implementation. So if state
> is pci_channel_io_frozen then we can assume the slot has been
> reseted.
>  The
> > pci_slot_reset() -> pciehp_reset_slot() path goes to some trouble to
> > ignore the resulting hotplug event, but the pci_bus_reset() path does
> > not.
> > 
> > 2) I'm not convinced that "hotplug_is_native()" is the correct test.
> > Even if we're using ACPI hotplug (acpiphp), that will detach the
> > drivers and remove the devices, won't it?
> Yes, agreed. It does not handle ACPI hotplug case. In case of
> ACPI hotplug, native_pcie_hotplug = 0. May be we need a new helper
> function. hotplug_is_enabled() ?

I'm not proposing the patch below to be applied.  I only included it
as an idea of where the hotplug testing should be.

I'm proposing to merge the pci/edr branch as-is, without these two
patches:

  PCI: move {pciehp,shpchp}_is_native() definitions to pci.c
  PCI/DPC: Fix DPC recovery issue in non hotplug case

accepting that we still have some issues in the non-hotplug case that
we can fix later.

> > I considered something like the patch below, which partly addresses my
> > first concern, but not the second.  Even the first one is awfully
> > messy because of the different ways the aer_root_reset() path can
> > work.
> > 
> > 
> > PCI/ERR: Skip driver callbacks if reset causes hotplug remove/add
> > 
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index 1ac57e9e1e71..000551a06013 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -208,6 +208,18 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> >   		status = reset_link(dev, service);
> >   		if (status != PCI_ERS_RESULT_RECOVERED)
> >   			goto failed;
> > +
> > +		/*
> > +		 * If pdev supports hotplug, a link reset causes a hotplug
> > +		 * remove event.  If we have a hotplug driver, it will
> > +		 * detach all drivers of downstream devices and remove the
> > +		 * devices, so we can't call any driver error recovery
> > +		 * callbacks.  Bringing the link back up causes a hotplug
> > +		 * add event, and the devices should be re-enumerated and
> > +		 * the drivers re-attached.
> > +		 */
> > +		if (hotplug_is_native(pdev))
> > +			goto succeeded;
> >   	} else {
> >   		pci_walk_bus(bus, report_normal_detected, &status);
> >   	}
> > @@ -224,7 +236,11 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> >   		 * functions to reset slot before calling
> >   		 * drivers' slot_reset callbacks?
> >   		 */
> > +		pci_warn(pdev, "driver requested reset, but that's not implemented\n");
> >   		status = PCI_ERS_RESULT_RECOVERED;
> > +	}
> > +
> > +	if (status == PCI_ERS_RESULT_RECOVERED) {
> Moving it outside status == PCI_ERS_NEED_RESET check will let it execute
> in non frozen error as well. IIUC, we should not call it on all error
> types. Let me know your comments.
> >   		pci_dbg(dev, "broadcast slot_reset message\n");
> >   		pci_walk_bus(bus, report_slot_reset, &status);
> >   	}
> > @@ -235,6 +251,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> >   	pci_dbg(dev, "broadcast resume message\n");
> >   	pci_walk_bus(bus, report_resume, &status);
> > +succeeded:
> >   	pci_aer_clear_device_status(dev);
> >   	pci_cleanup_aer_uncorrect_error_status(dev);
> >   	pci_info(dev, "device recovery successful\n");
> > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case
  2020-03-28 22:21           ` Bjorn Helgaas
@ 2020-03-28 22:40             ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 29+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-03-28 22:40 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj



On 3/28/20 3:21 PM, Bjorn Helgaas wrote:
>>> OK, thanks.  I'm still uncomfortable with this issue, so I think I'm
>>> going to apply this series but omit this patch.  Here's why:
>>>
>>> 1) The fact that resets may cause hotplug events isn't specific to
>>> DPC, so I don't think dpc_reset_link() is the right place.  For
>>> instance, aer_root_reset() ultimately does a secondary bus reset.
>> Agree. Reset is common for pci_channel_io_frozen errors. I did not
>> look into aer_root_reset() implementation. So if state
>> is pci_channel_io_frozen then we can assume the slot has been
>> reseted.
>>   The
>>> pci_slot_reset() -> pciehp_reset_slot() path goes to some trouble to
>>> ignore the resulting hotplug event, but the pci_bus_reset() path does
>>> not.
>>>
>>> 2) I'm not convinced that "hotplug_is_native()" is the correct test.
>>> Even if we're using ACPI hotplug (acpiphp), that will detach the
>>> drivers and remove the devices, won't it?
>> Yes, agreed. It does not handle ACPI hotplug case. In case of
>> ACPI hotplug, native_pcie_hotplug = 0. May be we need a new helper
>> function. hotplug_is_enabled() ?
> I'm not proposing the patch below to be applied.  I only included it
> as an idea of where the hotplug testing should be.
> 
> I'm proposing to merge the pci/edr branch as-is, without these two
> patches:
> 
>    PCI: move {pciehp,shpchp}_is_native() definitions to pci.c
>    PCI/DPC: Fix DPC recovery issue in non hotplug case
> 
> accepting that we still have some issues in the non-hotplug case that
> we can fix later.
Ok. I am fine with it. Thanks for working on it.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support
  2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
                   ` (10 preceding siblings ...)
  2020-03-24  0:26 ` [PATCH v18 11/11] PCI/AER: Rationalize error status register clearing sathyanarayanan.kuppuswamy
@ 2020-03-31 15:28 ` Bjorn Helgaas
  2020-03-31 16:28   ` Kuppuswamy, Sathyanarayanan
  11 siblings, 1 reply; 29+ messages in thread
From: Bjorn Helgaas @ 2020-03-31 15:28 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy; +Cc: linux-pci, linux-kernel, ashok.raj

On Mon, Mar 23, 2020 at 05:25:57PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> This patchset adds support for following features:
> 
> 1. Error Disconnect Recover (EDR) support.
> 2. _OSC based negotiation support for DPC.
> 
> You can find EDR spec in the following link.
> 
> https://members.pcisig.com/wg/PCI-SIG/document/12614
> 
> Changes since v17 + Bjorns changes:
>  * This version is based on Bjorn's review/edr branch.
>  * Moved {pciehp,shpchp}_is_native() function definitions to pci.c and
>    removed it's CONFIG_ACPI dependency.
>  * Modified dpc_reset_link() function to return PCI_ERS_RESULT_NEED_RESET
>    when hotplug is not supported or enabled in kernel.
>  * Modified reset_link() function to handle PCI_ERS_RESULT_NEED_RESET as
>    valid return value.
>  * Moved the implementation of reset_link() function to pcie_do_recovery()
>    and renamed function callback parameter from reset_cb to reset_link.
>  * Moved the order of pci_acpi_add_edr_notifier() and
>    pci_acpi_remove_edr_notifier() calls in pci_acpi_setup() and
>    pci_acpi_cleanup() above wakeup capable support checks.
>  * Used acpi_check_dsm() to check whether given _DSM is supported or
>    not in edr.c.
> 
> Changes since v16:
>  * Removed reset_link from pcie_port_service_driver.
>  * Removed pcie_port_find_service().
>  * Added pci_dpc_init() in pci_init_capabilities().
> 
> Changes since v15:
>  * Splitted Patch # 3 in previous set into multiple patches.
>  * Refactored EDR driver use pci_dev instead of dpc_dev.
>  * Added some debug logs to EDR driver.
>  * Used pci_aer_raw_clear_status() for clearing AER errors in EDR path.
>  * Addressed other comments from Bjorn.
>  * Rebased patches on top of Bjorns "PCI/DPC: Move data to struct pci_dev" patch.
> 
> Changes since v14:
>  * Rebased on top of v5.6-rc1
> 
> Changes since v13:
>  * Moved all EDR related code to edr.c
>  * Addressed Bjorns comments.
> 
> Changes since v12:
>  * Addressed Bjorns comments.
>  * Added check for CONFIG_PCIE_EDR before requesting DPC control from firmware.
>  * Removed ff_check parameter from AER APIs.
>  * Used macros for _OST return status values in DPC driver.
> 
> Changes since v11:
>  * Allowed error recovery to proceed after successful reset_link().
>  * Used correct ACPI handle for sending EDR status.
>  * Rebased on top of v5.5-rc5
> 
> Changes since v10:
>  * Added "edr_enabled" member to dpc priv structure, which is used to cache EDR
>    enabling status based on status of pcie_ports_dpc_native and FF mode.
>  * Changed type of _DSM argument from Integer to Package in acpi_enable_dpc_port()
>    function to fix ACPI related boot warnings.
>  * Rebased on top of v5.5-rc3
> 
> Changes since v9:
>  * Removed caching of pcie_aer_get_firmware_first() in dpc driver.
>  * Added proper spec reference in git log for patch 5 & 7.
>  * Added new function parameter "ff_check" to pci_cleanup_aer_uncorrect_error_status(),
>    pci_aer_clear_fatal_status() and pci_cleanup_aer_error_status_regs() functions.
>  * Rebased on top of v5.4-rc5
> 
> Changes since v8:
>  * Rebased on top of v5.4-rc1
> 
> Changes since v7:
>  * Updated DSM version number to match the spec.
> 
> Changes since v6:
>  * Modified the order of patches to enable EDR only after all necessary support is added in kernel.
>  * Addressed Bjorn comments.
> 
> Changes since v5:
>  * Addressed Keith's comments.
>  * Added additional check for FF mode in pci_aer_init().
>  * Updated commit history of "PCI/DPC: Add support for DPC recovery on NON_FATAL errors" patch.
> 
> Changes since v4:
>  * Rebased on top of v5.3-rc1
>  * Fixed lock/unlock issue in edr_handle_event().
>  * Merged "Update error status after reset_link()" patch into this patchset.
> 
> Changes since v3:
>  * Moved EDR related ACPI functions/definitions to pci-acpi.c
>  * Modified commit history in few patches to include spec reference.
>  * Added support to handle DPC triggered by NON_FATAL errors.
>  * Added edr_lock to protect PCI device receiving duplicate EDR notifications.
>  * Addressed Bjorn comments.
> 
> Changes since v2:
>  * Split EDR support patch into multiple patches.
>  * Addressed Bjorn comments.
> 
> Changes since v1:
>  * Rebased on top of v5.1-rc1
> 
> Bjorn Helgaas (1):
>   PCI/DPC: Move DPC data into struct pci_dev
> 
> Kuppuswamy Sathyanarayanan (10):
>   PCI/ERR: Update error status after reset_link()
>   PCI: move {pciehp,shpchp}_is_native() definitions to pci.c
>   PCI/DPC: Fix DPC recovery issue in non hotplug case
>   PCI/ERR: Remove service dependency in pcie_do_recovery()
>   PCI/ERR: Return status of pcie_do_recovery()
>   PCI/DPC: Cache DPC capabilities in pci_init_capabilities()
>   PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error
>     Status
>   PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR
>   PCI/DPC: Add Error Disconnect Recover (EDR) support
>   PCI/AER: Rationalize error status register clearing

Applied to pci/edr for v5.7, except these two:

    PCI: move {pciehp,shpchp}_is_native() definitions to pci.c
    PCI/DPC: Fix DPC recovery issue in non hotplug case

>  Documentation/PCI/pcieaer-howto.rst       |  23 +-
>  drivers/acpi/pci_root.c                   |  15 ++
>  drivers/net/ethernet/intel/ice/ice_main.c |   4 +-
>  drivers/ntb/hw/idt/ntb_hw_idt.c           |   4 +-
>  drivers/pci/pci-acpi.c                    |  40 +---
>  drivers/pci/pci.c                         |  40 +++-
>  drivers/pci/pci.h                         |  13 +-
>  drivers/pci/pcie/Kconfig                  |  10 +
>  drivers/pci/pcie/Makefile                 |   1 +
>  drivers/pci/pcie/aer.c                    |  40 ++--
>  drivers/pci/pcie/dpc.c                    | 145 ++++++-------
>  drivers/pci/pcie/edr.c                    | 251 ++++++++++++++++++++++
>  drivers/pci/pcie/err.c                    |  67 ++----
>  drivers/pci/pcie/portdrv.h                |   5 -
>  drivers/pci/pcie/portdrv_core.c           |  21 --
>  drivers/pci/probe.c                       |   2 +
>  drivers/scsi/lpfc/lpfc_attr.c             |   4 +-
>  include/linux/acpi.h                      |   6 +-
>  include/linux/aer.h                       |   9 +-
>  include/linux/pci-acpi.h                  |   8 +
>  include/linux/pci.h                       |   6 +
>  include/linux/pci_hotplug.h               |   7 +-
>  22 files changed, 471 insertions(+), 250 deletions(-)
>  create mode 100644 drivers/pci/pcie/edr.c
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support
  2020-03-31 15:28 ` [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support Bjorn Helgaas
@ 2020-03-31 16:28   ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 29+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-03-31 16:28 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj

Hi Bjorn,

On 3/31/20 8:28 AM, Bjorn Helgaas wrote:
> Applied to pci/edr for v5.7, except these two:
Great. Thanks.
> 
>      PCI: move {pciehp,shpchp}_is_native() definitions to pci.c
>      PCI/DPC: Fix DPC recovery issue in non hotplug case

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support
  2020-03-24  0:26 ` [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
  2020-03-24 21:37   ` Bjorn Helgaas
@ 2024-04-11 18:07   ` Bjorn Helgaas
  2024-04-11 19:16     ` Kuppuswamy Sathyanarayanan
  1 sibling, 1 reply; 29+ messages in thread
From: Bjorn Helgaas @ 2024-04-11 18:07 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: bhelgaas, linux-pci, linux-kernel, ashok.raj, Len Brown,
	Rafael J. Wysocki

On Mon, Mar 23, 2020 at 05:26:07PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> Error Disconnect Recover (EDR) is a feature that allows ACPI firmware to
> notify OSPM that a device has been disconnected due to an error condition
> (ACPI v6.3, sec 5.6.6).  OSPM advertises its support for EDR on PCI devices
> via _OSC (see [1], sec 4.5.1, table 4-4).  The OSPM EDR notify handler
> should invalidate software state associated with disconnected devices and
> may attempt to recover them.  OSPM communicates the status of recovery to
> the firmware via _OST (sec 6.3.5.2).
> 
> For PCIe, firmware may use Downstream Port Containment (DPC) to support
> EDR.  Per [1], sec 4.5.1, table 4-6, even if firmware has retained control
> of DPC, OSPM may read/write DPC control and status registers during the EDR
> notification processing window, i.e., from the time it receives an EDR
> notification until it clears the DPC Trigger Status.
> 
> Note that per [1], sec 4.5.1 and 4.5.2.4,
> 
>   1. If the OS supports EDR, it should advertise that to firmware by
>      setting OSC_PCI_EDR_SUPPORT in _OSC Support.
> 
>   2. If the OS sets OSC_PCI_EXPRESS_DPC_CONTROL in _OSC Control to request
>      control of the DPC capability, it must also set OSC_PCI_EDR_SUPPORT in
>      _OSC Support.
> 
> Add an EDR notify handler to attempt recovery.
> 
> [1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
>     affecting PCI Firmware Specification, Rev. 3.2
>     https://members.pcisig.com/wg/PCI-SIG/document/12888

> +static int acpi_enable_dpc(struct pci_dev *pdev)
> +{
> +	struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
> +	union acpi_object *obj, argv4, req;
> +	int status;
> +
> +	/*
> +	 * Some firmware implementations will return default values for
> +	 * unsupported _DSM calls. So checking acpi_evaluate_dsm() return
> +	 * value for NULL condition is not a complete method for finding
> +	 * whether given _DSM function is supported or not. So use
> +	 * explicit func 0 call to find whether given _DSM function is
> +	 * supported or not.
> +	 */
> +        status = acpi_check_dsm(adev->handle, &pci_acpi_dsm_guid, 5,
> +				1ULL << EDR_PORT_DPC_ENABLE_DSM);
> +        if (!status)
> +                return 0;
> +
> +	status = 0;
> +	req.type = ACPI_TYPE_INTEGER;
> +	req.integer.value = 1;
> +
> +	argv4.type = ACPI_TYPE_PACKAGE;
> +	argv4.package.count = 1;
> +	argv4.package.elements = &req;
> +
> +	/*
> +	 * Per Downstream Port Containment Related Enhancements ECN to PCI
> +	 * Firmware Specification r3.2, sec 4.6.12, EDR_PORT_DPC_ENABLE_DSM is
> +	 * optional.  Return success if it's not implemented.
> +	 */
> +	obj = acpi_evaluate_dsm(adev->handle, &pci_acpi_dsm_guid, 5,
> +				EDR_PORT_DPC_ENABLE_DSM, &argv4);

This has been upstream for a while, just a follow-up question: this
_DSM function was defined by the ECN with Rev 5.  The ECN was
incorporated into the PCI Firmware spec r3.3 with slightly different
behavior as Rev 6.

The main differences are:

  ECN
    - Rev 5
    - Arg3 is an Integer
    - Return is 0 (DPC disabled) or 1 (DPC enabled)

  r3.3 spec
    - Rev 6
    - Arg3 is a Package of one Integer
    - Return is 0 (DPC disabled, Hot-Plug Surprise may be set), 1 (DPC
      enabled, Hot-Plug Surprise may be cleared), or 2 (failure)

So the question is whether this actually implements Rev 5 or Rev 6?
It looks like this builds a *package* for Arg3 (which would correspond
to Rev 6), but we're evaluating Rev 5, which specified an Integer.

The meaning of the Arg3 values is basically the same, so I don't see
an issue there, but it looks like if a platform implemented Rev 5
according to the ECN to take a bare Integer, this might not work
correctly.

> +	if (!obj)
> +		return 0;
> +
> +	if (obj->type != ACPI_TYPE_INTEGER) {
> +		pci_err(pdev, FW_BUG "Enable DPC _DSM returned non integer\n");
> +		status = -EIO;
> +	}
> +
> +	if (obj->integer.value != 1) {
> +		pci_err(pdev, "Enable DPC _DSM failed to enable DPC\n");
> +		status = -EIO;
> +	}
> +
> +	ACPI_FREE(obj);
> +
> +	return status;
> +}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support
  2024-04-11 18:07   ` Bjorn Helgaas
@ 2024-04-11 19:16     ` Kuppuswamy Sathyanarayanan
  0 siblings, 0 replies; 29+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2024-04-11 19:16 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, linux-pci, linux-kernel, ashok.raj, Len Brown,
	Rafael J. Wysocki


On 4/11/24 11:07 AM, Bjorn Helgaas wrote:
> On Mon, Mar 23, 2020 at 05:26:07PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>> Error Disconnect Recover (EDR) is a feature that allows ACPI firmware to
>> notify OSPM that a device has been disconnected due to an error condition
>> (ACPI v6.3, sec 5.6.6).  OSPM advertises its support for EDR on PCI devices
>> via _OSC (see [1], sec 4.5.1, table 4-4).  The OSPM EDR notify handler
>> should invalidate software state associated with disconnected devices and
>> may attempt to recover them.  OSPM communicates the status of recovery to
>> the firmware via _OST (sec 6.3.5.2).
>>
>> For PCIe, firmware may use Downstream Port Containment (DPC) to support
>> EDR.  Per [1], sec 4.5.1, table 4-6, even if firmware has retained control
>> of DPC, OSPM may read/write DPC control and status registers during the EDR
>> notification processing window, i.e., from the time it receives an EDR
>> notification until it clears the DPC Trigger Status.
>>
>> Note that per [1], sec 4.5.1 and 4.5.2.4,
>>
>>   1. If the OS supports EDR, it should advertise that to firmware by
>>      setting OSC_PCI_EDR_SUPPORT in _OSC Support.
>>
>>   2. If the OS sets OSC_PCI_EXPRESS_DPC_CONTROL in _OSC Control to request
>>      control of the DPC capability, it must also set OSC_PCI_EDR_SUPPORT in
>>      _OSC Support.
>>
>> Add an EDR notify handler to attempt recovery.
>>
>> [1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
>>     affecting PCI Firmware Specification, Rev. 3.2
>>     https://members.pcisig.com/wg/PCI-SIG/document/12888
>> +static int acpi_enable_dpc(struct pci_dev *pdev)
>> +{
>> +	struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
>> +	union acpi_object *obj, argv4, req;
>> +	int status;
>> +
>> +	/*
>> +	 * Some firmware implementations will return default values for
>> +	 * unsupported _DSM calls. So checking acpi_evaluate_dsm() return
>> +	 * value for NULL condition is not a complete method for finding
>> +	 * whether given _DSM function is supported or not. So use
>> +	 * explicit func 0 call to find whether given _DSM function is
>> +	 * supported or not.
>> +	 */
>> +        status = acpi_check_dsm(adev->handle, &pci_acpi_dsm_guid, 5,
>> +				1ULL << EDR_PORT_DPC_ENABLE_DSM);
>> +        if (!status)
>> +                return 0;
>> +
>> +	status = 0;
>> +	req.type = ACPI_TYPE_INTEGER;
>> +	req.integer.value = 1;
>> +
>> +	argv4.type = ACPI_TYPE_PACKAGE;
>> +	argv4.package.count = 1;
>> +	argv4.package.elements = &req;
>> +
>> +	/*
>> +	 * Per Downstream Port Containment Related Enhancements ECN to PCI
>> +	 * Firmware Specification r3.2, sec 4.6.12, EDR_PORT_DPC_ENABLE_DSM is
>> +	 * optional.  Return success if it's not implemented.
>> +	 */
>> +	obj = acpi_evaluate_dsm(adev->handle, &pci_acpi_dsm_guid, 5,
>> +				EDR_PORT_DPC_ENABLE_DSM, &argv4);
> This has been upstream for a while, just a follow-up question: this
> _DSM function was defined by the ECN with Rev 5.  The ECN was
> incorporated into the PCI Firmware spec r3.3 with slightly different
> behavior as Rev 6.
>
> The main differences are:
>
>   ECN
>     - Rev 5
>     - Arg3 is an Integer
>     - Return is 0 (DPC disabled) or 1 (DPC enabled)
>
>   r3.3 spec
>     - Rev 6
>     - Arg3 is a Package of one Integer
>     - Return is 0 (DPC disabled, Hot-Plug Surprise may be set), 1 (DPC
>       enabled, Hot-Plug Surprise may be cleared), or 2 (failure)
>
> So the question is whether this actually implements Rev 5 or Rev 6?
> It looks like this builds a *package* for Arg3 (which would correspond
> to Rev 6), but we're evaluating Rev 5, which specified an Integer.
>
> The meaning of the Arg3 values is basically the same, so I don't see
> an issue there, but it looks like if a platform implemented Rev 5
> according to the ECN to take a bare Integer, this might not work
> correctly.

I think it implements rev 6. The version number needs to be updated.

If you would like, I can submit a patch to fix it.

>
>> +	if (!obj)
>> +		return 0;
>> +
>> +	if (obj->type != ACPI_TYPE_INTEGER) {
>> +		pci_err(pdev, FW_BUG "Enable DPC _DSM returned non integer\n");
>> +		status = -EIO;
>> +	}
>> +
>> +	if (obj->integer.value != 1) {
>> +		pci_err(pdev, "Enable DPC _DSM failed to enable DPC\n");
>> +		status = -EIO;
>> +	}
>> +
>> +	ACPI_FREE(obj);
>> +
>> +	return status;
>> +}

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-04-11 19:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
2020-03-24  0:25 ` [PATCH v18 01/11] PCI/ERR: Update error status after reset_link() sathyanarayanan.kuppuswamy
2020-03-24  0:25 ` [PATCH v18 02/11] PCI: move {pciehp,shpchp}_is_native() definitions to pci.c sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case sathyanarayanan.kuppuswamy
2020-03-24 23:49   ` Bjorn Helgaas
2020-03-25  1:17     ` Kuppuswamy, Sathyanarayanan
2020-03-28 17:10       ` Bjorn Helgaas
2020-03-28 22:04         ` Kuppuswamy, Sathyanarayanan
2020-03-28 22:21           ` Bjorn Helgaas
2020-03-28 22:40             ` Kuppuswamy, Sathyanarayanan
2020-03-24  0:26 ` [PATCH v18 04/11] PCI/DPC: Move DPC data into struct pci_dev sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 05/11] PCI/ERR: Remove service dependency in pcie_do_recovery() sathyanarayanan.kuppuswamy
2020-03-28 21:12   ` Kuppuswamy, Sathyanarayanan
2020-03-28 21:32     ` Bjorn Helgaas
2020-03-28 21:55       ` Kuppuswamy, Sathyanarayanan
2020-03-28 22:16         ` Bjorn Helgaas
2020-03-24  0:26 ` [PATCH v18 06/11] PCI/ERR: Return status of pcie_do_recovery() sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 07/11] PCI/DPC: Cache DPC capabilities in pci_init_capabilities() sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 08/11] PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error Status sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 09/11] PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
2020-03-24 21:37   ` Bjorn Helgaas
2020-03-25  1:00     ` Kuppuswamy, Sathyanarayanan
2020-03-26 22:36       ` Bjorn Helgaas
2024-04-11 18:07   ` Bjorn Helgaas
2024-04-11 19:16     ` Kuppuswamy Sathyanarayanan
2020-03-24  0:26 ` [PATCH v18 11/11] PCI/AER: Rationalize error status register clearing sathyanarayanan.kuppuswamy
2020-03-31 15:28 ` [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support Bjorn Helgaas
2020-03-31 16:28   ` Kuppuswamy, Sathyanarayanan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).