All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/7] Address error and recovery for AER and DPC
@ 2018-02-22 16:02 Oza Pawandeep
  2018-02-22 16:02 ` [PATCH v10 1/7] PCI/AER: Rename error recovery to generic pci naming Oza Pawandeep
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Oza Pawandeep @ 2018-02-22 16:02 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

This patch set brings in error handling support for DPC

The current implementation of AER and error message broadcasting to the
EP driver is tightly coupled and limited to AER service driver.
It is important to factor out broadcasting and other link handling
callbacks. So that not only when AER gets triggered, but also when DPC get
triggered (for e.g. ERR_FATAL), callbacks are handled appropriately.

DPC should enumerate the devices after recovering the link, which is
achieved by implementing error_resume callback.

Changes since v9:
    Sinan's comments addressed.
Changes since v8:
    Fixed Kbuild errors.
Changes since v7:
    Rebased the code on pci master
	> https://kernel.googlesource.com/pub/scm/linux/kernel/git/helgaas/pci
Changes since v6:
    Sinan's and Stefan's comments implemented.
	> reordered patch 6 and 7
	> cleaned up
Changes since v5:
    Sinan's and Keith's comments incorporated.
        > made separate patch for mutex
        > unified error repotting codes into driver/pci/pci.h
        > got rid of wait link active/inactive and
          made generic function in driver/pci/pci.c
Changes since v4:
    Bjorn's comments incorporated.
        > Renamed only do_recovery.
        > moved the things more locally to drivers/pci/pci.h
Changes since v3:
    Bjorn's comments incorporated.
        > Made separate patch renaming generic pci_err.c
        > Introduce pci_err.h to contain all the error types and recovery
        > removed all the dependencies on pci.h
Changes since v2:
    Based on feedback from Keith:
    "
    When DPC is triggered due to receipt of an uncorrectable error Message,
    the Requester ID from the Message is recorded in the DPC Error
    Source ID register and that Message is discarded and not forwarded Upstream.
    "
    Removed the patch where AER checks if DPC service is active
Changes since v1:
    Kbuild errors fixed:
        > pci_find_dpc_dev made static
        > ras_event.h updated
        > pci_find_aer_service call with CONFIG check
        > pci_find_dpc_service call with CONFIG check

Oza Pawandeep (7):
  PCI/AER: Rename error recovery to generic pci naming
  PCI/AER: factor out error reporting from AER
  PCI/ERR: add mutex to synchronize recovery
  PCI/DPC: Unify and plumb error handling into DPC
  PCI/AER: Unify aer error defines at single space
  PCI/DPC: Enumerate the devices after DPC trigger event
  PCI: Unify wait for link active into generic pci

 drivers/acpi/apei/ghes.c               |   1 +
 drivers/pci/hotplug/pciehp_hpc.c       |  21 +-
 drivers/pci/pci.c                      |  39 +++-
 drivers/pci/pci.h                      |  11 +
 drivers/pci/pcie/Makefile              |   2 +-
 drivers/pci/pcie/aer/aerdrv.h          |  30 ---
 drivers/pci/pcie/aer/aerdrv_core.c     | 293 +-------------------------
 drivers/pci/pcie/aer/aerdrv_errprint.c |   1 +
 drivers/pci/pcie/pcie-dpc.c            | 115 ++++++++++-
 drivers/pci/pcie/pcie-err.c            | 366 +++++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.h             |   2 +
 include/linux/aer.h                    |   4 -
 include/linux/pci.h                    |   1 +
 13 files changed, 534 insertions(+), 352 deletions(-)
 create mode 100644 drivers/pci/pcie/pcie-err.c

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v10 1/7] PCI/AER: Rename error recovery to generic pci naming
  2018-02-22 16:02 [PATCH v10 0/7] Address error and recovery for AER and DPC Oza Pawandeep
@ 2018-02-22 16:02 ` Oza Pawandeep
  2018-02-22 18:34   ` Christoph Hellwig
  2018-02-22 16:02 ` [PATCH v10 2/7] PCI/AER: factor out error reporting from AER Oza Pawandeep
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Oza Pawandeep @ 2018-02-22 16:02 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

This patch renames error recovery to generic name with pci prefix

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index a4bfea5..306bf2f 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -478,7 +478,7 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
 }
 
 /**
- * do_recovery - handle nonfatal/fatal error recovery process
+ * pci_do_recovery - handle nonfatal/fatal error recovery process
  * @dev: pointer to a pci_dev data structure of agent detecting an error
  * @severity: error severity type
  *
@@ -486,7 +486,7 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
  * error detected message to all downstream drivers within a hierarchy in
  * question and return the returned code.
  */
-static void do_recovery(struct pci_dev *dev, int severity)
+static void pci_do_recovery(struct pci_dev *dev, int severity)
 {
 	pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
 	enum pci_channel_state state;
@@ -566,7 +566,7 @@ static void handle_error_source(struct pcie_device *aerdev,
 			pci_write_config_dword(dev, pos + PCI_ERR_COR_STATUS,
 					info->status);
 	} else
-		do_recovery(dev, info->severity);
+		pci_do_recovery(dev, info->severity);
 }
 
 #ifdef CONFIG_ACPI_APEI_PCIEAER
@@ -631,7 +631,7 @@ static void aer_recover_work_func(struct work_struct *work)
 		}
 		cper_print_aer(pdev, entry.severity, entry.regs);
 		if (entry.severity != AER_CORRECTABLE)
-			do_recovery(pdev, entry.severity);
+			pci_do_recovery(pdev, entry.severity);
 		pci_dev_put(pdev);
 	}
 }
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 2/7] PCI/AER: factor out error reporting from AER
  2018-02-22 16:02 [PATCH v10 0/7] Address error and recovery for AER and DPC Oza Pawandeep
  2018-02-22 16:02 ` [PATCH v10 1/7] PCI/AER: Rename error recovery to generic pci naming Oza Pawandeep
@ 2018-02-22 16:02 ` Oza Pawandeep
  2018-02-22 18:34   ` Christoph Hellwig
  2018-02-22 16:02 ` [PATCH v10 3/7] PCI/ERR: add mutex to synchronize recovery Oza Pawandeep
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Oza Pawandeep @ 2018-02-22 16:02 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

This patch factors out error reporting callbacks, which are currently
tightly coupled with AER.
DPC should be able to register callbacks and attmept recovery when DPC
trigger event occurs.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fcd8191..a5a79f0 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -342,6 +342,9 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
 
 void pci_enable_acs(struct pci_dev *dev);
 
+/* PCI error reporting and recovery */
+void pci_do_recovery(struct pci_dev *dev, int severity);
+
 #ifdef CONFIG_PCIEASPM
 void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 223e4c3..d669497 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -6,7 +6,7 @@
 # Build PCI Express ASPM if needed
 obj-$(CONFIG_PCIEASPM)		+= aspm.o
 
-pcieportdrv-y			:= portdrv_core.o portdrv_pci.o portdrv_bus.o
+pcieportdrv-y			:= portdrv_core.o portdrv_pci.o portdrv_bus.o pcie-err.o
 pcieportdrv-$(CONFIG_ACPI)	+= portdrv_acpi.o
 
 obj-$(CONFIG_PCIEPORTBUS)	+= pcieportdrv.o
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 5449e5c..bc9db53 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -76,36 +76,6 @@ struct aer_rpc {
 					 */
 };
 
-struct aer_broadcast_data {
-	enum pci_channel_state state;
-	enum pci_ers_result result;
-};
-
-static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
-		enum pci_ers_result new)
-{
-	if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
-		return PCI_ERS_RESULT_NO_AER_DRIVER;
-
-	if (new == PCI_ERS_RESULT_NONE)
-		return orig;
-
-	switch (orig) {
-	case PCI_ERS_RESULT_CAN_RECOVER:
-	case PCI_ERS_RESULT_RECOVERED:
-		orig = new;
-		break;
-	case PCI_ERS_RESULT_DISCONNECT:
-		if (new == PCI_ERS_RESULT_NEED_RESET)
-			orig = PCI_ERS_RESULT_NEED_RESET;
-		break;
-	default:
-		break;
-	}
-
-	return orig;
-}
-
 extern struct bus_type pcie_port_bus_type;
 void aer_isr(struct work_struct *work);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 306bf2f..410d4b8 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -23,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/kfifo.h>
 #include "aerdrv.h"
+#include "../../pci.h"
 
 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
 				 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
@@ -230,191 +231,6 @@ static bool find_source_device(struct pci_dev *parent,
 	return true;
 }
 
-static int report_error_detected(struct pci_dev *dev, void *data)
-{
-	pci_ers_result_t vote;
-	const struct pci_error_handlers *err_handler;
-	struct aer_broadcast_data *result_data;
-	result_data = (struct aer_broadcast_data *) data;
-
-	device_lock(&dev->dev);
-	dev->error_state = result_data->state;
-
-	if (!dev->driver ||
-		!dev->driver->err_handler ||
-		!dev->driver->err_handler->error_detected) {
-		if (result_data->state == pci_channel_io_frozen &&
-			dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
-			/*
-			 * In case of fatal recovery, if one of down-
-			 * stream device has no driver. We might be
-			 * unable to recover because a later insmod
-			 * of a driver for this device is unaware of
-			 * its hw state.
-			 */
-			pci_printk(KERN_DEBUG, dev, "device has %s\n",
-				   dev->driver ?
-				   "no AER-aware driver" : "no driver");
-		}
-
-		/*
-		 * If there's any device in the subtree that does not
-		 * have an error_detected callback, returning
-		 * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
-		 * the subsequent mmio_enabled/slot_reset/resume
-		 * callbacks of "any" device in the subtree. All the
-		 * devices in the subtree are left in the error state
-		 * without recovery.
-		 */
-
-		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
-			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
-		else
-			vote = PCI_ERS_RESULT_NONE;
-	} else {
-		err_handler = dev->driver->err_handler;
-		vote = err_handler->error_detected(dev, result_data->state);
-		pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
-	}
-
-	result_data->result = merge_result(result_data->result, vote);
-	device_unlock(&dev->dev);
-	return 0;
-}
-
-static int report_mmio_enabled(struct pci_dev *dev, void *data)
-{
-	pci_ers_result_t vote;
-	const struct pci_error_handlers *err_handler;
-	struct aer_broadcast_data *result_data;
-	result_data = (struct aer_broadcast_data *) data;
-
-	device_lock(&dev->dev);
-	if (!dev->driver ||
-		!dev->driver->err_handler ||
-		!dev->driver->err_handler->mmio_enabled)
-		goto out;
-
-	err_handler = dev->driver->err_handler;
-	vote = err_handler->mmio_enabled(dev);
-	result_data->result = merge_result(result_data->result, vote);
-out:
-	device_unlock(&dev->dev);
-	return 0;
-}
-
-static int report_slot_reset(struct pci_dev *dev, void *data)
-{
-	pci_ers_result_t vote;
-	const struct pci_error_handlers *err_handler;
-	struct aer_broadcast_data *result_data;
-	result_data = (struct aer_broadcast_data *) data;
-
-	device_lock(&dev->dev);
-	if (!dev->driver ||
-		!dev->driver->err_handler ||
-		!dev->driver->err_handler->slot_reset)
-		goto out;
-
-	err_handler = dev->driver->err_handler;
-	vote = err_handler->slot_reset(dev);
-	result_data->result = merge_result(result_data->result, vote);
-out:
-	device_unlock(&dev->dev);
-	return 0;
-}
-
-static int report_resume(struct pci_dev *dev, void *data)
-{
-	const struct pci_error_handlers *err_handler;
-
-	device_lock(&dev->dev);
-	dev->error_state = pci_channel_io_normal;
-
-	if (!dev->driver ||
-		!dev->driver->err_handler ||
-		!dev->driver->err_handler->resume)
-		goto out;
-
-	err_handler = dev->driver->err_handler;
-	err_handler->resume(dev);
-	pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED);
-out:
-	device_unlock(&dev->dev);
-	return 0;
-}
-
-/**
- * broadcast_error_message - handle message broadcast to downstream drivers
- * @dev: pointer to from where in a hierarchy message is broadcasted down
- * @state: error state
- * @error_mesg: message to print
- * @cb: callback to be broadcasted
- *
- * Invoked during error recovery process. Once being invoked, the content
- * of error severity will be broadcasted to all downstream drivers in a
- * hierarchy in question.
- */
-static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
-	enum pci_channel_state state,
-	char *error_mesg,
-	int (*cb)(struct pci_dev *, void *))
-{
-	struct aer_broadcast_data result_data;
-
-	pci_printk(KERN_DEBUG, dev, "broadcast %s message\n", error_mesg);
-	result_data.state = state;
-	if (cb == report_error_detected)
-		result_data.result = PCI_ERS_RESULT_CAN_RECOVER;
-	else
-		result_data.result = PCI_ERS_RESULT_RECOVERED;
-
-	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
-		/*
-		 * If the error is reported by a bridge, we think this error
-		 * is related to the downstream link of the bridge, so we
-		 * do error recovery on all subordinates of the bridge instead
-		 * of the bridge and clear the error status of the bridge.
-		 */
-		if (cb == report_error_detected)
-			dev->error_state = state;
-		pci_walk_bus(dev->subordinate, cb, &result_data);
-		if (cb == report_resume) {
-			pci_cleanup_aer_uncorrect_error_status(dev);
-			dev->error_state = pci_channel_io_normal;
-		}
-	} else {
-		/*
-		 * If the error is reported by an end point, we think this
-		 * error is related to the upstream link of the end point.
-		 */
-		if (state == pci_channel_io_normal)
-			/*
-			 * the error is non fatal so the bus is ok, just invoke
-			 * the callback for the function that logged the error.
-			 */
-			cb(dev, &result_data);
-		else
-			pci_walk_bus(dev->bus, cb, &result_data);
-	}
-
-	return result_data.result;
-}
-
-/**
- * default_reset_link - default reset function
- * @dev: pointer to pci_dev data structure
- *
- * Invoked when performing link reset on a Downstream Port or a
- * Root Port with no aer driver.
- */
-static pci_ers_result_t default_reset_link(struct pci_dev *dev)
-{
-	pci_reset_bridge_secondary_bus(dev);
-	pci_printk(KERN_DEBUG, dev, "downstream link has been reset\n");
-	return PCI_ERS_RESULT_RECOVERED;
-}
-
 static int find_aer_service_iter(struct device *device, void *data)
 {
 	struct pcie_port_service_driver *service_driver, **drv;
@@ -432,7 +248,7 @@ static int find_aer_service_iter(struct device *device, void *data)
 	return 0;
 }
 
-static struct pcie_port_service_driver *find_aer_service(struct pci_dev *dev)
+struct pcie_port_service_driver *pci_find_aer_service(struct pci_dev *dev)
 {
 	struct pcie_port_service_driver *drv = NULL;
 
@@ -440,107 +256,7 @@ static struct pcie_port_service_driver *find_aer_service(struct pci_dev *dev)
 
 	return drv;
 }
-
-static pci_ers_result_t reset_link(struct pci_dev *dev)
-{
-	struct pci_dev *udev;
-	pci_ers_result_t status;
-	struct pcie_port_service_driver *driver;
-
-	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
-		/* Reset this port for all subordinates */
-		udev = dev;
-	} else {
-		/* Reset the upstream component (likely downstream port) */
-		udev = dev->bus->self;
-	}
-
-	/* Use the aer driver of the component firstly */
-	driver = find_aer_service(udev);
-
-	if (driver && driver->reset_link) {
-		status = driver->reset_link(udev);
-	} else if (udev->has_secondary_link) {
-		status = default_reset_link(udev);
-	} else {
-		pci_printk(KERN_DEBUG, dev, "no link-reset support at upstream device %s\n",
-			pci_name(udev));
-		return PCI_ERS_RESULT_DISCONNECT;
-	}
-
-	if (status != PCI_ERS_RESULT_RECOVERED) {
-		pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s failed\n",
-			pci_name(udev));
-		return PCI_ERS_RESULT_DISCONNECT;
-	}
-
-	return status;
-}
-
-/**
- * pci_do_recovery - handle nonfatal/fatal error recovery process
- * @dev: pointer to a pci_dev data structure of agent detecting an error
- * @severity: error severity type
- *
- * Invoked when an error is nonfatal/fatal. Once being invoked, broadcast
- * error detected message to all downstream drivers within a hierarchy in
- * question and return the returned code.
- */
-static void pci_do_recovery(struct pci_dev *dev, int severity)
-{
-	pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
-	enum pci_channel_state state;
-
-	if (severity == AER_FATAL)
-		state = pci_channel_io_frozen;
-	else
-		state = pci_channel_io_normal;
-
-	status = broadcast_error_message(dev,
-			state,
-			"error_detected",
-			report_error_detected);
-
-	if (severity == AER_FATAL) {
-		result = reset_link(dev);
-		if (result != PCI_ERS_RESULT_RECOVERED)
-			goto failed;
-	}
-
-	if (status == PCI_ERS_RESULT_CAN_RECOVER)
-		status = broadcast_error_message(dev,
-				state,
-				"mmio_enabled",
-				report_mmio_enabled);
-
-	if (status == PCI_ERS_RESULT_NEED_RESET) {
-		/*
-		 * TODO: Should call platform-specific
-		 * functions to reset slot before calling
-		 * drivers' slot_reset callbacks?
-		 */
-		status = broadcast_error_message(dev,
-				state,
-				"slot_reset",
-				report_slot_reset);
-	}
-
-	if (status != PCI_ERS_RESULT_RECOVERED)
-		goto failed;
-
-	broadcast_error_message(dev,
-				state,
-				"resume",
-				report_resume);
-
-	pci_info(dev, "AER: Device recovery successful\n");
-	return;
-
-failed:
-	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
-	/* TODO: Should kernel panic here? */
-	pci_info(dev, "AER: Device recovery failed\n");
-}
+EXPORT_SYMBOL(pci_find_aer_service);
 
 /**
  * handle_error_source - handle logging error into an event log
diff --git a/drivers/pci/pcie/pcie-err.c b/drivers/pci/pcie/pcie-err.c
new file mode 100644
index 0000000..a532fe0
--- /dev/null
+++ b/drivers/pci/pcie/pcie-err.c
@@ -0,0 +1,334 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This file implements the error recovery as a core part of PCIe error reporting.
+ * When a PCIe error is delivered, an error message will be collected and printed
+ * to console, then, an error recovery procedure will be executed by following
+ * the PCI error recovery rules.
+ *
+ * Copyright (C) 2006 Intel Corp.
+ *	Tom Long Nguyen (tom.l.nguyen@intel.com)
+ *	Zhang Yanmin (yanmin.zhang@intel.com)
+ *
+ */
+
+#include <linux/pci.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/aer.h>
+#include <linux/pcieport_if.h>
+#include "portdrv.h"
+
+struct aer_broadcast_data {
+	enum pci_channel_state state;
+	enum pci_ers_result result;
+};
+
+static pci_ers_result_t merge_result(enum pci_ers_result orig,
+				  enum pci_ers_result new)
+{
+	if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
+		return PCI_ERS_RESULT_NO_AER_DRIVER;
+
+	if (new == PCI_ERS_RESULT_NONE)
+		return orig;
+
+	switch (orig) {
+	case PCI_ERS_RESULT_CAN_RECOVER:
+	case PCI_ERS_RESULT_RECOVERED:
+		orig = new;
+		break;
+	case PCI_ERS_RESULT_DISCONNECT:
+		if (new == PCI_ERS_RESULT_NEED_RESET)
+			orig = PCI_ERS_RESULT_NEED_RESET;
+		break;
+	default:
+		break;
+	}
+
+	return orig;
+}
+
+static int report_mmio_enabled(struct pci_dev *dev, void *data)
+{
+	pci_ers_result_t vote;
+	const struct pci_error_handlers *err_handler;
+	struct aer_broadcast_data *result_data;
+
+	result_data = (struct aer_broadcast_data *) data;
+
+	device_lock(&dev->dev);
+	if (!dev->driver ||
+		!dev->driver->err_handler ||
+		!dev->driver->err_handler->mmio_enabled)
+		goto out;
+
+	err_handler = dev->driver->err_handler;
+	vote = err_handler->mmio_enabled(dev);
+	result_data->result = merge_result(result_data->result, vote);
+out:
+	device_unlock(&dev->dev);
+	return 0;
+}
+
+static int report_slot_reset(struct pci_dev *dev, void *data)
+{
+	pci_ers_result_t vote;
+	const struct pci_error_handlers *err_handler;
+	struct aer_broadcast_data *result_data;
+
+	result_data = (struct aer_broadcast_data *) data;
+
+	device_lock(&dev->dev);
+	if (!dev->driver ||
+		!dev->driver->err_handler ||
+		!dev->driver->err_handler->slot_reset)
+		goto out;
+
+	err_handler = dev->driver->err_handler;
+	vote = err_handler->slot_reset(dev);
+	result_data->result = merge_result(result_data->result, vote);
+out:
+	device_unlock(&dev->dev);
+	return 0;
+}
+
+static int report_resume(struct pci_dev *dev, void *data)
+{
+	const struct pci_error_handlers *err_handler;
+
+	device_lock(&dev->dev);
+	dev->error_state = pci_channel_io_normal;
+
+	if (!dev->driver ||
+		!dev->driver->err_handler ||
+		!dev->driver->err_handler->resume)
+		goto out;
+
+	err_handler = dev->driver->err_handler;
+	err_handler->resume(dev);
+out:
+	device_unlock(&dev->dev);
+	return 0;
+}
+
+static int report_error_detected(struct pci_dev *dev, void *data)
+{
+	pci_ers_result_t vote;
+	const struct pci_error_handlers *err_handler;
+	struct aer_broadcast_data *result_data;
+
+	result_data = (struct aer_broadcast_data *) data;
+
+	device_lock(&dev->dev);
+	dev->error_state = result_data->state;
+
+	if (!dev->driver ||
+		!dev->driver->err_handler ||
+		!dev->driver->err_handler->error_detected) {
+		if (result_data->state == pci_channel_io_frozen &&
+			dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
+			/*
+			 * In case of fatal recovery, if one of down-
+			 * stream device has no driver. We might be
+			 * unable to recover because a later insmod
+			 * of a driver for this device is unaware of
+			 * its hw state.
+			 */
+			dev_printk(KERN_DEBUG, &dev->dev, "device has %s\n",
+				   dev->driver ?
+				   "no error-aware driver" : "no driver");
+		}
+
+		/*
+		 * If there's any device in the subtree that does not
+		 * have an error_detected callback, returning
+		 * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
+		 * the subsequent mmio_enabled/slot_reset/resume
+		 * callbacks of "any" device in the subtree. All the
+		 * devices in the subtree are left in the error state
+		 * without recovery.
+		 */
+
+		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
+			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
+		else
+			vote = PCI_ERS_RESULT_NONE;
+	} else {
+		err_handler = dev->driver->err_handler;
+		vote = err_handler->error_detected(dev, result_data->state);
+	}
+
+	result_data->result = merge_result(result_data->result, vote);
+	device_unlock(&dev->dev);
+	return 0;
+}
+
+/**
+ * default_reset_link - default reset function
+ * @dev: pointer to pci_dev data structure
+ *
+ * Invoked when performing link reset on a Downstream Port or a
+ * Root Port with no aer driver.
+ */
+static pci_ers_result_t default_reset_link(struct pci_dev *dev)
+{
+	pci_reset_bridge_secondary_bus(dev);
+	dev_printk(KERN_DEBUG, &dev->dev, "downstream link has been reset\n");
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+static pci_ers_result_t reset_link(struct pci_dev *dev)
+{
+	struct pci_dev *udev;
+	pci_ers_result_t status;
+	struct pcie_port_service_driver *driver = NULL;
+
+	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+		/* Reset this port for all subordinates */
+		udev = dev;
+	} else {
+		/* Reset the upstream component (likely downstream port) */
+		udev = dev->bus->self;
+	}
+
+#if IS_ENABLED(CONFIG_PCIEAER)
+	/* Use the aer driver of the component firstly */
+	driver = pci_find_aer_service(udev);
+#endif
+
+	if (driver && driver->reset_link) {
+		status = driver->reset_link(udev);
+	} else if (udev->has_secondary_link) {
+		status = default_reset_link(udev);
+	} else {
+		dev_printk(KERN_DEBUG, &dev->dev,
+			"no link-reset support at upstream device %s\n",
+			pci_name(udev));
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	if (status != PCI_ERS_RESULT_RECOVERED) {
+		dev_printk(KERN_DEBUG, &dev->dev,
+			"link reset at upstream device %s failed\n",
+			pci_name(udev));
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	return status;
+}
+
+/**
+ * broadcast_error_message - handle message broadcast to downstream drivers
+ * @dev: pointer to from where in a hierarchy message is broadcasted down
+ * @state: error state
+ * @error_mesg: message to print
+ * @cb: callback to be broadcasted
+ *
+ * Invoked during error recovery process. Once being invoked, the content
+ * of error severity will be broadcasted to all downstream drivers in a
+ * hierarchy in question.
+ */
+static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
+	enum pci_channel_state state,
+	char *error_mesg,
+	int (*cb)(struct pci_dev *, void *))
+{
+	struct aer_broadcast_data result_data;
+
+	dev_printk(KERN_DEBUG, &dev->dev, "broadcast %s message\n", error_mesg);
+	result_data.state = state;
+	if (cb == report_error_detected)
+		result_data.result = PCI_ERS_RESULT_CAN_RECOVER;
+	else
+		result_data.result = PCI_ERS_RESULT_RECOVERED;
+
+	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+		/*
+		 * If the error is reported by a bridge, we think this error
+		 * is related to the downstream link of the bridge, so we
+		 * do error recovery on all subordinates of the bridge instead
+		 * of the bridge and clear the error status of the bridge.
+		 */
+		if (cb == report_error_detected)
+			dev->error_state = state;
+		pci_walk_bus(dev->subordinate, cb, &result_data);
+		if (cb == report_resume) {
+			pci_cleanup_aer_uncorrect_error_status(dev);
+			dev->error_state = pci_channel_io_normal;
+		}
+	} else {
+		/*
+		 * If the error is reported by an end point, we think this
+		 * error is related to the upstream link of the end point.
+		 */
+		pci_walk_bus(dev->bus, cb, &result_data);
+	}
+
+	return result_data.result;
+}
+
+/**
+ * pci_do_recovery - handle nonfatal/fatal error recovery process
+ * @dev: pointer to a pci_dev data structure of agent detecting an error
+ * @severity: error severity type
+ *
+ * Invoked when an error is nonfatal/fatal. Once being invoked, broadcast
+ * error detected message to all downstream drivers within a hierarchy in
+ * question and return the returned code.
+ */
+void pci_do_recovery(struct pci_dev *dev, int severity)
+{
+	pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
+	enum pci_channel_state state;
+
+	if (severity == AER_FATAL)
+		state = pci_channel_io_frozen;
+	else
+		state = pci_channel_io_normal;
+
+	status = broadcast_error_message(dev,
+			state,
+			"error_detected",
+			report_error_detected);
+
+	if (severity == AER_FATAL) {
+		result = reset_link(dev);
+		if (result != PCI_ERS_RESULT_RECOVERED)
+			goto failed;
+	}
+
+	if (status == PCI_ERS_RESULT_CAN_RECOVER)
+		status = broadcast_error_message(dev,
+				state,
+				"mmio_enabled",
+				report_mmio_enabled);
+
+	if (status == PCI_ERS_RESULT_NEED_RESET) {
+		/*
+		 * TODO: Should call platform-specific
+		 * functions to reset slot before calling
+		 * drivers' slot_reset callbacks?
+		 */
+		status = broadcast_error_message(dev,
+				state,
+				"slot_reset",
+				report_slot_reset);
+	}
+
+	if (status != PCI_ERS_RESULT_RECOVERED)
+		goto failed;
+
+	broadcast_error_message(dev,
+				state,
+				"resume",
+				report_resume);
+
+	dev_info(&dev->dev, "Device recovery successful\n");
+	return;
+
+failed:
+	/* TODO: Should kernel panic here? */
+	dev_info(&dev->dev, "Device recovery failed\n");
+}
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index a854bc5..4f1992d 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -79,4 +79,5 @@ static inline void pcie_port_platform_notify(struct pci_dev *port, int *mask)
 static inline void pcie_port_platform_notify(struct pci_dev *port, int *mask){}
 #endif /* !CONFIG_ACPI */
 
+struct pcie_port_service_driver *pci_find_aer_service(struct pci_dev *dev);
 #endif /* _PORTDRV_H_ */
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 3/7] PCI/ERR: add mutex to synchronize recovery
  2018-02-22 16:02 [PATCH v10 0/7] Address error and recovery for AER and DPC Oza Pawandeep
  2018-02-22 16:02 ` [PATCH v10 1/7] PCI/AER: Rename error recovery to generic pci naming Oza Pawandeep
  2018-02-22 16:02 ` [PATCH v10 2/7] PCI/AER: factor out error reporting from AER Oza Pawandeep
@ 2018-02-22 16:02 ` Oza Pawandeep
  2018-02-22 18:36   ` Christoph Hellwig
  2018-02-22 16:02 ` [PATCH v10 4/7] PCI/DPC: Unify and plumb error handling into DPC Oza Pawandeep
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Oza Pawandeep @ 2018-02-22 16:02 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

This patch protects pci_do_recovery with mutex.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pcie/pcie-err.c b/drivers/pci/pcie/pcie-err.c
index a532fe0..8318c84 100644
--- a/drivers/pci/pcie/pcie-err.c
+++ b/drivers/pci/pcie/pcie-err.c
@@ -20,6 +20,8 @@
 #include <linux/pcieport_if.h>
 #include "portdrv.h"
 
+static DEFINE_MUTEX(pci_err_recovery_lock);
+
 struct aer_broadcast_data {
 	enum pci_channel_state state;
 	enum pci_ers_result result;
@@ -283,6 +285,8 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
 	pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
 	enum pci_channel_state state;
 
+	mutex_lock(&pci_err_recovery_lock);
+
 	if (severity == AER_FATAL)
 		state = pci_channel_io_frozen;
 	else
@@ -326,9 +330,11 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
 				report_resume);
 
 	dev_info(&dev->dev, "Device recovery successful\n");
+	mutex_unlock(&pci_err_recovery_lock);
 	return;
 
 failed:
 	/* TODO: Should kernel panic here? */
 	dev_info(&dev->dev, "Device recovery failed\n");
+	mutex_unlock(&pci_err_recovery_lock);
 }
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 4/7] PCI/DPC: Unify and plumb error handling into DPC
  2018-02-22 16:02 [PATCH v10 0/7] Address error and recovery for AER and DPC Oza Pawandeep
                   ` (2 preceding siblings ...)
  2018-02-22 16:02 ` [PATCH v10 3/7] PCI/ERR: add mutex to synchronize recovery Oza Pawandeep
@ 2018-02-22 16:02 ` Oza Pawandeep
  2018-02-22 18:39   ` Christoph Hellwig
  2018-02-22 16:02 ` [PATCH v10 5/7] PCI/AER: Unify aer error defines at single space Oza Pawandeep
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Oza Pawandeep @ 2018-02-22 16:02 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

Current DPC driver does not do recovery, e.g. calling end-point's driver's
callbacks, which sanitize the sw.

DPC driver implements link_reset callback, and calls pci_do_recovery.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a5a79f0..124f42e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -343,6 +343,8 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
 void pci_enable_acs(struct pci_dev *dev);
 
 /* PCI error reporting and recovery */
+#define DPC_FATAL	4
+
 void pci_do_recovery(struct pci_dev *dev, int severity);
 
 #ifdef CONFIG_PCIEASPM
diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index 38e40c6..208b427 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -13,6 +13,7 @@
 #include <linux/pcieport_if.h>
 #include "../pci.h"
 #include "aer/aerdrv.h"
+#include "portdrv.h"
 
 struct dpc_dev {
 	struct pcie_device	*dev;
@@ -45,6 +46,60 @@ struct dpc_dev {
 	"Memory Request Completion Timeout",		 /* Bit Position 18 */
 };
 
+static int find_dpc_dev_iter(struct device *device, void *data)
+{
+	struct pcie_port_service_driver *service_driver;
+	struct device **dev;
+
+	dev = (struct device **) data;
+
+	if (device->bus == &pcie_port_bus_type && device->driver) {
+		service_driver = to_service_driver(device->driver);
+		if (service_driver->service == PCIE_PORT_SERVICE_DPC) {
+			*dev = device;
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
+static struct device *pci_find_dpc_dev(struct pci_dev *pdev)
+{
+	struct device *dev = NULL;
+
+	device_for_each_child(&pdev->dev, &dev, find_dpc_dev_iter);
+
+	return dev;
+}
+
+static int find_dpc_service_iter(struct device *device, void *data)
+{
+	struct pcie_port_service_driver *service_driver, **drv;
+
+	drv = (struct pcie_port_service_driver **) data;
+
+	if (device->bus == &pcie_port_bus_type && device->driver) {
+		service_driver = to_service_driver(device->driver);
+		if (service_driver->service == PCIE_PORT_SERVICE_DPC) {
+			*drv = service_driver;
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
+struct pcie_port_service_driver *pci_find_dpc_service(struct pci_dev *dev)
+{
+	struct pcie_port_service_driver *drv = NULL;
+
+	device_for_each_child(&dev->dev, &drv, find_dpc_service_iter);
+
+	return drv;
+}
+EXPORT_SYMBOL(pci_find_dpc_service);
+
 static int dpc_wait_rp_inactive(struct dpc_dev *dpc)
 {
 	unsigned long timeout = jiffies + HZ;
@@ -82,12 +137,25 @@ static void dpc_wait_link_inactive(struct dpc_dev *dpc)
 		dev_warn(dev, "Link state not disabled for DPC event\n");
 }
 
-static void dpc_work(struct work_struct *work)
+/**
+ * dpc_reset_link - reset link DPC  routine
+ * @dev: pointer to Root Port's pci_dev data structure
+ *
+ * Invoked by Port Bus driver when performing link reset at Root Port.
+ */
+static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 {
-	struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
-	struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
 	struct pci_bus *parent = pdev->subordinate;
-	u16 cap = dpc->cap_pos, ctl;
+	struct pci_dev *dev, *temp;
+	struct dpc_dev *dpc;
+	struct pcie_device *pciedev;
+	struct device *devdpc;
+	u16 cap, ctl;
+
+	devdpc = pci_find_dpc_dev(pdev);
+	pciedev = to_pcie_device(devdpc);
+	dpc = get_service_data(pciedev);
+	cap = dpc->cap_pos;
 
 	pci_lock_rescan_remove();
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
@@ -104,21 +172,31 @@ static void dpc_work(struct work_struct *work)
 
 	dpc_wait_link_inactive(dpc);
 	if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
-		return;
+		return PCI_ERS_RESULT_DISCONNECT;
 	if (dpc->rp_extensions && dpc->rp_pio_status) {
 		pci_write_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_STATUS,
 				       dpc->rp_pio_status);
 		dpc->rp_pio_status = 0;
 	}
 
-	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
+	pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_STATUS,
 		PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, &ctl);
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
-			      ctl | PCI_EXP_DPC_CTL_INT_EN);
+				ctl | PCI_EXP_DPC_CTL_INT_EN);
+
+	return PCI_ERS_RESULT_RECOVERED;
 }
 
+static void dpc_work(struct work_struct *work)
+{
+	struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
+	struct pci_dev *pdev = dpc->dev->port;
+
+	/* From DPC point of view error is always FATAL. */
+	pci_do_recovery(pdev, DPC_FATAL);
+}
 static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
 {
 	struct device *dev = &dpc->dev->device;
@@ -297,6 +375,7 @@ static void dpc_remove(struct pcie_device *dev)
 	.service	= PCIE_PORT_SERVICE_DPC,
 	.probe		= dpc_probe,
 	.remove		= dpc_remove,
+	.reset_link     = dpc_reset_link,
 };
 
 static int __init dpc_service_init(void)
diff --git a/drivers/pci/pcie/pcie-err.c b/drivers/pci/pcie/pcie-err.c
index 8318c84..6844347 100644
--- a/drivers/pci/pcie/pcie-err.c
+++ b/drivers/pci/pcie/pcie-err.c
@@ -19,6 +19,7 @@
 #include <linux/aer.h>
 #include <linux/pcieport_if.h>
 #include "portdrv.h"
+#include "./../pci.h"
 
 static DEFINE_MUTEX(pci_err_recovery_lock);
 
@@ -181,7 +182,7 @@ static pci_ers_result_t default_reset_link(struct pci_dev *dev)
 	return PCI_ERS_RESULT_RECOVERED;
 }
 
-static pci_ers_result_t reset_link(struct pci_dev *dev)
+static pci_ers_result_t reset_link(struct pci_dev *dev, int severity)
 {
 	struct pci_dev *udev;
 	pci_ers_result_t status;
@@ -195,9 +196,17 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
 		udev = dev->bus->self;
 	}
 
+
+	/* Use the service driver of the component firstly */
+#if IS_ENABLED(CONFIG_PCIE_DPC)
+	if (severity == DPC_FATAL)
+		driver = pci_find_dpc_service(udev);
+#endif
 #if IS_ENABLED(CONFIG_PCIEAER)
-	/* Use the aer driver of the component firstly */
-	driver = pci_find_aer_service(udev);
+	if ((severity == AER_FATAL) ||
+	    (severity == AER_NONFATAL) ||
+	    (severity == AER_CORRECTABLE))
+		driver = pci_find_aer_service(udev);
 #endif
 
 	if (driver && driver->reset_link) {
@@ -287,7 +296,8 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
 
 	mutex_lock(&pci_err_recovery_lock);
 
-	if (severity == AER_FATAL)
+	if ((severity == AER_FATAL) ||
+	    (severity == DPC_FATAL))
 		state = pci_channel_io_frozen;
 	else
 		state = pci_channel_io_normal;
@@ -297,10 +307,14 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
 			"error_detected",
 			report_error_detected);
 
-	if (severity == AER_FATAL) {
-		result = reset_link(dev);
+	if ((severity == AER_FATAL) ||
+	    (severity == DPC_FATAL)) {
+		result = reset_link(dev, severity);
 		if (result != PCI_ERS_RESULT_RECOVERED)
 			goto failed;
+		else if (severity == DPC_FATAL)
+			goto resume;
+
 	}
 
 	if (status == PCI_ERS_RESULT_CAN_RECOVER)
@@ -324,6 +338,7 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
 	if (status != PCI_ERS_RESULT_RECOVERED)
 		goto failed;
 
+resume:
 	broadcast_error_message(dev,
 				state,
 				"resume",
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 4f1992d..b013e24 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -80,4 +80,5 @@ static inline void pcie_port_platform_notify(struct pci_dev *port, int *mask){}
 #endif /* !CONFIG_ACPI */
 
 struct pcie_port_service_driver *pci_find_aer_service(struct pci_dev *dev);
+struct pcie_port_service_driver *pci_find_dpc_service(struct pci_dev *dev);
 #endif /* _PORTDRV_H_ */
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 5/7] PCI/AER: Unify aer error defines at single space
  2018-02-22 16:02 [PATCH v10 0/7] Address error and recovery for AER and DPC Oza Pawandeep
                   ` (3 preceding siblings ...)
  2018-02-22 16:02 ` [PATCH v10 4/7] PCI/DPC: Unify and plumb error handling into DPC Oza Pawandeep
@ 2018-02-22 16:02 ` Oza Pawandeep
  2018-02-22 16:06   ` poza
  2018-02-22 16:02 ` [PATCH v10 6/7] PCI: Unify wait for link active into generic pci Oza Pawandeep
  2018-02-22 16:02 ` [PATCH v10 7/7] PCI/DPC: Enumerate the devices after DPC trigger event Oza Pawandeep
  6 siblings, 1 reply; 19+ messages in thread
From: Oza Pawandeep @ 2018-02-22 16:02 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

This patch moves AER error defines to drivers/pci/pci.h.
So that it unifies the error repoting codes at single place along with dpc

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1efefe9..7ae9bb3 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -56,6 +56,7 @@
 #include <ras/ras_event.h>
 
 #include "apei-internal.h"
+#include "../../pci/pci.h"
 
 #define GHES_PFX	"GHES: "
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 124f42e..b0e63b5 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -343,7 +343,11 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
 void pci_enable_acs(struct pci_dev *dev);
 
 /* PCI error reporting and recovery */
-#define DPC_FATAL	4
+#define AER_NONFATAL		0
+#define AER_FATAL		1
+#define AER_CORRECTABLE		2
+
+#define DPC_FATAL		4
 
 void pci_do_recovery(struct pci_dev *dev, int severity);
 
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 6a352e6..4c59f37 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -19,6 +19,7 @@
 #include <linux/cper.h>
 
 #include "aerdrv.h"
+#include "../../pci.h"
 #include <ras/ras_event.h>
 
 #define AER_AGENT_RECEIVER		0
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 8f87bbe..3eac8ed 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -11,10 +11,6 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 
-#define AER_NONFATAL			0
-#define AER_FATAL			1
-#define AER_CORRECTABLE			2
-
 struct pci_dev;
 
 struct aer_header_log_regs {
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 9c68986..d75c75b 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -13,6 +13,7 @@
 #include <linux/aer.h>
 #include <linux/cper.h>
 #include <linux/mm.h>
+#include "../../../drivers/pci/pci.h"
 
 /*
  * MCE Extended Error Log trace event
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 6/7] PCI: Unify wait for link active into generic pci
  2018-02-22 16:02 [PATCH v10 0/7] Address error and recovery for AER and DPC Oza Pawandeep
                   ` (4 preceding siblings ...)
  2018-02-22 16:02 ` [PATCH v10 5/7] PCI/AER: Unify aer error defines at single space Oza Pawandeep
@ 2018-02-22 16:02 ` Oza Pawandeep
  2018-02-22 18:43   ` Christoph Hellwig
  2018-02-22 16:02 ` [PATCH v10 7/7] PCI/DPC: Enumerate the devices after DPC trigger event Oza Pawandeep
  6 siblings, 1 reply; 19+ messages in thread
From: Oza Pawandeep @ 2018-02-22 16:02 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

Clients such as pciehp, dpc are using pcie_wait_link_active, which waits
till the link becomes active or inactive.

Made generic function and moved it to drivers/pci/pci.c

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 18a42f8..de9b0ea 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -231,25 +231,11 @@ bool pciehp_check_link_active(struct controller *ctrl)
 	return ret;
 }
 
-static void __pcie_wait_link_active(struct controller *ctrl, bool active)
+static bool pcie_wait_link_active(struct controller *ctrl)
 {
-	int timeout = 1000;
-
-	if (pciehp_check_link_active(ctrl) == active)
-		return;
-	while (timeout > 0) {
-		msleep(10);
-		timeout -= 10;
-		if (pciehp_check_link_active(ctrl) == active)
-			return;
-	}
-	ctrl_dbg(ctrl, "Data Link Layer Link Active not %s in 1000 msec\n",
-			active ? "set" : "cleared");
-}
+	struct pci_dev *pdev = ctrl_dev(ctrl);
 
-static void pcie_wait_link_active(struct controller *ctrl)
-{
-	__pcie_wait_link_active(ctrl, true);
+	return pci_wait_for_link(pdev, true);
 }
 
 static bool pci_bus_check_dev(struct pci_bus *bus, int devfn)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f6a4dd1..5440696 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4176,6 +4176,40 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
 	return 0;
 }
 
+/**
+ * pci__wait_for_link - Wait for link till its active/inactive
+ * @pdev: Bridge device
+ * @active: waiting for active or inactive ?
+ *
+ * Use this to wait till link becomes active or inactive.
+ */
+bool pci_wait_for_link(struct pci_dev *pdev, bool active)
+{
+	int timeout = 1000;
+	bool ret;
+	u16 lnk_status;
+
+	pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
+	ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
+
+	while ((ret != active) && (timeout > 0)) {
+		msleep(10);
+		timeout -= 10;
+		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
+		ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
+	}
+
+	if (ret == active)
+		return true;
+
+	dev_printk(KERN_DEBUG, &pdev->dev,
+		   "Data Link Layer Link Active not %s in 1000 msec\n",
+		   active ? "set" : "cleared");
+
+	return false;
+}
+EXPORT_SYMBOL(pci_wait_for_link);
+
 void pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	u16 ctrl;
diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index 208b427..0524bd3 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -122,19 +122,9 @@ static int dpc_wait_rp_inactive(struct dpc_dev *dpc)
 
 static void dpc_wait_link_inactive(struct dpc_dev *dpc)
 {
-	unsigned long timeout = jiffies + HZ;
 	struct pci_dev *pdev = dpc->dev->port;
-	struct device *dev = &dpc->dev->device;
-	u16 lnk_status;
 
-	pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
-	while (lnk_status & PCI_EXP_LNKSTA_DLLLA &&
-					!time_after(jiffies, timeout)) {
-		msleep(10);
-		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
-	}
-	if (lnk_status & PCI_EXP_LNKSTA_DLLLA)
-		dev_warn(dev, "Link state not disabled for DPC event\n");
+	pci_wait_for_link(pdev, false);
 }
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 024a1be..cb674c3 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1195,6 +1195,7 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 int pci_request_selected_regions(struct pci_dev *, int, const char *);
 int pci_request_selected_regions_exclusive(struct pci_dev *, int, const char *);
 void pci_release_selected_regions(struct pci_dev *, int);
+bool pci_wait_for_link(struct pci_dev *pdev, bool active);
 
 /* drivers/pci/bus.c */
 struct pci_bus *pci_bus_get(struct pci_bus *bus);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 7/7] PCI/DPC: Enumerate the devices after DPC trigger event
  2018-02-22 16:02 [PATCH v10 0/7] Address error and recovery for AER and DPC Oza Pawandeep
                   ` (5 preceding siblings ...)
  2018-02-22 16:02 ` [PATCH v10 6/7] PCI: Unify wait for link active into generic pci Oza Pawandeep
@ 2018-02-22 16:02 ` Oza Pawandeep
  6 siblings, 0 replies; 19+ messages in thread
From: Oza Pawandeep @ 2018-02-22 16:02 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

Implement error_resume callback in DPC so, after DPC trigger event
enumerates the devices beneath.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index 0524bd3..382d5c5 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -128,6 +128,23 @@ static void dpc_wait_link_inactive(struct dpc_dev *dpc)
 }
 
 /**
+ * dpc_error_resume - enumerate the devices beneath
+ * @dev: pointer to Root Port's pci_dev data structure
+ *
+ * Invoked by Port Bus driver during nonfatal recovery.
+ */
+static void dpc_error_resume(struct pci_dev *pdev)
+{
+	bool active = true;
+
+	if (pci_wait_for_link(pdev, active)) {
+		pci_lock_rescan_remove();
+		pci_rescan_bus(pdev->bus);
+		pci_unlock_rescan_remove();
+	}
+}
+
+/**
  * dpc_reset_link - reset link DPC  routine
  * @dev: pointer to Root Port's pci_dev data structure
  *
@@ -365,6 +382,7 @@ static void dpc_remove(struct pcie_device *dev)
 	.service	= PCIE_PORT_SERVICE_DPC,
 	.probe		= dpc_probe,
 	.remove		= dpc_remove,
+	.error_resume	= dpc_error_resume,
 	.reset_link     = dpc_reset_link,
 };
 
diff --git a/drivers/pci/pcie/pcie-err.c b/drivers/pci/pcie/pcie-err.c
index 6844347..d7e2377 100644
--- a/drivers/pci/pcie/pcie-err.c
+++ b/drivers/pci/pcie/pcie-err.c
@@ -236,6 +236,7 @@ static pci_ers_result_t reset_link(struct pci_dev *dev, int severity)
  * @state: error state
  * @error_mesg: message to print
  * @cb: callback to be broadcasted
+ * @severity: error severity
  *
  * Invoked during error recovery process. Once being invoked, the content
  * of error severity will be broadcasted to all downstream drivers in a
@@ -244,7 +245,8 @@ static pci_ers_result_t reset_link(struct pci_dev *dev, int severity)
 static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
 	enum pci_channel_state state,
 	char *error_mesg,
-	int (*cb)(struct pci_dev *, void *))
+	int (*cb)(struct pci_dev *, void *),
+	int severity)
 {
 	struct aer_broadcast_data result_data;
 
@@ -256,6 +258,17 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
 		result_data.result = PCI_ERS_RESULT_RECOVERED;
 
 	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+		/* If DPC is triggered, call resume error handler
+		 * because, at this point we can safely assume that
+		 * link recovery has happened, this is only handled if
+		 * callback is resume, as this function can be called
+		 * with multiple callbacks.
+		 */
+		if ((severity == DPC_FATAL) &&
+			(cb == report_resume)) {
+			cb(dev, NULL);
+			return PCI_ERS_RESULT_RECOVERED;
+		}
 		/*
 		 * If the error is reported by a bridge, we think this error
 		 * is related to the downstream link of the bridge, so we
@@ -305,7 +318,8 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
 	status = broadcast_error_message(dev,
 			state,
 			"error_detected",
-			report_error_detected);
+			report_error_detected,
+			severity);
 
 	if ((severity == AER_FATAL) ||
 	    (severity == DPC_FATAL)) {
@@ -321,7 +335,8 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
 		status = broadcast_error_message(dev,
 				state,
 				"mmio_enabled",
-				report_mmio_enabled);
+				report_mmio_enabled,
+				severity);
 
 	if (status == PCI_ERS_RESULT_NEED_RESET) {
 		/*
@@ -332,7 +347,8 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
 		status = broadcast_error_message(dev,
 				state,
 				"slot_reset",
-				report_slot_reset);
+				report_slot_reset,
+				severity);
 	}
 
 	if (status != PCI_ERS_RESULT_RECOVERED)
@@ -342,7 +358,8 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
 	broadcast_error_message(dev,
 				state,
 				"resume",
-				report_resume);
+				report_resume,
+				severity);
 
 	dev_info(&dev->dev, "Device recovery successful\n");
 	mutex_unlock(&pci_err_recovery_lock);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 5/7] PCI/AER: Unify aer error defines at single space
  2018-02-22 16:02 ` [PATCH v10 5/7] PCI/AER: Unify aer error defines at single space Oza Pawandeep
@ 2018-02-22 16:06   ` poza
  0 siblings, 0 replies; 19+ messages in thread
From: poza @ 2018-02-22 16:06 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On 2018-02-22 21:32, Oza Pawandeep wrote:
> This patch moves AER error defines to drivers/pci/pci.h.
> So that it unifies the error repoting codes at single place along with 
> dpc
> 
> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 1efefe9..7ae9bb3 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -56,6 +56,7 @@
>  #include <ras/ras_event.h>
> 
>  #include "apei-internal.h"
> +#include "../../pci/pci.h"
> 
>  #define GHES_PFX	"GHES: "
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 124f42e..b0e63b5 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -343,7 +343,11 @@ static inline resource_size_t
> pci_resource_alignment(struct pci_dev *dev,
>  void pci_enable_acs(struct pci_dev *dev);
> 
>  /* PCI error reporting and recovery */
> -#define DPC_FATAL	4
> +#define AER_NONFATAL		0
> +#define AER_FATAL		1
> +#define AER_CORRECTABLE		2
> +
> +#define DPC_FATAL		4
> 
>  void pci_do_recovery(struct pci_dev *dev, int severity);
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c
> b/drivers/pci/pcie/aer/aerdrv_errprint.c
> index 6a352e6..4c59f37 100644
> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> @@ -19,6 +19,7 @@
>  #include <linux/cper.h>
> 
>  #include "aerdrv.h"
> +#include "../../pci.h"
>  #include <ras/ras_event.h>
> 
>  #define AER_AGENT_RECEIVER		0
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 8f87bbe..3eac8ed 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -11,10 +11,6 @@
>  #include <linux/errno.h>
>  #include <linux/types.h>
> 
> -#define AER_NONFATAL			0
> -#define AER_FATAL			1
> -#define AER_CORRECTABLE			2
> -
>  struct pci_dev;
> 
>  struct aer_header_log_regs {
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index 9c68986..d75c75b 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -13,6 +13,7 @@
>  #include <linux/aer.h>
>  #include <linux/cper.h>
>  #include <linux/mm.h>
> +#include "../../../drivers/pci/pci.h"
> 
>  /*
>   * MCE Extended Error Log trace event


Hi Bjorn,

Based on your earlier comment I have moved AER and DPC error defined in 
driver/pci/pci.h
but I am not entirely comfortable with moving them there because

file such as
drivers/acpi/apei/ghes.c
include/ras/ras_event.h

they are outside of the scope of driver/pci/pci.h and they have to 
include the files
#include "../../pci/pci.h" and #include "../../../drivers/pci/pci.h" 
respectively.

hence my suggestion is to move error defines in include/linux/pci.h

let me know your view.

Regards,
Oza.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 1/7] PCI/AER: Rename error recovery to generic pci naming
  2018-02-22 16:02 ` [PATCH v10 1/7] PCI/AER: Rename error recovery to generic pci naming Oza Pawandeep
@ 2018-02-22 18:34   ` Christoph Hellwig
  2018-02-23  5:24     ` poza
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-02-22 18:34 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On Thu, Feb 22, 2018 at 09:32:06PM +0530, Oza Pawandeep wrote:
> This patch renames error recovery to generic name with pci prefix
> 
> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index a4bfea5..306bf2f 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -478,7 +478,7 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
>  }
>  
>  /**
> - * do_recovery - handle nonfatal/fatal error recovery process
> + * pci_do_recovery - handle nonfatal/fatal error recovery process
>   * @dev: pointer to a pci_dev data structure of agent detecting an error
>   * @severity: error severity type
>   *
> @@ -486,7 +486,7 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
>   * error detected message to all downstream drivers within a hierarchy in
>   * question and return the returned code.
>   */
> -static void do_recovery(struct pci_dev *dev, int severity)
> +static void pci_do_recovery(struct pci_dev *dev, int severity)

Maybe pcie_do_recovery?

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 2/7] PCI/AER: factor out error reporting from AER
  2018-02-22 16:02 ` [PATCH v10 2/7] PCI/AER: factor out error reporting from AER Oza Pawandeep
@ 2018-02-22 18:34   ` Christoph Hellwig
  2018-02-22 18:42     ` David Laight
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-02-22 18:34 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

> -	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> -	/* TODO: Should kernel panic here? */
> -	pci_info(dev, "AER: Device recovery failed\n");
> -}
> +EXPORT_SYMBOL(pci_find_aer_service);

EXPORT_SYMBPOL_GPL please.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 3/7] PCI/ERR: add mutex to synchronize recovery
  2018-02-22 16:02 ` [PATCH v10 3/7] PCI/ERR: add mutex to synchronize recovery Oza Pawandeep
@ 2018-02-22 18:36   ` Christoph Hellwig
  2018-02-23  5:27     ` poza
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-02-22 18:36 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On Thu, Feb 22, 2018 at 09:32:08PM +0530, Oza Pawandeep wrote:
> This patch protects pci_do_recovery with mutex.

Do we really want a global mutex for this and not one per root complex
at very least?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 4/7] PCI/DPC: Unify and plumb error handling into DPC
  2018-02-22 16:02 ` [PATCH v10 4/7] PCI/DPC: Unify and plumb error handling into DPC Oza Pawandeep
@ 2018-02-22 18:39   ` Christoph Hellwig
  2018-02-23  5:28     ` poza
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-02-22 18:39 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On Thu, Feb 22, 2018 at 09:32:09PM +0530, Oza Pawandeep wrote:
> Current DPC driver does not do recovery, e.g. calling end-point's driver's
> callbacks, which sanitize the sw.
> 
> DPC driver implements link_reset callback, and calls pci_do_recovery.
> 
> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index a5a79f0..124f42e 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -343,6 +343,8 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
>  void pci_enable_acs(struct pci_dev *dev);
>  
>  /* PCI error reporting and recovery */
> +#define DPC_FATAL	4
> +
>  void pci_do_recovery(struct pci_dev *dev, int severity);
>  
>  #ifdef CONFIG_PCIEASPM
> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
> index 38e40c6..208b427 100644
> --- a/drivers/pci/pcie/pcie-dpc.c
> +++ b/drivers/pci/pcie/pcie-dpc.c
> @@ -13,6 +13,7 @@
>  #include <linux/pcieport_if.h>
>  #include "../pci.h"
>  #include "aer/aerdrv.h"
> +#include "portdrv.h"
>  
>  struct dpc_dev {
>  	struct pcie_device	*dev;
> @@ -45,6 +46,60 @@ struct dpc_dev {
>  	"Memory Request Completion Timeout",		 /* Bit Position 18 */
>  };
>  
> +static int find_dpc_dev_iter(struct device *device, void *data)
> +{
> +	struct pcie_port_service_driver *service_driver;
> +	struct device **dev;
> +
> +	dev = (struct device **) data;
> +
> +	if (device->bus == &pcie_port_bus_type && device->driver) {
> +		service_driver = to_service_driver(device->driver);

Please move the initial assignment to the declaration line to
clean this up a bit:

	struct device **dev = (struct device **)data;

Same thing happens a couple more times in this patch.

> +EXPORT_SYMBOL(pci_find_dpc_service);

EXPORT_SYMBOL_GPL for PCI layer internals again - and ditto for
probably about everything in this series.

>  #if IS_ENABLED(CONFIG_PCIEAER)
> -	/* Use the aer driver of the component firstly */
> -	driver = pci_find_aer_service(udev);
> +	if ((severity == AER_FATAL) ||
> +	    (severity == AER_NONFATAL) ||
> +	    (severity == AER_CORRECTABLE))
> +		driver = pci_find_aer_service(udev);
>  #endif

No need for the inner braces here.

> -	if (severity == AER_FATAL)
> +	if ((severity == AER_FATAL) ||
> +	    (severity == DPC_FATAL))
>  		state = pci_channel_io_frozen;
>  	else
>  		state = pci_channel_io_normal;

Same here and a few more times below.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v10 2/7] PCI/AER: factor out error reporting from AER
  2018-02-22 18:34   ` Christoph Hellwig
@ 2018-02-22 18:42     ` David Laight
  0 siblings, 0 replies; 19+ messages in thread
From: David Laight @ 2018-02-22 18:42 UTC (permalink / raw)
  To: 'Christoph Hellwig', Oza Pawandeep
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

From: Christoph Hellwig
> > -	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> > -	/* TODO: Should kernel panic here? */
> > -	pci_info(dev, "AER: Device recovery failed\n");
> > -}
> > +EXPORT_SYMBOL(pci_find_aer_service);
> 
> EXPORT_SYMBPOL_GPL please.

Is this a function that a 'normal' PCIe device driver would need to call?
If so it is a PITA requiring all drivers that want AER to be GPL.

	David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 6/7] PCI: Unify wait for link active into generic pci
  2018-02-22 16:02 ` [PATCH v10 6/7] PCI: Unify wait for link active into generic pci Oza Pawandeep
@ 2018-02-22 18:43   ` Christoph Hellwig
  2018-02-23  6:53     ` poza
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-02-22 18:43 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

>  
> +/**
> + * pci__wait_for_link - Wait for link till its active/inactive

typo - just wants a single underscore.

> +	pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
> +	ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);

no need for the !! when assigning to a boolean.

> +
> +	while ((ret != active) && (timeout > 0)) {

No need for either pair of inner braces.

> +		msleep(10);
> +		timeout -= 10;
> +		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
> +		ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);

Same as above.

> +	}
> +
> +	if (ret == active)
> +		return true;

Seems like the structure is a bit odd.  Why not:

	for (;;) {
		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
		if ((lnk_status & PCI_EXP_LNKSTA_DLLLA) == active)
			return true;
		if (timeout <= 0)
			break;
		timeout -= 10;
	}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 1/7] PCI/AER: Rename error recovery to generic pci naming
  2018-02-22 18:34   ` Christoph Hellwig
@ 2018-02-23  5:24     ` poza
  0 siblings, 0 replies; 19+ messages in thread
From: poza @ 2018-02-23  5:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On 2018-02-23 00:04, Christoph Hellwig wrote:
> On Thu, Feb 22, 2018 at 09:32:06PM +0530, Oza Pawandeep wrote:
>> This patch renames error recovery to generic name with pci prefix
>> 
>> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>> 
>> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
>> b/drivers/pci/pcie/aer/aerdrv_core.c
>> index a4bfea5..306bf2f 100644
>> --- a/drivers/pci/pcie/aer/aerdrv_core.c
>> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
>> @@ -478,7 +478,7 @@ static pci_ers_result_t reset_link(struct pci_dev 
>> *dev)
>>  }
>> 
>>  /**
>> - * do_recovery - handle nonfatal/fatal error recovery process
>> + * pci_do_recovery - handle nonfatal/fatal error recovery process
>>   * @dev: pointer to a pci_dev data structure of agent detecting an 
>> error
>>   * @severity: error severity type
>>   *
>> @@ -486,7 +486,7 @@ static pci_ers_result_t reset_link(struct pci_dev 
>> *dev)
>>   * error detected message to all downstream drivers within a 
>> hierarchy in
>>   * question and return the returned code.
>>   */
>> -static void do_recovery(struct pci_dev *dev, int severity)
>> +static void pci_do_recovery(struct pci_dev *dev, int severity)
> 
> Maybe pcie_do_recovery?
> 

sure will take care.
Thanks.

> Otherwise looks fine:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 3/7] PCI/ERR: add mutex to synchronize recovery
  2018-02-22 18:36   ` Christoph Hellwig
@ 2018-02-23  5:27     ` poza
  0 siblings, 0 replies; 19+ messages in thread
From: poza @ 2018-02-23  5:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On 2018-02-23 00:06, Christoph Hellwig wrote:
> On Thu, Feb 22, 2018 at 09:32:08PM +0530, Oza Pawandeep wrote:
>> This patch protects pci_do_recovery with mutex.
> 
> Do we really want a global mutex for this and not one per root complex
> at very least?

This protects pcie_do_recovery, which is called by port service drivers.
mainly aer and dpc as of now. so the association is not root complex 
agents
instead service drivers who attempts to do link recovery.

so I think we are good here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 4/7] PCI/DPC: Unify and plumb error handling into DPC
  2018-02-22 18:39   ` Christoph Hellwig
@ 2018-02-23  5:28     ` poza
  0 siblings, 0 replies; 19+ messages in thread
From: poza @ 2018-02-23  5:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi,
	linux-pci-owner

On 2018-02-23 00:09, Christoph Hellwig wrote:
> On Thu, Feb 22, 2018 at 09:32:09PM +0530, Oza Pawandeep wrote:
>> Current DPC driver does not do recovery, e.g. calling end-point's 
>> driver's
>> callbacks, which sanitize the sw.
>> 
>> DPC driver implements link_reset callback, and calls pci_do_recovery.
>> 
>> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>> 
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index a5a79f0..124f42e 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -343,6 +343,8 @@ static inline resource_size_t 
>> pci_resource_alignment(struct pci_dev *dev,
>>  void pci_enable_acs(struct pci_dev *dev);
>> 
>>  /* PCI error reporting and recovery */
>> +#define DPC_FATAL	4
>> +
>>  void pci_do_recovery(struct pci_dev *dev, int severity);
>> 
>>  #ifdef CONFIG_PCIEASPM
>> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
>> index 38e40c6..208b427 100644
>> --- a/drivers/pci/pcie/pcie-dpc.c
>> +++ b/drivers/pci/pcie/pcie-dpc.c
>> @@ -13,6 +13,7 @@
>>  #include <linux/pcieport_if.h>
>>  #include "../pci.h"
>>  #include "aer/aerdrv.h"
>> +#include "portdrv.h"
>> 
>>  struct dpc_dev {
>>  	struct pcie_device	*dev;
>> @@ -45,6 +46,60 @@ struct dpc_dev {
>>  	"Memory Request Completion Timeout",		 /* Bit Position 18 */
>>  };
>> 
>> +static int find_dpc_dev_iter(struct device *device, void *data)
>> +{
>> +	struct pcie_port_service_driver *service_driver;
>> +	struct device **dev;
>> +
>> +	dev = (struct device **) data;
>> +
>> +	if (device->bus == &pcie_port_bus_type && device->driver) {
>> +		service_driver = to_service_driver(device->driver);
> 
> Please move the initial assignment to the declaration line to
> clean this up a bit:
> 
> 	struct device **dev = (struct device **)data;
> 
> Same thing happens a couple more times in this patch.
> 
>> +EXPORT_SYMBOL(pci_find_dpc_service);
> 
> EXPORT_SYMBOL_GPL for PCI layer internals again - and ditto for
> probably about everything in this series.

will take care of all these comments.

> 
>>  #if IS_ENABLED(CONFIG_PCIEAER)
>> -	/* Use the aer driver of the component firstly */
>> -	driver = pci_find_aer_service(udev);
>> +	if ((severity == AER_FATAL) ||
>> +	    (severity == AER_NONFATAL) ||
>> +	    (severity == AER_CORRECTABLE))
>> +		driver = pci_find_aer_service(udev);
>>  #endif
> 
> No need for the inner braces here.
> 
>> -	if (severity == AER_FATAL)
>> +	if ((severity == AER_FATAL) ||
>> +	    (severity == DPC_FATAL))
>>  		state = pci_channel_io_frozen;
>>  	else
>>  		state = pci_channel_io_normal;
> 
> Same here and a few more times below.

will take care
Thanks,
Oza.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 6/7] PCI: Unify wait for link active into generic pci
  2018-02-22 18:43   ` Christoph Hellwig
@ 2018-02-23  6:53     ` poza
  0 siblings, 0 replies; 19+ messages in thread
From: poza @ 2018-02-23  6:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On 2018-02-23 00:13, Christoph Hellwig wrote:
>> 
>> +/**
>> + * pci__wait_for_link - Wait for link till its active/inactive
> 
> typo - just wants a single underscore.
> 
>> +	pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
>> +	ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
> 
> no need for the !! when assigning to a boolean.

bool is 8 bit, while
#define  PCI_EXP_LNKSTA_DLLLA	0x2000	/* Data Link Layer Link Active */
so I think we need it. otherwise it will treat lnk_status & 
PCI_EXP_LNKSTA_DLLLA as 0 even if 13th bit is set.

> 
>> +
>> +	while ((ret != active) && (timeout > 0)) {
> 
> No need for either pair of inner braces.
> 
>> +		msleep(10);
>> +		timeout -= 10;
>> +		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
>> +		ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
> 
> Same as above.
> 
>> +	}
>> +
>> +	if (ret == active)
>> +		return true;
> 
> Seems like the structure is a bit odd.  Why not:
> 
> 	for (;;) {
> 		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
> 		if ((lnk_status & PCI_EXP_LNKSTA_DLLLA) == active)
> 			return true;
> 		if (timeout <= 0)
> 			break;
> 		timeout -= 10;
> 	}

Thanks for suggestion, will do this way. thanks.
Regards,
Oza.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-02-23  6:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-22 16:02 [PATCH v10 0/7] Address error and recovery for AER and DPC Oza Pawandeep
2018-02-22 16:02 ` [PATCH v10 1/7] PCI/AER: Rename error recovery to generic pci naming Oza Pawandeep
2018-02-22 18:34   ` Christoph Hellwig
2018-02-23  5:24     ` poza
2018-02-22 16:02 ` [PATCH v10 2/7] PCI/AER: factor out error reporting from AER Oza Pawandeep
2018-02-22 18:34   ` Christoph Hellwig
2018-02-22 18:42     ` David Laight
2018-02-22 16:02 ` [PATCH v10 3/7] PCI/ERR: add mutex to synchronize recovery Oza Pawandeep
2018-02-22 18:36   ` Christoph Hellwig
2018-02-23  5:27     ` poza
2018-02-22 16:02 ` [PATCH v10 4/7] PCI/DPC: Unify and plumb error handling into DPC Oza Pawandeep
2018-02-22 18:39   ` Christoph Hellwig
2018-02-23  5:28     ` poza
2018-02-22 16:02 ` [PATCH v10 5/7] PCI/AER: Unify aer error defines at single space Oza Pawandeep
2018-02-22 16:06   ` poza
2018-02-22 16:02 ` [PATCH v10 6/7] PCI: Unify wait for link active into generic pci Oza Pawandeep
2018-02-22 18:43   ` Christoph Hellwig
2018-02-23  6:53     ` poza
2018-02-22 16:02 ` [PATCH v10 7/7] PCI/DPC: Enumerate the devices after DPC trigger event Oza Pawandeep

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.