All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/13] EEH refactoring 2
@ 2018-05-25  3:11 Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line Sam Bobroff
                   ` (12 more replies)
  0 siblings, 13 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

Hello everyone,

Here is a second, somewhat deeper, set of cleanups for the EEH code
(mostly eeh_drver.c).

These changes are not intended to significantly alter the actual processing,
but rather to improve the readability and maintainability of the code. They are
subjective by nature so I would appreciate comments and suggestions.

The earlier changes are mostly to support the last patch, where we're finally
able to use a common infrastructure for the reporting functions (basically
wrappers around the driver's handlers). This allows removal of a fair bit of
code, and the easy addition of some useful messaging which should make future
maintenance easier (as an example, a recent fix in this area "powerpc/eeh: Fix
race with driver un/bind" would have required adding two lines rather than
42+/26-).

Cheers,
Sam.

Patch set changelog follows:

====== v1 -> v2: ======

Patch 1/13: powerpc/eeh: Add eeh_max_freezes to initial EEH log line

Patch 2/13: powerpc/eeh: Add final message for successful recovery

Patch 3/13: powerpc/eeh: Fix use-after-release of EEH driver

Patch 4/13: powerpc/eeh: Remove unused eeh_pcid_name()

Patch 5/13: powerpc/eeh: Strengthen types of eeh traversal functions

Patch 6/13: powerpc/eeh: Add message when PE processing at parent

Patch 7/13: powerpc/eeh: Clean up pci_ers_result handling
* Added the value, and missing newline, to some WARN()s.
* Improved name of merge_result() to pci_ers_merge_result().
* Adjusted the result priorities so that unknown doesn't overlap with _NONE.

Patch 8/13: powerpc/eeh: Introduce eeh_for_each_pe()

Patch 9/13: powerpc/eeh: Introduce eeh_edev_actionable()

Patch 10/13: powerpc/eeh: Introduce eeh_set_channel_state()

Patch 11/13: powerpc/eeh: Introduce eeh_set_irq_state()
* Reorganised eeh_set_irq_state() to reduce nesting depth.

Patch 12/13: powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER

Patch 13/13: powerpc/eeh: Refactor report functions
* Better name for eeh_infoline() and implement using a function.
* Move the core of eeh_pe_report() into a new function to improve readability.
* pci_ers_result_name(): match parameter name to eeh_result_priority(), correct and improve warning message.

====== v1: ======

Patch 1/13: powerpc/eeh: Add eeh_max_freezes to initial EEH log line
Patch 2/13: powerpc/eeh: Add final message for successful recovery
Patch 3/13: powerpc/eeh: Fix use-after-release of EEH driver
Patch 4/13: powerpc/eeh: Remove unused eeh_pcid_name()
Patch 5/13: powerpc/eeh: Strengthen types of eeh traversal functions
Patch 6/13: powerpc/eeh: Add message when PE processing at parent
Patch 7/13: powerpc/eeh: Clean up pci_ers_result handling
Patch 8/13: powerpc/eeh: Introduce eeh_for_each_pe()
Patch 9/13: powerpc/eeh: Introduce eeh_edev_actionable()
Patch 10/13: powerpc/eeh: Introduce eeh_set_channel_state()
Patch 11/13: powerpc/eeh: Introduce eeh_set_irq_state()
Patch 12/13: powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER
Patch 13/13: powerpc/eeh: Refactor report functions

Sam Bobroff (13):
  powerpc/eeh: Add eeh_max_freezes to initial EEH log line
  powerpc/eeh: Add final message for successful recovery
  powerpc/eeh: Fix use-after-release of EEH driver
  powerpc/eeh: Remove unused eeh_pcid_name()
  powerpc/eeh: Strengthen types of eeh traversal functions
  powerpc/eeh: Add message when PE processing at parent
  powerpc/eeh: Clean up pci_ers_result handling
  powerpc/eeh: Introduce eeh_for_each_pe()
  powerpc/eeh: Introduce eeh_edev_actionable()
  powerpc/eeh: Introduce eeh_set_channel_state()
  powerpc/eeh: Introduce eeh_set_irq_state()
  powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER
  powerpc/eeh: Refactor report functions

 arch/powerpc/include/asm/eeh.h   |  11 +-
 arch/powerpc/kernel/eeh.c        |  19 +-
 arch/powerpc/kernel/eeh_driver.c | 493 +++++++++++++++++++++------------------
 arch/powerpc/kernel/eeh_pe.c     |  26 +--
 4 files changed, 294 insertions(+), 255 deletions(-)

-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-06-04 14:11   ` [v2, " Michael Ellerman
  2018-05-25  3:11 ` [PATCH v2 02/13] powerpc/eeh: Add final message for successful recovery Sam Bobroff
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

The current failure message includes the number of failures that have
occurred in the last hour (for a device) but it does not indicate
how many failures will be tolerated before the device is permanently
disabled.

Include the limit (eeh_max_freezes) to make this less surprising when
it happens.

Also remove the embedded newline from the existing message to make it
easier to grep for.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index b8a329f04814..56a60b9eb397 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -778,14 +778,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	eeh_pe_update_time_stamp(pe);
 	pe->freeze_count++;
 	if (pe->freeze_count > eeh_max_freezes) {
-		pr_err("EEH: PHB#%x-PE#%x has failed %d times in the\n"
-		       "last hour and has been permanently disabled.\n",
+		pr_err("EEH: PHB#%x-PE#%x has failed %d times in the last hour and has been permanently disabled.\n",
 		       pe->phb->global_number, pe->addr,
 		       pe->freeze_count);
 		goto hard_fail;
 	}
-	pr_warn("EEH: This PCI device has failed %d times in the last hour\n",
-		pe->freeze_count);
+	pr_warn("EEH: This PCI device has failed %d times in the last hour and will be permanently disabled after %d failures.\n",
+		pe->freeze_count, eeh_max_freezes);
 
 	/* Walk the various device drivers attached to this slot through
 	 * a reset sequence, giving each an opportunity to do what it needs
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 02/13] powerpc/eeh: Add final message for successful recovery
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 03/13] powerpc/eeh: Fix use-after-release of EEH driver Sam Bobroff
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

Add a single log line at the end of successful EEH recovery, so that
it's clear that event processing has finished.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 56a60b9eb397..07e0a42035ce 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	pr_info("EEH: Notify device driver to resume\n");
 	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
+	pr_info("EEH: Recovery successful.\n");
 	goto final;
 
 hard_fail:
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 03/13] powerpc/eeh: Fix use-after-release of EEH driver
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 02/13] powerpc/eeh: Add final message for successful recovery Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 04/13] powerpc/eeh: Remove unused eeh_pcid_name() Sam Bobroff
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

Correct two cases where eeh_pcid_get() is used to reference the driver's
module but the reference is dropped before the driver pointer is used.

In eeh_rmv_device() also refactor a little so that only two calls to
eeh_pcid_put() are needed, rather than three and the reference isn't
taken at all if it wasn't needed.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 07e0a42035ce..54333f6c9d67 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -458,9 +458,11 @@ static void *eeh_add_virt_device(void *data, void *userdata)
 
 	driver = eeh_pcid_get(dev);
 	if (driver) {
-		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (driver->err_handler) {
+			eeh_pcid_put(dev);
 			return NULL;
+		}
+		eeh_pcid_put(dev);
 	}
 
 #ifdef CONFIG_PCI_IOV
@@ -497,17 +499,19 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	if (eeh_dev_removed(edev))
 		return NULL;
 
-	driver = eeh_pcid_get(dev);
-	if (driver) {
-		eeh_pcid_put(dev);
-		if (removed &&
-		    eeh_pe_passed(edev->pe))
-			return NULL;
-		if (removed &&
-		    driver->err_handler &&
-		    driver->err_handler->error_detected &&
-		    driver->err_handler->slot_reset)
+	if (removed) {
+		if (eeh_pe_passed(edev->pe))
 			return NULL;
+		driver = eeh_pcid_get(dev);
+		if (driver) {
+			if (driver->err_handler &&
+			    driver->err_handler->error_detected &&
+			    driver->err_handler->slot_reset) {
+				eeh_pcid_put(dev);
+				return NULL;
+			}
+			eeh_pcid_put(dev);
+		}
 	}
 
 	/* Remove it from PCI subsystem */
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 04/13] powerpc/eeh: Remove unused eeh_pcid_name()
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
                   ` (2 preceding siblings ...)
  2018-05-25  3:11 ` [PATCH v2 03/13] powerpc/eeh: Fix use-after-release of EEH driver Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 05/13] powerpc/eeh: Strengthen types of eeh traversal functions Sam Bobroff
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 54333f6c9d67..ca9a73fe9cc5 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -39,20 +39,6 @@ struct eeh_rmv_data {
 	int removed;
 };
 
-/**
- * eeh_pcid_name - Retrieve name of PCI device driver
- * @pdev: PCI device
- *
- * This routine is used to retrieve the name of PCI device driver
- * if that's valid.
- */
-static inline const char *eeh_pcid_name(struct pci_dev *pdev)
-{
-	if (pdev && pdev->dev.driver)
-		return pdev->dev.driver->name;
-	return "";
-}
-
 /**
  * eeh_pcid_get - Get the PCI device driver
  * @pdev: PCI device
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 05/13] powerpc/eeh: Strengthen types of eeh traversal functions
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
                   ` (3 preceding siblings ...)
  2018-05-25  3:11 ` [PATCH v2 04/13] powerpc/eeh: Remove unused eeh_pcid_name() Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 06/13] powerpc/eeh: Add message when PE processing at parent Sam Bobroff
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

The traversal functions eeh_pe_traverse() and eeh_pe_dev_traverse()
both provide their first argument as void * but every single user casts
it to the expected type.

Change the type of the first parameter from void * to the appropriate
type, and clean up all uses.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |  7 ++++---
 arch/powerpc/kernel/eeh.c        | 13 +++++--------
 arch/powerpc/kernel/eeh_driver.c | 30 ++++++++++--------------------
 arch/powerpc/kernel/eeh_pe.c     | 19 +++++++------------
 4 files changed, 26 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c2266ca61853..f02e0400e6f2 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -262,7 +262,8 @@ static inline bool eeh_state_active(int state)
 	== (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
 }
 
-typedef void *(*eeh_traverse_func)(void *data, void *flag);
+typedef void *(*eeh_edev_traverse_func)(struct eeh_dev *edev, void *flag);
+typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, void *flag);
 void eeh_set_pe_aux_size(int size);
 int eeh_phb_pe_create(struct pci_controller *phb);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
@@ -272,9 +273,9 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev);
 int eeh_rmv_from_parent_pe(struct eeh_dev *edev);
 void eeh_pe_update_time_stamp(struct eeh_pe *pe);
 void *eeh_pe_traverse(struct eeh_pe *root,
-		eeh_traverse_func fn, void *flag);
+		      eeh_pe_traverse_func fn, void *flag);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
-		eeh_traverse_func fn, void *flag);
+			  eeh_edev_traverse_func fn, void *flag);
 void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index bc640e4c5ca5..f82dade4fb9a 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -263,9 +263,8 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 	return n;
 }
 
-static void *eeh_dump_pe_log(void *data, void *flag)
+static void *eeh_dump_pe_log(struct eeh_pe *pe, void *flag)
 {
-	struct eeh_pe *pe = data;
 	struct eeh_dev *edev, *tmp;
 	size_t *plen = flag;
 
@@ -686,9 +685,9 @@ int eeh_pci_enable(struct eeh_pe *pe, int function)
 	return rc;
 }
 
-static void *eeh_disable_and_save_dev_state(void *data, void *userdata)
+static void *eeh_disable_and_save_dev_state(struct eeh_dev *edev,
+					    void *userdata)
 {
-	struct eeh_dev *edev = data;
 	struct pci_dev *pdev = eeh_dev_to_pci_dev(edev);
 	struct pci_dev *dev = userdata;
 
@@ -714,9 +713,8 @@ static void *eeh_disable_and_save_dev_state(void *data, void *userdata)
 	return NULL;
 }
 
-static void *eeh_restore_dev_state(void *data, void *userdata)
+static void *eeh_restore_dev_state(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = data;
 	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 	struct pci_dev *pdev = eeh_dev_to_pci_dev(edev);
 	struct pci_dev *dev = userdata;
@@ -856,11 +854,10 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat
  * the indicated device and its children so that the bunch of the
  * devices could be reset properly.
  */
-static void *eeh_set_dev_freset(void *data, void *flag)
+static void *eeh_set_dev_freset(struct eeh_dev *edev, void *flag)
 {
 	struct pci_dev *dev;
 	unsigned int *freset = (unsigned int *)flag;
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 
 	dev = eeh_dev_to_pci_dev(edev);
 	if (dev)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index ca9a73fe9cc5..188d15c4fe3a 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -149,9 +149,8 @@ static bool eeh_dev_removed(struct eeh_dev *edev)
 	return false;
 }
 
-static void *eeh_dev_save_state(void *data, void *userdata)
+static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = data;
 	struct pci_dev *pdev;
 
 	if (!edev)
@@ -184,9 +183,8 @@ static void *eeh_dev_save_state(void *data, void *userdata)
  * merge the device driver responses. Cumulative response
  * passed back in "userdata".
  */
-static void *eeh_report_error(void *data, void *userdata)
+static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
@@ -231,9 +229,8 @@ static void *eeh_report_error(void *data, void *userdata)
  * are now enabled. Collects up and merges the device driver responses.
  * Cumulative response passed back in "userdata".
  */
-static void *eeh_report_mmio_enabled(void *data, void *userdata)
+static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
@@ -273,9 +270,8 @@ static void *eeh_report_mmio_enabled(void *data, void *userdata)
  * some actions, usually to save data the driver needs so that the
  * driver can work again while the device is recovered.
  */
-static void *eeh_report_reset(void *data, void *userdata)
+static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
@@ -310,9 +306,8 @@ static void *eeh_report_reset(void *data, void *userdata)
 	return NULL;
 }
 
-static void *eeh_dev_restore_state(void *data, void *userdata)
+static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = data;
 	struct pci_dev *pdev;
 
 	if (!edev)
@@ -348,9 +343,8 @@ static void *eeh_dev_restore_state(void *data, void *userdata)
  * could resume so that the device driver can do some initialization
  * to make the recovered device work again.
  */
-static void *eeh_report_resume(void *data, void *userdata)
+static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	bool was_in_error;
 	struct pci_driver *driver;
@@ -397,9 +391,8 @@ static void *eeh_report_resume(void *data, void *userdata)
  * This informs the device driver that the device is permanently
  * dead, and that no further recovery attempts will be made on it.
  */
-static void *eeh_report_failure(void *data, void *userdata)
+static void *eeh_report_failure(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	struct pci_driver *driver;
 
@@ -457,10 +450,9 @@ static void *eeh_add_virt_device(void *data, void *userdata)
 	return NULL;
 }
 
-static void *eeh_rmv_device(void *data, void *userdata)
+static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata)
 {
 	struct pci_driver *driver;
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	struct eeh_rmv_data *rmv_data = (struct eeh_rmv_data *)userdata;
 	int *removed = rmv_data ? &rmv_data->removed : NULL;
@@ -532,9 +524,8 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	return NULL;
 }
 
-static void *eeh_pe_detach_dev(void *data, void *userdata)
+static void *eeh_pe_detach_dev(struct eeh_pe *pe, void *userdata)
 {
-	struct eeh_pe *pe = (struct eeh_pe *)data;
 	struct eeh_dev *edev, *tmp;
 
 	eeh_pe_for_each_dev(pe, edev, tmp) {
@@ -555,9 +546,8 @@ static void *eeh_pe_detach_dev(void *data, void *userdata)
  * PE reset (for 3 times), we try to clear the frozen state
  * for 3 times as well.
  */
-static void *__eeh_clear_pe_frozen_state(void *data, void *flag)
+static void *__eeh_clear_pe_frozen_state(struct eeh_pe *pe, void *flag)
 {
-	struct eeh_pe *pe = (struct eeh_pe *)data;
 	bool clear_sw_state = *(bool *)flag;
 	int i, rc = 1;
 
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index ee5a67d57aab..38a4bcd8ed13 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -173,7 +173,7 @@ static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe,
  * to be traversed.
  */
 void *eeh_pe_traverse(struct eeh_pe *root,
-		      eeh_traverse_func fn, void *flag)
+		      eeh_pe_traverse_func fn, void *flag)
 {
 	struct eeh_pe *pe;
 	void *ret;
@@ -196,7 +196,7 @@ void *eeh_pe_traverse(struct eeh_pe *root,
  * PE and its child PEs.
  */
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
-		eeh_traverse_func fn, void *flag)
+			  eeh_edev_traverse_func fn, void *flag)
 {
 	struct eeh_pe *pe;
 	struct eeh_dev *edev, *tmp;
@@ -235,9 +235,8 @@ struct eeh_pe_get_flag {
 	int config_addr;
 };
 
-static void *__eeh_pe_get(void *data, void *flag)
+static void *__eeh_pe_get(struct eeh_pe *pe, void *flag)
 {
-	struct eeh_pe *pe = (struct eeh_pe *)data;
 	struct eeh_pe_get_flag *tmp = (struct eeh_pe_get_flag *) flag;
 
 	/* Unexpected PHB PE */
@@ -551,9 +550,8 @@ void eeh_pe_update_time_stamp(struct eeh_pe *pe)
  * PE. Also, the associated PCI devices will be put into IO frozen
  * state as well.
  */
-static void *__eeh_pe_state_mark(void *data, void *flag)
+static void *__eeh_pe_state_mark(struct eeh_pe *pe, void *flag)
 {
-	struct eeh_pe *pe = (struct eeh_pe *)data;
 	int state = *((int *)flag);
 	struct eeh_dev *edev, *tmp;
 	struct pci_dev *pdev;
@@ -595,9 +593,8 @@ void eeh_pe_state_mark(struct eeh_pe *pe, int state)
 }
 EXPORT_SYMBOL_GPL(eeh_pe_state_mark);
 
-static void *__eeh_pe_dev_mode_mark(void *data, void *flag)
+static void *__eeh_pe_dev_mode_mark(struct eeh_dev *edev, void *flag)
 {
-	struct eeh_dev *edev = data;
 	int mode = *((int *)flag);
 
 	edev->mode |= mode;
@@ -625,9 +622,8 @@ void eeh_pe_dev_mode_mark(struct eeh_pe *pe, int mode)
  * given PE. Besides, we also clear the check count of the PE
  * as well.
  */
-static void *__eeh_pe_state_clear(void *data, void *flag)
+static void *__eeh_pe_state_clear(struct eeh_pe *pe, void *flag)
 {
-	struct eeh_pe *pe = (struct eeh_pe *)data;
 	int state = *((int *)flag);
 	struct eeh_dev *edev, *tmp;
 	struct pci_dev *pdev;
@@ -858,9 +854,8 @@ static void eeh_restore_device_bars(struct eeh_dev *edev)
  * the expansion ROM base address, the latency timer, and etc.
  * from the saved values in the device node.
  */
-static void *eeh_restore_one_device_bars(void *data, void *flag)
+static void *eeh_restore_one_device_bars(struct eeh_dev *edev, void *flag)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
 	/* Do special restore for bridges */
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 06/13] powerpc/eeh: Add message when PE processing at parent
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
                   ` (4 preceding siblings ...)
  2018-05-25  3:11 ` [PATCH v2 05/13] powerpc/eeh: Strengthen types of eeh traversal functions Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling Sam Bobroff
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

To aid debugging, add a message to show when EEH processing for a PE
will be done at the device's parent, rather than directly at the
device.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f82dade4fb9a..1139821a9aec 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -541,8 +541,12 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 
 		/* Frozen parent PE ? */
 		ret = eeh_ops->get_state(parent_pe, NULL);
-		if (ret > 0 && !eeh_state_active(ret))
+		if (ret > 0 && !eeh_state_active(ret)) {
 			pe = parent_pe;
+			pr_err("EEH: Failure of PHB#%x-PE#%x will be handled at parent PHB#%x-PE#%x.\n",
+			       pe->phb->global_number, pe->addr,
+			       pe->phb->global_number, parent_pe->addr);
+		}
 
 		/* Next parent level */
 		parent_pe = parent_pe->parent;
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
                   ` (5 preceding siblings ...)
  2018-05-25  3:11 ` [PATCH v2 06/13] powerpc/eeh: Add message when PE processing at parent Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-06-01 15:40   ` Michael Ellerman
  2018-05-25  3:11 ` [PATCH v2 08/13] powerpc/eeh: Introduce eeh_for_each_pe() Sam Bobroff
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

As EEH event handling progresses, a cumulative result of type
pci_ers_result is built up by (some of) the eeh_report_*() functions
using either:
	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
or:
	if ((*res == PCI_ERS_RESULT_NONE) ||
	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
	if (*res == PCI_ERS_RESULT_DISCONNECT &&
	    rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
(Where *res is the accumulator.)

However, the intent is not immediately clear and the result in some
situations is order dependent.

Address this by assigning a priority to each result value, and always
merging to the highest priority. This renders the intent clear, and
provides a stable value for all orderings.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
====== v1 -> v2: ======
* Added the value, and missing newline, to some WARN()s.
* Improved name of merge_result() to pci_ers_merge_result().
* Adjusted the result priorities so that unknown doesn't overlap with _NONE.

 arch/powerpc/kernel/eeh_driver.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 188d15c4fe3a..2d3cac584899 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -39,6 +39,29 @@ struct eeh_rmv_data {
 	int removed;
 };
 
+static int eeh_result_priority(enum pci_ers_result result)
+{
+	switch (result) {
+	case PCI_ERS_RESULT_NONE: return 1;
+	case PCI_ERS_RESULT_NO_AER_DRIVER: return 2;
+	case PCI_ERS_RESULT_RECOVERED: return 3;
+	case PCI_ERS_RESULT_CAN_RECOVER: return 4;
+	case PCI_ERS_RESULT_DISCONNECT: return 5;
+	case PCI_ERS_RESULT_NEED_RESET: return 6;
+	default:
+		WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", (int)result);
+		return 0;
+	}
+};
+
+static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
+						enum pci_ers_result new)
+{
+	if (eeh_result_priority(new) > eeh_result_priority(old))
+		return new;
+	return old;
+}
+
 /**
  * eeh_pcid_get - Get the PCI device driver
  * @pdev: PCI device
@@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 
 	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
 
-	/* A driver that needs a reset trumps all others */
-	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
+	*res = pci_ers_merge_result(*res, rc);
 
 	edev->in_error = true;
 	pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
@@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata)
 
 	rc = driver->err_handler->mmio_enabled(dev);
 
-	/* A driver that needs a reset trumps all others */
-	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
+	*res = pci_ers_merge_result(*res, rc);
 
 out:
 	eeh_pcid_put(dev);
@@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
 		goto out;
 
 	rc = driver->err_handler->slot_reset(dev);
-	if ((*res == PCI_ERS_RESULT_NONE) ||
-	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
-	if (*res == PCI_ERS_RESULT_DISCONNECT &&
-	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
+	*res = pci_ers_merge_result(*res, rc);
 
 out:
 	eeh_pcid_put(dev);
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 08/13] powerpc/eeh: Introduce eeh_for_each_pe()
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
                   ` (6 preceding siblings ...)
  2018-05-25  3:11 ` [PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 09/13] powerpc/eeh: Introduce eeh_edev_actionable() Sam Bobroff
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

Add a for_each-style macro for iterating through PEs without the
boilerplate required by a traversal function. eeh_pe_next() is now
exported, as it is now used directly in place.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h | 4 ++++
 arch/powerpc/kernel/eeh_pe.c   | 7 +++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index f02e0400e6f2..677102baf3cd 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -106,6 +106,9 @@ struct eeh_pe {
 #define eeh_pe_for_each_dev(pe, edev, tmp) \
 		list_for_each_entry_safe(edev, tmp, &pe->edevs, list)
 
+#define eeh_for_each_pe(root, pe) \
+	for (pe = root; pe; pe = eeh_pe_next(pe, root))
+
 static inline bool eeh_pe_passed(struct eeh_pe *pe)
 {
 	return pe ? !!atomic_read(&pe->pass_dev_cnt) : false;
@@ -267,6 +270,7 @@ typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, void *flag);
 void eeh_set_pe_aux_size(int size);
 int eeh_phb_pe_create(struct pci_controller *phb);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
+struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root);
 struct eeh_pe *eeh_pe_get(struct pci_controller *phb,
 			  int pe_no, int config_addr);
 int eeh_add_to_parent_pe(struct eeh_dev *edev);
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 38a4bcd8ed13..1b238ecc553e 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -142,8 +142,7 @@ struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
  * The function is used to retrieve the next PE in the
  * hierarchy PE tree.
  */
-static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe,
-				  struct eeh_pe *root)
+struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root)
 {
 	struct list_head *next = pe->child_list.next;
 
@@ -178,7 +177,7 @@ void *eeh_pe_traverse(struct eeh_pe *root,
 	struct eeh_pe *pe;
 	void *ret;
 
-	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
+	eeh_for_each_pe(root, pe) {
 		ret = fn(pe, flag);
 		if (ret) return ret;
 	}
@@ -209,7 +208,7 @@ void *eeh_pe_dev_traverse(struct eeh_pe *root,
 	}
 
 	/* Traverse root PE */
-	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
+	eeh_for_each_pe(root, pe) {
 		eeh_pe_for_each_dev(pe, edev, tmp) {
 			ret = fn(edev, flag);
 			if (ret)
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 09/13] powerpc/eeh: Introduce eeh_edev_actionable()
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
                   ` (7 preceding siblings ...)
  2018-05-25  3:11 ` [PATCH v2 08/13] powerpc/eeh: Introduce eeh_for_each_pe() Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 10/13] powerpc/eeh: Introduce eeh_set_channel_state() Sam Bobroff
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

The same test is done in every EEH report function, so factor it out.

Since eeh_dev_removed() needs to be moved higher up in the file,
simplify it a little while we're at it.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 2d3cac584899..30898866418b 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -62,6 +62,17 @@ static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
 	return old;
 }
 
+static bool eeh_dev_removed(struct eeh_dev *edev)
+{
+	return !edev || (edev->mode & EEH_DEV_REMOVED);
+}
+
+static bool eeh_edev_actionable(struct eeh_dev *edev)
+{
+	return (edev->pdev && !eeh_dev_removed(edev) &&
+		!eeh_pe_passed(edev->pe));
+}
+
 /**
  * eeh_pcid_get - Get the PCI device driver
  * @pdev: PCI device
@@ -163,15 +174,6 @@ static void eeh_enable_irq(struct pci_dev *dev)
 	}
 }
 
-static bool eeh_dev_removed(struct eeh_dev *edev)
-{
-	/* EEH device removed ? */
-	if (!edev || (edev->mode & EEH_DEV_REMOVED))
-		return true;
-
-	return false;
-}
-
 static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
 {
 	struct pci_dev *pdev;
@@ -212,7 +214,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+	if (!eeh_edev_actionable(edev))
 		return NULL;
 
 	device_lock(&dev->dev);
@@ -256,7 +258,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata)
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+	if (!eeh_edev_actionable(edev))
 		return NULL;
 
 	device_lock(&dev->dev);
@@ -295,7 +297,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+	if (!eeh_edev_actionable(edev))
 		return NULL;
 
 	device_lock(&dev->dev);
@@ -365,7 +367,7 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
 	bool was_in_error;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+	if (!eeh_edev_actionable(edev))
 		return NULL;
 
 	device_lock(&dev->dev);
@@ -412,7 +414,7 @@ static void *eeh_report_failure(struct eeh_dev *edev, void *userdata)
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+	if (!eeh_edev_actionable(edev))
 		return NULL;
 
 	device_lock(&dev->dev);
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 10/13] powerpc/eeh: Introduce eeh_set_channel_state()
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
                   ` (8 preceding siblings ...)
  2018-05-25  3:11 ` [PATCH v2 09/13] powerpc/eeh: Introduce eeh_edev_actionable() Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 11/13] powerpc/eeh: Introduce eeh_set_irq_state() Sam Bobroff
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

To ease future refactoring, extract setting of the channel state
from the report functions out into their own functions. This increases
the amount of code that is identical across all of the report
functions.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 30898866418b..76951ca701b2 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -199,6 +199,17 @@ static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
 	return NULL;
 }
 
+static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s)
+{
+	struct eeh_pe *pe;
+	struct eeh_dev *edev, *tmp;
+
+	eeh_for_each_pe(root, pe)
+		eeh_pe_for_each_dev(pe, edev, tmp)
+			if (eeh_edev_actionable(edev))
+				edev->pdev->error_state = s;
+}
+
 /**
  * eeh_report_error - Report pci error to each device driver
  * @data: eeh device
@@ -218,7 +229,6 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 		return NULL;
 
 	device_lock(&dev->dev);
-	dev->error_state = pci_channel_io_frozen;
 
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
@@ -301,7 +311,6 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
 		return NULL;
 
 	device_lock(&dev->dev);
-	dev->error_state = pci_channel_io_normal;
 
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
@@ -371,7 +380,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
 		return NULL;
 
 	device_lock(&dev->dev);
-	dev->error_state = pci_channel_io_normal;
 
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
@@ -795,6 +803,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * hotplug for this case.
 	 */
 	pr_info("EEH: Notify device drivers to shutdown\n");
+	eeh_set_channel_state(pe, pci_channel_io_frozen);
 	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
 	if ((pe->type & EEH_PE_PHB) &&
 	    result != PCI_ERS_RESULT_NONE &&
@@ -885,6 +894,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		pr_info("EEH: Notify device drivers "
 			"the completion of reset\n");
 		result = PCI_ERS_RESULT_NONE;
+		eeh_set_channel_state(pe, pci_channel_io_normal);
 		eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
 	}
 
@@ -906,6 +916,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 
 	/* Tell all device drivers that they can resume operations */
 	pr_info("EEH: Notify device driver to resume\n");
+	eeh_set_channel_state(pe, pci_channel_io_normal);
 	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
 	pr_info("EEH: Recovery successful.\n");
@@ -924,6 +935,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	eeh_slot_error_detail(pe, EEH_LOG_PERM);
 
 	/* Notify all devices that they're about to go down. */
+	eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
 
 	/* Mark the PE to be removed permanently */
@@ -1033,6 +1045,7 @@ void eeh_handle_special_event(void)
 
 				/* Notify all devices to be down */
 				eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
+				eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 				eeh_pe_dev_traverse(pe,
 					eeh_report_failure, NULL);
 				bus = eeh_pe_bus_get(phb_pe);
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 11/13] powerpc/eeh: Introduce eeh_set_irq_state()
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
                   ` (9 preceding siblings ...)
  2018-05-25  3:11 ` [PATCH v2 10/13] powerpc/eeh: Introduce eeh_set_channel_state() Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 12/13] powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 13/13] powerpc/eeh: Refactor report functions Sam Bobroff
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

To ease future refactoring, extract calls to eeh_enable_irq() and
eeh_disable_irq() from the various report functions. This makes
the report functions initial sequences more similar, as well as making
the IRQ changes visible when reading eeh_handle_normal_event().

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
====== v1 -> v2: ======
* Reorganised eeh_set_irq_state() to reduce nesting depth.

 arch/powerpc/kernel/eeh_driver.c | 52 ++++++++++++++++++++++++++--------------
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 76951ca701b2..8dd2a33424a1 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -118,22 +118,20 @@ static inline void eeh_pcid_put(struct pci_dev *pdev)
  * do real work because EEH should freeze DMA transfers for those PCI
  * devices encountering EEH errors, which includes MSI or MSI-X.
  */
-static void eeh_disable_irq(struct pci_dev *dev)
+static void eeh_disable_irq(struct eeh_dev *edev)
 {
-	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
-
 	/* Don't disable MSI and MSI-X interrupts. They are
 	 * effectively disabled by the DMA Stopped state
 	 * when an EEH error occurs.
 	 */
-	if (dev->msi_enabled || dev->msix_enabled)
+	if (edev->pdev->msi_enabled || edev->pdev->msix_enabled)
 		return;
 
-	if (!irq_has_action(dev->irq))
+	if (!irq_has_action(edev->pdev->irq))
 		return;
 
 	edev->mode |= EEH_DEV_IRQ_DISABLED;
-	disable_irq_nosync(dev->irq);
+	disable_irq_nosync(edev->pdev->irq);
 }
 
 /**
@@ -143,10 +141,8 @@ static void eeh_disable_irq(struct pci_dev *dev)
  * This routine must be called to enable interrupt while failed
  * device could be resumed.
  */
-static void eeh_enable_irq(struct pci_dev *dev)
+static void eeh_enable_irq(struct eeh_dev *edev)
 {
-	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
-
 	if ((edev->mode) & EEH_DEV_IRQ_DISABLED) {
 		edev->mode &= ~EEH_DEV_IRQ_DISABLED;
 		/*
@@ -169,8 +165,8 @@ static void eeh_enable_irq(struct pci_dev *dev)
 		 *
 		 *	tglx
 		 */
-		if (irqd_irq_disabled(irq_get_irq_data(dev->irq)))
-			enable_irq(dev->irq);
+		if (irqd_irq_disabled(irq_get_irq_data(edev->pdev->irq)))
+			enable_irq(edev->pdev->irq);
 	}
 }
 
@@ -210,6 +206,29 @@ static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s)
 				edev->pdev->error_state = s;
 }
 
+static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
+{
+	struct eeh_pe *pe;
+	struct eeh_dev *edev, *tmp;
+
+	eeh_for_each_pe(root, pe) {
+		eeh_pe_for_each_dev(pe, edev, tmp) {
+			if (!eeh_edev_actionable(edev))
+				continue;
+
+			if (!eeh_pcid_get(edev->pdev))
+				continue;
+
+			if (enable)
+				eeh_enable_irq(edev);
+			else
+				eeh_disable_irq(edev);
+
+			eeh_pcid_put(edev->pdev);
+		}
+	}
+}
+
 /**
  * eeh_report_error - Report pci error to each device driver
  * @data: eeh device
@@ -233,8 +252,6 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
 
-	eeh_disable_irq(dev);
-
 	if (!driver->err_handler ||
 	    !driver->err_handler->error_detected)
 		goto out;
@@ -315,8 +332,6 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
 
-	eeh_enable_irq(dev);
-
 	if (!driver->err_handler ||
 	    !driver->err_handler->slot_reset ||
 	    (edev->mode & EEH_DEV_NO_HANDLER) ||
@@ -386,7 +401,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
 
 	was_in_error = edev->in_error;
 	edev->in_error = false;
-	eeh_enable_irq(dev);
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume ||
@@ -431,8 +445,6 @@ static void *eeh_report_failure(struct eeh_dev *edev, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
 
-	eeh_disable_irq(dev);
-
 	if (!driver->err_handler ||
 	    !driver->err_handler->error_detected)
 		goto out;
@@ -804,6 +816,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 */
 	pr_info("EEH: Notify device drivers to shutdown\n");
 	eeh_set_channel_state(pe, pci_channel_io_frozen);
+	eeh_set_irq_state(pe, false);
 	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
 	if ((pe->type & EEH_PE_PHB) &&
 	    result != PCI_ERS_RESULT_NONE &&
@@ -895,6 +908,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 			"the completion of reset\n");
 		result = PCI_ERS_RESULT_NONE;
 		eeh_set_channel_state(pe, pci_channel_io_normal);
+		eeh_set_irq_state(pe, true);
 		eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
 	}
 
@@ -917,6 +931,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	/* Tell all device drivers that they can resume operations */
 	pr_info("EEH: Notify device driver to resume\n");
 	eeh_set_channel_state(pe, pci_channel_io_normal);
+	eeh_set_irq_state(pe, true);
 	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
 	pr_info("EEH: Recovery successful.\n");
@@ -936,6 +951,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 
 	/* Notify all devices that they're about to go down. */
 	eeh_set_channel_state(pe, pci_channel_io_perm_failure);
+	eeh_set_irq_state(pe, false);
 	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
 
 	/* Mark the PE to be removed permanently */
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 12/13] powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
                   ` (10 preceding siblings ...)
  2018-05-25  3:11 ` [PATCH v2 11/13] powerpc/eeh: Introduce eeh_set_irq_state() Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  2018-05-25  3:11 ` [PATCH v2 13/13] powerpc/eeh: Refactor report functions Sam Bobroff
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

If a device without a driver is recovered via EEH, the flag
EEH_DEV_NO_HANDLER is incorrectly left set on the device after
recovery, because the test in eeh_report_resume() for the existence of
a bound driver is done before the flag is cleared. If a driver is
later bound, and EEH experienced again, some of the drivers EEH
handers are not called.

To correct this, clear the flag unconditionally after EEH processing
is complete.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 8dd2a33424a1..42fe80ed6f59 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -405,7 +405,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume ||
 	    (edev->mode & EEH_DEV_NO_HANDLER) || !was_in_error) {
-		edev->mode &= ~EEH_DEV_NO_HANDLER;
 		goto out;
 	}
 
@@ -780,6 +779,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 {
 	struct pci_bus *bus;
 	struct eeh_dev *edev, *tmp;
+	struct eeh_pe *tmp_pe;
 	int rc = 0;
 	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
 	struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.edev_list), 0};
@@ -934,6 +934,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	eeh_set_irq_state(pe, true);
 	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
+	eeh_for_each_pe(pe, tmp_pe)
+		eeh_pe_for_each_dev(tmp_pe, edev, tmp)
+			edev->mode &= ~EEH_DEV_NO_HANDLER;
+
 	pr_info("EEH: Recovery successful.\n");
 	goto final;
 
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 13/13] powerpc/eeh: Refactor report functions
  2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
                   ` (11 preceding siblings ...)
  2018-05-25  3:11 ` [PATCH v2 12/13] powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER Sam Bobroff
@ 2018-05-25  3:11 ` Sam Bobroff
  12 siblings, 0 replies; 18+ messages in thread
From: Sam Bobroff @ 2018-05-25  3:11 UTC (permalink / raw)
  To: linuxppc-dev

The EEH report functions now share a fair bit of code around the start
and end of each function.

So factor out as much as possible, and move the traversal into a
custom function. This also allows accurate debug to be generated more
easily.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
====== v1 -> v2: ======
* Better name for eeh_infoline() and implement using a function.
* Move the core of eeh_pe_report() into a new function to improve readability.
* pci_ers_result_name(): match parameter name to eeh_result_priority(), correct and improve warning message.

 arch/powerpc/kernel/eeh_driver.c | 310 ++++++++++++++++++++-------------------
 1 file changed, 159 insertions(+), 151 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 42fe80ed6f59..7cab58abbec9 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -54,6 +54,40 @@ static int eeh_result_priority(enum pci_ers_result result)
 	}
 };
 
+const char *pci_ers_result_name(enum pci_ers_result result)
+{
+	switch (result) {
+	case PCI_ERS_RESULT_NONE: return "none";
+	case PCI_ERS_RESULT_CAN_RECOVER: return "can recover";
+	case PCI_ERS_RESULT_NEED_RESET: return "need reset";
+	case PCI_ERS_RESULT_DISCONNECT: return "disconnect";
+	case PCI_ERS_RESULT_RECOVERED: return "recovered";
+	case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver";
+	default:
+		WARN_ONCE(1, "Unknown result type: %d\n", (int)result);
+		return "unknown";
+	}
+};
+
+static __printf(2, 3)
+void eeh_edev_info(const struct eeh_dev *edev, const char *fmt, ...)
+{
+	struct va_format vaf;
+	va_list args;
+
+	va_start(args, fmt);
+
+	vaf.fmt = fmt;
+	vaf.va = &args;
+
+	printk(KERN_INFO "EEH: PE#%x (PCI %s): %pV\n",
+			edev->pe_config_addr,
+			edev->pdev ? dev_name(&edev->pdev->dev) : "none",
+			&vaf);
+
+	va_end(args);
+}
+
 static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
 						enum pci_ers_result new)
 {
@@ -229,123 +263,124 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
 	}
 }
 
-/**
- * eeh_report_error - Report pci error to each device driver
- * @data: eeh device
- * @userdata: return value
- *
- * Report an EEH error to each device driver, collect up and
- * merge the device driver responses. Cumulative response
- * passed back in "userdata".
- */
-static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
+typedef enum pci_ers_result (*eeh_report_fn)(struct eeh_dev *,
+					     struct pci_driver *);
+static void eeh_pe_report_edev(struct eeh_dev *edev,
+					      eeh_report_fn fn,
+					      enum pci_ers_result *result)
 {
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
+	enum pci_ers_result new_result;
+
+	device_lock(&edev->pdev->dev);
+	if (eeh_edev_actionable(edev)) {
+		driver = eeh_pcid_get(edev->pdev);
+
+		if (!driver)
+			eeh_edev_info(edev, "no driver");
+		else if (!driver->err_handler)
+			eeh_edev_info(edev,
+				     "driver not EEH aware");
+		else if (edev->mode & EEH_DEV_NO_HANDLER)
+			eeh_edev_info(edev,
+				     "driver bound too late");
+		else {
+			new_result = fn(edev, driver);
+			eeh_edev_info(edev,
+				"%s driver reports: '%s'",
+				driver->name,
+				pci_ers_result_name(new_result));
+			if (result)
+				*result = pci_ers_merge_result(*result,
+							       new_result);
+		}
+		if (driver)
+			eeh_pcid_put(edev->pdev);
+	} else {
+		eeh_edev_info(edev, "not actionable (%d,%d,%d)",
+			     !!edev->pdev,
+			     !eeh_dev_removed(edev),
+			     !eeh_pe_passed(edev->pe));
+	}
+	device_unlock(&edev->pdev->dev);
+}
 
-	if (!eeh_edev_actionable(edev))
-		return NULL;
+static void eeh_pe_report(const char *name, struct eeh_pe *root,
+			  eeh_report_fn fn,
+			  enum pci_ers_result *result)
+{
+	struct eeh_pe *pe;
+	struct eeh_dev *edev, *tmp;
 
-	device_lock(&dev->dev);
+	pr_info("EEH: Beginning: '%s'\n", name);
+	eeh_for_each_pe(root, pe)
+		eeh_pe_for_each_dev(pe, edev, tmp)
+			eeh_pe_report_edev(edev, fn, result);
+	if (result)
+		pr_info("EEH: Finished:'%s' with aggregate recovery state:'%s'\n",
+			name, pci_ers_result_name(*result));
+	else
+		pr_info("EEH: Finished:'%s'", name);
+}
 
-	driver = eeh_pcid_get(dev);
-	if (!driver) goto out_no_dev;
+/**
+ * eeh_report_error - Report pci error to each device driver
+ * @edev: eeh device
+ * @driver: device's PCI driver
+ *
+ * Report an EEH error to each device driver.
+ */
+static enum pci_ers_result eeh_report_error(struct eeh_dev *edev,
+			      struct pci_driver *driver)
+{
+	enum pci_ers_result rc;
+	struct pci_dev *dev = edev->pdev;
 
-	if (!driver->err_handler ||
-	    !driver->err_handler->error_detected)
-		goto out;
+	if (!driver->err_handler->error_detected)
+		return PCI_ERS_RESULT_NONE;
 
+	eeh_edev_info(edev, "Invoking %s->error_detected(IO frozen)", driver->name);
 	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
 
-	*res = pci_ers_merge_result(*res, rc);
-
 	edev->in_error = true;
 	pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
-
-out:
-	eeh_pcid_put(dev);
-out_no_dev:
-	device_unlock(&dev->dev);
-	return NULL;
+	return rc;
 }
 
 /**
  * eeh_report_mmio_enabled - Tell drivers that MMIO has been enabled
- * @data: eeh device
- * @userdata: return value
+ * @edev: eeh device
+ * @driver: device's PCI driver
  *
  * Tells each device driver that IO ports, MMIO and config space I/O
- * are now enabled. Collects up and merges the device driver responses.
- * Cumulative response passed back in "userdata".
+ * are now enabled.
  */
-static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata)
+static enum pci_ers_result eeh_report_mmio_enabled(struct eeh_dev *edev,
+						   struct pci_driver *driver)
 {
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	enum pci_ers_result rc, *res = userdata;
-	struct pci_driver *driver;
-
-	if (!eeh_edev_actionable(edev))
-		return NULL;
-
-	device_lock(&dev->dev);
-	driver = eeh_pcid_get(dev);
-	if (!driver) goto out_no_dev;
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->mmio_enabled ||
-	    (edev->mode & EEH_DEV_NO_HANDLER))
-		goto out;
-
-	rc = driver->err_handler->mmio_enabled(dev);
-
-	*res = pci_ers_merge_result(*res, rc);
-
-out:
-	eeh_pcid_put(dev);
-out_no_dev:
-	device_unlock(&dev->dev);
-	return NULL;
+	if (!driver->err_handler->mmio_enabled)
+		return PCI_ERS_RESULT_NONE;
+	eeh_edev_info(edev, "Invoking %s->mmio_enabled()", driver->name);
+	return driver->err_handler->mmio_enabled(edev->pdev);
 }
 
 /**
  * eeh_report_reset - Tell device that slot has been reset
- * @data: eeh device
- * @userdata: return value
+ * @edev: eeh device
+ * @driver: device's PCI driver
  *
  * This routine must be called while EEH tries to reset particular
  * PCI device so that the associated PCI device driver could take
  * some actions, usually to save data the driver needs so that the
  * driver can work again while the device is recovered.
  */
-static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
+static enum pci_ers_result eeh_report_reset(struct eeh_dev *edev,
+					    struct pci_driver *driver)
 {
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	enum pci_ers_result rc, *res = userdata;
-	struct pci_driver *driver;
-
-	if (!eeh_edev_actionable(edev))
-		return NULL;
-
-	device_lock(&dev->dev);
-
-	driver = eeh_pcid_get(dev);
-	if (!driver) goto out_no_dev;
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->slot_reset ||
-	    (edev->mode & EEH_DEV_NO_HANDLER) ||
-	    (!edev->in_error))
-		goto out;
-
-	rc = driver->err_handler->slot_reset(dev);
-	*res = pci_ers_merge_result(*res, rc);
-
-out:
-	eeh_pcid_put(dev);
-out_no_dev:
-	device_unlock(&dev->dev);
-	return NULL;
+	if (!driver->err_handler->slot_reset || !edev->in_error)
+		return PCI_ERS_RESULT_NONE;
+	eeh_edev_info(edev, "Invoking %s->slot_reset()", driver->name);
+	return driver->err_handler->slot_reset(edev->pdev);
 }
 
 static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
@@ -378,84 +413,52 @@ static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
 
 /**
  * eeh_report_resume - Tell device to resume normal operations
- * @data: eeh device
- * @userdata: return value
+ * @edev: eeh device
+ * @driver: device's PCI driver
  *
  * This routine must be called to notify the device driver that it
  * could resume so that the device driver can do some initialization
  * to make the recovered device work again.
  */
-static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
+static enum pci_ers_result eeh_report_resume(struct eeh_dev *edev,
+					     struct pci_driver *driver)
 {
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	bool was_in_error;
-	struct pci_driver *driver;
+	if (!driver->err_handler->resume || !edev->in_error)
+		return PCI_ERS_RESULT_NONE;
 
-	if (!eeh_edev_actionable(edev))
-		return NULL;
-
-	device_lock(&dev->dev);
-
-	driver = eeh_pcid_get(dev);
-	if (!driver) goto out_no_dev;
-
-	was_in_error = edev->in_error;
-	edev->in_error = false;
+	eeh_edev_info(edev, "Invoking %s->resume()", driver->name);
+	driver->err_handler->resume(edev->pdev);
 
-	if (!driver->err_handler ||
-	    !driver->err_handler->resume ||
-	    (edev->mode & EEH_DEV_NO_HANDLER) || !was_in_error) {
-		goto out;
-	}
-
-	driver->err_handler->resume(dev);
-
-	pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED);
-out:
-	eeh_pcid_put(dev);
+	pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_RECOVERED);
 #ifdef CONFIG_PCI_IOV
 	if (eeh_ops->notify_resume && eeh_dev_to_pdn(edev))
 		eeh_ops->notify_resume(eeh_dev_to_pdn(edev));
 #endif
-out_no_dev:
-	device_unlock(&dev->dev);
-	return NULL;
+	return PCI_ERS_RESULT_NONE;
 }
 
 /**
  * eeh_report_failure - Tell device driver that device is dead.
- * @data: eeh device
- * @userdata: return value
+ * @edev: eeh device
+ * @driver: device's PCI driver
  *
  * This informs the device driver that the device is permanently
  * dead, and that no further recovery attempts will be made on it.
  */
-static void *eeh_report_failure(struct eeh_dev *edev, void *userdata)
+static enum pci_ers_result eeh_report_failure(struct eeh_dev *edev,
+					      struct pci_driver *driver)
 {
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	struct pci_driver *driver;
-
-	if (!eeh_edev_actionable(edev))
-		return NULL;
-
-	device_lock(&dev->dev);
-	dev->error_state = pci_channel_io_perm_failure;
-
-	driver = eeh_pcid_get(dev);
-	if (!driver) goto out_no_dev;
+	enum pci_ers_result rc;
 
-	if (!driver->err_handler ||
-	    !driver->err_handler->error_detected)
-		goto out;
+	if (!driver->err_handler->error_detected)
+		return PCI_ERS_RESULT_NONE;
 
-	driver->err_handler->error_detected(dev, pci_channel_io_perm_failure);
+	eeh_edev_info(edev, "Invoking %s->error_detected(permanent failure)",
+		     driver->name);
+	rc = driver->err_handler->error_detected(edev->pdev, pci_channel_io_perm_failure);
 
-	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
-out:
-	eeh_pcid_put(dev);
-out_no_dev:
-	device_unlock(&dev->dev);
-	return NULL;
+	pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_DISCONNECT);
+	return rc;
 }
 
 static void *eeh_add_virt_device(void *data, void *userdata)
@@ -817,7 +820,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	pr_info("EEH: Notify device drivers to shutdown\n");
 	eeh_set_channel_state(pe, pci_channel_io_frozen);
 	eeh_set_irq_state(pe, false);
-	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
+	eeh_pe_report("error_detected(IO frozen)", pe,
+		      eeh_report_error, &result);
 	if ((pe->type & EEH_PE_PHB) &&
 	    result != PCI_ERS_RESULT_NONE &&
 	    result != PCI_ERS_RESULT_NEED_RESET)
@@ -864,7 +868,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 			result = PCI_ERS_RESULT_NEED_RESET;
 		} else {
 			pr_info("EEH: Notify device drivers to resume I/O\n");
-			eeh_pe_dev_traverse(pe, eeh_report_mmio_enabled, &result);
+			eeh_pe_report("mmio_enabled", pe,
+				      eeh_report_mmio_enabled, &result);
 		}
 	}
 
@@ -909,7 +914,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		result = PCI_ERS_RESULT_NONE;
 		eeh_set_channel_state(pe, pci_channel_io_normal);
 		eeh_set_irq_state(pe, true);
-		eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
+		eeh_pe_report("slot_reset", pe, eeh_report_reset, &result);
 	}
 
 	/* All devices should claim they have recovered by now. */
@@ -932,11 +937,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	pr_info("EEH: Notify device driver to resume\n");
 	eeh_set_channel_state(pe, pci_channel_io_normal);
 	eeh_set_irq_state(pe, true);
-	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
-
-	eeh_for_each_pe(pe, tmp_pe)
-		eeh_pe_for_each_dev(tmp_pe, edev, tmp)
+	eeh_pe_report("resume", pe, eeh_report_resume, NULL);
+	eeh_for_each_pe(pe, tmp_pe) {
+		eeh_pe_for_each_dev(tmp_pe, edev, tmp) {
 			edev->mode &= ~EEH_DEV_NO_HANDLER;
+			edev->in_error = false;
+		}
+	}
 
 	pr_info("EEH: Recovery successful.\n");
 	goto final;
@@ -956,7 +963,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	/* Notify all devices that they're about to go down. */
 	eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 	eeh_set_irq_state(pe, false);
-	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
+	eeh_pe_report("error_detected(permanent failure)", pe,
+		      eeh_report_failure, NULL);
 
 	/* Mark the PE to be removed permanently */
 	eeh_pe_state_mark(pe, EEH_PE_REMOVED);
@@ -1066,8 +1074,8 @@ void eeh_handle_special_event(void)
 				/* Notify all devices to be down */
 				eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
 				eeh_set_channel_state(pe, pci_channel_io_perm_failure);
-				eeh_pe_dev_traverse(pe,
-					eeh_report_failure, NULL);
+				eeh_pe_report("error_detected(permanent failure)",
+					       pe, eeh_report_failure, NULL);
 				bus = eeh_pe_bus_get(phb_pe);
 				if (!bus) {
 					pr_err("%s: Cannot find PCI bus for "
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling
  2018-05-25  3:11 ` [PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling Sam Bobroff
@ 2018-06-01 15:40   ` Michael Ellerman
  2018-06-04  1:28     ` Sam Bobroff
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Ellerman @ 2018-06-01 15:40 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

Sam Bobroff <sbobroff@linux.ibm.com> writes:

> As EEH event handling progresses, a cumulative result of type
> pci_ers_result is built up by (some of) the eeh_report_*() functions
> using either:
> 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> or:
> 	if ((*res == PCI_ERS_RESULT_NONE) ||
> 	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> 	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> 	    rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> (Where *res is the accumulator.)
>
> However, the intent is not immediately clear and the result in some
> situations is order dependent.
>
> Address this by assigning a priority to each result value, and always
> merging to the highest priority. This renders the intent clear, and
> provides a stable value for all orderings.
>
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> ---
> ====== v1 -> v2: ======
>
> * Added the value, and missing newline, to some WARN()s.
> * Improved name of merge_result() to pci_ers_merge_result().
> * Adjusted the result priorities so that unknown doesn't overlap with _NONE.

These === markers seem to have confused patchwork, they ended up in the
patch, and then git put them in the changelog.

http://patchwork.ozlabs.org/patch/920194/

The usual format is just something like:

v2 - Added the value, and missing newline, to some WARN()s.
   - Improved name of merge_result() to pci_ers_merge_result().
   - Adjusted the result priorities so that unknown doesn't overlap with _NONE.

cheers

>  arch/powerpc/kernel/eeh_driver.c | 36 ++++++++++++++++++++++++++----------
>  1 file changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index 188d15c4fe3a..2d3cac584899 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -39,6 +39,29 @@ struct eeh_rmv_data {
>  	int removed;
>  };
>  
> +static int eeh_result_priority(enum pci_ers_result result)
> +{
> +	switch (result) {
> +	case PCI_ERS_RESULT_NONE: return 1;
> +	case PCI_ERS_RESULT_NO_AER_DRIVER: return 2;
> +	case PCI_ERS_RESULT_RECOVERED: return 3;
> +	case PCI_ERS_RESULT_CAN_RECOVER: return 4;
> +	case PCI_ERS_RESULT_DISCONNECT: return 5;
> +	case PCI_ERS_RESULT_NEED_RESET: return 6;
> +	default:
> +		WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", (int)result);
> +		return 0;
> +	}
> +};
> +
> +static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
> +						enum pci_ers_result new)
> +{
> +	if (eeh_result_priority(new) > eeh_result_priority(old))
> +		return new;
> +	return old;
> +}
> +
>  /**
>   * eeh_pcid_get - Get the PCI device driver
>   * @pdev: PCI device
> @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
>  
>  	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
>  
> -	/* A driver that needs a reset trumps all others */
> -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> +	*res = pci_ers_merge_result(*res, rc);
>  
>  	edev->in_error = true;
>  	pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
> @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata)
>  
>  	rc = driver->err_handler->mmio_enabled(dev);
>  
> -	/* A driver that needs a reset trumps all others */
> -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> +	*res = pci_ers_merge_result(*res, rc);
>  
>  out:
>  	eeh_pcid_put(dev);
> @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
>  		goto out;
>  
>  	rc = driver->err_handler->slot_reset(dev);
> -	if ((*res == PCI_ERS_RESULT_NONE) ||
> -	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> -	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> -	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> +	*res = pci_ers_merge_result(*res, rc);
>  
>  out:
>  	eeh_pcid_put(dev);
> -- 
> 2.16.1.74.g9b0b1f47b

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling
  2018-06-01 15:40   ` Michael Ellerman
@ 2018-06-04  1:28     ` Sam Bobroff
  2018-06-04  9:43       ` Michael Ellerman
  0 siblings, 1 reply; 18+ messages in thread
From: Sam Bobroff @ 2018-06-04  1:28 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 4454 bytes --]

On Sat, Jun 02, 2018 at 01:40:46AM +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobroff@linux.ibm.com> writes:
> 
> > As EEH event handling progresses, a cumulative result of type
> > pci_ers_result is built up by (some of) the eeh_report_*() functions
> > using either:
> > 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > or:
> > 	if ((*res == PCI_ERS_RESULT_NONE) ||
> > 	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> > 	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> > 	    rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > (Where *res is the accumulator.)
> >
> > However, the intent is not immediately clear and the result in some
> > situations is order dependent.
> >
> > Address this by assigning a priority to each result value, and always
> > merging to the highest priority. This renders the intent clear, and
> > provides a stable value for all orderings.
> >
> > Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> > ---
> > ====== v1 -> v2: ======
> >
> > * Added the value, and missing newline, to some WARN()s.
> > * Improved name of merge_result() to pci_ers_merge_result().
> > * Adjusted the result priorities so that unknown doesn't overlap with _NONE.
> 
> These === markers seem to have confused patchwork, they ended up in the
> patch, and then git put them in the changelog.
> 
> http://patchwork.ozlabs.org/patch/920194/
> 
> The usual format is just something like:
> 
> v2 - Added the value, and missing newline, to some WARN()s.
>    - Improved name of merge_result() to pci_ers_merge_result().
>    - Adjusted the result priorities so that unknown doesn't overlap with _NONE.
> 
> cheers

Oh! I'll change it!

Sam.

> 
> >  arch/powerpc/kernel/eeh_driver.c | 36 ++++++++++++++++++++++++++----------
> >  1 file changed, 26 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> > index 188d15c4fe3a..2d3cac584899 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -39,6 +39,29 @@ struct eeh_rmv_data {
> >  	int removed;
> >  };
> >  
> > +static int eeh_result_priority(enum pci_ers_result result)
> > +{
> > +	switch (result) {
> > +	case PCI_ERS_RESULT_NONE: return 1;
> > +	case PCI_ERS_RESULT_NO_AER_DRIVER: return 2;
> > +	case PCI_ERS_RESULT_RECOVERED: return 3;
> > +	case PCI_ERS_RESULT_CAN_RECOVER: return 4;
> > +	case PCI_ERS_RESULT_DISCONNECT: return 5;
> > +	case PCI_ERS_RESULT_NEED_RESET: return 6;
> > +	default:
> > +		WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", (int)result);
> > +		return 0;
> > +	}
> > +};
> > +
> > +static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
> > +						enum pci_ers_result new)
> > +{
> > +	if (eeh_result_priority(new) > eeh_result_priority(old))
> > +		return new;
> > +	return old;
> > +}
> > +
> >  /**
> >   * eeh_pcid_get - Get the PCI device driver
> >   * @pdev: PCI device
> > @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
> >  
> >  	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
> >  
> > -	/* A driver that needs a reset trumps all others */
> > -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > +	*res = pci_ers_merge_result(*res, rc);
> >  
> >  	edev->in_error = true;
> >  	pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
> > @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata)
> >  
> >  	rc = driver->err_handler->mmio_enabled(dev);
> >  
> > -	/* A driver that needs a reset trumps all others */
> > -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > +	*res = pci_ers_merge_result(*res, rc);
> >  
> >  out:
> >  	eeh_pcid_put(dev);
> > @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
> >  		goto out;
> >  
> >  	rc = driver->err_handler->slot_reset(dev);
> > -	if ((*res == PCI_ERS_RESULT_NONE) ||
> > -	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> > -	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> > -	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > +	*res = pci_ers_merge_result(*res, rc);
> >  
> >  out:
> >  	eeh_pcid_put(dev);
> > -- 
> > 2.16.1.74.g9b0b1f47b
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling
  2018-06-04  1:28     ` Sam Bobroff
@ 2018-06-04  9:43       ` Michael Ellerman
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Ellerman @ 2018-06-04  9:43 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: linuxppc-dev

Sam Bobroff <sbobroff@linux.ibm.com> writes:

> On Sat, Jun 02, 2018 at 01:40:46AM +1000, Michael Ellerman wrote:
>> Sam Bobroff <sbobroff@linux.ibm.com> writes:
>> 
>> > As EEH event handling progresses, a cumulative result of type
>> > pci_ers_result is built up by (some of) the eeh_report_*() functions
>> > using either:
>> > 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
>> > 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
>> > or:
>> > 	if ((*res == PCI_ERS_RESULT_NONE) ||
>> > 	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
>> > 	if (*res == PCI_ERS_RESULT_DISCONNECT &&
>> > 	    rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
>> > (Where *res is the accumulator.)
>> >
>> > However, the intent is not immediately clear and the result in some
>> > situations is order dependent.
>> >
>> > Address this by assigning a priority to each result value, and always
>> > merging to the highest priority. This renders the intent clear, and
>> > provides a stable value for all orderings.
>> >
>> > Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
>> > ---
>> > ====== v1 -> v2: ======
>> >
>> > * Added the value, and missing newline, to some WARN()s.
>> > * Improved name of merge_result() to pci_ers_merge_result().
>> > * Adjusted the result priorities so that unknown doesn't overlap with _NONE.
>> 
>> These === markers seem to have confused patchwork, they ended up in the
>> patch, and then git put them in the changelog.
>> 
>> http://patchwork.ozlabs.org/patch/920194/
>> 
>> The usual format is just something like:
>> 
>> v2 - Added the value, and missing newline, to some WARN()s.
>>    - Improved name of merge_result() to pci_ers_merge_result().
>>    - Adjusted the result priorities so that unknown doesn't overlap with _NONE.
>> 
>> cheers
>
> Oh! I'll change it!

Thanks.

cheers

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [v2, 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line
  2018-05-25  3:11 ` [PATCH v2 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line Sam Bobroff
@ 2018-06-04 14:11   ` Michael Ellerman
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Ellerman @ 2018-06-04 14:11 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

On Fri, 2018-05-25 at 03:11:28 UTC, Sam Bobroff wrote:
> The current failure message includes the number of failures that have
> occurred in the last hour (for a device) but it does not indicate
> how many failures will be tolerated before the device is permanently
> disabled.
> 
> Include the limit (eeh_max_freezes) to make this less surprising when
> it happens.
> 
> Also remove the embedded newline from the existing message to make it
> easier to grep for.
> 
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/796b9f5b317a46d1b744f661c38a62

cheers

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-06-04 14:11 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-25  3:11 [PATCH v2 00/13] EEH refactoring 2 Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line Sam Bobroff
2018-06-04 14:11   ` [v2, " Michael Ellerman
2018-05-25  3:11 ` [PATCH v2 02/13] powerpc/eeh: Add final message for successful recovery Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 03/13] powerpc/eeh: Fix use-after-release of EEH driver Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 04/13] powerpc/eeh: Remove unused eeh_pcid_name() Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 05/13] powerpc/eeh: Strengthen types of eeh traversal functions Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 06/13] powerpc/eeh: Add message when PE processing at parent Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling Sam Bobroff
2018-06-01 15:40   ` Michael Ellerman
2018-06-04  1:28     ` Sam Bobroff
2018-06-04  9:43       ` Michael Ellerman
2018-05-25  3:11 ` [PATCH v2 08/13] powerpc/eeh: Introduce eeh_for_each_pe() Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 09/13] powerpc/eeh: Introduce eeh_edev_actionable() Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 10/13] powerpc/eeh: Introduce eeh_set_channel_state() Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 11/13] powerpc/eeh: Introduce eeh_set_irq_state() Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 12/13] powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER Sam Bobroff
2018-05-25  3:11 ` [PATCH v2 13/13] powerpc/eeh: Refactor report functions Sam Bobroff

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.