All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] EEH refactoring 2
@ 2018-05-02  6:34 Sam Bobroff
  2018-05-02  6:34 ` [PATCH 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line Sam Bobroff
                   ` (12 more replies)
  0 siblings, 13 replies; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:34 UTC (permalink / raw)
  To: linuxppc-dev

Hello everyone,

Here is a second, somewhat deeper, set of cleanups for the EEH code
(mostly eeh_drver.c).

These changes are not intended to significantly alter the actual processing,
but rather to improve the readability and maintainability of the code. They are
subjective by nature so I would appreciate comments and suggestions.

The earlier changes are mostly to support the last patch, where we're finally
able to use a common infrastructure for the reporting functions (basically
wrappers around the driver's handlers). This allows removal of a fair bit of
code, and the easy addition of some useful messaging which should make future
maintenance easier (as an example, a recent fix in this area "powerpc/eeh: Fix
race with driver un/bind" would have required adding two lines rather than
42+/26-).

Cheers,
Sam.

Sam Bobroff (13):
  powerpc/eeh: Add eeh_max_freezes to initial EEH log line
  powerpc/eeh: Add final message for successful recovery
  powerpc/eeh: Fix use-after-release of EEH driver
  powerpc/eeh: Remove unused eeh_pcid_name()
  powerpc/eeh: Strengthen types of eeh traversal functions
  powerpc/eeh: Add message when PE processing at parent
  powerpc/eeh: Clean up pci_ers_result handling
  powerpc/eeh: Introduce eeh_for_each_pe()
  powerpc/eeh: Introduce eeh_edev_actionable()
  powerpc/eeh: Introduce eeh_set_channel_state()
  powerpc/eeh: Introduce eeh_set_irq_state()
  powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER
  powerpc/eeh: Refactor report functions

 arch/powerpc/include/asm/eeh.h   |  11 +-
 arch/powerpc/kernel/eeh.c        |  19 +-
 arch/powerpc/kernel/eeh_driver.c | 468 ++++++++++++++++++++-------------------
 arch/powerpc/kernel/eeh_pe.c     |  26 +--
 4 files changed, 268 insertions(+), 256 deletions(-)

-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
@ 2018-05-02  6:34 ` Sam Bobroff
  2018-05-04  5:46   ` Russell Currey
  2018-05-02  6:35 ` [PATCH 02/13] powerpc/eeh: Add final message for successful recovery Sam Bobroff
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:34 UTC (permalink / raw)
  To: linuxppc-dev

The current failure message includes the number of failures that have
occurred in the last hour (for a device) but it does not indicate
how many failures will be tolerated before the device is permanently
disabled.

Include the limit (eeh_max_freezes) to make this less surprising when
it happens.

Also remove the embedded newline from the existing message to make it
easier to grep for.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index b8a329f04814..56a60b9eb397 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -778,14 +778,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	eeh_pe_update_time_stamp(pe);
 	pe->freeze_count++;
 	if (pe->freeze_count > eeh_max_freezes) {
-		pr_err("EEH: PHB#%x-PE#%x has failed %d times in the\n"
-		       "last hour and has been permanently disabled.\n",
+		pr_err("EEH: PHB#%x-PE#%x has failed %d times in the last hour and has been permanently disabled.\n",
 		       pe->phb->global_number, pe->addr,
 		       pe->freeze_count);
 		goto hard_fail;
 	}
-	pr_warn("EEH: This PCI device has failed %d times in the last hour\n",
-		pe->freeze_count);
+	pr_warn("EEH: This PCI device has failed %d times in the last hour and will be permanently disabled after %d failures.\n",
+		pe->freeze_count, eeh_max_freezes);
 
 	/* Walk the various device drivers attached to this slot through
 	 * a reset sequence, giving each an opportunity to do what it needs
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/13] powerpc/eeh: Add final message for successful recovery
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
  2018-05-02  6:34 ` [PATCH 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line Sam Bobroff
@ 2018-05-02  6:35 ` Sam Bobroff
  2018-05-04  2:55   ` Michael Ellerman
  2018-05-02  6:35 ` [PATCH 03/13] powerpc/eeh: Fix use-after-release of EEH driver Sam Bobroff
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:35 UTC (permalink / raw)
  To: linuxppc-dev

Add a single log line at the end of successful EEH recovery, so that
it's clear that event processing has finished.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 56a60b9eb397..07e0a42035ce 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	pr_info("EEH: Notify device driver to resume\n");
 	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
+	pr_info("EEH: Recovery successful.\n");
 	goto final;
 
 hard_fail:
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/13] powerpc/eeh: Fix use-after-release of EEH driver
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
  2018-05-02  6:34 ` [PATCH 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line Sam Bobroff
  2018-05-02  6:35 ` [PATCH 02/13] powerpc/eeh: Add final message for successful recovery Sam Bobroff
@ 2018-05-02  6:35 ` Sam Bobroff
  2018-05-04  2:56   ` Michael Ellerman
  2018-05-02  6:35 ` [PATCH 04/13] powerpc/eeh: Remove unused eeh_pcid_name() Sam Bobroff
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:35 UTC (permalink / raw)
  To: linuxppc-dev

Correct two cases where eeh_pcid_get() is used to reference the driver's
module but the reference is dropped before the driver pointer is used.

In eeh_rmv_device() also refactor a little so that only two calls to
eeh_pcid_put() are needed, rather than three and the reference isn't
taken at all if it wasn't needed.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 07e0a42035ce..54333f6c9d67 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -458,9 +458,11 @@ static void *eeh_add_virt_device(void *data, void *userdata)
 
 	driver = eeh_pcid_get(dev);
 	if (driver) {
-		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (driver->err_handler) {
+			eeh_pcid_put(dev);
 			return NULL;
+		}
+		eeh_pcid_put(dev);
 	}
 
 #ifdef CONFIG_PCI_IOV
@@ -497,17 +499,19 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	if (eeh_dev_removed(edev))
 		return NULL;
 
-	driver = eeh_pcid_get(dev);
-	if (driver) {
-		eeh_pcid_put(dev);
-		if (removed &&
-		    eeh_pe_passed(edev->pe))
-			return NULL;
-		if (removed &&
-		    driver->err_handler &&
-		    driver->err_handler->error_detected &&
-		    driver->err_handler->slot_reset)
+	if (removed) {
+		if (eeh_pe_passed(edev->pe))
 			return NULL;
+		driver = eeh_pcid_get(dev);
+		if (driver) {
+			if (driver->err_handler &&
+			    driver->err_handler->error_detected &&
+			    driver->err_handler->slot_reset) {
+				eeh_pcid_put(dev);
+				return NULL;
+			}
+			eeh_pcid_put(dev);
+		}
 	}
 
 	/* Remove it from PCI subsystem */
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/13] powerpc/eeh: Remove unused eeh_pcid_name()
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
                   ` (2 preceding siblings ...)
  2018-05-02  6:35 ` [PATCH 03/13] powerpc/eeh: Fix use-after-release of EEH driver Sam Bobroff
@ 2018-05-02  6:35 ` Sam Bobroff
  2018-05-04  6:29   ` Russell Currey
  2018-05-02  6:35 ` [PATCH 05/13] powerpc/eeh: Strengthen types of eeh traversal functions Sam Bobroff
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:35 UTC (permalink / raw)
  To: linuxppc-dev

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 54333f6c9d67..ca9a73fe9cc5 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -39,20 +39,6 @@ struct eeh_rmv_data {
 	int removed;
 };
 
-/**
- * eeh_pcid_name - Retrieve name of PCI device driver
- * @pdev: PCI device
- *
- * This routine is used to retrieve the name of PCI device driver
- * if that's valid.
- */
-static inline const char *eeh_pcid_name(struct pci_dev *pdev)
-{
-	if (pdev && pdev->dev.driver)
-		return pdev->dev.driver->name;
-	return "";
-}
-
 /**
  * eeh_pcid_get - Get the PCI device driver
  * @pdev: PCI device
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/13] powerpc/eeh: Strengthen types of eeh traversal functions
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
                   ` (3 preceding siblings ...)
  2018-05-02  6:35 ` [PATCH 04/13] powerpc/eeh: Remove unused eeh_pcid_name() Sam Bobroff
@ 2018-05-02  6:35 ` Sam Bobroff
  2018-05-04  6:32   ` Russell Currey
  2018-05-02  6:35 ` [PATCH 06/13] powerpc/eeh: Add message when PE processing at parent Sam Bobroff
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:35 UTC (permalink / raw)
  To: linuxppc-dev

The traversal functions eeh_pe_traverse() and eeh_pe_dev_traverse()
both provide their first argument as void * but every single user casts
it to the expected type.

Change the type of the first parameter from void * to the appropriate
type, and clean up all uses.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |  7 ++++---
 arch/powerpc/kernel/eeh.c        | 13 +++++--------
 arch/powerpc/kernel/eeh_driver.c | 30 ++++++++++--------------------
 arch/powerpc/kernel/eeh_pe.c     | 19 +++++++------------
 4 files changed, 26 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c2266ca61853..f02e0400e6f2 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -262,7 +262,8 @@ static inline bool eeh_state_active(int state)
 	== (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
 }
 
-typedef void *(*eeh_traverse_func)(void *data, void *flag);
+typedef void *(*eeh_edev_traverse_func)(struct eeh_dev *edev, void *flag);
+typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, void *flag);
 void eeh_set_pe_aux_size(int size);
 int eeh_phb_pe_create(struct pci_controller *phb);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
@@ -272,9 +273,9 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev);
 int eeh_rmv_from_parent_pe(struct eeh_dev *edev);
 void eeh_pe_update_time_stamp(struct eeh_pe *pe);
 void *eeh_pe_traverse(struct eeh_pe *root,
-		eeh_traverse_func fn, void *flag);
+		      eeh_pe_traverse_func fn, void *flag);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
-		eeh_traverse_func fn, void *flag);
+			  eeh_edev_traverse_func fn, void *flag);
 void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index bc640e4c5ca5..f82dade4fb9a 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -263,9 +263,8 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 	return n;
 }
 
-static void *eeh_dump_pe_log(void *data, void *flag)
+static void *eeh_dump_pe_log(struct eeh_pe *pe, void *flag)
 {
-	struct eeh_pe *pe = data;
 	struct eeh_dev *edev, *tmp;
 	size_t *plen = flag;
 
@@ -686,9 +685,9 @@ int eeh_pci_enable(struct eeh_pe *pe, int function)
 	return rc;
 }
 
-static void *eeh_disable_and_save_dev_state(void *data, void *userdata)
+static void *eeh_disable_and_save_dev_state(struct eeh_dev *edev,
+					    void *userdata)
 {
-	struct eeh_dev *edev = data;
 	struct pci_dev *pdev = eeh_dev_to_pci_dev(edev);
 	struct pci_dev *dev = userdata;
 
@@ -714,9 +713,8 @@ static void *eeh_disable_and_save_dev_state(void *data, void *userdata)
 	return NULL;
 }
 
-static void *eeh_restore_dev_state(void *data, void *userdata)
+static void *eeh_restore_dev_state(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = data;
 	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 	struct pci_dev *pdev = eeh_dev_to_pci_dev(edev);
 	struct pci_dev *dev = userdata;
@@ -856,11 +854,10 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat
  * the indicated device and its children so that the bunch of the
  * devices could be reset properly.
  */
-static void *eeh_set_dev_freset(void *data, void *flag)
+static void *eeh_set_dev_freset(struct eeh_dev *edev, void *flag)
 {
 	struct pci_dev *dev;
 	unsigned int *freset = (unsigned int *)flag;
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 
 	dev = eeh_dev_to_pci_dev(edev);
 	if (dev)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index ca9a73fe9cc5..188d15c4fe3a 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -149,9 +149,8 @@ static bool eeh_dev_removed(struct eeh_dev *edev)
 	return false;
 }
 
-static void *eeh_dev_save_state(void *data, void *userdata)
+static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = data;
 	struct pci_dev *pdev;
 
 	if (!edev)
@@ -184,9 +183,8 @@ static void *eeh_dev_save_state(void *data, void *userdata)
  * merge the device driver responses. Cumulative response
  * passed back in "userdata".
  */
-static void *eeh_report_error(void *data, void *userdata)
+static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
@@ -231,9 +229,8 @@ static void *eeh_report_error(void *data, void *userdata)
  * are now enabled. Collects up and merges the device driver responses.
  * Cumulative response passed back in "userdata".
  */
-static void *eeh_report_mmio_enabled(void *data, void *userdata)
+static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
@@ -273,9 +270,8 @@ static void *eeh_report_mmio_enabled(void *data, void *userdata)
  * some actions, usually to save data the driver needs so that the
  * driver can work again while the device is recovered.
  */
-static void *eeh_report_reset(void *data, void *userdata)
+static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
@@ -310,9 +306,8 @@ static void *eeh_report_reset(void *data, void *userdata)
 	return NULL;
 }
 
-static void *eeh_dev_restore_state(void *data, void *userdata)
+static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = data;
 	struct pci_dev *pdev;
 
 	if (!edev)
@@ -348,9 +343,8 @@ static void *eeh_dev_restore_state(void *data, void *userdata)
  * could resume so that the device driver can do some initialization
  * to make the recovered device work again.
  */
-static void *eeh_report_resume(void *data, void *userdata)
+static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	bool was_in_error;
 	struct pci_driver *driver;
@@ -397,9 +391,8 @@ static void *eeh_report_resume(void *data, void *userdata)
  * This informs the device driver that the device is permanently
  * dead, and that no further recovery attempts will be made on it.
  */
-static void *eeh_report_failure(void *data, void *userdata)
+static void *eeh_report_failure(struct eeh_dev *edev, void *userdata)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	struct pci_driver *driver;
 
@@ -457,10 +450,9 @@ static void *eeh_add_virt_device(void *data, void *userdata)
 	return NULL;
 }
 
-static void *eeh_rmv_device(void *data, void *userdata)
+static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata)
 {
 	struct pci_driver *driver;
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	struct eeh_rmv_data *rmv_data = (struct eeh_rmv_data *)userdata;
 	int *removed = rmv_data ? &rmv_data->removed : NULL;
@@ -532,9 +524,8 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	return NULL;
 }
 
-static void *eeh_pe_detach_dev(void *data, void *userdata)
+static void *eeh_pe_detach_dev(struct eeh_pe *pe, void *userdata)
 {
-	struct eeh_pe *pe = (struct eeh_pe *)data;
 	struct eeh_dev *edev, *tmp;
 
 	eeh_pe_for_each_dev(pe, edev, tmp) {
@@ -555,9 +546,8 @@ static void *eeh_pe_detach_dev(void *data, void *userdata)
  * PE reset (for 3 times), we try to clear the frozen state
  * for 3 times as well.
  */
-static void *__eeh_clear_pe_frozen_state(void *data, void *flag)
+static void *__eeh_clear_pe_frozen_state(struct eeh_pe *pe, void *flag)
 {
-	struct eeh_pe *pe = (struct eeh_pe *)data;
 	bool clear_sw_state = *(bool *)flag;
 	int i, rc = 1;
 
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index ee5a67d57aab..38a4bcd8ed13 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -173,7 +173,7 @@ static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe,
  * to be traversed.
  */
 void *eeh_pe_traverse(struct eeh_pe *root,
-		      eeh_traverse_func fn, void *flag)
+		      eeh_pe_traverse_func fn, void *flag)
 {
 	struct eeh_pe *pe;
 	void *ret;
@@ -196,7 +196,7 @@ void *eeh_pe_traverse(struct eeh_pe *root,
  * PE and its child PEs.
  */
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
-		eeh_traverse_func fn, void *flag)
+			  eeh_edev_traverse_func fn, void *flag)
 {
 	struct eeh_pe *pe;
 	struct eeh_dev *edev, *tmp;
@@ -235,9 +235,8 @@ struct eeh_pe_get_flag {
 	int config_addr;
 };
 
-static void *__eeh_pe_get(void *data, void *flag)
+static void *__eeh_pe_get(struct eeh_pe *pe, void *flag)
 {
-	struct eeh_pe *pe = (struct eeh_pe *)data;
 	struct eeh_pe_get_flag *tmp = (struct eeh_pe_get_flag *) flag;
 
 	/* Unexpected PHB PE */
@@ -551,9 +550,8 @@ void eeh_pe_update_time_stamp(struct eeh_pe *pe)
  * PE. Also, the associated PCI devices will be put into IO frozen
  * state as well.
  */
-static void *__eeh_pe_state_mark(void *data, void *flag)
+static void *__eeh_pe_state_mark(struct eeh_pe *pe, void *flag)
 {
-	struct eeh_pe *pe = (struct eeh_pe *)data;
 	int state = *((int *)flag);
 	struct eeh_dev *edev, *tmp;
 	struct pci_dev *pdev;
@@ -595,9 +593,8 @@ void eeh_pe_state_mark(struct eeh_pe *pe, int state)
 }
 EXPORT_SYMBOL_GPL(eeh_pe_state_mark);
 
-static void *__eeh_pe_dev_mode_mark(void *data, void *flag)
+static void *__eeh_pe_dev_mode_mark(struct eeh_dev *edev, void *flag)
 {
-	struct eeh_dev *edev = data;
 	int mode = *((int *)flag);
 
 	edev->mode |= mode;
@@ -625,9 +622,8 @@ void eeh_pe_dev_mode_mark(struct eeh_pe *pe, int mode)
  * given PE. Besides, we also clear the check count of the PE
  * as well.
  */
-static void *__eeh_pe_state_clear(void *data, void *flag)
+static void *__eeh_pe_state_clear(struct eeh_pe *pe, void *flag)
 {
-	struct eeh_pe *pe = (struct eeh_pe *)data;
 	int state = *((int *)flag);
 	struct eeh_dev *edev, *tmp;
 	struct pci_dev *pdev;
@@ -858,9 +854,8 @@ static void eeh_restore_device_bars(struct eeh_dev *edev)
  * the expansion ROM base address, the latency timer, and etc.
  * from the saved values in the device node.
  */
-static void *eeh_restore_one_device_bars(void *data, void *flag)
+static void *eeh_restore_one_device_bars(struct eeh_dev *edev, void *flag)
 {
-	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
 	/* Do special restore for bridges */
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/13] powerpc/eeh: Add message when PE processing at parent
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
                   ` (4 preceding siblings ...)
  2018-05-02  6:35 ` [PATCH 05/13] powerpc/eeh: Strengthen types of eeh traversal functions Sam Bobroff
@ 2018-05-02  6:35 ` Sam Bobroff
  2018-05-04  6:51   ` Russell Currey
  2018-05-02  6:36 ` [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling Sam Bobroff
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:35 UTC (permalink / raw)
  To: linuxppc-dev

To aid debugging, add a message to show when EEH processing for a PE
will be done at the device's parent, rather than directly at the
device.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f82dade4fb9a..1139821a9aec 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -541,8 +541,12 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 
 		/* Frozen parent PE ? */
 		ret = eeh_ops->get_state(parent_pe, NULL);
-		if (ret > 0 && !eeh_state_active(ret))
+		if (ret > 0 && !eeh_state_active(ret)) {
 			pe = parent_pe;
+			pr_err("EEH: Failure of PHB#%x-PE#%x will be handled at parent PHB#%x-PE#%x.\n",
+			       pe->phb->global_number, pe->addr,
+			       pe->phb->global_number, parent_pe->addr);
+		}
 
 		/* Next parent level */
 		parent_pe = parent_pe->parent;
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
                   ` (5 preceding siblings ...)
  2018-05-02  6:35 ` [PATCH 06/13] powerpc/eeh: Add message when PE processing at parent Sam Bobroff
@ 2018-05-02  6:36 ` Sam Bobroff
  2018-05-04  2:58   ` Michael Ellerman
  2018-05-04  6:58   ` Russell Currey
  2018-05-02  6:36 ` [PATCH 08/13] powerpc/eeh: Introduce eeh_for_each_pe() Sam Bobroff
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:36 UTC (permalink / raw)
  To: linuxppc-dev

As EEH event handling progresses, a cumulative result of type
pci_ers_result is built up by (some of) the eeh_report_*() functions
using either:
	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
or:
	if ((*res == PCI_ERS_RESULT_NONE) ||
	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
	if (*res == PCI_ERS_RESULT_DISCONNECT &&
	    rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
(Where *res is the accumulator.)

However, the intent is not immediately clear and the result in some
situations is order dependent.

Address this by assigning a priority to each result value, and always
merging to the highest priority. This renders the intent clear, and
provides a stable value for all orderings.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 188d15c4fe3a..f33dd68a9ca2 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -39,6 +39,29 @@ struct eeh_rmv_data {
 	int removed;
 };
 
+static int eeh_result_priority(enum pci_ers_result result)
+{
+	switch (result) {
+	case PCI_ERS_RESULT_NONE: return 0;
+	case PCI_ERS_RESULT_NO_AER_DRIVER: return 1;
+	case PCI_ERS_RESULT_RECOVERED: return 2;
+	case PCI_ERS_RESULT_CAN_RECOVER: return 3;
+	case PCI_ERS_RESULT_DISCONNECT: return 4;
+	case PCI_ERS_RESULT_NEED_RESET: return 5;
+	default:
+		WARN_ONCE(1, "Unknown pci_ers_result value");
+		return 0;
+	}
+};
+
+static enum pci_ers_result merge_result(enum pci_ers_result old,
+					enum pci_ers_result new)
+{
+	if (eeh_result_priority(new) > eeh_result_priority(old))
+		return new;
+	return old;
+}
+
 /**
  * eeh_pcid_get - Get the PCI device driver
  * @pdev: PCI device
@@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 
 	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
 
-	/* A driver that needs a reset trumps all others */
-	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
+	*res = merge_result(*res, rc);
 
 	edev->in_error = true;
 	pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
@@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata)
 
 	rc = driver->err_handler->mmio_enabled(dev);
 
-	/* A driver that needs a reset trumps all others */
-	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
+	*res = merge_result(*res, rc);
 
 out:
 	eeh_pcid_put(dev);
@@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
 		goto out;
 
 	rc = driver->err_handler->slot_reset(dev);
-	if ((*res == PCI_ERS_RESULT_NONE) ||
-	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
-	if (*res == PCI_ERS_RESULT_DISCONNECT &&
-	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
+	*res = merge_result(*res, rc);
 
 out:
 	eeh_pcid_put(dev);
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/13] powerpc/eeh: Introduce eeh_for_each_pe()
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
                   ` (6 preceding siblings ...)
  2018-05-02  6:36 ` [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling Sam Bobroff
@ 2018-05-02  6:36 ` Sam Bobroff
  2018-05-04  6:59   ` Russell Currey
  2018-05-02  6:36 ` [PATCH 09/13] powerpc/eeh: Introduce eeh_edev_actionable() Sam Bobroff
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:36 UTC (permalink / raw)
  To: linuxppc-dev

Add a for_each-style macro for iterating through PEs without the
boilerplate required by a traversal function. eeh_pe_next() is now
exported, as it is now used directly in place.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h | 4 ++++
 arch/powerpc/kernel/eeh_pe.c   | 7 +++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index f02e0400e6f2..677102baf3cd 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -106,6 +106,9 @@ struct eeh_pe {
 #define eeh_pe_for_each_dev(pe, edev, tmp) \
 		list_for_each_entry_safe(edev, tmp, &pe->edevs, list)
 
+#define eeh_for_each_pe(root, pe) \
+	for (pe = root; pe; pe = eeh_pe_next(pe, root))
+
 static inline bool eeh_pe_passed(struct eeh_pe *pe)
 {
 	return pe ? !!atomic_read(&pe->pass_dev_cnt) : false;
@@ -267,6 +270,7 @@ typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, void *flag);
 void eeh_set_pe_aux_size(int size);
 int eeh_phb_pe_create(struct pci_controller *phb);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
+struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root);
 struct eeh_pe *eeh_pe_get(struct pci_controller *phb,
 			  int pe_no, int config_addr);
 int eeh_add_to_parent_pe(struct eeh_dev *edev);
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 38a4bcd8ed13..1b238ecc553e 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -142,8 +142,7 @@ struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
  * The function is used to retrieve the next PE in the
  * hierarchy PE tree.
  */
-static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe,
-				  struct eeh_pe *root)
+struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root)
 {
 	struct list_head *next = pe->child_list.next;
 
@@ -178,7 +177,7 @@ void *eeh_pe_traverse(struct eeh_pe *root,
 	struct eeh_pe *pe;
 	void *ret;
 
-	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
+	eeh_for_each_pe(root, pe) {
 		ret = fn(pe, flag);
 		if (ret) return ret;
 	}
@@ -209,7 +208,7 @@ void *eeh_pe_dev_traverse(struct eeh_pe *root,
 	}
 
 	/* Traverse root PE */
-	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
+	eeh_for_each_pe(root, pe) {
 		eeh_pe_for_each_dev(pe, edev, tmp) {
 			ret = fn(edev, flag);
 			if (ret)
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/13] powerpc/eeh: Introduce eeh_edev_actionable()
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
                   ` (7 preceding siblings ...)
  2018-05-02  6:36 ` [PATCH 08/13] powerpc/eeh: Introduce eeh_for_each_pe() Sam Bobroff
@ 2018-05-02  6:36 ` Sam Bobroff
  2018-05-02  6:36 ` [PATCH 10/13] powerpc/eeh: Introduce eeh_set_channel_state() Sam Bobroff
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:36 UTC (permalink / raw)
  To: linuxppc-dev

The same test is done in every EEH report function, so factor it out.

Since eeh_dev_removed() needs to be moved higher up in the file,
simplify it a little while we're at it.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index f33dd68a9ca2..9dd98da57c13 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -62,6 +62,17 @@ static enum pci_ers_result merge_result(enum pci_ers_result old,
 	return old;
 }
 
+static bool eeh_dev_removed(struct eeh_dev *edev)
+{
+	return !edev || (edev->mode & EEH_DEV_REMOVED);
+}
+
+static bool eeh_edev_actionable(struct eeh_dev *edev)
+{
+	return (edev->pdev && !eeh_dev_removed(edev) &&
+		!eeh_pe_passed(edev->pe));
+}
+
 /**
  * eeh_pcid_get - Get the PCI device driver
  * @pdev: PCI device
@@ -163,15 +174,6 @@ static void eeh_enable_irq(struct pci_dev *dev)
 	}
 }
 
-static bool eeh_dev_removed(struct eeh_dev *edev)
-{
-	/* EEH device removed ? */
-	if (!edev || (edev->mode & EEH_DEV_REMOVED))
-		return true;
-
-	return false;
-}
-
 static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
 {
 	struct pci_dev *pdev;
@@ -212,7 +214,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+	if (!eeh_edev_actionable(edev))
 		return NULL;
 
 	device_lock(&dev->dev);
@@ -256,7 +258,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata)
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+	if (!eeh_edev_actionable(edev))
 		return NULL;
 
 	device_lock(&dev->dev);
@@ -295,7 +297,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+	if (!eeh_edev_actionable(edev))
 		return NULL;
 
 	device_lock(&dev->dev);
@@ -365,7 +367,7 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
 	bool was_in_error;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+	if (!eeh_edev_actionable(edev))
 		return NULL;
 
 	device_lock(&dev->dev);
@@ -412,7 +414,7 @@ static void *eeh_report_failure(struct eeh_dev *edev, void *userdata)
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+	if (!eeh_edev_actionable(edev))
 		return NULL;
 
 	device_lock(&dev->dev);
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/13] powerpc/eeh: Introduce eeh_set_channel_state()
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
                   ` (8 preceding siblings ...)
  2018-05-02  6:36 ` [PATCH 09/13] powerpc/eeh: Introduce eeh_edev_actionable() Sam Bobroff
@ 2018-05-02  6:36 ` Sam Bobroff
  2018-05-02  6:36 ` [PATCH 11/13] powerpc/eeh: Introduce eeh_set_irq_state() Sam Bobroff
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:36 UTC (permalink / raw)
  To: linuxppc-dev

To ease future refactoring, extract setting of the channel state
from the report functions out into their own functions. This increases
the amount of code that is identical across all of the report
functions.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 9dd98da57c13..f63a01d336ee 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -199,6 +199,17 @@ static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
 	return NULL;
 }
 
+static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s)
+{
+	struct eeh_pe *pe;
+	struct eeh_dev *edev, *tmp;
+
+	eeh_for_each_pe(root, pe)
+		eeh_pe_for_each_dev(pe, edev, tmp)
+			if (eeh_edev_actionable(edev))
+				edev->pdev->error_state = s;
+}
+
 /**
  * eeh_report_error - Report pci error to each device driver
  * @data: eeh device
@@ -218,7 +229,6 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 		return NULL;
 
 	device_lock(&dev->dev);
-	dev->error_state = pci_channel_io_frozen;
 
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
@@ -301,7 +311,6 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
 		return NULL;
 
 	device_lock(&dev->dev);
-	dev->error_state = pci_channel_io_normal;
 
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
@@ -371,7 +380,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
 		return NULL;
 
 	device_lock(&dev->dev);
-	dev->error_state = pci_channel_io_normal;
 
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
@@ -795,6 +803,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * hotplug for this case.
 	 */
 	pr_info("EEH: Notify device drivers to shutdown\n");
+	eeh_set_channel_state(pe, pci_channel_io_frozen);
 	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
 	if ((pe->type & EEH_PE_PHB) &&
 	    result != PCI_ERS_RESULT_NONE &&
@@ -885,6 +894,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		pr_info("EEH: Notify device drivers "
 			"the completion of reset\n");
 		result = PCI_ERS_RESULT_NONE;
+		eeh_set_channel_state(pe, pci_channel_io_normal);
 		eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
 	}
 
@@ -906,6 +916,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 
 	/* Tell all device drivers that they can resume operations */
 	pr_info("EEH: Notify device driver to resume\n");
+	eeh_set_channel_state(pe, pci_channel_io_normal);
 	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
 	pr_info("EEH: Recovery successful.\n");
@@ -924,6 +935,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	eeh_slot_error_detail(pe, EEH_LOG_PERM);
 
 	/* Notify all devices that they're about to go down. */
+	eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
 
 	/* Mark the PE to be removed permanently */
@@ -1033,6 +1045,7 @@ void eeh_handle_special_event(void)
 
 				/* Notify all devices to be down */
 				eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
+				eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 				eeh_pe_dev_traverse(pe,
 					eeh_report_failure, NULL);
 				bus = eeh_pe_bus_get(phb_pe);
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/13] powerpc/eeh: Introduce eeh_set_irq_state()
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
                   ` (9 preceding siblings ...)
  2018-05-02  6:36 ` [PATCH 10/13] powerpc/eeh: Introduce eeh_set_channel_state() Sam Bobroff
@ 2018-05-02  6:36 ` Sam Bobroff
  2018-05-04  3:02   ` Michael Ellerman
  2018-05-02  6:36 ` [PATCH 12/13] powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER Sam Bobroff
  2018-05-02  6:36 ` [PATCH 13/13] powerpc/eeh: Refactor report functions Sam Bobroff
  12 siblings, 1 reply; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:36 UTC (permalink / raw)
  To: linuxppc-dev

To ease future refactoring, extract calls to eeh_enable_irq() and
eeh_disable_irq() from the various report functions. This makes
the report functions initial sequences more similar, as well as making
the IRQ changes visible when reading eeh_handle_normal_event().

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 46 ++++++++++++++++++++++++----------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index f63a01d336ee..b3edd0df04b8 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -118,22 +118,20 @@ static inline void eeh_pcid_put(struct pci_dev *pdev)
  * do real work because EEH should freeze DMA transfers for those PCI
  * devices encountering EEH errors, which includes MSI or MSI-X.
  */
-static void eeh_disable_irq(struct pci_dev *dev)
+static void eeh_disable_irq(struct eeh_dev *edev)
 {
-	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
-
 	/* Don't disable MSI and MSI-X interrupts. They are
 	 * effectively disabled by the DMA Stopped state
 	 * when an EEH error occurs.
 	 */
-	if (dev->msi_enabled || dev->msix_enabled)
+	if (edev->pdev->msi_enabled || edev->pdev->msix_enabled)
 		return;
 
-	if (!irq_has_action(dev->irq))
+	if (!irq_has_action(edev->pdev->irq))
 		return;
 
 	edev->mode |= EEH_DEV_IRQ_DISABLED;
-	disable_irq_nosync(dev->irq);
+	disable_irq_nosync(edev->pdev->irq);
 }
 
 /**
@@ -143,10 +141,8 @@ static void eeh_disable_irq(struct pci_dev *dev)
  * This routine must be called to enable interrupt while failed
  * device could be resumed.
  */
-static void eeh_enable_irq(struct pci_dev *dev)
+static void eeh_enable_irq(struct eeh_dev *edev)
 {
-	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
-
 	if ((edev->mode) & EEH_DEV_IRQ_DISABLED) {
 		edev->mode &= ~EEH_DEV_IRQ_DISABLED;
 		/*
@@ -169,8 +165,8 @@ static void eeh_enable_irq(struct pci_dev *dev)
 		 *
 		 *	tglx
 		 */
-		if (irqd_irq_disabled(irq_get_irq_data(dev->irq)))
-			enable_irq(dev->irq);
+		if (irqd_irq_disabled(irq_get_irq_data(edev->pdev->irq)))
+			enable_irq(edev->pdev->irq);
 	}
 }
 
@@ -210,6 +206,23 @@ static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s)
 				edev->pdev->error_state = s;
 }
 
+static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
+{
+	struct eeh_pe *pe;
+	struct eeh_dev *edev, *tmp;
+
+	eeh_for_each_pe(root, pe)
+		eeh_pe_for_each_dev(pe, edev, tmp)
+			if (eeh_edev_actionable(edev))
+				if (eeh_pcid_get(edev->pdev)) {
+					if (enable)
+						eeh_enable_irq(edev);
+					else
+						eeh_disable_irq(edev);
+					eeh_pcid_put(edev->pdev);
+				}
+}
+
 /**
  * eeh_report_error - Report pci error to each device driver
  * @data: eeh device
@@ -233,8 +246,6 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
 
-	eeh_disable_irq(dev);
-
 	if (!driver->err_handler ||
 	    !driver->err_handler->error_detected)
 		goto out;
@@ -315,8 +326,6 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
 
-	eeh_enable_irq(dev);
-
 	if (!driver->err_handler ||
 	    !driver->err_handler->slot_reset ||
 	    (edev->mode & EEH_DEV_NO_HANDLER) ||
@@ -386,7 +395,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
 
 	was_in_error = edev->in_error;
 	edev->in_error = false;
-	eeh_enable_irq(dev);
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume ||
@@ -431,8 +439,6 @@ static void *eeh_report_failure(struct eeh_dev *edev, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (!driver) goto out_no_dev;
 
-	eeh_disable_irq(dev);
-
 	if (!driver->err_handler ||
 	    !driver->err_handler->error_detected)
 		goto out;
@@ -804,6 +810,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 */
 	pr_info("EEH: Notify device drivers to shutdown\n");
 	eeh_set_channel_state(pe, pci_channel_io_frozen);
+	eeh_set_irq_state(pe, false);
 	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
 	if ((pe->type & EEH_PE_PHB) &&
 	    result != PCI_ERS_RESULT_NONE &&
@@ -895,6 +902,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 			"the completion of reset\n");
 		result = PCI_ERS_RESULT_NONE;
 		eeh_set_channel_state(pe, pci_channel_io_normal);
+		eeh_set_irq_state(pe, true);
 		eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
 	}
 
@@ -917,6 +925,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	/* Tell all device drivers that they can resume operations */
 	pr_info("EEH: Notify device driver to resume\n");
 	eeh_set_channel_state(pe, pci_channel_io_normal);
+	eeh_set_irq_state(pe, true);
 	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
 	pr_info("EEH: Recovery successful.\n");
@@ -936,6 +945,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 
 	/* Notify all devices that they're about to go down. */
 	eeh_set_channel_state(pe, pci_channel_io_perm_failure);
+	eeh_set_irq_state(pe, false);
 	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
 
 	/* Mark the PE to be removed permanently */
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/13] powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
                   ` (10 preceding siblings ...)
  2018-05-02  6:36 ` [PATCH 11/13] powerpc/eeh: Introduce eeh_set_irq_state() Sam Bobroff
@ 2018-05-02  6:36 ` Sam Bobroff
  2018-05-02  6:36 ` [PATCH 13/13] powerpc/eeh: Refactor report functions Sam Bobroff
  12 siblings, 0 replies; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:36 UTC (permalink / raw)
  To: linuxppc-dev

If a device without a driver is recovered via EEH, the flag
EEH_DEV_NO_HANDLER is incorrectly left set on the device after
recovery, because the test in eeh_report_resume() for the existence of
a bound driver is done before the flag is cleared. If a driver is
later bound, and EEH experienced again, some of the drivers EEH
handers are not called.

To correct this, clear the flag unconditionally after EEH processing
is complete.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index b3edd0df04b8..eb4feee81ff4 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -399,7 +399,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume ||
 	    (edev->mode & EEH_DEV_NO_HANDLER) || !was_in_error) {
-		edev->mode &= ~EEH_DEV_NO_HANDLER;
 		goto out;
 	}
 
@@ -774,6 +773,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 {
 	struct pci_bus *bus;
 	struct eeh_dev *edev, *tmp;
+	struct eeh_pe *tmp_pe;
 	int rc = 0;
 	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
 	struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.edev_list), 0};
@@ -928,6 +928,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	eeh_set_irq_state(pe, true);
 	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
+	eeh_for_each_pe(pe, tmp_pe)
+		eeh_pe_for_each_dev(tmp_pe, edev, tmp)
+			edev->mode &= ~EEH_DEV_NO_HANDLER;
+
 	pr_info("EEH: Recovery successful.\n");
 	goto final;
 
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/13] powerpc/eeh: Refactor report functions
  2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
                   ` (11 preceding siblings ...)
  2018-05-02  6:36 ` [PATCH 12/13] powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER Sam Bobroff
@ 2018-05-02  6:36 ` Sam Bobroff
  2018-05-03 13:27   ` Michael Ellerman
  12 siblings, 1 reply; 33+ messages in thread
From: Sam Bobroff @ 2018-05-02  6:36 UTC (permalink / raw)
  To: linuxppc-dev

The EEH report functions now share a fair bit of code around the start
and end of each function.

So factor out as much as possible, and move the traversal into a
custom function. This also allows accurate debug to be generated more
easily.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 289 +++++++++++++++++++--------------------
 1 file changed, 138 insertions(+), 151 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index eb4feee81ff4..1c4336dcf9f5 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -54,6 +54,25 @@ static int eeh_result_priority(enum pci_ers_result result)
 	}
 };
 
+const char *pci_ers_result_name(enum pci_ers_result r)
+{
+	switch (r) {
+	case PCI_ERS_RESULT_NONE: return "none";
+	case PCI_ERS_RESULT_CAN_RECOVER: return "can recover";
+	case PCI_ERS_RESULT_NEED_RESET: return "need reset";
+	case PCI_ERS_RESULT_DISCONNECT: return "disconnect";
+	case PCI_ERS_RESULT_RECOVERED: return "recovered";
+	case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver";
+	default:
+		WARN_ONCE(1, "Unknown result type");
+		return "unknown";
+	}
+};
+
+#define eeh_infoline(EDEV, FMT, ...) \
+pr_info("EEH: PE#%x (PCI %s): " pr_fmt(FMT) "\n", EDEV->pe_config_addr, \
+((EDEV->pdev) ? dev_name(&EDEV->pdev->dev) : "NONE"), ##__VA_ARGS__)
+
 static enum pci_ers_result merge_result(enum pci_ers_result old,
 					enum pci_ers_result new)
 {
@@ -223,123 +242,118 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
 				}
 }
 
+static void eeh_pe_report(const char *name, struct eeh_pe *root,
+			  enum pci_ers_result (*fn)(struct eeh_dev *,
+						    struct pci_driver *),
+			  enum pci_ers_result *result)
+{
+	struct eeh_pe *pe;
+	struct eeh_dev *edev, *tmp;
+	enum pci_ers_result new_result;
+
+	pr_info("EEH: Beginning: '%s'\n", name);
+	eeh_for_each_pe(root, pe) {
+		eeh_pe_for_each_dev(pe, edev, tmp) {
+			device_lock(&edev->pdev->dev);
+			if (eeh_edev_actionable(edev)) {
+				struct pci_driver *driver;
+
+				driver = eeh_pcid_get(edev->pdev);
+
+				if (!driver)
+					eeh_infoline(edev, "no driver");
+				else if (!driver->err_handler)
+					eeh_infoline(edev,
+						     "driver not EEH aware");
+				else if (edev->mode & EEH_DEV_NO_HANDLER)
+					eeh_infoline(edev,
+						     "driver bound too late");
+				else {
+					new_result = fn(edev, driver);
+					eeh_infoline(edev,
+						"%s driver reports: '%s'",
+						driver->name,
+						pci_ers_result_name(new_result));
+					if (result)
+						*result = merge_result(*result,
+								       new_result);
+				}
+				if (driver)
+					eeh_pcid_put(edev->pdev);
+			} else {
+				eeh_infoline(edev, "not actionable (%d,%d,%d)",
+					     !!edev->pdev,
+					     !eeh_dev_removed(edev),
+					     !eeh_pe_passed(edev->pe));
+			}
+			device_unlock(&edev->pdev->dev);
+		}
+	}
+	if (result)
+		pr_info("EEH: Finished:'%s' with aggregate recovery state:'%s'\n",
+			name, pci_ers_result_name(*result));
+	else
+		pr_info("EEH: Finished:'%s'", name);
+}
+
 /**
  * eeh_report_error - Report pci error to each device driver
- * @data: eeh device
- * @userdata: return value
+ * @edev: eeh device
+ * @driver: device's PCI driver
  *
- * Report an EEH error to each device driver, collect up and
- * merge the device driver responses. Cumulative response
- * passed back in "userdata".
+ * Report an EEH error to each device driver.
  */
-static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
+static enum pci_ers_result eeh_report_error(struct eeh_dev *edev,
+			      struct pci_driver *driver)
 {
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	enum pci_ers_result rc, *res = userdata;
-	struct pci_driver *driver;
+	enum pci_ers_result rc;
+	struct pci_dev *dev = edev->pdev;
 
-	if (!eeh_edev_actionable(edev))
-		return NULL;
-
-	device_lock(&dev->dev);
-
-	driver = eeh_pcid_get(dev);
-	if (!driver) goto out_no_dev;
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->error_detected)
-		goto out;
+	if (!driver->err_handler->error_detected)
+		return PCI_ERS_RESULT_NONE;
 
+	eeh_infoline(edev, "Invoking %s->error_detected(IO frozen)", driver->name);
 	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
 
-	*res = merge_result(*res, rc);
-
 	edev->in_error = true;
 	pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
-
-out:
-	eeh_pcid_put(dev);
-out_no_dev:
-	device_unlock(&dev->dev);
-	return NULL;
+	return rc;
 }
 
 /**
  * eeh_report_mmio_enabled - Tell drivers that MMIO has been enabled
- * @data: eeh device
- * @userdata: return value
+ * @edev: eeh device
+ * @driver: device's PCI driver
  *
  * Tells each device driver that IO ports, MMIO and config space I/O
- * are now enabled. Collects up and merges the device driver responses.
- * Cumulative response passed back in "userdata".
+ * are now enabled.
  */
-static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata)
+static enum pci_ers_result eeh_report_mmio_enabled(struct eeh_dev *edev,
+						   struct pci_driver *driver)
 {
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	enum pci_ers_result rc, *res = userdata;
-	struct pci_driver *driver;
-
-	if (!eeh_edev_actionable(edev))
-		return NULL;
-
-	device_lock(&dev->dev);
-	driver = eeh_pcid_get(dev);
-	if (!driver) goto out_no_dev;
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->mmio_enabled ||
-	    (edev->mode & EEH_DEV_NO_HANDLER))
-		goto out;
-
-	rc = driver->err_handler->mmio_enabled(dev);
-
-	*res = merge_result(*res, rc);
-
-out:
-	eeh_pcid_put(dev);
-out_no_dev:
-	device_unlock(&dev->dev);
-	return NULL;
+	if (!driver->err_handler->mmio_enabled)
+		return PCI_ERS_RESULT_NONE;
+	eeh_infoline(edev, "Invoking %s->mmio_enabled()", driver->name);
+	return driver->err_handler->mmio_enabled(edev->pdev);
 }
 
 /**
  * eeh_report_reset - Tell device that slot has been reset
- * @data: eeh device
- * @userdata: return value
+ * @edev: eeh device
+ * @driver: device's PCI driver
  *
  * This routine must be called while EEH tries to reset particular
  * PCI device so that the associated PCI device driver could take
  * some actions, usually to save data the driver needs so that the
  * driver can work again while the device is recovered.
  */
-static void *eeh_report_reset(struct eeh_dev *edev, void *userdata)
+static enum pci_ers_result eeh_report_reset(struct eeh_dev *edev,
+					    struct pci_driver *driver)
 {
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	enum pci_ers_result rc, *res = userdata;
-	struct pci_driver *driver;
-
-	if (!eeh_edev_actionable(edev))
-		return NULL;
-
-	device_lock(&dev->dev);
-
-	driver = eeh_pcid_get(dev);
-	if (!driver) goto out_no_dev;
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->slot_reset ||
-	    (edev->mode & EEH_DEV_NO_HANDLER) ||
-	    (!edev->in_error))
-		goto out;
-
-	rc = driver->err_handler->slot_reset(dev);
-	*res = merge_result(*res, rc);
-
-out:
-	eeh_pcid_put(dev);
-out_no_dev:
-	device_unlock(&dev->dev);
-	return NULL;
+	if (!driver->err_handler->slot_reset || !edev->in_error)
+		return PCI_ERS_RESULT_NONE;
+	eeh_infoline(edev, "Invoking %s->slot_reset()", driver->name);
+	return driver->err_handler->slot_reset(edev->pdev);
 }
 
 static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
@@ -372,84 +386,52 @@ static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
 
 /**
  * eeh_report_resume - Tell device to resume normal operations
- * @data: eeh device
- * @userdata: return value
+ * @edev: eeh device
+ * @driver: device's PCI driver
  *
  * This routine must be called to notify the device driver that it
  * could resume so that the device driver can do some initialization
  * to make the recovered device work again.
  */
-static void *eeh_report_resume(struct eeh_dev *edev, void *userdata)
+static enum pci_ers_result eeh_report_resume(struct eeh_dev *edev,
+					     struct pci_driver *driver)
 {
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	bool was_in_error;
-	struct pci_driver *driver;
-
-	if (!eeh_edev_actionable(edev))
-		return NULL;
+	if (!driver->err_handler->resume || !edev->in_error)
+		return PCI_ERS_RESULT_NONE;
 
-	device_lock(&dev->dev);
+	eeh_infoline(edev, "Invoking %s->resume()", driver->name);
+	driver->err_handler->resume(edev->pdev);
 
-	driver = eeh_pcid_get(dev);
-	if (!driver) goto out_no_dev;
-
-	was_in_error = edev->in_error;
-	edev->in_error = false;
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->resume ||
-	    (edev->mode & EEH_DEV_NO_HANDLER) || !was_in_error) {
-		goto out;
-	}
-
-	driver->err_handler->resume(dev);
-
-	pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED);
-out:
-	eeh_pcid_put(dev);
+	pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_RECOVERED);
 #ifdef CONFIG_PCI_IOV
 	if (eeh_ops->notify_resume && eeh_dev_to_pdn(edev))
 		eeh_ops->notify_resume(eeh_dev_to_pdn(edev));
 #endif
-out_no_dev:
-	device_unlock(&dev->dev);
-	return NULL;
+	return PCI_ERS_RESULT_NONE;
 }
 
 /**
  * eeh_report_failure - Tell device driver that device is dead.
- * @data: eeh device
- * @userdata: return value
+ * @edev: eeh device
+ * @driver: device's PCI driver
  *
  * This informs the device driver that the device is permanently
  * dead, and that no further recovery attempts will be made on it.
  */
-static void *eeh_report_failure(struct eeh_dev *edev, void *userdata)
+static enum pci_ers_result eeh_report_failure(struct eeh_dev *edev,
+					      struct pci_driver *driver)
 {
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	struct pci_driver *driver;
+	enum pci_ers_result rc;
 
-	if (!eeh_edev_actionable(edev))
-		return NULL;
+	if (!driver->err_handler->error_detected)
+		return PCI_ERS_RESULT_NONE;
 
-	device_lock(&dev->dev);
-	dev->error_state = pci_channel_io_perm_failure;
+	eeh_infoline(edev, "Invoking %s->error_detected(permanent failure)",
+		     driver->name);
+	rc = driver->err_handler->error_detected(edev->pdev, pci_channel_io_perm_failure);
 
-	driver = eeh_pcid_get(dev);
-	if (!driver) goto out_no_dev;
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->error_detected)
-		goto out;
-
-	driver->err_handler->error_detected(dev, pci_channel_io_perm_failure);
-
-	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
-out:
-	eeh_pcid_put(dev);
-out_no_dev:
-	device_unlock(&dev->dev);
-	return NULL;
+	pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_DISCONNECT);
+	return rc;
 }
 
 static void *eeh_add_virt_device(void *data, void *userdata)
@@ -811,7 +793,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	pr_info("EEH: Notify device drivers to shutdown\n");
 	eeh_set_channel_state(pe, pci_channel_io_frozen);
 	eeh_set_irq_state(pe, false);
-	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
+	eeh_pe_report("error_detected(IO frozen)", pe,
+		      eeh_report_error, &result);
 	if ((pe->type & EEH_PE_PHB) &&
 	    result != PCI_ERS_RESULT_NONE &&
 	    result != PCI_ERS_RESULT_NEED_RESET)
@@ -858,7 +841,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 			result = PCI_ERS_RESULT_NEED_RESET;
 		} else {
 			pr_info("EEH: Notify device drivers to resume I/O\n");
-			eeh_pe_dev_traverse(pe, eeh_report_mmio_enabled, &result);
+			eeh_pe_report("mmio_enabled", pe,
+				      eeh_report_mmio_enabled, &result);
 		}
 	}
 
@@ -903,7 +887,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		result = PCI_ERS_RESULT_NONE;
 		eeh_set_channel_state(pe, pci_channel_io_normal);
 		eeh_set_irq_state(pe, true);
-		eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
+		eeh_pe_report("slot_reset", pe, eeh_report_reset, &result);
 	}
 
 	/* All devices should claim they have recovered by now. */
@@ -926,11 +910,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	pr_info("EEH: Notify device driver to resume\n");
 	eeh_set_channel_state(pe, pci_channel_io_normal);
 	eeh_set_irq_state(pe, true);
-	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
-
-	eeh_for_each_pe(pe, tmp_pe)
-		eeh_pe_for_each_dev(tmp_pe, edev, tmp)
+	eeh_pe_report("resume", pe, eeh_report_resume, NULL);
+	eeh_for_each_pe(pe, tmp_pe) {
+		eeh_pe_for_each_dev(tmp_pe, edev, tmp) {
 			edev->mode &= ~EEH_DEV_NO_HANDLER;
+			edev->in_error = false;
+		}
+	}
 
 	pr_info("EEH: Recovery successful.\n");
 	goto final;
@@ -950,7 +936,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	/* Notify all devices that they're about to go down. */
 	eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 	eeh_set_irq_state(pe, false);
-	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
+	eeh_pe_report("error_detected(permanent failure)", pe,
+		      eeh_report_failure, NULL);
 
 	/* Mark the PE to be removed permanently */
 	eeh_pe_state_mark(pe, EEH_PE_REMOVED);
@@ -1060,8 +1047,8 @@ void eeh_handle_special_event(void)
 				/* Notify all devices to be down */
 				eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
 				eeh_set_channel_state(pe, pci_channel_io_perm_failure);
-				eeh_pe_dev_traverse(pe,
-					eeh_report_failure, NULL);
+				eeh_pe_report("error_detected(permanent failure)",
+					       pe, eeh_report_failure, NULL);
 				bus = eeh_pe_bus_get(phb_pe);
 				if (!bus) {
 					pr_err("%s: Cannot find PCI bus for "
-- 
2.16.1.74.g9b0b1f47b

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 13/13] powerpc/eeh: Refactor report functions
  2018-05-02  6:36 ` [PATCH 13/13] powerpc/eeh: Refactor report functions Sam Bobroff
@ 2018-05-03 13:27   ` Michael Ellerman
  2018-05-07  5:23     ` Sam Bobroff
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Ellerman @ 2018-05-03 13:27 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

Sam Bobroff <sbobroff@linux.ibm.com> writes:
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index eb4feee81ff4..1c4336dcf9f5 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -54,6 +54,25 @@ static int eeh_result_priority(enum pci_ers_result result)
>  	}
>  };
>  
> +const char *pci_ers_result_name(enum pci_ers_result r)
> +{
> +	switch (r) {
> +	case PCI_ERS_RESULT_NONE: return "none";
> +	case PCI_ERS_RESULT_CAN_RECOVER: return "can recover";
> +	case PCI_ERS_RESULT_NEED_RESET: return "need reset";
> +	case PCI_ERS_RESULT_DISCONNECT: return "disconnect";
> +	case PCI_ERS_RESULT_RECOVERED: return "recovered";
> +	case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver";
> +	default:
> +		WARN_ONCE(1, "Unknown result type");
> +		return "unknown";
> +	}
> +};
> +
> +#define eeh_infoline(EDEV, FMT, ...) \
> +pr_info("EEH: PE#%x (PCI %s): " pr_fmt(FMT) "\n", EDEV->pe_config_addr, \
> +((EDEV->pdev) ? dev_name(&EDEV->pdev->dev) : "NONE"), ##__VA_ARGS__)

Ooof, I guess.

It would be nicer as a function.

"infoline" annoys me for some reason, "eeh_info" ?

> @@ -223,123 +242,118 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
>  				}
>  }
>  
> +static void eeh_pe_report(const char *name, struct eeh_pe *root,
> +			  enum pci_ers_result (*fn)(struct eeh_dev *,
> +						    struct pci_driver *),
> +			  enum pci_ers_result *result)
> +{
> +	struct eeh_pe *pe;
> +	struct eeh_dev *edev, *tmp;
> +	enum pci_ers_result new_result;
> +
> +	pr_info("EEH: Beginning: '%s'\n", name);
> +	eeh_for_each_pe(root, pe) {
> +		eeh_pe_for_each_dev(pe, edev, tmp) {

This should be in a separate function.

> +			device_lock(&edev->pdev->dev);
> +			if (eeh_edev_actionable(edev)) {
> +				struct pci_driver *driver;
> +
> +				driver = eeh_pcid_get(edev->pdev);
> +
> +				if (!driver)
> +					eeh_infoline(edev, "no driver");
> +				else if (!driver->err_handler)
> +					eeh_infoline(edev,
> +						     "driver not EEH aware");
> +				else if (edev->mode & EEH_DEV_NO_HANDLER)
> +					eeh_infoline(edev,
> +						     "driver bound too late");
> +				else {
> +					new_result = fn(edev, driver);
> +					eeh_infoline(edev,
> +						"%s driver reports: '%s'",
> +						driver->name,
> +						pci_ers_result_name(new_result));
> +					if (result)
> +						*result = merge_result(*result,
> +								       new_result);
> +				}
> +				if (driver)
> +					eeh_pcid_put(edev->pdev);
> +			} else {
> +				eeh_infoline(edev, "not actionable (%d,%d,%d)",
> +					     !!edev->pdev,
> +					     !eeh_dev_removed(edev),
> +					     !eeh_pe_passed(edev->pe));
> +			}
> +			device_unlock(&edev->pdev->dev);

^^^


cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/13] powerpc/eeh: Add final message for successful recovery
  2018-05-02  6:35 ` [PATCH 02/13] powerpc/eeh: Add final message for successful recovery Sam Bobroff
@ 2018-05-04  2:55   ` Michael Ellerman
  2018-05-04  6:08     ` Russell Currey
  2018-05-07  5:29     ` Sam Bobroff
  0 siblings, 2 replies; 33+ messages in thread
From: Michael Ellerman @ 2018-05-04  2:55 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

Sam Bobroff <sbobroff@linux.ibm.com> writes:

> Add a single log line at the end of successful EEH recovery, so that
> it's clear that event processing has finished.
>
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> ---
>  arch/powerpc/kernel/eeh_driver.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index 56a60b9eb397..07e0a42035ce 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
>  	pr_info("EEH: Notify device driver to resume\n");
>  	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
>  
> +	pr_info("EEH: Recovery successful.\n");

Is it possible for recovery for multiple devices to be interleaved?

Should that message include the device?

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/13] powerpc/eeh: Fix use-after-release of EEH driver
  2018-05-02  6:35 ` [PATCH 03/13] powerpc/eeh: Fix use-after-release of EEH driver Sam Bobroff
@ 2018-05-04  2:56   ` Michael Ellerman
  2018-05-07  5:38     ` Sam Bobroff
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Ellerman @ 2018-05-04  2:56 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

Sam Bobroff <sbobroff@linux.ibm.com> writes:

> Correct two cases where eeh_pcid_get() is used to reference the driver's
> module but the reference is dropped before the driver pointer is used.
>
> In eeh_rmv_device() also refactor a little so that only two calls to
> eeh_pcid_put() are needed, rather than three and the reference isn't
> taken at all if it wasn't needed.

This sounds like a crash or memory corruption bug?

But it doesn't have Fixes or Cc: stable. Is it not a major problem for
some reason?

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling
  2018-05-02  6:36 ` [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling Sam Bobroff
@ 2018-05-04  2:58   ` Michael Ellerman
  2018-05-04  6:58   ` Russell Currey
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2018-05-04  2:58 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

Sam Bobroff <sbobroff@linux.ibm.com> writes:
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index 188d15c4fe3a..f33dd68a9ca2 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -39,6 +39,29 @@ struct eeh_rmv_data {
>  	int removed;
>  };
>  
> +static int eeh_result_priority(enum pci_ers_result result)
> +{
> +	switch (result) {
> +	case PCI_ERS_RESULT_NONE: return 0;
> +	case PCI_ERS_RESULT_NO_AER_DRIVER: return 1;
> +	case PCI_ERS_RESULT_RECOVERED: return 2;
> +	case PCI_ERS_RESULT_CAN_RECOVER: return 3;
> +	case PCI_ERS_RESULT_DISCONNECT: return 4;
> +	case PCI_ERS_RESULT_NEED_RESET: return 5;
> +	default:
> +		WARN_ONCE(1, "Unknown pci_ers_result value");
> +		return 0;
> +	}
> +};
> +
> +static enum pci_ers_result merge_result(enum pci_ers_result old,
> +					enum pci_ers_result new)
> +{
> +	if (eeh_result_priority(new) > eeh_result_priority(old))
> +		return new;
> +	return old;

max() ?

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 11/13] powerpc/eeh: Introduce eeh_set_irq_state()
  2018-05-02  6:36 ` [PATCH 11/13] powerpc/eeh: Introduce eeh_set_irq_state() Sam Bobroff
@ 2018-05-04  3:02   ` Michael Ellerman
  2018-05-08  1:12     ` Sam Bobroff
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Ellerman @ 2018-05-04  3:02 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

Sam Bobroff <sbobroff@linux.ibm.com> writes:

> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index f63a01d336ee..b3edd0df04b8 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -210,6 +206,23 @@ static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s)
>  				edev->pdev->error_state = s;
>  }
>  
> +static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
> +{
> +	struct eeh_pe *pe;
> +	struct eeh_dev *edev, *tmp;
> +
> +	eeh_for_each_pe(root, pe)
> +		eeh_pe_for_each_dev(pe, edev, tmp)
> +			if (eeh_edev_actionable(edev))
> +				if (eeh_pcid_get(edev->pdev)) {
> +					if (enable)
> +						eeh_enable_irq(edev);
> +					else
> +						eeh_disable_irq(edev);
> +					eeh_pcid_put(edev->pdev);
> +				}

Yikes.

What about?

	eeh_for_each_pe(root, pe) {
		eeh_pe_for_each_dev(pe, edev, tmp) {
			if (!eeh_edev_actionable(edev))
				continue;

			if (!eeh_pcid_get(edev->pdev))
				continue;

			if (enable)
				eeh_enable_irq(edev);
			else
				eeh_disable_irq(edev);

			eeh_pcid_put(edev->pdev);
		}
	}

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line
  2018-05-02  6:34 ` [PATCH 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line Sam Bobroff
@ 2018-05-04  5:46   ` Russell Currey
  0 siblings, 0 replies; 33+ messages in thread
From: Russell Currey @ 2018-05-04  5:46 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

On Wed, 2018-05-02 at 16:34 +1000, Sam Bobroff wrote:
> The current failure message includes the number of failures that have
> occurred in the last hour (for a device) but it does not indicate
> how many failures will be tolerated before the device is permanently
> disabled.
> 
> Include the limit (eeh_max_freezes) to make this less surprising when
> it happens.
> 
> Also remove the embedded newline from the existing message to make it
> easier to grep for.
> 
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>

Good idea.

Reviewed-by: Russell Currey <ruscur@russell.cc>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/13] powerpc/eeh: Add final message for successful recovery
  2018-05-04  2:55   ` Michael Ellerman
@ 2018-05-04  6:08     ` Russell Currey
  2018-05-07  5:35       ` Sam Bobroff
  2018-05-07  5:29     ` Sam Bobroff
  1 sibling, 1 reply; 33+ messages in thread
From: Russell Currey @ 2018-05-04  6:08 UTC (permalink / raw)
  To: Michael Ellerman, Sam Bobroff, linuxppc-dev

On Fri, 2018-05-04 at 12:55 +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobroff@linux.ibm.com> writes:
> 
> > Add a single log line at the end of successful EEH recovery, so
> > that
> > it's clear that event processing has finished.
> > 
> > Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> > ---
> >  arch/powerpc/kernel/eeh_driver.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/powerpc/kernel/eeh_driver.c
> > b/arch/powerpc/kernel/eeh_driver.c
> > index 56a60b9eb397..07e0a42035ce 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
> >  	pr_info("EEH: Notify device driver to resume\n");
> >  	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
> >  
> > +	pr_info("EEH: Recovery successful.\n");
> Is it possible for recovery for multiple devices to be interleaved?
> 
> Should that message include the device?

Pretty sure EEH will only process a single error at a time so this
*should* always let you infer from context, but PHB and PE should
probably be included anyway.  It'd be cool to move pe_{err/warn/info}()
out of powernv for messages like this.

- Russell

> 
> cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/13] powerpc/eeh: Remove unused eeh_pcid_name()
  2018-05-02  6:35 ` [PATCH 04/13] powerpc/eeh: Remove unused eeh_pcid_name() Sam Bobroff
@ 2018-05-04  6:29   ` Russell Currey
  0 siblings, 0 replies; 33+ messages in thread
From: Russell Currey @ 2018-05-04  6:29 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

On Wed, 2018-05-02 at 16:35 +1000, Sam Bobroff wrote:
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>

Wow, this has been around since 2006.

Reviewed-by: Russell Currey <ruscur@russell.cc>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/13] powerpc/eeh: Strengthen types of eeh traversal functions
  2018-05-02  6:35 ` [PATCH 05/13] powerpc/eeh: Strengthen types of eeh traversal functions Sam Bobroff
@ 2018-05-04  6:32   ` Russell Currey
  0 siblings, 0 replies; 33+ messages in thread
From: Russell Currey @ 2018-05-04  6:32 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

On Wed, 2018-05-02 at 16:35 +1000, Sam Bobroff wrote:
> The traversal functions eeh_pe_traverse() and eeh_pe_dev_traverse()
> both provide their first argument as void * but every single user
> casts
> it to the expected type.
> 
> Change the type of the first parameter from void * to the appropriate
> type, and clean up all uses.
> 
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>

Reviewed-by: Russell Currey <ruscur@russell.cc>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/13] powerpc/eeh: Add message when PE processing at parent
  2018-05-02  6:35 ` [PATCH 06/13] powerpc/eeh: Add message when PE processing at parent Sam Bobroff
@ 2018-05-04  6:51   ` Russell Currey
  0 siblings, 0 replies; 33+ messages in thread
From: Russell Currey @ 2018-05-04  6:51 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

On Wed, 2018-05-02 at 16:35 +1000, Sam Bobroff wrote:
> To aid debugging, add a message to show when EEH processing for a PE
> will be done at the device's parent, rather than directly at the
> device.
> 
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>

Good idea!

Reviewed-by: Russell Currey <ruscur@russell.cc>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling
  2018-05-02  6:36 ` [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling Sam Bobroff
  2018-05-04  2:58   ` Michael Ellerman
@ 2018-05-04  6:58   ` Russell Currey
  2018-05-08  1:09     ` Sam Bobroff
  1 sibling, 1 reply; 33+ messages in thread
From: Russell Currey @ 2018-05-04  6:58 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

On Wed, 2018-05-02 at 16:36 +1000, Sam Bobroff wrote:
> As EEH event handling progresses, a cumulative result of type
> pci_ers_result is built up by (some of) the eeh_report_*() functions
> using either:
> 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> or:
> 	if ((*res == PCI_ERS_RESULT_NONE) ||
> 	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> 	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> 	    rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> (Where *res is the accumulator.)
> 
> However, the intent is not immediately clear and the result in some
> situations is order dependent.
> 
> Address this by assigning a priority to each result value, and always
> merging to the highest priority. This renders the intent clear, and
> provides a stable value for all orderings.
> 
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> ---
>  arch/powerpc/kernel/eeh_driver.c | 36 ++++++++++++++++++++++++++--
> --------
>  1 file changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/eeh_driver.c
> b/arch/powerpc/kernel/eeh_driver.c
> index 188d15c4fe3a..f33dd68a9ca2 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -39,6 +39,29 @@ struct eeh_rmv_data {
>  	int removed;
>  };
>  
> +static int eeh_result_priority(enum pci_ers_result result)
> +{
> +	switch (result) {
> +	case PCI_ERS_RESULT_NONE: return 0;
> +	case PCI_ERS_RESULT_NO_AER_DRIVER: return 1;
> +	case PCI_ERS_RESULT_RECOVERED: return 2;
> +	case PCI_ERS_RESULT_CAN_RECOVER: return 3;
> +	case PCI_ERS_RESULT_DISCONNECT: return 4;
> +	case PCI_ERS_RESULT_NEED_RESET: return 5;
> +	default:
> +		WARN_ONCE(1, "Unknown pci_ers_result value");

Missing newline and also would be good to print the value was

> +		return 0;
> +	}
> +};
> +
> +static enum pci_ers_result merge_result(enum pci_ers_result old,
> +					enum pci_ers_result new)

merge_result() sounds like something really generic, maybe call it
pci_ers_merge_result() or something?

Note: just learned that it stands for Error Recovery System and that's
a thing (?)

> +{
> +	if (eeh_result_priority(new) > eeh_result_priority(old))
> +		return new;
> +	return old;

MAX would be nicer as per mpe's suggestion

> +}
> +
>  /**
>   * eeh_pcid_get - Get the PCI device driver
>   * @pdev: PCI device
> @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev
> *edev, void *userdata)
>  
>  	rc = driver->err_handler->error_detected(dev,
> pci_channel_io_frozen);
>  
> -	/* A driver that needs a reset trumps all others */
> -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> +	*res = merge_result(*res, rc);
>  
>  	edev->in_error = true;
>  	pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
> @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct
> eeh_dev *edev, void *userdata)
>  
>  	rc = driver->err_handler->mmio_enabled(dev);
>  
> -	/* A driver that needs a reset trumps all others */
> -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> +	*res = merge_result(*res, rc);
>  
>  out:
>  	eeh_pcid_put(dev);
> @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev
> *edev, void *userdata)
>  		goto out;
>  
>  	rc = driver->err_handler->slot_reset(dev);
> -	if ((*res == PCI_ERS_RESULT_NONE) ||
> -	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> -	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> -	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> +	*res = merge_result(*res, rc);
>  
>  out:
>  	eeh_pcid_put(dev);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 08/13] powerpc/eeh: Introduce eeh_for_each_pe()
  2018-05-02  6:36 ` [PATCH 08/13] powerpc/eeh: Introduce eeh_for_each_pe() Sam Bobroff
@ 2018-05-04  6:59   ` Russell Currey
  0 siblings, 0 replies; 33+ messages in thread
From: Russell Currey @ 2018-05-04  6:59 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev

On Wed, 2018-05-02 at 16:36 +1000, Sam Bobroff wrote:
> Add a for_each-style macro for iterating through PEs without the
> boilerplate required by a traversal function. eeh_pe_next() is now
> exported, as it is now used directly in place.
> 
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>

Reviewed-by: Russell Currey <ruscur@russell.cc>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 13/13] powerpc/eeh: Refactor report functions
  2018-05-03 13:27   ` Michael Ellerman
@ 2018-05-07  5:23     ` Sam Bobroff
  2018-05-09 14:51       ` Michael Ellerman
  0 siblings, 1 reply; 33+ messages in thread
From: Sam Bobroff @ 2018-05-07  5:23 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 3621 bytes --]

On Thu, May 03, 2018 at 11:27:12PM +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobroff@linux.ibm.com> writes:
> > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> > index eb4feee81ff4..1c4336dcf9f5 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -54,6 +54,25 @@ static int eeh_result_priority(enum pci_ers_result result)
> >  	}
> >  };
> >  
> > +const char *pci_ers_result_name(enum pci_ers_result r)
> > +{
> > +	switch (r) {
> > +	case PCI_ERS_RESULT_NONE: return "none";
> > +	case PCI_ERS_RESULT_CAN_RECOVER: return "can recover";
> > +	case PCI_ERS_RESULT_NEED_RESET: return "need reset";
> > +	case PCI_ERS_RESULT_DISCONNECT: return "disconnect";
> > +	case PCI_ERS_RESULT_RECOVERED: return "recovered";
> > +	case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver";
> > +	default:
> > +		WARN_ONCE(1, "Unknown result type");
> > +		return "unknown";
> > +	}
> > +};
> > +
> > +#define eeh_infoline(EDEV, FMT, ...) \
> > +pr_info("EEH: PE#%x (PCI %s): " pr_fmt(FMT) "\n", EDEV->pe_config_addr, \
> > +((EDEV->pdev) ? dev_name(&EDEV->pdev->dev) : "NONE"), ##__VA_ARGS__)
> 
> Ooof, I guess.
> 
> It would be nicer as a function.

OK (I'd prefer to avoid macros too), but I'm not sure what kind of
function you mean. I initially tried to use a function, but the existing
pr_*() type macros seemed to be macros all the way down to printk, so there
didn't seem to be anywhere to hook into, and I ended up having to
open-code the varargs and allocate a buffer to print into.  Then it can
overflow etc. and it ended up seeming worse. Is there a better way I'm
missing here?

> "infoline" annoys me for some reason, "eeh_info" ?

OK. How about eeh_edev_info(), to indicate that it acts on an 'edev'?

> > @@ -223,123 +242,118 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
> >  				}
> >  }
> >  
> > +static void eeh_pe_report(const char *name, struct eeh_pe *root,
> > +			  enum pci_ers_result (*fn)(struct eeh_dev *,
> > +						    struct pci_driver *),
> > +			  enum pci_ers_result *result)
> > +{
> > +	struct eeh_pe *pe;
> > +	struct eeh_dev *edev, *tmp;
> > +	enum pci_ers_result new_result;
> > +
> > +	pr_info("EEH: Beginning: '%s'\n", name);
> > +	eeh_for_each_pe(root, pe) {
> > +		eeh_pe_for_each_dev(pe, edev, tmp) {
> 
> This should be in a separate function.

Oh, that's better. Will do!

> > +			device_lock(&edev->pdev->dev);
> > +			if (eeh_edev_actionable(edev)) {
> > +				struct pci_driver *driver;
> > +
> > +				driver = eeh_pcid_get(edev->pdev);
> > +
> > +				if (!driver)
> > +					eeh_infoline(edev, "no driver");
> > +				else if (!driver->err_handler)
> > +					eeh_infoline(edev,
> > +						     "driver not EEH aware");
> > +				else if (edev->mode & EEH_DEV_NO_HANDLER)
> > +					eeh_infoline(edev,
> > +						     "driver bound too late");
> > +				else {
> > +					new_result = fn(edev, driver);
> > +					eeh_infoline(edev,
> > +						"%s driver reports: '%s'",
> > +						driver->name,
> > +						pci_ers_result_name(new_result));
> > +					if (result)
> > +						*result = merge_result(*result,
> > +								       new_result);
> > +				}
> > +				if (driver)
> > +					eeh_pcid_put(edev->pdev);
> > +			} else {
> > +				eeh_infoline(edev, "not actionable (%d,%d,%d)",
> > +					     !!edev->pdev,
> > +					     !eeh_dev_removed(edev),
> > +					     !eeh_pe_passed(edev->pe));
> > +			}
> > +			device_unlock(&edev->pdev->dev);
> 
> ^^^
> 
> 
> cheers

Thanks!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/13] powerpc/eeh: Add final message for successful recovery
  2018-05-04  2:55   ` Michael Ellerman
  2018-05-04  6:08     ` Russell Currey
@ 2018-05-07  5:29     ` Sam Bobroff
  1 sibling, 0 replies; 33+ messages in thread
From: Sam Bobroff @ 2018-05-07  5:29 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On Fri, May 04, 2018 at 12:55:42PM +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobroff@linux.ibm.com> writes:
> 
> > Add a single log line at the end of successful EEH recovery, so that
> > it's clear that event processing has finished.
> >
> > Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> > ---
> >  arch/powerpc/kernel/eeh_driver.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> > index 56a60b9eb397..07e0a42035ce 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
> >  	pr_info("EEH: Notify device driver to resume\n");
> >  	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
> >  
> > +	pr_info("EEH: Recovery successful.\n");
> 
> Is it possible for recovery for multiple devices to be interleaved?
> 
> Should that message include the device?

No, all the calls are serialized. They're only called from
eeh_event_handler() which is in the main loop of a kernel thread (see
eeh_event_init()).

> cheers
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/13] powerpc/eeh: Add final message for successful recovery
  2018-05-04  6:08     ` Russell Currey
@ 2018-05-07  5:35       ` Sam Bobroff
  0 siblings, 0 replies; 33+ messages in thread
From: Sam Bobroff @ 2018-05-07  5:35 UTC (permalink / raw)
  To: Russell Currey; +Cc: Michael Ellerman, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1891 bytes --]

On Fri, May 04, 2018 at 04:08:28PM +1000, Russell Currey wrote:
> On Fri, 2018-05-04 at 12:55 +1000, Michael Ellerman wrote:
> > Sam Bobroff <sbobroff@linux.ibm.com> writes:
> > 
> > > Add a single log line at the end of successful EEH recovery, so
> > > that
> > > it's clear that event processing has finished.
> > > 
> > > Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> > > ---
> > >  arch/powerpc/kernel/eeh_driver.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/arch/powerpc/kernel/eeh_driver.c
> > > b/arch/powerpc/kernel/eeh_driver.c
> > > index 56a60b9eb397..07e0a42035ce 100644
> > > --- a/arch/powerpc/kernel/eeh_driver.c
> > > +++ b/arch/powerpc/kernel/eeh_driver.c
> > > @@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
> > >  	pr_info("EEH: Notify device driver to resume\n");
> > >  	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
> > >  
> > > +	pr_info("EEH: Recovery successful.\n");
> > Is it possible for recovery for multiple devices to be interleaved?
> > 
> > Should that message include the device?
> 
> Pretty sure EEH will only process a single error at a time so this
> *should* always let you infer from context, but PHB and PE should
> probably be included anyway.  It'd be cool to move pe_{err/warn/info}()
> out of powernv for messages like this.

While we only process a single "event" at a time, that event covers the
initial PE, or possibly it's parent or PHB, and any dependent PEs so I
thought it could be misleading to show only a single one for that
message -- I wanted something to indicate that the whole "event" had
finished (since that was otherwise not really clear to me). Does that
seem reasonable?

I'm sorry but I don't understand what you mean by moving some functions
out of powernv, could you explain more?

> - Russell
> 
> > 
> > cheers
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/13] powerpc/eeh: Fix use-after-release of EEH driver
  2018-05-04  2:56   ` Michael Ellerman
@ 2018-05-07  5:38     ` Sam Bobroff
  0 siblings, 0 replies; 33+ messages in thread
From: Sam Bobroff @ 2018-05-07  5:38 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

On Fri, May 04, 2018 at 12:56:55PM +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobroff@linux.ibm.com> writes:
> 
> > Correct two cases where eeh_pcid_get() is used to reference the driver's
> > module but the reference is dropped before the driver pointer is used.
> >
> > In eeh_rmv_device() also refactor a little so that only two calls to
> > eeh_pcid_put() are needed, rather than three and the reference isn't
> > taken at all if it wasn't needed.
> 
> This sounds like a crash or memory corruption bug?
> 
> But it doesn't have Fixes or Cc: stable. Is it not a major problem for
> some reason?

Only that I've exercised that code path a fair bit during testing and
never managed to cause a problem with it. I found it by inspection.

Do you think I should mark it fixes or stable in the next version?

> cheers
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling
  2018-05-04  6:58   ` Russell Currey
@ 2018-05-08  1:09     ` Sam Bobroff
  0 siblings, 0 replies; 33+ messages in thread
From: Sam Bobroff @ 2018-05-08  1:09 UTC (permalink / raw)
  To: Russell Currey; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 4248 bytes --]

On Fri, May 04, 2018 at 04:58:01PM +1000, Russell Currey wrote:
> On Wed, 2018-05-02 at 16:36 +1000, Sam Bobroff wrote:
> > As EEH event handling progresses, a cumulative result of type
> > pci_ers_result is built up by (some of) the eeh_report_*() functions
> > using either:
> > 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > or:
> > 	if ((*res == PCI_ERS_RESULT_NONE) ||
> > 	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> > 	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> > 	    rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > (Where *res is the accumulator.)
> > 
> > However, the intent is not immediately clear and the result in some
> > situations is order dependent.
> > 
> > Address this by assigning a priority to each result value, and always
> > merging to the highest priority. This renders the intent clear, and
> > provides a stable value for all orderings.
> > 
> > Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> > ---
> >  arch/powerpc/kernel/eeh_driver.c | 36 ++++++++++++++++++++++++++--
> > --------
> >  1 file changed, 26 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/eeh_driver.c
> > b/arch/powerpc/kernel/eeh_driver.c
> > index 188d15c4fe3a..f33dd68a9ca2 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -39,6 +39,29 @@ struct eeh_rmv_data {
> >  	int removed;
> >  };
> >  
> > +static int eeh_result_priority(enum pci_ers_result result)
> > +{
> > +	switch (result) {
> > +	case PCI_ERS_RESULT_NONE: return 0;
> > +	case PCI_ERS_RESULT_NO_AER_DRIVER: return 1;
> > +	case PCI_ERS_RESULT_RECOVERED: return 2;
> > +	case PCI_ERS_RESULT_CAN_RECOVER: return 3;
> > +	case PCI_ERS_RESULT_DISCONNECT: return 4;
> > +	case PCI_ERS_RESULT_NEED_RESET: return 5;
> > +	default:
> > +		WARN_ONCE(1, "Unknown pci_ers_result value");
> 
> Missing newline and also would be good to print the value was

Sounds good, will do! I'm not sure if it's mentioned elsewhere but I'll
fix the same issues in pci_ers_result_name() as well.

> > +		return 0;
> > +	}
> > +};
> > +
> > +static enum pci_ers_result merge_result(enum pci_ers_result old,
> > +					enum pci_ers_result new)
> 
> merge_result() sounds like something really generic, maybe call it
> pci_ers_merge_result() or something?

Sounds good, will do.

> Note: just learned that it stands for Error Recovery System and that's
> a thing (?)
> 
> > +{
> > +	if (eeh_result_priority(new) > eeh_result_priority(old))
> > +		return new;
> > +	return old;
> 
> MAX would be nicer as per mpe's suggestion

Sorry, I don't understand. The return value isn't MAX(new, old) so
how can MAX() do this?

> > +}
> > +
> >  /**
> >   * eeh_pcid_get - Get the PCI device driver
> >   * @pdev: PCI device
> > @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev
> > *edev, void *userdata)
> >  
> >  	rc = driver->err_handler->error_detected(dev,
> > pci_channel_io_frozen);
> >  
> > -	/* A driver that needs a reset trumps all others */
> > -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > +	*res = merge_result(*res, rc);
> >  
> >  	edev->in_error = true;
> >  	pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
> > @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct
> > eeh_dev *edev, void *userdata)
> >  
> >  	rc = driver->err_handler->mmio_enabled(dev);
> >  
> > -	/* A driver that needs a reset trumps all others */
> > -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > +	*res = merge_result(*res, rc);
> >  
> >  out:
> >  	eeh_pcid_put(dev);
> > @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev
> > *edev, void *userdata)
> >  		goto out;
> >  
> >  	rc = driver->err_handler->slot_reset(dev);
> > -	if ((*res == PCI_ERS_RESULT_NONE) ||
> > -	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> > -	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> > -	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > +	*res = merge_result(*res, rc);
> >  
> >  out:
> >  	eeh_pcid_put(dev);
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 11/13] powerpc/eeh: Introduce eeh_set_irq_state()
  2018-05-04  3:02   ` Michael Ellerman
@ 2018-05-08  1:12     ` Sam Bobroff
  0 siblings, 0 replies; 33+ messages in thread
From: Sam Bobroff @ 2018-05-08  1:12 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1368 bytes --]

On Fri, May 04, 2018 at 01:02:32PM +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobroff@linux.ibm.com> writes:
> 
> > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> > index f63a01d336ee..b3edd0df04b8 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -210,6 +206,23 @@ static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s)
> >  				edev->pdev->error_state = s;
> >  }
> >  
> > +static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
> > +{
> > +	struct eeh_pe *pe;
> > +	struct eeh_dev *edev, *tmp;
> > +
> > +	eeh_for_each_pe(root, pe)
> > +		eeh_pe_for_each_dev(pe, edev, tmp)
> > +			if (eeh_edev_actionable(edev))
> > +				if (eeh_pcid_get(edev->pdev)) {
> > +					if (enable)
> > +						eeh_enable_irq(edev);
> > +					else
> > +						eeh_disable_irq(edev);
> > +					eeh_pcid_put(edev->pdev);
> > +				}
> 
> Yikes.
> 
> What about?
> 
> 	eeh_for_each_pe(root, pe) {
> 		eeh_pe_for_each_dev(pe, edev, tmp) {
> 			if (!eeh_edev_actionable(edev))
> 				continue;
> 
> 			if (!eeh_pcid_get(edev->pdev))
> 				continue;
> 
> 			if (enable)
> 				eeh_enable_irq(edev);
> 			else
> 				eeh_disable_irq(edev);
> 
> 			eeh_pcid_put(edev->pdev);
> 		}
> 	}
> 
> cheers

Sure, will do.

Cheers,

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 13/13] powerpc/eeh: Refactor report functions
  2018-05-07  5:23     ` Sam Bobroff
@ 2018-05-09 14:51       ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2018-05-09 14:51 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: linuxppc-dev

Sam Bobroff <sbobroff@linux.ibm.com> writes:
> On Thu, May 03, 2018 at 11:27:12PM +1000, Michael Ellerman wrote:
>> Sam Bobroff <sbobroff@linux.ibm.com> writes:
>> > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>> > index eb4feee81ff4..1c4336dcf9f5 100644
>> > --- a/arch/powerpc/kernel/eeh_driver.c
>> > +++ b/arch/powerpc/kernel/eeh_driver.c
>> > @@ -54,6 +54,25 @@ static int eeh_result_priority(enum pci_ers_result result)
>> >  	}
>> >  };
>> >  
>> > +const char *pci_ers_result_name(enum pci_ers_result r)
>> > +{
>> > +	switch (r) {
>> > +	case PCI_ERS_RESULT_NONE: return "none";
>> > +	case PCI_ERS_RESULT_CAN_RECOVER: return "can recover";
>> > +	case PCI_ERS_RESULT_NEED_RESET: return "need reset";
>> > +	case PCI_ERS_RESULT_DISCONNECT: return "disconnect";
>> > +	case PCI_ERS_RESULT_RECOVERED: return "recovered";
>> > +	case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver";
>> > +	default:
>> > +		WARN_ONCE(1, "Unknown result type");
>> > +		return "unknown";
>> > +	}
>> > +};
>> > +
>> > +#define eeh_infoline(EDEV, FMT, ...) \
>> > +pr_info("EEH: PE#%x (PCI %s): " pr_fmt(FMT) "\n", EDEV->pe_config_addr, \
>> > +((EDEV->pdev) ? dev_name(&EDEV->pdev->dev) : "NONE"), ##__VA_ARGS__)
>> 
>> Ooof, I guess.
>> 
>> It would be nicer as a function.
>
> OK (I'd prefer to avoid macros too), but I'm not sure what kind of
> function you mean. I initially tried to use a function, but the existing
> pr_*() type macros seemed to be macros all the way down to printk, so there
> didn't seem to be anywhere to hook into, and I ended up having to
> open-code the varargs and allocate a buffer to print into.  Then it can
> overflow etc. and it ended up seeming worse. Is there a better way I'm
> missing here?

Something like (not tested):

void eeh_edev_info(const struct eeh_info *info, const char *fmt, ...)
{
	struct va_format vaf;
	va_list args;

	va_start(args, fmt);

	vaf.fmt = fmt;
	vaf.va = &args;

        printk(KERN_INFO "EEH: PE#%x (PCI %s): %pV\n",
        	info->pe_config_addr,
                info->pdev ? dev_name(&info->pdev->dev) : "none",
                &vaf);

	va_end(args);
}

>
>> "infoline" annoys me for some reason, "eeh_info" ?
>
> OK. How about eeh_edev_info(), to indicate that it acts on an 'edev'?

Sounds good.

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-05-09 14:51 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-02  6:34 [PATCH 00/13] EEH refactoring 2 Sam Bobroff
2018-05-02  6:34 ` [PATCH 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line Sam Bobroff
2018-05-04  5:46   ` Russell Currey
2018-05-02  6:35 ` [PATCH 02/13] powerpc/eeh: Add final message for successful recovery Sam Bobroff
2018-05-04  2:55   ` Michael Ellerman
2018-05-04  6:08     ` Russell Currey
2018-05-07  5:35       ` Sam Bobroff
2018-05-07  5:29     ` Sam Bobroff
2018-05-02  6:35 ` [PATCH 03/13] powerpc/eeh: Fix use-after-release of EEH driver Sam Bobroff
2018-05-04  2:56   ` Michael Ellerman
2018-05-07  5:38     ` Sam Bobroff
2018-05-02  6:35 ` [PATCH 04/13] powerpc/eeh: Remove unused eeh_pcid_name() Sam Bobroff
2018-05-04  6:29   ` Russell Currey
2018-05-02  6:35 ` [PATCH 05/13] powerpc/eeh: Strengthen types of eeh traversal functions Sam Bobroff
2018-05-04  6:32   ` Russell Currey
2018-05-02  6:35 ` [PATCH 06/13] powerpc/eeh: Add message when PE processing at parent Sam Bobroff
2018-05-04  6:51   ` Russell Currey
2018-05-02  6:36 ` [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling Sam Bobroff
2018-05-04  2:58   ` Michael Ellerman
2018-05-04  6:58   ` Russell Currey
2018-05-08  1:09     ` Sam Bobroff
2018-05-02  6:36 ` [PATCH 08/13] powerpc/eeh: Introduce eeh_for_each_pe() Sam Bobroff
2018-05-04  6:59   ` Russell Currey
2018-05-02  6:36 ` [PATCH 09/13] powerpc/eeh: Introduce eeh_edev_actionable() Sam Bobroff
2018-05-02  6:36 ` [PATCH 10/13] powerpc/eeh: Introduce eeh_set_channel_state() Sam Bobroff
2018-05-02  6:36 ` [PATCH 11/13] powerpc/eeh: Introduce eeh_set_irq_state() Sam Bobroff
2018-05-04  3:02   ` Michael Ellerman
2018-05-08  1:12     ` Sam Bobroff
2018-05-02  6:36 ` [PATCH 12/13] powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER Sam Bobroff
2018-05-02  6:36 ` [PATCH 13/13] powerpc/eeh: Refactor report functions Sam Bobroff
2018-05-03 13:27   ` Michael Ellerman
2018-05-07  5:23     ` Sam Bobroff
2018-05-09 14:51       ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.