All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] EEH Improvement and Cleanup
@ 2015-10-08  3:58 Gavin Shan
  2015-10-08  3:58 ` [PATCH v2 1/8] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Gavin Shan @ 2015-10-08  3:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

The PATCH[1] of this series skips unfreezing PHB PE after reset during
recovery because that's unnecessary. PATCH[2] makes the hotplug criterion
more realistic by validating all required error handlers. PATCH[3] fixes
the issue that reset isn't issued when the error handler (error_detected())
of device driver doesn't expect reset during recovery, leading to the error
can't be recovered. PATCH[4] fixes the issue of enabling IO path that has
been enabled on pSeries platform. PATCH[5-8] cleans up eeh_pseries.c and
eeh-powernv.c without introducing logical changes.

Gavin Shan (8):
  powerpc/eeh: Don't unfreeze PHB PE after reset
  powerpc/eeh: More relexed hotplug criterion
  powerpc/eeh: Force reset on fenced PHB
  powerpc/eeh: More relxed condition for enabled IO path
  powerpc/pseries: Cleanup on pseries_eeh_get_state()
  powerpc/powernv: Cleanup on EEH comments
  powerpc/powernv: Remove pnv_eeh_cap_start()
  powerpc/powernv: Simplify pnv_eeh_set_option()

 arch/powerpc/kernel/eeh.c                    |  2 +-
 arch/powerpc/kernel/eeh_driver.c             | 27 +++++++--
 arch/powerpc/platforms/powernv/eeh-powernv.c | 83 +++++++++++-----------------
 arch/powerpc/platforms/pseries/eeh_pseries.c | 60 +++++++++-----------
 4 files changed, 81 insertions(+), 91 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/8] powerpc/eeh: Don't unfreeze PHB PE after reset
  2015-10-08  3:58 [PATCH v2 0/8] EEH Improvement and Cleanup Gavin Shan
@ 2015-10-08  3:58 ` Gavin Shan
  2015-10-21 11:41   ` [v2,1/8] " Michael Ellerman
  2015-10-08  3:58 ` [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion Gavin Shan
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Gavin Shan @ 2015-10-08  3:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

On PowerNV platform, the PE is kept in frozen state until the PE
reset is completed to avoid recursive EEH error caused by MMIO
access during the period of EEH reset. The PE's frozen state is
cleared after BARs of PCI device included in the PE are restored
and enabled. However, we needn't clear the frozen state for PHB PE
explicitly at this point as there is no real PE for PHB PE. As the
PHB PE is always binding with PE#0, we actually clear PE#0, which
is wrong. It doesn't incur any problem though.

This checks if the PE is PHB PE and doesn't clear the frozen state
if it is.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 89eb4bc..3a626ed 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -587,10 +587,16 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	eeh_ops->configure_bridge(pe);
 	eeh_pe_restore_bars(pe);
 
-	/* Clear frozen state */
-	rc = eeh_clear_pe_frozen_state(pe, false);
-	if (rc)
-		return rc;
+	/*
+	 * If it's PHB PE, the frozen state on all available PEs should have
+	 * been cleared by the PHB reset. Otherwise, we unfreeze the PE and its
+	 * child PEs because they might be in frozen state.
+	 */
+	if (!(pe->type & EEH_PE_PHB)) {
+		rc = eeh_clear_pe_frozen_state(pe, false);
+		if (rc)
+			return rc;
+	}
 
 	/* Give the system 5 seconds to finish running the user-space
 	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion
  2015-10-08  3:58 [PATCH v2 0/8] EEH Improvement and Cleanup Gavin Shan
  2015-10-08  3:58 ` [PATCH v2 1/8] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
@ 2015-10-08  3:58 ` Gavin Shan
  2015-10-12 22:55   ` Daniel Axtens
  2015-10-08  3:58 ` [PATCH v2 3/8] powerpc/eeh: Force reset on fenced PHB Gavin Shan
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Gavin Shan @ 2015-10-08  3:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

Currently, we rely on the existence of struct pci_driver::err_handler
to judge if the corresponding PCI device should be unplugged during
EEH recovery (partially hotplug case). However, it's not elaborate.
some device drivers are implementing part of the EEH error handlers
to collect diag-data. That means the driver still expects a hotplug
to recover from the EEH error.

This makes the hotplug criterion more relaxed: if the device driver
doesn't provide all necessary EEH error handlers, it will experience
hotplug during EEH recovery.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 3a626ed..32178a4 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (driver) {
 		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (driver->err_handler &&
+		    driver->err_handler->error_detected &&
+		    driver->err_handler->slot_reset &&
+		    driver->err_handler->resume)
 			return NULL;
 	}
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/8] powerpc/eeh: Force reset on fenced PHB
  2015-10-08  3:58 [PATCH v2 0/8] EEH Improvement and Cleanup Gavin Shan
  2015-10-08  3:58 ` [PATCH v2 1/8] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
  2015-10-08  3:58 ` [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion Gavin Shan
@ 2015-10-08  3:58 ` Gavin Shan
  2015-10-13  1:43   ` Daniel Axtens
  2015-10-08  3:58 ` [PATCH v2 4/8] powerpc/eeh: More relxed condition for enabled IO path Gavin Shan
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Gavin Shan @ 2015-10-08  3:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

On fenced PHB, the error handlers in the drivers of its subordinate
devices could return PCI_ERS_RESULT_CAN_RECOVER, indicating no reset
will be issued during the recovery. It's conflicting with the fact
that fenced PHB won't be recovered without reset.

This limits the return value from the error handlers in the drivers
of the fenced PHB's subordinate devices to PCI_ERS_RESULT_NEED_NONE
or PCI_ERS_RESULT_NEED_RESET, to ensure reset will be issued during
recovery.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 32178a4..80dfe89 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -664,9 +664,17 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * to accomplish the reset.  Each child gets a report of the
 	 * status ... if any child can't handle the reset, then the entire
 	 * slot is dlpar removed and added.
+	 *
+	 * When the PHB is fenced, we have to issue a reset to recover from
+	 * the error. Override the result if necessary to have partially
+	 * hotplug for this case.
 	 */
 	pr_info("EEH: Notify device drivers to shutdown\n");
 	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
+	if ((pe->type & EEH_PE_PHB) &&
+	    result != PCI_ERS_RESULT_NONE &&
+	    result != PCI_ERS_RESULT_NEED_RESET)
+		result = PCI_ERS_RESULT_NEED_RESET;
 
 	/* Get the current PCI slot state. This can take a long time,
 	 * sometimes over 300 seconds for certain systems.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/8] powerpc/eeh: More relxed condition for enabled IO path
  2015-10-08  3:58 [PATCH v2 0/8] EEH Improvement and Cleanup Gavin Shan
                   ` (2 preceding siblings ...)
  2015-10-08  3:58 ` [PATCH v2 3/8] powerpc/eeh: Force reset on fenced PHB Gavin Shan
@ 2015-10-08  3:58 ` Gavin Shan
  2015-10-08  3:58 ` [PATCH v2 5/8] powerpc/pseries: Cleanup on pseries_eeh_get_state() Gavin Shan
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Gavin Shan @ 2015-10-08  3:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

When one of below two flags or both of them are marked in the PE
state, the PE's IO path is regarded as enabled: EEH_STATE_MMIO_ACTIVE
or EEH_STATE_MMIO_ENABLED.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index e968533..ddbf406 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -630,7 +630,7 @@ int eeh_pci_enable(struct eeh_pe *pe, int function)
 	 */
 	switch (function) {
 	case EEH_OPT_THAW_MMIO:
-		active_flag = EEH_STATE_MMIO_ACTIVE;
+		active_flag = EEH_STATE_MMIO_ACTIVE | EEH_STATE_MMIO_ENABLED;
 		break;
 	case EEH_OPT_THAW_DMA:
 		active_flag = EEH_STATE_DMA_ACTIVE;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/8] powerpc/pseries: Cleanup on pseries_eeh_get_state()
  2015-10-08  3:58 [PATCH v2 0/8] EEH Improvement and Cleanup Gavin Shan
                   ` (3 preceding siblings ...)
  2015-10-08  3:58 ` [PATCH v2 4/8] powerpc/eeh: More relxed condition for enabled IO path Gavin Shan
@ 2015-10-08  3:58 ` Gavin Shan
  2015-10-08  4:15   ` Andrew Donnellan
  2015-10-08  3:58 ` [PATCH v2 6/8] powerpc/powernv: Cleanup on EEH comments Gavin Shan
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Gavin Shan @ 2015-10-08  3:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

This cleans up pseries_eeh_get_state(), no functional changes:

   * Return EEH_STATE_NOT_SUPPORT early when the 2nd RTAS output
     argument is zero to avoid nested if statements.
   * Skip clearing bits in the PE state represented by variable
     "result" to simplify the code.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 60 ++++++++++++----------------
 1 file changed, 26 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 1ba55d0..ac3ffd9 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -433,42 +433,34 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
 		return ret;
 
 	/* Parse the result out */
-	result = 0;
-	if (rets[1]) {
-		switch(rets[0]) {
-		case 0:
-			result &= ~EEH_STATE_RESET_ACTIVE;
-			result |= EEH_STATE_MMIO_ACTIVE;
-			result |= EEH_STATE_DMA_ACTIVE;
-			break;
-		case 1:
-			result |= EEH_STATE_RESET_ACTIVE;
-			result |= EEH_STATE_MMIO_ACTIVE;
-			result |= EEH_STATE_DMA_ACTIVE;
-			break;
-		case 2:
-			result &= ~EEH_STATE_RESET_ACTIVE;
-			result &= ~EEH_STATE_MMIO_ACTIVE;
-			result &= ~EEH_STATE_DMA_ACTIVE;
-			break;
-		case 4:
-			result &= ~EEH_STATE_RESET_ACTIVE;
-			result &= ~EEH_STATE_MMIO_ACTIVE;
-			result &= ~EEH_STATE_DMA_ACTIVE;
-			result |= EEH_STATE_MMIO_ENABLED;
-			break;
-		case 5:
-			if (rets[2]) {
-				if (state) *state = rets[2];
-				result = EEH_STATE_UNAVAILABLE;
-			} else {
-				result = EEH_STATE_NOT_SUPPORT;
-			}
-			break;
-		default:
+	if (!rets[1])
+		return EEH_STATE_NOT_SUPPORT;
+
+	switch(rets[0]) {
+	case 0:
+		result = EEH_STATE_MMIO_ACTIVE |
+			 EEH_STATE_DMA_ACTIVE;
+		break;
+	case 1:
+		result = EEH_STATE_RESET_ACTIVE |
+			 EEH_STATE_MMIO_ACTIVE  |
+			 EEH_STATE_DMA_ACTIVE;
+		break;
+	case 2:
+		result = 0;
+		break;
+	case 4:
+		result = EEH_STATE_MMIO_ENABLED;
+		break;
+	case 5:
+		if (rets[2]) {
+			if (state) *state = rets[2];
+			result = EEH_STATE_UNAVAILABLE;
+		} else {
 			result = EEH_STATE_NOT_SUPPORT;
 		}
-	} else {
+		break;
+	default:
 		result = EEH_STATE_NOT_SUPPORT;
 	}
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 6/8] powerpc/powernv: Cleanup on EEH comments
  2015-10-08  3:58 [PATCH v2 0/8] EEH Improvement and Cleanup Gavin Shan
                   ` (4 preceding siblings ...)
  2015-10-08  3:58 ` [PATCH v2 5/8] powerpc/pseries: Cleanup on pseries_eeh_get_state() Gavin Shan
@ 2015-10-08  3:58 ` Gavin Shan
  2015-10-08  3:58 ` [PATCH v2 7/8] powerpc/powernv: Remove pnv_eeh_cap_start() Gavin Shan
  2015-10-08  3:58 ` [PATCH v2 8/8] powerpc/powernv: Simplify pnv_eeh_set_option() Gavin Shan
  7 siblings, 0 replies; 22+ messages in thread
From: Gavin Shan @ 2015-10-08  3:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

This applies cleanup on eeh-powernv.c, no functional changes:

   * Remove unnecessary comments and empty line.
   * Correct inaccurate comments.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 3bb6acb..2032936 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -43,17 +43,11 @@
 static bool pnv_eeh_nb_init = false;
 static int eeh_event_irq = -EINVAL;
 
-/**
- * pnv_eeh_init - EEH platform dependent initialization
- *
- * EEH platform dependent initialization on powernv
- */
 static int pnv_eeh_init(void)
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
 
-	/* We require OPALv3 */
 	if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
 		pr_warn("%s: OPALv3 is required !\n",
 			__func__);
@@ -77,9 +71,9 @@ static int pnv_eeh_init(void)
 		/*
 		 * PE#0 should be regarded as valid by EEH core
 		 * if it's not the reserved one. Currently, we
-		 * have the reserved PE#0 and PE#127 for PHB3
+		 * have the reserved PE#255 and PE#127 for PHB3
 		 * and P7IOC separately. So we should regard
-		 * PE#0 as valid for P7IOC.
+		 * PE#0 as valid for PHB3 and P7IOC.
 		 */
 		if (phb->ioda.reserved_pe != 0)
 			eeh_add_flag(EEH_VALID_PE_ZERO);
@@ -284,7 +278,6 @@ static int pnv_eeh_post_init(void)
 #endif /* CONFIG_DEBUG_FS */
 	}
 
-
 	return ret;
 }
 
@@ -490,7 +483,6 @@ static int pnv_eeh_set_option(struct eeh_pe *pe, int option)
 	int opt, ret = 0;
 	s64 rc;
 
-	/* Sanity check on option */
 	switch (option) {
 	case EEH_OPT_DISABLE:
 		return -EPERM;
@@ -1065,7 +1057,6 @@ static int pnv_eeh_err_inject(struct eeh_pe *pe, int type, int func,
 	struct pnv_phb *phb = hose->private_data;
 	s64 rc;
 
-	/* Sanity check on error type */
 	if (type != OPAL_ERR_INJECT_TYPE_IOA_BUS_ERR &&
 	    type != OPAL_ERR_INJECT_TYPE_IOA_BUS_ERR64) {
 		pr_warn("%s: Invalid error type %d\n",
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 7/8] powerpc/powernv: Remove pnv_eeh_cap_start()
  2015-10-08  3:58 [PATCH v2 0/8] EEH Improvement and Cleanup Gavin Shan
                   ` (5 preceding siblings ...)
  2015-10-08  3:58 ` [PATCH v2 6/8] powerpc/powernv: Cleanup on EEH comments Gavin Shan
@ 2015-10-08  3:58 ` Gavin Shan
  2015-10-08  4:18   ` Andrew Donnellan
  2015-10-08  3:58 ` [PATCH v2 8/8] powerpc/powernv: Simplify pnv_eeh_set_option() Gavin Shan
  7 siblings, 1 reply; 22+ messages in thread
From: Gavin Shan @ 2015-10-08  3:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

This moves the logic of pnv_eeh_cap_start() to pnv_eeh_find_cap()
as the function is only called by pnv_eeh_find_cap(). The logic
of both functions are pretty simple. No need to have separate
functions.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 2032936..2b5c70b 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -281,29 +281,20 @@ static int pnv_eeh_post_init(void)
 	return ret;
 }
 
-static int pnv_eeh_cap_start(struct pci_dn *pdn)
+static int pnv_eeh_find_cap(struct pci_dn *pdn, int cap)
 {
-	u32 status;
+	int pos = PCI_CAPABILITY_LIST;
+	int cnt = 48;   /* Maximal number of capabilities */
+	u32 status, id;
 
 	if (!pdn)
 		return 0;
 
+	/* Check if the device supports capabilities */
 	pnv_pci_cfg_read(pdn, PCI_STATUS, 2, &status);
 	if (!(status & PCI_STATUS_CAP_LIST))
 		return 0;
 
-	return PCI_CAPABILITY_LIST;
-}
-
-static int pnv_eeh_find_cap(struct pci_dn *pdn, int cap)
-{
-	int pos = pnv_eeh_cap_start(pdn);
-	int cnt = 48;   /* Maximal number of capabilities */
-	u32 id;
-
-	if (!pos)
-		return 0;
-
 	while (cnt--) {
 		pnv_pci_cfg_read(pdn, pos, 1, &pos);
 		if (pos < 0x40)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 8/8] powerpc/powernv: Simplify pnv_eeh_set_option()
  2015-10-08  3:58 [PATCH v2 0/8] EEH Improvement and Cleanup Gavin Shan
                   ` (6 preceding siblings ...)
  2015-10-08  3:58 ` [PATCH v2 7/8] powerpc/powernv: Remove pnv_eeh_cap_start() Gavin Shan
@ 2015-10-08  3:58 ` Gavin Shan
  2015-10-08  4:33   ` Andrew Donnellan
  7 siblings, 1 reply; 22+ messages in thread
From: Gavin Shan @ 2015-10-08  3:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

This simplifies pnv_eeh_set_option() to avoid unnecessary nested
if statements, to improve readability. No functional changes.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 51 ++++++++++++++--------------
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 2b5c70b..d62007f 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -471,7 +471,7 @@ static int pnv_eeh_set_option(struct eeh_pe *pe, int option)
 	struct pci_controller *hose = pe->phb;
 	struct pnv_phb *phb = hose->private_data;
 	bool freeze_pe = false;
-	int opt, ret = 0;
+	int opt;
 	s64 rc;
 
 	switch (option) {
@@ -494,38 +494,37 @@ static int pnv_eeh_set_option(struct eeh_pe *pe, int option)
 		return -EINVAL;
 	}
 
-	/* If PHB supports compound PE, to handle it */
+	/* Freeze master and slave PEs if PHB supports compound PEs */
 	if (freeze_pe) {
 		if (phb->freeze_pe) {
 			phb->freeze_pe(phb, pe->addr);
-		} else {
-			rc = opal_pci_eeh_freeze_set(phb->opal_id,
-						     pe->addr, opt);
-			if (rc != OPAL_SUCCESS) {
-				pr_warn("%s: Failure %lld freezing "
-					"PHB#%x-PE#%x\n",
-					__func__, rc,
-					phb->hose->global_number, pe->addr);
-				ret = -EIO;
-			}
+			return 0;
 		}
-	} else {
-		if (phb->unfreeze_pe) {
-			ret = phb->unfreeze_pe(phb, pe->addr, opt);
-		} else {
-			rc = opal_pci_eeh_freeze_clear(phb->opal_id,
-						       pe->addr, opt);
-			if (rc != OPAL_SUCCESS) {
-				pr_warn("%s: Failure %lld enable %d "
-					"for PHB#%x-PE#%x\n",
-					__func__, rc, option,
-					phb->hose->global_number, pe->addr);
-				ret = -EIO;
-			}
+
+		rc = opal_pci_eeh_freeze_set(phb->opal_id, pe->addr, opt);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Failure %lld freezing PHB#%x-PE#%x\n",
+				__func__, rc, phb->hose->global_number,
+				pe->addr);
+			return -EIO;
 		}
+
+		return 0;
 	}
 
-	return ret;
+	/* Unfreeze master and slave PEs if PHB supports */
+	if (phb->unfreeze_pe)
+		return phb->unfreeze_pe(phb, pe->addr, opt);
+
+	rc = opal_pci_eeh_freeze_clear(phb->opal_id, pe->addr, opt);
+	if (rc != OPAL_SUCCESS) {
+		pr_warn("%s: Failure %lld enable %d for PHB#%x-PE#%x\n",
+			__func__, rc, option, phb->hose->global_number,
+			pe->addr);
+		return -EIO;
+	}
+
+	return 0;
 }
 
 /**
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/8] powerpc/pseries: Cleanup on pseries_eeh_get_state()
  2015-10-08  3:58 ` [PATCH v2 5/8] powerpc/pseries: Cleanup on pseries_eeh_get_state() Gavin Shan
@ 2015-10-08  4:15   ` Andrew Donnellan
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Donnellan @ 2015-10-08  4:15 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev

On 08/10/15 14:58, Gavin Shan wrote:
> This cleans up pseries_eeh_get_state(), no functional changes:
>
>     * Return EEH_STATE_NOT_SUPPORT early when the 2nd RTAS output
>       argument is zero to avoid nested if statements.
>     * Skip clearing bits in the PE state represented by variable
>       "result" to simplify the code.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Much easier to follow!

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              Software Engineer, OzLabs
andrew.donnellan@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)        IBM Australia Limited

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 7/8] powerpc/powernv: Remove pnv_eeh_cap_start()
  2015-10-08  3:58 ` [PATCH v2 7/8] powerpc/powernv: Remove pnv_eeh_cap_start() Gavin Shan
@ 2015-10-08  4:18   ` Andrew Donnellan
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Donnellan @ 2015-10-08  4:18 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev

On 08/10/15 14:58, Gavin Shan wrote:
> This moves the logic of pnv_eeh_cap_start() to pnv_eeh_find_cap()
> as the function is only called by pnv_eeh_find_cap(). The logic
> of both functions are pretty simple. No need to have separate
> functions.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              Software Engineer, OzLabs
andrew.donnellan@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)        IBM Australia Limited

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 8/8] powerpc/powernv: Simplify pnv_eeh_set_option()
  2015-10-08  3:58 ` [PATCH v2 8/8] powerpc/powernv: Simplify pnv_eeh_set_option() Gavin Shan
@ 2015-10-08  4:33   ` Andrew Donnellan
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Donnellan @ 2015-10-08  4:33 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev

On 08/10/15 14:58, Gavin Shan wrote:
> This simplifies pnv_eeh_set_option() to avoid unnecessary nested
> if statements, to improve readability. No functional changes.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              Software Engineer, OzLabs
andrew.donnellan@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)        IBM Australia Limited

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion
  2015-10-08  3:58 ` [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion Gavin Shan
@ 2015-10-12 22:55   ` Daniel Axtens
  2015-10-12 23:25     ` Gavin Shan
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Axtens @ 2015-10-12 22:55 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev; +Cc: Gavin Shan

[-- Attachment #1: Type: text/plain, Size: 1902 bytes --]

Gavin Shan <gwshan@linux.vnet.ibm.com> writes:

Hi Gavin,

> Currently, we rely on the existence of struct pci_driver::err_handler
> to judge if the corresponding PCI device should be unplugged during
> EEH recovery (partially hotplug case). However, it's not elaborate.
> some device drivers are implementing part of the EEH error handlers
> to collect diag-data. That means the driver still expects a hotplug
> to recover from the EEH error.


> This makes the hotplug criterion more relaxed: if the device driver
> doesn't provide all necessary EEH error handlers, it will experience
> hotplug during EEH recovery.

Interesting.

My understanding of Documentation/PCI/pci-error-recovery.txt is that a
driver should be able to just supply an error_detected() callback. If
the driver just wants to collect diag-data and wants to be hotplugged,
it should return PCI_ERS_RESULT_NONE.

What drivers did you have in mind?

Regards,
Daniel

>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/eeh_driver.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index 3a626ed..32178a4 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
>  	driver = eeh_pcid_get(dev);
>  	if (driver) {
>  		eeh_pcid_put(dev);
> -		if (driver->err_handler)
> +		if (driver->err_handler &&
> +		    driver->err_handler->error_detected &&
> +		    driver->err_handler->slot_reset &&
> +		    driver->err_handler->resume)
>  			return NULL;
>  	}
>  
> -- 
> 2.1.0
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion
  2015-10-12 22:55   ` Daniel Axtens
@ 2015-10-12 23:25     ` Gavin Shan
  2015-10-13  2:48       ` Daniel Axtens
  0 siblings, 1 reply; 22+ messages in thread
From: Gavin Shan @ 2015-10-12 23:25 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: Gavin Shan, linuxppc-dev

On Tue, Oct 13, 2015 at 09:55:53AM +1100, Daniel Axtens wrote:
>> Currently, we rely on the existence of struct pci_driver::err_handler
>> to judge if the corresponding PCI device should be unplugged during
>> EEH recovery (partially hotplug case). However, it's not elaborate.
>> some device drivers are implementing part of the EEH error handlers
>> to collect diag-data. That means the driver still expects a hotplug
>> to recover from the EEH error.
>
>
>> This makes the hotplug criterion more relaxed: if the device driver
>> doesn't provide all necessary EEH error handlers, it will experience
>> hotplug during EEH recovery.
>
>Interesting.
>
>My understanding of Documentation/PCI/pci-error-recovery.txt is that a
>driver should be able to just supply an error_detected() callback. If
>the driver just wants to collect diag-data and wants to be hotplugged,
>it should return PCI_ERS_RESULT_NONE.
>
>What drivers did you have in mind?
>

Danienl, The issue is tracked by IBM's bugzilla 127612 reported from Nvida
private GPU drivers. I tried to find the source code from upstream kernel,
but failed.

Taking an example, one PE has two different devices A and B. A's driver
privides error_detected()/slot_reset()/resume() and it's returning NEED_RESET.
B's driver just provides error_detected() that returns NONE as you said.
EEH core receives NEED_RESET and B won't be having hotplug during recovery.
The error won't be recovered on B.

Thanks,
Gavin

>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kernel/eeh_driver.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>> index 3a626ed..32178a4 100644
>> --- a/arch/powerpc/kernel/eeh_driver.c
>> +++ b/arch/powerpc/kernel/eeh_driver.c
>> @@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>  	driver = eeh_pcid_get(dev);
>>  	if (driver) {
>>  		eeh_pcid_put(dev);
>> -		if (driver->err_handler)
>> +		if (driver->err_handler &&
>> +		    driver->err_handler->error_detected &&
>> +		    driver->err_handler->slot_reset &&
>> +		    driver->err_handler->resume)
>>  			return NULL;
>>  	}
>>  
>> -- 
>> 2.1.0
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/8] powerpc/eeh: Force reset on fenced PHB
  2015-10-08  3:58 ` [PATCH v2 3/8] powerpc/eeh: Force reset on fenced PHB Gavin Shan
@ 2015-10-13  1:43   ` Daniel Axtens
  2015-10-13  5:01     ` Gavin Shan
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Axtens @ 2015-10-13  1:43 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev; +Cc: Gavin Shan

[-- Attachment #1: Type: text/plain, Size: 1028 bytes --]

Gavin Shan <gwshan@linux.vnet.ibm.com> writes:


> +	 *
> +	 * When the PHB is fenced, we have to issue a reset to recover from
> +	 * the error. Override the result if necessary to have partially
> +	 * hotplug for this case.
>  	 */
>  	pr_info("EEH: Notify device drivers to shutdown\n");
>  	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
> +	if ((pe->type & EEH_PE_PHB) &&
> +	    result != PCI_ERS_RESULT_NONE &&
> +	    result != PCI_ERS_RESULT_NEED_RESET)
> +		result = PCI_ERS_RESULT_NEED_RESET;
I think we shouldn't discard the DISCONNECT state. A driver could ask
that the device be disconnected in the error_detected callback and we
should probably honour that.

Regards,
Daniel

>  
>  	/* Get the current PCI slot state. This can take a long time,
>  	 * sometimes over 300 seconds for certain systems.
> -- 
> 2.1.0
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion
  2015-10-12 23:25     ` Gavin Shan
@ 2015-10-13  2:48       ` Daniel Axtens
  2015-10-13  4:28         ` Gavin Shan
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Axtens @ 2015-10-13  2:48 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Gavin Shan, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2451 bytes --]

Gavin Shan <gwshan@linux.vnet.ibm.com> writes:

> Danienl, The issue is tracked by IBM's bugzilla 127612 reported from Nvida
> private GPU drivers. I tried to find the source code from upstream kernel,
> but failed.

OK. So I've read the internal bug, and I'm going to do my best to summarise
without including confidential info.

 1) A PHB with 2 devices is fenced via error injection.

 2) The error_detected() callback is run on both devices. One returns
    CAN_RECOVER, the other returns NONE.

We then fall through to partial-hotplug handling. (BTW this isn't
documented in Documentation/PCI/pci-error-recovery.txt, so at some point
this should be fixed!)

Partial hotplug is detected by the presence of an err_handler, not by
storing the result of error_detected. Would it be better to store the
result from eeh_report_error in the eeh_dev structure, rather than by
looking at more elements of the err_handler structure?

More generally, drivers using error_detect and then returning NONE as a
way to get data and then not participate in EEH is a hack, and it's not
surprising it's breaking in odd ways, especially with partial hotplug.

Partial hotplug is pretty hacky to begin with, and a driver being able
to opt out of EEH selectively is a useful feature, so we probably want
to redesign the state machine to handle them both better. That would be
a long term project.

Regards,
Daniel

> Thanks,
> Gavin
>
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/kernel/eeh_driver.c | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>> index 3a626ed..32178a4 100644
>>> --- a/arch/powerpc/kernel/eeh_driver.c
>>> +++ b/arch/powerpc/kernel/eeh_driver.c
>>> @@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>>  	driver = eeh_pcid_get(dev);
>>>  	if (driver) {
>>>  		eeh_pcid_put(dev);
>>> -		if (driver->err_handler)
>>> +		if (driver->err_handler &&
>>> +		    driver->err_handler->error_detected &&
>>> +		    driver->err_handler->slot_reset &&
>>> +		    driver->err_handler->resume)
>>>  			return NULL;
>>>  	}
>>>  
>>> -- 
>>> 2.1.0
>>>
>>> _______________________________________________
>>> Linuxppc-dev mailing list
>>> Linuxppc-dev@lists.ozlabs.org
>>> https://lists.ozlabs.org/listinfo/linuxppc-dev

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion
  2015-10-13  2:48       ` Daniel Axtens
@ 2015-10-13  4:28         ` Gavin Shan
  2015-10-13 23:48           ` Daniel Axtens
  0 siblings, 1 reply; 22+ messages in thread
From: Gavin Shan @ 2015-10-13  4:28 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: Gavin Shan, linuxppc-dev

On Tue, Oct 13, 2015 at 01:48:54PM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>> Danienl, The issue is tracked by IBM's bugzilla 127612 reported from Nvida
>> private GPU drivers. I tried to find the source code from upstream kernel,
>> but failed.
>
>OK. So I've read the internal bug, and I'm going to do my best to summarise
>without including confidential info.
>
> 1) A PHB with 2 devices is fenced via error injection.
>
> 2) The error_detected() callback is run on both devices. One returns
>    CAN_RECOVER, the other returns NONE.
>
>We then fall through to partial-hotplug handling. (BTW this isn't
>documented in Documentation/PCI/pci-error-recovery.txt, so at some point
>this should be fixed!)
>

No hotplug is triggered when EEH core receives CAN_RECOVER. It seems the
bug brought confusion instead of helping to explain the situation as I
intended to. I was intended to say: there has driver which implements
part of the EEH callbacks to collect diag-data.

>Partial hotplug is detected by the presence of an err_handler, not by
>storing the result of error_detected. Would it be better to store the
>result from eeh_report_error in the eeh_dev structure, rather than by
>looking at more elements of the err_handler structure?
>

I don't see the benefit to do that. In eeh_report_error(), the specific
error handlers still need to be checked and the result (from the check)
is temporary, and not worthy to store that in eeh_dev. The current code
looks good.

>More generally, drivers using error_detect and then returning NONE as a
>way to get data and then not participate in EEH is a hack, and it's not
>surprising it's breaking in odd ways, especially with partial hotplug.
>

I think you're talking about the situation reported from the bug. It's
CAN_RECOVER instead of NONE returned from error_detected(). With the
CAN_RECOVER, the driver hopes the EEH core to enable the IO path so that
it can collect diag-data from IO space at late point.

>Partial hotplug is pretty hacky to begin with, and a driver being able
>to opt out of EEH selectively is a useful feature, so we probably want
>to redesign the state machine to handle them both better. That would be
>a long term project.
>

Thanks,
Gavin

>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>> ---
>>>>  arch/powerpc/kernel/eeh_driver.c | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>>> index 3a626ed..32178a4 100644
>>>> --- a/arch/powerpc/kernel/eeh_driver.c
>>>> +++ b/arch/powerpc/kernel/eeh_driver.c
>>>> @@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>>>  	driver = eeh_pcid_get(dev);
>>>>  	if (driver) {
>>>>  		eeh_pcid_put(dev);
>>>> -		if (driver->err_handler)
>>>> +		if (driver->err_handler &&
>>>> +		    driver->err_handler->error_detected &&
>>>> +		    driver->err_handler->slot_reset &&
>>>> +		    driver->err_handler->resume)
>>>>  			return NULL;
>>>>  	}
>>>>  
>>>> -- 
>>>> 2.1.0
>>>>
>>>> _______________________________________________
>>>> Linuxppc-dev mailing list
>>>> Linuxppc-dev@lists.ozlabs.org
>>>> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/8] powerpc/eeh: Force reset on fenced PHB
  2015-10-13  1:43   ` Daniel Axtens
@ 2015-10-13  5:01     ` Gavin Shan
  2015-10-13  5:18       ` Daniel Axtens
  0 siblings, 1 reply; 22+ messages in thread
From: Gavin Shan @ 2015-10-13  5:01 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: Gavin Shan, linuxppc-dev

On Tue, Oct 13, 2015 at 12:43:23PM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>> +	 *
>> +	 * When the PHB is fenced, we have to issue a reset to recover from
>> +	 * the error. Override the result if necessary to have partially
>> +	 * hotplug for this case.
>>  	 */
>>  	pr_info("EEH: Notify device drivers to shutdown\n");
>>  	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
>> +	if ((pe->type & EEH_PE_PHB) &&
>> +	    result != PCI_ERS_RESULT_NONE &&
>> +	    result != PCI_ERS_RESULT_NEED_RESET)
>> +		result = PCI_ERS_RESULT_NEED_RESET;
>I think we shouldn't discard the DISCONNECT state. A driver could ask
>that the device be disconnected in the error_detected callback and we
>should probably honour that.
>

Not exactly, the improvement is limited to fenced PHB, not frozen PE case.
That's ok to discard DISCONNECT which forces all PHB's subordinate devices
to offline permanently, which isn't so reasonable.

This flag (DISCONNECT) has been there before the partial hotplug is
added. I think the flag can die now with partial hotplug support.

Thanks,
Gavin

>>  
>>  	/* Get the current PCI slot state. This can take a long time,
>>  	 * sometimes over 300 seconds for certain systems.
>> -- 
>> 2.1.0
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/8] powerpc/eeh: Force reset on fenced PHB
  2015-10-13  5:01     ` Gavin Shan
@ 2015-10-13  5:18       ` Daniel Axtens
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Axtens @ 2015-10-13  5:18 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Gavin Shan, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1977 bytes --]

Gavin Shan <gwshan@linux.vnet.ibm.com> writes:

> On Tue, Oct 13, 2015 at 12:43:23PM +1100, Daniel Axtens wrote:
>>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>>
>>> +	 *
>>> +	 * When the PHB is fenced, we have to issue a reset to recover from
>>> +	 * the error. Override the result if necessary to have partially
>>> +	 * hotplug for this case.
>>>  	 */
>>>  	pr_info("EEH: Notify device drivers to shutdown\n");
>>>  	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
>>> +	if ((pe->type & EEH_PE_PHB) &&
>>> +	    result != PCI_ERS_RESULT_NONE &&
>>> +	    result != PCI_ERS_RESULT_NEED_RESET)
>>> +		result = PCI_ERS_RESULT_NEED_RESET;
>>I think we shouldn't discard the DISCONNECT state. A driver could ask
>>that the device be disconnected in the error_detected callback and we
>>should probably honour that.
>>
>
> Not exactly, the improvement is limited to fenced PHB, not frozen PE case.
> That's ok to discard DISCONNECT which forces all PHB's subordinate devices
> to offline permanently, which isn't so reasonable.

I see. That makes more sense now. It's still a bit hacky but I can see
why it should be the way it is.

Reviewed-by: Daniel Axtens <dja@axtens.net>

>
> This flag (DISCONNECT) has been there before the partial hotplug is
> added. I think the flag can die now with partial hotplug support.

OK. It looks like I need to put some more thought into partial hotplug.
I'm increasingly thinking it would be worth redesigning the state
machine and the enumerations/flags to better deal with how things have
evolved over the years.

Regards,
Daniel


>
> Thanks,
> Gavin
>
>>>  
>>>  	/* Get the current PCI slot state. This can take a long time,
>>>  	 * sometimes over 300 seconds for certain systems.
>>> -- 
>>> 2.1.0
>>>
>>> _______________________________________________
>>> Linuxppc-dev mailing list
>>> Linuxppc-dev@lists.ozlabs.org
>>> https://lists.ozlabs.org/listinfo/linuxppc-dev

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion
  2015-10-13  4:28         ` Gavin Shan
@ 2015-10-13 23:48           ` Daniel Axtens
  2015-10-14  1:33             ` Gavin Shan
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Axtens @ 2015-10-13 23:48 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Gavin Shan, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1898 bytes --]

Gavin Shan <gwshan@linux.vnet.ibm.com> writes:


> I think you're talking about the situation reported from the bug. It's
> CAN_RECOVER instead of NONE returned from error_detected(). With the
> CAN_RECOVER, the driver hopes the EEH core to enable the IO path so that
> it can collect diag-data from IO space at late point.

Oh. That's an interesting decision from the driver's point of view.

I obviously need to re-read the patch and the surrounding code and try
again to make sense of it later. Thanks for your attempts to explain it!

Regards,
Daniel

>
>>Partial hotplug is pretty hacky to begin with, and a driver being able
>>to opt out of EEH selectively is a useful feature, so we probably want
>>to redesign the state machine to handle them both better. That would be
>>a long term project.
>>
>
> Thanks,
> Gavin
>
>>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>> ---
>>>>>  arch/powerpc/kernel/eeh_driver.c | 5 ++++-
>>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>>>> index 3a626ed..32178a4 100644
>>>>> --- a/arch/powerpc/kernel/eeh_driver.c
>>>>> +++ b/arch/powerpc/kernel/eeh_driver.c
>>>>> @@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>>>>  	driver = eeh_pcid_get(dev);
>>>>>  	if (driver) {
>>>>>  		eeh_pcid_put(dev);
>>>>> -		if (driver->err_handler)
>>>>> +		if (driver->err_handler &&
>>>>> +		    driver->err_handler->error_detected &&
>>>>> +		    driver->err_handler->slot_reset &&
>>>>> +		    driver->err_handler->resume)
>>>>>  			return NULL;
>>>>>  	}
>>>>>  
>>>>> -- 
>>>>> 2.1.0
>>>>>
>>>>> _______________________________________________
>>>>> Linuxppc-dev mailing list
>>>>> Linuxppc-dev@lists.ozlabs.org
>>>>> https://lists.ozlabs.org/listinfo/linuxppc-dev

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion
  2015-10-13 23:48           ` Daniel Axtens
@ 2015-10-14  1:33             ` Gavin Shan
  0 siblings, 0 replies; 22+ messages in thread
From: Gavin Shan @ 2015-10-14  1:33 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: Gavin Shan, linuxppc-dev

On Wed, Oct 14, 2015 at 10:48:15AM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>> I think you're talking about the situation reported from the bug. It's
>> CAN_RECOVER instead of NONE returned from error_detected(). With the
>> CAN_RECOVER, the driver hopes the EEH core to enable the IO path so that
>> it can collect diag-data from IO space at late point.
>
>Oh. That's an interesting decision from the driver's point of view.
>
>I obviously need to re-read the patch and the surrounding code and try
>again to make sense of it later. Thanks for your attempts to explain it!
>

Yeah, that was the tricky solution we had after discussion. Obviously,
that's breaking EEH core's assumption that driver implements all error
handlers or none of them as you said. Unfortunately, I think there might
have more drivers to continue breaking but EEH core has to support. On
the other hand, the error handlers could be used for purposes other than
recovery, which is good.

Thanks,
Gavin

>>
>>>Partial hotplug is pretty hacky to begin with, and a driver being able
>>>to opt out of EEH selectively is a useful feature, so we probably want
>>>to redesign the state machine to handle them both better. That would be
>>>a long term project.
>>>

>>>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>  arch/powerpc/kernel/eeh_driver.c | 5 ++++-
>>>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>>>>> index 3a626ed..32178a4 100644
>>>>>> --- a/arch/powerpc/kernel/eeh_driver.c
>>>>>> +++ b/arch/powerpc/kernel/eeh_driver.c
>>>>>> @@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>>>>>  	driver = eeh_pcid_get(dev);
>>>>>>  	if (driver) {
>>>>>>  		eeh_pcid_put(dev);
>>>>>> -		if (driver->err_handler)
>>>>>> +		if (driver->err_handler &&
>>>>>> +		    driver->err_handler->error_detected &&
>>>>>> +		    driver->err_handler->slot_reset &&
>>>>>> +		    driver->err_handler->resume)
>>>>>>  			return NULL;
>>>>>>  	}
>>>>>>  
>>>>>> -- 
>>>>>> 2.1.0
>>>>>>
>>>>>> _______________________________________________
>>>>>> Linuxppc-dev mailing list
>>>>>> Linuxppc-dev@lists.ozlabs.org
>>>>>> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2,1/8] powerpc/eeh: Don't unfreeze PHB PE after reset
  2015-10-08  3:58 ` [PATCH v2 1/8] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
@ 2015-10-21 11:41   ` Michael Ellerman
  0 siblings, 0 replies; 22+ messages in thread
From: Michael Ellerman @ 2015-10-21 11:41 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev; +Cc: Gavin Shan

On Thu, 2015-08-10 at 03:58:52 UTC, Gavin Shan wrote:
> On PowerNV platform, the PE is kept in frozen state until the PE
> reset is completed to avoid recursive EEH error caused by MMIO
> access during the period of EEH reset. The PE's frozen state is
> cleared after BARs of PCI device included in the PE are restored
> and enabled. However, we needn't clear the frozen state for PHB PE
> explicitly at this point as there is no real PE for PHB PE. As the
> PHB PE is always binding with PE#0, we actually clear PE#0, which
> is wrong. It doesn't incur any problem though.
> 
> This checks if the PE is PHB PE and doesn't clear the frozen state
> if it is.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/527d10ef3a315d3cb9dc098d

cheers

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-10-21 11:41 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-08  3:58 [PATCH v2 0/8] EEH Improvement and Cleanup Gavin Shan
2015-10-08  3:58 ` [PATCH v2 1/8] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
2015-10-21 11:41   ` [v2,1/8] " Michael Ellerman
2015-10-08  3:58 ` [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion Gavin Shan
2015-10-12 22:55   ` Daniel Axtens
2015-10-12 23:25     ` Gavin Shan
2015-10-13  2:48       ` Daniel Axtens
2015-10-13  4:28         ` Gavin Shan
2015-10-13 23:48           ` Daniel Axtens
2015-10-14  1:33             ` Gavin Shan
2015-10-08  3:58 ` [PATCH v2 3/8] powerpc/eeh: Force reset on fenced PHB Gavin Shan
2015-10-13  1:43   ` Daniel Axtens
2015-10-13  5:01     ` Gavin Shan
2015-10-13  5:18       ` Daniel Axtens
2015-10-08  3:58 ` [PATCH v2 4/8] powerpc/eeh: More relxed condition for enabled IO path Gavin Shan
2015-10-08  3:58 ` [PATCH v2 5/8] powerpc/pseries: Cleanup on pseries_eeh_get_state() Gavin Shan
2015-10-08  4:15   ` Andrew Donnellan
2015-10-08  3:58 ` [PATCH v2 6/8] powerpc/powernv: Cleanup on EEH comments Gavin Shan
2015-10-08  3:58 ` [PATCH v2 7/8] powerpc/powernv: Remove pnv_eeh_cap_start() Gavin Shan
2015-10-08  4:18   ` Andrew Donnellan
2015-10-08  3:58 ` [PATCH v2 8/8] powerpc/powernv: Simplify pnv_eeh_set_option() Gavin Shan
2015-10-08  4:33   ` Andrew Donnellan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.