linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] powerpc/eeh: Release EEH device state synchronously
@ 2020-04-28  3:45 Sam Bobroff
  2020-04-28  3:45 ` [PATCH v4 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge() Sam Bobroff
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Sam Bobroff @ 2020-04-28  3:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nathan Lynch, Oliver O'Halloran

Hi everyone,

Here are some fixes and cleanups that have come from other work but that I
think stand on their own.

Only one patch ("Release EEH device state synchronously", suggested by Oliver
O'Halloran) is a significant change: it moves the cleanup of some EEH device
data out of the (possibly asynchronous) device release handler and into the
(synchronously called) bus notifier. This is useful for future work as it makes
it easier to reason about the lifetimes of EEH structures.

Note that I've left a few WARN_ON_ONCEs in the code because I'm paranoid, but I
have not been able to hit them during testing.

Cheers,
Sam.

Notes for v4:
Stopped using rtas_error_rc() as it is too specific, intead just translate the
one code that is valid for this RTAS call. Therefore, the new patch to export
rtas_error_rc() is dropped.

Notes for v3:
I've tweaked the fix for pseries_eeh_configure_bridge() to return the correct
error code (even though it's not used) by calling an already present RTAS
function, rtas_error_rc(). However, I had to make another change to export that
function and while it does seem like the right thing to do, but I'm concerned
it's a bit out of scope for such a small fix.

Notes for v2:

I've dropped both cleanup patches (3/4, 4/4) because that type of cleanup
(replacing a call to eeh_rmv_from_parent_pe() with one to eeh_remove_device())
is incorrect: if called during recovery, it will cause edev->pe to remain set
when it would have been cleared previously. This would lead to stale
information in the edev. I think there should be a way to simplify the code
around EEH_PE_KEEP but I'll look at that separately.

Patch set changelog follows:

Patch set v4: 
Patch 1/2 (was 2/3): powerpc/eeh: fix pseries_eeh_configure_bridge()
- Just handle the error translation locally, as it's specific to the RTAS call,
  but log the unaltered code in case it's useful for debugging.
Patch 2/2 (was 3/3): powerpc/eeh: Release EEH device state synchronously
Dropped (was 1/3) powerpc/rtas: Export rtas_error_rc

Patch set v3: 
Patch 1/3 (new in this version): powerpc/rtas: Export rtas_error_rc
Patch 2/3 (was 1/2): powerpc/eeh: fix pseries_eeh_configure_bridge()
Patch 3/3 (was 2/2): powerpc/eeh: Release EEH device state synchronously

Patch set v2: 
Patch 1/2: powerpc/eeh: fix pseries_eeh_configure_bridge()
Patch 2/2: powerpc/eeh: Release EEH device state synchronously
- Added comment explaining why the add case can't be handled similarly to the remove case.
Dropped (was 4/4) powerpc/eeh: Clean up edev cleanup for VFs
Dropped (was 3/4) powerpc/eeh: Remove workaround from eeh_add_device_late()

Patch set v1:
Patch 1/4: powerpc/eeh: fix pseries_eeh_configure_bridge()
Patch 2/4: powerpc/eeh: Release EEH device state synchronously
Patch 3/4: powerpc/eeh: Remove workaround from eeh_add_device_late()
Patch 4/4: powerpc/eeh: Clean up edev cleanup for VFs

Sam Bobroff (2):
  powerpc/eeh: fix pseries_eeh_configure_bridge()
  powerpc/eeh: Release EEH device state synchronously

 arch/powerpc/kernel/eeh.c                    | 31 ++++++++++++++++++++
 arch/powerpc/kernel/pci-hotplug.c            |  2 --
 arch/powerpc/platforms/pseries/eeh_pseries.c |  8 ++++-
 3 files changed, 38 insertions(+), 3 deletions(-)

-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge()
  2020-04-28  3:45 [PATCH v4 0/2] powerpc/eeh: Release EEH device state synchronously Sam Bobroff
@ 2020-04-28  3:45 ` Sam Bobroff
  2020-05-06 14:55   ` Nathan Lynch
  2020-04-28  3:45 ` [PATCH v4 2/2] powerpc/eeh: Release EEH device state synchronously Sam Bobroff
  2020-05-20 11:00 ` [PATCH v4 0/2] " Michael Ellerman
  2 siblings, 1 reply; 5+ messages in thread
From: Sam Bobroff @ 2020-04-28  3:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nathan Lynch, Oliver O'Halloran

If a device is hot unplgged during EEH recovery, it's possible for the
RTAS call to ibm,configure-pe in pseries_eeh_configure() to return
parameter error (-3), however negative return values are not checked
for and this leads to an infinite loop.

Fix this by correctly bailing out on negative values.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
v4 - Just handle the error translation locally, as it's specific to the RTAS call,
     but log the unaltered code in case it's useful for debugging.

 arch/powerpc/platforms/pseries/eeh_pseries.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 893ba3f562c4..04c1ed79bc6e 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -607,6 +607,8 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
 
 		if (!ret)
 			return ret;
+		if (ret < 0)
+			break;
 
 		/*
 		 * If RTAS returns a delay value that's above 100ms, cut it
@@ -627,7 +629,11 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
 
 	pr_warn("%s: Unable to configure bridge PHB#%x-PE#%x (%d)\n",
 		__func__, pe->phb->global_number, pe->addr, ret);
-	return ret;
+	/* PAPR defines -3 as "Parameter Error" for this function: */
+	if (ret == -3)
+		return -EINVAL;
+	else
+		return -EIO;
 }
 
 /**
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 2/2] powerpc/eeh: Release EEH device state synchronously
  2020-04-28  3:45 [PATCH v4 0/2] powerpc/eeh: Release EEH device state synchronously Sam Bobroff
  2020-04-28  3:45 ` [PATCH v4 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge() Sam Bobroff
@ 2020-04-28  3:45 ` Sam Bobroff
  2020-05-20 11:00 ` [PATCH v4 0/2] " Michael Ellerman
  2 siblings, 0 replies; 5+ messages in thread
From: Sam Bobroff @ 2020-04-28  3:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nathan Lynch, Oliver O'Halloran

EEH device state is currently removed (by eeh_remove_device()) during
the device release handler, which is invoked as the device's reference
count drops to zero. This may take some time, or forever, as other
threads may hold references.

However, the PCI device state is released synchronously by
pci_stop_and_remove_bus_device(). This mismatch causes problems, for
example the device may be re-discovered as a new device before the
release handler has been called, leaving the PCI and EEH state
mismatched.

So instead, call eeh_remove_device() from the bus device removal
handlers, which are called synchronously in the removal path.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh.c         | 31 +++++++++++++++++++++++++++++++
 arch/powerpc/kernel/pci-hotplug.c |  2 --
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 17cb3e9b5697..64361311bc8e 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1106,6 +1106,37 @@ static int eeh_init(void)
 
 core_initcall_sync(eeh_init);
 
+static int eeh_device_notifier(struct notifier_block *nb,
+			       unsigned long action, void *data)
+{
+	struct device *dev = data;
+
+	switch (action) {
+	/*
+	 * Note: It's not possible to perform EEH device addition (i.e.
+	 * {pseries,pnv}_pcibios_bus_add_device()) here because it depends on
+	 * the device's resources, which have not yet been set up.
+	 */
+	case BUS_NOTIFY_DEL_DEVICE:
+		eeh_remove_device(to_pci_dev(dev));
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block eeh_device_nb = {
+	.notifier_call = eeh_device_notifier,
+};
+
+static __init int eeh_set_bus_notifier(void)
+{
+	bus_register_notifier(&pci_bus_type, &eeh_device_nb);
+	return 0;
+}
+arch_initcall(eeh_set_bus_notifier);
+
 /**
  * eeh_add_device_early - Enable EEH for the indicated device node
  * @pdn: PCI device node for which to set up EEH
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index d6a67f814983..28e9aa274f64 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -57,8 +57,6 @@ void pcibios_release_device(struct pci_dev *dev)
 	struct pci_controller *phb = pci_bus_to_host(dev->bus);
 	struct pci_dn *pdn = pci_get_pdn(dev);
 
-	eeh_remove_device(dev);
-
 	if (phb->controller_ops.release_device)
 		phb->controller_ops.release_device(dev);
 
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge()
  2020-04-28  3:45 ` [PATCH v4 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge() Sam Bobroff
@ 2020-05-06 14:55   ` Nathan Lynch
  0 siblings, 0 replies; 5+ messages in thread
From: Nathan Lynch @ 2020-05-06 14:55 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: linuxppc-dev, Oliver O'Halloran

Sam Bobroff <sbobroff@linux.ibm.com> writes:
> If a device is hot unplgged during EEH recovery, it's possible for the
> RTAS call to ibm,configure-pe in pseries_eeh_configure() to return
> parameter error (-3), however negative return values are not checked
> for and this leads to an infinite loop.
>
> Fix this by correctly bailing out on negative values.

Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 0/2] powerpc/eeh: Release EEH device state synchronously
  2020-04-28  3:45 [PATCH v4 0/2] powerpc/eeh: Release EEH device state synchronously Sam Bobroff
  2020-04-28  3:45 ` [PATCH v4 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge() Sam Bobroff
  2020-04-28  3:45 ` [PATCH v4 2/2] powerpc/eeh: Release EEH device state synchronously Sam Bobroff
@ 2020-05-20 11:00 ` Michael Ellerman
  2 siblings, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2020-05-20 11:00 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev; +Cc: Nathan Lynch, Oliver O'Halloran

On Tue, 28 Apr 2020 13:45:04 +1000, Sam Bobroff wrote:
> Here are some fixes and cleanups that have come from other work but that I
> think stand on their own.
> 
> Only one patch ("Release EEH device state synchronously", suggested by Oliver
> O'Halloran) is a significant change: it moves the cleanup of some EEH device
> data out of the (possibly asynchronous) device release handler and into the
> (synchronously called) bus notifier. This is useful for future work as it makes
> it easier to reason about the lifetimes of EEH structures.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/eeh: Fix pseries_eeh_configure_bridge()
      https://git.kernel.org/powerpc/c/6fa13640aea7bb0760846981aa2da4245307bd26
[2/2] powerpc/eeh: Release EEH device state synchronously
      https://git.kernel.org/powerpc/c/466381ecdc741b1767d980e10b1ec49f6bde56f3

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-05-20 11:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28  3:45 [PATCH v4 0/2] powerpc/eeh: Release EEH device state synchronously Sam Bobroff
2020-04-28  3:45 ` [PATCH v4 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge() Sam Bobroff
2020-05-06 14:55   ` Nathan Lynch
2020-04-28  3:45 ` [PATCH v4 2/2] powerpc/eeh: Release EEH device state synchronously Sam Bobroff
2020-05-20 11:00 ` [PATCH v4 0/2] " Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).