All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/eeh: Probe after unbalanced kref check
@ 2015-08-14  6:03 Daniel Axtens
  2015-08-14  7:30 ` Gavin Shan
  2015-08-17  8:03 ` Michael Ellerman
  0 siblings, 2 replies; 3+ messages in thread
From: Daniel Axtens @ 2015-08-14  6:03 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, Matthew R. Ochs, Manoj Kumar, mikey, imunsie,
	Gavin Shan, Daniel Axtens

In the complete hotplug case, EEH PEs are supposed to be released
and set to NULL. Normally, this is done by eeh_remove_device(),
which is called from pcibios_release_device().

However, if something is holding a kref to the device, it will not
be released, and the PE will remain. eeh_add_device_late() has
a check for this which will explictly destroy the PE in this case.

This check in eeh_add_device_late() occurs after a call to
eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
which will exit without probing if there is an existing PE.

This means that on PowerNV, devices with outstanding krefs will not
be rediscovered by EEH correctly after a complete hotplug. This is
affecting CXL (CAPI) devices in the field.

Put the probe after the kref check so that the PE is destroyed
and affected devices are correctly rediscovered by EEH.

Fixes: d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug")
Cc: stable@vger.kernel.org
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/kernel/eeh.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index af9b597b10af..8e61d717915e 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1116,9 +1116,6 @@ void eeh_add_device_late(struct pci_dev *dev)
 		return;
 	}
 
-	if (eeh_has_flag(EEH_PROBE_MODE_DEV))
-		eeh_ops->probe(pdn, NULL);
-
 	/*
 	 * The EEH cache might not be removed correctly because of
 	 * unbalanced kref to the device during unplug time, which
@@ -1142,6 +1139,9 @@ void eeh_add_device_late(struct pci_dev *dev)
 		dev->dev.archdata.edev = NULL;
 	}
 
+	if (eeh_has_flag(EEH_PROBE_MODE_DEV))
+		eeh_ops->probe(pdn, NULL);
+
 	edev->pdev = dev;
 	dev->dev.archdata.edev = edev;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] powerpc/eeh: Probe after unbalanced kref check
  2015-08-14  6:03 [PATCH] powerpc/eeh: Probe after unbalanced kref check Daniel Axtens
@ 2015-08-14  7:30 ` Gavin Shan
  2015-08-17  8:03 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Gavin Shan @ 2015-08-14  7:30 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: linuxppc-dev, mpe, benh, Matthew R. Ochs, Manoj Kumar, mikey,
	imunsie, Gavin Shan

On Fri, Aug 14, 2015 at 04:03:19PM +1000, Daniel Axtens wrote:
>In the complete hotplug case, EEH PEs are supposed to be released
>and set to NULL. Normally, this is done by eeh_remove_device(),
>which is called from pcibios_release_device().
>
>However, if something is holding a kref to the device, it will not
>be released, and the PE will remain. eeh_add_device_late() has
>a check for this which will explictly destroy the PE in this case.
>
>This check in eeh_add_device_late() occurs after a call to
>eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
>which will exit without probing if there is an existing PE.
>
>This means that on PowerNV, devices with outstanding krefs will not
>be rediscovered by EEH correctly after a complete hotplug. This is
>affecting CXL (CAPI) devices in the field.
>
>Put the probe after the kref check so that the PE is destroyed
>and affected devices are correctly rediscovered by EEH.
>
>Fixes: d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug")
>Cc: stable@vger.kernel.org
>Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
>Signed-off-by: Daniel Axtens <dja@axtens.net>

Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Thanks,
Gavin

>---
> arch/powerpc/kernel/eeh.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>index af9b597b10af..8e61d717915e 100644
>--- a/arch/powerpc/kernel/eeh.c
>+++ b/arch/powerpc/kernel/eeh.c
>@@ -1116,9 +1116,6 @@ void eeh_add_device_late(struct pci_dev *dev)
> 		return;
> 	}
>
>-	if (eeh_has_flag(EEH_PROBE_MODE_DEV))
>-		eeh_ops->probe(pdn, NULL);
>-
> 	/*
> 	 * The EEH cache might not be removed correctly because of
> 	 * unbalanced kref to the device during unplug time, which
>@@ -1142,6 +1139,9 @@ void eeh_add_device_late(struct pci_dev *dev)
> 		dev->dev.archdata.edev = NULL;
> 	}
>
>+	if (eeh_has_flag(EEH_PROBE_MODE_DEV))
>+		eeh_ops->probe(pdn, NULL);
>+
> 	edev->pdev = dev;
> 	dev->dev.archdata.edev = edev;
>
>-- 
>2.1.4
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: powerpc/eeh: Probe after unbalanced kref check
  2015-08-14  6:03 [PATCH] powerpc/eeh: Probe after unbalanced kref check Daniel Axtens
  2015-08-14  7:30 ` Gavin Shan
@ 2015-08-17  8:03 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2015-08-17  8:03 UTC (permalink / raw)
  To: Daniel Axtens, linuxppc-dev
  Cc: mikey, Matthew R. Ochs, imunsie, Gavin Shan, Manoj Kumar, Daniel Axtens

On Fri, 2015-14-08 at 06:03:19 UTC, Daniel Axtens wrote:
> In the complete hotplug case, EEH PEs are supposed to be released
> and set to NULL. Normally, this is done by eeh_remove_device(),
> which is called from pcibios_release_device().
> 
> However, if something is holding a kref to the device, it will not
> be released, and the PE will remain. eeh_add_device_late() has
> a check for this which will explictly destroy the PE in this case.
> 
> This check in eeh_add_device_late() occurs after a call to
> eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
> which will exit without probing if there is an existing PE.
> 
> This means that on PowerNV, devices with outstanding krefs will not
> be rediscovered by EEH correctly after a complete hotplug. This is
> affecting CXL (CAPI) devices in the field.
> 
> Put the probe after the kref check so that the PE is destroyed
> and affected devices are correctly rediscovered by EEH.
> 
> Fixes: d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug")
> Cc: stable@vger.kernel.org
> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e642d11bdbfe8eb10116

cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-08-17  8:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-14  6:03 [PATCH] powerpc/eeh: Probe after unbalanced kref check Daniel Axtens
2015-08-14  7:30 ` Gavin Shan
2015-08-17  8:03 ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.