All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] powerpc/eeh: fix deadlock handling dead PHB
@ 2020-02-07  4:57 Sam Bobroff
  2020-02-07  8:56 ` Frederic Barrat
  2020-02-19 12:39 ` Michael Ellerman
  0 siblings, 2 replies; 3+ messages in thread
From: Sam Bobroff @ 2020-02-07  4:57 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: fbarrat

Recovering a dead PHB can currently cause a deadlock as the PCI
rescan/remove lock is taken twice.

This is caused as part of an existing bug in
eeh_handle_special_event(). The pe is processed while traversing the
PHBs even though the pe is unrelated to the loop. This causes the pe
to be, incorrectly, processed more than once.

Untangling this section can move the pe processing out of the loop and
also outside the locked section, correcting both problems.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
I have only compile tested this fix, Frederic Barrat (who discovered it) has
offered to test it (thanks!).

 arch/powerpc/kernel/eeh_driver.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 3dd1a422fc29..d6e75a8a14ce 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -1190,6 +1190,17 @@ void eeh_handle_special_event(void)
 			eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
 			eeh_handle_normal_event(pe);
 		} else {
+			eeh_for_each_pe(pe, tmp_pe)
+				eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
+					edev->mode &= ~EEH_DEV_NO_HANDLER;
+
+			/* Notify all devices to be down */
+			eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
+			eeh_set_channel_state(pe, pci_channel_io_perm_failure);
+			eeh_pe_report(
+				"error_detected(permanent failure)", pe,
+				eeh_report_failure, NULL);
+
 			pci_lock_rescan_remove();
 			list_for_each_entry(hose, &hose_list, list_node) {
 				phb_pe = eeh_phb_pe_get(hose);
@@ -1198,16 +1209,6 @@ void eeh_handle_special_event(void)
 				    (phb_pe->state & EEH_PE_RECOVERING))
 					continue;
 
-				eeh_for_each_pe(pe, tmp_pe)
-					eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
-						edev->mode &= ~EEH_DEV_NO_HANDLER;
-
-				/* Notify all devices to be down */
-				eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
-				eeh_set_channel_state(pe, pci_channel_io_perm_failure);
-				eeh_pe_report(
-					"error_detected(permanent failure)", pe,
-					eeh_report_failure, NULL);
 				bus = eeh_pe_bus_get(phb_pe);
 				if (!bus) {
 					pr_err("%s: Cannot find PCI bus for "
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] powerpc/eeh: fix deadlock handling dead PHB
  2020-02-07  4:57 [PATCH 1/1] powerpc/eeh: fix deadlock handling dead PHB Sam Bobroff
@ 2020-02-07  8:56 ` Frederic Barrat
  2020-02-19 12:39 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Frederic Barrat @ 2020-02-07  8:56 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev



Le 07/02/2020 à 05:57, Sam Bobroff a écrit :
> Recovering a dead PHB can currently cause a deadlock as the PCI
> rescan/remove lock is taken twice.
> 
> This is caused as part of an existing bug in
> eeh_handle_special_event(). The pe is processed while traversing the
> PHBs even though the pe is unrelated to the loop. This causes the pe
> to be, incorrectly, processed more than once.
> 
> Untangling this section can move the pe processing out of the loop and
> also outside the locked section, correcting both problems.
> 
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> ---


Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Tested-by: Frederic Barrat <fbarrat@linux.ibm.com>

I think it also needs:
Fixes: 2e25505147b8 ("powerpc/eeh: Fix crash when edev->pdev changes")
Cc: stable@vger.kernel.org # 5.4+


   Fred


> I have only compile tested this fix, Frederic Barrat (who discovered it) has
> offered to test it (thanks!).
> 
>   arch/powerpc/kernel/eeh_driver.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index 3dd1a422fc29..d6e75a8a14ce 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -1190,6 +1190,17 @@ void eeh_handle_special_event(void)
>   			eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
>   			eeh_handle_normal_event(pe);
>   		} else {
> +			eeh_for_each_pe(pe, tmp_pe)
> +				eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
> +					edev->mode &= ~EEH_DEV_NO_HANDLER;
> +
> +			/* Notify all devices to be down */
> +			eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
> +			eeh_set_channel_state(pe, pci_channel_io_perm_failure);
> +			eeh_pe_report(
> +				"error_detected(permanent failure)", pe,
> +				eeh_report_failure, NULL);
> +
>   			pci_lock_rescan_remove();
>   			list_for_each_entry(hose, &hose_list, list_node) {
>   				phb_pe = eeh_phb_pe_get(hose);
> @@ -1198,16 +1209,6 @@ void eeh_handle_special_event(void)
>   				    (phb_pe->state & EEH_PE_RECOVERING))
>   					continue;
>   
> -				eeh_for_each_pe(pe, tmp_pe)
> -					eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
> -						edev->mode &= ~EEH_DEV_NO_HANDLER;
> -
> -				/* Notify all devices to be down */
> -				eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
> -				eeh_set_channel_state(pe, pci_channel_io_perm_failure);
> -				eeh_pe_report(
> -					"error_detected(permanent failure)", pe,
> -					eeh_report_failure, NULL);
>   				bus = eeh_pe_bus_get(phb_pe);
>   				if (!bus) {
>   					pr_err("%s: Cannot find PCI bus for "
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] powerpc/eeh: fix deadlock handling dead PHB
  2020-02-07  4:57 [PATCH 1/1] powerpc/eeh: fix deadlock handling dead PHB Sam Bobroff
  2020-02-07  8:56 ` Frederic Barrat
@ 2020-02-19 12:39 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2020-02-19 12:39 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev; +Cc: fbarrat

On Fri, 2020-02-07 at 04:57:31 UTC, Sam Bobroff wrote:
> Recovering a dead PHB can currently cause a deadlock as the PCI
> rescan/remove lock is taken twice.
> 
> This is caused as part of an existing bug in
> eeh_handle_special_event(). The pe is processed while traversing the
> PHBs even though the pe is unrelated to the loop. This causes the pe
> to be, incorrectly, processed more than once.
> 
> Untangling this section can move the pe processing out of the loop and
> also outside the locked section, correcting both problems.
> 
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/d4f194ed9eb9841a8f978710e4d24296f791a85b

cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-02-19 12:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07  4:57 [PATCH 1/1] powerpc/eeh: fix deadlock handling dead PHB Sam Bobroff
2020-02-07  8:56 ` Frederic Barrat
2020-02-19 12:39 ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.