All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/eeh: Set channel state after notifying the drivers
@ 2023-02-09 10:56 Ganesh Goudar
  2023-02-20  3:49 ` Michael Ellerman
  0 siblings, 1 reply; 2+ messages in thread
From: Ganesh Goudar @ 2023-02-09 10:56 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: dick.kennedy, Ganesh Goudar, oohall, mahesh, wenxiong

When a PCI error is encountered 6th time in an hour we
set the channel state to perm_failure and notify the
driver about the permanent failure.

However, after upstream commit 38ddc011478e ("powerpc/eeh:
Make permanently failed devices non-actionable"), EEH handler
stops calling any routine once the device is marked as
permanent failure. This issue can lead to fatal consequences
like kernel hang with certain PCI devices.

Following log is observed with lpfc driver, with and without
this change, Without this change kernel hangs, If PCI error
is encountered 6 times for a device in an hour.

Without the change

 EEH: Beginning: 'error_detected(permanent failure)'
 PCI 0132:60:00.0#600000: EEH: not actionable (1,1,1)
 PCI 0132:60:00.1#600000: EEH: not actionable (1,1,1)
 EEH: Finished:'error_detected(permanent failure)'

With the change

 EEH: Beginning: 'error_detected(permanent failure)'
 EEH: Invoking lpfc->error_detected(permanent failure)
 EEH: lpfc driver reports: 'disconnect'
 EEH: Invoking lpfc->error_detected(permanent failure)
 EEH: lpfc driver reports: 'disconnect'
 EEH: Finished:'error_detected(permanent failure)'

To fix the issue, set channel state to permanent failure after
notifying the drivers.

Fixes: 38ddc011478e ("powerpc/eeh: Make permanently failed devices non-actionable")
Suggested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index f279295179bd..438568a472d0 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -1065,10 +1065,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	eeh_slot_error_detail(pe, EEH_LOG_PERM);
 
 	/* Notify all devices that they're about to go down. */
-	eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 	eeh_set_irq_state(pe, false);
 	eeh_pe_report("error_detected(permanent failure)", pe,
 		      eeh_report_failure, NULL);
+	eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 
 	/* Mark the PE to be removed permanently */
 	eeh_pe_state_mark(pe, EEH_PE_REMOVED);
@@ -1185,10 +1185,10 @@ void eeh_handle_special_event(void)
 
 			/* Notify all devices to be down */
 			eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
-			eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 			eeh_pe_report(
 				"error_detected(permanent failure)", pe,
 				eeh_report_failure, NULL);
+			eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 
 			pci_lock_rescan_remove();
 			list_for_each_entry(hose, &hose_list, list_node) {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] powerpc/eeh: Set channel state after notifying the drivers
  2023-02-09 10:56 [PATCH] powerpc/eeh: Set channel state after notifying the drivers Ganesh Goudar
@ 2023-02-20  3:49 ` Michael Ellerman
  0 siblings, 0 replies; 2+ messages in thread
From: Michael Ellerman @ 2023-02-20  3:49 UTC (permalink / raw)
  To: Ganesh Goudar, mpe, linuxppc-dev; +Cc: dick.kennedy, oohall, mahesh, wenxiong

On Thu, 9 Feb 2023 16:26:49 +0530, Ganesh Goudar wrote:
> When a PCI error is encountered 6th time in an hour we
> set the channel state to perm_failure and notify the
> driver about the permanent failure.
> 
> However, after upstream commit 38ddc011478e ("powerpc/eeh:
> Make permanently failed devices non-actionable"), EEH handler
> stops calling any routine once the device is marked as
> permanent failure. This issue can lead to fatal consequences
> like kernel hang with certain PCI devices.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/eeh: Set channel state after notifying the drivers
      https://git.kernel.org/powerpc/c/9efcdaac36e1643a1b7f5337e6143ce142d381b1

cheers

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-02-20  3:53 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 10:56 [PATCH] powerpc/eeh: Set channel state after notifying the drivers Ganesh Goudar
2023-02-20  3:49 ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.