EEH Causes bnx2x to Hang in napi_disable

* EEH Causes bnx2x to Hang in napi_disable
@ 2022-04-11 18:06 David Christensen
  2022-04-15 14:48 ` [EXT] " Manish Chopra
  0 siblings, 1 reply; 3+ messages in thread
From: David Christensen @ 2022-04-11 18:06 UTC (permalink / raw)
  To: Netdev, aelior, manishc, skalluru

Experiencing an inaccessible system when bnx2x attempts to recover from 
an injected EEH error. This is a POWER10 system running SLES 15 SP3 
(5.3.18-150300.59.49-default) with a BCM57810 dual port NIC.

System message log shows the following:
=======================================
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000
[ 3222.537511] EEH: This PCI device has failed 2 times in the last hour 
and will be permanently disabled after 5 failures.
[ 3222.537512] EEH: Notify device drivers to shutdown
[ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking 
bnx2x->error_detected(IO frozen)
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error 
detected
[ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports: 
'need reset'
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking 
bnx2x->error_detected(IO frozen)
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error 
detected
[ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports: 
'need reset'
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate 
recovery state:'need reset'
[ 3222.537890] EEH: Collect temporary log
[ 3222.583481] EEH: of node=0384:80:00.0
[ 3222.583519] EEH: PCI device/vendor: 168e14e4
[ 3222.583557] EEH: PCI cmd/status register: 00100140
[ 3222.583557] EEH: PCI-E capabilities and status follow:
[ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82
[ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000
[ 3222.583893] EEH: PCI-E 20: 00000000
[ 3222.583893] EEH: PCI-E AER capability register set follows:
[ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030
[ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000
[ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000
[ 3222.584416] EEH: of node=0384:80:00.1
[ 3222.584454] EEH: PCI device/vendor: 168e14e4
[ 3222.584491] EEH: PCI cmd/status register: 00100140
[ 3222.584492] EEH: PCI-E capabilities and status follow:
[ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82
[ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000
[ 3222.584826] EEH: PCI-E 20: 00000000
[ 3222.584826] EEH: PCI-E AER capability register set follows:
[ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030
[ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000
[ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000
[ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2
[ 3222.586873] EEH: Reset without hotplug activity
[ 3224.762767] EEH: Beginning: 'slot_reset'
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking 
bnx2x->slot_reset()
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset 
initializing...
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142)
[ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset 
--> driver unload

Uninterruptable tasks
=====================
crash> ps | grep UN
     213      2  11  c000000004c89e00  UN   0.0       0      0  [eehd]
     215      2   0  c000000004c80000  UN   0.0       0      0 
[kworker/0:2]
    2196      1  28  c000000004504f00  UN   0.1   15936  11136  wickedd
    4287      1   9  c00000020d076800  UN   0.0    4032   3008  agetty
    4289      1  20  c00000020d056680  UN   0.0    7232   3840  agetty
   32423      2  26  c00000020038c580  UN   0.0       0      0 
[kworker/26:3]
   32871   4241  27  c0000002609ddd00  UN   0.1   18624  11648  sshd
   32920  10130  16  c00000027284a100  UN   0.1   48512  12608  sendmail
   33092  32987   0  c000000205218b00  UN   0.1   48512  12608  sendmail
   33154   4567  16  c000000260e51780  UN   0.1   48832  12864  pickup
   33209   4241  36  c000000270cb6500  UN   0.1   18624  11712  sshd
   33473  33283   0  c000000205211480  UN   0.1   48512  12672  sendmail
   33531   4241  37  c00000023c902780  UN   0.1   18624  11648  sshd

EEH handler hung while bnx2x sleeping and holding RTNL lock
===========================================================
crash> bt 213
PID: 213    TASK: c000000004c89e00  CPU: 11  COMMAND: "eehd"
  #0 [c000000004d477e0] __schedule at c000000000c70808
  #1 [c000000004d478b0] schedule at c000000000c70ee0
  #2 [c000000004d478e0] schedule_timeout at c000000000c76dec
  #3 [c000000004d479c0] msleep at c0000000002120cc
  #4 [c000000004d479f0] napi_disable at c000000000a06448
                                        ^^^^^^^^^^^^^^^^
  #5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x]
  #6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x]
  #7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc
  #8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8
  #9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64
#10 [c000000004d47d00] eeh_event_handler at c00000000004e330
#11 [c000000004d47db0] kthread at c00000000017583c
#12 [c000000004d47e20] ret_from_kernel_thread at c00000000000cda8

And the sleeping source code
============================
crash> dis -ls c000000000a06448
FILE: ../net/core/dev.c
LINE: 6702

   6697  {
   6698          might_sleep();
   6699          set_bit(NAPI_STATE_DISABLE, &n->state);
   6700
   6701          while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
* 6702                  msleep(1);
   6703          while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
   6704                  msleep(1);
   6705
   6706          hrtimer_cancel(&n->timer);
   6707
   6708          clear_bit(NAPI_STATE_DISABLE, &n->state);
   6709  }

EEH calls into bnx2x twice based on the system log above, first through 
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes 
the following call chains:

bnx2x_io_error_detected()
  +-> bnx2x_eeh_nic_unload()
       +-> bnx2x_del_all_napi()
            +-> __netif_napi_del()

bnx2x_io_slot_reset()
  +-> bnx2x_netif_stop()
       +-> bnx2x_napi_disable()
            +->napi_disable()

I'm suspicious of the napi_disable() following the __netif_napi_del(), 
based on my read of the NAPI API 
(https://wiki.linuxfoundation.org/networking/napi).  Though not 
explicitly documented, it seems like disabling NAPI after deleting NAPI 
resources is a bad thing.  Am I mistaken in my understanding of the NAPI 
API or is there a bug here?  If the latter, how can we fix it?

David Christensen

^ permalink raw reply	[flat|nested] 3+ messages in thread