wilc1000 kernel crash

* wilc1000 kernel crash
@ 2022-10-24 13:54 Michael Walle
  2022-10-25 20:26 ` Ajay.Kathat
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Walle @ 2022-10-24 13:54 UTC (permalink / raw)
  To: Ajay Singh, Claudiu Beznea; +Cc: linux-wireless, Kalle Valo

Hi,

I'm using the WILC1000 wifi chip in SDIO mode and with NetworkManager
which seems to be probing the network in the background. What I am
seeing is a kernel oops while processing the workqueue.

This is on a kernel 5.15.74, but it also happens with the latest next,
but not that often - I guess due to a different timing.

My reduced steps to reproduce are the following:
  $ while true; do ifconfig wlan0 up; iw dev wlan0 scan & \
      ifconfig wlan0 down; done

After a while I'll get the following splash:

[  487.955326] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  487.955363] Mem abort info:
[  487.955366]   ESR = 0x96000004
[  487.955370]   EC = 0x25: DABT (current EL), IL = 32 bits
[  487.965939] FW not responding
[  487.971033]   SET = 0, FnV = 0
[  487.971039]   EA = 0, S1PTW = 0
[  487.971043]   FSC = 0x04: level 0 translation fault
[  487.971047] Data abort info:
[  487.971050]   ISV = 0, ISS = 0x00000004
[  487.971053]   CM = 0, WnR = 0
[  487.971059] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000497b0000
[  487.971066] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  487.971085] Internal error: Oops: 96000004 [#1] SMP
[  487.971094] Modules linked in:
[  487.971104] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.74-00013-g2d5897cb12ef #130
[  487.971113] Hardware name: NXP i.MX8MNano DDR3L EVK board (DT)
[  487.971122] Workqueue: WILC_wq handle_rcvd_ntwrk_info
[  488.035377] wilc1000_sdio mmc1:0001:1: chipid (001003a0)
[  487.971085] Internal error: Oops:
[  488.041180] 96000004 [#1] SMP
[  488.041186] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  488.041196] pc : handle_rcvd_ntwrk_info+0x7c/0xc4
[  488.041208] lr : handle_rcvd_ntwrk_info+0x70/0xc4
[  488.049128] wilc1000_sdio mmc1:0001:1: has_thrpt_enh3 = 1...
[  488.057273] sp : ffff80000a20bd70
[  488.057277] x29: ffff80000a20bd70 x28: 0000000000000000 x27: 0000000000000000
[  488.057289] x26: ffff000000118470 x25: ffff000005059d05 x24: ffff00000de94d30
[  488.057299] x23: 0000000000000000
[  488.062670] wilc1000_sdio mmc1:0001:1 wlan0: ChipID [1003a0] loading firmware [atmel/wilc1000_wifi_firmware-1.bin]
[  488.070418]  x22: ffff000005059d00 x21: 0000000000000000
[  488.070428] x20: ffff00000de94d00 x19: ffff00000de94d28 x18: 0000000000000000
[  488.070440] x17: 0000000000000000 x16: 0000000000000000 x15: a4270000a4030001
[  488.070450] x14: 010102f2500018dd x13: 0018dd0000010002 x12: 0546000000000000
[  488.070461] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800008ad92a0
[  488.150644] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000063e88c0
[  488.157799] x5 : 0000000000000000 x4 : 0000000000000003 x3 : 0000000000000000
[  488.164947] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000001
[  488.172095] Call trace:
[  488.174548]  handle_rcvd_ntwrk_info+0x7c/0xc4
[  488.175400] FW not responding
[  488.178927]  process_one_work+0x1ec/0x48c
[  488.178941]  worker_thread+0x170/0x564
[  488.178948]  kthread+0x128/0x13c
[  488.178959]  ret_from_fork+0x10/0x20
[  488.185280] FW not responding
[  488.185958] Code: 9415ea8e b4000060 39400401 35000201 (f94002a3)
[  488.192042] FW not responding
[  488.192957] ---[ end trace fa915dc840cf0355 ]---
[  488.199700] FW not responding
[  488.205601] Kernel panic - not syncing: Oops: Fatal exception
[  488.205608] SMP: stopping secondary CPUs
[  488.205630] Kernel Offset: disabled
[  488.205634] CPU features: 0x00002001,20000846
[  488.205642] Memory Limit: none

In handle_rcvd_ntwrk_info() scan_req->scan_result isn't valid anymore,
although it doesn't contain NULL. Thus the driver is calling into a
bogus function pointer. There seems to be no locking between the
asynchronous calls within the workqueue (wilc_enqueue_work()) and when
the interface is disabled (wilc_deinit()). wilc_deinit() will free the
host_if_drv object which might still be used within the workqueue
context.

BTW, ignore the "FW not repsonding" for now, that seems to be a
different problem.

-michael

^ permalink raw reply	[flat|nested] 9+ messages in thread