bad unlock balance WARNING at nvme/045

* bad unlock balance WARNING at nvme/045
@ 2022-10-18  8:03 Shinichiro Kawasaki
  2022-10-18 10:57 ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Shinichiro Kawasaki @ 2022-10-18  8:03 UTC (permalink / raw)
  To: linux-nvme, Hannes Reinecke; +Cc: Damien Le Moal

Hello Hannes,

I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As the Call
Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.

	mutex_lock(&ctrl->dhchap_auth_mutex);
	list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
		mutex_unlock(&ctrl->dhchap_auth_mutex);
		flush_work(&chap->auth_work);
		__nvme_auth_reset(chap);
	}
	mutex_unlock(&ctrl->dhchap_auth_mutex);

I tried to remove the mutex_unlock in the list iteration with a patch [2], but
it resulted in another "WARNING: possible recursive locking detected" [3]. I'm
not sure but cause of this WARN could be __nvme_auth_work and
nvme_dhchap_auth_work in same nvme_wq.

Could you take a look for fix?


[1]

[   89.883480] loop: module loaded
[   90.271575] run blktests nvme/045 at 2022-10-18 15:13:33
[   90.774324] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   91.410877] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid: with DH-HMAC-CHAP.
[   91.454315] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[   91.456445] nvme nvme6: qid 0: controller authenticated
[   91.458581] nvme nvme6: qid 0: authenticated
[   91.466122] nvme nvme6: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[   91.469709] nvme nvme6: creating 4 I/O queues.
[   91.620115] nvme nvme6: new ctrl: "blktests-subsystem-1"

[   93.775756] =====================================
[   93.776961] WARNING: bad unlock balance detected!
[   93.778088] 6.1.0-rc1 #3 Not tainted
[   93.779011] -------------------------------------
[   93.780090] check/961 is trying to release lock (&ctrl->dhchap_auth_mutex) at:
[   93.781726] [<ffffffffc03bf28b>] nvme_auth_reset+0x5b/0xb0 [nvme_core]
[   93.783310] but there are no more locks to release!
[   93.784461] 
               other info that might help us debug this:
[   93.786087] 3 locks held by check/961:
[   93.787098]  #0: ffff8881135fa460 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xe7/0x1b0
[   93.788923]  #1: ffff88811fb1d888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x21d/0x530
[   93.790888]  #2: ffff88810d8160f0 (kn->active#110){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x241/0x530
[   93.792960] 
               stack backtrace:
[   93.794168] CPU: 1 PID: 961 Comm: check Not tainted 6.1.0-rc1 #3
[   93.795552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   93.798016] Call Trace:
[   93.798716]  <TASK>
[   93.799298]  dump_stack_lvl+0x5b/0x77
[   93.800251]  lock_release.cold+0x10/0x4e
[   93.801244]  ? nvme_auth_reset+0x5b/0xb0 [nvme_core]
[   93.802479]  ? reacquire_held_locks+0x4e0/0x4e0
[   93.803537]  ? mark_held_locks+0x9e/0xe0
[   93.804544]  __mutex_unlock_slowpath+0x8c/0x5f0
[   93.805650]  ? kasan_quarantine_put+0x94/0x1f0
[   93.806759]  ? bit_wait_timeout+0x170/0x170
[   93.807757]  ? __nvme_auth_reset+0x198/0x3d0 [nvme_core]
[   93.809010]  ? __kmem_cache_free+0xa9/0x390
[   93.810059]  nvme_auth_reset+0x5b/0xb0 [nvme_core]
[   93.811253]  nvme_ctrl_dhchap_secret_store+0x1b4/0x1d0 [nvme_core]
[   93.812710]  kernfs_fop_write_iter+0x356/0x530
[   93.813754]  vfs_write+0x519/0xc50
[   93.814513]  ? kernel_write+0x590/0x590
[   93.815410]  ? __up_read+0x182/0x700
[   93.816252]  ? __fget_light+0x51/0x230
[   93.817092]  ksys_write+0xe7/0x1b0
[   93.817888]  ? __ia32_sys_read+0xa0/0xa0
[   93.818750]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
[   93.819901]  ? syscall_enter_from_user_mode+0x22/0xc0
[   93.820929]  ? lockdep_hardirqs_on+0x7d/0x100
[   93.821848]  do_syscall_64+0x37/0x90
[   93.822583]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   93.823615] RIP: 0033:0x7f6f84d018f7
[   93.824372] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[   93.827699] RSP: 002b:00007ffcdb0ce5d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   93.829137] RAX: ffffffffffffffda RBX: 000000000000003c RCX: 00007f6f84d018f7
[   93.830502] RDX: 000000000000003c RSI: 00005582004ce6e0 RDI: 0000000000000001
[   93.831879] RBP: 00005582004ce6e0 R08: 0000000000000000 R09: 0000000000000073
[   93.833236] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000003c
[   93.834594] R13: 00007f6f84df9780 R14: 000000000000003c R15: 00007f6f84df49e0
[   93.835979]  </TASK>
[   93.836823] nvme nvme6: re-authenticating controller
[   93.871885] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[   93.873754] nvme nvme6: qid 0: controller authenticated
[   93.916346] nvme nvme6: re-authenticating controller
[   93.948156] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[   93.949498] nvme nvme6: qid 0: controller authenticated
[   93.992725] nvme nvme6: re-authenticating controller
[   94.025843] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[   94.027160] nvme nvme6: qid 0: controller authenticated
[   98.836070] nvme nvme6: re-authenticating controller
[   99.345893] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[   99.347859] nvme nvme6: qid 0: controller authenticated
[   99.736321] nvme nvme6: re-authenticating controller
[  100.014469] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[  100.015192] nvme nvme6: qid 0: controller authenticated
[  101.560718] nvme nvme6: Removing ctrl: NQN "blktests-subsystem-1"


[2]

diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
index c8a6db7c4498..4e824aab30eb 100644
--- a/drivers/nvme/host/auth.c
+++ b/drivers/nvme/host/auth.c
@@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl)

        mutex_lock(&ctrl->dhchap_auth_mutex);
        list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
-               mutex_unlock(&ctrl->dhchap_auth_mutex);
                flush_work(&chap->auth_work);
                __nvme_auth_reset(chap);
        }

[3]

[ 2678.264545] loop: module loaded
[ 2678.679919] run blktests nvme/045 at 2022-10-18 16:34:00
[ 2679.231157] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 2679.854789] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid: with DH-HMAC-CHAP.
[ 2679.898208] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2679.900508] nvme nvme6: qid 0: controller authenticated
[ 2679.902183] nvme nvme6: qid 0: authenticated
[ 2679.907944] nvme nvme6: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[ 2679.912737] nvme nvme6: creating 4 I/O queues.
[ 2680.068539] nvme nvme6: new ctrl: "blktests-subsystem-1"
[ 2682.156749] nvme nvme6: re-authenticating controller

[ 2682.160640] ============================================
[ 2682.161903] WARNING: possible recursive locking detected
[ 2682.163112] 6.1.0-rc1+ #4 Not tainted
[ 2682.164063] --------------------------------------------
[ 2682.165319] kworker/u8:0/927 is trying to acquire lock:
[ 2682.166573] ffff888119d27138 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: __flush_work+0x40e/0x900
[ 2682.168529] 
               but task is already holding lock:
[ 2682.170022] ffff888119d27138 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0x73c/0x1300
[ 2682.172088] 
               other info that might help us debug this:
[ 2682.173695]  Possible unsafe locking scenario:

[ 2682.175196]        CPU0
[ 2682.175908]        ----
[ 2682.176559]   lock((wq_completion)nvme-wq);
[ 2682.177589]   lock((wq_completion)nvme-wq);
[ 2682.178631] 
                *** DEADLOCK ***

[ 2682.180252]  May be due to missing lock nesting notation

[ 2682.181918] 3 locks held by kworker/u8:0/927:
[ 2682.182988]  #0: ffff888119d27138 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0x73c/0x1300
[ 2682.185078]  #1: ffff88810c8bfdd0 ((work_completion)(&ctrl->dhchap_auth_work)){+.+.}-{0:0}, at: process_one_work+0x769/0x1300
[ 2682.187535]  #2: ffffffffb483d560 (rcu_read_lock){....}-{1:2}, at: __flush_work+0xc2/0x900
[ 2682.189373] 
               stack backtrace:
[ 2682.190590] CPU: 2 PID: 927 Comm: kworker/u8:0 Not tainted 6.1.0-rc1+ #4
[ 2682.192118] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 2682.194630] Workqueue: nvme-wq nvme_dhchap_auth_work [nvme_core]
[ 2682.196108] Call Trace:
[ 2682.196755]  <TASK>
[ 2682.197290]  dump_stack_lvl+0x5b/0x77
[ 2682.198143]  __lock_acquire.cold+0x36f/0x3f5
[ 2682.198510] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2682.199075]  ? lock_chain_count+0x20/0x20
[ 2682.199089]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 2682.200949] nvme nvme6: qid 0: controller authenticated
[ 2682.201771]  lock_acquire+0x194/0x4f0
[ 2682.201783]  ? __flush_work+0x40e/0x900
[ 2682.201796]  ? lock_downgrade+0x6b0/0x6b0
[ 2682.201808]  ? mark_held_locks+0x9e/0xe0
[ 2682.207078]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
[ 2682.208041]  __flush_work+0x42e/0x900
[ 2682.208736]  ? __flush_work+0x40e/0x900
[ 2682.209507]  ? queue_delayed_work_on+0x90/0x90
[ 2682.210322]  ? flush_workqueue_prep_pwqs+0x3f0/0x3f0
[ 2682.211209]  nvme_dhchap_auth_work+0xf1/0x1f8 [nvme_core]
[ 2682.212163]  process_one_work+0x816/0x1300
[ 2682.212858]  ? lock_downgrade+0x6b0/0x6b0
[ 2682.213589]  ? pwq_dec_nr_in_flight+0x230/0x230
[ 2682.214367]  ? rwlock_bug.part.0+0x90/0x90
[ 2682.215062]  worker_thread+0xfc/0x1270
[ 2682.215670]  ? __kthread_parkme+0xc1/0x1f0
[ 2682.216367]  ? process_one_work+0x1300/0x1300
[ 2682.217052]  kthread+0x29b/0x340
[ 2682.217620]  ? kthread_complete_and_exit+0x20/0x20
[ 2682.218313]  ret_from_fork+0x1f/0x30
[ 2682.218922]  </TASK>
[ 2682.254321] nvme nvme6: re-authenticating controller
[ 2682.292068] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2682.293937] nvme nvme6: qid 0: controller authenticated
[ 2682.334786] nvme nvme6: re-authenticating controller
[ 2682.364032] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2682.365898] nvme nvme6: qid 0: controller authenticated
[ 2686.934974] nvme nvme6: re-authenticating controller
[ 2687.345435] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 2687.347055] nvme nvme6: qid 0: controller authenticated
[ 2687.738816] nvme nvme6: re-authenticating controller
[ 2688.133101] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 2688.135123] nvme nvme6: qid 0: controller authenticated
[ 2689.624308] nvme nvme6: Removing ctrl: NQN "blktests-subsystem-1"
[ 2886.071612] loop: module loaded
[ 2886.269473] run blktests nvme/045 at 2022-10-18 16:37:28
[ 2886.533625] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 2886.726637] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid: with DH-HMAC-CHAP.
[ 2886.763826] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2886.765778] nvme nvme6: qid 0: controller authenticated
[ 2886.767199] nvme nvme6: qid 0: authenticated
[ 2886.769632] nvme nvme6: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[ 2886.772454] nvme nvme6: creating 4 I/O queues.
[ 2886.918047] nvme nvme6: new ctrl: "blktests-subsystem-1"
[ 2888.496721] nvme nvme6: re-authenticating controller
[ 2888.532075] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2888.534031] nvme nvme6: qid 0: controller authenticated
[ 2888.583146] nvme nvme6: re-authenticating controller
[ 2888.615654] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2888.617588] nvme nvme6: qid 0: controller authenticated
[ 2888.656311] nvme nvme6: re-authenticating controller
[ 2888.683953] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2888.685868] nvme nvme6: qid 0: controller authenticated
[ 2888.724955] nvme nvme6: re-authenticating controller
[ 2889.200478] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 2889.202425] nvme nvme6: qid 0: controller authenticated
[ 2889.625949] nvme nvme6: re-authenticating controller
[ 2889.992842] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 2889.994815] nvme nvme6: qid 0: controller authenticated
[ 2891.514773] nvme nvme6: Removing ctrl: NQN "blktests-subsystem-1"

-- 
Shin'ichiro Kawasaki

^ permalink raw reply related	[flat|nested] 8+ messages in thread