* bad unlock balance WARNING at nvme/045 @ 2022-10-18 8:03 Shinichiro Kawasaki 2022-10-18 10:57 ` Sagi Grimberg 0 siblings, 1 reply; 8+ messages in thread From: Shinichiro Kawasaki @ 2022-10-18 8:03 UTC (permalink / raw) To: linux-nvme, Hannes Reinecke; +Cc: Damien Le Moal Hello Hannes, I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As the Call Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock. mutex_lock(&ctrl->dhchap_auth_mutex); list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { mutex_unlock(&ctrl->dhchap_auth_mutex); flush_work(&chap->auth_work); __nvme_auth_reset(chap); } mutex_unlock(&ctrl->dhchap_auth_mutex); I tried to remove the mutex_unlock in the list iteration with a patch [2], but it resulted in another "WARNING: possible recursive locking detected" [3]. I'm not sure but cause of this WARN could be __nvme_auth_work and nvme_dhchap_auth_work in same nvme_wq. Could you take a look for fix? [1] [ 89.883480] loop: module loaded [ 90.271575] run blktests nvme/045 at 2022-10-18 15:13:33 [ 90.774324] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 91.410877] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid: with DH-HMAC-CHAP. [ 91.454315] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 91.456445] nvme nvme6: qid 0: controller authenticated [ 91.458581] nvme nvme6: qid 0: authenticated [ 91.466122] nvme nvme6: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices. [ 91.469709] nvme nvme6: creating 4 I/O queues. [ 91.620115] nvme nvme6: new ctrl: "blktests-subsystem-1" [ 93.775756] ===================================== [ 93.776961] WARNING: bad unlock balance detected! [ 93.778088] 6.1.0-rc1 #3 Not tainted [ 93.779011] ------------------------------------- [ 93.780090] check/961 is trying to release lock (&ctrl->dhchap_auth_mutex) at: [ 93.781726] [<ffffffffc03bf28b>] nvme_auth_reset+0x5b/0xb0 [nvme_core] [ 93.783310] but there are no more locks to release! [ 93.784461] other info that might help us debug this: [ 93.786087] 3 locks held by check/961: [ 93.787098] #0: ffff8881135fa460 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xe7/0x1b0 [ 93.788923] #1: ffff88811fb1d888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x21d/0x530 [ 93.790888] #2: ffff88810d8160f0 (kn->active#110){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x241/0x530 [ 93.792960] stack backtrace: [ 93.794168] CPU: 1 PID: 961 Comm: check Not tainted 6.1.0-rc1 #3 [ 93.795552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 93.798016] Call Trace: [ 93.798716] <TASK> [ 93.799298] dump_stack_lvl+0x5b/0x77 [ 93.800251] lock_release.cold+0x10/0x4e [ 93.801244] ? nvme_auth_reset+0x5b/0xb0 [nvme_core] [ 93.802479] ? reacquire_held_locks+0x4e0/0x4e0 [ 93.803537] ? mark_held_locks+0x9e/0xe0 [ 93.804544] __mutex_unlock_slowpath+0x8c/0x5f0 [ 93.805650] ? kasan_quarantine_put+0x94/0x1f0 [ 93.806759] ? bit_wait_timeout+0x170/0x170 [ 93.807757] ? __nvme_auth_reset+0x198/0x3d0 [nvme_core] [ 93.809010] ? __kmem_cache_free+0xa9/0x390 [ 93.810059] nvme_auth_reset+0x5b/0xb0 [nvme_core] [ 93.811253] nvme_ctrl_dhchap_secret_store+0x1b4/0x1d0 [nvme_core] [ 93.812710] kernfs_fop_write_iter+0x356/0x530 [ 93.813754] vfs_write+0x519/0xc50 [ 93.814513] ? kernel_write+0x590/0x590 [ 93.815410] ? __up_read+0x182/0x700 [ 93.816252] ? __fget_light+0x51/0x230 [ 93.817092] ksys_write+0xe7/0x1b0 [ 93.817888] ? __ia32_sys_read+0xa0/0xa0 [ 93.818750] ? lockdep_hardirqs_on_prepare+0x17b/0x410 [ 93.819901] ? syscall_enter_from_user_mode+0x22/0xc0 [ 93.820929] ? lockdep_hardirqs_on+0x7d/0x100 [ 93.821848] do_syscall_64+0x37/0x90 [ 93.822583] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 93.823615] RIP: 0033:0x7f6f84d018f7 [ 93.824372] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 93.827699] RSP: 002b:00007ffcdb0ce5d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 93.829137] RAX: ffffffffffffffda RBX: 000000000000003c RCX: 00007f6f84d018f7 [ 93.830502] RDX: 000000000000003c RSI: 00005582004ce6e0 RDI: 0000000000000001 [ 93.831879] RBP: 00005582004ce6e0 R08: 0000000000000000 R09: 0000000000000073 [ 93.833236] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000003c [ 93.834594] R13: 00007f6f84df9780 R14: 000000000000003c R15: 00007f6f84df49e0 [ 93.835979] </TASK> [ 93.836823] nvme nvme6: re-authenticating controller [ 93.871885] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 93.873754] nvme nvme6: qid 0: controller authenticated [ 93.916346] nvme nvme6: re-authenticating controller [ 93.948156] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 93.949498] nvme nvme6: qid 0: controller authenticated [ 93.992725] nvme nvme6: re-authenticating controller [ 94.025843] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 94.027160] nvme nvme6: qid 0: controller authenticated [ 98.836070] nvme nvme6: re-authenticating controller [ 99.345893] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192 [ 99.347859] nvme nvme6: qid 0: controller authenticated [ 99.736321] nvme nvme6: re-authenticating controller [ 100.014469] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192 [ 100.015192] nvme nvme6: qid 0: controller authenticated [ 101.560718] nvme nvme6: Removing ctrl: NQN "blktests-subsystem-1" [2] diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c index c8a6db7c4498..4e824aab30eb 100644 --- a/drivers/nvme/host/auth.c +++ b/drivers/nvme/host/auth.c @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl) mutex_lock(&ctrl->dhchap_auth_mutex); list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { - mutex_unlock(&ctrl->dhchap_auth_mutex); flush_work(&chap->auth_work); __nvme_auth_reset(chap); } [3] [ 2678.264545] loop: module loaded [ 2678.679919] run blktests nvme/045 at 2022-10-18 16:34:00 [ 2679.231157] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 2679.854789] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid: with DH-HMAC-CHAP. [ 2679.898208] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 2679.900508] nvme nvme6: qid 0: controller authenticated [ 2679.902183] nvme nvme6: qid 0: authenticated [ 2679.907944] nvme nvme6: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices. [ 2679.912737] nvme nvme6: creating 4 I/O queues. [ 2680.068539] nvme nvme6: new ctrl: "blktests-subsystem-1" [ 2682.156749] nvme nvme6: re-authenticating controller [ 2682.160640] ============================================ [ 2682.161903] WARNING: possible recursive locking detected [ 2682.163112] 6.1.0-rc1+ #4 Not tainted [ 2682.164063] -------------------------------------------- [ 2682.165319] kworker/u8:0/927 is trying to acquire lock: [ 2682.166573] ffff888119d27138 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: __flush_work+0x40e/0x900 [ 2682.168529] but task is already holding lock: [ 2682.170022] ffff888119d27138 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0x73c/0x1300 [ 2682.172088] other info that might help us debug this: [ 2682.173695] Possible unsafe locking scenario: [ 2682.175196] CPU0 [ 2682.175908] ---- [ 2682.176559] lock((wq_completion)nvme-wq); [ 2682.177589] lock((wq_completion)nvme-wq); [ 2682.178631] *** DEADLOCK *** [ 2682.180252] May be due to missing lock nesting notation [ 2682.181918] 3 locks held by kworker/u8:0/927: [ 2682.182988] #0: ffff888119d27138 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0x73c/0x1300 [ 2682.185078] #1: ffff88810c8bfdd0 ((work_completion)(&ctrl->dhchap_auth_work)){+.+.}-{0:0}, at: process_one_work+0x769/0x1300 [ 2682.187535] #2: ffffffffb483d560 (rcu_read_lock){....}-{1:2}, at: __flush_work+0xc2/0x900 [ 2682.189373] stack backtrace: [ 2682.190590] CPU: 2 PID: 927 Comm: kworker/u8:0 Not tainted 6.1.0-rc1+ #4 [ 2682.192118] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 2682.194630] Workqueue: nvme-wq nvme_dhchap_auth_work [nvme_core] [ 2682.196108] Call Trace: [ 2682.196755] <TASK> [ 2682.197290] dump_stack_lvl+0x5b/0x77 [ 2682.198143] __lock_acquire.cold+0x36f/0x3f5 [ 2682.198510] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 2682.199075] ? lock_chain_count+0x20/0x20 [ 2682.199089] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 2682.200949] nvme nvme6: qid 0: controller authenticated [ 2682.201771] lock_acquire+0x194/0x4f0 [ 2682.201783] ? __flush_work+0x40e/0x900 [ 2682.201796] ? lock_downgrade+0x6b0/0x6b0 [ 2682.201808] ? mark_held_locks+0x9e/0xe0 [ 2682.207078] ? lockdep_hardirqs_on_prepare+0x17b/0x410 [ 2682.208041] __flush_work+0x42e/0x900 [ 2682.208736] ? __flush_work+0x40e/0x900 [ 2682.209507] ? queue_delayed_work_on+0x90/0x90 [ 2682.210322] ? flush_workqueue_prep_pwqs+0x3f0/0x3f0 [ 2682.211209] nvme_dhchap_auth_work+0xf1/0x1f8 [nvme_core] [ 2682.212163] process_one_work+0x816/0x1300 [ 2682.212858] ? lock_downgrade+0x6b0/0x6b0 [ 2682.213589] ? pwq_dec_nr_in_flight+0x230/0x230 [ 2682.214367] ? rwlock_bug.part.0+0x90/0x90 [ 2682.215062] worker_thread+0xfc/0x1270 [ 2682.215670] ? __kthread_parkme+0xc1/0x1f0 [ 2682.216367] ? process_one_work+0x1300/0x1300 [ 2682.217052] kthread+0x29b/0x340 [ 2682.217620] ? kthread_complete_and_exit+0x20/0x20 [ 2682.218313] ret_from_fork+0x1f/0x30 [ 2682.218922] </TASK> [ 2682.254321] nvme nvme6: re-authenticating controller [ 2682.292068] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 2682.293937] nvme nvme6: qid 0: controller authenticated [ 2682.334786] nvme nvme6: re-authenticating controller [ 2682.364032] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 2682.365898] nvme nvme6: qid 0: controller authenticated [ 2686.934974] nvme nvme6: re-authenticating controller [ 2687.345435] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192 [ 2687.347055] nvme nvme6: qid 0: controller authenticated [ 2687.738816] nvme nvme6: re-authenticating controller [ 2688.133101] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192 [ 2688.135123] nvme nvme6: qid 0: controller authenticated [ 2689.624308] nvme nvme6: Removing ctrl: NQN "blktests-subsystem-1" [ 2886.071612] loop: module loaded [ 2886.269473] run blktests nvme/045 at 2022-10-18 16:37:28 [ 2886.533625] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 2886.726637] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid: with DH-HMAC-CHAP. [ 2886.763826] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 2886.765778] nvme nvme6: qid 0: controller authenticated [ 2886.767199] nvme nvme6: qid 0: authenticated [ 2886.769632] nvme nvme6: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices. [ 2886.772454] nvme nvme6: creating 4 I/O queues. [ 2886.918047] nvme nvme6: new ctrl: "blktests-subsystem-1" [ 2888.496721] nvme nvme6: re-authenticating controller [ 2888.532075] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 2888.534031] nvme nvme6: qid 0: controller authenticated [ 2888.583146] nvme nvme6: re-authenticating controller [ 2888.615654] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 2888.617588] nvme nvme6: qid 0: controller authenticated [ 2888.656311] nvme nvme6: re-authenticating controller [ 2888.683953] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048 [ 2888.685868] nvme nvme6: qid 0: controller authenticated [ 2888.724955] nvme nvme6: re-authenticating controller [ 2889.200478] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192 [ 2889.202425] nvme nvme6: qid 0: controller authenticated [ 2889.625949] nvme nvme6: re-authenticating controller [ 2889.992842] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192 [ 2889.994815] nvme nvme6: qid 0: controller authenticated [ 2891.514773] nvme nvme6: Removing ctrl: NQN "blktests-subsystem-1" -- Shin'ichiro Kawasaki ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045 2022-10-18 8:03 bad unlock balance WARNING at nvme/045 Shinichiro Kawasaki @ 2022-10-18 10:57 ` Sagi Grimberg 2022-10-26 2:13 ` Shinichiro Kawasaki 2022-10-26 6:42 ` Hannes Reinecke 0 siblings, 2 replies; 8+ messages in thread From: Sagi Grimberg @ 2022-10-18 10:57 UTC (permalink / raw) To: Shinichiro Kawasaki, linux-nvme, Hannes Reinecke; +Cc: Damien Le Moal > Hello Hannes, > > I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As the Call > Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock. > > mutex_lock(&ctrl->dhchap_auth_mutex); > list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { > mutex_unlock(&ctrl->dhchap_auth_mutex); > flush_work(&chap->auth_work); > __nvme_auth_reset(chap); > } > mutex_unlock(&ctrl->dhchap_auth_mutex); > > I tried to remove the mutex_unlock in the list iteration with a patch [2], but > it resulted in another "WARNING: possible recursive locking detected" [3]. I'm > not sure but cause of this WARN could be __nvme_auth_work and > nvme_dhchap_auth_work in same nvme_wq. > > Could you take a look for fix? I'm looking at the code and I think that the way the concurrent negotiations and how dhchap_auth_mutex is handled is very fragile, also why should the per-queue auth_work hold the controller-wide dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate is checking if the chap context is already queued? Why should we allow that? I'd suggest to splice dhchap_auth_list, to a local list and then just flush nvmet_wq in teardown flows. Same for renegotiations/reset flows. And we should prevent for the double-queuing of chap negotiations to begin with, instead of handling them (I still don't understand why this is permitted, but perhaps just return EBUSY in this case?) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045 2022-10-18 10:57 ` Sagi Grimberg @ 2022-10-26 2:13 ` Shinichiro Kawasaki 2022-10-26 6:42 ` Hannes Reinecke 1 sibling, 0 replies; 8+ messages in thread From: Shinichiro Kawasaki @ 2022-10-26 2:13 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-nvme, Hannes Reinecke, Damien Le Moal On Oct 18, 2022 / 13:57, Sagi Grimberg wrote: > > > Hello Hannes, > > > > I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As the Call > > Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock. > > > > mutex_lock(&ctrl->dhchap_auth_mutex); > > list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { > > mutex_unlock(&ctrl->dhchap_auth_mutex); > > flush_work(&chap->auth_work); > > __nvme_auth_reset(chap); > > } > > mutex_unlock(&ctrl->dhchap_auth_mutex); > > > > I tried to remove the mutex_unlock in the list iteration with a patch [2], but > > it resulted in another "WARNING: possible recursive locking detected" [3]. I'm > > not sure but cause of this WARN could be __nvme_auth_work and > > nvme_dhchap_auth_work in same nvme_wq. > > > > Could you take a look for fix? Sagi, thank you for the comments. > I'm looking at the code and I think that the way the concurrent > negotiations and how dhchap_auth_mutex is handled is very fragile, > also why should the per-queue auth_work hold the controller-wide > dhchap_auth_mutex? It looks for me that dhchap_auth_mutex lock in nvme_auth_negotiate() simply guards ctrl->dhchap_auth_list access (look up the list and add an element to the list). > The only reason I see is because nvme_auth_negotiate > is checking if the chap context is already queued? Why should we > allow that? I'm not sure about this. I would like to ask other experts' comment on it. > > I'd suggest to splice dhchap_auth_list, to a local list and then just > flush nvmet_wq in teardown flows. Same for renegotiations/reset flows. Thanks for the suggestion. I'm not sure how to use list splice of dhchap_auth_list at renegotitation. But as for reset by nvme_auth_reset(), your suggestion looks good for me to avoid the unbalanced lock. It can be applied to tear down by nvme_auth_free() also. I created a patch [1] with this approach. It just frees the dhchap contexts at reset and do not reuse them for negotiation. I think this is fine since the contexts are re-allocated at renegotiation. > And we should prevent for the double-queuing of chap negotiations to > begin with, instead of handling them (I still don't understand why this > is permitted, but perhaps just return EBUSY in this case?) Again, I'm not sure about this point. I confirmed the patch [1] avoids the unbalanced lock. However, still the other "WARNING: possible recursive locking detected" was reported. I created another patch to add a workqueue dedicated to nvme auth, then it avoided the WARN. Are these fix approches ok? If so, I can post the two patches for the two WARNs. [1] diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c index c8a6db7c4498..c71f1be7d403 100644 --- a/drivers/nvme/host/auth.c +++ b/drivers/nvme/host/auth.c @@ -920,17 +920,25 @@ int nvme_auth_wait(struct nvme_ctrl *ctrl, int qid) } EXPORT_SYMBOL_GPL(nvme_auth_wait); -void nvme_auth_reset(struct nvme_ctrl *ctrl) +static void nvme_auth_free_queue_contexts(struct nvme_ctrl *ctrl) { - struct nvme_dhchap_queue_context *chap; + struct nvme_dhchap_queue_context *chap = NULL, *tmp; + LIST_HEAD(splice); mutex_lock(&ctrl->dhchap_auth_mutex); - list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { - mutex_unlock(&ctrl->dhchap_auth_mutex); + list_splice_init(&ctrl->dhchap_auth_list, &splice); + mutex_unlock(&ctrl->dhchap_auth_mutex); + + list_for_each_entry_safe(chap, tmp, &splice, entry) { + list_del_init(&chap->entry); flush_work(&chap->auth_work); - __nvme_auth_reset(chap); + __nvme_auth_free(chap); } - mutex_unlock(&ctrl->dhchap_auth_mutex); +} + +void nvme_auth_reset(struct nvme_ctrl *ctrl) +{ + nvme_auth_free_queue_contexts(ctrl); } EXPORT_SYMBOL_GPL(nvme_auth_reset); @@ -996,15 +1004,8 @@ EXPORT_SYMBOL_GPL(nvme_auth_stop); void nvme_auth_free(struct nvme_ctrl *ctrl) { - struct nvme_dhchap_queue_context *chap = NULL, *tmp; + nvme_auth_free_queue_contexts(ctrl); - mutex_lock(&ctrl->dhchap_auth_mutex); - list_for_each_entry_safe(chap, tmp, &ctrl->dhchap_auth_list, entry) { - list_del_init(&chap->entry); - flush_work(&chap->auth_work); - __nvme_auth_free(chap); - } - mutex_unlock(&ctrl->dhchap_auth_mutex); if (ctrl->host_key) { nvme_auth_free_key(ctrl->host_key); ctrl->host_key = NULL; -- Shin'ichiro Kawasaki ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045 2022-10-18 10:57 ` Sagi Grimberg 2022-10-26 2:13 ` Shinichiro Kawasaki @ 2022-10-26 6:42 ` Hannes Reinecke 2022-10-26 12:01 ` Shinichiro Kawasaki 2022-10-26 12:27 ` Sagi Grimberg 1 sibling, 2 replies; 8+ messages in thread From: Hannes Reinecke @ 2022-10-26 6:42 UTC (permalink / raw) To: Sagi Grimberg, Shinichiro Kawasaki, linux-nvme; +Cc: Damien Le Moal On 10/18/22 12:57, Sagi Grimberg wrote: > >> Hello Hannes, >> >> I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As >> the Call >> Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock. >> >> mutex_lock(&ctrl->dhchap_auth_mutex); >> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { >> mutex_unlock(&ctrl->dhchap_auth_mutex); >> flush_work(&chap->auth_work); >> __nvme_auth_reset(chap); >> } >> mutex_unlock(&ctrl->dhchap_auth_mutex); >> >> I tried to remove the mutex_unlock in the list iteration with a patch >> [2], but >> it resulted in another "WARNING: possible recursive locking detected" >> [3]. I'm >> not sure but cause of this WARN could be __nvme_auth_work and >> nvme_dhchap_auth_work in same nvme_wq. >> >> Could you take a look for fix? > > I'm looking at the code and I think that the way the concurrent > negotiations and how dhchap_auth_mutex is handled is very fragile, > also why should the per-queue auth_work hold the controller-wide > dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate > is checking if the chap context is already queued? Why should we > allow that? > Well; that's partially due to the internal design of linux-nvme. The controller structure itself doesn't have 'queues' per se; there just is a general 'ctrl' structure. So while I would have loved to have a per-queue structure to hook the chap authentication into, all I have is the controller structure. Hence we have a controller-wide list holding all 'chap' structures for the individual queues. Hence the controller-wide mutex to gate list modifications. > I'd suggest to splice dhchap_auth_list, to a local list and then just > flush nvmet_wq in teardown flows. Same for renegotiations/reset flows. > And we should prevent for the double-queuing of chap negotiations to > begin with, instead of handling them (I still don't understand why this > is permitted, but perhaps just return EBUSY in this case?) We don't double queue; we're re-using the existing entries. Can you check if this fix works? diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c index c8a6db7c4498..4e824aab30eb 100644 --- a/drivers/nvme/host/auth.c +++ b/drivers/nvme/host/auth.c @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl) mutex_lock(&ctrl->dhchap_auth_mutex); list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { - mutex_unlock(&ctrl->dhchap_auth_mutex); flush_work(&chap->auth_work); __nvme_auth_reset(chap); } Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045 2022-10-26 6:42 ` Hannes Reinecke @ 2022-10-26 12:01 ` Shinichiro Kawasaki 2022-10-26 12:38 ` Sagi Grimberg 2022-10-26 12:27 ` Sagi Grimberg 1 sibling, 1 reply; 8+ messages in thread From: Shinichiro Kawasaki @ 2022-10-26 12:01 UTC (permalink / raw) To: Hannes Reinecke; +Cc: Sagi Grimberg, linux-nvme, Damien Le Moal On Oct 26, 2022 / 08:42, Hannes Reinecke wrote: > On 10/18/22 12:57, Sagi Grimberg wrote: > > > > > Hello Hannes, > > > > > > I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. > > > As the Call > > > Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock. > > > > > > mutex_lock(&ctrl->dhchap_auth_mutex); > > > list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { > > > mutex_unlock(&ctrl->dhchap_auth_mutex); > > > flush_work(&chap->auth_work); > > > __nvme_auth_reset(chap); > > > } > > > mutex_unlock(&ctrl->dhchap_auth_mutex); > > > > > > I tried to remove the mutex_unlock in the list iteration with a > > > patch [2], but > > > it resulted in another "WARNING: possible recursive locking > > > detected" [3]. I'm > > > not sure but cause of this WARN could be __nvme_auth_work and > > > nvme_dhchap_auth_work in same nvme_wq. > > > > > > Could you take a look for fix? > > > > I'm looking at the code and I think that the way the concurrent > > negotiations and how dhchap_auth_mutex is handled is very fragile, > > also why should the per-queue auth_work hold the controller-wide > > dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate > > is checking if the chap context is already queued? Why should we > > allow that? > > > Well; that's partially due to the internal design of linux-nvme. > The controller structure itself doesn't have 'queues' per se; there just is > a general 'ctrl' structure. So while I would have loved to have a per-queue > structure to hook the chap authentication into, all I have is the controller > structure. > Hence we have a controller-wide list holding all 'chap' structures for the > individual queues. > Hence the controller-wide mutex to gate list modifications. > > > I'd suggest to splice dhchap_auth_list, to a local list and then just > > flush nvmet_wq in teardown flows. Same for renegotiations/reset flows. > > And we should prevent for the double-queuing of chap negotiations to > > begin with, instead of handling them (I still don't understand why this > > is permitted, but perhaps just return EBUSY in this case?) > > We don't double queue; we're re-using the existing entries. Hannes, thanks for the explanations. > > Can you check if this fix works? > > diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c > index c8a6db7c4498..4e824aab30eb 100644 > --- a/drivers/nvme/host/auth.c > +++ b/drivers/nvme/host/auth.c > @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl) > > mutex_lock(&ctrl->dhchap_auth_mutex); > list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { > - mutex_unlock(&ctrl->dhchap_auth_mutex); > flush_work(&chap->auth_work); > __nvme_auth_reset(chap); > } I confirmed this hunk avoids the "WARNING: bad unlock balance detected!". As far as I ran blktests with this change, I observe no failure in other test cases. However, I observed another new WARN at nvme/045: "WARNING: possible recursive locking detected". I think it was caused by nvme_dhchap_auth_work in nvme_wq tried to flush another work __nvme_auth_work in the same workqueue. I created a patch below which creates another workqueue nvme_auth_wq for __nvme_auth_work. Do you think this fix approach is acceptable? diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c index 4e824aab30eb..946085070223 100644 --- a/drivers/nvme/host/auth.c +++ b/drivers/nvme/host/auth.c @@ -42,6 +42,8 @@ struct nvme_dhchap_queue_context { int sess_key_len; }; +struct workqueue_struct *nvme_auth_wq; + #define nvme_auth_flags_from_qid(qid) \ (qid == 0) ? 0 : BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_RESERVED #define nvme_auth_queue_from_qid(ctrl, qid) \ @@ -869,7 +871,7 @@ int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid) mutex_unlock(&ctrl->dhchap_auth_mutex); flush_work(&chap->auth_work); __nvme_auth_reset(chap); - queue_work(nvme_wq, &chap->auth_work); + queue_work(nvme_auth_wq, &chap->auth_work); return 0; } } @@ -896,7 +898,7 @@ int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid) INIT_WORK(&chap->auth_work, __nvme_auth_work); list_add(&chap->entry, &ctrl->dhchap_auth_list); mutex_unlock(&ctrl->dhchap_auth_mutex); - queue_work(nvme_wq, &chap->auth_work); + queue_work(nvme_auth_wq, &chap->auth_work); return 0; } EXPORT_SYMBOL_GPL(nvme_auth_negotiate); @@ -969,6 +971,21 @@ static void nvme_dhchap_auth_work(struct work_struct *work) */ } +int nvme_auth_init(void) +{ + nvme_auth_wq = alloc_workqueue("nvme-auth-wq", + WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0); + if (!nvme_auth_wq) + return -ENOMEM; + + return 0; +} + +void nvme_auth_exit(void) +{ + destroy_workqueue(nvme_auth_wq); +} + void nvme_auth_init_ctrl(struct nvme_ctrl *ctrl) { INIT_LIST_HEAD(&ctrl->dhchap_auth_list); diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 059737c1a2c1..aa06c686ad29 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -5341,8 +5341,14 @@ static int __init nvme_core_init(void) goto unregister_generic_ns; } + result = nvme_auth_init(); + if (result) + goto exit_nvme_auth; + return 0; +exit_nvme_auth: + nvme_auth_exit(); unregister_generic_ns: unregister_chrdev_region(nvme_ns_chr_devt, NVME_MINORS); destroy_subsys_class: @@ -5363,6 +5369,7 @@ static int __init nvme_core_init(void) static void __exit nvme_core_exit(void) { + nvme_auth_exit(); class_destroy(nvme_ns_chr_class); class_destroy(nvme_subsys_class); class_destroy(nvme_class); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a29877217ee6..472ab2d14a67 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -1019,6 +1019,8 @@ static inline bool nvme_ctrl_sgl_supported(struct nvme_ctrl *ctrl) } #ifdef CONFIG_NVME_AUTH +int nvme_auth_init(void); +void nvme_auth_exit(void); void nvme_auth_init_ctrl(struct nvme_ctrl *ctrl); void nvme_auth_stop(struct nvme_ctrl *ctrl); int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid); @@ -1026,6 +1028,8 @@ int nvme_auth_wait(struct nvme_ctrl *ctrl, int qid); void nvme_auth_reset(struct nvme_ctrl *ctrl); void nvme_auth_free(struct nvme_ctrl *ctrl); #else +static inline int nvme_auth_init(void) {}; +static inline void nvme_auth_exit(void) {}; static inline void nvme_auth_init_ctrl(struct nvme_ctrl *ctrl) {}; static inline void nvme_auth_stop(struct nvme_ctrl *ctrl) {}; static inline int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid) -- Shin'ichiro Kawasaki ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045 2022-10-26 12:01 ` Shinichiro Kawasaki @ 2022-10-26 12:38 ` Sagi Grimberg 2022-10-28 13:52 ` Hannes Reinecke 0 siblings, 1 reply; 8+ messages in thread From: Sagi Grimberg @ 2022-10-26 12:38 UTC (permalink / raw) To: Shinichiro Kawasaki, Hannes Reinecke; +Cc: linux-nvme, Damien Le Moal >>>> Hello Hannes, >>>> >>>> I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. >>>> As the Call >>>> Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock. >>>> >>>> mutex_lock(&ctrl->dhchap_auth_mutex); >>>> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { >>>> mutex_unlock(&ctrl->dhchap_auth_mutex); >>>> flush_work(&chap->auth_work); >>>> __nvme_auth_reset(chap); >>>> } >>>> mutex_unlock(&ctrl->dhchap_auth_mutex); >>>> >>>> I tried to remove the mutex_unlock in the list iteration with a >>>> patch [2], but >>>> it resulted in another "WARNING: possible recursive locking >>>> detected" [3]. I'm >>>> not sure but cause of this WARN could be __nvme_auth_work and >>>> nvme_dhchap_auth_work in same nvme_wq. >>>> >>>> Could you take a look for fix? >>> >>> I'm looking at the code and I think that the way the concurrent >>> negotiations and how dhchap_auth_mutex is handled is very fragile, >>> also why should the per-queue auth_work hold the controller-wide >>> dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate >>> is checking if the chap context is already queued? Why should we >>> allow that? >>> >> Well; that's partially due to the internal design of linux-nvme. >> The controller structure itself doesn't have 'queues' per se; there just is >> a general 'ctrl' structure. So while I would have loved to have a per-queue >> structure to hook the chap authentication into, all I have is the controller >> structure. >> Hence we have a controller-wide list holding all 'chap' structures for the >> individual queues. >> Hence the controller-wide mutex to gate list modifications. >> >>> I'd suggest to splice dhchap_auth_list, to a local list and then just >>> flush nvmet_wq in teardown flows. Same for renegotiations/reset flows. >>> And we should prevent for the double-queuing of chap negotiations to >>> begin with, instead of handling them (I still don't understand why this >>> is permitted, but perhaps just return EBUSY in this case?) >> >> We don't double queue; we're re-using the existing entries. > > Hannes, thanks for the explanations. > >> >> Can you check if this fix works? >> >> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c >> index c8a6db7c4498..4e824aab30eb 100644 >> --- a/drivers/nvme/host/auth.c >> +++ b/drivers/nvme/host/auth.c >> @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl) >> >> mutex_lock(&ctrl->dhchap_auth_mutex); >> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { >> - mutex_unlock(&ctrl->dhchap_auth_mutex); >> flush_work(&chap->auth_work); >> __nvme_auth_reset(chap); >> } > > I confirmed this hunk avoids the "WARNING: bad unlock balance detected!". As far > as I ran blktests with this change, I observe no failure in other test cases. > > However, I observed another new WARN at nvme/045: "WARNING: possible recursive > locking detected". I think it was caused by nvme_dhchap_auth_work in nvme_wq > tried to flush another work __nvme_auth_work in the same workqueue. I created a > patch below which creates another workqueue nvme_auth_wq for __nvme_auth_work. > Do you think this fix approach is acceptable? It is fine to flush work on the same workqueue, the problem is that they share the same lock. I think what we should do is: 1. have chap contexts in an array and not a list so we don't need to traverse to locate a context. 2. omit the dhchap_auth_mutex as I don't see a need to protect the array. 3. synchronize only the chap context itself, either with a lock, or a flag, or a state (atomic). 4. get rid of long lived allocations, it is not optimizing a whole lot imho, and right now we are reserving something like 5k per queue. 5. add one more synchronization between chap and ctrl accessing the keys ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045 2022-10-26 12:38 ` Sagi Grimberg @ 2022-10-28 13:52 ` Hannes Reinecke 0 siblings, 0 replies; 8+ messages in thread From: Hannes Reinecke @ 2022-10-28 13:52 UTC (permalink / raw) To: Sagi Grimberg, Shinichiro Kawasaki; +Cc: linux-nvme, Damien Le Moal On 10/26/22 14:38, Sagi Grimberg wrote: > [ .. ] >> Hannes, thanks for the explanations. >> >>> >>> Can you check if this fix works? >>> >>> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c >>> index c8a6db7c4498..4e824aab30eb 100644 >>> --- a/drivers/nvme/host/auth.c >>> +++ b/drivers/nvme/host/auth.c >>> @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl) >>> >>> mutex_lock(&ctrl->dhchap_auth_mutex); >>> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { >>> - mutex_unlock(&ctrl->dhchap_auth_mutex); >>> flush_work(&chap->auth_work); >>> __nvme_auth_reset(chap); >>> } >> >> I confirmed this hunk avoids the "WARNING: bad unlock balance >> detected!". As far >> as I ran blktests with this change, I observe no failure in other test >> cases. >> >> However, I observed another new WARN at nvme/045: "WARNING: possible >> recursive >> locking detected". I think it was caused by nvme_dhchap_auth_work in >> nvme_wq >> tried to flush another work __nvme_auth_work in the same workqueue. I >> created a >> patch below which creates another workqueue nvme_auth_wq for >> __nvme_auth_work. >> Do you think this fix approach is acceptable? > > It is fine to flush work on the same workqueue, the problem is that they > share the same lock. > > I think what we should do is: > 1. have chap contexts in an array and not a list so we don't need > to traverse to locate a context. > 2. omit the dhchap_auth_mutex as I don't see a need to protect the > array. > 3. synchronize only the chap context itself, either with a lock, or > a flag, or a state (atomic). > 4. get rid of long lived allocations, it is not optimizing a whole lot > imho, and right now we are reserving something like 5k per queue. > 5. add one more synchronization between chap and ctrl accessing the keys > I've now send a patchset which should addressing this; can you check if that's what you had in mind? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045 2022-10-26 6:42 ` Hannes Reinecke 2022-10-26 12:01 ` Shinichiro Kawasaki @ 2022-10-26 12:27 ` Sagi Grimberg 1 sibling, 0 replies; 8+ messages in thread From: Sagi Grimberg @ 2022-10-26 12:27 UTC (permalink / raw) To: Hannes Reinecke, Shinichiro Kawasaki, linux-nvme; +Cc: Damien Le Moal On 10/26/22 09:42, Hannes Reinecke wrote: > On 10/18/22 12:57, Sagi Grimberg wrote: >> >>> Hello Hannes, >>> >>> I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. >>> As the Call >>> Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock. >>> >>> mutex_lock(&ctrl->dhchap_auth_mutex); >>> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { >>> mutex_unlock(&ctrl->dhchap_auth_mutex); >>> flush_work(&chap->auth_work); >>> __nvme_auth_reset(chap); >>> } >>> mutex_unlock(&ctrl->dhchap_auth_mutex); >>> >>> I tried to remove the mutex_unlock in the list iteration with a patch >>> [2], but >>> it resulted in another "WARNING: possible recursive locking detected" >>> [3]. I'm >>> not sure but cause of this WARN could be __nvme_auth_work and >>> nvme_dhchap_auth_work in same nvme_wq. >>> >>> Could you take a look for fix? >> >> I'm looking at the code and I think that the way the concurrent >> negotiations and how dhchap_auth_mutex is handled is very fragile, >> also why should the per-queue auth_work hold the controller-wide >> dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate >> is checking if the chap context is already queued? Why should we >> allow that? >> > Well; that's partially due to the internal design of linux-nvme. > The controller structure itself doesn't have 'queues' per se; there just > is a general 'ctrl' structure. So while I would have loved to have a > per-queue structure to hook the chap authentication into, all I have is > the controller structure. > Hence we have a controller-wide list holding all 'chap' structures for > the individual queues. > Hence the controller-wide mutex to gate list modifications. This stil can easily be an array that does not enforce a list traversal. > >> I'd suggest to splice dhchap_auth_list, to a local list and then just >> flush nvmet_wq in teardown flows. Same for renegotiations/reset flows. >> And we should prevent for the double-queuing of chap negotiations to >> begin with, instead of handling them (I still don't understand why this >> is permitted, but perhaps just return EBUSY in this case?) > > We don't double queue; we're re-using the existing entries. I'd argue that you shouldn't. It'd be better to simply delete entries once queue auth is done... > Can you check if this fix works? > > diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c > index c8a6db7c4498..4e824aab30eb 100644 > --- a/drivers/nvme/host/auth.c > +++ b/drivers/nvme/host/auth.c > @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl) > > mutex_lock(&ctrl->dhchap_auth_mutex); > list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) { > - mutex_unlock(&ctrl->dhchap_auth_mutex); > flush_work(&chap->auth_work); > __nvme_auth_reset(chap); > } How can this work? auth_work currently acquires the mutex. This shouldn't solve anything. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-10-28 13:52 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-18 8:03 bad unlock balance WARNING at nvme/045 Shinichiro Kawasaki 2022-10-18 10:57 ` Sagi Grimberg 2022-10-26 2:13 ` Shinichiro Kawasaki 2022-10-26 6:42 ` Hannes Reinecke 2022-10-26 12:01 ` Shinichiro Kawasaki 2022-10-26 12:38 ` Sagi Grimberg 2022-10-28 13:52 ` Hannes Reinecke 2022-10-26 12:27 ` Sagi Grimberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).