* bad unlock balance WARNING at nvme/045
@ 2022-10-18 8:03 Shinichiro Kawasaki
2022-10-18 10:57 ` Sagi Grimberg
0 siblings, 1 reply; 8+ messages in thread
From: Shinichiro Kawasaki @ 2022-10-18 8:03 UTC (permalink / raw)
To: linux-nvme, Hannes Reinecke; +Cc: Damien Le Moal
Hello Hannes,
I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As the Call
Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
mutex_lock(&ctrl->dhchap_auth_mutex);
list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
mutex_unlock(&ctrl->dhchap_auth_mutex);
flush_work(&chap->auth_work);
__nvme_auth_reset(chap);
}
mutex_unlock(&ctrl->dhchap_auth_mutex);
I tried to remove the mutex_unlock in the list iteration with a patch [2], but
it resulted in another "WARNING: possible recursive locking detected" [3]. I'm
not sure but cause of this WARN could be __nvme_auth_work and
nvme_dhchap_auth_work in same nvme_wq.
Could you take a look for fix?
[1]
[ 89.883480] loop: module loaded
[ 90.271575] run blktests nvme/045 at 2022-10-18 15:13:33
[ 90.774324] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 91.410877] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid: with DH-HMAC-CHAP.
[ 91.454315] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 91.456445] nvme nvme6: qid 0: controller authenticated
[ 91.458581] nvme nvme6: qid 0: authenticated
[ 91.466122] nvme nvme6: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[ 91.469709] nvme nvme6: creating 4 I/O queues.
[ 91.620115] nvme nvme6: new ctrl: "blktests-subsystem-1"
[ 93.775756] =====================================
[ 93.776961] WARNING: bad unlock balance detected!
[ 93.778088] 6.1.0-rc1 #3 Not tainted
[ 93.779011] -------------------------------------
[ 93.780090] check/961 is trying to release lock (&ctrl->dhchap_auth_mutex) at:
[ 93.781726] [<ffffffffc03bf28b>] nvme_auth_reset+0x5b/0xb0 [nvme_core]
[ 93.783310] but there are no more locks to release!
[ 93.784461]
other info that might help us debug this:
[ 93.786087] 3 locks held by check/961:
[ 93.787098] #0: ffff8881135fa460 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xe7/0x1b0
[ 93.788923] #1: ffff88811fb1d888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x21d/0x530
[ 93.790888] #2: ffff88810d8160f0 (kn->active#110){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x241/0x530
[ 93.792960]
stack backtrace:
[ 93.794168] CPU: 1 PID: 961 Comm: check Not tainted 6.1.0-rc1 #3
[ 93.795552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 93.798016] Call Trace:
[ 93.798716] <TASK>
[ 93.799298] dump_stack_lvl+0x5b/0x77
[ 93.800251] lock_release.cold+0x10/0x4e
[ 93.801244] ? nvme_auth_reset+0x5b/0xb0 [nvme_core]
[ 93.802479] ? reacquire_held_locks+0x4e0/0x4e0
[ 93.803537] ? mark_held_locks+0x9e/0xe0
[ 93.804544] __mutex_unlock_slowpath+0x8c/0x5f0
[ 93.805650] ? kasan_quarantine_put+0x94/0x1f0
[ 93.806759] ? bit_wait_timeout+0x170/0x170
[ 93.807757] ? __nvme_auth_reset+0x198/0x3d0 [nvme_core]
[ 93.809010] ? __kmem_cache_free+0xa9/0x390
[ 93.810059] nvme_auth_reset+0x5b/0xb0 [nvme_core]
[ 93.811253] nvme_ctrl_dhchap_secret_store+0x1b4/0x1d0 [nvme_core]
[ 93.812710] kernfs_fop_write_iter+0x356/0x530
[ 93.813754] vfs_write+0x519/0xc50
[ 93.814513] ? kernel_write+0x590/0x590
[ 93.815410] ? __up_read+0x182/0x700
[ 93.816252] ? __fget_light+0x51/0x230
[ 93.817092] ksys_write+0xe7/0x1b0
[ 93.817888] ? __ia32_sys_read+0xa0/0xa0
[ 93.818750] ? lockdep_hardirqs_on_prepare+0x17b/0x410
[ 93.819901] ? syscall_enter_from_user_mode+0x22/0xc0
[ 93.820929] ? lockdep_hardirqs_on+0x7d/0x100
[ 93.821848] do_syscall_64+0x37/0x90
[ 93.822583] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 93.823615] RIP: 0033:0x7f6f84d018f7
[ 93.824372] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 93.827699] RSP: 002b:00007ffcdb0ce5d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 93.829137] RAX: ffffffffffffffda RBX: 000000000000003c RCX: 00007f6f84d018f7
[ 93.830502] RDX: 000000000000003c RSI: 00005582004ce6e0 RDI: 0000000000000001
[ 93.831879] RBP: 00005582004ce6e0 R08: 0000000000000000 R09: 0000000000000073
[ 93.833236] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000003c
[ 93.834594] R13: 00007f6f84df9780 R14: 000000000000003c R15: 00007f6f84df49e0
[ 93.835979] </TASK>
[ 93.836823] nvme nvme6: re-authenticating controller
[ 93.871885] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 93.873754] nvme nvme6: qid 0: controller authenticated
[ 93.916346] nvme nvme6: re-authenticating controller
[ 93.948156] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 93.949498] nvme nvme6: qid 0: controller authenticated
[ 93.992725] nvme nvme6: re-authenticating controller
[ 94.025843] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 94.027160] nvme nvme6: qid 0: controller authenticated
[ 98.836070] nvme nvme6: re-authenticating controller
[ 99.345893] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 99.347859] nvme nvme6: qid 0: controller authenticated
[ 99.736321] nvme nvme6: re-authenticating controller
[ 100.014469] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 100.015192] nvme nvme6: qid 0: controller authenticated
[ 101.560718] nvme nvme6: Removing ctrl: NQN "blktests-subsystem-1"
[2]
diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
index c8a6db7c4498..4e824aab30eb 100644
--- a/drivers/nvme/host/auth.c
+++ b/drivers/nvme/host/auth.c
@@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl)
mutex_lock(&ctrl->dhchap_auth_mutex);
list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
- mutex_unlock(&ctrl->dhchap_auth_mutex);
flush_work(&chap->auth_work);
__nvme_auth_reset(chap);
}
[3]
[ 2678.264545] loop: module loaded
[ 2678.679919] run blktests nvme/045 at 2022-10-18 16:34:00
[ 2679.231157] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 2679.854789] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid: with DH-HMAC-CHAP.
[ 2679.898208] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2679.900508] nvme nvme6: qid 0: controller authenticated
[ 2679.902183] nvme nvme6: qid 0: authenticated
[ 2679.907944] nvme nvme6: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[ 2679.912737] nvme nvme6: creating 4 I/O queues.
[ 2680.068539] nvme nvme6: new ctrl: "blktests-subsystem-1"
[ 2682.156749] nvme nvme6: re-authenticating controller
[ 2682.160640] ============================================
[ 2682.161903] WARNING: possible recursive locking detected
[ 2682.163112] 6.1.0-rc1+ #4 Not tainted
[ 2682.164063] --------------------------------------------
[ 2682.165319] kworker/u8:0/927 is trying to acquire lock:
[ 2682.166573] ffff888119d27138 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: __flush_work+0x40e/0x900
[ 2682.168529]
but task is already holding lock:
[ 2682.170022] ffff888119d27138 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0x73c/0x1300
[ 2682.172088]
other info that might help us debug this:
[ 2682.173695] Possible unsafe locking scenario:
[ 2682.175196] CPU0
[ 2682.175908] ----
[ 2682.176559] lock((wq_completion)nvme-wq);
[ 2682.177589] lock((wq_completion)nvme-wq);
[ 2682.178631]
*** DEADLOCK ***
[ 2682.180252] May be due to missing lock nesting notation
[ 2682.181918] 3 locks held by kworker/u8:0/927:
[ 2682.182988] #0: ffff888119d27138 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0x73c/0x1300
[ 2682.185078] #1: ffff88810c8bfdd0 ((work_completion)(&ctrl->dhchap_auth_work)){+.+.}-{0:0}, at: process_one_work+0x769/0x1300
[ 2682.187535] #2: ffffffffb483d560 (rcu_read_lock){....}-{1:2}, at: __flush_work+0xc2/0x900
[ 2682.189373]
stack backtrace:
[ 2682.190590] CPU: 2 PID: 927 Comm: kworker/u8:0 Not tainted 6.1.0-rc1+ #4
[ 2682.192118] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 2682.194630] Workqueue: nvme-wq nvme_dhchap_auth_work [nvme_core]
[ 2682.196108] Call Trace:
[ 2682.196755] <TASK>
[ 2682.197290] dump_stack_lvl+0x5b/0x77
[ 2682.198143] __lock_acquire.cold+0x36f/0x3f5
[ 2682.198510] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2682.199075] ? lock_chain_count+0x20/0x20
[ 2682.199089] ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 2682.200949] nvme nvme6: qid 0: controller authenticated
[ 2682.201771] lock_acquire+0x194/0x4f0
[ 2682.201783] ? __flush_work+0x40e/0x900
[ 2682.201796] ? lock_downgrade+0x6b0/0x6b0
[ 2682.201808] ? mark_held_locks+0x9e/0xe0
[ 2682.207078] ? lockdep_hardirqs_on_prepare+0x17b/0x410
[ 2682.208041] __flush_work+0x42e/0x900
[ 2682.208736] ? __flush_work+0x40e/0x900
[ 2682.209507] ? queue_delayed_work_on+0x90/0x90
[ 2682.210322] ? flush_workqueue_prep_pwqs+0x3f0/0x3f0
[ 2682.211209] nvme_dhchap_auth_work+0xf1/0x1f8 [nvme_core]
[ 2682.212163] process_one_work+0x816/0x1300
[ 2682.212858] ? lock_downgrade+0x6b0/0x6b0
[ 2682.213589] ? pwq_dec_nr_in_flight+0x230/0x230
[ 2682.214367] ? rwlock_bug.part.0+0x90/0x90
[ 2682.215062] worker_thread+0xfc/0x1270
[ 2682.215670] ? __kthread_parkme+0xc1/0x1f0
[ 2682.216367] ? process_one_work+0x1300/0x1300
[ 2682.217052] kthread+0x29b/0x340
[ 2682.217620] ? kthread_complete_and_exit+0x20/0x20
[ 2682.218313] ret_from_fork+0x1f/0x30
[ 2682.218922] </TASK>
[ 2682.254321] nvme nvme6: re-authenticating controller
[ 2682.292068] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2682.293937] nvme nvme6: qid 0: controller authenticated
[ 2682.334786] nvme nvme6: re-authenticating controller
[ 2682.364032] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2682.365898] nvme nvme6: qid 0: controller authenticated
[ 2686.934974] nvme nvme6: re-authenticating controller
[ 2687.345435] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 2687.347055] nvme nvme6: qid 0: controller authenticated
[ 2687.738816] nvme nvme6: re-authenticating controller
[ 2688.133101] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 2688.135123] nvme nvme6: qid 0: controller authenticated
[ 2689.624308] nvme nvme6: Removing ctrl: NQN "blktests-subsystem-1"
[ 2886.071612] loop: module loaded
[ 2886.269473] run blktests nvme/045 at 2022-10-18 16:37:28
[ 2886.533625] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 2886.726637] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid: with DH-HMAC-CHAP.
[ 2886.763826] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2886.765778] nvme nvme6: qid 0: controller authenticated
[ 2886.767199] nvme nvme6: qid 0: authenticated
[ 2886.769632] nvme nvme6: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[ 2886.772454] nvme nvme6: creating 4 I/O queues.
[ 2886.918047] nvme nvme6: new ctrl: "blktests-subsystem-1"
[ 2888.496721] nvme nvme6: re-authenticating controller
[ 2888.532075] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2888.534031] nvme nvme6: qid 0: controller authenticated
[ 2888.583146] nvme nvme6: re-authenticating controller
[ 2888.615654] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2888.617588] nvme nvme6: qid 0: controller authenticated
[ 2888.656311] nvme nvme6: re-authenticating controller
[ 2888.683953] nvme nvme6: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 2888.685868] nvme nvme6: qid 0: controller authenticated
[ 2888.724955] nvme nvme6: re-authenticating controller
[ 2889.200478] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 2889.202425] nvme nvme6: qid 0: controller authenticated
[ 2889.625949] nvme nvme6: re-authenticating controller
[ 2889.992842] nvme nvme6: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 2889.994815] nvme nvme6: qid 0: controller authenticated
[ 2891.514773] nvme nvme6: Removing ctrl: NQN "blktests-subsystem-1"
--
Shin'ichiro Kawasaki
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045
2022-10-18 8:03 bad unlock balance WARNING at nvme/045 Shinichiro Kawasaki
@ 2022-10-18 10:57 ` Sagi Grimberg
2022-10-26 2:13 ` Shinichiro Kawasaki
2022-10-26 6:42 ` Hannes Reinecke
0 siblings, 2 replies; 8+ messages in thread
From: Sagi Grimberg @ 2022-10-18 10:57 UTC (permalink / raw)
To: Shinichiro Kawasaki, linux-nvme, Hannes Reinecke; +Cc: Damien Le Moal
> Hello Hannes,
>
> I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As the Call
> Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
>
> mutex_lock(&ctrl->dhchap_auth_mutex);
> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
> mutex_unlock(&ctrl->dhchap_auth_mutex);
> flush_work(&chap->auth_work);
> __nvme_auth_reset(chap);
> }
> mutex_unlock(&ctrl->dhchap_auth_mutex);
>
> I tried to remove the mutex_unlock in the list iteration with a patch [2], but
> it resulted in another "WARNING: possible recursive locking detected" [3]. I'm
> not sure but cause of this WARN could be __nvme_auth_work and
> nvme_dhchap_auth_work in same nvme_wq.
>
> Could you take a look for fix?
I'm looking at the code and I think that the way the concurrent
negotiations and how dhchap_auth_mutex is handled is very fragile,
also why should the per-queue auth_work hold the controller-wide
dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate
is checking if the chap context is already queued? Why should we
allow that?
I'd suggest to splice dhchap_auth_list, to a local list and then just
flush nvmet_wq in teardown flows. Same for renegotiations/reset flows.
And we should prevent for the double-queuing of chap negotiations to
begin with, instead of handling them (I still don't understand why this
is permitted, but perhaps just return EBUSY in this case?)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045
2022-10-18 10:57 ` Sagi Grimberg
@ 2022-10-26 2:13 ` Shinichiro Kawasaki
2022-10-26 6:42 ` Hannes Reinecke
1 sibling, 0 replies; 8+ messages in thread
From: Shinichiro Kawasaki @ 2022-10-26 2:13 UTC (permalink / raw)
To: Sagi Grimberg; +Cc: linux-nvme, Hannes Reinecke, Damien Le Moal
On Oct 18, 2022 / 13:57, Sagi Grimberg wrote:
>
> > Hello Hannes,
> >
> > I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As the Call
> > Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
> >
> > mutex_lock(&ctrl->dhchap_auth_mutex);
> > list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
> > mutex_unlock(&ctrl->dhchap_auth_mutex);
> > flush_work(&chap->auth_work);
> > __nvme_auth_reset(chap);
> > }
> > mutex_unlock(&ctrl->dhchap_auth_mutex);
> >
> > I tried to remove the mutex_unlock in the list iteration with a patch [2], but
> > it resulted in another "WARNING: possible recursive locking detected" [3]. I'm
> > not sure but cause of this WARN could be __nvme_auth_work and
> > nvme_dhchap_auth_work in same nvme_wq.
> >
> > Could you take a look for fix?
Sagi, thank you for the comments.
> I'm looking at the code and I think that the way the concurrent
> negotiations and how dhchap_auth_mutex is handled is very fragile,
> also why should the per-queue auth_work hold the controller-wide
> dhchap_auth_mutex?
It looks for me that dhchap_auth_mutex lock in nvme_auth_negotiate()
simply guards ctrl->dhchap_auth_list access (look up the list and
add an element to the list).
> The only reason I see is because nvme_auth_negotiate
> is checking if the chap context is already queued? Why should we
> allow that?
I'm not sure about this. I would like to ask other experts' comment on it.
>
> I'd suggest to splice dhchap_auth_list, to a local list and then just
> flush nvmet_wq in teardown flows. Same for renegotiations/reset flows.
Thanks for the suggestion. I'm not sure how to use list splice of
dhchap_auth_list at renegotitation. But as for reset by nvme_auth_reset(),
your suggestion looks good for me to avoid the unbalanced lock. It can be
applied to tear down by nvme_auth_free() also. I created a patch [1] with
this approach. It just frees the dhchap contexts at reset and do not reuse
them for negotiation. I think this is fine since the contexts are re-allocated
at renegotiation.
> And we should prevent for the double-queuing of chap negotiations to
> begin with, instead of handling them (I still don't understand why this
> is permitted, but perhaps just return EBUSY in this case?)
Again, I'm not sure about this point.
I confirmed the patch [1] avoids the unbalanced lock. However, still the other
"WARNING: possible recursive locking detected" was reported. I created another
patch to add a workqueue dedicated to nvme auth, then it avoided the WARN. Are
these fix approches ok? If so, I can post the two patches for the two WARNs.
[1]
diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
index c8a6db7c4498..c71f1be7d403 100644
--- a/drivers/nvme/host/auth.c
+++ b/drivers/nvme/host/auth.c
@@ -920,17 +920,25 @@ int nvme_auth_wait(struct nvme_ctrl *ctrl, int qid)
}
EXPORT_SYMBOL_GPL(nvme_auth_wait);
-void nvme_auth_reset(struct nvme_ctrl *ctrl)
+static void nvme_auth_free_queue_contexts(struct nvme_ctrl *ctrl)
{
- struct nvme_dhchap_queue_context *chap;
+ struct nvme_dhchap_queue_context *chap = NULL, *tmp;
+ LIST_HEAD(splice);
mutex_lock(&ctrl->dhchap_auth_mutex);
- list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
- mutex_unlock(&ctrl->dhchap_auth_mutex);
+ list_splice_init(&ctrl->dhchap_auth_list, &splice);
+ mutex_unlock(&ctrl->dhchap_auth_mutex);
+
+ list_for_each_entry_safe(chap, tmp, &splice, entry) {
+ list_del_init(&chap->entry);
flush_work(&chap->auth_work);
- __nvme_auth_reset(chap);
+ __nvme_auth_free(chap);
}
- mutex_unlock(&ctrl->dhchap_auth_mutex);
+}
+
+void nvme_auth_reset(struct nvme_ctrl *ctrl)
+{
+ nvme_auth_free_queue_contexts(ctrl);
}
EXPORT_SYMBOL_GPL(nvme_auth_reset);
@@ -996,15 +1004,8 @@ EXPORT_SYMBOL_GPL(nvme_auth_stop);
void nvme_auth_free(struct nvme_ctrl *ctrl)
{
- struct nvme_dhchap_queue_context *chap = NULL, *tmp;
+ nvme_auth_free_queue_contexts(ctrl);
- mutex_lock(&ctrl->dhchap_auth_mutex);
- list_for_each_entry_safe(chap, tmp, &ctrl->dhchap_auth_list, entry) {
- list_del_init(&chap->entry);
- flush_work(&chap->auth_work);
- __nvme_auth_free(chap);
- }
- mutex_unlock(&ctrl->dhchap_auth_mutex);
if (ctrl->host_key) {
nvme_auth_free_key(ctrl->host_key);
ctrl->host_key = NULL;
--
Shin'ichiro Kawasaki
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045
2022-10-18 10:57 ` Sagi Grimberg
2022-10-26 2:13 ` Shinichiro Kawasaki
@ 2022-10-26 6:42 ` Hannes Reinecke
2022-10-26 12:01 ` Shinichiro Kawasaki
2022-10-26 12:27 ` Sagi Grimberg
1 sibling, 2 replies; 8+ messages in thread
From: Hannes Reinecke @ 2022-10-26 6:42 UTC (permalink / raw)
To: Sagi Grimberg, Shinichiro Kawasaki, linux-nvme; +Cc: Damien Le Moal
On 10/18/22 12:57, Sagi Grimberg wrote:
>
>> Hello Hannes,
>>
>> I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As
>> the Call
>> Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
>>
>> mutex_lock(&ctrl->dhchap_auth_mutex);
>> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
>> mutex_unlock(&ctrl->dhchap_auth_mutex);
>> flush_work(&chap->auth_work);
>> __nvme_auth_reset(chap);
>> }
>> mutex_unlock(&ctrl->dhchap_auth_mutex);
>>
>> I tried to remove the mutex_unlock in the list iteration with a patch
>> [2], but
>> it resulted in another "WARNING: possible recursive locking detected"
>> [3]. I'm
>> not sure but cause of this WARN could be __nvme_auth_work and
>> nvme_dhchap_auth_work in same nvme_wq.
>>
>> Could you take a look for fix?
>
> I'm looking at the code and I think that the way the concurrent
> negotiations and how dhchap_auth_mutex is handled is very fragile,
> also why should the per-queue auth_work hold the controller-wide
> dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate
> is checking if the chap context is already queued? Why should we
> allow that?
>
Well; that's partially due to the internal design of linux-nvme.
The controller structure itself doesn't have 'queues' per se; there just
is a general 'ctrl' structure. So while I would have loved to have a
per-queue structure to hook the chap authentication into, all I have is
the controller structure.
Hence we have a controller-wide list holding all 'chap' structures for
the individual queues.
Hence the controller-wide mutex to gate list modifications.
> I'd suggest to splice dhchap_auth_list, to a local list and then just
> flush nvmet_wq in teardown flows. Same for renegotiations/reset flows.
> And we should prevent for the double-queuing of chap negotiations to
> begin with, instead of handling them (I still don't understand why this
> is permitted, but perhaps just return EBUSY in this case?)
We don't double queue; we're re-using the existing entries.
Can you check if this fix works?
diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
index c8a6db7c4498..4e824aab30eb 100644
--- a/drivers/nvme/host/auth.c
+++ b/drivers/nvme/host/auth.c
@@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl)
mutex_lock(&ctrl->dhchap_auth_mutex);
list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
- mutex_unlock(&ctrl->dhchap_auth_mutex);
flush_work(&chap->auth_work);
__nvme_auth_reset(chap);
}
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045
2022-10-26 6:42 ` Hannes Reinecke
@ 2022-10-26 12:01 ` Shinichiro Kawasaki
2022-10-26 12:38 ` Sagi Grimberg
2022-10-26 12:27 ` Sagi Grimberg
1 sibling, 1 reply; 8+ messages in thread
From: Shinichiro Kawasaki @ 2022-10-26 12:01 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Sagi Grimberg, linux-nvme, Damien Le Moal
On Oct 26, 2022 / 08:42, Hannes Reinecke wrote:
> On 10/18/22 12:57, Sagi Grimberg wrote:
> >
> > > Hello Hannes,
> > >
> > > I observed "WARNING: bad unlock balance detected!" at nvme/045 [1].
> > > As the Call
> > > Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
> > >
> > > mutex_lock(&ctrl->dhchap_auth_mutex);
> > > list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
> > > mutex_unlock(&ctrl->dhchap_auth_mutex);
> > > flush_work(&chap->auth_work);
> > > __nvme_auth_reset(chap);
> > > }
> > > mutex_unlock(&ctrl->dhchap_auth_mutex);
> > >
> > > I tried to remove the mutex_unlock in the list iteration with a
> > > patch [2], but
> > > it resulted in another "WARNING: possible recursive locking
> > > detected" [3]. I'm
> > > not sure but cause of this WARN could be __nvme_auth_work and
> > > nvme_dhchap_auth_work in same nvme_wq.
> > >
> > > Could you take a look for fix?
> >
> > I'm looking at the code and I think that the way the concurrent
> > negotiations and how dhchap_auth_mutex is handled is very fragile,
> > also why should the per-queue auth_work hold the controller-wide
> > dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate
> > is checking if the chap context is already queued? Why should we
> > allow that?
> >
> Well; that's partially due to the internal design of linux-nvme.
> The controller structure itself doesn't have 'queues' per se; there just is
> a general 'ctrl' structure. So while I would have loved to have a per-queue
> structure to hook the chap authentication into, all I have is the controller
> structure.
> Hence we have a controller-wide list holding all 'chap' structures for the
> individual queues.
> Hence the controller-wide mutex to gate list modifications.
>
> > I'd suggest to splice dhchap_auth_list, to a local list and then just
> > flush nvmet_wq in teardown flows. Same for renegotiations/reset flows.
> > And we should prevent for the double-queuing of chap negotiations to
> > begin with, instead of handling them (I still don't understand why this
> > is permitted, but perhaps just return EBUSY in this case?)
>
> We don't double queue; we're re-using the existing entries.
Hannes, thanks for the explanations.
>
> Can you check if this fix works?
>
> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
> index c8a6db7c4498..4e824aab30eb 100644
> --- a/drivers/nvme/host/auth.c
> +++ b/drivers/nvme/host/auth.c
> @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl)
>
> mutex_lock(&ctrl->dhchap_auth_mutex);
> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
> - mutex_unlock(&ctrl->dhchap_auth_mutex);
> flush_work(&chap->auth_work);
> __nvme_auth_reset(chap);
> }
I confirmed this hunk avoids the "WARNING: bad unlock balance detected!". As far
as I ran blktests with this change, I observe no failure in other test cases.
However, I observed another new WARN at nvme/045: "WARNING: possible recursive
locking detected". I think it was caused by nvme_dhchap_auth_work in nvme_wq
tried to flush another work __nvme_auth_work in the same workqueue. I created a
patch below which creates another workqueue nvme_auth_wq for __nvme_auth_work.
Do you think this fix approach is acceptable?
diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
index 4e824aab30eb..946085070223 100644
--- a/drivers/nvme/host/auth.c
+++ b/drivers/nvme/host/auth.c
@@ -42,6 +42,8 @@ struct nvme_dhchap_queue_context {
int sess_key_len;
};
+struct workqueue_struct *nvme_auth_wq;
+
#define nvme_auth_flags_from_qid(qid) \
(qid == 0) ? 0 : BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_RESERVED
#define nvme_auth_queue_from_qid(ctrl, qid) \
@@ -869,7 +871,7 @@ int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid)
mutex_unlock(&ctrl->dhchap_auth_mutex);
flush_work(&chap->auth_work);
__nvme_auth_reset(chap);
- queue_work(nvme_wq, &chap->auth_work);
+ queue_work(nvme_auth_wq, &chap->auth_work);
return 0;
}
}
@@ -896,7 +898,7 @@ int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid)
INIT_WORK(&chap->auth_work, __nvme_auth_work);
list_add(&chap->entry, &ctrl->dhchap_auth_list);
mutex_unlock(&ctrl->dhchap_auth_mutex);
- queue_work(nvme_wq, &chap->auth_work);
+ queue_work(nvme_auth_wq, &chap->auth_work);
return 0;
}
EXPORT_SYMBOL_GPL(nvme_auth_negotiate);
@@ -969,6 +971,21 @@ static void nvme_dhchap_auth_work(struct work_struct *work)
*/
}
+int nvme_auth_init(void)
+{
+ nvme_auth_wq = alloc_workqueue("nvme-auth-wq",
+ WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
+ if (!nvme_auth_wq)
+ return -ENOMEM;
+
+ return 0;
+}
+
+void nvme_auth_exit(void)
+{
+ destroy_workqueue(nvme_auth_wq);
+}
+
void nvme_auth_init_ctrl(struct nvme_ctrl *ctrl)
{
INIT_LIST_HEAD(&ctrl->dhchap_auth_list);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 059737c1a2c1..aa06c686ad29 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5341,8 +5341,14 @@ static int __init nvme_core_init(void)
goto unregister_generic_ns;
}
+ result = nvme_auth_init();
+ if (result)
+ goto exit_nvme_auth;
+
return 0;
+exit_nvme_auth:
+ nvme_auth_exit();
unregister_generic_ns:
unregister_chrdev_region(nvme_ns_chr_devt, NVME_MINORS);
destroy_subsys_class:
@@ -5363,6 +5369,7 @@ static int __init nvme_core_init(void)
static void __exit nvme_core_exit(void)
{
+ nvme_auth_exit();
class_destroy(nvme_ns_chr_class);
class_destroy(nvme_subsys_class);
class_destroy(nvme_class);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a29877217ee6..472ab2d14a67 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1019,6 +1019,8 @@ static inline bool nvme_ctrl_sgl_supported(struct nvme_ctrl *ctrl)
}
#ifdef CONFIG_NVME_AUTH
+int nvme_auth_init(void);
+void nvme_auth_exit(void);
void nvme_auth_init_ctrl(struct nvme_ctrl *ctrl);
void nvme_auth_stop(struct nvme_ctrl *ctrl);
int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid);
@@ -1026,6 +1028,8 @@ int nvme_auth_wait(struct nvme_ctrl *ctrl, int qid);
void nvme_auth_reset(struct nvme_ctrl *ctrl);
void nvme_auth_free(struct nvme_ctrl *ctrl);
#else
+static inline int nvme_auth_init(void) {};
+static inline void nvme_auth_exit(void) {};
static inline void nvme_auth_init_ctrl(struct nvme_ctrl *ctrl) {};
static inline void nvme_auth_stop(struct nvme_ctrl *ctrl) {};
static inline int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid)
--
Shin'ichiro Kawasaki
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045
2022-10-26 6:42 ` Hannes Reinecke
2022-10-26 12:01 ` Shinichiro Kawasaki
@ 2022-10-26 12:27 ` Sagi Grimberg
1 sibling, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2022-10-26 12:27 UTC (permalink / raw)
To: Hannes Reinecke, Shinichiro Kawasaki, linux-nvme; +Cc: Damien Le Moal
On 10/26/22 09:42, Hannes Reinecke wrote:
> On 10/18/22 12:57, Sagi Grimberg wrote:
>>
>>> Hello Hannes,
>>>
>>> I observed "WARNING: bad unlock balance detected!" at nvme/045 [1].
>>> As the Call
>>> Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
>>>
>>> mutex_lock(&ctrl->dhchap_auth_mutex);
>>> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
>>> mutex_unlock(&ctrl->dhchap_auth_mutex);
>>> flush_work(&chap->auth_work);
>>> __nvme_auth_reset(chap);
>>> }
>>> mutex_unlock(&ctrl->dhchap_auth_mutex);
>>>
>>> I tried to remove the mutex_unlock in the list iteration with a patch
>>> [2], but
>>> it resulted in another "WARNING: possible recursive locking detected"
>>> [3]. I'm
>>> not sure but cause of this WARN could be __nvme_auth_work and
>>> nvme_dhchap_auth_work in same nvme_wq.
>>>
>>> Could you take a look for fix?
>>
>> I'm looking at the code and I think that the way the concurrent
>> negotiations and how dhchap_auth_mutex is handled is very fragile,
>> also why should the per-queue auth_work hold the controller-wide
>> dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate
>> is checking if the chap context is already queued? Why should we
>> allow that?
>>
> Well; that's partially due to the internal design of linux-nvme.
> The controller structure itself doesn't have 'queues' per se; there just
> is a general 'ctrl' structure. So while I would have loved to have a
> per-queue structure to hook the chap authentication into, all I have is
> the controller structure.
> Hence we have a controller-wide list holding all 'chap' structures for
> the individual queues.
> Hence the controller-wide mutex to gate list modifications.
This stil can easily be an array that does not enforce a list traversal.
>
>> I'd suggest to splice dhchap_auth_list, to a local list and then just
>> flush nvmet_wq in teardown flows. Same for renegotiations/reset flows.
>> And we should prevent for the double-queuing of chap negotiations to
>> begin with, instead of handling them (I still don't understand why this
>> is permitted, but perhaps just return EBUSY in this case?)
>
> We don't double queue; we're re-using the existing entries.
I'd argue that you shouldn't. It'd be better to simply delete entries
once queue auth is done...
> Can you check if this fix works?
>
> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
> index c8a6db7c4498..4e824aab30eb 100644
> --- a/drivers/nvme/host/auth.c
> +++ b/drivers/nvme/host/auth.c
> @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl)
>
> mutex_lock(&ctrl->dhchap_auth_mutex);
> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
> - mutex_unlock(&ctrl->dhchap_auth_mutex);
> flush_work(&chap->auth_work);
> __nvme_auth_reset(chap);
> }
How can this work? auth_work currently acquires the mutex. This
shouldn't solve anything.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045
2022-10-26 12:01 ` Shinichiro Kawasaki
@ 2022-10-26 12:38 ` Sagi Grimberg
2022-10-28 13:52 ` Hannes Reinecke
0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2022-10-26 12:38 UTC (permalink / raw)
To: Shinichiro Kawasaki, Hannes Reinecke; +Cc: linux-nvme, Damien Le Moal
>>>> Hello Hannes,
>>>>
>>>> I observed "WARNING: bad unlock balance detected!" at nvme/045 [1].
>>>> As the Call
>>>> Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
>>>>
>>>> mutex_lock(&ctrl->dhchap_auth_mutex);
>>>> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
>>>> mutex_unlock(&ctrl->dhchap_auth_mutex);
>>>> flush_work(&chap->auth_work);
>>>> __nvme_auth_reset(chap);
>>>> }
>>>> mutex_unlock(&ctrl->dhchap_auth_mutex);
>>>>
>>>> I tried to remove the mutex_unlock in the list iteration with a
>>>> patch [2], but
>>>> it resulted in another "WARNING: possible recursive locking
>>>> detected" [3]. I'm
>>>> not sure but cause of this WARN could be __nvme_auth_work and
>>>> nvme_dhchap_auth_work in same nvme_wq.
>>>>
>>>> Could you take a look for fix?
>>>
>>> I'm looking at the code and I think that the way the concurrent
>>> negotiations and how dhchap_auth_mutex is handled is very fragile,
>>> also why should the per-queue auth_work hold the controller-wide
>>> dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate
>>> is checking if the chap context is already queued? Why should we
>>> allow that?
>>>
>> Well; that's partially due to the internal design of linux-nvme.
>> The controller structure itself doesn't have 'queues' per se; there just is
>> a general 'ctrl' structure. So while I would have loved to have a per-queue
>> structure to hook the chap authentication into, all I have is the controller
>> structure.
>> Hence we have a controller-wide list holding all 'chap' structures for the
>> individual queues.
>> Hence the controller-wide mutex to gate list modifications.
>>
>>> I'd suggest to splice dhchap_auth_list, to a local list and then just
>>> flush nvmet_wq in teardown flows. Same for renegotiations/reset flows.
>>> And we should prevent for the double-queuing of chap negotiations to
>>> begin with, instead of handling them (I still don't understand why this
>>> is permitted, but perhaps just return EBUSY in this case?)
>>
>> We don't double queue; we're re-using the existing entries.
>
> Hannes, thanks for the explanations.
>
>>
>> Can you check if this fix works?
>>
>> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
>> index c8a6db7c4498..4e824aab30eb 100644
>> --- a/drivers/nvme/host/auth.c
>> +++ b/drivers/nvme/host/auth.c
>> @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl)
>>
>> mutex_lock(&ctrl->dhchap_auth_mutex);
>> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
>> - mutex_unlock(&ctrl->dhchap_auth_mutex);
>> flush_work(&chap->auth_work);
>> __nvme_auth_reset(chap);
>> }
>
> I confirmed this hunk avoids the "WARNING: bad unlock balance detected!". As far
> as I ran blktests with this change, I observe no failure in other test cases.
>
> However, I observed another new WARN at nvme/045: "WARNING: possible recursive
> locking detected". I think it was caused by nvme_dhchap_auth_work in nvme_wq
> tried to flush another work __nvme_auth_work in the same workqueue. I created a
> patch below which creates another workqueue nvme_auth_wq for __nvme_auth_work.
> Do you think this fix approach is acceptable?
It is fine to flush work on the same workqueue, the problem is that they
share the same lock.
I think what we should do is:
1. have chap contexts in an array and not a list so we don't need
to traverse to locate a context.
2. omit the dhchap_auth_mutex as I don't see a need to protect the
array.
3. synchronize only the chap context itself, either with a lock, or
a flag, or a state (atomic).
4. get rid of long lived allocations, it is not optimizing a whole lot
imho, and right now we are reserving something like 5k per queue.
5. add one more synchronization between chap and ctrl accessing the keys
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad unlock balance WARNING at nvme/045
2022-10-26 12:38 ` Sagi Grimberg
@ 2022-10-28 13:52 ` Hannes Reinecke
0 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2022-10-28 13:52 UTC (permalink / raw)
To: Sagi Grimberg, Shinichiro Kawasaki; +Cc: linux-nvme, Damien Le Moal
On 10/26/22 14:38, Sagi Grimberg wrote:
>
[ .. ]
>> Hannes, thanks for the explanations.
>>
>>>
>>> Can you check if this fix works?
>>>
>>> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
>>> index c8a6db7c4498..4e824aab30eb 100644
>>> --- a/drivers/nvme/host/auth.c
>>> +++ b/drivers/nvme/host/auth.c
>>> @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl)
>>>
>>> mutex_lock(&ctrl->dhchap_auth_mutex);
>>> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
>>> - mutex_unlock(&ctrl->dhchap_auth_mutex);
>>> flush_work(&chap->auth_work);
>>> __nvme_auth_reset(chap);
>>> }
>>
>> I confirmed this hunk avoids the "WARNING: bad unlock balance
>> detected!". As far
>> as I ran blktests with this change, I observe no failure in other test
>> cases.
>>
>> However, I observed another new WARN at nvme/045: "WARNING: possible
>> recursive
>> locking detected". I think it was caused by nvme_dhchap_auth_work in
>> nvme_wq
>> tried to flush another work __nvme_auth_work in the same workqueue. I
>> created a
>> patch below which creates another workqueue nvme_auth_wq for
>> __nvme_auth_work.
>> Do you think this fix approach is acceptable?
>
> It is fine to flush work on the same workqueue, the problem is that they
> share the same lock.
>
> I think what we should do is:
> 1. have chap contexts in an array and not a list so we don't need
> to traverse to locate a context.
> 2. omit the dhchap_auth_mutex as I don't see a need to protect the
> array.
> 3. synchronize only the chap context itself, either with a lock, or
> a flag, or a state (atomic).
> 4. get rid of long lived allocations, it is not optimizing a whole lot
> imho, and right now we are reserving something like 5k per queue.
> 5. add one more synchronization between chap and ctrl accessing the keys
>
I've now send a patchset which should addressing this; can you check if
that's what you had in mind?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-10-28 13:52 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18 8:03 bad unlock balance WARNING at nvme/045 Shinichiro Kawasaki
2022-10-18 10:57 ` Sagi Grimberg
2022-10-26 2:13 ` Shinichiro Kawasaki
2022-10-26 6:42 ` Hannes Reinecke
2022-10-26 12:01 ` Shinichiro Kawasaki
2022-10-26 12:38 ` Sagi Grimberg
2022-10-28 13:52 ` Hannes Reinecke
2022-10-26 12:27 ` Sagi Grimberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).