* lockdep splat when disconnecting NVMe oF TCP inititator
@ 2020-09-23 9:02 Lucas Stach
2020-09-29 8:20 ` Sagi Grimberg
0 siblings, 1 reply; 2+ messages in thread
From: Lucas Stach @ 2020-09-23 9:02 UTC (permalink / raw)
To: Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni; +Cc: linux-nvme
Hi nvme folks,
I've toyed a bit with NVMe oF TCP over the last few days and came
across the following lockdep splat on the target when disconnecting the
initiator. I don't know if I'll have time to look into this anytime
soon, so I figured I could at least leave the report here.
Regards,
Lucas
[ 142.078964] WARNING: possible circular locking dependency detected
[ 142.085172] 5.9.0-rc5 #2 Not tainted
[ 142.091982] ------------------------------------------------------
[ 142.098186] kworker/0:3/74 is trying to acquire lock:
[ 142.103259] ffff000015228830 ((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: __flush_work+0x54/0x510
[ 142.112981]
[ 142.112981] but task is already holding lock:
[ 142.118835] ffff800011efbdc0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
[ 142.129413]
[ 142.129413] which lock already depends on the new lock.
[ 142.129413]
[ 142.137618]
[ 142.137618] the existing dependency chain (in reverse order) is:
[ 142.145124]
[ 142.145124] -> #2 ((work_completion)(&queue->release_work)){+.+.}-{0:0}:
[ 142.153352] process_one_work+0x248/0x710
[ 142.157906] worker_thread+0x74/0x470
[ 142.162119] kthread+0x15c/0x160
[ 142.165894] ret_from_fork+0x10/0x38
[ 142.170005]
[ 142.170005] -> #1 ((wq_completion)events){+.+.}-{0:0}:
[ 142.176665] flush_workqueue+0x98/0x400
[ 142.181054] nvmet_tcp_install_queue+0x11c/0x130
[ 142.186217] nvmet_install_queue+0xbc/0x150
[ 142.190946] nvmet_execute_admin_connect+0x11c/0x200
[ 142.196458] nvmet_tcp_io_work+0x8c0/0x950
[ 142.201099] process_one_work+0x294/0x710
[ 142.205654] worker_thread+0x74/0x470
[ 142.209860] kthread+0x15c/0x160
[ 142.213630] ret_from_fork+0x10/0x38
[ 142.217742]
[ 142.217742] -> #0 ((work_completion)(&queue->io_work)){+.+.}-{0:0}:
[ 142.225542] __lock_acquire+0x13fc/0x2160
[ 142.230097] lock_acquire+0xec/0x4d0
[ 142.234215] __flush_work+0x7c/0x510
[ 142.238333] flush_work+0x14/0x20
[ 142.242192] nvmet_tcp_release_queue_work+0xb0/0x280
[ 142.247703] process_one_work+0x294/0x710
[ 142.252257] worker_thread+0x74/0x470
[ 142.256465] kthread+0x15c/0x160
[ 142.260234] ret_from_fork+0x10/0x38
[ 142.264346]
[ 142.264346] other info that might help us debug this:
[ 142.264346]
[ 142.272375] Chain exists of:
[ 142.272375] (work_completion)(&queue->io_work) --> (wq_completion)events --> (work_completion)(&queue->release_work)
[ 142.272375]
[ 142.287476] Possible unsafe locking scenario:
[ 142.287476]
[ 142.293415] CPU0 CPU1
[ 142.297962] ---- ----
[ 142.302508] lock((work_completion)(&queue->release_work));
[ 142.308194] lock((wq_completion)events);
[ 142.314836] lock((work_completion)(&queue->release_work));
[ 142.323046] lock((work_completion)(&queue->io_work));
[ 142.328296]
[ 142.328296] *** DEADLOCK ***
[ 142.328296]
[ 142.334239] 2 locks held by kworker/0:3/74:
[ 142.338438] #0: ffff000017405738 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
[ 142.347887] #1: ffff800011efbdc0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
[ 142.358901]
[ 142.358901] stack backtrace:
[ 142.363284] CPU: 0 PID: 74 Comm: kworker/0:3 Not tainted 5.9.0-rc5 #2
[ 142.372970] Hardware name: XXX (DT)
[ 142.378053] Workqueue: events nvmet_tcp_release_queue_work
[ 142.383563] Call trace:
[ 142.386032] dump_backtrace+0x0/0x1b0
[ 142.389717] show_stack+0x18/0x30
[ 142.393055] dump_stack+0xe8/0x15c
[ 142.396480] print_circular_bug+0x278/0x280
[ 142.400688] check_noncircular+0x164/0x1e0
[ 142.404808] __lock_acquire+0x13fc/0x2160
[ 142.408841] lock_acquire+0xec/0x4d0
[ 142.412437] __flush_work+0x7c/0x510
[ 142.416032] flush_work+0x14/0x20
[ 142.419369] nvmet_tcp_release_queue_work+0xb0/0x280
[ 142.424358] process_one_work+0x294/0x710
[ 142.428390] worker_thread+0x74/0x470
[ 142.432075] kthread+0x15c/0x160
[ 142.435322] ret_from_fork+0x10/0x38
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: lockdep splat when disconnecting NVMe oF TCP inititator
2020-09-23 9:02 lockdep splat when disconnecting NVMe oF TCP inititator Lucas Stach
@ 2020-09-29 8:20 ` Sagi Grimberg
0 siblings, 0 replies; 2+ messages in thread
From: Sagi Grimberg @ 2020-09-29 8:20 UTC (permalink / raw)
To: Lucas Stach, Christoph Hellwig, Chaitanya Kulkarni; +Cc: linux-nvme
> Hi nvme folks,
>
> I've toyed a bit with NVMe oF TCP over the last few days and came
> across the following lockdep splat on the target when disconnecting the
> initiator. I don't know if I'll have time to look into this anytime
> soon, so I figured I could at least leave the report here.
This is the way the driver always behaved... To be honest I'm not clear
at all that this is not a false negative.
Nothing in the release_work is dependent on the flush_scheduled_work
waiting for online queue removals...
I'll try to have a deeper look later.
>
> Regards,
> Lucas
>
> [ 142.078964] WARNING: possible circular locking dependency detected
> [ 142.085172] 5.9.0-rc5 #2 Not tainted
> [ 142.091982] ------------------------------------------------------
> [ 142.098186] kworker/0:3/74 is trying to acquire lock:
> [ 142.103259] ffff000015228830 ((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: __flush_work+0x54/0x510
> [ 142.112981]
> [ 142.112981] but task is already holding lock:
> [ 142.118835] ffff800011efbdc0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
> [ 142.129413]
> [ 142.129413] which lock already depends on the new lock.
> [ 142.129413]
> [ 142.137618]
> [ 142.137618] the existing dependency chain (in reverse order) is:
> [ 142.145124]
> [ 142.145124] -> #2 ((work_completion)(&queue->release_work)){+.+.}-{0:0}:
> [ 142.153352] process_one_work+0x248/0x710
> [ 142.157906] worker_thread+0x74/0x470
> [ 142.162119] kthread+0x15c/0x160
> [ 142.165894] ret_from_fork+0x10/0x38
> [ 142.170005]
> [ 142.170005] -> #1 ((wq_completion)events){+.+.}-{0:0}:
> [ 142.176665] flush_workqueue+0x98/0x400
> [ 142.181054] nvmet_tcp_install_queue+0x11c/0x130
> [ 142.186217] nvmet_install_queue+0xbc/0x150
> [ 142.190946] nvmet_execute_admin_connect+0x11c/0x200
> [ 142.196458] nvmet_tcp_io_work+0x8c0/0x950
> [ 142.201099] process_one_work+0x294/0x710
> [ 142.205654] worker_thread+0x74/0x470
> [ 142.209860] kthread+0x15c/0x160
> [ 142.213630] ret_from_fork+0x10/0x38
> [ 142.217742]
> [ 142.217742] -> #0 ((work_completion)(&queue->io_work)){+.+.}-{0:0}:
> [ 142.225542] __lock_acquire+0x13fc/0x2160
> [ 142.230097] lock_acquire+0xec/0x4d0
> [ 142.234215] __flush_work+0x7c/0x510
> [ 142.238333] flush_work+0x14/0x20
> [ 142.242192] nvmet_tcp_release_queue_work+0xb0/0x280
> [ 142.247703] process_one_work+0x294/0x710
> [ 142.252257] worker_thread+0x74/0x470
> [ 142.256465] kthread+0x15c/0x160
> [ 142.260234] ret_from_fork+0x10/0x38
> [ 142.264346]
> [ 142.264346] other info that might help us debug this:
> [ 142.264346]
> [ 142.272375] Chain exists of:
> [ 142.272375] (work_completion)(&queue->io_work) --> (wq_completion)events --> (work_completion)(&queue->release_work)
> [ 142.272375]
> [ 142.287476] Possible unsafe locking scenario:
> [ 142.287476]
> [ 142.293415] CPU0 CPU1
> [ 142.297962] ---- ----
> [ 142.302508] lock((work_completion)(&queue->release_work));
> [ 142.308194] lock((wq_completion)events);
> [ 142.314836] lock((work_completion)(&queue->release_work));
> [ 142.323046] lock((work_completion)(&queue->io_work));
> [ 142.328296]
> [ 142.328296] *** DEADLOCK ***
> [ 142.328296]
> [ 142.334239] 2 locks held by kworker/0:3/74:
> [ 142.338438] #0: ffff000017405738 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
> [ 142.347887] #1: ffff800011efbdc0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
> [ 142.358901]
> [ 142.358901] stack backtrace:
> [ 142.363284] CPU: 0 PID: 74 Comm: kworker/0:3 Not tainted 5.9.0-rc5 #2
> [ 142.372970] Hardware name: XXX (DT)
> [ 142.378053] Workqueue: events nvmet_tcp_release_queue_work
> [ 142.383563] Call trace:
> [ 142.386032] dump_backtrace+0x0/0x1b0
> [ 142.389717] show_stack+0x18/0x30
> [ 142.393055] dump_stack+0xe8/0x15c
> [ 142.396480] print_circular_bug+0x278/0x280
> [ 142.400688] check_noncircular+0x164/0x1e0
> [ 142.404808] __lock_acquire+0x13fc/0x2160
> [ 142.408841] lock_acquire+0xec/0x4d0
> [ 142.412437] __flush_work+0x7c/0x510
> [ 142.416032] flush_work+0x14/0x20
> [ 142.419369] nvmet_tcp_release_queue_work+0xb0/0x280
> [ 142.424358] process_one_work+0x294/0x710
> [ 142.428390] worker_thread+0x74/0x470
> [ 142.432075] kthread+0x15c/0x160
> [ 142.435322] ret_from_fork+0x10/0x38
>
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-09-29 8:20 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 9:02 lockdep splat when disconnecting NVMe oF TCP inititator Lucas Stach
2020-09-29 8:20 ` Sagi Grimberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).