* Lockdep warning after `mdadm -S`
@ 2020-06-10 7:19 Michał Mirosław
2020-06-10 11:00 ` Nikolay Borisov
0 siblings, 1 reply; 2+ messages in thread
From: Michał Mirosław @ 2020-06-10 7:19 UTC (permalink / raw)
To: linux-btrfs, linux-raid, linux-fsdevel; +Cc: linux-kernel
Dear Developers,
I found a lockdep warning in dmesg some after doing 'mdadm -S' while
also having btrfs mounted (light to none I/O load). Disks under MD and
btrfs are unrelated.
Best Regards,
Michał Mirosław
======================================================
WARNING: possible circular locking dependency detected
5.7.1mq+ #383 Tainted: G O
------------------------------------------------------
kworker/u16:3/8175 is trying to acquire lock:
ffff8882f19556a0 (sb_internal#3){.+.+}-{0:0}, at: start_transaction+0x37e/0x550 [btrfs]
but task is already holding lock:
ffffc900087c7e68 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x235/0x620
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
__flush_work+0x331/0x490
wb_shutdown+0x8f/0xb0
bdi_unregister+0x72/0x1f0
del_gendisk+0x2b0/0x2c0
md_free+0x28/0x90
kobject_put+0xa6/0x1b0
process_one_work+0x2b6/0x620
worker_thread+0x35/0x3e0
kthread+0x143/0x160
ret_from_fork+0x3a/0x50
-> #7 ((work_completion)(&mddev->del_work)){+.+.}-{0:0}:
process_one_work+0x28d/0x620
worker_thread+0x35/0x3e0
kthread+0x143/0x160
ret_from_fork+0x3a/0x50
-> #6 ((wq_completion)md_misc){+.+.}-{0:0}:
flush_workqueue+0xa9/0x4e0
__md_stop_writes+0x18/0x100
do_md_stop+0x165/0x2d0
md_ioctl+0xa52/0x1d60
blkdev_ioctl+0x1cc/0x2a0
block_ioctl+0x3a/0x40
ksys_ioctl+0x81/0xc0
__x64_sys_ioctl+0x11/0x20
do_syscall_64+0x4f/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xb3
-> #5 (&mddev->open_mutex){+.+.}-{3:3}:
__mutex_lock+0x93/0x9c0
md_open+0x43/0xc0
__blkdev_get+0xea/0x560
blkdev_get+0x60/0x130
do_dentry_open+0x147/0x3e0
path_openat+0x84f/0xa80
do_filp_open+0x8e/0x100
do_sys_openat2+0x225/0x2e0
do_sys_open+0x46/0x80
do_syscall_64+0x4f/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xb3
-> #4 (&bdev->bd_mutex){+.+.}-{3:3}:
__mutex_lock+0x93/0x9c0
__blkdev_get+0x77/0x560
blkdev_get+0x60/0x130
blkdev_get_by_path+0x41/0x80
btrfs_get_bdev_and_sb+0x16/0xb0 [btrfs]
open_fs_devices+0x9d/0x240 [btrfs]
btrfs_open_devices+0x89/0x90 [btrfs]
btrfs_mount_root+0x26a/0x4b0 [btrfs]
legacy_get_tree+0x2b/0x50
vfs_get_tree+0x23/0xc0
fc_mount+0x9/0x40
vfs_kern_mount.part.40+0x57/0x80
btrfs_mount+0x148/0x3f0 [btrfs]
legacy_get_tree+0x2b/0x50
vfs_get_tree+0x23/0xc0
do_mount+0x712/0xa40
__x64_sys_mount+0xbf/0xe0
do_syscall_64+0x4f/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xb3
-> #3 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
__mutex_lock+0x93/0x9c0
btrfs_run_dev_stats+0x44/0x470 [btrfs]
commit_cowonly_roots+0xac/0x2a0 [btrfs]
btrfs_commit_transaction+0x511/0xa70 [btrfs]
transaction_kthread+0x13c/0x160 [btrfs]
kthread+0x143/0x160
ret_from_fork+0x3a/0x50
-> #2 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
__mutex_lock+0x93/0x9c0
btrfs_commit_transaction+0x4b6/0xa70 [btrfs]
transaction_kthread+0x13c/0x160 [btrfs]
kthread+0x143/0x160
ret_from_fork+0x3a/0x50
-> #1 (&fs_info->reloc_mutex){+.+.}-{3:3}:
__mutex_lock+0x93/0x9c0
btrfs_record_root_in_trans+0x3e/0x60 [btrfs]
start_transaction+0xcb/0x550 [btrfs]
btrfs_mkdir+0x5c/0x1e0 [btrfs]
vfs_mkdir+0x107/0x1d0
do_mkdirat+0xe7/0x110
do_syscall_64+0x4f/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xb3
-> #0 (sb_internal#3){.+.+}-{0:0}:
__lock_acquire+0x11f9/0x1aa0
lock_acquire+0x9e/0x380
__sb_start_write+0x13a/0x270
start_transaction+0x37e/0x550 [btrfs]
cow_file_range_inline.constprop.74+0xe4/0x640 [btrfs]
cow_file_range+0xe5/0x3f0 [btrfs]
btrfs_run_delalloc_range+0x128/0x620 [btrfs]
writepage_delalloc+0xe2/0x140 [btrfs]
__extent_writepage+0x1a3/0x370 [btrfs]
extent_write_cache_pages+0x2b8/0x470 [btrfs]
extent_writepages+0x3f/0x90 [btrfs]
do_writepages+0x3c/0xe0
__writeback_single_inode+0x4f/0x650
writeback_sb_inodes+0x1f7/0x560
__writeback_inodes_wb+0x58/0xa0
wb_writeback+0x33b/0x4b0
wb_workfn+0x428/0x5b0
process_one_work+0x2b6/0x620
worker_thread+0x35/0x3e0
kthread+0x143/0x160
ret_from_fork+0x3a/0x50
other info that might help us debug this:
Chain exists of:
sb_internal#3 --> (work_completion)(&mddev->del_work) --> (work_completion)(&(&wb->dwork)->work)
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&(&wb->dwork)->work));
lock((work_completion)(&mddev->del_work));
lock((work_completion)(&(&wb->dwork)->work));
lock(sb_internal#3);
*** DEADLOCK ***
3 locks held by kworker/u16:3/8175:
#0: ffff88840baa6948 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x235/0x620
#1: ffffc900087c7e68 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x235/0x620
#2: ffff8882f19550e8 (&type->s_umount_key#52){++++}-{3:3}, at: trylock_super+0x11/0x50
stack backtrace:
CPU: 1 PID: 8175 Comm: kworker/u16:3 Tainted: G O 5.7.1mq+ #383
Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 3603 11/09/2012
Workqueue: writeback wb_workfn (flush-btrfs-1)
Call Trace:
dump_stack+0x71/0xa0
check_noncircular+0x165/0x180
? stack_trace_save+0x46/0x70
__lock_acquire+0x11f9/0x1aa0
lock_acquire+0x9e/0x380
? start_transaction+0x37e/0x550 [btrfs]
__sb_start_write+0x13a/0x270
? start_transaction+0x37e/0x550 [btrfs]
start_transaction+0x37e/0x550 [btrfs]
? kmem_cache_alloc+0x1b0/0x2c0
cow_file_range_inline.constprop.74+0xe4/0x640 [btrfs]
? lock_acquire+0x9e/0x380
? test_range_bit+0x3d/0x130 [btrfs]
cow_file_range+0xe5/0x3f0 [btrfs]
btrfs_run_delalloc_range+0x128/0x620 [btrfs]
? find_lock_delalloc_range+0x1f3/0x220 [btrfs]
writepage_delalloc+0xe2/0x140 [btrfs]
__extent_writepage+0x1a3/0x370 [btrfs]
extent_write_cache_pages+0x2b8/0x470 [btrfs]
? __lock_acquire+0x3fc/0x1aa0
extent_writepages+0x3f/0x90 [btrfs]
do_writepages+0x3c/0xe0
? find_held_lock+0x2d/0x90
__writeback_single_inode+0x4f/0x650
writeback_sb_inodes+0x1f7/0x560
__writeback_inodes_wb+0x58/0xa0
wb_writeback+0x33b/0x4b0
wb_workfn+0x428/0x5b0
? sched_clock_cpu+0xe/0xd0
process_one_work+0x2b6/0x620
? worker_thread+0xc7/0x3e0
worker_thread+0x35/0x3e0
? process_one_work+0x620/0x620
kthread+0x143/0x160
? kthread_park+0x80/0x80
ret_from_fork+0x3a/0x50
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Lockdep warning after `mdadm -S`
2020-06-10 7:19 Lockdep warning after `mdadm -S` Michał Mirosław
@ 2020-06-10 11:00 ` Nikolay Borisov
0 siblings, 0 replies; 2+ messages in thread
From: Nikolay Borisov @ 2020-06-10 11:00 UTC (permalink / raw)
To: Michał Mirosław, linux-btrfs, linux-raid, linux-fsdevel
Cc: linux-kernel
On 10.06.20 г. 10:19 ч., Michał Mirosław wrote:
> Dear Developers,
>
> I found a lockdep warning in dmesg some after doing 'mdadm -S' while
> also having btrfs mounted (light to none I/O load). Disks under MD and
> btrfs are unrelated.
Huhz, I think that's genuine, because btrfs and md shared at least the bd_mutex
and the workqueue. So the scenario could go something along the lines of:
T1: T2: T3: T4: T5:
initiates process of stopping md, transaction commit blocked on User tries to (erroneously) mount mddev
holds bd_mutex, open_mutex, wants to begin a new transaction but device_list_mutex, callstack #3, but blocks on callstack at #4 since T1
blocks on flushing md_misc_wq, Should begin mmdev_delayed_delete blocks on callstack #0 due to a holding sb_internal holds bd_open and is holding device_list_mutex
Callstack #6 but never does because workqueue is running transaction commit in T4
blocked due to T3 being blocked NB: This happens from writeback
context, ie. same as T2 workqueue
bd_mutex held by T1
So T5 blocks T4, which blocks T3, which blocks the shared writeback workqueue,
this prevents T2 from running which when done would cause T1 to unlock bd_mutex,
which would unblock T5. I think this is generally possible but highly unlikely.
Also looking at the code in T5 (callstack #4 below) it seems that the same could happen if
scan ioctl is sent for the mddev. Discussing this with peterz he proposed the following half-baked
patch: https://paste.debian.net/1151314/
The idea is to remove the md_open mutex which would break the dependency chain between
#4->#6. What do mdraid people think?
>
> Best Regards,
> Michał Mirosław
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.7.1mq+ #383 Tainted: G O
> ------------------------------------------------------
> kworker/u16:3/8175 is trying to acquire lock:
> ffff8882f19556a0 (sb_internal#3){.+.+}-{0:0}, at: start_transaction+0x37e/0x550 [btrfs]
>
> but task is already holding lock:
> ffffc900087c7e68 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x235/0x620
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
> __flush_work+0x331/0x490
> wb_shutdown+0x8f/0xb0
> bdi_unregister+0x72/0x1f0
> del_gendisk+0x2b0/0x2c0
> md_free+0x28/0x90
> kobject_put+0xa6/0x1b0
> process_one_work+0x2b6/0x620
> worker_thread+0x35/0x3e0
> kthread+0x143/0x160
> ret_from_fork+0x3a/0x50
>
> -> #7 ((work_completion)(&mddev->del_work)){+.+.}-{0:0}:
> process_one_work+0x28d/0x620
> worker_thread+0x35/0x3e0
> kthread+0x143/0x160
> ret_from_fork+0x3a/0x50
>
> -> #6 ((wq_completion)md_misc){+.+.}-{0:0}:
> flush_workqueue+0xa9/0x4e0
> __md_stop_writes+0x18/0x100
> do_md_stop+0x165/0x2d0
> md_ioctl+0xa52/0x1d60
> blkdev_ioctl+0x1cc/0x2a0
> block_ioctl+0x3a/0x40
> ksys_ioctl+0x81/0xc0
> __x64_sys_ioctl+0x11/0x20
> do_syscall_64+0x4f/0x210
> entry_SYSCALL_64_after_hwframe+0x49/0xb3
>
> -> #5 (&mddev->open_mutex){+.+.}-{3:3}:
> __mutex_lock+0x93/0x9c0
> md_open+0x43/0xc0
> __blkdev_get+0xea/0x560
> blkdev_get+0x60/0x130
> do_dentry_open+0x147/0x3e0
> path_openat+0x84f/0xa80
> do_filp_open+0x8e/0x100
> do_sys_openat2+0x225/0x2e0
> do_sys_open+0x46/0x80
> do_syscall_64+0x4f/0x210
> entry_SYSCALL_64_after_hwframe+0x49/0xb3
>
> -> #4 (&bdev->bd_mutex){+.+.}-{3:3}:
> __mutex_lock+0x93/0x9c0
> __blkdev_get+0x77/0x560
> blkdev_get+0x60/0x130
> blkdev_get_by_path+0x41/0x80
> btrfs_get_bdev_and_sb+0x16/0xb0 [btrfs]
> open_fs_devices+0x9d/0x240 [btrfs]
> btrfs_open_devices+0x89/0x90 [btrfs]
> btrfs_mount_root+0x26a/0x4b0 [btrfs]
> legacy_get_tree+0x2b/0x50
> vfs_get_tree+0x23/0xc0
> fc_mount+0x9/0x40
> vfs_kern_mount.part.40+0x57/0x80
> btrfs_mount+0x148/0x3f0 [btrfs]
> legacy_get_tree+0x2b/0x50
> vfs_get_tree+0x23/0xc0
> do_mount+0x712/0xa40
> __x64_sys_mount+0xbf/0xe0
> do_syscall_64+0x4f/0x210
> entry_SYSCALL_64_after_hwframe+0x49/0xb3
>
> -> #3 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
> __mutex_lock+0x93/0x9c0
> btrfs_run_dev_stats+0x44/0x470 [btrfs]
> commit_cowonly_roots+0xac/0x2a0 [btrfs]
> btrfs_commit_transaction+0x511/0xa70 [btrfs]
> transaction_kthread+0x13c/0x160 [btrfs]
> kthread+0x143/0x160
> ret_from_fork+0x3a/0x50
>
> -> #2 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
> __mutex_lock+0x93/0x9c0
> btrfs_commit_transaction+0x4b6/0xa70 [btrfs]
> transaction_kthread+0x13c/0x160 [btrfs]
> kthread+0x143/0x160
> ret_from_fork+0x3a/0x50
>
> -> #1 (&fs_info->reloc_mutex){+.+.}-{3:3}:
> __mutex_lock+0x93/0x9c0
> btrfs_record_root_in_trans+0x3e/0x60 [btrfs]
> start_transaction+0xcb/0x550 [btrfs]
> btrfs_mkdir+0x5c/0x1e0 [btrfs]
> vfs_mkdir+0x107/0x1d0
> do_mkdirat+0xe7/0x110
> do_syscall_64+0x4f/0x210
> entry_SYSCALL_64_after_hwframe+0x49/0xb3
>
> -> #0 (sb_internal#3){.+.+}-{0:0}:
> __lock_acquire+0x11f9/0x1aa0
> lock_acquire+0x9e/0x380
> __sb_start_write+0x13a/0x270
> start_transaction+0x37e/0x550 [btrfs]
> cow_file_range_inline.constprop.74+0xe4/0x640 [btrfs]
> cow_file_range+0xe5/0x3f0 [btrfs]
> btrfs_run_delalloc_range+0x128/0x620 [btrfs]
> writepage_delalloc+0xe2/0x140 [btrfs]
> __extent_writepage+0x1a3/0x370 [btrfs]
> extent_write_cache_pages+0x2b8/0x470 [btrfs]
> extent_writepages+0x3f/0x90 [btrfs]
> do_writepages+0x3c/0xe0
> __writeback_single_inode+0x4f/0x650
> writeback_sb_inodes+0x1f7/0x560
> __writeback_inodes_wb+0x58/0xa0
> wb_writeback+0x33b/0x4b0
> wb_workfn+0x428/0x5b0
> process_one_work+0x2b6/0x620
> worker_thread+0x35/0x3e0
> kthread+0x143/0x160
> ret_from_fork+0x3a/0x50
>
> other info that might help us debug this:
>
> Chain exists of:
> sb_internal#3 --> (work_completion)(&mddev->del_work) --> (work_completion)(&(&wb->dwork)->work)
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock((work_completion)(&(&wb->dwork)->work));
> lock((work_completion)(&mddev->del_work));
> lock((work_completion)(&(&wb->dwork)->work));
> lock(sb_internal#3);
>
> *** DEADLOCK ***
>
> 3 locks held by kworker/u16:3/8175:
> #0: ffff88840baa6948 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x235/0x620
> #1: ffffc900087c7e68 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x235/0x620
> #2: ffff8882f19550e8 (&type->s_umount_key#52){++++}-{3:3}, at: trylock_super+0x11/0x50
>
> stack backtrace:
> CPU: 1 PID: 8175 Comm: kworker/u16:3 Tainted: G O 5.7.1mq+ #383
> Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 3603 11/09/2012
> Workqueue: writeback wb_workfn (flush-btrfs-1)
> Call Trace:
> dump_stack+0x71/0xa0
> check_noncircular+0x165/0x180
> ? stack_trace_save+0x46/0x70
> __lock_acquire+0x11f9/0x1aa0
> lock_acquire+0x9e/0x380
> ? start_transaction+0x37e/0x550 [btrfs]
> __sb_start_write+0x13a/0x270
> ? start_transaction+0x37e/0x550 [btrfs]
> start_transaction+0x37e/0x550 [btrfs]
> ? kmem_cache_alloc+0x1b0/0x2c0
> cow_file_range_inline.constprop.74+0xe4/0x640 [btrfs]
> ? lock_acquire+0x9e/0x380
> ? test_range_bit+0x3d/0x130 [btrfs]
> cow_file_range+0xe5/0x3f0 [btrfs]
> btrfs_run_delalloc_range+0x128/0x620 [btrfs]
> ? find_lock_delalloc_range+0x1f3/0x220 [btrfs]
> writepage_delalloc+0xe2/0x140 [btrfs]
> __extent_writepage+0x1a3/0x370 [btrfs]
> extent_write_cache_pages+0x2b8/0x470 [btrfs]
> ? __lock_acquire+0x3fc/0x1aa0
> extent_writepages+0x3f/0x90 [btrfs]
> do_writepages+0x3c/0xe0
> ? find_held_lock+0x2d/0x90
> __writeback_single_inode+0x4f/0x650
> writeback_sb_inodes+0x1f7/0x560
> __writeback_inodes_wb+0x58/0xa0
> wb_writeback+0x33b/0x4b0
> wb_workfn+0x428/0x5b0
> ? sched_clock_cpu+0xe/0xd0
> process_one_work+0x2b6/0x620
> ? worker_thread+0xc7/0x3e0
> worker_thread+0x35/0x3e0
> ? process_one_work+0x620/0x620
> kthread+0x143/0x160
> ? kthread_park+0x80/0x80
> ret_from_fork+0x3a/0x50
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-06-10 11:00 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10 7:19 Lockdep warning after `mdadm -S` Michał Mirosław
2020-06-10 11:00 ` Nikolay Borisov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).