linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Lockdep warning after `mdadm -S`
@ 2020-06-10  7:19 Michał Mirosław
  2020-06-10 11:00 ` Nikolay Borisov
  0 siblings, 1 reply; 2+ messages in thread
From: Michał Mirosław @ 2020-06-10  7:19 UTC (permalink / raw)
  To: linux-btrfs, linux-raid, linux-fsdevel; +Cc: linux-kernel

Dear Developers,

I found a lockdep warning in dmesg some after doing 'mdadm -S' while
also having btrfs mounted (light to none I/O load).  Disks under MD and
btrfs are unrelated.

Best Regards,
Michał Mirosław

======================================================
WARNING: possible circular locking dependency detected
5.7.1mq+ #383 Tainted: G           O     
------------------------------------------------------
kworker/u16:3/8175 is trying to acquire lock:
ffff8882f19556a0 (sb_internal#3){.+.+}-{0:0}, at: start_transaction+0x37e/0x550 [btrfs]

but task is already holding lock:
ffffc900087c7e68 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x235/0x620

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
       __flush_work+0x331/0x490
       wb_shutdown+0x8f/0xb0
       bdi_unregister+0x72/0x1f0
       del_gendisk+0x2b0/0x2c0
       md_free+0x28/0x90
       kobject_put+0xa6/0x1b0
       process_one_work+0x2b6/0x620
       worker_thread+0x35/0x3e0
       kthread+0x143/0x160
       ret_from_fork+0x3a/0x50

-> #7 ((work_completion)(&mddev->del_work)){+.+.}-{0:0}:
       process_one_work+0x28d/0x620
       worker_thread+0x35/0x3e0
       kthread+0x143/0x160
       ret_from_fork+0x3a/0x50

-> #6 ((wq_completion)md_misc){+.+.}-{0:0}:
       flush_workqueue+0xa9/0x4e0
       __md_stop_writes+0x18/0x100
       do_md_stop+0x165/0x2d0
       md_ioctl+0xa52/0x1d60
       blkdev_ioctl+0x1cc/0x2a0
       block_ioctl+0x3a/0x40
       ksys_ioctl+0x81/0xc0
       __x64_sys_ioctl+0x11/0x20
       do_syscall_64+0x4f/0x210
       entry_SYSCALL_64_after_hwframe+0x49/0xb3

-> #5 (&mddev->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x93/0x9c0
       md_open+0x43/0xc0
       __blkdev_get+0xea/0x560
       blkdev_get+0x60/0x130
       do_dentry_open+0x147/0x3e0
       path_openat+0x84f/0xa80
       do_filp_open+0x8e/0x100
       do_sys_openat2+0x225/0x2e0
       do_sys_open+0x46/0x80
       do_syscall_64+0x4f/0x210
       entry_SYSCALL_64_after_hwframe+0x49/0xb3

-> #4 (&bdev->bd_mutex){+.+.}-{3:3}:
       __mutex_lock+0x93/0x9c0
       __blkdev_get+0x77/0x560
       blkdev_get+0x60/0x130
       blkdev_get_by_path+0x41/0x80
       btrfs_get_bdev_and_sb+0x16/0xb0 [btrfs]
       open_fs_devices+0x9d/0x240 [btrfs]
       btrfs_open_devices+0x89/0x90 [btrfs]
       btrfs_mount_root+0x26a/0x4b0 [btrfs]
       legacy_get_tree+0x2b/0x50
       vfs_get_tree+0x23/0xc0
       fc_mount+0x9/0x40
       vfs_kern_mount.part.40+0x57/0x80
       btrfs_mount+0x148/0x3f0 [btrfs]
       legacy_get_tree+0x2b/0x50
       vfs_get_tree+0x23/0xc0
       do_mount+0x712/0xa40
       __x64_sys_mount+0xbf/0xe0
       do_syscall_64+0x4f/0x210
       entry_SYSCALL_64_after_hwframe+0x49/0xb3

-> #3 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
       __mutex_lock+0x93/0x9c0
       btrfs_run_dev_stats+0x44/0x470 [btrfs]
       commit_cowonly_roots+0xac/0x2a0 [btrfs]
       btrfs_commit_transaction+0x511/0xa70 [btrfs]
       transaction_kthread+0x13c/0x160 [btrfs]
       kthread+0x143/0x160
       ret_from_fork+0x3a/0x50

-> #2 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
       __mutex_lock+0x93/0x9c0
       btrfs_commit_transaction+0x4b6/0xa70 [btrfs]
       transaction_kthread+0x13c/0x160 [btrfs]
       kthread+0x143/0x160
       ret_from_fork+0x3a/0x50

-> #1 (&fs_info->reloc_mutex){+.+.}-{3:3}:
       __mutex_lock+0x93/0x9c0
       btrfs_record_root_in_trans+0x3e/0x60 [btrfs]
       start_transaction+0xcb/0x550 [btrfs]
       btrfs_mkdir+0x5c/0x1e0 [btrfs]
       vfs_mkdir+0x107/0x1d0
       do_mkdirat+0xe7/0x110
       do_syscall_64+0x4f/0x210
       entry_SYSCALL_64_after_hwframe+0x49/0xb3

-> #0 (sb_internal#3){.+.+}-{0:0}:
       __lock_acquire+0x11f9/0x1aa0
       lock_acquire+0x9e/0x380
       __sb_start_write+0x13a/0x270
       start_transaction+0x37e/0x550 [btrfs]
       cow_file_range_inline.constprop.74+0xe4/0x640 [btrfs]
       cow_file_range+0xe5/0x3f0 [btrfs]
       btrfs_run_delalloc_range+0x128/0x620 [btrfs]
       writepage_delalloc+0xe2/0x140 [btrfs]
       __extent_writepage+0x1a3/0x370 [btrfs]
       extent_write_cache_pages+0x2b8/0x470 [btrfs]
       extent_writepages+0x3f/0x90 [btrfs]
       do_writepages+0x3c/0xe0
       __writeback_single_inode+0x4f/0x650
       writeback_sb_inodes+0x1f7/0x560
       __writeback_inodes_wb+0x58/0xa0
       wb_writeback+0x33b/0x4b0
       wb_workfn+0x428/0x5b0
       process_one_work+0x2b6/0x620
       worker_thread+0x35/0x3e0
       kthread+0x143/0x160
       ret_from_fork+0x3a/0x50

other info that might help us debug this:

Chain exists of:
  sb_internal#3 --> (work_completion)(&mddev->del_work) --> (work_completion)(&(&wb->dwork)->work)

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&(&wb->dwork)->work));
                               lock((work_completion)(&mddev->del_work));
                               lock((work_completion)(&(&wb->dwork)->work));
  lock(sb_internal#3);

 *** DEADLOCK ***

3 locks held by kworker/u16:3/8175:
 #0: ffff88840baa6948 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x235/0x620
 #1: ffffc900087c7e68 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x235/0x620
 #2: ffff8882f19550e8 (&type->s_umount_key#52){++++}-{3:3}, at: trylock_super+0x11/0x50

stack backtrace:
CPU: 1 PID: 8175 Comm: kworker/u16:3 Tainted: G           O      5.7.1mq+ #383
Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 3603 11/09/2012
Workqueue: writeback wb_workfn (flush-btrfs-1)
Call Trace:
 dump_stack+0x71/0xa0
 check_noncircular+0x165/0x180
 ? stack_trace_save+0x46/0x70
 __lock_acquire+0x11f9/0x1aa0
 lock_acquire+0x9e/0x380
 ? start_transaction+0x37e/0x550 [btrfs]
 __sb_start_write+0x13a/0x270
 ? start_transaction+0x37e/0x550 [btrfs]
 start_transaction+0x37e/0x550 [btrfs]
 ? kmem_cache_alloc+0x1b0/0x2c0
 cow_file_range_inline.constprop.74+0xe4/0x640 [btrfs]
 ? lock_acquire+0x9e/0x380
 ? test_range_bit+0x3d/0x130 [btrfs]
 cow_file_range+0xe5/0x3f0 [btrfs]
 btrfs_run_delalloc_range+0x128/0x620 [btrfs]
 ? find_lock_delalloc_range+0x1f3/0x220 [btrfs]
 writepage_delalloc+0xe2/0x140 [btrfs]
 __extent_writepage+0x1a3/0x370 [btrfs]
 extent_write_cache_pages+0x2b8/0x470 [btrfs]
 ? __lock_acquire+0x3fc/0x1aa0
 extent_writepages+0x3f/0x90 [btrfs]
 do_writepages+0x3c/0xe0
 ? find_held_lock+0x2d/0x90
 __writeback_single_inode+0x4f/0x650
 writeback_sb_inodes+0x1f7/0x560
 __writeback_inodes_wb+0x58/0xa0
 wb_writeback+0x33b/0x4b0
 wb_workfn+0x428/0x5b0
 ? sched_clock_cpu+0xe/0xd0
 process_one_work+0x2b6/0x620
 ? worker_thread+0xc7/0x3e0
 worker_thread+0x35/0x3e0
 ? process_one_work+0x620/0x620
 kthread+0x143/0x160
 ? kthread_park+0x80/0x80
 ret_from_fork+0x3a/0x50


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Lockdep warning after `mdadm -S`
  2020-06-10  7:19 Lockdep warning after `mdadm -S` Michał Mirosław
@ 2020-06-10 11:00 ` Nikolay Borisov
  0 siblings, 0 replies; 2+ messages in thread
From: Nikolay Borisov @ 2020-06-10 11:00 UTC (permalink / raw)
  To: Michał Mirosław, linux-btrfs, linux-raid, linux-fsdevel
  Cc: linux-kernel



On 10.06.20 г. 10:19 ч., Michał Mirosław wrote:
> Dear Developers,
> 
> I found a lockdep warning in dmesg some after doing 'mdadm -S' while
> also having btrfs mounted (light to none I/O load).  Disks under MD and
> btrfs are unrelated.

Huhz, I think that's genuine, because btrfs and md shared at least the bd_mutex 
and the workqueue. So the scenario could go something along the lines of: 

					   
 					   						 
T1: 						T2:					T3:							T4:					T5:
 initiates process of stopping md,													transaction commit blocked on   		User tries to (erroneously) mount mddev 
 holds bd_mutex, open_mutex, 								wants to begin a new transaction but 		device_list_mutex, callstack #3,		but blocks on callstack at #4 since T1 
 blocks on flushing md_misc_wq,		Should begin  mmdev_delayed_delete		blocks on callstack #0 due to a 		holding sb_internal				holds bd_open  and is holding device_list_mutex
 Callstack #6				but never does because workqueue is  		running transaction commit in T4
					blocked due to T3 being blocked	                NB: This happens from writeback		   
					 					       	context, ie. same as T2 workqueue				  
              				  					        bd_mutex held by T1 	

So T5 blocks T4, which blocks T3, which blocks the shared writeback workqueue, 
this prevents T2 from running which when done would cause T1 to unlock bd_mutex, 
which would unblock T5. I think this is generally possible but highly unlikely.

Also looking at the code in T5 (callstack #4 below) it seems that the same could happen if 
scan ioctl is sent for the mddev. Discussing this with peterz he proposed the following half-baked 
patch: https://paste.debian.net/1151314/

The idea is to remove the md_open mutex which would break the dependency chain between 
#4->#6. What do mdraid people think?
											
> 
> Best Regards,
> Michał Mirosław
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.7.1mq+ #383 Tainted: G           O     
> ------------------------------------------------------
> kworker/u16:3/8175 is trying to acquire lock:
> ffff8882f19556a0 (sb_internal#3){.+.+}-{0:0}, at: start_transaction+0x37e/0x550 [btrfs]
> 
> but task is already holding lock:
> ffffc900087c7e68 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x235/0x620
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
>        __flush_work+0x331/0x490
>        wb_shutdown+0x8f/0xb0
>        bdi_unregister+0x72/0x1f0
>        del_gendisk+0x2b0/0x2c0
>        md_free+0x28/0x90
>        kobject_put+0xa6/0x1b0
>        process_one_work+0x2b6/0x620
>        worker_thread+0x35/0x3e0
>        kthread+0x143/0x160
>        ret_from_fork+0x3a/0x50
> 
> -> #7 ((work_completion)(&mddev->del_work)){+.+.}-{0:0}:
>        process_one_work+0x28d/0x620
>        worker_thread+0x35/0x3e0
>        kthread+0x143/0x160
>        ret_from_fork+0x3a/0x50
> 
> -> #6 ((wq_completion)md_misc){+.+.}-{0:0}:
>        flush_workqueue+0xa9/0x4e0
>        __md_stop_writes+0x18/0x100
>        do_md_stop+0x165/0x2d0
>        md_ioctl+0xa52/0x1d60
>        blkdev_ioctl+0x1cc/0x2a0
>        block_ioctl+0x3a/0x40
>        ksys_ioctl+0x81/0xc0
>        __x64_sys_ioctl+0x11/0x20
>        do_syscall_64+0x4f/0x210
>        entry_SYSCALL_64_after_hwframe+0x49/0xb3
> 
> -> #5 (&mddev->open_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x93/0x9c0
>        md_open+0x43/0xc0
>        __blkdev_get+0xea/0x560
>        blkdev_get+0x60/0x130
>        do_dentry_open+0x147/0x3e0
>        path_openat+0x84f/0xa80
>        do_filp_open+0x8e/0x100
>        do_sys_openat2+0x225/0x2e0
>        do_sys_open+0x46/0x80
>        do_syscall_64+0x4f/0x210
>        entry_SYSCALL_64_after_hwframe+0x49/0xb3
> 
> -> #4 (&bdev->bd_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x93/0x9c0
>        __blkdev_get+0x77/0x560
>        blkdev_get+0x60/0x130
>        blkdev_get_by_path+0x41/0x80
>        btrfs_get_bdev_and_sb+0x16/0xb0 [btrfs]
>        open_fs_devices+0x9d/0x240 [btrfs]
>        btrfs_open_devices+0x89/0x90 [btrfs]
>        btrfs_mount_root+0x26a/0x4b0 [btrfs]
>        legacy_get_tree+0x2b/0x50
>        vfs_get_tree+0x23/0xc0
>        fc_mount+0x9/0x40
>        vfs_kern_mount.part.40+0x57/0x80
>        btrfs_mount+0x148/0x3f0 [btrfs]
>        legacy_get_tree+0x2b/0x50
>        vfs_get_tree+0x23/0xc0
>        do_mount+0x712/0xa40
>        __x64_sys_mount+0xbf/0xe0
>        do_syscall_64+0x4f/0x210
>        entry_SYSCALL_64_after_hwframe+0x49/0xb3
> 
> -> #3 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x93/0x9c0
>        btrfs_run_dev_stats+0x44/0x470 [btrfs]
>        commit_cowonly_roots+0xac/0x2a0 [btrfs]
>        btrfs_commit_transaction+0x511/0xa70 [btrfs]
>        transaction_kthread+0x13c/0x160 [btrfs]
>        kthread+0x143/0x160
>        ret_from_fork+0x3a/0x50
> 
> -> #2 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x93/0x9c0
>        btrfs_commit_transaction+0x4b6/0xa70 [btrfs]
>        transaction_kthread+0x13c/0x160 [btrfs]
>        kthread+0x143/0x160
>        ret_from_fork+0x3a/0x50
> 
> -> #1 (&fs_info->reloc_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x93/0x9c0
>        btrfs_record_root_in_trans+0x3e/0x60 [btrfs]
>        start_transaction+0xcb/0x550 [btrfs]
>        btrfs_mkdir+0x5c/0x1e0 [btrfs]
>        vfs_mkdir+0x107/0x1d0
>        do_mkdirat+0xe7/0x110
>        do_syscall_64+0x4f/0x210
>        entry_SYSCALL_64_after_hwframe+0x49/0xb3
> 
> -> #0 (sb_internal#3){.+.+}-{0:0}:
>        __lock_acquire+0x11f9/0x1aa0
>        lock_acquire+0x9e/0x380
>        __sb_start_write+0x13a/0x270
>        start_transaction+0x37e/0x550 [btrfs]
>        cow_file_range_inline.constprop.74+0xe4/0x640 [btrfs]
>        cow_file_range+0xe5/0x3f0 [btrfs]
>        btrfs_run_delalloc_range+0x128/0x620 [btrfs]
>        writepage_delalloc+0xe2/0x140 [btrfs]
>        __extent_writepage+0x1a3/0x370 [btrfs]
>        extent_write_cache_pages+0x2b8/0x470 [btrfs]
>        extent_writepages+0x3f/0x90 [btrfs]
>        do_writepages+0x3c/0xe0
>        __writeback_single_inode+0x4f/0x650
>        writeback_sb_inodes+0x1f7/0x560
>        __writeback_inodes_wb+0x58/0xa0
>        wb_writeback+0x33b/0x4b0
>        wb_workfn+0x428/0x5b0
>        process_one_work+0x2b6/0x620
>        worker_thread+0x35/0x3e0
>        kthread+0x143/0x160
>        ret_from_fork+0x3a/0x50
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   sb_internal#3 --> (work_completion)(&mddev->del_work) --> (work_completion)(&(&wb->dwork)->work)
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock((work_completion)(&(&wb->dwork)->work));
>                                lock((work_completion)(&mddev->del_work));
>                                lock((work_completion)(&(&wb->dwork)->work));
>   lock(sb_internal#3);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/u16:3/8175:
>  #0: ffff88840baa6948 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x235/0x620
>  #1: ffffc900087c7e68 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x235/0x620
>  #2: ffff8882f19550e8 (&type->s_umount_key#52){++++}-{3:3}, at: trylock_super+0x11/0x50
> 
> stack backtrace:
> CPU: 1 PID: 8175 Comm: kworker/u16:3 Tainted: G           O      5.7.1mq+ #383
> Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 3603 11/09/2012
> Workqueue: writeback wb_workfn (flush-btrfs-1)
> Call Trace:
>  dump_stack+0x71/0xa0
>  check_noncircular+0x165/0x180
>  ? stack_trace_save+0x46/0x70
>  __lock_acquire+0x11f9/0x1aa0
>  lock_acquire+0x9e/0x380
>  ? start_transaction+0x37e/0x550 [btrfs]
>  __sb_start_write+0x13a/0x270
>  ? start_transaction+0x37e/0x550 [btrfs]
>  start_transaction+0x37e/0x550 [btrfs]
>  ? kmem_cache_alloc+0x1b0/0x2c0
>  cow_file_range_inline.constprop.74+0xe4/0x640 [btrfs]
>  ? lock_acquire+0x9e/0x380
>  ? test_range_bit+0x3d/0x130 [btrfs]
>  cow_file_range+0xe5/0x3f0 [btrfs]
>  btrfs_run_delalloc_range+0x128/0x620 [btrfs]
>  ? find_lock_delalloc_range+0x1f3/0x220 [btrfs]
>  writepage_delalloc+0xe2/0x140 [btrfs]
>  __extent_writepage+0x1a3/0x370 [btrfs]
>  extent_write_cache_pages+0x2b8/0x470 [btrfs]
>  ? __lock_acquire+0x3fc/0x1aa0
>  extent_writepages+0x3f/0x90 [btrfs]
>  do_writepages+0x3c/0xe0
>  ? find_held_lock+0x2d/0x90
>  __writeback_single_inode+0x4f/0x650
>  writeback_sb_inodes+0x1f7/0x560
>  __writeback_inodes_wb+0x58/0xa0
>  wb_writeback+0x33b/0x4b0
>  wb_workfn+0x428/0x5b0
>  ? sched_clock_cpu+0xe/0xd0
>  process_one_work+0x2b6/0x620
>  ? worker_thread+0xc7/0x3e0
>  worker_thread+0x35/0x3e0
>  ? process_one_work+0x620/0x620
>  kthread+0x143/0x160
>  ? kthread_park+0x80/0x80
>  ret_from_fork+0x3a/0x50
> 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-06-10 11:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10  7:19 Lockdep warning after `mdadm -S` Michał Mirosław
2020-06-10 11:00 ` Nikolay Borisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).