* [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
@ 2018-03-16 18:36 jeffm
2018-03-16 18:45 ` Liu Bo
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: jeffm @ 2018-03-16 18:36 UTC (permalink / raw)
To: linux-btrfs; +Cc: Jeff Mahoney
From: Jeff Mahoney <jeffm@suse.com>
While running btrfs/011, I hit the following lockdep splat.
This is the important bit:
pcpu_alloc+0x1ac/0x5e0
__percpu_counter_init+0x4e/0xb0
btrfs_init_fs_root+0x99/0x1c0 [btrfs]
btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
resolve_indirect_refs+0x130/0x830 [btrfs]
find_parent_nodes+0x69e/0xff0 [btrfs]
btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
btrfs_find_all_roots+0x50/0x70 [btrfs]
btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
The percpu_counter_init call in btrfs_alloc_subvolume_writers
uses GFP_KERNEL, which we can't do during transaction commit.
This switches it to GFP_NOFS.
========================================================
WARNING: possible irq lock inversion dependency detected
4.12.14-kvmsmall #8 Tainted: G W
--------------------------------------------------------
kswapd0/50 just changed the state of lock:
(&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
but this lock took another, RECLAIM_FS-unsafe lock in the past:
(pcpu_alloc_mutex){+.+.+.}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:
&delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pcpu_alloc_mutex);
local_irq_disable();
lock(&delayed_node->mutex);
lock(&found->groups_sem);
<Interrupt>
lock(&delayed_node->mutex);
*** DEADLOCK ***
2 locks held by kswapd0/50:
#0: (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
#1: (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50
the shortest dependencies between 2nd lock and 1st lock:
-> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
HARDIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
pcpu_alloc+0x1ac/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
__do_tune_cpucache+0x2c/0x220
do_tune_cpucache+0x26/0xc0
enable_cpucache+0x6d/0xf0
kmem_cache_init_late+0x42/0x75
start_kernel+0x343/0x4cb
x86_64_start_kernel+0x127/0x134
secondary_startup_64+0xa5/0xb0
SOFTIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
pcpu_alloc+0x1ac/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
__do_tune_cpucache+0x2c/0x220
do_tune_cpucache+0x26/0xc0
enable_cpucache+0x6d/0xf0
kmem_cache_init_late+0x42/0x75
start_kernel+0x343/0x4cb
x86_64_start_kernel+0x127/0x134
secondary_startup_64+0xa5/0xb0
RECLAIM_FS-ON-W at:
__kmalloc+0x47/0x310
pcpu_extend_area_map+0x2b/0xc0
pcpu_alloc+0x3ec/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
__do_tune_cpucache+0x2c/0x220
do_tune_cpucache+0x26/0xc0
enable_cpucache+0x6d/0xf0
__kmem_cache_create+0x1bf/0x390
create_cache+0xba/0x1b0
kmem_cache_create+0x1f8/0x2b0
ksm_init+0x6f/0x19d
do_one_initcall+0x50/0x1b0
kernel_init_freeable+0x201/0x289
kernel_init+0xa/0x100
ret_from_fork+0x3a/0x50
INITIAL USE at:
__mutex_lock+0x4e/0x8c0
pcpu_alloc+0x1ac/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
setup_cpu_cache+0x2f/0x1f0
__kmem_cache_create+0x1bf/0x390
create_boot_cache+0x8b/0xb1
kmem_cache_init+0xa1/0x19e
start_kernel+0x270/0x4cb
x86_64_start_kernel+0x127/0x134
secondary_startup_64+0xa5/0xb0
}
... key at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
... acquired at:
pcpu_alloc+0x1ac/0x5e0
__percpu_counter_init+0x4e/0xb0
btrfs_init_fs_root+0x99/0x1c0 [btrfs]
btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
resolve_indirect_refs+0x130/0x830 [btrfs]
find_parent_nodes+0x69e/0xff0 [btrfs]
btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
btrfs_find_all_roots+0x50/0x70 [btrfs]
btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
transaction_kthread+0x176/0x1b0 [btrfs]
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
-> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
HARDIRQ-ON-W at:
down_write+0x3e/0xa0
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
cow_file_range.isra.66+0x133/0x470 [btrfs]
run_delalloc_range+0x121/0x410 [btrfs]
writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
__extent_writepage+0x19a/0x360 [btrfs]
extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
extent_writepages+0x4d/0x60 [btrfs]
do_writepages+0x1a/0x70
__filemap_fdatawrite_range+0xa7/0xe0
btrfs_rename+0x5ee/0xdb0 [btrfs]
vfs_rename+0x52a/0x7e0
SyS_rename+0x351/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
HARDIRQ-ON-R at:
down_read+0x35/0x90
caching_thread+0x57/0x560 [btrfs]
normal_work_helper+0x1c0/0x5e0 [btrfs]
process_one_work+0x1e0/0x5c0
worker_thread+0x44/0x390
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
SOFTIRQ-ON-W at:
down_write+0x3e/0xa0
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
cow_file_range.isra.66+0x133/0x470 [btrfs]
run_delalloc_range+0x121/0x410 [btrfs]
writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
__extent_writepage+0x19a/0x360 [btrfs]
extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
extent_writepages+0x4d/0x60 [btrfs]
do_writepages+0x1a/0x70
__filemap_fdatawrite_range+0xa7/0xe0
btrfs_rename+0x5ee/0xdb0 [btrfs]
vfs_rename+0x52a/0x7e0
SyS_rename+0x351/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-R at:
down_read+0x35/0x90
caching_thread+0x57/0x560 [btrfs]
normal_work_helper+0x1c0/0x5e0 [btrfs]
process_one_work+0x1e0/0x5c0
worker_thread+0x44/0x390
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
INITIAL USE at:
down_write+0x3e/0xa0
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
cow_file_range.isra.66+0x133/0x470 [btrfs]
run_delalloc_range+0x121/0x410 [btrfs]
writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
__extent_writepage+0x19a/0x360 [btrfs]
extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
extent_writepages+0x4d/0x60 [btrfs]
do_writepages+0x1a/0x70
__filemap_fdatawrite_range+0xa7/0xe0
btrfs_rename+0x5ee/0xdb0 [btrfs]
vfs_rename+0x52a/0x7e0
SyS_rename+0x351/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
... acquired at:
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
btrfs_create_tree+0xbb/0x2a0 [btrfs]
btrfs_create_uuid_tree+0x37/0x140 [btrfs]
open_ctree+0x23c0/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
-> (&found->groups_sem){++++..} ops: 2134587 {
HARDIRQ-ON-W at:
down_write+0x3e/0xa0
__link_block_group+0x34/0x130 [btrfs]
btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
open_ctree+0x2054/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
HARDIRQ-ON-R at:
down_read+0x35/0x90
btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
open_ctree+0x207b/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-W at:
down_write+0x3e/0xa0
__link_block_group+0x34/0x130 [btrfs]
btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
open_ctree+0x2054/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-R at:
down_read+0x35/0x90
btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
open_ctree+0x207b/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
INITIAL USE at:
down_write+0x3e/0xa0
__link_block_group+0x34/0x130 [btrfs]
btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
open_ctree+0x2054/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
... acquired at:
find_free_extent+0xcb4/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
__btrfs_cow_block+0x110/0x5b0 [btrfs]
btrfs_cow_block+0xd7/0x290 [btrfs]
btrfs_search_slot+0x1f6/0x960 [btrfs]
btrfs_lookup_inode+0x2a/0x90 [btrfs]
__btrfs_update_delayed_inode+0x65/0x210 [btrfs]
btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
evict+0xc4/0x190
__dentry_kill+0xbf/0x170
dput+0x2ae/0x2f0
SyS_rename+0x2a6/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
-> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
HARDIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
btrfs_update_inode+0x83/0x110 [btrfs]
btrfs_dirty_inode+0x62/0xe0 [btrfs]
touch_atime+0x8c/0xb0
do_generic_file_read+0x818/0xb10
__vfs_read+0xdc/0x150
vfs_read+0x8a/0x130
SyS_read+0x45/0xa0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
btrfs_update_inode+0x83/0x110 [btrfs]
btrfs_dirty_inode+0x62/0xe0 [btrfs]
touch_atime+0x8c/0xb0
do_generic_file_read+0x818/0xb10
__vfs_read+0xdc/0x150
vfs_read+0x8a/0x130
SyS_read+0x45/0xa0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
IN-RECLAIM_FS-W at:
__mutex_lock+0x4e/0x8c0
__btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
btrfs_evict_inode+0x22c/0x6a0 [btrfs]
evict+0xc4/0x190
dispose_list+0x35/0x50
prune_icache_sb+0x42/0x50
super_cache_scan+0x139/0x190
shrink_slab+0x262/0x5b0
shrink_node+0x2eb/0x2f0
kswapd+0x2eb/0x890
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
INITIAL USE at:
__mutex_lock+0x4e/0x8c0
btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
btrfs_update_inode+0x83/0x110 [btrfs]
btrfs_dirty_inode+0x62/0xe0 [btrfs]
touch_atime+0x8c/0xb0
do_generic_file_read+0x818/0xb10
__vfs_read+0xdc/0x150
vfs_read+0x8a/0x130
SyS_read+0x45/0xa0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
... acquired at:
__lock_acquire+0x264/0x11c0
lock_acquire+0xbd/0x1e0
__mutex_lock+0x4e/0x8c0
__btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
btrfs_evict_inode+0x22c/0x6a0 [btrfs]
evict+0xc4/0x190
dispose_list+0x35/0x50
prune_icache_sb+0x42/0x50
super_cache_scan+0x139/0x190
shrink_slab+0x262/0x5b0
shrink_node+0x2eb/0x2f0
kswapd+0x2eb/0x890
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
stack backtrace:
CPU: 1 PID: 50 Comm: kswapd0 Tainted: G W 4.12.14-kvmsmall #8 SLE15 (unreleased)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x78/0xb7
print_irq_inversion_bug.part.38+0x19f/0x1aa
check_usage_forwards+0x102/0x120
? ret_from_fork+0x3a/0x50
? check_usage_backwards+0x110/0x110
mark_lock+0x16c/0x270
__lock_acquire+0x264/0x11c0
? pagevec_lookup_entries+0x1a/0x30
? truncate_inode_pages_range+0x2b3/0x7f0
lock_acquire+0xbd/0x1e0
? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
__mutex_lock+0x4e/0x8c0
? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
__btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
btrfs_evict_inode+0x22c/0x6a0 [btrfs]
evict+0xc4/0x190
dispose_list+0x35/0x50
prune_icache_sb+0x42/0x50
super_cache_scan+0x139/0x190
shrink_slab+0x262/0x5b0
shrink_node+0x2eb/0x2f0
kswapd+0x2eb/0x890
kthread+0x102/0x140
? mem_cgroup_shrink_node+0x2c0/0x2c0
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x3a/0x50
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
fs/btrfs/disk-io.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 21f34ad0d411..eb6bb3169a9e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
if (!writers)
return ERR_PTR(-ENOMEM);
- ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
+ ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
if (ret < 0) {
kfree(writers);
return ERR_PTR(ret);
--
2.15.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
2018-03-16 18:36 [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers jeffm
@ 2018-03-16 18:45 ` Liu Bo
2018-03-16 18:48 ` Nikolay Borisov
2018-03-16 20:12 ` David Sterba
2 siblings, 0 replies; 9+ messages in thread
From: Liu Bo @ 2018-03-16 18:45 UTC (permalink / raw)
To: jeffm; +Cc: linux-btrfs
On Fri, Mar 16, 2018 at 11:36 AM, <jeffm@suse.com> wrote:
> From: Jeff Mahoney <jeffm@suse.com>
>
> While running btrfs/011, I hit the following lockdep splat.
>
> This is the important bit:
> pcpu_alloc+0x1ac/0x5e0
> __percpu_counter_init+0x4e/0xb0
> btrfs_init_fs_root+0x99/0x1c0 [btrfs]
> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
> resolve_indirect_refs+0x130/0x830 [btrfs]
> find_parent_nodes+0x69e/0xff0 [btrfs]
> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
> btrfs_find_all_roots+0x50/0x70 [btrfs]
> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>
> The percpu_counter_init call in btrfs_alloc_subvolume_writers
> uses GFP_KERNEL, which we can't do during transaction commit.
>
> This switches it to GFP_NOFS.
>
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 4.12.14-kvmsmall #8 Tainted: G W
> --------------------------------------------------------
> kswapd0/50 just changed the state of lock:
> (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
> but this lock took another, RECLAIM_FS-unsafe lock in the past:
> (pcpu_alloc_mutex){+.+.+.}
>
> and interrupts could create inverse lock ordering between them.
>
> other info that might help us debug this:
> Chain exists of:
> &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex
>
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(pcpu_alloc_mutex);
> local_irq_disable();
> lock(&delayed_node->mutex);
> lock(&found->groups_sem);
> <Interrupt>
> lock(&delayed_node->mutex);
>
> *** DEADLOCK ***
>
> 2 locks held by kswapd0/50:
> #0: (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
> #1: (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50
>
> the shortest dependencies between 2nd lock and 1st lock:
> -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
> HARDIRQ-ON-W at:
> __mutex_lock+0x4e/0x8c0
> pcpu_alloc+0x1ac/0x5e0
> alloc_kmem_cache_cpus.isra.70+0x25/0xa0
> __do_tune_cpucache+0x2c/0x220
> do_tune_cpucache+0x26/0xc0
> enable_cpucache+0x6d/0xf0
> kmem_cache_init_late+0x42/0x75
> start_kernel+0x343/0x4cb
> x86_64_start_kernel+0x127/0x134
> secondary_startup_64+0xa5/0xb0
> SOFTIRQ-ON-W at:
> __mutex_lock+0x4e/0x8c0
> pcpu_alloc+0x1ac/0x5e0
> alloc_kmem_cache_cpus.isra.70+0x25/0xa0
> __do_tune_cpucache+0x2c/0x220
> do_tune_cpucache+0x26/0xc0
> enable_cpucache+0x6d/0xf0
> kmem_cache_init_late+0x42/0x75
> start_kernel+0x343/0x4cb
> x86_64_start_kernel+0x127/0x134
> secondary_startup_64+0xa5/0xb0
> RECLAIM_FS-ON-W at:
> __kmalloc+0x47/0x310
> pcpu_extend_area_map+0x2b/0xc0
> pcpu_alloc+0x3ec/0x5e0
> alloc_kmem_cache_cpus.isra.70+0x25/0xa0
> __do_tune_cpucache+0x2c/0x220
> do_tune_cpucache+0x26/0xc0
> enable_cpucache+0x6d/0xf0
> __kmem_cache_create+0x1bf/0x390
> create_cache+0xba/0x1b0
> kmem_cache_create+0x1f8/0x2b0
> ksm_init+0x6f/0x19d
> do_one_initcall+0x50/0x1b0
> kernel_init_freeable+0x201/0x289
> kernel_init+0xa/0x100
> ret_from_fork+0x3a/0x50
> INITIAL USE at:
> __mutex_lock+0x4e/0x8c0
> pcpu_alloc+0x1ac/0x5e0
> alloc_kmem_cache_cpus.isra.70+0x25/0xa0
> setup_cpu_cache+0x2f/0x1f0
> __kmem_cache_create+0x1bf/0x390
> create_boot_cache+0x8b/0xb1
> kmem_cache_init+0xa1/0x19e
> start_kernel+0x270/0x4cb
> x86_64_start_kernel+0x127/0x134
> secondary_startup_64+0xa5/0xb0
> }
> ... key at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
> ... acquired at:
> pcpu_alloc+0x1ac/0x5e0
> __percpu_counter_init+0x4e/0xb0
> btrfs_init_fs_root+0x99/0x1c0 [btrfs]
> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
> resolve_indirect_refs+0x130/0x830 [btrfs]
> find_parent_nodes+0x69e/0xff0 [btrfs]
> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
> btrfs_find_all_roots+0x50/0x70 [btrfs]
> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
> transaction_kthread+0x176/0x1b0 [btrfs]
> kthread+0x102/0x140
> ret_from_fork+0x3a/0x50
>
> -> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
> HARDIRQ-ON-W at:
> down_write+0x3e/0xa0
> cache_block_group+0x287/0x420 [btrfs]
> find_free_extent+0x106c/0x12d0 [btrfs]
> btrfs_reserve_extent+0xd8/0x170 [btrfs]
> cow_file_range.isra.66+0x133/0x470 [btrfs]
> run_delalloc_range+0x121/0x410 [btrfs]
> writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
> __extent_writepage+0x19a/0x360 [btrfs]
> extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
> extent_writepages+0x4d/0x60 [btrfs]
> do_writepages+0x1a/0x70
> __filemap_fdatawrite_range+0xa7/0xe0
> btrfs_rename+0x5ee/0xdb0 [btrfs]
> vfs_rename+0x52a/0x7e0
> SyS_rename+0x351/0x3b0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> HARDIRQ-ON-R at:
> down_read+0x35/0x90
> caching_thread+0x57/0x560 [btrfs]
> normal_work_helper+0x1c0/0x5e0 [btrfs]
> process_one_work+0x1e0/0x5c0
> worker_thread+0x44/0x390
> kthread+0x102/0x140
> ret_from_fork+0x3a/0x50
> SOFTIRQ-ON-W at:
> down_write+0x3e/0xa0
> cache_block_group+0x287/0x420 [btrfs]
> find_free_extent+0x106c/0x12d0 [btrfs]
> btrfs_reserve_extent+0xd8/0x170 [btrfs]
> cow_file_range.isra.66+0x133/0x470 [btrfs]
> run_delalloc_range+0x121/0x410 [btrfs]
> writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
> __extent_writepage+0x19a/0x360 [btrfs]
> extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
> extent_writepages+0x4d/0x60 [btrfs]
> do_writepages+0x1a/0x70
> __filemap_fdatawrite_range+0xa7/0xe0
> btrfs_rename+0x5ee/0xdb0 [btrfs]
> vfs_rename+0x52a/0x7e0
> SyS_rename+0x351/0x3b0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> SOFTIRQ-ON-R at:
> down_read+0x35/0x90
> caching_thread+0x57/0x560 [btrfs]
> normal_work_helper+0x1c0/0x5e0 [btrfs]
> process_one_work+0x1e0/0x5c0
> worker_thread+0x44/0x390
> kthread+0x102/0x140
> ret_from_fork+0x3a/0x50
> INITIAL USE at:
> down_write+0x3e/0xa0
> cache_block_group+0x287/0x420 [btrfs]
> find_free_extent+0x106c/0x12d0 [btrfs]
> btrfs_reserve_extent+0xd8/0x170 [btrfs]
> cow_file_range.isra.66+0x133/0x470 [btrfs]
> run_delalloc_range+0x121/0x410 [btrfs]
> writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
> __extent_writepage+0x19a/0x360 [btrfs]
> extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
> extent_writepages+0x4d/0x60 [btrfs]
> do_writepages+0x1a/0x70
> __filemap_fdatawrite_range+0xa7/0xe0
> btrfs_rename+0x5ee/0xdb0 [btrfs]
> vfs_rename+0x52a/0x7e0
> SyS_rename+0x351/0x3b0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> }
> ... key at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
> ... acquired at:
> cache_block_group+0x287/0x420 [btrfs]
> find_free_extent+0x106c/0x12d0 [btrfs]
> btrfs_reserve_extent+0xd8/0x170 [btrfs]
> btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
> btrfs_create_tree+0xbb/0x2a0 [btrfs]
> btrfs_create_uuid_tree+0x37/0x140 [btrfs]
> open_ctree+0x23c0/0x2660 [btrfs]
> btrfs_mount+0xd36/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> btrfs_mount+0x18c/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> do_mount+0x1c1/0xcc0
> SyS_mount+0x7e/0xd0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> -> (&found->groups_sem){++++..} ops: 2134587 {
> HARDIRQ-ON-W at:
> down_write+0x3e/0xa0
> __link_block_group+0x34/0x130 [btrfs]
> btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
> open_ctree+0x2054/0x2660 [btrfs]
> btrfs_mount+0xd36/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> btrfs_mount+0x18c/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> do_mount+0x1c1/0xcc0
> SyS_mount+0x7e/0xd0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> HARDIRQ-ON-R at:
> down_read+0x35/0x90
> btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
> open_ctree+0x207b/0x2660 [btrfs]
> btrfs_mount+0xd36/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> btrfs_mount+0x18c/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> do_mount+0x1c1/0xcc0
> SyS_mount+0x7e/0xd0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> SOFTIRQ-ON-W at:
> down_write+0x3e/0xa0
> __link_block_group+0x34/0x130 [btrfs]
> btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
> open_ctree+0x2054/0x2660 [btrfs]
> btrfs_mount+0xd36/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> btrfs_mount+0x18c/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> do_mount+0x1c1/0xcc0
> SyS_mount+0x7e/0xd0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> SOFTIRQ-ON-R at:
> down_read+0x35/0x90
> btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
> open_ctree+0x207b/0x2660 [btrfs]
> btrfs_mount+0xd36/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> btrfs_mount+0x18c/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> do_mount+0x1c1/0xcc0
> SyS_mount+0x7e/0xd0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> INITIAL USE at:
> down_write+0x3e/0xa0
> __link_block_group+0x34/0x130 [btrfs]
> btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
> open_ctree+0x2054/0x2660 [btrfs]
> btrfs_mount+0xd36/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> btrfs_mount+0x18c/0xf90 [btrfs]
> mount_fs+0x3a/0x160
> vfs_kern_mount+0x66/0x150
> do_mount+0x1c1/0xcc0
> SyS_mount+0x7e/0xd0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> }
> ... key at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
> ... acquired at:
> find_free_extent+0xcb4/0x12d0 [btrfs]
> btrfs_reserve_extent+0xd8/0x170 [btrfs]
> btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
> __btrfs_cow_block+0x110/0x5b0 [btrfs]
> btrfs_cow_block+0xd7/0x290 [btrfs]
> btrfs_search_slot+0x1f6/0x960 [btrfs]
> btrfs_lookup_inode+0x2a/0x90 [btrfs]
> __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
> btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
> btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
> evict+0xc4/0x190
> __dentry_kill+0xbf/0x170
> dput+0x2ae/0x2f0
> SyS_rename+0x2a6/0x3b0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> -> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
> HARDIRQ-ON-W at:
> __mutex_lock+0x4e/0x8c0
> btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
> btrfs_update_inode+0x83/0x110 [btrfs]
> btrfs_dirty_inode+0x62/0xe0 [btrfs]
> touch_atime+0x8c/0xb0
> do_generic_file_read+0x818/0xb10
> __vfs_read+0xdc/0x150
> vfs_read+0x8a/0x130
> SyS_read+0x45/0xa0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> SOFTIRQ-ON-W at:
> __mutex_lock+0x4e/0x8c0
> btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
> btrfs_update_inode+0x83/0x110 [btrfs]
> btrfs_dirty_inode+0x62/0xe0 [btrfs]
> touch_atime+0x8c/0xb0
> do_generic_file_read+0x818/0xb10
> __vfs_read+0xdc/0x150
> vfs_read+0x8a/0x130
> SyS_read+0x45/0xa0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> IN-RECLAIM_FS-W at:
> __mutex_lock+0x4e/0x8c0
> __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
> btrfs_evict_inode+0x22c/0x6a0 [btrfs]
> evict+0xc4/0x190
> dispose_list+0x35/0x50
> prune_icache_sb+0x42/0x50
> super_cache_scan+0x139/0x190
> shrink_slab+0x262/0x5b0
> shrink_node+0x2eb/0x2f0
> kswapd+0x2eb/0x890
> kthread+0x102/0x140
> ret_from_fork+0x3a/0x50
> INITIAL USE at:
> __mutex_lock+0x4e/0x8c0
> btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
> btrfs_update_inode+0x83/0x110 [btrfs]
> btrfs_dirty_inode+0x62/0xe0 [btrfs]
> touch_atime+0x8c/0xb0
> do_generic_file_read+0x818/0xb10
> __vfs_read+0xdc/0x150
> vfs_read+0x8a/0x130
> SyS_read+0x45/0xa0
> do_syscall_64+0x79/0x1e0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> }
> ... key at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
> ... acquired at:
> __lock_acquire+0x264/0x11c0
> lock_acquire+0xbd/0x1e0
> __mutex_lock+0x4e/0x8c0
> __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
> btrfs_evict_inode+0x22c/0x6a0 [btrfs]
> evict+0xc4/0x190
> dispose_list+0x35/0x50
> prune_icache_sb+0x42/0x50
> super_cache_scan+0x139/0x190
> shrink_slab+0x262/0x5b0
> shrink_node+0x2eb/0x2f0
> kswapd+0x2eb/0x890
> kthread+0x102/0x140
> ret_from_fork+0x3a/0x50
>
> stack backtrace:
> CPU: 1 PID: 50 Comm: kswapd0 Tainted: G W 4.12.14-kvmsmall #8 SLE15 (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> Call Trace:
> dump_stack+0x78/0xb7
> print_irq_inversion_bug.part.38+0x19f/0x1aa
> check_usage_forwards+0x102/0x120
> ? ret_from_fork+0x3a/0x50
> ? check_usage_backwards+0x110/0x110
> mark_lock+0x16c/0x270
> __lock_acquire+0x264/0x11c0
> ? pagevec_lookup_entries+0x1a/0x30
> ? truncate_inode_pages_range+0x2b3/0x7f0
> lock_acquire+0xbd/0x1e0
> ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
> __mutex_lock+0x4e/0x8c0
> ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
> ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
> ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
> __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
> btrfs_evict_inode+0x22c/0x6a0 [btrfs]
> evict+0xc4/0x190
> dispose_list+0x35/0x50
> prune_icache_sb+0x42/0x50
> super_cache_scan+0x139/0x190
> shrink_slab+0x262/0x5b0
> shrink_node+0x2eb/0x2f0
> kswapd+0x2eb/0x890
> kthread+0x102/0x140
> ? mem_cgroup_shrink_node+0x2c0/0x2c0
> ? kthread_create_on_node+0x40/0x40
> ret_from_fork+0x3a/0x50
>
Looks OK to me.
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
thanks,
liubo
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
> fs/btrfs/disk-io.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 21f34ad0d411..eb6bb3169a9e 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
> if (!writers)
> return ERR_PTR(-ENOMEM);
>
> - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
> + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
> if (ret < 0) {
> kfree(writers);
> return ERR_PTR(ret);
> --
> 2.15.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
2018-03-16 18:36 [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers jeffm
2018-03-16 18:45 ` Liu Bo
@ 2018-03-16 18:48 ` Nikolay Borisov
2018-03-16 18:54 ` Jeff Mahoney
2018-03-16 20:12 ` David Sterba
2 siblings, 1 reply; 9+ messages in thread
From: Nikolay Borisov @ 2018-03-16 18:48 UTC (permalink / raw)
To: jeffm, linux-btrfs
On 16.03.2018 20:36, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
>
> While running btrfs/011, I hit the following lockdep splat.
>
> This is the important bit:
> pcpu_alloc+0x1ac/0x5e0
> __percpu_counter_init+0x4e/0xb0
> btrfs_init_fs_root+0x99/0x1c0 [btrfs]
> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
> resolve_indirect_refs+0x130/0x830 [btrfs]
> find_parent_nodes+0x69e/0xff0 [btrfs]
> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
> btrfs_find_all_roots+0x50/0x70 [btrfs]
> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>
> The percpu_counter_init call in btrfs_alloc_subvolume_writers
> uses GFP_KERNEL, which we can't do during transaction commit.
>
> This switches it to GFP_NOFS.
Given there is effort underway to actually kill GFP_NOFS and replace it
with the context annotation routines, shouldn't instead use those
routines directly ?
<snip>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
2018-03-16 18:48 ` Nikolay Borisov
@ 2018-03-16 18:54 ` Jeff Mahoney
0 siblings, 0 replies; 9+ messages in thread
From: Jeff Mahoney @ 2018-03-16 18:54 UTC (permalink / raw)
To: Nikolay Borisov, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 1278 bytes --]
On 3/16/18 2:48 PM, Nikolay Borisov wrote:
>
>
> On 16.03.2018 20:36, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> While running btrfs/011, I hit the following lockdep splat.
>>
>> This is the important bit:
>> pcpu_alloc+0x1ac/0x5e0
>> __percpu_counter_init+0x4e/0xb0
>> btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>> resolve_indirect_refs+0x130/0x830 [btrfs]
>> find_parent_nodes+0x69e/0xff0 [btrfs]
>> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>> btrfs_find_all_roots+0x50/0x70 [btrfs]
>> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>>
>> The percpu_counter_init call in btrfs_alloc_subvolume_writers
>> uses GFP_KERNEL, which we can't do during transaction commit.
>>
>> This switches it to GFP_NOFS.
>
> Given there is effort underway to actually kill GFP_NOFS and replace it
> with the context annotation routines, shouldn't instead use those
> routines directly ?
I don't think those have landed yet. When they do, it should obsolete
the gfp flags here in any context since we can also read roots from code
that doesn't need GFP_NOFS.
-Jeff
--
Jeff Mahoney
SUSE Labs
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
2018-03-16 18:36 [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers jeffm
2018-03-16 18:45 ` Liu Bo
2018-03-16 18:48 ` Nikolay Borisov
@ 2018-03-16 20:12 ` David Sterba
2018-03-16 21:21 ` Jeff Mahoney
2018-03-19 17:52 ` Jeff Mahoney
2 siblings, 2 replies; 9+ messages in thread
From: David Sterba @ 2018-03-16 20:12 UTC (permalink / raw)
To: jeffm; +Cc: linux-btrfs
On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
>
> While running btrfs/011, I hit the following lockdep splat.
>
> This is the important bit:
> pcpu_alloc+0x1ac/0x5e0
> __percpu_counter_init+0x4e/0xb0
> btrfs_init_fs_root+0x99/0x1c0 [btrfs]
> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
> resolve_indirect_refs+0x130/0x830 [btrfs]
> find_parent_nodes+0x69e/0xff0 [btrfs]
> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
> btrfs_find_all_roots+0x50/0x70 [btrfs]
> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>
> The percpu_counter_init call in btrfs_alloc_subvolume_writers
> uses GFP_KERNEL, which we can't do during transaction commit.
>
> This switches it to GFP_NOFS.
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
> fs/btrfs/disk-io.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 21f34ad0d411..eb6bb3169a9e 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
> if (!writers)
> return ERR_PTR(-ENOMEM);
>
> - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
> + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
A line above the diff context is another allocation that does GFP_NOFS,
so one of the gfp flags were wrong.
Looks like there's another instance where percpu allocates with
GFP_KERNEL: create_space_info that can be called from the path that
allocates chunks, so this also looks like a NOFS candidate.
And in the same function, there's another indirect and hidden GFP_KERNEL
allocation from kobject_init_and_add. So in this case we can't fix all
the gfp problems at the call site and will have to use the scoped
approach eventually.
I haven't found any instance of such lockdep reports in my logs (over a
long period), so it's quite unlikely to end up in the recursive
allocation.
Patch added to next, thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
2018-03-16 20:12 ` David Sterba
@ 2018-03-16 21:21 ` Jeff Mahoney
2018-03-19 17:52 ` Jeff Mahoney
1 sibling, 0 replies; 9+ messages in thread
From: Jeff Mahoney @ 2018-03-16 21:21 UTC (permalink / raw)
To: dsterba, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 2883 bytes --]
On 3/16/18 4:12 PM, David Sterba wrote:
> On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> While running btrfs/011, I hit the following lockdep splat.
>>
>> This is the important bit:
>> pcpu_alloc+0x1ac/0x5e0
>> __percpu_counter_init+0x4e/0xb0
>> btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>> resolve_indirect_refs+0x130/0x830 [btrfs]
>> find_parent_nodes+0x69e/0xff0 [btrfs]
>> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>> btrfs_find_all_roots+0x50/0x70 [btrfs]
>> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>>
>> The percpu_counter_init call in btrfs_alloc_subvolume_writers
>> uses GFP_KERNEL, which we can't do during transaction commit.
>>
>> This switches it to GFP_NOFS.
>
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> ---
>> fs/btrfs/disk-io.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 21f34ad0d411..eb6bb3169a9e 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
>> if (!writers)
>> return ERR_PTR(-ENOMEM);
>>
>> - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
>> + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
>
> A line above the diff context is another allocation that does GFP_NOFS,
> so one of the gfp flags were wrong.
This one was wrong. It was initially implicitly GFP_KERNEL until Tejun
added the gfp_t argument and used GFP_KERNEL for most of the sites.
Since that was effectively a no-op, it was the right thing for him to do
without asking every subsystem maintainer their preference.
> Looks like there's another instance where percpu allocates with
> GFP_KERNEL: create_space_info that can be called from the path that
> allocates chunks, so this also looks like a NOFS candidate.
That's probably for the same reason.
> And in the same function, there's another indirect and hidden GFP_KERNEL
> allocation from kobject_init_and_add. So in this case we can't fix all
> the gfp problems at the call site and will have to use the scoped
> approach eventually.
Yep. That's not a huge barrier, though. We can push the kobject_add
into a workqueue pretty easily.
> I haven't found any instance of such lockdep reports in my logs (over a
> long period), so it's quite unlikely to end up in the recursive
> allocation.
>
> Patch added to next, thanks.
When hunting to see if this had already been fixed, I did find two
reports. One from Qu from April of last year and another from Mike
Galbraith in 2016.
-Jeff
--
Jeff Mahoney
SUSE Labs
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
2018-03-16 20:12 ` David Sterba
2018-03-16 21:21 ` Jeff Mahoney
@ 2018-03-19 17:52 ` Jeff Mahoney
2018-03-19 18:08 ` David Sterba
1 sibling, 1 reply; 9+ messages in thread
From: Jeff Mahoney @ 2018-03-19 17:52 UTC (permalink / raw)
To: dsterba, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 1993 bytes --]
On 3/16/18 4:12 PM, David Sterba wrote:
> On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> While running btrfs/011, I hit the following lockdep splat.
>>
>> This is the important bit:
>> pcpu_alloc+0x1ac/0x5e0
>> __percpu_counter_init+0x4e/0xb0
>> btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>> resolve_indirect_refs+0x130/0x830 [btrfs]
>> find_parent_nodes+0x69e/0xff0 [btrfs]
>> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>> btrfs_find_all_roots+0x50/0x70 [btrfs]
>> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>>
>> The percpu_counter_init call in btrfs_alloc_subvolume_writers
>> uses GFP_KERNEL, which we can't do during transaction commit.
>>
>> This switches it to GFP_NOFS.
>
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> ---
>> fs/btrfs/disk-io.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 21f34ad0d411..eb6bb3169a9e 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
>> if (!writers)
>> return ERR_PTR(-ENOMEM);
>>
>> - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
>> + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
>
> A line above the diff context is another allocation that does GFP_NOFS,
> so one of the gfp flags were wrong.
>
> Looks like there's another instance where percpu allocates with
> GFP_KERNEL: create_space_info that can be called from the path that
> allocates chunks, so this also looks like a NOFS candidate.
We can get rid of this case entirely. Those call sites should be
removed since the space_infos are all allocated at mount time.
-Jeff
--
Jeff Mahoney
SUSE Labs
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
2018-03-19 17:52 ` Jeff Mahoney
@ 2018-03-19 18:08 ` David Sterba
2018-03-19 21:15 ` Jeff Mahoney
0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2018-03-19 18:08 UTC (permalink / raw)
To: Jeff Mahoney; +Cc: dsterba, linux-btrfs
On Mon, Mar 19, 2018 at 01:52:05PM -0400, Jeff Mahoney wrote:
> On 3/16/18 4:12 PM, David Sterba wrote:
> > On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote:
> >> From: Jeff Mahoney <jeffm@suse.com>
> >>
> >> While running btrfs/011, I hit the following lockdep splat.
> >>
> >> This is the important bit:
> >> pcpu_alloc+0x1ac/0x5e0
> >> __percpu_counter_init+0x4e/0xb0
> >> btrfs_init_fs_root+0x99/0x1c0 [btrfs]
> >> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
> >> resolve_indirect_refs+0x130/0x830 [btrfs]
> >> find_parent_nodes+0x69e/0xff0 [btrfs]
> >> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
> >> btrfs_find_all_roots+0x50/0x70 [btrfs]
> >> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
> >> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
> >>
> >> The percpu_counter_init call in btrfs_alloc_subvolume_writers
> >> uses GFP_KERNEL, which we can't do during transaction commit.
> >>
> >> This switches it to GFP_NOFS.
> >
> >> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> >> ---
> >> fs/btrfs/disk-io.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index 21f34ad0d411..eb6bb3169a9e 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
> >> if (!writers)
> >> return ERR_PTR(-ENOMEM);
> >>
> >> - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
> >> + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
> >
> > A line above the diff context is another allocation that does GFP_NOFS,
> > so one of the gfp flags were wrong.
> >
> > Looks like there's another instance where percpu allocates with
> > GFP_KERNEL: create_space_info that can be called from the path that
> > allocates chunks, so this also looks like a NOFS candidate.
>
> We can get rid of this case entirely. Those call sites should be
> removed since the space_infos are all allocated at mount time.
That would be great and make a few things simpler. So this means that
__find_space_info never fails once the space infos are properly
initialized, right? That was my concern in do_chunk_alloc and
btrfs_make_block_group (that's called from __btrfs_alloc_chunk).
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
2018-03-19 18:08 ` David Sterba
@ 2018-03-19 21:15 ` Jeff Mahoney
0 siblings, 0 replies; 9+ messages in thread
From: Jeff Mahoney @ 2018-03-19 21:15 UTC (permalink / raw)
To: dsterba, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 2986 bytes --]
On 3/19/18 2:08 PM, David Sterba wrote:
> On Mon, Mar 19, 2018 at 01:52:05PM -0400, Jeff Mahoney wrote:
>> On 3/16/18 4:12 PM, David Sterba wrote:
>>> On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote:
>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>
>>>> While running btrfs/011, I hit the following lockdep splat.
>>>>
>>>> This is the important bit:
>>>> pcpu_alloc+0x1ac/0x5e0
>>>> __percpu_counter_init+0x4e/0xb0
>>>> btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>>>> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>>>> resolve_indirect_refs+0x130/0x830 [btrfs]
>>>> find_parent_nodes+0x69e/0xff0 [btrfs]
>>>> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>>>> btrfs_find_all_roots+0x50/0x70 [btrfs]
>>>> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>>>> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>>>>
>>>> The percpu_counter_init call in btrfs_alloc_subvolume_writers
>>>> uses GFP_KERNEL, which we can't do during transaction commit.
>>>>
>>>> This switches it to GFP_NOFS.
>>>
>>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>>> ---
>>>> fs/btrfs/disk-io.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 21f34ad0d411..eb6bb3169a9e 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
>>>> if (!writers)
>>>> return ERR_PTR(-ENOMEM);
>>>>
>>>> - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
>>>> + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
>>>
>>> A line above the diff context is another allocation that does GFP_NOFS,
>>> so one of the gfp flags were wrong.
>>>
>>> Looks like there's another instance where percpu allocates with
>>> GFP_KERNEL: create_space_info that can be called from the path that
>>> allocates chunks, so this also looks like a NOFS candidate.
>>
>> We can get rid of this case entirely. Those call sites should be
>> removed since the space_infos are all allocated at mount time.
>
> That would be great and make a few things simpler. So this means that
> __find_space_info never fails once the space infos are properly
> initialized, right? That was my concern in do_chunk_alloc and
> btrfs_make_block_group (that's called from __btrfs_alloc_chunk).
That's a different case. The raid levels are added when the first block
group of a particular read level is loaded up. That can happen when the
block groups are read in initially, where it should be safe to use
GFP_KERNEL or when a chunk of a new type is allocated. The thing is
that a chunk of a new type will only be allocated when we're converting
via balance, so we may be able to do the kobject_add for the raid level
when we start the balance rather than wait for it to create the block group.
-Jeff
--
Jeff Mahoney
SUSE Labs
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-03-19 21:15 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-16 18:36 [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers jeffm
2018-03-16 18:45 ` Liu Bo
2018-03-16 18:48 ` Nikolay Borisov
2018-03-16 18:54 ` Jeff Mahoney
2018-03-16 20:12 ` David Sterba
2018-03-16 21:21 ` Jeff Mahoney
2018-03-19 17:52 ` Jeff Mahoney
2018-03-19 18:08 ` David Sterba
2018-03-19 21:15 ` Jeff Mahoney
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.