* [PATCH] btrfs: nofs inode allocations
@ 2019-09-09 14:12 Josef Bacik
2019-09-09 14:28 ` Nikolay Borisov
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Josef Bacik @ 2019-09-09 14:12 UTC (permalink / raw)
To: kernel-team, linux-btrfs; +Cc: Zdenek Sojka
A user reported a lockdep splat
======================================================
WARNING: possible circular locking dependency detected
5.2.11-gentoo #2 Not tainted
------------------------------------------------------
kswapd0/711 is trying to acquire lock:
000000007777a663 (sb_internal){.+.+}, at: start_transaction+0x3a8/0x500
but task is already holding lock:
000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}:
kmem_cache_alloc+0x1f/0x1c0
btrfs_alloc_inode+0x1f/0x260
alloc_inode+0x16/0xa0
new_inode+0xe/0xb0
btrfs_new_inode+0x70/0x610
btrfs_symlink+0xd0/0x420
vfs_symlink+0x9c/0x100
do_symlinkat+0x66/0xe0
do_syscall_64+0x55/0x1c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (sb_internal){.+.+}:
__sb_start_write+0xf6/0x150
start_transaction+0x3a8/0x500
btrfs_commit_inode_delayed_inode+0x59/0x110
btrfs_evict_inode+0x19e/0x4c0
evict+0xbc/0x1f0
inode_lru_isolate+0x113/0x190
__list_lru_walk_one.isra.4+0x5c/0x100
list_lru_walk_one+0x32/0x50
prune_icache_sb+0x36/0x80
super_cache_scan+0x14a/0x1d0
do_shrink_slab+0x131/0x320
shrink_node+0xf7/0x380
balance_pgdat+0x2d5/0x640
kswapd+0x2ba/0x5e0
kthread+0x147/0x160
ret_from_fork+0x24/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(sb_internal);
lock(fs_reclaim);
lock(sb_internal);
*** DEADLOCK ***
3 locks held by kswapd0/711:
#0: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
#1: 000000004a5100f8 (shrinker_rwsem){++++}, at: shrink_node+0x9a/0x380
#2: 00000000f956fa46 (&type->s_umount_key#30){++++}, at: super_cache_scan+0x35/0x1d0
stack backtrace:
CPU: 7 PID: 711 Comm: kswapd0 Not tainted 5.2.11-gentoo #2
Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.4.2 09/29/2017
Call Trace:
dump_stack+0x85/0xc7
print_circular_bug.cold.40+0x1d9/0x235
__lock_acquire+0x18b1/0x1f00
lock_acquire+0xa6/0x170
? start_transaction+0x3a8/0x500
__sb_start_write+0xf6/0x150
? start_transaction+0x3a8/0x500
start_transaction+0x3a8/0x500
btrfs_commit_inode_delayed_inode+0x59/0x110
btrfs_evict_inode+0x19e/0x4c0
? var_wake_function+0x20/0x20
evict+0xbc/0x1f0
inode_lru_isolate+0x113/0x190
? discard_new_inode+0xc0/0xc0
__list_lru_walk_one.isra.4+0x5c/0x100
? discard_new_inode+0xc0/0xc0
list_lru_walk_one+0x32/0x50
prune_icache_sb+0x36/0x80
super_cache_scan+0x14a/0x1d0
do_shrink_slab+0x131/0x320
shrink_node+0xf7/0x380
balance_pgdat+0x2d5/0x640
kswapd+0x2ba/0x5e0
? __wake_up_common_lock+0x90/0x90
kthread+0x147/0x160
? balance_pgdat+0x640/0x640
? __kthread_create_on_node+0x160/0x160
ret_from_fork+0x24/0x30
This is because btrfs_new_inode() calls new_inode() under the
transaction. We could probably move the new_inode() outside of this but
for now just wrap it in memalloc_nofs_save().
Reported-by: Zdenek Sojka <zsojka@seznam.cz>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/inode.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index aece5dd0e7a8..bf40d1085e4e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6291,13 +6291,16 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
u32 sizes[2];
int nitems = name ? 2 : 1;
unsigned long ptr;
+ unsigned long nofs_flag;
int ret;
path = btrfs_alloc_path();
if (!path)
return ERR_PTR(-ENOMEM);
+ nofs_flag = memalloc_nofs_save();
inode = new_inode(fs_info->sb);
+ memalloc_nofs_restore(nofs_flag);
if (!inode) {
btrfs_free_path(path);
return ERR_PTR(-ENOMEM);
--
2.21.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] btrfs: nofs inode allocations
2019-09-09 14:12 [PATCH] btrfs: nofs inode allocations Josef Bacik
@ 2019-09-09 14:28 ` Nikolay Borisov
2019-10-01 18:06 ` David Sterba
2019-09-11 9:37 ` Filipe Manana
2019-10-01 18:11 ` David Sterba
2 siblings, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2019-09-09 14:28 UTC (permalink / raw)
To: Josef Bacik, kernel-team, linux-btrfs; +Cc: Zdenek Sojka
On 9.09.19 г. 17:12 ч., Josef Bacik wrote:
> A user reported a lockdep splat
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.2.11-gentoo #2 Not tainted
> ------------------------------------------------------
> kswapd0/711 is trying to acquire lock:
> 000000007777a663 (sb_internal){.+.+}, at: start_transaction+0x3a8/0x500
>
> but task is already holding lock:
> 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (fs_reclaim){+.+.}:
> kmem_cache_alloc+0x1f/0x1c0
> btrfs_alloc_inode+0x1f/0x260
> alloc_inode+0x16/0xa0
> new_inode+0xe/0xb0
> btrfs_new_inode+0x70/0x610
> btrfs_symlink+0xd0/0x420
> vfs_symlink+0x9c/0x100
> do_symlinkat+0x66/0xe0
> do_syscall_64+0x55/0x1c0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (sb_internal){.+.+}:
> __sb_start_write+0xf6/0x150
> start_transaction+0x3a8/0x500
> btrfs_commit_inode_delayed_inode+0x59/0x110
> btrfs_evict_inode+0x19e/0x4c0
> evict+0xbc/0x1f0
> inode_lru_isolate+0x113/0x190
> __list_lru_walk_one.isra.4+0x5c/0x100
> list_lru_walk_one+0x32/0x50
> prune_icache_sb+0x36/0x80
> super_cache_scan+0x14a/0x1d0
> do_shrink_slab+0x131/0x320
> shrink_node+0xf7/0x380
> balance_pgdat+0x2d5/0x640
> kswapd+0x2ba/0x5e0
> kthread+0x147/0x160
> ret_from_fork+0x24/0x30
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(fs_reclaim);
> lock(sb_internal);
> lock(fs_reclaim);
> lock(sb_internal);
> *** DEADLOCK ***
>
> 3 locks held by kswapd0/711:
> #0: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
> #1: 000000004a5100f8 (shrinker_rwsem){++++}, at: shrink_node+0x9a/0x380
> #2: 00000000f956fa46 (&type->s_umount_key#30){++++}, at: super_cache_scan+0x35/0x1d0
>
> stack backtrace:
> CPU: 7 PID: 711 Comm: kswapd0 Not tainted 5.2.11-gentoo #2
> Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.4.2 09/29/2017
> Call Trace:
> dump_stack+0x85/0xc7
> print_circular_bug.cold.40+0x1d9/0x235
> __lock_acquire+0x18b1/0x1f00
> lock_acquire+0xa6/0x170
> ? start_transaction+0x3a8/0x500
> __sb_start_write+0xf6/0x150
> ? start_transaction+0x3a8/0x500
> start_transaction+0x3a8/0x500
> btrfs_commit_inode_delayed_inode+0x59/0x110
> btrfs_evict_inode+0x19e/0x4c0
> ? var_wake_function+0x20/0x20
> evict+0xbc/0x1f0
> inode_lru_isolate+0x113/0x190
> ? discard_new_inode+0xc0/0xc0
> __list_lru_walk_one.isra.4+0x5c/0x100
> ? discard_new_inode+0xc0/0xc0
> list_lru_walk_one+0x32/0x50
> prune_icache_sb+0x36/0x80
> super_cache_scan+0x14a/0x1d0
> do_shrink_slab+0x131/0x320
> shrink_node+0xf7/0x380
> balance_pgdat+0x2d5/0x640
> kswapd+0x2ba/0x5e0
> ? __wake_up_common_lock+0x90/0x90
> kthread+0x147/0x160
> ? balance_pgdat+0x640/0x640
> ? __kthread_create_on_node+0x160/0x160
> ret_from_fork+0x24/0x30
>
> This is because btrfs_new_inode() calls new_inode() under the
> transaction. We could probably move the new_inode() outside of this but
> for now just wrap it in memalloc_nofs_save().
If I'm understanding correctly what happens here is that symlinking
wants to instantiate a new inode
(new_inode->btrfs_alloc_inode->kmem_cache_alloc with GFP_KERNEL) but it
triggers background reclaim, hence goes to sleep, while holding a
transaction. At the same time background reclaim joins the same
transaction which is now blocked by the symlinking thread, hence it's
prone to deadlock ?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] btrfs: nofs inode allocations
2019-09-09 14:12 [PATCH] btrfs: nofs inode allocations Josef Bacik
2019-09-09 14:28 ` Nikolay Borisov
@ 2019-09-11 9:37 ` Filipe Manana
2019-10-01 18:11 ` David Sterba
2 siblings, 0 replies; 5+ messages in thread
From: Filipe Manana @ 2019-09-11 9:37 UTC (permalink / raw)
To: Josef Bacik; +Cc: kernel-team, linux-btrfs, Zdenek Sojka
On Tue, Sep 10, 2019 at 7:22 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> A user reported a lockdep splat
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.2.11-gentoo #2 Not tainted
> ------------------------------------------------------
> kswapd0/711 is trying to acquire lock:
> 000000007777a663 (sb_internal){.+.+}, at: start_transaction+0x3a8/0x500
>
> but task is already holding lock:
> 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (fs_reclaim){+.+.}:
> kmem_cache_alloc+0x1f/0x1c0
> btrfs_alloc_inode+0x1f/0x260
> alloc_inode+0x16/0xa0
> new_inode+0xe/0xb0
> btrfs_new_inode+0x70/0x610
> btrfs_symlink+0xd0/0x420
> vfs_symlink+0x9c/0x100
> do_symlinkat+0x66/0xe0
> do_syscall_64+0x55/0x1c0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (sb_internal){.+.+}:
> __sb_start_write+0xf6/0x150
> start_transaction+0x3a8/0x500
> btrfs_commit_inode_delayed_inode+0x59/0x110
> btrfs_evict_inode+0x19e/0x4c0
> evict+0xbc/0x1f0
> inode_lru_isolate+0x113/0x190
> __list_lru_walk_one.isra.4+0x5c/0x100
> list_lru_walk_one+0x32/0x50
> prune_icache_sb+0x36/0x80
> super_cache_scan+0x14a/0x1d0
> do_shrink_slab+0x131/0x320
> shrink_node+0xf7/0x380
> balance_pgdat+0x2d5/0x640
> kswapd+0x2ba/0x5e0
> kthread+0x147/0x160
> ret_from_fork+0x24/0x30
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(fs_reclaim);
> lock(sb_internal);
> lock(fs_reclaim);
> lock(sb_internal);
> *** DEADLOCK ***
>
> 3 locks held by kswapd0/711:
> #0: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
> #1: 000000004a5100f8 (shrinker_rwsem){++++}, at: shrink_node+0x9a/0x380
> #2: 00000000f956fa46 (&type->s_umount_key#30){++++}, at: super_cache_scan+0x35/0x1d0
>
> stack backtrace:
> CPU: 7 PID: 711 Comm: kswapd0 Not tainted 5.2.11-gentoo #2
> Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.4.2 09/29/2017
> Call Trace:
> dump_stack+0x85/0xc7
> print_circular_bug.cold.40+0x1d9/0x235
> __lock_acquire+0x18b1/0x1f00
> lock_acquire+0xa6/0x170
> ? start_transaction+0x3a8/0x500
> __sb_start_write+0xf6/0x150
> ? start_transaction+0x3a8/0x500
> start_transaction+0x3a8/0x500
> btrfs_commit_inode_delayed_inode+0x59/0x110
> btrfs_evict_inode+0x19e/0x4c0
> ? var_wake_function+0x20/0x20
> evict+0xbc/0x1f0
> inode_lru_isolate+0x113/0x190
> ? discard_new_inode+0xc0/0xc0
> __list_lru_walk_one.isra.4+0x5c/0x100
> ? discard_new_inode+0xc0/0xc0
> list_lru_walk_one+0x32/0x50
> prune_icache_sb+0x36/0x80
> super_cache_scan+0x14a/0x1d0
> do_shrink_slab+0x131/0x320
> shrink_node+0xf7/0x380
> balance_pgdat+0x2d5/0x640
> kswapd+0x2ba/0x5e0
> ? __wake_up_common_lock+0x90/0x90
> kthread+0x147/0x160
> ? balance_pgdat+0x640/0x640
> ? __kthread_create_on_node+0x160/0x160
> ret_from_fork+0x24/0x30
>
> This is because btrfs_new_inode() calls new_inode() under the
> transaction. We could probably move the new_inode() outside of this but
> for now just wrap it in memalloc_nofs_save().
>
> Reported-by: Zdenek Sojka <zsojka@seznam.cz>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
To aid the stable team figure out to where to backport things, adding
a Fixes tag (Fixes: 712e36c5f2a7fa ("btrfs: use GFP_KERNEL in
btrfs_alloc_inode")) or a " CC: stable@vger.kernel.org # 4.16+" would
help.
> ---
> fs/btrfs/inode.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index aece5dd0e7a8..bf40d1085e4e 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6291,13 +6291,16 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
> u32 sizes[2];
> int nitems = name ? 2 : 1;
> unsigned long ptr;
> + unsigned long nofs_flag;
Should be unsigned int (it's what memalloc_nofs_save() returns exactly).
Anyway, the change itself looks good to me.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
> int ret;
>
> path = btrfs_alloc_path();
> if (!path)
> return ERR_PTR(-ENOMEM);
>
> + nofs_flag = memalloc_nofs_save();
> inode = new_inode(fs_info->sb);
> + memalloc_nofs_restore(nofs_flag);
> if (!inode) {
> btrfs_free_path(path);
> return ERR_PTR(-ENOMEM);
> --
> 2.21.0
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] btrfs: nofs inode allocations
2019-09-09 14:28 ` Nikolay Borisov
@ 2019-10-01 18:06 ` David Sterba
0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2019-10-01 18:06 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: Josef Bacik, kernel-team, linux-btrfs, Zdenek Sojka
On Mon, Sep 09, 2019 at 05:28:11PM +0300, Nikolay Borisov wrote:
> > This is because btrfs_new_inode() calls new_inode() under the
> > transaction. We could probably move the new_inode() outside of this but
> > for now just wrap it in memalloc_nofs_save().
>
> If I'm understanding correctly what happens here is that symlinking
> wants to instantiate a new inode
> (new_inode->btrfs_alloc_inode->kmem_cache_alloc with GFP_KERNEL) but it
> triggers background reclaim, hence goes to sleep, while holding a
> transaction. At the same time background reclaim joins the same
> transaction which is now blocked by the symlinking thread, hence it's
> prone to deadlock ?
Yes, that's the pattern, GFP_KERNEL inside
start_transaction/commit_transaction. The more generic fix is in the works.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] btrfs: nofs inode allocations
2019-09-09 14:12 [PATCH] btrfs: nofs inode allocations Josef Bacik
2019-09-09 14:28 ` Nikolay Borisov
2019-09-11 9:37 ` Filipe Manana
@ 2019-10-01 18:11 ` David Sterba
2 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2019-10-01 18:11 UTC (permalink / raw)
To: Josef Bacik; +Cc: kernel-team, linux-btrfs, Zdenek Sojka
On Mon, Sep 09, 2019 at 10:12:04AM -0400, Josef Bacik wrote:
> A user reported a lockdep splat
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.2.11-gentoo #2 Not tainted
> ------------------------------------------------------
> kswapd0/711 is trying to acquire lock:
> 000000007777a663 (sb_internal){.+.+}, at: start_transaction+0x3a8/0x500
>
> but task is already holding lock:
> 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (fs_reclaim){+.+.}:
> kmem_cache_alloc+0x1f/0x1c0
> btrfs_alloc_inode+0x1f/0x260
> alloc_inode+0x16/0xa0
> new_inode+0xe/0xb0
> btrfs_new_inode+0x70/0x610
> btrfs_symlink+0xd0/0x420
> vfs_symlink+0x9c/0x100
> do_symlinkat+0x66/0xe0
> do_syscall_64+0x55/0x1c0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (sb_internal){.+.+}:
> __sb_start_write+0xf6/0x150
> start_transaction+0x3a8/0x500
> btrfs_commit_inode_delayed_inode+0x59/0x110
> btrfs_evict_inode+0x19e/0x4c0
> evict+0xbc/0x1f0
> inode_lru_isolate+0x113/0x190
> __list_lru_walk_one.isra.4+0x5c/0x100
> list_lru_walk_one+0x32/0x50
> prune_icache_sb+0x36/0x80
> super_cache_scan+0x14a/0x1d0
> do_shrink_slab+0x131/0x320
> shrink_node+0xf7/0x380
> balance_pgdat+0x2d5/0x640
> kswapd+0x2ba/0x5e0
> kthread+0x147/0x160
> ret_from_fork+0x24/0x30
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(fs_reclaim);
> lock(sb_internal);
> lock(fs_reclaim);
> lock(sb_internal);
> *** DEADLOCK ***
>
> 3 locks held by kswapd0/711:
> #0: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
> #1: 000000004a5100f8 (shrinker_rwsem){++++}, at: shrink_node+0x9a/0x380
> #2: 00000000f956fa46 (&type->s_umount_key#30){++++}, at: super_cache_scan+0x35/0x1d0
>
> stack backtrace:
> CPU: 7 PID: 711 Comm: kswapd0 Not tainted 5.2.11-gentoo #2
> Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.4.2 09/29/2017
> Call Trace:
> dump_stack+0x85/0xc7
> print_circular_bug.cold.40+0x1d9/0x235
> __lock_acquire+0x18b1/0x1f00
> lock_acquire+0xa6/0x170
> ? start_transaction+0x3a8/0x500
> __sb_start_write+0xf6/0x150
> ? start_transaction+0x3a8/0x500
> start_transaction+0x3a8/0x500
> btrfs_commit_inode_delayed_inode+0x59/0x110
> btrfs_evict_inode+0x19e/0x4c0
> ? var_wake_function+0x20/0x20
> evict+0xbc/0x1f0
> inode_lru_isolate+0x113/0x190
> ? discard_new_inode+0xc0/0xc0
> __list_lru_walk_one.isra.4+0x5c/0x100
> ? discard_new_inode+0xc0/0xc0
> list_lru_walk_one+0x32/0x50
> prune_icache_sb+0x36/0x80
> super_cache_scan+0x14a/0x1d0
> do_shrink_slab+0x131/0x320
> shrink_node+0xf7/0x380
> balance_pgdat+0x2d5/0x640
> kswapd+0x2ba/0x5e0
> ? __wake_up_common_lock+0x90/0x90
> kthread+0x147/0x160
> ? balance_pgdat+0x640/0x640
> ? __kthread_create_on_node+0x160/0x160
> ret_from_fork+0x24/0x30
>
> This is because btrfs_new_inode() calls new_inode() under the
> transaction. We could probably move the new_inode() outside of this but
> for now just wrap it in memalloc_nofs_save().
>
> Reported-by: Zdenek Sojka <zsojka@seznam.cz>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Added to 5.4 queue, with the type fixed and updated subject.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-10-01 18:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-09 14:12 [PATCH] btrfs: nofs inode allocations Josef Bacik
2019-09-09 14:28 ` Nikolay Borisov
2019-10-01 18:06 ` David Sterba
2019-09-11 9:37 ` Filipe Manana
2019-10-01 18:11 ` David Sterba
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.