All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org, josef@toxicpanda.com,
	Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent
Date: Mon, 22 Oct 2018 14:55:57 -0400	[thread overview]
Message-ID: <20181022185556.2bfuoijklvsdyzx6@destiny> (raw)
In-Reply-To: <20181022184830.415-1-fdmanana@kernel.org>

On Mon, Oct 22, 2018 at 07:48:30PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
> 
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> 
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
> 
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are COWing an extent
> buffer from the root tree and space caching is enabled (-o space_cache
> mount option). This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
> 
> Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> 
> V2: Made the solution more generic, since the problem could happen in any
>     path COWing an extent buffer from the root tree.
> 
>     Applies on top of a previous patch titled:
> 
>      "Btrfs: fix deadlock when writing out free space caches"
> 
>  fs/btrfs/ctree.c       |  4 ++++
>  fs/btrfs/ctree.h       |  3 +++
>  fs/btrfs/disk-io.c     |  2 ++
>  fs/btrfs/extent-tree.c | 15 ++++++++++++++-
>  4 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 089b46c4d97f..646aafda55a3 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1065,10 +1065,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
>  	    root == fs_info->chunk_root ||
>  	    root == fs_info->dev_root)
>  		trans->can_flush_pending_bgs = false;
> +	else if (root == fs_info->tree_root)
> +		atomic_inc(&fs_info->tree_root_cows);
>  
>  	cow = btrfs_alloc_tree_block(trans, root, parent_start,
>  			root->root_key.objectid, &disk_key, level,
>  			search_start, empty_size);
> +	if (root == fs_info->tree_root)
> +		atomic_dec(&fs_info->tree_root_cows);

Do we need this though?  Our root should be the root we're cow'ing the block
for, and it should be passed all the way down to find_free_extent properly, so
we really should be able to just do if (root == fs_info->tree_root) and not add
all this stuff.

Not to mention this will race with anybody else doing stuff, so if another
thread that isn't actually touching the tree_root it would skip caching a block
group when it's completely ok for that thread to do it.  Thanks,

Josef

  reply	other threads:[~2018-10-22 18:56 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-22  9:09 [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent fdmanana
2018-10-22  9:53 ` Filipe Manana
2018-10-22  9:54 ` Filipe Manana
2018-10-22 18:07 ` Josef Bacik
2018-10-22 18:51   ` Filipe Manana
2018-10-22 18:48 ` [PATCH v2] " fdmanana
2018-10-22 18:55   ` Josef Bacik [this message]
2018-10-22 19:10     ` Filipe Manana
2018-10-22 19:10 ` [PATCH v3] " fdmanana
2018-10-22 19:18   ` Josef Bacik
2018-10-22 22:05     ` Filipe Manana
2018-10-24  4:08       ` Josef Bacik
2018-10-24  9:07         ` Filipe Manana
2018-10-24  9:13 ` [PATCH v4] " fdmanana
2018-10-24 11:37   ` Josef Bacik
2018-10-24 11:53     ` Filipe Manana
2018-10-24 12:40       ` Josef Bacik
2018-10-24 12:48         ` Filipe Manana
2018-10-24 23:58           ` Josef Bacik
2018-11-05 16:28           ` David Sterba
2018-11-05 16:30             ` Filipe Manana
2018-11-05 16:34               ` David Sterba
2018-11-05 16:36                 ` Filipe Manana
2018-11-13 14:33                   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181022185556.2bfuoijklvsdyzx6@destiny \
    --to=josef@toxicpanda.com \
    --cc=fdmanana@kernel.org \
    --cc=fdmanana@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.