All of lore.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
	Filipe David Borba Manana <fdmanana@suse.com>
Subject: Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
Date: Wed, 24 Oct 2018 13:48:40 +0100	[thread overview]
Message-ID: <CAL3q7H7dZm8s2Ey185Xw=bsjG6fZBeEWjQu+He2kQ5JYMPCHWg@mail.gmail.com> (raw)
In-Reply-To: <20181024123959.icp7ssu5sjhqg6c3@MacBook-Pro-91.local>

On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote:
> > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > >
> > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote:
> > > > From: Filipe Manana <fdmanana@suse.com>
> > > >
> > > > When we are writing out a free space cache, during the transaction commit
> > > > phase, we can end up in a deadlock which results in a stack trace like the
> > > > following:
> > > >
> > > >  schedule+0x28/0x80
> > > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > > >  ? inode_insert5+0x119/0x190
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_iget+0x113/0x690 [btrfs]
> > > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  find_free_extent+0x799/0x1010 [btrfs]
> > > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > > >
> > > > At cache_save_setup() we need to update the inode item of a block group's
> > > > cache which is located in the tree root (fs_info->tree_root), which means
> > > > that it may result in COWing a leaf from that tree. If that happens we
> > > > need to find a free metadata extent and while looking for one, if we find
> > > > a block group which was not cached yet we attempt to load its cache by
> > > > calling cache_block_group(). However this function will try to load the
> > > > inode of the free space cache, which requires finding the matching inode
> > > > item in the tree root - if that inode item is located in the same leaf as
> > > > the inode item of the space cache we are updating at cache_save_setup(),
> > > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > > extent buffer that we previously write locked.
> > > >
> > > > So fix this by using the tree root's commit root when searching for a
> > > > block group's free space cache inode item when we are attempting to load
> > > > a free space cache. This is safe since block groups once loaded stay in
> > > > memory forever, as well as their caches, so after they are first loaded
> > > > we will never need to read their inode items again. For new block groups,
> > > > once they are created they get their ->cached field set to
> > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> > > >
> > > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > > ---
> > > >
> > >
> > > Now my goal is to see how many times I can get you to redo this thing.
> > >
> > > Why not instead just do
> > >
> > > if (btrfs_is_free_space_inode(inode))
> > >   path->search_commit_root = 1;
> > >
> > > in read_locked_inode?  That would be cleaner.  If we don't want to do that for
> > > the inode cache (I'm not sure if it's ok or not) we could just do
> > >
> > > if (root == fs_info->tree_root)
> >
> > We can't (not just that at least).
> > Tried something like that, but we get into a BUG_ON when writing out
> > the space cache for new block groups (created in the current
> > transaction).
> > Because at cache_save_setup() we have this:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342
> >
> > Lookup for the inode in normal root, doesn't exist, create it then
> > repeat - if still not found, BUG_ON.
> > Could also make create_free_space_inode() return an inode pointer and
> > make it call btrfs_iget().
> >
>
> Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
> take a path, and allocate it in btrfs_iget, that'll remove the ugly
>
> if (path != in_path)

You mean the following on top of v4:

https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg

Not much different, just saves one such if statement. I'm ok with that.

>
> stuff.  Thanks,
>
> Josef

  reply	other threads:[~2018-10-24 12:48 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-22  9:09 [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent fdmanana
2018-10-22  9:53 ` Filipe Manana
2018-10-22  9:54 ` Filipe Manana
2018-10-22 18:07 ` Josef Bacik
2018-10-22 18:51   ` Filipe Manana
2018-10-22 18:48 ` [PATCH v2] " fdmanana
2018-10-22 18:55   ` Josef Bacik
2018-10-22 19:10     ` Filipe Manana
2018-10-22 19:10 ` [PATCH v3] " fdmanana
2018-10-22 19:18   ` Josef Bacik
2018-10-22 22:05     ` Filipe Manana
2018-10-24  4:08       ` Josef Bacik
2018-10-24  9:07         ` Filipe Manana
2018-10-24  9:13 ` [PATCH v4] " fdmanana
2018-10-24 11:37   ` Josef Bacik
2018-10-24 11:53     ` Filipe Manana
2018-10-24 12:40       ` Josef Bacik
2018-10-24 12:48         ` Filipe Manana [this message]
2018-10-24 23:58           ` Josef Bacik
2018-11-05 16:28           ` David Sterba
2018-11-05 16:30             ` Filipe Manana
2018-11-05 16:34               ` David Sterba
2018-11-05 16:36                 ` Filipe Manana
2018-11-13 14:33                   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H7dZm8s2Ey185Xw=bsjG6fZBeEWjQu+He2kQ5JYMPCHWg@mail.gmail.com' \
    --to=fdmanana@kernel.org \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.