linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	stable@vger.kernel.org, David Sterba <dsterba@suse.com>,
	Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH v2] btrfs: honor path->skip_locking in backref code
Date: Wed, 13 Feb 2019 12:31:07 +0000	[thread overview]
Message-ID: <CAL3q7H78vZiG+miY7Eb+Jh1_CyOYTkVuWzzK6_JW-Wqv19q-Uw@mail.gmail.com> (raw)
In-Reply-To: <20190213062704.25121-1-wqu@suse.com>

On Wed, Feb 13, 2019 at 11:33 AM Qu Wenruo <wqu@suse.com> wrote:
>
> From: Josef Bacik <josef@toxicpanda.com>
>
> Qgroups will do the old roots lookup at delayed ref time, which could be
> while walking down the extent root while running a delayed ref.  This
> should be fine, except we specifically lock eb's in the backref walking
> code irrespective of path->skip_locking, which deadlocks the system.
> Fix up the backref code to honor path->skip_locking, nobody will be
> modifying the commit_root when we're searching so it's completely safe
> to do.
>
> This happens Since fb235dc06fac ("btrfs: qgroup: Move half of the qgroup
> accounting time out of commit trans"), kernel may lockup with quota
> enabled.
>
> There is one backref trace triggered by snapshot dropping along with
> write operation in the source subvolume.  The example can be reliably
> reproduced:
>
>   btrfs-cleaner   D    0  4062      2 0x80000000
>   Call Trace:
>    schedule+0x32/0x90
>    btrfs_tree_read_lock+0x93/0x130 [btrfs]
>    find_parent_nodes+0x29b/0x1170 [btrfs]
>    btrfs_find_all_roots_safe+0xa8/0x120 [btrfs]
>    btrfs_find_all_roots+0x57/0x70 [btrfs]
>    btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs]
>    btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs]
>    btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs]
>    do_walk_down+0x541/0x5e3 [btrfs]
>    walk_down_tree+0xab/0xe7 [btrfs]
>    btrfs_drop_snapshot+0x356/0x71a [btrfs]
>    btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs]
>    cleaner_kthread+0x12b/0x160 [btrfs]
>    kthread+0x112/0x130
>    ret_from_fork+0x27/0x50
>
> When dropping snapshots with qgroup enabled, we will trigger backref
> walk.
>
> However such backref walk at that timing is pretty dangerous, as if one
> of the parent nodes get WRITE locked by other thread, we could cause a
> dead lock.
>
> For example:
>
>            FS 260     FS 261 (Dropped)
>             node A        node B
>            /      \      /      \
>        node C      node D      node E
>       /   \         /  \        /     \
>   leaf F|leaf G|leaf H|leaf I|leaf J|leaf K
>
> The lock sequence would be:
>
>       Thread A (cleaner)             |       Thread B (other writer)
> -----------------------------------------------------------------------
> write_lock(B)                        |
> write_lock(D)                        |
> ^^^ called by walk_down_tree()       |
>                                      |       write_lock(A)
>                                      |       write_lock(D) << Stall
> read_lock(H) << for backref walk     |
> read_lock(D) << lock owner is        |
>                 the same thread A    |
>                 so read lock is OK   |
> read_lock(A) << Stall                |
>
> So thread A hold write lock D, and needs read lock A to unlock.
> While thread B holds write lock A, while needs lock D to unlock.
>
> This will cause a deadlock.
>
> This is not only limited to snapshot dropping case.
> As the backref walk, even only happens on commit trees, is breaking the
> normal top-down locking order, makes it deadlock prone.
>
> Fixes: fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans")
> CC: stable@vger.kernel.org # 4.19+
> Reported-and-tested-by: David Sterba <dsterba@suse.com>
> Reported-by: Filipe Manana <fdmanana@suse.com>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> [ copy logs and deadlock analysis from Qu's patch ]
> [ rebase to latest branch and fix lock assert bug ]
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good.

> ---
> changelog:
> v2:
> - Rebased to latest misc-next branch.
> - Fix a lock assert by moving btrfs_set_lock_blocking_read() into the
>   same branch of btrfs_tree_read_lock().
> ---
>  fs/btrfs/backref.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index 136454dbb4af..6e647520b65a 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -712,7 +712,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info,
>   * read tree blocks and add keys where required.
>   */
>  static int add_missing_keys(struct btrfs_fs_info *fs_info,
> -                           struct preftrees *preftrees)
> +                           struct preftrees *preftrees, bool lock)
>  {
>         struct prelim_ref *ref;
>         struct extent_buffer *eb;
> @@ -737,12 +737,14 @@ static int add_missing_keys(struct btrfs_fs_info *fs_info,
>                         free_extent_buffer(eb);
>                         return -EIO;
>                 }
> -               btrfs_tree_read_lock(eb);
> +               if (lock)
> +                       btrfs_tree_read_lock(eb);
>                 if (btrfs_header_level(eb) == 0)
>                         btrfs_item_key_to_cpu(eb, &ref->key_for_search, 0);
>                 else
>                         btrfs_node_key_to_cpu(eb, &ref->key_for_search, 0);
> -               btrfs_tree_read_unlock(eb);
> +               if (lock)
> +                       btrfs_tree_read_unlock(eb);
>                 free_extent_buffer(eb);
>                 prelim_ref_insert(fs_info, &preftrees->indirect, ref, NULL);
>                 cond_resched();
> @@ -1227,7 +1229,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans,
>
>         btrfs_release_path(path);
>
> -       ret = add_missing_keys(fs_info, &preftrees);
> +       ret = add_missing_keys(fs_info, &preftrees, path->skip_locking == 0);
>         if (ret)
>                 goto out;
>
> @@ -1288,11 +1290,14 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans,
>                                         ret = -EIO;
>                                         goto out;
>                                 }
> -                               btrfs_tree_read_lock(eb);
> -                               btrfs_set_lock_blocking_read(eb);
> +                               if (!path->skip_locking) {
> +                                       btrfs_tree_read_lock(eb);
> +                                       btrfs_set_lock_blocking_read(eb);
> +                               }
>                                 ret = find_extent_in_eb(eb, bytenr,
>                                                         *extent_item_pos, &eie, ignore_offset);
> -                               btrfs_tree_read_unlock_blocking(eb);
> +                               if (!path->skip_locking)
> +                                       btrfs_tree_read_unlock_blocking(eb);
>                                 free_extent_buffer(eb);
>                                 if (ret < 0)
>                                         goto out;
> --
> 2.18.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

  reply	other threads:[~2019-02-13 12:31 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-13  6:27 [PATCH v2] btrfs: honor path->skip_locking in backref code Qu Wenruo
2019-02-13 12:31 ` Filipe Manana [this message]
2019-02-15 18:35 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL3q7H78vZiG+miY7Eb+Jh1_CyOYTkVuWzzK6_JW-Wqv19q-Uw@mail.gmail.com \
    --to=fdmanana@gmail.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).