From: Filipe Manana <fdmanana@gmail.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>, kernel-team@fb.com
Subject: Re: [PATCH 2/2] btrfs: save drop_progress if we drop refs at all
Date: Thu, 7 Feb 2019 12:07:22 +0000 [thread overview]
Message-ID: <CAL3q7H5+EDeaG7rhE_FQaCkhzwW0dBZD991Zyn1Y3Gov_OFRyg@mail.gmail.com> (raw)
In-Reply-To: <20190206204615.5862-3-josef@toxicpanda.com>
On Wed, Feb 6, 2019 at 8:47 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Previously we only updated the drop_progress key if we were in the
> DROP_REFERENCE stage of snapshot deletion. This is because the
> UPDATE_BACKREF stage checks the flags of the blocks it's converting to
> FULL_BACKREF, so if we go over a block we processed before it doesn't
> matter, we just don't do anything.
>
> The problem is in do_walk_down() we will go ahead and drop the roots
> reference to any blocks that we know we won't need to walk into.
>
> Given subvolume A and snapshot B. The root of B points to all of the
> nodes that belong to A, so all of those nodes have a refcnt > 1. If B
> did not modify those blocks it'll hit this condition in do_walk_down
>
> if (!wc->update_ref ||
> generation <= root->root_key.offset)
> goto skip;
>
> and in "goto skip" we simply do a btrfs_free_extent() for that bytenr
> that we point at.
>
> Now assume we modified some data in B, and then took a snapshot of B and
> call it C. C points to all the nodes in B, making every node the root
> of B points to have a refcnt > 1. This assumes the root level is 2 or
> higher.
>
> We delete snapshot B, which does the above work in do_walk_down,
> free'ing our ref for nodes we share with A that we didn't modify. Now
> we hit a node we _did_ modify, thus we own. We need to walk down into
> this node and we set wc->stage == UPDATE_BACKREF. We walk down to level
> 0 which we also own because we modified data. We can't walk any further
> down and thus now need to walk up and start the next part of the
> deletion. Now walk_up_proc is supposed to put us back into
> DROP_REFERENCE, but there's an exception to this
>
> if (level < wc->shared_level)
> goto out;
>
> we are at level == 0, and our shared_level == 1. We skip out of this
> one and go up to level 1. Since path->slots[1] < nritems we
> path->slots[1]++ and break out of walk_up_tree to stop our transaction
> and loop back around. Now in btrfs_drop_snapshot we have this snippet
>
> if (wc->stage == DROP_REFERENCE) {
> level = wc->level;
> btrfs_node_key(path->nodes[level],
> &root_item->drop_progress,
> path->slots[level]);
> root_item->drop_level = level;
> }
>
> our stage == UPDATE_BACKREF still, so we don't update the drop_progress
> key. This is a problem because we would have done btrfs_free_extent()
> for the nodes leading up to our current position. If we crash or
> unmount here and go to remount we'll start over where we were before and
> try to free our ref for blocks we've already freed, and thus abort()
> out.
>
> Fix this by keeping track of the last place we dropped a reference for
> our block in do_walk_down. Then if wc->stage == UPDATE_BACKREF we know
> we'll start over from a place we meant to, and otherwise things continue
> to work as they did before.
>
> I have a complicated reproducer for this problem, without this patch
> we'll fail to fsck the fs when replaying the log writes log. With this
> patch we can replay the whole log without any fsck or mount failures.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Looks good, great catch!
> ---
> fs/btrfs/extent-tree.c | 26 ++++++++++++++++++++------
> 1 file changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f40d6086c947..53fd4626660c 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8765,6 +8765,8 @@ struct walk_control {
> u64 refs[BTRFS_MAX_LEVEL];
> u64 flags[BTRFS_MAX_LEVEL];
> struct btrfs_key update_progress;
> + struct btrfs_key drop_progress;
> + int drop_level;
> int stage;
> int level;
> int shared_level;
> @@ -9148,6 +9150,16 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
> ret);
> }
> }
> +
> + /*
> + * We need to update the next key in our walk control so we can
> + * update the drop_progress key accordingly. We don't care if
> + * find_next_key doesn't find a key because that means we're at
> + * the end and are going to clean up now.
> + */
> + wc->drop_level = level;
> + find_next_key(path, level, &wc->drop_progress);
> +
> ret = btrfs_free_extent(trans, root, bytenr, fs_info->nodesize,
> parent, root->root_key.objectid,
> level - 1, 0);
> @@ -9499,12 +9511,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> }
>
> if (wc->stage == DROP_REFERENCE) {
> - level = wc->level;
> - btrfs_node_key(path->nodes[level],
> - &root_item->drop_progress,
> - path->slots[level]);
> - root_item->drop_level = level;
> - }
> + wc->drop_level = wc->level;
> + btrfs_node_key_to_cpu(path->nodes[wc->drop_level],
> + &wc->drop_progress,
> + path->slots[wc->drop_level]);
> + }
> + btrfs_cpu_key_to_disk(&root_item->drop_progress,
> + &wc->drop_progress);
> + root_item->drop_level = wc->drop_level;
>
> BUG_ON(wc->level == 0);
> if (btrfs_should_end_transaction(trans) ||
> --
> 2.14.3
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
next prev parent reply other threads:[~2019-02-07 12:07 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-06 20:46 [PATCH 0/2] Fix missing reference aborts when resuming snapshot delete Josef Bacik
2019-02-06 20:46 ` [PATCH 1/2] btrfs: check for refs on snapshot delete resume Josef Bacik
2019-02-07 12:06 ` Filipe Manana
2019-02-18 14:31 ` David Sterba
2019-02-06 20:46 ` [PATCH 2/2] btrfs: save drop_progress if we drop refs at all Josef Bacik
2019-02-07 12:07 ` Filipe Manana [this message]
2019-02-27 13:08 ` [PATCH 0/2] Fix missing reference aborts when resuming snapshot delete David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAL3q7H5+EDeaG7rhE_FQaCkhzwW0dBZD991Zyn1Y3Gov_OFRyg@mail.gmail.com \
--to=fdmanana@gmail.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).