linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>, kernel-team@fb.com
Subject: Re: [PATCH 2/2] btrfs: save drop_progress if we drop refs at all
Date: Thu, 7 Feb 2019 12:07:22 +0000	[thread overview]
Message-ID: <CAL3q7H5+EDeaG7rhE_FQaCkhzwW0dBZD991Zyn1Y3Gov_OFRyg@mail.gmail.com> (raw)
In-Reply-To: <20190206204615.5862-3-josef@toxicpanda.com>

On Wed, Feb 6, 2019 at 8:47 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Previously we only updated the drop_progress key if we were in the
> DROP_REFERENCE stage of snapshot deletion.  This is because the
> UPDATE_BACKREF stage checks the flags of the blocks it's converting to
> FULL_BACKREF, so if we go over a block we processed before it doesn't
> matter, we just don't do anything.
>
> The problem is in do_walk_down() we will go ahead and drop the roots
> reference to any blocks that we know we won't need to walk into.
>
> Given subvolume A and snapshot B.  The root of B points to all of the
> nodes that belong to A, so all of those nodes have a refcnt > 1.  If B
> did not modify those blocks it'll hit this condition in do_walk_down
>
> if (!wc->update_ref ||
>     generation <= root->root_key.offset)
>         goto skip;
>
> and in "goto skip" we simply do a btrfs_free_extent() for that bytenr
> that we point at.
>
> Now assume we modified some data in B, and then took a snapshot of B and
> call it C.  C points to all the nodes in B, making every node the root
> of B points to have a refcnt > 1.  This assumes the root level is 2 or
> higher.
>
> We delete snapshot B, which does the above work in do_walk_down,
> free'ing our ref for nodes we share with A that we didn't modify.  Now
> we hit a node we _did_ modify, thus we own.  We need to walk down into
> this node and we set wc->stage == UPDATE_BACKREF.  We walk down to level
> 0 which we also own because we modified data.  We can't walk any further
> down and thus now need to walk up and start the next part of the
> deletion.  Now walk_up_proc is supposed to put us back into
> DROP_REFERENCE, but there's an exception to this
>
> if (level < wc->shared_level)
>         goto out;
>
> we are at level == 0, and our shared_level == 1.  We skip out of this
> one and go up to level 1.  Since path->slots[1] < nritems we
> path->slots[1]++ and break out of walk_up_tree to stop our transaction
> and loop back around.  Now in btrfs_drop_snapshot we have this snippet
>
> if (wc->stage == DROP_REFERENCE) {
>         level = wc->level;
>         btrfs_node_key(path->nodes[level],
>                        &root_item->drop_progress,
>                        path->slots[level]);
>         root_item->drop_level = level;
> }
>
> our stage == UPDATE_BACKREF still, so we don't update the drop_progress
> key.  This is a problem because we would have done btrfs_free_extent()
> for the nodes leading up to our current position.  If we crash or
> unmount here and go to remount we'll start over where we were before and
> try to free our ref for blocks we've already freed, and thus abort()
> out.
>
> Fix this by keeping track of the last place we dropped a reference for
> our block in do_walk_down.  Then if wc->stage == UPDATE_BACKREF we know
> we'll start over from a place we meant to, and otherwise things continue
> to work as they did before.
>
> I have a complicated reproducer for this problem, without this patch
> we'll fail to fsck the fs when replaying the log writes log.  With this
> patch we can replay the whole log without any fsck or mount failures.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, great catch!

> ---
>  fs/btrfs/extent-tree.c | 26 ++++++++++++++++++++------
>  1 file changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f40d6086c947..53fd4626660c 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8765,6 +8765,8 @@ struct walk_control {
>         u64 refs[BTRFS_MAX_LEVEL];
>         u64 flags[BTRFS_MAX_LEVEL];
>         struct btrfs_key update_progress;
> +       struct btrfs_key drop_progress;
> +       int drop_level;
>         int stage;
>         int level;
>         int shared_level;
> @@ -9148,6 +9150,16 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
>                                              ret);
>                         }
>                 }
> +
> +               /*
> +                * We need to update the next key in our walk control so we can
> +                * update the drop_progress key accordingly.  We don't care if
> +                * find_next_key doesn't find a key because that means we're at
> +                * the end and are going to clean up now.
> +                */
> +               wc->drop_level = level;
> +               find_next_key(path, level, &wc->drop_progress);
> +
>                 ret = btrfs_free_extent(trans, root, bytenr, fs_info->nodesize,
>                                         parent, root->root_key.objectid,
>                                         level - 1, 0);
> @@ -9499,12 +9511,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>                 }
>
>                 if (wc->stage == DROP_REFERENCE) {
> -                       level = wc->level;
> -                       btrfs_node_key(path->nodes[level],
> -                                      &root_item->drop_progress,
> -                                      path->slots[level]);
> -                       root_item->drop_level = level;
> -               }
> +                       wc->drop_level = wc->level;
> +                       btrfs_node_key_to_cpu(path->nodes[wc->drop_level],
> +                                             &wc->drop_progress,
> +                                             path->slots[wc->drop_level]);
> +               }
> +               btrfs_cpu_key_to_disk(&root_item->drop_progress,
> +                                     &wc->drop_progress);
> +               root_item->drop_level = wc->drop_level;
>
>                 BUG_ON(wc->level == 0);
>                 if (btrfs_should_end_transaction(trans) ||
> --
> 2.14.3
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

  reply	other threads:[~2019-02-07 12:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-06 20:46 [PATCH 0/2] Fix missing reference aborts when resuming snapshot delete Josef Bacik
2019-02-06 20:46 ` [PATCH 1/2] btrfs: check for refs on snapshot delete resume Josef Bacik
2019-02-07 12:06   ` Filipe Manana
2019-02-18 14:31   ` David Sterba
2019-02-06 20:46 ` [PATCH 2/2] btrfs: save drop_progress if we drop refs at all Josef Bacik
2019-02-07 12:07   ` Filipe Manana [this message]
2019-02-27 13:08 ` [PATCH 0/2] Fix missing reference aborts when resuming snapshot delete David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL3q7H5+EDeaG7rhE_FQaCkhzwW0dBZD991Zyn1Y3Gov_OFRyg@mail.gmail.com \
    --to=fdmanana@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).