linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 0/2] Fix missing reference aborts when resuming snapshot delete
Date: Wed, 27 Feb 2019 14:08:23 +0100	[thread overview]
Message-ID: <20190227130823.GG24609@twin.jikos.cz> (raw)
In-Reply-To: <20190206204615.5862-1-josef@toxicpanda.com>

On Wed, Feb 06, 2019 at 03:46:13PM -0500, Josef Bacik wrote:
> With my delayed refs rsv patches in place we started hitting issues in our build
> servers that do a lot of snapshot deletions.  Turns out there was a bug in
> btrfs_end_transaction_throttle() that caused it to basically always commit the
> transaction, which uncovered this particular bug.
> 
> The gory details are in the change logs for both patches, but generally speaking
> it's a problem with how we update our root_item->drop_progress key.  We will
> skip updating it some times even though we will have dropped references to
> blocks.  If we crash or unmount at these times we will start at a point earlier
> in our delete than we should be and try to free blocks that we already freed,
> thus ending up with a transaction abort because we couldn't find the extent
> reference.
> 
> There are 2 patches, 1 patch to deal with already broken file systems, and 1
> patch to keep this problem from happening in the first place.
> 
> The steps to reproduce this easily are sort of tricky, I had to add a couple of
> debug patches to the kernel in order to make it easy, basically I just needed to
> make sure we did actually commit the transaction every time we finished a
> walk_down_tree/walk_up_tree combo.
> 
> The reproducer
> 
> 1) Creates a base subvolume.
> 2) Creates 100k files in the subvolume.
> 3) Snapshots the base subvolume (snap1).
> 4) Touches files 5000-6000 in snap1.
> 5) Snapshots snap1 (snap2).
> 6) Deletes snap1.
> 
> I do this with dm-log-writes, and then replay to every FUA in the log and fsck
> the fs.  Without these patches this falls over pretty quickly.  With just the
> first patch we can mount the fs at the point that the fsck fails and it cleans
> everything up properly.  With both patches applied the fsck never fails and
> we're golden.  Thanks,

I copied the reproducer steps to the 2nd patch. 1 and 2 added to
misc-next.

      parent reply	other threads:[~2019-02-27 13:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-06 20:46 [PATCH 0/2] Fix missing reference aborts when resuming snapshot delete Josef Bacik
2019-02-06 20:46 ` [PATCH 1/2] btrfs: check for refs on snapshot delete resume Josef Bacik
2019-02-07 12:06   ` Filipe Manana
2019-02-18 14:31   ` David Sterba
2019-02-06 20:46 ` [PATCH 2/2] btrfs: save drop_progress if we drop refs at all Josef Bacik
2019-02-07 12:07   ` Filipe Manana
2019-02-27 13:08 ` David Sterba [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190227130823.GG24609@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).