linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix missing reference aborts when resuming snapshot delete
@ 2019-02-06 20:46 Josef Bacik
  2019-02-06 20:46 ` [PATCH 1/2] btrfs: check for refs on snapshot delete resume Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Josef Bacik @ 2019-02-06 20:46 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

With my delayed refs rsv patches in place we started hitting issues in our build
servers that do a lot of snapshot deletions.  Turns out there was a bug in
btrfs_end_transaction_throttle() that caused it to basically always commit the
transaction, which uncovered this particular bug.

The gory details are in the change logs for both patches, but generally speaking
it's a problem with how we update our root_item->drop_progress key.  We will
skip updating it some times even though we will have dropped references to
blocks.  If we crash or unmount at these times we will start at a point earlier
in our delete than we should be and try to free blocks that we already freed,
thus ending up with a transaction abort because we couldn't find the extent
reference.

There are 2 patches, 1 patch to deal with already broken file systems, and 1
patch to keep this problem from happening in the first place.

The steps to reproduce this easily are sort of tricky, I had to add a couple of
debug patches to the kernel in order to make it easy, basically I just needed to
make sure we did actually commit the transaction every time we finished a
walk_down_tree/walk_up_tree combo.

The reproducer

1) Creates a base subvolume.
2) Creates 100k files in the subvolume.
3) Snapshots the base subvolume (snap1).
4) Touches files 5000-6000 in snap1.
5) Snapshots snap1 (snap2).
6) Deletes snap1.

I do this with dm-log-writes, and then replay to every FUA in the log and fsck
the fs.  Without these patches this falls over pretty quickly.  With just the
first patch we can mount the fs at the point that the fsck fails and it cleans
everything up properly.  With both patches applied the fsck never fails and
we're golden.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-02-27 13:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-06 20:46 [PATCH 0/2] Fix missing reference aborts when resuming snapshot delete Josef Bacik
2019-02-06 20:46 ` [PATCH 1/2] btrfs: check for refs on snapshot delete resume Josef Bacik
2019-02-07 12:06   ` Filipe Manana
2019-02-18 14:31   ` David Sterba
2019-02-06 20:46 ` [PATCH 2/2] btrfs: save drop_progress if we drop refs at all Josef Bacik
2019-02-07 12:07   ` Filipe Manana
2019-02-27 13:08 ` [PATCH 0/2] Fix missing reference aborts when resuming snapshot delete David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).