linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: waxhead <waxhead@dirtcellar.net>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Switching from spacecache v1 to v2
Date: Sun, 1 Nov 2020 12:49:03 -0500	[thread overview]
Message-ID: <20201101174902.GU5890@hungrycats.org> (raw)
In-Reply-To: <fc45b21c-d24e-641c-efab-e1544aa98071@dirtcellar.net>

On Sat, Oct 31, 2020 at 01:27:57AM +0100, waxhead wrote:
> A couple of months ago I asked on IRC how to properly switch from version 1
> to version 2 of the space cache. I also asked if the space cache v2 was
> considered stable.
> I only remember what we talked about, and from what I understood it was not
> as easy to switch as the wiki may seem to indicate.
> 
> We run a box with a btrfs filesystem at 19TB, 9 disks, 11 subvolumes that
> contains about 6.5 million files (and this number is growing).
> 
> The filesystem has always been mounted with just the default options.
> 
> Performance is slow, and it improved when I moved the bulk of the files to
> various subvolumes for some reason. The wiki states that performance on very
> large filesystems (what is considered large?) may degrade drastically.

The important number for space_cache=v1 performance is the number of block
groups in which some space was allocated or deallocated per transaction
(i.e. the number of block groups that have to be updated on disk),
divided by the speed of the drives (i.e. the number of seeks they can
perform per second).

"Large" could be 100GB if it was on a slow disk with a highly fragmented
workload and low latency requirement.

A 19TB filesystem has up to 19000 block groups and a spinning disk can do
maybe 150 seeks per second, so a worst-case commit could take a couple of
minutes.  Delete a few old snapshots, and you'll add enough fragmentation
to touch a significant portion of the block groups, and thus see a lot
of additional latency.

> I would like to try v2 of the space cache to see if that improves speed a
> bit.
> 
> So is space cache v2 safe to use?!

AFAIK it has been 663 days since the last bug fix specific to free space
tree (a6d8654d885d "Btrfs: fix deadlock when using free space tree due
to block group creation" from 5.0).  That fix was backported to earlier
LTS kernels.

We switched to space_cache=v2 for all new filesystems back in 2016, and
upgraded our last legacy machine still running space_cache=v1 in 2019.

I have never considered going back to v1:  we have no machines running
v1, I don't run regression tests on new kernels with v1, and I've never
seen a filesystem fail in the field due to v2 (even with the bugs we
now know it had).

IMHO the real question is "is v1 safe to use", given that its design is
based on letting errors happen, then detecting and recovering from them
after they occur (this is the mechanism behind the ubiquitous "failed to
load free space cache for block group %llu, rebuilding it now" message).
v2 prevents the errors from happening in the first place by using the
same btrfs metadata update mechanisms that are used for everything else
in the filesystem.

The problems in v1 may be mostly theoretical.  I've never cared enough
about v1 to try a practical experiment to see if btrfs recovers from
these problems correctly (or not).  v2 doesn't have those problems even
in theory, and it works, so I use v2 instead.

> And
> How do I make the switch properly?

Unmount the filesystem, mount it once with -o clear_cache,space_cache=v2.
It will take some time to create the tree.  After that, no mount option
is needed.

With current kernels it is not possible to upgrade while the filesystem is
online, i.e. to upgrade "/" you have to set rootflags in the bootloader
or boot from external media.  That and the long mount time to do the
conversion (which offends systemd's default mount timeout parameters)
are the two major gotchas.

There are some patches for future kernels that will take care of details
like deleting the v1 space cache inodes and other inert parts of the
space_cache=v1 infrastructure.  I would not bother with these
now, and instead let future kernels clean up automatically.

  reply	other threads:[~2020-11-01 17:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-31  0:27 waxhead
2020-11-01 17:49 ` Zygo Blaxell [this message]
2020-11-02  5:48   ` A L
2020-11-02 14:40     ` Zygo Blaxell
2020-11-02 17:03   ` waxhead

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201101174902.GU5890@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=waxhead@dirtcellar.net \
    --subject='Re: Switching from spacecache v1 to v2' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).