linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Switching from spacecache v1 to v2
@ 2020-10-31  0:27 waxhead
  2020-11-01 17:49 ` Zygo Blaxell
  0 siblings, 1 reply; 5+ messages in thread
From: waxhead @ 2020-10-31  0:27 UTC (permalink / raw)
  To: Btrfs BTRFS

A couple of months ago I asked on IRC how to properly switch from 
version 1 to version 2 of the space cache. I also asked if the space 
cache v2 was considered stable.
I only remember what we talked about, and from what I understood it was 
not as easy to switch as the wiki may seem to indicate.

We run a box with a btrfs filesystem at 19TB, 9 disks, 11 subvolumes 
that contains about 6.5 million files (and this number is growing).

The filesystem has always been mounted with just the default options.

Performance is slow, and it improved when I moved the bulk of the files 
to various subvolumes for some reason. The wiki states that performance 
on very large filesystems (what is considered large?) may degrade 
drastically.

I would like to try v2 of the space cache to see if that improves speed 
a bit.

So is space cache v2 safe to use?!
And
How do I make the switch properly?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Switching from spacecache v1 to v2
  2020-10-31  0:27 Switching from spacecache v1 to v2 waxhead
@ 2020-11-01 17:49 ` Zygo Blaxell
  2020-11-02  5:48   ` A L
  2020-11-02 17:03   ` waxhead
  0 siblings, 2 replies; 5+ messages in thread
From: Zygo Blaxell @ 2020-11-01 17:49 UTC (permalink / raw)
  To: waxhead; +Cc: Btrfs BTRFS

On Sat, Oct 31, 2020 at 01:27:57AM +0100, waxhead wrote:
> A couple of months ago I asked on IRC how to properly switch from version 1
> to version 2 of the space cache. I also asked if the space cache v2 was
> considered stable.
> I only remember what we talked about, and from what I understood it was not
> as easy to switch as the wiki may seem to indicate.
> 
> We run a box with a btrfs filesystem at 19TB, 9 disks, 11 subvolumes that
> contains about 6.5 million files (and this number is growing).
> 
> The filesystem has always been mounted with just the default options.
> 
> Performance is slow, and it improved when I moved the bulk of the files to
> various subvolumes for some reason. The wiki states that performance on very
> large filesystems (what is considered large?) may degrade drastically.

The important number for space_cache=v1 performance is the number of block
groups in which some space was allocated or deallocated per transaction
(i.e. the number of block groups that have to be updated on disk),
divided by the speed of the drives (i.e. the number of seeks they can
perform per second).

"Large" could be 100GB if it was on a slow disk with a highly fragmented
workload and low latency requirement.

A 19TB filesystem has up to 19000 block groups and a spinning disk can do
maybe 150 seeks per second, so a worst-case commit could take a couple of
minutes.  Delete a few old snapshots, and you'll add enough fragmentation
to touch a significant portion of the block groups, and thus see a lot
of additional latency.

> I would like to try v2 of the space cache to see if that improves speed a
> bit.
> 
> So is space cache v2 safe to use?!

AFAIK it has been 663 days since the last bug fix specific to free space
tree (a6d8654d885d "Btrfs: fix deadlock when using free space tree due
to block group creation" from 5.0).  That fix was backported to earlier
LTS kernels.

We switched to space_cache=v2 for all new filesystems back in 2016, and
upgraded our last legacy machine still running space_cache=v1 in 2019.

I have never considered going back to v1:  we have no machines running
v1, I don't run regression tests on new kernels with v1, and I've never
seen a filesystem fail in the field due to v2 (even with the bugs we
now know it had).

IMHO the real question is "is v1 safe to use", given that its design is
based on letting errors happen, then detecting and recovering from them
after they occur (this is the mechanism behind the ubiquitous "failed to
load free space cache for block group %llu, rebuilding it now" message).
v2 prevents the errors from happening in the first place by using the
same btrfs metadata update mechanisms that are used for everything else
in the filesystem.

The problems in v1 may be mostly theoretical.  I've never cared enough
about v1 to try a practical experiment to see if btrfs recovers from
these problems correctly (or not).  v2 doesn't have those problems even
in theory, and it works, so I use v2 instead.

> And
> How do I make the switch properly?

Unmount the filesystem, mount it once with -o clear_cache,space_cache=v2.
It will take some time to create the tree.  After that, no mount option
is needed.

With current kernels it is not possible to upgrade while the filesystem is
online, i.e. to upgrade "/" you have to set rootflags in the bootloader
or boot from external media.  That and the long mount time to do the
conversion (which offends systemd's default mount timeout parameters)
are the two major gotchas.

There are some patches for future kernels that will take care of details
like deleting the v1 space cache inodes and other inert parts of the
space_cache=v1 infrastructure.  I would not bother with these
now, and instead let future kernels clean up automatically.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Switching from spacecache v1 to v2
  2020-11-01 17:49 ` Zygo Blaxell
@ 2020-11-02  5:48   ` A L
  2020-11-02 14:40     ` Zygo Blaxell
  2020-11-02 17:03   ` waxhead
  1 sibling, 1 reply; 5+ messages in thread
From: A L @ 2020-11-02  5:48 UTC (permalink / raw)
  To: Zygo Blaxell, waxhead; +Cc: Btrfs BTRFS


>> And
>> How do I make the switch properly?
> Unmount the filesystem, mount it once with -o clear_cache,space_cache=v2.
> It will take some time to create the tree.  After that, no mount option
> is needed.
>
> With current kernels it is not possible to upgrade while the filesystem is
> online, i.e. to upgrade "/" you have to set rootflags in the bootloader
> or boot from external media.  That and the long mount time to do the
> conversion (which offends systemd's default mount timeout parameters)
> are the two major gotchas.
>
> There are some patches for future kernels that will take care of details
> like deleting the v1 space cache inodes and other inert parts of the
> space_cache=v1 infrastructure.  I would not bother with these
> now, and instead let future kernels clean up automatically.

There is also this option according to the man page of btrfs-check:
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-check

--clear-space-cache v1|v2
     completely wipe all free space cache of given type
     For free space cache v1, the clear_cache kernel mount option only 
rebuilds the free space cache for block groups that are modified while 
the filesystem is mounted with that option. Thus, using this option with 
v1 makes it possible to actually clear the entire free space cache.
     For free space cache v2, the clear_cache kernel mount option 
destroys the entire free space cache. This option, with v2 provides an 
alternative method of clearing the free space cache that doesn’t require 
mounting the filesystem.

Is there any practical difference to clearing the space cache using 
mount options? For example, would a lot of old space_cache=v1 data 
remain on-disk after mounting -o clear_cache,space_cache=v2 ? Would that 
affect performance in any way?

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Switching from spacecache v1 to v2
  2020-11-02  5:48   ` A L
@ 2020-11-02 14:40     ` Zygo Blaxell
  0 siblings, 0 replies; 5+ messages in thread
From: Zygo Blaxell @ 2020-11-02 14:40 UTC (permalink / raw)
  To: A L; +Cc: waxhead, Btrfs BTRFS

On Mon, Nov 02, 2020 at 06:48:11AM +0100, A L wrote:
> 
> > > And
> > > How do I make the switch properly?
> > Unmount the filesystem, mount it once with -o clear_cache,space_cache=v2.
> > It will take some time to create the tree.  After that, no mount option
> > is needed.
> > 
> > With current kernels it is not possible to upgrade while the filesystem is
> > online, i.e. to upgrade "/" you have to set rootflags in the bootloader
> > or boot from external media.  That and the long mount time to do the
> > conversion (which offends systemd's default mount timeout parameters)
> > are the two major gotchas.
> > 
> > There are some patches for future kernels that will take care of details
> > like deleting the v1 space cache inodes and other inert parts of the
> > space_cache=v1 infrastructure.  I would not bother with these
> > now, and instead let future kernels clean up automatically.
> 
> There is also this option according to the man page of btrfs-check:
> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-check
> 
> --clear-space-cache v1|v2
>     completely wipe all free space cache of given type
>     For free space cache v1, the clear_cache kernel mount option only
> rebuilds the free space cache for block groups that are modified while the
> filesystem is mounted with that option. Thus, using this option with v1
> makes it possible to actually clear the entire free space cache.
>     For free space cache v2, the clear_cache kernel mount option destroys
> the entire free space cache. This option, with v2 provides an alternative
> method of clearing the free space cache that doesn’t require mounting the
> filesystem.
> 
> Is there any practical difference to clearing the space cache using mount
> options? 

It's easier, because mount requires only setting flags.  It doesn't
require the additional separate step of running btrfs check.

The kernel will currently recreate parts of the v1 structures when
space_cache=v2 is used, so it will partially cancel out the work btrfs
check does.  There is a patch out there to fix that, see "btrfs: skip
space_cache v1 setup when not using it").

> For example, would a lot of old space_cache=v1 data remain on-disk
> after mounting -o clear_cache,space_cache=v2 ? 

It does, but the space used is negligible.  Future kernels will clean
it up automatically, assuming the patch set lands.

> Would that affect performance in any way?

Unused space cache is inert.  It only takes up space, and not very much
of that.

> Thanks.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Switching from spacecache v1 to v2
  2020-11-01 17:49 ` Zygo Blaxell
  2020-11-02  5:48   ` A L
@ 2020-11-02 17:03   ` waxhead
  1 sibling, 0 replies; 5+ messages in thread
From: waxhead @ 2020-11-02 17:03 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Btrfs BTRFS

Zygo Blaxell wrote:
> On Sat, Oct 31, 2020 at 01:27:57AM +0100, waxhead wrote:
>> A couple of months ago I asked on IRC how to properly switch from version 1
>> to version 2 of the space cache. I also asked if the space cache v2 was
>> considered stable.
>> I only remember what we talked about, and from what I understood it was not
>> as easy to switch as the wiki may seem to indicate.
>>
>> We run a box with a btrfs filesystem at 19TB, 9 disks, 11 subvolumes that
>> contains about 6.5 million files (and this number is growing).
>>
>> The filesystem has always been mounted with just the default options.
>>
>> Performance is slow, and it improved when I moved the bulk of the files to
>> various subvolumes for some reason. The wiki states that performance on very
>> large filesystems (what is considered large?) may degrade drastically.
> 
> The important number for space_cache=v1 performance is the number of block
> groups in which some space was allocated or deallocated per transaction
> (i.e. the number of block groups that have to be updated on disk),
> divided by the speed of the drives (i.e. the number of seeks they can
> perform per second).
> 
> "Large" could be 100GB if it was on a slow disk with a highly fragmented
> workload and low latency requirement.
> 
> A 19TB filesystem has up to 19000 block groups and a spinning disk can do
> maybe 150 seeks per second, so a worst-case commit could take a couple of
> minutes.  Delete a few old snapshots, and you'll add enough fragmentation
> to touch a significant portion of the block groups, and thus see a lot
> of additional latency.
> 
>> I would like to try v2 of the space cache to see if that improves speed a
>> bit.
>>
>> So is space cache v2 safe to use?!
> 
> AFAIK it has been 663 days since the last bug fix specific to free space
> tree (a6d8654d885d "Btrfs: fix deadlock when using free space tree due
> to block group creation" from 5.0).  That fix was backported to earlier
> LTS kernels.
> 
> We switched to space_cache=v2 for all new filesystems back in 2016, and
> upgraded our last legacy machine still running space_cache=v1 in 2019.
> 
> I have never considered going back to v1:  we have no machines running
> v1, I don't run regression tests on new kernels with v1, and I've never
> seen a filesystem fail in the field due to v2 (even with the bugs we
> now know it had).
> 
> IMHO the real question is "is v1 safe to use", given that its design is
> based on letting errors happen, then detecting and recovering from them
> after they occur (this is the mechanism behind the ubiquitous "failed to
> load free space cache for block group %llu, rebuilding it now" message).
> v2 prevents the errors from happening in the first place by using the
> same btrfs metadata update mechanisms that are used for everything else
> in the filesystem.
> 
> The problems in v1 may be mostly theoretical.  I've never cared enough
> about v1 to try a practical experiment to see if btrfs recovers from
> these problems correctly (or not).  v2 doesn't have those problems even
> in theory, and it works, so I use v2 instead.
> 
>> And
>> How do I make the switch properly?
> 
> Unmount the filesystem, mount it once with -o clear_cache,space_cache=v2.
> It will take some time to create the tree.  After that, no mount option
> is needed.
> 
> With current kernels it is not possible to upgrade while the filesystem is
> online, i.e. to upgrade "/" you have to set rootflags in the bootloader
> or boot from external media.  That and the long mount time to do the
> conversion (which offends systemd's default mount timeout parameters)
> are the two major gotchas.
> 
> There are some patches for future kernels that will take care of details
> like deleting the v1 space cache inodes and other inert parts of the
> space_cache=v1 infrastructure.  I would not bother with these
> now, and instead let future kernels clean up automatically.
> 

Well I did exactly as you said. I mounted the filesystem from a live CD 
with -o clear_cache,space_cache=v2 and rebooted back into the system 
(yes, the rootfs is btrfs).

Everything I am about to say is of course subjective, but the system is 
significantly more snappy now - quite a lot too. So unless the live cd 
with kernel 5.9 tuned something magnificent that has a nice effect on 
5.8 as well the change to V2 space cache was significant on our box.

So if I may summarize... COW-ABUNGA! WOW!
Not sure why it had such a profound impact on performance, but perhaps 
V2 should be the default?!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-02 17:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-31  0:27 Switching from spacecache v1 to v2 waxhead
2020-11-01 17:49 ` Zygo Blaxell
2020-11-02  5:48   ` A L
2020-11-02 14:40     ` Zygo Blaxell
2020-11-02 17:03   ` waxhead

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).