seeking advice for a backup server (accepting btrfs receive streams via SSH)

All of lore.kernel.org
 help / color / mirror / Atom feed

* seeking advice for a backup server (accepting btrfs receive streams via SSH)
@ 2021-09-09 18:28 Dave T
  2021-09-11 20:41 ` Dave T
  0 siblings, 1 reply; 6+ messages in thread
From: Dave T @ 2021-09-09 18:28 UTC (permalink / raw)
  To: Btrfs BTRFS

Hello. I have a server on a LAN that will act as a backup target for
clients that use btrbk to send snapshots via SSH.

After my initial attempt, the backup server became extremely slow. I
don't know the cause yet, and I'm starting to investigate.

The first thing I would like to know from this group is whether there
are special considerations for configuring or managing a server that
will receive many btrfs snapshots from other devices.

For example, do the general rules about limiting the number of
snapshots on a volume still apply in this case?

Thanks for any input.

As a somewhat unrelated side note, I had a large USB attached storage
device on another server and it got up to more than 20,000 btrfs
snapshots in one volume without any apparent issues. Some of the
snapshots dated back about six years. I no longer have all those
snapshots, but it was an interesting experiment and it succeeded as
far as I am concerned.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: seeking advice for a backup server (accepting btrfs receive streams via SSH)
  2021-09-09 18:28 seeking advice for a backup server (accepting btrfs receive streams via SSH) Dave T
@ 2021-09-11 20:41 ` Dave T
  2021-09-11 22:01   ` Forza
  0 siblings, 1 reply; 6+ messages in thread
From: Dave T @ 2021-09-11 20:41 UTC (permalink / raw)
  To: Btrfs BTRFS

Hello. I have a server on a LAN that will act as a backup target for
clients that use btrbk to send snapshots via SSH.

After my initial attempt, the backup server became extremely slow. I
don't know the cause yet, and I'm starting to investigate.

The first thing I would like to know from this group is whether there
are special considerations for configuring or managing a server that
will receive many btrfs snapshots from other devices.

For example, do the general rules about limiting the number of
snapshots on a volume still apply in this case?

Thanks for any input.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: seeking advice for a backup server (accepting btrfs receive streams via SSH)
  2021-09-11 20:41 ` Dave T
@ 2021-09-11 22:01   ` Forza
  2021-09-11 22:22     ` Dave T
  0 siblings, 1 reply; 6+ messages in thread
From: Forza @ 2021-09-11 22:01 UTC (permalink / raw)
  To: Dave T, Btrfs BTRFS

---- From: Dave T <davestechshop@gmail.com> -- Sent: 2021-09-11 - 22:41 ----

> Hello. I have a server on a LAN that will act as a backup target for
> clients that use btrbk to send snapshots via SSH.
> 
> After my initial attempt, the backup server became extremely slow. I
> don't know the cause yet, and I'm starting to investigate.
> 
> The first thing I would like to know from this group is whether there
> are special considerations for configuring or managing a server that
> will receive many btrfs snapshots from other devices.
> 
> For example, do the general rules about limiting the number of
> snapshots on a volume still apply in this case?
> 
> Thanks for any input.

It's hard to say much without more detailed information about your set up, such as hardware configuration, filesystem setup, etc. What do you consider slow?

Some pointers to look at may be
* deleting snapshots can cause increased I/O.
* atimes can affect snapshots as they mean cow of metadata. Mount as noatime.
* exclude snapshots from mlocate/updatedb and other indexing services. I forgot once and ended up with several gb database... :D
* space_cache=v2 can be helpful, but it increases metadata usage a little.
* monitor disk usage allocation with 'btrfs filesystem usage /mnt'

Good luck. 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: seeking advice for a backup server (accepting btrfs receive streams via SSH)
  2021-09-11 22:01   ` Forza
@ 2021-09-11 22:22     ` Dave T
  2021-09-12 17:47       ` Forza
  0 siblings, 1 reply; 6+ messages in thread
From: Dave T @ 2021-09-11 22:22 UTC (permalink / raw)
  To: Forza, Btrfs BTRFS

On Sat, Sep 11, 2021 at 6:01 PM Forza <forza@tnonline.net> wrote:
>
>
>
> ---- From: Dave T <davestechshop@gmail.com> -- Sent: 2021-09-11 - 22:41 ----
>
> > Hello. I have a server on a LAN that will act as a backup target for
> > clients that use btrbk to send snapshots via SSH.
> >
> > After my initial attempt, the backup server became extremely slow. I
> > don't know the cause yet, and I'm starting to investigate.
> >
> > The first thing I would like to know from this group is whether there
> > are special considerations for configuring or managing a server that
> > will receive many btrfs snapshots from other devices.
> >
> > For example, do the general rules about limiting the number of
> > snapshots on a volume still apply in this case?
> >
> > Thanks for any input.
>
>
> It's hard to say much without more detailed information about your set up, such as hardware configuration, filesystem setup, etc.

I can offer any additional info needed. Rather than guess at what
anyone wants to see, I will respond with the info upon request.

> What do you consider slow?

The connected clients will freeze for several minutes, up to 15
minutes or more sometimes. It was not just "normal slow" it was
unusable. These periods of extreme slowness did not correspond, as far
as I could tell, to the moments when clients were running any btrbk
operations. It seemed random.

I started over with a "new" (i.e., repurposed) server and so far it
seems OK in testing with just a few clients. But before I go too far
down this path I want to make sure the general idea is workable,
assuming I have adequate hardware.

> Some pointers to look at may be

Thank you for offering these pointers.

> * deleting snapshots can cause increased I/O.

Under what circumstances? Do you mean that when there are a lot of
snapshots, deleting some may cause increased I/O? Deletions are
managed per client by the btrbk config running on that client. btrbk
sends snapshot diffs (incremental backups) to the backup server
according to the schedule on each client, and it removes existing
backups that exceed the allotted qty.

> * atimes can affect snapshots as they mean cow of metadata. Mount as noatime.

I am already doing that.

> * exclude snapshots from mlocate/updatedb and other indexing services. I forgot once and ended up with several gb database... :D

I am not aware of these services (mlocate/updatedb and other indexing
services). Do you have tips for finding any such running services or
what some of the others might be?

# which mlocate
which: no mlocate in
(/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl)
# man mlocate
No manual entry for mlocate

The mlocate package is not installed by my package manager.

I got similar results for updatedb.

> * space_cache=v2 can be helpful, but it increases metadata usage a little.

I am using space_cache=v2 on the main volumes, which includes where
these "backups" are saved. The root (os) volume itself hasn't been
converted from space_cache v1 yet (b/c I haven't had time to read up
on that).

> * monitor disk usage allocation with 'btrfs filesystem usage /mnt'

That's a very vague recommendation. I'm already doing regular balance,
scrub and making sure the disks are not out of space.
>
> Good luck.
>
Thank you for replying. Can I assume that it is generally OK to use a
backup server in this way where it will receive (over time) hundreds
or thousands of backups (incremental usually) via btrbk running on
different clients?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: seeking advice for a backup server (accepting btrfs receive streams via SSH)
  2021-09-11 22:22     ` Dave T
@ 2021-09-12 17:47       ` Forza
  2021-09-13 13:12         ` Dave T
  0 siblings, 1 reply; 6+ messages in thread
From: Forza @ 2021-09-12 17:47 UTC (permalink / raw)
  To: Dave T, Btrfs BTRFS



On 2021-09-12 00:22, Dave T wrote:
> On Sat, Sep 11, 2021 at 6:01 PM Forza <forza@tnonline.net> wrote:
>>
>>
>>
>> ---- From: Dave T <davestechshop@gmail.com> -- Sent: 2021-09-11 - 22:41 ----
>>
>>> Hello. I have a server on a LAN that will act as a backup target for
>>> clients that use btrbk to send snapshots via SSH.
>>>
>>> After my initial attempt, the backup server became extremely slow. I
>>> don't know the cause yet, and I'm starting to investigate.
>>>
>>> The first thing I would like to know from this group is whether there
>>> are special considerations for configuring or managing a server that
>>> will receive many btrfs snapshots from other devices.
>>>
>>> For example, do the general rules about limiting the number of
>>> snapshots on a volume still apply in this case?
>>>
>>> Thanks for any input.
>>
>>
>> It's hard to say much without more detailed information about your set up, such as hardware configuration, filesystem setup, etc.
> 
> I can offer any additional info needed. Rather than guess at what
> anyone wants to see, I will respond with the info upon request.

For example what kinds of disks do you use (make and model)? SMR drives 
can have really poor performance on lots of metadata IO.

And show output of "btrfs filesystem usage -T /mnt/"

> 
>> What do you consider slow?
> 
> The connected clients will freeze for several minutes, up to 15
> minutes or more sometimes. It was not just "normal slow" it was
> unusable. These periods of extreme slowness did not correspond, as far
> as I could tell, to the moments when clients were running any btrbk
> operations. It seemed random.
> 
> I started over with a "new" (i.e., repurposed) server and so far it
> seems OK in testing with just a few clients. But before I go too far
> down this path I want to make sure the general idea is workable,
> assuming I have adequate hardware.

This sounds like it is connected to snapshot deletions. They can cause 
long stalls while btrfs is in transaction. I am using btrfs myself like 
this for backups and I have not noticed it myself, however I have heard 
from users in the IRC forum #btrfs (https://web.libera.chat/#btrfs) that 
it can happen. Mostly, I think, those systems are heavily loaded.

> 
>> Some pointers to look at may be
> 
> Thank you for offering these pointers.
> 
>> * deleting snapshots can cause increased I/O.
> 
> Under what circumstances? Do you mean that when there are a lot of
> snapshots, deleting some may cause increased I/O? Deletions are
> managed per client by the btrbk config running on that client. btrbk
> sends snapshot diffs (incremental backups) to the backup server
> according to the schedule on each client, and it removes existing
> backups that exceed the allotted qty.

It would be some time after someone (btrbk) issue a "btrfs subvolume 
delete".

An alternative can be to to "rm -rf", which itself is slower, but can 
have less of an overall impact.

> 
>> * atimes can affect snapshots as they mean cow of metadata. Mount as noatime.
> 
> I am already doing that.
> 
>> * exclude snapshots from mlocate/updatedb and other indexing services. I forgot once and ended up with several gb database... :D
> 
> I am not aware of these services (mlocate/updatedb and other indexing
> services). Do you have tips for finding any such running services or
> what some of the others might be?
> 
> # which mlocate
> which: no mlocate in
> (/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl)
> # man mlocate
> No manual entry for mlocate
> 
> The mlocate package is not installed by my package manager.
> 
> I got similar results for updatedb.

This was just a guess based on my own experience.

> 
>> * space_cache=v2 can be helpful, but it increases metadata usage a little.
> 
> I am using space_cache=v2 on the main volumes, which includes where
> these "backups" are saved. The root (os) volume itself hasn't been
> converted from space_cache v1 yet (b/c I haven't had time to read up
> on that).
> 
>> * monitor disk usage allocation with 'btrfs filesystem usage /mnt'
> 
> That's a very vague recommendation. I'm already doing regular balance,
> scrub and making sure the disks are not out of space.

What I mean here is to avoid running close to full as that is not good 
for a COW filesystem.

>>
>> Good luck.
>>
> Thank you for replying. Can I assume that it is generally OK to use a
> backup server in this way where it will receive (over time) hundreds
> or thousands of backups (incremental usually) via btrbk running on
> different clients?
> 

I would say it is generally OK!



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: seeking advice for a backup server (accepting btrfs receive streams via SSH)
  2021-09-12 17:47       ` Forza
@ 2021-09-13 13:12         ` Dave T
  0 siblings, 0 replies; 6+ messages in thread
From: Dave T @ 2021-09-13 13:12 UTC (permalink / raw)
  To: Forza, Btrfs BTRFS

> >> * deleting snapshots can cause increased I/O.
> >
> > Under what circumstances? Do you mean that when there are a lot of
> > snapshots, deleting some may cause increased I/O? Deletions are
> > managed per client by the btrbk config running on that client. btrbk
> > sends snapshot diffs (incremental backups) to the backup server
> > according to the schedule on each client, and it removes existing
> > backups that exceed the allotted qty.
>
> It would be some time after someone (btrbk) issue a "btrfs subvolume
> delete".
>
> An alternative can be to to "rm -rf", which itself is slower, but can
> have less of an overall impact.

That's interesting. I would have guessed that using "rm -rf" to remove
subvolumes that are part of a chain of incremental backups would lead
to problems such as the backups not being complete or future
incremental backups failing.

> > The connected clients will freeze for several minutes, up to 15
> > minutes or more sometimes. It was not just "normal slow" it was
> > unusable. These periods of extreme slowness did not correspond, as far
> > as I could tell, to the moments when clients were running any btrbk
> > operations. It seemed random.
> >
> > I started over with a "new" (i.e., repurposed) server and so far it
> > seems OK in testing with just a few clients. But before I go too far
> > down this path I want to make sure the general idea is workable,
> > assuming I have adequate hardware.
>
> This sounds like it is connected to snapshot deletions. They can cause
> long stalls while btrfs is in transaction. I am using btrfs myself like
> this for backups and I have not noticed it myself, however I have heard
> from users in the IRC forum #btrfs (https://web.libera.chat/#btrfs) that
> it can happen. Mostly, I think, those systems are heavily loaded.
>

OK, I think you may be right that it is related to snapshot deletions.
This gives me some ideas to pursue.

> >>
> > Thank you for replying. Can I assume that it is generally OK to use a
> > backup server in this way where it will receive (over time) hundreds
> > or thousands of backups (incremental usually) via btrbk running on
> > different clients?
> >
>
> I would say it is generally OK!

Thanks again. This is enough to encourage me to persist with this plan.

> What I mean here is to avoid running close to full as that is not good for a COW filesystem.

I never let btrfs storage devices get more than about 75 or 80% full
according to btrfs fi usage cmd, and most are at 50% or less. Sending
cmd output wouldn't help now because I recently deleted everything
from the backup target drive to start over. The drive is nearly empty,
atm.

In terms of hardware, it hasn't changed recently. I've been doing
hourly btrfs snapshots and daily send | receive backups for years on
generally the same hardware and there were no performance issues. The
problems started recently when I did 2 things:

1. switched to btrbk
2. started sending the daily send | receive backups to a server on the
LAN instead of to local USB attached storage.

The SMR backup disk in the server is currently a HGST HUH721212ALE600
10.9T, which is definitely a better disk than what is inside the
low-end WD Passport USB HDDs I was using.

The client devices (hosts) are generally running Samsung SSD's like:

Samsung SSD 970 EVO Plus
Samsung SSD 980 PRO

Since the recent performance problems don't coincide with any hardware
changes, and I am not seeing any hardware-related errors, I will focus
my troubleshooting on my btrbk configuration.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-13 13:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-09 18:28 seeking advice for a backup server (accepting btrfs receive streams via SSH) Dave T
2021-09-11 20:41 ` Dave T
2021-09-11 22:01   ` Forza
2021-09-11 22:22     ` Dave T
2021-09-12 17:47       ` Forza
2021-09-13 13:12         ` Dave T

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.