All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs based backup?
@ 2019-11-12 18:34 Ulli Horlacher
  2019-11-12 18:58 ` joshua
                   ` (6 more replies)
  0 siblings, 7 replies; 31+ messages in thread
From: Ulli Horlacher @ 2019-11-12 18:34 UTC (permalink / raw)
  To: linux-btrfs


I need a new backup system for some servers. Destination is a RAID, not
tapes.

So far I have used a self written shell script. 25 years old, over 1000
lines of (HORRIBLE) code, no longer maintenable :-}

All backup software I know is either too primitive (e.g. no versioning) or
very complex and needs a long time to master it.

My new idea is:

Set up a backup server with btrfs storage (with compress mount option),
the clients do their backup with rsync over nfs.

For versioning I make btrfs snapshots.


To have a secondary backup I will use btrfs send / receive,


Any comments on this? Or better suggestions?

The backup software must be open source!

-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum TIK         
Universitaet Stuttgart         E-Mail: horlacher@tik.uni-stuttgart.de
Allmandring 30a                Tel:    ++49-711-68565868
70569 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/
REF:<20191112183425.GA1257@tik.uni-stuttgart.de>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: btrfs based backup?
  2019-11-12 18:34 btrfs based backup? Ulli Horlacher
@ 2019-11-12 18:58 ` joshua
  2019-11-12 19:09 ` Oliver Freyermuth
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: joshua @ 2019-11-12 18:58 UTC (permalink / raw)
  To: Ulli Horlacher, linux-btrfs

I highly recommend something similar to snapper or btrbk (https://github.com/digint/btrbk) for the automation of snapshotting.

I've used snapper previously and currently use btrbk, and both allow you to set very customizable retention policies for snapshots.

Say you take snapshots every hour, you could configure something like:
- Keep Hourly snapshots for 24 hours.
- Keep Daily snapshots for 7 days.
- Keep Weekly snapshots for 4 weeks.
- Keep Monthly snapshots for 6 months.

Of course you can optimize what snapshots you keep based on your knowledge of the data, and balancing point-in-time recovery vs not having too many snapshots to make some btrfs operations slower.

btrbk is focused towards running it both on a source and a destination server to automate send & receive for backup purposes, but it can also simply manage snapshots on the local machine.


November 12, 2019 10:34 AM, "Ulli Horlacher" <framstag@rus.uni-stuttgart.de> wrote:

> I need a new backup system for some servers. Destination is a RAID, not
> tapes.
> 
> So far I have used a self written shell script. 25 years old, over 1000
> lines of (HORRIBLE) code, no longer maintenable :-}
> 
> All backup software I know is either too primitive (e.g. no versioning) or
> very complex and needs a long time to master it.
> 
> My new idea is:
> 
> Set up a backup server with btrfs storage (with compress mount option),
> the clients do their backup with rsync over nfs.
> 
> For versioning I make btrfs snapshots.
> 
> To have a secondary backup I will use btrfs send / receive,
> 
> Any comments on this? Or better suggestions?
> 
> The backup software must be open source!
> 
> -- 
> Ullrich Horlacher Server und Virtualisierung
> Rechenzentrum TIK 
> Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de
> Allmandring 30a Tel: ++49-711-68565868
> 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de
> REF:<20191112183425.GA1257@tik.uni-stuttgart.de>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: btrfs based backup?
  2019-11-12 18:34 btrfs based backup? Ulli Horlacher
  2019-11-12 18:58 ` joshua
@ 2019-11-12 19:09 ` Oliver Freyermuth
  2019-11-12 19:14 ` Remi Gauvin
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Oliver Freyermuth @ 2019-11-12 19:09 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Ulli Horlacher

[-- Attachment #1: Type: text/plain, Size: 4379 bytes --]

Hi,

I'm not sure if the btrfs list is the correct place for a generic answer - but I'll try to give one
mentioning all the backup solutions I have collected experience with (all open source, of course). 

1) btrbk ( https://github.com/digint/btrbk )
   I use it on all my personal machines, both for local snapshotting (to unroll my own mistakes easily...) and for sending the incrementals
   to an external storage. It's basically a well-working btrfs send / receive automation, so it needs btrfs at both ends (or becomes less efficient), which may not match your use case. 

2) Borg Backup ( https://borgbackup.readthedocs.io/en/stable/ )
   I use this whenever I do not have btrfs at one / both ends. It can also do encryption, compression and deduplication, purge old incrementals without ever doing a full backup,
   you can even mount your backups. 
   I use this for some smaller machines (e.g. on a Raspberry Pi) and we use it on our infrastructure for some configuration backups. 

3) Restic ( https://restic.readthedocs.io/en/latest/ )
   Restic is (feature-wise) like borg (but no compression yet). The main difference is that it can (but does not have to) back up to cloud-like storages such as S3. 
   We intend to use this heavily to a local Ceph storage system with 3x replication offering S3/Swift via Rados Gateway nodes. 
   If you want something less heavy than a Ceph cluster (we love it, it does not bite!) you can try minio ( https://min.io/ ). I never used minio myself,
   but only heard good things about it. 

4) Duplicati ( https://www.duplicati.com/ )
   Like Borg / Restic (can also talk S3 if wanted, or store to a file system, also does compression). 
   The main advantage here is that it has a GUI. Probably not interesting for your use case, but we intend to recommend that to our users
   with Windows / MacOS X who may prefer some buttons to click. 

5) Since you mention VMs in your signature, I'll also mention:
   https://benji-backup.me/
   http://backy2.com/
   https://bitbucket.org/flyingcircus/backy
   I'll personally recommend benji here, due to a large featureset, very active development and high efficiency. 
   It does differential backups of RBD volumes, so it will only be really useful to you if you use Ceph RBD
   (you can also get it to work with LVM and raw block devices, I think). 
   You can find some of our experiences with it here:
   https://indico.cern.ch/event/765214/contributions/3517132/

I think all of these are not too complex (of course, they only work well if your infrastructure matches them)
since you can essentially arrive at a working backup and restore in a few minutes. 
I'll also add that for almost all of our servers, we do not do any backups at all - file servers and services with data have their storage replicated to their HA partner node(s),
and all configuration is "backed up" by having it completely in Foreman / Puppet, so a machine can be reinstalled at the push of a button. 

The main functionalities you give up with your original idea (rsync to one replicated node) is that you do not have deduplication built-in and would need to do encryption at the source yourself if needed. 
Also, you would have to do regular "full" backups and think about how you can keep incrementals - rsnapshot (which nowadays seems rather dead) could do something similar via rsync with hardlinks,
but that meant you had tons of files which no FS really likes, and would always have to store a full file if a single byte changed. 

Cheers and hope that helps,
	Oliver

Am 12.11.19 um 19:34 schrieb Ulli Horlacher:
> 
> I need a new backup system for some servers. Destination is a RAID, not
> tapes.
> 
> So far I have used a self written shell script. 25 years old, over 1000
> lines of (HORRIBLE) code, no longer maintenable :-}
> 
> All backup software I know is either too primitive (e.g. no versioning) or
> very complex and needs a long time to master it.
> 
> My new idea is:
> 
> Set up a backup server with btrfs storage (with compress mount option),
> the clients do their backup with rsync over nfs.
> 
> For versioning I make btrfs snapshots.
> 
> 
> To have a secondary backup I will use btrfs send / receive,
> 
> 
> Any comments on this? Or better suggestions?
> 
> The backup software must be open source!
> 



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5432 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: btrfs based backup?
  2019-11-12 18:34 btrfs based backup? Ulli Horlacher
  2019-11-12 18:58 ` joshua
  2019-11-12 19:09 ` Oliver Freyermuth
@ 2019-11-12 19:14 ` Remi Gauvin
  2019-11-12 20:05 ` Oliver Freyermuth
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Remi Gauvin @ 2019-11-12 19:14 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 725 bytes --]

On 2019-11-12 1:34 p.m., Ulli Horlacher wrote:

> Set up a backup server with btrfs storage (with compress mount option),
> the clients do their backup with rsync over nfs.
> 
> For versioning I make btrfs snapshots.
> 

My KISS script to do exactly this looks like so:
(Permissions on the backup are handled by default ACL's on the receiving
end.)

/usr/bin/rsync \
	-a --no-o --no-g --no-p --chmod=ugo=rwX --info=STATS \
	--inplace --exclude='.snapshots' --delete \
	/nas/  \
	backup.server:/backups/server/nas/ \
    || exit 1

/usr/bin/ssh backup.server \
  "/bin/btrfs file defrag -r -t 32M /backups/server"


/usr/bin/ssh backup.server \
  "snapper -c server create -c number" \
  || exit 1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: btrfs based backup?
  2019-11-12 18:34 btrfs based backup? Ulli Horlacher
                   ` (2 preceding siblings ...)
  2019-11-12 19:14 ` Remi Gauvin
@ 2019-11-12 20:05 ` Oliver Freyermuth
  2019-11-20 16:36   ` freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?) Christian Pernegger
  2019-11-12 20:48 ` btrfs based backup? Michael
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 31+ messages in thread
From: Oliver Freyermuth @ 2019-11-12 20:05 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I'm not sure if the btrfs list is the correct place for a generic answer - but I'll try to give one
mentioning all the backup solutions I have collected experience with (all open source, of course). 

1) btrbk ( https://github.com/digint/btrbk )
   I use it on all my personal machines, both for local snapshotting (to unroll my own mistakes easily...) and for sending the incrementals
   to an external storage. It's basically a well-working btrfs send / receive automation, so it needs btrfs at both ends (or becomes less efficient), which may not match your use case. 

2) Borg Backup ( https://borgbackup.readthedocs.io/en/stable/ )
   I use this whenever I do not have btrfs at one / both ends. It can also do encryption, compression and deduplication, purge old incrementals without ever doing a full backup,
   you can even mount your backups. 
   I use this for some smaller machines (e.g. on a Raspberry Pi) and we use it on our infrastructure for some configuration backups. 

3) Restic ( https://restic.readthedocs.io/en/latest/ )
   Restic is (feature-wise) like borg (but no compression yet). The main difference is that it can (but does not have to) back up to cloud-like storages such as S3. 
   We intend to use this heavily to a local Ceph storage system with 3x replication offering S3/Swift via Rados Gateway nodes. 
   If you want something less heavy than a Ceph cluster (we love it, it does not bite!) you can try minio ( https://min.io/ ). I never used minio myself,
   but only heard good things about it. 

4) Duplicati ( https://www.duplicati.com/ )
   Like Borg / Restic (can also talk S3 if wanted, or store to a file system, also does compression). 
   The main advantage here is that it has a GUI. Probably not interesting for your use case, but we intend to recommend that to our users
   with Windows / MacOS X who may prefer some buttons to click. 

5) Since you mention VMs in your signature, I'll also mention:
   https://benji-backup.me/
   http://backy2.com/
   https://bitbucket.org/flyingcircus/backy
   I'll personally recommend benji here, due to a large featureset, very active development and high efficiency. 
   It does differential backups of RBD volumes, so it will only be really useful to you if you use Ceph RBD
   (you can also get it to work with LVM and raw block devices, I think). 
   You can find some of our experiences with it here:
   https://indico.cern.ch/event/765214/contributions/3517132/

I think all of these are not too complex (of course, they only work well if your infrastructure matches them)
since you can essentially arrive at a working backup and restore in a few minutes. 
I'll also add that for almost all of our servers, we do not do any backups at all - file servers and services with data have their storage replicated to their HA partner node(s),
and all configuration is "backed up" by having it completely in Foreman / Puppet, so a machine can be reinstalled at the push of a button. 

The main functionalities you give up with your original idea (rsync to one replicated node) is that you do not have deduplication built-in and would need to do encryption at the source yourself if needed. 
Also, you would have to do regular "full" backups and think about how you can keep incrementals - rsnapshot (which nowadays seems rather dead) could do something similar via rsync with hardlinks,
but that meant you had tons of files which no FS really likes, and would always have to store a full file if a single byte changed. 

Cheers and hope that helps,
	Oliver

Am 12.11.19 um 19:34 schrieb Ulli Horlacher:
> 
> I need a new backup system for some servers. Destination is a RAID, not
> tapes.
> 
> So far I have used a self written shell script. 25 years old, over 1000
> lines of (HORRIBLE) code, no longer maintenable :-}
> 
> All backup software I know is either too primitive (e.g. no versioning) or
> very complex and needs a long time to master it.
> 
> My new idea is:
> 
> Set up a backup server with btrfs storage (with compress mount option),
> the clients do their backup with rsync over nfs.
> 
> For versioning I make btrfs snapshots.
> 
> 
> To have a secondary backup I will use btrfs send / receive,
> 
> 
> Any comments on this? Or better suggestions?
> 
> The backup software must be open source!
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: btrfs based backup?
  2019-11-12 18:34 btrfs based backup? Ulli Horlacher
                   ` (3 preceding siblings ...)
  2019-11-12 20:05 ` Oliver Freyermuth
@ 2019-11-12 20:48 ` Michael
  2019-11-13 15:04 ` Austin S. Hemmelgarn
  2019-11-18 12:56 ` Ulli Horlacher
  6 siblings, 0 replies; 31+ messages in thread
From: Michael @ 2019-11-12 20:48 UTC (permalink / raw)
  To: linux-btrfs

12.11.2019 20:34, Ulli Horlacher пишет:
> I need a new backup system for some servers. Destination is a RAID, not
> tapes.
>
> So far I have used a self written shell script. 25 years old, over 1000
> lines of (HORRIBLE) code, no longer maintenable :-}
>
> All backup software I know is either too primitive (e.g. no versioning) or
> very complex and needs a long time to master it.
>
> My new idea is:
>
> Set up a backup server with btrfs storage (with compress mount option),
> the clients do their backup with rsync over nfs.
>
> For versioning I make btrfs snapshots.
>
>
> To have a secondary backup I will use btrfs send / receive,

Check my message with subject *"**Read-only snapshot send speed very 
slow after modify original data. Need help 
<https://www.spinics.net/lists/linux-btrfs/msg94128.html>*/"./

/Very-very slow send read-only snapshot after modify original rw subvol 
if compress. In some cases ~8-15 hour per snapshot./

/100cpu load and send only 5-100mb diff./
/There is no reaction
/

>
> Any comments on this? Or better suggestions?
>
> The backup software must be open source!
>

-- 
С уважением, Михаил
067-786-11-75

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: btrfs based backup?
  2019-11-12 18:34 btrfs based backup? Ulli Horlacher
                   ` (4 preceding siblings ...)
  2019-11-12 20:48 ` btrfs based backup? Michael
@ 2019-11-13 15:04 ` Austin S. Hemmelgarn
  2019-11-18 12:56 ` Ulli Horlacher
  6 siblings, 0 replies; 31+ messages in thread
From: Austin S. Hemmelgarn @ 2019-11-13 15:04 UTC (permalink / raw)
  To: linux-btrfs

On 2019-11-12 13:34, Ulli Horlacher wrote:
> 
> I need a new backup system for some servers. Destination is a RAID, not
> tapes.
> 
> So far I have used a self written shell script. 25 years old, over 1000
> lines of (HORRIBLE) code, no longer maintenable :-}
> 
> All backup software I know is either too primitive (e.g. no versioning) or
> very complex and needs a long time to master it.
> 
> My new idea is:
> 
> Set up a backup server with btrfs storage (with compress mount option),
> the clients do their backup with rsync over nfs.
> 
> For versioning I make btrfs snapshots.
> 
> 
> To have a secondary backup I will use btrfs send / receive,
> 
> 
> Any comments on this? Or better suggestions?
> 
> The backup software must be open source!
> 

Borg [1] backup on the clients. That will get you:

* Automatic 'versioning' without needing snapshots.
* Automatic compression and deduplication of the backups (without 
needing BTRFS to do either).
* Automatic encryption (if you want it).
* The ability to mount your backups like a filesystem (through FUSE).
* All in a layout that's reasonably friendly to copy between systems 
with tools like rsync or rclone.

Borg's big thing is that it does reference-counted deduplication of the 
individual blocks of the backup, so incrementals take up next to no 
space or time but still give you a full view of the backed up filesystem 
with each snapshot. It also has support for accessing a backup server 
over SSH, which is a bit more efficient than using something like NFS.

For the copy to the secondary backup, you could just use rsync to mirror 
the backups done with Borg (or alternatively, us rclone to mirror them 
offsite to cloud storage).

[1] https://borgbackup.readthedocs.io/en/stable/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: btrfs based backup?
  2019-11-12 18:34 btrfs based backup? Ulli Horlacher
                   ` (5 preceding siblings ...)
  2019-11-13 15:04 ` Austin S. Hemmelgarn
@ 2019-11-18 12:56 ` Ulli Horlacher
  6 siblings, 0 replies; 31+ messages in thread
From: Ulli Horlacher @ 2019-11-18 12:56 UTC (permalink / raw)
  To: linux-btrfs

On Tue 2019-11-12 (19:34), Ulli Horlacher wrote:

> I need a new backup system for some servers. Destination is a RAID, not
> tapes.
> 
> So far I have used a self written shell script. 25 years old, over 1000
> lines of (HORRIBLE) code, no longer maintenable :-}

Thanks for all your suggestions, but I found myself a really easy solution
just with btrfs+rsync.

On the clients I use:

root@tandem:~# grep backup /etc/fstab 
mutter:/backup/rsync/tandem     /backup         nfs ro,tcp,soft,retrans=1 0 0

root@tandem:~# cat bin/rsync_backup 
#!/bin/bash

exclude='
--exclude=.snapshot
--exclude=.del
--exclude=*.iso
--exclude=tmp/*
--exclude=backup
'

if [ ! -t 0 ]; then
  exec >>/var/log/backup.log
  chmod 600 /var/log/backup.log
  echo
  date +"%Y-%m-%d %H:%M:%S"
fi

mount /backup
mount -o remount,rw /backup || exit 1
rsync -vaxH --delete $exclude / /export /backup/
touch /backup/.ready
mount -o remount,ro /backup


On the backup server I use:

root@mutter:/backup/rsync# grep backup /etc/crontab 
0  *    * * *   root    /backup/rsync/snapshot >/dev/null

root@mutter:/backup/rsync# cat snapshot 
#!/bin/bash

PATH="$PATH:/opt/btrfs-tools/bin"

for i in /backup/rsync/*/.ready; do
  if [ -f "$i" ]; then
    rm $i
    snaprotate rsync 10 $(dirname $i)
  fi
done



That's all! Works like a charm :-)
And substitutes an unmaintainable 1000+ lines shell script.



-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum TIK         
Universitaet Stuttgart         E-Mail: horlacher@tik.uni-stuttgart.de
Allmandring 30a                Tel:    ++49-711-68565868
70569 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/
REF:<20191112183425.GA1257@tik.uni-stuttgart.de>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-12 20:05 ` Oliver Freyermuth
@ 2019-11-20 16:36   ` Christian Pernegger
  2019-11-20 17:59     ` Oliver Freyermuth
                       ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Christian Pernegger @ 2019-11-20 16:36 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I've decided to go with a snapshot-based backup solution for our new
Linux desktops -- thank you for the timely thread --, namely btrbk.
A couple of subvolumes for different stuff, with hourly snapshots that
regularly go to another machine. Brilliant in theory, less so in
practice, because every time btrbk runs, the box'll freeze for a few
seconds, as in, Firefox and LibreOffice, for instance, become entirely
unresponsive, games hang and so on. (AFAICT, all it does is snapshot
each subvolume and delete ones that are out of the retention period.)

I'm aware that having many snapshots can impact performance of some
operations, but I didn't think that "many" <= 200, "impact" = stop
dead and "some operations" = light desktop use. These are decently
specced, after all (Zen 2 8/12 core, 32 GB RAM, Samsung 970 Evo Plus).
What I'm asking is, is this to be expected, does it just need tuning,
is the hardware buggy, the kernel version (Ubuntu 18.04.3 HWE, their
5.0 series) a stinker, something else awry ...?

Cheers,
C.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-20 16:36   ` freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?) Christian Pernegger
@ 2019-11-20 17:59     ` Oliver Freyermuth
  2019-11-20 18:32     ` Chris Murphy
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Oliver Freyermuth @ 2019-11-20 17:59 UTC (permalink / raw)
  To: Christian Pernegger, linux-btrfs

Hi,

I'm using a ~4 year old laptop, 4 cores (+4 HT), 32 GB RAM,
Crucial mSATA SSD and don't notice neither the snapshotting nor the deletion of snapshots nor the transferring at all
(been doing this for years now). 

I'm running kernel 5.3 now, but have also been on 5.0 some time ago (but I'm on Gentoo, not Ubuntu). So I'd say this is not normal. 

The first thing you'd need to check is when exactly it happens - btrbk logs the steps it is doing. Does it happen during the snapshotting, transferring,
or deletion of snapshots? Anything in the kernel log? 

Did you run a deduplication tool on the BTRFS volumes, or use quotas? These are the only things which come to my mind which can cause high CPU load here
(but in any case, nothing should "block"). 

Cheers,
	Oliver


Am 20.11.19 um 17:36 schrieb Christian Pernegger:
> Hello,
> 
> I've decided to go with a snapshot-based backup solution for our new
> Linux desktops -- thank you for the timely thread --, namely btrbk.
> A couple of subvolumes for different stuff, with hourly snapshots that
> regularly go to another machine. Brilliant in theory, less so in
> practice, because every time btrbk runs, the box'll freeze for a few
> seconds, as in, Firefox and LibreOffice, for instance, become entirely
> unresponsive, games hang and so on. (AFAICT, all it does is snapshot
> each subvolume and delete ones that are out of the retention period.)
> 
> I'm aware that having many snapshots can impact performance of some
> operations, but I didn't think that "many" <= 200, "impact" = stop
> dead and "some operations" = light desktop use. These are decently
> specced, after all (Zen 2 8/12 core, 32 GB RAM, Samsung 970 Evo Plus).
> What I'm asking is, is this to be expected, does it just need tuning,
> is the hardware buggy, the kernel version (Ubuntu 18.04.3 HWE, their
> 5.0 series) a stinker, something else awry ...?
> 
> Cheers,
> C.
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-20 16:36   ` freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?) Christian Pernegger
  2019-11-20 17:59     ` Oliver Freyermuth
@ 2019-11-20 18:32     ` Chris Murphy
  2019-11-21  1:51     ` Qu Wenruo
  2019-11-21 22:22     ` Zygo Blaxell
  3 siblings, 0 replies; 31+ messages in thread
From: Chris Murphy @ 2019-11-20 18:32 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-btrfs

On Wed, Nov 20, 2019 at 9:36 AM Christian Pernegger <pernegger@gmail.com> wrote:
>
> Hello,
>
> I've decided to go with a snapshot-based backup solution for our new
> Linux desktops -- thank you for the timely thread --, namely btrbk.
> A couple of subvolumes for different stuff, with hourly snapshots that
> regularly go to another machine. Brilliant in theory, less so in
> practice, because every time btrbk runs, the box'll freeze for a few
> seconds, as in, Firefox and LibreOffice, for instance, become entirely
> unresponsive, games hang and so on. (AFAICT, all it does is snapshot
> each subvolume and delete ones that are out of the retention period.)
>
> I'm aware that having many snapshots can impact performance of some
> operations, but I didn't think that "many" <= 200, "impact" = stop
> dead and "some operations" = light desktop use. These are decently
> specced, after all (Zen 2 8/12 core, 32 GB RAM, Samsung 970 Evo Plus).
> What I'm asking is, is this to be expected, does it just need tuning,
> is the hardware buggy, the kernel version (Ubuntu 18.04.3 HWE, their
> 5.0 series) a stinker, something else awry ...?


What are the mount options? And what's the workload immediate prior to
the snapshot? Or does it always happen no matter the workload?

I use Btrfs on a variety of hardware and storage devices, USB flash,
NVMe, hard drives, and a Samsung 940 EVO, and I can't say I experience
anything like a freeze or hang. If I'm doing something like updates
(dnf updates, RPM) and do a snapshot while the update is happening
(bit kooky because that snapshot represents an inbetween state of the
update, essentially useless except as an intentionally poking things
with a stick just to see what happens) I do see a user space "hang" as
a flush is required as part of the snapshot, and I see this flush
using top. But so far I only see it affect the snapshot command itself
(it's a delay rather than a hang). I don't see it affect GUI
responsiveness.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-20 16:36   ` freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?) Christian Pernegger
  2019-11-20 17:59     ` Oliver Freyermuth
  2019-11-20 18:32     ` Chris Murphy
@ 2019-11-21  1:51     ` Qu Wenruo
  2019-11-21 16:44       ` Christian Pernegger
  2019-11-21 22:22     ` Zygo Blaxell
  3 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2019-11-21  1:51 UTC (permalink / raw)
  To: Christian Pernegger, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1375 bytes --]



On 2019/11/21 上午12:36, Christian Pernegger wrote:
> Hello,
> 
> I've decided to go with a snapshot-based backup solution for our new
> Linux desktops -- thank you for the timely thread --, namely btrbk.
> A couple of subvolumes for different stuff, with hourly snapshots that
> regularly go to another machine. Brilliant in theory, less so in
> practice, because every time btrbk runs, the box'll freeze for a few
> seconds, as in, Firefox and LibreOffice, for instance, become entirely
> unresponsive, games hang and so on. (AFAICT, all it does is snapshot
> each subvolume and delete ones that are out of the retention period.)
> 
> I'm aware that having many snapshots can impact performance of some
> operations, but I didn't think that "many" <= 200, "impact" = stop
> dead and "some operations" = light desktop use. These are decently
> specced, after all (Zen 2 8/12 core, 32 GB RAM, Samsung 970 Evo Plus).
> What I'm asking is, is this to be expected, does it just need tuning,
> is the hardware buggy, the kernel version (Ubuntu 18.04.3 HWE, their
> 5.0 series) a stinker, something else awry ...?

Are you using qgroup?

With qgroup, snapshot deleting is still a problem though.
(But not for snapshot creation, that shouldn't cause any slow down,
unless you're using multi-level qgroups)

Thanks,
Qu

> 
> Cheers,
> C.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-21  1:51     ` Qu Wenruo
@ 2019-11-21 16:44       ` Christian Pernegger
  2019-11-21 19:37         ` Oliver Freyermuth
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Pernegger @ 2019-11-21 16:44 UTC (permalink / raw)
  To: linux-btrfs

Am Mi., 20. Nov. 2019 um 18:59 Uhr schrieb Oliver Freyermuth
<o.freyermuth@googlemail.com>:
> So I'd say this is not normal.

Good to hear, that means it might be fixable. The alternative would be
to switch to Borg or restic, and I just don't feel comfortable with
deduplication relying solely on hashes, I'm a Luddite like that.

> The first thing you'd need to check is when exactly it happens

Currently 17 minutes past the hour, which is when my cron.hourly runs,
and that only runs btrbk. I can't say for certain if it happens every
hour, but I'm reasonably confident.

> btrbk logs the steps it is doing. Does it happen during the snapshotting, transferring, or deletion of snapshots?

It's just configured to snapshot & prune, no transfer. A central
backup server (grand name, for a white-box NAS) pulls the snapshots
each night and does its own pruning. I'm not sure how to tell when
exactly it happens, as I have not much agency while it is happening.

> Anything in the kernel log?

Nothing suspicious in btrbk.log, dmesg or the systemd journal. The
affected things just stop reacting, then continue as if nothing had
happened.

> Did you run a deduplication tool on the BTRFS volumes, or use quotas?

No to deduplication, maybe to quotas. It's possible that Timeshift
enables them, how can I check?

Just had another episode:
2019-11-21T17:17:01+0100 startup v0.26.0 - - - # btrbk command line
client, version 0.26.0
2019-11-21T17:17:01+0100 snapshot starting
/mnt/timeshift/backup/btrbk-snapshots/@.20191121T171701+0100
/mnt/timeshift/backup/@ - -
2019-11-21T17:17:01+0100 snapshot success
/mnt/timeshift/backup/btrbk-snapshots/@.20191121T171701+0100
/mnt/timeshift/backup/@ - -
2019-11-21T17:17:01+0100 snapshot starting
/mnt/timeshift/backup/btrbk-snapshots/@home.20191121T171701+0100
/mnt/timeshift/backup/@home - -
2019-11-21T17:17:01+0100 snapshot success
/mnt/timeshift/backup/btrbk-snapshots/@home.20191121T171701+0100
/mnt/timeshift/backup/@home - -
2019-11-21T17:17:01+0100 delete_snapshot starting
/mnt/timeshift/backup/btrbk-snapshots/@.20191119T161701+0100 - - -
2019-11-21T17:17:01+0100 delete_snapshot success
/mnt/timeshift/backup/btrbk-snapshots/@.20191119T161701+0100 - - -
2019-11-21T17:17:01+0100 delete_snapshot starting
/mnt/timeshift/backup/btrbk-snapshots/@home.20191119T161701+0100 - - -
2019-11-21T17:17:01+0100 delete_snapshot success
/mnt/timeshift/backup/btrbk-snapshots/@home.20191119T161701+0100 - - -
2019-11-21T17:17:01+0100 delete_snapshot starting
/mnt/timeshift/backup/btrbk-snapshots/@home-chris-.steam.20191119T161701+0100
- - -
2019-11-21T17:17:01+0100 delete_snapshot success
/mnt/timeshift/backup/btrbk-snapshots/@home-chris-.steam.20191119T161701+0100
- - -
2019-11-21T17:17:01+0100 finished success - - - -

I had a tail on the log, these came out in one go, no larger pauses.
At first I thought, just my luck, here I am lying in wait and of
course everything works, then the mini-freeze happened. CPU usage in
one core spiked during the freeze, but I couldn't switch tabs from the
graphs to the process list in gnome-system-monitor. Top it is, next
time.

Am Mi., 20. Nov. 2019 um 19:32 Uhr schrieb Chris Murphy
<lists@colorremedies.com>:
> What are the mount options?

defaults, which comes out as
rw,relatime,ssd,space_cache,subvolid=,subvol=, according to mount.

> And what's the workload immediate prior to the snapshot? Or does it always happen no matter the workload?

Can't guarantee "always", but ... This time I was in the process of
composing this e-Mail. A couple of things open, sure, Firefox, couple
of terminals, Signal, evince, deadbeat [stopped], but not doing
anything much. I'd call the workload "idle", especially fs-wise. Last
time I was typing at a bash prompt via gnome-terminal -- the input
wouldn't show or register until it was over. It's not only
i/o-intensive stuff that blocks.

Am Do., 21. Nov. 2019 um 02:51 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> Are you using qgroup?

Not knowingly. If either Timeshift or btrbk enable them, it's possible.

Cheers,
C.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-21 16:44       ` Christian Pernegger
@ 2019-11-21 19:37         ` Oliver Freyermuth
  2019-11-21 20:30           ` Christian Pernegger
  0 siblings, 1 reply; 31+ messages in thread
From: Oliver Freyermuth @ 2019-11-21 19:37 UTC (permalink / raw)
  To: Christian Pernegger, linux-btrfs

Am 21.11.19 um 17:44 schrieb Christian Pernegger:
> No to deduplication, maybe to quotas. It's possible that Timeshift
> enables them, how can I check?

You can test with:
 $ btrfs qgroup show /
 ERROR: can't list qgroups: quotas not enabled
but none of the tools you are using should activate qgroups I think
(at least btrbk does not). 

> Just had another episode:
> 2019-11-21T17:17:01+0100 startup v0.26.0 - - - # btrbk command line
> client, version 0.26.0
> 2019-11-21T17:17:01+0100 snapshot starting
> /mnt/timeshift/backup/btrbk-snapshots/@.20191121T171701+0100
> /mnt/timeshift/backup/@ - -
> 2019-11-21T17:17:01+0100 snapshot success
> /mnt/timeshift/backup/btrbk-snapshots/@.20191121T171701+0100
> /mnt/timeshift/backup/@ - -
> 2019-11-21T17:17:01+0100 snapshot starting
> /mnt/timeshift/backup/btrbk-snapshots/@home.20191121T171701+0100
> /mnt/timeshift/backup/@home - -
> 2019-11-21T17:17:01+0100 snapshot success
> /mnt/timeshift/backup/btrbk-snapshots/@home.20191121T171701+0100
> /mnt/timeshift/backup/@home - -
> 2019-11-21T17:17:01+0100 delete_snapshot starting
> /mnt/timeshift/backup/btrbk-snapshots/@.20191119T161701+0100 - - -
> 2019-11-21T17:17:01+0100 delete_snapshot success
> /mnt/timeshift/backup/btrbk-snapshots/@.20191119T161701+0100 - - -
> 2019-11-21T17:17:01+0100 delete_snapshot starting
> /mnt/timeshift/backup/btrbk-snapshots/@home.20191119T161701+0100 - - -
> 2019-11-21T17:17:01+0100 delete_snapshot success
> /mnt/timeshift/backup/btrbk-snapshots/@home.20191119T161701+0100 - - -
> 2019-11-21T17:17:01+0100 delete_snapshot starting
> /mnt/timeshift/backup/btrbk-snapshots/@home-chris-.steam.20191119T161701+0100
> - - -
> 2019-11-21T17:17:01+0100 delete_snapshot success
> /mnt/timeshift/backup/btrbk-snapshots/@home-chris-.steam.20191119T161701+0100
> - - -
> 2019-11-21T17:17:01+0100 finished success - - - -
> 
> I had a tail on the log, these came out in one go, no larger pauses.
> At first I thought, just my luck, here I am lying in wait and of
> course everything works, then the mini-freeze happened. CPU usage in
> one core spiked during the freeze, but I couldn't switch tabs from the
> graphs to the process list in gnome-system-monitor. Top it is, next
> time.

This is an interesting observation. I believe this means it is happening when the snapshot deletes are actually going to the storage,
which usually happens only _after_ btrbk is finished (in case you catch it with top, a kernel thread "btrfs-cleaner" should be doing this job). 
Another interesting test could be to adjust btrbk configuration to:
btrfs_commit_delete = each
which will ensure the delete_snapshot operations are flushed to disk one by one, so the freeze should then correlate to the log
(and might be converted from one longer freeze to multiple, contiguous smaller freezes). 

Sadly, I have no idea on why this would freeze for you (well, it's the only actual I/O-heavy part when you don't do the transfers at this point in time). 
But maybe Qu will have a good idea. 

Cheers,
	Oliver

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-21 19:37         ` Oliver Freyermuth
@ 2019-11-21 20:30           ` Christian Pernegger
  2019-11-21 21:34             ` Christian Pernegger
  2019-11-21 23:57             ` Oliver Freyermuth
  0 siblings, 2 replies; 31+ messages in thread
From: Christian Pernegger @ 2019-11-21 20:30 UTC (permalink / raw)
  To: linux-btrfs

> Am 21.11.19 um 17:44 schrieb Christian Pernegger:
> > maybe to quotas. It's possible that Timeshift enables them, how can I check?
>
> You can test with:
>  $ btrfs qgroup show /

Definitely enabled, then. ... ... ... There it is: Timeshift has a
pre-selected checkbox "enable BTRFS qgroups (recommended)" [translated
from German].

1) How can I safely disable qgroups? Is it enough to uncheck the
Timeshift option and then run btrfs quota disable or do I have to
manually remove the qgroups somehow?

2) I'm wondering if this couldn't be improved. Considering qgroups are
only used (in this case) for reporting on allocated space, not
limiting it, and btrfs free space reporting is notoriously lazy [not
meant in a bad way, can't think of a better word right now] anyway,
why does anything need to block at all? Even if I were using quotas, I
might prefer fuzzy quotas [that can be be hit too early/late because
accounting is catching up] to a temporary standstill, as an option.

> This is an interesting observation. I believe this means it is happening when the snapshot deletes are actually going to the storage,
> which usually happens only _after_ btrbk is finished (in case you catch it with top, a kernel thread "btrfs-cleaner" should be doing this job).

Ok, so btrbk runs, finishes, soon (but not immediately) after that
btrfs-cleaner indeed tops the CPU charts, pegging one core to 100 %.
The system is still responsive at this point. A couple of seconds into
the btrfs-cleaner run, the system becomes unresponsive (top still
updates throughout, though). btrfs-cleaner drops off, and
btrfs-transacti[obv. cut off] takes it's place, taking 100 % CPU.
Still unresponsive. As soon as btrfs-transacti is done, the system
immediately recovers. Then btrfs cleaner returns, briefly, with no
impact on performance. (Keep in mind that top only updates every
couple seconds, it's possible btrfs-cleaner is blameless and
btrfs-transacti the culprit.)

> Another interesting test could be to adjust btrbk configuration to:
> btrfs_commit_delete = each

Will do.

Cheers,
C.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-21 20:30           ` Christian Pernegger
@ 2019-11-21 21:34             ` Christian Pernegger
  2019-11-21 22:39               ` Marc Joliet
  2019-11-21 23:57             ` Oliver Freyermuth
  1 sibling, 1 reply; 31+ messages in thread
From: Christian Pernegger @ 2019-11-21 21:34 UTC (permalink / raw)
  To: linux-btrfs

> > Another interesting test could be to adjust btrbk configuration to:
> > btrfs_commit_delete = each
>
> Will do.

Hm. No freeze, this time (with btrbk set to commit after each delete).

In other news,
- I seem to be leaking cgroups. There are currently 191 subvolumes
(most of which are ro snapshots), but 547 "0/*" qgroups. Should
deleting a subvolume take care of removing its (auto-created) cgroup,
or does that always have to be done manually (or by setting the
experimental *_qgroup_destroy options in btrbk.conf)? Any elegant ways
to remove orphaned cqroups?
- Timeshift, at :00, triggers this as well, it's just less severe
(maybe because that's 1 subvolume instead of 3).

Cheers,
C.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-20 16:36   ` freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?) Christian Pernegger
                       ` (2 preceding siblings ...)
  2019-11-21  1:51     ` Qu Wenruo
@ 2019-11-21 22:22     ` Zygo Blaxell
  2019-11-22  4:59       ` Zygo Blaxell
  2019-11-22 14:36       ` Christian Pernegger
  3 siblings, 2 replies; 31+ messages in thread
From: Zygo Blaxell @ 2019-11-21 22:22 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3430 bytes --]

On Wed, Nov 20, 2019 at 05:36:04PM +0100, Christian Pernegger wrote:
> Hello,
> 
> I've decided to go with a snapshot-based backup solution for our new
> Linux desktops -- thank you for the timely thread --, namely btrbk.
> A couple of subvolumes for different stuff, with hourly snapshots that
> regularly go to another machine. Brilliant in theory, less so in
> practice, because every time btrbk runs, the box'll freeze for a few
> seconds, as in, Firefox and LibreOffice, for instance, become entirely
> unresponsive, games hang and so on. (AFAICT, all it does is snapshot
> each subvolume and delete ones that are out of the retention period.)

Snapshot delete is pretty aggressive with IO and can force a lot of
commits if you are modifying a lot of metadata pages between snapshots.
Generally I get a coffee when my 1TB NVME systems decide it's time to
drop a snapshot, as the system can effectively hang for a few minutes
while btrfs-cleaner runs.  On performance-critical systems we only ever
have one snapshot active on the filesystem at a time, and we only create
it once a day for backups.  I'd love a way to throttle btrfs-cleaner so
it's not so aggressive with IO and CPU.

Snapshot create has unbounded running time on 5.0 kernels.  The creation
process has to flush dirty buffers to the filesystem to get a clean
snapshot state.  Any process that is writing data while the flush is
running gets its data included in the snapshot flush, so in the worst
possible case, the snapshot flush never ends (unless you run out of disk
space, or whatever was writing new data stops, whichever comes first).

Anything that needs to take a sb_writer lock (which is almost everything
that modifies the filesystem) will hang until the snapshot create is done;
however, processes that are reading the filesystem will not be obstructed.
This can lead to starvation of the writing processes.  cgroups and ionice
won't help here--the block layer doesn't detect waits for sb_writers
(there is no associated block device for those, so they're invisible to
the block layer), so it doesn't know that writer processes are waiting
for IO, and all the writers' IO bandwidth gets reallocated to the reader
processes, making for long-lasting priority inversions.  The IO pressure
stall subsystem reads _zero_ IO pressure even though writing processes
are continuously blocked for hours.

On small systems, this is all over in a second or less.  On bigger
fileservers, I've had single snapshot creates run for many hours.  As a
workaround, I have some scripts that freeze processes that write to the
disk while 'btrfs sub create' runs, to force the snapshot create to finish
in a timely manner.  I think I saw some patches going into later 5.x
kernels that solve the problem in the kernel, too (writes that occur after
the snapshot creation starts are not included in the snapshot any more).

> I'm aware that having many snapshots can impact performance of some
> operations, but I didn't think that "many" <= 200, "impact" = stop
> dead and "some operations" = light desktop use. These are decently
> specced, after all (Zen 2 8/12 core, 32 GB RAM, Samsung 970 Evo Plus).
> What I'm asking is, is this to be expected, does it just need tuning,
> is the hardware buggy, the kernel version (Ubuntu 18.04.3 HWE, their
> 5.0 series) a stinker, something else awry ...?
> 
> Cheers,
> C.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-21 21:34             ` Christian Pernegger
@ 2019-11-21 22:39               ` Marc Joliet
  2019-11-22  1:36                 ` Chris Murphy
  0 siblings, 1 reply; 31+ messages in thread
From: Marc Joliet @ 2019-11-21 22:39 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1819 bytes --]

Am Donnerstag, 21. November 2019, 22:34:41 CET schrieb Christian Pernegger:
> > > Another interesting test could be to adjust btrbk configuration to:
> > > btrfs_commit_delete = each
> >
> > Will do.
>
> Hm. No freeze, this time (with btrbk set to commit after each delete).
>
> In other news,
> - I seem to be leaking cgroups. There are currently 191 subvolumes
> (most of which are ro snapshots), but 547 "0/*" qgroups. Should
> deleting a subvolume take care of removing its (auto-created) cgroup,
> or does that always have to be done manually (or by setting the
> experimental *_qgroup_destroy options in btrbk.conf)? Any elegant ways
> to remove orphaned cqroups?
> - Timeshift, at :00, triggers this as well, it's just less severe
> (maybe because that's 1 subvolume instead of 3).
>
> Cheers,
> C.

As Qu said, the freezes should only happen on snapshot deletion.  Depending on
how you have btrbk configured and how regularly it runs, not every btrbk run
will delete snapshots.  Therefor not every run will cause the system to lock
up.

On a side note, I am also really annoyed by the lockups caused by qgroups.  On
my Gentoo systems (which use btrbk) I have it disabled for that reason, but I
left it on on my openSUSE laptop (a Dell XPS 13 9360), which locks up for
about 15-30 minutes while cleaning up snapshots a few times a week (usually
after reboots or after "zypper dup").  Of course, that's with snapshots active
for /home, which I do so that the file system doesn't change out from under
borg while it's running.  I'm tentatively considering turning it off there,
too, but I'll experiment with the snapper configuration first.

Greetings
--
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-21 20:30           ` Christian Pernegger
  2019-11-21 21:34             ` Christian Pernegger
@ 2019-11-21 23:57             ` Oliver Freyermuth
  2019-11-22 12:30               ` Christian Pernegger
  1 sibling, 1 reply; 31+ messages in thread
From: Oliver Freyermuth @ 2019-11-21 23:57 UTC (permalink / raw)
  To: Christian Pernegger, linux-btrfs

Am 21.11.19 um 21:30 schrieb Christian Pernegger:
> Definitely enabled, then. ... ... ... There it is: Timeshift has a
> pre-selected checkbox "enable BTRFS qgroups (recommended)" [translated
> from German].

Since I've never used qgroups myself, I'll only comment on the parts where I can. 
However, I would say "(recommended)" just to get an estimate of space consumption
is a rather hard label for the option in Timeshift. 

You can check the known issues on qgroups:
https://btrfs.wiki.kernel.org/index.php/Quota_support#Known_issues
This contains, amongst other things, the observed performance issues and also:
"- After deleting a subvolume, you must manually delete the associated qgroup."
which you observe, too. But it does indeed seem btrbk can help out here:
https://github.com/digint/btrbk/issues/49

Manpages of btrfs-quota and btrfs-qgroup contain quite some warnings about the existence
of these known issues, the status page at:
https://btrfs.wiki.kernel.org/index.php/Status
links them, etc. So I believe the recommendation by Timeshift is somewhat hefty. 
Other downstreams (see e.g. https://wiki.debian.org/Btrfs or https://wiki.archlinux.org/index.php/Btrfs#Quota ) 
explicitly recommend not to use qgroup unless really needed. 

Apparently this has also been raised to the developer:
https://github.com/teejee2008/timeshift/issues/127
which has at least led to the addition of the checkmark to allow not enabling qgroup. 

> 2) I'm wondering if this couldn't be improved. Considering qgroups are
> only used (in this case) for reporting on allocated space, not
> limiting it, and btrfs free space reporting is notoriously lazy [not
> meant in a bad way, can't think of a better word right now] anyway,
> why does anything need to block at all? Even if I were using quotas, I
> might prefer fuzzy quotas [that can be be hit too early/late because
> accounting is catching up] to a temporary standstill, as an option.

You can check e.g. the man page btrfs-quota(8) for a short discussion on why doing quota correctly
with btrfs is not as easy as it may seem. 
I'll leave more comments (and how to disable them safely) to those who have experience with qgroups ;-). 

>> Another interesting test could be to adjust btrbk configuration to:
>> btrfs_commit_delete = each
> 
> Will do.
...
> Hm. No freeze, this time (with btrbk set to commit after each delete).

That might be a red herring if there was just less to delete, as Marc Joliet pointed out,
at least, I think this means we identified the reason for the freezes you get. 

Cheers,
	Oliver

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-21 22:39               ` Marc Joliet
@ 2019-11-22  1:36                 ` Chris Murphy
  2019-11-22 23:21                   ` Marc Joliet
  0 siblings, 1 reply; 31+ messages in thread
From: Chris Murphy @ 2019-11-22  1:36 UTC (permalink / raw)
  To: Marc Joliet; +Cc: Btrfs BTRFS

On Thu, Nov 21, 2019 at 3:39 PM Marc Joliet <marcec@gmx.de> wrote:

> On a side note, I am also really annoyed by the lockups caused by qgroups.  On
> my Gentoo systems (which use btrbk) I have it disabled for that reason, but I
> left it on on my openSUSE laptop (a Dell XPS 13 9360), which locks up for
> about 15-30 minutes while cleaning up snapshots a few times a week (usually
> after reboots or after "zypper dup").

15 seconds is not at all acceptable on a desktop system, 15 minutes is
atrocious. A computer that appears to hang for 15 seconds, it is
completely reasonable for ordinary users to consider has totally
faceplanted, will not recover, and to force power off. The
distribution really needs to do something about that kind of negative
user experience.

And by the way, I've recently done some unprivileged compilations of
webkitgtk, with default options that cause n cores +2 to be used,
eating all available RAM and swap, and quickly totally hanging the
system while swap thrashing and basically acting like a fork bomb. I'm
using Btrfs for the rootfs as well as user home for this compile, and
have done hundreds of forced power offs during these events and have
seen exactly zero corruptions or Btrfs complaints. So at least there's
that, however unscientific a sample that is.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-21 22:22     ` Zygo Blaxell
@ 2019-11-22  4:59       ` Zygo Blaxell
  2019-11-22 14:36       ` Christian Pernegger
  1 sibling, 0 replies; 31+ messages in thread
From: Zygo Blaxell @ 2019-11-22  4:59 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3781 bytes --]

On Thu, Nov 21, 2019 at 05:22:28PM -0500, Zygo Blaxell wrote:
> On Wed, Nov 20, 2019 at 05:36:04PM +0100, Christian Pernegger wrote:
> > Hello,
> > 
> > I've decided to go with a snapshot-based backup solution for our new
> > Linux desktops -- thank you for the timely thread --, namely btrbk.
> > A couple of subvolumes for different stuff, with hourly snapshots that
> > regularly go to another machine. Brilliant in theory, less so in
> > practice, because every time btrbk runs, the box'll freeze for a few
> > seconds, as in, Firefox and LibreOffice, for instance, become entirely
> > unresponsive, games hang and so on. (AFAICT, all it does is snapshot
> > each subvolume and delete ones that are out of the retention period.)
> 
> Snapshot delete is pretty aggressive with IO and can force a lot of
> commits if you are modifying a lot of metadata pages between snapshots.
> Generally I get a coffee when my 1TB NVME systems decide it's time to
> drop a snapshot, as the system can effectively hang for a few minutes
> while btrfs-cleaner runs.  On performance-critical systems we only ever
> have one snapshot active on the filesystem at a time, and we only create
> it once a day for backups.  I'd love a way to throttle btrfs-cleaner so
> it's not so aggressive with IO and CPU.
> 
> Snapshot create has unbounded running time on 5.0 kernels.  The creation
> process has to flush dirty buffers to the filesystem to get a clean
> snapshot state.  Any process that is writing data while the flush is
> running gets its data included in the snapshot flush, so in the worst
> possible case, the snapshot flush never ends (unless you run out of disk
> space, or whatever was writing new data stops, whichever comes first).
> 
> Anything that needs to take a sb_writer lock (which is almost everything
> that modifies the filesystem) will hang until the snapshot create is done;
> however, processes that are reading the filesystem will not be obstructed.
> This can lead to starvation of the writing processes.  cgroups and ionice
> won't help here--the block layer doesn't detect waits for sb_writers
> (there is no associated block device for those, so they're invisible to
> the block layer), so it doesn't know that writer processes are waiting
> for IO, and all the writers' IO bandwidth gets reallocated to the reader
> processes, making for long-lasting priority inversions.  The IO pressure
> stall subsystem reads _zero_ IO pressure even though writing processes
> are continuously blocked for hours.
> 
> On small systems, this is all over in a second or less.  On bigger
> fileservers, I've had single snapshot creates run for many hours.  As a
> workaround, I have some scripts that freeze processes that write to the
> disk while 'btrfs sub create' runs, to force the snapshot create to finish
> in a timely manner.  I think I saw some patches going into later 5.x
> kernels that solve the problem in the kernel, too (writes that occur after
> the snapshot creation starts are not included in the snapshot any more).

Nope, the patch I'm thinking of is dated Nov 1 *2018* and is already in
5.0.  So either that fix is ineffective, or the slow snapshots are caused
by something else.

> > I'm aware that having many snapshots can impact performance of some
> > operations, but I didn't think that "many" <= 200, "impact" = stop
> > dead and "some operations" = light desktop use. These are decently
> > specced, after all (Zen 2 8/12 core, 32 GB RAM, Samsung 970 Evo Plus).
> > What I'm asking is, is this to be expected, does it just need tuning,
> > is the hardware buggy, the kernel version (Ubuntu 18.04.3 HWE, their
> > 5.0 series) a stinker, something else awry ...?
> > 
> > Cheers,
> > C.



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-21 23:57             ` Oliver Freyermuth
@ 2019-11-22 12:30               ` Christian Pernegger
  2019-11-22 12:34                 ` Qu Wenruo
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Pernegger @ 2019-11-22 12:30 UTC (permalink / raw)
  To: linux-btrfs

Am Fr., 22. Nov. 2019 um 00:57 Uhr schrieb Oliver Freyermuth
<o.freyermuth@googlemail.com>:
> > 2) I'm wondering if this couldn't be improved. [...]
>
> You can check e.g. the man page btrfs-quota(8) for a short discussion on why doing quota correctly
> with btrfs is not as easy as it may seem.

I've read that and I appreciate the difficulties in getting accurate
usage information (or even defining what that means) from a COW
filesystem. IMHO, performance, and the trade-off between performance
and up-to-the-minute accuracy are separate issues.

FWIW, running btrfs quota disable, enable, and rescan got rid of the
orphan qgroups. The full rescan ran for all of 3 seconds and didn't
block.

Cheers,
C.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-22 12:30               ` Christian Pernegger
@ 2019-11-22 12:34                 ` Qu Wenruo
  2019-11-22 14:43                   ` Christian Pernegger
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2019-11-22 12:34 UTC (permalink / raw)
  To: Christian Pernegger, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1179 bytes --]



On 2019/11/22 下午8:30, Christian Pernegger wrote:
> Am Fr., 22. Nov. 2019 um 00:57 Uhr schrieb Oliver Freyermuth
> <o.freyermuth@googlemail.com>:
>>> 2) I'm wondering if this couldn't be improved. [...]
>>
>> You can check e.g. the man page btrfs-quota(8) for a short discussion on why doing quota correctly
>> with btrfs is not as easy as it may seem.
> 
> I've read that and I appreciate the difficulties in getting accurate
> usage information (or even defining what that means) from a COW
> filesystem. IMHO, performance, and the trade-off between performance
> and up-to-the-minute accuracy are separate issues.
> 
> FWIW, running btrfs quota disable, enable, and rescan got rid of the
> orphan qgroups. The full rescan ran for all of 3 seconds and didn't
> block.

BTW, for the empty qgroup auto delete, we have pending patch for that
already.
Just not merged yet.

https://patchwork.kernel.org/patch/11195067/


But still, for snapshot deletion part, there is still a performance impact.
(For completely independent subvolume, IIRC there is a quick path for
it, thus no performance penalty then)

Thanks,
Qu
> 
> Cheers,
> C.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-21 22:22     ` Zygo Blaxell
  2019-11-22  4:59       ` Zygo Blaxell
@ 2019-11-22 14:36       ` Christian Pernegger
  2019-11-23  3:49         ` Zygo Blaxell
  1 sibling, 1 reply; 31+ messages in thread
From: Christian Pernegger @ 2019-11-22 14:36 UTC (permalink / raw)
  To: linux-btrfs

Am Do., 21. Nov. 2019 um 23:22 Uhr schrieb Zygo Blaxell
<ce3g8jdj@umail.furryterror.org>:
> Snapshot delete is pretty aggressive with IO [...]  can effectively hang for a few minutes
> while btrfs-cleaner runs.

It's doesn't look like it's btrfs-cleaner that blocks here, though,
more like it's btrfs-transacti.

> Snapshot create has unbounded running time on 5.0 kernels.

It looks to me like delete, not create, is the culprit here.

> Anything that needs to take a sb_writer lock (which is almost everything
> that modifies the filesystem) will hang until the snapshot create is done;

It's not just fs activity, either. Even if I'm just typing in
LibreOffice or at a bash prompt, the input isn't registered during the
freeze (it's buffered, so it comes out all at once in the end).

Cheers,
C.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-22 12:34                 ` Qu Wenruo
@ 2019-11-22 14:43                   ` Christian Pernegger
  2019-11-24  0:38                     ` Qu Wenruo
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Pernegger @ 2019-11-22 14:43 UTC (permalink / raw)
  To: linux-btrfs

Am Fr., 22. Nov. 2019 um 13:34 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> But still, for snapshot deletion part, there is still a performance impact.

Ok. It's just that I'd have expected *slower* write and read
performance until everything's settled, maybe sync writes taking
noticeably longer than usual, not that all user input blocks across
the whole system regardless of fs activity.

Cheers,
C.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-22  1:36                 ` Chris Murphy
@ 2019-11-22 23:21                   ` Marc Joliet
  2020-03-08 15:11                     ` Marc Joliet
  0 siblings, 1 reply; 31+ messages in thread
From: Marc Joliet @ 2019-11-22 23:21 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3394 bytes --]

Am Freitag, 22. November 2019, 02:36:56 CET schrieb Chris Murphy:
> On Thu, Nov 21, 2019 at 3:39 PM Marc Joliet <marcec@gmx.de> wrote:
> > On a side note, I am also really annoyed by the lockups caused by qgroups.
> >  On my Gentoo systems (which use btrbk) I have it disabled for that
> > reason, but I left it on on my openSUSE laptop (a Dell XPS 13 9360),
> > which locks up for about 15-30 minutes while cleaning up snapshots a few
> > times a week (usually after reboots or after "zypper dup").
>
> 15 seconds is not at all acceptable on a desktop system, 15 minutes is
> atrocious. A computer that appears to hang for 15 seconds, it is
> completely reasonable for ordinary users to consider has totally
> faceplanted, will not recover, and to force power off. The
> distribution really needs to do something about that kind of negative
> user experience.

Sadly, I can't say if it's better without snapshotting /home, because I hadn't
accumulated many / snapshots at that point in time.  It might have gotten
worse even with only / being snapshotted.  But like I said, I'll experiment
with configuring snapper before blaming SUSE.  I believe the installation even
recommends against snapshotting /home, but hey, I wanted to do it anyway :-) .

But to be precise, it's not locked up continuously during snapshot deletion.
Occasionally I'll be able to operate my desktop for a few seconds, and if I
leave top running in a GUI terminal (in my case konsole), I'll see it updating
(almost) the entire time.  My guess (emphasis on *guess*) is that the qgroups
update is holding some lock that is preventing other I/O from finishing, thus
locking up any application that wants to write to disk and isn't doing so
concurrently (maybe Plasma is blocking on fsync() at the time?).

> And by the way, I've recently done some unprivileged compilations of
> webkitgtk, with default options that cause n cores +2 to be used,
> eating all available RAM and swap, and quickly totally hanging the
> system while swap thrashing and basically acting like a fork bomb. I'm
> using Btrfs for the rootfs as well as user home for this compile, and
> have done hundreds of forced power offs during these events and have
> seen exactly zero corruptions or Btrfs complaints. So at least there's
> that, however unscientific a sample that is.

My experience has also been that forced reboots don't cause any damage, even
though I usually only have to do them rarely [0].  I mean, with COW it should
be expected to be safe.

[0] I have two main situations where this happens: The first are RCU stalls
that cause my desktop to get hung up (happens during bootup occasionally,
shortly between the boot loader and the login screen), but also recently
started affecting my home server.  The second only affects my home server (a
used small business server), namely a wonky e1000e NIC, which I only recently
learned are sometimes buggy are known for causing servers to crash.  The
workaround is apparently to turn off TSO and GSO, and sometimes also GRO, but
I've been able to get away with only the first two without experiencing any
more crashes thus far.  Interestingly enough the RCU stalls happened shortly
after I did that.

Greetings
--
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-22 14:36       ` Christian Pernegger
@ 2019-11-23  3:49         ` Zygo Blaxell
  0 siblings, 0 replies; 31+ messages in thread
From: Zygo Blaxell @ 2019-11-23  3:49 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1764 bytes --]

On Fri, Nov 22, 2019 at 03:36:43PM +0100, Christian Pernegger wrote:
> Am Do., 21. Nov. 2019 um 23:22 Uhr schrieb Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org>:
> > Snapshot delete is pretty aggressive with IO [...]  can effectively hang for a few minutes
> > while btrfs-cleaner runs.
> 
> It's doesn't look like it's btrfs-cleaner that blocks here, though,
> more like it's btrfs-transacti.

It's hard to tell.  btrfs-transaction does a lot of work for other threads.
If you have kernel stacks enabled,

	watch -n.1 cat /proc/<pid of btrfs-cleaner>/stack

will show you what btrfs-cleaner is up to.  If it's something like
'wait_for_commit' then btrfs-cleaner dumped a bunch of work on
btrfs-transaction, and now btrfs-transaction is trying to catch up.

> > Snapshot create has unbounded running time on 5.0 kernels.
> 
> It looks to me like delete, not create, is the culprit here.
> 
> > Anything that needs to take a sb_writer lock (which is almost everything
> > that modifies the filesystem) will hang until the snapshot create is done;
> 
> It's not just fs activity, either. Even if I'm just typing in
> LibreOffice or at a bash prompt, the input isn't registered during the
> freeze (it's buffered, so it comes out all at once in the end).

IO pressure, especially blocked writes, can delay memory allocations
on Linux.  That stops almost everything dead in a modern GUI.

If you can log into the box from another machine you might be able to
watch what it's doing with 'top' etc.

On the other hand, from the other messages in this thread, it sounds like
you're using qgroups, which multiplies everything I said above by 1000.
qgroups is all in-kernel CPU, too, so userspace can't preempt it.

> Cheers,
> C.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-22 14:43                   ` Christian Pernegger
@ 2019-11-24  0:38                     ` Qu Wenruo
  2019-11-24 19:09                       ` Christian Pernegger
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2019-11-24  0:38 UTC (permalink / raw)
  To: Christian Pernegger, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1178 bytes --]



On 2019/11/22 下午10:43, Christian Pernegger wrote:
> Am Fr., 22. Nov. 2019 um 13:34 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> But still, for snapshot deletion part, there is still a performance impact.
> 
> Ok. It's just that I'd have expected *slower* write and read
> performance until everything's settled, maybe sync writes taking
> noticeably longer than usual, not that all user input blocks across
> the whole system regardless of fs activity.

The slowdown happens in commit transaction, and with commit transaction,
a lot of operation is blocked until current transaction is committed.

That's why it blocks everything.

We had tried our best to reduce the impact, but deletion is still a big
problem, as it can cause tons of extents to change their owner, thus
cause the problem.


In short, unless you really need to know how many bytes each snapshots
really takes, then disable qgroup.

And BTW, for "many" subvolumes/snapshots, I guess we mean 20.
200 is already prone to cause problem, not only qgroups, but also send.

So it's also recommended to reduce the number of snapshots.

Thanks,
Qu
> 
> Cheers,
> C.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-24  0:38                     ` Qu Wenruo
@ 2019-11-24 19:09                       ` Christian Pernegger
  2019-11-25  1:22                         ` Qu Wenruo
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Pernegger @ 2019-11-24 19:09 UTC (permalink / raw)
  To: linux-btrfs

Am So., 24. Nov. 2019 um 01:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> In short, unless you really need to know how many bytes each snapshots
> really takes, then disable qgroup.
>
> And BTW, for "many" subvolumes/snapshots, I guess we mean 20.
> 200 is already prone to cause problem, not only qgroups, but also send.
>
> So it's also recommended to reduce the number of snapshots.

I've disabled qgroups for now, we'll see how that goes. These are
personal desktops, they would have been nice to have, that's all.
Sadly that means that they probably won't work on any storage setup
complex enough for them to be really useful, either, yet.
If btrfs scales so badly with the number of subvolumes that having >20
at a time should be avoided, doesn't that kill a lot of interesting
use-cases? My "time machine" desktop setup, certainly, but anything
with a couple of users or VMs would chew through that 20 pretty
quickly, even before snapshots. Which leaves the LVM use-case
(snapshot, backup the snapshot, delete the snapshot).

> The slowdown happens in commit transaction, and with commit transaction,
> a lot of operation is blocked until current transaction is committed.
>
> That's why it blocks everything.
>
> We had tried our best to reduce the impact, but deletion is still a big
> problem, as it can cause tons of extents to change their owner, thus
> cause the problem.

Sure, but why does it *have to* block? Couldn't the intent to delete
the subvolume be committed, the metadata changes / actual deletion
happen at leisure? Yes, if qgroups are on, then the qgroup info will
be behind, but so what? At least I think that lax/lazy qgroups would
be a nice option to have.
Also, I still don't get why disabling qgroups, reenabling them and
doing a full rescan is lightning fast (and non-blocking), while just
leaving them on results in the observed behaviour.

Cheers,
C.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-24 19:09                       ` Christian Pernegger
@ 2019-11-25  1:22                         ` Qu Wenruo
  0 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2019-11-25  1:22 UTC (permalink / raw)
  To: Christian Pernegger, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3122 bytes --]



On 2019/11/25 上午3:09, Christian Pernegger wrote:
> Am So., 24. Nov. 2019 um 01:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> In short, unless you really need to know how many bytes each snapshots
>> really takes, then disable qgroup.
>>
>> And BTW, for "many" subvolumes/snapshots, I guess we mean 20.
>> 200 is already prone to cause problem, not only qgroups, but also send.
>>
>> So it's also recommended to reduce the number of snapshots.
> 
> I've disabled qgroups for now, we'll see how that goes. These are
> personal desktops, they would have been nice to have, that's all.
> Sadly that means that they probably won't work on any storage setup
> complex enough for them to be really useful, either, yet.
> If btrfs scales so badly with the number of subvolumes that having >20
> at a time should be avoided, doesn't that kill a lot of interesting
> use-cases? My "time machine" desktop setup, certainly, but anything
> with a couple of users or VMs would chew through that 20 pretty
> quickly, even before snapshots. Which leaves the LVM use-case
> (snapshot, backup the snapshot, delete the snapshot).

BTW, that 20 number means 20 snapshots (they all have some shared tree
blocks).

If it's 20 subvolume (no shared tree/data between each), then it counts
as 1.

The main time consuming part is the shared tree/data check, as btrfs
uses indirect way to record them on-disk, forcing us to do complex
walk-back.

Thankfully, we have some plan to improve it.

> 
>> The slowdown happens in commit transaction, and with commit transaction,
>> a lot of operation is blocked until current transaction is committed.
>>
>> That's why it blocks everything.
>>
>> We had tried our best to reduce the impact, but deletion is still a big
>> problem, as it can cause tons of extents to change their owner, thus
>> cause the problem.
> 
> Sure, but why does it *have to* block? Couldn't the intent to delete
> the subvolume be committed, the metadata changes / actual deletion
> happen at leisure?

Unfortunately, not that easy.
We have already delayed a lot of metadata operation, and commit
transaction is the only time we get a consistent metadata view.

That's why it has to happen at that critical section.

> Yes, if qgroups are on, then the qgroup info will
> be behind, but so what?

It's already behind.

> At least I think that lax/lazy qgroups would
> be a nice option to have.

Qgroup is bond to delayed extent tree updates.
While extent tree update is already delayed to transaction commit time,
if it's further delayed, the consistency of the fs will be corrupted.

The plan to solve it is to introduce a global cache for backref walk,
which would not only benefit qgroup, but also send with reflink.

Although there will be some new challenges, we will see if the cache
will be worthy.

Thanks,
Qu

> Also, I still don't get why disabling qgroups, reenabling them and
> doing a full rescan is lightning fast (and non-blocking), while just
> leaving them on results in the observed behaviour.
> 
> Cheers,
> C.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?)
  2019-11-22 23:21                   ` Marc Joliet
@ 2020-03-08 15:11                     ` Marc Joliet
  0 siblings, 0 replies; 31+ messages in thread
From: Marc Joliet @ 2020-03-08 15:11 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2835 bytes --]

Am Samstag, 23. November 2019, 00:21:18 CET schrieben Sie:
> Am Freitag, 22. November 2019, 02:36:56 CET schrieb Chris Murphy:
> > On Thu, Nov 21, 2019 at 3:39 PM Marc Joliet <marcec@gmx.de> wrote:
> > > On a side note, I am also really annoyed by the lockups caused by
> > > qgroups.
> > > On my Gentoo systems (which use btrbk) I have it disabled for that
> > > reason, but I left it on on my openSUSE laptop (a Dell XPS 13 9360),
> > > which locks up for about 15-30 minutes while cleaning up snapshots a few
> > > times a week (usually after reboots or after "zypper dup").
> >
> > 15 seconds is not at all acceptable on a desktop system, 15 minutes is
> > atrocious. A computer that appears to hang for 15 seconds, it is
> > completely reasonable for ordinary users to consider has totally
> > faceplanted, will not recover, and to force power off. The
> > distribution really needs to do something about that kind of negative
> > user experience.
>
> Sadly, I can't say if it's better without snapshotting /home, because I
> hadn't accumulated many / snapshots at that point in time.  It might have
> gotten worse even with only / being snapshotted.  But like I said, I'll
> experiment with configuring snapper before blaming SUSE.  I believe the
> installation even recommends against snapshotting /home, but hey, I wanted
> to do it anyway :-) .
>
> But to be precise, it's not locked up continuously during snapshot deletion.
> Occasionally I'll be able to operate my desktop for a few seconds, and if I
> leave top running in a GUI terminal (in my case konsole), I'll see it
> updating (almost) the entire time.  My guess (emphasis on *guess*) is that
> the qgroups update is holding some lock that is preventing other I/O from
> finishing, thus locking up any application that wants to write to disk and
> isn't doing so concurrently (maybe Plasma is blocking on fsync() at the
> time?).

So just to follow up on this, reducing the total number of snapshots and
increasing the time between their creation from hourly to once every six hours
did help a *little* bit.  However, about a week ago I decided to try an
experiment and added the "autodefrag" mount option (which I don't usually do
on SSDs), and that helped *massively*.  Ever since, snapper-cleanup.service
runs without me noticing at all!

[ What made me try it was that booting the laptop and logging in started
getting really slow and top was showing several btrfs-endio threads hogging
the CPU, *before* snapper-cleanup.service or anything else specific to btrfs
was running (their activity usually coincided with KDE Baloo activity), i.e.,
general I/O was performing badly. ]

Greetings
--
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-03-08 15:12 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 18:34 btrfs based backup? Ulli Horlacher
2019-11-12 18:58 ` joshua
2019-11-12 19:09 ` Oliver Freyermuth
2019-11-12 19:14 ` Remi Gauvin
2019-11-12 20:05 ` Oliver Freyermuth
2019-11-20 16:36   ` freezes during snapshot creation/deletion -- to be expected? (Was: Re: btrfs based backup?) Christian Pernegger
2019-11-20 17:59     ` Oliver Freyermuth
2019-11-20 18:32     ` Chris Murphy
2019-11-21  1:51     ` Qu Wenruo
2019-11-21 16:44       ` Christian Pernegger
2019-11-21 19:37         ` Oliver Freyermuth
2019-11-21 20:30           ` Christian Pernegger
2019-11-21 21:34             ` Christian Pernegger
2019-11-21 22:39               ` Marc Joliet
2019-11-22  1:36                 ` Chris Murphy
2019-11-22 23:21                   ` Marc Joliet
2020-03-08 15:11                     ` Marc Joliet
2019-11-21 23:57             ` Oliver Freyermuth
2019-11-22 12:30               ` Christian Pernegger
2019-11-22 12:34                 ` Qu Wenruo
2019-11-22 14:43                   ` Christian Pernegger
2019-11-24  0:38                     ` Qu Wenruo
2019-11-24 19:09                       ` Christian Pernegger
2019-11-25  1:22                         ` Qu Wenruo
2019-11-21 22:22     ` Zygo Blaxell
2019-11-22  4:59       ` Zygo Blaxell
2019-11-22 14:36       ` Christian Pernegger
2019-11-23  3:49         ` Zygo Blaxell
2019-11-12 20:48 ` btrfs based backup? Michael
2019-11-13 15:04 ` Austin S. Hemmelgarn
2019-11-18 12:56 ` Ulli Horlacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.