synchronize btrfs snapshots over a unreliable, slow connection

All of lore.kernel.org
 help / color / mirror / Atom feed

* synchronize btrfs snapshots over a unreliable, slow connection
@ 2021-01-04 20:51  
  2021-01-05  8:34 ` Forza
  2021-01-07  1:59 ` Zygo Blaxell
  0 siblings, 2 replies; 11+ messages in thread
From:   @ 2021-01-04 20:51 UTC (permalink / raw)
  To: linux-btrfs

I have a master NAS that makes one read only snapshot of my data per day. I want to transfer these snapshots to a slave NAS over a slow, unreliable internet connection. (it's a cheap provider). This rules out a "btrfs send -> ssh -> btrfs receive" construction, as that can't be resumed.

Therefore I want to use rsync to synchronize the snapshots on the master NAS to the slave NAS.

My thirst thought is something like this:
1) create a read-only snapshot on the master NAS:
btrfs subvolume snapshot -r /mnt/nas/storage /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m)
2) send that data to the slave NAS like this:
rsync --partial -var --compress --bwlimit=500KB -e "ssh -i ~/slave-nas.key" /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m) cedric@123.123.123.123/nas/storage
3) Restart rsync until all data is copied (by checking the error code of rsync, is it's 0 then all data has been transferred)
4) Create the read-only snapshot on the slave NAS with the same name as in step 1.

Does somebody already has a script that does this?
Is there a problem with this approach that I have not yet considered?

---

Take your mailboxes with you. Free, fast and secure Mail &amp; Cloud: https://www.eclipso.eu - Time to change!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: synchronize btrfs snapshots over a unreliable, slow connection
  2021-01-04 20:51 synchronize btrfs snapshots over a unreliable, slow connection  
@ 2021-01-05  8:34 ` Forza
  2021-01-05 11:24   ` Graham Cobb
  2021-01-07  3:09   ` Zygo Blaxell
  2021-01-07  1:59 ` Zygo Blaxell
  1 sibling, 2 replies; 11+ messages in thread
From: Forza @ 2021-01-05  8:34 UTC (permalink / raw)
  To: Cedric.dewijs, linux-btrfs



On 2021-01-04 21:51, Cedric.dewijs@eclipso.eu wrote:
> I have a master NAS that makes one read only snapshot of my data per day. I want to transfer these snapshots to a slave NAS over a slow, unreliable internet connection. (it's a cheap provider). This rules out a "btrfs send -> ssh -> btrfs receive" construction, as that can't be resumed.
> 
> Therefore I want to use rsync to synchronize the snapshots on the master NAS to the slave NAS.
> 
> My thirst thought is something like this:
> 1) create a read-only snapshot on the master NAS:
> btrfs subvolume snapshot -r /mnt/nas/storage /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m)
> 2) send that data to the slave NAS like this:
> rsync --partial -var --compress --bwlimit=500KB -e "ssh -i ~/slave-nas.key" /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m) cedric@123.123.123.123/nas/storage
> 3) Restart rsync until all data is copied (by checking the error code of rsync, is it's 0 then all data has been transferred)
> 4) Create the read-only snapshot on the slave NAS with the same name as in step 1.
> 
> Does somebody already has a script that does this?
> Is there a problem with this approach that I have not yet considered?
> 

One option is to store the send stream as a compressed file and rsync 
that file over and do a shasum or similar on it.

Steps would be something like this on the sender side:

1) create read-only snapshot as 
/mnt/nas/storage_snapshots/storage-210105-0930
2) btrfs send /mnt/nas/storage_snapshots/storage-210105-0930| xz -T0 - > 
/some/path/storage-210105-0930.xz

send this file to remote location, verify integrity, then do:
3) xzcat storage-210105-0930.xz | btrfs receive /nas/storage

You could expand on the compression scheme by using self-healing 
archives using PAR[1] or similar tools, in case you want to keep the 
archived files.

btrbk[2] is a Btrfs backup tool that also can store snapshots as 
archives on remote location. You may want to have a look at that too.

Good Luck!


[1]https://en.wikipedia.org/wiki/Parchive
[2]https://digint.ch/btrbk/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: synchronize btrfs snapshots over a unreliable, slow connection
  2021-01-05  8:34 ` Forza
@ 2021-01-05 11:24   ` Graham Cobb
  2021-01-05 11:53     ` Roman Mamedov
  2021-01-05 12:24     ` Cerem Cem ASLAN
  2021-01-07  3:09   ` Zygo Blaxell
  1 sibling, 2 replies; 11+ messages in thread
From: Graham Cobb @ 2021-01-05 11:24 UTC (permalink / raw)
  To: Forza, Cedric.dewijs, linux-btrfs

On 05/01/2021 08:34, Forza wrote:
> 
> 
> On 2021-01-04 21:51, Cedric.dewijs@eclipso.eu wrote:
>> I have a master NAS that makes one read only snapshot of my data per
>> day. I want to transfer these snapshots to a slave NAS over a slow,
>> unreliable internet connection. (it's a cheap provider). This rules
>> out a "btrfs send -> ssh -> btrfs receive" construction, as that can't
>> be resumed.
>>
>> Therefore I want to use rsync to synchronize the snapshots on the
>> master NAS to the slave NAS.
>>
>> My thirst thought is something like this:
>> 1) create a read-only snapshot on the master NAS:
>> btrfs subvolume snapshot -r /mnt/nas/storage
>> /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m)
>> 2) send that data to the slave NAS like this:
>> rsync --partial -var --compress --bwlimit=500KB -e "ssh -i
>> ~/slave-nas.key" /mnt/nas/storage_snapshots/storage-$(date
>> +%Y_%m_%d-%H%m) cedric@123.123.123.123/nas/storage
>> 3) Restart rsync until all data is copied (by checking the error code
>> of rsync, is it's 0 then all data has been transferred)
>> 4) Create the read-only snapshot on the slave NAS with the same name
>> as in step 1.

Seems like a reasonable approach to me, but see comment below.

>> Does somebody already has a script that does this?

I don't.

>> Is there a problem with this approach that I have not yet considered?

Not a problem as such, but you could also consider using something like
rsnapshot (or reimplementing your own version by using rsync
--link-dest) instead of relying on btrfs snapshots on the slave NAS.
That way you don't need btrfs on that NAS at all if you don't want. I
used that approach as the (old) NAS I was using had a very old linux
version and didn't even run btrfs.

> One option is to store the send stream as a compressed file and rsync
> that file over and do a shasum or similar on it.

I have looked into that in the past and eventually decided against it.

My main concern was being too reliant on very complex and less used
features of btrfs, including one which has had several bugs in the past
(send/receive). I decided my backups needed to be reliable and robust
more than they need to be optimally efficient.

I had even considered just saving the original send stream, and the
subsequent incremental sends (all compressed) - until I realised that
any tiny corruption or bug in even one of those streams could make the
later streams completely unrestorable.

In the end, I decided to use a very boring (but powerful and
well-maintained), widely used, conventional backup tool (specifically,
dar, under the control of the dar_automatic_backup script) and I copy
the dar archives (compressed and encrypted) onto my offsite backup
server (actually, now, I store them in S3, using rclone). They are also
convenient to occasionally put on a disk which I can give to a friend to
put at the back of their cupboard somewhere in case I need it (faster
and cheaper to access than S3)!

In my case, I had some spare disks and plenty of bandwidth so I also use
rsnapshot from my onsite NAS to an offsite NAS. But that is for
convenience (not having to have dar read through all the archives) - I
consider the S3 dar archives my "main" disaster-recovery backup.

> btrbk[2] is a Btrfs backup tool that also can store snapshots as
> archives on remote location. You may want to have a look at that too.

I use btrbk for local snapshots (storing snapshots of all my systems on
my main server system). But I consider those convenient copies for
restoring files deleted by mistake, or restoring earlier configurations,
not backups (for example, a serious electrical problem or fire in the
server machine could destroy both the original disk and the snapshot disk).

Your situation is different, of course - so just some things to consider.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: synchronize btrfs snapshots over a unreliable, slow connection
  2021-01-05 11:24   ` Graham Cobb
@ 2021-01-05 11:53     ` Roman Mamedov
  2021-01-05 12:24     ` Cerem Cem ASLAN
  1 sibling, 0 replies; 11+ messages in thread
From: Roman Mamedov @ 2021-01-05 11:53 UTC (permalink / raw)
  To: Graham Cobb; +Cc: Forza, Cedric.dewijs, linux-btrfs

On Tue, 5 Jan 2021 11:24:24 +0000
Graham Cobb <g.btrfs@cobb.uk.net> wrote:

> used that approach as the (old) NAS I was using had a very old linux
> version and didn't even run btrfs.

One anecdote --

I do use an old D-Link DNS-323 NAS with old kernel and distro (older Debian),
and only ~60 MB of RAM to serve a 8 TB disk or two. How does that even work?

Simple: it exports the disk(s) over the network as block devices via NBD, and
they are mounted remotely on a much more modern and powerful host.

A bit of a secret sauce surprisingly turned out to be the QEMU's NBD server
(qemu-nbd), it allows to set disk access modes inherited from QEMU itself, and
there with "--cache=none" the little thing doesn't choke (anymore) RAM-wise,
even with full jumbo-frames enabled on the network side.

(Other NBD servers were much less performant and/or reliable on the hardware).

Transfer speeds are around 17 MBytes/sec. That's on a Gbit LAN, and admittedly
running blockdevice-level access over a network does prefer to have a low ping
for good performance.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: synchronize btrfs snapshots over a unreliable, slow connection
  2021-01-05 11:24   ` Graham Cobb
  2021-01-05 11:53     ` Roman Mamedov
@ 2021-01-05 12:24     ` Cerem Cem ASLAN
  2021-01-06  8:18       ` Forza
  1 sibling, 1 reply; 11+ messages in thread
From: Cerem Cem ASLAN @ 2021-01-05 12:24 UTC (permalink / raw)
  To: Graham Cobb; +Cc: Forza, Cedric.dewijs, Btrfs BTRFS

I also thought about a different approach in the past:

1. Take a snapshot and rsync it to the server.
2. When it succeeds, make it readonly and take a note on the remote
site that indicates the Received_UUID and checksum of entire
subvolume.
3. When you want to send your diff, run `btrfs send -p ./first
./second | list-file-changes -o my-diff-for-second.txt` if that
Received_UUID on the remote site matches with ./first. (Otherwise, you
should run rsync without taking advantage of
`my-diff-for-second.txt`.)
4. Use rsync to send the changed files listed in `my-diff-for-second.txt`.
5. Verify by using a rolling hash, create a second snapshot and so on.

That approach will use all advantages of rsync and adds the "change
detection" benefit from BTRFS. The problem is, I don't know how to
implement the `list-file-changes` tool.

By the way, why wouldn't BTRFS keep a CHECKSUM field on readonly
subvolumes and simply use that field for diff and patch operations?
Calculating incremental checksums on every new readonly snapshot seems
like a computationally cheap operation. We could then transfer our
snapshots whatever method/tool we like (even we could create the
/home/foo/hello.txt file with "hello world" content manually and then
create another snapshot that will automatically match with our new
local snapshot).

Graham Cobb <g.btrfs@cobb.uk.net>, 5 Oca 2021 Sal, 14:29 tarihinde şunu yazdı:
>
> On 05/01/2021 08:34, Forza wrote:
> >
> >
> > On 2021-01-04 21:51, Cedric.dewijs@eclipso.eu wrote:
> >> I have a master NAS that makes one read only snapshot of my data per
> >> day. I want to transfer these snapshots to a slave NAS over a slow,
> >> unreliable internet connection. (it's a cheap provider). This rules
> >> out a "btrfs send -> ssh -> btrfs receive" construction, as that can't
> >> be resumed.
> >>
> >> Therefore I want to use rsync to synchronize the snapshots on the
> >> master NAS to the slave NAS.
> >>
> >> My thirst thought is something like this:
> >> 1) create a read-only snapshot on the master NAS:
> >> btrfs subvolume snapshot -r /mnt/nas/storage
> >> /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m)
> >> 2) send that data to the slave NAS like this:
> >> rsync --partial -var --compress --bwlimit=500KB -e "ssh -i
> >> ~/slave-nas.key" /mnt/nas/storage_snapshots/storage-$(date
> >> +%Y_%m_%d-%H%m) cedric@123.123.123.123/nas/storage
> >> 3) Restart rsync until all data is copied (by checking the error code
> >> of rsync, is it's 0 then all data has been transferred)
> >> 4) Create the read-only snapshot on the slave NAS with the same name
> >> as in step 1.
>
> Seems like a reasonable approach to me, but see comment below.
>
> >> Does somebody already has a script that does this?
>
> I don't.
>
> >> Is there a problem with this approach that I have not yet considered?
>
> Not a problem as such, but you could also consider using something like
> rsnapshot (or reimplementing your own version by using rsync
> --link-dest) instead of relying on btrfs snapshots on the slave NAS.
> That way you don't need btrfs on that NAS at all if you don't want. I
> used that approach as the (old) NAS I was using had a very old linux
> version and didn't even run btrfs.
>
> > One option is to store the send stream as a compressed file and rsync
> > that file over and do a shasum or similar on it.
>
> I have looked into that in the past and eventually decided against it.
>
> My main concern was being too reliant on very complex and less used
> features of btrfs, including one which has had several bugs in the past
> (send/receive). I decided my backups needed to be reliable and robust
> more than they need to be optimally efficient.
>
> I had even considered just saving the original send stream, and the
> subsequent incremental sends (all compressed) - until I realised that
> any tiny corruption or bug in even one of those streams could make the
> later streams completely unrestorable.
>
> In the end, I decided to use a very boring (but powerful and
> well-maintained), widely used, conventional backup tool (specifically,
> dar, under the control of the dar_automatic_backup script) and I copy
> the dar archives (compressed and encrypted) onto my offsite backup
> server (actually, now, I store them in S3, using rclone). They are also
> convenient to occasionally put on a disk which I can give to a friend to
> put at the back of their cupboard somewhere in case I need it (faster
> and cheaper to access than S3)!
>
> In my case, I had some spare disks and plenty of bandwidth so I also use
> rsnapshot from my onsite NAS to an offsite NAS. But that is for
> convenience (not having to have dar read through all the archives) - I
> consider the S3 dar archives my "main" disaster-recovery backup.
>
> > btrbk[2] is a Btrfs backup tool that also can store snapshots as
> > archives on remote location. You may want to have a look at that too.
>
> I use btrbk for local snapshots (storing snapshots of all my systems on
> my main server system). But I consider those convenient copies for
> restoring files deleted by mistake, or restoring earlier configurations,
> not backups (for example, a serious electrical problem or fire in the
> server machine could destroy both the original disk and the snapshot disk).
>
> Your situation is different, of course - so just some things to consider.
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: synchronize btrfs snapshots over a unreliable, slow connection
  2021-01-05 12:24     ` Cerem Cem ASLAN
@ 2021-01-06  8:18       ` Forza
  2021-01-07  2:06         ` Zygo Blaxell
  2021-01-11  9:32         ` Cerem Cem ASLAN
  0 siblings, 2 replies; 11+ messages in thread
From: Forza @ 2021-01-06  8:18 UTC (permalink / raw)
  To: Cerem Cem ASLAN, Graham Cobb; +Cc: Cedric.dewijs, Btrfs BTRFS



On 2021-01-05 13:24, Cerem Cem ASLAN wrote:
> I also thought about a different approach in the past:
> 
> 1. Take a snapshot and rsync it to the server.
> 2. When it succeeds, make it readonly and take a note on the remote
> site that indicates the Received_UUID and checksum of entire
> subvolume.
> 3. When you want to send your diff, run `btrfs send -p ./first
> ./second | list-file-changes -o my-diff-for-second.txt` if that
> Received_UUID on the remote site matches with ./first. (Otherwise, you
> should run rsync without taking advantage of
> `my-diff-for-second.txt`.)

You can use `btrbk diff old-snap new-snap` to list changes between 
snapshots.

Example:
------------------------------------------------------------------------------
#btrbk diff /mnt/systemRoot/snapshots/root.20210101T0001/ 
/mnt/systemRoot/snapshots/root.20210102T0001/

Subvolume Diff (btrbk command line client, version 0.30.0)

     Date:   Wed Jan  6 09:06:37 2021

Showing changed files for subvolume:
   /mnt/systemRoot/snapshots/root.20210102T0001  (gen=6050233)

Starting at generation after subvolume:
   /mnt/systemRoot/snapshots/root.20210101T0001  (gen=6046626)

This will show all files modified within generation range: 
[6046627..6050233]
Newest file generation (transid marker) was: 6050233

Legend:
     +..     file accessed at offset 0 (at least once)
     .c.     flags COMPRESS or COMPRESS|INLINE set (at least once)
     ..i     flags INLINE or COMPRESS|INLINE set (at least once)
     <count> file was modified in <count> generations
     <size>  file was modified for a total of <size> bytes
------------------------------------------------------------------------------
+ci   1       1318  etc/csh.env
+ci   1       2116  etc/dispatch-conf.conf
+ci   1       1111  etc/environment.d/10-gentoo-env.conf
+ci   1       2000  etc/etc-update.conf
+c.   1      94208  etc/ld.so.cache
...
------------------------------------------------------------------------------

You can also use `btrfs find-new` to list filesystem changes, but the 
output is much more verbose than that of btrbk, and you need to figure 
out the generation id's first. I also think that some things like 
deleted files and renamed files do not get listed? [*]

Example:
------------------------------------------------------------------------------
# btrfs subvolume find-new /mnt/systemRoot/snapshots/root.20210102T0001/ 
6046626

inode 3054490 file offset 0 len 8192 disk start 239676399616 offset 0 
gen 6048209 flags COMPRESS etc/passwd-
inode 9527306 file offset 0 len 4096 disk start 239792578560 offset 0 
gen 6049979 flags COMPRESS var/lib/dhcp/dhclient.leases
inode 9527306 file offset 4096 len 4096 disk start 239437688832 offset 0 
gen 6050179 flags COMPRESS var/lib/dhcp/dhclient.leases
inode 9527306 file offset 8192 len 4096 disk start 241226248192 offset 0 
gen 6050220 flags NONE var/lib/dhcp/dhclient.leases
inode 9527438 file offset 0 len 4096 disk start 244439986176 offset 0 
gen 6049681 flags NONE var/lib/samba/wins.tdb
inode 9527438 file offset 4096 len 4096 disk start 244569776128 offset 0 
gen 6050217 flags NONE var/lib/samba/wins.tdb
inode 9527438 file offset 8192 len 4096 disk start 243901612032 offset 0 
gen 6049543 flags NONE var/lib/samba/wins.tdb
inode 9527438 file offset 12288 len 8192 disk start 242191458304 offset 
4096 gen 6048901 flags PREALLOC var/lib/samba/wins.tdb
inode 9527438 file offset 20480 len 4096 disk start 244319576064 offset 
0 gen 6049691 flags NONE var/lib/samba/wins.tdb
------------------------------------------------------------------------------

> 4. Use rsync to send the changed files listed in `my-diff-for-second.txt`.
> 5. Verify by using a rolling hash, create a second snapshot and so on.
> 
> That approach will use all advantages of rsync and adds the "change
> detection" benefit from BTRFS. The problem is, I don't know how to
> implement the `list-file-changes` tool.
> 
> By the way, why wouldn't BTRFS keep a CHECKSUM field on readonly
> subvolumes and simply use that field for diff and patch operations?
> Calculating incremental checksums on every new readonly snapshot seems
> like a computationally cheap operation. We could then transfer our
> snapshots whatever method/tool we like (even we could create the
> /home/foo/hello.txt file with "hello world" content manually and then
> create another snapshot that will automatically match with our new
> local snapshot).
> 
[*]http://marc.merlins.org/perso/btrfs/post_2014-05-19_Btrfs-diff-Between-Snapshots.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: synchronize btrfs snapshots over a unreliable, slow connection
  2021-01-04 20:51 synchronize btrfs snapshots over a unreliable, slow connection  
  2021-01-05  8:34 ` Forza
@ 2021-01-07  1:59 ` Zygo Blaxell
  1 sibling, 0 replies; 11+ messages in thread
From: Zygo Blaxell @ 2021-01-07  1:59 UTC (permalink / raw)
  To: Cedric.dewijs; +Cc: linux-btrfs

On Mon, Jan 04, 2021 at 09:51:46PM +0100,   wrote:
> I have a master NAS that makes one read only snapshot of my data per day. I want to transfer these snapshots to a slave NAS over a slow, unreliable internet connection. (it's a cheap provider). This rules out a "btrfs send -> ssh -> btrfs receive" construction, as that can't be resumed.
> 
> Therefore I want to use rsync to synchronize the snapshots on the master NAS to the slave NAS.
> 
> My thirst thought is something like this:
> 1) create a read-only snapshot on the master NAS:
> btrfs subvolume snapshot -r /mnt/nas/storage /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m)
> 2) send that data to the slave NAS like this:
> rsync --partial -var --compress --bwlimit=500KB -e "ssh -i ~/slave-nas.key" /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m) cedric@123.123.123.123/nas/storage
> 3) Restart rsync until all data is copied (by checking the error code of rsync, is it's 0 then all data has been transferred)
> 4) Create the read-only snapshot on the slave NAS with the same name as in step 1.
> 
> Does somebody already has a script that does this?

Yes, and it is pretty much what you wrote above.  You probably also
want rsync options -aXXHS and --del, possibly also --numeric-ids and/or
--fake-super depending on how exact you want this copy to be (i.e. should
it preserve uid/gids, do both NAS hosts have all the same user names but
different user IDs, do you want the receiver to run rsync as root or an
unprivileged user, etc).

> Is there a problem with this approach that I have not yet considered?

rsync will not propagate extent sharing to the receiver, and by default
if part of a file is modified, the entire file becomes unshared.  If this
is a problem, you may want to run dedupe on the receiver.

If you omit the -S option and add --inplace to rsync, then there is better
extent sharing (now partially modified files don't unshare the entire file)
but you lose sparse file support (so files that have large holes will have
them filled in with zero-data blocks).  This can result in a size increase
with some file formats, to astronomical sizes in the case of files like
/var/log/lastlog.

If the link can fail, then ssh commands to create snapshots on the receiver
can fail too.  You can loop to retry those as well.

If it takes more than one day to propagate a snapshot over the link,
you will have to decide whether to let rsync keep trying to catch up,
or abort and start over from the next day's snapshot.  You might want
to exit the rsync retry loop if the date changes while it's running.

A related question is what is expected when the sending host reboots.
Does it forget previous incomplete sends and just start a fresh rsync
with the current date's snapshot, or does it loop over all snapshots
in reverse order until it gets to one the receiver has, and then loops
forward from there to send each one from the backlog?

I solved the last two problems by not retaining the snapshots on the
sender side.  Each rsync instance sends from its own freshly created
snapshot that is deleted as soon as rsync exits (or upon reboot after
a crash), and the receiver provides its own snapshot names.  There is
no problem with backlog this way, but if you want to keep snapshots on
both sides of the SSH connection then this approach is not for you.

> ---
> 
> Take your mailboxes with you. Free, fast and secure Mail &amp; Cloud: https://www.eclipso.eu - Time to change!
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: synchronize btrfs snapshots over a unreliable, slow connection
  2021-01-06  8:18       ` Forza
@ 2021-01-07  2:06         ` Zygo Blaxell
  2021-01-11  9:32         ` Cerem Cem ASLAN
  1 sibling, 0 replies; 11+ messages in thread
From: Zygo Blaxell @ 2021-01-07  2:06 UTC (permalink / raw)
  To: Forza; +Cc: Cerem Cem ASLAN, Graham Cobb, Cedric.dewijs, Btrfs BTRFS

On Wed, Jan 06, 2021 at 09:18:30AM +0100, Forza wrote:
> 
> 
> On 2021-01-05 13:24, Cerem Cem ASLAN wrote:
> > I also thought about a different approach in the past:
> > 
> > 1. Take a snapshot and rsync it to the server.
> > 2. When it succeeds, make it readonly and take a note on the remote
> > site that indicates the Received_UUID and checksum of entire
> > subvolume.
> > 3. When you want to send your diff, run `btrfs send -p ./first
> > ./second | list-file-changes -o my-diff-for-second.txt` if that
> > Received_UUID on the remote site matches with ./first. (Otherwise, you
> > should run rsync without taking advantage of
> > `my-diff-for-second.txt`.)
> 
> You can use `btrbk diff old-snap new-snap` to list changes between
> snapshots.
> 
> Example:
> ------------------------------------------------------------------------------
> #btrbk diff /mnt/systemRoot/snapshots/root.20210101T0001/
> /mnt/systemRoot/snapshots/root.20210102T0001/
> 
> Subvolume Diff (btrbk command line client, version 0.30.0)
> 
>     Date:   Wed Jan  6 09:06:37 2021
> 
> Showing changed files for subvolume:
>   /mnt/systemRoot/snapshots/root.20210102T0001  (gen=6050233)
> 
> Starting at generation after subvolume:
>   /mnt/systemRoot/snapshots/root.20210101T0001  (gen=6046626)
> 
> This will show all files modified within generation range:
> [6046627..6050233]
> Newest file generation (transid marker) was: 6050233
> 
> Legend:
>     +..     file accessed at offset 0 (at least once)
>     .c.     flags COMPRESS or COMPRESS|INLINE set (at least once)
>     ..i     flags INLINE or COMPRESS|INLINE set (at least once)
>     <count> file was modified in <count> generations
>     <size>  file was modified for a total of <size> bytes
> ------------------------------------------------------------------------------
> +ci   1       1318  etc/csh.env
> +ci   1       2116  etc/dispatch-conf.conf
> +ci   1       1111  etc/environment.d/10-gentoo-env.conf
> +ci   1       2000  etc/etc-update.conf
> +c.   1      94208  etc/ld.so.cache
> ...
> ------------------------------------------------------------------------------
> 
> You can also use `btrfs find-new` to list filesystem changes, but the output
> is much more verbose than that of btrbk, and you need to figure out the
> generation id's first. I also think that some things like deleted files and
> renamed files do not get listed? [*]

find-new runs TREE_SEARCH to find everything in subvol metadata pages that
were unshared since the given transid.  It then filters out references to
file data that are older than the given transid, and prints what is left.

It's roughly all the new extents in the subvol since the given transid.  No
deletions, (it has nothing to compare against to know something is now
no longer there), no file attributes, no new clones or reflinks of old data
(i.e. after 'cp --reflink=always old_file old_file_2', old_file_2 will
not show up in find-new).

> Example:
> ------------------------------------------------------------------------------
> # btrfs subvolume find-new /mnt/systemRoot/snapshots/root.20210102T0001/
> 6046626
> 
> inode 3054490 file offset 0 len 8192 disk start 239676399616 offset 0 gen
> 6048209 flags COMPRESS etc/passwd-
> inode 9527306 file offset 0 len 4096 disk start 239792578560 offset 0 gen
> 6049979 flags COMPRESS var/lib/dhcp/dhclient.leases
> inode 9527306 file offset 4096 len 4096 disk start 239437688832 offset 0 gen
> 6050179 flags COMPRESS var/lib/dhcp/dhclient.leases
> inode 9527306 file offset 8192 len 4096 disk start 241226248192 offset 0 gen
> 6050220 flags NONE var/lib/dhcp/dhclient.leases
> inode 9527438 file offset 0 len 4096 disk start 244439986176 offset 0 gen
> 6049681 flags NONE var/lib/samba/wins.tdb
> inode 9527438 file offset 4096 len 4096 disk start 244569776128 offset 0 gen
> 6050217 flags NONE var/lib/samba/wins.tdb
> inode 9527438 file offset 8192 len 4096 disk start 243901612032 offset 0 gen
> 6049543 flags NONE var/lib/samba/wins.tdb
> inode 9527438 file offset 12288 len 8192 disk start 242191458304 offset 4096
> gen 6048901 flags PREALLOC var/lib/samba/wins.tdb
> inode 9527438 file offset 20480 len 4096 disk start 244319576064 offset 0
> gen 6049691 flags NONE var/lib/samba/wins.tdb
> ------------------------------------------------------------------------------
> 
> > 4. Use rsync to send the changed files listed in `my-diff-for-second.txt`.
> > 5. Verify by using a rolling hash, create a second snapshot and so on.
> > 
> > That approach will use all advantages of rsync and adds the "change
> > detection" benefit from BTRFS. The problem is, I don't know how to
> > implement the `list-file-changes` tool.
> > 
> > By the way, why wouldn't BTRFS keep a CHECKSUM field on readonly
> > subvolumes and simply use that field for diff and patch operations?
> > Calculating incremental checksums on every new readonly snapshot seems
> > like a computationally cheap operation. We could then transfer our
> > snapshots whatever method/tool we like (even we could create the
> > /home/foo/hello.txt file with "hello world" content manually and then
> > create another snapshot that will automatically match with our new
> > local snapshot).
> > 
> [*]http://marc.merlins.org/perso/btrfs/post_2014-05-19_Btrfs-diff-Between-Snapshots.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: synchronize btrfs snapshots over a unreliable, slow connection
  2021-01-05  8:34 ` Forza
  2021-01-05 11:24   ` Graham Cobb
@ 2021-01-07  3:09   ` Zygo Blaxell
  2021-01-07 19:22     ` Graham Cobb
  1 sibling, 1 reply; 11+ messages in thread
From: Zygo Blaxell @ 2021-01-07  3:09 UTC (permalink / raw)
  To: Forza; +Cc: Cedric.dewijs, linux-btrfs

On Tue, Jan 05, 2021 at 09:34:24AM +0100, Forza wrote:
> 
> 
> On 2021-01-04 21:51, Cedric.dewijs@eclipso.eu wrote:
> > I have a master NAS that makes one read only snapshot of my data per day. I want to transfer these snapshots to a slave NAS over a slow, unreliable internet connection. (it's a cheap provider). This rules out a "btrfs send -> ssh -> btrfs receive" construction, as that can't be resumed.
> > 
> > Therefore I want to use rsync to synchronize the snapshots on the master NAS to the slave NAS.
> > 
> > My thirst thought is something like this:
> > 1) create a read-only snapshot on the master NAS:
> > btrfs subvolume snapshot -r /mnt/nas/storage /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m)
> > 2) send that data to the slave NAS like this:
> > rsync --partial -var --compress --bwlimit=500KB -e "ssh -i ~/slave-nas.key" /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m) cedric@123.123.123.123/nas/storage
> > 3) Restart rsync until all data is copied (by checking the error code of rsync, is it's 0 then all data has been transferred)
> > 4) Create the read-only snapshot on the slave NAS with the same name as in step 1.
> > 
> > Does somebody already has a script that does this?
> > Is there a problem with this approach that I have not yet considered?
> > 
> 
> One option is to store the send stream as a compressed file and rsync that
> file over and do a shasum or similar on it.
> 
> Steps would be something like this on the sender side:
> 
> 1) create read-only snapshot as
> /mnt/nas/storage_snapshots/storage-210105-0930
> 2) btrfs send /mnt/nas/storage_snapshots/storage-210105-0930| xz -T0 - >
> /some/path/storage-210105-0930.xz

Should be:

	if btrfs send /mnt/nas/storage_snapshots/storage-210105-0930 > >(xz -T0 - > /some/path/storage-210105-0930.xz); then
		if xz -t /some/path/storage-210105-0930.xz; then
			... good to go ...
		else
			... handle xz failure ...
		fi
	else
		... handle btrfs send failure ...
	fi

> send this file to remote location, verify integrity, then do:

For huge files and broken links, use these options:

	rsync --append --inplace --partial --bwlimit=500KB storage-210105-0930.xz receiver:/path

This will avoid copy-and-rename of a big file in case the backup file is
very large and the link is failing relatively often.

> 3) xzcat storage-210105-0930.xz | btrfs receive /nas/storage

This can be problematic if the deltas are unusually large.  e.g. if we
add 1TB of new data on Tuesday, and delete it all on Thursday, we can't
unpack Friday's snapshot because ssh is still working on getting Tuesday's
snapshot over the network (and will be for at least 3 more Tuesdays at a
500 KB/s link speed).

What we really need here is a way to bless an rsync-synchronized btrfs
subvol for future use with btrfs send -p, so we can freely switch between
them as required.  Like this, perhaps:

1) sender:  btrfs sub snap -r subvol snapshot-$(date +%s)

2) sender:  rsync -avxHSXX --partial --numeric-ids --del receiver:/path/to/receive/subvol

   (do not leave out:

	-a - recurse and preserve inode attributes

	-x - stay within the subvol

	-XX - preserve xattrs (receive might fail if they are missing)

	-H - hardlinks (send/receive might fail unless every inode has
	all of the same names on both sides)

	--numeric-ids (send and receive use numeric ids)

	--del (receive will not attempt to create an existing file,
	it will just fail)

    and don't use --exclude or --fake-super, or any variation like --include '!foo' or --no-fake-super)

3) sender:  get generation and uuid from snapshot-$(date +%s) and send it to receiver

4) receiver:

        struct btrfs_ioctl_received_subvol_args rs_args = {
                .received_uuid = /* insert the subvol uuid from the sender side */,
                .stransid = /* insert the subvol transid from the sender side */,
        }
        u64 flags;

	subvol_fd = open("/path/to/received/subvol", O_RDONLY);
	/* set the received uuid on the subvol so btrfs send -p can use it as a parent */
        ret = ioctl(subvol_fd, BTRFS_IOC_SET_RECEIVED_SUBVOL, &rs_args);

	/* this is just "btrfs prop set <snapshot> ro true", but you can also do it from C code */
        ret = ioctl(subvol_fd, BTRFS_IOC_SUBVOL_GETFLAGS, &flags);
        flags |= BTRFS_SUBVOL_RDONLY;
        ret = ioctl(subvol_fd, BTRFS_IOC_SUBVOL_SETFLAGS, &flags);

The result should be compatible with future btrfs send -p (in
theory--remember, I've already been wrong about btrfs send once this
week ;).

This code is untested, and things will go badly and silently wrong if the
two subvols are _not_ in fact identical (e.g. if you excluded anything
in the transfer, added anything on the receive side, or didn't use _all_
of the rsync options listed above to produce topologically identical
inode-to-filename graphs on both sides).

> You could expand on the compression scheme by using self-healing archives
> using PAR[1] or similar tools, in case you want to keep the archived files.

I would only attempt to put the archives into long-term storage after
verifying that they produce correct output when fed to btrfs receive;
otherwise, you could find out too late that a months-old archive was
damaged, incomplete, or incorrect, and restores after that point are no
longer possible.

Once that verification has been done and the subvol is no longer needed
for incremental sends, you can delete the subvol and keep the archive(s)
that produced it.

> btrbk[2] is a Btrfs backup tool that also can store snapshots as archives on
> remote location. You may want to have a look at that too.
> 
> Good Luck!
> 
> 
> [1]https://en.wikipedia.org/wiki/Parchive
> [2]https://digint.ch/btrbk/
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: synchronize btrfs snapshots over a unreliable, slow connection
  2021-01-07  3:09   ` Zygo Blaxell
@ 2021-01-07 19:22     ` Graham Cobb
  0 siblings, 0 replies; 11+ messages in thread
From: Graham Cobb @ 2021-01-07 19:22 UTC (permalink / raw)
  To: Zygo Blaxell, Forza; +Cc: Cedric.dewijs, linux-btrfs

On 07/01/2021 03:09, Zygo Blaxell wrote:
...
> I would only attempt to put the archives into long-term storage after> verifying that they produce correct output when fed to btrfs receive;>
otherwise, you could find out too late that a months-old archive was>
damaged, incomplete, or incorrect, and restores after that point are no>
longer possible.> > Once that verification has been done and the subvol
is no longer needed> for incremental sends, you can delete the subvol
and keep the archive(s)> that produced it.
Personally, I wouldn't do that. Particularly if this was my only or main
backup. I don't think btrfs has many tests that new versions of
"receive" can correctly process old archives - let alone an incremental
sequence of them generated by versions of "send" with bugs fixed years
before.

If it was me, I would always keep the "latest" subvol online, or at
least as a newly created full (not incremental) send archive.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: synchronize btrfs snapshots over a unreliable, slow connection
  2021-01-06  8:18       ` Forza
  2021-01-07  2:06         ` Zygo Blaxell
@ 2021-01-11  9:32         ` Cerem Cem ASLAN
  1 sibling, 0 replies; 11+ messages in thread
From: Cerem Cem ASLAN @ 2021-01-11  9:32 UTC (permalink / raw)
  To: Forza; +Cc: Graham Cobb, Cedric.dewijs, Btrfs BTRFS

> You can use `btrbk diff old-snap new-snap` to list changes between
> snapshots.
>

The problem with 'btrbk diff' approach (as stated here [1]) is that it
can not show changes for empty files, empty folders and deletions,
because it also uses 'btrfs find_new' under the hood (see [2]).

However, I found this[3] tool at the time of writing this reply, which
works great and idea behind it (parsing 'btrfs send --no-data' output)
is rock solid.

[1]: https://serverfault.com/a/580264/261445
[2]: https://github.com/digint/btrbk/blob/7dc827bdc3c23fb839540ff1e41f1186fe5ffa19/btrbk#L5692
[3]: https://github.com/sysnux/btrfs-snapshots-diff

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-01-11  9:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04 20:51 synchronize btrfs snapshots over a unreliable, slow connection  
2021-01-05  8:34 ` Forza
2021-01-05 11:24   ` Graham Cobb
2021-01-05 11:53     ` Roman Mamedov
2021-01-05 12:24     ` Cerem Cem ASLAN
2021-01-06  8:18       ` Forza
2021-01-07  2:06         ` Zygo Blaxell
2021-01-11  9:32         ` Cerem Cem ASLAN
2021-01-07  3:09   ` Zygo Blaxell
2021-01-07 19:22     ` Graham Cobb
2021-01-07  1:59 ` Zygo Blaxell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.