linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* receive failing for incremental streams
@ 2021-12-15 20:27 Eric Levy
  2021-12-15 23:35 ` Graham Cobb
  2021-12-16  5:36 ` Andrei Borzenkov
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Levy @ 2021-12-15 20:27 UTC (permalink / raw)
  To: linux-btrfs

Hello.

I have been experiencing very confusing problems with incremental
streams.

For a subvolume, I have a simple incremental backup created from two
stages:

btrfs send old/@ > base.btrfs
btrfs send new/@ -p old/@ > update.btrfs

The two source subvolumes are snapshots captured at separate times from
the same actively mounted subvolume.

On the target, I attempt to restore:

btrfs receive ./ < base.btrfs
btrfs receive ./ < update.btfs

The expectation is that the prior command would create a restored
snapshot of the initial backup stage, and that the latter would apply
the updated stage.

The prior command succeeds, but the latter fails:

ERROR: creating snapshot ./@ -> @ failed: File exists

Since it is obvious I cannot usefully apply the second stage to a
target that does not exist, I am puzzled about why the process performs
this check, as well as what is expected to have success applying the
update.

How may I apply the update stage to the target generated from restoring
the initial stage?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: receive failing for incremental streams
  2021-12-15 20:27 receive failing for incremental streams Eric Levy
@ 2021-12-15 23:35 ` Graham Cobb
  2021-12-15 23:52   ` Eric Levy
  2021-12-16  5:36 ` Andrei Borzenkov
  1 sibling, 1 reply; 9+ messages in thread
From: Graham Cobb @ 2021-12-15 23:35 UTC (permalink / raw)
  To: Eric Levy, linux-btrfs

On 15/12/2021 20:27, Eric Levy wrote:
> Hello.
> 
> I have been experiencing very confusing problems with incremental
> streams.

There is no such thing as an incremental stream. Send sends all the
information necessary to create a subvolume. Some of that includes
instructions to share data in other subvolumes but it is not incremental.

> For a subvolume, I have a simple incremental backup created from two
> stages:
> 
> btrfs send old/@ > base.btrfs
> btrfs send new/@ -p old/@ > update.btrfs
> 
> The two source subvolumes are snapshots captured at separate times from
> the same actively mounted subvolume.
> 
> On the target, I attempt to restore:
> 
> btrfs receive ./ < base.btrfs
> btrfs receive ./ < update.btfs
> 
> The expectation is that the prior command would create a restored
> snapshot of the initial backup stage, 

Yes

> and that the latter would apply
> the updated stage.

No. Receive always creates a brand new subvolume - it doesn't update
anything. Of course, the new subvolume may include clones of data stored
in other subvolumes but it doesn't modify any existing subvolumes.

> 
> The prior command succeeds, but the latter fails:
> 
> ERROR: creating snapshot ./@ -> @ failed: File exists
> 
> Since it is obvious I cannot usefully apply the second stage to a
> target that does not exist, I am puzzled about why the process performs
> this check, as well as what is expected to have success applying the
> update.
> 
> How may I apply the update stage to the target generated from restoring
> the initial stage?

You don't. Receive will create a new subvolume - which will include
unchanged data from the initial stage and whatever changes have
happened. If you want, you can then snapshot that (read-only or
read-write as you wish) into any position you want in your destination
filesystem.

Graham

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: receive failing for incremental streams
  2021-12-15 23:35 ` Graham Cobb
@ 2021-12-15 23:52   ` Eric Levy
  2021-12-16  0:55     ` Graham Cobb
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Levy @ 2021-12-15 23:52 UTC (permalink / raw)
  To: linux-btrfs

Thank you for the reply. Please see my questions, below.

On Wed, 2021-12-15 at 23:35 +0000, Graham Cobb wrote:

> There is no such thing as an incremental stream. Send sends all the
> information necessary to create a subvolume. Some of that includes
> instructions to share data in other subvolumes but it is not
> incremental.

Perhaps you would clarify the distinction, as to me an incremental
backup is a minimal set of data needed to recreate the original volume
when combined with the previous capture.

> You don't. Receive will create a new subvolume - which will include
> unchanged data from the initial stage and whatever changes have
> happened. If you want, you can then snapshot that (read-only or
> read-write as you wish) into any position you want in your
> destination
> filesystem.

How should I use the latter stream? From the stream length it is
obvious it does not contain most of the data from the earlier one.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: receive failing for incremental streams
  2021-12-15 23:52   ` Eric Levy
@ 2021-12-16  0:55     ` Graham Cobb
  2021-12-16  1:13       ` Eric Levy
  0 siblings, 1 reply; 9+ messages in thread
From: Graham Cobb @ 2021-12-16  0:55 UTC (permalink / raw)
  To: Eric Levy, linux-btrfs


On 15/12/2021 23:52, Eric Levy wrote:
> Thank you for the reply. Please see my questions, below.
> 
> On Wed, 2021-12-15 at 23:35 +0000, Graham Cobb wrote:
> 
>> There is no such thing as an incremental stream. Send sends all the
>> information necessary to create a subvolume. Some of that includes
>> instructions to share data in other subvolumes but it is not
>> incremental.
> 
> Perhaps you would clarify the distinction, as to me an incremental
> backup is a minimal set of data needed to recreate the original volume
> when combined with the previous capture.

Maybe it isn't a real difference. I mean that the stream is not intended
to make **changes** to an existing subvolume to create the new version.
It is intended to **create** a new version, reusing some of the extents
from the earlier version (but, not changing the earlier version at all).

> 
>> You don't. Receive will create a new subvolume - which will include
>> unchanged data from the initial stage and whatever changes have
>> happened. If you want, you can then snapshot that (read-only or
>> read-write as you wish) into any position you want in your
>> destination
>> filesystem.
> 
> How should I use the latter stream? From the stream length it is
> obvious it does not contain most of the data from the earlier one.
> 

Imagine you have a subvolume called /data on the source system. One day
you snapshot it to create /data-1. You then send /data-1 to the second
system to create a read-only subvolume on that system - let's call it
/copy-data-1.

Later you snapshot /data again to create /data-2 on the source system.
You btrfs-send /data-2 to the other system again and it creates a new
read-only subvolume - you tell btrfs-receive what to call it and where
to put it, let's say you call it /copy-data-2 - using the data in the
stream and reusing some extents from the existing /copy-data-1.
/copy-data-2 is now a (read-only) copy of /data-2 from the source system.

How you use that copy is up to you. If you are just taking backups you
probably do nothing with it unless you have a problem (it will form part
of the source for data for any future /copy-data-3). If you want to use
it to initialize a read-write subvolume on the destination system you
can take a read-write snapshot of /copy-data-2 to create a new subvolume
(say /my-new-data) on the destination system.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: receive failing for incremental streams
  2021-12-16  0:55     ` Graham Cobb
@ 2021-12-16  1:13       ` Eric Levy
  2021-12-16 10:24         ` Graham Cobb
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Levy @ 2021-12-16  1:13 UTC (permalink / raw)
  To: linux-btrfs

> Later you snapshot /data again to create /data-2 on the source
> system.
> You btrfs-send /data-2 to the other system again and it creates a new
> read-only subvolume - you tell btrfs-receive what to call it and
> where
> to put it, let's say you call it /copy-data-2 - using the data in the
> stream and reusing some extents from the existing /copy-data-1.
> /copy-data-2 is now a (read-only) copy of /data-2 from the source
> system.
> 
> How you use that copy is up to you. If you are just taking backups
> you
> probably do nothing with it unless you have a problem (it will form
> part
> of the source for data for any future /copy-data-3). If you want to
> use
> it to initialize a read-write subvolume on the destination system you
> can take a read-write snapshot of /copy-data-2 to create a new
> subvolume
> (say /my-new-data) on the destination system.

Such is close to what I have always understood about receive, but the
confusion is that the second receive command makes no reference to the
subvolume created by the first command. How do I ultimately create a
restore target that combines the original full capture with the
incremental differences?

When I ask how I use it, I mean what commands do I enter into the
system.

Note in my case I archive the streams into regular (compressed) filesm
for later recovery.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: receive failing for incremental streams
  2021-12-15 20:27 receive failing for incremental streams Eric Levy
  2021-12-15 23:35 ` Graham Cobb
@ 2021-12-16  5:36 ` Andrei Borzenkov
  1 sibling, 0 replies; 9+ messages in thread
From: Andrei Borzenkov @ 2021-12-16  5:36 UTC (permalink / raw)
  To: Eric Levy, linux-btrfs

On 15.12.2021 23:27, Eric Levy wrote:
> Hello.
> 
> I have been experiencing very confusing problems with incremental
> streams.
> 
> For a subvolume, I have a simple incremental backup created from two
> stages:
> 
> btrfs send old/@ > base.btrfs
> btrfs send new/@ -p old/@ > update.btrfs
> 
> The two source subvolumes are snapshots captured at separate times from
> the same actively mounted subvolume.
> 
> On the target, I attempt to restore:
> 
> btrfs receive ./ < base.btrfs
> btrfs receive ./ < update.btfs
> 
> The expectation is that the prior command would create a restored
> snapshot of the initial backup stage, and that the latter would apply
> the updated stage.
> 
> The prior command succeeds, but the latter fails:
> 
> ERROR: creating snapshot ./@ -> @ failed: File exists
> 

You need to restore it in different directory. Each send stream defines
subvolume and you cannot have two subvolumes with the same name in the
same directory. 

> Since it is obvious I cannot usefully apply the second stage to a
> target that does not exist, I am puzzled about why the process performs
> this check, as well as what is expected to have success applying the
> update.
> 
> How may I apply the update stage to the target generated from restoring
> the initial stage?
> 
> 

You misunderstand what happens. btrfs receive does not update existing subvlume.
It always creates new subvolume by cloning parent replica and applying changes
to this clone. Parent remains in its original state and read-only.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: receive failing for incremental streams
  2021-12-16  1:13       ` Eric Levy
@ 2021-12-16 10:24         ` Graham Cobb
  2021-12-16 11:38           ` Hugo Mills
  0 siblings, 1 reply; 9+ messages in thread
From: Graham Cobb @ 2021-12-16 10:24 UTC (permalink / raw)
  To: Eric Levy, linux-btrfs

On 16/12/2021 01:13, Eric Levy wrote:
>> Later you snapshot /data again to create /data-2 on the source
>> system.
>> You btrfs-send /data-2 to the other system again and it creates a new
>> read-only subvolume - you tell btrfs-receive what to call it and
>> where
>> to put it, let's say you call it /copy-data-2 - using the data in the
>> stream and reusing some extents from the existing /copy-data-1.
>> /copy-data-2 is now a (read-only) copy of /data-2 from the source
>> system.
>>
>> How you use that copy is up to you. If you are just taking backups
>> you
>> probably do nothing with it unless you have a problem (it will form
>> part
>> of the source for data for any future /copy-data-3). If you want to
>> use
>> it to initialize a read-write subvolume on the destination system you
>> can take a read-write snapshot of /copy-data-2 to create a new
>> subvolume
>> (say /my-new-data) on the destination system.
> 
> Such is close to what I have always understood about receive, but the
> confusion is that the second receive command makes no reference to the
> subvolume created by the first command. How do I ultimately create a
> restore target that combines the original full capture with the
> incremental differences?

It's just magic. Seriously, as long as you have already restored the
parent (and any clone sources, if you have specified those) to the same
filesystem, btrfs will find them and clone the necessary files into the
new subvolume.

> 
> When I ask how I use it, I mean what commands do I enter into the
> system.

Assume subvolume called /data.

On the sending side...

btrfs subv snapshot -r /data /data-1
btrfs send /data-1 -f data-1.send

Later, to generate the incremental stream from /data-1...

btrfs subv snapshot -r /data /data-2
btrfs send -p /data-1 /data-2 -f data-2.send

When you want to restore...

btrfs receive -f data-1.send /recv-data-1
btrfs receive -f data-2.send /recv-data-2

If you want read-write access to the data you need to create a new
subvolume...

btrfs subv snapshot /recv-data-2 /new-data

[I haven't tested these so sorry for any mistakes - hopefully you get
the idea]

> 
> Note in my case I archive the streams into regular (compressed) filesm
> for later recovery.

I considered doing that but I don't recommend it. The biggest issue is
that you have to keep all the incrementals since the last full backup,
as all the steps must complete in order to restore. This means that if
something has gone wrong with the archive (even a single bit corruption,
or an unexpected truncation) all the incremental streams after that
point are useless. btrfs receive doesn't have a "try hard" mode - it
will just fail unless all the sources it needs, and the stream it is
processing, are perfect. And you don't know, unless you do regular test
restorations.

In the end I decided I would keep the archive subvolumes themselves, not
the streams. Even in the worst case, this takes very little more space
(assuming you have turned on compression) - after all the cloned data is
still cloned. And even if something has been corrupted you can still get
at undamaged files in the various subvolumes. And if you make sure that
each send stream is only using the directly previous snapshot as its
clone source, you can remove any older snapshots that you like without
making later subvolumes useless.

Once I decided that, I ended up using btrbk - which makes a good job of
managing the backup and archive subvolumes, on both the source system
and the destination system. Of course, many other tools are available.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: receive failing for incremental streams
  2021-12-16 10:24         ` Graham Cobb
@ 2021-12-16 11:38           ` Hugo Mills
  2021-12-18 23:53             ` Eric Levy
  0 siblings, 1 reply; 9+ messages in thread
From: Hugo Mills @ 2021-12-16 11:38 UTC (permalink / raw)
  To: Graham Cobb; +Cc: Eric Levy, linux-btrfs

On Thu, Dec 16, 2021 at 10:24:09AM +0000, Graham Cobb wrote:
> On 16/12/2021 01:13, Eric Levy wrote:
> >> Later you snapshot /data again to create /data-2 on the source
> >> system.
> >> You btrfs-send /data-2 to the other system again and it creates a new
> >> read-only subvolume - you tell btrfs-receive what to call it and
> >> where
> >> to put it, let's say you call it /copy-data-2 - using the data in the
> >> stream and reusing some extents from the existing /copy-data-1.
> >> /copy-data-2 is now a (read-only) copy of /data-2 from the source
> >> system.
> >>
> >> How you use that copy is up to you. If you are just taking backups
> >> you
> >> probably do nothing with it unless you have a problem (it will form
> >> part
> >> of the source for data for any future /copy-data-3). If you want to
> >> use
> >> it to initialize a read-write subvolume on the destination system you
> >> can take a read-write snapshot of /copy-data-2 to create a new
> >> subvolume
> >> (say /my-new-data) on the destination system.
> > 
> > Such is close to what I have always understood about receive, but the
> > confusion is that the second receive command makes no reference to the
> > subvolume created by the first command. How do I ultimately create a
> > restore target that combines the original full capture with the
> > incremental differences?
> 
> It's just magic. Seriously, as long as you have already restored the
> parent (and any clone sources, if you have specified those) to the same
> filesystem, btrfs will find them and clone the necessary files into the
> new subvolume.

   This is what happens:

Sending machine                        Receiving machine

$ send A
    Send all the data of A
    plus its UUID (uA)

                                       $ receive
				          Make a new subvol, A'
					  Write all the data to it
					  Set "received_uuid" on A' to uA
					  Make A' read-only

$ send B -p A
    Send the differences between A
    and B, plus their UUIDs, uA and uB

                                      $ receive
				         Find the subvol with
					 "received_uuid" == uA (this is A')
					 Snapshot it to B'
					 Modify B' using the differences
					 Set "received_uuid" of B' to uB
					 Make B' read-only


> > When I ask how I use it, I mean what commands do I enter into the
> > system.
> 
> Assume subvolume called /data.
> 
> On the sending side...
> 
> btrfs subv snapshot -r /data /data-1
> btrfs send /data-1 -f data-1.send
> 
> Later, to generate the incremental stream from /data-1...
> 
> btrfs subv snapshot -r /data /data-2
> btrfs send -p /data-1 /data-2 -f data-2.send
> 
> When you want to restore...
> 
> btrfs receive -f data-1.send /recv-data-1
> btrfs receive -f data-2.send /recv-data-2
> 
> If you want read-write access to the data you need to create a new
> subvolume...
> 
> btrfs subv snapshot /recv-data-2 /new-data
> 
> [I haven't tested these so sorry for any mistakes - hopefully you get
> the idea]
> 
> > 
> > Note in my case I archive the streams into regular (compressed) filesm
> > for later recovery.
> 
> I considered doing that but I don't recommend it. The biggest issue is
> that you have to keep all the incrementals since the last full backup,
> as all the steps must complete in order to restore. This means that if
> something has gone wrong with the archive (even a single bit corruption,
> or an unexpected truncation) all the incremental streams after that
> point are useless. btrfs receive doesn't have a "try hard" mode - it
> will just fail unless all the sources it needs, and the stream it is
> processing, are perfect. And you don't know, unless you do regular test
> restorations.
> 
> In the end I decided I would keep the archive subvolumes themselves, not
> the streams. Even in the worst case, this takes very little more space
> (assuming you have turned on compression) - after all the cloned data is
> still cloned. And even if something has been corrupted you can still get
> at undamaged files in the various subvolumes. And if you make sure that
> each send stream is only using the directly previous snapshot as its
> clone source, you can remove any older snapshots that you like without
> making later subvolumes useless.
> 
> Once I decided that, I ended up using btrbk - which makes a good job of
> managing the backup and archive subvolumes, on both the source system
> and the destination system. Of course, many other tools are available.
> 

-- 
Hugo Mills             | Computer Science is not about computers, any more
hugo@... carfax.org.uk | than astronomy is about telescopes.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                       Esdger Dijkstra

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: receive failing for incremental streams
  2021-12-16 11:38           ` Hugo Mills
@ 2021-12-18 23:53             ` Eric Levy
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Levy @ 2021-12-18 23:53 UTC (permalink / raw)
  To: linux-btrfs

Thank you for the explanation about streams.

My first observation is that the details clarified in this conversation
are easily understood from the man page, nor from any official online
documentation I had found, nor even from any other discussion or
documentation I had found through web searches. Thus, even if it were
the only change to result from these considerations, I would suggest
that the man page should include a more robust explanation of the
design.

Next, the child stream being restored to a new subvolume, with the
result sharing references with the parent, may be practical from a
standpoint of underlying implementation, but may not be intuitive for a
user in a typical work flow. It might be helpful for users to have some
direct support for the use case of updating an existing stream in
place.

Finally, the constraint that a restore target must have the same file
name as the original subvolume is, at least to my thinking,
inconvenient, if not also in many cases challenging, as when the
original name is not known, perhaps having been chosen arbitrarily. A
useful feature would be an option in the administrative tool to choose
the name of the restored subvolume, not simply the parent directory.

Whether any such enhancements require changes to the file system
functionality is beyond my knowledge, but it is certainly worthwhile to
consider any that are possible through changing only tools in user
space.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-18 23:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-15 20:27 receive failing for incremental streams Eric Levy
2021-12-15 23:35 ` Graham Cobb
2021-12-15 23:52   ` Eric Levy
2021-12-16  0:55     ` Graham Cobb
2021-12-16  1:13       ` Eric Levy
2021-12-16 10:24         ` Graham Cobb
2021-12-16 11:38           ` Hugo Mills
2021-12-18 23:53             ` Eric Levy
2021-12-16  5:36 ` Andrei Borzenkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).