All of lore.kernel.org
 help / color / mirror / Atom feed
* hierarchical, tree-like structure of snapshots
@ 2020-12-30 16:56 john terragon
  2020-12-30 17:03 ` john terragon
  2020-12-30 17:24 ` sys
  0 siblings, 2 replies; 17+ messages in thread
From: john terragon @ 2020-12-30 16:56 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi.
I would like to maintain a tree-like hierarchical structure of
snapshots. Let me try to explain what I mean by that.

Let's say I have a btrfs fs with just one subvolume X, and let's say
that a make a readonly snapshot Y of X. As far as I understand there
is a parent-child relation between Y (the parent) and X the child.

Now let's say that after some time and modifications of X I do another
snapshot Z of X. Now the "temporal" stucture would be Y-Z-X. So X is
now the "child" of Z and Z is now the "child" of Y. The structure is a
path which is a special case of a tree.

Now let's suppose that I want to start modify Y but I still want to be
able to have a parent of Z which I might use as a point of reference
for Z in a
send to somewhere. That is I want to be able to still do a send -p Y Z
to another btrfs filesystem where there is previously sent copy of Y
(which, remember, as of this point has been readonly and I'm just now
wanting to start to modify it).
The only thing I think I can do would be to make a readonly snapshot
Y1 of Y and make Y writeable (so that I can start modify it). At that
point the structure would be

Y1-Y
    \
      Z-X

(yes my ascii art is atrocious...) which is a "proper" tree where Y1
is the root with two children (Y and Z), Z has one child (X) and Y and
X are leaves.
Now, my question is, would Y1 still be usable in send -p Y1 Z, just
like Y was before becoming writeable and being modified? I would say
that Y1 would be just as good as the readonly original Y was as a
parent for Z in a send. But maybe there is some implementation detail
that escapes me and that prevents Y1 to be used as a perfect
replacement for the original Y.
I hope I was clear enough.
Thanks
John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-30 16:56 hierarchical, tree-like structure of snapshots john terragon
@ 2020-12-30 17:03 ` john terragon
  2020-12-30 17:24 ` sys
  1 sibling, 0 replies; 17+ messages in thread
From: john terragon @ 2020-12-30 17:03 UTC (permalink / raw)
  To: Btrfs BTRFS

Sorry, that ascii tree came out awful and it looks like Z is the child
of Y instead of Y1. I hope this one below looks better.

Y1-Y
 \
  Z-X

On Wed, Dec 30, 2020 at 5:56 PM john terragon <jterragon@gmail.com> wrote:
>
> Hi.
> I would like to maintain a tree-like hierarchical structure of
> snapshots. Let me try to explain what I mean by that.
>
> Let's say I have a btrfs fs with just one subvolume X, and let's say
> that a make a readonly snapshot Y of X. As far as I understand there
> is a parent-child relation between Y (the parent) and X the child.
>
> Now let's say that after some time and modifications of X I do another
> snapshot Z of X. Now the "temporal" stucture would be Y-Z-X. So X is
> now the "child" of Z and Z is now the "child" of Y. The structure is a
> path which is a special case of a tree.
>
> Now let's suppose that I want to start modify Y but I still want to be
> able to have a parent of Z which I might use as a point of reference
> for Z in a
> send to somewhere. That is I want to be able to still do a send -p Y Z
> to another btrfs filesystem where there is previously sent copy of Y
> (which, remember, as of this point has been readonly and I'm just now
> wanting to start to modify it).
> The only thing I think I can do would be to make a readonly snapshot
> Y1 of Y and make Y writeable (so that I can start modify it). At that
> point the structure would be
>
> Y1-Y
>     \
>       Z-X
>
> (yes my ascii art is atrocious...) which is a "proper" tree where Y1
> is the root with two children (Y and Z), Z has one child (X) and Y and
> X are leaves.
> Now, my question is, would Y1 still be usable in send -p Y1 Z, just
> like Y was before becoming writeable and being modified? I would say
> that Y1 would be just as good as the readonly original Y was as a
> parent for Z in a send. But maybe there is some implementation detail
> that escapes me and that prevents Y1 to be used as a perfect
> replacement for the original Y.
> I hope I was clear enough.
> Thanks
> John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-30 16:56 hierarchical, tree-like structure of snapshots john terragon
  2020-12-30 17:03 ` john terragon
@ 2020-12-30 17:24 ` sys
  2020-12-30 17:39   ` john terragon
  1 sibling, 1 reply; 17+ messages in thread
From: sys @ 2020-12-30 17:24 UTC (permalink / raw)
  To: john terragon, Btrfs BTRFS



On 2020-12-30 17:56, john terragon wrote:
> Hi.
> I would like to maintain a tree-like hierarchical structure of
> snapshots. Let me try to explain what I mean by that.
> 
> Let's say I have a btrfs fs with just one subvolume X, and let's say
> that a make a readonly snapshot Y of X. As far as I understand there
> is a parent-child relation between Y (the parent) and X the child.
> 
> Now let's say that after some time and modifications of X I do another
> snapshot Z of X. Now the "temporal" stucture would be Y-Z-X. So X is
> now the "child" of Z and Z is now the "child" of Y. The structure is a
> path which is a special case of a tree.
> 
> Now let's suppose that I want to start modify Y but I still want to be
> able to have a parent of Z which I might use as a point of reference
> for Z in a
> send to somewhere. That is I want to be able to still do a send -p Y Z
> to another btrfs filesystem where there is previously sent copy of Y
> (which, remember, as of this point has been readonly and I'm just now
> wanting to start to modify it).
> The only thing I think I can do would be to make a readonly snapshot
> Y1 of Y and make Y writeable (so that I can start modify it).

You should simply make a 'read-write' snapshot (Y-rw) of the 'read-only' 
snapshot (Y) that is part of your backup/send scheme. Do not modify 
read-only snapshots to be rw.


  At that
> point the structure would be
> 
> Y1-Y
>      \
>        Z-X
> 
> (yes my ascii art is atrocious...) which is a "proper" tree where Y1
> is the root with two children (Y and Z), Z has one child (X) and Y and
> X are leaves.
> Now, my question is, would Y1 still be usable in send -p Y1 Z, just
> like Y was before becoming writeable and being modified? I would say
> that Y1 would be just as good as the readonly original Y was as a
> parent for Z in a send. But maybe there is some implementation detail
> that escapes me and that prevents Y1 to be used as a perfect
> replacement for the original Y.
> I hope I was clear enough.
> Thanks
> John
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-30 17:24 ` sys
@ 2020-12-30 17:39   ` john terragon
  2020-12-31  7:05     ` Andrei Borzenkov
  0 siblings, 1 reply; 17+ messages in thread
From: john terragon @ 2020-12-30 17:39 UTC (permalink / raw)
  To: sys; +Cc: Btrfs BTRFS

On Wed, Dec 30, 2020 at 6:24 PM sys <system@lechevalier.se> wrote:
>
>
>
[...]
> You should simply make a 'read-write' snapshot (Y-rw) of the 'read-only'
> snapshot (Y) that is part of your backup/send scheme. Do not modify
> read-only snapshots to be rw.
>

OK, but then could I use Y as parent of the rw snapshot, let's call it
W, in a send?
So I would have this tree where Y is still the root.

Y-W
 \
  Z-X

Can I do a send -p Y W ?
Because I thought it was other way around, that is I do a readonly
snapshot W of Y and that will be the base for incrementally sending
the future modified Y to another  FS (provided of course W is already
there).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-30 17:39   ` john terragon
@ 2020-12-31  7:05     ` Andrei Borzenkov
  2020-12-31 10:00       ` Forza
  2020-12-31 16:08       ` john terragon
  0 siblings, 2 replies; 17+ messages in thread
From: Andrei Borzenkov @ 2020-12-31  7:05 UTC (permalink / raw)
  To: john terragon, sys; +Cc: Btrfs BTRFS

30.12.2020 20:39, john terragon пишет:
> On Wed, Dec 30, 2020 at 6:24 PM sys <system@lechevalier.se> wrote:
>>
>>
>>
> [...]
>> You should simply make a 'read-write' snapshot (Y-rw) of the 'read-only'
>> snapshot (Y) that is part of your backup/send scheme. Do not modify
>> read-only snapshots to be rw.
>>
> 
> OK, but then could I use Y as parent of the rw snapshot, let's call it
> W, in a send?

No

> So I would have this tree where Y is still the root.
> 
> Y-W
>  \
>   Z-X
> 
> Can I do a send -p Y W ?

No. All subvolumes used in send/receive must be read-only. And they must
remain read-only from the moment they are created - we have seen quite a
lot of reports when users removed read-only property from subvolume used
in the past as send source, modified it, set as read-only again and
tried to continue replication. This resulted in complete mess on receive
side. Also if you try to modify destination snapshots it will break at
some point.

The general rule - everything used for replication must remain
read-only. If you want to use any snapshot that is part of replication
you clone it and use its clone.

> Because I thought it was other way around, that is I do a readonly
> snapshot W of Y and that will be the base for incrementally sending
> the future modified Y to another  FS (provided of course W is already
> there).
> 

If you want to capture changes in W since it was cloned from Y you
create another read-only snapshot of W and use it.

btrfs subvolume snapshot -r W V
btrfs send -p Y V

It is possible that btrfs implementation is optimized for sequential
snapshots from the same subvolume so the send stream size will be
larger. I am not familiar with these low level details. From the naïve
end-user point of view there should be no difference between

btrfs subvolume snapshot -r W R1
btrfs send R1
modify W
btrfs subvolume snapshot -r W R2
btrfs send -p R1 R2

and

btrfs send R1
btrfs subvolume snapshot R1 W
modify W
btrfs subvolume snapshot -r W R2
btrfs send -p R1 R2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-31  7:05     ` Andrei Borzenkov
@ 2020-12-31 10:00       ` Forza
  2020-12-31 16:08       ` john terragon
  1 sibling, 0 replies; 17+ messages in thread
From: Forza @ 2020-12-31 10:00 UTC (permalink / raw)
  To: Andrei Borzenkov, john terragon; +Cc: Btrfs BTRFS



On 2020-12-31 08:05, Andrei Borzenkov wrote:
> 30.12.2020 20:39, john terragon пишет:
>> On Wed, Dec 30, 2020 at 6:24 PM sys <system@lechevalier.se> wrote:
>>>
>>>
>>>
>> [...]
>>> You should simply make a 'read-write' snapshot (Y-rw) of the 'read-only'
>>> snapshot (Y) that is part of your backup/send scheme. Do not modify
>>> read-only snapshots to be rw.
>>>
>>
>> OK, but then could I use Y as parent of the rw snapshot, let's call it
>> W, in a send?
> 
> No
> 
>> So I would have this tree where Y is still the root.
>>
>> Y-W
>>   \
>>    Z-X
>>
>> Can I do a send -p Y W ?
> 
> No. All subvolumes used in send/receive must be read-only. And they must
> remain read-only from the moment they are created - we have seen quite a
> lot of reports when users removed read-only property from subvolume used
> in the past as send source, modified it, set as read-only again and
> tried to continue replication. This resulted in complete mess on receive
> side. Also if you try to modify destination snapshots it will break at
> some point.
> 
> The general rule - everything used for replication must remain
> read-only. If you want to use any snapshot that is part of replication
> you clone it and use its clone.
> 
>> Because I thought it was other way around, that is I do a readonly
>> snapshot W of Y and that will be the base for incrementally sending
>> the future modified Y to another  FS (provided of course W is already
>> there).
>>
> 
> If you want to capture changes in W since it was cloned from Y you
> create another read-only snapshot of W and use it.
> 
> btrfs subvolume snapshot -r W V
> btrfs send -p Y V
> 
> It is possible that btrfs implementation is optimized for sequential
> snapshots from the same subvolume so the send stream size will be
> larger. I am not familiar with these low level details. From the naïve
> end-user point of view there should be no difference between
> 
> btrfs subvolume snapshot -r W R1
> btrfs send R1
> modify W
> btrfs subvolume snapshot -r W R2
> btrfs send -p R1 R2
> 
> and
> 
> btrfs send R1
> btrfs subvolume snapshot R1 W
> modify W
> btrfs subvolume snapshot -r W R2
> btrfs send -p R1 R2
> 

I think you are correct. The man page specifies that all snapshots must 
be read-only, but it is rather unclear if you modify some snaps in between.

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-send


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-31  7:05     ` Andrei Borzenkov
  2020-12-31 10:00       ` Forza
@ 2020-12-31 16:08       ` john terragon
  2020-12-31 17:28         ` Zygo Blaxell
  1 sibling, 1 reply; 17+ messages in thread
From: john terragon @ 2020-12-31 16:08 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: sys, Btrfs BTRFS

On Thu, Dec 31, 2020 at 8:05 AM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>

> >
> > OK, but then could I use Y as parent of the rw snapshot, let's call it
> > W, in a send?
>
> No
>

Of course I didn't mean to use Y as a parent of W itself but to a
readonly snapshot of W whenever I want to send it to the second FS.

And I just tried the following steps and they worked:

1) created subvol X
2) created readonly snap Y of X
3) sent Y to second FS
4) modified X
5) created readonly snap X1 of X
6) sent -p Y X1 to second FS
7) created readwrite snap Y1 of Y
8) modified Y1
9) created readonly snap Y1_RO of Y1
10) sent -p Y Y1_RO to second FS

So, as you can see,

-in 6) I've used the RO snap Y of X as the parent of X1 (and X) to
send X1 to the second FS

-in 10) I did the opposite, Y is still used as the parent but this
time I've sent the RO snap of a subvol that is a snap of Y.

So it seems to work both ways

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-31 16:08       ` john terragon
@ 2020-12-31 17:28         ` Zygo Blaxell
  2020-12-31 18:19           ` john terragon
  0 siblings, 1 reply; 17+ messages in thread
From: Zygo Blaxell @ 2020-12-31 17:28 UTC (permalink / raw)
  To: john terragon; +Cc: Andrei Borzenkov, sys, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1621 bytes --]

On Thu, Dec 31, 2020 at 05:08:57PM +0100, john terragon wrote:
> On Thu, Dec 31, 2020 at 8:05 AM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> >
> 
> > >
> > > OK, but then could I use Y as parent of the rw snapshot, let's call it
> > > W, in a send?
> >
> > No
> >
> 
> Of course I didn't mean to use Y as a parent of W itself but to a
> readonly snapshot of W whenever I want to send it to the second FS.
> 
> And I just tried the following steps and they worked:
> 
> 1) created subvol X
> 2) created readonly snap Y of X
> 3) sent Y to second FS
> 4) modified X
> 5) created readonly snap X1 of X
> 6) sent -p Y X1 to second FS
> 7) created readwrite snap Y1 of Y
> 8) modified Y1
> 9) created readonly snap Y1_RO of Y1
> 10) sent -p Y Y1_RO to second FS
> 
> So, as you can see,
> 
> -in 6) I've used the RO snap Y of X as the parent of X1 (and X) to
> send X1 to the second FS
> 
> -in 10) I did the opposite, Y is still used as the parent but this
> time I've sent the RO snap of a subvol that is a snap of Y.
> 
> So it seems to work both ways

I think your confusion is that you are thinking of these as a tree.
There is no tree, each subvol is an equal peer in the filesystem.

"send -p A B" just walks over subvol A and B and sends a diff of the
parts of B not in A.  You can pick any subvol with -p as long as it's
read-only and present on the receiving side.  Obviously it's much more
efficient if the two subvols have a lot of shared extents (e.g. because
B and A were both snapshots made at different times of some other subvol
C), but this is not required.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-31 17:28         ` Zygo Blaxell
@ 2020-12-31 18:19           ` john terragon
  2020-12-31 19:42             ` Andrei Borzenkov
  0 siblings, 1 reply; 17+ messages in thread
From: john terragon @ 2020-12-31 18:19 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Andrei Borzenkov, sys, Btrfs BTRFS

On Thu, Dec 31, 2020 at 6:28 PM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:

>
> I think your confusion is that you are thinking of these as a tree.
> There is no tree, each subvol is an equal peer in the filesystem.
>
> "send -p A B" just walks over subvol A and B and sends a diff of the
> parts of B not in A.  You can pick any subvol with -p as long as it's
> read-only and present on the receiving side.  Obviously it's much more
> efficient if the two subvols have a lot of shared extents (e.g. because
> B and A were both snapshots made at different times of some other subvol
> C), but this is not required.

Can you really use ANY subvol to use with -p. Because if I

1) create a subvol X
2) create a subvol W with the exact same content of X (but created
independently)
3) do a RO snap X_RO of X
4) do a RO snap W_RO of W
5) send W_RO to the other FS
6) send -p W_RO X_RO to the other FS

I get this:

At subvol X_RO
At snapshot X_RO
ERROR: chown o257-1648413-0 failed: No such file or directory

any idea?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-31 18:19           ` john terragon
@ 2020-12-31 19:42             ` Andrei Borzenkov
  2020-12-31 20:48               ` john terragon
  0 siblings, 1 reply; 17+ messages in thread
From: Andrei Borzenkov @ 2020-12-31 19:42 UTC (permalink / raw)
  To: john terragon, Zygo Blaxell; +Cc: sys, Btrfs BTRFS

31.12.2020 21:19, john terragon пишет:
> On Thu, Dec 31, 2020 at 6:28 PM Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> 
>>
>> I think your confusion is that you are thinking of these as a tree.
>> There is no tree, each subvol is an equal peer in the filesystem.
>>
>> "send -p A B" just walks over subvol A and B and sends a diff of the
>> parts of B not in A.  You can pick any subvol with -p as long as it's
>> read-only and present on the receiving side.  Obviously it's much more
>> efficient if the two subvols have a lot of shared extents (e.g. because
>> B and A were both snapshots made at different times of some other subvol
>> C), but this is not required.
> 
> Can you really use ANY subvol to use with -p. Because if I
> 
> 1) create a subvol X
> 2) create a subvol W with the exact same content of X (but created
> independently)

How exactly you create subvolume with the same content? There are many
possible interpretations.

> 3) do a RO snap X_RO of X
> 4) do a RO snap W_RO of W
> 5) send W_RO to the other FS

Show actual command please.

> 6) send -p W_RO X_RO to the other FS
> 

Again show full command please. Which include also receive command.

> I get this:
> 
> At subvol X_RO
> At snapshot X_RO
> ERROR: chown o257-1648413-0 failed: No such file or directory
> 

You get where? On source, on destination?

> any idea?
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-31 19:42             ` Andrei Borzenkov
@ 2020-12-31 20:48               ` john terragon
  2020-12-31 21:36                 ` Zygo Blaxell
  0 siblings, 1 reply; 17+ messages in thread
From: john terragon @ 2020-12-31 20:48 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Zygo Blaxell, sys, Btrfs BTRFS

On Thu, Dec 31, 2020 at 8:42 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>

>
> How exactly you create subvolume with the same content? There are many
> possible interpretations.
>

Zygo wrote that any subvol could be used with -p. So, out of
curiosity, I did the following

1) btrfs sub create X
2) I unpacked some source (linux kernel) in X
3) btrfs sub create W
4) I unpacked the same source in W (so X and W have the same content
but they are independent)
5) btrfs sub snap -r X X_RO
6) btrfs sub snap -r W W_RO
7) btrfs send W_RO | btrfs receive /mnt/btrfs2
8) btrfs send -p W_RO X_RO | btrfs receive /mnt/btrfs2

And this is the exact output of 8)

At subvol X_RO
At snapshot X_RO
ERROR: chown o257-1648413-0 failed: No such file or directory

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-31 20:48               ` john terragon
@ 2020-12-31 21:36                 ` Zygo Blaxell
  2021-01-01  4:54                   ` john terragon
  2021-01-01 11:42                   ` Andrei Borzenkov
  0 siblings, 2 replies; 17+ messages in thread
From: Zygo Blaxell @ 2020-12-31 21:36 UTC (permalink / raw)
  To: john terragon; +Cc: Andrei Borzenkov, sys, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 4922 bytes --]

On Thu, Dec 31, 2020 at 09:48:54PM +0100, john terragon wrote:
> On Thu, Dec 31, 2020 at 8:42 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> >
> 
> >
> > How exactly you create subvolume with the same content? There are many
> > possible interpretations.
> >
> 
> Zygo wrote that any subvol could be used with -p. So, out of
> curiosity, I did the following
> 
> 1) btrfs sub create X
> 2) I unpacked some source (linux kernel) in X
> 3) btrfs sub create W
> 4) I unpacked the same source in W (so X and W have the same content
> but they are independent)
> 5) btrfs sub snap -r X X_RO
> 6) btrfs sub snap -r W W_RO
> 7) btrfs send W_RO | btrfs receive /mnt/btrfs2
> 8) btrfs send -p W_RO X_RO | btrfs receive /mnt/btrfs2
> 
> And this is the exact output of 8)
> 
> At subvol X_RO
> At snapshot X_RO
> ERROR: chown o257-1648413-0 failed: No such file or directory

Yeah, I only checked that send completed without error and produced a
smaller stream.

I just dumped the send metadata stream from the incremental snapshot now,
and it's more or less garbage at the start:

	# btrfs sub create A
	# btrfs sub create B
	# date > A/date
	# date > B/date
	# mkdir A/t B/u
	# btrfs sub snap -r A A_RO
	# btrfs sub snap -r B B_RO
	# btrfs send A_RO | btrfs receive --dump
	At subvol A_RO
	subvol          ./A_RO                          uuid=995adde4-00ac-5e49-8c6f-f01743def072 transid=7329268
	chown           ./A_RO/                         gid=0 uid=0
	chmod           ./A_RO/                         mode=755
	utimes          ./A_RO/                         atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 ctime=2020-12-31T15:51:48-0500
	mkfile          ./A_RO/o257-7329268-0
	rename          ./A_RO/o257-7329268-0           dest=./A_RO/date
	utimes          ./A_RO/                         atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 ctime=2020-12-31T15:51:48-0500
	write           ./A_RO/date                     offset=0 len=29
	chown           ./A_RO/date                     gid=0 uid=0
	chmod           ./A_RO/date                     mode=644
	utimes          ./A_RO/date                     atime=2020-12-31T15:51:38-0500 mtime=2020-12-31T15:51:38-0500 ctime=2020-12-31T15:51:38-0500
	mkdir           ./A_RO/o258-7329268-0
	rename          ./A_RO/o258-7329268-0           dest=./A_RO/t
	utimes          ./A_RO/                         atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 ctime=2020-12-31T15:51:48-0500
	chown           ./A_RO/t                        gid=0 uid=0
	chmod           ./A_RO/t                        mode=755
	utimes          ./A_RO/t                        atime=2020-12-31T15:51:48-0500 mtime=2020-12-31T15:51:48-0500 ctime=2020-12-31T15:51:48-0500
	# btrfs send B_RO -p A_RO | btrfs receive --dump
	At subvol B_RO
	snapshot        ./B_RO                          uuid=4aa7db26-b219-694e-9b3c-f8f737a46bdb transid=7329268 parent_uuid=995adde4-00ac-5e49-8c6f-f01743def072 parent_transid=7329268
	utimes          ./B_RO/                         atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 ctime=2020-12-31T15:51:52-0500
	link            ./B_RO/date                     dest=date
	unlink          ./B_RO/date
	utimes          ./B_RO/                         atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 ctime=2020-12-31T15:51:52-0500
	write           ./B_RO/date                     offset=0 len=29
	utimes          ./B_RO/date                     atime=2020-12-31T15:51:41-0500 mtime=2020-12-31T15:51:41-0500 ctime=2020-12-31T15:51:41-0500
	rename          ./B_RO/t                        dest=./B_RO/u
	utimes          ./B_RO/                         atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 ctime=2020-12-31T15:51:52-0500
	utimes          ./B_RO/u                        atime=2020-12-31T15:51:52-0500 mtime=2020-12-31T15:51:52-0500 ctime=2020-12-31T15:51:52-0500
	# btrfs send A_RO | btrfs receive -v /tmp/test
	At subvol A_RO
	At subvol A_RO
	receiving subvol A_RO uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
	write date - offset=0 length=29
	BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
	# btrfs send B_RO -p A_RO | btrfs receive -v /tmp/test
	At subvol B_RO
	At snapshot B_RO
	receiving snapshot B_RO uuid=4aa7db26-b219-694e-9b3c-f8f737a46bdb, ctransid=7329268 parent_uuid=995adde4-00ac-5e49-8c6f-f01743def072, parent_ctransid=7329268
	ERROR: link date -> date failed: File exists

The btrfs_compare_trees function can handle arbitrary tree differences,
but something happens in one of the support functions and we get a
bogus link command.  The rest of the stream is OK though:  we fill
in the contents of B_RO/date, rename A_RO/t to B_RO/u, and update all
the timestamps.

Oh well, I didn't say send didn't have any bugs.  ;)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-31 21:36                 ` Zygo Blaxell
@ 2021-01-01  4:54                   ` john terragon
  2021-01-01 11:42                   ` Andrei Borzenkov
  1 sibling, 0 replies; 17+ messages in thread
From: john terragon @ 2021-01-01  4:54 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Andrei Borzenkov, sys, Btrfs BTRFS

Although I'm glad that a bug has been uncovered, maybe it's best if I
stick with good old rsync for backups.
It would be kind of ironic if the first data loss that I experienced
in many years of btrfs use would be caused by an ancillary backup
tool.

On Thu, Dec 31, 2020 at 10:36 PM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> On Thu, Dec 31, 2020 at 09:48:54PM +0100, john terragon wrote:
> > On Thu, Dec 31, 2020 at 8:42 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> > >
> >
> > >
> > > How exactly you create subvolume with the same content? There are many
> > > possible interpretations.
> > >
> >
> > Zygo wrote that any subvol could be used with -p. So, out of
> > curiosity, I did the following
> >
> > 1) btrfs sub create X
> > 2) I unpacked some source (linux kernel) in X
> > 3) btrfs sub create W
> > 4) I unpacked the same source in W (so X and W have the same content
> > but they are independent)
> > 5) btrfs sub snap -r X X_RO
> > 6) btrfs sub snap -r W W_RO
> > 7) btrfs send W_RO | btrfs receive /mnt/btrfs2
> > 8) btrfs send -p W_RO X_RO | btrfs receive /mnt/btrfs2
> >
> > And this is the exact output of 8)
> >
> > At subvol X_RO
> > At snapshot X_RO
> > ERROR: chown o257-1648413-0 failed: No such file or directory
>
> Yeah, I only checked that send completed without error and produced a
> smaller stream.
>
> I just dumped the send metadata stream from the incremental snapshot now,
> and it's more or less garbage at the start:
>
>         # btrfs sub create A
>         # btrfs sub create B
>         # date > A/date
>         # date > B/date
>         # mkdir A/t B/u
>         # btrfs sub snap -r A A_RO
>         # btrfs sub snap -r B B_RO
>         # btrfs send A_RO | btrfs receive --dump
>         At subvol A_RO
>         subvol          ./A_RO                          uuid=995adde4-00ac-5e49-8c6f-f01743def072 transid=7329268
>         chown           ./A_RO/                         gid=0 uid=0
>         chmod           ./A_RO/                         mode=755
>         utimes          ./A_RO/                         atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 ctime=2020-12-31T15:51:48-0500
>         mkfile          ./A_RO/o257-7329268-0
>         rename          ./A_RO/o257-7329268-0           dest=./A_RO/date
>         utimes          ./A_RO/                         atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 ctime=2020-12-31T15:51:48-0500
>         write           ./A_RO/date                     offset=0 len=29
>         chown           ./A_RO/date                     gid=0 uid=0
>         chmod           ./A_RO/date                     mode=644
>         utimes          ./A_RO/date                     atime=2020-12-31T15:51:38-0500 mtime=2020-12-31T15:51:38-0500 ctime=2020-12-31T15:51:38-0500
>         mkdir           ./A_RO/o258-7329268-0
>         rename          ./A_RO/o258-7329268-0           dest=./A_RO/t
>         utimes          ./A_RO/                         atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 ctime=2020-12-31T15:51:48-0500
>         chown           ./A_RO/t                        gid=0 uid=0
>         chmod           ./A_RO/t                        mode=755
>         utimes          ./A_RO/t                        atime=2020-12-31T15:51:48-0500 mtime=2020-12-31T15:51:48-0500 ctime=2020-12-31T15:51:48-0500
>         # btrfs send B_RO -p A_RO | btrfs receive --dump
>         At subvol B_RO
>         snapshot        ./B_RO                          uuid=4aa7db26-b219-694e-9b3c-f8f737a46bdb transid=7329268 parent_uuid=995adde4-00ac-5e49-8c6f-f01743def072 parent_transid=7329268
>         utimes          ./B_RO/                         atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 ctime=2020-12-31T15:51:52-0500
>         link            ./B_RO/date                     dest=date
>         unlink          ./B_RO/date
>         utimes          ./B_RO/                         atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 ctime=2020-12-31T15:51:52-0500
>         write           ./B_RO/date                     offset=0 len=29
>         utimes          ./B_RO/date                     atime=2020-12-31T15:51:41-0500 mtime=2020-12-31T15:51:41-0500 ctime=2020-12-31T15:51:41-0500
>         rename          ./B_RO/t                        dest=./B_RO/u
>         utimes          ./B_RO/                         atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 ctime=2020-12-31T15:51:52-0500
>         utimes          ./B_RO/u                        atime=2020-12-31T15:51:52-0500 mtime=2020-12-31T15:51:52-0500 ctime=2020-12-31T15:51:52-0500
>         # btrfs send A_RO | btrfs receive -v /tmp/test
>         At subvol A_RO
>         At subvol A_RO
>         receiving subvol A_RO uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
>         write date - offset=0 length=29
>         BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
>         # btrfs send B_RO -p A_RO | btrfs receive -v /tmp/test
>         At subvol B_RO
>         At snapshot B_RO
>         receiving snapshot B_RO uuid=4aa7db26-b219-694e-9b3c-f8f737a46bdb, ctransid=7329268 parent_uuid=995adde4-00ac-5e49-8c6f-f01743def072, parent_ctransid=7329268
>         ERROR: link date -> date failed: File exists
>
> The btrfs_compare_trees function can handle arbitrary tree differences,
> but something happens in one of the support functions and we get a
> bogus link command.  The rest of the stream is OK though:  we fill
> in the contents of B_RO/date, rename A_RO/t to B_RO/u, and update all
> the timestamps.
>
> Oh well, I didn't say send didn't have any bugs.  ;)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2020-12-31 21:36                 ` Zygo Blaxell
  2021-01-01  4:54                   ` john terragon
@ 2021-01-01 11:42                   ` Andrei Borzenkov
  2021-01-01 20:40                     ` Andrei Borzenkov
  1 sibling, 1 reply; 17+ messages in thread
From: Andrei Borzenkov @ 2021-01-01 11:42 UTC (permalink / raw)
  To: Zygo Blaxell, john terragon; +Cc: sys, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 1841 bytes --]

01.01.2021 00:36, Zygo Blaxell пишет:
...
> 
> Yeah, I only checked that send completed without error and produced a
> smaller stream.
> 
> I just dumped the send metadata stream from the incremental snapshot now,
> and it's more or less garbage at the start:
> 
> 	# btrfs sub create A
> 	# btrfs sub create B
> 	# date > A/date
> 	# date > B/date
> 	# mkdir A/t B/u
> 	# btrfs sub snap -r A A_RO
> 	# btrfs sub snap -r B B_RO
...
> 	# btrfs send A_RO | btrfs receive -v /tmp/test
> 	At subvol A_RO
> 	At subvol A_RO
> 	receiving subvol A_RO uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
> 	write date - offset=0 length=29
> 	BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
> 	# btrfs send B_RO -p A_RO | btrfs receive -v /tmp/test
> 	At subvol B_RO
> 	At snapshot B_RO
> 	receiving snapshot B_RO uuid=4aa7db26-b219-694e-9b3c-f8f737a46bdb, ctransid=7329268 parent_uuid=995adde4-00ac-5e49-8c6f-f01743def072, parent_ctransid=7329268
> 	ERROR: link date -> date failed: File exists
> 
> The btrfs_compare_trees function can handle arbitrary tree differences,

I am not sure. It apparently relies on the fact that inodes are ever
monotonically increasing. This is probably true for clones of the same
subvolume (I assume clone inherits highest_objectid) but two subvolumes
created independently have the same range of inode numbers.

Also I am not sure if using later clone as base for difference to
earlier clone will work for the same reason.

> but something happens in one of the support functions and we get a
> bogus link command.  The rest of the stream is OK though:  we fill
> in the contents of B_RO/date, rename A_RO/t to B_RO/u, and update all
> the timestamps.
> 
> Oh well, I didn't say send didn't have any bugs.  ;)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2021-01-01 11:42                   ` Andrei Borzenkov
@ 2021-01-01 20:40                     ` Andrei Borzenkov
  2021-01-01 23:11                       ` Zygo Blaxell
  0 siblings, 1 reply; 17+ messages in thread
From: Andrei Borzenkov @ 2021-01-01 20:40 UTC (permalink / raw)
  To: Zygo Blaxell, john terragon; +Cc: sys, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 3091 bytes --]

01.01.2021 14:42, Andrei Borzenkov пишет:
> 01.01.2021 00:36, Zygo Blaxell пишет:
> ...
>>
>> Yeah, I only checked that send completed without error and produced a
>> smaller stream.
>>
>> I just dumped the send metadata stream from the incremental snapshot now,
>> and it's more or less garbage at the start:
>>
>> 	# btrfs sub create A
>> 	# btrfs sub create B
>> 	# date > A/date
>> 	# date > B/date
>> 	# mkdir A/t B/u
>> 	# btrfs sub snap -r A A_RO
>> 	# btrfs sub snap -r B B_RO
> ...
>> 	# btrfs send A_RO | btrfs receive -v /tmp/test
>> 	At subvol A_RO
>> 	At subvol A_RO
>> 	receiving subvol A_RO uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
>> 	write date - offset=0 length=29
>> 	BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
>> 	# btrfs send B_RO -p A_RO | btrfs receive -v /tmp/test
>> 	At subvol B_RO
>> 	At snapshot B_RO
>> 	receiving snapshot B_RO uuid=4aa7db26-b219-694e-9b3c-f8f737a46bdb, ctransid=7329268 parent_uuid=995adde4-00ac-5e49-8c6f-f01743def072, parent_ctransid=7329268
>> 	ERROR: link date -> date failed: File exists
>>
>> The btrfs_compare_trees function can handle arbitrary tree differences,
> 
> I am not sure. It apparently relies on the fact that inodes are ever
> monotonically increasing. This is probably true for clones of the same
> subvolume (I assume clone inherits highest_objectid) but two subvolumes
> created independently have the same range of inode numbers.
> 

In particular in your example both A/date and B/date have identical
inode numbers and in general INODE_ITEMs are identical (including
generation numbers) up to times so two inodes are compared as changed.
At the same time INODE_REFs for them are considered different because
INODE_ITEMs for root have different generations. This leads to code path
that attempts to create additional alias to existing inode, as it is
regular file it tries to link it. It does not really compares ref names
at this point at all.

This would not really be possible if A and B were clones of the same
subvolume (not necessary consecutive) as A/date and B/date would always
have different inode numbers.

If I force different generation numbers for A/date and B/date (by
syncing in between) send stream contains correct sequence of removing
old B/date (from A clone) and re-creating it again.

Which shows that unfortunately generation numbers are not reliable to
differentiate between different object generations (pun unintended). As
I understand generation is tied to transaction and multiple changes can
be packed into one transaction.

> Also I am not sure if using later clone as base for difference to
> earlier clone will work for the same reason.
> 
>> but something happens in one of the support functions and we get a
>> bogus link command.  The rest of the stream is OK though:  we fill
>> in the contents of B_RO/date, rename A_RO/t to B_RO/u, and update all
>> the timestamps.
>>
>> Oh well, I didn't say send didn't have any bugs.  ;)
>>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2021-01-01 20:40                     ` Andrei Borzenkov
@ 2021-01-01 23:11                       ` Zygo Blaxell
  2021-01-02  9:25                         ` Andrei Borzenkov
  0 siblings, 1 reply; 17+ messages in thread
From: Zygo Blaxell @ 2021-01-01 23:11 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: john terragon, sys, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 5466 bytes --]

On Fri, Jan 01, 2021 at 11:40:28PM +0300, Andrei Borzenkov wrote:
> 01.01.2021 14:42, Andrei Borzenkov пишет:
> > 01.01.2021 00:36, Zygo Blaxell пишет:
> > ...
> >>
> >> Yeah, I only checked that send completed without error and produced a
> >> smaller stream.
> >>
> >> I just dumped the send metadata stream from the incremental snapshot now,
> >> and it's more or less garbage at the start:
> >>
> >> 	# btrfs sub create A
> >> 	# btrfs sub create B
> >> 	# date > A/date
> >> 	# date > B/date
> >> 	# mkdir A/t B/u
> >> 	# btrfs sub snap -r A A_RO
> >> 	# btrfs sub snap -r B B_RO
> > ...
> >> 	# btrfs send A_RO | btrfs receive -v /tmp/test
> >> 	At subvol A_RO
> >> 	At subvol A_RO
> >> 	receiving subvol A_RO uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
> >> 	write date - offset=0 length=29
> >> 	BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
> >> 	# btrfs send B_RO -p A_RO | btrfs receive -v /tmp/test
> >> 	At subvol B_RO
> >> 	At snapshot B_RO
> >> 	receiving snapshot B_RO uuid=4aa7db26-b219-694e-9b3c-f8f737a46bdb, ctransid=7329268 parent_uuid=995adde4-00ac-5e49-8c6f-f01743def072, parent_ctransid=7329268
> >> 	ERROR: link date -> date failed: File exists
> >>
> >> The btrfs_compare_trees function can handle arbitrary tree differences,
> > 
> > I am not sure. It apparently relies on the fact that inodes are ever
> > monotonically increasing. This is probably true for clones of the same
> > subvolume (I assume clone inherits highest_objectid) but two subvolumes
> > created independently have the same range of inode numbers.
> > 
> 
> In particular in your example both A/date and B/date have identical
> inode numbers and in general INODE_ITEMs are identical (including
> generation numbers) up to times so two inodes are compared as changed.
> At the same time INODE_REFs for them are considered different because
> INODE_ITEMs for root have different generations. This leads to code path
> that attempts to create additional alias to existing inode, as it is
> regular file it tries to link it. It does not really compares ref names
> at this point at all.
> 
> This would not really be possible if A and B were clones of the same
> subvolume (not necessary consecutive) as A/date and B/date would always
> have different inode numbers.

After v5.11-rc1 inode_cache can no longer be used, but any filesystem that
has inode_cache in its history might have cases like this hiding in
metadata even with a linear series of snapshots.

The send code is mostly used to transmit linear sequences of snapshots
(a series of snapshots which capture the state of a single subvol at
different times, ordered from oldest to newest) between machines that
are not using the inode_cache mount option.  Any other case isn't getting
very well tested in the field, even if it happens to work sometimes.

> If I force different generation numbers for A/date and B/date (by
> syncing in between) send stream contains correct sequence of removing
> old B/date (from A clone) and re-creating it again.
>
> Which shows that unfortunately generation numbers are not reliable to
> differentiate between different object generations (pun unintended). As
> I understand generation is tied to transaction and multiple changes can
> be packed into one transaction.

I'm pretty sure that the 6000+ lines of special-case code in send.c still
don't cover every possible case, or even all of the likely ones, even
with linear snapshot sequences.  We still get people on IRC reporting
strange receive issues, and usually the best solution we can find is
to start over with a new full send.  That's OK for small filesystems,
but when you have to unexpectedly do a full send of dozens of terabytes
over a medium-speed link, it's probably time to switch to rsync.

Subversion used to have problems like this (maybe it still does, I
switched to git years ago) where a complicated commit that combined
multiple operations on objects of the same name would break the tool.
I'm surprised btrfs is trying to do similar things in the kernel
(though with the current send implementation there's nowhere else we
could do them).  At least for fsync we get to say "nope, too hard,
do a full commit instead" when complications arise.

> > Also I am not sure if using later clone as base for difference to
> > earlier clone will work for the same reason.

That use case can come up e.g. if you have snapshots of / and you roll
back to an earlier snapshot after a bad upgrade, but your backups are
using incremental snapshots made from '/'.  Then the last-sent-snapshot
(from the bad upgrade) is newer than the origin subvol (from an earlier
good upgrade, with new modifications on top).

Cases like these really need to work, or at least reliably throw
errors when they have failed, as the application that rolls back to
earlier snapshots might have no knowledge of the application that does
incremental send backups on a user's system if they integrated tools
from different vendors.

> >> but something happens in one of the support functions and we get a
> >> bogus link command.  The rest of the stream is OK though:  we fill
> >> in the contents of B_RO/date, rename A_RO/t to B_RO/u, and update all
> >> the timestamps.
> >>
> >> Oh well, I didn't say send didn't have any bugs.  ;)
> >>
> > 
> 
> 




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hierarchical, tree-like structure of snapshots
  2021-01-01 23:11                       ` Zygo Blaxell
@ 2021-01-02  9:25                         ` Andrei Borzenkov
  0 siblings, 0 replies; 17+ messages in thread
From: Andrei Borzenkov @ 2021-01-02  9:25 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: john terragon, sys, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 6566 bytes --]

02.01.2021 02:11, Zygo Blaxell пишет:
> On Fri, Jan 01, 2021 at 11:40:28PM +0300, Andrei Borzenkov wrote:
>> 01.01.2021 14:42, Andrei Borzenkov пишет:
>>> 01.01.2021 00:36, Zygo Blaxell пишет:
>>> ...
>>>>
>>>> Yeah, I only checked that send completed without error and produced a
>>>> smaller stream.
>>>>
>>>> I just dumped the send metadata stream from the incremental snapshot now,
>>>> and it's more or less garbage at the start:
>>>>
>>>> 	# btrfs sub create A
>>>> 	# btrfs sub create B
>>>> 	# date > A/date
>>>> 	# date > B/date
>>>> 	# mkdir A/t B/u
>>>> 	# btrfs sub snap -r A A_RO
>>>> 	# btrfs sub snap -r B B_RO
>>> ...
>>>> 	# btrfs send A_RO | btrfs receive -v /tmp/test
>>>> 	At subvol A_RO
>>>> 	At subvol A_RO
>>>> 	receiving subvol A_RO uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
>>>> 	write date - offset=0 length=29
>>>> 	BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=995adde4-00ac-5e49-8c6f-f01743def072, stransid=7329268
>>>> 	# btrfs send B_RO -p A_RO | btrfs receive -v /tmp/test
>>>> 	At subvol B_RO
>>>> 	At snapshot B_RO
>>>> 	receiving snapshot B_RO uuid=4aa7db26-b219-694e-9b3c-f8f737a46bdb, ctransid=7329268 parent_uuid=995adde4-00ac-5e49-8c6f-f01743def072, parent_ctransid=7329268
>>>> 	ERROR: link date -> date failed: File exists
>>>>
>>>> The btrfs_compare_trees function can handle arbitrary tree differences,
>>>
>>> I am not sure. It apparently relies on the fact that inodes are ever
>>> monotonically increasing. This is probably true for clones of the same
>>> subvolume (I assume clone inherits highest_objectid) but two subvolumes
>>> created independently have the same range of inode numbers.
>>>
>>
>> In particular in your example both A/date and B/date have identical
>> inode numbers and in general INODE_ITEMs are identical (including
>> generation numbers) up to times so two inodes are compared as changed.
>> At the same time INODE_REFs for them are considered different because
>> INODE_ITEMs for root have different generations. This leads to code path
>> that attempts to create additional alias to existing inode, as it is
>> regular file it tries to link it. It does not really compares ref names
>> at this point at all.
>>
>> This would not really be possible if A and B were clones of the same
>> subvolume (not necessary consecutive) as A/date and B/date would always
>> have different inode numbers.
> 
> After v5.11-rc1 inode_cache can no longer be used, but any filesystem that
> has inode_cache in its history might have cases like this hiding in
> metadata even with a linear series of snapshots.
> 
> The send code is mostly used to transmit linear sequences of snapshots
> (a series of snapshots which capture the state of a single subvol at
> different times, ordered from oldest to newest) between machines that
> are not using the inode_cache mount option.  Any other case isn't getting
> very well tested in the field, even if it happens to work sometimes.
> 

This is the only possible way to do it in NetApp and ZFS. But NetApp is
really much more usable than just that (I do not say ZFS is not, I just
have less experience with it). It retains unique identification of every
snapshot that is transferred so you can reverse replication and then
reverse it back, clone both source and destination volumes (which clones
their snapshots) and continue incremental replication of clones starting
from arbitrary snapshot pair, you can cascade replications (including
fan-pout them) and resume incremental replication between arbitrary pair
of systems in replication cascade. Nothing that is even remotely
possible in btrfs. Replication in btrfs is not really suitable for
anything more than offsite backup.

>> If I force different generation numbers for A/date and B/date (by
>> syncing in between) send stream contains correct sequence of removing
>> old B/date (from A clone) and re-creating it again.
>>
>> Which shows that unfortunately generation numbers are not reliable to
>> differentiate between different object generations (pun unintended). As
>> I understand generation is tied to transaction and multiple changes can
>> be packed into one transaction.
> 
> I'm pretty sure that the 6000+ lines of special-case code in send.c still
> don't cover every possible case, or even all of the likely ones, even
> with linear snapshot sequences.  We still get people on IRC reporting
> strange receive issues, and usually the best solution we can find is
> to start over with a new full send.  That's OK for small filesystems,
> but when you have to unexpectedly do a full send of dozens of terabytes
> over a medium-speed link, it's probably time to switch to rsync.
> 
> Subversion used to have problems like this (maybe it still does, I
> switched to git years ago) where a complicated commit that combined
> multiple operations on objects of the same name would break the tool.
> I'm surprised btrfs is trying to do similar things in the kernel
> (though with the current send implementation there's nowhere else we
> could do them).  At least for fsync we get to say "nope, too hard,
> do a full commit instead" when complications arise.
> 
>>> Also I am not sure if using later clone as base for difference to
>>> earlier clone will work for the same reason.
> 
> That use case can come up e.g. if you have snapshots of / and you roll
> back to an earlier snapshot after a bad upgrade, but your backups are
> using incremental snapshots made from '/'.  Then the last-sent-snapshot
> (from the bad upgrade) is newer than the origin subvol (from an earlier
> good upgrade, with new modifications on top).
> 

btrfs does not even support rollback. What is called "rollback" today
works only for root subvolume and is not really rollback but switch to a
different copy.

> Cases like these really need to work, or at least reliably throw
> errors when they have failed, 

Of course :)

> as the application that rolls back to
> earlier snapshots might have no knowledge of the application that does
> incremental send backups on a user's system if they integrated tools
> from different vendors.
> 
>>>> but something happens in one of the support functions and we get a
>>>> bogus link command.  The rest of the stream is OK though:  we fill
>>>> in the contents of B_RO/date, rename A_RO/t to B_RO/u, and update all
>>>> the timestamps.
>>>>
>>>> Oh well, I didn't say send didn't have any bugs.  ;)
>>>>
>>>
>>
>>
> 
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-01-02  9:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-30 16:56 hierarchical, tree-like structure of snapshots john terragon
2020-12-30 17:03 ` john terragon
2020-12-30 17:24 ` sys
2020-12-30 17:39   ` john terragon
2020-12-31  7:05     ` Andrei Borzenkov
2020-12-31 10:00       ` Forza
2020-12-31 16:08       ` john terragon
2020-12-31 17:28         ` Zygo Blaxell
2020-12-31 18:19           ` john terragon
2020-12-31 19:42             ` Andrei Borzenkov
2020-12-31 20:48               ` john terragon
2020-12-31 21:36                 ` Zygo Blaxell
2021-01-01  4:54                   ` john terragon
2021-01-01 11:42                   ` Andrei Borzenkov
2021-01-01 20:40                     ` Andrei Borzenkov
2021-01-01 23:11                       ` Zygo Blaxell
2021-01-02  9:25                         ` Andrei Borzenkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.