All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs-send format that contains binary diffs
@ 2021-03-29 13:16 Claudius Heine
  2021-03-29 16:30 ` Andrei Borzenkov
  0 siblings, 1 reply; 13+ messages in thread
From: Claudius Heine @ 2021-03-29 13:16 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Henning Schild

Hi,

I am currently investigating the possibility to use `btrfs-stream` files 
(generated by `btrfs send`) for deploying a image based update to 
systems (probably embedded ones).

One of the issues I encountered here is that btrfs-send does not use any 
diff algorithm on files that have changed from one snapshot to the next.

One way to implement this would be to add some sort of 'patch' command 
to the `btrfs-stream` format.

Is this something upstream would be interested in?

Lets say we introduce a new `btrfs-send` format, lets call it 
`btrfs-delta-stream`, which could can be created from a `btrfs-stream`:

1. For all `write` commands, check the requirements:
    - Does the file already exists in the old snapshot?
    - Is the file smaller than xMiB (this depends on the diff-algo and 
the available resources)
2. If the file fulfills those requirements, replace 'write' command with 
'patch' command, and calculate the binary delta.  Also check if the 
delta is actually smaller than the data of the new file.  Possible add 
the used binary diff algo as well as a checksum of the 'old' file to the 
command as well.

This file format can of course be converted back to `btrfs-stream` and 
then applied with `btrfs-receive`.

I would probably start with `bsdiff` for the diff algorithm, but maybe 
we want to be flexible here.

Of course if `btrfs-delta-stream` is implemented in `btrfs-progs` then, 
we can create and apply this format directly.

regards,
Claudius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-29 13:16 btrfs-send format that contains binary diffs Claudius Heine
@ 2021-03-29 16:30 ` Andrei Borzenkov
  2021-03-29 17:25   ` Henning Schild
  2021-03-29 19:14   ` Claudius Heine
  0 siblings, 2 replies; 13+ messages in thread
From: Andrei Borzenkov @ 2021-03-29 16:30 UTC (permalink / raw)
  To: Claudius Heine, linux-btrfs; +Cc: Henning Schild

On 29.03.2021 16:16, Claudius Heine wrote:
> Hi,
> 
> I am currently investigating the possibility to use `btrfs-stream` files
> (generated by `btrfs send`) for deploying a image based update to
> systems (probably embedded ones).
> 
> One of the issues I encountered here is that btrfs-send does not use any
> diff algorithm on files that have changed from one snapshot to the next.
> 

btrfs send works on block level. It sends blocks that differ between two
snapshots.

> One way to implement this would be to add some sort of 'patch' command
> to the `btrfs-stream` format.
> 

This would require reading complete content of both snapshots instead if
just computing block diff using metadata. Unless I misunderstand what
you mean.

> Is this something upstream would be interested in?
> 
> Lets say we introduce a new `btrfs-send` format, lets call it
> `btrfs-delta-stream`, which could can be created from a `btrfs-stream`:
> 
> 1. For all `write` commands, check the requirements:
>    - Does the file already exists in the old snapshot?
>    - Is the file smaller than xMiB (this depends on the diff-algo and
> the available resources)
> 2. If the file fulfills those requirements, replace 'write' command with
> 'patch' command, and calculate the binary delta.  Also check if the
> delta is actually smaller than the data of the new file.  Possible add
> the used binary diff algo as well as a checksum of the 'old' file to the
> command as well.
> 
> This file format can of course be converted back to `btrfs-stream` and
> then applied with `btrfs-receive`.
> 
> I would probably start with `bsdiff` for the diff algorithm, but maybe
> we want to be flexible here.
> 
> Of course if `btrfs-delta-stream` is implemented in `btrfs-progs` then,
> we can create and apply this format directly.
> 
> regards,
> Claudius


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-29 16:30 ` Andrei Borzenkov
@ 2021-03-29 17:25   ` Henning Schild
  2021-03-29 18:00     ` Martin Raiber
  2021-03-29 19:14   ` Claudius Heine
  1 sibling, 1 reply; 13+ messages in thread
From: Henning Schild @ 2021-03-29 17:25 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Claudius Heine, linux-btrfs

Am Mon, 29 Mar 2021 19:30:34 +0300
schrieb Andrei Borzenkov <arvidjaar@gmail.com>:

> On 29.03.2021 16:16, Claudius Heine wrote:
> > Hi,
> > 
> > I am currently investigating the possibility to use `btrfs-stream`
> > files (generated by `btrfs send`) for deploying a image based
> > update to systems (probably embedded ones).
> > 
> > One of the issues I encountered here is that btrfs-send does not
> > use any diff algorithm on files that have changed from one snapshot
> > to the next. 
> 
> btrfs send works on block level. It sends blocks that differ between
> two snapshots.
> 
> > One way to implement this would be to add some sort of 'patch'
> > command to the `btrfs-stream` format.
> >   
> 
> This would require reading complete content of both snapshots instead
> if just computing block diff using metadata. Unless I misunderstand
> what you mean.

On embedded systems it is common to update complete "firmware" images
as opposed to package based partial updates. You often have two root
filesystems to be able to always fall back to a working state in case
of any sort or error.

Take the picture from
https://sbabic.github.io/swupdate/overview.html#double-copy

and assume that "Application software" is a full blown OS with
everything that makes your device.

That approach offers great "control" but unfortunately can also lead to
great downloads required for an update. The basic idea is to download
the binary-diff between the future and the current rootfs only.
Given a filesystem supports snapshots, it would be great to
"send/receive" them as diffs.

Today most people that do such things with other fss script around with
xdelta etc. But btrfs is more "integrated", so when considering it for
such embedded usecases native support would most likely be better than
hacks on top.

We have several use-cases in mind with btrfs.
 - ro-base with rw overlays
 - binary diff updates against such a ro-base
 - backup/restore with snapshots of certain subvolumes
 - factory reset with wiping certain submodules

regards,
Henning

> > Is this something upstream would be interested in?
> > 
> > Lets say we introduce a new `btrfs-send` format, lets call it
> > `btrfs-delta-stream`, which could can be created from a
> > `btrfs-stream`:
> > 
> > 1. For all `write` commands, check the requirements:
> >    - Does the file already exists in the old snapshot?
> >    - Is the file smaller than xMiB (this depends on the diff-algo
> > and the available resources)
> > 2. If the file fulfills those requirements, replace 'write' command
> > with 'patch' command, and calculate the binary delta.  Also check
> > if the delta is actually smaller than the data of the new file.
> > Possible add the used binary diff algo as well as a checksum of the
> > 'old' file to the command as well.
> > 
> > This file format can of course be converted back to `btrfs-stream`
> > and then applied with `btrfs-receive`.
> > 
> > I would probably start with `bsdiff` for the diff algorithm, but
> > maybe we want to be flexible here.
> > 
> > Of course if `btrfs-delta-stream` is implemented in `btrfs-progs`
> > then, we can create and apply this format directly.
> > 
> > regards,
> > Claudius  
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-29 17:25   ` Henning Schild
@ 2021-03-29 18:00     ` Martin Raiber
  2021-03-29 19:25       ` Claudius Heine
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Raiber @ 2021-03-29 18:00 UTC (permalink / raw)
  To: Henning Schild, Andrei Borzenkov; +Cc: Claudius Heine, linux-btrfs

On 29.03.2021 19:25 Henning Schild wrote:
> Am Mon, 29 Mar 2021 19:30:34 +0300
> schrieb Andrei Borzenkov <arvidjaar@gmail.com>:
>
>> On 29.03.2021 16:16, Claudius Heine wrote:
>>> Hi,
>>>
>>> I am currently investigating the possibility to use `btrfs-stream`
>>> files (generated by `btrfs send`) for deploying a image based
>>> update to systems (probably embedded ones).
>>>
>>> One of the issues I encountered here is that btrfs-send does not
>>> use any diff algorithm on files that have changed from one snapshot
>>> to the next. 
>> btrfs send works on block level. It sends blocks that differ between
>> two snapshots.
>>
>>> One way to implement this would be to add some sort of 'patch'
>>> command to the `btrfs-stream` format.
>>>   
>> This would require reading complete content of both snapshots instead
>> if just computing block diff using metadata. Unless I misunderstand
>> what you mean.
> On embedded systems it is common to update complete "firmware" images
> as opposed to package based partial updates. You often have two root
> filesystems to be able to always fall back to a working state in case
> of any sort or error.
>
> Take the picture from
> https://sbabic.github.io/swupdate/overview.html#double-copy
>
> and assume that "Application software" is a full blown OS with
> everything that makes your device.
>
> That approach offers great "control" but unfortunately can also lead to
> great downloads required for an update. The basic idea is to download
> the binary-diff between the future and the current rootfs only.
> Given a filesystem supports snapshots, it would be great to
> "send/receive" them as diffs.
>
> Today most people that do such things with other fss script around with
> xdelta etc. But btrfs is more "integrated", so when considering it for
> such embedded usecases native support would most likely be better than
> hacks on top.
>
> We have several use-cases in mind with btrfs.
>  - ro-base with rw overlays
>  - binary diff updates against such a ro-base
>  - backup/restore with snapshots of certain subvolumes
>  - factory reset with wiping certain submodules
>
> regards,
> Henning

I think I know what you want to accomplish and I've been doing it for a while now. But I don't know what the problem with btrfs send is? Do you want to have non-block based diff to make updates smaller? Have you overwritten files completely and need to dedupe or reflink them before sending them? Theoretically the btrfs send format would be able to support something like bsdiff (non-block based diff -- it is just a set of e.g. write commands with offset and binary data or using reflink to copy data from one file to another), but there currently isn't a tool to create this.

How I've done it is:

 - Create a btrfs image with a rw sys_root_current subvol
 - E.g. debootstrap a Linux system into it
 - Create sys_root_v1 as ro snapshot of sys_root_current

Use that system image on different systems.

On update on the original image:

 - Modify sys_root_current
 - Create ro snapshot sys_root_v2 of sys_root_current
 - Create an btrfs send update that modifies sys_root_v1 to sys_root_v2: btrfs send -p sys_root_v1 sys_root_v2 | xz -c > update_v1.btrfs.xz
 - Publish update_v1.btrfs.xz

On the systems:

 - Download update_v1.btrfs.xz (verify signature)
 - Create sys_root_v2 by applying differences to sys_root_v1: cat update_v1.btrfs.xz | xz -d -c | btrfs receive /rootfs
 - Rename (exchange) sys_root_current to sys_root_last
 - Create rw snapshot of sys_root_v2 as sys_root_current
 - Reboot into new system

>>> Is this something upstream would be interested in?
>>>
>>> Lets say we introduce a new `btrfs-send` format, lets call it
>>> `btrfs-delta-stream`, which could can be created from a
>>> `btrfs-stream`:
>>>
>>> 1. For all `write` commands, check the requirements:
>>>    - Does the file already exists in the old snapshot?
>>>    - Is the file smaller than xMiB (this depends on the diff-algo
>>> and the available resources)
>>> 2. If the file fulfills those requirements, replace 'write' command
>>> with 'patch' command, and calculate the binary delta.  Also check
>>> if the delta is actually smaller than the data of the new file.
>>> Possible add the used binary diff algo as well as a checksum of the
>>> 'old' file to the command as well.
>>>
>>> This file format can of course be converted back to `btrfs-stream`
>>> and then applied with `btrfs-receive`.
>>>
>>> I would probably start with `bsdiff` for the diff algorithm, but
>>> maybe we want to be flexible here.
>>>
>>> Of course if `btrfs-delta-stream` is implemented in `btrfs-progs`
>>> then, we can create and apply this format directly.
>>>
>>> regards,
>>> Claudius  



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-29 16:30 ` Andrei Borzenkov
  2021-03-29 17:25   ` Henning Schild
@ 2021-03-29 19:14   ` Claudius Heine
  2021-03-29 19:53     ` Lionel Bouton
  2021-03-30  5:33     ` Andrei Borzenkov
  1 sibling, 2 replies; 13+ messages in thread
From: Claudius Heine @ 2021-03-29 19:14 UTC (permalink / raw)
  To: Andrei Borzenkov, linux-btrfs; +Cc: Henning Schild

Hi Andrei,

On 2021-03-29 18:30, Andrei Borzenkov wrote:
> On 29.03.2021 16:16, Claudius Heine wrote:
>> Hi,
>>
>> I am currently investigating the possibility to use `btrfs-stream` files
>> (generated by `btrfs send`) for deploying a image based update to
>> systems (probably embedded ones).
>>
>> One of the issues I encountered here is that btrfs-send does not use any
>> diff algorithm on files that have changed from one snapshot to the next.
>>
> 
> btrfs send works on block level. It sends blocks that differ between two
> snapshots.

Are you sure?

I did a test with a 32MiB random file. I created one snapshot, then 
changed (not deleted or added) one byte in that file and then created a 
snapshot again. `btrfs send` created a >32MiB `btrfs-stream` file. If it 
would be only block based, then I would have expected that it would just 
contain the changed block, not the whole file. And if I use a smaller 
file on the same file system, then the `btrfs-stream` is smaller as well.

I looked into those `btrfs-stream` files using [1] and also [2] as well 
as the code. While I haven't understood everything there yet, it 
currently looks to me like it is file based.

> 
>> One way to implement this would be to add some sort of 'patch' command
>> to the `btrfs-stream` format.
>>
> 
> This would require reading complete content of both snapshots instead if
> just computing block diff using metadata. Unless I misunderstand what
> you mean.
I think I should only need access to the old snapshot as well as the 
`btrfs-stream` file. But I currently don't have a complete PoC of this 
ready.

regards,
Claudius

[1] https://github.com/sysnux/btrfs-snapshots-diff
[2] https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-29 18:00     ` Martin Raiber
@ 2021-03-29 19:25       ` Claudius Heine
  0 siblings, 0 replies; 13+ messages in thread
From: Claudius Heine @ 2021-03-29 19:25 UTC (permalink / raw)
  To: Martin Raiber, Henning Schild, Andrei Borzenkov; +Cc: linux-btrfs

Hi Martin,

On 2021-03-29 20:00, Martin Raiber wrote:
> On 29.03.2021 19:25 Henning Schild wrote:
>> Am Mon, 29 Mar 2021 19:30:34 +0300
>> schrieb Andrei Borzenkov <arvidjaar@gmail.com>:
>>
>>> On 29.03.2021 16:16, Claudius Heine wrote:
>>>> Hi,
>>>>
>>>> I am currently investigating the possibility to use `btrfs-stream`
>>>> files (generated by `btrfs send`) for deploying a image based
>>>> update to systems (probably embedded ones).
>>>>
>>>> One of the issues I encountered here is that btrfs-send does not
>>>> use any diff algorithm on files that have changed from one snapshot
>>>> to the next.
>>> btrfs send works on block level. It sends blocks that differ between
>>> two snapshots.
>>>
>>>> One way to implement this would be to add some sort of 'patch'
>>>> command to the `btrfs-stream` format.
>>>>    
>>> This would require reading complete content of both snapshots instead
>>> if just computing block diff using metadata. Unless I misunderstand
>>> what you mean.
>> On embedded systems it is common to update complete "firmware" images
>> as opposed to package based partial updates. You often have two root
>> filesystems to be able to always fall back to a working state in case
>> of any sort or error.
>>
>> Take the picture from
>> https://sbabic.github.io/swupdate/overview.html#double-copy
>>
>> and assume that "Application software" is a full blown OS with
>> everything that makes your device.
>>
>> That approach offers great "control" but unfortunately can also lead to
>> great downloads required for an update. The basic idea is to download
>> the binary-diff between the future and the current rootfs only.
>> Given a filesystem supports snapshots, it would be great to
>> "send/receive" them as diffs.
>>
>> Today most people that do such things with other fss script around with
>> xdelta etc. But btrfs is more "integrated", so when considering it for
>> such embedded usecases native support would most likely be better than
>> hacks on top.
>>
>> We have several use-cases in mind with btrfs.
>>   - ro-base with rw overlays
>>   - binary diff updates against such a ro-base
>>   - backup/restore with snapshots of certain subvolumes
>>   - factory reset with wiping certain submodules
>>
>> regards,
>> Henning
> 
> I think I know what you want to accomplish and I've been doing it for a while now. But I don't know what the problem with btrfs send is? Do you want to have non-block based diff to make updates smaller? Have you overwritten files completely and need to dedupe or reflink them before sending them? Theoretically the btrfs send format would be able to support something like bsdiff (non-block based diff -- it is just a set of e.g. write commands with offset and binary data or using reflink to copy data from one file to another), but there currently isn't a tool to create this.
> 
> How I've done it is:
> 
>   - Create a btrfs image with a rw sys_root_current subvol
>   - E.g. debootstrap a Linux system into it
>   - Create sys_root_v1 as ro snapshot of sys_root_current
> 
> Use that system image on different systems.
> 
> On update on the original image:
> 
>   - Modify sys_root_current
>   - Create ro snapshot sys_root_v2 of sys_root_current
>   - Create an btrfs send update that modifies sys_root_v1 to sys_root_v2: btrfs send -p sys_root_v1 sys_root_v2 | xz -c > update_v1.btrfs.xz
>   - Publish update_v1.btrfs.xz
> 
> On the systems:
> 
>   - Download update_v1.btrfs.xz (verify signature)
>   - Create sys_root_v2 by applying differences to sys_root_v1: cat update_v1.btrfs.xz | xz -d -c | btrfs receive /rootfs
>   - Rename (exchange) sys_root_current to sys_root_last
>   - Create rw snapshot of sys_root_v2 as sys_root_current
>   - Reboot into new system

That is mostly the approach we have envisioned. However we also wanted 
to add the possibility of allowing binary deltas for smaller blobs. 
(e.g. <60MiB). Because competing mechanisms, that use different 
mechanisms are based on different file systems like os-tree, or zchunk 
allow smaller update sizes, by using similar mechanisms.

Os-tree for instance allows to calculate static deltas [1] in order to 
make the download size of the update smaller. However, currently you 
cannot really use OS-tree to apply those completely offline.

zchunk allows to split a big image into smaller chunks, and then only 
the chunks that changed need to be fetched [2]. Again, not really 
possible offline.

regards,
Claudius

[1] 
https://ostreedev.github.io/ostree/repository-management/#derived-data---static-deltas-and-the-summary-file
[2] https://github.com/zchunk/zchunk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-29 19:14   ` Claudius Heine
@ 2021-03-29 19:53     ` Lionel Bouton
  2021-03-30  7:48       ` Claudius Heine
  2021-03-30  5:33     ` Andrei Borzenkov
  1 sibling, 1 reply; 13+ messages in thread
From: Lionel Bouton @ 2021-03-29 19:53 UTC (permalink / raw)
  To: Claudius Heine, Andrei Borzenkov, linux-btrfs; +Cc: Henning Schild

Hi Claudius,

Le 29/03/2021 à 21:14, Claudius Heine a écrit :
> [...]
> Are you sure?
>
> I did a test with a 32MiB random file. I created one snapshot, then
> changed (not deleted or added) one byte in that file and then created
> a snapshot again. `btrfs send` created a >32MiB `btrfs-stream` file.
> If it would be only block based, then I would have expected that it
> would just contain the changed block, not the whole file.

I suspect there is another possible explanations : the tool you used to
change one byte actually rewrote the whole file.

You can test this by appending data to your file (for example with "cat
otherfile >> originalfile" or "dd if=/dev/urandom of=originalfile bs=1M
count=4 conv=notrunc oflag=append") and checking the size of `btrfs
send`'s output.

When I append data with dd as described above to a 32M file originally
created with "dd if=/dev/urandom of=originalfile bs=1M count=32" I get a
file with 1 extent only in each snapshot both marked shared and a little
other 4M in `btrfs send`'s output.
filefrag -v should tell you if the extents in your file are shared.

Note that if you use compression and your files compress well they will
use small extents (128kB from memory), this can be bad when you try to
avoid fragmentation but could help COW find more data to share if I
understand how COW works in respect to extents correctly.

Finally, using "dd if=/dev/urandom of=originalfile bs=1M count=1
conv=notrunc seek=12M" to write in the middle of my now 36M file results
in a little over 1M with `btrfs send` using -p <previous snapshot>
And filefrag -v shows 3 extents for this file. 2 of them share the same
logical offsets than the file in the previous snapshot, the last use a
new range, confirming the allocation of a new extent and reuse of the
previous ones.
This seems to confirm my hypothesis that the tool you used did rewrite
the whole file.

Another possibility would be that COW is disabled, either by a mount
option or a file attribute (see lsattr's output for your file).

Best regards,

Lionel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-29 19:14   ` Claudius Heine
  2021-03-29 19:53     ` Lionel Bouton
@ 2021-03-30  5:33     ` Andrei Borzenkov
  2021-03-30  5:38       ` Andrei Borzenkov
  1 sibling, 1 reply; 13+ messages in thread
From: Andrei Borzenkov @ 2021-03-30  5:33 UTC (permalink / raw)
  To: Claudius Heine, linux-btrfs; +Cc: Henning Schild

On 29.03.2021 22:14, Claudius Heine wrote:
> Hi Andrei,
> 
> On 2021-03-29 18:30, Andrei Borzenkov wrote:
>> On 29.03.2021 16:16, Claudius Heine wrote:
>>> Hi,
>>>
>>> I am currently investigating the possibility to use `btrfs-stream` files
>>> (generated by `btrfs send`) for deploying a image based update to
>>> systems (probably embedded ones).
>>>
>>> One of the issues I encountered here is that btrfs-send does not use any
>>> diff algorithm on files that have changed from one snapshot to the next.
>>>
>>
>> btrfs send works on block level. It sends blocks that differ between two
>> snapshots.
> 
> Are you sure?
> 

Yes.

> I did a test with a 32MiB random file. I created one snapshot, then
> changed (not deleted or added) one byte in that file and then created a
> snapshot again. `btrfs send` created a >32MiB `btrfs-stream` file. If it
> would be only block based, then I would have expected that it would just
> contain the changed block, not the whole file. And if I use a smaller
> file on the same file system, then the `btrfs-stream` is smaller as well.
> 
> I looked into those `btrfs-stream` files using [1] and also [2] as well
> as the code. While I haven't understood everything there yet, it
> currently looks to me like it is file based.
> 

btrfs send is not pure block based image, because it would require two
absolutely identical filesystems. It needs to replicate filesystem
structure so it of course needs to know which files are created/deleted.
But for each file it only sends changed parts since previous snapshot.
This only works if both snapshots refer to the *same* file.

As was already mentioned, you need to understand how your files are
changed. In particular, standard tools for software update do not
rewrite files in place - they create new files with new content. From
btrfs perspective they are completely different; two files with the same
name in two snapshots do not share a single byte. When you compute delta
between two snapshots you get instructions to delete old file and create
new file with new content (that will be renamed to the same name as
deleted old file). This also by necessity sends full new content.

So yes, btrfs replication is block based; similarity is determined by
how much physical data is shared between two files. And you expect file
based replication where file names determine whether files should be
considered the same and changes are computed for two files with the same
name.

>>
>>> One way to implement this would be to add some sort of 'patch' command
>>> to the `btrfs-stream` format.
>>>
>>
>> This would require reading complete content of both snapshots instead if
>> just computing block diff using metadata. Unless I misunderstand what
>> you mean.
> I think I should only need access to the old snapshot as well as the
> `btrfs-stream` file. But I currently don't have a complete PoC of this
> ready.
> 
> regards,
> Claudius
> 
> [1] https://github.com/sysnux/btrfs-snapshots-diff
> [2] https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-30  5:33     ` Andrei Borzenkov
@ 2021-03-30  5:38       ` Andrei Borzenkov
  2021-03-30  8:12         ` Claudius Heine
  0 siblings, 1 reply; 13+ messages in thread
From: Andrei Borzenkov @ 2021-03-30  5:38 UTC (permalink / raw)
  To: Claudius Heine, linux-btrfs; +Cc: Henning Schild

On 30.03.2021 08:33, Andrei Borzenkov wrote:
> On 29.03.2021 22:14, Claudius Heine wrote:
>> Hi Andrei,
>>
>> On 2021-03-29 18:30, Andrei Borzenkov wrote:
>>> On 29.03.2021 16:16, Claudius Heine wrote:
>>>> Hi,
>>>>
>>>> I am currently investigating the possibility to use `btrfs-stream` files
>>>> (generated by `btrfs send`) for deploying a image based update to
>>>> systems (probably embedded ones).
>>>>
>>>> One of the issues I encountered here is that btrfs-send does not use any
>>>> diff algorithm on files that have changed from one snapshot to the next.
>>>>
>>>
>>> btrfs send works on block level. It sends blocks that differ between two
>>> snapshots.
>>
>> Are you sure?
>>
> 
> Yes.
> 
>> I did a test with a 32MiB random file. I created one snapshot, then
>> changed (not deleted or added) one byte in that file and then created a
>> snapshot again. `btrfs send` created a >32MiB `btrfs-stream` file. If it
>> would be only block based, then I would have expected that it would just
>> contain the changed block, not the whole file. And if I use a smaller
>> file on the same file system, then the `btrfs-stream` is smaller as well.
>>
>> I looked into those `btrfs-stream` files using [1] and also [2] as well
>> as the code. While I haven't understood everything there yet, it
>> currently looks to me like it is file based.
>>
> 
> btrfs send is not pure block based image, because it would require two
> absolutely identical filesystems. It needs to replicate filesystem
> structure so it of course needs to know which files are created/deleted.
> But for each file it only sends changed parts since previous snapshot.
> This only works if both snapshots refer to the *same* file.
> 

Or more precisely - btrfs send knows which filesystem content was part
of previous snapshot and so is already present on destination and it
will not send this content again. It is actually more or less irrelevant
which files this content belongs to.


> As was already mentioned, you need to understand how your files are
> changed. In particular, standard tools for software update do not
> rewrite files in place - they create new files with new content. From
> btrfs perspective they are completely different; two files with the same
> name in two snapshots do not share a single byte. When you compute delta
> between two snapshots you get instructions to delete old file and create
> new file with new content (that will be renamed to the same name as
> deleted old file). This also by necessity sends full new content.
> 
> So yes, btrfs replication is block based; similarity is determined by
> how much physical data is shared between two files. And you expect file
> based replication where file names determine whether files should be
> considered the same and changes are computed for two files with the same
> name.
> 
>>>
>>>> One way to implement this would be to add some sort of 'patch' command
>>>> to the `btrfs-stream` format.
>>>>
>>>
>>> This would require reading complete content of both snapshots instead if
>>> just computing block diff using metadata. Unless I misunderstand what
>>> you mean.
>> I think I should only need access to the old snapshot as well as the
>> `btrfs-stream` file. But I currently don't have a complete PoC of this
>> ready.
>>
>> regards,
>> Claudius
>>
>> [1] https://github.com/sysnux/btrfs-snapshots-diff
>> [2] https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-29 19:53     ` Lionel Bouton
@ 2021-03-30  7:48       ` Claudius Heine
  0 siblings, 0 replies; 13+ messages in thread
From: Claudius Heine @ 2021-03-30  7:48 UTC (permalink / raw)
  To: Lionel Bouton, Andrei Borzenkov, linux-btrfs; +Cc: Henning Schild

Hi Lionel,

On 2021-03-29 21:53, Lionel Bouton wrote:
> Hi Claudius,
> 
> Le 29/03/2021 à 21:14, Claudius Heine a écrit :
>> [...]
>> Are you sure?
>>
>> I did a test with a 32MiB random file. I created one snapshot, then
>> changed (not deleted or added) one byte in that file and then created
>> a snapshot again. `btrfs send` created a >32MiB `btrfs-stream` file.
>> If it would be only block based, then I would have expected that it
>> would just contain the changed block, not the whole file.
> 
> I suspect there is another possible explanations : the tool you used to
> change one byte actually rewrote the whole file.
> 
> You can test this by appending data to your file (for example with "cat
> otherfile >> originalfile" or "dd if=/dev/urandom of=originalfile bs=1M
> count=4 conv=notrunc oflag=append") and checking the size of `btrfs
> send`'s output.
> 
> When I append data with dd as described above to a 32M file originally
> created with "dd if=/dev/urandom of=originalfile bs=1M count=32" I get a
> file with 1 extent only in each snapshot both marked shared and a little
> other 4M in `btrfs send`'s output.
> filefrag -v should tell you if the extents in your file are shared.
> 
> Note that if you use compression and your files compress well they will
> use small extents (128kB from memory), this can be bad when you try to
> avoid fragmentation but could help COW find more data to share if I
> understand how COW works in respect to extents correctly.
> 
> Finally, using "dd if=/dev/urandom of=originalfile bs=1M count=1
> conv=notrunc seek=12M" to write in the middle of my now 36M file results
> in a little over 1M with `btrfs send` using -p <previous snapshot>
> And filefrag -v shows 3 extents for this file. 2 of them share the same
> logical offsets than the file in the previous snapshot, the last use a
> new range, confirming the allocation of a new extent and reuse of the
> previous ones.
> This seems to confirm my hypothesis that the tool you used did rewrite
> the whole file.

Yes, I think you are right here. I will have to experiment with this a 
bit further. Thanks!

regards,
Claudius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-30  5:38       ` Andrei Borzenkov
@ 2021-03-30  8:12         ` Claudius Heine
  2021-03-30 16:32           ` Henning Schild
  2021-03-31  1:17           ` Zygo Blaxell
  0 siblings, 2 replies; 13+ messages in thread
From: Claudius Heine @ 2021-03-30  8:12 UTC (permalink / raw)
  To: Andrei Borzenkov, linux-btrfs; +Cc: Henning Schild

Hi Andrei,

On 2021-03-30 07:38, Andrei Borzenkov wrote:
> On 30.03.2021 08:33, Andrei Borzenkov wrote:
>> On 29.03.2021 22:14, Claudius Heine wrote:
>>> Hi Andrei,
>>>
>>> On 2021-03-29 18:30, Andrei Borzenkov wrote:
>>>> On 29.03.2021 16:16, Claudius Heine wrote:
>>>>> Hi,
>>>>>
>>>>> I am currently investigating the possibility to use `btrfs-stream` files
>>>>> (generated by `btrfs send`) for deploying a image based update to
>>>>> systems (probably embedded ones).
>>>>>
>>>>> One of the issues I encountered here is that btrfs-send does not use any
>>>>> diff algorithm on files that have changed from one snapshot to the next.
>>>>>
>>>>
>>>> btrfs send works on block level. It sends blocks that differ between two
>>>> snapshots.
>>>
>>> Are you sure?
>>>
>>
>> Yes.

Ok, sorry for doubting you. My assumptions where wrong.

>>
>>> I did a test with a 32MiB random file. I created one snapshot, then
>>> changed (not deleted or added) one byte in that file and then created a
>>> snapshot again. `btrfs send` created a >32MiB `btrfs-stream` file. If it
>>> would be only block based, then I would have expected that it would just
>>> contain the changed block, not the whole file. And if I use a smaller
>>> file on the same file system, then the `btrfs-stream` is smaller as well.
>>>
>>> I looked into those `btrfs-stream` files using [1] and also [2] as well
>>> as the code. While I haven't understood everything there yet, it
>>> currently looks to me like it is file based.
>>>
>>
>> btrfs send is not pure block based image, because it would require two
>> absolutely identical filesystems. It needs to replicate filesystem
>> structure so it of course needs to know which files are created/deleted.
>> But for each file it only sends changed parts since previous snapshot.
>> This only works if both snapshots refer to the *same* file.
>>
> 
> Or more precisely - btrfs send knows which filesystem content was part
> of previous snapshot and so is already present on destination and it
> will not send this content again. It is actually more or less irrelevant
> which files this content belongs to.

I think I understood that now.

> 
>> As was already mentioned, you need to understand how your files are
>> changed. In particular, standard tools for software update do not
>> rewrite files in place - they create new files with new content. From
>> btrfs perspective they are completely different; two files with the same
>> name in two snapshots do not share a single byte. When you compute delta
>> between two snapshots you get instructions to delete old file and create
>> new file with new content (that will be renamed to the same name as
>> deleted old file). This also by necessity sends full new content.

As you said, many standard tools create new files instead of updating 
files in place. But I guess a `dedupe` run before creating the snapshot 
could help here, right?

If we have a root file system build process that always regenerates all 
files, and then copies those into a file system, then all files are 
'different' from a btrfs perspective.

>> So yes, btrfs replication is block based; similarity is determined by
>> how much physical data is shared between two files. And you expect file
>> based replication where file names determine whether files should be
>> considered the same and changes are computed for two files with the same
>> name.

Right. Maybe we could use the file path just as a hint for an 
opportunity of saving resources by creating block based deltas.

I guess I have to think about this some more.

Thanks a lot!
Claudius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-30  8:12         ` Claudius Heine
@ 2021-03-30 16:32           ` Henning Schild
  2021-03-31  1:17           ` Zygo Blaxell
  1 sibling, 0 replies; 13+ messages in thread
From: Henning Schild @ 2021-03-30 16:32 UTC (permalink / raw)
  To: Claudius Heine; +Cc: Andrei Borzenkov, linux-btrfs

Am Tue, 30 Mar 2021 10:12:40 +0200
schrieb Claudius Heine <ch@denx.de>:

> Hi Andrei,
> 
> On 2021-03-30 07:38, Andrei Borzenkov wrote:
> > On 30.03.2021 08:33, Andrei Borzenkov wrote:  
> >> On 29.03.2021 22:14, Claudius Heine wrote:  
> >>> Hi Andrei,
> >>>
> >>> On 2021-03-29 18:30, Andrei Borzenkov wrote:  
> >>>> On 29.03.2021 16:16, Claudius Heine wrote:  
> >>>>> Hi,
> >>>>>
> >>>>> I am currently investigating the possibility to use
> >>>>> `btrfs-stream` files (generated by `btrfs send`) for deploying
> >>>>> a image based update to systems (probably embedded ones).
> >>>>>
> >>>>> One of the issues I encountered here is that btrfs-send does
> >>>>> not use any diff algorithm on files that have changed from one
> >>>>> snapshot to the next. 
> >>>>
> >>>> btrfs send works on block level. It sends blocks that differ
> >>>> between two snapshots.  
> >>>
> >>> Are you sure?
> >>>  
> >>
> >> Yes.  
> 
> Ok, sorry for doubting you. My assumptions where wrong.
> 
> >>  
> >>> I did a test with a 32MiB random file. I created one snapshot,
> >>> then changed (not deleted or added) one byte in that file and
> >>> then created a snapshot again. `btrfs send` created a >32MiB
> >>> `btrfs-stream` file. If it would be only block based, then I
> >>> would have expected that it would just contain the changed block,
> >>> not the whole file. And if I use a smaller file on the same file
> >>> system, then the `btrfs-stream` is smaller as well.
> >>>
> >>> I looked into those `btrfs-stream` files using [1] and also [2]
> >>> as well as the code. While I haven't understood everything there
> >>> yet, it currently looks to me like it is file based.
> >>>  
> >>
> >> btrfs send is not pure block based image, because it would require
> >> two absolutely identical filesystems. It needs to replicate
> >> filesystem structure so it of course needs to know which files are
> >> created/deleted. But for each file it only sends changed parts
> >> since previous snapshot. This only works if both snapshots refer
> >> to the *same* file. 
> > 
> > Or more precisely - btrfs send knows which filesystem content was
> > part of previous snapshot and so is already present on destination
> > and it will not send this content again. It is actually more or
> > less irrelevant which files this content belongs to.  
> 
> I think I understood that now.
> 
> >   
> >> As was already mentioned, you need to understand how your files are
> >> changed. In particular, standard tools for software update do not
> >> rewrite files in place - they create new files with new content.
> >> From btrfs perspective they are completely different; two files
> >> with the same name in two snapshots do not share a single byte.
> >> When you compute delta between two snapshots you get instructions
> >> to delete old file and create new file with new content (that will
> >> be renamed to the same name as deleted old file). This also by
> >> necessity sends full new content.  
> 
> As you said, many standard tools create new files instead of updating 
> files in place. But I guess a `dedupe` run before creating the
> snapshot could help here, right?
> 
> If we have a root file system build process that always regenerates
> all files, and then copies those into a file system, then all files
> are 'different' from a btrfs perspective.

Not to mention that would be kind of hard to inject that rootfs into an
existing filesystem. Building with i.e. yocto or Isar (in our
specific case) would generate a new fs every time. A compression based
on binary diff would be an additional/optional step, which depends on
still having that old state. And it would have to be done many times,
for every update-base to allow.
That all sounds to me like same limitation we already know from other
filesystems and btrfs not coming to the rescue. But a solution for
btrfs might still look different than one for other filesystems.

Henning

> >> So yes, btrfs replication is block based; similarity is determined
> >> by how much physical data is shared between two files. And you
> >> expect file based replication where file names determine whether
> >> files should be considered the same and changes are computed for
> >> two files with the same name.  
> 
> Right. Maybe we could use the file path just as a hint for an 
> opportunity of saving resources by creating block based deltas.
> 
> I guess I have to think about this some more.
> 
> Thanks a lot!
> Claudius


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs-send format that contains binary diffs
  2021-03-30  8:12         ` Claudius Heine
  2021-03-30 16:32           ` Henning Schild
@ 2021-03-31  1:17           ` Zygo Blaxell
  1 sibling, 0 replies; 13+ messages in thread
From: Zygo Blaxell @ 2021-03-31  1:17 UTC (permalink / raw)
  To: Claudius Heine; +Cc: Andrei Borzenkov, linux-btrfs, Henning Schild

On Tue, Mar 30, 2021 at 10:12:40AM +0200, Claudius Heine wrote:
> Hi Andrei,
> 
> On 2021-03-30 07:38, Andrei Borzenkov wrote:
> > On 30.03.2021 08:33, Andrei Borzenkov wrote:
> > > On 29.03.2021 22:14, Claudius Heine wrote:
> > > > Hi Andrei,
> > > > 
> > > > On 2021-03-29 18:30, Andrei Borzenkov wrote:
> > > > > On 29.03.2021 16:16, Claudius Heine wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I am currently investigating the possibility to use `btrfs-stream` files
> > > > > > (generated by `btrfs send`) for deploying a image based update to
> > > > > > systems (probably embedded ones).
> > > > > > 
> > > > > > One of the issues I encountered here is that btrfs-send does not use any
> > > > > > diff algorithm on files that have changed from one snapshot to the next.
> > > > > > 
> > > > > 
> > > > > btrfs send works on block level. It sends blocks that differ between two
> > > > > snapshots.
> > > > 
> > > > Are you sure?
> > > > 
> > > 
> > > Yes.
> 
> Ok, sorry for doubting you. My assumptions where wrong.
> 
> > > 
> > > > I did a test with a 32MiB random file. I created one snapshot, then
> > > > changed (not deleted or added) one byte in that file and then created a
> > > > snapshot again. `btrfs send` created a >32MiB `btrfs-stream` file. If it
> > > > would be only block based, then I would have expected that it would just
> > > > contain the changed block, not the whole file. And if I use a smaller
> > > > file on the same file system, then the `btrfs-stream` is smaller as well.
> > > > 
> > > > I looked into those `btrfs-stream` files using [1] and also [2] as well
> > > > as the code. While I haven't understood everything there yet, it
> > > > currently looks to me like it is file based.
> > > > 
> > > 
> > > btrfs send is not pure block based image, because it would require two
> > > absolutely identical filesystems. It needs to replicate filesystem
> > > structure so it of course needs to know which files are created/deleted.
> > > But for each file it only sends changed parts since previous snapshot.
> > > This only works if both snapshots refer to the *same* file.
> > > 
> > 
> > Or more precisely - btrfs send knows which filesystem content was part
> > of previous snapshot and so is already present on destination and it
> > will not send this content again. It is actually more or less irrelevant
> > which files this content belongs to.
> 
> I think I understood that now.
> 
> > 
> > > As was already mentioned, you need to understand how your files are
> > > changed. In particular, standard tools for software update do not
> > > rewrite files in place - they create new files with new content. From
> > > btrfs perspective they are completely different; two files with the same
> > > name in two snapshots do not share a single byte. When you compute delta
> > > between two snapshots you get instructions to delete old file and create
> > > new file with new content (that will be renamed to the same name as
> > > deleted old file). This also by necessity sends full new content.
> 
> As you said, many standard tools create new files instead of updating files
> in place. But I guess a `dedupe` run before creating the snapshot could help
> here, right?

Test thoroughly.  btrfs send was designed and implemented before btrfs
had dedupe.  There may be bugs with some use cases.

> If we have a root file system build process that always regenerates all
> files, and then copies those into a file system, then all files are
> 'different' from a btrfs perspective.

It sounds like you want to use rsync or casync instead of btrfs send.
They do more work on the sending side to minimize the cost in transit
and at the receiving side.  They don't particularly care about *how*
the files came to contain the data they do--contrast with btrfs send,
which cares about nothing else.  They both have output stream options,
so you can replicate the data changes on multiple receivers if they have
access to identical pre-delta content.

rsync doesn't do dedupe or reflink (not sure about casync), but it may be
easier to add reflink copy to rsync than it is to add delta compression
to btrfs send (*).  You can also dedupe after the fact on the receiver
side, but that might not be desirable for assorted good reasons.

btrfs send is mostly about extracting data from the source in bulk as
quickly as possible.  The backup use case makes only a single copy,
and the sender wants the cost at their end to be as close to zero
as possible.  Delta compression works against those goals, especially
when considering kernel memory constraints.

(*) maybe...?  rsync and btrfs send (kernel) are both pretty gnarly C
programs, and if it was easy to add reflink to rsync, I would expect
rsync to be already doing reflink by now...

> > > So yes, btrfs replication is block based; similarity is determined by
> > > how much physical data is shared between two files. And you expect file
> > > based replication where file names determine whether files should be
> > > considered the same and changes are computed for two files with the same
> > > name.
> 
> Right. Maybe we could use the file path just as a hint for an opportunity of
> saving resources by creating block based deltas.
> 
> I guess I have to think about this some more.
> 
> Thanks a lot!
> Claudius

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-03-31  1:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-29 13:16 btrfs-send format that contains binary diffs Claudius Heine
2021-03-29 16:30 ` Andrei Borzenkov
2021-03-29 17:25   ` Henning Schild
2021-03-29 18:00     ` Martin Raiber
2021-03-29 19:25       ` Claudius Heine
2021-03-29 19:14   ` Claudius Heine
2021-03-29 19:53     ` Lionel Bouton
2021-03-30  7:48       ` Claudius Heine
2021-03-30  5:33     ` Andrei Borzenkov
2021-03-30  5:38       ` Andrei Borzenkov
2021-03-30  8:12         ` Claudius Heine
2021-03-30 16:32           ` Henning Schild
2021-03-31  1:17           ` Zygo Blaxell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.