linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* reproducible builds with btrfs seed feature
@ 2018-10-13 22:28 Chris Murphy
  2018-10-13 23:05 ` Chris Murphy
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Chris Murphy @ 2018-10-13 22:28 UTC (permalink / raw)
  To: Btrfs BTRFS; +Cc: Anand Jain

Is it practical and desirable to make Btrfs based OS installation
images reproducible? Or is Btrfs simply too complex and
non-deterministic? [1]

The main three problems with Btrfs right now for reproducibility are:
a. many objects have uuids other than the volume uuid; and mkfs only
lets us set the volume uuid
b. atime, ctime, mtime, otime; and no way to make them all the same
c. non-deterministic allocation of file extents, compression, inode
assignment, logical and physical address allocation

I'm imagining reproducible image creation would be a mkfs feature that
builds on Btrfs seed and --rootdir concepts to constrain Btrfs
features to maybe make reproducible Btrfs volumes possible:

- No raid
- Either all objects needing uuids can have those uuids specified by
switch, or possibly a defined set of uuids expressly for this use
case, or possibly all of them can just be zeros (eek? not sure)
- A flag to set all times the same
- Possibly require that target block device is zero filled before
creation of the Btrfs
- Possibly disallow subvolumes and snapshots
- Require the resulting image is seed/ro and maybe also a new
compat_ro flag to enforce that such Btrfs file systems cannot be
modified after the fact.
- Enforce a consistent means of allocation and compression

The end result is creating two Btrfs volumes would yield image files
with matching hashes.

If I had to guess, the biggest challenge would be allocation. But it's
also possible that such an image may have problems with "sprouts". A
non-removable sprout seems fairly straightforward and safe; but if a
"reproducible build" type of seed is removed, it seems like removal
needs to be smart enough to refresh *all* uuids found in the sprout: a
hard break from the seed.

Competing file systems, ext4 with make_ext4 fork, and squashfs. At the
moment I'm thinking it might be easier to teach squashfs integrity
checking than to make Btrfs reproducible.  But then I also think
restricting Btrfs features, and applying some requirements to
constrain Btrfs to make it reproducible, really enhances the Btrfs
seed-sprout feature.

Any thoughts? Useful? Difficult to implement?

Squashfs might be a better fit for this use case *if* it can be taught
about integrity checking. It does per file checksums for the purpose
of deduplication but those checksums aren't retained for later
integrity checking.

[1] problems of reproducible system images
https://reproducible-builds.org/docs/system-images/

[2] purpose and motivation for reproducible builds
https://reproducible-builds.org/

[3] who is involved?
https://reproducible-builds.org/who/#Qubes%20OS




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-13 22:28 reproducible builds with btrfs seed feature Chris Murphy
@ 2018-10-13 23:05 ` Chris Murphy
  2018-10-14 12:20   ` Cerem Cem ASLAN
  2018-10-15 12:29 ` Austin S. Hemmelgarn
  2018-10-16  8:13 ` Anand Jain
  2 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2018-10-13 23:05 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Anand Jain

On Sat, Oct 13, 2018 at 4:28 PM, Chris Murphy <lists@colorremedies.com> wrote:
> Is it practical and desirable to make Btrfs based OS installation
> images reproducible? Or is Btrfs simply too complex and
> non-deterministic? [1]
>
> The main three problems with Btrfs right now for reproducibility are:
> a. many objects have uuids other than the volume uuid; and mkfs only
> lets us set the volume uuid
> b. atime, ctime, mtime, otime; and no way to make them all the same
> c. non-deterministic allocation of file extents, compression, inode
> assignment, logical and physical address allocation

d. generation, just pick a consistent default because the entire image
is made with mkfs and then never rw mounted so it's not a problem

> - Possibly disallow subvolumes and snapshots

There's no actual mechanism to do either of these with mkfs, so it's
not a problem. And if a sprout is created, it's fine for newly created
subvolumes to follow the usual behavior of having unique UUID and
incrementing generation. Thing is, the sprout will inherit the seeds
preset chunk uuid, which while it shouldn't cause a problem is a kind
of violation of uuid uniqueness; but ultimately I'm not sure how big
of a problem it is for such uuids to spread.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-13 23:05 ` Chris Murphy
@ 2018-10-14 12:20   ` Cerem Cem ASLAN
  2018-10-14 18:10     ` Chris Murphy
  0 siblings, 1 reply; 13+ messages in thread
From: Cerem Cem ASLAN @ 2018-10-14 12:20 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, anand.jain

I'm not sure I could fully understand the desired achievement but it
sounds like (or this would be an example of selective perception) it's
somehow related with "creating reproducible snapshots"
(https://unix.stackexchange.com/q/462451/65781), no?
Chris Murphy <lists@colorremedies.com>, 14 Eki 2018 Paz, 02:05
tarihinde şunu yazdı:
>
> On Sat, Oct 13, 2018 at 4:28 PM, Chris Murphy <lists@colorremedies.com> wrote:
> > Is it practical and desirable to make Btrfs based OS installation
> > images reproducible? Or is Btrfs simply too complex and
> > non-deterministic? [1]
> >
> > The main three problems with Btrfs right now for reproducibility are:
> > a. many objects have uuids other than the volume uuid; and mkfs only
> > lets us set the volume uuid
> > b. atime, ctime, mtime, otime; and no way to make them all the same
> > c. non-deterministic allocation of file extents, compression, inode
> > assignment, logical and physical address allocation
>
> d. generation, just pick a consistent default because the entire image
> is made with mkfs and then never rw mounted so it's not a problem
>
> > - Possibly disallow subvolumes and snapshots
>
> There's no actual mechanism to do either of these with mkfs, so it's
> not a problem. And if a sprout is created, it's fine for newly created
> subvolumes to follow the usual behavior of having unique UUID and
> incrementing generation. Thing is, the sprout will inherit the seeds
> preset chunk uuid, which while it shouldn't cause a problem is a kind
> of violation of uuid uniqueness; but ultimately I'm not sure how big
> of a problem it is for such uuids to spread.
>
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-14 12:20   ` Cerem Cem ASLAN
@ 2018-10-14 18:10     ` Chris Murphy
  2018-10-14 19:09       ` Cerem Cem ASLAN
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2018-10-14 18:10 UTC (permalink / raw)
  To: Cerem Cem ASLAN; +Cc: Btrfs BTRFS, Anand Jain

On Sun, Oct 14, 2018 at 6:20 AM, Cerem Cem ASLAN <ceremcem@ceremcem.net> wrote:
> I'm not sure I could fully understand the desired achievement but it
> sounds like (or this would be an example of selective perception) it's
> somehow related with "creating reproducible snapshots"
> (https://unix.stackexchange.com/q/462451/65781), no?

No the idea is to be able to consistently reproduce a distro installer
image (like an ISO file) with the same hash. Inside the ISO image, is
typically a root.img or squash.img which itself contains a file system
like ext4 or squashfs, to act as the system root. And that root.img is
the main thing I'm talking about here. There is work to make squashfs
deterministic, as well as ext4. And I'm wondering if there are sane
ways to constrain Btrfs features to make it likewise deterministic.

For example:

fallocate -l 5G btrfsroot.img
losetup /dev/loop0 btrfsroot.img
mkfs.btrfs -m single -d single -rseed --rootdir /tmp/ -T
"20181010T1200" --uuidv $X --uuidc $Y --uuidd $Z ...
shasum btrfsroot.img

And then do it again, and the shasum's should be the same. I realize
today it's not that way. And that inode assignment, extent allocation
(number, size, locality) are all factors in making Btrfs quickly
non-determinstic, and also why I'm assuming this needs to be done in
user space. That would be the point of the -rseed flag: set the seed
flag, possibly set a compat_ro flag, fix generation/transid to 1,
require the use of -T (similar to make_ext4) to set all timestamps to
this value, and configurable uuid's for everything that uses uuids,
and whatever other constraints are necessary.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-14 18:10     ` Chris Murphy
@ 2018-10-14 19:09       ` Cerem Cem ASLAN
  2018-10-14 23:38         ` Chris Murphy
  0 siblings, 1 reply; 13+ messages in thread
From: Cerem Cem ASLAN @ 2018-10-14 19:09 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, anand.jain

Thanks for the explanation, I got it now. I still think this is
related with my needs, so I'll keep an eye on this.

What is the possible use case? I can think of only one scenario: You
have a rootfs that contains a distro installer and you want to
generate distro.img files which uses Btrfs under the hood in different
locations and still have the same hash, so you can publish your
verified image hash by a single source (https://your-distro.org).
You'll sync next release files with the remote servers by using diffs
(btrfs send/receive) and they will generate distro.img independently,
still having the same hash that you'll later verify by
https://your-distro.org.
Chris Murphy <lists@colorremedies.com>, 14 Eki 2018 Paz, 21:10
tarihinde şunu yazdı:
>
> On Sun, Oct 14, 2018 at 6:20 AM, Cerem Cem ASLAN <ceremcem@ceremcem.net> wrote:
> > I'm not sure I could fully understand the desired achievement but it
> > sounds like (or this would be an example of selective perception) it's
> > somehow related with "creating reproducible snapshots"
> > (https://unix.stackexchange.com/q/462451/65781), no?
>
> No the idea is to be able to consistently reproduce a distro installer
> image (like an ISO file) with the same hash. Inside the ISO image, is
> typically a root.img or squash.img which itself contains a file system
> like ext4 or squashfs, to act as the system root. And that root.img is
> the main thing I'm talking about here. There is work to make squashfs
> deterministic, as well as ext4. And I'm wondering if there are sane
> ways to constrain Btrfs features to make it likewise deterministic.
>
> For example:
>
> fallocate -l 5G btrfsroot.img
> losetup /dev/loop0 btrfsroot.img
> mkfs.btrfs -m single -d single -rseed --rootdir /tmp/ -T
> "20181010T1200" --uuidv $X --uuidc $Y --uuidd $Z ...
> shasum btrfsroot.img
>
> And then do it again, and the shasum's should be the same. I realize
> today it's not that way. And that inode assignment, extent allocation
> (number, size, locality) are all factors in making Btrfs quickly
> non-determinstic, and also why I'm assuming this needs to be done in
> user space. That would be the point of the -rseed flag: set the seed
> flag, possibly set a compat_ro flag, fix generation/transid to 1,
> require the use of -T (similar to make_ext4) to set all timestamps to
> this value, and configurable uuid's for everything that uses uuids,
> and whatever other constraints are necessary.
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-14 19:09       ` Cerem Cem ASLAN
@ 2018-10-14 23:38         ` Chris Murphy
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Murphy @ 2018-10-14 23:38 UTC (permalink / raw)
  To: Cerem Cem ASLAN; +Cc: Chris Murphy, Btrfs BTRFS, Anand Jain

On Sun, Oct 14, 2018 at 1:09 PM, Cerem Cem ASLAN <ceremcem@ceremcem.net> wrote:
> Thanks for the explanation, I got it now. I still think this is
> related with my needs, so I'll keep an eye on this.
>
> What is the possible use case? I can think of only one scenario: You
> have a rootfs that contains a distro installer and you want to
> generate distro.img files which uses Btrfs under the hood in different
> locations and still have the same hash, so you can publish your
> verified image hash by a single source (https://your-distro.org).

The first step is accepting reproducible builds as a worthy goal in
and of itself independent of Btrfs. Specifically "Why does it matter?"
found here https://reproducible-builds.org/

Btrfs does bring valuable features for installation images: always on
checksumming; seed feature permits a straightforward way to setup a
volatile overlay on zram device; ability to convert it to a
non-volatile overlay, and boot either the seed or overlay; and even
installation by adding the install target and removing both overlay
and seed. And yet it remains compatible with a conventional copy to
another file system if it's not desirable to use Btrfs as root. Win
win.

By subsetting Btrfs features we don't care about in the installation
seed context, can we achieve reproducibility as a consequence, while
retaining some of the more interesting features? Of course once
sprouted, those limitations wouldn't apply.

Basically it's a "btrfs seed device 2.0" idea. But Btrfs is so
complicated it's maybe too much work, hence the question.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-13 22:28 reproducible builds with btrfs seed feature Chris Murphy
  2018-10-13 23:05 ` Chris Murphy
@ 2018-10-15 12:29 ` Austin S. Hemmelgarn
  2018-10-15 19:52   ` Chris Murphy
  2018-10-16  8:13 ` Anand Jain
  2 siblings, 1 reply; 13+ messages in thread
From: Austin S. Hemmelgarn @ 2018-10-15 12:29 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS; +Cc: Anand Jain

On 2018-10-13 18:28, Chris Murphy wrote:
> Is it practical and desirable to make Btrfs based OS installation
> images reproducible? Or is Btrfs simply too complex and
> non-deterministic? [1]
> 
> The main three problems with Btrfs right now for reproducibility are:
> a. many objects have uuids other than the volume uuid; and mkfs only
> lets us set the volume uuid
> b. atime, ctime, mtime, otime; and no way to make them all the same
> c. non-deterministic allocation of file extents, compression, inode
> assignment, logical and physical address allocation
> 
> I'm imagining reproducible image creation would be a mkfs feature that
> builds on Btrfs seed and --rootdir concepts to constrain Btrfs
> features to maybe make reproducible Btrfs volumes possible:
> 
> - No raid
> - Either all objects needing uuids can have those uuids specified by
> switch, or possibly a defined set of uuids expressly for this use
> case, or possibly all of them can just be zeros (eek? not sure)
> - A flag to set all times the same
> - Possibly require that target block device is zero filled before
> creation of the Btrfs
> - Possibly disallow subvolumes and snapshots
> - Require the resulting image is seed/ro and maybe also a new
> compat_ro flag to enforce that such Btrfs file systems cannot be
> modified after the fact.
> - Enforce a consistent means of allocation and compression
> 
> The end result is creating two Btrfs volumes would yield image files
> with matching hashes.
So in other words, you care about matching the block layout _exactly_. 
This is a great idea for paranoid people, but it's usually overkill. 
Realistically, almost nothing in userspace cares about the block layout, 
worrying about it just makes verifying the reproduced image a bit easier 
(there's no reason you can't verify all the relevant data without doing 
a checksum or HMAC of the image as a whole).
> 
> If I had to guess, the biggest challenge would be allocation. But it's
> also possible that such an image may have problems with "sprouts". A
> non-removable sprout seems fairly straightforward and safe; but if a
> "reproducible build" type of seed is removed, it seems like removal
> needs to be smart enough to refresh *all* uuids found in the sprout: a
> hard break from the seed.
> 
> Competing file systems, ext4 with make_ext4 fork, and squashfs. At the
> moment I'm thinking it might be easier to teach squashfs integrity
> checking than to make Btrfs reproducible.  But then I also think
> restricting Btrfs features, and applying some requirements to
> constrain Btrfs to make it reproducible, really enhances the Btrfs
> seed-sprout feature.
> 
> Any thoughts? Useful? Difficult to implement?
> 
> Squashfs might be a better fit for this use case *if* it can be taught
> about integrity checking. It does per file checksums for the purpose
> of deduplication but those checksums aren't retained for later
> integrity checking.
I've seen projects with SquashFS that store integrity data separately 
but leverage other infrastructure.  Methods I've seen so far include:

* GPG-signed SquashFS images, usually with detached signatures
* SquashFS with PAR2 integrity checking data
* SquashFS on top of dm-verity
* SquashFS on top of dm-integrity

The first two need to be externally checked prior to mount, but doing so 
is not hard.  The fourth is tricky to set up right, but provides better 
integration with encrypted images.  The third does exactly what's needed 
though.  You just use the embedded data variant of dm-verity, bind the 
resultant image to a loop device, activate dm-verity on the loop device, 
and mount the resultant mapped device like any other SquashFS image.

I've also seen some talk of using SquashFS with IMA and IMA appraisal, 
but I've not seen anybody actually _do_ that, and it wouldn't be on 
quite the level you seem to want (it verifies the files in the image, 
but not the image as a whole).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-15 12:29 ` Austin S. Hemmelgarn
@ 2018-10-15 19:52   ` Chris Murphy
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Murphy @ 2018-10-15 19:52 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Btrfs BTRFS, Anand Jain

On Mon, Oct 15, 2018 at 6:29 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2018-10-13 18:28, Chris Murphy wrote:

>> The end result is creating two Btrfs volumes would yield image files
>> with matching hashes.
>
> So in other words, you care about matching the block layout _exactly_.

Only because that's the easiest way to verify reproducibility without
any ambiguity.

If someone's compromised a build system such that everyone is getting
the malicious payload, but they can hide it behind a subvolume or
reflink that's not used by default, could someone plausibly cause
selective use of their malicious payload? I dunno I leave that for
more crafty people. But even if it's a tiny bit of ambiguity, it's
non-zero. Hashing a file that contains the entire file system is
unambiguous.

I think populating the image with --rootdir at mkfs time should be
pretty deterministic. One stream in and out. No generations, no
snapshot, no delayed allocation. It'd be quite similar to mksquashfs.
I guess I'd have to try it a few times, and see if really the only
differences are uuids and times, and not allocation related things.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-13 22:28 reproducible builds with btrfs seed feature Chris Murphy
  2018-10-13 23:05 ` Chris Murphy
  2018-10-15 12:29 ` Austin S. Hemmelgarn
@ 2018-10-16  8:13 ` Anand Jain
  2018-10-16 19:49   ` Chris Murphy
  2 siblings, 1 reply; 13+ messages in thread
From: Anand Jain @ 2018-10-16  8:13 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS



On 10/14/2018 06:28 AM, Chris Murphy wrote:
> Is it practical and desirable to make Btrfs based OS installation
> images reproducible? Or is Btrfs simply too complex and
> non-deterministic? [1]
> 
> The main three problems with Btrfs right now for reproducibility are:
> a. many objects have uuids other than the volume uuid; and mkfs only
> lets us set the volume uuid
> b. atime, ctime, mtime, otime; and no way to make them all the same
> c. non-deterministic allocation of file extents, compression, inode
> assignment, logical and physical address allocation
> 
> I'm imagining reproducible image creation would be a mkfs feature that
> builds on Btrfs seed and --rootdir concepts to constrain Btrfs
> features to maybe make reproducible Btrfs volumes possible:
> 
> - No raid
> - Either all objects needing uuids can have those uuids specified by
> switch, or possibly a defined set of uuids expressly for this use
> case, or possibly all of them can just be zeros (eek? not sure)
> - A flag to set all times the same
> - Possibly require that target block device is zero filled before
> creation of the Btrfs
> - Possibly disallow subvolumes and snapshots
> - Require the resulting image is seed/ro and maybe also a new
> compat_ro flag to enforce that such Btrfs file systems cannot be
> modified after the fact.
> - Enforce a consistent means of allocation and compression
> 
> The end result is creating two Btrfs volumes would yield image files
> with matching hashes.

> If I had to guess, the biggest challenge would be allocation. But it's
> also possible that such an image may have problems with "sprouts". A
> non-removable sprout seems fairly straightforward and safe; but if a
> "reproducible build" type of seed is removed, it seems like removal
> needs to be smart enough to refresh *all* uuids found in the sprout: a
> hard break from the seed.

Right. The seed fsid will be gone in a detached sprout.

> Competing file systems, ext4 with make_ext4 fork, and squashfs. At the
> moment I'm thinking it might be easier to teach squashfs integrity
> checking than to make Btrfs reproducible.  But then I also think
> restricting Btrfs features, and applying some requirements to
> constrain Btrfs to make it reproducible, really enhances the Btrfs
> seed-sprout feature.

 > Any thoughts? Useful? Difficult to implement?

Recently Nikolay sent a patch to change fsid on a mounted btrfs. However 
for a reproducible builds it also needs neutralized uuids, time, 
bytenr(s) further more though the ondisk layout won't change without 
notice but block-bytenr might.

One question why not reproducible builds get the file data extents from 
the image and stitch the hashes together to verify the hash. And there 
could be a vfs ioctl to import and export filesystem images for a better 
support-ability of the use-case similar to the reproducible builds.

For the seed sprout feature one thing I have in mind is to make it image 
and subvolume granular rather than the disk and fsid granular, and 
ability to transpire golden image (seed) updates, but I haven't checked 
the feasibility yet.

Thanks, Anand

> 
> Squashfs might be a better fit for this use case *if* it can be taught
> about integrity checking.

> It does per file checksums for the purpose
> of deduplication but those checksums aren't retained for later
> integrity checking.
> 
> [1] problems of reproducible system images
> https://reproducible-builds.org/docs/system-images/
> 
> [2] purpose and motivation for reproducible builds
> https://reproducible-builds.org/
> 
> [3] who is involved?
> https://reproducible-builds.org/who/#Qubes%20OS
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-16  8:13 ` Anand Jain
@ 2018-10-16 19:49   ` Chris Murphy
  2018-10-17  4:08     ` Anand Jain
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2018-10-16 19:49 UTC (permalink / raw)
  To: Anand Jain; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Oct 16, 2018 at 2:13 AM, Anand Jain <anand.jain@oracle.com> wrote:
>
>
> On 10/14/2018 06:28 AM, Chris Murphy wrote:
>>
>> Is it practical and desirable to make Btrfs based OS installation
>> images reproducible? Or is Btrfs simply too complex and
>> non-deterministic? [1]
>>
>> The main three problems with Btrfs right now for reproducibility are:
>> a. many objects have uuids other than the volume uuid; and mkfs only
>> lets us set the volume uuid
>> b. atime, ctime, mtime, otime; and no way to make them all the same
>> c. non-deterministic allocation of file extents, compression, inode
>> assignment, logical and physical address allocation
>>
>> I'm imagining reproducible image creation would be a mkfs feature that
>> builds on Btrfs seed and --rootdir concepts to constrain Btrfs
>> features to maybe make reproducible Btrfs volumes possible:
>>
>> - No raid
>> - Either all objects needing uuids can have those uuids specified by
>> switch, or possibly a defined set of uuids expressly for this use
>> case, or possibly all of them can just be zeros (eek? not sure)
>> - A flag to set all times the same
>> - Possibly require that target block device is zero filled before
>> creation of the Btrfs
>> - Possibly disallow subvolumes and snapshots
>> - Require the resulting image is seed/ro and maybe also a new
>> compat_ro flag to enforce that such Btrfs file systems cannot be
>> modified after the fact.
>> - Enforce a consistent means of allocation and compression
>>
>> The end result is creating two Btrfs volumes would yield image files
>> with matching hashes.
>
>
>> If I had to guess, the biggest challenge would be allocation. But it's
>> also possible that such an image may have problems with "sprouts". A
>> non-removable sprout seems fairly straightforward and safe; but if a
>> "reproducible build" type of seed is removed, it seems like removal
>> needs to be smart enough to refresh *all* uuids found in the sprout: a
>> hard break from the seed.
>
>
> Right. The seed fsid will be gone in a detached sprout.

I think already we get a new devid, volume uuid, and device uuid. Open
question is whether any other uuid's need to be refreshed, such as
chunk uuid since that appears in every node and leaf.


>> Any thoughts? Useful? Difficult to implement?
>
> Recently Nikolay sent a patch to change fsid on a mounted btrfs. However for
> a reproducible builds it also needs neutralized uuids, time, bytenr(s)
> further more though the ondisk layout won't change without notice but
> block-bytenr might.

Seems like the mkfs population method of such a seed, could be made
very deterministic as to what the start logical address and physical
address are. The vast majority of non-deterministic behavior comes
from the nature of kernel code having to handle so many complex inputs
and outputs, and negotiate them.


> One question why not reproducible builds get the file data extents from the
> image and stitch the hashes together to verify the hash. And there could be
> a vfs ioctl to import and export filesystem images for a better
> support-ability of the use-case similar to the reproducible builds.

Perhaps. I don't know the reproducible build requirements very well,
if all they really care about is the hash of the data extents, and
really how important fs metadata is. That is important when it comes
to fuzzing file systems that have no metadata checksumming like
squashfs; of course you'd have to checksum the whole file system
image.

Another feature the mkfs variety of seed image would need,
deduplication.  As far as I know, deduplication is kernel code only.
You'd want to be able to deduplicate, as well as compress, to have the
smallest distributed seed possible. And mksquashfs does deduplication
by default.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-16 19:49   ` Chris Murphy
@ 2018-10-17  4:08     ` Anand Jain
  2018-10-18 18:02       ` Chris Murphy
  0 siblings, 1 reply; 13+ messages in thread
From: Anand Jain @ 2018-10-17  4:08 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



On 10/17/2018 03:49 AM, Chris Murphy wrote:
> On Tue, Oct 16, 2018 at 2:13 AM, Anand Jain <anand.jain@oracle.com> wrote:
>>
>>
>> On 10/14/2018 06:28 AM, Chris Murphy wrote:
>>>
>>> Is it practical and desirable to make Btrfs based OS installation
>>> images reproducible? Or is Btrfs simply too complex and
>>> non-deterministic? [1]
>>>
>>> The main three problems with Btrfs right now for reproducibility are:
>>> a. many objects have uuids other than the volume uuid; and mkfs only
>>> lets us set the volume uuid
>>> b. atime, ctime, mtime, otime; and no way to make them all the same
>>> c. non-deterministic allocation of file extents, compression, inode
>>> assignment, logical and physical address allocation
>>>
>>> I'm imagining reproducible image creation would be a mkfs feature that
>>> builds on Btrfs seed and --rootdir concepts to constrain Btrfs
>>> features to maybe make reproducible Btrfs volumes possible:
>>>
>>> - No raid
>>> - Either all objects needing uuids can have those uuids specified by
>>> switch, or possibly a defined set of uuids expressly for this use
>>> case, or possibly all of them can just be zeros (eek? not sure)
>>> - A flag to set all times the same
>>> - Possibly require that target block device is zero filled before
>>> creation of the Btrfs
>>> - Possibly disallow subvolumes and snapshots
>>> - Require the resulting image is seed/ro and maybe also a new
>>> compat_ro flag to enforce that such Btrfs file systems cannot be
>>> modified after the fact.
>>> - Enforce a consistent means of allocation and compression
>>>
>>> The end result is creating two Btrfs volumes would yield image files
>>> with matching hashes.
>>
>>
>>> If I had to guess, the biggest challenge would be allocation. But it's
>>> also possible that such an image may have problems with "sprouts". A
>>> non-removable sprout seems fairly straightforward and safe; but if a
>>> "reproducible build" type of seed is removed, it seems like removal
>>> needs to be smart enough to refresh *all* uuids found in the sprout: a
>>> hard break from the seed.
>>
>>
>> Right. The seed fsid will be gone in a detached sprout.
> 
> I think already we get a new devid, volume uuid, and device uuid.

  Yes on the sprout.

> Open
> question is whether any other uuid's need to be refreshed, such as
> chunk uuid since that appears in every node and leaf.

  There are quite a number of uuid.

>>> Any thoughts? Useful? Difficult to implement?
>>
>> Recently Nikolay sent a patch to change fsid on a mounted btrfs. However for
>> a reproducible builds it also needs neutralized uuids, time, bytenr(s)
>> further more though the ondisk layout won't change without notice but
>> block-bytenr might.
> 
> Seems like the mkfs population method of such a seed,

> could be made
> very deterministic as to what the start logical address and physical
> address are.

  Can be. But it can change in future fixes as those aren't EXPORTED().

> The vast majority of non-deterministic behavior comes
> from the nature of kernel code having to handle so many complex inputs
> and outputs, and negotiate them.

> 
>> One question why not reproducible builds get the file data extents from the
>> image and stitch the hashes together to verify the hash. And there could be
>> a vfs ioctl to import and export filesystem images for a better
>> support-ability of the use-case similar to the reproducible builds.
> 
> Perhaps. I don't know the reproducible build requirements very well,
> if all they really care about is the hash of the data extents, and
> really how important fs metadata is.


> That is important when it comes
> to fuzzing file systems that have no metadata checksumming like
> squashfs; of course you'd have to checksum the whole file system
> image.


> Another feature the mkfs variety of seed image would need,
> deduplication.  As far as I know, deduplication is kernel code only.
> You'd want to be able to deduplicate, 


> as well as compress, to have the
> smallest distributed seed possible.

btrfs-image(8) already does compress.

I don't think mkfs is the right place to sanitize the uuid/fsid/time... 
it should be when we generate the btrfs-image.

  So a possible solution for the reproducible builds:
    usual mkfs.btrfs dev
    Write the data
    unmount; create btrfs-image with uuid/fsid/time sanitized; mark it 
as a seed (RO).
    check/verify the hash of the image.

   If the hash match. To use this btrfs-image.
    Rest the seed (RO) flag; mount and use it;
    OR
    Mount the seed device; add a RW sprout; detach the seed;
    OR
    Don't set the RO at all (above) and just mount and use it;

Thanks, Anand

> And mksquashfs does deduplication
> by default.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-17  4:08     ` Anand Jain
@ 2018-10-18 18:02       ` Chris Murphy
  2018-10-19  0:47         ` Anand Jain
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2018-10-18 18:02 UTC (permalink / raw)
  To: Anand Jain; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Oct 16, 2018 at 10:08 PM, Anand Jain <anand.jain@oracle.com> wrote:


>
>  So a possible solution for the reproducible builds:
>    usual mkfs.btrfs dev
>    Write the data
>    unmount; create btrfs-image with uuid/fsid/time sanitized; mark it as a
> seed (RO).
>    check/verify the hash of the image.

Gotcha. Generation/transid needs to be included in that list. Imagine
a fast system vs a slow system. The slow system certainly will end up
with with higher transid's for the latest completed transactions.

But also, I don't know how the kernel code chooses block numbers,
either physical (chunk allocation) or logical (extent allocation) and
if that could be made deterministic. Same for inode assignment.

Another question that comes up later when creating the sprout by
removing the seed device, is how a script can know when all block
groups have successfully copied from seed to sprout, and that the
sprout can be unmounted.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: reproducible builds with btrfs seed feature
  2018-10-18 18:02       ` Chris Murphy
@ 2018-10-19  0:47         ` Anand Jain
  0 siblings, 0 replies; 13+ messages in thread
From: Anand Jain @ 2018-10-19  0:47 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



On 10/19/2018 02:02 AM, Chris Murphy wrote:
> On Tue, Oct 16, 2018 at 10:08 PM, Anand Jain <anand.jain@oracle.com> wrote:
> 
> 
>>
>>   So a possible solution for the reproducible builds:
>>     usual mkfs.btrfs dev
>>     Write the data
>>     unmount; create btrfs-image with uuid/fsid/time sanitized; mark it as a
>> seed (RO).
>>     check/verify the hash of the image.
> 
> Gotcha. Generation/transid needs to be included in that list. Imagine
> a fast system vs a slow system. The slow system certainly will end up
> with with higher transid's for the latest completed transactions.

  In a scripted build environment the transid could remain same, as 
there won't extra sync or mount -o transid changes.. etc.

> But also, I don't know how the kernel code chooses block numbers,
> either physical (chunk allocation) or logical (extent allocation) and
> if that could be made deterministic. Same for inode assignment.

  The above list may not be complete. To avoid the disk size,type 
related changes one can choose to create a mkfs on a file instead of 
disk. But the point I am trying to make with bytenr is if a tool uses 
certain items which are not explicitly EXPORTED/ioctl, that means it can 
change without notice, unless these tools are inline with the btrfs 
kernel changes it would break.

> Another question that comes up later when creating the sprout by
> removing the seed device, is how a script can know when all block
> groups have successfully copied from seed to sprout, and that the
> sprout can be unmounted.

  Oh.
    mount -o loop seed.img /seed  <-- this will be RO
    btrfs device add /dev/sprout /seed <-- new FSID on the same mount 
point, /dev/sprout will have new SB with new FSID, and sprout device 
count will include the seed device. And originally this was RW already 
but we broke it somewhere. But not a big deal as we can use remount.
    mount -o remount,rw /dev/sprout /seed <-- this is RW. Only _new_ 
writes goes to /dev/sprout, and sprout still needs seed to mount.
    btrfs device delete <seed-devid> /seed <-- this will transfer all 
seed blocks to /dev/sprout.

    Now /dev/sprout is an independent RW FS with the contents from the 
seed and its total device count is now 1.

Thanks, Anand
> 
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-10-19  0:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-13 22:28 reproducible builds with btrfs seed feature Chris Murphy
2018-10-13 23:05 ` Chris Murphy
2018-10-14 12:20   ` Cerem Cem ASLAN
2018-10-14 18:10     ` Chris Murphy
2018-10-14 19:09       ` Cerem Cem ASLAN
2018-10-14 23:38         ` Chris Murphy
2018-10-15 12:29 ` Austin S. Hemmelgarn
2018-10-15 19:52   ` Chris Murphy
2018-10-16  8:13 ` Anand Jain
2018-10-16 19:49   ` Chris Murphy
2018-10-17  4:08     ` Anand Jain
2018-10-18 18:02       ` Chris Murphy
2018-10-19  0:47         ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).