All of lore.kernel.org
 help / color / mirror / Atom feed
* Reproducible XFS filesystem artifacts
@ 2018-01-16  4:49 Philipp Schrader
  2018-01-16  7:55 ` Darrick J. Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Philipp Schrader @ 2018-01-16  4:49 UTC (permalink / raw)
  To: linux-xfs; +Cc: Austin Schuh, Alison Chaiken

Hi all,

We're currently trying to clean up our build processes to make sure
that all binary output is reproducible (along the lines of
https://reproducible-builds.org/).

A few of our build artifacts are filesystem images. We have VFAT and
XFS images. They represent the system updates for units in the field
and at developers' desks.

We're trying to make these images reproducible and I'm in need of some
help. As far as I can tell, one of the biggest culprits is VFAT's
"creation time" and XFS' ctime fields.

Example of VFAT's differences:
$ hexdump -C ~/swu-tests/swu1/dvt-controller-kernel.vfat >
~/swu-tests/swu1/dvt-controller-kernel.vfat.dump
$ hexdump -C ~/swu-tests/swu3/dvt-controller-kernel.vfat >
~/swu-tests/swu3/dvt-controller-kernel.vfat.dump
$ diff -u ~/swu-tests/swu1/dvt-controller-kernel.vfat.dump
~/swu-tests/swu3/dvt-controller-kernel.vfat.dump
--- /x1/home/philipp/swu-tests/swu1/dvt-controller-kernel.vfat.dump
 2017-12-28 12:19:57.349880993 -0800
+++ /x1/home/philipp/swu-tests/swu3/dvt-controller-kernel.vfat.dump
 2017-12-28 12:19:50.253881196 -0800
...
@@ -1011,13 +1011,13 @@
 *
 00006200  41 7a 00 49 00 6d 00 61  00 67 00 0f 00 7c 65 00  |Az.I.m.a.g...|e.|
 00006210  00 00 ff ff ff ff ff ff  ff ff 00 00 ff ff ff ff  |................|
-00006220  5a 49 4d 41 47 45 20 20  20 20 20 20 00 00 63 9b  |ZIMAGE      ..c.|
+00006220  5a 49 4d 41 47 45 20 20  20 20 20 20 00 64 d8 95  |ZIMAGE      .d..|
 00006230  9c 4b 9c 4b 00 00 00 00  21 00 03 00 a0 2d 4a 00  |.K.K....!....-J.|
 00006240  42 6f 00 6c 00 6c 00 65  00 72 00 0f 00 67 2d 00  |Bo.l.l.e.r...g-.|
 00006250  64 00 76 00 74 00 2e 00  64 00 00 00 74 00 62 00  |d.v.t...d...t.b.|
...

As per the structure, that's the ctime (creation time) being different.
https://www.kernel.org/doc/Documentation/filesystems/vfat.txt

I've not had much luck digging into the XFS spec to see prove that the
ctime is different, but I'm pretty certain. When I mount the images, I
can see that ctime is different:
$ stat -c %x,%y,%z,%n /mnt/{a,b}/log/syslog
2017-12-28 11:26:53.552000096 -0800,1969-12-31 16:00:00.000000000
-0800,2017-12-28 11:28:50.524000060 -0800,/mnt/a/log/syslog
2017-12-28 10:46:38.739999913 -0800,1969-12-31 16:00:00.000000000
-0800,2017-12-28 10:48:17.180000049 -0800,/mnt/b/log/syslog

As far as I can tell, there are no mount options to null out the ctime
fields. (As an aside I'm curious as to the reason for this).

Is there a tool that lets me null out ctime fields on a XFS filesystem
image? Or maybe is there a library that lets me traverse the file
system and set the fields to zero manually?

Does what I'm asking make sense? I feel like I'm not the first person
to tackle this, but I haven't been lucky with finding anything to
address this.

Thanks,
Phil

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reproducible XFS filesystem artifacts
  2018-01-16  4:49 Reproducible XFS filesystem artifacts Philipp Schrader
@ 2018-01-16  7:55 ` Darrick J. Wong
  2018-01-17  0:52   ` Philipp Schrader
  0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2018-01-16  7:55 UTC (permalink / raw)
  To: Philipp Schrader; +Cc: linux-xfs, Austin Schuh, Alison Chaiken

On Mon, Jan 15, 2018 at 08:49:02PM -0800, Philipp Schrader wrote:
> Hi all,
> 
> We're currently trying to clean up our build processes to make sure
> that all binary output is reproducible (along the lines of
> https://reproducible-builds.org/).
> 
> A few of our build artifacts are filesystem images. We have VFAT and
> XFS images. They represent the system updates for units in the field
> and at developers' desks.
> 
> We're trying to make these images reproducible and I'm in need of some
> help. As far as I can tell, one of the biggest culprits is VFAT's
> "creation time" and XFS' ctime fields.
> 
> Example of VFAT's differences:
> $ hexdump -C ~/swu-tests/swu1/dvt-controller-kernel.vfat >
> ~/swu-tests/swu1/dvt-controller-kernel.vfat.dump
> $ hexdump -C ~/swu-tests/swu3/dvt-controller-kernel.vfat >
> ~/swu-tests/swu3/dvt-controller-kernel.vfat.dump
> $ diff -u ~/swu-tests/swu1/dvt-controller-kernel.vfat.dump
> ~/swu-tests/swu3/dvt-controller-kernel.vfat.dump
> --- /x1/home/philipp/swu-tests/swu1/dvt-controller-kernel.vfat.dump
>  2017-12-28 12:19:57.349880993 -0800
> +++ /x1/home/philipp/swu-tests/swu3/dvt-controller-kernel.vfat.dump
>  2017-12-28 12:19:50.253881196 -0800
> ...
> @@ -1011,13 +1011,13 @@
>  *
>  00006200  41 7a 00 49 00 6d 00 61  00 67 00 0f 00 7c 65 00  |Az.I.m.a.g...|e.|
>  00006210  00 00 ff ff ff ff ff ff  ff ff 00 00 ff ff ff ff  |................|
> -00006220  5a 49 4d 41 47 45 20 20  20 20 20 20 00 00 63 9b  |ZIMAGE      ..c.|
> +00006220  5a 49 4d 41 47 45 20 20  20 20 20 20 00 64 d8 95  |ZIMAGE      .d..|
>  00006230  9c 4b 9c 4b 00 00 00 00  21 00 03 00 a0 2d 4a 00  |.K.K....!....-J.|
>  00006240  42 6f 00 6c 00 6c 00 65  00 72 00 0f 00 67 2d 00  |Bo.l.l.e.r...g-.|
>  00006250  64 00 76 00 74 00 2e 00  64 00 00 00 74 00 62 00  |d.v.t...d...t.b.|
> ...
> 
> As per the structure, that's the ctime (creation time) being different.
> https://www.kernel.org/doc/Documentation/filesystems/vfat.txt

Yep.

> I've not had much luck digging into the XFS spec to see prove that the
> ctime is different, but I'm pretty certain. When I mount the images, I
> can see that ctime is different:
> $ stat -c %x,%y,%z,%n /mnt/{a,b}/log/syslog
> 2017-12-28 11:26:53.552000096 -0800,1969-12-31 16:00:00.000000000
> -0800,2017-12-28 11:28:50.524000060 -0800,/mnt/a/log/syslog
> 2017-12-28 10:46:38.739999913 -0800,1969-12-31 16:00:00.000000000
> -0800,2017-12-28 10:48:17.180000049 -0800,/mnt/b/log/syslog
> 
> As far as I can tell, there are no mount options to null out the ctime
> fields. (As an aside I'm curious as to the reason for this).

Correct, there's (afaict) no userspace interface to change ctime, since
it reflects the last time the inode metadata was updated by the kernel.

> Is there a tool that lets me null out ctime fields on a XFS filesystem
> image

None that I know of.

> Or maybe is there a library that lets me traverse the file
> system and set the fields to zero manually?

Not really, other than messing up the image with the debugger.

> Does what I'm asking make sense? I feel like I'm not the first person
> to tackle this, but I haven't been lucky with finding anything to
> address this.

I'm not sure I understand the use case for exactly reproducible filesystem
images (as opposed to the stuff inside said fs), can you tell us more?

--D

> Thanks,
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reproducible XFS filesystem artifacts
  2018-01-16  7:55 ` Darrick J. Wong
@ 2018-01-17  0:52   ` Philipp Schrader
  2018-01-17  4:05     ` Amir Goldstein
  0 siblings, 1 reply; 9+ messages in thread
From: Philipp Schrader @ 2018-01-17  0:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Austin Schuh, Alison Chaiken

> > I've not had much luck digging into the XFS spec to see prove that the
> > ctime is different, but I'm pretty certain. When I mount the images, I
> > can see that ctime is different:
> > $ stat -c %x,%y,%z,%n /mnt/{a,b}/log/syslog
> > 2017-12-28 11:26:53.552000096 -0800,1969-12-31 16:00:00.000000000
> > -0800,2017-12-28 11:28:50.524000060 -0800,/mnt/a/log/syslog
> > 2017-12-28 10:46:38.739999913 -0800,1969-12-31 16:00:00.000000000
> > -0800,2017-12-28 10:48:17.180000049 -0800,/mnt/b/log/syslog
> >
> > As far as I can tell, there are no mount options to null out the ctime
> > fields. (As an aside I'm curious as to the reason for this).
>
> Correct, there's (afaict) no userspace interface to change ctime, since
> it reflects the last time the inode metadata was updated by the kernel.
>
> > Is there a tool that lets me null out ctime fields on a XFS filesystem
> > image
>
> None that I know of.
>
> > Or maybe is there a library that lets me traverse the file
> > system and set the fields to zero manually?
>
> Not really, other than messing up the image with the debugger.

Which debugger are you talking about? Do you mean xfs_db? I was really
hoping to avoid that :)

>
> > Does what I'm asking make sense? I feel like I'm not the first person
> > to tackle this, but I haven't been lucky with finding anything to
> > address this.
>
> I'm not sure I understand the use case for exactly reproducible filesystem
> images (as opposed to the stuff inside said fs), can you tell us more?

For some background, these images serve as read-only root file system
images on vehicles. During the initial install or during a system
update, new images get written to the disks. This uses a process
equivalent to using dd(1).

We have two primary goals with reproducible filesystem images:

1. Caching for distributed builds.
We're in the process of moving to a distributed build system. That
includes a caching server. The build artifacts are cached so that they
can be quickly retrieved when someone else builds the same thing. To
make the caching actually work we need the artifacts to be
reproducible. In other words, each unique combination of source files
should produce a unique (but repeatable) result.
Until we can build these images in the distributed build it forces
each developer to build them on their own machine. It'd be nice to
move this into the caching infrastructure.

2. Confidence
We use the filesystem images as updates for our installs (and for
initial installs). We need the confidence that when a report from the
field comes in we can reliably re-create everything locally. If we
cannot reproduce the filesystem images, then it quickly becomes a lot
more difficult to validate that you're re-creating the same
environment locally. Having everything reproducible makes testing in
an automotive safety context a lot simpler.

I could dive into a lot more detail here, but I hope that that was a
reasonable high-level summary. The "Why does it matter?" section on
https://reproducible-builds.org/ provides some good links for more
reading.

Phil

>
> --D

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reproducible XFS filesystem artifacts
  2018-01-17  0:52   ` Philipp Schrader
@ 2018-01-17  4:05     ` Amir Goldstein
  2018-01-17  6:15       ` Dave Chinner
  2018-01-22 19:45       ` Philipp Schrader
  0 siblings, 2 replies; 9+ messages in thread
From: Amir Goldstein @ 2018-01-17  4:05 UTC (permalink / raw)
  To: Philipp Schrader
  Cc: Darrick J. Wong, linux-xfs, Austin Schuh, Alison Chaiken, Theodore Tso

On Wed, Jan 17, 2018 at 2:52 AM, Philipp Schrader
<philipp@peloton-tech.com> wrote:
>> > I've not had much luck digging into the XFS spec to see prove that the
>> > ctime is different, but I'm pretty certain. When I mount the images, I
>> > can see that ctime is different:
>> > $ stat -c %x,%y,%z,%n /mnt/{a,b}/log/syslog
>> > 2017-12-28 11:26:53.552000096 -0800,1969-12-31 16:00:00.000000000
>> > -0800,2017-12-28 11:28:50.524000060 -0800,/mnt/a/log/syslog
>> > 2017-12-28 10:46:38.739999913 -0800,1969-12-31 16:00:00.000000000
>> > -0800,2017-12-28 10:48:17.180000049 -0800,/mnt/b/log/syslog
>> >
>> > As far as I can tell, there are no mount options to null out the ctime
>> > fields. (As an aside I'm curious as to the reason for this).
>>
>> Correct, there's (afaict) no userspace interface to change ctime, since
>> it reflects the last time the inode metadata was updated by the kernel.
>>
>> > Is there a tool that lets me null out ctime fields on a XFS filesystem
>> > image
>>
>> None that I know of.
>>
>> > Or maybe is there a library that lets me traverse the file
>> > system and set the fields to zero manually?
>>
>> Not really, other than messing up the image with the debugger.
>
> Which debugger are you talking about? Do you mean xfs_db? I was really
> hoping to avoid that :)
>
>>
>> > Does what I'm asking make sense? I feel like I'm not the first person
>> > to tackle this, but I haven't been lucky with finding anything to
>> > address this.
>>
>> I'm not sure I understand the use case for exactly reproducible filesystem
>> images (as opposed to the stuff inside said fs), can you tell us more?
>
> For some background, these images serve as read-only root file system
> images on vehicles. During the initial install or during a system
> update, new images get written to the disks. This uses a process
> equivalent to using dd(1).
>

So I'm curious. Why xfs and fat and not, say, squashfs?
https://reproducible-builds.org/events/athens2015/system-images/

A quick glance at mksquashfs --help suggests its a much better
tool for the job (e.g. -fstime secs), not to mention that squashfs
is optimized for the read-only root file system distribution use case.

Another example of read-only root file system distributed over the
air to a few billion devices is ext4 on Android.
I'm not sure if Android build system cares about reproducible
system image, but I know it used to create the "system" image with
a home brewed tool called make_ext4fs which creates a well
"packed" fs. This means that fs takes the minimal space it can take
from a given set of files. mkfs.ext4 was not designed for the use
case of creating a file system with 0% free space.

I remember Ted saying that he is happy Google moved away
from from make_ext4fs (although it probably still lives on vendor
builds), but I wonder is the Android build system replacement for
creating a "packed" ext4 image?

I added Ted to CC for his inputs, but I suggest that you add
linux-fsdevel to CC for a larger diversity of inputs.

That is not to suggest that you should not use xfs. You probably
have your reasons for it, but whatever was already done by
others for other fs (e.g. e2image -Qa) may be the way to go
for xfs. xfs_copy would be the first tool I would look into extending
for your use case.

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reproducible XFS filesystem artifacts
  2018-01-17  4:05     ` Amir Goldstein
@ 2018-01-17  6:15       ` Dave Chinner
  2018-01-17  6:34         ` Dave Chinner
  2018-01-22 19:45       ` Philipp Schrader
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2018-01-17  6:15 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Philipp Schrader, Darrick J. Wong, linux-xfs, Austin Schuh,
	Alison Chaiken, Theodore Tso

On Wed, Jan 17, 2018 at 06:05:17AM +0200, Amir Goldstein wrote:
> On Wed, Jan 17, 2018 at 2:52 AM, Philipp Schrader
> <philipp@peloton-tech.com> wrote:
> >> > I've not had much luck digging into the XFS spec to see prove that the
> >> > ctime is different, but I'm pretty certain. When I mount the images, I
> >> > can see that ctime is different:
> >> > $ stat -c %x,%y,%z,%n /mnt/{a,b}/log/syslog
> >> > 2017-12-28 11:26:53.552000096 -0800,1969-12-31 16:00:00.000000000
> >> > -0800,2017-12-28 11:28:50.524000060 -0800,/mnt/a/log/syslog
> >> > 2017-12-28 10:46:38.739999913 -0800,1969-12-31 16:00:00.000000000
> >> > -0800,2017-12-28 10:48:17.180000049 -0800,/mnt/b/log/syslog
> >> >
> >> > As far as I can tell, there are no mount options to null out the ctime
> >> > fields. (As an aside I'm curious as to the reason for this).
> >>
> >> Correct, there's (afaict) no userspace interface to change ctime, since
> >> it reflects the last time the inode metadata was updated by the kernel.
> >>
> >> > Is there a tool that lets me null out ctime fields on a XFS filesystem
> >> > image
> >>
> >> None that I know of.
> >>
> >> > Or maybe is there a library that lets me traverse the file
> >> > system and set the fields to zero manually?
> >>
> >> Not really, other than messing up the image with the debugger.
> >
> > Which debugger are you talking about? Do you mean xfs_db? I was really
> > hoping to avoid that :)

Yup, xfs_db is the only way you can write custom timestamps in XFS
inodes in an OOB manner. But it's not scalable in any way :/

> >> > Does what I'm asking make sense? I feel like I'm not the first person
> >> > to tackle this, but I haven't been lucky with finding anything to
> >> > address this.
> >>
> >> I'm not sure I understand the use case for exactly reproducible filesystem
> >> images (as opposed to the stuff inside said fs), can you tell us more?
> >
> > For some background, these images serve as read-only root file system
> > images on vehicles. During the initial install or during a system
> > update, new images get written to the disks. This uses a process
> > equivalent to using dd(1).

[....]


> That is not to suggest that you should not use xfs. You probably
> have your reasons for it, but whatever was already done by
> others for other fs (e.g. e2image -Qa) may be the way to go
> for xfs. xfs_copy would be the first tool I would look into extending
> for your use case.

Let's make sure we're all on the same page here.

xfs_copy was written for efficient installation of XFS filesystem
images. It doesn't store or copy unused space in it's packed
filesystem image....

What xfs_copy cannot do is modify filesystem metadata. IOWs, it can't
solve the timestamp problem the reproducable build process needs
fixed - it can only be used to optimise deployment of images, and
that's not the problem that is being discussed here.

Philipp, if you need bulk modification of all inodes in the
filesystem, then the only tool we have that has the capability of
doing this in an automated fashion is xfs_repair. It wouldn't take
much modification to set all inode timestamps to a fixed timestamp
(including the hidden ones that users can't see like crtime) and
zero out other variable things like change counters.

*However*

Even with timestamp normalisation, there's still no absolute
guarantee that two filesystems produced by different builds will be
identical. The kernel can decide to write back two files in a
different order (e.g. due to differences in memory pressure on the
machine during the build), and that means they'll be allocated
differently on disk. Or there could be races with background
filesystem operations, resulting in an AG being locked when an
allocation is attempted and so the data extent is allocated in the
next available AG rather than the one local to the inode. And so on.
Even minor kernel version differences can result in the filesystem
images having different layouts.

This is not an XFS specific issue, either. All kernel filesystem are
susceptible to physical layout variance for one reason or another.
The fundamental problem is that build system /cannot control the
filesystem layout/, so even if the contents and user visible
metadata are the same the filesystem images will still not be 100%
identical on every build.

IOWs, you're chasing a goal (100% reproducable filesystem images)
that simply cannot be acheived via writing files through a
kernel-based filesystem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reproducible XFS filesystem artifacts
  2018-01-17  6:15       ` Dave Chinner
@ 2018-01-17  6:34         ` Dave Chinner
  2018-01-22 19:45           ` Philipp Schrader
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2018-01-17  6:34 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Philipp Schrader, Darrick J. Wong, linux-xfs, Austin Schuh,
	Alison Chaiken, Theodore Tso

On Wed, Jan 17, 2018 at 05:15:33PM +1100, Dave Chinner wrote:
> IOWs, you're chasing a goal (100% reproducable filesystem images)
> that simply cannot be acheived via writing files through a
> kernel-based filesystem....

That said, we do have a mechanism for populating XFS filesystems
from userspace in a manner that we may be able to make deterministic
enough for reproducable image file creation: the mkfs.xfs protofile
infrastructure. That runs from mkfs in userspace, and creates the
directory structure and files specified in the protofile. There's
nothing that runs concurrently with this, it will always run the
creation operations in the same order, and I think we could
extend it to specify a global timestamp for all inodes and solve
that problem too.

The protofile infrastructure uses the kernel allocation code which
we already know is deterministic (i.e. gives the same allocation
results for the same operations if the initial state is the same)
and so we can probably get very close to 100% reproducable
filesystem image through this mechanism.

It's need some work and extensions to provide everything that is
needed in a reliable manner, and a bunch of regression tests
added to fstests to make sure it works and keeps working. If you
want to stick with XFS as the base filesystem for your images, this
may be the best way to proceed....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reproducible XFS filesystem artifacts
  2018-01-17  4:05     ` Amir Goldstein
  2018-01-17  6:15       ` Dave Chinner
@ 2018-01-22 19:45       ` Philipp Schrader
  2018-01-22 20:28         ` Austin Schuh
  1 sibling, 1 reply; 9+ messages in thread
From: Philipp Schrader @ 2018-01-22 19:45 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Darrick J. Wong, linux-xfs, Austin Schuh, Alison Chaiken, Theodore Tso

On Tue, Jan 16, 2018 at 8:05 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Jan 17, 2018 at 2:52 AM, Philipp Schrader
> <philipp@peloton-tech.com> wrote:
>>> > I've not had much luck digging into the XFS spec to see prove that the
>>> > ctime is different, but I'm pretty certain. When I mount the images, I
>>> > can see that ctime is different:
>>> > $ stat -c %x,%y,%z,%n /mnt/{a,b}/log/syslog
>>> > 2017-12-28 11:26:53.552000096 -0800,1969-12-31 16:00:00.000000000
>>> > -0800,2017-12-28 11:28:50.524000060 -0800,/mnt/a/log/syslog
>>> > 2017-12-28 10:46:38.739999913 -0800,1969-12-31 16:00:00.000000000
>>> > -0800,2017-12-28 10:48:17.180000049 -0800,/mnt/b/log/syslog
>>> >
>>> > As far as I can tell, there are no mount options to null out the ctime
>>> > fields. (As an aside I'm curious as to the reason for this).
>>>
>>> Correct, there's (afaict) no userspace interface to change ctime, since
>>> it reflects the last time the inode metadata was updated by the kernel.
>>>
>>> > Is there a tool that lets me null out ctime fields on a XFS filesystem
>>> > image
>>>
>>> None that I know of.
>>>
>>> > Or maybe is there a library that lets me traverse the file
>>> > system and set the fields to zero manually?
>>>
>>> Not really, other than messing up the image with the debugger.
>>
>> Which debugger are you talking about? Do you mean xfs_db? I was really
>> hoping to avoid that :)
>>
>>>
>>> > Does what I'm asking make sense? I feel like I'm not the first person
>>> > to tackle this, but I haven't been lucky with finding anything to
>>> > address this.
>>>
>>> I'm not sure I understand the use case for exactly reproducible filesystem
>>> images (as opposed to the stuff inside said fs), can you tell us more?
>>
>> For some background, these images serve as read-only root file system
>> images on vehicles. During the initial install or during a system
>> update, new images get written to the disks. This uses a process
>> equivalent to using dd(1).
>>
>
> So I'm curious. Why xfs and fat and not, say, squashfs?
> https://reproducible-builds.org/events/athens2015/system-images/

It's a good question. It's largely because of historical reasons
internally. We started with XFS on the first iteration of our product.
We fixed a few minor bugs and were overall really happy with
performance etc. Later down the line came the question of system
upgrades without breaking what we currently had. Anyway, so XFS is
where we're at today.

That being said, for the future something like squashfs is definitely
a better choice. Thanks for the suggestion. I'll do more research on
that.

> A quick glance at mksquashfs --help suggests its a much better
> tool for the job (e.g. -fstime secs), not to mention that squashfs
> is optimized for the read-only root file system distribution use case.
>
> Another example of read-only root file system distributed over the
> air to a few billion devices is ext4 on Android.
> I'm not sure if Android build system cares about reproducible
> system image, but I know it used to create the "system" image with
> a home brewed tool called make_ext4fs which creates a well
> "packed" fs. This means that fs takes the minimal space it can take
> from a given set of files. mkfs.ext4 was not designed for the use
> case of creating a file system with 0% free space.

That's fascinating. I hadn't heard of that tool, but it looks straight
forward. I was imagining something like that maybe existed for XFS,
but it's starting to sound like there isn't. I'm starting to think
that I've been approaching this problem from the wrong direction :)

> I remember Ted saying that he is happy Google moved away
> from from make_ext4fs (although it probably still lives on vendor
> builds), but I wonder is the Android build system replacement for
> creating a "packed" ext4 image?
>
> I added Ted to CC for his inputs, but I suggest that you add
> linux-fsdevel to CC for a larger diversity of inputs.
>
> That is not to suggest that you should not use xfs. You probably
> have your reasons for it, but whatever was already done by
> others for other fs (e.g. e2image -Qa) may be the way to go
> for xfs. xfs_copy would be the first tool I would look into extending
> for your use case.

That sounds reasonable. Thank you for the suggestion. I'll I'll take
another look at xfs_copy.

>
> Cheers,
> Amir.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reproducible XFS filesystem artifacts
  2018-01-17  6:34         ` Dave Chinner
@ 2018-01-22 19:45           ` Philipp Schrader
  0 siblings, 0 replies; 9+ messages in thread
From: Philipp Schrader @ 2018-01-22 19:45 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Amir Goldstein, Darrick J. Wong, linux-xfs, Austin Schuh,
	Alison Chaiken, Theodore Tso

On Tue, Jan 16, 2018 at 10:34 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, Jan 17, 2018 at 05:15:33PM +1100, Dave Chinner wrote:
>> IOWs, you're chasing a goal (100% reproducable filesystem images)
>> that simply cannot be acheived via writing files through a
>> kernel-based filesystem....
>
> That said, we do have a mechanism for populating XFS filesystems
> from userspace in a manner that we may be able to make deterministic
> enough for reproducable image file creation: the mkfs.xfs protofile
> infrastructure. That runs from mkfs in userspace, and creates the
> directory structure and files specified in the protofile. There's
> nothing that runs concurrently with this, it will always run the
> creation operations in the same order, and I think we could
> extend it to specify a global timestamp for all inodes and solve
> that problem too.
>
> The protofile infrastructure uses the kernel allocation code which
> we already know is deterministic (i.e. gives the same allocation
> results for the same operations if the initial state is the same)
> and so we can probably get very close to 100% reproducable
> filesystem image through this mechanism.
>
> It's need some work and extensions to provide everything that is
> needed in a reliable manner, and a bunch of regression tests
> added to fstests to make sure it works and keeps working. If you
> want to stick with XFS as the base filesystem for your images, this
> may be the best way to proceed....

Wow, thank you very much for all that information! I'm starting to see
that I've been approaching this problem from the wrong angle.

It sounds like in the long run I should look at other filesystems more
suitable/amenable to this kind of thing.

The protofile infrastructure, however, sounds really interesting. I've
not encountered it before, but the documentation for it makes me agree
with your assessment.  I'll spend more time looking at all the
information everyone provided on this thread. It sounds like there are
a few decisions ahead of us :)

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reproducible XFS filesystem artifacts
  2018-01-22 19:45       ` Philipp Schrader
@ 2018-01-22 20:28         ` Austin Schuh
  0 siblings, 0 replies; 9+ messages in thread
From: Austin Schuh @ 2018-01-22 20:28 UTC (permalink / raw)
  To: Philipp Schrader
  Cc: Amir Goldstein, Darrick J. Wong, linux-xfs, Alison Chaiken, Theodore Tso

On Mon, Jan 22, 2018 at 11:45 AM, Philipp Schrader
<philipp@peloton-tech.com> wrote:
> On Tue, Jan 16, 2018 at 8:05 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> So I'm curious. Why xfs and fat and not, say, squashfs?
>> https://reproducible-builds.org/events/athens2015/system-images/
>
> It's a good question. It's largely because of historical reasons
> internally. We started with XFS on the first iteration of our product.
> We fixed a few minor bugs and were overall really happy with
> performance etc. Later down the line came the question of system
> upgrades without breaking what we currently had. Anyway, so XFS is
> where we're at today.
>
> That being said, for the future something like squashfs is definitely
> a better choice. Thanks for the suggestion. I'll do more research on
> that.

Adding to the historical reasons list, we've got a lot of test time
with XFS and CONFIG_PREEMPT_RT, and it's working well for us.  That's
always very hard to quantify when making a decision whether or not to
switch filesystems.

Austin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-01-22 20:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-16  4:49 Reproducible XFS filesystem artifacts Philipp Schrader
2018-01-16  7:55 ` Darrick J. Wong
2018-01-17  0:52   ` Philipp Schrader
2018-01-17  4:05     ` Amir Goldstein
2018-01-17  6:15       ` Dave Chinner
2018-01-17  6:34         ` Dave Chinner
2018-01-22 19:45           ` Philipp Schrader
2018-01-22 19:45       ` Philipp Schrader
2018-01-22 20:28         ` Austin Schuh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.