From: Amir Goldstein <amir73il@gmail.com>
To: Alexander Larsson <alexl@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
gscrivan@redhat.com, brauner@kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
david@fromorbit.com, viro@zeniv.linux.org.uk,
Vivek Goyal <vgoyal@redhat.com>,
Josef Bacik <josef@toxicpanda.com>,
Gao Xiang <hsiangkao@linux.alibaba.com>,
Jingbo Xu <jefflexu@linux.alibaba.com>
Subject: Re: [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem
Date: Mon, 6 Feb 2023 09:59:36 +0200 [thread overview]
Message-ID: <CAOQ4uxjDz93CNHJpenBzSNqsktnKg0NwpBV4LZ+dTDhKtbi5Vg@mail.gmail.com> (raw)
In-Reply-To: <CAOQ4uximQZ_DL1atbrCg0bQ8GN8JfrEartxDSP+GB_hFvYQOhg@mail.gmail.com>
On Sun, Feb 5, 2023 at 9:06 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > >>> Apart from that, I still fail to get some thoughts (apart from
> > >>> unprivileged
> > >>> mounts) how EROFS + overlayfs combination fails on automative real
> > >>> workloads
> > >>> aside from "ls -lR" (readdir + stat).
> > >>>
> > >>> And eventually we still need overlayfs for most use cases to do
> > >>> writable
> > >>> stuffs, anyway, it needs some words to describe why such < 1s
> > >>> difference is
> > >>> very very important to the real workload as you already mentioned
> > >>> before.
> > >>>
> > >>> And with overlayfs lazy lookup, I think it can be close to ~100ms or
> > >>> better.
> > >>>
> > >>
> > >> If we had an overlay.fs-verity xattr, then I think there are no
> > >> individual features lacking for it to work for the automotive usecase
> > >> I'm working on. Nor for the OCI container usecase. However, the
> > >> possibility of doing something doesn't mean it is the better technical
> > >> solution.
> > >>
> > >> The container usecase is very important in real world Linux use today,
> > >> and as such it makes sense to have a technically excellent solution for
> > >> it, not just a workable solution. Obviously we all have different
> > >> viewpoints of what that is, but these are the reasons why I think a
> > >> composefs solution is better:
> > >>
> > >> * It is faster than all other approaches for the one thing it actually
> > >> needs to do (lookup and readdir performance). Other kinds of
> > >> performance (file i/o speed, etc) is up to the backing filesystem
> > >> anyway.
> > >>
> > >> Even if there are possible approaches to make overlayfs perform better
> > >> here (the "lazy lookup" idea) it will not reach the performance of
> > >> composefs, while further complicating the overlayfs codebase. (btw, did
> > >> someone ask Miklos what he thinks of that idea?)
> > >>
> > >
> > > Well, Miklos was CCed (now in TO:)
> > > I did ask him specifically about relaxing -ouserxarr,metacopy,redirect:
> > > https://lore.kernel.org/linux-unionfs/20230126082228.rweg75ztaexykejv@wittgenstein/T/#mc375df4c74c0d41aa1a2251c97509c6522487f96
> > > but no response on that yet.
> > >
> > > TBH, in the end, Miklos really is the one who is going to have the most
> > > weight on the outcome.
> > >
> > > If Miklos is interested in adding this functionality to overlayfs, you are going
> > > to have a VERY hard sell, trying to merge composefs as an independent
> > > expert filesystem. The community simply does not approve of this sort of
> > > fragmentation unless there is a very good reason to do that.
> > >
> > >> For the automotive usecase we have strict cold-boot time requirements
> > >> that make cold-cache performance very important to us. Of course, there
> > >> is no simple time requirements for the specific case of listing files
> > >> in an image, but any improvement in cold-cache performance for both the
> > >> ostree rootfs and the containers started during boot will be worth its
> > >> weight in gold trying to reach these hard KPIs.
> > >>
> > >> * It uses less memory, as we don't need the extra inodes that comes
> > >> with the overlayfs mount. (See profiling data in giuseppes mail[1]).
> > >
> > > Understood, but we will need profiling data with the optimized ovl
> > > (or with the single blob hack) to compare the relevant alternatives.
> >
> > My little request again, could you help benchmark on your real workload
> > rather than "ls -lR" stuff? If your hard KPI is really what as you
> > said, why not just benchmark the real workload now and write a detailed
> > analysis to everyone to explain it's a _must_ that we should upstream
> > a new stacked fs for this?
> >
>
> I agree that benchmarking the actual KPI (boot time) will have
> a much stronger impact and help to build a much stronger case
> for composefs if you can prove that the boot time difference really matters.
>
> In order to test boot time on fair grounds, I prepared for you a POC
> branch with overlayfs lazy lookup:
> https://github.com/amir73il/linux/commits/ovl-lazy-lowerdata
>
> It is very lightly tested, but should be sufficient for the benchmark.
> Note that:
> 1. You need to opt-in with redirect_dir=lazyfollow,metacopy=on
> 2. The lazyfollow POC only works with read-only overlay that
> has two lower dirs (1 metadata layer and one data blobs layer)
> 3. The data layer must be a local blockdev fs (i.e. not a network fs)
> 4. Only absolute path redirects are lazy (e.g. "/objects/cc/3da...")
Forgot to mention that
5. The redirect path should be a realpath within the local fs -
symlinks are not followed.
>
> These limitations could be easily lifted with a bit more work.
> If any of those limitations stand in your way for running the benchmark
> let me know and I'll see what I can do.
>
> If there is any issue with the POC branch, please let me know.
>
Thanks,
Amir.
next prev parent reply other threads:[~2023-02-06 8:00 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 15:23 [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 1/6] fsverity: Export fsverity_get_digest Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 2/6] composefs: Add on-disk layout header Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 3/6] composefs: Add descriptor parsing code Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 4/6] composefs: Add filesystem implementation Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 5/6] composefs: Add documentation Alexander Larsson
2023-01-21 2:19 ` Bagas Sanjaya
2023-01-20 15:23 ` [PATCH v3 6/6] composefs: Add kconfig and build support Alexander Larsson
2023-01-20 19:44 ` [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem Amir Goldstein
2023-01-20 22:18 ` Giuseppe Scrivano
2023-01-21 3:08 ` Gao Xiang
2023-01-21 16:19 ` Giuseppe Scrivano
2023-01-21 17:15 ` Gao Xiang
2023-01-21 22:34 ` Giuseppe Scrivano
2023-01-22 0:39 ` Gao Xiang
2023-01-22 9:01 ` Giuseppe Scrivano
2023-01-22 9:32 ` Giuseppe Scrivano
2023-01-24 0:08 ` Gao Xiang
2023-01-21 10:57 ` Amir Goldstein
2023-01-21 15:01 ` Giuseppe Scrivano
2023-01-21 15:54 ` Amir Goldstein
2023-01-21 16:26 ` Gao Xiang
2023-01-23 17:56 ` Alexander Larsson
2023-01-23 23:59 ` Gao Xiang
2023-01-24 3:24 ` Amir Goldstein
2023-01-24 13:10 ` Alexander Larsson
2023-01-24 14:40 ` Gao Xiang
2023-01-24 19:06 ` Amir Goldstein
2023-01-25 4:18 ` Dave Chinner
2023-01-25 8:32 ` Amir Goldstein
2023-01-25 10:08 ` Alexander Larsson
2023-01-25 10:43 ` Amir Goldstein
2023-01-25 10:39 ` Giuseppe Scrivano
2023-01-25 11:17 ` Amir Goldstein
2023-01-25 12:30 ` Giuseppe Scrivano
2023-01-25 12:46 ` Amir Goldstein
2023-01-25 13:10 ` Giuseppe Scrivano
2023-01-25 18:07 ` Amir Goldstein
2023-01-25 19:45 ` Giuseppe Scrivano
2023-01-25 20:23 ` Amir Goldstein
2023-01-25 20:29 ` Amir Goldstein
2023-01-27 15:57 ` Vivek Goyal
2023-01-25 15:24 ` Christian Brauner
2023-01-25 16:05 ` Giuseppe Scrivano
2023-01-25 9:37 ` Alexander Larsson
2023-01-25 10:05 ` Gao Xiang
2023-01-25 10:15 ` Alexander Larsson
2023-01-27 10:24 ` Gao Xiang
2023-02-01 4:28 ` Jingbo Xu
2023-02-01 7:44 ` Amir Goldstein
2023-02-01 8:59 ` Jingbo Xu
2023-02-01 9:52 ` Alexander Larsson
2023-02-01 12:39 ` Jingbo Xu
2023-02-01 9:46 ` Alexander Larsson
2023-02-01 10:01 ` Gao Xiang
2023-02-01 11:22 ` Gao Xiang
2023-02-02 6:37 ` Amir Goldstein
2023-02-02 7:17 ` Gao Xiang
2023-02-02 7:37 ` Gao Xiang
2023-02-03 11:32 ` Alexander Larsson
2023-02-03 12:46 ` Amir Goldstein
2023-02-03 15:09 ` Gao Xiang
2023-02-05 19:06 ` Amir Goldstein
2023-02-06 7:59 ` Amir Goldstein [this message]
2023-02-06 10:35 ` Miklos Szeredi
2023-02-06 13:30 ` Amir Goldstein
2023-02-06 16:34 ` Miklos Szeredi
2023-02-06 17:16 ` Amir Goldstein
2023-02-06 18:17 ` Amir Goldstein
2023-02-06 19:32 ` Miklos Szeredi
2023-02-06 20:06 ` Amir Goldstein
2023-02-07 8:12 ` Alexander Larsson
2023-02-06 12:51 ` Alexander Larsson
2023-02-07 8:12 ` Jingbo Xu
2023-02-06 12:43 ` Alexander Larsson
2023-02-06 13:27 ` Gao Xiang
2023-02-06 15:31 ` Alexander Larsson
2023-02-01 12:06 ` Jingbo Xu
2023-02-02 4:57 ` Jingbo Xu
2023-02-02 4:59 ` Jingbo Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOQ4uxjDz93CNHJpenBzSNqsktnKg0NwpBV4LZ+dTDhKtbi5Vg@mail.gmail.com \
--to=amir73il@gmail.com \
--cc=alexl@redhat.com \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=gscrivan@redhat.com \
--cc=hsiangkao@linux.alibaba.com \
--cc=jefflexu@linux.alibaba.com \
--cc=josef@toxicpanda.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=vgoyal@redhat.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).