linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: Alexander Larsson <alexl@redhat.com>,
	Amir Goldstein <amir73il@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	gscrivan@redhat.com, david@fromorbit.com, brauner@kernel.org,
	viro@zeniv.linux.org.uk, Vivek Goyal <vgoyal@redhat.com>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem
Date: Tue, 24 Jan 2023 22:40:39 +0800	[thread overview]
Message-ID: <1d65be2f-6d3a-13c6-4982-66bbb0f9b530@linux.alibaba.com> (raw)
In-Reply-To: <5fb32a1297821040edd8c19ce796fc0540101653.camel@redhat.com>



On 2023/1/24 21:10, Alexander Larsson wrote:
> On Tue, 2023-01-24 at 05:24 +0200, Amir Goldstein wrote:
>> On Mon, Jan 23, 2023 at 7:56 PM Alexander Larsson <alexl@redhat.com>

...

>>
>> No it is not overlayfs, it is overlayfs+squashfs, please stick to
>> facts.
>> As Gao wrote, squashfs does not optimize directory lookup.
>> You can run a test with ext4 for POC as Gao suggested.
>> I am sure that mkfs.erofs sparse file support can be added if needed.
> 
> New measurements follow, they now include also erofs over loopback,
> although that isn't strictly fair, because that image is much larger
> due to the fact that it didn't store the files sparsely. It also
> includes a version where the topmost lower is directly on the backing
> xfs (i.e. not via loopback). I attached the scripts used to create the
> images and do the profiling in case anyone wants to reproduce.
> 
> Here are the results (on x86-64, xfs base fs):
> 
> overlayfs + loopback squashfs - uncached
> Benchmark 1: ls -lR mnt-ovl
>    Time (mean ± σ):      2.483 s ±  0.029 s    [User: 0.167 s, System: 1.656 s]
>    Range (min … max):    2.427 s …  2.530 s    10 runs
>   
> overlayfs + loopback squashfs - cached
> Benchmark 1: ls -lR mnt-ovl
>    Time (mean ± σ):     429.2 ms ±   4.6 ms    [User: 123.6 ms, System: 295.0 ms]
>    Range (min … max):   421.2 ms … 435.3 ms    10 runs
>   
> overlayfs + loopback ext4 - uncached
> Benchmark 1: ls -lR mnt-ovl
>    Time (mean ± σ):      4.332 s ±  0.060 s    [User: 0.204 s, System: 3.150 s]
>    Range (min … max):    4.261 s …  4.442 s    10 runs
>   
> overlayfs + loopback ext4 - cached
> Benchmark 1: ls -lR mnt-ovl
>    Time (mean ± σ):     528.3 ms ±   4.0 ms    [User: 143.4 ms, System: 381.2 ms]
>    Range (min … max):   521.1 ms … 536.4 ms    10 runs
>   
> overlayfs + loopback erofs - uncached
> Benchmark 1: ls -lR mnt-ovl
>    Time (mean ± σ):      3.045 s ±  0.127 s    [User: 0.198 s, System: 1.129 s]
>    Range (min … max):    2.926 s …  3.338 s    10 runs
>   
> overlayfs + loopback erofs - cached
> Benchmark 1: ls -lR mnt-ovl
>    Time (mean ± σ):     516.9 ms ±   5.7 ms    [User: 139.4 ms, System: 374.0 ms]
>    Range (min … max):   503.6 ms … 521.9 ms    10 runs
>   
> overlayfs + direct - uncached
> Benchmark 1: ls -lR mnt-ovl
>    Time (mean ± σ):      2.562 s ±  0.028 s    [User: 0.199 s, System: 1.129 s]
>    Range (min … max):    2.497 s …  2.585 s    10 runs
>   
> overlayfs + direct - cached
> Benchmark 1: ls -lR mnt-ovl
>    Time (mean ± σ):     524.5 ms ±   1.6 ms    [User: 148.7 ms, System: 372.2 ms]
>    Range (min … max):   522.8 ms … 527.8 ms    10 runs
>   
> composefs - uncached
> Benchmark 1: ls -lR mnt-fs
>    Time (mean ± σ):     681.4 ms ±  14.1 ms    [User: 154.4 ms, System: 369.9 ms]
>    Range (min … max):   652.5 ms … 703.2 ms    10 runs
>   
> composefs - cached
> Benchmark 1: ls -lR mnt-fs
>    Time (mean ± σ):     390.8 ms ±   4.7 ms    [User: 144.7 ms, System: 243.7 ms]
>    Range (min … max):   382.8 ms … 399.1 ms    10 runs
> 
> For the uncached case, composefs is still almost four times faster than
> the fastest overlay combo (squashfs), and the non-squashfs versions are
> strictly slower. For the cached case the difference is less (10%) but
> with similar order of performance.
> 
> For size comparison, here are the resulting images:
> 
> 8.6M large.composefs
> 2.5G large.erofs
> 200M large.ext4
> 2.6M large.squashfs
Ok, I have to say I'm a bit surprised by these results. Just a wild guess,
`ls -lR` is a seq-like access, so that compressed data (assumed that you
use it) is benefited from it.  I cannot think of a proper cause before
looking into more.  EROFS is impacted since EROFS on-disk inodes are not
arranged together with the current mkfs.erofs implemenetation (it's just
a userspace implementation details, if people really care about it, I
will refine the implementation), and I will also implement such sparse
files later so that all on-disk inodes won't be impacted as well (I'm on
vacation, but I will try my best).

 From the overall results, I don't really know what's the most bottleneck
point honestly:
   maybe just like what you said -- due to overlayfs overhead;
   or maybe a bottleneck of loopback device.

   so it's much better to show some results of "ls -lR" without overlayfs
stacked too.

IMHO, Amir's main point is always [1]
"w.r.t overlayfs, I am not even sure that anything needs to be modified
  in the driver.
  overlayfs already supports "metacopy" feature which means that an upper
  layer could be composed in a way that the file content would be read
  from an arbitrary path in lower fs, e.g. objects/cc/XXX. "

I think there is nothing wrong with it (except for fsverity). From the
results, such functionality indeed can already be achieved by overlayfs
+ some localfs with some user-space adaption. And it was not mentioned
in RFC and v2.

So without fs-verity requirement, currently your proposal is mainly
resolving a performance issue of an exist in-kernel approach (except for
unprivileged mounts).  It's much better to describe in the cover letter
-- The original problem, why overlayfs + (localfs or FUSE for metadata)
doesn't meet the requirements.  That makes much sense compared with the
current cover letter.

Thanks,
Gao Xiang

[1] https://lore.kernel.org/r/CAOQ4uxh34udueT-+Toef6TmTtyLjFUnSJs=882DH=HxADX8pKw@mail.gmail.com/

  reply	other threads:[~2023-01-24 14:40 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20 15:23 [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 1/6] fsverity: Export fsverity_get_digest Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 2/6] composefs: Add on-disk layout header Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 3/6] composefs: Add descriptor parsing code Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 4/6] composefs: Add filesystem implementation Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 5/6] composefs: Add documentation Alexander Larsson
2023-01-21  2:19   ` Bagas Sanjaya
2023-01-20 15:23 ` [PATCH v3 6/6] composefs: Add kconfig and build support Alexander Larsson
2023-01-20 19:44 ` [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem Amir Goldstein
2023-01-20 22:18   ` Giuseppe Scrivano
2023-01-21  3:08     ` Gao Xiang
2023-01-21 16:19       ` Giuseppe Scrivano
2023-01-21 17:15         ` Gao Xiang
2023-01-21 22:34           ` Giuseppe Scrivano
2023-01-22  0:39             ` Gao Xiang
2023-01-22  9:01               ` Giuseppe Scrivano
2023-01-22  9:32                 ` Giuseppe Scrivano
2023-01-24  0:08                   ` Gao Xiang
2023-01-21 10:57     ` Amir Goldstein
2023-01-21 15:01       ` Giuseppe Scrivano
2023-01-21 15:54         ` Amir Goldstein
2023-01-21 16:26           ` Gao Xiang
2023-01-23 17:56   ` Alexander Larsson
2023-01-23 23:59     ` Gao Xiang
2023-01-24  3:24     ` Amir Goldstein
2023-01-24 13:10       ` Alexander Larsson
2023-01-24 14:40         ` Gao Xiang [this message]
2023-01-24 19:06         ` Amir Goldstein
2023-01-25  4:18           ` Dave Chinner
2023-01-25  8:32             ` Amir Goldstein
2023-01-25 10:08               ` Alexander Larsson
2023-01-25 10:43                 ` Amir Goldstein
2023-01-25 10:39               ` Giuseppe Scrivano
2023-01-25 11:17                 ` Amir Goldstein
2023-01-25 12:30                   ` Giuseppe Scrivano
2023-01-25 12:46                     ` Amir Goldstein
2023-01-25 13:10                       ` Giuseppe Scrivano
2023-01-25 18:07                         ` Amir Goldstein
2023-01-25 19:45                           ` Giuseppe Scrivano
2023-01-25 20:23                             ` Amir Goldstein
2023-01-25 20:29                               ` Amir Goldstein
2023-01-27 15:57                               ` Vivek Goyal
2023-01-25 15:24                       ` Christian Brauner
2023-01-25 16:05                         ` Giuseppe Scrivano
2023-01-25  9:37           ` Alexander Larsson
2023-01-25 10:05             ` Gao Xiang
2023-01-25 10:15               ` Alexander Larsson
2023-01-27 10:24                 ` Gao Xiang
2023-02-01  4:28                   ` Jingbo Xu
2023-02-01  7:44                     ` Amir Goldstein
2023-02-01  8:59                       ` Jingbo Xu
2023-02-01  9:52                         ` Alexander Larsson
2023-02-01 12:39                           ` Jingbo Xu
2023-02-01  9:46                     ` Alexander Larsson
2023-02-01 10:01                       ` Gao Xiang
2023-02-01 11:22                         ` Gao Xiang
2023-02-02  6:37                           ` Amir Goldstein
2023-02-02  7:17                             ` Gao Xiang
2023-02-02  7:37                               ` Gao Xiang
2023-02-03 11:32                                 ` Alexander Larsson
2023-02-03 12:46                                   ` Amir Goldstein
2023-02-03 15:09                                     ` Gao Xiang
2023-02-05 19:06                                       ` Amir Goldstein
2023-02-06  7:59                                         ` Amir Goldstein
2023-02-06 10:35                                         ` Miklos Szeredi
2023-02-06 13:30                                           ` Amir Goldstein
2023-02-06 16:34                                             ` Miklos Szeredi
2023-02-06 17:16                                               ` Amir Goldstein
2023-02-06 18:17                                                 ` Amir Goldstein
2023-02-06 19:32                                                 ` Miklos Szeredi
2023-02-06 20:06                                                   ` Amir Goldstein
2023-02-07  8:12                                                     ` Alexander Larsson
2023-02-06 12:51                                         ` Alexander Larsson
2023-02-07  8:12                                         ` Jingbo Xu
2023-02-06 12:43                                     ` Alexander Larsson
2023-02-06 13:27                                       ` Gao Xiang
2023-02-06 15:31                                         ` Alexander Larsson
2023-02-01 12:06                       ` Jingbo Xu
2023-02-02  4:57                       ` Jingbo Xu
2023-02-02  4:59                         ` Jingbo Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d65be2f-6d3a-13c6-4982-66bbb0f9b530@linux.alibaba.com \
    --to=hsiangkao@linux.alibaba.com \
    --cc=alexl@redhat.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=gscrivan@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=vgoyal@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).