linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Du Rui <durui@linux.alibaba.com>
To: snitzer@kernel.org
Cc: agk@redhat.com, alexl@redhat.com, dm-devel@redhat.com,
	durui@linux.alibaba.com, gscrivan@redhat.com,
	linux-kernel@vger.kernel.org
Subject: Re: Re: dm overlaybd: targets mapping OverlayBD image
Date: Fri, 26 May 2023 18:25:32 +0800	[thread overview]
Message-ID: <20230526102532.29276-1-durui@linux.alibaba.com> (raw)
In-Reply-To: <ZGz32yw7ecKhW+lj@redhat.com>

Hi Mike:

> I appreciate that this work is being done with an eye toward
> containerd "community" and standardization 

> it appears that this format of OCI image storage/use is only
> used by Alibaba? 

> But you'd do well to explain why the userspace solution isn't
> acceptable.

Yes overlaybd has origins in container community, but this work (kernel 
modules) does *NOT* actually target at container. Because on-demand lazy
loading of container images involves complex interactions with the image 
registry through HTTP(s) protocol, and possibly with other transport 
serivces (like HTTP proxy, sock5 proxy, P2P, cache, etc.). This is better 
implemented in user-space and finally exported to kernel as a virtual 
block device like TCMU or ublk. The user-space impl of Overlaybd has a 
very large install base in Alibaba, as well as some other big companies, 
including another major cloud provider. (We'd better not unveil their
names before we get their permissions). And We are pleased with the
flexibility in user-space that allows for easy integration to various 
systems / environments.

We implement this kernel module and try to contribute it to upstream
because we belive it is useful for device mapper and LVM ecology:

(1) dm-overlaybd essentially implements generic redistributable snapshot
    of an block device. This may enable LVM to push/pull individual 
    snapshots to/from a volume repo globally distributed.

(2) dm-overlaybd is highly efficent. Its index performance doesn't degrade 
    with the number of snapshots increasing. In constrast, qcow2 (dm-qcow2) 
    do not support efficient external snapshots. It has O(n) overhead in 
    this case, where n is the number of (backing-file) snapshots.

(3) dm-zfile is an efficient generic compressed block device. This allows
    LVM to support compressed snapshot, in order to save disk space without
    compromise much performance, and may even improve performance in some
    cases.


> I also have doubts that this solution is _actually_ more performant
> than a proper filesystem based solution

This proposal is not focused on performance, it's focused on new features
to dm and LVM as described above, but I still advice you to run benchmarks
and see the results. After all, ext4, xfs and other mature file systems are
highly optimized as well.

> solution that allows page cache sharing

Page cache sharing can be realized with DAX support of the dm targets
(and the inner file system), together with virtual pmem device backend.

> There is an active discussion about, and active development effort
> for, using overlayfs + erofs for container images.  I'm reluctant to
> merge this DM based container image approach without wider consensus
> from other container stakeholders.

This proposal intends to help dm and lvm ecology, and is not related to 
those file systems. It actually supports all kinds of file systems with 
full capabilities. It is of little use in container, as the user-space 
implementation is more feasible. And, there is nothing preventing the 
container stakeholders to continue discussing and developing overlayfs, 
erofs, composefs, etc.

  parent reply	other threads:[~2023-05-26 10:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-19 10:27 [RFC] dm overlaybd: targets mapping OverlayBD image Du Rui
2023-05-23 17:28 ` Mike Snitzer
2023-05-24  0:56   ` [dm-devel] " Gao Xiang
2023-05-24  6:43   ` Alexander Larsson
2023-05-24  7:13     ` Gao Xiang
2023-05-24  8:11       ` Giuseppe Scrivano
2023-05-24  8:26         ` Gao Xiang
2023-05-24 10:48           ` Giuseppe Scrivano
2023-05-24 11:06             ` Gao Xiang
2023-05-26 10:28         ` Du Rui
2023-05-26 10:26     ` Du Rui
2023-05-26 16:43       ` Gao Xiang
2023-05-27  3:13         ` Du Rui
2023-05-27  4:12           ` Gao Xiang
2023-05-24  6:59   ` Du Rui
2023-05-26 10:25   ` Du Rui [this message]
2023-05-24  7:24 ` [RFC PATCH v2] " Du Rui
2023-05-24  7:40 ` [RFC PATCH v3] " Du Rui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230526102532.29276-1-durui@linux.alibaba.com \
    --to=durui@linux.alibaba.com \
    --cc=agk@redhat.com \
    --cc=alexl@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=gscrivan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).