linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Du Rui <durui@linux.alibaba.com>
To: Mike Snitzer <snitzer@kernel.org>
Cc: dm-devel@redhat.com, linux-kernel@vger.kernel.org,
	Alasdair Kergon <agk@redhat.com>,
	Alexander Larsson <alexl@redhat.com>,
	Giuseppe Scrivano <gscrivan@redhat.com>
Subject: Re: dm overlaybd: targets mapping OverlayBD image
Date: Wed, 24 May 2023 14:59:39 +0800	[thread overview]
Message-ID: <ff5a72a3-3977-68f1-c5c6-41c90b7229b2@linux.alibaba.com> (raw)
In-Reply-To: <ZGz32yw7ecKhW+lj@redhat.com>

Hi Mike,

On 5/24/23 1:28 AM, Mike Snitzer wrote:
> On Fri, May 19 2023 at  6:27P -0400,
> Du Rui <durui@linux.alibaba.com> wrote:
> 
>> OverlayBD is a novel layering block-level image format, which is design
>> for container, secure container and applicable to virtual machine,
>> published in USENIX ATC '20
>> https://www.usenix.org/system/files/atc20-li-huiba.pdf
>>
>> OverlayBD already has a ContainerD non-core sub-project implementation
>> in userspace, as an accelerated container image service
>> https://github.com/containerd/accelerated-container-image
>>
>> It could be much more efficient when do decompressing and mapping works
>> in the kernel with the framework of device-mapper, in many circumstances,
>> such as secure container runtime, mobile-devices, etc.
>>
>> This patch contains a module, dm-overlaybd, provides two kinds of targets
>> dm-zfile and dm-lsmt, to expose a group of block-devices contains
>> OverlayBD image as a overlaid read-only block-device.
>>
>> Signed-off-by: Du Rui <durui@linux.alibaba.com>
> 
> <snip, original patch here: [1] >
> 
> I appreciate that this work is being done with an eye toward
> containerd "community" and standardization but based on my limited
> research it appears that this format of OCI image storage/use is only
> used by Alibaba? (but I could be wrong...)
> 
> But you'd do well to explain why the userspace solution isn't
> acceptable. Are there security issues that moving the implementation
> to kernel addresses?
> 
> I also have doubts that this solution is _actually_ more performant
> than a proper filesystem based solution that allows page cache sharing
> of container image data across multiple containers.
> 
> There is an active discussion about, and active development effort
> for, using overlayfs + erofs for container images.  I'm reluctant to
> merge this DM based container image approach without wider consensus
> from other container stakeholders.
> 
> But short of reaching wider consensus on the need for these DM
> targets: there is nothing preventing you from carrying these changes
> in your alibaba kernel.
> 
> Mike
> 
> [1]: https://patchwork.kernel.org/project/dm-devel/patch/9505927dabc3b6695d62dfe1be371b12f5bdebf7.1684491648.git.durui@linux.alibaba.com/

OverlayBD is a generic solution for overlayable and random accessable 
read-only block device, it is a part of container image solution, but 
not only designed for container images. Actually our team also use it in 
VM and other data images.

Container images in format of OverlayBD is not only used in Alibaba, as 
a open-source solution of containerd, it has already have users in 
community. The project also have contributors from community.

I do like erofs, and also looking forward to widely used container image 
solutions via filesystem. But any filesystem container image soultion 
has no conflict with a generic block device image.

All filesystems that access data via block-devices are possible to 
create OverlayBD image, including those widely used filesystems. With 
dm-snapshot or dm-thin providing writable layer for a read-only block 
device, block images can be mounted as full featured filesystem, with 
100% compatibility to those filesystems on normal block devices.

By my tests, erofs, btrfs, squashfs, and other filesystems on OverlayBD 
performs very well, in some certain circumstances, even better that 
those on raw block devices.

Considering sharing page cache, lots of filesystem supports DAX for PMEM 
devices, that might be a way to work around I think. Currently those 
related implementation is not a part of this module.

Thanks for the replying.

Du Rui

  parent reply	other threads:[~2023-05-24  7:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-19 10:27 [RFC] dm overlaybd: targets mapping OverlayBD image Du Rui
2023-05-23 17:28 ` Mike Snitzer
2023-05-24  0:56   ` [dm-devel] " Gao Xiang
2023-05-24  6:43   ` Alexander Larsson
2023-05-24  7:13     ` Gao Xiang
2023-05-24  8:11       ` Giuseppe Scrivano
2023-05-24  8:26         ` Gao Xiang
2023-05-24 10:48           ` Giuseppe Scrivano
2023-05-24 11:06             ` Gao Xiang
2023-05-26 10:28         ` Du Rui
2023-05-26 10:26     ` Du Rui
2023-05-26 16:43       ` Gao Xiang
2023-05-27  3:13         ` Du Rui
2023-05-27  4:12           ` Gao Xiang
2023-05-24  6:59   ` Du Rui [this message]
2023-05-26 10:25   ` Du Rui
2023-05-24  7:24 ` [RFC PATCH v2] " Du Rui
2023-05-24  7:40 ` [RFC PATCH v3] " Du Rui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff5a72a3-3977-68f1-c5c6-41c90b7229b2@linux.alibaba.com \
    --to=durui@linux.alibaba.com \
    --cc=agk@redhat.com \
    --cc=alexl@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=gscrivan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).