linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: Du Rui <durui@linux.alibaba.com>
Cc: agk@redhat.com, alexl@redhat.com, dm-devel@redhat.com,
	gscrivan@redhat.com, linux-kernel@vger.kernel.org,
	snitzer@kernel.org, Gao Xiang <xiang@kernel.org>
Subject: Re: dm overlaybd: targets mapping OverlayBD image
Date: Sat, 27 May 2023 12:12:24 +0800	[thread overview]
Message-ID: <11c1e59f-d05e-5479-fa6b-36d9a793c16e@linux.alibaba.com> (raw)
In-Reply-To: <20230527031319.92200-1-durui@linux.alibaba.com>



On 2023/5/27 11:13, Du Rui wrote:
>> Block drivers has nothing to do on filesystem page cache stuffs, also
>> currently your approach has nothing to do with pmem stuffs (If you must
>> mention "DAX" to proposal your "page cache sharing", please _here_
>> write down your detailed design first and explain how it could work to
>> ours if you really want to do.)
> 
> We have already done experiments (by virtio pmem), to make virtual PMEM
> device in QEMU, make guest vm sharing only one memory mapping on host,
> with filesystem that supports DAX. In guest vm, fs keeps no page cache,
> maybe "sharing pagecache" is not such accurate description, but sharing
> memory pages on host can do prevent making duplicated pagecache pages in
> VMs.

First, does virtio-pmem have some relationship with this kernel
claim "dm / lvm" proposal" of yours?  Does your virtio-pmem work
on bare matel or cloud server or runC (I mean without some host
adaption)?

Secondly, does your virtio-pmem have any relationship with this
kernel approach? If not, why not directly using your userspace
work on your specific use case? How does this kernel DM approach
have any help to your "sharing pagecache"?

Do you know how kernel FSDAX work and what type of memory of
pmem is?  Could you give me your detailed kernel design to do
in-kernel DM+pmem dax mapping?

> 
> Please make sure that you have already understood that dm-overlaybd are
> for GENERIC purpose. It is NOT a special design for container, and have
> nothing related to filesystem implementations.
Previous dm-qcow2 is more generic (on-disk format friendly for
read-write as well as qcow2 format with two-level l1/l2 indexes
takes less runtime memory persistent footprint than your on-disk
format which needs to load and parse your hardly-seekable on-disk
lsmt+zfile layer indexes to some new in-memory represention which
can be used for random accesses before any real I/Os and these
in-memory indexes _cannot_ be _partially reclaimed_ from memory)
and qcow2 has more wider ecosystem compared to your approach, but
could you see the community tendency of this?

ublk-qcow2: ublk-qcow2 is available:
https://lore.kernel.org/r/Yza1u1KfKa7ycQm0@T590

Second, I have to mention here your previous attempt including
read (maybe later write) your DADI file stuffs in your
in-kernel block driver, which I think that was really
dangerous:

see vfsfile.c of your previous codebase
https://github.com/data-accelerator/dadi-kernel-mod/commit/ff12687f2c567ddf51a28df88b25dd2d0e3737a2

static struct file *file_open(const char *path, int flags, int rights)
{
..
	fp = filp_open(path, O_RDONLY, 0);
..
}

static ssize_t file_read(struct file *file, void *buf, size_t count, loff_t pos)
{
..
	vfs_fadvise(file, pos, count, POSIX_FADV_SEQUENTIAL);
..
		ret = kernel_read(file, buf, count, &lpos);
..
}

In your current proposed patch, you still call it as "struct
vfile" but use raw block device stuffs instead.

But this raw block device use cases are limited (almost useless)
for containers since as Alex said, almost all container users
switch to filesystem-based approach (I don't want to repeat why).
And your kernel approach is almost useless for virtual machine
use cases (see how qcow2 works for VMs).

In the end, Assume that if you *end up with* later upstreaming
reading backing filesystem files directly under the block layer
(for example, as your second step), that is really a not-go.

Anyway, all the above is on behalf myself.

Thanks,
Gao Xiang

  reply	other threads:[~2023-05-27  4:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-19 10:27 [RFC] dm overlaybd: targets mapping OverlayBD image Du Rui
2023-05-23 17:28 ` Mike Snitzer
2023-05-24  0:56   ` [dm-devel] " Gao Xiang
2023-05-24  6:43   ` Alexander Larsson
2023-05-24  7:13     ` Gao Xiang
2023-05-24  8:11       ` Giuseppe Scrivano
2023-05-24  8:26         ` Gao Xiang
2023-05-24 10:48           ` Giuseppe Scrivano
2023-05-24 11:06             ` Gao Xiang
2023-05-26 10:28         ` Du Rui
2023-05-26 10:26     ` Du Rui
2023-05-26 16:43       ` Gao Xiang
2023-05-27  3:13         ` Du Rui
2023-05-27  4:12           ` Gao Xiang [this message]
2023-05-24  6:59   ` Du Rui
2023-05-26 10:25   ` Du Rui
2023-05-24  7:24 ` [RFC PATCH v2] " Du Rui
2023-05-24  7:40 ` [RFC PATCH v3] " Du Rui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11c1e59f-d05e-5479-fa6b-36d9a793c16e@linux.alibaba.com \
    --to=hsiangkao@linux.alibaba.com \
    --cc=agk@redhat.com \
    --cc=alexl@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=durui@linux.alibaba.com \
    --cc=gscrivan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=snitzer@kernel.org \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).