All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: David Howells <dhowells@redhat.com>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>,
	linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org,
	linux-erofs@lists.ozlabs.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com,
	bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com,
	gerry@linux.alibaba.com, eguan@linux.alibaba.com,
	linux-kernel@vger.kernel.org
Subject: Re: [Linux-cachefs] [PATCH v2 00/20] fscache,  erofs: fscache-based demand-read semantics
Date: Wed, 19 Jan 2022 14:40:59 +0800	[thread overview]
Message-ID: <Yeeye2AUZITDsdh8@B-P7TQMD6M-0146.local> (raw)
In-Reply-To: <20220118131216.85338-1-jefflexu@linux.alibaba.com>

Hi David,

On Tue, Jan 18, 2022 at 09:11:56PM +0800, Jeffle Xu wrote:
> changes since v1:
> - rebase to v5.17
> - erofs: In chunk based layout, since the logical file offset has the
>   same remainder over PAGE_SIZE with the corresponding physical address
>   inside the data blob file, the file page cache can be directly
>   transferred to netfs library to contain the data from data blob file.
>   (patch 15) (Gao Xiang)
> - netfs,cachefiles: manage logical/physical offset separately. (patch 2)
>   (It is used by erofs_begin_cache_operation() in patch 15.)
> - cachefiles: introduce a new devnode specificaly for on-demand reading.
>   (patch 6)
> - netfs,fscache,cachefiles: add new CONFIG_* for on-demand reading.
>   (patch 3/5)
> - You could start a quick test by
>   https://github.com/lostjeffle/demand-read-cachefilesd
> - add more background information (mainly introduction to nydus) in the
>   "Background" part of this cover letter
> 
> [Important Issues]
> The following issues still need further discussion. Thanks for your time
> and patience.
> 
> 1. I noticed that there's refactoring of netfs library[1], and patch 1
> is not needed since [2].
> 
> 2. The current implementation will severely conflict with the
> refactoring of netfs library[1][2]. The assumption of 'struct
> netfs_i_context' [2] is that, every file in the upper netfs will
> correspond to only one backing file. While in our scenario, one file in
> erofs can correspond to multiple backing files. That is, the content of
> one file can be divided into multiple chunks, and are distrubuted over
> multiple blob files, i.e. multiple backing files. Currently I have no
> good idea solving this conflic.
>

Would you mind give more hints on this? Personally, I still think fscache
is useful and clean way for image distribution on-demand load use cases
in addition to cache network fs data as a more generic in-kernel caching
framework. From the point view of current codestat, it has slight
modification of netfslib and cachefiles (except for a new daemon):
 fs/netfs/Kconfig         |   8 +
 fs/netfs/read_helper.c   |  65 ++++++--
 include/linux/netfs.h    |  10 ++

 fs/cachefiles/Kconfig    |   8 +
 fs/cachefiles/daemon.c   | 147 ++++++++++++++++-
 fs/cachefiles/internal.h |  23 +++
 fs/cachefiles/io.c       |  82 +++++++++-
 fs/cachefiles/main.c     |  27 ++++
 fs/cachefiles/namei.c    |  60 ++++++-

Besides, I think that cookies can be set according to data mapping
(instead of fixed per file) will benefit the following scenario in
addition to our on-demand load use cases:
  It will benefit file cache data deduplication. What I can see is that
netfslib may have some follow-on development in order to support
encryption and compression. However, I think cache data deduplication
is also potentially useful to minimize cache storage since many local
fses already support reflink. However, I'm not sure if it's a great
idea that cachefile relies on underlayfs abilities for cache deduplication.
So for cache deduplication scenarios, I'm not sure per-file cookie is
still a good idea for us (or alternatively, maintain more complicated
mapping per cookie inside fscache besides filesystem mapping, too
unnecessary IMO).
  
By the way, in general, I'm not sure if it's a great idea to cache in
per-file basis (especially for too many small files), that is why we
introduced data deduplicated blobs. At least, it's simpler for read-only
fses. Recently, I found another good article to summarize this:
http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html

Thanks,
Gao Xiang

WARNING: multiple messages have this Message-ID (diff)
From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: David Howells <dhowells@redhat.com>
Cc: joseph.qi@linux.alibaba.com,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org,
	gerry@linux.alibaba.com, linux-erofs@lists.ozlabs.org
Subject: Re: [Linux-cachefs] [PATCH v2 00/20] fscache,	erofs: fscache-based demand-read semantics
Date: Wed, 19 Jan 2022 14:40:59 +0800	[thread overview]
Message-ID: <Yeeye2AUZITDsdh8@B-P7TQMD6M-0146.local> (raw)
In-Reply-To: <20220118131216.85338-1-jefflexu@linux.alibaba.com>

Hi David,

On Tue, Jan 18, 2022 at 09:11:56PM +0800, Jeffle Xu wrote:
> changes since v1:
> - rebase to v5.17
> - erofs: In chunk based layout, since the logical file offset has the
>   same remainder over PAGE_SIZE with the corresponding physical address
>   inside the data blob file, the file page cache can be directly
>   transferred to netfs library to contain the data from data blob file.
>   (patch 15) (Gao Xiang)
> - netfs,cachefiles: manage logical/physical offset separately. (patch 2)
>   (It is used by erofs_begin_cache_operation() in patch 15.)
> - cachefiles: introduce a new devnode specificaly for on-demand reading.
>   (patch 6)
> - netfs,fscache,cachefiles: add new CONFIG_* for on-demand reading.
>   (patch 3/5)
> - You could start a quick test by
>   https://github.com/lostjeffle/demand-read-cachefilesd
> - add more background information (mainly introduction to nydus) in the
>   "Background" part of this cover letter
> 
> [Important Issues]
> The following issues still need further discussion. Thanks for your time
> and patience.
> 
> 1. I noticed that there's refactoring of netfs library[1], and patch 1
> is not needed since [2].
> 
> 2. The current implementation will severely conflict with the
> refactoring of netfs library[1][2]. The assumption of 'struct
> netfs_i_context' [2] is that, every file in the upper netfs will
> correspond to only one backing file. While in our scenario, one file in
> erofs can correspond to multiple backing files. That is, the content of
> one file can be divided into multiple chunks, and are distrubuted over
> multiple blob files, i.e. multiple backing files. Currently I have no
> good idea solving this conflic.
>

Would you mind give more hints on this? Personally, I still think fscache
is useful and clean way for image distribution on-demand load use cases
in addition to cache network fs data as a more generic in-kernel caching
framework. From the point view of current codestat, it has slight
modification of netfslib and cachefiles (except for a new daemon):
 fs/netfs/Kconfig         |   8 +
 fs/netfs/read_helper.c   |  65 ++++++--
 include/linux/netfs.h    |  10 ++

 fs/cachefiles/Kconfig    |   8 +
 fs/cachefiles/daemon.c   | 147 ++++++++++++++++-
 fs/cachefiles/internal.h |  23 +++
 fs/cachefiles/io.c       |  82 +++++++++-
 fs/cachefiles/main.c     |  27 ++++
 fs/cachefiles/namei.c    |  60 ++++++-

Besides, I think that cookies can be set according to data mapping
(instead of fixed per file) will benefit the following scenario in
addition to our on-demand load use cases:
  It will benefit file cache data deduplication. What I can see is that
netfslib may have some follow-on development in order to support
encryption and compression. However, I think cache data deduplication
is also potentially useful to minimize cache storage since many local
fses already support reflink. However, I'm not sure if it's a great
idea that cachefile relies on underlayfs abilities for cache deduplication.
So for cache deduplication scenarios, I'm not sure per-file cookie is
still a good idea for us (or alternatively, maintain more complicated
mapping per cookie inside fscache besides filesystem mapping, too
unnecessary IMO).
  
By the way, in general, I'm not sure if it's a great idea to cache in
per-file basis (especially for too many small files), that is why we
introduced data deduplicated blobs. At least, it's simpler for read-only
fses. Recently, I found another good article to summarize this:
http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html

Thanks,
Gao Xiang

  parent reply	other threads:[~2022-01-19  6:41 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-18 13:11 [PATCH v2 00/20] fscache,erofs: fscache-based demand-read semantics Jeffle Xu
2022-01-18 13:11 ` Jeffle Xu
2022-01-18 13:11 ` [PATCH v2 01/20] netfs: make @file optional in netfs_alloc_read_request() Jeffle Xu
2022-01-18 13:11   ` Jeffle Xu
2022-01-18 13:11 ` [PATCH v2 02/20] netfs,cachefiles: manage logical/physical offset separately Jeffle Xu
2022-01-18 13:11   ` [PATCH v2 02/20] netfs, cachefiles: " Jeffle Xu
2022-01-18 13:11 ` [PATCH v2 03/20] netfs,fscache: support on-demand reading Jeffle Xu
2022-01-18 13:11   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 04/20] cachefiles: extract generic daemon write function Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 05/20] cachefiles: detect backing file size in on-demand read mode Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 06/20] cachefiles: introduce new devnode for " Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 07/20] erofs: use meta buffers for erofs_read_superblock() Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 08/20] erofs: export erofs_map_blocks() Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 09/20] erofs: add mode checking helper Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 10/20] erofs: register global fscache volume Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 11/20] erofs: add cookie context helper functions Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 12/20] erofs: add anonymous inode managing page cache of blob file Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 13/20] erofs: register cookie context for bootstrap blob Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 14/20] erofs: implement fscache-based metadata read Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 15/20] erofs: implement fscache-based data read for non-inline layout Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 16/20] erofs: implement fscache-based data read for inline layout Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 17/20] erofs: register cookie context for data blobs Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 18/20] erofs: implement fscache-based data read " Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 19/20] erofs: add 'uuid' mount option Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-18 13:12 ` [PATCH v2 20/20] erofs: support on-demand reading Jeffle Xu
2022-01-18 13:12   ` Jeffle Xu
2022-01-19  6:40 ` Gao Xiang [this message]
2022-01-19  6:40   ` [Linux-cachefs] [PATCH v2 00/20] fscache, erofs: fscache-based demand-read semantics Gao Xiang
2022-01-21 10:57   ` JeffleXu
2022-01-21 10:57     ` JeffleXu
2022-01-24 17:23 ` [PATCH v2 00/20] fscache,erofs: " David Howells
2022-01-24 17:23   ` [PATCH v2 00/20] fscache, erofs: " David Howells
2022-01-25  1:53   ` [PATCH v2 00/20] fscache,erofs: " JeffleXu
2022-01-25  1:53     ` JeffleXu
2022-01-25  2:55   ` JeffleXu
2022-01-25  2:55     ` JeffleXu
2022-01-25 15:34 ` [PATCH v2 11/20] erofs: add cookie context helper functions David Howells
2022-01-25 15:34   ` David Howells
2022-01-26  6:45   ` JeffleXu
2022-01-26  6:45     ` JeffleXu
2022-01-25 16:15 ` [PATCH v2 00/20] fscache, erofs: fscache-based demand-read semantics David Howells
2022-01-25 16:15   ` [PATCH v2 00/20] fscache,erofs: " David Howells
2022-01-26  6:10   ` JeffleXu
2022-01-26  6:10     ` JeffleXu
2022-01-26  8:51   ` David Howells
2022-01-26  8:51     ` [PATCH v2 00/20] fscache, erofs: " David Howells
2022-01-27  7:07     ` [PATCH v2 00/20] fscache,erofs: " JeffleXu
2022-01-27  7:07       ` JeffleXu
2022-01-25 20:27 ` David Howells
2022-01-25 20:27   ` [PATCH v2 00/20] fscache, erofs: " David Howells
2022-01-26  5:26   ` [PATCH v2 00/20] fscache,erofs: " JeffleXu
2022-01-26  5:26     ` JeffleXu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yeeye2AUZITDsdh8@B-P7TQMD6M-0146.local \
    --to=hsiangkao@linux.alibaba.com \
    --cc=bo.liu@linux.alibaba.com \
    --cc=chao@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=eguan@linux.alibaba.com \
    --cc=gerry@linux.alibaba.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jefflexu@linux.alibaba.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tao.peng@linux.alibaba.com \
    --cc=torvalds@linux-foundation.org \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.