All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 00/21] fscache,erofs: fscache-based on-demand read semantics
@ 2022-04-15 12:35 ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

changes since v8:
- rebase to 5.18-rc2
- cachefiles: use object_id rather than anon_fd to uniquely identify a
  cachefile object to avoid potential issues when the user moves the
  anonymous fd around, e.g. through dup() (refer to commit message and
  cachefiles_ondemand_get_fd() of patch 2 for more details)
  (David Howells)
- cachefiles: add @unbind_pincount refcount to avoid the potential deadlock
  (refer to commit message of patch3 for more details)
- cachefiles: move the calling site of cachefiles_ondemand_read() from
  cachefiles_read() to cacehfiles_prep_read() (refer to commit message
  of patch 5 for more details)
- cachefiles: add tracepoints (patch 7) (David Howells)
- cachefiles: update documentation (patch 8) (David Howells)
- erofs: update Reviewed-by tag from Gao Xiang
- erofs: move the logic of initializing bdev/dax_dev in fscache mode out
  from patch 15/20. Instead move it into patch 9, so that patch 20 can
  focus on the mount option handling
- erofs: update the subject line and commit message of patch 12 (Gao
  Xiang)
- erofs: remove and fold erofs_fscache_get_folio() helper (patch 16)
  (Gao Xiang)
- erofs: change kmap() to kamp_loacl_folio(), and comment cleanup (patch
  18) (Gao Xiang)
- update "advantage of fscache-based on-demand read" section of the
  cover letter
- we've finished a preliminary end-to-end on-demand download daemon in
  order to test the fscache on-demand kernel code as a real end-to-end
  workload for container use cases. The test user guide is added in the
  cover letter.
- Thanks Zichen Tian for testing
  Tested-by: Zichen Tian <tianzichen@kuaishou.com>


Kernel Patchset
---------------
Git tree:

    https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v9

Gitweb:

    https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v9


User Guide for E2E Container Use Case
-------------------------------------
User guide:

    https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md

Video:

    https://youtu.be/F4IF2_DENXo


User Daemon for Quick Test
--------------------------
Git tree:

    https://github.com/lostjeffle/demand-read-cachefilesd.git main

Gitweb:

    https://github.com/lostjeffle/demand-read-cachefilesd


RFC: https://lore.kernel.org/all/YbRL2glGzjfZkVbH@B-P7TQMD6M-0146.local/t/
v1: https://lore.kernel.org/lkml/47831875-4bdd-8398-9f2d-0466b31a4382@linux.alibaba.com/T/
v2: https://lore.kernel.org/all/2946d871-b9e1-cf29-6d39-bcab30f2854f@linux.alibaba.com/t/
v3: https://lore.kernel.org/lkml/20220209060108.43051-1-jefflexu@linux.alibaba.com/T/
v4: https://lore.kernel.org/lkml/20220307123305.79520-1-jefflexu@linux.alibaba.com/T/#t
v5: https://lore.kernel.org/lkml/202203170912.gk2sqkaK-lkp@intel.com/T/
v6: https://lore.kernel.org/lkml/202203260720.uA5o7k5w-lkp@intel.com/T/
v7: https://lore.kernel.org/lkml/557bcf75-2334-5fbb-d2e0-c65e96da566d@linux.alibaba.com/T/
v8: https://lore.kernel.org/all/ac8571b8-0935-1f4f-e9f1-e424f059b5ed@linux.alibaba.com/T/


[Background]
============
Nydus [1] is an image distribution service especially optimized for
distribution over network. Nydus is an excellent container image
acceleration solution, since it only pulls data from remote when needed,
a.k.a. on-demand reading and it also supports chunk-based deduplication,
compression, etc.

erofs (Enhanced Read-Only File System) is a filesystem designed for
read-only scenarios. (Documentation/filesystem/erofs.rst)

Over the past months we've been focusing on supporting Nydus image service
with in-kernel erofs format[2]. In that case, each container image will be
organized in one bootstrap (metadata) and (optional) multiple data blobs in
erofs format. Massive container images will be stored on one machine.

To accelerate the container startup (fetching container images from remote
and then start the container), we do hope that the bootstrap & blob files
could support on-demand read. That is, erofs can be mounted and accessed
even when the bootstrap/data blob files have not been fully downloaded.
Then it'll have native performance after data is available locally.

That means we have to manage the cache state of the bootstrap/data blob
files (if cache hit, read directly from the local cache; if cache miss,
fetch the data somehow). It would be painful and may be dumb for erofs to
implement the cache management itself. Thus we prefer fscache/cachefiles
to do the cache management instead.

The fscache on-demand read feature aims to be implemented in a generic way
so that it can benefit other use cases and/or filesystems if it's
implemented in the fscache subsystem.

[1] https://nydus.dev
[2] https://sched.co/pcdL


[Overall Design]
================
Please refer to patch 7 ("cachefiles: document on-demand read mode") for
more details.

When working in the original mode, cachefiles mainly serves as a local cache
for remote networking fs, while in on-demand read mode, cachefiles can work
in the scenario where on-demand read semantics is needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when cache miss, netfs itself will fetch data from remote, and then write the
fetched data into cache file. While in on-demand read mode, a user daemon is
responsible for fetching data and then feeds to the kernel fscache side.

The on-demand read mode relies on a simple protocol used for communication
between kernel and user daemon.

The proposed implementation relies on the anonymous fd mechanism to avoid
the dependence on the format of cache file. When a fscache cachefile is opened
for the first time, an anon_fd associated with the cache file is sent to the
user daemon. With the given anon_fd, user daemon could fetch and write data
into the cache file in the background, even when kernel has not triggered the
cache miss. Besides, the write() syscall to the anon_fd will finally call
cachefiles kernel module, which will write data to cache file in the latest
format of cache file.

1. cache miss
When cache miss, cachefiles kernel module will notify user daemon with the
anon_fd, along with the requested file range. When notified, user daemon
needs to fetch data of the requested file range, and then write the fetched
data into cache file with the given anonymous fd. When finished processing
the request, user daemon needs to notify the kernel.

After notifying the user daemon, the kernel read routine will hang there,
until the request is handled by user daemon. When it's awaken by the
notification from user daemon, i.e. the corresponding hole has been filled
by the user daemon, it will retry to read from the same file range.

2. cache hit
Once data is already ready in cache file, netfs will read from cache
file directly.


[Advantage of fscache-based on-demand read]
========================================
1. Asynchronous prefetch
In current mechanism, fscache is responsible for cache state management,
while the data plane (fetching data from local/remote on cache miss) is
done on the user daemon side even without any file system request driven.
In addition, if cached data has already been available locally, fscache
will use it instead of trapping to user space anymore.

Therefore, different from event-driven approaches, the fscache on-demand
user daemon could also fetch data (from remote) asynchronously in the
background just like most multi-threaded HTTP downloaders.

2. Flexible request amplification
Since the data plane can be independently controlled by the user daemon,
the user daemon can also fetch more data from remote than that the file
system actually requests for small I/O sizes. Then, fetched data in bulk
will be available at once and fscache won't be trapped into the user
daemon again.

3. Support massive blobs
This mechanism can naturally support a large amount of backing files,
and thus can benefit the densely employed scenarios. In our use cases,
one container image can be formed of one bootstrap (required) and
multiple chunk-deduplicated data blobs (optional).

For example, one container image for node.js will correspond to ~20
files in total. In densely employed environment, there could be hundreds
of containers and thus thousands of backing files on one machine.




Jeffle Xu (21):
  cachefiles: extract write routine
  cachefiles: notify user daemon when looking up cookie
  cachefiles: unbind cachefiles gracefully in on-demand mode
  cachefiles: notify user daemon when withdrawing cookie
  cachefiles: implement on-demand read
  cachefiles: enable on-demand read mode
  cachefiles: add tracepoints for on-demand read mode
  cachefiles: document on-demand read mode
  erofs: make erofs_map_blocks() generally available
  erofs: add fscache mode check helper
  erofs: register fscache volume
  erofs: add fscache context helper functions
  erofs: add anonymous inode caching metadata for data blobs
  erofs: add erofs_fscache_read_folios() helper
  erofs: register fscache context for primary data blob
  erofs: register fscache context for extra data blobs
  erofs: implement fscache-based metadata read
  erofs: implement fscache-based data read for non-inline layout
  erofs: implement fscache-based data read for inline layout
  erofs: implement fscache-based data readahead
  erofs: add 'fsid' mount option

 .../filesystems/caching/cachefiles.rst        | 170 ++++++
 fs/cachefiles/Kconfig                         |  11 +
 fs/cachefiles/Makefile                        |   1 +
 fs/cachefiles/daemon.c                        | 116 +++-
 fs/cachefiles/interface.c                     |   2 +
 fs/cachefiles/internal.h                      |  74 +++
 fs/cachefiles/io.c                            |  76 ++-
 fs/cachefiles/namei.c                         |  16 +-
 fs/cachefiles/ondemand.c                      | 496 ++++++++++++++++++
 fs/erofs/Kconfig                              |  10 +
 fs/erofs/Makefile                             |   1 +
 fs/erofs/data.c                               |  26 +-
 fs/erofs/fscache.c                            | 365 +++++++++++++
 fs/erofs/inode.c                              |   4 +
 fs/erofs/internal.h                           |  49 ++
 fs/erofs/super.c                              | 105 +++-
 fs/erofs/sysfs.c                              |   4 +-
 include/linux/fscache.h                       |   1 +
 include/linux/netfs.h                         |   2 +
 include/trace/events/cachefiles.h             | 176 +++++++
 include/uapi/linux/cachefiles.h               |  68 +++
 21 files changed, 1694 insertions(+), 79 deletions(-)
 create mode 100644 fs/cachefiles/ondemand.c
 create mode 100644 fs/erofs/fscache.c
 create mode 100644 include/uapi/linux/cachefiles.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v9 00/21] fscache, erofs: fscache-based on-demand read semantics
@ 2022-04-15 12:35 ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

changes since v8:
- rebase to 5.18-rc2
- cachefiles: use object_id rather than anon_fd to uniquely identify a
  cachefile object to avoid potential issues when the user moves the
  anonymous fd around, e.g. through dup() (refer to commit message and
  cachefiles_ondemand_get_fd() of patch 2 for more details)
  (David Howells)
- cachefiles: add @unbind_pincount refcount to avoid the potential deadlock
  (refer to commit message of patch3 for more details)
- cachefiles: move the calling site of cachefiles_ondemand_read() from
  cachefiles_read() to cacehfiles_prep_read() (refer to commit message
  of patch 5 for more details)
- cachefiles: add tracepoints (patch 7) (David Howells)
- cachefiles: update documentation (patch 8) (David Howells)
- erofs: update Reviewed-by tag from Gao Xiang
- erofs: move the logic of initializing bdev/dax_dev in fscache mode out
  from patch 15/20. Instead move it into patch 9, so that patch 20 can
  focus on the mount option handling
- erofs: update the subject line and commit message of patch 12 (Gao
  Xiang)
- erofs: remove and fold erofs_fscache_get_folio() helper (patch 16)
  (Gao Xiang)
- erofs: change kmap() to kamp_loacl_folio(), and comment cleanup (patch
  18) (Gao Xiang)
- update "advantage of fscache-based on-demand read" section of the
  cover letter
- we've finished a preliminary end-to-end on-demand download daemon in
  order to test the fscache on-demand kernel code as a real end-to-end
  workload for container use cases. The test user guide is added in the
  cover letter.
- Thanks Zichen Tian for testing
  Tested-by: Zichen Tian <tianzichen@kuaishou.com>


Kernel Patchset
---------------
Git tree:

    https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v9

Gitweb:

    https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v9


User Guide for E2E Container Use Case
-------------------------------------
User guide:

    https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md

Video:

    https://youtu.be/F4IF2_DENXo


User Daemon for Quick Test
--------------------------
Git tree:

    https://github.com/lostjeffle/demand-read-cachefilesd.git main

Gitweb:

    https://github.com/lostjeffle/demand-read-cachefilesd


RFC: https://lore.kernel.org/all/YbRL2glGzjfZkVbH@B-P7TQMD6M-0146.local/t/
v1: https://lore.kernel.org/lkml/47831875-4bdd-8398-9f2d-0466b31a4382@linux.alibaba.com/T/
v2: https://lore.kernel.org/all/2946d871-b9e1-cf29-6d39-bcab30f2854f@linux.alibaba.com/t/
v3: https://lore.kernel.org/lkml/20220209060108.43051-1-jefflexu@linux.alibaba.com/T/
v4: https://lore.kernel.org/lkml/20220307123305.79520-1-jefflexu@linux.alibaba.com/T/#t
v5: https://lore.kernel.org/lkml/202203170912.gk2sqkaK-lkp@intel.com/T/
v6: https://lore.kernel.org/lkml/202203260720.uA5o7k5w-lkp@intel.com/T/
v7: https://lore.kernel.org/lkml/557bcf75-2334-5fbb-d2e0-c65e96da566d@linux.alibaba.com/T/
v8: https://lore.kernel.org/all/ac8571b8-0935-1f4f-e9f1-e424f059b5ed@linux.alibaba.com/T/


[Background]
============
Nydus [1] is an image distribution service especially optimized for
distribution over network. Nydus is an excellent container image
acceleration solution, since it only pulls data from remote when needed,
a.k.a. on-demand reading and it also supports chunk-based deduplication,
compression, etc.

erofs (Enhanced Read-Only File System) is a filesystem designed for
read-only scenarios. (Documentation/filesystem/erofs.rst)

Over the past months we've been focusing on supporting Nydus image service
with in-kernel erofs format[2]. In that case, each container image will be
organized in one bootstrap (metadata) and (optional) multiple data blobs in
erofs format. Massive container images will be stored on one machine.

To accelerate the container startup (fetching container images from remote
and then start the container), we do hope that the bootstrap & blob files
could support on-demand read. That is, erofs can be mounted and accessed
even when the bootstrap/data blob files have not been fully downloaded.
Then it'll have native performance after data is available locally.

That means we have to manage the cache state of the bootstrap/data blob
files (if cache hit, read directly from the local cache; if cache miss,
fetch the data somehow). It would be painful and may be dumb for erofs to
implement the cache management itself. Thus we prefer fscache/cachefiles
to do the cache management instead.

The fscache on-demand read feature aims to be implemented in a generic way
so that it can benefit other use cases and/or filesystems if it's
implemented in the fscache subsystem.

[1] https://nydus.dev
[2] https://sched.co/pcdL


[Overall Design]
================
Please refer to patch 7 ("cachefiles: document on-demand read mode") for
more details.

When working in the original mode, cachefiles mainly serves as a local cache
for remote networking fs, while in on-demand read mode, cachefiles can work
in the scenario where on-demand read semantics is needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when cache miss, netfs itself will fetch data from remote, and then write the
fetched data into cache file. While in on-demand read mode, a user daemon is
responsible for fetching data and then feeds to the kernel fscache side.

The on-demand read mode relies on a simple protocol used for communication
between kernel and user daemon.

The proposed implementation relies on the anonymous fd mechanism to avoid
the dependence on the format of cache file. When a fscache cachefile is opened
for the first time, an anon_fd associated with the cache file is sent to the
user daemon. With the given anon_fd, user daemon could fetch and write data
into the cache file in the background, even when kernel has not triggered the
cache miss. Besides, the write() syscall to the anon_fd will finally call
cachefiles kernel module, which will write data to cache file in the latest
format of cache file.

1. cache miss
When cache miss, cachefiles kernel module will notify user daemon with the
anon_fd, along with the requested file range. When notified, user daemon
needs to fetch data of the requested file range, and then write the fetched
data into cache file with the given anonymous fd. When finished processing
the request, user daemon needs to notify the kernel.

After notifying the user daemon, the kernel read routine will hang there,
until the request is handled by user daemon. When it's awaken by the
notification from user daemon, i.e. the corresponding hole has been filled
by the user daemon, it will retry to read from the same file range.

2. cache hit
Once data is already ready in cache file, netfs will read from cache
file directly.


[Advantage of fscache-based on-demand read]
========================================
1. Asynchronous prefetch
In current mechanism, fscache is responsible for cache state management,
while the data plane (fetching data from local/remote on cache miss) is
done on the user daemon side even without any file system request driven.
In addition, if cached data has already been available locally, fscache
will use it instead of trapping to user space anymore.

Therefore, different from event-driven approaches, the fscache on-demand
user daemon could also fetch data (from remote) asynchronously in the
background just like most multi-threaded HTTP downloaders.

2. Flexible request amplification
Since the data plane can be independently controlled by the user daemon,
the user daemon can also fetch more data from remote than that the file
system actually requests for small I/O sizes. Then, fetched data in bulk
will be available at once and fscache won't be trapped into the user
daemon again.

3. Support massive blobs
This mechanism can naturally support a large amount of backing files,
and thus can benefit the densely employed scenarios. In our use cases,
one container image can be formed of one bootstrap (required) and
multiple chunk-deduplicated data blobs (optional).

For example, one container image for node.js will correspond to ~20
files in total. In densely employed environment, there could be hundreds
of containers and thus thousands of backing files on one machine.




Jeffle Xu (21):
  cachefiles: extract write routine
  cachefiles: notify user daemon when looking up cookie
  cachefiles: unbind cachefiles gracefully in on-demand mode
  cachefiles: notify user daemon when withdrawing cookie
  cachefiles: implement on-demand read
  cachefiles: enable on-demand read mode
  cachefiles: add tracepoints for on-demand read mode
  cachefiles: document on-demand read mode
  erofs: make erofs_map_blocks() generally available
  erofs: add fscache mode check helper
  erofs: register fscache volume
  erofs: add fscache context helper functions
  erofs: add anonymous inode caching metadata for data blobs
  erofs: add erofs_fscache_read_folios() helper
  erofs: register fscache context for primary data blob
  erofs: register fscache context for extra data blobs
  erofs: implement fscache-based metadata read
  erofs: implement fscache-based data read for non-inline layout
  erofs: implement fscache-based data read for inline layout
  erofs: implement fscache-based data readahead
  erofs: add 'fsid' mount option

 .../filesystems/caching/cachefiles.rst        | 170 ++++++
 fs/cachefiles/Kconfig                         |  11 +
 fs/cachefiles/Makefile                        |   1 +
 fs/cachefiles/daemon.c                        | 116 +++-
 fs/cachefiles/interface.c                     |   2 +
 fs/cachefiles/internal.h                      |  74 +++
 fs/cachefiles/io.c                            |  76 ++-
 fs/cachefiles/namei.c                         |  16 +-
 fs/cachefiles/ondemand.c                      | 496 ++++++++++++++++++
 fs/erofs/Kconfig                              |  10 +
 fs/erofs/Makefile                             |   1 +
 fs/erofs/data.c                               |  26 +-
 fs/erofs/fscache.c                            | 365 +++++++++++++
 fs/erofs/inode.c                              |   4 +
 fs/erofs/internal.h                           |  49 ++
 fs/erofs/super.c                              | 105 +++-
 fs/erofs/sysfs.c                              |   4 +-
 include/linux/fscache.h                       |   1 +
 include/linux/netfs.h                         |   2 +
 include/trace/events/cachefiles.h             | 176 +++++++
 include/uapi/linux/cachefiles.h               |  68 +++
 21 files changed, 1694 insertions(+), 79 deletions(-)
 create mode 100644 fs/cachefiles/ondemand.c
 create mode 100644 fs/erofs/fscache.c
 create mode 100644 include/uapi/linux/cachefiles.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v9 01/21] cachefiles: extract write routine
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:35   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Extract the generic routine of writing data to cache files, and make it
generally available.

This will be used by the following patch implementing on-demand read
mode. Since it's called inside cachefiles module in this case, make the
interface generic and unrelated to netfs_cache_resources.

It is worth noting that, ki->inval_counter is not initialized after
this cleanup. It shall not make any visible difference, since
inval_counter is no longer used in the write completion routine, i.e.
cachefiles_write_complete().

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/internal.h | 10 +++++++
 fs/cachefiles/io.c       | 61 +++++++++++++++++++++++-----------------
 2 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index c793d33b0224..e80673d0ab97 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -201,6 +201,16 @@ extern void cachefiles_put_object(struct cachefiles_object *object,
  */
 extern bool cachefiles_begin_operation(struct netfs_cache_resources *cres,
 				       enum fscache_want_state want_state);
+extern int __cachefiles_prepare_write(struct cachefiles_object *object,
+				      struct file *file,
+				      loff_t *_start, size_t *_len,
+				      bool no_space_allocated_yet);
+extern int __cachefiles_write(struct cachefiles_object *object,
+			      struct file *file,
+			      loff_t start_pos,
+			      struct iov_iter *iter,
+			      netfs_io_terminated_t term_func,
+			      void *term_func_priv);
 
 /*
  * key.c
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 9dc81e781f2b..50a14e8f0aac 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -277,36 +277,33 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
 /*
  * Initiate a write to the cache.
  */
-static int cachefiles_write(struct netfs_cache_resources *cres,
-			    loff_t start_pos,
-			    struct iov_iter *iter,
-			    netfs_io_terminated_t term_func,
-			    void *term_func_priv)
+int __cachefiles_write(struct cachefiles_object *object,
+		       struct file *file,
+		       loff_t start_pos,
+		       struct iov_iter *iter,
+		       netfs_io_terminated_t term_func,
+		       void *term_func_priv)
 {
-	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	struct cachefiles_kiocb *ki;
 	struct inode *inode;
-	struct file *file;
 	unsigned int old_nofs;
-	ssize_t ret = -ENOBUFS;
+	ssize_t ret;
 	size_t len = iov_iter_count(iter);
 
-	if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE))
-		goto presubmission_error;
 	fscache_count_write();
-	object = cachefiles_cres_object(cres);
 	cache = object->volume->cache;
-	file = cachefiles_cres_file(cres);
 
 	_enter("%pD,%li,%llx,%zx/%llx",
 	       file, file_inode(file)->i_ino, start_pos, len,
 	       i_size_read(file_inode(file)));
 
-	ret = -ENOMEM;
 	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
-	if (!ki)
-		goto presubmission_error;
+	if (!ki) {
+		if (term_func)
+			term_func(term_func_priv, -ENOMEM, false);
+		return -ENOMEM;
+	}
 
 	refcount_set(&ki->ki_refcnt, 2);
 	ki->iocb.ki_filp	= file;
@@ -314,7 +311,6 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
 	ki->iocb.ki_flags	= IOCB_DIRECT | IOCB_WRITE;
 	ki->iocb.ki_ioprio	= get_current_ioprio();
 	ki->object		= object;
-	ki->inval_counter	= cres->inval_counter;
 	ki->start		= start_pos;
 	ki->len			= len;
 	ki->term_func		= term_func;
@@ -369,11 +365,24 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
 	cachefiles_put_kiocb(ki);
 	_leave(" = %zd", ret);
 	return ret;
+}
 
-presubmission_error:
-	if (term_func)
-		term_func(term_func_priv, ret, false);
-	return ret;
+static int cachefiles_write(struct netfs_cache_resources *cres,
+			    loff_t start_pos,
+			    struct iov_iter *iter,
+			    netfs_io_terminated_t term_func,
+			    void *term_func_priv)
+{
+	if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE)) {
+		if (term_func)
+			term_func(term_func_priv, -ENOBUFS, false);
+		return -ENOBUFS;
+	}
+
+	return __cachefiles_write(cachefiles_cres_object(cres),
+				  cachefiles_cres_file(cres),
+				  start_pos, iter,
+				  term_func, term_func_priv);
 }
 
 /*
@@ -484,13 +493,12 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 /*
  * Prepare for a write to occur.
  */
-static int __cachefiles_prepare_write(struct netfs_cache_resources *cres,
-				      loff_t *_start, size_t *_len, loff_t i_size,
-				      bool no_space_allocated_yet)
+int __cachefiles_prepare_write(struct cachefiles_object *object,
+			       struct file *file,
+			       loff_t *_start, size_t *_len,
+			       bool no_space_allocated_yet)
 {
-	struct cachefiles_object *object = cachefiles_cres_object(cres);
 	struct cachefiles_cache *cache = object->volume->cache;
-	struct file *file = cachefiles_cres_file(cres);
 	loff_t start = *_start, pos;
 	size_t len = *_len, down;
 	int ret;
@@ -577,7 +585,8 @@ static int cachefiles_prepare_write(struct netfs_cache_resources *cres,
 	}
 
 	cachefiles_begin_secure(cache, &saved_cred);
-	ret = __cachefiles_prepare_write(cres, _start, _len, i_size,
+	ret = __cachefiles_prepare_write(object, cachefiles_cres_file(cres),
+					 _start, _len,
 					 no_space_allocated_yet);
 	cachefiles_end_secure(cache, saved_cred);
 	return ret;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 01/21] cachefiles: extract write routine
@ 2022-04-15 12:35   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Extract the generic routine of writing data to cache files, and make it
generally available.

This will be used by the following patch implementing on-demand read
mode. Since it's called inside cachefiles module in this case, make the
interface generic and unrelated to netfs_cache_resources.

It is worth noting that, ki->inval_counter is not initialized after
this cleanup. It shall not make any visible difference, since
inval_counter is no longer used in the write completion routine, i.e.
cachefiles_write_complete().

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/internal.h | 10 +++++++
 fs/cachefiles/io.c       | 61 +++++++++++++++++++++++-----------------
 2 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index c793d33b0224..e80673d0ab97 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -201,6 +201,16 @@ extern void cachefiles_put_object(struct cachefiles_object *object,
  */
 extern bool cachefiles_begin_operation(struct netfs_cache_resources *cres,
 				       enum fscache_want_state want_state);
+extern int __cachefiles_prepare_write(struct cachefiles_object *object,
+				      struct file *file,
+				      loff_t *_start, size_t *_len,
+				      bool no_space_allocated_yet);
+extern int __cachefiles_write(struct cachefiles_object *object,
+			      struct file *file,
+			      loff_t start_pos,
+			      struct iov_iter *iter,
+			      netfs_io_terminated_t term_func,
+			      void *term_func_priv);
 
 /*
  * key.c
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 9dc81e781f2b..50a14e8f0aac 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -277,36 +277,33 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
 /*
  * Initiate a write to the cache.
  */
-static int cachefiles_write(struct netfs_cache_resources *cres,
-			    loff_t start_pos,
-			    struct iov_iter *iter,
-			    netfs_io_terminated_t term_func,
-			    void *term_func_priv)
+int __cachefiles_write(struct cachefiles_object *object,
+		       struct file *file,
+		       loff_t start_pos,
+		       struct iov_iter *iter,
+		       netfs_io_terminated_t term_func,
+		       void *term_func_priv)
 {
-	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	struct cachefiles_kiocb *ki;
 	struct inode *inode;
-	struct file *file;
 	unsigned int old_nofs;
-	ssize_t ret = -ENOBUFS;
+	ssize_t ret;
 	size_t len = iov_iter_count(iter);
 
-	if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE))
-		goto presubmission_error;
 	fscache_count_write();
-	object = cachefiles_cres_object(cres);
 	cache = object->volume->cache;
-	file = cachefiles_cres_file(cres);
 
 	_enter("%pD,%li,%llx,%zx/%llx",
 	       file, file_inode(file)->i_ino, start_pos, len,
 	       i_size_read(file_inode(file)));
 
-	ret = -ENOMEM;
 	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
-	if (!ki)
-		goto presubmission_error;
+	if (!ki) {
+		if (term_func)
+			term_func(term_func_priv, -ENOMEM, false);
+		return -ENOMEM;
+	}
 
 	refcount_set(&ki->ki_refcnt, 2);
 	ki->iocb.ki_filp	= file;
@@ -314,7 +311,6 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
 	ki->iocb.ki_flags	= IOCB_DIRECT | IOCB_WRITE;
 	ki->iocb.ki_ioprio	= get_current_ioprio();
 	ki->object		= object;
-	ki->inval_counter	= cres->inval_counter;
 	ki->start		= start_pos;
 	ki->len			= len;
 	ki->term_func		= term_func;
@@ -369,11 +365,24 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
 	cachefiles_put_kiocb(ki);
 	_leave(" = %zd", ret);
 	return ret;
+}
 
-presubmission_error:
-	if (term_func)
-		term_func(term_func_priv, ret, false);
-	return ret;
+static int cachefiles_write(struct netfs_cache_resources *cres,
+			    loff_t start_pos,
+			    struct iov_iter *iter,
+			    netfs_io_terminated_t term_func,
+			    void *term_func_priv)
+{
+	if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE)) {
+		if (term_func)
+			term_func(term_func_priv, -ENOBUFS, false);
+		return -ENOBUFS;
+	}
+
+	return __cachefiles_write(cachefiles_cres_object(cres),
+				  cachefiles_cres_file(cres),
+				  start_pos, iter,
+				  term_func, term_func_priv);
 }
 
 /*
@@ -484,13 +493,12 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 /*
  * Prepare for a write to occur.
  */
-static int __cachefiles_prepare_write(struct netfs_cache_resources *cres,
-				      loff_t *_start, size_t *_len, loff_t i_size,
-				      bool no_space_allocated_yet)
+int __cachefiles_prepare_write(struct cachefiles_object *object,
+			       struct file *file,
+			       loff_t *_start, size_t *_len,
+			       bool no_space_allocated_yet)
 {
-	struct cachefiles_object *object = cachefiles_cres_object(cres);
 	struct cachefiles_cache *cache = object->volume->cache;
-	struct file *file = cachefiles_cres_file(cres);
 	loff_t start = *_start, pos;
 	size_t len = *_len, down;
 	int ret;
@@ -577,7 +585,8 @@ static int cachefiles_prepare_write(struct netfs_cache_resources *cres,
 	}
 
 	cachefiles_begin_secure(cache, &saved_cred);
-	ret = __cachefiles_prepare_write(cres, _start, _len, i_size,
+	ret = __cachefiles_prepare_write(object, cachefiles_cres_file(cres),
+					 _start, _len,
 					 no_space_allocated_yet);
 	cachefiles_end_secure(cache, saved_cred);
 	return ret;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 02/21] cachefiles: notify user daemon when looking up cookie
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:35   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Fscache/cachefiles used to serve as a local cache for a remote
networking fs. This patch, along with the following patches, introduces
a new on-demand read mode for cachefiles, which can boost the scenario
where on-demand read semantics is needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original
mode, when a cache miss occurs, the netfs will fetch the data from the
remote server and then write it to the cache file. With on-demand read
mode, however, fetching the data and writing it to the cache is
delegated to a user daemon.

As the first step, notify the user daemon when looking up cookie. In
this case, an anonymous fd is sent to the user daemon, through which the
user daemon can write the fetched data to the cache file. Since the user
daemon may move the anonymous fd around, e.g. through dup(), an object
ID uniquely identifying the cache file is also attached.

Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that
cache file size shall be retrieved at runtime. This helps the scenario
where one cache file contains multiple netfs files, e.g. for the purpose
of deduplication. In this case, netfs itself has no idea the size of the
cache file, whilst the user daemon needs to give the hint on it.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/Kconfig             |  11 +
 fs/cachefiles/Makefile            |   1 +
 fs/cachefiles/daemon.c            |  79 +++++--
 fs/cachefiles/internal.h          |  47 ++++
 fs/cachefiles/namei.c             |  16 +-
 fs/cachefiles/ondemand.c          | 371 ++++++++++++++++++++++++++++++
 include/linux/fscache.h           |   1 +
 include/trace/events/cachefiles.h |   2 +
 include/uapi/linux/cachefiles.h   |  50 ++++
 9 files changed, 563 insertions(+), 15 deletions(-)
 create mode 100644 fs/cachefiles/ondemand.c
 create mode 100644 include/uapi/linux/cachefiles.h

diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig
index 719faeeda168..67371cb1eb08 100644
--- a/fs/cachefiles/Kconfig
+++ b/fs/cachefiles/Kconfig
@@ -26,3 +26,14 @@ config CACHEFILES_ERROR_INJECTION
 	help
 	  This permits error injection to be enabled in cachefiles whilst a
 	  cache is in service.
+
+config CACHEFILES_ONDEMAND
+	bool "Support for on-demand read"
+	depends on CACHEFILES
+	default n
+	help
+	  This permits on-demand read mode of cachefiles.  In this mode, when
+	  cache miss, the cachefiles backend instead of netfs, is responsible
+	  for fetching data, e.g. through user daemon.
+
+	  If unsure, say N.
diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 16d811f1a2fa..c37a7a9af10b 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -16,5 +16,6 @@ cachefiles-y := \
 	xattr.o
 
 cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o
+cachefiles-$(CONFIG_CACHEFILES_ONDEMAND) += ondemand.o
 
 obj-$(CONFIG_CACHEFILES) := cachefiles.o
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 7ac04ee2c0a0..69ca22aa6abf 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -75,6 +75,9 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
 	{ "inuse",	cachefiles_daemon_inuse		},
 	{ "secctx",	cachefiles_daemon_secctx	},
 	{ "tag",	cachefiles_daemon_tag		},
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	{ "copen",	cachefiles_ondemand_copen	},
+#endif
 	{ "",		NULL				}
 };
 
@@ -108,6 +111,10 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	INIT_LIST_HEAD(&cache->volumes);
 	INIT_LIST_HEAD(&cache->object_list);
 	spin_lock_init(&cache->object_list_lock);
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
+	xa_init_flags(&cache->ondemand_ids, XA_FLAGS_ALLOC1);
+#endif
 
 	/* set default caching limits
 	 * - limit at 1% free space and/or free files
@@ -126,6 +133,30 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+static void cachefiles_flush_reqs(struct cachefiles_cache *cache)
+{
+	struct xarray *xa = &cache->reqs;
+	struct cachefiles_req *req;
+	unsigned long index;
+
+	/*
+	 * 1) Cache has been marked as dead state, and then 2) flush all
+	 * pending requests in @reqs xarray. The barrier inside set_bit()
+	 * will ensure that above two ops won't be reordered.
+	 */
+	xa_lock(xa);
+	xa_for_each(xa, index, req) {
+		req->error = -EIO;
+		complete(&req->done);
+	}
+	xa_unlock(xa);
+
+	xa_destroy(&cache->reqs);
+	xa_destroy(&cache->ondemand_ids);
+}
+#endif
+
 /*
  * Release a cache.
  */
@@ -139,6 +170,9 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 
 	set_bit(CACHEFILES_DEAD, &cache->flags);
 
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	cachefiles_flush_reqs(cache);
+#endif
 	cachefiles_daemon_unbind(cache);
 
 	/* clean up the control file interface */
@@ -152,23 +186,14 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-/*
- * Read the cache state.
- */
-static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
-				      size_t buflen, loff_t *pos)
+static ssize_t cachefiles_do_daemon_read(struct cachefiles_cache *cache,
+					 char __user *_buffer, size_t buflen)
 {
-	struct cachefiles_cache *cache = file->private_data;
 	unsigned long long b_released;
 	unsigned f_released;
 	char buffer[256];
 	int n;
 
-	//_enter(",,%zu,", buflen);
-
-	if (!test_bit(CACHEFILES_READY, &cache->flags))
-		return 0;
-
 	/* check how much space the cache has */
 	cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);
 
@@ -206,6 +231,26 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
 	return n;
 }
 
+/*
+ * Read the cache state.
+ */
+static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
+				      size_t buflen, loff_t *pos)
+{
+	struct cachefiles_cache *cache = file->private_data;
+
+	//_enter(",,%zu,", buflen);
+
+	if (!test_bit(CACHEFILES_READY, &cache->flags))
+		return 0;
+
+	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+	    test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return cachefiles_ondemand_daemon_read(cache, _buffer, buflen);
+	else
+		return cachefiles_do_daemon_read(cache, _buffer, buflen);
+}
+
 /*
  * Take a command from cachefilesd, parse it and act on it.
  */
@@ -297,8 +342,16 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
 	poll_wait(file, &cache->daemon_pollwq, poll);
 	mask = 0;
 
-	if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
-		mask |= EPOLLIN;
+	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+	    test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) {
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+		if (!xa_empty(&cache->reqs))
+			mask |= EPOLLIN;
+#endif
+	} else {
+		if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
+			mask |= EPOLLIN;
+	}
 
 	if (test_bit(CACHEFILES_CULLING, &cache->flags))
 		mask |= EPOLLOUT;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e80673d0ab97..8ebe238af20b 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -15,6 +15,8 @@
 #include <linux/fscache-cache.h>
 #include <linux/cred.h>
 #include <linux/security.h>
+#include <linux/xarray.h>
+#include <linux/cachefiles.h>
 
 #define CACHEFILES_DIO_BLOCK_SIZE 4096
 
@@ -58,8 +60,13 @@ struct cachefiles_object {
 	enum cachefiles_content		content_info:8;	/* Info about content presence */
 	unsigned long			flags;
 #define CACHEFILES_OBJECT_USING_TMPFILE	0		/* Have an unlinked tmpfile */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	int				ondemand_id;
+#endif
 };
 
+#define CACHEFILES_ONDEMAND_ID_CLOSED	-1
+
 /*
  * Cache files cache definition
  */
@@ -98,11 +105,26 @@ struct cachefiles_cache {
 #define CACHEFILES_DEAD			1	/* T if cache dead */
 #define CACHEFILES_CULLING		2	/* T if cull engaged */
 #define CACHEFILES_STATE_CHANGED	3	/* T if state changed (poll trigger) */
+#define CACHEFILES_ONDEMAND_MODE	4	/* T if in on-demand read mode */
 	char				*rootdirname;	/* name of cache root directory */
 	char				*secctx;	/* LSM security context */
 	char				*tag;		/* cache binding tag */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	struct xarray			reqs;		/* xarray of pending on-demand requests */
+	struct xarray			ondemand_ids;	/* xarray for ondemand_id allocation */
+	u32				ondemand_id_next;
+#endif
+};
+
+struct cachefiles_req {
+	struct cachefiles_object *object;
+	struct completion done;
+	int error;
+	struct cachefiles_msg msg;
 };
 
+#define CACHEFILES_REQ_NEW	XA_MARK_1
+
 #include <trace/events/cachefiles.h>
 
 static inline
@@ -250,6 +272,31 @@ extern struct file *cachefiles_create_tmpfile(struct cachefiles_object *object);
 extern bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
 				      struct cachefiles_object *object);
 
+/*
+ * ondemand.c
+ */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+extern ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					char __user *_buffer, size_t buflen);
+
+extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
+				     char *args);
+
+extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+
+#else
+static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					char __user *_buffer, size_t buflen)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+	return 0;
+}
+#endif
+
 /*
  * security.c
  */
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index ca9f3e4ec4b3..facf2ebe464b 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -452,10 +452,9 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 	struct dentry *fan = volume->fanout[(u8)object->cookie->key_hash];
 	struct file *file;
 	struct path path;
-	uint64_t ni_size = object->cookie->object_size;
+	uint64_t ni_size;
 	long ret;
 
-	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
 	cachefiles_begin_secure(cache, &saved_cred);
 
@@ -481,6 +480,15 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 		goto out_dput;
 	}
 
+	ret = cachefiles_ondemand_init_object(object);
+	if (ret < 0) {
+		file = ERR_PTR(ret);
+		goto out_unuse;
+	}
+
+	ni_size = object->cookie->object_size;
+	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
+
 	if (ni_size > 0) {
 		trace_cachefiles_trunc(object, d_backing_inode(path.dentry), 0, ni_size,
 				       cachefiles_trunc_expand_tmpfile);
@@ -586,6 +594,10 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
 	}
 	_debug("file -> %pd positive", dentry);
 
+	ret = cachefiles_ondemand_init_object(object);
+	if (ret < 0)
+		goto error_fput;
+
 	ret = cachefiles_check_auxdata(object, file);
 	if (ret < 0)
 		goto check_failed;
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
new file mode 100644
index 000000000000..890cd3ecc2f0
--- /dev/null
+++ b/fs/cachefiles/ondemand.c
@@ -0,0 +1,371 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include <linux/fdtable.h>
+#include <linux/anon_inodes.h>
+#include <linux/uio.h>
+#include "internal.h"
+
+static int cachefiles_ondemand_fd_release(struct inode *inode,
+					  struct file *file)
+{
+	struct cachefiles_object *object = file->private_data;
+	struct cachefiles_cache *cache = object->volume->cache;
+	int object_id = object->ondemand_id;
+
+	object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED;
+	xa_erase(&cache->ondemand_ids, object_id);
+	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+	return 0;
+}
+
+static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
+						 struct iov_iter *iter)
+{
+	struct cachefiles_object *object = kiocb->ki_filp->private_data;
+	struct cachefiles_cache *cache = object->volume->cache;
+	struct file *file = object->file;
+	size_t len = iter->count;
+	loff_t pos = kiocb->ki_pos;
+	const struct cred *saved_cred;
+	int ret;
+
+	if (!file)
+		return -ENOBUFS;
+
+	cachefiles_begin_secure(cache, &saved_cred);
+	ret = __cachefiles_prepare_write(object, file, &pos, &len, true);
+	cachefiles_end_secure(cache, saved_cred);
+	if (ret < 0)
+		return ret;
+
+	ret = __cachefiles_write(object, file, pos, iter, NULL, NULL);
+	if (!ret)
+		ret = len;
+
+	return ret;
+}
+
+static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos,
+					    int whence)
+{
+	struct cachefiles_object *object = filp->private_data;
+	struct file *file = object->file;
+
+	if (!file)
+		return -ENOBUFS;
+
+	return vfs_llseek(file, pos, whence);
+}
+
+static const struct file_operations cachefiles_ondemand_fd_fops = {
+	.owner		= THIS_MODULE,
+	.release	= cachefiles_ondemand_fd_release,
+	.write_iter	= cachefiles_ondemand_fd_write_iter,
+	.llseek		= cachefiles_ondemand_fd_llseek,
+};
+
+/*
+ * OPEN request Completion (copen)
+ * - command: "copen <id>,<cache_size>"
+ *   <cache_size> represents the object size if >=0, error code if negative
+ */
+int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
+{
+	struct cachefiles_req *req;
+	struct fscache_cookie *cookie;
+	char *pid, *psize;
+	unsigned long id;
+	long size;
+	int ret;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return -EOPNOTSUPP;
+
+	if (!*args) {
+		pr_err("Empty id specified\n");
+		return -EINVAL;
+	}
+
+	pid = args;
+	psize = strchr(args, ',');
+	if (!psize) {
+		pr_err("Cache size is not specified\n");
+		return -EINVAL;
+	}
+
+	*psize = 0;
+	psize++;
+
+	ret = kstrtoul(pid, 0, &id);
+	if (ret)
+		return ret;
+
+	req = xa_erase(&cache->reqs, id);
+	if (!req)
+		return -EINVAL;
+
+	/* fail OPEN request if copen format is invalid */
+	ret = kstrtol(psize, 0, &size);
+	if (ret) {
+		req->error = ret;
+		goto out;
+	}
+
+	/* fail OPEN request if daemon reports an error */
+	if (size < 0) {
+		if (!IS_ERR_VALUE(size))
+			size = -EINVAL;
+		req->error = size;
+		goto out;
+	}
+
+	cookie = req->object->cookie;
+	cookie->object_size = size;
+	if (size)
+		clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+	else
+		set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+
+out:
+	complete(&req->done);
+	return ret;
+}
+
+static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
+{
+	struct cachefiles_object *object;
+	struct cachefiles_cache *cache;
+	struct cachefiles_open *load;
+	struct file *file;
+	u32 object_id;
+	int ret, fd;
+
+	object = cachefiles_grab_object(req->object,
+			cachefiles_obj_get_ondemand_fd);
+	cache = object->volume->cache;
+
+	ret = xa_alloc_cyclic(&cache->ondemand_ids, &object_id, NULL,
+			      XA_LIMIT(1, INT_MAX),
+			      &cache->ondemand_id_next, GFP_KERNEL);
+	if (ret < 0)
+		goto err;
+
+	fd = get_unused_fd_flags(O_WRONLY);
+	if (fd < 0) {
+		ret = fd;
+		goto err_free_id;
+	}
+
+	file = anon_inode_getfile("[cachefiles]", &cachefiles_ondemand_fd_fops,
+				  object, O_WRONLY);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto err_put_fd;
+	}
+
+	file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
+	fd_install(fd, file);
+
+	load = (void *)req->msg.data;
+	load->fd = fd;
+	req->msg.object_id = object_id;
+	object->ondemand_id = object_id;
+	return 0;
+
+err_put_fd:
+	put_unused_fd(fd);
+err_free_id:
+	xa_erase(&cache->ondemand_ids, object_id);
+err:
+	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+	return ret;
+}
+
+ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					char __user *_buffer, size_t buflen)
+{
+	struct cachefiles_req *req;
+	struct cachefiles_msg *msg;
+	unsigned long id = 0;
+	size_t n;
+	int ret = 0;
+	XA_STATE(xas, &cache->reqs, 0);
+
+	/*
+	 * Search for a request that has not ever been processed, to prevent
+	 * requests from being processed repeatedly.
+	 */
+	xa_lock(&cache->reqs);
+	req = xas_find_marked(&xas, UINT_MAX, CACHEFILES_REQ_NEW);
+	if (!req) {
+		xa_unlock(&cache->reqs);
+		return 0;
+	}
+
+	msg = &req->msg;
+	n = msg->len;
+
+	if (n > buflen) {
+		xa_unlock(&cache->reqs);
+		return -EMSGSIZE;
+	}
+
+	xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
+	xa_unlock(&cache->reqs);
+
+	id = xas.xa_index;
+	msg->msg_id = id;
+
+	if (msg->opcode == CACHEFILES_OP_OPEN) {
+		ret = cachefiles_ondemand_get_fd(req);
+		if (ret)
+			goto error;
+	}
+
+	if (copy_to_user(_buffer, msg, n) != 0) {
+		ret = -EFAULT;
+		goto err_put_fd;
+	}
+
+	return n;
+
+err_put_fd:
+	if (msg->opcode == CACHEFILES_OP_OPEN)
+		close_fd(((struct cachefiles_open *)msg->data)->fd);
+error:
+	xa_erase(&cache->reqs, id);
+	req->error = ret;
+	complete(&req->done);
+	return ret;
+}
+
+typedef int (*init_req_fn)(struct cachefiles_req *req, void *private);
+
+static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
+					enum cachefiles_opcode opcode,
+					size_t data_len,
+					init_req_fn init_req,
+					void *private)
+{
+	struct cachefiles_cache *cache = object->volume->cache;
+	struct cachefiles_req *req;
+	XA_STATE(xas, &cache->reqs, 0);
+	int ret;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return 0;
+
+	if (test_bit(CACHEFILES_DEAD, &cache->flags))
+		return -EIO;
+
+	req = kzalloc(sizeof(*req) + data_len, GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->object = object;
+	init_completion(&req->done);
+	req->msg.opcode = opcode;
+	req->msg.len = sizeof(struct cachefiles_msg) + data_len;
+
+	ret = init_req(req, private);
+	if (ret)
+		goto out;
+
+	do {
+		/*
+		 * Stop enqueuing the request when daemon is dying. So we need
+		 * to 1) check cache state, and 2) enqueue request if cache is
+		 * alive.
+		 *
+		 * These two ops need to be atomic as a whole. Otherwise request
+		 * may be enqueued after xarray has been flushed, in which case
+		 * the orphan request will never be completed and thus netfs
+		 * will hang there forever.
+		 */
+		xas_lock(&xas);
+
+		/* recheck dead state with lock held */
+		if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+			xas_unlock(&xas);
+			ret = -EIO;
+			goto out;
+		}
+
+		xas.xa_index = 0;
+		xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
+		if (xas.xa_node == XAS_RESTART)
+			xas_set_err(&xas, -EBUSY);
+		xas_store(&xas, req);
+		xas_clear_mark(&xas, XA_FREE_MARK);
+		xas_set_mark(&xas, CACHEFILES_REQ_NEW);
+		xas_unlock(&xas);
+	} while (xas_nomem(&xas, GFP_KERNEL));
+
+	ret = xas_error(&xas);
+	if (ret)
+		goto out;
+
+	wake_up_all(&cache->daemon_pollwq);
+	wait_for_completion(&req->done);
+	ret = req->error;
+out:
+	kfree(req);
+	return ret;
+}
+
+static int cachefiles_ondemand_init_open_req(struct cachefiles_req *req,
+					     void *private)
+{
+	struct cachefiles_object *object = req->object;
+	struct fscache_cookie *cookie = object->cookie;
+	struct fscache_volume *volume = object->volume->vcookie;
+	struct cachefiles_open *load = (void *)req->msg.data;
+	size_t volume_key_size, cookie_key_size;
+	void *volume_key, *cookie_key;
+
+	/*
+	 * Volume key is a NUL-terminated string. key[0] stores strlen() of the
+	 * string, followed by the content of the string (excluding '\0').
+	 */
+	volume_key_size = volume->key[0] + 1;
+	volume_key = volume->key + 1;
+
+	/* Cookie key is binary data, which is netfs specific. */
+	cookie_key_size = cookie->key_len;
+	cookie_key = fscache_get_key(cookie);
+
+	if (!(object->cookie->advice & FSCACHE_ADV_WANT_CACHE_SIZE)) {
+		pr_err("WANT_CACHE_SIZE is needed for on-demand mode\n");
+		return -EINVAL;
+	}
+
+	load->volume_key_size = volume_key_size;
+	load->cookie_key_size = cookie_key_size;
+	memcpy(load->data, volume_key, volume_key_size);
+	memcpy(load->data + volume_key_size, cookie_key, cookie_key_size);
+
+	return 0;
+}
+
+int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+	struct fscache_cookie *cookie = object->cookie;
+	struct fscache_volume *volume = object->volume->vcookie;
+	size_t volume_key_size, cookie_key_size, data_len;
+
+	/*
+	 * Cachefiles will firstly check cache file under the root cache
+	 * directory. If coherency check failed, it will fallback to creating a
+	 * new tmpfile as the cache file. Reuse the previously allocated object
+	 * ID if any.
+	 */
+	if (object->ondemand_id > 0)
+		return 0;
+
+	volume_key_size = volume->key[0] + 1;
+	cookie_key_size = cookie->key_len;
+	data_len = sizeof(struct cachefiles_open) +
+		   volume_key_size + cookie_key_size;
+
+	return cachefiles_ondemand_send_req(object, CACHEFILES_OP_OPEN,
+			data_len, cachefiles_ondemand_init_open_req, NULL);
+}
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index e25539072463..72585c9729a2 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -39,6 +39,7 @@ struct fscache_cookie;
 #define FSCACHE_ADV_SINGLE_CHUNK	0x01 /* The object is a single chunk of data */
 #define FSCACHE_ADV_WRITE_CACHE		0x00 /* Do cache if written to locally */
 #define FSCACHE_ADV_WRITE_NOCACHE	0x02 /* Don't cache if written to locally */
+#define FSCACHE_ADV_WANT_CACHE_SIZE	0x04 /* Retrieve cache size at runtime */
 
 #define FSCACHE_INVAL_DIO_WRITE		0x01 /* Invalidate due to DIO write */
 
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 311c14a20e70..93df9391bd7f 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -31,6 +31,8 @@ enum cachefiles_obj_ref_trace {
 	cachefiles_obj_see_lookup_failed,
 	cachefiles_obj_see_withdraw_cookie,
 	cachefiles_obj_see_withdrawal,
+	cachefiles_obj_get_ondemand_fd,
+	cachefiles_obj_put_ondemand_fd,
 };
 
 enum fscache_why_object_killed {
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
new file mode 100644
index 000000000000..521f2fe4fe9c
--- /dev/null
+++ b/include/uapi/linux/cachefiles.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_CACHEFILES_H
+#define _LINUX_CACHEFILES_H
+
+#include <linux/types.h>
+
+/*
+ * Fscache ensures that the maximum length of cookie key is 255. The volume key
+ * is controlled by netfs, and generally no bigger than 255.
+ */
+#define CACHEFILES_MSG_MAX_SIZE	1024
+
+enum cachefiles_opcode {
+	CACHEFILES_OP_OPEN,
+};
+
+/*
+ * Message Header
+ *
+ * @msg_id	a unique ID identifying this message
+ * @opcode	message type, CACHEFILE_OP_*
+ * @len		message length, including message header and following data
+ * @object_id	a unique ID identifying a cache file
+ * @data	message type specific payload
+ */
+struct cachefiles_msg {
+	__u32 msg_id;
+	__u32 opcode;
+	__u32 len;
+	__u32 object_id;
+	__u8  data[];
+};
+
+/*
+ * @data contains the volume_key followed directly by the cookie_key. volume_key
+ * is a NUL-terminated string; @volume_key_size indicates the size of the volume
+ * key in bytes. cookie_key is binary data, which is netfs specific;
+ * @cookie_key_size indicates the size of the cookie key in bytes.
+ *
+ * @fd identifies an anon_fd referring to the cache file.
+ */
+struct cachefiles_open {
+	__u32 volume_key_size;
+	__u32 cookie_key_size;
+	__u32 fd;
+	__u32 flags;
+	__u8  data[];
+};
+
+#endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 02/21] cachefiles: notify user daemon when looking up cookie
@ 2022-04-15 12:35   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Fscache/cachefiles used to serve as a local cache for a remote
networking fs. This patch, along with the following patches, introduces
a new on-demand read mode for cachefiles, which can boost the scenario
where on-demand read semantics is needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original
mode, when a cache miss occurs, the netfs will fetch the data from the
remote server and then write it to the cache file. With on-demand read
mode, however, fetching the data and writing it to the cache is
delegated to a user daemon.

As the first step, notify the user daemon when looking up cookie. In
this case, an anonymous fd is sent to the user daemon, through which the
user daemon can write the fetched data to the cache file. Since the user
daemon may move the anonymous fd around, e.g. through dup(), an object
ID uniquely identifying the cache file is also attached.

Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that
cache file size shall be retrieved at runtime. This helps the scenario
where one cache file contains multiple netfs files, e.g. for the purpose
of deduplication. In this case, netfs itself has no idea the size of the
cache file, whilst the user daemon needs to give the hint on it.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/Kconfig             |  11 +
 fs/cachefiles/Makefile            |   1 +
 fs/cachefiles/daemon.c            |  79 +++++--
 fs/cachefiles/internal.h          |  47 ++++
 fs/cachefiles/namei.c             |  16 +-
 fs/cachefiles/ondemand.c          | 371 ++++++++++++++++++++++++++++++
 include/linux/fscache.h           |   1 +
 include/trace/events/cachefiles.h |   2 +
 include/uapi/linux/cachefiles.h   |  50 ++++
 9 files changed, 563 insertions(+), 15 deletions(-)
 create mode 100644 fs/cachefiles/ondemand.c
 create mode 100644 include/uapi/linux/cachefiles.h

diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig
index 719faeeda168..67371cb1eb08 100644
--- a/fs/cachefiles/Kconfig
+++ b/fs/cachefiles/Kconfig
@@ -26,3 +26,14 @@ config CACHEFILES_ERROR_INJECTION
 	help
 	  This permits error injection to be enabled in cachefiles whilst a
 	  cache is in service.
+
+config CACHEFILES_ONDEMAND
+	bool "Support for on-demand read"
+	depends on CACHEFILES
+	default n
+	help
+	  This permits on-demand read mode of cachefiles.  In this mode, when
+	  cache miss, the cachefiles backend instead of netfs, is responsible
+	  for fetching data, e.g. through user daemon.
+
+	  If unsure, say N.
diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 16d811f1a2fa..c37a7a9af10b 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -16,5 +16,6 @@ cachefiles-y := \
 	xattr.o
 
 cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o
+cachefiles-$(CONFIG_CACHEFILES_ONDEMAND) += ondemand.o
 
 obj-$(CONFIG_CACHEFILES) := cachefiles.o
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 7ac04ee2c0a0..69ca22aa6abf 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -75,6 +75,9 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
 	{ "inuse",	cachefiles_daemon_inuse		},
 	{ "secctx",	cachefiles_daemon_secctx	},
 	{ "tag",	cachefiles_daemon_tag		},
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	{ "copen",	cachefiles_ondemand_copen	},
+#endif
 	{ "",		NULL				}
 };
 
@@ -108,6 +111,10 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	INIT_LIST_HEAD(&cache->volumes);
 	INIT_LIST_HEAD(&cache->object_list);
 	spin_lock_init(&cache->object_list_lock);
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
+	xa_init_flags(&cache->ondemand_ids, XA_FLAGS_ALLOC1);
+#endif
 
 	/* set default caching limits
 	 * - limit at 1% free space and/or free files
@@ -126,6 +133,30 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+static void cachefiles_flush_reqs(struct cachefiles_cache *cache)
+{
+	struct xarray *xa = &cache->reqs;
+	struct cachefiles_req *req;
+	unsigned long index;
+
+	/*
+	 * 1) Cache has been marked as dead state, and then 2) flush all
+	 * pending requests in @reqs xarray. The barrier inside set_bit()
+	 * will ensure that above two ops won't be reordered.
+	 */
+	xa_lock(xa);
+	xa_for_each(xa, index, req) {
+		req->error = -EIO;
+		complete(&req->done);
+	}
+	xa_unlock(xa);
+
+	xa_destroy(&cache->reqs);
+	xa_destroy(&cache->ondemand_ids);
+}
+#endif
+
 /*
  * Release a cache.
  */
@@ -139,6 +170,9 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 
 	set_bit(CACHEFILES_DEAD, &cache->flags);
 
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	cachefiles_flush_reqs(cache);
+#endif
 	cachefiles_daemon_unbind(cache);
 
 	/* clean up the control file interface */
@@ -152,23 +186,14 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-/*
- * Read the cache state.
- */
-static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
-				      size_t buflen, loff_t *pos)
+static ssize_t cachefiles_do_daemon_read(struct cachefiles_cache *cache,
+					 char __user *_buffer, size_t buflen)
 {
-	struct cachefiles_cache *cache = file->private_data;
 	unsigned long long b_released;
 	unsigned f_released;
 	char buffer[256];
 	int n;
 
-	//_enter(",,%zu,", buflen);
-
-	if (!test_bit(CACHEFILES_READY, &cache->flags))
-		return 0;
-
 	/* check how much space the cache has */
 	cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);
 
@@ -206,6 +231,26 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
 	return n;
 }
 
+/*
+ * Read the cache state.
+ */
+static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
+				      size_t buflen, loff_t *pos)
+{
+	struct cachefiles_cache *cache = file->private_data;
+
+	//_enter(",,%zu,", buflen);
+
+	if (!test_bit(CACHEFILES_READY, &cache->flags))
+		return 0;
+
+	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+	    test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return cachefiles_ondemand_daemon_read(cache, _buffer, buflen);
+	else
+		return cachefiles_do_daemon_read(cache, _buffer, buflen);
+}
+
 /*
  * Take a command from cachefilesd, parse it and act on it.
  */
@@ -297,8 +342,16 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
 	poll_wait(file, &cache->daemon_pollwq, poll);
 	mask = 0;
 
-	if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
-		mask |= EPOLLIN;
+	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+	    test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) {
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+		if (!xa_empty(&cache->reqs))
+			mask |= EPOLLIN;
+#endif
+	} else {
+		if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
+			mask |= EPOLLIN;
+	}
 
 	if (test_bit(CACHEFILES_CULLING, &cache->flags))
 		mask |= EPOLLOUT;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e80673d0ab97..8ebe238af20b 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -15,6 +15,8 @@
 #include <linux/fscache-cache.h>
 #include <linux/cred.h>
 #include <linux/security.h>
+#include <linux/xarray.h>
+#include <linux/cachefiles.h>
 
 #define CACHEFILES_DIO_BLOCK_SIZE 4096
 
@@ -58,8 +60,13 @@ struct cachefiles_object {
 	enum cachefiles_content		content_info:8;	/* Info about content presence */
 	unsigned long			flags;
 #define CACHEFILES_OBJECT_USING_TMPFILE	0		/* Have an unlinked tmpfile */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	int				ondemand_id;
+#endif
 };
 
+#define CACHEFILES_ONDEMAND_ID_CLOSED	-1
+
 /*
  * Cache files cache definition
  */
@@ -98,11 +105,26 @@ struct cachefiles_cache {
 #define CACHEFILES_DEAD			1	/* T if cache dead */
 #define CACHEFILES_CULLING		2	/* T if cull engaged */
 #define CACHEFILES_STATE_CHANGED	3	/* T if state changed (poll trigger) */
+#define CACHEFILES_ONDEMAND_MODE	4	/* T if in on-demand read mode */
 	char				*rootdirname;	/* name of cache root directory */
 	char				*secctx;	/* LSM security context */
 	char				*tag;		/* cache binding tag */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	struct xarray			reqs;		/* xarray of pending on-demand requests */
+	struct xarray			ondemand_ids;	/* xarray for ondemand_id allocation */
+	u32				ondemand_id_next;
+#endif
+};
+
+struct cachefiles_req {
+	struct cachefiles_object *object;
+	struct completion done;
+	int error;
+	struct cachefiles_msg msg;
 };
 
+#define CACHEFILES_REQ_NEW	XA_MARK_1
+
 #include <trace/events/cachefiles.h>
 
 static inline
@@ -250,6 +272,31 @@ extern struct file *cachefiles_create_tmpfile(struct cachefiles_object *object);
 extern bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
 				      struct cachefiles_object *object);
 
+/*
+ * ondemand.c
+ */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+extern ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					char __user *_buffer, size_t buflen);
+
+extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
+				     char *args);
+
+extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+
+#else
+static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					char __user *_buffer, size_t buflen)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+	return 0;
+}
+#endif
+
 /*
  * security.c
  */
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index ca9f3e4ec4b3..facf2ebe464b 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -452,10 +452,9 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 	struct dentry *fan = volume->fanout[(u8)object->cookie->key_hash];
 	struct file *file;
 	struct path path;
-	uint64_t ni_size = object->cookie->object_size;
+	uint64_t ni_size;
 	long ret;
 
-	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
 	cachefiles_begin_secure(cache, &saved_cred);
 
@@ -481,6 +480,15 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 		goto out_dput;
 	}
 
+	ret = cachefiles_ondemand_init_object(object);
+	if (ret < 0) {
+		file = ERR_PTR(ret);
+		goto out_unuse;
+	}
+
+	ni_size = object->cookie->object_size;
+	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
+
 	if (ni_size > 0) {
 		trace_cachefiles_trunc(object, d_backing_inode(path.dentry), 0, ni_size,
 				       cachefiles_trunc_expand_tmpfile);
@@ -586,6 +594,10 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
 	}
 	_debug("file -> %pd positive", dentry);
 
+	ret = cachefiles_ondemand_init_object(object);
+	if (ret < 0)
+		goto error_fput;
+
 	ret = cachefiles_check_auxdata(object, file);
 	if (ret < 0)
 		goto check_failed;
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
new file mode 100644
index 000000000000..890cd3ecc2f0
--- /dev/null
+++ b/fs/cachefiles/ondemand.c
@@ -0,0 +1,371 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include <linux/fdtable.h>
+#include <linux/anon_inodes.h>
+#include <linux/uio.h>
+#include "internal.h"
+
+static int cachefiles_ondemand_fd_release(struct inode *inode,
+					  struct file *file)
+{
+	struct cachefiles_object *object = file->private_data;
+	struct cachefiles_cache *cache = object->volume->cache;
+	int object_id = object->ondemand_id;
+
+	object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED;
+	xa_erase(&cache->ondemand_ids, object_id);
+	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+	return 0;
+}
+
+static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
+						 struct iov_iter *iter)
+{
+	struct cachefiles_object *object = kiocb->ki_filp->private_data;
+	struct cachefiles_cache *cache = object->volume->cache;
+	struct file *file = object->file;
+	size_t len = iter->count;
+	loff_t pos = kiocb->ki_pos;
+	const struct cred *saved_cred;
+	int ret;
+
+	if (!file)
+		return -ENOBUFS;
+
+	cachefiles_begin_secure(cache, &saved_cred);
+	ret = __cachefiles_prepare_write(object, file, &pos, &len, true);
+	cachefiles_end_secure(cache, saved_cred);
+	if (ret < 0)
+		return ret;
+
+	ret = __cachefiles_write(object, file, pos, iter, NULL, NULL);
+	if (!ret)
+		ret = len;
+
+	return ret;
+}
+
+static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos,
+					    int whence)
+{
+	struct cachefiles_object *object = filp->private_data;
+	struct file *file = object->file;
+
+	if (!file)
+		return -ENOBUFS;
+
+	return vfs_llseek(file, pos, whence);
+}
+
+static const struct file_operations cachefiles_ondemand_fd_fops = {
+	.owner		= THIS_MODULE,
+	.release	= cachefiles_ondemand_fd_release,
+	.write_iter	= cachefiles_ondemand_fd_write_iter,
+	.llseek		= cachefiles_ondemand_fd_llseek,
+};
+
+/*
+ * OPEN request Completion (copen)
+ * - command: "copen <id>,<cache_size>"
+ *   <cache_size> represents the object size if >=0, error code if negative
+ */
+int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
+{
+	struct cachefiles_req *req;
+	struct fscache_cookie *cookie;
+	char *pid, *psize;
+	unsigned long id;
+	long size;
+	int ret;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return -EOPNOTSUPP;
+
+	if (!*args) {
+		pr_err("Empty id specified\n");
+		return -EINVAL;
+	}
+
+	pid = args;
+	psize = strchr(args, ',');
+	if (!psize) {
+		pr_err("Cache size is not specified\n");
+		return -EINVAL;
+	}
+
+	*psize = 0;
+	psize++;
+
+	ret = kstrtoul(pid, 0, &id);
+	if (ret)
+		return ret;
+
+	req = xa_erase(&cache->reqs, id);
+	if (!req)
+		return -EINVAL;
+
+	/* fail OPEN request if copen format is invalid */
+	ret = kstrtol(psize, 0, &size);
+	if (ret) {
+		req->error = ret;
+		goto out;
+	}
+
+	/* fail OPEN request if daemon reports an error */
+	if (size < 0) {
+		if (!IS_ERR_VALUE(size))
+			size = -EINVAL;
+		req->error = size;
+		goto out;
+	}
+
+	cookie = req->object->cookie;
+	cookie->object_size = size;
+	if (size)
+		clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+	else
+		set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+
+out:
+	complete(&req->done);
+	return ret;
+}
+
+static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
+{
+	struct cachefiles_object *object;
+	struct cachefiles_cache *cache;
+	struct cachefiles_open *load;
+	struct file *file;
+	u32 object_id;
+	int ret, fd;
+
+	object = cachefiles_grab_object(req->object,
+			cachefiles_obj_get_ondemand_fd);
+	cache = object->volume->cache;
+
+	ret = xa_alloc_cyclic(&cache->ondemand_ids, &object_id, NULL,
+			      XA_LIMIT(1, INT_MAX),
+			      &cache->ondemand_id_next, GFP_KERNEL);
+	if (ret < 0)
+		goto err;
+
+	fd = get_unused_fd_flags(O_WRONLY);
+	if (fd < 0) {
+		ret = fd;
+		goto err_free_id;
+	}
+
+	file = anon_inode_getfile("[cachefiles]", &cachefiles_ondemand_fd_fops,
+				  object, O_WRONLY);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto err_put_fd;
+	}
+
+	file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
+	fd_install(fd, file);
+
+	load = (void *)req->msg.data;
+	load->fd = fd;
+	req->msg.object_id = object_id;
+	object->ondemand_id = object_id;
+	return 0;
+
+err_put_fd:
+	put_unused_fd(fd);
+err_free_id:
+	xa_erase(&cache->ondemand_ids, object_id);
+err:
+	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+	return ret;
+}
+
+ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					char __user *_buffer, size_t buflen)
+{
+	struct cachefiles_req *req;
+	struct cachefiles_msg *msg;
+	unsigned long id = 0;
+	size_t n;
+	int ret = 0;
+	XA_STATE(xas, &cache->reqs, 0);
+
+	/*
+	 * Search for a request that has not ever been processed, to prevent
+	 * requests from being processed repeatedly.
+	 */
+	xa_lock(&cache->reqs);
+	req = xas_find_marked(&xas, UINT_MAX, CACHEFILES_REQ_NEW);
+	if (!req) {
+		xa_unlock(&cache->reqs);
+		return 0;
+	}
+
+	msg = &req->msg;
+	n = msg->len;
+
+	if (n > buflen) {
+		xa_unlock(&cache->reqs);
+		return -EMSGSIZE;
+	}
+
+	xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
+	xa_unlock(&cache->reqs);
+
+	id = xas.xa_index;
+	msg->msg_id = id;
+
+	if (msg->opcode == CACHEFILES_OP_OPEN) {
+		ret = cachefiles_ondemand_get_fd(req);
+		if (ret)
+			goto error;
+	}
+
+	if (copy_to_user(_buffer, msg, n) != 0) {
+		ret = -EFAULT;
+		goto err_put_fd;
+	}
+
+	return n;
+
+err_put_fd:
+	if (msg->opcode == CACHEFILES_OP_OPEN)
+		close_fd(((struct cachefiles_open *)msg->data)->fd);
+error:
+	xa_erase(&cache->reqs, id);
+	req->error = ret;
+	complete(&req->done);
+	return ret;
+}
+
+typedef int (*init_req_fn)(struct cachefiles_req *req, void *private);
+
+static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
+					enum cachefiles_opcode opcode,
+					size_t data_len,
+					init_req_fn init_req,
+					void *private)
+{
+	struct cachefiles_cache *cache = object->volume->cache;
+	struct cachefiles_req *req;
+	XA_STATE(xas, &cache->reqs, 0);
+	int ret;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return 0;
+
+	if (test_bit(CACHEFILES_DEAD, &cache->flags))
+		return -EIO;
+
+	req = kzalloc(sizeof(*req) + data_len, GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->object = object;
+	init_completion(&req->done);
+	req->msg.opcode = opcode;
+	req->msg.len = sizeof(struct cachefiles_msg) + data_len;
+
+	ret = init_req(req, private);
+	if (ret)
+		goto out;
+
+	do {
+		/*
+		 * Stop enqueuing the request when daemon is dying. So we need
+		 * to 1) check cache state, and 2) enqueue request if cache is
+		 * alive.
+		 *
+		 * These two ops need to be atomic as a whole. Otherwise request
+		 * may be enqueued after xarray has been flushed, in which case
+		 * the orphan request will never be completed and thus netfs
+		 * will hang there forever.
+		 */
+		xas_lock(&xas);
+
+		/* recheck dead state with lock held */
+		if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+			xas_unlock(&xas);
+			ret = -EIO;
+			goto out;
+		}
+
+		xas.xa_index = 0;
+		xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
+		if (xas.xa_node == XAS_RESTART)
+			xas_set_err(&xas, -EBUSY);
+		xas_store(&xas, req);
+		xas_clear_mark(&xas, XA_FREE_MARK);
+		xas_set_mark(&xas, CACHEFILES_REQ_NEW);
+		xas_unlock(&xas);
+	} while (xas_nomem(&xas, GFP_KERNEL));
+
+	ret = xas_error(&xas);
+	if (ret)
+		goto out;
+
+	wake_up_all(&cache->daemon_pollwq);
+	wait_for_completion(&req->done);
+	ret = req->error;
+out:
+	kfree(req);
+	return ret;
+}
+
+static int cachefiles_ondemand_init_open_req(struct cachefiles_req *req,
+					     void *private)
+{
+	struct cachefiles_object *object = req->object;
+	struct fscache_cookie *cookie = object->cookie;
+	struct fscache_volume *volume = object->volume->vcookie;
+	struct cachefiles_open *load = (void *)req->msg.data;
+	size_t volume_key_size, cookie_key_size;
+	void *volume_key, *cookie_key;
+
+	/*
+	 * Volume key is a NUL-terminated string. key[0] stores strlen() of the
+	 * string, followed by the content of the string (excluding '\0').
+	 */
+	volume_key_size = volume->key[0] + 1;
+	volume_key = volume->key + 1;
+
+	/* Cookie key is binary data, which is netfs specific. */
+	cookie_key_size = cookie->key_len;
+	cookie_key = fscache_get_key(cookie);
+
+	if (!(object->cookie->advice & FSCACHE_ADV_WANT_CACHE_SIZE)) {
+		pr_err("WANT_CACHE_SIZE is needed for on-demand mode\n");
+		return -EINVAL;
+	}
+
+	load->volume_key_size = volume_key_size;
+	load->cookie_key_size = cookie_key_size;
+	memcpy(load->data, volume_key, volume_key_size);
+	memcpy(load->data + volume_key_size, cookie_key, cookie_key_size);
+
+	return 0;
+}
+
+int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+	struct fscache_cookie *cookie = object->cookie;
+	struct fscache_volume *volume = object->volume->vcookie;
+	size_t volume_key_size, cookie_key_size, data_len;
+
+	/*
+	 * Cachefiles will firstly check cache file under the root cache
+	 * directory. If coherency check failed, it will fallback to creating a
+	 * new tmpfile as the cache file. Reuse the previously allocated object
+	 * ID if any.
+	 */
+	if (object->ondemand_id > 0)
+		return 0;
+
+	volume_key_size = volume->key[0] + 1;
+	cookie_key_size = cookie->key_len;
+	data_len = sizeof(struct cachefiles_open) +
+		   volume_key_size + cookie_key_size;
+
+	return cachefiles_ondemand_send_req(object, CACHEFILES_OP_OPEN,
+			data_len, cachefiles_ondemand_init_open_req, NULL);
+}
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index e25539072463..72585c9729a2 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -39,6 +39,7 @@ struct fscache_cookie;
 #define FSCACHE_ADV_SINGLE_CHUNK	0x01 /* The object is a single chunk of data */
 #define FSCACHE_ADV_WRITE_CACHE		0x00 /* Do cache if written to locally */
 #define FSCACHE_ADV_WRITE_NOCACHE	0x02 /* Don't cache if written to locally */
+#define FSCACHE_ADV_WANT_CACHE_SIZE	0x04 /* Retrieve cache size at runtime */
 
 #define FSCACHE_INVAL_DIO_WRITE		0x01 /* Invalidate due to DIO write */
 
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 311c14a20e70..93df9391bd7f 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -31,6 +31,8 @@ enum cachefiles_obj_ref_trace {
 	cachefiles_obj_see_lookup_failed,
 	cachefiles_obj_see_withdraw_cookie,
 	cachefiles_obj_see_withdrawal,
+	cachefiles_obj_get_ondemand_fd,
+	cachefiles_obj_put_ondemand_fd,
 };
 
 enum fscache_why_object_killed {
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
new file mode 100644
index 000000000000..521f2fe4fe9c
--- /dev/null
+++ b/include/uapi/linux/cachefiles.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_CACHEFILES_H
+#define _LINUX_CACHEFILES_H
+
+#include <linux/types.h>
+
+/*
+ * Fscache ensures that the maximum length of cookie key is 255. The volume key
+ * is controlled by netfs, and generally no bigger than 255.
+ */
+#define CACHEFILES_MSG_MAX_SIZE	1024
+
+enum cachefiles_opcode {
+	CACHEFILES_OP_OPEN,
+};
+
+/*
+ * Message Header
+ *
+ * @msg_id	a unique ID identifying this message
+ * @opcode	message type, CACHEFILE_OP_*
+ * @len		message length, including message header and following data
+ * @object_id	a unique ID identifying a cache file
+ * @data	message type specific payload
+ */
+struct cachefiles_msg {
+	__u32 msg_id;
+	__u32 opcode;
+	__u32 len;
+	__u32 object_id;
+	__u8  data[];
+};
+
+/*
+ * @data contains the volume_key followed directly by the cookie_key. volume_key
+ * is a NUL-terminated string; @volume_key_size indicates the size of the volume
+ * key in bytes. cookie_key is binary data, which is netfs specific;
+ * @cookie_key_size indicates the size of the cookie key in bytes.
+ *
+ * @fd identifies an anon_fd referring to the cache file.
+ */
+struct cachefiles_open {
+	__u32 volume_key_size;
+	__u32 cookie_key_size;
+	__u32 fd;
+	__u32 flags;
+	__u8  data[];
+};
+
+#endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:35   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Add a refcount to avoid the deadlock in on-demand read mode. The
on-demand read mode will pin the corresponding cachefiles object for
each anonymous fd. The cachefiles object is unpinned when the anonymous
fd gets closed. When the user daemon exits and the fd of
"/dev/cachefiles" device node gets closed, it will wait for all
cahcefiles objects gets withdrawn. Then if there's any anonymous fd
getting closed after the fd of the device node, the user daemon will
hang forever, waiting for all objects getting withdrawn.

To fix this, add a refcount indicating if there's any object pinned by
anonymous fds. The cachefiles cache gets unbound and withdrawn when the
refcount decreased to 0. It won't change the behaviour of the original
mode, in which case the cachefiles cache gets unbound and withdrawn as
long as the fd of the device node gets closed. Besides, kref_get() is
adequate whilst kref_get_unless_zero() is not needed here, since no more
anonymous fd will be created when the .release() callback of the device
node fd has already been called.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/daemon.c   | 24 +++++++++++++++++++++---
 fs/cachefiles/internal.h |  3 +++
 fs/cachefiles/ondemand.c |  3 +++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 69ca22aa6abf..2e946e4eb65a 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -111,6 +111,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	INIT_LIST_HEAD(&cache->volumes);
 	INIT_LIST_HEAD(&cache->object_list);
 	spin_lock_init(&cache->object_list_lock);
+	kref_init(&cache->unbind_pincount);
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
 	xa_init_flags(&cache->ondemand_ids, XA_FLAGS_ALLOC1);
@@ -157,6 +158,25 @@ static void cachefiles_flush_reqs(struct cachefiles_cache *cache)
 }
 #endif
 
+static void cachefiles_release_cache(struct kref *kref)
+{
+	struct cachefiles_cache *cache;
+
+	cache = container_of(kref, struct cachefiles_cache, unbind_pincount);
+	cachefiles_daemon_unbind(cache);
+	kfree(cache);
+}
+
+void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache)
+{
+	kref_put(&cache->unbind_pincount, cachefiles_release_cache);
+}
+
+void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache)
+{
+	kref_get(&cache->unbind_pincount);
+}
+
 /*
  * Release a cache.
  */
@@ -173,14 +193,12 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 	cachefiles_flush_reqs(cache);
 #endif
-	cachefiles_daemon_unbind(cache);
-
 	/* clean up the control file interface */
 	cache->cachefilesd = NULL;
 	file->private_data = NULL;
 	cachefiles_open = 0;
 
-	kfree(cache);
+	cachefiles_put_unbind_pincount(cache);
 
 	_leave("");
 	return 0;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 8ebe238af20b..9b83d8c82709 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -109,6 +109,7 @@ struct cachefiles_cache {
 	char				*rootdirname;	/* name of cache root directory */
 	char				*secctx;	/* LSM security context */
 	char				*tag;		/* cache binding tag */
+	struct kref			unbind_pincount;/* refcount to do daemon unbind */
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 	struct xarray			reqs;		/* xarray of pending on-demand requests */
 	struct xarray			ondemand_ids;	/* xarray for ondemand_id allocation */
@@ -167,6 +168,8 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
  * daemon.c
  */
 extern const struct file_operations cachefiles_daemon_fops;
+extern void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache);
+extern void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache);
 
 /*
  * error_inject.c
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 890cd3ecc2f0..eec883640efa 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -14,6 +14,7 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
 	object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED;
 	xa_erase(&cache->ondemand_ids, object_id);
 	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+	cachefiles_put_unbind_pincount(cache);
 	return 0;
 }
 
@@ -169,6 +170,8 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
 	load->fd = fd;
 	req->msg.object_id = object_id;
 	object->ondemand_id = object_id;
+
+	cachefiles_get_unbind_pincount(cache);
 	return 0;
 
 err_put_fd:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode
@ 2022-04-15 12:35   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Add a refcount to avoid the deadlock in on-demand read mode. The
on-demand read mode will pin the corresponding cachefiles object for
each anonymous fd. The cachefiles object is unpinned when the anonymous
fd gets closed. When the user daemon exits and the fd of
"/dev/cachefiles" device node gets closed, it will wait for all
cahcefiles objects gets withdrawn. Then if there's any anonymous fd
getting closed after the fd of the device node, the user daemon will
hang forever, waiting for all objects getting withdrawn.

To fix this, add a refcount indicating if there's any object pinned by
anonymous fds. The cachefiles cache gets unbound and withdrawn when the
refcount decreased to 0. It won't change the behaviour of the original
mode, in which case the cachefiles cache gets unbound and withdrawn as
long as the fd of the device node gets closed. Besides, kref_get() is
adequate whilst kref_get_unless_zero() is not needed here, since no more
anonymous fd will be created when the .release() callback of the device
node fd has already been called.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/daemon.c   | 24 +++++++++++++++++++++---
 fs/cachefiles/internal.h |  3 +++
 fs/cachefiles/ondemand.c |  3 +++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 69ca22aa6abf..2e946e4eb65a 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -111,6 +111,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	INIT_LIST_HEAD(&cache->volumes);
 	INIT_LIST_HEAD(&cache->object_list);
 	spin_lock_init(&cache->object_list_lock);
+	kref_init(&cache->unbind_pincount);
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
 	xa_init_flags(&cache->ondemand_ids, XA_FLAGS_ALLOC1);
@@ -157,6 +158,25 @@ static void cachefiles_flush_reqs(struct cachefiles_cache *cache)
 }
 #endif
 
+static void cachefiles_release_cache(struct kref *kref)
+{
+	struct cachefiles_cache *cache;
+
+	cache = container_of(kref, struct cachefiles_cache, unbind_pincount);
+	cachefiles_daemon_unbind(cache);
+	kfree(cache);
+}
+
+void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache)
+{
+	kref_put(&cache->unbind_pincount, cachefiles_release_cache);
+}
+
+void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache)
+{
+	kref_get(&cache->unbind_pincount);
+}
+
 /*
  * Release a cache.
  */
@@ -173,14 +193,12 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 	cachefiles_flush_reqs(cache);
 #endif
-	cachefiles_daemon_unbind(cache);
-
 	/* clean up the control file interface */
 	cache->cachefilesd = NULL;
 	file->private_data = NULL;
 	cachefiles_open = 0;
 
-	kfree(cache);
+	cachefiles_put_unbind_pincount(cache);
 
 	_leave("");
 	return 0;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 8ebe238af20b..9b83d8c82709 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -109,6 +109,7 @@ struct cachefiles_cache {
 	char				*rootdirname;	/* name of cache root directory */
 	char				*secctx;	/* LSM security context */
 	char				*tag;		/* cache binding tag */
+	struct kref			unbind_pincount;/* refcount to do daemon unbind */
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 	struct xarray			reqs;		/* xarray of pending on-demand requests */
 	struct xarray			ondemand_ids;	/* xarray for ondemand_id allocation */
@@ -167,6 +168,8 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
  * daemon.c
  */
 extern const struct file_operations cachefiles_daemon_fops;
+extern void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache);
+extern void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache);
 
 /*
  * error_inject.c
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 890cd3ecc2f0..eec883640efa 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -14,6 +14,7 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
 	object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED;
 	xa_erase(&cache->ondemand_ids, object_id);
 	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+	cachefiles_put_unbind_pincount(cache);
 	return 0;
 }
 
@@ -169,6 +170,8 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
 	load->fd = fd;
 	req->msg.object_id = object_id;
 	object->ondemand_id = object_id;
+
+	cachefiles_get_unbind_pincount(cache);
 	return 0;
 
 err_put_fd:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 04/21] cachefiles: notify user daemon when withdrawing cookie
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:35   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Notify the user daemon that cookie is going to be withdrawn, providing a
hint that the associated anonymous fd can be closed.

Be noted that this is only a hint. The user daemon may close the
associated anonymous fd when receiving the CLOSE request, then it will
receive another anonymous fd when the cookie gets looked up. Or it may
ignore the CLOSE request, and keep writing data through the anonymous
fd. However the next time the cookie gets looked up, the user daemon
will still receive another new anonymous fd.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/interface.c       |  2 ++
 fs/cachefiles/internal.h        |  5 +++++
 fs/cachefiles/ondemand.c        | 38 +++++++++++++++++++++++++++++++++
 include/uapi/linux/cachefiles.h |  1 +
 4 files changed, 46 insertions(+)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index ae93cee9d25d..a69073a1d3f0 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -362,6 +362,8 @@ static void cachefiles_withdraw_cookie(struct fscache_cookie *cookie)
 		spin_unlock(&cache->object_list_lock);
 	}
 
+	cachefiles_ondemand_clean_object(object);
+
 	if (object->file) {
 		cachefiles_begin_secure(cache, &saved_cred);
 		cachefiles_clean_up_object(object, cache);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 9b83d8c82709..15332eae43c0 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -286,6 +286,7 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
 				     char *args);
 
 extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object);
 
 #else
 static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
@@ -298,6 +299,10 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje
 {
 	return 0;
 }
+
+static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
+{
+}
 #endif
 
 /*
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index eec883640efa..7ce383536f27 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -229,6 +229,12 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 		goto err_put_fd;
 	}
 
+	/* CLOSE request has no reply */
+	if (msg->opcode == CACHEFILES_OP_CLOSE) {
+		xa_erase(&cache->reqs, id);
+		complete(&req->done);
+	}
+
 	return n;
 
 err_put_fd:
@@ -293,6 +299,13 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
 			goto out;
 		}
 
+		if (opcode != CACHEFILES_OP_OPEN && object->ondemand_id <= 0) {
+			WARN_ON_ONCE(object->ondemand_id == 0);
+			xas_unlock(&xas);
+			ret = -EIO;
+			goto out;
+		}
+
 		xas.xa_index = 0;
 		xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
 		if (xas.xa_node == XAS_RESTART)
@@ -349,6 +362,25 @@ static int cachefiles_ondemand_init_open_req(struct cachefiles_req *req,
 	return 0;
 }
 
+static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req,
+					      void *private)
+{
+	struct cachefiles_object *object = req->object;
+	int object_id = object->ondemand_id;
+
+	/*
+	 * It's possiblie that object id is still 0 if the cookie looking up
+	 * phase failed before OPEN request has ever been sent. Also avoid
+	 * sending CLOSE request for CACHEFILES_ONDEMAND_ID_CLOSED, which means
+	 * anon_fd has already been closed.
+	 */
+	if (object_id <= 0)
+		return -ENOENT;
+
+	req->msg.object_id = object_id;
+	return 0;
+}
+
 int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 {
 	struct fscache_cookie *cookie = object->cookie;
@@ -372,3 +404,9 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 	return cachefiles_ondemand_send_req(object, CACHEFILES_OP_OPEN,
 			data_len, cachefiles_ondemand_init_open_req, NULL);
 }
+
+void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
+{
+	cachefiles_ondemand_send_req(object, CACHEFILES_OP_CLOSE, 0,
+			cachefiles_ondemand_init_close_req, NULL);
+}
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 521f2fe4fe9c..37a0071037c8 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -12,6 +12,7 @@
 
 enum cachefiles_opcode {
 	CACHEFILES_OP_OPEN,
+	CACHEFILES_OP_CLOSE,
 };
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 04/21] cachefiles: notify user daemon when withdrawing cookie
@ 2022-04-15 12:35   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Notify the user daemon that cookie is going to be withdrawn, providing a
hint that the associated anonymous fd can be closed.

Be noted that this is only a hint. The user daemon may close the
associated anonymous fd when receiving the CLOSE request, then it will
receive another anonymous fd when the cookie gets looked up. Or it may
ignore the CLOSE request, and keep writing data through the anonymous
fd. However the next time the cookie gets looked up, the user daemon
will still receive another new anonymous fd.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/interface.c       |  2 ++
 fs/cachefiles/internal.h        |  5 +++++
 fs/cachefiles/ondemand.c        | 38 +++++++++++++++++++++++++++++++++
 include/uapi/linux/cachefiles.h |  1 +
 4 files changed, 46 insertions(+)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index ae93cee9d25d..a69073a1d3f0 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -362,6 +362,8 @@ static void cachefiles_withdraw_cookie(struct fscache_cookie *cookie)
 		spin_unlock(&cache->object_list_lock);
 	}
 
+	cachefiles_ondemand_clean_object(object);
+
 	if (object->file) {
 		cachefiles_begin_secure(cache, &saved_cred);
 		cachefiles_clean_up_object(object, cache);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 9b83d8c82709..15332eae43c0 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -286,6 +286,7 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
 				     char *args);
 
 extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object);
 
 #else
 static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
@@ -298,6 +299,10 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje
 {
 	return 0;
 }
+
+static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
+{
+}
 #endif
 
 /*
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index eec883640efa..7ce383536f27 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -229,6 +229,12 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 		goto err_put_fd;
 	}
 
+	/* CLOSE request has no reply */
+	if (msg->opcode == CACHEFILES_OP_CLOSE) {
+		xa_erase(&cache->reqs, id);
+		complete(&req->done);
+	}
+
 	return n;
 
 err_put_fd:
@@ -293,6 +299,13 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
 			goto out;
 		}
 
+		if (opcode != CACHEFILES_OP_OPEN && object->ondemand_id <= 0) {
+			WARN_ON_ONCE(object->ondemand_id == 0);
+			xas_unlock(&xas);
+			ret = -EIO;
+			goto out;
+		}
+
 		xas.xa_index = 0;
 		xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
 		if (xas.xa_node == XAS_RESTART)
@@ -349,6 +362,25 @@ static int cachefiles_ondemand_init_open_req(struct cachefiles_req *req,
 	return 0;
 }
 
+static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req,
+					      void *private)
+{
+	struct cachefiles_object *object = req->object;
+	int object_id = object->ondemand_id;
+
+	/*
+	 * It's possiblie that object id is still 0 if the cookie looking up
+	 * phase failed before OPEN request has ever been sent. Also avoid
+	 * sending CLOSE request for CACHEFILES_ONDEMAND_ID_CLOSED, which means
+	 * anon_fd has already been closed.
+	 */
+	if (object_id <= 0)
+		return -ENOENT;
+
+	req->msg.object_id = object_id;
+	return 0;
+}
+
 int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 {
 	struct fscache_cookie *cookie = object->cookie;
@@ -372,3 +404,9 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 	return cachefiles_ondemand_send_req(object, CACHEFILES_OP_OPEN,
 			data_len, cachefiles_ondemand_init_open_req, NULL);
 }
+
+void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
+{
+	cachefiles_ondemand_send_req(object, CACHEFILES_OP_CLOSE, 0,
+			cachefiles_ondemand_init_close_req, NULL);
+}
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 521f2fe4fe9c..37a0071037c8 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -12,6 +12,7 @@
 
 enum cachefiles_opcode {
 	CACHEFILES_OP_OPEN,
+	CACHEFILES_OP_CLOSE,
 };
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 05/21] cachefiles: implement on-demand read
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:35   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Implement the data plane of on-demand read mode.

The early implementation [1] place the entry to
cachefiles_ondemand_read() in fscache_read(). However, fscache_read()
can only detect if the requested file range is fully cache miss, whilst
we need to notify the user daemon as long as there's a hole inside the
requested file range.

Thus the entry is now placed in cachefiles_prepare_read(). When working
in on-demand read mode, once a hole detected, the read routine will send
a READ request to the user daemon. The user daemon needs to fetch the
data and write it to the cache file. After sending the READ request, the
read routine will hang there, until the READ request is handled by the
user daemon. Then it will retry to read from the same file range. If no
progress encountered, the read routine will fail then.

A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand
read should be done when a cache miss encountered.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
[1] https://lore.kernel.org/all/20220406075612.60298-6-jefflexu@linux.alibaba.com/ #v8
---
 fs/cachefiles/internal.h        |  9 ++++
 fs/cachefiles/io.c              | 26 ++++++++++-
 fs/cachefiles/ondemand.c        | 77 +++++++++++++++++++++++++++++++++
 include/linux/netfs.h           |  2 +
 include/uapi/linux/cachefiles.h | 17 ++++++++
 5 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 15332eae43c0..3025556ff7d4 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -288,6 +288,9 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
 extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
 extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object);
 
+extern int cachefiles_ondemand_read(struct cachefiles_object *object,
+				    loff_t pos, size_t len);
+
 #else
 static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 					char __user *_buffer, size_t buflen)
@@ -303,6 +306,12 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje
 static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
 {
 }
+
+static inline int cachefiles_ondemand_read(struct cachefiles_object *object,
+					   loff_t pos, size_t len)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 /*
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 50a14e8f0aac..ccf77a969653 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -95,6 +95,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 	       file, file_inode(file)->i_ino, start_pos, len,
 	       i_size_read(file_inode(file)));
 
+retry:
 	/* If the caller asked us to seek for data before doing the read, then
 	 * we should do that now.  If we find a gap, we fill it with zeros.
 	 */
@@ -119,6 +120,16 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 			if (read_hole == NETFS_READ_HOLE_FAIL)
 				goto presubmission_error;
 
+			if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
+				ret = cachefiles_ondemand_read(object, off, len);
+				if (ret)
+					goto presubmission_error;
+
+				/* fail the read if no progress achieved */
+				read_hole = NETFS_READ_HOLE_FAIL;
+				goto retry;
+			}
+
 			iov_iter_zero(len, iter);
 			skipped = len;
 			ret = 0;
@@ -403,6 +414,7 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 	enum netfs_io_source ret = NETFS_DOWNLOAD_FROM_SERVER;
 	loff_t off, to;
 	ino_t ino = file ? file_inode(file)->i_ino : 0;
+	int rc;
 
 	_enter("%zx @%llx/%llx", subreq->len, subreq->start, i_size);
 
@@ -415,7 +427,8 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 	if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) {
 		__set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
 		why = cachefiles_trace_read_no_data;
-		goto out_no_object;
+		if (!test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags))
+			goto out_no_object;
 	}
 
 	/* The object and the file may be being created in the background. */
@@ -432,7 +445,7 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 	object = cachefiles_cres_object(cres);
 	cache = object->volume->cache;
 	cachefiles_begin_secure(cache, &saved_cred);
-
+retry:
 	off = cachefiles_inject_read_error();
 	if (off == 0)
 		off = vfs_llseek(file, subreq->start, SEEK_DATA);
@@ -483,6 +496,15 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 
 download_and_store:
 	__set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
+	if (test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags)) {
+		rc = cachefiles_ondemand_read(object, subreq->start,
+					      subreq->len);
+		if (!rc) {
+			__clear_bit(NETFS_SREQ_ONDEMAND, &subreq->flags);
+			goto retry;
+		}
+		ret = NETFS_INVALID_READ;
+	}
 out:
 	cachefiles_end_secure(cache, saved_cred);
 out_no_object:
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 7ce383536f27..10bdac26ce23 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -10,8 +10,25 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
 	struct cachefiles_object *object = file->private_data;
 	struct cachefiles_cache *cache = object->volume->cache;
 	int object_id = object->ondemand_id;
+	struct cachefiles_req *req;
+	XA_STATE(xas, &cache->reqs, 0);
 
+	xa_lock(&cache->reqs);
 	object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED;
+
+	/*
+	 * Flush all pending READ requests since their completion depends on
+	 * anon_fd.
+	 */
+	xas_for_each(&xas, req, ULONG_MAX) {
+		if (req->msg.opcode == CACHEFILES_OP_READ) {
+			req->error = -EIO;
+			complete(&req->done);
+			xas_store(&xas, NULL);
+		}
+	}
+	xa_unlock(&cache->reqs);
+
 	xa_erase(&cache->ondemand_ids, object_id);
 	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
 	cachefiles_put_unbind_pincount(cache);
@@ -57,11 +74,35 @@ static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos,
 	return vfs_llseek(file, pos, whence);
 }
 
+static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
+					 unsigned long arg)
+{
+	struct cachefiles_object *object = filp->private_data;
+	struct cachefiles_cache *cache = object->volume->cache;
+	struct cachefiles_req *req;
+	unsigned long id;
+
+	if (ioctl != CACHEFILES_IOC_CREAD)
+		return -EINVAL;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return -EOPNOTSUPP;
+
+	id = arg;
+	req = xa_erase(&cache->reqs, id);
+	if (!req)
+		return -EINVAL;
+
+	complete(&req->done);
+	return 0;
+}
+
 static const struct file_operations cachefiles_ondemand_fd_fops = {
 	.owner		= THIS_MODULE,
 	.release	= cachefiles_ondemand_fd_release,
 	.write_iter	= cachefiles_ondemand_fd_write_iter,
 	.llseek		= cachefiles_ondemand_fd_llseek,
+	.unlocked_ioctl	= cachefiles_ondemand_fd_ioctl,
 };
 
 /*
@@ -381,6 +422,32 @@ static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req,
 	return 0;
 }
 
+struct cachefiles_read_ctx {
+	loff_t off;
+	size_t len;
+};
+
+static int cachefiles_ondemand_init_read_req(struct cachefiles_req *req,
+					     void *private)
+{
+	struct cachefiles_object *object = req->object;
+	struct cachefiles_read *load = (void *)req->msg.data;
+	struct cachefiles_read_ctx *read_ctx = private;
+	int object_id = object->ondemand_id;
+
+	/* Stop enqueuing requests when daemon has closed anon_fd. */
+	if (object_id <= 0) {
+		WARN_ON_ONCE(object_id == 0);
+		pr_info_once("READ: anonymous fd closed prematurely.\n");
+		return -EIO;
+	}
+
+	req->msg.object_id = object_id;
+	load->off = read_ctx->off;
+	load->len = read_ctx->len;
+	return 0;
+}
+
 int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 {
 	struct fscache_cookie *cookie = object->cookie;
@@ -410,3 +477,13 @@ void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
 	cachefiles_ondemand_send_req(object, CACHEFILES_OP_CLOSE, 0,
 			cachefiles_ondemand_init_close_req, NULL);
 }
+
+int cachefiles_ondemand_read(struct cachefiles_object *object,
+			     loff_t pos, size_t len)
+{
+	struct cachefiles_read_ctx read_ctx = {pos, len};
+
+	return cachefiles_ondemand_send_req(object, CACHEFILES_OP_READ,
+			sizeof(struct cachefiles_read),
+			cachefiles_ondemand_init_read_req, &read_ctx);
+}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index c7bf1eaf51d5..02dbde48bb68 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -159,6 +159,7 @@ struct netfs_io_subrequest {
 #define NETFS_SREQ_SHORT_IO		2	/* Set if the I/O was short */
 #define NETFS_SREQ_SEEK_DATA_READ	3	/* Set if ->read() should SEEK_DATA first */
 #define NETFS_SREQ_NO_PROGRESS		4	/* Set if we didn't manage to read any data */
+#define NETFS_SREQ_ONDEMAND		5	/* Set if it's from on-demand read mode */
 };
 
 enum netfs_io_origin {
@@ -222,6 +223,7 @@ enum netfs_read_from_hole {
 	NETFS_READ_HOLE_IGNORE,
 	NETFS_READ_HOLE_CLEAR,
 	NETFS_READ_HOLE_FAIL,
+	NETFS_READ_HOLE_ONDEMAND,
 };
 
 /*
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 37a0071037c8..028fbf15e02b 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -3,6 +3,7 @@
 #define _LINUX_CACHEFILES_H
 
 #include <linux/types.h>
+#include <linux/ioctl.h>
 
 /*
  * Fscache ensures that the maximum length of cookie key is 255. The volume key
@@ -13,6 +14,7 @@
 enum cachefiles_opcode {
 	CACHEFILES_OP_OPEN,
 	CACHEFILES_OP_CLOSE,
+	CACHEFILES_OP_READ,
 };
 
 /*
@@ -48,4 +50,19 @@ struct cachefiles_open {
 	__u8  data[];
 };
 
+/*
+ * @off		indicates the starting offset of the requested file range
+ * @len		indicates the length of the requested file range
+ */
+struct cachefiles_read {
+	__u64 off;
+	__u64 len;
+};
+
+/*
+ * Reply for READ request (Completion for READ)
+ * arg for CACHEFILES_IOC_CREAD ioctl is the @id field of READ request.
+ */
+#define CACHEFILES_IOC_CREAD	_IOW(0x98, 1, int)
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 05/21] cachefiles: implement on-demand read
@ 2022-04-15 12:35   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Implement the data plane of on-demand read mode.

The early implementation [1] place the entry to
cachefiles_ondemand_read() in fscache_read(). However, fscache_read()
can only detect if the requested file range is fully cache miss, whilst
we need to notify the user daemon as long as there's a hole inside the
requested file range.

Thus the entry is now placed in cachefiles_prepare_read(). When working
in on-demand read mode, once a hole detected, the read routine will send
a READ request to the user daemon. The user daemon needs to fetch the
data and write it to the cache file. After sending the READ request, the
read routine will hang there, until the READ request is handled by the
user daemon. Then it will retry to read from the same file range. If no
progress encountered, the read routine will fail then.

A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand
read should be done when a cache miss encountered.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
[1] https://lore.kernel.org/all/20220406075612.60298-6-jefflexu@linux.alibaba.com/ #v8
---
 fs/cachefiles/internal.h        |  9 ++++
 fs/cachefiles/io.c              | 26 ++++++++++-
 fs/cachefiles/ondemand.c        | 77 +++++++++++++++++++++++++++++++++
 include/linux/netfs.h           |  2 +
 include/uapi/linux/cachefiles.h | 17 ++++++++
 5 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 15332eae43c0..3025556ff7d4 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -288,6 +288,9 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
 extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
 extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object);
 
+extern int cachefiles_ondemand_read(struct cachefiles_object *object,
+				    loff_t pos, size_t len);
+
 #else
 static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 					char __user *_buffer, size_t buflen)
@@ -303,6 +306,12 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje
 static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
 {
 }
+
+static inline int cachefiles_ondemand_read(struct cachefiles_object *object,
+					   loff_t pos, size_t len)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 /*
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 50a14e8f0aac..ccf77a969653 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -95,6 +95,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 	       file, file_inode(file)->i_ino, start_pos, len,
 	       i_size_read(file_inode(file)));
 
+retry:
 	/* If the caller asked us to seek for data before doing the read, then
 	 * we should do that now.  If we find a gap, we fill it with zeros.
 	 */
@@ -119,6 +120,16 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 			if (read_hole == NETFS_READ_HOLE_FAIL)
 				goto presubmission_error;
 
+			if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
+				ret = cachefiles_ondemand_read(object, off, len);
+				if (ret)
+					goto presubmission_error;
+
+				/* fail the read if no progress achieved */
+				read_hole = NETFS_READ_HOLE_FAIL;
+				goto retry;
+			}
+
 			iov_iter_zero(len, iter);
 			skipped = len;
 			ret = 0;
@@ -403,6 +414,7 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 	enum netfs_io_source ret = NETFS_DOWNLOAD_FROM_SERVER;
 	loff_t off, to;
 	ino_t ino = file ? file_inode(file)->i_ino : 0;
+	int rc;
 
 	_enter("%zx @%llx/%llx", subreq->len, subreq->start, i_size);
 
@@ -415,7 +427,8 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 	if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) {
 		__set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
 		why = cachefiles_trace_read_no_data;
-		goto out_no_object;
+		if (!test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags))
+			goto out_no_object;
 	}
 
 	/* The object and the file may be being created in the background. */
@@ -432,7 +445,7 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 	object = cachefiles_cres_object(cres);
 	cache = object->volume->cache;
 	cachefiles_begin_secure(cache, &saved_cred);
-
+retry:
 	off = cachefiles_inject_read_error();
 	if (off == 0)
 		off = vfs_llseek(file, subreq->start, SEEK_DATA);
@@ -483,6 +496,15 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 
 download_and_store:
 	__set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
+	if (test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags)) {
+		rc = cachefiles_ondemand_read(object, subreq->start,
+					      subreq->len);
+		if (!rc) {
+			__clear_bit(NETFS_SREQ_ONDEMAND, &subreq->flags);
+			goto retry;
+		}
+		ret = NETFS_INVALID_READ;
+	}
 out:
 	cachefiles_end_secure(cache, saved_cred);
 out_no_object:
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 7ce383536f27..10bdac26ce23 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -10,8 +10,25 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
 	struct cachefiles_object *object = file->private_data;
 	struct cachefiles_cache *cache = object->volume->cache;
 	int object_id = object->ondemand_id;
+	struct cachefiles_req *req;
+	XA_STATE(xas, &cache->reqs, 0);
 
+	xa_lock(&cache->reqs);
 	object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED;
+
+	/*
+	 * Flush all pending READ requests since their completion depends on
+	 * anon_fd.
+	 */
+	xas_for_each(&xas, req, ULONG_MAX) {
+		if (req->msg.opcode == CACHEFILES_OP_READ) {
+			req->error = -EIO;
+			complete(&req->done);
+			xas_store(&xas, NULL);
+		}
+	}
+	xa_unlock(&cache->reqs);
+
 	xa_erase(&cache->ondemand_ids, object_id);
 	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
 	cachefiles_put_unbind_pincount(cache);
@@ -57,11 +74,35 @@ static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos,
 	return vfs_llseek(file, pos, whence);
 }
 
+static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
+					 unsigned long arg)
+{
+	struct cachefiles_object *object = filp->private_data;
+	struct cachefiles_cache *cache = object->volume->cache;
+	struct cachefiles_req *req;
+	unsigned long id;
+
+	if (ioctl != CACHEFILES_IOC_CREAD)
+		return -EINVAL;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return -EOPNOTSUPP;
+
+	id = arg;
+	req = xa_erase(&cache->reqs, id);
+	if (!req)
+		return -EINVAL;
+
+	complete(&req->done);
+	return 0;
+}
+
 static const struct file_operations cachefiles_ondemand_fd_fops = {
 	.owner		= THIS_MODULE,
 	.release	= cachefiles_ondemand_fd_release,
 	.write_iter	= cachefiles_ondemand_fd_write_iter,
 	.llseek		= cachefiles_ondemand_fd_llseek,
+	.unlocked_ioctl	= cachefiles_ondemand_fd_ioctl,
 };
 
 /*
@@ -381,6 +422,32 @@ static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req,
 	return 0;
 }
 
+struct cachefiles_read_ctx {
+	loff_t off;
+	size_t len;
+};
+
+static int cachefiles_ondemand_init_read_req(struct cachefiles_req *req,
+					     void *private)
+{
+	struct cachefiles_object *object = req->object;
+	struct cachefiles_read *load = (void *)req->msg.data;
+	struct cachefiles_read_ctx *read_ctx = private;
+	int object_id = object->ondemand_id;
+
+	/* Stop enqueuing requests when daemon has closed anon_fd. */
+	if (object_id <= 0) {
+		WARN_ON_ONCE(object_id == 0);
+		pr_info_once("READ: anonymous fd closed prematurely.\n");
+		return -EIO;
+	}
+
+	req->msg.object_id = object_id;
+	load->off = read_ctx->off;
+	load->len = read_ctx->len;
+	return 0;
+}
+
 int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 {
 	struct fscache_cookie *cookie = object->cookie;
@@ -410,3 +477,13 @@ void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
 	cachefiles_ondemand_send_req(object, CACHEFILES_OP_CLOSE, 0,
 			cachefiles_ondemand_init_close_req, NULL);
 }
+
+int cachefiles_ondemand_read(struct cachefiles_object *object,
+			     loff_t pos, size_t len)
+{
+	struct cachefiles_read_ctx read_ctx = {pos, len};
+
+	return cachefiles_ondemand_send_req(object, CACHEFILES_OP_READ,
+			sizeof(struct cachefiles_read),
+			cachefiles_ondemand_init_read_req, &read_ctx);
+}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index c7bf1eaf51d5..02dbde48bb68 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -159,6 +159,7 @@ struct netfs_io_subrequest {
 #define NETFS_SREQ_SHORT_IO		2	/* Set if the I/O was short */
 #define NETFS_SREQ_SEEK_DATA_READ	3	/* Set if ->read() should SEEK_DATA first */
 #define NETFS_SREQ_NO_PROGRESS		4	/* Set if we didn't manage to read any data */
+#define NETFS_SREQ_ONDEMAND		5	/* Set if it's from on-demand read mode */
 };
 
 enum netfs_io_origin {
@@ -222,6 +223,7 @@ enum netfs_read_from_hole {
 	NETFS_READ_HOLE_IGNORE,
 	NETFS_READ_HOLE_CLEAR,
 	NETFS_READ_HOLE_FAIL,
+	NETFS_READ_HOLE_ONDEMAND,
 };
 
 /*
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 37a0071037c8..028fbf15e02b 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -3,6 +3,7 @@
 #define _LINUX_CACHEFILES_H
 
 #include <linux/types.h>
+#include <linux/ioctl.h>
 
 /*
  * Fscache ensures that the maximum length of cookie key is 255. The volume key
@@ -13,6 +14,7 @@
 enum cachefiles_opcode {
 	CACHEFILES_OP_OPEN,
 	CACHEFILES_OP_CLOSE,
+	CACHEFILES_OP_READ,
 };
 
 /*
@@ -48,4 +50,19 @@ struct cachefiles_open {
 	__u8  data[];
 };
 
+/*
+ * @off		indicates the starting offset of the requested file range
+ * @len		indicates the length of the requested file range
+ */
+struct cachefiles_read {
+	__u64 off;
+	__u64 len;
+};
+
+/*
+ * Reply for READ request (Completion for READ)
+ * arg for CACHEFILES_IOC_CREAD ioctl is the @id field of READ request.
+ */
+#define CACHEFILES_IOC_CREAD	_IOW(0x98, 1, int)
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 06/21] cachefiles: enable on-demand read mode
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:35   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Enable on-demand read mode by adding an optional parameter to the "bind"
command.

On-demand mode will be turned on when this parameter is "ondemand", i.e.
"bind ondemand". Otherwise cachefiles will work in the original mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/daemon.c | 13 ++++++++-----
 fs/cachefiles/io.c     | 11 -----------
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 2e946e4eb65a..c8bde21ace6a 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -758,11 +758,6 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 	    cache->brun_percent  >= 100)
 		return -ERANGE;
 
-	if (*args) {
-		pr_err("'bind' command doesn't take an argument\n");
-		return -EINVAL;
-	}
-
 	if (!cache->rootdirname) {
 		pr_err("No cache directory specified\n");
 		return -EINVAL;
@@ -774,6 +769,14 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 		return -EBUSY;
 	}
 
+	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+	    !strcmp(args, "ondemand")) {
+		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
+	} else if (*args) {
+		pr_err("'bind' command doesn't take an argument\n");
+		return -EINVAL;
+	}
+
 	/* Make sure we have copies of the tag string */
 	if (!cache->tag) {
 		/*
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index ccf77a969653..000a28f46e59 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -95,7 +95,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 	       file, file_inode(file)->i_ino, start_pos, len,
 	       i_size_read(file_inode(file)));
 
-retry:
 	/* If the caller asked us to seek for data before doing the read, then
 	 * we should do that now.  If we find a gap, we fill it with zeros.
 	 */
@@ -120,16 +119,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 			if (read_hole == NETFS_READ_HOLE_FAIL)
 				goto presubmission_error;
 
-			if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
-				ret = cachefiles_ondemand_read(object, off, len);
-				if (ret)
-					goto presubmission_error;
-
-				/* fail the read if no progress achieved */
-				read_hole = NETFS_READ_HOLE_FAIL;
-				goto retry;
-			}
-
 			iov_iter_zero(len, iter);
 			skipped = len;
 			ret = 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 06/21] cachefiles: enable on-demand read mode
@ 2022-04-15 12:35   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:35 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Enable on-demand read mode by adding an optional parameter to the "bind"
command.

On-demand mode will be turned on when this parameter is "ondemand", i.e.
"bind ondemand". Otherwise cachefiles will work in the original mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/daemon.c | 13 ++++++++-----
 fs/cachefiles/io.c     | 11 -----------
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 2e946e4eb65a..c8bde21ace6a 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -758,11 +758,6 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 	    cache->brun_percent  >= 100)
 		return -ERANGE;
 
-	if (*args) {
-		pr_err("'bind' command doesn't take an argument\n");
-		return -EINVAL;
-	}
-
 	if (!cache->rootdirname) {
 		pr_err("No cache directory specified\n");
 		return -EINVAL;
@@ -774,6 +769,14 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 		return -EBUSY;
 	}
 
+	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+	    !strcmp(args, "ondemand")) {
+		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
+	} else if (*args) {
+		pr_err("'bind' command doesn't take an argument\n");
+		return -EINVAL;
+	}
+
 	/* Make sure we have copies of the tag string */
 	if (!cache->tag) {
 		/*
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index ccf77a969653..000a28f46e59 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -95,7 +95,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 	       file, file_inode(file)->i_ino, start_pos, len,
 	       i_size_read(file_inode(file)));
 
-retry:
 	/* If the caller asked us to seek for data before doing the read, then
 	 * we should do that now.  If we find a gap, we fill it with zeros.
 	 */
@@ -120,16 +119,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 			if (read_hole == NETFS_READ_HOLE_FAIL)
 				goto presubmission_error;
 
-			if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
-				ret = cachefiles_ondemand_read(object, off, len);
-				if (ret)
-					goto presubmission_error;
-
-				/* fail the read if no progress achieved */
-				read_hole = NETFS_READ_HOLE_FAIL;
-				goto retry;
-			}
-
 			iov_iter_zero(len, iter);
 			skipped = len;
 			ret = 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 07/21] cachefiles: add tracepoints for on-demand read mode
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Add tracepoints for on-demand read mode. Currently following tracepoints
are added:

	OPEN request / COPEN reply
	CLOSE request
	READ request / CREAD reply
	write through anonymous fd
	release of anonymous fd

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/ondemand.c          |   7 ++
 include/trace/events/cachefiles.h | 174 ++++++++++++++++++++++++++++++
 2 files changed, 181 insertions(+)

diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 10bdac26ce23..3be65b825037 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -30,6 +30,7 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
 	xa_unlock(&cache->reqs);
 
 	xa_erase(&cache->ondemand_ids, object_id);
+	trace_cachefiles_ondemand_fd_release(object, object_id);
 	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
 	cachefiles_put_unbind_pincount(cache);
 	return 0;
@@ -55,6 +56,7 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
 	if (ret < 0)
 		return ret;
 
+	trace_cachefiles_ondemand_fd_write(object, file_inode(file), pos, len);
 	ret = __cachefiles_write(object, file, pos, iter, NULL, NULL);
 	if (!ret)
 		ret = len;
@@ -93,6 +95,7 @@ static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
 	if (!req)
 		return -EINVAL;
 
+	trace_cachefiles_ondemand_cread(object, id);
 	complete(&req->done);
 	return 0;
 }
@@ -166,6 +169,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 		clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
 	else
 		set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+	trace_cachefiles_ondemand_copen(req->object, id, size);
 
 out:
 	complete(&req->done);
@@ -213,6 +217,7 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
 	object->ondemand_id = object_id;
 
 	cachefiles_get_unbind_pincount(cache);
+	trace_cachefiles_ondemand_open(object, &req->msg, load);
 	return 0;
 
 err_put_fd:
@@ -419,6 +424,7 @@ static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req,
 		return -ENOENT;
 
 	req->msg.object_id = object_id;
+	trace_cachefiles_ondemand_close(object, &req->msg);
 	return 0;
 }
 
@@ -445,6 +451,7 @@ static int cachefiles_ondemand_init_read_req(struct cachefiles_req *req,
 	req->msg.object_id = object_id;
 	load->off = read_ctx->off;
 	load->len = read_ctx->len;
+	trace_cachefiles_ondemand_read(object, &req->msg, load);
 	return 0;
 }
 
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 93df9391bd7f..d8d4d73fe7b6 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -673,6 +673,180 @@ TRACE_EVENT(cachefiles_io_error,
 		      __entry->error)
 	    );
 
+TRACE_EVENT(cachefiles_ondemand_open,
+	    TP_PROTO(struct cachefiles_object *obj, struct cachefiles_msg *msg,
+		     struct cachefiles_open *load),
+
+	    TP_ARGS(obj, msg, load),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj		)
+		    __field(unsigned int,	msg_id		)
+		    __field(unsigned int,	object_id	)
+		    __field(unsigned int,	fd		)
+		    __field(unsigned int,	flags		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->msg_id	= msg->msg_id;
+		    __entry->object_id	= msg->object_id;
+		    __entry->fd		= load->fd;
+		    __entry->flags	= load->flags;
+			   ),
+
+	    TP_printk("o=%08x mid=%x oid=%x fd=%d f=%x",
+		      __entry->obj,
+		      __entry->msg_id,
+		      __entry->object_id,
+		      __entry->fd,
+		      __entry->flags)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_copen,
+	    TP_PROTO(struct cachefiles_object *obj, unsigned int msg_id,
+		     long len),
+
+	    TP_ARGS(obj, msg_id, len),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj	)
+		    __field(unsigned int,	msg_id	)
+		    __field(long,		len	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->msg_id	= msg_id;
+		    __entry->len	= len;
+			   ),
+
+	    TP_printk("o=%08x mid=%x l=%lx",
+		      __entry->obj,
+		      __entry->msg_id,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_close,
+	    TP_PROTO(struct cachefiles_object *obj, struct cachefiles_msg *msg),
+
+	    TP_ARGS(obj, msg),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj		)
+		    __field(unsigned int,	msg_id		)
+		    __field(unsigned int,	object_id	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->msg_id	= msg->msg_id;
+		    __entry->object_id	= msg->object_id;
+			   ),
+
+	    TP_printk("o=%08x mid=%x oid=%x",
+		      __entry->obj,
+		      __entry->msg_id,
+		      __entry->object_id)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_read,
+	    TP_PROTO(struct cachefiles_object *obj, struct cachefiles_msg *msg,
+		     struct cachefiles_read *load),
+
+	    TP_ARGS(obj, msg, load),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj		)
+		    __field(unsigned int,	msg_id		)
+		    __field(unsigned int,	object_id	)
+		    __field(loff_t,		start		)
+		    __field(size_t,		len		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->msg_id	= msg->msg_id;
+		    __entry->object_id	= msg->object_id;
+		    __entry->start	= load->off;
+		    __entry->len	= load->len;
+			   ),
+
+	    TP_printk("o=%08x mid=%x oid=%x s=%llx l=%zx",
+		      __entry->obj,
+		      __entry->msg_id,
+		      __entry->object_id,
+		      __entry->start,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_cread,
+	    TP_PROTO(struct cachefiles_object *obj, unsigned int msg_id),
+
+	    TP_ARGS(obj, msg_id),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj	)
+		    __field(unsigned int,	msg_id	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->msg_id	= msg_id;
+			   ),
+
+	    TP_printk("o=%08x mid=%x",
+		      __entry->obj,
+		      __entry->msg_id)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_fd_write,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
+		     loff_t start, size_t len),
+
+	    TP_ARGS(obj, backer, start, len),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj	)
+		    __field(unsigned int,	backer	)
+		    __field(loff_t,		start	)
+		    __field(size_t,		len	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->backer	= backer->i_ino;
+		    __entry->start	= start;
+		    __entry->len	= len;
+			   ),
+
+	    TP_printk("o=%08x iB=%x s=%llx l=%zx",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->start,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_fd_release,
+	    TP_PROTO(struct cachefiles_object *obj, int object_id),
+
+	    TP_ARGS(obj, object_id),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj		)
+		    __field(unsigned int,	object_id	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->object_id	= object_id;
+			   ),
+
+	    TP_printk("o=%08x oid=%x",
+		      __entry->obj,
+		      __entry->object_id)
+	    );
+
 #endif /* _TRACE_CACHEFILES_H */
 
 /* This part must be outside protection */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 07/21] cachefiles: add tracepoints for on-demand read mode
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Add tracepoints for on-demand read mode. Currently following tracepoints
are added:

	OPEN request / COPEN reply
	CLOSE request
	READ request / CREAD reply
	write through anonymous fd
	release of anonymous fd

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/ondemand.c          |   7 ++
 include/trace/events/cachefiles.h | 174 ++++++++++++++++++++++++++++++
 2 files changed, 181 insertions(+)

diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 10bdac26ce23..3be65b825037 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -30,6 +30,7 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
 	xa_unlock(&cache->reqs);
 
 	xa_erase(&cache->ondemand_ids, object_id);
+	trace_cachefiles_ondemand_fd_release(object, object_id);
 	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
 	cachefiles_put_unbind_pincount(cache);
 	return 0;
@@ -55,6 +56,7 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
 	if (ret < 0)
 		return ret;
 
+	trace_cachefiles_ondemand_fd_write(object, file_inode(file), pos, len);
 	ret = __cachefiles_write(object, file, pos, iter, NULL, NULL);
 	if (!ret)
 		ret = len;
@@ -93,6 +95,7 @@ static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
 	if (!req)
 		return -EINVAL;
 
+	trace_cachefiles_ondemand_cread(object, id);
 	complete(&req->done);
 	return 0;
 }
@@ -166,6 +169,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 		clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
 	else
 		set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+	trace_cachefiles_ondemand_copen(req->object, id, size);
 
 out:
 	complete(&req->done);
@@ -213,6 +217,7 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
 	object->ondemand_id = object_id;
 
 	cachefiles_get_unbind_pincount(cache);
+	trace_cachefiles_ondemand_open(object, &req->msg, load);
 	return 0;
 
 err_put_fd:
@@ -419,6 +424,7 @@ static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req,
 		return -ENOENT;
 
 	req->msg.object_id = object_id;
+	trace_cachefiles_ondemand_close(object, &req->msg);
 	return 0;
 }
 
@@ -445,6 +451,7 @@ static int cachefiles_ondemand_init_read_req(struct cachefiles_req *req,
 	req->msg.object_id = object_id;
 	load->off = read_ctx->off;
 	load->len = read_ctx->len;
+	trace_cachefiles_ondemand_read(object, &req->msg, load);
 	return 0;
 }
 
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 93df9391bd7f..d8d4d73fe7b6 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -673,6 +673,180 @@ TRACE_EVENT(cachefiles_io_error,
 		      __entry->error)
 	    );
 
+TRACE_EVENT(cachefiles_ondemand_open,
+	    TP_PROTO(struct cachefiles_object *obj, struct cachefiles_msg *msg,
+		     struct cachefiles_open *load),
+
+	    TP_ARGS(obj, msg, load),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj		)
+		    __field(unsigned int,	msg_id		)
+		    __field(unsigned int,	object_id	)
+		    __field(unsigned int,	fd		)
+		    __field(unsigned int,	flags		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->msg_id	= msg->msg_id;
+		    __entry->object_id	= msg->object_id;
+		    __entry->fd		= load->fd;
+		    __entry->flags	= load->flags;
+			   ),
+
+	    TP_printk("o=%08x mid=%x oid=%x fd=%d f=%x",
+		      __entry->obj,
+		      __entry->msg_id,
+		      __entry->object_id,
+		      __entry->fd,
+		      __entry->flags)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_copen,
+	    TP_PROTO(struct cachefiles_object *obj, unsigned int msg_id,
+		     long len),
+
+	    TP_ARGS(obj, msg_id, len),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj	)
+		    __field(unsigned int,	msg_id	)
+		    __field(long,		len	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->msg_id	= msg_id;
+		    __entry->len	= len;
+			   ),
+
+	    TP_printk("o=%08x mid=%x l=%lx",
+		      __entry->obj,
+		      __entry->msg_id,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_close,
+	    TP_PROTO(struct cachefiles_object *obj, struct cachefiles_msg *msg),
+
+	    TP_ARGS(obj, msg),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj		)
+		    __field(unsigned int,	msg_id		)
+		    __field(unsigned int,	object_id	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->msg_id	= msg->msg_id;
+		    __entry->object_id	= msg->object_id;
+			   ),
+
+	    TP_printk("o=%08x mid=%x oid=%x",
+		      __entry->obj,
+		      __entry->msg_id,
+		      __entry->object_id)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_read,
+	    TP_PROTO(struct cachefiles_object *obj, struct cachefiles_msg *msg,
+		     struct cachefiles_read *load),
+
+	    TP_ARGS(obj, msg, load),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj		)
+		    __field(unsigned int,	msg_id		)
+		    __field(unsigned int,	object_id	)
+		    __field(loff_t,		start		)
+		    __field(size_t,		len		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->msg_id	= msg->msg_id;
+		    __entry->object_id	= msg->object_id;
+		    __entry->start	= load->off;
+		    __entry->len	= load->len;
+			   ),
+
+	    TP_printk("o=%08x mid=%x oid=%x s=%llx l=%zx",
+		      __entry->obj,
+		      __entry->msg_id,
+		      __entry->object_id,
+		      __entry->start,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_cread,
+	    TP_PROTO(struct cachefiles_object *obj, unsigned int msg_id),
+
+	    TP_ARGS(obj, msg_id),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj	)
+		    __field(unsigned int,	msg_id	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->msg_id	= msg_id;
+			   ),
+
+	    TP_printk("o=%08x mid=%x",
+		      __entry->obj,
+		      __entry->msg_id)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_fd_write,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
+		     loff_t start, size_t len),
+
+	    TP_ARGS(obj, backer, start, len),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj	)
+		    __field(unsigned int,	backer	)
+		    __field(loff_t,		start	)
+		    __field(size_t,		len	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->backer	= backer->i_ino;
+		    __entry->start	= start;
+		    __entry->len	= len;
+			   ),
+
+	    TP_printk("o=%08x iB=%x s=%llx l=%zx",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->start,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_ondemand_fd_release,
+	    TP_PROTO(struct cachefiles_object *obj, int object_id),
+
+	    TP_ARGS(obj, object_id),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	obj		)
+		    __field(unsigned int,	object_id	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj ? obj->debug_id : 0;
+		    __entry->object_id	= object_id;
+			   ),
+
+	    TP_printk("o=%08x oid=%x",
+		      __entry->obj,
+		      __entry->object_id)
+	    );
+
 #endif /* _TRACE_CACHEFILES_H */
 
 /* This part must be outside protection */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 08/21] cachefiles: document on-demand read mode
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Document new user interface introduced by on-demand read mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 .../filesystems/caching/cachefiles.rst        | 170 ++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst
index 8bf396b76359..c10a16957141 100644
--- a/Documentation/filesystems/caching/cachefiles.rst
+++ b/Documentation/filesystems/caching/cachefiles.rst
@@ -28,6 +28,7 @@ Cache on Already Mounted Filesystem
 
  (*) Debugging.
 
+ (*) On-demand Read.
 
 
 Overview
@@ -482,3 +483,172 @@ the control file.  For example::
 	echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
 
 will turn on all function entry debugging.
+
+
+On-demand Read
+==============
+
+When working in its original mode, cachefiles mainly serves as a local cache
+for a remote networking fs - while in on-demand read mode, cachefiles can boost
+the scenario where on-demand read semantics is needed, e.g. container image
+distribution.
+
+The essential difference between these two modes is that, in original mode,
+when a cache miss occurs, the netfs will fetch the data from the remote server
+and then write it to the cache file.  With on-demand read mode, however,
+fetching the data and writing it into the cache is delegated to a user daemon.
+
+``CONFIG_CACHEFILES_ONDEMAND`` should be enabled to support on-demand read mode.
+
+
+Protocol Communication
+----------------------
+
+The on-demand read mode relies on a simple protocol used for communication
+between kernel and user daemon. The protocol can be modeled as::
+
+	kernel --[request]--> user daemon --[reply]--> kernel
+
+The cachefiles kernel module will send requests to the user daemon when needed.
+The user daemon needs to poll on the devnode ('/dev/cachefiles') to check if
+there's a pending request to be processed.  A POLLIN event will be returned
+when there's a pending request.
+
+The user daemon then reads the devnode to fetch a request and process it
+accordingly.  It is worth noting that each read only gets one request. When
+finished processing the request, the user daemon needs to write the reply to
+the devnode.
+
+Each request starts with a message header of the form::
+
+	struct cachefiles_msg {
+		__u32 msg_id;
+		__u32 opcode;
+		__u32 len;
+		__u32 object_id;
+		__u8  data[];
+	};
+
+	where:
+
+	* ``msg_id`` is a unique ID identifying this request among all pending
+	  requests.
+
+	* ``opcode`` indicates the type of this request.
+
+	* ``object_id`` is a unique ID identifying the cache file operated on.
+
+	* ``data`` indicates the payload of this request.
+
+	* ``len`` indicates the whole length of this request, including the
+	  header and following type-specific payload.
+
+
+Turn on On-demand Mode
+----------------------
+
+An optional parameter is added to the "bind" command::
+
+	bind [ondemand]
+
+When the "bind" command takes without argument, it defaults to the original
+mode.  When the "bind" command is given the "ondemand" argument, i.e.
+"bind ondemand", on-demand read mode will be enabled.
+
+
+The OPEN Request
+----------------
+
+When the netfs opens a cache file for the first time, a request with the
+CACHEFILES_OP_OPEN opcode, a.k.a an OPEN request will be sent to the user
+daemon.  The payload format is of the form::
+
+	struct cachefiles_open {
+		__u32 volume_key_size;
+		__u32 cookie_key_size;
+		__u32 fd;
+		__u32 flags;
+		__u8  data[];
+	};
+
+	where:
+
+	* ``data`` contains the volume_key followed directly by the cookie_key.
+	  The volume key is a NUL-terminated string; the cookie key is binary
+	  data.
+
+	* ``volume_key_size`` indicates the size of the volume key in bytes.
+
+	* ``cookie_key_size`` indicates the size of the cookie key in bytes.
+
+	* ``fd`` indicates an anonymous fd referring to the cache file, through
+	  which the user daemon can perform write/llseek file operations on the
+	  cache file.
+
+
+The user daemon is able to distinguish the requested cache file with the given
+(volume_key, cookie_key) pair. Each cache file has a unique object_id, while it
+may have multiple anonymous fds. The user daemon may duplicate anonymous fds
+from the initial anonymous fd indicated by the @fd field through dup(). Thus
+each object_id can be mapped to multiple anonymous fds, while the usr daemon
+itself needs to maintain the mapping.
+
+With the given anonymous fd, the user daemon can fetch data and write it to the
+cache file in the background, even when kernel has not triggered a cache miss
+yet.
+
+The user daemon should complete the READ request by issuing a "copen" (complete
+open) command on the devnode::
+
+	copen <msg_id>,<cache_size>
+
+	* ``msg_id`` must match the msg_id field of the previous OPEN request.
+
+	* When >= 0, ``cache_size`` indicates the size of the cache file;
+	  when < 0, ``cache_size`` indicates the error code ecountered by the
+	  user daemon.
+
+
+The CLOSE Request
+-----------------
+
+When a cookie withdrawn, a CLOSE request (opcode CACHEFILES_OP_CLOSE) will be
+sent to the user daemon. It will notify the user daemon to close all anonymous
+fds associated with the given object_id.  The CLOSE request has no extea
+payload.
+
+
+The READ Request
+----------------
+
+When on-demand read mode is turned on, and a cache miss encountered, the kernel
+will send a READ request (opcode CACHEFILES_OP_READ) to the user daemon. This
+will tell the user daemon to fetch data of the requested file range. The payload
+is of the form::
+
+	struct cachefiles_read {
+		__u64 off;
+		__u64 len;
+	};
+
+	where:
+
+	* ``off`` indicates the starting offset of the requested file range.
+
+	* ``len`` indicates the length of the requested file range.
+
+
+When receiving a READ request, the user daemon needs to fetch the data of the
+requested file range, and then write it to the cache file identified by
+object_id.
+
+To finish processing the READ request, the user daemon should reply with the
+CACHEFILES_IOC_CREAD ioctl on one of the anonymous fds associated with the given
+object_id in the READ request.  The ioctl is of the form::
+
+	ioctl(fd, CACHEFILES_IOC_CREAD, msg_id);
+
+	* ``fd`` is one of the anonymous fds associated with the given object_id
+	  in the READ request.
+
+	* ``msg_id`` must match the msg_id field of the previous READ request.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 08/21] cachefiles: document on-demand read mode
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Document new user interface introduced by on-demand read mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 .../filesystems/caching/cachefiles.rst        | 170 ++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst
index 8bf396b76359..c10a16957141 100644
--- a/Documentation/filesystems/caching/cachefiles.rst
+++ b/Documentation/filesystems/caching/cachefiles.rst
@@ -28,6 +28,7 @@ Cache on Already Mounted Filesystem
 
  (*) Debugging.
 
+ (*) On-demand Read.
 
 
 Overview
@@ -482,3 +483,172 @@ the control file.  For example::
 	echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
 
 will turn on all function entry debugging.
+
+
+On-demand Read
+==============
+
+When working in its original mode, cachefiles mainly serves as a local cache
+for a remote networking fs - while in on-demand read mode, cachefiles can boost
+the scenario where on-demand read semantics is needed, e.g. container image
+distribution.
+
+The essential difference between these two modes is that, in original mode,
+when a cache miss occurs, the netfs will fetch the data from the remote server
+and then write it to the cache file.  With on-demand read mode, however,
+fetching the data and writing it into the cache is delegated to a user daemon.
+
+``CONFIG_CACHEFILES_ONDEMAND`` should be enabled to support on-demand read mode.
+
+
+Protocol Communication
+----------------------
+
+The on-demand read mode relies on a simple protocol used for communication
+between kernel and user daemon. The protocol can be modeled as::
+
+	kernel --[request]--> user daemon --[reply]--> kernel
+
+The cachefiles kernel module will send requests to the user daemon when needed.
+The user daemon needs to poll on the devnode ('/dev/cachefiles') to check if
+there's a pending request to be processed.  A POLLIN event will be returned
+when there's a pending request.
+
+The user daemon then reads the devnode to fetch a request and process it
+accordingly.  It is worth noting that each read only gets one request. When
+finished processing the request, the user daemon needs to write the reply to
+the devnode.
+
+Each request starts with a message header of the form::
+
+	struct cachefiles_msg {
+		__u32 msg_id;
+		__u32 opcode;
+		__u32 len;
+		__u32 object_id;
+		__u8  data[];
+	};
+
+	where:
+
+	* ``msg_id`` is a unique ID identifying this request among all pending
+	  requests.
+
+	* ``opcode`` indicates the type of this request.
+
+	* ``object_id`` is a unique ID identifying the cache file operated on.
+
+	* ``data`` indicates the payload of this request.
+
+	* ``len`` indicates the whole length of this request, including the
+	  header and following type-specific payload.
+
+
+Turn on On-demand Mode
+----------------------
+
+An optional parameter is added to the "bind" command::
+
+	bind [ondemand]
+
+When the "bind" command takes without argument, it defaults to the original
+mode.  When the "bind" command is given the "ondemand" argument, i.e.
+"bind ondemand", on-demand read mode will be enabled.
+
+
+The OPEN Request
+----------------
+
+When the netfs opens a cache file for the first time, a request with the
+CACHEFILES_OP_OPEN opcode, a.k.a an OPEN request will be sent to the user
+daemon.  The payload format is of the form::
+
+	struct cachefiles_open {
+		__u32 volume_key_size;
+		__u32 cookie_key_size;
+		__u32 fd;
+		__u32 flags;
+		__u8  data[];
+	};
+
+	where:
+
+	* ``data`` contains the volume_key followed directly by the cookie_key.
+	  The volume key is a NUL-terminated string; the cookie key is binary
+	  data.
+
+	* ``volume_key_size`` indicates the size of the volume key in bytes.
+
+	* ``cookie_key_size`` indicates the size of the cookie key in bytes.
+
+	* ``fd`` indicates an anonymous fd referring to the cache file, through
+	  which the user daemon can perform write/llseek file operations on the
+	  cache file.
+
+
+The user daemon is able to distinguish the requested cache file with the given
+(volume_key, cookie_key) pair. Each cache file has a unique object_id, while it
+may have multiple anonymous fds. The user daemon may duplicate anonymous fds
+from the initial anonymous fd indicated by the @fd field through dup(). Thus
+each object_id can be mapped to multiple anonymous fds, while the usr daemon
+itself needs to maintain the mapping.
+
+With the given anonymous fd, the user daemon can fetch data and write it to the
+cache file in the background, even when kernel has not triggered a cache miss
+yet.
+
+The user daemon should complete the READ request by issuing a "copen" (complete
+open) command on the devnode::
+
+	copen <msg_id>,<cache_size>
+
+	* ``msg_id`` must match the msg_id field of the previous OPEN request.
+
+	* When >= 0, ``cache_size`` indicates the size of the cache file;
+	  when < 0, ``cache_size`` indicates the error code ecountered by the
+	  user daemon.
+
+
+The CLOSE Request
+-----------------
+
+When a cookie withdrawn, a CLOSE request (opcode CACHEFILES_OP_CLOSE) will be
+sent to the user daemon. It will notify the user daemon to close all anonymous
+fds associated with the given object_id.  The CLOSE request has no extea
+payload.
+
+
+The READ Request
+----------------
+
+When on-demand read mode is turned on, and a cache miss encountered, the kernel
+will send a READ request (opcode CACHEFILES_OP_READ) to the user daemon. This
+will tell the user daemon to fetch data of the requested file range. The payload
+is of the form::
+
+	struct cachefiles_read {
+		__u64 off;
+		__u64 len;
+	};
+
+	where:
+
+	* ``off`` indicates the starting offset of the requested file range.
+
+	* ``len`` indicates the length of the requested file range.
+
+
+When receiving a READ request, the user daemon needs to fetch the data of the
+requested file range, and then write it to the cache file identified by
+object_id.
+
+To finish processing the READ request, the user daemon should reply with the
+CACHEFILES_IOC_CREAD ioctl on one of the anonymous fds associated with the given
+object_id in the READ request.  The ioctl is of the form::
+
+	ioctl(fd, CACHEFILES_IOC_CREAD, msg_id);
+
+	* ``fd`` is one of the anonymous fds associated with the given object_id
+	  in the READ request.
+
+	* ``msg_id`` must match the msg_id field of the previous READ request.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 09/21] erofs: make erofs_map_blocks() generally available
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

... so that it can be used in the following introduced fscache mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/data.c     | 4 ++--
 fs/erofs/internal.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 780db1e5f4b7..bc22642358ec 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -110,8 +110,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
 	return 0;
 }
 
-static int erofs_map_blocks(struct inode *inode,
-			    struct erofs_map_blocks *map, int flags)
+int erofs_map_blocks(struct inode *inode,
+		     struct erofs_map_blocks *map, int flags)
 {
 	struct super_block *sb = inode->i_sb;
 	struct erofs_inode *vi = EROFS_I(inode);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 5298c4ee277d..fe9564e5091e 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -486,6 +486,8 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
 int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
 int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		 u64 start, u64 len);
+int erofs_map_blocks(struct inode *inode,
+		     struct erofs_map_blocks *map, int flags);
 
 /* inode.c */
 static inline unsigned long erofs_inode_hash(erofs_nid_t nid)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 09/21] erofs: make erofs_map_blocks() generally available
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

... so that it can be used in the following introduced fscache mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/data.c     | 4 ++--
 fs/erofs/internal.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 780db1e5f4b7..bc22642358ec 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -110,8 +110,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
 	return 0;
 }
 
-static int erofs_map_blocks(struct inode *inode,
-			    struct erofs_map_blocks *map, int flags)
+int erofs_map_blocks(struct inode *inode,
+		     struct erofs_map_blocks *map, int flags)
 {
 	struct super_block *sb = inode->i_sb;
 	struct erofs_inode *vi = EROFS_I(inode);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 5298c4ee277d..fe9564e5091e 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -486,6 +486,8 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
 int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
 int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		 u64 start, u64 len);
+int erofs_map_blocks(struct inode *inode,
+		     struct erofs_map_blocks *map, int flags);
 
 /* inode.c */
 static inline unsigned long erofs_inode_hash(erofs_nid_t nid)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 10/21] erofs: add fscache mode check helper
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Until then erofs is exactly blockdev based filesystem.

A new fscache-based mode is going to be introduced for erofs to support
scenarios where on-demand read semantics is needed, e.g. container
image distribution. In this case, erofs could be mounted from data blobs
through fscache.

Add a helper checking which mode erofs works in, and twist the code in
prep for the following fscache mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/internal.h |  5 +++++
 fs/erofs/super.c    | 44 +++++++++++++++++++++++++++++---------------
 2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index fe9564e5091e..05a97533b1e9 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -161,6 +161,11 @@ struct erofs_sb_info {
 #define set_opt(opt, option)	((opt)->mount_opt |= EROFS_MOUNT_##option)
 #define test_opt(opt, option)	((opt)->mount_opt & EROFS_MOUNT_##option)
 
+static inline bool erofs_is_fscache_mode(struct super_block *sb)
+{
+	return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !sb->s_bdev;
+}
+
 enum {
 	EROFS_ZIP_CACHE_DISABLED,
 	EROFS_ZIP_CACHE_READAHEAD,
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 0c4b41130c2f..724d5ff0d78c 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -259,15 +259,19 @@ static int erofs_init_devices(struct super_block *sb,
 		}
 		dis = ptr + erofs_blkoff(pos);
 
-		bdev = blkdev_get_by_path(dif->path,
-					  FMODE_READ | FMODE_EXCL,
-					  sb->s_type);
-		if (IS_ERR(bdev)) {
-			err = PTR_ERR(bdev);
-			break;
+		if (!erofs_is_fscache_mode(sb)) {
+			bdev = blkdev_get_by_path(dif->path,
+						  FMODE_READ | FMODE_EXCL,
+						  sb->s_type);
+			if (IS_ERR(bdev)) {
+				err = PTR_ERR(bdev);
+				break;
+			}
+			dif->bdev = bdev;
+			dif->dax_dev = fs_dax_get_by_bdev(bdev,
+							  &dif->dax_part_off);
 		}
-		dif->bdev = bdev;
-		dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
+
 		dif->blocks = le32_to_cpu(dis->blocks);
 		dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr);
 		sbi->total_blocks += dif->blocks;
@@ -586,21 +590,28 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	sb->s_magic = EROFS_SUPER_MAGIC;
 
-	if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
-		erofs_err(sb, "failed to set erofs blksize");
-		return -EINVAL;
-	}
-
 	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
 	if (!sbi)
 		return -ENOMEM;
 
 	sb->s_fs_info = sbi;
 	sbi->opt = ctx->opt;
-	sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
 	sbi->devs = ctx->devs;
 	ctx->devs = NULL;
 
+	if (erofs_is_fscache_mode(sb)) {
+		sb->s_blocksize = EROFS_BLKSIZ;
+		sb->s_blocksize_bits = LOG_BLOCK_SIZE;
+	} else {
+		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
+			erofs_err(sb, "failed to set erofs blksize");
+			return -EINVAL;
+		}
+
+		sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev,
+						  &sbi->dax_part_off);
+	}
+
 	err = erofs_read_superblock(sb);
 	if (err)
 		return err;
@@ -857,7 +868,10 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
 	struct erofs_sb_info *sbi = EROFS_SB(sb);
-	u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
+	u64 id = 0;
+
+	if (!erofs_is_fscache_mode(sb))
+		id = huge_encode_dev(sb->s_bdev->bd_dev);
 
 	buf->f_type = sb->s_magic;
 	buf->f_bsize = EROFS_BLKSIZ;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 10/21] erofs: add fscache mode check helper
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Until then erofs is exactly blockdev based filesystem.

A new fscache-based mode is going to be introduced for erofs to support
scenarios where on-demand read semantics is needed, e.g. container
image distribution. In this case, erofs could be mounted from data blobs
through fscache.

Add a helper checking which mode erofs works in, and twist the code in
prep for the following fscache mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/internal.h |  5 +++++
 fs/erofs/super.c    | 44 +++++++++++++++++++++++++++++---------------
 2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index fe9564e5091e..05a97533b1e9 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -161,6 +161,11 @@ struct erofs_sb_info {
 #define set_opt(opt, option)	((opt)->mount_opt |= EROFS_MOUNT_##option)
 #define test_opt(opt, option)	((opt)->mount_opt & EROFS_MOUNT_##option)
 
+static inline bool erofs_is_fscache_mode(struct super_block *sb)
+{
+	return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !sb->s_bdev;
+}
+
 enum {
 	EROFS_ZIP_CACHE_DISABLED,
 	EROFS_ZIP_CACHE_READAHEAD,
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 0c4b41130c2f..724d5ff0d78c 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -259,15 +259,19 @@ static int erofs_init_devices(struct super_block *sb,
 		}
 		dis = ptr + erofs_blkoff(pos);
 
-		bdev = blkdev_get_by_path(dif->path,
-					  FMODE_READ | FMODE_EXCL,
-					  sb->s_type);
-		if (IS_ERR(bdev)) {
-			err = PTR_ERR(bdev);
-			break;
+		if (!erofs_is_fscache_mode(sb)) {
+			bdev = blkdev_get_by_path(dif->path,
+						  FMODE_READ | FMODE_EXCL,
+						  sb->s_type);
+			if (IS_ERR(bdev)) {
+				err = PTR_ERR(bdev);
+				break;
+			}
+			dif->bdev = bdev;
+			dif->dax_dev = fs_dax_get_by_bdev(bdev,
+							  &dif->dax_part_off);
 		}
-		dif->bdev = bdev;
-		dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
+
 		dif->blocks = le32_to_cpu(dis->blocks);
 		dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr);
 		sbi->total_blocks += dif->blocks;
@@ -586,21 +590,28 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	sb->s_magic = EROFS_SUPER_MAGIC;
 
-	if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
-		erofs_err(sb, "failed to set erofs blksize");
-		return -EINVAL;
-	}
-
 	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
 	if (!sbi)
 		return -ENOMEM;
 
 	sb->s_fs_info = sbi;
 	sbi->opt = ctx->opt;
-	sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
 	sbi->devs = ctx->devs;
 	ctx->devs = NULL;
 
+	if (erofs_is_fscache_mode(sb)) {
+		sb->s_blocksize = EROFS_BLKSIZ;
+		sb->s_blocksize_bits = LOG_BLOCK_SIZE;
+	} else {
+		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
+			erofs_err(sb, "failed to set erofs blksize");
+			return -EINVAL;
+		}
+
+		sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev,
+						  &sbi->dax_part_off);
+	}
+
 	err = erofs_read_superblock(sb);
 	if (err)
 		return err;
@@ -857,7 +868,10 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
 	struct erofs_sb_info *sbi = EROFS_SB(sb);
-	u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
+	u64 id = 0;
+
+	if (!erofs_is_fscache_mode(sb))
+		id = huge_encode_dev(sb->s_bdev->bd_dev);
 
 	buf->f_type = sb->s_magic;
 	buf->f_bsize = EROFS_BLKSIZ;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 11/21] erofs: register fscache volume
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

A new fscache based mode is going to be introduced for erofs, in which
case on-demand read semantics is implemented through fscache.

As the first step, register fscache volume for each erofs filesystem.
That means, data blobs can not be shared among erofs filesystems. In the
following iteration, we are going to introduce the domain semantics, in
which case several erofs filesystems can belong to one domain, and data
blobs can be shared among these erofs filesystems of one domain.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/Kconfig    | 10 ++++++++++
 fs/erofs/Makefile   |  1 +
 fs/erofs/fscache.c  | 37 +++++++++++++++++++++++++++++++++++++
 fs/erofs/internal.h | 16 ++++++++++++++++
 fs/erofs/super.c    |  5 +++++
 5 files changed, 69 insertions(+)
 create mode 100644 fs/erofs/fscache.c

diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index f57255ab88ed..85490370e0ca 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -98,3 +98,13 @@ config EROFS_FS_ZIP_LZMA
 	  systems will be readable without selecting this option.
 
 	  If unsure, say N.
+
+config EROFS_FS_ONDEMAND
+	bool "EROFS fscache-based on-demand read support"
+	depends on CACHEFILES_ONDEMAND && (EROFS_FS=m && FSCACHE || EROFS_FS=y && FSCACHE=y)
+	default n
+	help
+	  This permits EROFS to use fscache-backed data blobs with on-demand
+	  read support.
+
+	  If unsure, say N.
diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 8a3317e38e5a..99bbc597a3e9 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -5,3 +5,4 @@ erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o sysfs.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
 erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
 erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
+erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
new file mode 100644
index 000000000000..7a6d0239ebb1
--- /dev/null
+++ b/fs/erofs/fscache.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022, Alibaba Cloud
+ */
+#include <linux/fscache.h>
+#include "internal.h"
+
+int erofs_fscache_register_fs(struct super_block *sb)
+{
+	struct erofs_sb_info *sbi = EROFS_SB(sb);
+	struct fscache_volume *volume;
+	char *name;
+	int ret = 0;
+
+	name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid);
+	if (!name)
+		return -ENOMEM;
+
+	volume = fscache_acquire_volume(name, NULL, NULL, 0);
+	if (IS_ERR_OR_NULL(volume)) {
+		erofs_err(sb, "failed to register volume for %s", name);
+		ret = volume ? PTR_ERR(volume) : -EOPNOTSUPP;
+		volume = NULL;
+	}
+
+	sbi->volume = volume;
+	kfree(name);
+	return ret;
+}
+
+void erofs_fscache_unregister_fs(struct super_block *sb)
+{
+	struct erofs_sb_info *sbi = EROFS_SB(sb);
+
+	fscache_relinquish_volume(sbi->volume, NULL, false);
+	sbi->volume = NULL;
+}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 05a97533b1e9..e4f6a13f161f 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -74,6 +74,7 @@ struct erofs_mount_opts {
 	unsigned int max_sync_decompress_pages;
 #endif
 	unsigned int mount_opt;
+	char *fsid;
 };
 
 struct erofs_dev_context {
@@ -146,6 +147,9 @@ struct erofs_sb_info {
 	/* sysfs support */
 	struct kobject s_kobj;		/* /sys/fs/erofs/<devname> */
 	struct completion s_kobj_unregister;
+
+	/* fscache support */
+	struct fscache_volume *volume;
 };
 
 #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
@@ -618,6 +622,18 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
 }
 #endif	/* !CONFIG_EROFS_FS_ZIP */
 
+/* fscache.c */
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+int erofs_fscache_register_fs(struct super_block *sb);
+void erofs_fscache_unregister_fs(struct super_block *sb);
+#else
+static inline int erofs_fscache_register_fs(struct super_block *sb)
+{
+	return 0;
+}
+static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
+#endif
+
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
 
 #endif	/* __EROFS_INTERNAL_H */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 724d5ff0d78c..fd8daa447237 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -602,6 +602,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 	if (erofs_is_fscache_mode(sb)) {
 		sb->s_blocksize = EROFS_BLKSIZ;
 		sb->s_blocksize_bits = LOG_BLOCK_SIZE;
+
+		err = erofs_fscache_register_fs(sb);
+		if (err)
+			return err;
 	} else {
 		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
 			erofs_err(sb, "failed to set erofs blksize");
@@ -768,6 +772,7 @@ static void erofs_kill_sb(struct super_block *sb)
 
 	erofs_free_dev_context(sbi->devs);
 	fs_put_dax(sbi->dax_dev);
+	erofs_fscache_unregister_fs(sb);
 	kfree(sbi);
 	sb->s_fs_info = NULL;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 11/21] erofs: register fscache volume
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

A new fscache based mode is going to be introduced for erofs, in which
case on-demand read semantics is implemented through fscache.

As the first step, register fscache volume for each erofs filesystem.
That means, data blobs can not be shared among erofs filesystems. In the
following iteration, we are going to introduce the domain semantics, in
which case several erofs filesystems can belong to one domain, and data
blobs can be shared among these erofs filesystems of one domain.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/Kconfig    | 10 ++++++++++
 fs/erofs/Makefile   |  1 +
 fs/erofs/fscache.c  | 37 +++++++++++++++++++++++++++++++++++++
 fs/erofs/internal.h | 16 ++++++++++++++++
 fs/erofs/super.c    |  5 +++++
 5 files changed, 69 insertions(+)
 create mode 100644 fs/erofs/fscache.c

diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index f57255ab88ed..85490370e0ca 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -98,3 +98,13 @@ config EROFS_FS_ZIP_LZMA
 	  systems will be readable without selecting this option.
 
 	  If unsure, say N.
+
+config EROFS_FS_ONDEMAND
+	bool "EROFS fscache-based on-demand read support"
+	depends on CACHEFILES_ONDEMAND && (EROFS_FS=m && FSCACHE || EROFS_FS=y && FSCACHE=y)
+	default n
+	help
+	  This permits EROFS to use fscache-backed data blobs with on-demand
+	  read support.
+
+	  If unsure, say N.
diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 8a3317e38e5a..99bbc597a3e9 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -5,3 +5,4 @@ erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o sysfs.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
 erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
 erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
+erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
new file mode 100644
index 000000000000..7a6d0239ebb1
--- /dev/null
+++ b/fs/erofs/fscache.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022, Alibaba Cloud
+ */
+#include <linux/fscache.h>
+#include "internal.h"
+
+int erofs_fscache_register_fs(struct super_block *sb)
+{
+	struct erofs_sb_info *sbi = EROFS_SB(sb);
+	struct fscache_volume *volume;
+	char *name;
+	int ret = 0;
+
+	name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid);
+	if (!name)
+		return -ENOMEM;
+
+	volume = fscache_acquire_volume(name, NULL, NULL, 0);
+	if (IS_ERR_OR_NULL(volume)) {
+		erofs_err(sb, "failed to register volume for %s", name);
+		ret = volume ? PTR_ERR(volume) : -EOPNOTSUPP;
+		volume = NULL;
+	}
+
+	sbi->volume = volume;
+	kfree(name);
+	return ret;
+}
+
+void erofs_fscache_unregister_fs(struct super_block *sb)
+{
+	struct erofs_sb_info *sbi = EROFS_SB(sb);
+
+	fscache_relinquish_volume(sbi->volume, NULL, false);
+	sbi->volume = NULL;
+}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 05a97533b1e9..e4f6a13f161f 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -74,6 +74,7 @@ struct erofs_mount_opts {
 	unsigned int max_sync_decompress_pages;
 #endif
 	unsigned int mount_opt;
+	char *fsid;
 };
 
 struct erofs_dev_context {
@@ -146,6 +147,9 @@ struct erofs_sb_info {
 	/* sysfs support */
 	struct kobject s_kobj;		/* /sys/fs/erofs/<devname> */
 	struct completion s_kobj_unregister;
+
+	/* fscache support */
+	struct fscache_volume *volume;
 };
 
 #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
@@ -618,6 +622,18 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
 }
 #endif	/* !CONFIG_EROFS_FS_ZIP */
 
+/* fscache.c */
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+int erofs_fscache_register_fs(struct super_block *sb);
+void erofs_fscache_unregister_fs(struct super_block *sb);
+#else
+static inline int erofs_fscache_register_fs(struct super_block *sb)
+{
+	return 0;
+}
+static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
+#endif
+
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
 
 #endif	/* __EROFS_INTERNAL_H */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 724d5ff0d78c..fd8daa447237 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -602,6 +602,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 	if (erofs_is_fscache_mode(sb)) {
 		sb->s_blocksize = EROFS_BLKSIZ;
 		sb->s_blocksize_bits = LOG_BLOCK_SIZE;
+
+		err = erofs_fscache_register_fs(sb);
+		if (err)
+			return err;
 	} else {
 		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
 			erofs_err(sb, "failed to set erofs blksize");
@@ -768,6 +772,7 @@ static void erofs_kill_sb(struct super_block *sb)
 
 	erofs_free_dev_context(sbi->devs);
 	fs_put_dax(sbi->dax_dev);
+	erofs_fscache_unregister_fs(sb);
 	kfree(sbi);
 	sb->s_fs_info = NULL;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 12/21] erofs: add fscache context helper functions
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Introduce a context structure for managing data blobs, and helper
functions for initializing and cleaning up this context structure.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 46 +++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/internal.h | 19 +++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 7a6d0239ebb1..67a3c4935245 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,6 +5,52 @@
 #include <linux/fscache.h>
 #include "internal.h"
 
+/*
+ * Create an fscache context for data blob.
+ * Return: 0 on success and allocated fscache context is assigned to @fscache,
+ *	   negative error number on failure.
+ */
+int erofs_fscache_register_cookie(struct super_block *sb,
+				  struct erofs_fscache **fscache, char *name)
+{
+	struct fscache_volume *volume = EROFS_SB(sb)->volume;
+	struct erofs_fscache *ctx;
+	struct fscache_cookie *cookie;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	cookie = fscache_acquire_cookie(volume, FSCACHE_ADV_WANT_CACHE_SIZE,
+					name, strlen(name), NULL, 0, 0);
+	if (!cookie) {
+		erofs_err(sb, "failed to get cookie for %s", name);
+		kfree(name);
+		return -EINVAL;
+	}
+
+	fscache_use_cookie(cookie, false);
+	ctx->cookie = cookie;
+
+	*fscache = ctx;
+	return 0;
+}
+
+void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
+{
+	struct erofs_fscache *ctx = *fscache;
+
+	if (!ctx)
+		return;
+
+	fscache_unuse_cookie(ctx->cookie, NULL, NULL);
+	fscache_relinquish_cookie(ctx->cookie, false);
+	ctx->cookie = NULL;
+
+	kfree(ctx);
+	*fscache = NULL;
+}
+
 int erofs_fscache_register_fs(struct super_block *sb)
 {
 	struct erofs_sb_info *sbi = EROFS_SB(sb);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index e4f6a13f161f..b1f19f058503 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -97,6 +97,10 @@ struct erofs_sb_lz4_info {
 	u16 max_pclusterblks;
 };
 
+struct erofs_fscache {
+	struct fscache_cookie *cookie;
+};
+
 struct erofs_sb_info {
 	struct erofs_mount_opts opt;	/* options */
 #ifdef CONFIG_EROFS_FS_ZIP
@@ -626,12 +630,27 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
 #ifdef CONFIG_EROFS_FS_ONDEMAND
 int erofs_fscache_register_fs(struct super_block *sb);
 void erofs_fscache_unregister_fs(struct super_block *sb);
+
+int erofs_fscache_register_cookie(struct super_block *sb,
+				  struct erofs_fscache **fscache, char *name);
+void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
 #else
 static inline int erofs_fscache_register_fs(struct super_block *sb)
 {
 	return 0;
 }
 static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
+
+static inline int erofs_fscache_register_cookie(struct super_block *sb,
+						struct erofs_fscache **fscache,
+						char *name)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
+{
+}
 #endif
 
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 12/21] erofs: add fscache context helper functions
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Introduce a context structure for managing data blobs, and helper
functions for initializing and cleaning up this context structure.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 46 +++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/internal.h | 19 +++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 7a6d0239ebb1..67a3c4935245 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,6 +5,52 @@
 #include <linux/fscache.h>
 #include "internal.h"
 
+/*
+ * Create an fscache context for data blob.
+ * Return: 0 on success and allocated fscache context is assigned to @fscache,
+ *	   negative error number on failure.
+ */
+int erofs_fscache_register_cookie(struct super_block *sb,
+				  struct erofs_fscache **fscache, char *name)
+{
+	struct fscache_volume *volume = EROFS_SB(sb)->volume;
+	struct erofs_fscache *ctx;
+	struct fscache_cookie *cookie;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	cookie = fscache_acquire_cookie(volume, FSCACHE_ADV_WANT_CACHE_SIZE,
+					name, strlen(name), NULL, 0, 0);
+	if (!cookie) {
+		erofs_err(sb, "failed to get cookie for %s", name);
+		kfree(name);
+		return -EINVAL;
+	}
+
+	fscache_use_cookie(cookie, false);
+	ctx->cookie = cookie;
+
+	*fscache = ctx;
+	return 0;
+}
+
+void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
+{
+	struct erofs_fscache *ctx = *fscache;
+
+	if (!ctx)
+		return;
+
+	fscache_unuse_cookie(ctx->cookie, NULL, NULL);
+	fscache_relinquish_cookie(ctx->cookie, false);
+	ctx->cookie = NULL;
+
+	kfree(ctx);
+	*fscache = NULL;
+}
+
 int erofs_fscache_register_fs(struct super_block *sb)
 {
 	struct erofs_sb_info *sbi = EROFS_SB(sb);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index e4f6a13f161f..b1f19f058503 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -97,6 +97,10 @@ struct erofs_sb_lz4_info {
 	u16 max_pclusterblks;
 };
 
+struct erofs_fscache {
+	struct fscache_cookie *cookie;
+};
+
 struct erofs_sb_info {
 	struct erofs_mount_opts opt;	/* options */
 #ifdef CONFIG_EROFS_FS_ZIP
@@ -626,12 +630,27 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
 #ifdef CONFIG_EROFS_FS_ONDEMAND
 int erofs_fscache_register_fs(struct super_block *sb);
 void erofs_fscache_unregister_fs(struct super_block *sb);
+
+int erofs_fscache_register_cookie(struct super_block *sb,
+				  struct erofs_fscache **fscache, char *name);
+void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
 #else
 static inline int erofs_fscache_register_fs(struct super_block *sb)
 {
 	return 0;
 }
 static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
+
+static inline int erofs_fscache_register_cookie(struct super_block *sb,
+						struct erofs_fscache **fscache,
+						char *name)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
+{
+}
 #endif
 
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 13/21] erofs: add anonymous inode caching metadata for data blobs
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Introduce one anonymous inode for data blobs so that erofs can cache
metadata directly within such anonymous inode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 39 ++++++++++++++++++++++++++++++++++++---
 fs/erofs/internal.h |  6 ++++--
 2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 67a3c4935245..1c88614203d2 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,17 +5,22 @@
 #include <linux/fscache.h>
 #include "internal.h"
 
+static const struct address_space_operations erofs_fscache_meta_aops = {
+};
+
 /*
  * Create an fscache context for data blob.
  * Return: 0 on success and allocated fscache context is assigned to @fscache,
  *	   negative error number on failure.
  */
 int erofs_fscache_register_cookie(struct super_block *sb,
-				  struct erofs_fscache **fscache, char *name)
+				  struct erofs_fscache **fscache,
+				  char *name, bool need_inode)
 {
 	struct fscache_volume *volume = EROFS_SB(sb)->volume;
 	struct erofs_fscache *ctx;
 	struct fscache_cookie *cookie;
+	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
 	if (!ctx)
@@ -25,15 +30,40 @@ int erofs_fscache_register_cookie(struct super_block *sb,
 					name, strlen(name), NULL, 0, 0);
 	if (!cookie) {
 		erofs_err(sb, "failed to get cookie for %s", name);
-		kfree(name);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto err;
 	}
 
 	fscache_use_cookie(cookie, false);
 	ctx->cookie = cookie;
 
+	if (need_inode) {
+		struct inode *const inode = new_inode(sb);
+
+		if (!inode) {
+			erofs_err(sb, "failed to get anon inode for %s", name);
+			ret = -ENOMEM;
+			goto err_cookie;
+		}
+
+		set_nlink(inode, 1);
+		inode->i_size = OFFSET_MAX;
+		inode->i_mapping->a_ops = &erofs_fscache_meta_aops;
+		mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
+
+		ctx->inode = inode;
+	}
+
 	*fscache = ctx;
 	return 0;
+
+err_cookie:
+	fscache_unuse_cookie(ctx->cookie, NULL, NULL);
+	fscache_relinquish_cookie(ctx->cookie, false);
+	ctx->cookie = NULL;
+err:
+	kfree(ctx);
+	return ret;
 }
 
 void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
@@ -47,6 +77,9 @@ void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
 	fscache_relinquish_cookie(ctx->cookie, false);
 	ctx->cookie = NULL;
 
+	iput(ctx->inode);
+	ctx->inode = NULL;
+
 	kfree(ctx);
 	*fscache = NULL;
 }
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index b1f19f058503..5867cb63fd74 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -99,6 +99,7 @@ struct erofs_sb_lz4_info {
 
 struct erofs_fscache {
 	struct fscache_cookie *cookie;
+	struct inode *inode;
 };
 
 struct erofs_sb_info {
@@ -632,7 +633,8 @@ int erofs_fscache_register_fs(struct super_block *sb);
 void erofs_fscache_unregister_fs(struct super_block *sb);
 
 int erofs_fscache_register_cookie(struct super_block *sb,
-				  struct erofs_fscache **fscache, char *name);
+				  struct erofs_fscache **fscache,
+				  char *name, bool need_inode);
 void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
 #else
 static inline int erofs_fscache_register_fs(struct super_block *sb)
@@ -643,7 +645,7 @@ static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
 
 static inline int erofs_fscache_register_cookie(struct super_block *sb,
 						struct erofs_fscache **fscache,
-						char *name)
+						char *name, bool need_inode)
 {
 	return -EOPNOTSUPP;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 13/21] erofs: add anonymous inode caching metadata for data blobs
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Introduce one anonymous inode for data blobs so that erofs can cache
metadata directly within such anonymous inode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 39 ++++++++++++++++++++++++++++++++++++---
 fs/erofs/internal.h |  6 ++++--
 2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 67a3c4935245..1c88614203d2 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,17 +5,22 @@
 #include <linux/fscache.h>
 #include "internal.h"
 
+static const struct address_space_operations erofs_fscache_meta_aops = {
+};
+
 /*
  * Create an fscache context for data blob.
  * Return: 0 on success and allocated fscache context is assigned to @fscache,
  *	   negative error number on failure.
  */
 int erofs_fscache_register_cookie(struct super_block *sb,
-				  struct erofs_fscache **fscache, char *name)
+				  struct erofs_fscache **fscache,
+				  char *name, bool need_inode)
 {
 	struct fscache_volume *volume = EROFS_SB(sb)->volume;
 	struct erofs_fscache *ctx;
 	struct fscache_cookie *cookie;
+	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
 	if (!ctx)
@@ -25,15 +30,40 @@ int erofs_fscache_register_cookie(struct super_block *sb,
 					name, strlen(name), NULL, 0, 0);
 	if (!cookie) {
 		erofs_err(sb, "failed to get cookie for %s", name);
-		kfree(name);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto err;
 	}
 
 	fscache_use_cookie(cookie, false);
 	ctx->cookie = cookie;
 
+	if (need_inode) {
+		struct inode *const inode = new_inode(sb);
+
+		if (!inode) {
+			erofs_err(sb, "failed to get anon inode for %s", name);
+			ret = -ENOMEM;
+			goto err_cookie;
+		}
+
+		set_nlink(inode, 1);
+		inode->i_size = OFFSET_MAX;
+		inode->i_mapping->a_ops = &erofs_fscache_meta_aops;
+		mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
+
+		ctx->inode = inode;
+	}
+
 	*fscache = ctx;
 	return 0;
+
+err_cookie:
+	fscache_unuse_cookie(ctx->cookie, NULL, NULL);
+	fscache_relinquish_cookie(ctx->cookie, false);
+	ctx->cookie = NULL;
+err:
+	kfree(ctx);
+	return ret;
 }
 
 void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
@@ -47,6 +77,9 @@ void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
 	fscache_relinquish_cookie(ctx->cookie, false);
 	ctx->cookie = NULL;
 
+	iput(ctx->inode);
+	ctx->inode = NULL;
+
 	kfree(ctx);
 	*fscache = NULL;
 }
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index b1f19f058503..5867cb63fd74 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -99,6 +99,7 @@ struct erofs_sb_lz4_info {
 
 struct erofs_fscache {
 	struct fscache_cookie *cookie;
+	struct inode *inode;
 };
 
 struct erofs_sb_info {
@@ -632,7 +633,8 @@ int erofs_fscache_register_fs(struct super_block *sb);
 void erofs_fscache_unregister_fs(struct super_block *sb);
 
 int erofs_fscache_register_cookie(struct super_block *sb,
-				  struct erofs_fscache **fscache, char *name);
+				  struct erofs_fscache **fscache,
+				  char *name, bool need_inode);
 void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
 #else
 static inline int erofs_fscache_register_fs(struct super_block *sb)
@@ -643,7 +645,7 @@ static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
 
 static inline int erofs_fscache_register_cookie(struct super_block *sb,
 						struct erofs_fscache **fscache,
-						char *name)
+						char *name, bool need_inode)
 {
 	return -EOPNOTSUPP;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 14/21] erofs: add erofs_fscache_read_folios() helper
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Add erofs_fscache_read_folios() helper reading from fscache. It supports
on-demand read semantics. That is, it will make the backend prepare for
the data when cache miss. Once data ready, it will read from the cache.

This helper can then be used to implement .readpage()/.readahead() of
on-demand read semantics.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/fscache.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 1c88614203d2..066f68c062e2 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,6 +5,59 @@
 #include <linux/fscache.h>
 #include "internal.h"
 
+/*
+ * Read data from fscache and fill the read data into page cache described by
+ * @start/len, which shall be both aligned with PAGE_SIZE. @pstart describes
+ * the start physical address in the cache file.
+ */
+static int erofs_fscache_read_folios(struct fscache_cookie *cookie,
+				     struct address_space *mapping,
+				     loff_t start, size_t len,
+				     loff_t pstart)
+{
+	enum netfs_io_source source;
+	struct netfs_io_subrequest subreq;
+	struct netfs_io_request rreq;
+	struct netfs_cache_resources *cres = &rreq.cache_resources;
+	struct iov_iter iter;
+	size_t done = 0;
+	int ret;
+
+	memset(&rreq, 0, sizeof(rreq));
+	memset(&subreq, 0, sizeof(subreq));
+	subreq.rreq = &rreq;
+
+	ret = fscache_begin_read_operation(cres, cookie);
+	if (ret)
+		return ret;
+
+	while (done < len) {
+		subreq.start = pstart + done;
+		subreq.len = len - done;
+		subreq.flags = 1 << NETFS_SREQ_ONDEMAND;
+
+		source = cres->ops->prepare_read(&subreq, LLONG_MAX);
+		if (WARN_ON(subreq.len == 0))
+			source = NETFS_INVALID_READ;
+		if (source != NETFS_READ_FROM_CACHE) {
+			ret = -EIO;
+			goto out;
+		}
+
+		iov_iter_xarray(&iter, READ, &mapping->i_pages,
+				start + done, subreq.len);
+		ret = fscache_read(cres, subreq.start, &iter,
+				   NETFS_READ_HOLE_FAIL, NULL, NULL);
+		if (ret)
+			goto out;
+
+		done += subreq.len;
+	}
+out:
+	fscache_end_operation(cres);
+	return ret;
+}
+
 static const struct address_space_operations erofs_fscache_meta_aops = {
 };
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 14/21] erofs: add erofs_fscache_read_folios() helper
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Add erofs_fscache_read_folios() helper reading from fscache. It supports
on-demand read semantics. That is, it will make the backend prepare for
the data when cache miss. Once data ready, it will read from the cache.

This helper can then be used to implement .readpage()/.readahead() of
on-demand read semantics.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/fscache.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 1c88614203d2..066f68c062e2 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,6 +5,59 @@
 #include <linux/fscache.h>
 #include "internal.h"
 
+/*
+ * Read data from fscache and fill the read data into page cache described by
+ * @start/len, which shall be both aligned with PAGE_SIZE. @pstart describes
+ * the start physical address in the cache file.
+ */
+static int erofs_fscache_read_folios(struct fscache_cookie *cookie,
+				     struct address_space *mapping,
+				     loff_t start, size_t len,
+				     loff_t pstart)
+{
+	enum netfs_io_source source;
+	struct netfs_io_subrequest subreq;
+	struct netfs_io_request rreq;
+	struct netfs_cache_resources *cres = &rreq.cache_resources;
+	struct iov_iter iter;
+	size_t done = 0;
+	int ret;
+
+	memset(&rreq, 0, sizeof(rreq));
+	memset(&subreq, 0, sizeof(subreq));
+	subreq.rreq = &rreq;
+
+	ret = fscache_begin_read_operation(cres, cookie);
+	if (ret)
+		return ret;
+
+	while (done < len) {
+		subreq.start = pstart + done;
+		subreq.len = len - done;
+		subreq.flags = 1 << NETFS_SREQ_ONDEMAND;
+
+		source = cres->ops->prepare_read(&subreq, LLONG_MAX);
+		if (WARN_ON(subreq.len == 0))
+			source = NETFS_INVALID_READ;
+		if (source != NETFS_READ_FROM_CACHE) {
+			ret = -EIO;
+			goto out;
+		}
+
+		iov_iter_xarray(&iter, READ, &mapping->i_pages,
+				start + done, subreq.len);
+		ret = fscache_read(cres, subreq.start, &iter,
+				   NETFS_READ_HOLE_FAIL, NULL, NULL);
+		if (ret)
+			goto out;
+
+		done += subreq.len;
+	}
+out:
+	fscache_end_operation(cres);
+	return ret;
+}
+
 static const struct address_space_operations erofs_fscache_meta_aops = {
 };
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 15/21] erofs: register fscache context for primary data blob
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Registers fscache context for primary data blob. Also move the
initialization of s_op and related fields forward, since anonymous
inode will be allocated under the super block when registering the
fscache context.

Something worth mentioning about the cleanup routine.

1. The fscache context will instantiate anonymous inodes under the super
block. Release these anonymous inodes when .put_super() is called, or
we'll get "VFS: Busy inodes after unmount." warning.

2. The fscache context is initialized prior to the root inode. If
.kill_sb() is called when mount failed, .put_super() won't be called
when root inode has not been initialized yet. Thus .kill_sb() shall
also contain the cleanup routine.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/internal.h |  1 +
 fs/erofs/super.c    | 15 +++++++++++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 5867cb63fd74..386658416159 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -155,6 +155,7 @@ struct erofs_sb_info {
 
 	/* fscache support */
 	struct fscache_volume *volume;
+	struct erofs_fscache *s_fscache;
 };
 
 #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index fd8daa447237..61dc900295f9 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -589,6 +589,9 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 	int err;
 
 	sb->s_magic = EROFS_SUPER_MAGIC;
+	sb->s_flags |= SB_RDONLY | SB_NOATIME;
+	sb->s_maxbytes = MAX_LFS_FILESIZE;
+	sb->s_op = &erofs_sops;
 
 	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
 	if (!sbi)
@@ -606,6 +609,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 		err = erofs_fscache_register_fs(sb);
 		if (err)
 			return err;
+
+		err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
+						    sbi->opt.fsid, true);
+		if (err)
+			return err;
 	} else {
 		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
 			erofs_err(sb, "failed to set erofs blksize");
@@ -628,11 +636,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 			clear_opt(&sbi->opt, DAX_ALWAYS);
 		}
 	}
-	sb->s_flags |= SB_RDONLY | SB_NOATIME;
-	sb->s_maxbytes = MAX_LFS_FILESIZE;
-	sb->s_time_gran = 1;
 
-	sb->s_op = &erofs_sops;
+	sb->s_time_gran = 1;
 	sb->s_xattr = erofs_xattr_handlers;
 
 	if (test_opt(&sbi->opt, POSIX_ACL))
@@ -772,6 +777,7 @@ static void erofs_kill_sb(struct super_block *sb)
 
 	erofs_free_dev_context(sbi->devs);
 	fs_put_dax(sbi->dax_dev);
+	erofs_fscache_unregister_cookie(&sbi->s_fscache);
 	erofs_fscache_unregister_fs(sb);
 	kfree(sbi);
 	sb->s_fs_info = NULL;
@@ -790,6 +796,7 @@ static void erofs_put_super(struct super_block *sb)
 	iput(sbi->managed_cache);
 	sbi->managed_cache = NULL;
 #endif
+	erofs_fscache_unregister_cookie(&sbi->s_fscache);
 }
 
 static struct file_system_type erofs_fs_type = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 15/21] erofs: register fscache context for primary data blob
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Registers fscache context for primary data blob. Also move the
initialization of s_op and related fields forward, since anonymous
inode will be allocated under the super block when registering the
fscache context.

Something worth mentioning about the cleanup routine.

1. The fscache context will instantiate anonymous inodes under the super
block. Release these anonymous inodes when .put_super() is called, or
we'll get "VFS: Busy inodes after unmount." warning.

2. The fscache context is initialized prior to the root inode. If
.kill_sb() is called when mount failed, .put_super() won't be called
when root inode has not been initialized yet. Thus .kill_sb() shall
also contain the cleanup routine.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/internal.h |  1 +
 fs/erofs/super.c    | 15 +++++++++++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 5867cb63fd74..386658416159 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -155,6 +155,7 @@ struct erofs_sb_info {
 
 	/* fscache support */
 	struct fscache_volume *volume;
+	struct erofs_fscache *s_fscache;
 };
 
 #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index fd8daa447237..61dc900295f9 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -589,6 +589,9 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 	int err;
 
 	sb->s_magic = EROFS_SUPER_MAGIC;
+	sb->s_flags |= SB_RDONLY | SB_NOATIME;
+	sb->s_maxbytes = MAX_LFS_FILESIZE;
+	sb->s_op = &erofs_sops;
 
 	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
 	if (!sbi)
@@ -606,6 +609,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 		err = erofs_fscache_register_fs(sb);
 		if (err)
 			return err;
+
+		err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
+						    sbi->opt.fsid, true);
+		if (err)
+			return err;
 	} else {
 		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
 			erofs_err(sb, "failed to set erofs blksize");
@@ -628,11 +636,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 			clear_opt(&sbi->opt, DAX_ALWAYS);
 		}
 	}
-	sb->s_flags |= SB_RDONLY | SB_NOATIME;
-	sb->s_maxbytes = MAX_LFS_FILESIZE;
-	sb->s_time_gran = 1;
 
-	sb->s_op = &erofs_sops;
+	sb->s_time_gran = 1;
 	sb->s_xattr = erofs_xattr_handlers;
 
 	if (test_opt(&sbi->opt, POSIX_ACL))
@@ -772,6 +777,7 @@ static void erofs_kill_sb(struct super_block *sb)
 
 	erofs_free_dev_context(sbi->devs);
 	fs_put_dax(sbi->dax_dev);
+	erofs_fscache_unregister_cookie(&sbi->s_fscache);
 	erofs_fscache_unregister_fs(sb);
 	kfree(sbi);
 	sb->s_fs_info = NULL;
@@ -790,6 +796,7 @@ static void erofs_put_super(struct super_block *sb)
 	iput(sbi->managed_cache);
 	sbi->managed_cache = NULL;
 #endif
+	erofs_fscache_unregister_cookie(&sbi->s_fscache);
 }
 
 static struct file_system_type erofs_fs_type = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 16/21] erofs: register fscache context for extra data blobs
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Similar to the multi device mode, erofs could be mounted from one
primary data blob (mandatory) and multiple extra data blobs (optional).

Register fscache context for each extra data blob.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/data.c     | 3 +++
 fs/erofs/internal.h | 2 ++
 fs/erofs/super.c    | 8 +++++++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index bc22642358ec..14b64d960541 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -199,6 +199,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 	map->m_bdev = sb->s_bdev;
 	map->m_daxdev = EROFS_SB(sb)->dax_dev;
 	map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
+	map->m_fscache = EROFS_SB(sb)->s_fscache;
 
 	if (map->m_deviceid) {
 		down_read(&devs->rwsem);
@@ -210,6 +211,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 		map->m_bdev = dif->bdev;
 		map->m_daxdev = dif->dax_dev;
 		map->m_dax_part_off = dif->dax_part_off;
+		map->m_fscache = dif->fscache;
 		up_read(&devs->rwsem);
 	} else if (devs->extra_devices) {
 		down_read(&devs->rwsem);
@@ -227,6 +229,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 				map->m_bdev = dif->bdev;
 				map->m_daxdev = dif->dax_dev;
 				map->m_dax_part_off = dif->dax_part_off;
+				map->m_fscache = dif->fscache;
 				break;
 			}
 		}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 386658416159..fa488af8dfcf 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -49,6 +49,7 @@ typedef u32 erofs_blk_t;
 
 struct erofs_device_info {
 	char *path;
+	struct erofs_fscache *fscache;
 	struct block_device *bdev;
 	struct dax_device *dax_dev;
 	u64 dax_part_off;
@@ -482,6 +483,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode,
 #endif	/* !CONFIG_EROFS_FS_ZIP */
 
 struct erofs_map_dev {
+	struct erofs_fscache *m_fscache;
 	struct block_device *m_bdev;
 	struct dax_device *m_daxdev;
 	u64 m_dax_part_off;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 61dc900295f9..c6755bcae4a6 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -259,7 +259,12 @@ static int erofs_init_devices(struct super_block *sb,
 		}
 		dis = ptr + erofs_blkoff(pos);
 
-		if (!erofs_is_fscache_mode(sb)) {
+		if (erofs_is_fscache_mode(sb)) {
+			err = erofs_fscache_register_cookie(sb, &dif->fscache,
+							    dif->path, false);
+			if (err)
+				break;
+		} else {
 			bdev = blkdev_get_by_path(dif->path,
 						  FMODE_READ | FMODE_EXCL,
 						  sb->s_type);
@@ -710,6 +715,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
 	fs_put_dax(dif->dax_dev);
 	if (dif->bdev)
 		blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL);
+	erofs_fscache_unregister_cookie(&dif->fscache);
 	kfree(dif->path);
 	kfree(dif);
 	return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 16/21] erofs: register fscache context for extra data blobs
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Similar to the multi device mode, erofs could be mounted from one
primary data blob (mandatory) and multiple extra data blobs (optional).

Register fscache context for each extra data blob.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/data.c     | 3 +++
 fs/erofs/internal.h | 2 ++
 fs/erofs/super.c    | 8 +++++++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index bc22642358ec..14b64d960541 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -199,6 +199,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 	map->m_bdev = sb->s_bdev;
 	map->m_daxdev = EROFS_SB(sb)->dax_dev;
 	map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
+	map->m_fscache = EROFS_SB(sb)->s_fscache;
 
 	if (map->m_deviceid) {
 		down_read(&devs->rwsem);
@@ -210,6 +211,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 		map->m_bdev = dif->bdev;
 		map->m_daxdev = dif->dax_dev;
 		map->m_dax_part_off = dif->dax_part_off;
+		map->m_fscache = dif->fscache;
 		up_read(&devs->rwsem);
 	} else if (devs->extra_devices) {
 		down_read(&devs->rwsem);
@@ -227,6 +229,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 				map->m_bdev = dif->bdev;
 				map->m_daxdev = dif->dax_dev;
 				map->m_dax_part_off = dif->dax_part_off;
+				map->m_fscache = dif->fscache;
 				break;
 			}
 		}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 386658416159..fa488af8dfcf 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -49,6 +49,7 @@ typedef u32 erofs_blk_t;
 
 struct erofs_device_info {
 	char *path;
+	struct erofs_fscache *fscache;
 	struct block_device *bdev;
 	struct dax_device *dax_dev;
 	u64 dax_part_off;
@@ -482,6 +483,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode,
 #endif	/* !CONFIG_EROFS_FS_ZIP */
 
 struct erofs_map_dev {
+	struct erofs_fscache *m_fscache;
 	struct block_device *m_bdev;
 	struct dax_device *m_daxdev;
 	u64 m_dax_part_off;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 61dc900295f9..c6755bcae4a6 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -259,7 +259,12 @@ static int erofs_init_devices(struct super_block *sb,
 		}
 		dis = ptr + erofs_blkoff(pos);
 
-		if (!erofs_is_fscache_mode(sb)) {
+		if (erofs_is_fscache_mode(sb)) {
+			err = erofs_fscache_register_cookie(sb, &dif->fscache,
+							    dif->path, false);
+			if (err)
+				break;
+		} else {
 			bdev = blkdev_get_by_path(dif->path,
 						  FMODE_READ | FMODE_EXCL,
 						  sb->s_type);
@@ -710,6 +715,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
 	fs_put_dax(dif->dax_dev);
 	if (dif->bdev)
 		blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL);
+	erofs_fscache_unregister_cookie(&dif->fscache);
 	kfree(dif->path);
 	kfree(dif);
 	return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 17/21] erofs: implement fscache-based metadata read
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Implement the data plane of reading metadata from primary data blob
over fscache.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/data.c    | 19 +++++++++++++++----
 fs/erofs/fscache.c | 27 +++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 14b64d960541..bb9c1fd48c19 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -6,6 +6,7 @@
  */
 #include "internal.h"
 #include <linux/prefetch.h>
+#include <linux/sched/mm.h>
 #include <linux/dax.h>
 #include <trace/events/erofs.h>
 
@@ -35,14 +36,20 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
 	erofs_off_t offset = blknr_to_addr(blkaddr);
 	pgoff_t index = offset >> PAGE_SHIFT;
 	struct page *page = buf->page;
+	struct folio *folio;
+	unsigned int nofs_flag;
 
 	if (!page || page->index != index) {
 		erofs_put_metabuf(buf);
-		page = read_cache_page_gfp(mapping, index,
-				mapping_gfp_constraint(mapping, ~__GFP_FS));
-		if (IS_ERR(page))
-			return page;
+
+		nofs_flag = memalloc_nofs_save();
+		folio = read_cache_folio(mapping, index, NULL, NULL);
+		memalloc_nofs_restore(nofs_flag);
+		if (IS_ERR(folio))
+			return folio;
+
 		/* should already be PageUptodate, no need to lock page */
+		page = folio_file_page(folio, index);
 		buf->page = page;
 	}
 	if (buf->kmap_type == EROFS_NO_KMAP) {
@@ -63,6 +70,10 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
 			 erofs_blk_t blkaddr, enum erofs_kmap_type type)
 {
+	if (erofs_is_fscache_mode(sb))
+		return erofs_bread(buf, EROFS_SB(sb)->s_fscache->inode,
+				   blkaddr, type);
+
 	return erofs_bread(buf, sb->s_bdev->bd_inode, blkaddr, type);
 }
 
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 066f68c062e2..3f00eb34ac35 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -58,7 +58,34 @@ static int erofs_fscache_read_folios(struct fscache_cookie *cookie,
 	return ret;
 }
 
+static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
+{
+	int ret;
+	struct folio *folio = page_folio(page);
+	struct super_block *sb = folio_mapping(folio)->host->i_sb;
+	struct erofs_map_dev mdev = {
+		.m_deviceid = 0,
+		.m_pa = folio_pos(folio),
+	};
+
+	ret = erofs_map_dev(sb, &mdev);
+	if (ret)
+		goto out;
+
+	ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
+			folio_mapping(folio), folio_pos(folio),
+			folio_size(folio), mdev.m_pa);
+	if (ret)
+		goto out;
+
+	folio_mark_uptodate(folio);
+out:
+	folio_unlock(folio);
+	return ret;
+}
+
 static const struct address_space_operations erofs_fscache_meta_aops = {
+	.readpage = erofs_fscache_meta_readpage,
 };
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 17/21] erofs: implement fscache-based metadata read
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Implement the data plane of reading metadata from primary data blob
over fscache.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/data.c    | 19 +++++++++++++++----
 fs/erofs/fscache.c | 27 +++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 14b64d960541..bb9c1fd48c19 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -6,6 +6,7 @@
  */
 #include "internal.h"
 #include <linux/prefetch.h>
+#include <linux/sched/mm.h>
 #include <linux/dax.h>
 #include <trace/events/erofs.h>
 
@@ -35,14 +36,20 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
 	erofs_off_t offset = blknr_to_addr(blkaddr);
 	pgoff_t index = offset >> PAGE_SHIFT;
 	struct page *page = buf->page;
+	struct folio *folio;
+	unsigned int nofs_flag;
 
 	if (!page || page->index != index) {
 		erofs_put_metabuf(buf);
-		page = read_cache_page_gfp(mapping, index,
-				mapping_gfp_constraint(mapping, ~__GFP_FS));
-		if (IS_ERR(page))
-			return page;
+
+		nofs_flag = memalloc_nofs_save();
+		folio = read_cache_folio(mapping, index, NULL, NULL);
+		memalloc_nofs_restore(nofs_flag);
+		if (IS_ERR(folio))
+			return folio;
+
 		/* should already be PageUptodate, no need to lock page */
+		page = folio_file_page(folio, index);
 		buf->page = page;
 	}
 	if (buf->kmap_type == EROFS_NO_KMAP) {
@@ -63,6 +70,10 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
 			 erofs_blk_t blkaddr, enum erofs_kmap_type type)
 {
+	if (erofs_is_fscache_mode(sb))
+		return erofs_bread(buf, EROFS_SB(sb)->s_fscache->inode,
+				   blkaddr, type);
+
 	return erofs_bread(buf, sb->s_bdev->bd_inode, blkaddr, type);
 }
 
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 066f68c062e2..3f00eb34ac35 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -58,7 +58,34 @@ static int erofs_fscache_read_folios(struct fscache_cookie *cookie,
 	return ret;
 }
 
+static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
+{
+	int ret;
+	struct folio *folio = page_folio(page);
+	struct super_block *sb = folio_mapping(folio)->host->i_sb;
+	struct erofs_map_dev mdev = {
+		.m_deviceid = 0,
+		.m_pa = folio_pos(folio),
+	};
+
+	ret = erofs_map_dev(sb, &mdev);
+	if (ret)
+		goto out;
+
+	ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
+			folio_mapping(folio), folio_pos(folio),
+			folio_size(folio), mdev.m_pa);
+	if (ret)
+		goto out;
+
+	folio_mark_uptodate(folio);
+out:
+	folio_unlock(folio);
+	return ret;
+}
+
 static const struct address_space_operations erofs_fscache_meta_aops = {
+	.readpage = erofs_fscache_meta_readpage,
 };
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 18/21] erofs: implement fscache-based data read for non-inline layout
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Implement the data plane of reading data from data blobs over fscache
for non-inline layout.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 51 +++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/inode.c    |  4 ++++
 fs/erofs/internal.h |  2 ++
 3 files changed, 57 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 3f00eb34ac35..b799b0fe1b67 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -84,10 +84,61 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
 	return ret;
 }
 
+static int erofs_fscache_readpage(struct file *file, struct page *page)
+{
+	struct folio *folio = page_folio(page);
+	struct inode *inode = folio_mapping(folio)->host;
+	struct super_block *sb = inode->i_sb;
+	struct erofs_map_blocks map;
+	struct erofs_map_dev mdev;
+	erofs_off_t pos;
+	loff_t pstart;
+	int ret = 0;
+
+	DBG_BUGON(folio_size(folio) != EROFS_BLKSIZ);
+
+	pos = folio_pos(folio);
+	map.m_la = pos;
+
+	ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+	if (ret)
+		goto out_unlock;
+
+	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+		folio_zero_range(folio, 0, folio_size(folio));
+		goto out_uptodate;
+	}
+
+	mdev = (struct erofs_map_dev) {
+		.m_deviceid = map.m_deviceid,
+		.m_pa = map.m_pa,
+	};
+
+	ret = erofs_map_dev(sb, &mdev);
+	if (ret)
+		goto out_unlock;
+
+	pstart = mdev.m_pa + (pos - map.m_la);
+	ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
+			folio_mapping(folio), folio_pos(folio),
+			folio_size(folio), pstart);
+
+out_uptodate:
+	if (!ret)
+		folio_mark_uptodate(folio);
+out_unlock:
+	folio_unlock(folio);
+	return ret;
+}
+
 static const struct address_space_operations erofs_fscache_meta_aops = {
 	.readpage = erofs_fscache_meta_readpage,
 };
 
+const struct address_space_operations erofs_fscache_access_aops = {
+	.readpage = erofs_fscache_readpage,
+};
+
 /*
  * Create an fscache context for data blob.
  * Return: 0 on success and allocated fscache context is assigned to @fscache,
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index e8b37ba5e9ad..8d3f56c6469b 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -297,6 +297,10 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
 		goto out_unlock;
 	}
 	inode->i_mapping->a_ops = &erofs_raw_access_aops;
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+	if (erofs_is_fscache_mode(inode->i_sb))
+		inode->i_mapping->a_ops = &erofs_fscache_access_aops;
+#endif
 
 out_unlock:
 	erofs_put_metabuf(&buf);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index fa488af8dfcf..c8f6ac910976 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -639,6 +639,8 @@ int erofs_fscache_register_cookie(struct super_block *sb,
 				  struct erofs_fscache **fscache,
 				  char *name, bool need_inode);
 void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
+
+extern const struct address_space_operations erofs_fscache_access_aops;
 #else
 static inline int erofs_fscache_register_fs(struct super_block *sb)
 {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 18/21] erofs: implement fscache-based data read for non-inline layout
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Implement the data plane of reading data from data blobs over fscache
for non-inline layout.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 51 +++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/inode.c    |  4 ++++
 fs/erofs/internal.h |  2 ++
 3 files changed, 57 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 3f00eb34ac35..b799b0fe1b67 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -84,10 +84,61 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
 	return ret;
 }
 
+static int erofs_fscache_readpage(struct file *file, struct page *page)
+{
+	struct folio *folio = page_folio(page);
+	struct inode *inode = folio_mapping(folio)->host;
+	struct super_block *sb = inode->i_sb;
+	struct erofs_map_blocks map;
+	struct erofs_map_dev mdev;
+	erofs_off_t pos;
+	loff_t pstart;
+	int ret = 0;
+
+	DBG_BUGON(folio_size(folio) != EROFS_BLKSIZ);
+
+	pos = folio_pos(folio);
+	map.m_la = pos;
+
+	ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+	if (ret)
+		goto out_unlock;
+
+	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+		folio_zero_range(folio, 0, folio_size(folio));
+		goto out_uptodate;
+	}
+
+	mdev = (struct erofs_map_dev) {
+		.m_deviceid = map.m_deviceid,
+		.m_pa = map.m_pa,
+	};
+
+	ret = erofs_map_dev(sb, &mdev);
+	if (ret)
+		goto out_unlock;
+
+	pstart = mdev.m_pa + (pos - map.m_la);
+	ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
+			folio_mapping(folio), folio_pos(folio),
+			folio_size(folio), pstart);
+
+out_uptodate:
+	if (!ret)
+		folio_mark_uptodate(folio);
+out_unlock:
+	folio_unlock(folio);
+	return ret;
+}
+
 static const struct address_space_operations erofs_fscache_meta_aops = {
 	.readpage = erofs_fscache_meta_readpage,
 };
 
+const struct address_space_operations erofs_fscache_access_aops = {
+	.readpage = erofs_fscache_readpage,
+};
+
 /*
  * Create an fscache context for data blob.
  * Return: 0 on success and allocated fscache context is assigned to @fscache,
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index e8b37ba5e9ad..8d3f56c6469b 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -297,6 +297,10 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
 		goto out_unlock;
 	}
 	inode->i_mapping->a_ops = &erofs_raw_access_aops;
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+	if (erofs_is_fscache_mode(inode->i_sb))
+		inode->i_mapping->a_ops = &erofs_fscache_access_aops;
+#endif
 
 out_unlock:
 	erofs_put_metabuf(&buf);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index fa488af8dfcf..c8f6ac910976 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -639,6 +639,8 @@ int erofs_fscache_register_cookie(struct super_block *sb,
 				  struct erofs_fscache **fscache,
 				  char *name, bool need_inode);
 void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
+
+extern const struct address_space_operations erofs_fscache_access_aops;
 #else
 static inline int erofs_fscache_register_fs(struct super_block *sb)
 {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 19/21] erofs: implement fscache-based data read for inline layout
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Implement the data plane of reading data from data blobs over fscache
for inline layout.

For the heading non-inline part, the data plane for non-inline layout is
reused, while only the tail packing part needs special handling.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index b799b0fe1b67..08849c15500f 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -84,6 +84,33 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
 	return ret;
 }
 
+static int erofs_fscache_readpage_inline(struct folio *folio,
+					 struct erofs_map_blocks *map)
+{
+	struct super_block *sb = folio_mapping(folio)->host->i_sb;
+	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
+	erofs_blk_t blknr;
+	size_t offset, len;
+	void *src, *dst;
+
+	/* For tail packing layout, the offset may be non-zero. */
+	offset = erofs_blkoff(map->m_pa);
+	blknr = erofs_blknr(map->m_pa);
+	len = map->m_llen;
+
+	src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
+	if (IS_ERR(src))
+		return PTR_ERR(src);
+
+	dst = kmap_local_folio(folio, 0);
+	memcpy(dst, src + offset, len);
+	memset(dst + len, 0, PAGE_SIZE - len);
+	kunmap_local(dst);
+
+	erofs_put_metabuf(&buf);
+	return 0;
+}
+
 static int erofs_fscache_readpage(struct file *file, struct page *page)
 {
 	struct folio *folio = page_folio(page);
@@ -109,6 +136,11 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
 		goto out_uptodate;
 	}
 
+	if (map.m_flags & EROFS_MAP_META) {
+		ret = erofs_fscache_readpage_inline(folio, &map);
+		goto out_uptodate;
+	}
+
 	mdev = (struct erofs_map_dev) {
 		.m_deviceid = map.m_deviceid,
 		.m_pa = map.m_pa,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 19/21] erofs: implement fscache-based data read for inline layout
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Implement the data plane of reading data from data blobs over fscache
for inline layout.

For the heading non-inline part, the data plane for non-inline layout is
reused, while only the tail packing part needs special handling.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index b799b0fe1b67..08849c15500f 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -84,6 +84,33 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
 	return ret;
 }
 
+static int erofs_fscache_readpage_inline(struct folio *folio,
+					 struct erofs_map_blocks *map)
+{
+	struct super_block *sb = folio_mapping(folio)->host->i_sb;
+	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
+	erofs_blk_t blknr;
+	size_t offset, len;
+	void *src, *dst;
+
+	/* For tail packing layout, the offset may be non-zero. */
+	offset = erofs_blkoff(map->m_pa);
+	blknr = erofs_blknr(map->m_pa);
+	len = map->m_llen;
+
+	src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
+	if (IS_ERR(src))
+		return PTR_ERR(src);
+
+	dst = kmap_local_folio(folio, 0);
+	memcpy(dst, src + offset, len);
+	memset(dst + len, 0, PAGE_SIZE - len);
+	kunmap_local(dst);
+
+	erofs_put_metabuf(&buf);
+	return 0;
+}
+
 static int erofs_fscache_readpage(struct file *file, struct page *page)
 {
 	struct folio *folio = page_folio(page);
@@ -109,6 +136,11 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
 		goto out_uptodate;
 	}
 
+	if (map.m_flags & EROFS_MAP_META) {
+		ret = erofs_fscache_readpage_inline(folio, &map);
+		goto out_uptodate;
+	}
+
 	mdev = (struct erofs_map_dev) {
 		.m_deviceid = map.m_deviceid,
 		.m_pa = map.m_pa,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 20/21] erofs: implement fscache-based data readahead
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Implement fscache-based data readahead. Also registers an individual
bdi for each erofs instance to enable readahead.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/super.c   |  4 +++
 2 files changed, 90 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 08849c15500f..eaa50692ddba 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -163,12 +163,98 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
 	return ret;
 }
 
+static void erofs_fscache_unlock_folios(struct readahead_control *rac,
+					size_t len)
+{
+	while (len) {
+		struct folio *folio = readahead_folio(rac);
+
+		len -= folio_size(folio);
+		folio_mark_uptodate(folio);
+		folio_unlock(folio);
+	}
+}
+
+static void erofs_fscache_readahead(struct readahead_control *rac)
+{
+	struct inode *inode = rac->mapping->host;
+	struct super_block *sb = inode->i_sb;
+	size_t len, count, done = 0;
+	erofs_off_t pos;
+	loff_t start, offset;
+	int ret;
+
+	if (!readahead_count(rac))
+		return;
+
+	start = readahead_pos(rac);
+	len = readahead_length(rac);
+
+	do {
+		struct erofs_map_blocks map;
+		struct erofs_map_dev mdev;
+
+		pos = start + done;
+		map.m_la = pos;
+
+		ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+		if (ret)
+			return;
+
+		offset = start + done;
+		count = min_t(size_t, map.m_llen - (pos - map.m_la),
+			      len - done);
+
+		if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+			struct iov_iter iter;
+
+			iov_iter_xarray(&iter, READ, &rac->mapping->i_pages,
+					offset, count);
+			iov_iter_zero(count, &iter);
+
+			erofs_fscache_unlock_folios(rac, count);
+			ret = count;
+			continue;
+		}
+
+		if (map.m_flags & EROFS_MAP_META) {
+			struct folio *folio = readahead_folio(rac);
+
+			ret = erofs_fscache_readpage_inline(folio, &map);
+			if (!ret) {
+				folio_mark_uptodate(folio);
+				ret = folio_size(folio);
+			}
+
+			folio_unlock(folio);
+			continue;
+		}
+
+		mdev = (struct erofs_map_dev) {
+			.m_deviceid = map.m_deviceid,
+			.m_pa = map.m_pa,
+		};
+		ret = erofs_map_dev(sb, &mdev);
+		if (ret)
+			return;
+
+		ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
+				rac->mapping, offset, count,
+				mdev.m_pa + (pos - map.m_la));
+		if (!ret) {
+			erofs_fscache_unlock_folios(rac, count);
+			ret = count;
+		}
+	} while (ret > 0 && ((done += ret) < len));
+}
+
 static const struct address_space_operations erofs_fscache_meta_aops = {
 	.readpage = erofs_fscache_meta_readpage,
 };
 
 const struct address_space_operations erofs_fscache_access_aops = {
 	.readpage = erofs_fscache_readpage,
+	.readahead = erofs_fscache_readahead,
 };
 
 /*
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index c6755bcae4a6..f68ba929100d 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -619,6 +619,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 						    sbi->opt.fsid, true);
 		if (err)
 			return err;
+
+		err = super_setup_bdi(sb);
+		if (err)
+			return err;
 	} else {
 		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
 			erofs_err(sb, "failed to set erofs blksize");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 20/21] erofs: implement fscache-based data readahead
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Implement fscache-based data readahead. Also registers an individual
bdi for each erofs instance to enable readahead.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/super.c   |  4 +++
 2 files changed, 90 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 08849c15500f..eaa50692ddba 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -163,12 +163,98 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
 	return ret;
 }
 
+static void erofs_fscache_unlock_folios(struct readahead_control *rac,
+					size_t len)
+{
+	while (len) {
+		struct folio *folio = readahead_folio(rac);
+
+		len -= folio_size(folio);
+		folio_mark_uptodate(folio);
+		folio_unlock(folio);
+	}
+}
+
+static void erofs_fscache_readahead(struct readahead_control *rac)
+{
+	struct inode *inode = rac->mapping->host;
+	struct super_block *sb = inode->i_sb;
+	size_t len, count, done = 0;
+	erofs_off_t pos;
+	loff_t start, offset;
+	int ret;
+
+	if (!readahead_count(rac))
+		return;
+
+	start = readahead_pos(rac);
+	len = readahead_length(rac);
+
+	do {
+		struct erofs_map_blocks map;
+		struct erofs_map_dev mdev;
+
+		pos = start + done;
+		map.m_la = pos;
+
+		ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+		if (ret)
+			return;
+
+		offset = start + done;
+		count = min_t(size_t, map.m_llen - (pos - map.m_la),
+			      len - done);
+
+		if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+			struct iov_iter iter;
+
+			iov_iter_xarray(&iter, READ, &rac->mapping->i_pages,
+					offset, count);
+			iov_iter_zero(count, &iter);
+
+			erofs_fscache_unlock_folios(rac, count);
+			ret = count;
+			continue;
+		}
+
+		if (map.m_flags & EROFS_MAP_META) {
+			struct folio *folio = readahead_folio(rac);
+
+			ret = erofs_fscache_readpage_inline(folio, &map);
+			if (!ret) {
+				folio_mark_uptodate(folio);
+				ret = folio_size(folio);
+			}
+
+			folio_unlock(folio);
+			continue;
+		}
+
+		mdev = (struct erofs_map_dev) {
+			.m_deviceid = map.m_deviceid,
+			.m_pa = map.m_pa,
+		};
+		ret = erofs_map_dev(sb, &mdev);
+		if (ret)
+			return;
+
+		ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
+				rac->mapping, offset, count,
+				mdev.m_pa + (pos - map.m_la));
+		if (!ret) {
+			erofs_fscache_unlock_folios(rac, count);
+			ret = count;
+		}
+	} while (ret > 0 && ((done += ret) < len));
+}
+
 static const struct address_space_operations erofs_fscache_meta_aops = {
 	.readpage = erofs_fscache_meta_readpage,
 };
 
 const struct address_space_operations erofs_fscache_access_aops = {
 	.readpage = erofs_fscache_readpage,
+	.readahead = erofs_fscache_readahead,
 };
 
 /*
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index c6755bcae4a6..f68ba929100d 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -619,6 +619,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 						    sbi->opt.fsid, true);
 		if (err)
 			return err;
+
+		err = super_setup_bdi(sb);
+		if (err)
+			return err;
 	} else {
 		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
 			erofs_err(sb, "failed to set erofs blksize");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 21/21] erofs: add 'fsid' mount option
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-15 12:36   ` Jeffle Xu
  -1 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend,
	tianzichen, fannaihao, zhangjiachen.jaycee

Introduce 'fsid' mount option to enable on-demand read sementics, in
which case, erofs will be mounted from data blobs. Users could specify
the name of primary data blob by this mount option.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/super.c | 31 ++++++++++++++++++++++++++++++-
 fs/erofs/sysfs.c |  4 ++--
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index f68ba929100d..4a623630e1c4 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -371,6 +371,8 @@ static int erofs_read_superblock(struct super_block *sb)
 
 	if (erofs_sb_has_ztailpacking(sbi))
 		erofs_info(sb, "EXPERIMENTAL compressed inline data feature in use. Use at your own risk!");
+	if (erofs_is_fscache_mode(sb))
+		erofs_info(sb, "EXPERIMENTAL fscache-based on-demand read feature in use. Use at your own risk!");
 out:
 	erofs_put_metabuf(&buf);
 	return ret;
@@ -399,6 +401,7 @@ enum {
 	Opt_dax,
 	Opt_dax_enum,
 	Opt_device,
+	Opt_fsid,
 	Opt_err
 };
 
@@ -423,6 +426,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
 	fsparam_flag("dax",             Opt_dax),
 	fsparam_enum("dax",		Opt_dax_enum, erofs_dax_param_enums),
 	fsparam_string("device",	Opt_device),
+	fsparam_string("fsid",		Opt_fsid),
 	{}
 };
 
@@ -518,6 +522,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
 		}
 		++ctx->devs->extra_devices;
 		break;
+	case Opt_fsid:
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+		kfree(ctx->opt.fsid);
+		ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL);
+		if (!ctx->opt.fsid)
+			return -ENOMEM;
+#else
+		errorfc(fc, "fsid option not supported");
+#endif
+		break;
 	default:
 		return -ENOPARAM;
 	}
@@ -604,6 +618,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	sb->s_fs_info = sbi;
 	sbi->opt = ctx->opt;
+	ctx->opt.fsid = NULL;
 	sbi->devs = ctx->devs;
 	ctx->devs = NULL;
 
@@ -690,6 +705,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 static int erofs_fc_get_tree(struct fs_context *fc)
 {
+	struct erofs_fs_context *ctx = fc->fs_private;
+
+	if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
+		return get_tree_nodev(fc, erofs_fc_fill_super);
+
 	return get_tree_bdev(fc, erofs_fc_fill_super);
 }
 
@@ -739,6 +759,7 @@ static void erofs_fc_free(struct fs_context *fc)
 	struct erofs_fs_context *ctx = fc->fs_private;
 
 	erofs_free_dev_context(ctx->devs);
+	kfree(ctx->opt.fsid);
 	kfree(ctx);
 }
 
@@ -779,7 +800,10 @@ static void erofs_kill_sb(struct super_block *sb)
 
 	WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
 
-	kill_block_super(sb);
+	if (erofs_is_fscache_mode(sb))
+		generic_shutdown_super(sb);
+	else
+		kill_block_super(sb);
 
 	sbi = EROFS_SB(sb);
 	if (!sbi)
@@ -789,6 +813,7 @@ static void erofs_kill_sb(struct super_block *sb)
 	fs_put_dax(sbi->dax_dev);
 	erofs_fscache_unregister_cookie(&sbi->s_fscache);
 	erofs_fscache_unregister_fs(sb);
+	kfree(sbi->opt.fsid);
 	kfree(sbi);
 	sb->s_fs_info = NULL;
 }
@@ -938,6 +963,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",dax=always");
 	if (test_opt(opt, DAX_NEVER))
 		seq_puts(seq, ",dax=never");
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+	if (opt->fsid)
+		seq_printf(seq, ",fsid=%s", opt->fsid);
+#endif
 	return 0;
 }
 
diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
index f3babf1e6608..c1383e508bbe 100644
--- a/fs/erofs/sysfs.c
+++ b/fs/erofs/sysfs.c
@@ -205,8 +205,8 @@ int erofs_register_sysfs(struct super_block *sb)
 
 	sbi->s_kobj.kset = &erofs_root;
 	init_completion(&sbi->s_kobj_unregister);
-	err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL,
-				   "%s", sb->s_id);
+	err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
+			erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
 	if (err)
 		goto put_sb_kobj;
 	return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v9 21/21] erofs: add 'fsid' mount option
@ 2022-04-15 12:36   ` Jeffle Xu
  0 siblings, 0 replies; 96+ messages in thread
From: Jeffle Xu @ 2022-04-15 12:36 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Introduce 'fsid' mount option to enable on-demand read sementics, in
which case, erofs will be mounted from data blobs. Users could specify
the name of primary data blob by this mount option.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/super.c | 31 ++++++++++++++++++++++++++++++-
 fs/erofs/sysfs.c |  4 ++--
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index f68ba929100d..4a623630e1c4 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -371,6 +371,8 @@ static int erofs_read_superblock(struct super_block *sb)
 
 	if (erofs_sb_has_ztailpacking(sbi))
 		erofs_info(sb, "EXPERIMENTAL compressed inline data feature in use. Use at your own risk!");
+	if (erofs_is_fscache_mode(sb))
+		erofs_info(sb, "EXPERIMENTAL fscache-based on-demand read feature in use. Use at your own risk!");
 out:
 	erofs_put_metabuf(&buf);
 	return ret;
@@ -399,6 +401,7 @@ enum {
 	Opt_dax,
 	Opt_dax_enum,
 	Opt_device,
+	Opt_fsid,
 	Opt_err
 };
 
@@ -423,6 +426,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
 	fsparam_flag("dax",             Opt_dax),
 	fsparam_enum("dax",		Opt_dax_enum, erofs_dax_param_enums),
 	fsparam_string("device",	Opt_device),
+	fsparam_string("fsid",		Opt_fsid),
 	{}
 };
 
@@ -518,6 +522,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
 		}
 		++ctx->devs->extra_devices;
 		break;
+	case Opt_fsid:
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+		kfree(ctx->opt.fsid);
+		ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL);
+		if (!ctx->opt.fsid)
+			return -ENOMEM;
+#else
+		errorfc(fc, "fsid option not supported");
+#endif
+		break;
 	default:
 		return -ENOPARAM;
 	}
@@ -604,6 +618,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	sb->s_fs_info = sbi;
 	sbi->opt = ctx->opt;
+	ctx->opt.fsid = NULL;
 	sbi->devs = ctx->devs;
 	ctx->devs = NULL;
 
@@ -690,6 +705,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 static int erofs_fc_get_tree(struct fs_context *fc)
 {
+	struct erofs_fs_context *ctx = fc->fs_private;
+
+	if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
+		return get_tree_nodev(fc, erofs_fc_fill_super);
+
 	return get_tree_bdev(fc, erofs_fc_fill_super);
 }
 
@@ -739,6 +759,7 @@ static void erofs_fc_free(struct fs_context *fc)
 	struct erofs_fs_context *ctx = fc->fs_private;
 
 	erofs_free_dev_context(ctx->devs);
+	kfree(ctx->opt.fsid);
 	kfree(ctx);
 }
 
@@ -779,7 +800,10 @@ static void erofs_kill_sb(struct super_block *sb)
 
 	WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
 
-	kill_block_super(sb);
+	if (erofs_is_fscache_mode(sb))
+		generic_shutdown_super(sb);
+	else
+		kill_block_super(sb);
 
 	sbi = EROFS_SB(sb);
 	if (!sbi)
@@ -789,6 +813,7 @@ static void erofs_kill_sb(struct super_block *sb)
 	fs_put_dax(sbi->dax_dev);
 	erofs_fscache_unregister_cookie(&sbi->s_fscache);
 	erofs_fscache_unregister_fs(sb);
+	kfree(sbi->opt.fsid);
 	kfree(sbi);
 	sb->s_fs_info = NULL;
 }
@@ -938,6 +963,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",dax=always");
 	if (test_opt(opt, DAX_NEVER))
 		seq_puts(seq, ",dax=never");
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+	if (opt->fsid)
+		seq_printf(seq, ",fsid=%s", opt->fsid);
+#endif
 	return 0;
 }
 
diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
index f3babf1e6608..c1383e508bbe 100644
--- a/fs/erofs/sysfs.c
+++ b/fs/erofs/sysfs.c
@@ -205,8 +205,8 @@ int erofs_register_sysfs(struct super_block *sb)
 
 	sbi->s_kobj.kset = &erofs_root;
 	init_completion(&sbi->s_kobj_unregister);
-	err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL,
-				   "%s", sb->s_id);
+	err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
+			erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
 	if (err)
 		goto put_sb_kobj;
 	return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 00/21] fscache, erofs: fscache-based on-demand read semantics
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-20  8:52   ` JiaZhu
  -1 siblings, 0 replies; 96+ messages in thread
From: JiaZhu @ 2022-04-20  8:52 UTC (permalink / raw)
  To: Jeffle Xu, xiang
  Cc: gregkh, fannaihao, willy, linux-kernel, tianzichen, joseph.qi,
	zhangjiachen.jaycee, linux-fsdevel, luodaowen.backend, gerry,
	torvalds, dhowells, linux-cachefs, chao, linux-erofs, zhujia.zj



在 4/15/22 8:35 PM, Jeffle Xu 写道:
> changes since v8:
> - rebase to 5.18-rc2
> - cachefiles: use object_id rather than anon_fd to uniquely identify a
>    cachefile object to avoid potential issues when the user moves the
>    anonymous fd around, e.g. through dup() (refer to commit message and
>    cachefiles_ondemand_get_fd() of patch 2 for more details)
>    (David Howells)
> - cachefiles: add @unbind_pincount refcount to avoid the potential deadlock
>    (refer to commit message of patch3 for more details)
> - cachefiles: move the calling site of cachefiles_ondemand_read() from
>    cachefiles_read() to cacehfiles_prep_read() (refer to commit message
>    of patch 5 for more details)
> - cachefiles: add tracepoints (patch 7) (David Howells)
> - cachefiles: update documentation (patch 8) (David Howells)
> - erofs: update Reviewed-by tag from Gao Xiang
> - erofs: move the logic of initializing bdev/dax_dev in fscache mode out
>    from patch 15/20. Instead move it into patch 9, so that patch 20 can
>    focus on the mount option handling
> - erofs: update the subject line and commit message of patch 12 (Gao
>    Xiang)
> - erofs: remove and fold erofs_fscache_get_folio() helper (patch 16)
>    (Gao Xiang)
> - erofs: change kmap() to kamp_loacl_folio(), and comment cleanup (patch
>    18) (Gao Xiang)
> - update "advantage of fscache-based on-demand read" section of the
>    cover letter
> - we've finished a preliminary end-to-end on-demand download daemon in
>    order to test the fscache on-demand kernel code as a real end-to-end
>    workload for container use cases. The test user guide is added in the
>    cover letter.
> - Thanks Zichen Tian for testing
>    Tested-by: Zichen Tian <tianzichen@kuaishou.com>
> 
> 
> Kernel Patchset
> ---------------
> Git tree:
> 
>      https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v9
> 
> Gitweb:
> 
>      https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v9
> 
> 
> User Guide for E2E Container Use Case
> -------------------------------------
> User guide:
> 
>      https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md
> 
> Video:
> 
>      https://youtu.be/F4IF2_DENXo
> 
> 
> User Daemon for Quick Test
> --------------------------
> Git tree:
> 
>      https://github.com/lostjeffle/demand-read-cachefilesd.git main
> 
> Gitweb:
> 
>      https://github.com/lostjeffle/demand-read-cachefilesd
> 
> 
> RFC: https://lore.kernel.org/all/YbRL2glGzjfZkVbH@B-P7TQMD6M-0146.local/t/
> v1: https://lore.kernel.org/lkml/47831875-4bdd-8398-9f2d-0466b31a4382@linux.alibaba.com/T/
> v2: https://lore.kernel.org/all/2946d871-b9e1-cf29-6d39-bcab30f2854f@linux.alibaba.com/t/
> v3: https://lore.kernel.org/lkml/20220209060108.43051-1-jefflexu@linux.alibaba.com/T/
> v4: https://lore.kernel.org/lkml/20220307123305.79520-1-jefflexu@linux.alibaba.com/T/#t
> v5: https://lore.kernel.org/lkml/202203170912.gk2sqkaK-lkp@intel.com/T/
> v6: https://lore.kernel.org/lkml/202203260720.uA5o7k5w-lkp@intel.com/T/
> v7: https://lore.kernel.org/lkml/557bcf75-2334-5fbb-d2e0-c65e96da566d@linux.alibaba.com/T/
> v8: https://lore.kernel.org/all/ac8571b8-0935-1f4f-e9f1-e424f059b5ed@linux.alibaba.com/T/
> 
> 
> [Background]
> ============
> Nydus [1] is an image distribution service especially optimized for
> distribution over network. Nydus is an excellent container image
> acceleration solution, since it only pulls data from remote when needed,
> a.k.a. on-demand reading and it also supports chunk-based deduplication,
> compression, etc.
> 
> erofs (Enhanced Read-Only File System) is a filesystem designed for
> read-only scenarios. (Documentation/filesystem/erofs.rst)
> 
> Over the past months we've been focusing on supporting Nydus image service
> with in-kernel erofs format[2]. In that case, each container image will be
> organized in one bootstrap (metadata) and (optional) multiple data blobs in
> erofs format. Massive container images will be stored on one machine.
> 
> To accelerate the container startup (fetching container images from remote
> and then start the container), we do hope that the bootstrap & blob files
> could support on-demand read. That is, erofs can be mounted and accessed
> even when the bootstrap/data blob files have not been fully downloaded.
> Then it'll have native performance after data is available locally.
> 
> That means we have to manage the cache state of the bootstrap/data blob
> files (if cache hit, read directly from the local cache; if cache miss,
> fetch the data somehow). It would be painful and may be dumb for erofs to
> implement the cache management itself. Thus we prefer fscache/cachefiles
> to do the cache management instead.
> 
> The fscache on-demand read feature aims to be implemented in a generic way
> so that it can benefit other use cases and/or filesystems if it's
> implemented in the fscache subsystem.
> 
> [1] https://nydus.dev
> [2] https://sched.co/pcdL
> 
> 
> [Overall Design]
> ================
> Please refer to patch 7 ("cachefiles: document on-demand read mode") for
> more details.
> 
> When working in the original mode, cachefiles mainly serves as a local cache
> for remote networking fs, while in on-demand read mode, cachefiles can work
> in the scenario where on-demand read semantics is needed, e.g. container image
> distribution.
> 
> The essential difference between these two modes is that, in original mode,
> when cache miss, netfs itself will fetch data from remote, and then write the
> fetched data into cache file. While in on-demand read mode, a user daemon is
> responsible for fetching data and then feeds to the kernel fscache side.
> 
> The on-demand read mode relies on a simple protocol used for communication
> between kernel and user daemon.
> 
> The proposed implementation relies on the anonymous fd mechanism to avoid
> the dependence on the format of cache file. When a fscache cachefile is opened
> for the first time, an anon_fd associated with the cache file is sent to the
> user daemon. With the given anon_fd, user daemon could fetch and write data
> into the cache file in the background, even when kernel has not triggered the
> cache miss. Besides, the write() syscall to the anon_fd will finally call
> cachefiles kernel module, which will write data to cache file in the latest
> format of cache file.
> 
> 1. cache miss
> When cache miss, cachefiles kernel module will notify user daemon with the
> anon_fd, along with the requested file range. When notified, user daemon
> needs to fetch data of the requested file range, and then write the fetched
> data into cache file with the given anonymous fd. When finished processing
> the request, user daemon needs to notify the kernel.
> 
> After notifying the user daemon, the kernel read routine will hang there,
> until the request is handled by user daemon. When it's awaken by the
> notification from user daemon, i.e. the corresponding hole has been filled
> by the user daemon, it will retry to read from the same file range.
> 
> 2. cache hit
> Once data is already ready in cache file, netfs will read from cache
> file directly.
> 
> 
> [Advantage of fscache-based on-demand read]
> ========================================
> 1. Asynchronous prefetch
> In current mechanism, fscache is responsible for cache state management,
> while the data plane (fetching data from local/remote on cache miss) is
> done on the user daemon side even without any file system request driven.
> In addition, if cached data has already been available locally, fscache
> will use it instead of trapping to user space anymore.
> 
> Therefore, different from event-driven approaches, the fscache on-demand
> user daemon could also fetch data (from remote) asynchronously in the
> background just like most multi-threaded HTTP downloaders.
> 
> 2. Flexible request amplification
> Since the data plane can be independently controlled by the user daemon,
> the user daemon can also fetch more data from remote than that the file
> system actually requests for small I/O sizes. Then, fetched data in bulk
> will be available at once and fscache won't be trapped into the user
> daemon again.
> 
> 3. Support massive blobs
> This mechanism can naturally support a large amount of backing files,
> and thus can benefit the densely employed scenarios. In our use cases,
> one container image can be formed of one bootstrap (required) and
> multiple chunk-deduplicated data blobs (optional).
> 
> For example, one container image for node.js will correspond to ~20
> files in total. In densely employed environment, there could be hundreds
> of containers and thus thousands of backing files on one machine.
> 
> 
> 
> 
> Jeffle Xu (21):
>    cachefiles: extract write routine
>    cachefiles: notify user daemon when looking up cookie
>    cachefiles: unbind cachefiles gracefully in on-demand mode
>    cachefiles: notify user daemon when withdrawing cookie
>    cachefiles: implement on-demand read
>    cachefiles: enable on-demand read mode
>    cachefiles: add tracepoints for on-demand read mode
>    cachefiles: document on-demand read mode
>    erofs: make erofs_map_blocks() generally available
>    erofs: add fscache mode check helper
>    erofs: register fscache volume
>    erofs: add fscache context helper functions
>    erofs: add anonymous inode caching metadata for data blobs
>    erofs: add erofs_fscache_read_folios() helper
>    erofs: register fscache context for primary data blob
>    erofs: register fscache context for extra data blobs
>    erofs: implement fscache-based metadata read
>    erofs: implement fscache-based data read for non-inline layout
>    erofs: implement fscache-based data read for inline layout
>    erofs: implement fscache-based data readahead
>    erofs: add 'fsid' mount option
> 
>   .../filesystems/caching/cachefiles.rst        | 170 ++++++
>   fs/cachefiles/Kconfig                         |  11 +
>   fs/cachefiles/Makefile                        |   1 +
>   fs/cachefiles/daemon.c                        | 116 +++-
>   fs/cachefiles/interface.c                     |   2 +
>   fs/cachefiles/internal.h                      |  74 +++
>   fs/cachefiles/io.c                            |  76 ++-
>   fs/cachefiles/namei.c                         |  16 +-
>   fs/cachefiles/ondemand.c                      | 496 ++++++++++++++++++
>   fs/erofs/Kconfig                              |  10 +
>   fs/erofs/Makefile                             |   1 +
>   fs/erofs/data.c                               |  26 +-
>   fs/erofs/fscache.c                            | 365 +++++++++++++
>   fs/erofs/inode.c                              |   4 +
>   fs/erofs/internal.h                           |  49 ++
>   fs/erofs/super.c                              | 105 +++-
>   fs/erofs/sysfs.c                              |   4 +-
>   include/linux/fscache.h                       |   1 +
>   include/linux/netfs.h                         |   2 +
>   include/trace/events/cachefiles.h             | 176 +++++++
>   include/uapi/linux/cachefiles.h               |  68 +++
>   21 files changed, 1694 insertions(+), 79 deletions(-)
>   create mode 100644 fs/cachefiles/ondemand.c
>   create mode 100644 fs/erofs/fscache.c
>   create mode 100644 include/uapi/linux/cachefiles.h
> 
Hi Jeffle & Xiang,

Thanks for coming up with such an innovative solution. We interested in
this and want to deploy it in our system. So we have performed the tests
by user guide and did some error injection tests using User Daemon Demo
offered by Jeffle. Hope it can be an upstream feature.

Thanks,
Jia

Tested-by: Jia Zhu <zhujia.zj@bytedance.com>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 00/21] fscache, erofs: fscache-based on-demand read semantics
@ 2022-04-20  8:52   ` JiaZhu
  0 siblings, 0 replies; 96+ messages in thread
From: JiaZhu @ 2022-04-20  8:52 UTC (permalink / raw)
  To: Jeffle Xu, xiang
  Cc: dhowells, linux-erofs, gregkh, fannaihao, linux-kernel, willy,
	tianzichen, joseph.qi, zhangjiachen.jaycee, linux-cachefs,
	linux-fsdevel, luodaowen.backend, gerry, torvalds



在 4/15/22 8:35 PM, Jeffle Xu 写道:
> changes since v8:
> - rebase to 5.18-rc2
> - cachefiles: use object_id rather than anon_fd to uniquely identify a
>    cachefile object to avoid potential issues when the user moves the
>    anonymous fd around, e.g. through dup() (refer to commit message and
>    cachefiles_ondemand_get_fd() of patch 2 for more details)
>    (David Howells)
> - cachefiles: add @unbind_pincount refcount to avoid the potential deadlock
>    (refer to commit message of patch3 for more details)
> - cachefiles: move the calling site of cachefiles_ondemand_read() from
>    cachefiles_read() to cacehfiles_prep_read() (refer to commit message
>    of patch 5 for more details)
> - cachefiles: add tracepoints (patch 7) (David Howells)
> - cachefiles: update documentation (patch 8) (David Howells)
> - erofs: update Reviewed-by tag from Gao Xiang
> - erofs: move the logic of initializing bdev/dax_dev in fscache mode out
>    from patch 15/20. Instead move it into patch 9, so that patch 20 can
>    focus on the mount option handling
> - erofs: update the subject line and commit message of patch 12 (Gao
>    Xiang)
> - erofs: remove and fold erofs_fscache_get_folio() helper (patch 16)
>    (Gao Xiang)
> - erofs: change kmap() to kamp_loacl_folio(), and comment cleanup (patch
>    18) (Gao Xiang)
> - update "advantage of fscache-based on-demand read" section of the
>    cover letter
> - we've finished a preliminary end-to-end on-demand download daemon in
>    order to test the fscache on-demand kernel code as a real end-to-end
>    workload for container use cases. The test user guide is added in the
>    cover letter.
> - Thanks Zichen Tian for testing
>    Tested-by: Zichen Tian <tianzichen@kuaishou.com>
> 
> 
> Kernel Patchset
> ---------------
> Git tree:
> 
>      https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v9
> 
> Gitweb:
> 
>      https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v9
> 
> 
> User Guide for E2E Container Use Case
> -------------------------------------
> User guide:
> 
>      https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md
> 
> Video:
> 
>      https://youtu.be/F4IF2_DENXo
> 
> 
> User Daemon for Quick Test
> --------------------------
> Git tree:
> 
>      https://github.com/lostjeffle/demand-read-cachefilesd.git main
> 
> Gitweb:
> 
>      https://github.com/lostjeffle/demand-read-cachefilesd
> 
> 
> RFC: https://lore.kernel.org/all/YbRL2glGzjfZkVbH@B-P7TQMD6M-0146.local/t/
> v1: https://lore.kernel.org/lkml/47831875-4bdd-8398-9f2d-0466b31a4382@linux.alibaba.com/T/
> v2: https://lore.kernel.org/all/2946d871-b9e1-cf29-6d39-bcab30f2854f@linux.alibaba.com/t/
> v3: https://lore.kernel.org/lkml/20220209060108.43051-1-jefflexu@linux.alibaba.com/T/
> v4: https://lore.kernel.org/lkml/20220307123305.79520-1-jefflexu@linux.alibaba.com/T/#t
> v5: https://lore.kernel.org/lkml/202203170912.gk2sqkaK-lkp@intel.com/T/
> v6: https://lore.kernel.org/lkml/202203260720.uA5o7k5w-lkp@intel.com/T/
> v7: https://lore.kernel.org/lkml/557bcf75-2334-5fbb-d2e0-c65e96da566d@linux.alibaba.com/T/
> v8: https://lore.kernel.org/all/ac8571b8-0935-1f4f-e9f1-e424f059b5ed@linux.alibaba.com/T/
> 
> 
> [Background]
> ============
> Nydus [1] is an image distribution service especially optimized for
> distribution over network. Nydus is an excellent container image
> acceleration solution, since it only pulls data from remote when needed,
> a.k.a. on-demand reading and it also supports chunk-based deduplication,
> compression, etc.
> 
> erofs (Enhanced Read-Only File System) is a filesystem designed for
> read-only scenarios. (Documentation/filesystem/erofs.rst)
> 
> Over the past months we've been focusing on supporting Nydus image service
> with in-kernel erofs format[2]. In that case, each container image will be
> organized in one bootstrap (metadata) and (optional) multiple data blobs in
> erofs format. Massive container images will be stored on one machine.
> 
> To accelerate the container startup (fetching container images from remote
> and then start the container), we do hope that the bootstrap & blob files
> could support on-demand read. That is, erofs can be mounted and accessed
> even when the bootstrap/data blob files have not been fully downloaded.
> Then it'll have native performance after data is available locally.
> 
> That means we have to manage the cache state of the bootstrap/data blob
> files (if cache hit, read directly from the local cache; if cache miss,
> fetch the data somehow). It would be painful and may be dumb for erofs to
> implement the cache management itself. Thus we prefer fscache/cachefiles
> to do the cache management instead.
> 
> The fscache on-demand read feature aims to be implemented in a generic way
> so that it can benefit other use cases and/or filesystems if it's
> implemented in the fscache subsystem.
> 
> [1] https://nydus.dev
> [2] https://sched.co/pcdL
> 
> 
> [Overall Design]
> ================
> Please refer to patch 7 ("cachefiles: document on-demand read mode") for
> more details.
> 
> When working in the original mode, cachefiles mainly serves as a local cache
> for remote networking fs, while in on-demand read mode, cachefiles can work
> in the scenario where on-demand read semantics is needed, e.g. container image
> distribution.
> 
> The essential difference between these two modes is that, in original mode,
> when cache miss, netfs itself will fetch data from remote, and then write the
> fetched data into cache file. While in on-demand read mode, a user daemon is
> responsible for fetching data and then feeds to the kernel fscache side.
> 
> The on-demand read mode relies on a simple protocol used for communication
> between kernel and user daemon.
> 
> The proposed implementation relies on the anonymous fd mechanism to avoid
> the dependence on the format of cache file. When a fscache cachefile is opened
> for the first time, an anon_fd associated with the cache file is sent to the
> user daemon. With the given anon_fd, user daemon could fetch and write data
> into the cache file in the background, even when kernel has not triggered the
> cache miss. Besides, the write() syscall to the anon_fd will finally call
> cachefiles kernel module, which will write data to cache file in the latest
> format of cache file.
> 
> 1. cache miss
> When cache miss, cachefiles kernel module will notify user daemon with the
> anon_fd, along with the requested file range. When notified, user daemon
> needs to fetch data of the requested file range, and then write the fetched
> data into cache file with the given anonymous fd. When finished processing
> the request, user daemon needs to notify the kernel.
> 
> After notifying the user daemon, the kernel read routine will hang there,
> until the request is handled by user daemon. When it's awaken by the
> notification from user daemon, i.e. the corresponding hole has been filled
> by the user daemon, it will retry to read from the same file range.
> 
> 2. cache hit
> Once data is already ready in cache file, netfs will read from cache
> file directly.
> 
> 
> [Advantage of fscache-based on-demand read]
> ========================================
> 1. Asynchronous prefetch
> In current mechanism, fscache is responsible for cache state management,
> while the data plane (fetching data from local/remote on cache miss) is
> done on the user daemon side even without any file system request driven.
> In addition, if cached data has already been available locally, fscache
> will use it instead of trapping to user space anymore.
> 
> Therefore, different from event-driven approaches, the fscache on-demand
> user daemon could also fetch data (from remote) asynchronously in the
> background just like most multi-threaded HTTP downloaders.
> 
> 2. Flexible request amplification
> Since the data plane can be independently controlled by the user daemon,
> the user daemon can also fetch more data from remote than that the file
> system actually requests for small I/O sizes. Then, fetched data in bulk
> will be available at once and fscache won't be trapped into the user
> daemon again.
> 
> 3. Support massive blobs
> This mechanism can naturally support a large amount of backing files,
> and thus can benefit the densely employed scenarios. In our use cases,
> one container image can be formed of one bootstrap (required) and
> multiple chunk-deduplicated data blobs (optional).
> 
> For example, one container image for node.js will correspond to ~20
> files in total. In densely employed environment, there could be hundreds
> of containers and thus thousands of backing files on one machine.
> 
> 
> 
> 
> Jeffle Xu (21):
>    cachefiles: extract write routine
>    cachefiles: notify user daemon when looking up cookie
>    cachefiles: unbind cachefiles gracefully in on-demand mode
>    cachefiles: notify user daemon when withdrawing cookie
>    cachefiles: implement on-demand read
>    cachefiles: enable on-demand read mode
>    cachefiles: add tracepoints for on-demand read mode
>    cachefiles: document on-demand read mode
>    erofs: make erofs_map_blocks() generally available
>    erofs: add fscache mode check helper
>    erofs: register fscache volume
>    erofs: add fscache context helper functions
>    erofs: add anonymous inode caching metadata for data blobs
>    erofs: add erofs_fscache_read_folios() helper
>    erofs: register fscache context for primary data blob
>    erofs: register fscache context for extra data blobs
>    erofs: implement fscache-based metadata read
>    erofs: implement fscache-based data read for non-inline layout
>    erofs: implement fscache-based data read for inline layout
>    erofs: implement fscache-based data readahead
>    erofs: add 'fsid' mount option
> 
>   .../filesystems/caching/cachefiles.rst        | 170 ++++++
>   fs/cachefiles/Kconfig                         |  11 +
>   fs/cachefiles/Makefile                        |   1 +
>   fs/cachefiles/daemon.c                        | 116 +++-
>   fs/cachefiles/interface.c                     |   2 +
>   fs/cachefiles/internal.h                      |  74 +++
>   fs/cachefiles/io.c                            |  76 ++-
>   fs/cachefiles/namei.c                         |  16 +-
>   fs/cachefiles/ondemand.c                      | 496 ++++++++++++++++++
>   fs/erofs/Kconfig                              |  10 +
>   fs/erofs/Makefile                             |   1 +
>   fs/erofs/data.c                               |  26 +-
>   fs/erofs/fscache.c                            | 365 +++++++++++++
>   fs/erofs/inode.c                              |   4 +
>   fs/erofs/internal.h                           |  49 ++
>   fs/erofs/super.c                              | 105 +++-
>   fs/erofs/sysfs.c                              |   4 +-
>   include/linux/fscache.h                       |   1 +
>   include/linux/netfs.h                         |   2 +
>   include/trace/events/cachefiles.h             | 176 +++++++
>   include/uapi/linux/cachefiles.h               |  68 +++
>   21 files changed, 1694 insertions(+), 79 deletions(-)
>   create mode 100644 fs/cachefiles/ondemand.c
>   create mode 100644 fs/erofs/fscache.c
>   create mode 100644 include/uapi/linux/cachefiles.h
> 
Hi Jeffle & Xiang,

Thanks for coming up with such an innovative solution. We interested in
this and want to deploy it in our system. So we have performed the tests
by user guide and did some error injection tests using User Daemon Demo
offered by Jeffle. Hope it can be an upstream feature.

Thanks,
Jia

Tested-by: Jia Zhu <zhujia.zj@bytedance.com>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 10/21] erofs: add fscache mode check helper
  2022-04-15 12:36   ` Jeffle Xu
@ 2022-04-21  7:53     ` Gao Xiang
  -1 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21  7:53 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

On Fri, Apr 15, 2022 at 08:36:03PM +0800, Jeffle Xu wrote:
> Until then erofs is exactly blockdev based filesystem.
> 
> A new fscache-based mode is going to be introduced for erofs to support
> scenarios where on-demand read semantics is needed, e.g. container
> image distribution. In this case, erofs could be mounted from data blobs
> through fscache.
> 
> Add a helper checking which mode erofs works in, and twist the code in
> prep for the following fscache mode.

in preparation for the upcoming fscache mode.

> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

> ---
>  fs/erofs/internal.h |  5 +++++
>  fs/erofs/super.c    | 44 +++++++++++++++++++++++++++++---------------
>  2 files changed, 34 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index fe9564e5091e..05a97533b1e9 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -161,6 +161,11 @@ struct erofs_sb_info {
>  #define set_opt(opt, option)	((opt)->mount_opt |= EROFS_MOUNT_##option)
>  #define test_opt(opt, option)	((opt)->mount_opt & EROFS_MOUNT_##option)
>  
> +static inline bool erofs_is_fscache_mode(struct super_block *sb)
> +{
> +	return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !sb->s_bdev;
> +}
> +
>  enum {
>  	EROFS_ZIP_CACHE_DISABLED,
>  	EROFS_ZIP_CACHE_READAHEAD,
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 0c4b41130c2f..724d5ff0d78c 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -259,15 +259,19 @@ static int erofs_init_devices(struct super_block *sb,
>  		}
>  		dis = ptr + erofs_blkoff(pos);
>  
> -		bdev = blkdev_get_by_path(dif->path,
> -					  FMODE_READ | FMODE_EXCL,
> -					  sb->s_type);
> -		if (IS_ERR(bdev)) {
> -			err = PTR_ERR(bdev);
> -			break;
> +		if (!erofs_is_fscache_mode(sb)) {
> +			bdev = blkdev_get_by_path(dif->path,
> +						  FMODE_READ | FMODE_EXCL,
> +						  sb->s_type);
> +			if (IS_ERR(bdev)) {
> +				err = PTR_ERR(bdev);
> +				break;
> +			}
> +			dif->bdev = bdev;
> +			dif->dax_dev = fs_dax_get_by_bdev(bdev,
> +							  &dif->dax_part_off);
>  		}
> -		dif->bdev = bdev;
> -		dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
> +
>  		dif->blocks = le32_to_cpu(dis->blocks);
>  		dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr);
>  		sbi->total_blocks += dif->blocks;
> @@ -586,21 +590,28 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>  
>  	sb->s_magic = EROFS_SUPER_MAGIC;
>  
> -	if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
> -		erofs_err(sb, "failed to set erofs blksize");
> -		return -EINVAL;
> -	}
> -
>  	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
>  	if (!sbi)
>  		return -ENOMEM;
>  
>  	sb->s_fs_info = sbi;
>  	sbi->opt = ctx->opt;
> -	sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
>  	sbi->devs = ctx->devs;
>  	ctx->devs = NULL;
>  
> +	if (erofs_is_fscache_mode(sb)) {
> +		sb->s_blocksize = EROFS_BLKSIZ;
> +		sb->s_blocksize_bits = LOG_BLOCK_SIZE;
> +	} else {
> +		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
> +			erofs_err(sb, "failed to set erofs blksize");
> +			return -EINVAL;
> +		}
> +
> +		sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev,
> +						  &sbi->dax_part_off);
> +	}
> +
>  	err = erofs_read_superblock(sb);
>  	if (err)
>  		return err;
> @@ -857,7 +868,10 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  {
>  	struct super_block *sb = dentry->d_sb;
>  	struct erofs_sb_info *sbi = EROFS_SB(sb);
> -	u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
> +	u64 id = 0;
> +
> +	if (!erofs_is_fscache_mode(sb))
> +		id = huge_encode_dev(sb->s_bdev->bd_dev);
>  
>  	buf->f_type = sb->s_magic;
>  	buf->f_bsize = EROFS_BLKSIZ;
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 10/21] erofs: add fscache mode check helper
@ 2022-04-21  7:53     ` Gao Xiang
  0 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21  7:53 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

On Fri, Apr 15, 2022 at 08:36:03PM +0800, Jeffle Xu wrote:
> Until then erofs is exactly blockdev based filesystem.
> 
> A new fscache-based mode is going to be introduced for erofs to support
> scenarios where on-demand read semantics is needed, e.g. container
> image distribution. In this case, erofs could be mounted from data blobs
> through fscache.
> 
> Add a helper checking which mode erofs works in, and twist the code in
> prep for the following fscache mode.

in preparation for the upcoming fscache mode.

> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

> ---
>  fs/erofs/internal.h |  5 +++++
>  fs/erofs/super.c    | 44 +++++++++++++++++++++++++++++---------------
>  2 files changed, 34 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index fe9564e5091e..05a97533b1e9 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -161,6 +161,11 @@ struct erofs_sb_info {
>  #define set_opt(opt, option)	((opt)->mount_opt |= EROFS_MOUNT_##option)
>  #define test_opt(opt, option)	((opt)->mount_opt & EROFS_MOUNT_##option)
>  
> +static inline bool erofs_is_fscache_mode(struct super_block *sb)
> +{
> +	return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !sb->s_bdev;
> +}
> +
>  enum {
>  	EROFS_ZIP_CACHE_DISABLED,
>  	EROFS_ZIP_CACHE_READAHEAD,
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 0c4b41130c2f..724d5ff0d78c 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -259,15 +259,19 @@ static int erofs_init_devices(struct super_block *sb,
>  		}
>  		dis = ptr + erofs_blkoff(pos);
>  
> -		bdev = blkdev_get_by_path(dif->path,
> -					  FMODE_READ | FMODE_EXCL,
> -					  sb->s_type);
> -		if (IS_ERR(bdev)) {
> -			err = PTR_ERR(bdev);
> -			break;
> +		if (!erofs_is_fscache_mode(sb)) {
> +			bdev = blkdev_get_by_path(dif->path,
> +						  FMODE_READ | FMODE_EXCL,
> +						  sb->s_type);
> +			if (IS_ERR(bdev)) {
> +				err = PTR_ERR(bdev);
> +				break;
> +			}
> +			dif->bdev = bdev;
> +			dif->dax_dev = fs_dax_get_by_bdev(bdev,
> +							  &dif->dax_part_off);
>  		}
> -		dif->bdev = bdev;
> -		dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
> +
>  		dif->blocks = le32_to_cpu(dis->blocks);
>  		dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr);
>  		sbi->total_blocks += dif->blocks;
> @@ -586,21 +590,28 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>  
>  	sb->s_magic = EROFS_SUPER_MAGIC;
>  
> -	if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
> -		erofs_err(sb, "failed to set erofs blksize");
> -		return -EINVAL;
> -	}
> -
>  	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
>  	if (!sbi)
>  		return -ENOMEM;
>  
>  	sb->s_fs_info = sbi;
>  	sbi->opt = ctx->opt;
> -	sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
>  	sbi->devs = ctx->devs;
>  	ctx->devs = NULL;
>  
> +	if (erofs_is_fscache_mode(sb)) {
> +		sb->s_blocksize = EROFS_BLKSIZ;
> +		sb->s_blocksize_bits = LOG_BLOCK_SIZE;
> +	} else {
> +		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
> +			erofs_err(sb, "failed to set erofs blksize");
> +			return -EINVAL;
> +		}
> +
> +		sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev,
> +						  &sbi->dax_part_off);
> +	}
> +
>  	err = erofs_read_superblock(sb);
>  	if (err)
>  		return err;
> @@ -857,7 +868,10 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  {
>  	struct super_block *sb = dentry->d_sb;
>  	struct erofs_sb_info *sbi = EROFS_SB(sb);
> -	u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
> +	u64 id = 0;
> +
> +	if (!erofs_is_fscache_mode(sb))
> +		id = huge_encode_dev(sb->s_bdev->bd_dev);
>  
>  	buf->f_type = sb->s_magic;
>  	buf->f_bsize = EROFS_BLKSIZ;
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 16/21] erofs: register fscache context for extra data blobs
  2022-04-15 12:36   ` Jeffle Xu
@ 2022-04-21 10:58     ` Gao Xiang
  -1 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 10:58 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

On Fri, Apr 15, 2022 at 08:36:09PM +0800, Jeffle Xu wrote:
> Similar to the multi device mode, erofs could be mounted from one
> primary data blob (mandatory) and multiple extra data blobs (optional).
> 
> Register fscache context for each extra data blob.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

> ---
>  fs/erofs/data.c     | 3 +++
>  fs/erofs/internal.h | 2 ++
>  fs/erofs/super.c    | 8 +++++++-
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index bc22642358ec..14b64d960541 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -199,6 +199,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
>  	map->m_bdev = sb->s_bdev;
>  	map->m_daxdev = EROFS_SB(sb)->dax_dev;
>  	map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
> +	map->m_fscache = EROFS_SB(sb)->s_fscache;
>  
>  	if (map->m_deviceid) {
>  		down_read(&devs->rwsem);
> @@ -210,6 +211,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
>  		map->m_bdev = dif->bdev;
>  		map->m_daxdev = dif->dax_dev;
>  		map->m_dax_part_off = dif->dax_part_off;
> +		map->m_fscache = dif->fscache;
>  		up_read(&devs->rwsem);
>  	} else if (devs->extra_devices) {
>  		down_read(&devs->rwsem);
> @@ -227,6 +229,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
>  				map->m_bdev = dif->bdev;
>  				map->m_daxdev = dif->dax_dev;
>  				map->m_dax_part_off = dif->dax_part_off;
> +				map->m_fscache = dif->fscache;
>  				break;
>  			}
>  		}
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 386658416159..fa488af8dfcf 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -49,6 +49,7 @@ typedef u32 erofs_blk_t;
>  
>  struct erofs_device_info {
>  	char *path;
> +	struct erofs_fscache *fscache;
>  	struct block_device *bdev;
>  	struct dax_device *dax_dev;
>  	u64 dax_part_off;
> @@ -482,6 +483,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode,
>  #endif	/* !CONFIG_EROFS_FS_ZIP */
>  
>  struct erofs_map_dev {
> +	struct erofs_fscache *m_fscache;
>  	struct block_device *m_bdev;
>  	struct dax_device *m_daxdev;
>  	u64 m_dax_part_off;
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 61dc900295f9..c6755bcae4a6 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -259,7 +259,12 @@ static int erofs_init_devices(struct super_block *sb,
>  		}
>  		dis = ptr + erofs_blkoff(pos);
>  
> -		if (!erofs_is_fscache_mode(sb)) {
> +		if (erofs_is_fscache_mode(sb)) {
> +			err = erofs_fscache_register_cookie(sb, &dif->fscache,
> +							    dif->path, false);
> +			if (err)
> +				break;
> +		} else {
>  			bdev = blkdev_get_by_path(dif->path,
>  						  FMODE_READ | FMODE_EXCL,
>  						  sb->s_type);
> @@ -710,6 +715,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
>  	fs_put_dax(dif->dax_dev);
>  	if (dif->bdev)
>  		blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL);
> +	erofs_fscache_unregister_cookie(&dif->fscache);
>  	kfree(dif->path);
>  	kfree(dif);
>  	return 0;
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 16/21] erofs: register fscache context for extra data blobs
@ 2022-04-21 10:58     ` Gao Xiang
  0 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 10:58 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

On Fri, Apr 15, 2022 at 08:36:09PM +0800, Jeffle Xu wrote:
> Similar to the multi device mode, erofs could be mounted from one
> primary data blob (mandatory) and multiple extra data blobs (optional).
> 
> Register fscache context for each extra data blob.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

> ---
>  fs/erofs/data.c     | 3 +++
>  fs/erofs/internal.h | 2 ++
>  fs/erofs/super.c    | 8 +++++++-
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index bc22642358ec..14b64d960541 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -199,6 +199,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
>  	map->m_bdev = sb->s_bdev;
>  	map->m_daxdev = EROFS_SB(sb)->dax_dev;
>  	map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
> +	map->m_fscache = EROFS_SB(sb)->s_fscache;
>  
>  	if (map->m_deviceid) {
>  		down_read(&devs->rwsem);
> @@ -210,6 +211,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
>  		map->m_bdev = dif->bdev;
>  		map->m_daxdev = dif->dax_dev;
>  		map->m_dax_part_off = dif->dax_part_off;
> +		map->m_fscache = dif->fscache;
>  		up_read(&devs->rwsem);
>  	} else if (devs->extra_devices) {
>  		down_read(&devs->rwsem);
> @@ -227,6 +229,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
>  				map->m_bdev = dif->bdev;
>  				map->m_daxdev = dif->dax_dev;
>  				map->m_dax_part_off = dif->dax_part_off;
> +				map->m_fscache = dif->fscache;
>  				break;
>  			}
>  		}
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 386658416159..fa488af8dfcf 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -49,6 +49,7 @@ typedef u32 erofs_blk_t;
>  
>  struct erofs_device_info {
>  	char *path;
> +	struct erofs_fscache *fscache;
>  	struct block_device *bdev;
>  	struct dax_device *dax_dev;
>  	u64 dax_part_off;
> @@ -482,6 +483,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode,
>  #endif	/* !CONFIG_EROFS_FS_ZIP */
>  
>  struct erofs_map_dev {
> +	struct erofs_fscache *m_fscache;
>  	struct block_device *m_bdev;
>  	struct dax_device *m_daxdev;
>  	u64 m_dax_part_off;
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 61dc900295f9..c6755bcae4a6 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -259,7 +259,12 @@ static int erofs_init_devices(struct super_block *sb,
>  		}
>  		dis = ptr + erofs_blkoff(pos);
>  
> -		if (!erofs_is_fscache_mode(sb)) {
> +		if (erofs_is_fscache_mode(sb)) {
> +			err = erofs_fscache_register_cookie(sb, &dif->fscache,
> +							    dif->path, false);
> +			if (err)
> +				break;
> +		} else {
>  			bdev = blkdev_get_by_path(dif->path,
>  						  FMODE_READ | FMODE_EXCL,
>  						  sb->s_type);
> @@ -710,6 +715,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
>  	fs_put_dax(dif->dax_dev);
>  	if (dif->bdev)
>  		blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL);
> +	erofs_fscache_unregister_cookie(&dif->fscache);
>  	kfree(dif->path);
>  	kfree(dif);
>  	return 0;
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 18/21] erofs: implement fscache-based data read for non-inline layout
  2022-04-15 12:36   ` Jeffle Xu
@ 2022-04-21 11:13     ` Gao Xiang
  -1 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 11:13 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

On Fri, Apr 15, 2022 at 08:36:11PM +0800, Jeffle Xu wrote:
> Implement the data plane of reading data from data blobs over fscache
> for non-inline layout.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/fscache.c  | 51 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/erofs/inode.c    |  4 ++++
>  fs/erofs/internal.h |  2 ++
>  3 files changed, 57 insertions(+)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 3f00eb34ac35..b799b0fe1b67 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -84,10 +84,61 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
>  	return ret;
>  }
>  
> +static int erofs_fscache_readpage(struct file *file, struct page *page)
> +{
> +	struct folio *folio = page_folio(page);
> +	struct inode *inode = folio_mapping(folio)->host;
> +	struct super_block *sb = inode->i_sb;
> +	struct erofs_map_blocks map;
> +	struct erofs_map_dev mdev;
> +	erofs_off_t pos;
> +	loff_t pstart;
> +	int ret = 0;

Redundant assignment now?

> +
> +	DBG_BUGON(folio_size(folio) != EROFS_BLKSIZ);
> +
> +	pos = folio_pos(folio);
> +	map.m_la = pos;
> +
> +	ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
> +	if (ret)
> +		goto out_unlock;
> +
> +	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> +		folio_zero_range(folio, 0, folio_size(folio));
> +		goto out_uptodate;
> +	}
> +
> +	mdev = (struct erofs_map_dev) {
> +		.m_deviceid = map.m_deviceid,
> +		.m_pa = map.m_pa,
> +	};
> +
> +	ret = erofs_map_dev(sb, &mdev);
> +	if (ret)
> +		goto out_unlock;
> +
> +	pstart = mdev.m_pa + (pos - map.m_la);
> +	ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
> +			folio_mapping(folio), folio_pos(folio),
> +			folio_size(folio), pstart);
> +
> +out_uptodate:
> +	if (!ret)
> +		folio_mark_uptodate(folio);
> +out_unlock:
> +	folio_unlock(folio);
> +	return ret;
> +}
> +
>  static const struct address_space_operations erofs_fscache_meta_aops = {
>  	.readpage = erofs_fscache_meta_readpage,
>  };
>  
> +const struct address_space_operations erofs_fscache_access_aops = {
> +	.readpage = erofs_fscache_readpage,
> +};
> +
>  /*
>   * Create an fscache context for data blob.
>   * Return: 0 on success and allocated fscache context is assigned to @fscache,
> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> index e8b37ba5e9ad..8d3f56c6469b 100644
> --- a/fs/erofs/inode.c
> +++ b/fs/erofs/inode.c
> @@ -297,6 +297,10 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
>  		goto out_unlock;
>  	}
>  	inode->i_mapping->a_ops = &erofs_raw_access_aops;
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> +	if (erofs_is_fscache_mode(inode->i_sb))
> +		inode->i_mapping->a_ops = &erofs_fscache_access_aops;
> +#endif
>  
>  out_unlock:
>  	erofs_put_metabuf(&buf);
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index fa488af8dfcf..c8f6ac910976 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -639,6 +639,8 @@ int erofs_fscache_register_cookie(struct super_block *sb,
>  				  struct erofs_fscache **fscache,
>  				  char *name, bool need_inode);
>  void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
> +
> +extern const struct address_space_operations erofs_fscache_access_aops;
>  #else
>  static inline int erofs_fscache_register_fs(struct super_block *sb)
>  {
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 18/21] erofs: implement fscache-based data read for non-inline layout
@ 2022-04-21 11:13     ` Gao Xiang
  0 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 11:13 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

On Fri, Apr 15, 2022 at 08:36:11PM +0800, Jeffle Xu wrote:
> Implement the data plane of reading data from data blobs over fscache
> for non-inline layout.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/fscache.c  | 51 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/erofs/inode.c    |  4 ++++
>  fs/erofs/internal.h |  2 ++
>  3 files changed, 57 insertions(+)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 3f00eb34ac35..b799b0fe1b67 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -84,10 +84,61 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
>  	return ret;
>  }
>  
> +static int erofs_fscache_readpage(struct file *file, struct page *page)
> +{
> +	struct folio *folio = page_folio(page);
> +	struct inode *inode = folio_mapping(folio)->host;
> +	struct super_block *sb = inode->i_sb;
> +	struct erofs_map_blocks map;
> +	struct erofs_map_dev mdev;
> +	erofs_off_t pos;
> +	loff_t pstart;
> +	int ret = 0;

Redundant assignment now?

> +
> +	DBG_BUGON(folio_size(folio) != EROFS_BLKSIZ);
> +
> +	pos = folio_pos(folio);
> +	map.m_la = pos;
> +
> +	ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
> +	if (ret)
> +		goto out_unlock;
> +
> +	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> +		folio_zero_range(folio, 0, folio_size(folio));
> +		goto out_uptodate;
> +	}
> +
> +	mdev = (struct erofs_map_dev) {
> +		.m_deviceid = map.m_deviceid,
> +		.m_pa = map.m_pa,
> +	};
> +
> +	ret = erofs_map_dev(sb, &mdev);
> +	if (ret)
> +		goto out_unlock;
> +
> +	pstart = mdev.m_pa + (pos - map.m_la);
> +	ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
> +			folio_mapping(folio), folio_pos(folio),
> +			folio_size(folio), pstart);
> +
> +out_uptodate:
> +	if (!ret)
> +		folio_mark_uptodate(folio);
> +out_unlock:
> +	folio_unlock(folio);
> +	return ret;
> +}
> +
>  static const struct address_space_operations erofs_fscache_meta_aops = {
>  	.readpage = erofs_fscache_meta_readpage,
>  };
>  
> +const struct address_space_operations erofs_fscache_access_aops = {
> +	.readpage = erofs_fscache_readpage,
> +};
> +
>  /*
>   * Create an fscache context for data blob.
>   * Return: 0 on success and allocated fscache context is assigned to @fscache,
> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> index e8b37ba5e9ad..8d3f56c6469b 100644
> --- a/fs/erofs/inode.c
> +++ b/fs/erofs/inode.c
> @@ -297,6 +297,10 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
>  		goto out_unlock;
>  	}
>  	inode->i_mapping->a_ops = &erofs_raw_access_aops;
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> +	if (erofs_is_fscache_mode(inode->i_sb))
> +		inode->i_mapping->a_ops = &erofs_fscache_access_aops;
> +#endif
>  
>  out_unlock:
>  	erofs_put_metabuf(&buf);
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index fa488af8dfcf..c8f6ac910976 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -639,6 +639,8 @@ int erofs_fscache_register_cookie(struct super_block *sb,
>  				  struct erofs_fscache **fscache,
>  				  char *name, bool need_inode);
>  void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
> +
> +extern const struct address_space_operations erofs_fscache_access_aops;
>  #else
>  static inline int erofs_fscache_register_fs(struct super_block *sb)
>  {
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 19/21] erofs: implement fscache-based data read for inline layout
  2022-04-15 12:36   ` Jeffle Xu
@ 2022-04-21 11:14     ` Gao Xiang
  -1 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 11:14 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

On Fri, Apr 15, 2022 at 08:36:12PM +0800, Jeffle Xu wrote:
> Implement the data plane of reading data from data blobs over fscache
> for inline layout.
> 
> For the heading non-inline part, the data plane for non-inline layout is
> reused, while only the tail packing part needs special handling.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

> ---
>  fs/erofs/fscache.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index b799b0fe1b67..08849c15500f 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -84,6 +84,33 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
>  	return ret;
>  }
>  
> +static int erofs_fscache_readpage_inline(struct folio *folio,
> +					 struct erofs_map_blocks *map)
> +{
> +	struct super_block *sb = folio_mapping(folio)->host->i_sb;
> +	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
> +	erofs_blk_t blknr;
> +	size_t offset, len;
> +	void *src, *dst;
> +
> +	/* For tail packing layout, the offset may be non-zero. */
> +	offset = erofs_blkoff(map->m_pa);
> +	blknr = erofs_blknr(map->m_pa);
> +	len = map->m_llen;
> +
> +	src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
> +	if (IS_ERR(src))
> +		return PTR_ERR(src);
> +
> +	dst = kmap_local_folio(folio, 0);
> +	memcpy(dst, src + offset, len);
> +	memset(dst + len, 0, PAGE_SIZE - len);
> +	kunmap_local(dst);
> +
> +	erofs_put_metabuf(&buf);
> +	return 0;
> +}
> +
>  static int erofs_fscache_readpage(struct file *file, struct page *page)
>  {
>  	struct folio *folio = page_folio(page);
> @@ -109,6 +136,11 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
>  		goto out_uptodate;
>  	}
>  
> +	if (map.m_flags & EROFS_MAP_META) {
> +		ret = erofs_fscache_readpage_inline(folio, &map);
> +		goto out_uptodate;
> +	}
> +
>  	mdev = (struct erofs_map_dev) {
>  		.m_deviceid = map.m_deviceid,
>  		.m_pa = map.m_pa,
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 19/21] erofs: implement fscache-based data read for inline layout
@ 2022-04-21 11:14     ` Gao Xiang
  0 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 11:14 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

On Fri, Apr 15, 2022 at 08:36:12PM +0800, Jeffle Xu wrote:
> Implement the data plane of reading data from data blobs over fscache
> for inline layout.
> 
> For the heading non-inline part, the data plane for non-inline layout is
> reused, while only the tail packing part needs special handling.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

> ---
>  fs/erofs/fscache.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index b799b0fe1b67..08849c15500f 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -84,6 +84,33 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
>  	return ret;
>  }
>  
> +static int erofs_fscache_readpage_inline(struct folio *folio,
> +					 struct erofs_map_blocks *map)
> +{
> +	struct super_block *sb = folio_mapping(folio)->host->i_sb;
> +	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
> +	erofs_blk_t blknr;
> +	size_t offset, len;
> +	void *src, *dst;
> +
> +	/* For tail packing layout, the offset may be non-zero. */
> +	offset = erofs_blkoff(map->m_pa);
> +	blknr = erofs_blknr(map->m_pa);
> +	len = map->m_llen;
> +
> +	src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
> +	if (IS_ERR(src))
> +		return PTR_ERR(src);
> +
> +	dst = kmap_local_folio(folio, 0);
> +	memcpy(dst, src + offset, len);
> +	memset(dst + len, 0, PAGE_SIZE - len);
> +	kunmap_local(dst);
> +
> +	erofs_put_metabuf(&buf);
> +	return 0;
> +}
> +
>  static int erofs_fscache_readpage(struct file *file, struct page *page)
>  {
>  	struct folio *folio = page_folio(page);
> @@ -109,6 +136,11 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
>  		goto out_uptodate;
>  	}
>  
> +	if (map.m_flags & EROFS_MAP_META) {
> +		ret = erofs_fscache_readpage_inline(folio, &map);
> +		goto out_uptodate;
> +	}
> +
>  	mdev = (struct erofs_map_dev) {
>  		.m_deviceid = map.m_deviceid,
>  		.m_pa = map.m_pa,
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 20/21] erofs: implement fscache-based data readahead
  2022-04-15 12:36   ` Jeffle Xu
@ 2022-04-21 11:51     ` Gao Xiang
  -1 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 11:51 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

On Fri, Apr 15, 2022 at 08:36:13PM +0800, Jeffle Xu wrote:
> Implement fscache-based data readahead. Also registers an individual
> bdi for each erofs instance to enable readahead.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/erofs/fscache.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/erofs/super.c   |  4 +++
>  2 files changed, 90 insertions(+)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 08849c15500f..eaa50692ddba 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -163,12 +163,98 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
>  	return ret;
>  }
>  
> +static void erofs_fscache_unlock_folios(struct readahead_control *rac,
> +					size_t len)
> +{
> +	while (len) {
> +		struct folio *folio = readahead_folio(rac);
> +
> +		len -= folio_size(folio);
> +		folio_mark_uptodate(folio);
> +		folio_unlock(folio);
> +	}
> +}
> +
> +static void erofs_fscache_readahead(struct readahead_control *rac)
> +{
> +	struct inode *inode = rac->mapping->host;
> +	struct super_block *sb = inode->i_sb;
> +	size_t len, count, done = 0;
> +	erofs_off_t pos;
> +	loff_t start, offset;
> +	int ret;
> +
> +	if (!readahead_count(rac))
> +		return;
> +
> +	start = readahead_pos(rac);
> +	len = readahead_length(rac);
> +
> +	do {
> +		struct erofs_map_blocks map;
> +		struct erofs_map_dev mdev;
> +
> +		pos = start + done;
> +		map.m_la = pos;
> +
> +		ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
> +		if (ret)
> +			return;
> +
> +		offset = start + done;
> +		count = min_t(size_t, map.m_llen - (pos - map.m_la),
> +			      len - done);
> +
> +		if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> +			struct iov_iter iter;
> +
> +			iov_iter_xarray(&iter, READ, &rac->mapping->i_pages,
> +					offset, count);
> +			iov_iter_zero(count, &iter);
> +
> +			erofs_fscache_unlock_folios(rac, count);
> +			ret = count;
> +			continue;
> +		}
> +
> +		if (map.m_flags & EROFS_MAP_META) {
> +			struct folio *folio = readahead_folio(rac);
> +
> +			ret = erofs_fscache_readpage_inline(folio, &map);
> +			if (!ret) {
> +				folio_mark_uptodate(folio);
> +				ret = folio_size(folio);
> +			}
> +
> +			folio_unlock(folio);
> +			continue;
> +		}
> +
> +		mdev = (struct erofs_map_dev) {
> +			.m_deviceid = map.m_deviceid,
> +			.m_pa = map.m_pa,
> +		};
> +		ret = erofs_map_dev(sb, &mdev);
> +		if (ret)
> +			return;
> +
> +		ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
> +				rac->mapping, offset, count,
> +				mdev.m_pa + (pos - map.m_la));
> +		if (!ret) {
> +			erofs_fscache_unlock_folios(rac, count);
> +			ret = count;
> +		}

I think this really needs a comment why we don't need to unlock folios
for the error cases.

Thanks,
Gao Xiang

> +	} while (ret > 0 && ((done += ret) < len));
> +}
> +

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 20/21] erofs: implement fscache-based data readahead
@ 2022-04-21 11:51     ` Gao Xiang
  0 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 11:51 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

On Fri, Apr 15, 2022 at 08:36:13PM +0800, Jeffle Xu wrote:
> Implement fscache-based data readahead. Also registers an individual
> bdi for each erofs instance to enable readahead.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/erofs/fscache.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/erofs/super.c   |  4 +++
>  2 files changed, 90 insertions(+)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 08849c15500f..eaa50692ddba 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -163,12 +163,98 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
>  	return ret;
>  }
>  
> +static void erofs_fscache_unlock_folios(struct readahead_control *rac,
> +					size_t len)
> +{
> +	while (len) {
> +		struct folio *folio = readahead_folio(rac);
> +
> +		len -= folio_size(folio);
> +		folio_mark_uptodate(folio);
> +		folio_unlock(folio);
> +	}
> +}
> +
> +static void erofs_fscache_readahead(struct readahead_control *rac)
> +{
> +	struct inode *inode = rac->mapping->host;
> +	struct super_block *sb = inode->i_sb;
> +	size_t len, count, done = 0;
> +	erofs_off_t pos;
> +	loff_t start, offset;
> +	int ret;
> +
> +	if (!readahead_count(rac))
> +		return;
> +
> +	start = readahead_pos(rac);
> +	len = readahead_length(rac);
> +
> +	do {
> +		struct erofs_map_blocks map;
> +		struct erofs_map_dev mdev;
> +
> +		pos = start + done;
> +		map.m_la = pos;
> +
> +		ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
> +		if (ret)
> +			return;
> +
> +		offset = start + done;
> +		count = min_t(size_t, map.m_llen - (pos - map.m_la),
> +			      len - done);
> +
> +		if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> +			struct iov_iter iter;
> +
> +			iov_iter_xarray(&iter, READ, &rac->mapping->i_pages,
> +					offset, count);
> +			iov_iter_zero(count, &iter);
> +
> +			erofs_fscache_unlock_folios(rac, count);
> +			ret = count;
> +			continue;
> +		}
> +
> +		if (map.m_flags & EROFS_MAP_META) {
> +			struct folio *folio = readahead_folio(rac);
> +
> +			ret = erofs_fscache_readpage_inline(folio, &map);
> +			if (!ret) {
> +				folio_mark_uptodate(folio);
> +				ret = folio_size(folio);
> +			}
> +
> +			folio_unlock(folio);
> +			continue;
> +		}
> +
> +		mdev = (struct erofs_map_dev) {
> +			.m_deviceid = map.m_deviceid,
> +			.m_pa = map.m_pa,
> +		};
> +		ret = erofs_map_dev(sb, &mdev);
> +		if (ret)
> +			return;
> +
> +		ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
> +				rac->mapping, offset, count,
> +				mdev.m_pa + (pos - map.m_la));
> +		if (!ret) {
> +			erofs_fscache_unlock_folios(rac, count);
> +			ret = count;
> +		}

I think this really needs a comment why we don't need to unlock folios
for the error cases.

Thanks,
Gao Xiang

> +	} while (ret > 0 && ((done += ret) < len));
> +}
> +

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 21/21] erofs: add 'fsid' mount option
  2022-04-15 12:36   ` Jeffle Xu
@ 2022-04-21 11:59     ` Gao Xiang
  -1 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 11:59 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

On Fri, Apr 15, 2022 at 08:36:14PM +0800, Jeffle Xu wrote:
> Introduce 'fsid' mount option to enable on-demand read sementics, in
> which case, erofs will be mounted from data blobs. Users could specify
> the name of primary data blob by this mount option.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

> ---
>  fs/erofs/super.c | 31 ++++++++++++++++++++++++++++++-
>  fs/erofs/sysfs.c |  4 ++--
>  2 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index f68ba929100d..4a623630e1c4 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -371,6 +371,8 @@ static int erofs_read_superblock(struct super_block *sb)
>  
>  	if (erofs_sb_has_ztailpacking(sbi))
>  		erofs_info(sb, "EXPERIMENTAL compressed inline data feature in use. Use at your own risk!");
> +	if (erofs_is_fscache_mode(sb))
> +		erofs_info(sb, "EXPERIMENTAL fscache-based on-demand read feature in use. Use at your own risk!");
>  out:
>  	erofs_put_metabuf(&buf);
>  	return ret;
> @@ -399,6 +401,7 @@ enum {
>  	Opt_dax,
>  	Opt_dax_enum,
>  	Opt_device,
> +	Opt_fsid,
>  	Opt_err
>  };
>  
> @@ -423,6 +426,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
>  	fsparam_flag("dax",             Opt_dax),
>  	fsparam_enum("dax",		Opt_dax_enum, erofs_dax_param_enums),
>  	fsparam_string("device",	Opt_device),
> +	fsparam_string("fsid",		Opt_fsid),
>  	{}
>  };
>  
> @@ -518,6 +522,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
>  		}
>  		++ctx->devs->extra_devices;
>  		break;
> +	case Opt_fsid:
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> +		kfree(ctx->opt.fsid);
> +		ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL);
> +		if (!ctx->opt.fsid)
> +			return -ENOMEM;
> +#else
> +		errorfc(fc, "fsid option not supported");
> +#endif
> +		break;
>  	default:
>  		return -ENOPARAM;
>  	}
> @@ -604,6 +618,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>  
>  	sb->s_fs_info = sbi;
>  	sbi->opt = ctx->opt;
> +	ctx->opt.fsid = NULL;
>  	sbi->devs = ctx->devs;
>  	ctx->devs = NULL;
>  
> @@ -690,6 +705,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>  
>  static int erofs_fc_get_tree(struct fs_context *fc)
>  {
> +	struct erofs_fs_context *ctx = fc->fs_private;
> +
> +	if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
> +		return get_tree_nodev(fc, erofs_fc_fill_super);
> +
>  	return get_tree_bdev(fc, erofs_fc_fill_super);
>  }
>  
> @@ -739,6 +759,7 @@ static void erofs_fc_free(struct fs_context *fc)
>  	struct erofs_fs_context *ctx = fc->fs_private;
>  
>  	erofs_free_dev_context(ctx->devs);
> +	kfree(ctx->opt.fsid);
>  	kfree(ctx);
>  }
>  
> @@ -779,7 +800,10 @@ static void erofs_kill_sb(struct super_block *sb)
>  
>  	WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
>  
> -	kill_block_super(sb);
> +	if (erofs_is_fscache_mode(sb))
> +		generic_shutdown_super(sb);
> +	else
> +		kill_block_super(sb);
>  
>  	sbi = EROFS_SB(sb);
>  	if (!sbi)
> @@ -789,6 +813,7 @@ static void erofs_kill_sb(struct super_block *sb)
>  	fs_put_dax(sbi->dax_dev);
>  	erofs_fscache_unregister_cookie(&sbi->s_fscache);
>  	erofs_fscache_unregister_fs(sb);
> +	kfree(sbi->opt.fsid);
>  	kfree(sbi);
>  	sb->s_fs_info = NULL;
>  }
> @@ -938,6 +963,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
>  		seq_puts(seq, ",dax=always");
>  	if (test_opt(opt, DAX_NEVER))
>  		seq_puts(seq, ",dax=never");
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> +	if (opt->fsid)
> +		seq_printf(seq, ",fsid=%s", opt->fsid);
> +#endif
>  	return 0;
>  }
>  
> diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
> index f3babf1e6608..c1383e508bbe 100644
> --- a/fs/erofs/sysfs.c
> +++ b/fs/erofs/sysfs.c
> @@ -205,8 +205,8 @@ int erofs_register_sysfs(struct super_block *sb)
>  
>  	sbi->s_kobj.kset = &erofs_root;
>  	init_completion(&sbi->s_kobj_unregister);
> -	err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL,
> -				   "%s", sb->s_id);
> +	err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
> +			erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
>  	if (err)
>  		goto put_sb_kobj;
>  	return 0;
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 21/21] erofs: add 'fsid' mount option
@ 2022-04-21 11:59     ` Gao Xiang
  0 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 11:59 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

On Fri, Apr 15, 2022 at 08:36:14PM +0800, Jeffle Xu wrote:
> Introduce 'fsid' mount option to enable on-demand read sementics, in
> which case, erofs will be mounted from data blobs. Users could specify
> the name of primary data blob by this mount option.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

> ---
>  fs/erofs/super.c | 31 ++++++++++++++++++++++++++++++-
>  fs/erofs/sysfs.c |  4 ++--
>  2 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index f68ba929100d..4a623630e1c4 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -371,6 +371,8 @@ static int erofs_read_superblock(struct super_block *sb)
>  
>  	if (erofs_sb_has_ztailpacking(sbi))
>  		erofs_info(sb, "EXPERIMENTAL compressed inline data feature in use. Use at your own risk!");
> +	if (erofs_is_fscache_mode(sb))
> +		erofs_info(sb, "EXPERIMENTAL fscache-based on-demand read feature in use. Use at your own risk!");
>  out:
>  	erofs_put_metabuf(&buf);
>  	return ret;
> @@ -399,6 +401,7 @@ enum {
>  	Opt_dax,
>  	Opt_dax_enum,
>  	Opt_device,
> +	Opt_fsid,
>  	Opt_err
>  };
>  
> @@ -423,6 +426,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
>  	fsparam_flag("dax",             Opt_dax),
>  	fsparam_enum("dax",		Opt_dax_enum, erofs_dax_param_enums),
>  	fsparam_string("device",	Opt_device),
> +	fsparam_string("fsid",		Opt_fsid),
>  	{}
>  };
>  
> @@ -518,6 +522,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
>  		}
>  		++ctx->devs->extra_devices;
>  		break;
> +	case Opt_fsid:
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> +		kfree(ctx->opt.fsid);
> +		ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL);
> +		if (!ctx->opt.fsid)
> +			return -ENOMEM;
> +#else
> +		errorfc(fc, "fsid option not supported");
> +#endif
> +		break;
>  	default:
>  		return -ENOPARAM;
>  	}
> @@ -604,6 +618,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>  
>  	sb->s_fs_info = sbi;
>  	sbi->opt = ctx->opt;
> +	ctx->opt.fsid = NULL;
>  	sbi->devs = ctx->devs;
>  	ctx->devs = NULL;
>  
> @@ -690,6 +705,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>  
>  static int erofs_fc_get_tree(struct fs_context *fc)
>  {
> +	struct erofs_fs_context *ctx = fc->fs_private;
> +
> +	if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
> +		return get_tree_nodev(fc, erofs_fc_fill_super);
> +
>  	return get_tree_bdev(fc, erofs_fc_fill_super);
>  }
>  
> @@ -739,6 +759,7 @@ static void erofs_fc_free(struct fs_context *fc)
>  	struct erofs_fs_context *ctx = fc->fs_private;
>  
>  	erofs_free_dev_context(ctx->devs);
> +	kfree(ctx->opt.fsid);
>  	kfree(ctx);
>  }
>  
> @@ -779,7 +800,10 @@ static void erofs_kill_sb(struct super_block *sb)
>  
>  	WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
>  
> -	kill_block_super(sb);
> +	if (erofs_is_fscache_mode(sb))
> +		generic_shutdown_super(sb);
> +	else
> +		kill_block_super(sb);
>  
>  	sbi = EROFS_SB(sb);
>  	if (!sbi)
> @@ -789,6 +813,7 @@ static void erofs_kill_sb(struct super_block *sb)
>  	fs_put_dax(sbi->dax_dev);
>  	erofs_fscache_unregister_cookie(&sbi->s_fscache);
>  	erofs_fscache_unregister_fs(sb);
> +	kfree(sbi->opt.fsid);
>  	kfree(sbi);
>  	sb->s_fs_info = NULL;
>  }
> @@ -938,6 +963,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
>  		seq_puts(seq, ",dax=always");
>  	if (test_opt(opt, DAX_NEVER))
>  		seq_puts(seq, ",dax=never");
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> +	if (opt->fsid)
> +		seq_printf(seq, ",fsid=%s", opt->fsid);
> +#endif
>  	return 0;
>  }
>  
> diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
> index f3babf1e6608..c1383e508bbe 100644
> --- a/fs/erofs/sysfs.c
> +++ b/fs/erofs/sysfs.c
> @@ -205,8 +205,8 @@ int erofs_register_sysfs(struct super_block *sb)
>  
>  	sbi->s_kobj.kset = &erofs_root;
>  	init_completion(&sbi->s_kobj_unregister);
> -	err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL,
> -				   "%s", sb->s_id);
> +	err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
> +			erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
>  	if (err)
>  		goto put_sb_kobj;
>  	return 0;
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 17/21] erofs: implement fscache-based metadata read
  2022-04-15 12:36   ` Jeffle Xu
@ 2022-04-21 13:03     ` Gao Xiang
  -1 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 13:03 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

On Fri, Apr 15, 2022 at 08:36:10PM +0800, Jeffle Xu wrote:
> Implement the data plane of reading metadata from primary data blob
> over fscache.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/erofs/data.c    | 19 +++++++++++++++----
>  fs/erofs/fscache.c | 27 +++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 14b64d960541..bb9c1fd48c19 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -6,6 +6,7 @@
>   */
>  #include "internal.h"
>  #include <linux/prefetch.h>
> +#include <linux/sched/mm.h>
>  #include <linux/dax.h>
>  #include <trace/events/erofs.h>
>  
> @@ -35,14 +36,20 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
>  	erofs_off_t offset = blknr_to_addr(blkaddr);
>  	pgoff_t index = offset >> PAGE_SHIFT;
>  	struct page *page = buf->page;
> +	struct folio *folio;
> +	unsigned int nofs_flag;
>  
>  	if (!page || page->index != index) {
>  		erofs_put_metabuf(buf);
> -		page = read_cache_page_gfp(mapping, index,
> -				mapping_gfp_constraint(mapping, ~__GFP_FS));
> -		if (IS_ERR(page))
> -			return page;
> +
> +		nofs_flag = memalloc_nofs_save();
> +		folio = read_cache_folio(mapping, index, NULL, NULL);
> +		memalloc_nofs_restore(nofs_flag);
> +		if (IS_ERR(folio))
> +			return folio;
> +
>  		/* should already be PageUptodate, no need to lock page */
> +		page = folio_file_page(folio, index);
>  		buf->page = page;
>  	}
>  	if (buf->kmap_type == EROFS_NO_KMAP) {
> @@ -63,6 +70,10 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
>  void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
>  			 erofs_blk_t blkaddr, enum erofs_kmap_type type)
>  {
> +	if (erofs_is_fscache_mode(sb))
> +		return erofs_bread(buf, EROFS_SB(sb)->s_fscache->inode,
> +				   blkaddr, type);
> +
>  	return erofs_bread(buf, sb->s_bdev->bd_inode, blkaddr, type);
>  }
>  
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 066f68c062e2..3f00eb34ac35 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -58,7 +58,34 @@ static int erofs_fscache_read_folios(struct fscache_cookie *cookie,
>  	return ret;
>  }
>  
> +static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
> +{
> +	int ret;
> +	struct folio *folio = page_folio(page);
> +	struct super_block *sb = folio_mapping(folio)->host->i_sb;
> +	struct erofs_map_dev mdev = {
> +		.m_deviceid = 0,
> +		.m_pa = folio_pos(folio),
> +	};
> +
> +	ret = erofs_map_dev(sb, &mdev);
> +	if (ret)
> +		goto out;
> +
> +	ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
> +			folio_mapping(folio), folio_pos(folio),
> +			folio_size(folio), mdev.m_pa);
> +	if (ret)
> +		goto out;
> +
> +	folio_mark_uptodate(folio);

	if (!ret)
		folio_mark_uptodate(folio);

Thanks,
Gao Xiang

> +out:
> +	folio_unlock(folio);
> +	return ret;
> +}
> +
>  static const struct address_space_operations erofs_fscache_meta_aops = {
> +	.readpage = erofs_fscache_meta_readpage,
>  };
>  
>  /*
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 17/21] erofs: implement fscache-based metadata read
@ 2022-04-21 13:03     ` Gao Xiang
  0 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 13:03 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

On Fri, Apr 15, 2022 at 08:36:10PM +0800, Jeffle Xu wrote:
> Implement the data plane of reading metadata from primary data blob
> over fscache.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/erofs/data.c    | 19 +++++++++++++++----
>  fs/erofs/fscache.c | 27 +++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 14b64d960541..bb9c1fd48c19 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -6,6 +6,7 @@
>   */
>  #include "internal.h"
>  #include <linux/prefetch.h>
> +#include <linux/sched/mm.h>
>  #include <linux/dax.h>
>  #include <trace/events/erofs.h>
>  
> @@ -35,14 +36,20 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
>  	erofs_off_t offset = blknr_to_addr(blkaddr);
>  	pgoff_t index = offset >> PAGE_SHIFT;
>  	struct page *page = buf->page;
> +	struct folio *folio;
> +	unsigned int nofs_flag;
>  
>  	if (!page || page->index != index) {
>  		erofs_put_metabuf(buf);
> -		page = read_cache_page_gfp(mapping, index,
> -				mapping_gfp_constraint(mapping, ~__GFP_FS));
> -		if (IS_ERR(page))
> -			return page;
> +
> +		nofs_flag = memalloc_nofs_save();
> +		folio = read_cache_folio(mapping, index, NULL, NULL);
> +		memalloc_nofs_restore(nofs_flag);
> +		if (IS_ERR(folio))
> +			return folio;
> +
>  		/* should already be PageUptodate, no need to lock page */
> +		page = folio_file_page(folio, index);
>  		buf->page = page;
>  	}
>  	if (buf->kmap_type == EROFS_NO_KMAP) {
> @@ -63,6 +70,10 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
>  void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
>  			 erofs_blk_t blkaddr, enum erofs_kmap_type type)
>  {
> +	if (erofs_is_fscache_mode(sb))
> +		return erofs_bread(buf, EROFS_SB(sb)->s_fscache->inode,
> +				   blkaddr, type);
> +
>  	return erofs_bread(buf, sb->s_bdev->bd_inode, blkaddr, type);
>  }
>  
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 066f68c062e2..3f00eb34ac35 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -58,7 +58,34 @@ static int erofs_fscache_read_folios(struct fscache_cookie *cookie,
>  	return ret;
>  }
>  
> +static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
> +{
> +	int ret;
> +	struct folio *folio = page_folio(page);
> +	struct super_block *sb = folio_mapping(folio)->host->i_sb;
> +	struct erofs_map_dev mdev = {
> +		.m_deviceid = 0,
> +		.m_pa = folio_pos(folio),
> +	};
> +
> +	ret = erofs_map_dev(sb, &mdev);
> +	if (ret)
> +		goto out;
> +
> +	ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
> +			folio_mapping(folio), folio_pos(folio),
> +			folio_size(folio), mdev.m_pa);
> +	if (ret)
> +		goto out;
> +
> +	folio_mark_uptodate(folio);

	if (!ret)
		folio_mark_uptodate(folio);

Thanks,
Gao Xiang

> +out:
> +	folio_unlock(folio);
> +	return ret;
> +}
> +
>  static const struct address_space_operations erofs_fscache_meta_aops = {
> +	.readpage = erofs_fscache_meta_readpage,
>  };
>  
>  /*
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 01/21] cachefiles: extract write routine
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-21 13:24   ` David Howells
  -1 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 13:24 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> Extract the generic routine of writing data to cache files, and make it
> generally available.
> 
> This will be used by the following patch implementing on-demand read
> mode. Since it's called inside cachefiles module in this case, make the
> interface generic and unrelated to netfs_cache_resources.
> 
> It is worth noting that, ki->inval_counter is not initialized after
> this cleanup. It shall not make any visible difference, since
> inval_counter is no longer used in the write completion routine, i.e.
> cachefiles_write_complete().
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Acked-by: David Howells <dhowells@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 01/21] cachefiles: extract write routine
@ 2022-04-21 13:24   ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 13:24 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> Extract the generic routine of writing data to cache files, and make it
> generally available.
> 
> This will be used by the following patch implementing on-demand read
> mode. Since it's called inside cachefiles module in this case, make the
> interface generic and unrelated to netfs_cache_resources.
> 
> It is worth noting that, ki->inval_counter is not initialized after
> this cleanup. It shall not make any visible difference, since
> inval_counter is no longer used in the write completion routine, i.e.
> cachefiles_write_complete().
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Acked-by: David Howells <dhowells@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 02/21] cachefiles: notify user daemon when looking up cookie
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-21 13:57   ` David Howells
  -1 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 13:57 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	help
> +	  This permits on-demand read mode of cachefiles.  In this mode, when
> +	  cache miss, the cachefiles backend instead of netfs, is responsible
> +	  for fetching data, e.g. through user daemon.

How about:

	help
	  This permits userspace to enable the cachefiles on-demand read mode.
	  In this mode, when a cache miss occurs, responsibility for fetching
	  the data lies with the cachefiles backend instead of with the netfs
	  and is delegated to userspace.

> +	/*
> +	 * 1) Cache has been marked as dead state, and then 2) flush all
> +	 * pending requests in @reqs xarray. The barrier inside set_bit()
> +	 * will ensure that above two ops won't be reordered.
> +	 */

What set_bit()?  What "above two ops"?  And that's not how barriers work; they
provide a partial ordering relative to another pair of barriered ops.

Also, set_bit() can't be relied upon to imply a barrier - see
Documentation/memory-barriers.txt.

> +	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
> +	    test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) {

It might be worth abstracting this into an inline function in internal.h:

	static inline bool cachefiles_in_ondemand_mode(cache)
	{
		return IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
			test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)
	}

> +#ifdef CONFIG_CACHEFILES_ONDEMAND

This looks like it ought to be superfluous, given the preceding test - though
I can see why you need it:

> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> +	struct xarray			reqs;		/* xarray of pending on-demand requests */
> +	struct xarray			ondemand_ids;	/* xarray for ondemand_id allocation */
> +	u32				ondemand_id_next;
> +#endif

I'm tempted to say that you should just make them non-conditional.  It's not
like there's likely to be more than one or two cachefiles_cache structs on a
system.

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 02/21] cachefiles: notify user daemon when looking up cookie
@ 2022-04-21 13:57   ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 13:57 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	help
> +	  This permits on-demand read mode of cachefiles.  In this mode, when
> +	  cache miss, the cachefiles backend instead of netfs, is responsible
> +	  for fetching data, e.g. through user daemon.

How about:

	help
	  This permits userspace to enable the cachefiles on-demand read mode.
	  In this mode, when a cache miss occurs, responsibility for fetching
	  the data lies with the cachefiles backend instead of with the netfs
	  and is delegated to userspace.

> +	/*
> +	 * 1) Cache has been marked as dead state, and then 2) flush all
> +	 * pending requests in @reqs xarray. The barrier inside set_bit()
> +	 * will ensure that above two ops won't be reordered.
> +	 */

What set_bit()?  What "above two ops"?  And that's not how barriers work; they
provide a partial ordering relative to another pair of barriered ops.

Also, set_bit() can't be relied upon to imply a barrier - see
Documentation/memory-barriers.txt.

> +	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
> +	    test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) {

It might be worth abstracting this into an inline function in internal.h:

	static inline bool cachefiles_in_ondemand_mode(cache)
	{
		return IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
			test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)
	}

> +#ifdef CONFIG_CACHEFILES_ONDEMAND

This looks like it ought to be superfluous, given the preceding test - though
I can see why you need it:

> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> +	struct xarray			reqs;		/* xarray of pending on-demand requests */
> +	struct xarray			ondemand_ids;	/* xarray for ondemand_id allocation */
> +	u32				ondemand_id_next;
> +#endif

I'm tempted to say that you should just make them non-conditional.  It's not
like there's likely to be more than one or two cachefiles_cache structs on a
system.

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-21 14:02   ` David Howells
  -1 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:02 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	struct kref			unbind_pincount;/* refcount to do daemon unbind */

Please use refcount_t or atomic_t, especially as this isn't the refcount for
the structure.

> -	cachefiles_daemon_unbind(cache);
> -
>  	/* clean up the control file interface */
>  	cache->cachefilesd = NULL;
>  	file->private_data = NULL;
>  	cachefiles_open = 0;

Please call cachefiles_daemon_unbind() before the cleanup.

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode
@ 2022-04-21 14:02   ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:02 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	struct kref			unbind_pincount;/* refcount to do daemon unbind */

Please use refcount_t or atomic_t, especially as this isn't the refcount for
the structure.

> -	cachefiles_daemon_unbind(cache);
> -
>  	/* clean up the control file interface */
>  	cache->cachefilesd = NULL;
>  	file->private_data = NULL;
>  	cachefiles_open = 0;

Please call cachefiles_daemon_unbind() before the cleanup.

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 04/21] cachefiles: notify user daemon when withdrawing cookie
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-21 14:05   ` David Howells
  -1 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:05 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	 * It's possiblie that object id is still 0 if the cookie looking up

possiblie -> possible

Otherwise:

Acked-by: David Howells <dhowells@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 04/21] cachefiles: notify user daemon when withdrawing cookie
@ 2022-04-21 14:05   ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:05 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	 * It's possiblie that object id is still 0 if the cookie looking up

possiblie -> possible

Otherwise:

Acked-by: David Howells <dhowells@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 05/21] cachefiles: implement on-demand read
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-21 14:14   ` David Howells
  -1 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:14 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand
> read should be done when a cache miss encountered.

That may conflict with changes I'm making - but it's just a matter of flag
renumbering.

> +#define CACHEFILES_IOC_CREAD	_IOW(0x98, 1, int)

I wonder if CACHEFILES_IOC_READ_COMPLETE would be a better name, but apart
from that:

Acked-by: David Howells <dhowells@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 05/21] cachefiles: implement on-demand read
@ 2022-04-21 14:14   ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:14 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand
> read should be done when a cache miss encountered.

That may conflict with changes I'm making - but it's just a matter of flag
renumbering.

> +#define CACHEFILES_IOC_CREAD	_IOW(0x98, 1, int)

I wonder if CACHEFILES_IOC_READ_COMPLETE would be a better name, but apart
from that:

Acked-by: David Howells <dhowells@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 06/21] cachefiles: enable on-demand read mode
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-21 14:17   ` David Howells
  -1 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:17 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
> +	    !strcmp(args, "ondemand")) {
> +		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
> +	} else if (*args) {
> +		pr_err("'bind' command doesn't take an argument\n");

The error message isn't true if CONFIG_CACHEFILES_ONDEMAND=y.  It would be
better to say "Invalid argument to the 'bind' command".

> -retry:
>  	/* If the caller asked us to seek for data before doing the read, then
>  	 * we should do that now.  If we find a gap, we fill it with zeros.
>  	 */
> @@ -120,16 +119,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
>  			if (read_hole == NETFS_READ_HOLE_FAIL)
>  				goto presubmission_error;
>  
> -			if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
> -				ret = cachefiles_ondemand_read(object, off, len);
> -				if (ret)
> -					goto presubmission_error;
> -
> -				/* fail the read if no progress achieved */
> -				read_hole = NETFS_READ_HOLE_FAIL;
> -				goto retry;
> -			}
> -

Unexplained deletion of newly added code.

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 06/21] cachefiles: enable on-demand read mode
@ 2022-04-21 14:17   ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:17 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
> +	    !strcmp(args, "ondemand")) {
> +		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
> +	} else if (*args) {
> +		pr_err("'bind' command doesn't take an argument\n");

The error message isn't true if CONFIG_CACHEFILES_ONDEMAND=y.  It would be
better to say "Invalid argument to the 'bind' command".

> -retry:
>  	/* If the caller asked us to seek for data before doing the read, then
>  	 * we should do that now.  If we find a gap, we fill it with zeros.
>  	 */
> @@ -120,16 +119,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
>  			if (read_hole == NETFS_READ_HOLE_FAIL)
>  				goto presubmission_error;
>  
> -			if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
> -				ret = cachefiles_ondemand_read(object, off, len);
> -				if (ret)
> -					goto presubmission_error;
> -
> -				/* fail the read if no progress achieved */
> -				read_hole = NETFS_READ_HOLE_FAIL;
> -				goto retry;
> -			}
> -

Unexplained deletion of newly added code.

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 07/21] cachefiles: add tracepoints for on-demand read mode
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-21 14:19   ` David Howells
  -1 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:19 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> Add tracepoints for on-demand read mode. Currently following tracepoints
> are added:
> 
> 	OPEN request / COPEN reply
> 	CLOSE request
> 	READ request / CREAD reply
> 	write through anonymous fd
> 	release of anonymous fd
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Acked-by: David Howells <dhowells@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 07/21] cachefiles: add tracepoints for on-demand read mode
@ 2022-04-21 14:19   ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:19 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> Add tracepoints for on-demand read mode. Currently following tracepoints
> are added:
> 
> 	OPEN request / COPEN reply
> 	CLOSE request
> 	READ request / CREAD reply
> 	write through anonymous fd
> 	release of anonymous fd
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Acked-by: David Howells <dhowells@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 02/21] cachefiles: notify user daemon when looking up cookie
  2022-04-21 13:57   ` David Howells
@ 2022-04-21 14:47     ` JeffleXu
  -1 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-21 14:47 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Hi David,

Thanks for reviewing :)


On 4/21/22 9:57 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	help
>> +	  This permits on-demand read mode of cachefiles.  In this mode, when
>> +	  cache miss, the cachefiles backend instead of netfs, is responsible
>> +	  for fetching data, e.g. through user daemon.
> 
> How about:
> 
> 	help
> 	  This permits userspace to enable the cachefiles on-demand read mode.
> 	  In this mode, when a cache miss occurs, responsibility for fetching
> 	  the data lies with the cachefiles backend instead of with the netfs
> 	  and is delegated to userspace.
> 
>> +	/*
>> +	 * 1) Cache has been marked as dead state, and then 2) flush all
>> +	 * pending requests in @reqs xarray. The barrier inside set_bit()
>> +	 * will ensure that above two ops won't be reordered.
>> +	 */
> 
> What set_bit()?  

"set_bit(CACHEFILES_DEAD, &cache->flags);" in cachefiles_daemon_release()

> What "above two ops"? 

The two operations I mentioned in the comment:
1) Cache has been marked as dead state, and then
2) flush all pending requests in @reqs xarray.


> And that's not how barriers work; they


> provide a partial ordering relative to another pair of barriered ops.
> 
> Also, set_bit() can't be relied upon to imply a barrier - see
> Documentation/memory-barriers.txt.

Yeah, it seems that set_bit() doesn't imply with a memory barrier,
though the x86 implementation (arch/x86/boot/bitops.h) indeed implies a
barrier, which may misleads me. Thanks for pointing it out. Then maybe a
full barrier is needed here before flushing the @reqs xarray.

> 
>> +	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
>> +	    test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) {
> 
> It might be worth abstracting this into an inline function in internal.h:
> 
> 	static inline bool cachefiles_in_ondemand_mode(cache)
> 	{
> 		return IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
> 			test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)
> 	}

Okay, will be fixed in the next version.

> 
>> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> 
> This looks like it ought to be superfluous, given the preceding test - though
> I can see why you need it:

Sorry I can't see the context. But I guess you are referring to the
snippet of cachefiles_daemon_poll()?

```
+	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+	    test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) {
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+		if (!xa_empty(&cache->reqs))
+			mask |= EPOLLIN;
```

Yes the implementation here is indeed not elegant enough. As you
described below, if @reqs is defined non-conditionally in struct
cachefiles_cache, then the superfluous magic here is not needed then.

> 
>> +#ifdef CONFIG_CACHEFILES_ONDEMAND
>> +	struct xarray			reqs;		/* xarray of pending on-demand requests */
>> +	struct xarray			ondemand_ids;	/* xarray for ondemand_id allocation */
>> +	u32				ondemand_id_next;
>> +#endif
> 
> I'm tempted to say that you should just make them non-conditional.  It's not
> like there's likely to be more than one or two cachefiles_cache structs on a
> system.

Okay, sounds reasonable.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 02/21] cachefiles: notify user daemon when looking up cookie
@ 2022-04-21 14:47     ` JeffleXu
  0 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-21 14:47 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, fannaihao, willy, linux-kernel, tianzichen,
	joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Hi David,

Thanks for reviewing :)


On 4/21/22 9:57 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	help
>> +	  This permits on-demand read mode of cachefiles.  In this mode, when
>> +	  cache miss, the cachefiles backend instead of netfs, is responsible
>> +	  for fetching data, e.g. through user daemon.
> 
> How about:
> 
> 	help
> 	  This permits userspace to enable the cachefiles on-demand read mode.
> 	  In this mode, when a cache miss occurs, responsibility for fetching
> 	  the data lies with the cachefiles backend instead of with the netfs
> 	  and is delegated to userspace.
> 
>> +	/*
>> +	 * 1) Cache has been marked as dead state, and then 2) flush all
>> +	 * pending requests in @reqs xarray. The barrier inside set_bit()
>> +	 * will ensure that above two ops won't be reordered.
>> +	 */
> 
> What set_bit()?  

"set_bit(CACHEFILES_DEAD, &cache->flags);" in cachefiles_daemon_release()

> What "above two ops"? 

The two operations I mentioned in the comment:
1) Cache has been marked as dead state, and then
2) flush all pending requests in @reqs xarray.


> And that's not how barriers work; they


> provide a partial ordering relative to another pair of barriered ops.
> 
> Also, set_bit() can't be relied upon to imply a barrier - see
> Documentation/memory-barriers.txt.

Yeah, it seems that set_bit() doesn't imply with a memory barrier,
though the x86 implementation (arch/x86/boot/bitops.h) indeed implies a
barrier, which may misleads me. Thanks for pointing it out. Then maybe a
full barrier is needed here before flushing the @reqs xarray.

> 
>> +	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
>> +	    test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) {
> 
> It might be worth abstracting this into an inline function in internal.h:
> 
> 	static inline bool cachefiles_in_ondemand_mode(cache)
> 	{
> 		return IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
> 			test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)
> 	}

Okay, will be fixed in the next version.

> 
>> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> 
> This looks like it ought to be superfluous, given the preceding test - though
> I can see why you need it:

Sorry I can't see the context. But I guess you are referring to the
snippet of cachefiles_daemon_poll()?

```
+	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+	    test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) {
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+		if (!xa_empty(&cache->reqs))
+			mask |= EPOLLIN;
```

Yes the implementation here is indeed not elegant enough. As you
described below, if @reqs is defined non-conditionally in struct
cachefiles_cache, then the superfluous magic here is not needed then.

> 
>> +#ifdef CONFIG_CACHEFILES_ONDEMAND
>> +	struct xarray			reqs;		/* xarray of pending on-demand requests */
>> +	struct xarray			ondemand_ids;	/* xarray for ondemand_id allocation */
>> +	u32				ondemand_id_next;
>> +#endif
> 
> I'm tempted to say that you should just make them non-conditional.  It's not
> like there's likely to be more than one or two cachefiles_cache structs on a
> system.

Okay, sounds reasonable.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 08/21] cachefiles: document on-demand read mode
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-21 14:47   ` David Howells
  -1 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:47 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +When working in its original mode, cachefiles mainly

I'd delete "mainly" there.

> serves as a local cache
> +for a remote networking fs - while in on-demand read mode, cachefiles can boost
> +the scenario where on-demand read semantics is

is -> are.

> +The essential difference between these two modes is that, in original mode,
> +when a cache miss occurs, the netfs will fetch the data from the remote server
> +and then write it to the cache file.  With on-demand read mode, however,
> +fetching the data and writing it into the cache is delegated to a user daemon.

The starting sentence seems off.  How about:

  The essential difference between these two modes is seen when a cache miss
  occurs: In the original mode, the netfs will fetch the data from the remote
  server and then write it to the cache file; in on-demand read mode, fetching
  data and writing it into the cache is delegated to a user daemon.

> +Protocol Communication
> +----------------------
> +
> +The on-demand read mode relies on

relies on -> uses

> a simple protocol used

Delete "used".

> for communication
> +between kernel and user daemon. The protocol can be modeled as::
> +
> +	kernel --[request]--> user daemon --[reply]--> kernel
> +
> +The cachefiles kernel module will send requests to the user daemon when needed.
> +The user daemon needs to

needs to -> should

> poll on

poll on -> poll

> the devnode ('/dev/cachefiles') to check if
> +there's a pending request to be processed.  A POLLIN event will be returned
> +when there's a pending request.
> +
> +The user daemon then reads the devnode to fetch a request and process it
> +accordingly.

Reading the devnode doesn't process the request, so I think something like:

"... and process it accordingly" -> "... that it can then process."

or:

"... and process it accordingly" -> "... to process."

> It is worth noting

"It should be noted"

> that each read only gets one request. When

... it has ...

> +finished processing the request, the user daemon needs to

needs to -> should write

> write the reply to
> +the devnode.
> +
> +Each request starts with a message header of the form::
> +
> +	struct cachefiles_msg {
> +		__u32 msg_id;
> +		__u32 opcode;
> +		__u32 len;
> +		__u32 object_id;
> +		__u8  data[];
> +	};
> +
> +	where:
> +
> +	* ``msg_id`` is a unique ID identifying this request among all pending
> +	  requests.
> +
> +	* ``opcode`` indicates the type of this request.
> +
> +	* ``object_id`` is a unique ID identifying the cache file operated on.
> +
> +	* ``data`` indicates the payload of this request.
> +
> +	* ``len`` indicates the whole length of this request, including the
> +	  header and following type-specific payload.
> +
> +
> +Turn on On-demand Mode

Turning on

> +----------------------
> +
> +An optional parameter is added

is added -> becomes available

> to the "bind" command::
> +
> +	bind [ondemand]
> +
> +When the "bind" command takes without

takes without -> is given no

> argument, it defaults to the original
> +mode.  When the "bind" command is given

When it is given

> the "ondemand" argument, i.e.
> +"bind ondemand", on-demand read mode will be enabled.
> +
> +
> +The OPEN Request
> +----------------
> +
> +When the netfs opens a cache file for the first time, a request with the
> +CACHEFILES_OP_OPEN opcode, a.k.a an OPEN request will be sent to the user
> +daemon.  The payload format is of the form::
> +
> +	struct cachefiles_open {
> +		__u32 volume_key_size;
> +		__u32 cookie_key_size;
> +		__u32 fd;
> +		__u32 flags;
> +		__u8  data[];
> +	};
> +
> +	where:
> +
> +	* ``data`` contains the volume_key followed directly by the cookie_key.
> +	  The volume key is a NUL-terminated string; the cookie key is binary
> +	  data.
> +
> +	* ``volume_key_size`` indicates the size of the volume key in bytes.
> +
> +	* ``cookie_key_size`` indicates the size of the cookie key in bytes.
> +
> +	* ``fd`` indicates an anonymous fd referring to the cache file, through
> +	  which the user daemon can perform write/llseek file operations on the
> +	  cache file.
> +
> +
> +The user daemon is able to distinguish the requested cache file with the given
> +(volume_key, cookie_key) pair.

"The user daemon can use the given (volume_key, cookie_key) pair to
distinguish the requested cache file." might sound better.

> Each cache file has a unique object_id, while it
> +may have multiple anonymous fds. The user daemon may duplicate anonymous fds
> +from the initial anonymous fd indicated by the @fd field through dup(). Thus
> +each object_id can be mapped to multiple anonymous fds, while the usr daemon
> +itself needs to maintain the mapping.
> +
> +With the given anonymous fd, the user daemon can fetch data and write it to the
> +cache file in the background, even when kernel has not triggered a cache miss
> +yet.
> +
> +The user daemon should complete the READ request

READ request -> OPEN request?

> by issuing a "copen" (complete
> +open) command on the devnode::
> +
> +	copen <msg_id>,<cache_size>
> +
> +	* ``msg_id`` must match the msg_id field of the previous OPEN request.
> +
> +	* When >= 0, ``cache_size`` indicates the size of the cache file;
> +	  when < 0, ``cache_size`` indicates the

the -> any

> error code ecountered

encountered

> by the
> +	  user daemon.
> +
> +
> +The CLOSE Request
> +-----------------
> +
> +When a cookie withdrawn, a CLOSE request (opcode CACHEFILES_OP_CLOSE) will be
> +sent to the user daemon. It will notify

It will notify -> This tells

> the user daemon to close all anonymous
> +fds associated with the given object_id.  The CLOSE request has no extea

extra

> +payload.
> +
> +
> +The READ Request
> +----------------
> +
> +When on-demand read mode is turned on, and a cache miss encountered,

"When a cache miss is encountered in on-demand read mode,"

> the kernel
> +will send a READ request (opcode CACHEFILES_OP_READ) to the user daemon. This
> +will tell

will tell -> tells/asks

> the user daemon to fetch data

data -> the contents

> of the requested file range. The payload
> +is of the form::
> +
> +	struct cachefiles_read {
> +		__u64 off;
> +		__u64 len;
> +	};
> +
> +	where:
> +
> +	* ``off`` indicates the starting offset of the requested file range.
> +
> +	* ``len`` indicates the length of the requested file range.
> +
> +
> +When receiving

receiving -> it receives

> a READ request, the user daemon needs to

needs to -> should

> fetch the

requested

> data of the
> +requested file range,

"of the requested file range," -> "" (including the comma, I think)

> and then

"then" -> ""

> write it to the cache file identified by
> +object_id.
> +
> +To finish

When it has finished

> processing the READ request, the user daemon should reply with

with -> by using

> the
> +CACHEFILES_IOC_CREAD ioctl on one of the anonymous fds associated with the given
> +object_id

given object_id -> object_id given

> in the READ request.  The ioctl is of the form::
> +
> +	ioctl(fd, CACHEFILES_IOC_CREAD, msg_id);
> +
> +	* ``fd`` is one of the anonymous fds associated with the given object_id
> +	  in the READ request.

the given object_id in the READ request -> object_id

> +
> +	* ``msg_id`` must match the msg_id field of the previous READ request.

By "previous READ request" is this referring to something different to "the
READ request" you mentioned against the fd parameter?

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 08/21] cachefiles: document on-demand read mode
@ 2022-04-21 14:47   ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:47 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +When working in its original mode, cachefiles mainly

I'd delete "mainly" there.

> serves as a local cache
> +for a remote networking fs - while in on-demand read mode, cachefiles can boost
> +the scenario where on-demand read semantics is

is -> are.

> +The essential difference between these two modes is that, in original mode,
> +when a cache miss occurs, the netfs will fetch the data from the remote server
> +and then write it to the cache file.  With on-demand read mode, however,
> +fetching the data and writing it into the cache is delegated to a user daemon.

The starting sentence seems off.  How about:

  The essential difference between these two modes is seen when a cache miss
  occurs: In the original mode, the netfs will fetch the data from the remote
  server and then write it to the cache file; in on-demand read mode, fetching
  data and writing it into the cache is delegated to a user daemon.

> +Protocol Communication
> +----------------------
> +
> +The on-demand read mode relies on

relies on -> uses

> a simple protocol used

Delete "used".

> for communication
> +between kernel and user daemon. The protocol can be modeled as::
> +
> +	kernel --[request]--> user daemon --[reply]--> kernel
> +
> +The cachefiles kernel module will send requests to the user daemon when needed.
> +The user daemon needs to

needs to -> should

> poll on

poll on -> poll

> the devnode ('/dev/cachefiles') to check if
> +there's a pending request to be processed.  A POLLIN event will be returned
> +when there's a pending request.
> +
> +The user daemon then reads the devnode to fetch a request and process it
> +accordingly.

Reading the devnode doesn't process the request, so I think something like:

"... and process it accordingly" -> "... that it can then process."

or:

"... and process it accordingly" -> "... to process."

> It is worth noting

"It should be noted"

> that each read only gets one request. When

... it has ...

> +finished processing the request, the user daemon needs to

needs to -> should write

> write the reply to
> +the devnode.
> +
> +Each request starts with a message header of the form::
> +
> +	struct cachefiles_msg {
> +		__u32 msg_id;
> +		__u32 opcode;
> +		__u32 len;
> +		__u32 object_id;
> +		__u8  data[];
> +	};
> +
> +	where:
> +
> +	* ``msg_id`` is a unique ID identifying this request among all pending
> +	  requests.
> +
> +	* ``opcode`` indicates the type of this request.
> +
> +	* ``object_id`` is a unique ID identifying the cache file operated on.
> +
> +	* ``data`` indicates the payload of this request.
> +
> +	* ``len`` indicates the whole length of this request, including the
> +	  header and following type-specific payload.
> +
> +
> +Turn on On-demand Mode

Turning on

> +----------------------
> +
> +An optional parameter is added

is added -> becomes available

> to the "bind" command::
> +
> +	bind [ondemand]
> +
> +When the "bind" command takes without

takes without -> is given no

> argument, it defaults to the original
> +mode.  When the "bind" command is given

When it is given

> the "ondemand" argument, i.e.
> +"bind ondemand", on-demand read mode will be enabled.
> +
> +
> +The OPEN Request
> +----------------
> +
> +When the netfs opens a cache file for the first time, a request with the
> +CACHEFILES_OP_OPEN opcode, a.k.a an OPEN request will be sent to the user
> +daemon.  The payload format is of the form::
> +
> +	struct cachefiles_open {
> +		__u32 volume_key_size;
> +		__u32 cookie_key_size;
> +		__u32 fd;
> +		__u32 flags;
> +		__u8  data[];
> +	};
> +
> +	where:
> +
> +	* ``data`` contains the volume_key followed directly by the cookie_key.
> +	  The volume key is a NUL-terminated string; the cookie key is binary
> +	  data.
> +
> +	* ``volume_key_size`` indicates the size of the volume key in bytes.
> +
> +	* ``cookie_key_size`` indicates the size of the cookie key in bytes.
> +
> +	* ``fd`` indicates an anonymous fd referring to the cache file, through
> +	  which the user daemon can perform write/llseek file operations on the
> +	  cache file.
> +
> +
> +The user daemon is able to distinguish the requested cache file with the given
> +(volume_key, cookie_key) pair.

"The user daemon can use the given (volume_key, cookie_key) pair to
distinguish the requested cache file." might sound better.

> Each cache file has a unique object_id, while it
> +may have multiple anonymous fds. The user daemon may duplicate anonymous fds
> +from the initial anonymous fd indicated by the @fd field through dup(). Thus
> +each object_id can be mapped to multiple anonymous fds, while the usr daemon
> +itself needs to maintain the mapping.
> +
> +With the given anonymous fd, the user daemon can fetch data and write it to the
> +cache file in the background, even when kernel has not triggered a cache miss
> +yet.
> +
> +The user daemon should complete the READ request

READ request -> OPEN request?

> by issuing a "copen" (complete
> +open) command on the devnode::
> +
> +	copen <msg_id>,<cache_size>
> +
> +	* ``msg_id`` must match the msg_id field of the previous OPEN request.
> +
> +	* When >= 0, ``cache_size`` indicates the size of the cache file;
> +	  when < 0, ``cache_size`` indicates the

the -> any

> error code ecountered

encountered

> by the
> +	  user daemon.
> +
> +
> +The CLOSE Request
> +-----------------
> +
> +When a cookie withdrawn, a CLOSE request (opcode CACHEFILES_OP_CLOSE) will be
> +sent to the user daemon. It will notify

It will notify -> This tells

> the user daemon to close all anonymous
> +fds associated with the given object_id.  The CLOSE request has no extea

extra

> +payload.
> +
> +
> +The READ Request
> +----------------
> +
> +When on-demand read mode is turned on, and a cache miss encountered,

"When a cache miss is encountered in on-demand read mode,"

> the kernel
> +will send a READ request (opcode CACHEFILES_OP_READ) to the user daemon. This
> +will tell

will tell -> tells/asks

> the user daemon to fetch data

data -> the contents

> of the requested file range. The payload
> +is of the form::
> +
> +	struct cachefiles_read {
> +		__u64 off;
> +		__u64 len;
> +	};
> +
> +	where:
> +
> +	* ``off`` indicates the starting offset of the requested file range.
> +
> +	* ``len`` indicates the length of the requested file range.
> +
> +
> +When receiving

receiving -> it receives

> a READ request, the user daemon needs to

needs to -> should

> fetch the

requested

> data of the
> +requested file range,

"of the requested file range," -> "" (including the comma, I think)

> and then

"then" -> ""

> write it to the cache file identified by
> +object_id.
> +
> +To finish

When it has finished

> processing the READ request, the user daemon should reply with

with -> by using

> the
> +CACHEFILES_IOC_CREAD ioctl on one of the anonymous fds associated with the given
> +object_id

given object_id -> object_id given

> in the READ request.  The ioctl is of the form::
> +
> +	ioctl(fd, CACHEFILES_IOC_CREAD, msg_id);
> +
> +	* ``fd`` is one of the anonymous fds associated with the given object_id
> +	  in the READ request.

the given object_id in the READ request -> object_id

> +
> +	* ``msg_id`` must match the msg_id field of the previous READ request.

By "previous READ request" is this referring to something different to "the
READ request" you mentioned against the fd parameter?

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* EMFILE/ENFILE mitigation needed in erofs?
  2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
@ 2022-04-21 14:54   ` David Howells
  -1 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:54 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	fd_install(fd, file);

Do you need to mitigate potential EMFILE/ENFILE problems?  You're potentially
trebling up the number of accounted systemwide fds: one for erofs itself, one
anonfd per cache object file to communicate with the daemon and one in the
daemon to talk to the server.  Cachefiles has a fourth internally, but it's
kept off the books - further, cachefiles closes them fairly quickly after a
period of nonuse.

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* EMFILE/ENFILE mitigation needed in erofs?
@ 2022-04-21 14:54   ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 14:54 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	fd_install(fd, file);

Do you need to mitigate potential EMFILE/ENFILE problems?  You're potentially
trebling up the number of accounted systemwide fds: one for erofs itself, one
anonfd per cache object file to communicate with the daemon and one in the
daemon to talk to the server.  Cachefiles has a fourth internally, but it's
kept off the books - further, cachefiles closes them fairly quickly after a
period of nonuse.

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 04/21] cachefiles: notify user daemon when withdrawing cookie
  2022-04-21 14:05   ` David Howells
@ 2022-04-21 14:57     ` JeffleXu
  -1 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-21 14:57 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee



On 4/21/22 10:05 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	 * It's possiblie that object id is still 0 if the cookie looking up
> 
> possiblie -> possible

Thanks.

> 
> Otherwise:
> 
> Acked-by: David Howells <dhowells@redhat.com>

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 04/21] cachefiles: notify user daemon when withdrawing cookie
@ 2022-04-21 14:57     ` JeffleXu
  0 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-21 14:57 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, fannaihao, willy, linux-kernel, tianzichen,
	joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds



On 4/21/22 10:05 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	 * It's possiblie that object id is still 0 if the cookie looking up
> 
> possiblie -> possible

Thanks.

> 
> Otherwise:
> 
> Acked-by: David Howells <dhowells@redhat.com>

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 05/21] cachefiles: implement on-demand read
  2022-04-21 14:14   ` David Howells
@ 2022-04-21 15:00     ` JeffleXu
  -1 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-21 15:00 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee



On 4/21/22 10:14 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand
>> read should be done when a cache miss encountered.
> 
> That may conflict with changes I'm making - but it's just a matter of flag
> renumbering.
> 
>> +#define CACHEFILES_IOC_CREAD	_IOW(0x98, 1, int)
> 
> I wonder if CACHEFILES_IOC_READ_COMPLETE would be a better name, 

Okay, it sounds more readable. Thanks.


but apart
> from that:
> 
> Acked-by: David Howells <dhowells@redhat.com>

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 05/21] cachefiles: implement on-demand read
@ 2022-04-21 15:00     ` JeffleXu
  0 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-21 15:00 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, fannaihao, willy, linux-kernel, tianzichen,
	joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds



On 4/21/22 10:14 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand
>> read should be done when a cache miss encountered.
> 
> That may conflict with changes I'm making - but it's just a matter of flag
> renumbering.
> 
>> +#define CACHEFILES_IOC_CREAD	_IOW(0x98, 1, int)
> 
> I wonder if CACHEFILES_IOC_READ_COMPLETE would be a better name, 

Okay, it sounds more readable. Thanks.


but apart
> from that:
> 
> Acked-by: David Howells <dhowells@redhat.com>

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 06/21] cachefiles: enable on-demand read mode
  2022-04-21 14:17   ` David Howells
@ 2022-04-21 15:11     ` JeffleXu
  -1 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-21 15:11 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee



On 4/21/22 10:17 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
>> +	    !strcmp(args, "ondemand")) {
>> +		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
>> +	} else if (*args) {
>> +		pr_err("'bind' command doesn't take an argument\n");
> 
> The error message isn't true if CONFIG_CACHEFILES_ONDEMAND=y.  It would be
> better to say "Invalid argument to the 'bind' command".

Right. Or users may gets confused then. Will be fixed in the next version.

> 
>> -retry:
>>  	/* If the caller asked us to seek for data before doing the read, then
>>  	 * we should do that now.  If we find a gap, we fill it with zeros.
>>  	 */
>> @@ -120,16 +119,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
>>  			if (read_hole == NETFS_READ_HOLE_FAIL)
>>  				goto presubmission_error;
>>  
>> -			if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
>> -				ret = cachefiles_ondemand_read(object, off, len);
>> -				if (ret)
>> -					goto presubmission_error;
>> -
>> -				/* fail the read if no progress achieved */
>> -				read_hole = NETFS_READ_HOLE_FAIL;
>> -				goto retry;
>> -			}
>> -
> 

Sorry, it's my mistake when doing "git rebase". The previous version
(v8) actually calls cachefiles_ondemand_read() in cachefiles_read().
However as explained in the commit message of patch 5 ("cachefiles:
implement on-demand read"), fscache_read() can only detect if the
requested file range is fully cache miss, whilst it can't detect if it
is partial cache miss, i.e. there's a hole inside the requested file range.

Thus in this patchset (v9), we move the entry of calling
cachefiles_ondemand_read() from cachefiles_read() to
cachefiles_prepare_read(). The above "deletion of newly added code" is
actually reverting the previous change to cachefiles_read(). It was
mistakenly merged to this patch when I was doing "git rebase"...
Actually it should be merged to patch 5 ("cachefiles: implement
on-demand read"), which initially introduce the change to cachefiles_read().

Apologize for the careless mistake...


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 06/21] cachefiles: enable on-demand read mode
@ 2022-04-21 15:11     ` JeffleXu
  0 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-21 15:11 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, fannaihao, willy, linux-kernel, tianzichen,
	joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds



On 4/21/22 10:17 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
>> +	    !strcmp(args, "ondemand")) {
>> +		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
>> +	} else if (*args) {
>> +		pr_err("'bind' command doesn't take an argument\n");
> 
> The error message isn't true if CONFIG_CACHEFILES_ONDEMAND=y.  It would be
> better to say "Invalid argument to the 'bind' command".

Right. Or users may gets confused then. Will be fixed in the next version.

> 
>> -retry:
>>  	/* If the caller asked us to seek for data before doing the read, then
>>  	 * we should do that now.  If we find a gap, we fill it with zeros.
>>  	 */
>> @@ -120,16 +119,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
>>  			if (read_hole == NETFS_READ_HOLE_FAIL)
>>  				goto presubmission_error;
>>  
>> -			if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
>> -				ret = cachefiles_ondemand_read(object, off, len);
>> -				if (ret)
>> -					goto presubmission_error;
>> -
>> -				/* fail the read if no progress achieved */
>> -				read_hole = NETFS_READ_HOLE_FAIL;
>> -				goto retry;
>> -			}
>> -
> 

Sorry, it's my mistake when doing "git rebase". The previous version
(v8) actually calls cachefiles_ondemand_read() in cachefiles_read().
However as explained in the commit message of patch 5 ("cachefiles:
implement on-demand read"), fscache_read() can only detect if the
requested file range is fully cache miss, whilst it can't detect if it
is partial cache miss, i.e. there's a hole inside the requested file range.

Thus in this patchset (v9), we move the entry of calling
cachefiles_ondemand_read() from cachefiles_read() to
cachefiles_prepare_read(). The above "deletion of newly added code" is
actually reverting the previous change to cachefiles_read(). It was
mistakenly merged to this patch when I was doing "git rebase"...
Actually it should be merged to patch 5 ("cachefiles: implement
on-demand read"), which initially introduce the change to cachefiles_read().

Apologize for the careless mistake...


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: EMFILE/ENFILE mitigation needed in erofs?
  2022-04-21 14:54   ` David Howells
@ 2022-04-21 16:14     ` JeffleXu
  -1 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-21 16:14 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, fannaihao, willy, linux-kernel, tianzichen,
	joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds



On 4/21/22 10:54 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	fd_install(fd, file);
> 
> Do you need to mitigate potential EMFILE/ENFILE problems?  You're potentially
> trebling up the number of accounted systemwide fds: one for erofs itself, one
> anonfd per cache object file to communicate with the daemon and one in the
> daemon to talk to the server.  Cachefiles has a fourth internally, but it's
> kept off the books - further, cachefiles closes them fairly quickly after a
> period of nonuse.
> 

Hi, thanks for pointing it out.

1. Actually in our using scenarios, one erofs filesystem is formed of
several chunk-deduplicated blobs (which are really cached by
Cachefiles), while each blob can contain many files of erofs. For
example, one container image for node.js will correspond to ~20 blob
files in total. Only these blob files are cached by Cachefiles. In
densely employed environment, there could be hundreds of containers and
thus thousands of backing files on one machine. That is, only tens of
thousands of fds/files is needed in this case.

2. Our user daemon will configure rlimit-nofile to a reasonably large
(e.g. 1 million) value, so that it won't fail when trying to allocate fds.

https://github.com/dragonflyoss/image-service/blob/master/src/bin/nydusd/main.rs#L152

3. Our user daemon will close the anonymous fd once the corresponding
backing file has fully downloaded, to free the fd resources.

4. Even if fd/file allocation fails (in cachefiles_ondemand_get_fd()),
the INIT request will fail, and thus the erofs mount will fail then.
That is, it won't break the upper erofs in this case.

5. If later we find that the number of fds/files is indeed an issue,
then we also plan to make the user daemon close some fds to spare some
free resources. And then the Cachefiles kernel module needs to
reallocate an anonymous fd for the backing file when cache miss. But it
remains to be done later if it's really needed.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: EMFILE/ENFILE mitigation needed in erofs?
@ 2022-04-21 16:14     ` JeffleXu
  0 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-21 16:14 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee



On 4/21/22 10:54 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	fd_install(fd, file);
> 
> Do you need to mitigate potential EMFILE/ENFILE problems?  You're potentially
> trebling up the number of accounted systemwide fds: one for erofs itself, one
> anonfd per cache object file to communicate with the daemon and one in the
> daemon to talk to the server.  Cachefiles has a fourth internally, but it's
> kept off the books - further, cachefiles closes them fairly quickly after a
> period of nonuse.
> 

Hi, thanks for pointing it out.

1. Actually in our using scenarios, one erofs filesystem is formed of
several chunk-deduplicated blobs (which are really cached by
Cachefiles), while each blob can contain many files of erofs. For
example, one container image for node.js will correspond to ~20 blob
files in total. Only these blob files are cached by Cachefiles. In
densely employed environment, there could be hundreds of containers and
thus thousands of backing files on one machine. That is, only tens of
thousands of fds/files is needed in this case.

2. Our user daemon will configure rlimit-nofile to a reasonably large
(e.g. 1 million) value, so that it won't fail when trying to allocate fds.

https://github.com/dragonflyoss/image-service/blob/master/src/bin/nydusd/main.rs#L152

3. Our user daemon will close the anonymous fd once the corresponding
backing file has fully downloaded, to free the fd resources.

4. Even if fd/file allocation fails (in cachefiles_ondemand_get_fd()),
the INIT request will fail, and thus the erofs mount will fail then.
That is, it won't break the upper erofs in this case.

5. If later we find that the number of fds/files is indeed an issue,
then we also plan to make the user daemon close some fds to spare some
free resources. And then the Cachefiles kernel module needs to
reallocate an anonymous fd for the backing file when cache miss. But it
remains to be done later if it's really needed.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: EMFILE/ENFILE mitigation needed in erofs?
  2022-04-21 14:54   ` David Howells
@ 2022-04-21 17:57     ` David Howells
  -1 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 17:57 UTC (permalink / raw)
  To: JeffleXu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

JeffleXu <jefflexu@linux.alibaba.com> wrote:

> 2. Our user daemon will configure rlimit-nofile to a reasonably large
> (e.g. 1 million) value, so that it won't fail when trying to allocate fds.

There's a system-wide limit also; simply increasing the rlimit won't override
that.

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: EMFILE/ENFILE mitigation needed in erofs?
@ 2022-04-21 17:57     ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2022-04-21 17:57 UTC (permalink / raw)
  To: JeffleXu
  Cc: tianzichen, linux-erofs, fannaihao, willy, linux-kernel,
	dhowells, joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

JeffleXu <jefflexu@linux.alibaba.com> wrote:

> 2. Our user daemon will configure rlimit-nofile to a reasonably large
> (e.g. 1 million) value, so that it won't fail when trying to allocate fds.

There's a system-wide limit also; simply increasing the rlimit won't override
that.

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: EMFILE/ENFILE mitigation needed in erofs?
  2022-04-21 17:57     ` David Howells
@ 2022-04-21 18:16       ` Gao Xiang
  -1 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 18:16 UTC (permalink / raw)
  To: David Howells
  Cc: JeffleXu, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Hi David,

On Thu, Apr 21, 2022 at 06:57:40PM +0100, David Howells wrote:
> JeffleXu <jefflexu@linux.alibaba.com> wrote:
> 
> > 2. Our user daemon will configure rlimit-nofile to a reasonably large
> > (e.g. 1 million) value, so that it won't fail when trying to allocate fds.
> 
> There's a system-wide limit also; simply increasing the rlimit won't override
> that.

Yes, I suggest that we should add some words to document this
to system administrators to take care of `/proc/sys/fs/file-max',
but I think it's typically not a problem about our on-demand cases.

Since each cookie equals to an erofs device, so not too many erofs
devices (much like docker layers) for one erofs images and they
are all handled when mounting (which needs privilege permissions.)

And due to this, fscache dir can be easily backed up, restored, and
transfered since they are really golden erofs image files.

Thanks,
Gao Xiang

> 
> David

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: EMFILE/ENFILE mitigation needed in erofs?
@ 2022-04-21 18:16       ` Gao Xiang
  0 siblings, 0 replies; 96+ messages in thread
From: Gao Xiang @ 2022-04-21 18:16 UTC (permalink / raw)
  To: David Howells
  Cc: joseph.qi, linux-erofs, fannaihao, willy, linux-kernel,
	tianzichen, linux-fsdevel, zhangjiachen.jaycee, linux-cachefs,
	gregkh, luodaowen.backend, gerry, torvalds

Hi David,

On Thu, Apr 21, 2022 at 06:57:40PM +0100, David Howells wrote:
> JeffleXu <jefflexu@linux.alibaba.com> wrote:
> 
> > 2. Our user daemon will configure rlimit-nofile to a reasonably large
> > (e.g. 1 million) value, so that it won't fail when trying to allocate fds.
> 
> There's a system-wide limit also; simply increasing the rlimit won't override
> that.

Yes, I suggest that we should add some words to document this
to system administrators to take care of `/proc/sys/fs/file-max',
but I think it's typically not a problem about our on-demand cases.

Since each cookie equals to an erofs device, so not too many erofs
devices (much like docker layers) for one erofs images and they
are all handled when mounting (which needs privilege permissions.)

And due to this, fscache dir can be easily backed up, restored, and
transfered since they are really golden erofs image files.

Thanks,
Gao Xiang

> 
> David

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode
  2022-04-21 14:02   ` David Howells
@ 2022-04-22  2:44     ` JeffleXu
  -1 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-22  2:44 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee



On 4/21/22 10:02 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	struct kref			unbind_pincount;/* refcount to do daemon unbind */
> 
> Please use refcount_t or atomic_t, especially as this isn't the refcount for
> the structure.

Okay, will be done in the next version.

> 
>> -	cachefiles_daemon_unbind(cache);
>> -
>>  	/* clean up the control file interface */
>>  	cache->cachefilesd = NULL;
>>  	file->private_data = NULL;
>>  	cachefiles_open = 0;
> 
> Please call cachefiles_daemon_unbind() before the cleanup.

Since the cachefiles_struct struct will be freed once the pincount is
decreased to 0, "cache->cachefilesd = NULL;" needs to be done before
decreasing the pincount. BTW, "cachefiles_open = 0;" indeed should be
done only when pincount has been decreased to 0.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode
@ 2022-04-22  2:44     ` JeffleXu
  0 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-22  2:44 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, fannaihao, willy, linux-kernel, tianzichen,
	joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds



On 4/21/22 10:02 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	struct kref			unbind_pincount;/* refcount to do daemon unbind */
> 
> Please use refcount_t or atomic_t, especially as this isn't the refcount for
> the structure.

Okay, will be done in the next version.

> 
>> -	cachefiles_daemon_unbind(cache);
>> -
>>  	/* clean up the control file interface */
>>  	cache->cachefilesd = NULL;
>>  	file->private_data = NULL;
>>  	cachefiles_open = 0;
> 
> Please call cachefiles_daemon_unbind() before the cleanup.

Since the cachefiles_struct struct will be freed once the pincount is
decreased to 0, "cache->cachefilesd = NULL;" needs to be done before
decreasing the pincount. BTW, "cachefiles_open = 0;" indeed should be
done only when pincount has been decreased to 0.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 08/21] cachefiles: document on-demand read mode
  2022-04-21 14:47   ` David Howells
@ 2022-04-22  3:10     ` JeffleXu
  -1 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-22  3:10 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend, tianzichen, fannaihao,
	zhangjiachen.jaycee

Hi David, thanks for polishing the documents. It's a detailed and
meticulous review again. Really thanks for your time :) I will fix all
these in the next version.

On 4/21/22 10:47 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +The essential difference between these two modes is that, in original mode,
>> +when a cache miss occurs, the netfs will fetch the data from the remote server
>> +and then write it to the cache file.  With on-demand read mode, however,
>> +fetching the data and writing it into the cache is delegated to a user daemon.
> 
> The starting sentence seems off.  How about:
> 
>   The essential difference between these two modes is seen when a cache miss
>   occurs: In the original mode, the netfs will fetch the data from the remote
>   server and then write it to the cache file; in on-demand read mode, fetching
>   data and writing it into the cache is delegated to a user daemon.

Okay, it sounds better.

>> the devnode ('/dev/cachefiles') to check if
>> +there's a pending request to be processed.  A POLLIN event will be returned
>> +when there's a pending request.
>> +
>> +The user daemon then reads the devnode to fetch a request and process it
>> +accordingly.
> 
> Reading the devnode doesn't process the request, so I think something like:
> 
> "... and process it accordingly" -> "... that it can then process."
> 
> or:
> 
> "... and process it accordingly" -> "... to process."

Yeah the original statement is indeed misleading.


>> Each cache file has a unique object_id, while it
>> +may have multiple anonymous fds. The user daemon may duplicate anonymous fds
>> +from the initial anonymous fd indicated by the @fd field through dup(). Thus
>> +each object_id can be mapped to multiple anonymous fds, while the usr daemon
>> +itself needs to maintain the mapping.
>> +
>> +With the given anonymous fd, the user daemon can fetch data and write it to the
>> +cache file in the background, even when kernel has not triggered a cache miss
>> +yet.
>> +
>> +The user daemon should complete the READ request
> 
> READ request -> OPEN request?

Good catch. Will be fixed.


>> in the READ request.  The ioctl is of the form::
>> +
>> +	ioctl(fd, CACHEFILES_IOC_CREAD, msg_id);
>> +
>> +	* ``fd`` is one of the anonymous fds associated with the given object_id
>> +	  in the READ request.
> 
> the given object_id in the READ request -> object_id
> 
>> +
>> +	* ``msg_id`` must match the msg_id field of the previous READ request.
> 
> By "previous READ request" is this referring to something different to "the
> READ request" you mentioned against the fd parameter?

Actually it is referring to the same thing (the same READ request). I
will change the statement simply to:

``msg_id`` must match the msg_id field of the READ request.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v9 08/21] cachefiles: document on-demand read mode
@ 2022-04-22  3:10     ` JeffleXu
  0 siblings, 0 replies; 96+ messages in thread
From: JeffleXu @ 2022-04-22  3:10 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, fannaihao, willy, linux-kernel, tianzichen,
	joseph.qi, zhangjiachen.jaycee, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Hi David, thanks for polishing the documents. It's a detailed and
meticulous review again. Really thanks for your time :) I will fix all
these in the next version.

On 4/21/22 10:47 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +The essential difference between these two modes is that, in original mode,
>> +when a cache miss occurs, the netfs will fetch the data from the remote server
>> +and then write it to the cache file.  With on-demand read mode, however,
>> +fetching the data and writing it into the cache is delegated to a user daemon.
> 
> The starting sentence seems off.  How about:
> 
>   The essential difference between these two modes is seen when a cache miss
>   occurs: In the original mode, the netfs will fetch the data from the remote
>   server and then write it to the cache file; in on-demand read mode, fetching
>   data and writing it into the cache is delegated to a user daemon.

Okay, it sounds better.

>> the devnode ('/dev/cachefiles') to check if
>> +there's a pending request to be processed.  A POLLIN event will be returned
>> +when there's a pending request.
>> +
>> +The user daemon then reads the devnode to fetch a request and process it
>> +accordingly.
> 
> Reading the devnode doesn't process the request, so I think something like:
> 
> "... and process it accordingly" -> "... that it can then process."
> 
> or:
> 
> "... and process it accordingly" -> "... to process."

Yeah the original statement is indeed misleading.


>> Each cache file has a unique object_id, while it
>> +may have multiple anonymous fds. The user daemon may duplicate anonymous fds
>> +from the initial anonymous fd indicated by the @fd field through dup(). Thus
>> +each object_id can be mapped to multiple anonymous fds, while the usr daemon
>> +itself needs to maintain the mapping.
>> +
>> +With the given anonymous fd, the user daemon can fetch data and write it to the
>> +cache file in the background, even when kernel has not triggered a cache miss
>> +yet.
>> +
>> +The user daemon should complete the READ request
> 
> READ request -> OPEN request?

Good catch. Will be fixed.


>> in the READ request.  The ioctl is of the form::
>> +
>> +	ioctl(fd, CACHEFILES_IOC_CREAD, msg_id);
>> +
>> +	* ``fd`` is one of the anonymous fds associated with the given object_id
>> +	  in the READ request.
> 
> the given object_id in the READ request -> object_id
> 
>> +
>> +	* ``msg_id`` must match the msg_id field of the previous READ request.
> 
> By "previous READ request" is this referring to something different to "the
> READ request" you mentioned against the fd parameter?

Actually it is referring to the same thing (the same READ request). I
will change the statement simply to:

``msg_id`` must match the msg_id field of the READ request.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2022-04-22  3:10 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-15 12:35 [PATCH v9 00/21] fscache,erofs: fscache-based on-demand read semantics Jeffle Xu
2022-04-15 12:35 ` [PATCH v9 00/21] fscache, erofs: " Jeffle Xu
2022-04-15 12:35 ` [PATCH v9 01/21] cachefiles: extract write routine Jeffle Xu
2022-04-15 12:35   ` Jeffle Xu
2022-04-15 12:35 ` [PATCH v9 02/21] cachefiles: notify user daemon when looking up cookie Jeffle Xu
2022-04-15 12:35   ` Jeffle Xu
2022-04-15 12:35 ` [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode Jeffle Xu
2022-04-15 12:35   ` Jeffle Xu
2022-04-15 12:35 ` [PATCH v9 04/21] cachefiles: notify user daemon when withdrawing cookie Jeffle Xu
2022-04-15 12:35   ` Jeffle Xu
2022-04-15 12:35 ` [PATCH v9 05/21] cachefiles: implement on-demand read Jeffle Xu
2022-04-15 12:35   ` Jeffle Xu
2022-04-15 12:35 ` [PATCH v9 06/21] cachefiles: enable on-demand read mode Jeffle Xu
2022-04-15 12:35   ` Jeffle Xu
2022-04-15 12:36 ` [PATCH v9 07/21] cachefiles: add tracepoints for " Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-15 12:36 ` [PATCH v9 08/21] cachefiles: document " Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-15 12:36 ` [PATCH v9 09/21] erofs: make erofs_map_blocks() generally available Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-15 12:36 ` [PATCH v9 10/21] erofs: add fscache mode check helper Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-21  7:53   ` Gao Xiang
2022-04-21  7:53     ` Gao Xiang
2022-04-15 12:36 ` [PATCH v9 11/21] erofs: register fscache volume Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-15 12:36 ` [PATCH v9 12/21] erofs: add fscache context helper functions Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-15 12:36 ` [PATCH v9 13/21] erofs: add anonymous inode caching metadata for data blobs Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-15 12:36 ` [PATCH v9 14/21] erofs: add erofs_fscache_read_folios() helper Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-15 12:36 ` [PATCH v9 15/21] erofs: register fscache context for primary data blob Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-15 12:36 ` [PATCH v9 16/21] erofs: register fscache context for extra data blobs Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-21 10:58   ` Gao Xiang
2022-04-21 10:58     ` Gao Xiang
2022-04-15 12:36 ` [PATCH v9 17/21] erofs: implement fscache-based metadata read Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-21 13:03   ` Gao Xiang
2022-04-21 13:03     ` Gao Xiang
2022-04-15 12:36 ` [PATCH v9 18/21] erofs: implement fscache-based data read for non-inline layout Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-21 11:13   ` Gao Xiang
2022-04-21 11:13     ` Gao Xiang
2022-04-15 12:36 ` [PATCH v9 19/21] erofs: implement fscache-based data read for inline layout Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-21 11:14   ` Gao Xiang
2022-04-21 11:14     ` Gao Xiang
2022-04-15 12:36 ` [PATCH v9 20/21] erofs: implement fscache-based data readahead Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-21 11:51   ` Gao Xiang
2022-04-21 11:51     ` Gao Xiang
2022-04-15 12:36 ` [PATCH v9 21/21] erofs: add 'fsid' mount option Jeffle Xu
2022-04-15 12:36   ` Jeffle Xu
2022-04-21 11:59   ` Gao Xiang
2022-04-21 11:59     ` Gao Xiang
2022-04-20  8:52 ` [PATCH v9 00/21] fscache, erofs: fscache-based on-demand read semantics JiaZhu
2022-04-20  8:52   ` JiaZhu
2022-04-21 13:24 ` [PATCH v9 01/21] cachefiles: extract write routine David Howells
2022-04-21 13:24   ` David Howells
2022-04-21 13:57 ` [PATCH v9 02/21] cachefiles: notify user daemon when looking up cookie David Howells
2022-04-21 13:57   ` David Howells
2022-04-21 14:47   ` JeffleXu
2022-04-21 14:47     ` JeffleXu
2022-04-21 14:02 ` [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode David Howells
2022-04-21 14:02   ` David Howells
2022-04-22  2:44   ` JeffleXu
2022-04-22  2:44     ` JeffleXu
2022-04-21 14:05 ` [PATCH v9 04/21] cachefiles: notify user daemon when withdrawing cookie David Howells
2022-04-21 14:05   ` David Howells
2022-04-21 14:57   ` JeffleXu
2022-04-21 14:57     ` JeffleXu
2022-04-21 14:14 ` [PATCH v9 05/21] cachefiles: implement on-demand read David Howells
2022-04-21 14:14   ` David Howells
2022-04-21 15:00   ` JeffleXu
2022-04-21 15:00     ` JeffleXu
2022-04-21 14:17 ` [PATCH v9 06/21] cachefiles: enable on-demand read mode David Howells
2022-04-21 14:17   ` David Howells
2022-04-21 15:11   ` JeffleXu
2022-04-21 15:11     ` JeffleXu
2022-04-21 14:19 ` [PATCH v9 07/21] cachefiles: add tracepoints for " David Howells
2022-04-21 14:19   ` David Howells
2022-04-21 14:47 ` [PATCH v9 08/21] cachefiles: document " David Howells
2022-04-21 14:47   ` David Howells
2022-04-22  3:10   ` JeffleXu
2022-04-22  3:10     ` JeffleXu
2022-04-21 14:54 ` EMFILE/ENFILE mitigation needed in erofs? David Howells
2022-04-21 14:54   ` David Howells
2022-04-21 16:14   ` JeffleXu
2022-04-21 16:14     ` JeffleXu
2022-04-21 17:57   ` David Howells
2022-04-21 17:57     ` David Howells
2022-04-21 18:16     ` Gao Xiang
2022-04-21 18:16       ` Gao Xiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.