All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/22] fscache,erofs: fscache-based on-demand read semantics
@ 2022-03-16 13:17 ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

changes since v4:
- erofs: add reviewed-by tag from Chao Yu (patch 8)
- cachefiles: rename CACHEFILES_OP_INIT to CACHEFILES_OP_OPEN (patch 4)
- cachefiles: add a new message type (CACHEFILES_OP_CLOSE). It will be
  sent to user daemon when withdrawing cookie. It is used to notify user daemon
  to close the attached anon_fd. (patch 5)
- cachefiles: add a read-write spinlock @cache->reqs_lock (patch 3) to protect
  parallel accessing to the xarray (patch 4).
- cachefiles: remove the logic of automaticlly flushing all associated
  requests when anon_fd gets closed (in cachefiles_ondemand_fd_release()).
  The reason is that, the reordering of cread (response to READ request) and
  close(anon_fd) may unexpectedly complete another READ request which reuses
  the ID of previous READ request.

```
Process 1				Process 2
close(anon_fd)
  complete READ request A with ID X

					on-demand read
					  enqueue READ request B into xarray,
					  now READ request B reuses ID X
cread(ID X) of READ request A
  now ID X responds to READ request B
  complete READ request B // unexpected
```

  So now closing anon_fd won't flush all associated requests. A
  mandatory response (cread) is required for each READ request.


RFC: https://lore.kernel.org/all/YbRL2glGzjfZkVbH@B-P7TQMD6M-0146.local/t/
v1: https://lore.kernel.org/lkml/47831875-4bdd-8398-9f2d-0466b31a4382@linux.alibaba.com/T/
v2: https://lore.kernel.org/all/2946d871-b9e1-cf29-6d39-bcab30f2854f@linux.alibaba.com/t/
v3: https://lore.kernel.org/lkml/20220209060108.43051-1-jefflexu@linux.alibaba.com/T/
v4: https://lore.kernel.org/lkml/20220307123305.79520-1-jefflexu@linux.alibaba.com/T/#t


[Background]
============
Nydus [1] is a container image distribution service specially optimised
for distribution over network. Nydus is an excellent container image
acceleration solution, since it only pulls data from remote when it's
really needed, a.k.a. on-demand reading.

erofs (Enhanced Read-Only File System) is a filesystem specially
optimised for read-only scenarios. (Documentation/filesystem/erofs.rst)

Recently we are focusing on erofs in container images distribution
scenario [2], trying to combine it with nydus. In this case, erofs can
be mounted from one bootstrap file (metadata) with (optional) multiple
data blob files (data) stored on another local filesystem. (All these
files are actually image files in erofs disk format.)

To accelerate the container startup (fetching container image from remote
and then start the container), we do hope that the bootstrap blob file
could support demand read. That is, erofs can be mounted and accessed
even when the bootstrap/data blob files have not been fully downloaded.

That means we have to manage the cache state of the bootstrap/data blob
files (if cache hit, read directly from the local cache; if cache miss,
fetch the data somehow). It would be painful and may be dumb for erofs to
implement the cache management itself. Thus we prefer fscache/cachefiles
to do the cache management. Besides, the demand-read feature shall be
general and it can benefit other using scenarios if it can be implemented
in fscache level.

[1] https://nydus.dev
[2] https://sched.co/pcdL


[Overall Design]
================

Please refer to patch 6 ("cachefiles: document on-demand read mode") for
more details.

When working in original mode, cachefiles mainly serves as a local cache for
remote networking fs, while in on-demand read mode, cachefiles can boost the
scenario where on-demand read semantics is needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when cache miss, netfs itself will fetch data from remote, and then write the
fetched data into cache file. While in on-demand read mode, a user daemon is
responsible for fetching data and then writing to the cache file.

The on-demand read mode relies on a simple protocol used for communication
between kernel and user daemon.

The current implementation relies on the anonymous fd mechanism to avoid
the dependence on the format of cache file. When cache file is opened
for the first time, an anon_fd associated with the cache file is sent to
user daemon. With the given anon_fd, user daemon could fetch and write data
into the cache file in the background, even when kernel has not triggered
the cache miss. Besides, the write() syscall to the anon_fd will finally
call cachefiles kernel module, which will write data to cache file in
the latest format of cache file.

1. cache miss
When cache miss, cachefiles kernel module will notify user daemon the
anon_fd, along with the requested file range. When notified, user dameon
needs to fetch data of the requested file range, and then write the fetched
data into cache file with the given anonymous fd. When finished
processing the request, user daemon needs to notify the kernel.

After notifying the user daemon, the kernel read routine will hang there,
until the request is handled by user daemon. When it's awaken by the
notification from user daemon, i.e. the corresponding hole has been filled
by the user daemon, it will retry to read from the same file range.

2. cache hit
Once data is already ready in cache file, netfs will read from cache file directly.


[Advantage of fscache-based demand-read]
========================================
1. Asynchronous Prefetch
In current mechanism, fscache is responsible for cache state management,
while the data plane (fetch data from local/remote on cache miss) is
done on the user daemon side.

If data has already been ready in the backing file, the upper fs (e.g.
erofs) will read from the backing file directly and won't be trapped to
user space anymore. Thus the user daemon could fetch data (from remote)
asynchronously on the background, and thus accelerate the backing file
accessing in some degree.

2. Support massive blob files
Besides this mechanism supports a large amount of backing files, and
thus can benefit the densely employed scenario.

In our using scenario, one container image can correspond to one
bootstrap file (required) and multiple data blob files (optional). For
example, one container image for node.js will corresponds to ~20 files
in total. In densely employed environment, there could be as many as
hundreds of containers and thus thousands of backing files on one
machine.


[Test]
==========
You could start a quick test by
https://github.com/lostjeffle/demand-read-cachefilesd


Jeffle Xu (22):
  fscache: export fscache_end_operation()
  cachefiles: extract write routine
  cachefiles: introduce on-demand read mode
  cachefiles: notify user daemon with anon_fd when looking up cookie
  cachefiles: notify user daemon when withdrawing cookie
  cachefiles: implement on-demand read
  cachefiles: document on-demand read mode
  erofs: use meta buffers for erofs_read_superblock()
  erofs: make erofs_map_blocks() generally available
  erofs: add mode checking helper
  erofs: register global fscache volume
  erofs: add cookie context helper functions
  erofs: add anonymous inode managing page cache of blob file
  erofs: add erofs_fscache_read_pages() helper
  erofs: register cookie context for bootstrap blob
  erofs: implement fscache-based metadata read
  erofs: implement fscache-based data read for non-inline layout
  erofs: implement fscache-based data read for inline layout
  erofs: register cookie context for data blobs
  erofs: implement fscache-based data read for data blobs
  erofs: implement fscache-based data readahead
  erofs: add 'uuid' mount option

 .../filesystems/caching/cachefiles.rst        | 176 ++++++
 fs/cachefiles/Kconfig                         |  11 +
 fs/cachefiles/daemon.c                        | 587 +++++++++++++++++-
 fs/cachefiles/interface.c                     |   2 +
 fs/cachefiles/internal.h                      |  53 ++
 fs/cachefiles/io.c                            |  72 ++-
 fs/cachefiles/namei.c                         |  16 +-
 fs/erofs/Makefile                             |   3 +-
 fs/erofs/data.c                               |  18 +-
 fs/erofs/fscache.c                            | 492 +++++++++++++++
 fs/erofs/inode.c                              |   6 +-
 fs/erofs/internal.h                           |  30 +
 fs/erofs/super.c                              | 106 +++-
 fs/fscache/internal.h                         |  11 -
 fs/nfs/fscache.c                              |   8 -
 include/linux/fscache.h                       |  15 +
 include/linux/netfs.h                         |   1 +
 include/trace/events/cachefiles.h             |   2 +
 include/uapi/linux/cachefiles.h               |  51 ++
 19 files changed, 1560 insertions(+), 100 deletions(-)
 create mode 100644 fs/erofs/fscache.c
 create mode 100644 include/uapi/linux/cachefiles.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v5 00/22] fscache, erofs: fscache-based on-demand read semantics
@ 2022-03-16 13:17 ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

changes since v4:
- erofs: add reviewed-by tag from Chao Yu (patch 8)
- cachefiles: rename CACHEFILES_OP_INIT to CACHEFILES_OP_OPEN (patch 4)
- cachefiles: add a new message type (CACHEFILES_OP_CLOSE). It will be
  sent to user daemon when withdrawing cookie. It is used to notify user daemon
  to close the attached anon_fd. (patch 5)
- cachefiles: add a read-write spinlock @cache->reqs_lock (patch 3) to protect
  parallel accessing to the xarray (patch 4).
- cachefiles: remove the logic of automaticlly flushing all associated
  requests when anon_fd gets closed (in cachefiles_ondemand_fd_release()).
  The reason is that, the reordering of cread (response to READ request) and
  close(anon_fd) may unexpectedly complete another READ request which reuses
  the ID of previous READ request.

```
Process 1				Process 2
close(anon_fd)
  complete READ request A with ID X

					on-demand read
					  enqueue READ request B into xarray,
					  now READ request B reuses ID X
cread(ID X) of READ request A
  now ID X responds to READ request B
  complete READ request B // unexpected
```

  So now closing anon_fd won't flush all associated requests. A
  mandatory response (cread) is required for each READ request.


RFC: https://lore.kernel.org/all/YbRL2glGzjfZkVbH@B-P7TQMD6M-0146.local/t/
v1: https://lore.kernel.org/lkml/47831875-4bdd-8398-9f2d-0466b31a4382@linux.alibaba.com/T/
v2: https://lore.kernel.org/all/2946d871-b9e1-cf29-6d39-bcab30f2854f@linux.alibaba.com/t/
v3: https://lore.kernel.org/lkml/20220209060108.43051-1-jefflexu@linux.alibaba.com/T/
v4: https://lore.kernel.org/lkml/20220307123305.79520-1-jefflexu@linux.alibaba.com/T/#t


[Background]
============
Nydus [1] is a container image distribution service specially optimised
for distribution over network. Nydus is an excellent container image
acceleration solution, since it only pulls data from remote when it's
really needed, a.k.a. on-demand reading.

erofs (Enhanced Read-Only File System) is a filesystem specially
optimised for read-only scenarios. (Documentation/filesystem/erofs.rst)

Recently we are focusing on erofs in container images distribution
scenario [2], trying to combine it with nydus. In this case, erofs can
be mounted from one bootstrap file (metadata) with (optional) multiple
data blob files (data) stored on another local filesystem. (All these
files are actually image files in erofs disk format.)

To accelerate the container startup (fetching container image from remote
and then start the container), we do hope that the bootstrap blob file
could support demand read. That is, erofs can be mounted and accessed
even when the bootstrap/data blob files have not been fully downloaded.

That means we have to manage the cache state of the bootstrap/data blob
files (if cache hit, read directly from the local cache; if cache miss,
fetch the data somehow). It would be painful and may be dumb for erofs to
implement the cache management itself. Thus we prefer fscache/cachefiles
to do the cache management. Besides, the demand-read feature shall be
general and it can benefit other using scenarios if it can be implemented
in fscache level.

[1] https://nydus.dev
[2] https://sched.co/pcdL


[Overall Design]
================

Please refer to patch 6 ("cachefiles: document on-demand read mode") for
more details.

When working in original mode, cachefiles mainly serves as a local cache for
remote networking fs, while in on-demand read mode, cachefiles can boost the
scenario where on-demand read semantics is needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when cache miss, netfs itself will fetch data from remote, and then write the
fetched data into cache file. While in on-demand read mode, a user daemon is
responsible for fetching data and then writing to the cache file.

The on-demand read mode relies on a simple protocol used for communication
between kernel and user daemon.

The current implementation relies on the anonymous fd mechanism to avoid
the dependence on the format of cache file. When cache file is opened
for the first time, an anon_fd associated with the cache file is sent to
user daemon. With the given anon_fd, user daemon could fetch and write data
into the cache file in the background, even when kernel has not triggered
the cache miss. Besides, the write() syscall to the anon_fd will finally
call cachefiles kernel module, which will write data to cache file in
the latest format of cache file.

1. cache miss
When cache miss, cachefiles kernel module will notify user daemon the
anon_fd, along with the requested file range. When notified, user dameon
needs to fetch data of the requested file range, and then write the fetched
data into cache file with the given anonymous fd. When finished
processing the request, user daemon needs to notify the kernel.

After notifying the user daemon, the kernel read routine will hang there,
until the request is handled by user daemon. When it's awaken by the
notification from user daemon, i.e. the corresponding hole has been filled
by the user daemon, it will retry to read from the same file range.

2. cache hit
Once data is already ready in cache file, netfs will read from cache file directly.


[Advantage of fscache-based demand-read]
========================================
1. Asynchronous Prefetch
In current mechanism, fscache is responsible for cache state management,
while the data plane (fetch data from local/remote on cache miss) is
done on the user daemon side.

If data has already been ready in the backing file, the upper fs (e.g.
erofs) will read from the backing file directly and won't be trapped to
user space anymore. Thus the user daemon could fetch data (from remote)
asynchronously on the background, and thus accelerate the backing file
accessing in some degree.

2. Support massive blob files
Besides this mechanism supports a large amount of backing files, and
thus can benefit the densely employed scenario.

In our using scenario, one container image can correspond to one
bootstrap file (required) and multiple data blob files (optional). For
example, one container image for node.js will corresponds to ~20 files
in total. In densely employed environment, there could be as many as
hundreds of containers and thus thousands of backing files on one
machine.


[Test]
==========
You could start a quick test by
https://github.com/lostjeffle/demand-read-cachefilesd


Jeffle Xu (22):
  fscache: export fscache_end_operation()
  cachefiles: extract write routine
  cachefiles: introduce on-demand read mode
  cachefiles: notify user daemon with anon_fd when looking up cookie
  cachefiles: notify user daemon when withdrawing cookie
  cachefiles: implement on-demand read
  cachefiles: document on-demand read mode
  erofs: use meta buffers for erofs_read_superblock()
  erofs: make erofs_map_blocks() generally available
  erofs: add mode checking helper
  erofs: register global fscache volume
  erofs: add cookie context helper functions
  erofs: add anonymous inode managing page cache of blob file
  erofs: add erofs_fscache_read_pages() helper
  erofs: register cookie context for bootstrap blob
  erofs: implement fscache-based metadata read
  erofs: implement fscache-based data read for non-inline layout
  erofs: implement fscache-based data read for inline layout
  erofs: register cookie context for data blobs
  erofs: implement fscache-based data read for data blobs
  erofs: implement fscache-based data readahead
  erofs: add 'uuid' mount option

 .../filesystems/caching/cachefiles.rst        | 176 ++++++
 fs/cachefiles/Kconfig                         |  11 +
 fs/cachefiles/daemon.c                        | 587 +++++++++++++++++-
 fs/cachefiles/interface.c                     |   2 +
 fs/cachefiles/internal.h                      |  53 ++
 fs/cachefiles/io.c                            |  72 ++-
 fs/cachefiles/namei.c                         |  16 +-
 fs/erofs/Makefile                             |   3 +-
 fs/erofs/data.c                               |  18 +-
 fs/erofs/fscache.c                            | 492 +++++++++++++++
 fs/erofs/inode.c                              |   6 +-
 fs/erofs/internal.h                           |  30 +
 fs/erofs/super.c                              | 106 +++-
 fs/fscache/internal.h                         |  11 -
 fs/nfs/fscache.c                              |   8 -
 include/linux/fscache.h                       |  15 +
 include/linux/netfs.h                         |   1 +
 include/trace/events/cachefiles.h             |   2 +
 include/uapi/linux/cachefiles.h               |  51 ++
 19 files changed, 1560 insertions(+), 100 deletions(-)
 create mode 100644 fs/erofs/fscache.c
 create mode 100644 include/uapi/linux/cachefiles.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v5 01/22] fscache: export fscache_end_operation()
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Export fscache_end_operation() to avoid code duplication.

Besides, considering the paired fscache_begin_read_operation() is
already exported, it shall make sense to also export
fscache_end_operation().

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 fs/fscache/internal.h   | 11 -----------
 fs/nfs/fscache.c        |  8 --------
 include/linux/fscache.h | 14 ++++++++++++++
 3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index f121c21590dc..ed1c9ed737f2 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -70,17 +70,6 @@ static inline void fscache_see_cookie(struct fscache_cookie *cookie,
 			     where);
 }
 
-/*
- * io.c
- */
-static inline void fscache_end_operation(struct netfs_cache_resources *cres)
-{
-	const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
-
-	if (ops)
-		ops->end_operation(cres);
-}
-
 /*
  * main.c
  */
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index cfe901650ab0..39654ca72d3d 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -249,14 +249,6 @@ void nfs_fscache_release_file(struct inode *inode, struct file *filp)
 	}
 }
 
-static inline void fscache_end_operation(struct netfs_cache_resources *cres)
-{
-	const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
-
-	if (ops)
-		ops->end_operation(cres);
-}
-
 /*
  * Fallback page reading interface.
  */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 296c5f1d9f35..d2430da8aa67 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -456,6 +456,20 @@ int fscache_begin_read_operation(struct netfs_cache_resources *cres,
 	return -ENOBUFS;
 }
 
+/**
+ * fscache_end_operation - End the read operation for the netfs lib
+ * @cres: The cache resources for the read operation
+ *
+ * Clean up the resources at the end of the read request.
+ */
+static inline void fscache_end_operation(struct netfs_cache_resources *cres)
+{
+	const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
+
+	if (ops)
+		ops->end_operation(cres);
+}
+
 /**
  * fscache_read - Start a read from the cache.
  * @cres: The cache resources to use
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 01/22] fscache: export fscache_end_operation()
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Export fscache_end_operation() to avoid code duplication.

Besides, considering the paired fscache_begin_read_operation() is
already exported, it shall make sense to also export
fscache_end_operation().

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 fs/fscache/internal.h   | 11 -----------
 fs/nfs/fscache.c        |  8 --------
 include/linux/fscache.h | 14 ++++++++++++++
 3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index f121c21590dc..ed1c9ed737f2 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -70,17 +70,6 @@ static inline void fscache_see_cookie(struct fscache_cookie *cookie,
 			     where);
 }
 
-/*
- * io.c
- */
-static inline void fscache_end_operation(struct netfs_cache_resources *cres)
-{
-	const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
-
-	if (ops)
-		ops->end_operation(cres);
-}
-
 /*
  * main.c
  */
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index cfe901650ab0..39654ca72d3d 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -249,14 +249,6 @@ void nfs_fscache_release_file(struct inode *inode, struct file *filp)
 	}
 }
 
-static inline void fscache_end_operation(struct netfs_cache_resources *cres)
-{
-	const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
-
-	if (ops)
-		ops->end_operation(cres);
-}
-
 /*
  * Fallback page reading interface.
  */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 296c5f1d9f35..d2430da8aa67 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -456,6 +456,20 @@ int fscache_begin_read_operation(struct netfs_cache_resources *cres,
 	return -ENOBUFS;
 }
 
+/**
+ * fscache_end_operation - End the read operation for the netfs lib
+ * @cres: The cache resources for the read operation
+ *
+ * Clean up the resources at the end of the read request.
+ */
+static inline void fscache_end_operation(struct netfs_cache_resources *cres)
+{
+	const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
+
+	if (ops)
+		ops->end_operation(cres);
+}
+
 /**
  * fscache_read - Start a read from the cache.
  * @cres: The cache resources to use
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 02/22] cachefiles: extract write routine
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Extract the generic routine of writing data to cache files, and make it
generally available.

This will be used by the following patch implementing on-demand read
mode. Since it's called inside cachefiles module in this case, make the
interface generic and unrelated to netfs_cache_resources.

It is worth nothing that, ki->inval_counter is not initialized after
this cleanup. It shall not make any visible difference, since
inval_counter is no longer used in the write completion routine, i.e.
cachefiles_write_complete().

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/internal.h | 10 +++++++
 fs/cachefiles/io.c       | 61 +++++++++++++++++++++++-----------------
 2 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index c793d33b0224..e80673d0ab97 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -201,6 +201,16 @@ extern void cachefiles_put_object(struct cachefiles_object *object,
  */
 extern bool cachefiles_begin_operation(struct netfs_cache_resources *cres,
 				       enum fscache_want_state want_state);
+extern int __cachefiles_prepare_write(struct cachefiles_object *object,
+				      struct file *file,
+				      loff_t *_start, size_t *_len,
+				      bool no_space_allocated_yet);
+extern int __cachefiles_write(struct cachefiles_object *object,
+			      struct file *file,
+			      loff_t start_pos,
+			      struct iov_iter *iter,
+			      netfs_io_terminated_t term_func,
+			      void *term_func_priv);
 
 /*
  * key.c
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 753986ea1583..8dbc1eb254a3 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -278,36 +278,33 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
 /*
  * Initiate a write to the cache.
  */
-static int cachefiles_write(struct netfs_cache_resources *cres,
-			    loff_t start_pos,
-			    struct iov_iter *iter,
-			    netfs_io_terminated_t term_func,
-			    void *term_func_priv)
+int __cachefiles_write(struct cachefiles_object *object,
+		       struct file *file,
+		       loff_t start_pos,
+		       struct iov_iter *iter,
+		       netfs_io_terminated_t term_func,
+		       void *term_func_priv)
 {
-	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	struct cachefiles_kiocb *ki;
 	struct inode *inode;
-	struct file *file;
 	unsigned int old_nofs;
-	ssize_t ret = -ENOBUFS;
+	ssize_t ret;
 	size_t len = iov_iter_count(iter);
 
-	if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE))
-		goto presubmission_error;
 	fscache_count_write();
-	object = cachefiles_cres_object(cres);
 	cache = object->volume->cache;
-	file = cachefiles_cres_file(cres);
 
 	_enter("%pD,%li,%llx,%zx/%llx",
 	       file, file_inode(file)->i_ino, start_pos, len,
 	       i_size_read(file_inode(file)));
 
-	ret = -ENOMEM;
 	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
-	if (!ki)
-		goto presubmission_error;
+	if (!ki) {
+		if (term_func)
+			term_func(term_func_priv, -ENOMEM, false);
+		return -ENOMEM;
+	}
 
 	refcount_set(&ki->ki_refcnt, 2);
 	ki->iocb.ki_filp	= file;
@@ -316,7 +313,6 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
 	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
 	ki->iocb.ki_ioprio	= get_current_ioprio();
 	ki->object		= object;
-	ki->inval_counter	= cres->inval_counter;
 	ki->start		= start_pos;
 	ki->len			= len;
 	ki->term_func		= term_func;
@@ -371,11 +367,24 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
 	cachefiles_put_kiocb(ki);
 	_leave(" = %zd", ret);
 	return ret;
+}
 
-presubmission_error:
-	if (term_func)
-		term_func(term_func_priv, ret, false);
-	return ret;
+static int cachefiles_write(struct netfs_cache_resources *cres,
+			    loff_t start_pos,
+			    struct iov_iter *iter,
+			    netfs_io_terminated_t term_func,
+			    void *term_func_priv)
+{
+	if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE)) {
+		if (term_func)
+			term_func(term_func_priv, -ENOBUFS, false);
+		return -ENOBUFS;
+	}
+
+	return __cachefiles_write(cachefiles_cres_object(cres),
+				  cachefiles_cres_file(cres),
+				  start_pos, iter,
+				  term_func, term_func_priv);
 }
 
 /*
@@ -486,13 +495,12 @@ static enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subreque
 /*
  * Prepare for a write to occur.
  */
-static int __cachefiles_prepare_write(struct netfs_cache_resources *cres,
-				      loff_t *_start, size_t *_len, loff_t i_size,
-				      bool no_space_allocated_yet)
+int __cachefiles_prepare_write(struct cachefiles_object *object,
+			       struct file *file,
+			       loff_t *_start, size_t *_len,
+			       bool no_space_allocated_yet)
 {
-	struct cachefiles_object *object = cachefiles_cres_object(cres);
 	struct cachefiles_cache *cache = object->volume->cache;
-	struct file *file = cachefiles_cres_file(cres);
 	loff_t start = *_start, pos;
 	size_t len = *_len, down;
 	int ret;
@@ -579,7 +587,8 @@ static int cachefiles_prepare_write(struct netfs_cache_resources *cres,
 	}
 
 	cachefiles_begin_secure(cache, &saved_cred);
-	ret = __cachefiles_prepare_write(cres, _start, _len, i_size,
+	ret = __cachefiles_prepare_write(object, cachefiles_cres_file(cres),
+					 _start, _len,
 					 no_space_allocated_yet);
 	cachefiles_end_secure(cache, saved_cred);
 	return ret;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 02/22] cachefiles: extract write routine
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Extract the generic routine of writing data to cache files, and make it
generally available.

This will be used by the following patch implementing on-demand read
mode. Since it's called inside cachefiles module in this case, make the
interface generic and unrelated to netfs_cache_resources.

It is worth nothing that, ki->inval_counter is not initialized after
this cleanup. It shall not make any visible difference, since
inval_counter is no longer used in the write completion routine, i.e.
cachefiles_write_complete().

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/internal.h | 10 +++++++
 fs/cachefiles/io.c       | 61 +++++++++++++++++++++++-----------------
 2 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index c793d33b0224..e80673d0ab97 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -201,6 +201,16 @@ extern void cachefiles_put_object(struct cachefiles_object *object,
  */
 extern bool cachefiles_begin_operation(struct netfs_cache_resources *cres,
 				       enum fscache_want_state want_state);
+extern int __cachefiles_prepare_write(struct cachefiles_object *object,
+				      struct file *file,
+				      loff_t *_start, size_t *_len,
+				      bool no_space_allocated_yet);
+extern int __cachefiles_write(struct cachefiles_object *object,
+			      struct file *file,
+			      loff_t start_pos,
+			      struct iov_iter *iter,
+			      netfs_io_terminated_t term_func,
+			      void *term_func_priv);
 
 /*
  * key.c
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 753986ea1583..8dbc1eb254a3 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -278,36 +278,33 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
 /*
  * Initiate a write to the cache.
  */
-static int cachefiles_write(struct netfs_cache_resources *cres,
-			    loff_t start_pos,
-			    struct iov_iter *iter,
-			    netfs_io_terminated_t term_func,
-			    void *term_func_priv)
+int __cachefiles_write(struct cachefiles_object *object,
+		       struct file *file,
+		       loff_t start_pos,
+		       struct iov_iter *iter,
+		       netfs_io_terminated_t term_func,
+		       void *term_func_priv)
 {
-	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	struct cachefiles_kiocb *ki;
 	struct inode *inode;
-	struct file *file;
 	unsigned int old_nofs;
-	ssize_t ret = -ENOBUFS;
+	ssize_t ret;
 	size_t len = iov_iter_count(iter);
 
-	if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE))
-		goto presubmission_error;
 	fscache_count_write();
-	object = cachefiles_cres_object(cres);
 	cache = object->volume->cache;
-	file = cachefiles_cres_file(cres);
 
 	_enter("%pD,%li,%llx,%zx/%llx",
 	       file, file_inode(file)->i_ino, start_pos, len,
 	       i_size_read(file_inode(file)));
 
-	ret = -ENOMEM;
 	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
-	if (!ki)
-		goto presubmission_error;
+	if (!ki) {
+		if (term_func)
+			term_func(term_func_priv, -ENOMEM, false);
+		return -ENOMEM;
+	}
 
 	refcount_set(&ki->ki_refcnt, 2);
 	ki->iocb.ki_filp	= file;
@@ -316,7 +313,6 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
 	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
 	ki->iocb.ki_ioprio	= get_current_ioprio();
 	ki->object		= object;
-	ki->inval_counter	= cres->inval_counter;
 	ki->start		= start_pos;
 	ki->len			= len;
 	ki->term_func		= term_func;
@@ -371,11 +367,24 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
 	cachefiles_put_kiocb(ki);
 	_leave(" = %zd", ret);
 	return ret;
+}
 
-presubmission_error:
-	if (term_func)
-		term_func(term_func_priv, ret, false);
-	return ret;
+static int cachefiles_write(struct netfs_cache_resources *cres,
+			    loff_t start_pos,
+			    struct iov_iter *iter,
+			    netfs_io_terminated_t term_func,
+			    void *term_func_priv)
+{
+	if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE)) {
+		if (term_func)
+			term_func(term_func_priv, -ENOBUFS, false);
+		return -ENOBUFS;
+	}
+
+	return __cachefiles_write(cachefiles_cres_object(cres),
+				  cachefiles_cres_file(cres),
+				  start_pos, iter,
+				  term_func, term_func_priv);
 }
 
 /*
@@ -486,13 +495,12 @@ static enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subreque
 /*
  * Prepare for a write to occur.
  */
-static int __cachefiles_prepare_write(struct netfs_cache_resources *cres,
-				      loff_t *_start, size_t *_len, loff_t i_size,
-				      bool no_space_allocated_yet)
+int __cachefiles_prepare_write(struct cachefiles_object *object,
+			       struct file *file,
+			       loff_t *_start, size_t *_len,
+			       bool no_space_allocated_yet)
 {
-	struct cachefiles_object *object = cachefiles_cres_object(cres);
 	struct cachefiles_cache *cache = object->volume->cache;
-	struct file *file = cachefiles_cres_file(cres);
 	loff_t start = *_start, pos;
 	size_t len = *_len, down;
 	int ret;
@@ -579,7 +587,8 @@ static int cachefiles_prepare_write(struct netfs_cache_resources *cres,
 	}
 
 	cachefiles_begin_secure(cache, &saved_cred);
-	ret = __cachefiles_prepare_write(cres, _start, _len, i_size,
+	ret = __cachefiles_prepare_write(object, cachefiles_cres_file(cres),
+					 _start, _len,
 					 no_space_allocated_yet);
 	cachefiles_end_secure(cache, saved_cred);
 	return ret;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Fscache/cachefiles used to serve as a local cache for remote fs. This
patch, along with the following patches, introduces a new on-demand read
mode for cachefiles, which can boost the scenario where on-demand read
semantics is needed, e.g. container image distribution.

The essential difference between the original mode and on-demand read
mode is that, in the original mode, when cache miss, netfs itself will
fetch data from remote, and then write the fetched data into cache file.
While in on-demand read mode, a user daemon is responsible for fetching
data and then writing to the cache file.

This patch only adds the command to enable on-demand read mode. An
optional parameter to "bind" command is added. On-demand mode will be
turned on when this optional argument matches "ondemand" exactly, i.e.
"bind ondemand". Otherwise cachefiles will keep working in the original
mode.

The following patches will implement the data plane of on-demand read
mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/Kconfig    |  11 ++++
 fs/cachefiles/daemon.c   | 132 +++++++++++++++++++++++++++++++--------
 fs/cachefiles/internal.h |   6 ++
 3 files changed, 124 insertions(+), 25 deletions(-)

diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig
index 719faeeda168..58aad1fb4c5c 100644
--- a/fs/cachefiles/Kconfig
+++ b/fs/cachefiles/Kconfig
@@ -26,3 +26,14 @@ config CACHEFILES_ERROR_INJECTION
 	help
 	  This permits error injection to be enabled in cachefiles whilst a
 	  cache is in service.
+
+config CACHEFILES_ONDEMAND
+	bool "Support for on-demand read"
+	depends on CACHEFILES
+	default n
+	help
+	  This permits on-demand read mode of cachefiles. In this mode, when
+	  cache miss, the cachefiles backend instead of netfs, is responsible
+          for fetching data, e.g. through user daemon.
+
+	  If unsure, say N.
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 7ac04ee2c0a0..c0c3a3cbee28 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -78,6 +78,65 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
 	{ "",		NULL				}
 };
 
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
+{
+	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
+	rwlock_init(&cache->reqs_lock);
+}
+
+static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
+{
+	xa_destroy(&cache->reqs);
+}
+
+static inline __poll_t cachefiles_ondemand_mask(struct cachefiles_cache *cache)
+{
+	__poll_t mask = 0;
+
+	if (!xa_empty(&cache->reqs))
+		mask |= EPOLLIN;
+
+	if (test_bit(CACHEFILES_CULLING, &cache->flags))
+		mask |= EPOLLOUT;
+
+	return mask;
+}
+
+static inline
+bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
+{
+	if (!strcmp(args, "ondemand")) {
+		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
+		return true;
+	}
+
+	return false;
+}
+
+#else
+static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache) {}
+static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache) {}
+
+static inline
+__poll_t cachefiles_ondemand_mask(struct cachefiles_cache *cache)
+{
+	return 0;
+}
+
+static inline
+bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
+{
+	return false;
+}
+#endif
+
+static inline
+ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					char __user *_buffer, size_t buflen)
+{
+	return -EOPNOTSUPP;
+}
 
 /*
  * Prepare a cache for caching.
@@ -108,6 +167,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	INIT_LIST_HEAD(&cache->volumes);
 	INIT_LIST_HEAD(&cache->object_list);
 	spin_lock_init(&cache->object_list_lock);
+	cachefiles_ondemand_open(cache);
 
 	/* set default caching limits
 	 * - limit at 1% free space and/or free files
@@ -139,6 +199,7 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 
 	set_bit(CACHEFILES_DEAD, &cache->flags);
 
+	cachefiles_ondemand_release(cache);
 	cachefiles_daemon_unbind(cache);
 
 	/* clean up the control file interface */
@@ -152,23 +213,15 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-/*
- * Read the cache state.
- */
-static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
-				      size_t buflen, loff_t *pos)
+static ssize_t cachefiles_do_daemon_read(struct cachefiles_cache *cache,
+					 char __user *_buffer,
+					 size_t buflen)
 {
-	struct cachefiles_cache *cache = file->private_data;
 	unsigned long long b_released;
 	unsigned f_released;
 	char buffer[256];
 	int n;
 
-	//_enter(",,%zu,", buflen);
-
-	if (!test_bit(CACHEFILES_READY, &cache->flags))
-		return 0;
-
 	/* check how much space the cache has */
 	cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);
 
@@ -206,6 +259,25 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
 	return n;
 }
 
+/*
+ * Read the cache state.
+ */
+static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
+				      size_t buflen, loff_t *pos)
+{
+	struct cachefiles_cache *cache = file->private_data;
+
+	//_enter(",,%zu,", buflen);
+
+	if (!test_bit(CACHEFILES_READY, &cache->flags))
+		return 0;
+
+	if (likely(!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)))
+		return cachefiles_do_daemon_read(cache, _buffer, buflen);
+	else
+		return cachefiles_ondemand_daemon_read(cache, _buffer, buflen);
+}
+
 /*
  * Take a command from cachefilesd, parse it and act on it.
  */
@@ -284,6 +356,21 @@ static ssize_t cachefiles_daemon_write(struct file *file,
 	goto error;
 }
 
+
+static inline
+__poll_t cachefiles_daemon_mask(struct cachefiles_cache *cache)
+{
+	__poll_t mask = 0;
+
+	if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
+		mask |= EPOLLIN;
+
+	if (test_bit(CACHEFILES_CULLING, &cache->flags))
+		mask |= EPOLLOUT;
+
+	return mask;
+}
+
 /*
  * Poll for culling state
  * - use EPOLLOUT to indicate culling state
@@ -292,18 +379,13 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
 					   struct poll_table_struct *poll)
 {
 	struct cachefiles_cache *cache = file->private_data;
-	__poll_t mask;
 
 	poll_wait(file, &cache->daemon_pollwq, poll);
-	mask = 0;
-
-	if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
-		mask |= EPOLLIN;
-
-	if (test_bit(CACHEFILES_CULLING, &cache->flags))
-		mask |= EPOLLOUT;
 
-	return mask;
+	if (likely(!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)))
+		return cachefiles_daemon_mask(cache);
+	else
+		return cachefiles_ondemand_mask(cache);
 }
 
 /*
@@ -687,11 +769,6 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 	    cache->brun_percent  >= 100)
 		return -ERANGE;
 
-	if (*args) {
-		pr_err("'bind' command doesn't take an argument\n");
-		return -EINVAL;
-	}
-
 	if (!cache->rootdirname) {
 		pr_err("No cache directory specified\n");
 		return -EINVAL;
@@ -703,6 +780,11 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 		return -EBUSY;
 	}
 
+	if (!cachefiles_ondemand_daemon_bind(cache, args) && *args) {
+		pr_err("'bind' command doesn't take an argument\n");
+		return -EINVAL;
+	}
+
 	/* Make sure we have copies of the tag string */
 	if (!cache->tag) {
 		/*
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e80673d0ab97..3f791882fa3f 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -15,6 +15,7 @@
 #include <linux/fscache-cache.h>
 #include <linux/cred.h>
 #include <linux/security.h>
+#include <linux/xarray.h>
 
 #define CACHEFILES_DIO_BLOCK_SIZE 4096
 
@@ -98,9 +99,14 @@ struct cachefiles_cache {
 #define CACHEFILES_DEAD			1	/* T if cache dead */
 #define CACHEFILES_CULLING		2	/* T if cull engaged */
 #define CACHEFILES_STATE_CHANGED	3	/* T if state changed (poll trigger) */
+#define CACHEFILES_ONDEMAND_MODE	4	/* T if in on-demand read mode */
 	char				*rootdirname;	/* name of cache root directory */
 	char				*secctx;	/* LSM security context */
 	char				*tag;		/* cache binding tag */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	struct xarray			reqs;		/* xarray of pending on-demand requests */
+	rwlock_t			reqs_lock;	/* Lock for reqs xarray */
+#endif
 };
 
 #include <trace/events/cachefiles.h>
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Fscache/cachefiles used to serve as a local cache for remote fs. This
patch, along with the following patches, introduces a new on-demand read
mode for cachefiles, which can boost the scenario where on-demand read
semantics is needed, e.g. container image distribution.

The essential difference between the original mode and on-demand read
mode is that, in the original mode, when cache miss, netfs itself will
fetch data from remote, and then write the fetched data into cache file.
While in on-demand read mode, a user daemon is responsible for fetching
data and then writing to the cache file.

This patch only adds the command to enable on-demand read mode. An
optional parameter to "bind" command is added. On-demand mode will be
turned on when this optional argument matches "ondemand" exactly, i.e.
"bind ondemand". Otherwise cachefiles will keep working in the original
mode.

The following patches will implement the data plane of on-demand read
mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/Kconfig    |  11 ++++
 fs/cachefiles/daemon.c   | 132 +++++++++++++++++++++++++++++++--------
 fs/cachefiles/internal.h |   6 ++
 3 files changed, 124 insertions(+), 25 deletions(-)

diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig
index 719faeeda168..58aad1fb4c5c 100644
--- a/fs/cachefiles/Kconfig
+++ b/fs/cachefiles/Kconfig
@@ -26,3 +26,14 @@ config CACHEFILES_ERROR_INJECTION
 	help
 	  This permits error injection to be enabled in cachefiles whilst a
 	  cache is in service.
+
+config CACHEFILES_ONDEMAND
+	bool "Support for on-demand read"
+	depends on CACHEFILES
+	default n
+	help
+	  This permits on-demand read mode of cachefiles. In this mode, when
+	  cache miss, the cachefiles backend instead of netfs, is responsible
+          for fetching data, e.g. through user daemon.
+
+	  If unsure, say N.
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 7ac04ee2c0a0..c0c3a3cbee28 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -78,6 +78,65 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
 	{ "",		NULL				}
 };
 
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
+{
+	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
+	rwlock_init(&cache->reqs_lock);
+}
+
+static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
+{
+	xa_destroy(&cache->reqs);
+}
+
+static inline __poll_t cachefiles_ondemand_mask(struct cachefiles_cache *cache)
+{
+	__poll_t mask = 0;
+
+	if (!xa_empty(&cache->reqs))
+		mask |= EPOLLIN;
+
+	if (test_bit(CACHEFILES_CULLING, &cache->flags))
+		mask |= EPOLLOUT;
+
+	return mask;
+}
+
+static inline
+bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
+{
+	if (!strcmp(args, "ondemand")) {
+		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
+		return true;
+	}
+
+	return false;
+}
+
+#else
+static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache) {}
+static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache) {}
+
+static inline
+__poll_t cachefiles_ondemand_mask(struct cachefiles_cache *cache)
+{
+	return 0;
+}
+
+static inline
+bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
+{
+	return false;
+}
+#endif
+
+static inline
+ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					char __user *_buffer, size_t buflen)
+{
+	return -EOPNOTSUPP;
+}
 
 /*
  * Prepare a cache for caching.
@@ -108,6 +167,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	INIT_LIST_HEAD(&cache->volumes);
 	INIT_LIST_HEAD(&cache->object_list);
 	spin_lock_init(&cache->object_list_lock);
+	cachefiles_ondemand_open(cache);
 
 	/* set default caching limits
 	 * - limit at 1% free space and/or free files
@@ -139,6 +199,7 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 
 	set_bit(CACHEFILES_DEAD, &cache->flags);
 
+	cachefiles_ondemand_release(cache);
 	cachefiles_daemon_unbind(cache);
 
 	/* clean up the control file interface */
@@ -152,23 +213,15 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-/*
- * Read the cache state.
- */
-static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
-				      size_t buflen, loff_t *pos)
+static ssize_t cachefiles_do_daemon_read(struct cachefiles_cache *cache,
+					 char __user *_buffer,
+					 size_t buflen)
 {
-	struct cachefiles_cache *cache = file->private_data;
 	unsigned long long b_released;
 	unsigned f_released;
 	char buffer[256];
 	int n;
 
-	//_enter(",,%zu,", buflen);
-
-	if (!test_bit(CACHEFILES_READY, &cache->flags))
-		return 0;
-
 	/* check how much space the cache has */
 	cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);
 
@@ -206,6 +259,25 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
 	return n;
 }
 
+/*
+ * Read the cache state.
+ */
+static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
+				      size_t buflen, loff_t *pos)
+{
+	struct cachefiles_cache *cache = file->private_data;
+
+	//_enter(",,%zu,", buflen);
+
+	if (!test_bit(CACHEFILES_READY, &cache->flags))
+		return 0;
+
+	if (likely(!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)))
+		return cachefiles_do_daemon_read(cache, _buffer, buflen);
+	else
+		return cachefiles_ondemand_daemon_read(cache, _buffer, buflen);
+}
+
 /*
  * Take a command from cachefilesd, parse it and act on it.
  */
@@ -284,6 +356,21 @@ static ssize_t cachefiles_daemon_write(struct file *file,
 	goto error;
 }
 
+
+static inline
+__poll_t cachefiles_daemon_mask(struct cachefiles_cache *cache)
+{
+	__poll_t mask = 0;
+
+	if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
+		mask |= EPOLLIN;
+
+	if (test_bit(CACHEFILES_CULLING, &cache->flags))
+		mask |= EPOLLOUT;
+
+	return mask;
+}
+
 /*
  * Poll for culling state
  * - use EPOLLOUT to indicate culling state
@@ -292,18 +379,13 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
 					   struct poll_table_struct *poll)
 {
 	struct cachefiles_cache *cache = file->private_data;
-	__poll_t mask;
 
 	poll_wait(file, &cache->daemon_pollwq, poll);
-	mask = 0;
-
-	if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
-		mask |= EPOLLIN;
-
-	if (test_bit(CACHEFILES_CULLING, &cache->flags))
-		mask |= EPOLLOUT;
 
-	return mask;
+	if (likely(!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)))
+		return cachefiles_daemon_mask(cache);
+	else
+		return cachefiles_ondemand_mask(cache);
 }
 
 /*
@@ -687,11 +769,6 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 	    cache->brun_percent  >= 100)
 		return -ERANGE;
 
-	if (*args) {
-		pr_err("'bind' command doesn't take an argument\n");
-		return -EINVAL;
-	}
-
 	if (!cache->rootdirname) {
 		pr_err("No cache directory specified\n");
 		return -EINVAL;
@@ -703,6 +780,11 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 		return -EBUSY;
 	}
 
+	if (!cachefiles_ondemand_daemon_bind(cache, args) && *args) {
+		pr_err("'bind' command doesn't take an argument\n");
+		return -EINVAL;
+	}
+
 	/* Make sure we have copies of the tag string */
 	if (!cache->tag) {
 		/*
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e80673d0ab97..3f791882fa3f 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -15,6 +15,7 @@
 #include <linux/fscache-cache.h>
 #include <linux/cred.h>
 #include <linux/security.h>
+#include <linux/xarray.h>
 
 #define CACHEFILES_DIO_BLOCK_SIZE 4096
 
@@ -98,9 +99,14 @@ struct cachefiles_cache {
 #define CACHEFILES_DEAD			1	/* T if cache dead */
 #define CACHEFILES_CULLING		2	/* T if cull engaged */
 #define CACHEFILES_STATE_CHANGED	3	/* T if state changed (poll trigger) */
+#define CACHEFILES_ONDEMAND_MODE	4	/* T if in on-demand read mode */
 	char				*rootdirname;	/* name of cache root directory */
 	char				*secctx;	/* LSM security context */
 	char				*tag;		/* cache binding tag */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	struct xarray			reqs;		/* xarray of pending on-demand requests */
+	rwlock_t			reqs_lock;	/* Lock for reqs xarray */
+#endif
 };
 
 #include <trace/events/cachefiles.h>
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 04/22] cachefiles: notify user daemon with anon_fd when looking up cookie
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Send the anonymous fd to user daemon when looking up cookie, no matter
whether the cache file exist there or not. With the given anonymous fd,
user daemon can fetch and then write data into cache file in advance,
even when cache miss has not happended yet.

Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that
cache file size shall be retrieved at runtime. This helps the scenario
where one cache file can contain multiple netfs files for the purpose of
deduplication, e.g. In this case, netfs itself has no idea the cache
file size, whilst user daemon needs to offer the hint on the cache file
size.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/daemon.c            | 365 +++++++++++++++++++++++++++++-
 fs/cachefiles/internal.h          |  24 ++
 fs/cachefiles/namei.c             |  16 +-
 include/linux/fscache.h           |   1 +
 include/trace/events/cachefiles.h |   2 +
 include/uapi/linux/cachefiles.h   |  39 ++++
 6 files changed, 444 insertions(+), 3 deletions(-)
 create mode 100644 include/uapi/linux/cachefiles.h

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index c0c3a3cbee28..3c3a461f8cd8 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -19,6 +19,8 @@
 #include <linux/ctype.h>
 #include <linux/string.h>
 #include <linux/fs_struct.h>
+#include <linux/fdtable.h>
+#include <linux/anon_inodes.h>
 #include "internal.h"
 
 static int cachefiles_daemon_open(struct inode *, struct file *);
@@ -43,6 +45,9 @@ static int cachefiles_daemon_secctx(struct cachefiles_cache *, char *);
 static int cachefiles_daemon_tag(struct cachefiles_cache *, char *);
 static int cachefiles_daemon_bind(struct cachefiles_cache *, char *);
 static void cachefiles_daemon_unbind(struct cachefiles_cache *);
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+static int cachefiles_ondemand_cinit(struct cachefiles_cache *, char *);
+#endif
 
 static unsigned long cachefiles_open;
 
@@ -75,6 +80,9 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
 	{ "inuse",	cachefiles_daemon_inuse		},
 	{ "secctx",	cachefiles_daemon_secctx	},
 	{ "tag",	cachefiles_daemon_tag		},
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	{ "cinit",	cachefiles_ondemand_cinit	},
+#endif
 	{ "",		NULL				}
 };
 
@@ -87,6 +95,21 @@ static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
 
 static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
 {
+	struct cachefiles_req *req;
+	unsigned long index;
+
+	/*
+	 * 1) Cache has been marked as dead state, and then 2) flush all pending
+	 * requests in @reqs xarray. The barrier inside set_bit() will ensure
+	 * that above two ops won't be reordered.
+	 */
+	write_lock(&cache->reqs_lock);
+	xa_for_each(&cache->reqs, index, req) {
+		req->error = -EIO;
+		complete(&req->done);
+	}
+	write_unlock(&cache->reqs_lock);
+
 	xa_destroy(&cache->reqs);
 }
 
@@ -114,6 +137,346 @@ bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
 	return false;
 }
 
+static int cachefiles_ondemand_fd_release(struct inode *inode, struct file *file)
+{
+	struct cachefiles_object *object = file->private_data;
+
+	/*
+	 * Uninstall anon_fd to the cachefiles object, so that no further
+	 * associated requests will get enqueued.
+	 */
+	object->fd = -1;
+
+	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+	return 0;
+}
+
+static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
+						 struct iov_iter *iter)
+{
+	struct cachefiles_object *object = kiocb->ki_filp->private_data;
+	struct cachefiles_cache *cache = object->volume->cache;
+	struct file *file = object->file;
+	size_t len = iter->count;
+	loff_t pos = kiocb->ki_pos;
+	const struct cred *saved_cred;
+	int ret;
+
+	if (!file)
+		return -ENOBUFS;
+
+	cachefiles_begin_secure(cache, &saved_cred);
+	ret = __cachefiles_prepare_write(object, file, &pos, &len, true);
+	cachefiles_end_secure(cache, saved_cred);
+	if (ret < 0)
+		return ret;
+
+	ret = __cachefiles_write(object, file, pos, iter, NULL, NULL);
+	if (!ret)
+		ret = len;
+
+	return ret;
+}
+
+static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos, int whence)
+{
+	struct cachefiles_object *object = filp->private_data;
+	struct file *file = object->file;
+
+	if (!file)
+		return -ENOBUFS;
+
+	return vfs_llseek(file, pos, whence);
+}
+
+static const struct file_operations cachefiles_ondemand_fd_fops = {
+	.owner		= THIS_MODULE,
+	.release	= cachefiles_ondemand_fd_release,
+	.write_iter	= cachefiles_ondemand_fd_write_iter,
+	.llseek		= cachefiles_ondemand_fd_llseek,
+};
+
+/*
+ * Init request completion
+ * - command: "cinit <id>[,<cache_size>]"
+ */
+static int cachefiles_ondemand_cinit(struct cachefiles_cache *cache, char *args)
+{
+	struct cachefiles_req *req;
+	struct cachefiles_open *load;
+	struct fscache_cookie *cookie;
+	char *tmp, *pid, *psize;
+	unsigned long id, flags, size = 0;
+	int ret;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return -EOPNOTSUPP;
+
+	if (!*args) {
+		pr_err("Empty id specified\n");
+		return -EINVAL;
+	}
+
+	tmp = kstrdup(args, GFP_KERNEL);
+	if (!tmp)
+		return -ENOMEM;
+
+	pid = tmp;
+	psize = strchr(tmp, ',');
+	if (psize) {
+		*psize = 0;
+		psize++;
+
+		ret = kstrtoul(psize, 0, &size);
+		if (ret)
+			goto out;
+	}
+
+	ret = kstrtoul(pid, 0, &id);
+	if (ret)
+		goto out;
+
+	req = xa_erase(&cache->reqs, id);
+	if (!req) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	load = (void *)req->msg.data;
+	flags = load->flags;
+
+	if (test_bit(CACHEFILES_OPEN_WANT_CACHE_SIZE, &flags)) {
+		if (WARN_ON_ONCE(!size)) {
+			req->error = -EINVAL;
+		} else {
+			cookie = req->object->cookie;
+			cookie->object_size = size;
+			if (size)
+				set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+			else
+				clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+		}
+	}
+
+	complete(&req->done);
+out:
+	kfree(tmp);
+	return ret;
+}
+
+static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
+{
+	struct cachefiles_object *object;
+	struct cachefiles_open *load;
+	struct fd f;
+	int ret;
+
+	object = cachefiles_grab_object(req->object,
+			cachefiles_obj_get_ondemand_fd);
+
+	ret = anon_inode_getfd("[cachefiles]", &cachefiles_ondemand_fd_fops,
+				object, O_WRONLY);
+	if (ret < 0) {
+		cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+		return ret;
+	}
+
+	f = fdget_pos(ret);
+	if (WARN_ON_ONCE(!f.file))
+		return -EBADFD;
+
+	f.file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
+	fdput_pos(f);
+
+	load = (void *)req->msg.data;
+	load->fd = object->fd = ret;
+
+	return 0;
+}
+
+static ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					       char __user *_buffer,
+					       size_t buflen)
+{
+	struct cachefiles_req *req;
+	struct cachefiles_msg *msg;
+	unsigned long id = 0;
+	size_t n;
+	int ret = 0;
+	XA_STATE(xas, &cache->reqs, 0);
+
+	/*
+	 * Search for request that has not ever been processed, to prevent
+	 * requests from being sent to user daemon repeatedly.
+	 */
+	xa_lock(&cache->reqs);
+	req = xas_find_marked(&xas, UINT_MAX, CACHEFILES_REQ_NEW);
+	if (req)
+		xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
+	xa_unlock(&cache->reqs);
+
+	if (!req)
+		return 0;
+
+	msg = &req->msg;
+	msg->id = id = xas.xa_index;
+
+	n = msg->len;
+	if (n > buflen) {
+		ret = -EMSGSIZE;
+		goto error;
+	}
+
+	if (msg->opcode == CACHEFILES_OP_OPEN) {
+		ret = cachefiles_ondemand_get_fd(req);
+		if (ret)
+			goto error;
+	}
+
+	if (copy_to_user(_buffer, msg, n) != 0) {
+		ret = -EFAULT;
+		goto err_put_fd;
+	}
+
+	return n;
+
+err_put_fd:
+	if (msg->opcode == CACHEFILES_OP_OPEN)
+		close_fd(req->object->fd);
+error:
+	xa_erase(&cache->reqs, id);
+	req->error = ret;
+	complete(&req->done);
+	return ret;
+}
+
+typedef int (*init_req_fn)(struct cachefiles_req *req, void *private);
+
+static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
+					enum cachefiles_opcode opcode,
+					size_t data_len,
+					init_req_fn init_req,
+					void *private)
+{
+	struct cachefiles_cache *cache = object->volume->cache;
+	struct cachefiles_req *req;
+	struct xarray *xa = &cache->reqs;
+	int ret;
+	u32 id;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return -EOPNOTSUPP;
+
+	if (test_bit(CACHEFILES_DEAD, &cache->flags))
+		return -EIO;
+
+	req = kzalloc(sizeof(*req) + data_len, GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->object = object;
+	init_completion(&req->done);
+	req->msg.opcode = opcode;
+	req->msg.len = sizeof(struct cachefiles_msg) + data_len;
+
+	ret = init_req(req, private);
+	if (ret)
+		goto out;
+
+	/*
+	 * Enqueue the pending request.
+	 *
+	 * Stop enqueuing the request when daemon is dying. So we need to
+	 * 1) check cache state, and 2) enqueue request if cache is alive.
+	 *
+	 * The above two ops need to be atomic as a whole. @reqs_lock is used
+	 * here to ensure that. Otherwise, request may be enqueued after xarray
+	 * has been flushed, in which case the orphan request will never be
+	 * completed and thus netfs will hang there forever.
+	 */
+	read_lock(&cache->reqs_lock);
+
+	/* recheck dead state under lock */
+	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+		read_unlock(&cache->reqs_lock);
+		ret = -EIO;
+		goto out;
+	}
+
+	xa_lock(xa);
+	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
+	if (!ret)
+		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
+	xa_unlock(xa);
+
+	read_unlock(&cache->reqs_lock);
+
+	if (ret)
+		goto out;
+
+	wake_up_all(&cache->daemon_pollwq);
+	wait_for_completion(&req->done);
+	ret = req->error;
+out:
+	kfree(req);
+	return ret;
+}
+
+static int init_open_req(struct cachefiles_req *req, void *private)
+{
+	struct cachefiles_object *object = req->object;
+	struct fscache_cookie *cookie = object->cookie;
+	struct fscache_volume *volume = object->volume->vcookie;
+	struct cachefiles_open *load = (void *)req->msg.data;
+	size_t volume_key_len, cookie_key_len;
+	void *volume_key, *cookie_key;
+	unsigned long flags = 0;
+
+	/* volume key is of string format */
+	volume_key_len = volume->key[0] + 1;
+	volume_key = volume->key + 1;
+
+	/* cookie key is of binary format */
+	cookie_key_len = cookie->key_len;
+	cookie_key = fscache_get_key(cookie);
+
+	if (object->cookie->advice & FSCACHE_ADV_WANT_CACHE_SIZE)
+		__set_bit(CACHEFILES_OPEN_WANT_CACHE_SIZE, &flags);
+
+	load->flags = flags;
+	load->volume_key_len = volume_key_len;
+	load->cookie_key_len = cookie_key_len;
+	memcpy(load->data, volume_key, volume_key_len);
+	memcpy(load->data + volume_key_len, cookie_key, cookie_key_len);
+
+	return 0;
+}
+
+int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+	struct fscache_cookie *cookie = object->cookie;
+	struct fscache_volume *volume = object->volume->vcookie;
+	size_t volume_key_len, cookie_key_len, data_len;
+
+	/*
+	 * Cachefiles will firstly check cache file under the root cache
+	 * directory. If coherency check failed, it will fallback to creating a
+	 * new tmpfile as the cache file. Reuse the previously created anon_fd
+	 * if any.
+	 */
+	if (object->fd > 0)
+		return 0;
+
+	volume_key_len = volume->key[0] + 1;
+	cookie_key_len = cookie->key_len;
+	data_len = sizeof(struct cachefiles_open) +
+		   volume_key_len + cookie_key_len;
+
+	return cachefiles_ondemand_send_req(object,
+					    CACHEFILES_OP_OPEN, data_len,
+					    init_open_req, NULL);
+}
+
 #else
 static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache) {}
 static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache) {}
@@ -129,7 +492,6 @@ bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
 {
 	return false;
 }
-#endif
 
 static inline
 ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
@@ -137,6 +499,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 {
 	return -EOPNOTSUPP;
 }
+#endif
 
 /*
  * Prepare a cache for caching.
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 3f791882fa3f..8450ebd77949 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -16,6 +16,7 @@
 #include <linux/cred.h>
 #include <linux/security.h>
 #include <linux/xarray.h>
+#include <linux/cachefiles.h>
 
 #define CACHEFILES_DIO_BLOCK_SIZE 4096
 
@@ -59,6 +60,9 @@ struct cachefiles_object {
 	enum cachefiles_content		content_info:8;	/* Info about content presence */
 	unsigned long			flags;
 #define CACHEFILES_OBJECT_USING_TMPFILE	0		/* Have an unlinked tmpfile */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	int				fd;		/* anonymous fd */
+#endif
 };
 
 /*
@@ -109,6 +113,15 @@ struct cachefiles_cache {
 #endif
 };
 
+struct cachefiles_req {
+	struct cachefiles_object *object;
+	struct completion done;
+	int error;
+	struct cachefiles_msg msg;
+};
+
+#define CACHEFILES_REQ_NEW	XA_MARK_1
+
 #include <trace/events/cachefiles.h>
 
 static inline
@@ -152,6 +165,17 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
  */
 extern const struct file_operations cachefiles_daemon_fops;
 
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+
+#else
+static inline
+int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+	return 0;
+}
+#endif
+
 /*
  * error_inject.c
  */
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index f256c8aff7bb..22aba4c6a762 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -444,10 +444,9 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 	struct dentry *fan = volume->fanout[(u8)object->cookie->key_hash];
 	struct file *file;
 	struct path path;
-	uint64_t ni_size = object->cookie->object_size;
+	uint64_t ni_size;
 	long ret;
 
-	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
 	cachefiles_begin_secure(cache, &saved_cred);
 
@@ -473,6 +472,15 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 		goto out_dput;
 	}
 
+	ret = cachefiles_ondemand_init_object(object);
+	if (ret < 0) {
+		file = ERR_PTR(ret);
+		goto out_dput;
+	}
+
+	ni_size = object->cookie->object_size;
+	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
+
 	if (ni_size > 0) {
 		trace_cachefiles_trunc(object, d_backing_inode(path.dentry), 0, ni_size,
 				       cachefiles_trunc_expand_tmpfile);
@@ -573,6 +581,10 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
 	}
 	_debug("file -> %pd positive", dentry);
 
+	ret = cachefiles_ondemand_init_object(object);
+	if (ret < 0)
+		goto error_fput;
+
 	ret = cachefiles_check_auxdata(object, file);
 	if (ret < 0)
 		goto check_failed;
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index d2430da8aa67..a330354f33ca 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -39,6 +39,7 @@ struct fscache_cookie;
 #define FSCACHE_ADV_SINGLE_CHUNK	0x01 /* The object is a single chunk of data */
 #define FSCACHE_ADV_WRITE_CACHE		0x00 /* Do cache if written to locally */
 #define FSCACHE_ADV_WRITE_NOCACHE	0x02 /* Don't cache if written to locally */
+#define FSCACHE_ADV_WANT_CACHE_SIZE	0x04 /* Retrieve cache size at runtime */
 
 #define FSCACHE_INVAL_DIO_WRITE		0x01 /* Invalidate due to DIO write */
 
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index c6f5aa74db89..371e5816e98c 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -31,6 +31,8 @@ enum cachefiles_obj_ref_trace {
 	cachefiles_obj_see_lookup_failed,
 	cachefiles_obj_see_withdraw_cookie,
 	cachefiles_obj_see_withdrawal,
+	cachefiles_obj_get_ondemand_fd,
+	cachefiles_obj_put_ondemand_fd,
 };
 
 enum fscache_why_object_killed {
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
new file mode 100644
index 000000000000..5ea7285863f1
--- /dev/null
+++ b/include/uapi/linux/cachefiles.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_CACHEFILES_H
+#define _LINUX_CACHEFILES_H
+
+#include <linux/types.h>
+
+#define CACHEFILES_MSG_MAX_SIZE	512
+
+enum cachefiles_opcode {
+	CACHEFILES_OP_OPEN,
+};
+
+/*
+ * @id		identifying position of this message in the radix tree
+ * @opcode	message type, CACHEFILE_OP_*
+ * @len		message length, including message header and following data
+ * @data	message type specific payload
+ */
+struct cachefiles_msg {
+	__u32 id;
+	__u32 opcode;
+	__u32 len;
+	__u8  data[];
+};
+
+struct cachefiles_open {
+	__u32 volume_key_len;
+	__u32 cookie_key_len;
+	__u32 fd;
+	__u32 flags;
+	/* following data contains volume_key and cookie_key in sequence */
+	__u8  data[];
+};
+
+enum cachefiles_open_flags {
+	CACHEFILES_OPEN_WANT_CACHE_SIZE,
+};
+
+#endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 04/22] cachefiles: notify user daemon with anon_fd when looking up cookie
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Send the anonymous fd to user daemon when looking up cookie, no matter
whether the cache file exist there or not. With the given anonymous fd,
user daemon can fetch and then write data into cache file in advance,
even when cache miss has not happended yet.

Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that
cache file size shall be retrieved at runtime. This helps the scenario
where one cache file can contain multiple netfs files for the purpose of
deduplication, e.g. In this case, netfs itself has no idea the cache
file size, whilst user daemon needs to offer the hint on the cache file
size.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/daemon.c            | 365 +++++++++++++++++++++++++++++-
 fs/cachefiles/internal.h          |  24 ++
 fs/cachefiles/namei.c             |  16 +-
 include/linux/fscache.h           |   1 +
 include/trace/events/cachefiles.h |   2 +
 include/uapi/linux/cachefiles.h   |  39 ++++
 6 files changed, 444 insertions(+), 3 deletions(-)
 create mode 100644 include/uapi/linux/cachefiles.h

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index c0c3a3cbee28..3c3a461f8cd8 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -19,6 +19,8 @@
 #include <linux/ctype.h>
 #include <linux/string.h>
 #include <linux/fs_struct.h>
+#include <linux/fdtable.h>
+#include <linux/anon_inodes.h>
 #include "internal.h"
 
 static int cachefiles_daemon_open(struct inode *, struct file *);
@@ -43,6 +45,9 @@ static int cachefiles_daemon_secctx(struct cachefiles_cache *, char *);
 static int cachefiles_daemon_tag(struct cachefiles_cache *, char *);
 static int cachefiles_daemon_bind(struct cachefiles_cache *, char *);
 static void cachefiles_daemon_unbind(struct cachefiles_cache *);
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+static int cachefiles_ondemand_cinit(struct cachefiles_cache *, char *);
+#endif
 
 static unsigned long cachefiles_open;
 
@@ -75,6 +80,9 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
 	{ "inuse",	cachefiles_daemon_inuse		},
 	{ "secctx",	cachefiles_daemon_secctx	},
 	{ "tag",	cachefiles_daemon_tag		},
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	{ "cinit",	cachefiles_ondemand_cinit	},
+#endif
 	{ "",		NULL				}
 };
 
@@ -87,6 +95,21 @@ static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
 
 static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
 {
+	struct cachefiles_req *req;
+	unsigned long index;
+
+	/*
+	 * 1) Cache has been marked as dead state, and then 2) flush all pending
+	 * requests in @reqs xarray. The barrier inside set_bit() will ensure
+	 * that above two ops won't be reordered.
+	 */
+	write_lock(&cache->reqs_lock);
+	xa_for_each(&cache->reqs, index, req) {
+		req->error = -EIO;
+		complete(&req->done);
+	}
+	write_unlock(&cache->reqs_lock);
+
 	xa_destroy(&cache->reqs);
 }
 
@@ -114,6 +137,346 @@ bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
 	return false;
 }
 
+static int cachefiles_ondemand_fd_release(struct inode *inode, struct file *file)
+{
+	struct cachefiles_object *object = file->private_data;
+
+	/*
+	 * Uninstall anon_fd to the cachefiles object, so that no further
+	 * associated requests will get enqueued.
+	 */
+	object->fd = -1;
+
+	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+	return 0;
+}
+
+static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
+						 struct iov_iter *iter)
+{
+	struct cachefiles_object *object = kiocb->ki_filp->private_data;
+	struct cachefiles_cache *cache = object->volume->cache;
+	struct file *file = object->file;
+	size_t len = iter->count;
+	loff_t pos = kiocb->ki_pos;
+	const struct cred *saved_cred;
+	int ret;
+
+	if (!file)
+		return -ENOBUFS;
+
+	cachefiles_begin_secure(cache, &saved_cred);
+	ret = __cachefiles_prepare_write(object, file, &pos, &len, true);
+	cachefiles_end_secure(cache, saved_cred);
+	if (ret < 0)
+		return ret;
+
+	ret = __cachefiles_write(object, file, pos, iter, NULL, NULL);
+	if (!ret)
+		ret = len;
+
+	return ret;
+}
+
+static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos, int whence)
+{
+	struct cachefiles_object *object = filp->private_data;
+	struct file *file = object->file;
+
+	if (!file)
+		return -ENOBUFS;
+
+	return vfs_llseek(file, pos, whence);
+}
+
+static const struct file_operations cachefiles_ondemand_fd_fops = {
+	.owner		= THIS_MODULE,
+	.release	= cachefiles_ondemand_fd_release,
+	.write_iter	= cachefiles_ondemand_fd_write_iter,
+	.llseek		= cachefiles_ondemand_fd_llseek,
+};
+
+/*
+ * Init request completion
+ * - command: "cinit <id>[,<cache_size>]"
+ */
+static int cachefiles_ondemand_cinit(struct cachefiles_cache *cache, char *args)
+{
+	struct cachefiles_req *req;
+	struct cachefiles_open *load;
+	struct fscache_cookie *cookie;
+	char *tmp, *pid, *psize;
+	unsigned long id, flags, size = 0;
+	int ret;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return -EOPNOTSUPP;
+
+	if (!*args) {
+		pr_err("Empty id specified\n");
+		return -EINVAL;
+	}
+
+	tmp = kstrdup(args, GFP_KERNEL);
+	if (!tmp)
+		return -ENOMEM;
+
+	pid = tmp;
+	psize = strchr(tmp, ',');
+	if (psize) {
+		*psize = 0;
+		psize++;
+
+		ret = kstrtoul(psize, 0, &size);
+		if (ret)
+			goto out;
+	}
+
+	ret = kstrtoul(pid, 0, &id);
+	if (ret)
+		goto out;
+
+	req = xa_erase(&cache->reqs, id);
+	if (!req) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	load = (void *)req->msg.data;
+	flags = load->flags;
+
+	if (test_bit(CACHEFILES_OPEN_WANT_CACHE_SIZE, &flags)) {
+		if (WARN_ON_ONCE(!size)) {
+			req->error = -EINVAL;
+		} else {
+			cookie = req->object->cookie;
+			cookie->object_size = size;
+			if (size)
+				set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+			else
+				clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+		}
+	}
+
+	complete(&req->done);
+out:
+	kfree(tmp);
+	return ret;
+}
+
+static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
+{
+	struct cachefiles_object *object;
+	struct cachefiles_open *load;
+	struct fd f;
+	int ret;
+
+	object = cachefiles_grab_object(req->object,
+			cachefiles_obj_get_ondemand_fd);
+
+	ret = anon_inode_getfd("[cachefiles]", &cachefiles_ondemand_fd_fops,
+				object, O_WRONLY);
+	if (ret < 0) {
+		cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+		return ret;
+	}
+
+	f = fdget_pos(ret);
+	if (WARN_ON_ONCE(!f.file))
+		return -EBADFD;
+
+	f.file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
+	fdput_pos(f);
+
+	load = (void *)req->msg.data;
+	load->fd = object->fd = ret;
+
+	return 0;
+}
+
+static ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					       char __user *_buffer,
+					       size_t buflen)
+{
+	struct cachefiles_req *req;
+	struct cachefiles_msg *msg;
+	unsigned long id = 0;
+	size_t n;
+	int ret = 0;
+	XA_STATE(xas, &cache->reqs, 0);
+
+	/*
+	 * Search for request that has not ever been processed, to prevent
+	 * requests from being sent to user daemon repeatedly.
+	 */
+	xa_lock(&cache->reqs);
+	req = xas_find_marked(&xas, UINT_MAX, CACHEFILES_REQ_NEW);
+	if (req)
+		xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
+	xa_unlock(&cache->reqs);
+
+	if (!req)
+		return 0;
+
+	msg = &req->msg;
+	msg->id = id = xas.xa_index;
+
+	n = msg->len;
+	if (n > buflen) {
+		ret = -EMSGSIZE;
+		goto error;
+	}
+
+	if (msg->opcode == CACHEFILES_OP_OPEN) {
+		ret = cachefiles_ondemand_get_fd(req);
+		if (ret)
+			goto error;
+	}
+
+	if (copy_to_user(_buffer, msg, n) != 0) {
+		ret = -EFAULT;
+		goto err_put_fd;
+	}
+
+	return n;
+
+err_put_fd:
+	if (msg->opcode == CACHEFILES_OP_OPEN)
+		close_fd(req->object->fd);
+error:
+	xa_erase(&cache->reqs, id);
+	req->error = ret;
+	complete(&req->done);
+	return ret;
+}
+
+typedef int (*init_req_fn)(struct cachefiles_req *req, void *private);
+
+static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
+					enum cachefiles_opcode opcode,
+					size_t data_len,
+					init_req_fn init_req,
+					void *private)
+{
+	struct cachefiles_cache *cache = object->volume->cache;
+	struct cachefiles_req *req;
+	struct xarray *xa = &cache->reqs;
+	int ret;
+	u32 id;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return -EOPNOTSUPP;
+
+	if (test_bit(CACHEFILES_DEAD, &cache->flags))
+		return -EIO;
+
+	req = kzalloc(sizeof(*req) + data_len, GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->object = object;
+	init_completion(&req->done);
+	req->msg.opcode = opcode;
+	req->msg.len = sizeof(struct cachefiles_msg) + data_len;
+
+	ret = init_req(req, private);
+	if (ret)
+		goto out;
+
+	/*
+	 * Enqueue the pending request.
+	 *
+	 * Stop enqueuing the request when daemon is dying. So we need to
+	 * 1) check cache state, and 2) enqueue request if cache is alive.
+	 *
+	 * The above two ops need to be atomic as a whole. @reqs_lock is used
+	 * here to ensure that. Otherwise, request may be enqueued after xarray
+	 * has been flushed, in which case the orphan request will never be
+	 * completed and thus netfs will hang there forever.
+	 */
+	read_lock(&cache->reqs_lock);
+
+	/* recheck dead state under lock */
+	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+		read_unlock(&cache->reqs_lock);
+		ret = -EIO;
+		goto out;
+	}
+
+	xa_lock(xa);
+	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
+	if (!ret)
+		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
+	xa_unlock(xa);
+
+	read_unlock(&cache->reqs_lock);
+
+	if (ret)
+		goto out;
+
+	wake_up_all(&cache->daemon_pollwq);
+	wait_for_completion(&req->done);
+	ret = req->error;
+out:
+	kfree(req);
+	return ret;
+}
+
+static int init_open_req(struct cachefiles_req *req, void *private)
+{
+	struct cachefiles_object *object = req->object;
+	struct fscache_cookie *cookie = object->cookie;
+	struct fscache_volume *volume = object->volume->vcookie;
+	struct cachefiles_open *load = (void *)req->msg.data;
+	size_t volume_key_len, cookie_key_len;
+	void *volume_key, *cookie_key;
+	unsigned long flags = 0;
+
+	/* volume key is of string format */
+	volume_key_len = volume->key[0] + 1;
+	volume_key = volume->key + 1;
+
+	/* cookie key is of binary format */
+	cookie_key_len = cookie->key_len;
+	cookie_key = fscache_get_key(cookie);
+
+	if (object->cookie->advice & FSCACHE_ADV_WANT_CACHE_SIZE)
+		__set_bit(CACHEFILES_OPEN_WANT_CACHE_SIZE, &flags);
+
+	load->flags = flags;
+	load->volume_key_len = volume_key_len;
+	load->cookie_key_len = cookie_key_len;
+	memcpy(load->data, volume_key, volume_key_len);
+	memcpy(load->data + volume_key_len, cookie_key, cookie_key_len);
+
+	return 0;
+}
+
+int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+	struct fscache_cookie *cookie = object->cookie;
+	struct fscache_volume *volume = object->volume->vcookie;
+	size_t volume_key_len, cookie_key_len, data_len;
+
+	/*
+	 * Cachefiles will firstly check cache file under the root cache
+	 * directory. If coherency check failed, it will fallback to creating a
+	 * new tmpfile as the cache file. Reuse the previously created anon_fd
+	 * if any.
+	 */
+	if (object->fd > 0)
+		return 0;
+
+	volume_key_len = volume->key[0] + 1;
+	cookie_key_len = cookie->key_len;
+	data_len = sizeof(struct cachefiles_open) +
+		   volume_key_len + cookie_key_len;
+
+	return cachefiles_ondemand_send_req(object,
+					    CACHEFILES_OP_OPEN, data_len,
+					    init_open_req, NULL);
+}
+
 #else
 static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache) {}
 static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache) {}
@@ -129,7 +492,6 @@ bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
 {
 	return false;
 }
-#endif
 
 static inline
 ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
@@ -137,6 +499,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 {
 	return -EOPNOTSUPP;
 }
+#endif
 
 /*
  * Prepare a cache for caching.
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 3f791882fa3f..8450ebd77949 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -16,6 +16,7 @@
 #include <linux/cred.h>
 #include <linux/security.h>
 #include <linux/xarray.h>
+#include <linux/cachefiles.h>
 
 #define CACHEFILES_DIO_BLOCK_SIZE 4096
 
@@ -59,6 +60,9 @@ struct cachefiles_object {
 	enum cachefiles_content		content_info:8;	/* Info about content presence */
 	unsigned long			flags;
 #define CACHEFILES_OBJECT_USING_TMPFILE	0		/* Have an unlinked tmpfile */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	int				fd;		/* anonymous fd */
+#endif
 };
 
 /*
@@ -109,6 +113,15 @@ struct cachefiles_cache {
 #endif
 };
 
+struct cachefiles_req {
+	struct cachefiles_object *object;
+	struct completion done;
+	int error;
+	struct cachefiles_msg msg;
+};
+
+#define CACHEFILES_REQ_NEW	XA_MARK_1
+
 #include <trace/events/cachefiles.h>
 
 static inline
@@ -152,6 +165,17 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
  */
 extern const struct file_operations cachefiles_daemon_fops;
 
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+
+#else
+static inline
+int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+	return 0;
+}
+#endif
+
 /*
  * error_inject.c
  */
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index f256c8aff7bb..22aba4c6a762 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -444,10 +444,9 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 	struct dentry *fan = volume->fanout[(u8)object->cookie->key_hash];
 	struct file *file;
 	struct path path;
-	uint64_t ni_size = object->cookie->object_size;
+	uint64_t ni_size;
 	long ret;
 
-	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
 	cachefiles_begin_secure(cache, &saved_cred);
 
@@ -473,6 +472,15 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 		goto out_dput;
 	}
 
+	ret = cachefiles_ondemand_init_object(object);
+	if (ret < 0) {
+		file = ERR_PTR(ret);
+		goto out_dput;
+	}
+
+	ni_size = object->cookie->object_size;
+	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
+
 	if (ni_size > 0) {
 		trace_cachefiles_trunc(object, d_backing_inode(path.dentry), 0, ni_size,
 				       cachefiles_trunc_expand_tmpfile);
@@ -573,6 +581,10 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
 	}
 	_debug("file -> %pd positive", dentry);
 
+	ret = cachefiles_ondemand_init_object(object);
+	if (ret < 0)
+		goto error_fput;
+
 	ret = cachefiles_check_auxdata(object, file);
 	if (ret < 0)
 		goto check_failed;
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index d2430da8aa67..a330354f33ca 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -39,6 +39,7 @@ struct fscache_cookie;
 #define FSCACHE_ADV_SINGLE_CHUNK	0x01 /* The object is a single chunk of data */
 #define FSCACHE_ADV_WRITE_CACHE		0x00 /* Do cache if written to locally */
 #define FSCACHE_ADV_WRITE_NOCACHE	0x02 /* Don't cache if written to locally */
+#define FSCACHE_ADV_WANT_CACHE_SIZE	0x04 /* Retrieve cache size at runtime */
 
 #define FSCACHE_INVAL_DIO_WRITE		0x01 /* Invalidate due to DIO write */
 
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index c6f5aa74db89..371e5816e98c 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -31,6 +31,8 @@ enum cachefiles_obj_ref_trace {
 	cachefiles_obj_see_lookup_failed,
 	cachefiles_obj_see_withdraw_cookie,
 	cachefiles_obj_see_withdrawal,
+	cachefiles_obj_get_ondemand_fd,
+	cachefiles_obj_put_ondemand_fd,
 };
 
 enum fscache_why_object_killed {
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
new file mode 100644
index 000000000000..5ea7285863f1
--- /dev/null
+++ b/include/uapi/linux/cachefiles.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_CACHEFILES_H
+#define _LINUX_CACHEFILES_H
+
+#include <linux/types.h>
+
+#define CACHEFILES_MSG_MAX_SIZE	512
+
+enum cachefiles_opcode {
+	CACHEFILES_OP_OPEN,
+};
+
+/*
+ * @id		identifying position of this message in the radix tree
+ * @opcode	message type, CACHEFILE_OP_*
+ * @len		message length, including message header and following data
+ * @data	message type specific payload
+ */
+struct cachefiles_msg {
+	__u32 id;
+	__u32 opcode;
+	__u32 len;
+	__u8  data[];
+};
+
+struct cachefiles_open {
+	__u32 volume_key_len;
+	__u32 cookie_key_len;
+	__u32 fd;
+	__u32 flags;
+	/* following data contains volume_key and cookie_key in sequence */
+	__u8  data[];
+};
+
+enum cachefiles_open_flags {
+	CACHEFILES_OPEN_WANT_CACHE_SIZE,
+};
+
+#endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 05/22] cachefiles: notify user daemon when withdrawing cookie
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Notify user daemon that cookie is going to be withdrawed, providing a
hint that the associated anon_fd can be closed. The anon_fd attached in
the CLOSE request shall be same with that in the previous OPEN request.

Be noted that this is only a hint. User daemon can close the anon_fd
when receiving the CLOSE request, then it will receive another anon_fd
if the cookie gets looked up. Or it can also ignore the CLOSE request,
and keep writing data into the anon_fd. However the next time cookie
gets looked up, the user daemon will still receive another anon_fd.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/daemon.c          | 27 +++++++++++++++++++++++++++
 fs/cachefiles/interface.c       |  2 ++
 fs/cachefiles/internal.h        |  4 ++++
 include/uapi/linux/cachefiles.h |  5 +++++
 4 files changed, 38 insertions(+)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 3c3a461f8cd8..2ecfdf194206 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -338,6 +338,12 @@ static ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 		goto err_put_fd;
 	}
 
+	/* CLOSE request doesn't look forward a reply */
+	if (msg->opcode == CACHEFILES_OP_CLOSE) {
+		xa_erase(&cache->reqs, id);
+		complete(&req->done);
+	}
+
 	return n;
 
 err_put_fd:
@@ -452,6 +458,19 @@ static int init_open_req(struct cachefiles_req *req, void *private)
 	return 0;
 }
 
+static int init_close_req(struct cachefiles_req *req, void *private)
+{
+	struct cachefiles_object *object = req->object;
+	struct cachefiles_close *load = (void *)req->msg.data;
+	int fd = object->fd;
+
+	if (WARN_ON_ONCE(fd == -1))
+		return -EIO;
+
+	load->fd = fd;
+	return 0;
+}
+
 int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 {
 	struct fscache_cookie *cookie = object->cookie;
@@ -477,6 +496,14 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 					    init_open_req, NULL);
 }
 
+void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object)
+{
+	cachefiles_ondemand_send_req(object,
+				     CACHEFILES_OP_CLOSE,
+				     sizeof(struct cachefiles_close),
+				     init_close_req, NULL);
+}
+
 #else
 static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache) {}
 static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache) {}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index ae93cee9d25d..c5b8fefd4ccc 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -322,6 +322,8 @@ static void cachefiles_commit_object(struct cachefiles_object *object,
 static void cachefiles_clean_up_object(struct cachefiles_object *object,
 				       struct cachefiles_cache *cache)
 {
+	cachefiles_ondemand_cleanup_object(object);
+
 	if (test_bit(FSCACHE_COOKIE_RETIRED, &object->cookie->flags)) {
 		if (!test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) {
 			cachefiles_see_object(object, cachefiles_obj_see_clean_delete);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 8450ebd77949..eaac9fae74eb 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -167,6 +167,7 @@ extern const struct file_operations cachefiles_daemon_fops;
 
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+extern void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object);
 
 #else
 static inline
@@ -174,6 +175,9 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 {
 	return 0;
 }
+
+static inline
+void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object) {}
 #endif
 
 /*
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 5ea7285863f1..47e53043cfad 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -8,6 +8,7 @@
 
 enum cachefiles_opcode {
 	CACHEFILES_OP_OPEN,
+	CACHEFILES_OP_CLOSE,
 };
 
 /*
@@ -36,4 +37,8 @@ enum cachefiles_open_flags {
 	CACHEFILES_OPEN_WANT_CACHE_SIZE,
 };
 
+struct cachefiles_close {
+	__u32 fd;
+};
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 05/22] cachefiles: notify user daemon when withdrawing cookie
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Notify user daemon that cookie is going to be withdrawed, providing a
hint that the associated anon_fd can be closed. The anon_fd attached in
the CLOSE request shall be same with that in the previous OPEN request.

Be noted that this is only a hint. User daemon can close the anon_fd
when receiving the CLOSE request, then it will receive another anon_fd
if the cookie gets looked up. Or it can also ignore the CLOSE request,
and keep writing data into the anon_fd. However the next time cookie
gets looked up, the user daemon will still receive another anon_fd.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/daemon.c          | 27 +++++++++++++++++++++++++++
 fs/cachefiles/interface.c       |  2 ++
 fs/cachefiles/internal.h        |  4 ++++
 include/uapi/linux/cachefiles.h |  5 +++++
 4 files changed, 38 insertions(+)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 3c3a461f8cd8..2ecfdf194206 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -338,6 +338,12 @@ static ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 		goto err_put_fd;
 	}
 
+	/* CLOSE request doesn't look forward a reply */
+	if (msg->opcode == CACHEFILES_OP_CLOSE) {
+		xa_erase(&cache->reqs, id);
+		complete(&req->done);
+	}
+
 	return n;
 
 err_put_fd:
@@ -452,6 +458,19 @@ static int init_open_req(struct cachefiles_req *req, void *private)
 	return 0;
 }
 
+static int init_close_req(struct cachefiles_req *req, void *private)
+{
+	struct cachefiles_object *object = req->object;
+	struct cachefiles_close *load = (void *)req->msg.data;
+	int fd = object->fd;
+
+	if (WARN_ON_ONCE(fd == -1))
+		return -EIO;
+
+	load->fd = fd;
+	return 0;
+}
+
 int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 {
 	struct fscache_cookie *cookie = object->cookie;
@@ -477,6 +496,14 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 					    init_open_req, NULL);
 }
 
+void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object)
+{
+	cachefiles_ondemand_send_req(object,
+				     CACHEFILES_OP_CLOSE,
+				     sizeof(struct cachefiles_close),
+				     init_close_req, NULL);
+}
+
 #else
 static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache) {}
 static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache) {}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index ae93cee9d25d..c5b8fefd4ccc 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -322,6 +322,8 @@ static void cachefiles_commit_object(struct cachefiles_object *object,
 static void cachefiles_clean_up_object(struct cachefiles_object *object,
 				       struct cachefiles_cache *cache)
 {
+	cachefiles_ondemand_cleanup_object(object);
+
 	if (test_bit(FSCACHE_COOKIE_RETIRED, &object->cookie->flags)) {
 		if (!test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) {
 			cachefiles_see_object(object, cachefiles_obj_see_clean_delete);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 8450ebd77949..eaac9fae74eb 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -167,6 +167,7 @@ extern const struct file_operations cachefiles_daemon_fops;
 
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+extern void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object);
 
 #else
 static inline
@@ -174,6 +175,9 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 {
 	return 0;
 }
+
+static inline
+void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object) {}
 #endif
 
 /*
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 5ea7285863f1..47e53043cfad 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -8,6 +8,7 @@
 
 enum cachefiles_opcode {
 	CACHEFILES_OP_OPEN,
+	CACHEFILES_OP_CLOSE,
 };
 
 /*
@@ -36,4 +37,8 @@ enum cachefiles_open_flags {
 	CACHEFILES_OPEN_WANT_CACHE_SIZE,
 };
 
+struct cachefiles_close {
+	__u32 fd;
+};
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 06/22] cachefiles: implement on-demand read
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Implement the data plane of on-demand read mode.

A new NETFS_READ_HOLE_ONDEMAND flag is introduced to indicate that
on-demand read should be done when a cache miss encountered. In this
case, the read routine will send a READ request to user daemon, along
with the anonymous fd and the file range that shall be read. Now user
daemon is responsible for fetching data in the given file range, and
then writing the fetched data into cache file with the given anonymous
fd.

After sending the READ request, the read routine will hang there, until
the READ request is handled by user daemon. Then it will retry to read
from the same file range. If a cache miss is encountered again on the
same file range, the read routine will fail then.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/daemon.c          | 65 +++++++++++++++++++++++++++++++++
 fs/cachefiles/internal.h        |  9 +++++
 fs/cachefiles/io.c              | 11 ++++++
 include/linux/netfs.h           |  1 +
 include/uapi/linux/cachefiles.h |  7 ++++
 5 files changed, 93 insertions(+)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 2ecfdf194206..29af8943f270 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -47,6 +47,7 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *, char *);
 static void cachefiles_daemon_unbind(struct cachefiles_cache *);
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 static int cachefiles_ondemand_cinit(struct cachefiles_cache *, char *);
+static int cachefiles_ondemand_cread(struct cachefiles_cache *, char *);
 #endif
 
 static unsigned long cachefiles_open;
@@ -82,6 +83,7 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
 	{ "tag",	cachefiles_daemon_tag		},
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 	{ "cinit",	cachefiles_ondemand_cinit	},
+	{ "cread",	cachefiles_ondemand_cread	},
 #endif
 	{ "",		NULL				}
 };
@@ -264,6 +266,36 @@ static int cachefiles_ondemand_cinit(struct cachefiles_cache *cache, char *args)
 	return ret;
 }
 
+/*
+ * Read request completion
+ * - command: "cread <id>"
+ */
+static int cachefiles_ondemand_cread(struct cachefiles_cache *cache, char *args)
+{
+	struct cachefiles_req *req;
+	unsigned long id;
+	int ret;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return -EOPNOTSUPP;
+
+	if (!*args) {
+		pr_err("Empty id specified\n");
+		return -EINVAL;
+	}
+
+	ret = kstrtoul(args, 0, &id);
+	if (ret)
+		return ret;
+
+	req = xa_erase(&cache->reqs, id);
+	if (!req)
+		return -EINVAL;
+
+	complete(&req->done);
+	return 0;
+}
+
 static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
 {
 	struct cachefiles_object *object;
@@ -471,6 +503,28 @@ static int init_close_req(struct cachefiles_req *req, void *private)
 	return 0;
 }
 
+struct cachefiles_read_ctx {
+	loff_t off;
+	size_t len;
+};
+
+static int init_read_req(struct cachefiles_req *req, void *private)
+{
+	struct cachefiles_object *object = req->object;
+	struct cachefiles_read *load = (void *)&req->msg.data;
+	struct cachefiles_read_ctx *read_ctx = private;
+	int fd = object->fd;
+
+	/* Stop enqueuig request when daemon closes anon_fd prematurely. */
+	if (WARN_ON_ONCE(fd == -1))
+		return -EIO;
+
+	load->off = read_ctx->off;
+	load->len = read_ctx->len;
+	load->fd  = fd;
+	return 0;
+}
+
 int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 {
 	struct fscache_cookie *cookie = object->cookie;
@@ -504,6 +558,17 @@ void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object)
 				     init_close_req, NULL);
 }
 
+int cachefiles_ondemand_read(struct cachefiles_object *object,
+			     loff_t pos, size_t len)
+{
+	struct cachefiles_read_ctx read_ctx = {pos, len};
+
+	return cachefiles_ondemand_send_req(object,
+					    CACHEFILES_OP_READ,
+					    sizeof(struct cachefiles_read),
+					    init_read_req, &read_ctx);
+}
+
 #else
 static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache) {}
 static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache) {}
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index eaac9fae74eb..770b37e23bcc 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -168,6 +168,8 @@ extern const struct file_operations cachefiles_daemon_fops;
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
 extern void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object);
+extern int cachefiles_ondemand_read(struct cachefiles_object *object,
+				    loff_t pos, size_t len);
 
 #else
 static inline
@@ -178,6 +180,13 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 
 static inline
 void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object) {}
+
+static inline
+int cachefiles_ondemand_read(struct cachefiles_object *object,
+			     loff_t pos, size_t len)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 /*
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 8dbc1eb254a3..ee1283ba7a2c 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -95,6 +95,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 	       file, file_inode(file)->i_ino, start_pos, len,
 	       i_size_read(file_inode(file)));
 
+retry:
 	/* If the caller asked us to seek for data before doing the read, then
 	 * we should do that now.  If we find a gap, we fill it with zeros.
 	 */
@@ -119,6 +120,16 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 			if (read_hole == NETFS_READ_HOLE_FAIL)
 				goto presubmission_error;
 
+			if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
+				if (!cachefiles_ondemand_read(object, off, len)) {
+					/* fail the read if no progress achieved */
+					read_hole = NETFS_READ_HOLE_FAIL;
+					goto retry;
+				}
+
+				goto presubmission_error;
+			}
+
 			iov_iter_zero(len, iter);
 			skipped = len;
 			ret = 0;
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 614f22213e21..2a9c50d3a928 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -203,6 +203,7 @@ enum netfs_read_from_hole {
 	NETFS_READ_HOLE_IGNORE,
 	NETFS_READ_HOLE_CLEAR,
 	NETFS_READ_HOLE_FAIL,
+	NETFS_READ_HOLE_ONDEMAND,
 };
 
 /*
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 47e53043cfad..48a0dbac9a92 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -9,6 +9,7 @@
 enum cachefiles_opcode {
 	CACHEFILES_OP_OPEN,
 	CACHEFILES_OP_CLOSE,
+	CACHEFILES_OP_READ,
 };
 
 /*
@@ -41,4 +42,10 @@ struct cachefiles_close {
 	__u32 fd;
 };
 
+struct cachefiles_read {
+	__u64 off;
+	__u64 len;
+	__u32 fd;
+};
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 06/22] cachefiles: implement on-demand read
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Implement the data plane of on-demand read mode.

A new NETFS_READ_HOLE_ONDEMAND flag is introduced to indicate that
on-demand read should be done when a cache miss encountered. In this
case, the read routine will send a READ request to user daemon, along
with the anonymous fd and the file range that shall be read. Now user
daemon is responsible for fetching data in the given file range, and
then writing the fetched data into cache file with the given anonymous
fd.

After sending the READ request, the read routine will hang there, until
the READ request is handled by user daemon. Then it will retry to read
from the same file range. If a cache miss is encountered again on the
same file range, the read routine will fail then.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/daemon.c          | 65 +++++++++++++++++++++++++++++++++
 fs/cachefiles/internal.h        |  9 +++++
 fs/cachefiles/io.c              | 11 ++++++
 include/linux/netfs.h           |  1 +
 include/uapi/linux/cachefiles.h |  7 ++++
 5 files changed, 93 insertions(+)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 2ecfdf194206..29af8943f270 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -47,6 +47,7 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *, char *);
 static void cachefiles_daemon_unbind(struct cachefiles_cache *);
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 static int cachefiles_ondemand_cinit(struct cachefiles_cache *, char *);
+static int cachefiles_ondemand_cread(struct cachefiles_cache *, char *);
 #endif
 
 static unsigned long cachefiles_open;
@@ -82,6 +83,7 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
 	{ "tag",	cachefiles_daemon_tag		},
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 	{ "cinit",	cachefiles_ondemand_cinit	},
+	{ "cread",	cachefiles_ondemand_cread	},
 #endif
 	{ "",		NULL				}
 };
@@ -264,6 +266,36 @@ static int cachefiles_ondemand_cinit(struct cachefiles_cache *cache, char *args)
 	return ret;
 }
 
+/*
+ * Read request completion
+ * - command: "cread <id>"
+ */
+static int cachefiles_ondemand_cread(struct cachefiles_cache *cache, char *args)
+{
+	struct cachefiles_req *req;
+	unsigned long id;
+	int ret;
+
+	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+		return -EOPNOTSUPP;
+
+	if (!*args) {
+		pr_err("Empty id specified\n");
+		return -EINVAL;
+	}
+
+	ret = kstrtoul(args, 0, &id);
+	if (ret)
+		return ret;
+
+	req = xa_erase(&cache->reqs, id);
+	if (!req)
+		return -EINVAL;
+
+	complete(&req->done);
+	return 0;
+}
+
 static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
 {
 	struct cachefiles_object *object;
@@ -471,6 +503,28 @@ static int init_close_req(struct cachefiles_req *req, void *private)
 	return 0;
 }
 
+struct cachefiles_read_ctx {
+	loff_t off;
+	size_t len;
+};
+
+static int init_read_req(struct cachefiles_req *req, void *private)
+{
+	struct cachefiles_object *object = req->object;
+	struct cachefiles_read *load = (void *)&req->msg.data;
+	struct cachefiles_read_ctx *read_ctx = private;
+	int fd = object->fd;
+
+	/* Stop enqueuig request when daemon closes anon_fd prematurely. */
+	if (WARN_ON_ONCE(fd == -1))
+		return -EIO;
+
+	load->off = read_ctx->off;
+	load->len = read_ctx->len;
+	load->fd  = fd;
+	return 0;
+}
+
 int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 {
 	struct fscache_cookie *cookie = object->cookie;
@@ -504,6 +558,17 @@ void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object)
 				     init_close_req, NULL);
 }
 
+int cachefiles_ondemand_read(struct cachefiles_object *object,
+			     loff_t pos, size_t len)
+{
+	struct cachefiles_read_ctx read_ctx = {pos, len};
+
+	return cachefiles_ondemand_send_req(object,
+					    CACHEFILES_OP_READ,
+					    sizeof(struct cachefiles_read),
+					    init_read_req, &read_ctx);
+}
+
 #else
 static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache) {}
 static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache) {}
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index eaac9fae74eb..770b37e23bcc 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -168,6 +168,8 @@ extern const struct file_operations cachefiles_daemon_fops;
 #ifdef CONFIG_CACHEFILES_ONDEMAND
 extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
 extern void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object);
+extern int cachefiles_ondemand_read(struct cachefiles_object *object,
+				    loff_t pos, size_t len);
 
 #else
 static inline
@@ -178,6 +180,13 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
 
 static inline
 void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object) {}
+
+static inline
+int cachefiles_ondemand_read(struct cachefiles_object *object,
+			     loff_t pos, size_t len)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 /*
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 8dbc1eb254a3..ee1283ba7a2c 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -95,6 +95,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 	       file, file_inode(file)->i_ino, start_pos, len,
 	       i_size_read(file_inode(file)));
 
+retry:
 	/* If the caller asked us to seek for data before doing the read, then
 	 * we should do that now.  If we find a gap, we fill it with zeros.
 	 */
@@ -119,6 +120,16 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 			if (read_hole == NETFS_READ_HOLE_FAIL)
 				goto presubmission_error;
 
+			if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
+				if (!cachefiles_ondemand_read(object, off, len)) {
+					/* fail the read if no progress achieved */
+					read_hole = NETFS_READ_HOLE_FAIL;
+					goto retry;
+				}
+
+				goto presubmission_error;
+			}
+
 			iov_iter_zero(len, iter);
 			skipped = len;
 			ret = 0;
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 614f22213e21..2a9c50d3a928 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -203,6 +203,7 @@ enum netfs_read_from_hole {
 	NETFS_READ_HOLE_IGNORE,
 	NETFS_READ_HOLE_CLEAR,
 	NETFS_READ_HOLE_FAIL,
+	NETFS_READ_HOLE_ONDEMAND,
 };
 
 /*
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 47e53043cfad..48a0dbac9a92 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -9,6 +9,7 @@
 enum cachefiles_opcode {
 	CACHEFILES_OP_OPEN,
 	CACHEFILES_OP_CLOSE,
+	CACHEFILES_OP_READ,
 };
 
 /*
@@ -41,4 +42,10 @@ struct cachefiles_close {
 	__u32 fd;
 };
 
+struct cachefiles_read {
+	__u64 off;
+	__u64 len;
+	__u32 fd;
+};
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 07/22] cachefiles: document on-demand read mode
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Document new user interface introduced by on-demand read mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 .../filesystems/caching/cachefiles.rst        | 176 ++++++++++++++++++
 1 file changed, 176 insertions(+)

diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst
index 8bf396b76359..c8286c901eae 100644
--- a/Documentation/filesystems/caching/cachefiles.rst
+++ b/Documentation/filesystems/caching/cachefiles.rst
@@ -28,6 +28,8 @@ Cache on Already Mounted Filesystem
 
  (*) Debugging.
 
+ (*) On-demand Read.
+
 
 
 Overview
@@ -482,3 +484,177 @@ the control file.  For example::
 	echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
 
 will turn on all function entry debugging.
+
+
+On-demand Read
+==============
+
+When working in original mode, cachefiles mainly serves as a local cache for
+remote networking fs, while in on-demand read mode, cachefiles can boost the
+scenario where on-demand read semantics is needed, e.g. container image
+distribution.
+
+The essential difference between these two modes is that, in original mode,
+when cache miss, netfs itself will fetch data from remote, and then write the
+fetched data into cache file. While in on-demand read mode, a user daemon is
+responsible for fetching data and then writing to the cache file.
+
+``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.
+
+
+Protocol Communication
+----------------------
+
+The on-demand read mode relies on a simple protocol used for communication
+between kernel and user daemon. The model is like::
+
+	kernel --[request]--> user daemon --[reply]--> kernel
+
+The cachefiles kernel module will send requests to user daemon when needed.
+User daemon needs to poll on the devnode ('/dev/cachefiles') to check if
+there's pending request to be processed. A POLLIN event will be returned
+when there's pending request.
+
+Then user daemon needs to read the devnode to fetch one request and process it
+accordingly. It is worth nothing that each read only gets one request. When
+finished processing the request, user dameon needs to write the reply to the
+devnode.
+
+Each request is started with a message header like::
+
+	struct cachefiles_msg {
+		__u32 id;
+		__u32 opcode;
+		__u32 len;
+		__u8  data[];
+	};
+
+	* ``id`` identifies the position of this request in an internal xarray
+	  managing all pending requests.
+
+	* ``opcode`` identifies the type of this request.
+
+	* ``data`` identifies the payload of this request.
+
+	* ``len`` identifies the whole length of this request, including the
+	  header and following type specific payload.
+
+
+Turn on On-demand Mode
+----------------------
+
+An optional parameter is added to "bind" command::
+
+	bind [ondemand]
+
+When "bind" command takes without argument, it defaults to the original mode.
+When "bind" command takes with "ondemand" argument, i.e. "bind ondemand",
+on-demand read mode will be enabled.
+
+
+OPEN Request
+------------
+
+When netfs opens a cache file for the first time, a request with
+CACHEFILES_OP_OPEN opcode, a.k.a OPEN request will be sent to user daemon. The
+payload format is like::
+
+	struct cachefiles_open {
+		__u32 volume_key_len;
+		__u32 cookie_key_len;
+		__u32 fd;
+		__u32 flags;
+		__u8  data[];
+	};
+
+	* ``data`` contains volume_key and cookie_key in sequence.
+
+	* ``volume_key_len`` identifies the length of the volume key of the
+	  cache file, in bytes. volume_key is of string format, with a suffix
+	  '\0'.
+
+	* ``cookie_key_len`` identifies the length of the cookie key of the
+	  cache file, in bytes. The format of cookie_key is netfs specific. It
+	  can be of binary format.
+
+	* ``fd`` identifies the anonymous fd of the cache file, with which user
+	  daemon can perform write/llseek file operations on the cache file.
+
+
+OPEN request contains (volume_key, cookie_key, anon_fd) triple for corresponding
+cache file. With this triple, user daemon could fetch and write data into the
+cache file in the background, even when kernel has not triggered the cache miss
+yet. User daemon is able to distinguish the requested cache file with the given
+(volume_key, cookie_key), and write the fetched data into cache file with the
+given anon_fd.
+
+After recording the (volume_key, cookie_key, anon_fd) triple, user daemon shall
+reply with "cinit" (complete init) command::
+
+	cinit <id>
+
+	* ``id`` is exactly the id field of the previous OPEN request.
+
+
+Besides, CACHEFILES_OPEN_WANT_CACHE_SIZE flag may be set in flags field of
+OPEN request. This flag is used in the scenario where one cache file can contain
+multiple netfs files for the purpose of deduplication, e.g. In this case, netfs
+itself has no idea the cache file size, whilst user daemon needs to offer the
+hint on the cache file size.
+
+Thus when receiving an OPEN request with CACHEFILES_OPEN_WANT_CACHE_SIZE flag
+set, user daemon must reply with the cache file size::
+
+	cinit <id>,<cache_size>
+
+	* ``id`` is exactly the id field of the previous OPEN request.
+
+	* ``cache_size`` identifies the size of the cache file.
+
+
+CLOSE Request
+-------------
+When cookie withdrawed, a request with CACHEFILES_OP_CLOSE opcode, a.k.a CLOSE
+request, will be sent to user daemon. It will notify user daemon to close the
+attached anon_fd. The payload format is like::
+
+	struct cachefiles_close {
+		__u32 fd;
+	};
+
+	* ``fd`` identifies the anon_fd to be closed, which is exactly the same
+	  with that in OPEN request.
+
+
+READ Request
+------------
+
+When on-demand read mode is turned on, and cache miss encountered, kernel will
+send a request with CACHEFILES_OP_READ opcode, a.k.a READ request, to user
+daemon. It will notify user daemon to fetch data in the requested file range.
+The payload format is like::
+
+	struct cachefiles_read {
+		__u64 off;
+		__u64 len;
+		__u32 fd;
+	};
+
+	* ``off`` identifies the starting offset of the requested file range.
+
+	* ``len`` identifies the length of the requested file range.
+
+	* ``fd`` identifies the anonymous fd of the requested cache file. It is
+	  guaranteed that it shall be the same with the fd field in the previous
+	  OPEN request.
+
+When receiving one READ request, user daemon needs to fetch data of the
+requested file range, and then write the fetched data into cache file with the
+given anonymous fd.
+
+When finished processing the READ request, user daemon needs to reply with
+"cread" (complete read) command::
+
+	cread <id>
+
+	* ``id`` is exactly the id field of the previous READ request.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 07/22] cachefiles: document on-demand read mode
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Document new user interface introduced by on-demand read mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 .../filesystems/caching/cachefiles.rst        | 176 ++++++++++++++++++
 1 file changed, 176 insertions(+)

diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst
index 8bf396b76359..c8286c901eae 100644
--- a/Documentation/filesystems/caching/cachefiles.rst
+++ b/Documentation/filesystems/caching/cachefiles.rst
@@ -28,6 +28,8 @@ Cache on Already Mounted Filesystem
 
  (*) Debugging.
 
+ (*) On-demand Read.
+
 
 
 Overview
@@ -482,3 +484,177 @@ the control file.  For example::
 	echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
 
 will turn on all function entry debugging.
+
+
+On-demand Read
+==============
+
+When working in original mode, cachefiles mainly serves as a local cache for
+remote networking fs, while in on-demand read mode, cachefiles can boost the
+scenario where on-demand read semantics is needed, e.g. container image
+distribution.
+
+The essential difference between these two modes is that, in original mode,
+when cache miss, netfs itself will fetch data from remote, and then write the
+fetched data into cache file. While in on-demand read mode, a user daemon is
+responsible for fetching data and then writing to the cache file.
+
+``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.
+
+
+Protocol Communication
+----------------------
+
+The on-demand read mode relies on a simple protocol used for communication
+between kernel and user daemon. The model is like::
+
+	kernel --[request]--> user daemon --[reply]--> kernel
+
+The cachefiles kernel module will send requests to user daemon when needed.
+User daemon needs to poll on the devnode ('/dev/cachefiles') to check if
+there's pending request to be processed. A POLLIN event will be returned
+when there's pending request.
+
+Then user daemon needs to read the devnode to fetch one request and process it
+accordingly. It is worth nothing that each read only gets one request. When
+finished processing the request, user dameon needs to write the reply to the
+devnode.
+
+Each request is started with a message header like::
+
+	struct cachefiles_msg {
+		__u32 id;
+		__u32 opcode;
+		__u32 len;
+		__u8  data[];
+	};
+
+	* ``id`` identifies the position of this request in an internal xarray
+	  managing all pending requests.
+
+	* ``opcode`` identifies the type of this request.
+
+	* ``data`` identifies the payload of this request.
+
+	* ``len`` identifies the whole length of this request, including the
+	  header and following type specific payload.
+
+
+Turn on On-demand Mode
+----------------------
+
+An optional parameter is added to "bind" command::
+
+	bind [ondemand]
+
+When "bind" command takes without argument, it defaults to the original mode.
+When "bind" command takes with "ondemand" argument, i.e. "bind ondemand",
+on-demand read mode will be enabled.
+
+
+OPEN Request
+------------
+
+When netfs opens a cache file for the first time, a request with
+CACHEFILES_OP_OPEN opcode, a.k.a OPEN request will be sent to user daemon. The
+payload format is like::
+
+	struct cachefiles_open {
+		__u32 volume_key_len;
+		__u32 cookie_key_len;
+		__u32 fd;
+		__u32 flags;
+		__u8  data[];
+	};
+
+	* ``data`` contains volume_key and cookie_key in sequence.
+
+	* ``volume_key_len`` identifies the length of the volume key of the
+	  cache file, in bytes. volume_key is of string format, with a suffix
+	  '\0'.
+
+	* ``cookie_key_len`` identifies the length of the cookie key of the
+	  cache file, in bytes. The format of cookie_key is netfs specific. It
+	  can be of binary format.
+
+	* ``fd`` identifies the anonymous fd of the cache file, with which user
+	  daemon can perform write/llseek file operations on the cache file.
+
+
+OPEN request contains (volume_key, cookie_key, anon_fd) triple for corresponding
+cache file. With this triple, user daemon could fetch and write data into the
+cache file in the background, even when kernel has not triggered the cache miss
+yet. User daemon is able to distinguish the requested cache file with the given
+(volume_key, cookie_key), and write the fetched data into cache file with the
+given anon_fd.
+
+After recording the (volume_key, cookie_key, anon_fd) triple, user daemon shall
+reply with "cinit" (complete init) command::
+
+	cinit <id>
+
+	* ``id`` is exactly the id field of the previous OPEN request.
+
+
+Besides, CACHEFILES_OPEN_WANT_CACHE_SIZE flag may be set in flags field of
+OPEN request. This flag is used in the scenario where one cache file can contain
+multiple netfs files for the purpose of deduplication, e.g. In this case, netfs
+itself has no idea the cache file size, whilst user daemon needs to offer the
+hint on the cache file size.
+
+Thus when receiving an OPEN request with CACHEFILES_OPEN_WANT_CACHE_SIZE flag
+set, user daemon must reply with the cache file size::
+
+	cinit <id>,<cache_size>
+
+	* ``id`` is exactly the id field of the previous OPEN request.
+
+	* ``cache_size`` identifies the size of the cache file.
+
+
+CLOSE Request
+-------------
+When cookie withdrawed, a request with CACHEFILES_OP_CLOSE opcode, a.k.a CLOSE
+request, will be sent to user daemon. It will notify user daemon to close the
+attached anon_fd. The payload format is like::
+
+	struct cachefiles_close {
+		__u32 fd;
+	};
+
+	* ``fd`` identifies the anon_fd to be closed, which is exactly the same
+	  with that in OPEN request.
+
+
+READ Request
+------------
+
+When on-demand read mode is turned on, and cache miss encountered, kernel will
+send a request with CACHEFILES_OP_READ opcode, a.k.a READ request, to user
+daemon. It will notify user daemon to fetch data in the requested file range.
+The payload format is like::
+
+	struct cachefiles_read {
+		__u64 off;
+		__u64 len;
+		__u32 fd;
+	};
+
+	* ``off`` identifies the starting offset of the requested file range.
+
+	* ``len`` identifies the length of the requested file range.
+
+	* ``fd`` identifies the anonymous fd of the requested cache file. It is
+	  guaranteed that it shall be the same with the fd field in the previous
+	  OPEN request.
+
+When receiving one READ request, user daemon needs to fetch data of the
+requested file range, and then write the fetched data into cache file with the
+given anonymous fd.
+
+When finished processing the READ request, user daemon needs to reply with
+"cread" (complete read) command::
+
+	cread <id>
+
+	* ``id`` is exactly the id field of the previous READ request.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 08/22] erofs: use meta buffers for erofs_read_superblock()
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

The only change is that, meta buffers read cache page without __GFP_FS
flag, which shall not matter.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
---
 fs/erofs/super.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 915eefe0d7e2..12755217631f 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -281,21 +281,19 @@ static int erofs_init_devices(struct super_block *sb,
 static int erofs_read_superblock(struct super_block *sb)
 {
 	struct erofs_sb_info *sbi;
-	struct page *page;
+	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
 	struct erofs_super_block *dsb;
 	unsigned int blkszbits;
 	void *data;
 	int ret;
 
-	page = read_mapping_page(sb->s_bdev->bd_inode->i_mapping, 0, NULL);
-	if (IS_ERR(page)) {
+	data = erofs_read_metabuf(&buf, sb, 0, EROFS_KMAP);
+	if (IS_ERR(data)) {
 		erofs_err(sb, "cannot read erofs superblock");
-		return PTR_ERR(page);
+		return PTR_ERR(data);
 	}
 
 	sbi = EROFS_SB(sb);
-
-	data = kmap(page);
 	dsb = (struct erofs_super_block *)(data + EROFS_SUPER_OFFSET);
 
 	ret = -EINVAL;
@@ -365,8 +363,7 @@ static int erofs_read_superblock(struct super_block *sb)
 	if (erofs_sb_has_ztailpacking(sbi))
 		erofs_info(sb, "EXPERIMENTAL compressed inline data feature in use. Use at your own risk!");
 out:
-	kunmap(page);
-	put_page(page);
+	erofs_put_metabuf(&buf);
 	return ret;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 08/22] erofs: use meta buffers for erofs_read_superblock()
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

The only change is that, meta buffers read cache page without __GFP_FS
flag, which shall not matter.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
---
 fs/erofs/super.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 915eefe0d7e2..12755217631f 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -281,21 +281,19 @@ static int erofs_init_devices(struct super_block *sb,
 static int erofs_read_superblock(struct super_block *sb)
 {
 	struct erofs_sb_info *sbi;
-	struct page *page;
+	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
 	struct erofs_super_block *dsb;
 	unsigned int blkszbits;
 	void *data;
 	int ret;
 
-	page = read_mapping_page(sb->s_bdev->bd_inode->i_mapping, 0, NULL);
-	if (IS_ERR(page)) {
+	data = erofs_read_metabuf(&buf, sb, 0, EROFS_KMAP);
+	if (IS_ERR(data)) {
 		erofs_err(sb, "cannot read erofs superblock");
-		return PTR_ERR(page);
+		return PTR_ERR(data);
 	}
 
 	sbi = EROFS_SB(sb);
-
-	data = kmap(page);
 	dsb = (struct erofs_super_block *)(data + EROFS_SUPER_OFFSET);
 
 	ret = -EINVAL;
@@ -365,8 +363,7 @@ static int erofs_read_superblock(struct super_block *sb)
 	if (erofs_sb_has_ztailpacking(sbi))
 		erofs_info(sb, "EXPERIMENTAL compressed inline data feature in use. Use at your own risk!");
 out:
-	kunmap(page);
-	put_page(page);
+	erofs_put_metabuf(&buf);
 	return ret;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 09/22] erofs: make erofs_map_blocks() generally available
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

... so that it can be used in the following introduced fs/erofs/fscache.c.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/data.c     | 4 ++--
 fs/erofs/internal.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 226a57c57ee6..6e2a28242453 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -104,8 +104,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
 	return 0;
 }
 
-static int erofs_map_blocks(struct inode *inode,
-			    struct erofs_map_blocks *map, int flags)
+int erofs_map_blocks(struct inode *inode,
+		     struct erofs_map_blocks *map, int flags)
 {
 	struct super_block *sb = inode->i_sb;
 	struct erofs_inode *vi = EROFS_I(inode);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 5aa2cf2c2f80..e424293f47a2 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -484,6 +484,8 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
 int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
 int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		 u64 start, u64 len);
+int erofs_map_blocks(struct inode *inode,
+		     struct erofs_map_blocks *map, int flags);
 
 /* inode.c */
 static inline unsigned long erofs_inode_hash(erofs_nid_t nid)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 09/22] erofs: make erofs_map_blocks() generally available
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

... so that it can be used in the following introduced fs/erofs/fscache.c.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/data.c     | 4 ++--
 fs/erofs/internal.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 226a57c57ee6..6e2a28242453 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -104,8 +104,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
 	return 0;
 }
 
-static int erofs_map_blocks(struct inode *inode,
-			    struct erofs_map_blocks *map, int flags)
+int erofs_map_blocks(struct inode *inode,
+		     struct erofs_map_blocks *map, int flags)
 {
 	struct super_block *sb = inode->i_sb;
 	struct erofs_inode *vi = EROFS_I(inode);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 5aa2cf2c2f80..e424293f47a2 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -484,6 +484,8 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
 int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
 int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		 u64 start, u64 len);
+int erofs_map_blocks(struct inode *inode,
+		     struct erofs_map_blocks *map, int flags);
 
 /* inode.c */
 static inline unsigned long erofs_inode_hash(erofs_nid_t nid)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 10/22] erofs: add mode checking helper
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Until then erofs is exactly blockdev based filesystem. In other using
scenarios (e.g. container image), erofs needs to run upon files.

This patch set is going to introduces a new nodev mode, in which erofs
could be mounted from a bootstrap blob file containing complete erofs
image.

Add a helper checking which mode erofs works in.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index e424293f47a2..f66af9ebda43 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -161,6 +161,11 @@ struct erofs_sb_info {
 #define set_opt(opt, option)	((opt)->mount_opt |= EROFS_MOUNT_##option)
 #define test_opt(opt, option)	((opt)->mount_opt & EROFS_MOUNT_##option)
 
+static inline bool erofs_bdev_mode(struct super_block *sb)
+{
+	return sb->s_bdev;
+}
+
 enum {
 	EROFS_ZIP_CACHE_DISABLED,
 	EROFS_ZIP_CACHE_READAHEAD,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 10/22] erofs: add mode checking helper
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Until then erofs is exactly blockdev based filesystem. In other using
scenarios (e.g. container image), erofs needs to run upon files.

This patch set is going to introduces a new nodev mode, in which erofs
could be mounted from a bootstrap blob file containing complete erofs
image.

Add a helper checking which mode erofs works in.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index e424293f47a2..f66af9ebda43 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -161,6 +161,11 @@ struct erofs_sb_info {
 #define set_opt(opt, option)	((opt)->mount_opt |= EROFS_MOUNT_##option)
 #define test_opt(opt, option)	((opt)->mount_opt & EROFS_MOUNT_##option)
 
+static inline bool erofs_bdev_mode(struct super_block *sb)
+{
+	return sb->s_bdev;
+}
+
 enum {
 	EROFS_ZIP_CACHE_DISABLED,
 	EROFS_ZIP_CACHE_READAHEAD,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 11/22] erofs: register global fscache volume
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

All erofs instances will share one global fscache volume.

In this using scenario, one erofs instance could be mounted from one (or
multiple) blob files instead of blkdev. The number of blob files that
each erofs instance could correspond to is limited, since these blob
files are quite large in size. For example, when used for container
image distribution, one erofs instance used for container image for
node.js will correspond to ~20 blob files in total. Thus in densely
employed environment, there could be as many as hundreds of containers
and thus thousands of fscache cookies under one fscache volume.

Then as for cachefiles backend, the hash table managing all cookies
under one volume contains 32K slots. Thus the hashing functionality shall
scale well in this case. Besides, cachefiles backend will scatter
backing files under 256 fan sub-directoris, and thus the scalability of
looking up backing files shall also not be an issue.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/Makefile   |  3 ++-
 fs/erofs/fscache.c  | 21 +++++++++++++++++++++
 fs/erofs/internal.h |  5 +++++
 fs/erofs/super.c    |  7 +++++++
 4 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 fs/erofs/fscache.c

diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 8a3317e38e5a..21999e8a4728 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 obj-$(CONFIG_EROFS_FS) += erofs.o
-erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o sysfs.o
+erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o sysfs.o \
+	      fscache.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
 erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
 erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
new file mode 100644
index 000000000000..9c32f42e1056
--- /dev/null
+++ b/fs/erofs/fscache.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2021, Alibaba Cloud
+ */
+#include "internal.h"
+
+static struct fscache_volume *volume;
+
+int __init erofs_init_fscache(void)
+{
+	volume = fscache_acquire_volume("erofs", NULL, NULL, 0);
+	if (!volume)
+		return -EINVAL;
+
+	return 0;
+}
+
+void erofs_exit_fscache(void)
+{
+	fscache_relinquish_volume(volume, NULL, false);
+}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index f66af9ebda43..51fe5c2a419d 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -17,6 +17,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/iomap.h>
+#include <linux/fscache.h>
 #include "erofs_fs.h"
 
 /* redefine pr_fmt "erofs: " */
@@ -616,6 +617,10 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
 }
 #endif	/* !CONFIG_EROFS_FS_ZIP */
 
+/* fscache.c */
+int erofs_init_fscache(void);
+void erofs_exit_fscache(void);
+
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
 
 #endif	/* __EROFS_INTERNAL_H */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 12755217631f..798f0c379e35 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -814,6 +814,10 @@ static int __init erofs_module_init(void)
 	if (err)
 		goto sysfs_err;
 
+	err = erofs_init_fscache();
+	if (err)
+		goto fscache_err;
+
 	err = register_filesystem(&erofs_fs_type);
 	if (err)
 		goto fs_err;
@@ -821,6 +825,8 @@ static int __init erofs_module_init(void)
 	return 0;
 
 fs_err:
+	erofs_exit_fscache();
+fscache_err:
 	erofs_exit_sysfs();
 sysfs_err:
 	z_erofs_exit_zip_subsystem();
@@ -841,6 +847,7 @@ static void __exit erofs_module_exit(void)
 	/* Ensure all RCU free inodes / pclusters are safe to be destroyed. */
 	rcu_barrier();
 
+	erofs_exit_fscache();
 	erofs_exit_sysfs();
 	z_erofs_exit_zip_subsystem();
 	z_erofs_lzma_exit();
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 11/22] erofs: register global fscache volume
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

All erofs instances will share one global fscache volume.

In this using scenario, one erofs instance could be mounted from one (or
multiple) blob files instead of blkdev. The number of blob files that
each erofs instance could correspond to is limited, since these blob
files are quite large in size. For example, when used for container
image distribution, one erofs instance used for container image for
node.js will correspond to ~20 blob files in total. Thus in densely
employed environment, there could be as many as hundreds of containers
and thus thousands of fscache cookies under one fscache volume.

Then as for cachefiles backend, the hash table managing all cookies
under one volume contains 32K slots. Thus the hashing functionality shall
scale well in this case. Besides, cachefiles backend will scatter
backing files under 256 fan sub-directoris, and thus the scalability of
looking up backing files shall also not be an issue.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/Makefile   |  3 ++-
 fs/erofs/fscache.c  | 21 +++++++++++++++++++++
 fs/erofs/internal.h |  5 +++++
 fs/erofs/super.c    |  7 +++++++
 4 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 fs/erofs/fscache.c

diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 8a3317e38e5a..21999e8a4728 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 obj-$(CONFIG_EROFS_FS) += erofs.o
-erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o sysfs.o
+erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o sysfs.o \
+	      fscache.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
 erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
 erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
new file mode 100644
index 000000000000..9c32f42e1056
--- /dev/null
+++ b/fs/erofs/fscache.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2021, Alibaba Cloud
+ */
+#include "internal.h"
+
+static struct fscache_volume *volume;
+
+int __init erofs_init_fscache(void)
+{
+	volume = fscache_acquire_volume("erofs", NULL, NULL, 0);
+	if (!volume)
+		return -EINVAL;
+
+	return 0;
+}
+
+void erofs_exit_fscache(void)
+{
+	fscache_relinquish_volume(volume, NULL, false);
+}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index f66af9ebda43..51fe5c2a419d 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -17,6 +17,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/iomap.h>
+#include <linux/fscache.h>
 #include "erofs_fs.h"
 
 /* redefine pr_fmt "erofs: " */
@@ -616,6 +617,10 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
 }
 #endif	/* !CONFIG_EROFS_FS_ZIP */
 
+/* fscache.c */
+int erofs_init_fscache(void);
+void erofs_exit_fscache(void);
+
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
 
 #endif	/* __EROFS_INTERNAL_H */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 12755217631f..798f0c379e35 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -814,6 +814,10 @@ static int __init erofs_module_init(void)
 	if (err)
 		goto sysfs_err;
 
+	err = erofs_init_fscache();
+	if (err)
+		goto fscache_err;
+
 	err = register_filesystem(&erofs_fs_type);
 	if (err)
 		goto fs_err;
@@ -821,6 +825,8 @@ static int __init erofs_module_init(void)
 	return 0;
 
 fs_err:
+	erofs_exit_fscache();
+fscache_err:
 	erofs_exit_sysfs();
 sysfs_err:
 	z_erofs_exit_zip_subsystem();
@@ -841,6 +847,7 @@ static void __exit erofs_module_exit(void)
 	/* Ensure all RCU free inodes / pclusters are safe to be destroyed. */
 	rcu_barrier();
 
+	erofs_exit_fscache();
 	erofs_exit_sysfs();
 	z_erofs_exit_zip_subsystem();
 	z_erofs_lzma_exit();
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 12/22] erofs: add cookie context helper functions
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Introduce 'struct erofs_cookie_ctx' for managing cookie for backing
file, and the following introduced API for reading from backing file.

Besides, introduce two helper functions for initializing and cleaning
up erofs_cookie_ctx.

struct erofs_cookie_ctx *
erofs_fscache_get_ctx(struct super_block *sb, char *path);

void erofs_fscache_put_ctx(struct erofs_cookie_ctx *ctx);

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 74 +++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/internal.h |  8 +++++
 2 files changed, 82 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 9c32f42e1056..28ec7c69744a 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -6,6 +6,80 @@
 
 static struct fscache_volume *volume;
 
+static int erofs_fscache_init_cookie(struct erofs_fscache_context *ctx,
+				     char *path)
+{
+	struct fscache_cookie *cookie;
+
+	cookie = fscache_acquire_cookie(volume, FSCACHE_ADV_WANT_CACHE_SIZE,
+					path, strlen(path),
+					NULL, 0, 0);
+	if (!cookie)
+		return -EINVAL;
+
+	fscache_use_cookie(cookie, false);
+	ctx->cookie = cookie;
+	return 0;
+}
+
+static inline
+void erofs_fscache_cleanup_cookie(struct erofs_fscache_context *ctx)
+{
+	struct fscache_cookie *cookie = ctx->cookie;
+
+	fscache_unuse_cookie(cookie, NULL, NULL);
+	fscache_relinquish_cookie(cookie, false);
+	ctx->cookie = NULL;
+}
+
+static int erofs_fscache_init_ctx(struct erofs_fscache_context *ctx,
+				  struct super_block *sb, char *path)
+{
+	int ret;
+
+	ret = erofs_fscache_init_cookie(ctx, path);
+	if (ret) {
+		erofs_err(sb, "failed to init cookie");
+		return ret;
+	}
+
+	return 0;
+}
+
+static inline
+void erofs_fscache_cleanup_ctx(struct erofs_fscache_context *ctx)
+{
+	erofs_fscache_cleanup_cookie(ctx);
+}
+
+struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
+						char *path)
+{
+	struct erofs_fscache_context *ctx;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	ret = erofs_fscache_init_ctx(ctx, sb, path);
+	if (ret) {
+		kfree(ctx);
+		return ERR_PTR(ret);
+	}
+
+	return ctx;
+}
+
+void erofs_fscache_put_ctx(struct erofs_fscache_context *ctx)
+{
+	if (!ctx)
+		return;
+
+	erofs_fscache_cleanup_ctx(ctx);
+	kfree(ctx);
+}
+
 int __init erofs_init_fscache(void)
 {
 	volume = fscache_acquire_volume("erofs", NULL, NULL, 0);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 51fe5c2a419d..123a8dfc179b 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -97,6 +97,10 @@ struct erofs_sb_lz4_info {
 	u16 max_pclusterblks;
 };
 
+struct erofs_fscache_context {
+	struct fscache_cookie *cookie;
+};
+
 struct erofs_sb_info {
 	struct erofs_mount_opts opt;	/* options */
 #ifdef CONFIG_EROFS_FS_ZIP
@@ -621,6 +625,10 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
 int erofs_init_fscache(void);
 void erofs_exit_fscache(void);
 
+struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
+						char *path);
+void erofs_fscache_put_ctx(struct erofs_fscache_context *ctx);
+
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
 
 #endif	/* __EROFS_INTERNAL_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 12/22] erofs: add cookie context helper functions
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Introduce 'struct erofs_cookie_ctx' for managing cookie for backing
file, and the following introduced API for reading from backing file.

Besides, introduce two helper functions for initializing and cleaning
up erofs_cookie_ctx.

struct erofs_cookie_ctx *
erofs_fscache_get_ctx(struct super_block *sb, char *path);

void erofs_fscache_put_ctx(struct erofs_cookie_ctx *ctx);

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 74 +++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/internal.h |  8 +++++
 2 files changed, 82 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 9c32f42e1056..28ec7c69744a 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -6,6 +6,80 @@
 
 static struct fscache_volume *volume;
 
+static int erofs_fscache_init_cookie(struct erofs_fscache_context *ctx,
+				     char *path)
+{
+	struct fscache_cookie *cookie;
+
+	cookie = fscache_acquire_cookie(volume, FSCACHE_ADV_WANT_CACHE_SIZE,
+					path, strlen(path),
+					NULL, 0, 0);
+	if (!cookie)
+		return -EINVAL;
+
+	fscache_use_cookie(cookie, false);
+	ctx->cookie = cookie;
+	return 0;
+}
+
+static inline
+void erofs_fscache_cleanup_cookie(struct erofs_fscache_context *ctx)
+{
+	struct fscache_cookie *cookie = ctx->cookie;
+
+	fscache_unuse_cookie(cookie, NULL, NULL);
+	fscache_relinquish_cookie(cookie, false);
+	ctx->cookie = NULL;
+}
+
+static int erofs_fscache_init_ctx(struct erofs_fscache_context *ctx,
+				  struct super_block *sb, char *path)
+{
+	int ret;
+
+	ret = erofs_fscache_init_cookie(ctx, path);
+	if (ret) {
+		erofs_err(sb, "failed to init cookie");
+		return ret;
+	}
+
+	return 0;
+}
+
+static inline
+void erofs_fscache_cleanup_ctx(struct erofs_fscache_context *ctx)
+{
+	erofs_fscache_cleanup_cookie(ctx);
+}
+
+struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
+						char *path)
+{
+	struct erofs_fscache_context *ctx;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	ret = erofs_fscache_init_ctx(ctx, sb, path);
+	if (ret) {
+		kfree(ctx);
+		return ERR_PTR(ret);
+	}
+
+	return ctx;
+}
+
+void erofs_fscache_put_ctx(struct erofs_fscache_context *ctx)
+{
+	if (!ctx)
+		return;
+
+	erofs_fscache_cleanup_ctx(ctx);
+	kfree(ctx);
+}
+
 int __init erofs_init_fscache(void)
 {
 	volume = fscache_acquire_volume("erofs", NULL, NULL, 0);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 51fe5c2a419d..123a8dfc179b 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -97,6 +97,10 @@ struct erofs_sb_lz4_info {
 	u16 max_pclusterblks;
 };
 
+struct erofs_fscache_context {
+	struct fscache_cookie *cookie;
+};
+
 struct erofs_sb_info {
 	struct erofs_mount_opts opt;	/* options */
 #ifdef CONFIG_EROFS_FS_ZIP
@@ -621,6 +625,10 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
 int erofs_init_fscache(void);
 void erofs_exit_fscache(void);
 
+struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
+						char *path);
+void erofs_fscache_put_ctx(struct erofs_fscache_context *ctx);
+
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
 
 #endif	/* __EROFS_INTERNAL_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 13/22] erofs: add anonymous inode managing page cache of blob file
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Introduce one anonymous inode for managing page cache of corresponding
blob file. Then erofs could read directly from the address space of the
anonymous inode when cache hit.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 45 ++++++++++++++++++++++++++++++++++++++++++---
 fs/erofs/internal.h |  3 ++-
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 28ec7c69744a..684ac6f940bc 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -6,6 +6,9 @@
 
 static struct fscache_volume *volume;
 
+static const struct address_space_operations erofs_fscache_blob_aops = {
+};
+
 static int erofs_fscache_init_cookie(struct erofs_fscache_context *ctx,
 				     char *path)
 {
@@ -32,8 +35,34 @@ void erofs_fscache_cleanup_cookie(struct erofs_fscache_context *ctx)
 	ctx->cookie = NULL;
 }
 
+static int erofs_fscache_get_inode(struct erofs_fscache_context *ctx,
+				   struct super_block *sb)
+{
+	struct inode *const inode = new_inode(sb);
+
+	if (!inode)
+		return -ENOMEM;
+
+	set_nlink(inode, 1);
+	inode->i_size = OFFSET_MAX;
+
+	inode->i_mapping->a_ops = &erofs_fscache_blob_aops;
+	mapping_set_gfp_mask(inode->i_mapping,
+			GFP_NOFS | __GFP_HIGHMEM | __GFP_MOVABLE);
+	ctx->inode = inode;
+	return 0;
+}
+
+static inline
+void erofs_fscache_put_inode(struct erofs_fscache_context *ctx)
+{
+	iput(ctx->inode);
+	ctx->inode = NULL;
+}
+
 static int erofs_fscache_init_ctx(struct erofs_fscache_context *ctx,
-				  struct super_block *sb, char *path)
+				  struct super_block *sb, char *path,
+				  bool need_inode)
 {
 	int ret;
 
@@ -43,6 +72,15 @@ static int erofs_fscache_init_ctx(struct erofs_fscache_context *ctx,
 		return ret;
 	}
 
+	if (need_inode) {
+		ret = erofs_fscache_get_inode(ctx, sb);
+		if (ret) {
+			erofs_err(sb, "failed to get anonymous inode");
+			erofs_fscache_cleanup_cookie(ctx);
+			return ret;
+		}
+	}
+
 	return 0;
 }
 
@@ -50,10 +88,11 @@ static inline
 void erofs_fscache_cleanup_ctx(struct erofs_fscache_context *ctx)
 {
 	erofs_fscache_cleanup_cookie(ctx);
+	erofs_fscache_put_inode(ctx);
 }
 
 struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
-						char *path)
+						char *path, bool need_inode)
 {
 	struct erofs_fscache_context *ctx;
 	int ret;
@@ -62,7 +101,7 @@ struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
 	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
-	ret = erofs_fscache_init_ctx(ctx, sb, path);
+	ret = erofs_fscache_init_ctx(ctx, sb, path, need_inode);
 	if (ret) {
 		kfree(ctx);
 		return ERR_PTR(ret);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 123a8dfc179b..32aaa5ee5e14 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -99,6 +99,7 @@ struct erofs_sb_lz4_info {
 
 struct erofs_fscache_context {
 	struct fscache_cookie *cookie;
+	struct inode *inode;
 };
 
 struct erofs_sb_info {
@@ -626,7 +627,7 @@ int erofs_init_fscache(void);
 void erofs_exit_fscache(void);
 
 struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
-						char *path);
+						char *path, bool need_inode);
 void erofs_fscache_put_ctx(struct erofs_fscache_context *ctx);
 
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 13/22] erofs: add anonymous inode managing page cache of blob file
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Introduce one anonymous inode for managing page cache of corresponding
blob file. Then erofs could read directly from the address space of the
anonymous inode when cache hit.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 45 ++++++++++++++++++++++++++++++++++++++++++---
 fs/erofs/internal.h |  3 ++-
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 28ec7c69744a..684ac6f940bc 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -6,6 +6,9 @@
 
 static struct fscache_volume *volume;
 
+static const struct address_space_operations erofs_fscache_blob_aops = {
+};
+
 static int erofs_fscache_init_cookie(struct erofs_fscache_context *ctx,
 				     char *path)
 {
@@ -32,8 +35,34 @@ void erofs_fscache_cleanup_cookie(struct erofs_fscache_context *ctx)
 	ctx->cookie = NULL;
 }
 
+static int erofs_fscache_get_inode(struct erofs_fscache_context *ctx,
+				   struct super_block *sb)
+{
+	struct inode *const inode = new_inode(sb);
+
+	if (!inode)
+		return -ENOMEM;
+
+	set_nlink(inode, 1);
+	inode->i_size = OFFSET_MAX;
+
+	inode->i_mapping->a_ops = &erofs_fscache_blob_aops;
+	mapping_set_gfp_mask(inode->i_mapping,
+			GFP_NOFS | __GFP_HIGHMEM | __GFP_MOVABLE);
+	ctx->inode = inode;
+	return 0;
+}
+
+static inline
+void erofs_fscache_put_inode(struct erofs_fscache_context *ctx)
+{
+	iput(ctx->inode);
+	ctx->inode = NULL;
+}
+
 static int erofs_fscache_init_ctx(struct erofs_fscache_context *ctx,
-				  struct super_block *sb, char *path)
+				  struct super_block *sb, char *path,
+				  bool need_inode)
 {
 	int ret;
 
@@ -43,6 +72,15 @@ static int erofs_fscache_init_ctx(struct erofs_fscache_context *ctx,
 		return ret;
 	}
 
+	if (need_inode) {
+		ret = erofs_fscache_get_inode(ctx, sb);
+		if (ret) {
+			erofs_err(sb, "failed to get anonymous inode");
+			erofs_fscache_cleanup_cookie(ctx);
+			return ret;
+		}
+	}
+
 	return 0;
 }
 
@@ -50,10 +88,11 @@ static inline
 void erofs_fscache_cleanup_ctx(struct erofs_fscache_context *ctx)
 {
 	erofs_fscache_cleanup_cookie(ctx);
+	erofs_fscache_put_inode(ctx);
 }
 
 struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
-						char *path)
+						char *path, bool need_inode)
 {
 	struct erofs_fscache_context *ctx;
 	int ret;
@@ -62,7 +101,7 @@ struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
 	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
-	ret = erofs_fscache_init_ctx(ctx, sb, path);
+	ret = erofs_fscache_init_ctx(ctx, sb, path, need_inode);
 	if (ret) {
 		kfree(ctx);
 		return ERR_PTR(ret);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 123a8dfc179b..32aaa5ee5e14 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -99,6 +99,7 @@ struct erofs_sb_lz4_info {
 
 struct erofs_fscache_context {
 	struct fscache_cookie *cookie;
+	struct inode *inode;
 };
 
 struct erofs_sb_info {
@@ -626,7 +627,7 @@ int erofs_init_fscache(void);
 void erofs_exit_fscache(void);
 
 struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
-						char *path);
+						char *path, bool need_inode);
 void erofs_fscache_put_ctx(struct erofs_fscache_context *ctx);
 
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 14/22] erofs: add erofs_fscache_read_pages() helper
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Add erofs_fscache_read_pages() helper reading from fscache. It supports
on-demand read semantics. That is, it will make the backend prepare for
the data when cache miss. Once data ready, it will reinitiate a read
from the cache.

This helper can then be used to implement .readpage()/.readahead() of
on-demand read semantics.

Besides also add erofs_fscache_read_page() wrapper helper.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 684ac6f940bc..38b5a9380092 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -6,6 +6,44 @@
 
 static struct fscache_volume *volume;
 
+/*
+ * erofs_fscache_read_pages - Read data from fscache.
+ *
+ * Fill the readed data into page cache described by @start/len, which
+ * shall be both aligned with PAGE_SIZE. @pstart describes the corresponding
+ * physical start address in the cache file.
+ */
+static int erofs_fscache_read_pages(struct fscache_cookie *cookie,
+				    struct address_space *mapping,
+				    loff_t start, size_t len,
+				    loff_t pstart)
+{
+	struct netfs_cache_resources cres;
+	struct iov_iter iter;
+	int ret;
+
+	memset(&cres, 0, sizeof(cres));
+
+	ret = fscache_begin_read_operation(&cres, cookie);
+	if (ret)
+		return ret;
+
+	iov_iter_xarray(&iter, READ, &mapping->i_pages, start, len);
+
+	ret = fscache_read(&cres, pstart, &iter,
+			   NETFS_READ_HOLE_ONDEMAND, NULL, NULL);
+
+	fscache_end_operation(&cres);
+	return ret;
+}
+
+static inline int erofs_fscache_read_page(struct fscache_cookie *cookie,
+					  struct page *page, loff_t pstart)
+{
+	return erofs_fscache_read_pages(cookie, page_file_mapping(page),
+					page_offset(page), PAGE_SIZE, pstart);
+}
+
 static const struct address_space_operations erofs_fscache_blob_aops = {
 };
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 14/22] erofs: add erofs_fscache_read_pages() helper
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Add erofs_fscache_read_pages() helper reading from fscache. It supports
on-demand read semantics. That is, it will make the backend prepare for
the data when cache miss. Once data ready, it will reinitiate a read
from the cache.

This helper can then be used to implement .readpage()/.readahead() of
on-demand read semantics.

Besides also add erofs_fscache_read_page() wrapper helper.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 684ac6f940bc..38b5a9380092 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -6,6 +6,44 @@
 
 static struct fscache_volume *volume;
 
+/*
+ * erofs_fscache_read_pages - Read data from fscache.
+ *
+ * Fill the readed data into page cache described by @start/len, which
+ * shall be both aligned with PAGE_SIZE. @pstart describes the corresponding
+ * physical start address in the cache file.
+ */
+static int erofs_fscache_read_pages(struct fscache_cookie *cookie,
+				    struct address_space *mapping,
+				    loff_t start, size_t len,
+				    loff_t pstart)
+{
+	struct netfs_cache_resources cres;
+	struct iov_iter iter;
+	int ret;
+
+	memset(&cres, 0, sizeof(cres));
+
+	ret = fscache_begin_read_operation(&cres, cookie);
+	if (ret)
+		return ret;
+
+	iov_iter_xarray(&iter, READ, &mapping->i_pages, start, len);
+
+	ret = fscache_read(&cres, pstart, &iter,
+			   NETFS_READ_HOLE_ONDEMAND, NULL, NULL);
+
+	fscache_end_operation(&cres);
+	return ret;
+}
+
+static inline int erofs_fscache_read_page(struct fscache_cookie *cookie,
+					  struct page *page, loff_t pstart)
+{
+	return erofs_fscache_read_pages(cookie, page_file_mapping(page),
+					page_offset(page), PAGE_SIZE, pstart);
+}
+
 static const struct address_space_operations erofs_fscache_blob_aops = {
 };
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 15/22] erofs: register cookie context for bootstrap blob
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Registers fscache_cookie for the bootstrap blob file. The bootstrap blob
file can be specified by a new mount option, which is going to be
introduced by a following patch.

Something worth mentioning about the cleanup routine.

1. The init routine is prior to when the root inode gets initialized,
and thus the corresponding cleanup routine shall be placed under
.kill_sb() callback.

2. The init routine will instantiate anonymous inodes under the
super_block, and thus .put_super() callback shall also contain the
cleanup routine. Or we'll get "VFS: Busy inodes after unmount." warning.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/internal.h |  3 +++
 fs/erofs/super.c    | 13 +++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 32aaa5ee5e14..cce39339c08e 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -75,6 +75,7 @@ struct erofs_mount_opts {
 	unsigned int max_sync_decompress_pages;
 #endif
 	unsigned int mount_opt;
+	char *uuid;
 };
 
 struct erofs_dev_context {
@@ -152,6 +153,8 @@ struct erofs_sb_info {
 	/* sysfs support */
 	struct kobject s_kobj;		/* /sys/fs/erofs/<devname> */
 	struct completion s_kobj_unregister;
+
+	struct erofs_fscache_context *bootstrap;
 };
 
 #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 798f0c379e35..8c5783c6f71f 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -598,6 +598,16 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 	sbi->devs = ctx->devs;
 	ctx->devs = NULL;
 
+	if (!erofs_bdev_mode(sb)) {
+		struct erofs_fscache_context *bootstrap;
+
+		bootstrap = erofs_fscache_get_ctx(sb, ctx->opt.uuid, true);
+		if (IS_ERR(bootstrap))
+			return PTR_ERR(bootstrap);
+
+		sbi->bootstrap = bootstrap;
+	}
+
 	err = erofs_read_superblock(sb);
 	if (err)
 		return err;
@@ -753,6 +763,7 @@ static void erofs_kill_sb(struct super_block *sb)
 		return;
 
 	erofs_free_dev_context(sbi->devs);
+	erofs_fscache_put_ctx(sbi->bootstrap);
 	fs_put_dax(sbi->dax_dev);
 	kfree(sbi);
 	sb->s_fs_info = NULL;
@@ -771,6 +782,8 @@ static void erofs_put_super(struct super_block *sb)
 	iput(sbi->managed_cache);
 	sbi->managed_cache = NULL;
 #endif
+	erofs_fscache_put_ctx(sbi->bootstrap);
+	sbi->bootstrap = NULL;
 }
 
 static struct file_system_type erofs_fs_type = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 15/22] erofs: register cookie context for bootstrap blob
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Registers fscache_cookie for the bootstrap blob file. The bootstrap blob
file can be specified by a new mount option, which is going to be
introduced by a following patch.

Something worth mentioning about the cleanup routine.

1. The init routine is prior to when the root inode gets initialized,
and thus the corresponding cleanup routine shall be placed under
.kill_sb() callback.

2. The init routine will instantiate anonymous inodes under the
super_block, and thus .put_super() callback shall also contain the
cleanup routine. Or we'll get "VFS: Busy inodes after unmount." warning.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/internal.h |  3 +++
 fs/erofs/super.c    | 13 +++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 32aaa5ee5e14..cce39339c08e 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -75,6 +75,7 @@ struct erofs_mount_opts {
 	unsigned int max_sync_decompress_pages;
 #endif
 	unsigned int mount_opt;
+	char *uuid;
 };
 
 struct erofs_dev_context {
@@ -152,6 +153,8 @@ struct erofs_sb_info {
 	/* sysfs support */
 	struct kobject s_kobj;		/* /sys/fs/erofs/<devname> */
 	struct completion s_kobj_unregister;
+
+	struct erofs_fscache_context *bootstrap;
 };
 
 #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 798f0c379e35..8c5783c6f71f 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -598,6 +598,16 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 	sbi->devs = ctx->devs;
 	ctx->devs = NULL;
 
+	if (!erofs_bdev_mode(sb)) {
+		struct erofs_fscache_context *bootstrap;
+
+		bootstrap = erofs_fscache_get_ctx(sb, ctx->opt.uuid, true);
+		if (IS_ERR(bootstrap))
+			return PTR_ERR(bootstrap);
+
+		sbi->bootstrap = bootstrap;
+	}
+
 	err = erofs_read_superblock(sb);
 	if (err)
 		return err;
@@ -753,6 +763,7 @@ static void erofs_kill_sb(struct super_block *sb)
 		return;
 
 	erofs_free_dev_context(sbi->devs);
+	erofs_fscache_put_ctx(sbi->bootstrap);
 	fs_put_dax(sbi->dax_dev);
 	kfree(sbi);
 	sb->s_fs_info = NULL;
@@ -771,6 +782,8 @@ static void erofs_put_super(struct super_block *sb)
 	iput(sbi->managed_cache);
 	sbi->managed_cache = NULL;
 #endif
+	erofs_fscache_put_ctx(sbi->bootstrap);
+	sbi->bootstrap = NULL;
 }
 
 static struct file_system_type erofs_fs_type = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 16/22] erofs: implement fscache-based metadata read
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

This patch implements the data plane of reading metadata from bootstrap
blob file over fscache.

Be noted that currently it only supports the scenario where the backing
file has no hole. Once it hits a hole of the backing file, erofs will
fail the IO with -EOPNOTSUPP for now. The following patch will fix this
issue, i.e. implementing the demand reading mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/data.c     | 11 +++++++++--
 fs/erofs/fscache.c  | 24 ++++++++++++++++++++++++
 fs/erofs/internal.h |  3 +++
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 6e2a28242453..1bff99576883 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -31,15 +31,22 @@ void erofs_put_metabuf(struct erofs_buf *buf)
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
 			erofs_blk_t blkaddr, enum erofs_kmap_type type)
 {
-	struct address_space *const mapping = sb->s_bdev->bd_inode->i_mapping;
+	struct address_space *mapping;
+	struct erofs_sb_info *sbi = EROFS_SB(sb);
 	erofs_off_t offset = blknr_to_addr(blkaddr);
 	pgoff_t index = offset >> PAGE_SHIFT;
 	struct page *page = buf->page;
 
 	if (!page || page->index != index) {
 		erofs_put_metabuf(buf);
-		page = read_cache_page_gfp(mapping, index,
+		if (erofs_bdev_mode(sb)) {
+			mapping = sb->s_bdev->bd_inode->i_mapping;
+			page = read_cache_page_gfp(mapping, index,
 				mapping_gfp_constraint(mapping, ~__GFP_FS));
+		} else {
+			page = erofs_fscache_read_cache_page(sbi->bootstrap,
+				index);
+		}
 		if (IS_ERR(page))
 			return page;
 		/* should already be PageUptodate, no need to lock page */
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 38b5a9380092..654414aa87ad 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -44,9 +44,33 @@ static inline int erofs_fscache_read_page(struct fscache_cookie *cookie,
 					page_offset(page), PAGE_SIZE, pstart);
 }
 
+static int erofs_fscache_readpage_blob(struct file *data, struct page *page)
+{
+	int ret;
+	struct erofs_fscache_context *ctx =
+		(struct erofs_fscache_context *)data;
+
+	ret = erofs_fscache_read_page(ctx->cookie, page, page_offset(page));
+	if (!ret)
+		SetPageUptodate(page);
+	else
+		SetPageError(page);
+
+	unlock_page(page);
+	return ret;
+}
+
 static const struct address_space_operations erofs_fscache_blob_aops = {
+	.readpage = erofs_fscache_readpage_blob,
 };
 
+struct page *erofs_fscache_read_cache_page(struct erofs_fscache_context *ctx,
+					   pgoff_t index)
+{
+	DBG_BUGON(!ctx->inode);
+	return read_mapping_page(ctx->inode->i_mapping, index, ctx);
+}
+
 static int erofs_fscache_init_cookie(struct erofs_fscache_context *ctx,
 				     char *path)
 {
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index cce39339c08e..35e7c330e59e 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -633,6 +633,9 @@ struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
 						char *path, bool need_inode);
 void erofs_fscache_put_ctx(struct erofs_fscache_context *ctx);
 
+struct page *erofs_fscache_read_cache_page(struct erofs_fscache_context *ctx,
+					   pgoff_t index);
+
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
 
 #endif	/* __EROFS_INTERNAL_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 16/22] erofs: implement fscache-based metadata read
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

This patch implements the data plane of reading metadata from bootstrap
blob file over fscache.

Be noted that currently it only supports the scenario where the backing
file has no hole. Once it hits a hole of the backing file, erofs will
fail the IO with -EOPNOTSUPP for now. The following patch will fix this
issue, i.e. implementing the demand reading mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/data.c     | 11 +++++++++--
 fs/erofs/fscache.c  | 24 ++++++++++++++++++++++++
 fs/erofs/internal.h |  3 +++
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 6e2a28242453..1bff99576883 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -31,15 +31,22 @@ void erofs_put_metabuf(struct erofs_buf *buf)
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
 			erofs_blk_t blkaddr, enum erofs_kmap_type type)
 {
-	struct address_space *const mapping = sb->s_bdev->bd_inode->i_mapping;
+	struct address_space *mapping;
+	struct erofs_sb_info *sbi = EROFS_SB(sb);
 	erofs_off_t offset = blknr_to_addr(blkaddr);
 	pgoff_t index = offset >> PAGE_SHIFT;
 	struct page *page = buf->page;
 
 	if (!page || page->index != index) {
 		erofs_put_metabuf(buf);
-		page = read_cache_page_gfp(mapping, index,
+		if (erofs_bdev_mode(sb)) {
+			mapping = sb->s_bdev->bd_inode->i_mapping;
+			page = read_cache_page_gfp(mapping, index,
 				mapping_gfp_constraint(mapping, ~__GFP_FS));
+		} else {
+			page = erofs_fscache_read_cache_page(sbi->bootstrap,
+				index);
+		}
 		if (IS_ERR(page))
 			return page;
 		/* should already be PageUptodate, no need to lock page */
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 38b5a9380092..654414aa87ad 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -44,9 +44,33 @@ static inline int erofs_fscache_read_page(struct fscache_cookie *cookie,
 					page_offset(page), PAGE_SIZE, pstart);
 }
 
+static int erofs_fscache_readpage_blob(struct file *data, struct page *page)
+{
+	int ret;
+	struct erofs_fscache_context *ctx =
+		(struct erofs_fscache_context *)data;
+
+	ret = erofs_fscache_read_page(ctx->cookie, page, page_offset(page));
+	if (!ret)
+		SetPageUptodate(page);
+	else
+		SetPageError(page);
+
+	unlock_page(page);
+	return ret;
+}
+
 static const struct address_space_operations erofs_fscache_blob_aops = {
+	.readpage = erofs_fscache_readpage_blob,
 };
 
+struct page *erofs_fscache_read_cache_page(struct erofs_fscache_context *ctx,
+					   pgoff_t index)
+{
+	DBG_BUGON(!ctx->inode);
+	return read_mapping_page(ctx->inode->i_mapping, index, ctx);
+}
+
 static int erofs_fscache_init_cookie(struct erofs_fscache_context *ctx,
 				     char *path)
 {
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index cce39339c08e..35e7c330e59e 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -633,6 +633,9 @@ struct erofs_fscache_context *erofs_fscache_get_ctx(struct super_block *sb,
 						char *path, bool need_inode);
 void erofs_fscache_put_ctx(struct erofs_fscache_context *ctx);
 
+struct page *erofs_fscache_read_cache_page(struct erofs_fscache_context *ctx,
+					   pgoff_t index);
+
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
 
 #endif	/* __EROFS_INTERNAL_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 17/22] erofs: implement fscache-based data read for non-inline layout
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

This patch implements the data plane of reading data from bootstrap blob
file over fscache for non-inline layout.

Be noted that compressed layout is not supported yet.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 94 +++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/inode.c    |  6 ++-
 fs/erofs/internal.h |  1 +
 3 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 654414aa87ad..df56562f33c4 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -4,6 +4,12 @@
  */
 #include "internal.h"
 
+struct erofs_fscache_map {
+	struct erofs_fscache_context *m_ctx;
+	erofs_off_t m_pa, m_la, o_la;
+	u64 m_llen;
+};
+
 static struct fscache_volume *volume;
 
 /*
@@ -60,10 +66,98 @@ static int erofs_fscache_readpage_blob(struct file *data, struct page *page)
 	return ret;
 }
 
+static inline int erofs_fscache_get_map(struct erofs_fscache_map *fsmap,
+					struct erofs_map_blocks *map,
+					struct super_block *sb)
+{
+	struct erofs_sb_info *sbi = EROFS_SB(sb);
+
+	fsmap->m_ctx  = sbi->bootstrap;
+	fsmap->m_la   = map->m_la;
+	fsmap->m_pa   = map->m_pa;
+	fsmap->m_llen = map->m_llen;
+
+	return 0;
+}
+
+static int erofs_fscache_readpage_noinline(struct page *page,
+					   struct erofs_fscache_map *fsmap)
+{
+	struct fscache_cookie *cookie = fsmap->m_ctx->cookie;
+	/*
+	 * 1) For FLAT_PLAIN layout, the output map.m_la shall be equal to o_la,
+	 * and the output map.m_pa is exactly the physical address of o_la.
+	 * 2) For CHUNK_BASED layout, the output map.m_la is rounded down to the
+	 * nearest chunk boundary, and the output map.m_pa is actually the
+	 * physical address of this chunk boundary. So we need to recalculate
+	 * the actual physical address of o_la.
+	 */
+	loff_t start = fsmap->m_pa + (fsmap->o_la - fsmap->m_la);
+
+	return erofs_fscache_read_page(cookie, page, start);
+}
+
+static int erofs_fscache_do_readpage(struct page *page)
+{
+	struct inode *inode = page->mapping->host;
+	struct erofs_inode *vi = EROFS_I(inode);
+	struct super_block *sb = inode->i_sb;
+	struct erofs_map_blocks map;
+	struct erofs_fscache_map fsmap;
+	int ret;
+
+	if (erofs_inode_is_data_compressed(vi->datalayout)) {
+		erofs_info(sb, "compressed layout not supported yet");
+		return -EOPNOTSUPP;
+	}
+
+	map.m_la = fsmap.o_la = page_offset(page);
+
+	ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+	if (ret)
+		return ret;
+
+	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+		zero_user(page, 0, PAGE_SIZE);
+		return 0;
+	}
+
+	ret = erofs_fscache_get_map(&fsmap, &map, sb);
+	if (ret)
+		return ret;
+
+	switch (vi->datalayout) {
+	case EROFS_INODE_FLAT_PLAIN:
+	case EROFS_INODE_CHUNK_BASED:
+		return erofs_fscache_readpage_noinline(page, &fsmap);
+	default:
+		DBG_BUGON(1);
+		return -EOPNOTSUPP;
+	}
+}
+
+static int erofs_fscache_readpage(struct file *file, struct page *page)
+{
+	int ret;
+
+	ret = erofs_fscache_do_readpage(page);
+	if (!ret)
+		SetPageUptodate(page);
+	else
+		SetPageError(page);
+
+	unlock_page(page);
+	return ret;
+}
+
 static const struct address_space_operations erofs_fscache_blob_aops = {
 	.readpage = erofs_fscache_readpage_blob,
 };
 
+const struct address_space_operations erofs_fscache_access_aops = {
+	.readpage = erofs_fscache_readpage,
+};
+
 struct page *erofs_fscache_read_cache_page(struct erofs_fscache_context *ctx,
 					   pgoff_t index)
 {
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index ff62f84f47d3..2f450cb3a7b9 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -296,7 +296,11 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
 		err = z_erofs_fill_inode(inode);
 		goto out_unlock;
 	}
-	inode->i_mapping->a_ops = &erofs_raw_access_aops;
+
+	if (erofs_bdev_mode(inode->i_sb))
+		inode->i_mapping->a_ops = &erofs_raw_access_aops;
+	else
+		inode->i_mapping->a_ops = &erofs_fscache_access_aops;
 
 out_unlock:
 	erofs_put_metabuf(&buf);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 35e7c330e59e..f94a921eff98 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -393,6 +393,7 @@ struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
 extern const struct super_operations erofs_sops;
 
 extern const struct address_space_operations erofs_raw_access_aops;
+extern const struct address_space_operations erofs_fscache_access_aops;
 extern const struct address_space_operations z_erofs_aops;
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 17/22] erofs: implement fscache-based data read for non-inline layout
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

This patch implements the data plane of reading data from bootstrap blob
file over fscache for non-inline layout.

Be noted that compressed layout is not supported yet.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c  | 94 +++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/inode.c    |  6 ++-
 fs/erofs/internal.h |  1 +
 3 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 654414aa87ad..df56562f33c4 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -4,6 +4,12 @@
  */
 #include "internal.h"
 
+struct erofs_fscache_map {
+	struct erofs_fscache_context *m_ctx;
+	erofs_off_t m_pa, m_la, o_la;
+	u64 m_llen;
+};
+
 static struct fscache_volume *volume;
 
 /*
@@ -60,10 +66,98 @@ static int erofs_fscache_readpage_blob(struct file *data, struct page *page)
 	return ret;
 }
 
+static inline int erofs_fscache_get_map(struct erofs_fscache_map *fsmap,
+					struct erofs_map_blocks *map,
+					struct super_block *sb)
+{
+	struct erofs_sb_info *sbi = EROFS_SB(sb);
+
+	fsmap->m_ctx  = sbi->bootstrap;
+	fsmap->m_la   = map->m_la;
+	fsmap->m_pa   = map->m_pa;
+	fsmap->m_llen = map->m_llen;
+
+	return 0;
+}
+
+static int erofs_fscache_readpage_noinline(struct page *page,
+					   struct erofs_fscache_map *fsmap)
+{
+	struct fscache_cookie *cookie = fsmap->m_ctx->cookie;
+	/*
+	 * 1) For FLAT_PLAIN layout, the output map.m_la shall be equal to o_la,
+	 * and the output map.m_pa is exactly the physical address of o_la.
+	 * 2) For CHUNK_BASED layout, the output map.m_la is rounded down to the
+	 * nearest chunk boundary, and the output map.m_pa is actually the
+	 * physical address of this chunk boundary. So we need to recalculate
+	 * the actual physical address of o_la.
+	 */
+	loff_t start = fsmap->m_pa + (fsmap->o_la - fsmap->m_la);
+
+	return erofs_fscache_read_page(cookie, page, start);
+}
+
+static int erofs_fscache_do_readpage(struct page *page)
+{
+	struct inode *inode = page->mapping->host;
+	struct erofs_inode *vi = EROFS_I(inode);
+	struct super_block *sb = inode->i_sb;
+	struct erofs_map_blocks map;
+	struct erofs_fscache_map fsmap;
+	int ret;
+
+	if (erofs_inode_is_data_compressed(vi->datalayout)) {
+		erofs_info(sb, "compressed layout not supported yet");
+		return -EOPNOTSUPP;
+	}
+
+	map.m_la = fsmap.o_la = page_offset(page);
+
+	ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+	if (ret)
+		return ret;
+
+	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+		zero_user(page, 0, PAGE_SIZE);
+		return 0;
+	}
+
+	ret = erofs_fscache_get_map(&fsmap, &map, sb);
+	if (ret)
+		return ret;
+
+	switch (vi->datalayout) {
+	case EROFS_INODE_FLAT_PLAIN:
+	case EROFS_INODE_CHUNK_BASED:
+		return erofs_fscache_readpage_noinline(page, &fsmap);
+	default:
+		DBG_BUGON(1);
+		return -EOPNOTSUPP;
+	}
+}
+
+static int erofs_fscache_readpage(struct file *file, struct page *page)
+{
+	int ret;
+
+	ret = erofs_fscache_do_readpage(page);
+	if (!ret)
+		SetPageUptodate(page);
+	else
+		SetPageError(page);
+
+	unlock_page(page);
+	return ret;
+}
+
 static const struct address_space_operations erofs_fscache_blob_aops = {
 	.readpage = erofs_fscache_readpage_blob,
 };
 
+const struct address_space_operations erofs_fscache_access_aops = {
+	.readpage = erofs_fscache_readpage,
+};
+
 struct page *erofs_fscache_read_cache_page(struct erofs_fscache_context *ctx,
 					   pgoff_t index)
 {
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index ff62f84f47d3..2f450cb3a7b9 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -296,7 +296,11 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
 		err = z_erofs_fill_inode(inode);
 		goto out_unlock;
 	}
-	inode->i_mapping->a_ops = &erofs_raw_access_aops;
+
+	if (erofs_bdev_mode(inode->i_sb))
+		inode->i_mapping->a_ops = &erofs_raw_access_aops;
+	else
+		inode->i_mapping->a_ops = &erofs_fscache_access_aops;
 
 out_unlock:
 	erofs_put_metabuf(&buf);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 35e7c330e59e..f94a921eff98 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -393,6 +393,7 @@ struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
 extern const struct super_operations erofs_sops;
 
 extern const struct address_space_operations erofs_raw_access_aops;
+extern const struct address_space_operations erofs_fscache_access_aops;
 extern const struct address_space_operations z_erofs_aops;
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 18/22] erofs: implement fscache-based data read for inline layout
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

This patch implements the data plane of reading data from bootstrap blob
file over fscache for inline layout.

For the heading non-inline part, the data plane for non-inline layout is
resued, while only the tail packing part needs special handling.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 43 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index df56562f33c4..254b3e72ab4d 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -85,8 +85,9 @@ static int erofs_fscache_readpage_noinline(struct page *page,
 {
 	struct fscache_cookie *cookie = fsmap->m_ctx->cookie;
 	/*
-	 * 1) For FLAT_PLAIN layout, the output map.m_la shall be equal to o_la,
-	 * and the output map.m_pa is exactly the physical address of o_la.
+	 * 1) For FLAT_PLAIN and FLAT_INLINE (the heading non tail packing part)
+	 * layout, the output map.m_la shall be equal to o_la, and the output
+	 * map.m_pa is exactly the physical address of o_la.
 	 * 2) For CHUNK_BASED layout, the output map.m_la is rounded down to the
 	 * nearest chunk boundary, and the output map.m_pa is actually the
 	 * physical address of this chunk boundary. So we need to recalculate
@@ -97,6 +98,40 @@ static int erofs_fscache_readpage_noinline(struct page *page,
 	return erofs_fscache_read_page(cookie, page, start);
 }
 
+static int erofs_fscache_readpage_inline(struct page *page,
+					 struct erofs_fscache_map *fsmap)
+{
+	struct inode *inode = page->mapping->host;
+	struct super_block *sb = inode->i_sb;
+	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
+	erofs_blk_t blknr;
+	size_t offset, len;
+	void *src, *dst;
+
+	/*
+	 * For inline (tail packing) layout, the offset may be non-zero, which
+	 * can be calculated from corresponding physical address directly.
+	 * Currently only flat layout supports inline (FLAT_INLINE), and the
+	 * output map.m_pa is exactly the physical address of o_la in this case.
+	 */
+	offset = erofs_blkoff(fsmap->m_pa);
+	blknr = erofs_blknr(fsmap->m_pa);
+	len = fsmap->m_llen;
+
+	src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
+	if (IS_ERR(src))
+		return PTR_ERR(src);
+
+	dst = kmap(page);
+	memcpy(dst, src + offset, len);
+	memset(dst + len, 0, PAGE_SIZE - len);
+	kunmap(page);
+
+	erofs_put_metabuf(&buf);
+
+	return 0;
+}
+
 static int erofs_fscache_do_readpage(struct page *page)
 {
 	struct inode *inode = page->mapping->host;
@@ -126,8 +161,12 @@ static int erofs_fscache_do_readpage(struct page *page)
 	if (ret)
 		return ret;
 
+	if (map.m_flags & EROFS_MAP_META)
+		return erofs_fscache_readpage_inline(page, &fsmap);
+
 	switch (vi->datalayout) {
 	case EROFS_INODE_FLAT_PLAIN:
+	case EROFS_INODE_FLAT_INLINE:
 	case EROFS_INODE_CHUNK_BASED:
 		return erofs_fscache_readpage_noinline(page, &fsmap);
 	default:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 18/22] erofs: implement fscache-based data read for inline layout
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

This patch implements the data plane of reading data from bootstrap blob
file over fscache for inline layout.

For the heading non-inline part, the data plane for non-inline layout is
resued, while only the tail packing part needs special handling.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 43 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index df56562f33c4..254b3e72ab4d 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -85,8 +85,9 @@ static int erofs_fscache_readpage_noinline(struct page *page,
 {
 	struct fscache_cookie *cookie = fsmap->m_ctx->cookie;
 	/*
-	 * 1) For FLAT_PLAIN layout, the output map.m_la shall be equal to o_la,
-	 * and the output map.m_pa is exactly the physical address of o_la.
+	 * 1) For FLAT_PLAIN and FLAT_INLINE (the heading non tail packing part)
+	 * layout, the output map.m_la shall be equal to o_la, and the output
+	 * map.m_pa is exactly the physical address of o_la.
 	 * 2) For CHUNK_BASED layout, the output map.m_la is rounded down to the
 	 * nearest chunk boundary, and the output map.m_pa is actually the
 	 * physical address of this chunk boundary. So we need to recalculate
@@ -97,6 +98,40 @@ static int erofs_fscache_readpage_noinline(struct page *page,
 	return erofs_fscache_read_page(cookie, page, start);
 }
 
+static int erofs_fscache_readpage_inline(struct page *page,
+					 struct erofs_fscache_map *fsmap)
+{
+	struct inode *inode = page->mapping->host;
+	struct super_block *sb = inode->i_sb;
+	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
+	erofs_blk_t blknr;
+	size_t offset, len;
+	void *src, *dst;
+
+	/*
+	 * For inline (tail packing) layout, the offset may be non-zero, which
+	 * can be calculated from corresponding physical address directly.
+	 * Currently only flat layout supports inline (FLAT_INLINE), and the
+	 * output map.m_pa is exactly the physical address of o_la in this case.
+	 */
+	offset = erofs_blkoff(fsmap->m_pa);
+	blknr = erofs_blknr(fsmap->m_pa);
+	len = fsmap->m_llen;
+
+	src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
+	if (IS_ERR(src))
+		return PTR_ERR(src);
+
+	dst = kmap(page);
+	memcpy(dst, src + offset, len);
+	memset(dst + len, 0, PAGE_SIZE - len);
+	kunmap(page);
+
+	erofs_put_metabuf(&buf);
+
+	return 0;
+}
+
 static int erofs_fscache_do_readpage(struct page *page)
 {
 	struct inode *inode = page->mapping->host;
@@ -126,8 +161,12 @@ static int erofs_fscache_do_readpage(struct page *page)
 	if (ret)
 		return ret;
 
+	if (map.m_flags & EROFS_MAP_META)
+		return erofs_fscache_readpage_inline(page, &fsmap);
+
 	switch (vi->datalayout) {
 	case EROFS_INODE_FLAT_PLAIN:
+	case EROFS_INODE_FLAT_INLINE:
 	case EROFS_INODE_CHUNK_BASED:
 		return erofs_fscache_readpage_noinline(page, &fsmap);
 	default:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 19/22] erofs: register cookie context for data blobs
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Similar to the multi device mode, erofs could be mounted from multiple
blob files (one bootstrap blob file and optional multiple data blob
files). In this case, each device slot contains the path of
corresponding data blob file.

This patch registers corresponding cookie context for each data blob
file.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/internal.h |  1 +
 fs/erofs/super.c    | 27 +++++++++++++++++++--------
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index f94a921eff98..d93de8b6ff44 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -53,6 +53,7 @@ struct erofs_device_info {
 	struct block_device *bdev;
 	struct dax_device *dax_dev;
 	u64 dax_part_off;
+	struct erofs_fscache_context *ctx;
 
 	u32 blocks;
 	u32 mapped_blkaddr;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 8c5783c6f71f..f058a04a00c7 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -250,6 +250,7 @@ static int erofs_init_devices(struct super_block *sb,
 	down_read(&sbi->devs->rwsem);
 	idr_for_each_entry(&sbi->devs->tree, dif, id) {
 		struct block_device *bdev;
+		struct erofs_fscache_context *ctx;
 
 		ptr = erofs_read_metabuf(&buf, sb, erofs_blknr(pos),
 					 EROFS_KMAP);
@@ -259,15 +260,24 @@ static int erofs_init_devices(struct super_block *sb,
 		}
 		dis = ptr + erofs_blkoff(pos);
 
-		bdev = blkdev_get_by_path(dif->path,
-					  FMODE_READ | FMODE_EXCL,
-					  sb->s_type);
-		if (IS_ERR(bdev)) {
-			err = PTR_ERR(bdev);
-			break;
+		if (erofs_bdev_mode(sb)) {
+			bdev = blkdev_get_by_path(dif->path,
+						  FMODE_READ | FMODE_EXCL,
+						  sb->s_type);
+			if (IS_ERR(bdev)) {
+				err = PTR_ERR(bdev);
+				break;
+			}
+			dif->bdev = bdev;
+			dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
+		} else {
+			ctx = erofs_fscache_get_ctx(sb, dif->path, false);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				break;
+			}
+			dif->ctx = ctx;
 		}
-		dif->bdev = bdev;
-		dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
 		dif->blocks = le32_to_cpu(dis->blocks);
 		dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr);
 		sbi->total_blocks += dif->blocks;
@@ -694,6 +704,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
 {
 	struct erofs_device_info *dif = ptr;
 
+	erofs_fscache_put_ctx(dif->ctx);
 	fs_put_dax(dif->dax_dev);
 	if (dif->bdev)
 		blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 19/22] erofs: register cookie context for data blobs
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Similar to the multi device mode, erofs could be mounted from multiple
blob files (one bootstrap blob file and optional multiple data blob
files). In this case, each device slot contains the path of
corresponding data blob file.

This patch registers corresponding cookie context for each data blob
file.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/internal.h |  1 +
 fs/erofs/super.c    | 27 +++++++++++++++++++--------
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index f94a921eff98..d93de8b6ff44 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -53,6 +53,7 @@ struct erofs_device_info {
 	struct block_device *bdev;
 	struct dax_device *dax_dev;
 	u64 dax_part_off;
+	struct erofs_fscache_context *ctx;
 
 	u32 blocks;
 	u32 mapped_blkaddr;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 8c5783c6f71f..f058a04a00c7 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -250,6 +250,7 @@ static int erofs_init_devices(struct super_block *sb,
 	down_read(&sbi->devs->rwsem);
 	idr_for_each_entry(&sbi->devs->tree, dif, id) {
 		struct block_device *bdev;
+		struct erofs_fscache_context *ctx;
 
 		ptr = erofs_read_metabuf(&buf, sb, erofs_blknr(pos),
 					 EROFS_KMAP);
@@ -259,15 +260,24 @@ static int erofs_init_devices(struct super_block *sb,
 		}
 		dis = ptr + erofs_blkoff(pos);
 
-		bdev = blkdev_get_by_path(dif->path,
-					  FMODE_READ | FMODE_EXCL,
-					  sb->s_type);
-		if (IS_ERR(bdev)) {
-			err = PTR_ERR(bdev);
-			break;
+		if (erofs_bdev_mode(sb)) {
+			bdev = blkdev_get_by_path(dif->path,
+						  FMODE_READ | FMODE_EXCL,
+						  sb->s_type);
+			if (IS_ERR(bdev)) {
+				err = PTR_ERR(bdev);
+				break;
+			}
+			dif->bdev = bdev;
+			dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
+		} else {
+			ctx = erofs_fscache_get_ctx(sb, dif->path, false);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				break;
+			}
+			dif->ctx = ctx;
 		}
-		dif->bdev = bdev;
-		dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
 		dif->blocks = le32_to_cpu(dis->blocks);
 		dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr);
 		sbi->total_blocks += dif->blocks;
@@ -694,6 +704,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
 {
 	struct erofs_device_info *dif = ptr;
 
+	erofs_fscache_put_ctx(dif->ctx);
 	fs_put_dax(dif->dax_dev);
 	if (dif->bdev)
 		blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 20/22] erofs: implement fscache-based data read for data blobs
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

This patch implements the data plane of reading data from data blob file
over fscache.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/data.c     |  3 +++
 fs/erofs/fscache.c  | 16 +++++++++++++---
 fs/erofs/internal.h |  1 +
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 1bff99576883..c5ccf55c050c 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -200,6 +200,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 	map->m_bdev = sb->s_bdev;
 	map->m_daxdev = EROFS_SB(sb)->dax_dev;
 	map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
+	map->m_ctx = EROFS_SB(sb)->bootstrap;
 
 	if (map->m_deviceid) {
 		down_read(&devs->rwsem);
@@ -211,6 +212,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 		map->m_bdev = dif->bdev;
 		map->m_daxdev = dif->dax_dev;
 		map->m_dax_part_off = dif->dax_part_off;
+		map->m_ctx = dif->ctx;
 		up_read(&devs->rwsem);
 	} else if (devs->extra_devices) {
 		down_read(&devs->rwsem);
@@ -228,6 +230,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 				map->m_bdev = dif->bdev;
 				map->m_daxdev = dif->dax_dev;
 				map->m_dax_part_off = dif->dax_part_off;
+				map->m_ctx = dif->ctx;
 				break;
 			}
 		}
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 254b3e72ab4d..82c52b6e077e 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -70,11 +70,21 @@ static inline int erofs_fscache_get_map(struct erofs_fscache_map *fsmap,
 					struct erofs_map_blocks *map,
 					struct super_block *sb)
 {
-	struct erofs_sb_info *sbi = EROFS_SB(sb);
+	struct erofs_map_dev mdev;
+	int ret;
+
+	mdev = (struct erofs_map_dev) {
+		.m_deviceid = map->m_deviceid,
+		.m_pa = map->m_pa,
+	};
+
+	ret = erofs_map_dev(sb, &mdev);
+	if (ret)
+		return ret;
 
-	fsmap->m_ctx  = sbi->bootstrap;
+	fsmap->m_ctx  = mdev.m_ctx;
+	fsmap->m_pa   = mdev.m_pa;
 	fsmap->m_la   = map->m_la;
-	fsmap->m_pa   = map->m_pa;
 	fsmap->m_llen = map->m_llen;
 
 	return 0;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index d93de8b6ff44..f698bdeb88ef 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -486,6 +486,7 @@ struct erofs_map_dev {
 	struct block_device *m_bdev;
 	struct dax_device *m_daxdev;
 	u64 m_dax_part_off;
+	struct erofs_fscache_context *m_ctx;
 
 	erofs_off_t m_pa;
 	unsigned int m_deviceid;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 20/22] erofs: implement fscache-based data read for data blobs
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

This patch implements the data plane of reading data from data blob file
over fscache.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/data.c     |  3 +++
 fs/erofs/fscache.c  | 16 +++++++++++++---
 fs/erofs/internal.h |  1 +
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 1bff99576883..c5ccf55c050c 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -200,6 +200,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 	map->m_bdev = sb->s_bdev;
 	map->m_daxdev = EROFS_SB(sb)->dax_dev;
 	map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
+	map->m_ctx = EROFS_SB(sb)->bootstrap;
 
 	if (map->m_deviceid) {
 		down_read(&devs->rwsem);
@@ -211,6 +212,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 		map->m_bdev = dif->bdev;
 		map->m_daxdev = dif->dax_dev;
 		map->m_dax_part_off = dif->dax_part_off;
+		map->m_ctx = dif->ctx;
 		up_read(&devs->rwsem);
 	} else if (devs->extra_devices) {
 		down_read(&devs->rwsem);
@@ -228,6 +230,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 				map->m_bdev = dif->bdev;
 				map->m_daxdev = dif->dax_dev;
 				map->m_dax_part_off = dif->dax_part_off;
+				map->m_ctx = dif->ctx;
 				break;
 			}
 		}
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 254b3e72ab4d..82c52b6e077e 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -70,11 +70,21 @@ static inline int erofs_fscache_get_map(struct erofs_fscache_map *fsmap,
 					struct erofs_map_blocks *map,
 					struct super_block *sb)
 {
-	struct erofs_sb_info *sbi = EROFS_SB(sb);
+	struct erofs_map_dev mdev;
+	int ret;
+
+	mdev = (struct erofs_map_dev) {
+		.m_deviceid = map->m_deviceid,
+		.m_pa = map->m_pa,
+	};
+
+	ret = erofs_map_dev(sb, &mdev);
+	if (ret)
+		return ret;
 
-	fsmap->m_ctx  = sbi->bootstrap;
+	fsmap->m_ctx  = mdev.m_ctx;
+	fsmap->m_pa   = mdev.m_pa;
 	fsmap->m_la   = map->m_la;
-	fsmap->m_pa   = map->m_pa;
 	fsmap->m_llen = map->m_llen;
 
 	return 0;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index d93de8b6ff44..f698bdeb88ef 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -486,6 +486,7 @@ struct erofs_map_dev {
 	struct block_device *m_bdev;
 	struct dax_device *m_daxdev;
 	u64 m_dax_part_off;
+	struct erofs_fscache_context *m_ctx;
 
 	erofs_off_t m_pa;
 	unsigned int m_deviceid;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 21/22] erofs: implement fscache-based data readahead
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

This patch implements fscache-based data readahead. Also registers an
individual bdi for each erofs instance to enable readahead.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 153 +++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/super.c   |   4 ++
 2 files changed, 157 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 82c52b6e077e..913ca891deb9 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -10,6 +10,13 @@ struct erofs_fscache_map {
 	u64 m_llen;
 };
 
+struct erofs_fscahce_ra_ctx {
+	struct readahead_control *rac;
+	struct address_space *mapping;
+	loff_t start;
+	size_t len, done;
+};
+
 static struct fscache_volume *volume;
 
 /*
@@ -199,12 +206,158 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
 	return ret;
 }
 
+static inline size_t erofs_fscache_calc_len(struct erofs_fscahce_ra_ctx *ractx,
+					    struct erofs_fscache_map *fsmap)
+{
+	/*
+	 * 1) For CHUNK_BASED layout, the output m_la is rounded down to the
+	 * nearest chunk boundary, and the output m_llen actually starts from
+	 * the start of the containing chunk.
+	 * 2) For other cases, the output m_la is equal to o_la.
+	 */
+	size_t len = fsmap->m_llen - (fsmap->o_la - fsmap->m_la);
+
+	return min_t(size_t, len, ractx->len - ractx->done);
+}
+
+static inline void erofs_fscache_unlock_pages(struct readahead_control *rac,
+					      size_t len)
+{
+	while (len) {
+		struct page *page = readahead_page(rac);
+
+		SetPageUptodate(page);
+		unlock_page(page);
+		put_page(page);
+
+		len -= PAGE_SIZE;
+	}
+}
+
+static int erofs_fscache_ra_hole(struct erofs_fscahce_ra_ctx *ractx,
+				 struct erofs_fscache_map *fsmap)
+{
+	struct iov_iter iter;
+	loff_t start = ractx->start + ractx->done;
+	size_t length = erofs_fscache_calc_len(ractx, fsmap);
+
+	iov_iter_xarray(&iter, READ, &ractx->mapping->i_pages, start, length);
+	iov_iter_zero(length, &iter);
+
+	erofs_fscache_unlock_pages(ractx->rac, length);
+	return length;
+}
+
+static int erofs_fscache_ra_noinline(struct erofs_fscahce_ra_ctx *ractx,
+				     struct erofs_fscache_map *fsmap)
+{
+	struct fscache_cookie *cookie = fsmap->m_ctx->cookie;
+	loff_t start = ractx->start + ractx->done;
+	size_t length = erofs_fscache_calc_len(ractx, fsmap);
+	loff_t pstart = fsmap->m_pa + (fsmap->o_la - fsmap->m_la);
+	int ret;
+
+	ret = erofs_fscache_read_pages(cookie, ractx->mapping,
+				       start, length, pstart);
+	if (!ret) {
+		erofs_fscache_unlock_pages(ractx->rac, length);
+		ret = length;
+	}
+
+	return ret;
+}
+
+static int erofs_fscache_ra_inline(struct erofs_fscahce_ra_ctx *ractx,
+				   struct erofs_fscache_map *fsmap)
+{
+	struct page *page = readahead_page(ractx->rac);
+	int ret;
+
+	ret = erofs_fscache_readpage_inline(page, fsmap);
+	if (!ret) {
+		SetPageUptodate(page);
+		ret = PAGE_SIZE;
+	}
+
+	unlock_page(page);
+	put_page(page);
+	return ret;
+}
+
+static void erofs_fscache_readahead(struct readahead_control *rac)
+{
+	struct inode *inode = rac->mapping->host;
+	struct erofs_inode *vi = EROFS_I(inode);
+	struct super_block *sb = inode->i_sb;
+	struct erofs_fscahce_ra_ctx ractx;
+	int ret;
+
+	if (erofs_inode_is_data_compressed(vi->datalayout)) {
+		erofs_info(sb, "compressed layout not supported yet");
+		return;
+	}
+
+	if (!readahead_count(rac))
+		return;
+
+	ractx = (struct erofs_fscahce_ra_ctx) {
+		.rac = rac,
+		.mapping = rac->mapping,
+		.start = readahead_pos(rac),
+		.len = readahead_length(rac),
+	};
+
+	do {
+		struct erofs_map_blocks map;
+		struct erofs_fscache_map fsmap;
+
+		map.m_la = fsmap.o_la = ractx.start + ractx.done;
+
+		ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+		if (ret)
+			return;
+
+		if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+			/*
+			 * Two cases will hit this:
+			 * 1) EOF. Imposibble in readahead routine;
+			 * 2) hole. Only CHUNK_BASED layout supports hole.
+			 */
+			fsmap.m_la   = map.m_la;
+			fsmap.m_llen = map.m_llen;
+			ret = erofs_fscache_ra_hole(&ractx, &fsmap);
+			continue;
+		}
+
+		ret = erofs_fscache_get_map(&fsmap, &map, sb);
+		if (ret)
+			return;
+
+		if (map.m_flags & EROFS_MAP_META) {
+			ret = erofs_fscache_ra_inline(&ractx, &fsmap);
+			continue;
+		}
+
+		switch (vi->datalayout) {
+		case EROFS_INODE_FLAT_PLAIN:
+		case EROFS_INODE_FLAT_INLINE:
+		case EROFS_INODE_CHUNK_BASED:
+			ret = erofs_fscache_ra_noinline(&ractx, &fsmap);
+			break;
+		default:
+			DBG_BUGON(1);
+			return;
+		}
+	} while (ret > 0 && ((ractx.done += ret) < ractx.len));
+}
+
 static const struct address_space_operations erofs_fscache_blob_aops = {
 	.readpage = erofs_fscache_readpage_blob,
 };
 
 const struct address_space_operations erofs_fscache_access_aops = {
 	.readpage = erofs_fscache_readpage,
+	.readahead = erofs_fscache_readahead,
 };
 
 struct page *erofs_fscache_read_cache_page(struct erofs_fscache_context *ctx,
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index f058a04a00c7..2942029a7049 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -616,6 +616,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 			return PTR_ERR(bootstrap);
 
 		sbi->bootstrap = bootstrap;
+
+		err = super_setup_bdi(sb);
+		if (err)
+			return err;
 	}
 
 	err = erofs_read_superblock(sb);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 21/22] erofs: implement fscache-based data readahead
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

This patch implements fscache-based data readahead. Also registers an
individual bdi for each erofs instance to enable readahead.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/fscache.c | 153 +++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/super.c   |   4 ++
 2 files changed, 157 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 82c52b6e077e..913ca891deb9 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -10,6 +10,13 @@ struct erofs_fscache_map {
 	u64 m_llen;
 };
 
+struct erofs_fscahce_ra_ctx {
+	struct readahead_control *rac;
+	struct address_space *mapping;
+	loff_t start;
+	size_t len, done;
+};
+
 static struct fscache_volume *volume;
 
 /*
@@ -199,12 +206,158 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
 	return ret;
 }
 
+static inline size_t erofs_fscache_calc_len(struct erofs_fscahce_ra_ctx *ractx,
+					    struct erofs_fscache_map *fsmap)
+{
+	/*
+	 * 1) For CHUNK_BASED layout, the output m_la is rounded down to the
+	 * nearest chunk boundary, and the output m_llen actually starts from
+	 * the start of the containing chunk.
+	 * 2) For other cases, the output m_la is equal to o_la.
+	 */
+	size_t len = fsmap->m_llen - (fsmap->o_la - fsmap->m_la);
+
+	return min_t(size_t, len, ractx->len - ractx->done);
+}
+
+static inline void erofs_fscache_unlock_pages(struct readahead_control *rac,
+					      size_t len)
+{
+	while (len) {
+		struct page *page = readahead_page(rac);
+
+		SetPageUptodate(page);
+		unlock_page(page);
+		put_page(page);
+
+		len -= PAGE_SIZE;
+	}
+}
+
+static int erofs_fscache_ra_hole(struct erofs_fscahce_ra_ctx *ractx,
+				 struct erofs_fscache_map *fsmap)
+{
+	struct iov_iter iter;
+	loff_t start = ractx->start + ractx->done;
+	size_t length = erofs_fscache_calc_len(ractx, fsmap);
+
+	iov_iter_xarray(&iter, READ, &ractx->mapping->i_pages, start, length);
+	iov_iter_zero(length, &iter);
+
+	erofs_fscache_unlock_pages(ractx->rac, length);
+	return length;
+}
+
+static int erofs_fscache_ra_noinline(struct erofs_fscahce_ra_ctx *ractx,
+				     struct erofs_fscache_map *fsmap)
+{
+	struct fscache_cookie *cookie = fsmap->m_ctx->cookie;
+	loff_t start = ractx->start + ractx->done;
+	size_t length = erofs_fscache_calc_len(ractx, fsmap);
+	loff_t pstart = fsmap->m_pa + (fsmap->o_la - fsmap->m_la);
+	int ret;
+
+	ret = erofs_fscache_read_pages(cookie, ractx->mapping,
+				       start, length, pstart);
+	if (!ret) {
+		erofs_fscache_unlock_pages(ractx->rac, length);
+		ret = length;
+	}
+
+	return ret;
+}
+
+static int erofs_fscache_ra_inline(struct erofs_fscahce_ra_ctx *ractx,
+				   struct erofs_fscache_map *fsmap)
+{
+	struct page *page = readahead_page(ractx->rac);
+	int ret;
+
+	ret = erofs_fscache_readpage_inline(page, fsmap);
+	if (!ret) {
+		SetPageUptodate(page);
+		ret = PAGE_SIZE;
+	}
+
+	unlock_page(page);
+	put_page(page);
+	return ret;
+}
+
+static void erofs_fscache_readahead(struct readahead_control *rac)
+{
+	struct inode *inode = rac->mapping->host;
+	struct erofs_inode *vi = EROFS_I(inode);
+	struct super_block *sb = inode->i_sb;
+	struct erofs_fscahce_ra_ctx ractx;
+	int ret;
+
+	if (erofs_inode_is_data_compressed(vi->datalayout)) {
+		erofs_info(sb, "compressed layout not supported yet");
+		return;
+	}
+
+	if (!readahead_count(rac))
+		return;
+
+	ractx = (struct erofs_fscahce_ra_ctx) {
+		.rac = rac,
+		.mapping = rac->mapping,
+		.start = readahead_pos(rac),
+		.len = readahead_length(rac),
+	};
+
+	do {
+		struct erofs_map_blocks map;
+		struct erofs_fscache_map fsmap;
+
+		map.m_la = fsmap.o_la = ractx.start + ractx.done;
+
+		ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+		if (ret)
+			return;
+
+		if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+			/*
+			 * Two cases will hit this:
+			 * 1) EOF. Imposibble in readahead routine;
+			 * 2) hole. Only CHUNK_BASED layout supports hole.
+			 */
+			fsmap.m_la   = map.m_la;
+			fsmap.m_llen = map.m_llen;
+			ret = erofs_fscache_ra_hole(&ractx, &fsmap);
+			continue;
+		}
+
+		ret = erofs_fscache_get_map(&fsmap, &map, sb);
+		if (ret)
+			return;
+
+		if (map.m_flags & EROFS_MAP_META) {
+			ret = erofs_fscache_ra_inline(&ractx, &fsmap);
+			continue;
+		}
+
+		switch (vi->datalayout) {
+		case EROFS_INODE_FLAT_PLAIN:
+		case EROFS_INODE_FLAT_INLINE:
+		case EROFS_INODE_CHUNK_BASED:
+			ret = erofs_fscache_ra_noinline(&ractx, &fsmap);
+			break;
+		default:
+			DBG_BUGON(1);
+			return;
+		}
+	} while (ret > 0 && ((ractx.done += ret) < ractx.len));
+}
+
 static const struct address_space_operations erofs_fscache_blob_aops = {
 	.readpage = erofs_fscache_readpage_blob,
 };
 
 const struct address_space_operations erofs_fscache_access_aops = {
 	.readpage = erofs_fscache_readpage,
+	.readahead = erofs_fscache_readahead,
 };
 
 struct page *erofs_fscache_read_cache_page(struct erofs_fscache_context *ctx,
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index f058a04a00c7..2942029a7049 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -616,6 +616,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 			return PTR_ERR(bootstrap);
 
 		sbi->bootstrap = bootstrap;
+
+		err = super_setup_bdi(sb);
+		if (err)
+			return err;
 	}
 
 	err = erofs_read_superblock(sb);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 22/22] erofs: add 'uuid' mount option
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-16 13:17   ` Jeffle Xu
  -1 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: gregkh, willy, linux-kernel, joseph.qi, linux-fsdevel,
	luodaowen.backend, gerry, torvalds

Introduce 'uuid' mount option to enable on-demand read sementics. In
this case, erofs could be mounted from blob files instead of blkdev.
By then users could specify the path of bootstrap blob file containing
the complete erofs image.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/super.c | 44 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 2942029a7049..8bc4b782f9a9 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -400,6 +400,7 @@ enum {
 	Opt_dax,
 	Opt_dax_enum,
 	Opt_device,
+	Opt_uuid,
 	Opt_err
 };
 
@@ -424,6 +425,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
 	fsparam_flag("dax",             Opt_dax),
 	fsparam_enum("dax",		Opt_dax_enum, erofs_dax_param_enums),
 	fsparam_string("device",	Opt_device),
+	fsparam_string("uuid",		Opt_uuid),
 	{}
 };
 
@@ -519,6 +521,12 @@ static int erofs_fc_parse_param(struct fs_context *fc,
 		}
 		++ctx->devs->extra_devices;
 		break;
+	case Opt_uuid:
+		kfree(ctx->opt.uuid);
+		ctx->opt.uuid = kstrdup(param->string, GFP_KERNEL);
+		if (!ctx->opt.uuid)
+			return -ENOMEM;
+		break;
 	default:
 		return -ENOPARAM;
 	}
@@ -593,9 +601,14 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	sb->s_magic = EROFS_SUPER_MAGIC;
 
-	if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
-		erofs_err(sb, "failed to set erofs blksize");
-		return -EINVAL;
+	if (erofs_bdev_mode(sb)) {
+		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
+			erofs_err(sb, "failed to set erofs blksize");
+			return -EINVAL;
+		}
+	} else {
+		sb->s_blocksize = EROFS_BLKSIZ;
+		sb->s_blocksize_bits = LOG_BLOCK_SIZE;
 	}
 
 	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
@@ -604,11 +617,12 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	sb->s_fs_info = sbi;
 	sbi->opt = ctx->opt;
-	sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
 	sbi->devs = ctx->devs;
 	ctx->devs = NULL;
 
-	if (!erofs_bdev_mode(sb)) {
+	if (erofs_bdev_mode(sb)) {
+		sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
+	} else {
 		struct erofs_fscache_context *bootstrap;
 
 		bootstrap = erofs_fscache_get_ctx(sb, ctx->opt.uuid, true);
@@ -620,6 +634,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 		err = super_setup_bdi(sb);
 		if (err)
 			return err;
+
+		sbi->dax_dev = NULL;
 	}
 
 	err = erofs_read_superblock(sb);
@@ -682,6 +698,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 static int erofs_fc_get_tree(struct fs_context *fc)
 {
+	struct erofs_fs_context *ctx = fc->fs_private;
+
+	if (ctx->opt.uuid)
+		return get_tree_nodev(fc, erofs_fc_fill_super);
+
 	return get_tree_bdev(fc, erofs_fc_fill_super);
 }
 
@@ -731,6 +752,7 @@ static void erofs_fc_free(struct fs_context *fc)
 	struct erofs_fs_context *ctx = fc->fs_private;
 
 	erofs_free_dev_context(ctx->devs);
+	kfree(ctx->opt.uuid);
 	kfree(ctx);
 }
 
@@ -771,7 +793,10 @@ static void erofs_kill_sb(struct super_block *sb)
 
 	WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
 
-	kill_block_super(sb);
+	if (erofs_bdev_mode(sb))
+		kill_block_super(sb);
+	else
+		generic_shutdown_super(sb);
 
 	sbi = EROFS_SB(sb);
 	if (!sbi)
@@ -889,7 +914,12 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
 	struct erofs_sb_info *sbi = EROFS_SB(sb);
-	u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
+	u64 id;
+
+	if (erofs_bdev_mode(sb))
+		id = huge_encode_dev(sb->s_bdev->bd_dev);
+	else
+		id = 0; /* TODO */
 
 	buf->f_type = sb->s_magic;
 	buf->f_bsize = EROFS_BLKSIZ;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v5 22/22] erofs: add 'uuid' mount option
@ 2022-03-16 13:17   ` Jeffle Xu
  0 siblings, 0 replies; 102+ messages in thread
From: Jeffle Xu @ 2022-03-16 13:17 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: torvalds, gregkh, willy, linux-fsdevel, joseph.qi, bo.liu,
	tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Introduce 'uuid' mount option to enable on-demand read sementics. In
this case, erofs could be mounted from blob files instead of blkdev.
By then users could specify the path of bootstrap blob file containing
the complete erofs image.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/erofs/super.c | 44 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 2942029a7049..8bc4b782f9a9 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -400,6 +400,7 @@ enum {
 	Opt_dax,
 	Opt_dax_enum,
 	Opt_device,
+	Opt_uuid,
 	Opt_err
 };
 
@@ -424,6 +425,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
 	fsparam_flag("dax",             Opt_dax),
 	fsparam_enum("dax",		Opt_dax_enum, erofs_dax_param_enums),
 	fsparam_string("device",	Opt_device),
+	fsparam_string("uuid",		Opt_uuid),
 	{}
 };
 
@@ -519,6 +521,12 @@ static int erofs_fc_parse_param(struct fs_context *fc,
 		}
 		++ctx->devs->extra_devices;
 		break;
+	case Opt_uuid:
+		kfree(ctx->opt.uuid);
+		ctx->opt.uuid = kstrdup(param->string, GFP_KERNEL);
+		if (!ctx->opt.uuid)
+			return -ENOMEM;
+		break;
 	default:
 		return -ENOPARAM;
 	}
@@ -593,9 +601,14 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	sb->s_magic = EROFS_SUPER_MAGIC;
 
-	if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
-		erofs_err(sb, "failed to set erofs blksize");
-		return -EINVAL;
+	if (erofs_bdev_mode(sb)) {
+		if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
+			erofs_err(sb, "failed to set erofs blksize");
+			return -EINVAL;
+		}
+	} else {
+		sb->s_blocksize = EROFS_BLKSIZ;
+		sb->s_blocksize_bits = LOG_BLOCK_SIZE;
 	}
 
 	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
@@ -604,11 +617,12 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	sb->s_fs_info = sbi;
 	sbi->opt = ctx->opt;
-	sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
 	sbi->devs = ctx->devs;
 	ctx->devs = NULL;
 
-	if (!erofs_bdev_mode(sb)) {
+	if (erofs_bdev_mode(sb)) {
+		sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
+	} else {
 		struct erofs_fscache_context *bootstrap;
 
 		bootstrap = erofs_fscache_get_ctx(sb, ctx->opt.uuid, true);
@@ -620,6 +634,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 		err = super_setup_bdi(sb);
 		if (err)
 			return err;
+
+		sbi->dax_dev = NULL;
 	}
 
 	err = erofs_read_superblock(sb);
@@ -682,6 +698,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 
 static int erofs_fc_get_tree(struct fs_context *fc)
 {
+	struct erofs_fs_context *ctx = fc->fs_private;
+
+	if (ctx->opt.uuid)
+		return get_tree_nodev(fc, erofs_fc_fill_super);
+
 	return get_tree_bdev(fc, erofs_fc_fill_super);
 }
 
@@ -731,6 +752,7 @@ static void erofs_fc_free(struct fs_context *fc)
 	struct erofs_fs_context *ctx = fc->fs_private;
 
 	erofs_free_dev_context(ctx->devs);
+	kfree(ctx->opt.uuid);
 	kfree(ctx);
 }
 
@@ -771,7 +793,10 @@ static void erofs_kill_sb(struct super_block *sb)
 
 	WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
 
-	kill_block_super(sb);
+	if (erofs_bdev_mode(sb))
+		kill_block_super(sb);
+	else
+		generic_shutdown_super(sb);
 
 	sbi = EROFS_SB(sb);
 	if (!sbi)
@@ -889,7 +914,12 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
 	struct erofs_sb_info *sbi = EROFS_SB(sb);
-	u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
+	u64 id;
+
+	if (erofs_bdev_mode(sb))
+		id = huge_encode_dev(sb->s_bdev->bd_dev);
+	else
+		id = 0; /* TODO */
 
 	buf->f_type = sb->s_magic;
 	buf->f_bsize = EROFS_BLKSIZ;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 04/22] cachefiles: notify user daemon with anon_fd when looking up cookie
  2022-03-16 13:17   ` Jeffle Xu
@ 2022-03-16 19:37     ` kernel test robot
  -1 siblings, 0 replies; 102+ messages in thread
From: kernel test robot @ 2022-03-16 19:37 UTC (permalink / raw)
  To: Jeffle Xu, dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: kbuild-all, torvalds, gregkh, willy, linux-fsdevel, joseph.qi,
	bo.liu, tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Hi Jeffle,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on rostedt-trace/for-next linus/master v5.17-rc8]
[cannot apply to xiang-erofs/dev-test dhowells-fs/fscache-next next-20220316]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: ia64-randconfig-r033-20220317 (https://download.01.org/0day-ci/archive/20220317/202203170323.idYrKxCZ-lkp@intel.com/config)
compiler: ia64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ef29cbdc09ec1e6ab918eaf5a16fa7ba8d23fb54
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
        git checkout ef29cbdc09ec1e6ab918eaf5a16fa7ba8d23fb54
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=ia64 SHELL=/bin/bash fs/cachefiles/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/ia64/include/asm/pgtable.h:153,
                    from include/linux/pgtable.h:6,
                    from arch/ia64/include/asm/uaccess.h:40,
                    from include/linux/uaccess.h:11,
                    from include/linux/sched/task.h:11,
                    from include/linux/sched/signal.h:9,
                    from include/linux/rcuwait.h:6,
                    from include/linux/percpu-rwsem.h:7,
                    from include/linux/fs.h:33,
                    from fs/cachefiles/daemon.c:13:
   arch/ia64/include/asm/mmu_context.h: In function 'reload_context':
   arch/ia64/include/asm/mmu_context.h:127:48: warning: variable 'old_rr4' set but not used [-Wunused-but-set-variable]
     127 |         unsigned long rr0, rr1, rr2, rr3, rr4, old_rr4;
         |                                                ^~~~~~~
   fs/cachefiles/daemon.c: In function 'cachefiles_ondemand_fd_write_iter':
>> fs/cachefiles/daemon.c:160:26: error: invalid use of undefined type 'struct iov_iter'
     160 |         size_t len = iter->count;
         |                          ^~


vim +160 fs/cachefiles/daemon.c

   153	
   154	static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
   155							 struct iov_iter *iter)
   156	{
   157		struct cachefiles_object *object = kiocb->ki_filp->private_data;
   158		struct cachefiles_cache *cache = object->volume->cache;
   159		struct file *file = object->file;
 > 160		size_t len = iter->count;
   161		loff_t pos = kiocb->ki_pos;
   162		const struct cred *saved_cred;
   163		int ret;
   164	
   165		if (!file)
   166			return -ENOBUFS;
   167	
   168		cachefiles_begin_secure(cache, &saved_cred);
   169		ret = __cachefiles_prepare_write(object, file, &pos, &len, true);
   170		cachefiles_end_secure(cache, saved_cred);
   171		if (ret < 0)
   172			return ret;
   173	
   174		ret = __cachefiles_write(object, file, pos, iter, NULL, NULL);
   175		if (!ret)
   176			ret = len;
   177	
   178		return ret;
   179	}
   180	

---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 04/22] cachefiles: notify user daemon with anon_fd when looking up cookie
@ 2022-03-16 19:37     ` kernel test robot
  0 siblings, 0 replies; 102+ messages in thread
From: kernel test robot @ 2022-03-16 19:37 UTC (permalink / raw)
  To: Jeffle Xu, dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: kbuild-all, gregkh, willy, linux-kernel, joseph.qi,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Hi Jeffle,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on rostedt-trace/for-next linus/master v5.17-rc8]
[cannot apply to xiang-erofs/dev-test dhowells-fs/fscache-next next-20220316]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: ia64-randconfig-r033-20220317 (https://download.01.org/0day-ci/archive/20220317/202203170323.idYrKxCZ-lkp@intel.com/config)
compiler: ia64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ef29cbdc09ec1e6ab918eaf5a16fa7ba8d23fb54
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
        git checkout ef29cbdc09ec1e6ab918eaf5a16fa7ba8d23fb54
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=ia64 SHELL=/bin/bash fs/cachefiles/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/ia64/include/asm/pgtable.h:153,
                    from include/linux/pgtable.h:6,
                    from arch/ia64/include/asm/uaccess.h:40,
                    from include/linux/uaccess.h:11,
                    from include/linux/sched/task.h:11,
                    from include/linux/sched/signal.h:9,
                    from include/linux/rcuwait.h:6,
                    from include/linux/percpu-rwsem.h:7,
                    from include/linux/fs.h:33,
                    from fs/cachefiles/daemon.c:13:
   arch/ia64/include/asm/mmu_context.h: In function 'reload_context':
   arch/ia64/include/asm/mmu_context.h:127:48: warning: variable 'old_rr4' set but not used [-Wunused-but-set-variable]
     127 |         unsigned long rr0, rr1, rr2, rr3, rr4, old_rr4;
         |                                                ^~~~~~~
   fs/cachefiles/daemon.c: In function 'cachefiles_ondemand_fd_write_iter':
>> fs/cachefiles/daemon.c:160:26: error: invalid use of undefined type 'struct iov_iter'
     160 |         size_t len = iter->count;
         |                          ^~


vim +160 fs/cachefiles/daemon.c

   153	
   154	static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
   155							 struct iov_iter *iter)
   156	{
   157		struct cachefiles_object *object = kiocb->ki_filp->private_data;
   158		struct cachefiles_cache *cache = object->volume->cache;
   159		struct file *file = object->file;
 > 160		size_t len = iter->count;
   161		loff_t pos = kiocb->ki_pos;
   162		const struct cred *saved_cred;
   163		int ret;
   164	
   165		if (!file)
   166			return -ENOBUFS;
   167	
   168		cachefiles_begin_secure(cache, &saved_cred);
   169		ret = __cachefiles_prepare_write(object, file, &pos, &len, true);
   170		cachefiles_end_secure(cache, saved_cred);
   171		if (ret < 0)
   172			return ret;
   173	
   174		ret = __cachefiles_write(object, file, pos, iter, NULL, NULL);
   175		if (!ret)
   176			ret = len;
   177	
   178		return ret;
   179	}
   180	

---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 11/22] erofs: register global fscache volume
  2022-03-16 13:17   ` Jeffle Xu
@ 2022-03-16 21:52     ` kernel test robot
  -1 siblings, 0 replies; 102+ messages in thread
From: kernel test robot @ 2022-03-16 21:52 UTC (permalink / raw)
  To: Jeffle Xu, dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: llvm, kbuild-all, torvalds, gregkh, willy, linux-fsdevel,
	joseph.qi, bo.liu, tao.peng, gerry, eguan, linux-kernel,
	luodaowen.backend

Hi Jeffle,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on rostedt-trace/for-next linus/master v5.17-rc8]
[cannot apply to xiang-erofs/dev-test dhowells-fs/fscache-next next-20220316]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: hexagon-randconfig-r041-20220313 (https://download.01.org/0day-ci/archive/20220317/202203170512.Se1LRa68-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project a6ec1e3d798f8eab43fb3a91028c6ab04e115fcb)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f52882624bb750e533d0ffa591c3903f08f6d8bb
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
        git checkout f52882624bb750e533d0ffa591c3903f08f6d8bb
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: __fscache_relinquish_volume
   >>> referenced by fscache.c
   >>> erofs/fscache.o:(erofs_exit_fscache) in archive fs/built-in.a
   >>> referenced by fscache.c
   >>> erofs/fscache.o:(erofs_exit_fscache) in archive fs/built-in.a
--
>> ld.lld: error: undefined symbol: __fscache_acquire_volume
   >>> referenced by fscache.c
   >>> erofs/fscache.o:(erofs_init_fscache) in archive fs/built-in.a
   >>> referenced by fscache.c
   >>> erofs/fscache.o:(erofs_init_fscache) in archive fs/built-in.a

---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 11/22] erofs: register global fscache volume
@ 2022-03-16 21:52     ` kernel test robot
  0 siblings, 0 replies; 102+ messages in thread
From: kernel test robot @ 2022-03-16 21:52 UTC (permalink / raw)
  To: Jeffle Xu, dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: kbuild-all, gregkh, llvm, willy, linux-kernel, joseph.qi,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Hi Jeffle,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on rostedt-trace/for-next linus/master v5.17-rc8]
[cannot apply to xiang-erofs/dev-test dhowells-fs/fscache-next next-20220316]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: hexagon-randconfig-r041-20220313 (https://download.01.org/0day-ci/archive/20220317/202203170512.Se1LRa68-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project a6ec1e3d798f8eab43fb3a91028c6ab04e115fcb)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f52882624bb750e533d0ffa591c3903f08f6d8bb
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
        git checkout f52882624bb750e533d0ffa591c3903f08f6d8bb
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: __fscache_relinquish_volume
   >>> referenced by fscache.c
   >>> erofs/fscache.o:(erofs_exit_fscache) in archive fs/built-in.a
   >>> referenced by fscache.c
   >>> erofs/fscache.o:(erofs_exit_fscache) in archive fs/built-in.a
--
>> ld.lld: error: undefined symbol: __fscache_acquire_volume
   >>> referenced by fscache.c
   >>> erofs/fscache.o:(erofs_init_fscache) in archive fs/built-in.a
   >>> referenced by fscache.c
   >>> erofs/fscache.o:(erofs_init_fscache) in archive fs/built-in.a

---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 11/22] erofs: register global fscache volume
  2022-03-16 13:17   ` Jeffle Xu
@ 2022-03-17  1:49     ` kernel test robot
  -1 siblings, 0 replies; 102+ messages in thread
From: kernel test robot @ 2022-03-17  1:49 UTC (permalink / raw)
  To: Jeffle Xu, dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: kbuild-all, torvalds, gregkh, willy, linux-fsdevel, joseph.qi,
	bo.liu, tao.peng, gerry, eguan, linux-kernel, luodaowen.backend

Hi Jeffle,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on rostedt-trace/for-next linus/master v5.17-rc8]
[cannot apply to xiang-erofs/dev-test dhowells-fs/fscache-next next-20220316]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: parisc-randconfig-m031-20220317 (https://download.01.org/0day-ci/archive/20220317/202203170912.gk2sqkaK-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f52882624bb750e533d0ffa591c3903f08f6d8bb
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
        git checkout f52882624bb750e533d0ffa591c3903f08f6d8bb
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=parisc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   hppa-linux-ld: fs/erofs/fscache.o: in function `erofs_exit_fscache':
>> (.text+0x18): undefined reference to `__fscache_relinquish_volume'
   hppa-linux-ld: fs/erofs/fscache.o: in function `erofs_init_fscache':
>> (.init.text+0x18): undefined reference to `__fscache_acquire_volume'

---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 11/22] erofs: register global fscache volume
@ 2022-03-17  1:49     ` kernel test robot
  0 siblings, 0 replies; 102+ messages in thread
From: kernel test robot @ 2022-03-17  1:49 UTC (permalink / raw)
  To: Jeffle Xu, dhowells, linux-cachefs, xiang, chao, linux-erofs
  Cc: kbuild-all, gregkh, willy, linux-kernel, joseph.qi,
	linux-fsdevel, luodaowen.backend, gerry, torvalds

Hi Jeffle,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on rostedt-trace/for-next linus/master v5.17-rc8]
[cannot apply to xiang-erofs/dev-test dhowells-fs/fscache-next next-20220316]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: parisc-randconfig-m031-20220317 (https://download.01.org/0day-ci/archive/20220317/202203170912.gk2sqkaK-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f52882624bb750e533d0ffa591c3903f08f6d8bb
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220316-214711
        git checkout f52882624bb750e533d0ffa591c3903f08f6d8bb
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=parisc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   hppa-linux-ld: fs/erofs/fscache.o: in function `erofs_exit_fscache':
>> (.text+0x18): undefined reference to `__fscache_relinquish_volume'
   hppa-linux-ld: fs/erofs/fscache.o: in function `erofs_init_fscache':
>> (.init.text+0x18): undefined reference to `__fscache_acquire_volume'

---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 21/22] erofs: implement fscache-based data readahead
  2022-03-16 13:17   ` Jeffle Xu
@ 2022-03-17  5:22     ` Gao Xiang
  -1 siblings, 0 replies; 102+ messages in thread
From: Gao Xiang @ 2022-03-17  5:22 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: linux-erofs, willy, linux-kernel, dhowells, joseph.qi,
	linux-cachefs, gregkh, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

On Wed, Mar 16, 2022 at 09:17:22PM +0800, Jeffle Xu wrote:
> This patch implements fscache-based data readahead. Also registers an
> individual bdi for each erofs instance to enable readahead.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/erofs/fscache.c | 153 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/erofs/super.c   |   4 ++
>  2 files changed, 157 insertions(+)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 82c52b6e077e..913ca891deb9 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -10,6 +10,13 @@ struct erofs_fscache_map {
>  	u64 m_llen;
>  };
>  
> +struct erofs_fscahce_ra_ctx {

typo,  should be `erofs_fscache_ra_ctx'

> +	struct readahead_control *rac;
> +	struct address_space *mapping;
> +	loff_t start;
> +	size_t len, done;
> +};
> +
>  static struct fscache_volume *volume;
>  
>  /*
> @@ -199,12 +206,158 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
>  	return ret;
>  }
>  
> +static inline size_t erofs_fscache_calc_len(struct erofs_fscahce_ra_ctx *ractx,
> +					    struct erofs_fscache_map *fsmap)
> +{
> +	/*
> +	 * 1) For CHUNK_BASED layout, the output m_la is rounded down to the
> +	 * nearest chunk boundary, and the output m_llen actually starts from
> +	 * the start of the containing chunk.
> +	 * 2) For other cases, the output m_la is equal to o_la.
> +	 */
> +	size_t len = fsmap->m_llen - (fsmap->o_la - fsmap->m_la);
> +
> +	return min_t(size_t, len, ractx->len - ractx->done);
> +}
> +
> +static inline void erofs_fscache_unlock_pages(struct readahead_control *rac,
> +					      size_t len)

Can we convert them into folios in advance? it seems much
straight-forward to convert these...

Or I have to convert them later, and it seems unnecessary...


> +{
> +	while (len) {
> +		struct page *page = readahead_page(rac);
> +
> +		SetPageUptodate(page);
> +		unlock_page(page);
> +		put_page(page);
> +
> +		len -= PAGE_SIZE;
> +	}
> +}
> +
> +static int erofs_fscache_ra_hole(struct erofs_fscahce_ra_ctx *ractx,
> +				 struct erofs_fscache_map *fsmap)
> +{
> +	struct iov_iter iter;
> +	loff_t start = ractx->start + ractx->done;
> +	size_t length = erofs_fscache_calc_len(ractx, fsmap);
> +
> +	iov_iter_xarray(&iter, READ, &ractx->mapping->i_pages, start, length);
> +	iov_iter_zero(length, &iter);
> +
> +	erofs_fscache_unlock_pages(ractx->rac, length);
> +	return length;
> +}
> +
> +static int erofs_fscache_ra_noinline(struct erofs_fscahce_ra_ctx *ractx,
> +				     struct erofs_fscache_map *fsmap)
> +{
> +	struct fscache_cookie *cookie = fsmap->m_ctx->cookie;
> +	loff_t start = ractx->start + ractx->done;
> +	size_t length = erofs_fscache_calc_len(ractx, fsmap);
> +	loff_t pstart = fsmap->m_pa + (fsmap->o_la - fsmap->m_la);
> +	int ret;
> +
> +	ret = erofs_fscache_read_pages(cookie, ractx->mapping,
> +				       start, length, pstart);
> +	if (!ret) {
> +		erofs_fscache_unlock_pages(ractx->rac, length);
> +		ret = length;
> +	}
> +
> +	return ret;
> +}
> +
> +static int erofs_fscache_ra_inline(struct erofs_fscahce_ra_ctx *ractx,
> +				   struct erofs_fscache_map *fsmap)
> +{

We could fold in this, since it has the only user.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 21/22] erofs: implement fscache-based data readahead
@ 2022-03-17  5:22     ` Gao Xiang
  0 siblings, 0 replies; 102+ messages in thread
From: Gao Xiang @ 2022-03-17  5:22 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend

On Wed, Mar 16, 2022 at 09:17:22PM +0800, Jeffle Xu wrote:
> This patch implements fscache-based data readahead. Also registers an
> individual bdi for each erofs instance to enable readahead.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/erofs/fscache.c | 153 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/erofs/super.c   |   4 ++
>  2 files changed, 157 insertions(+)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 82c52b6e077e..913ca891deb9 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -10,6 +10,13 @@ struct erofs_fscache_map {
>  	u64 m_llen;
>  };
>  
> +struct erofs_fscahce_ra_ctx {

typo,  should be `erofs_fscache_ra_ctx'

> +	struct readahead_control *rac;
> +	struct address_space *mapping;
> +	loff_t start;
> +	size_t len, done;
> +};
> +
>  static struct fscache_volume *volume;
>  
>  /*
> @@ -199,12 +206,158 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
>  	return ret;
>  }
>  
> +static inline size_t erofs_fscache_calc_len(struct erofs_fscahce_ra_ctx *ractx,
> +					    struct erofs_fscache_map *fsmap)
> +{
> +	/*
> +	 * 1) For CHUNK_BASED layout, the output m_la is rounded down to the
> +	 * nearest chunk boundary, and the output m_llen actually starts from
> +	 * the start of the containing chunk.
> +	 * 2) For other cases, the output m_la is equal to o_la.
> +	 */
> +	size_t len = fsmap->m_llen - (fsmap->o_la - fsmap->m_la);
> +
> +	return min_t(size_t, len, ractx->len - ractx->done);
> +}
> +
> +static inline void erofs_fscache_unlock_pages(struct readahead_control *rac,
> +					      size_t len)

Can we convert them into folios in advance? it seems much
straight-forward to convert these...

Or I have to convert them later, and it seems unnecessary...


> +{
> +	while (len) {
> +		struct page *page = readahead_page(rac);
> +
> +		SetPageUptodate(page);
> +		unlock_page(page);
> +		put_page(page);
> +
> +		len -= PAGE_SIZE;
> +	}
> +}
> +
> +static int erofs_fscache_ra_hole(struct erofs_fscahce_ra_ctx *ractx,
> +				 struct erofs_fscache_map *fsmap)
> +{
> +	struct iov_iter iter;
> +	loff_t start = ractx->start + ractx->done;
> +	size_t length = erofs_fscache_calc_len(ractx, fsmap);
> +
> +	iov_iter_xarray(&iter, READ, &ractx->mapping->i_pages, start, length);
> +	iov_iter_zero(length, &iter);
> +
> +	erofs_fscache_unlock_pages(ractx->rac, length);
> +	return length;
> +}
> +
> +static int erofs_fscache_ra_noinline(struct erofs_fscahce_ra_ctx *ractx,
> +				     struct erofs_fscache_map *fsmap)
> +{
> +	struct fscache_cookie *cookie = fsmap->m_ctx->cookie;
> +	loff_t start = ractx->start + ractx->done;
> +	size_t length = erofs_fscache_calc_len(ractx, fsmap);
> +	loff_t pstart = fsmap->m_pa + (fsmap->o_la - fsmap->m_la);
> +	int ret;
> +
> +	ret = erofs_fscache_read_pages(cookie, ractx->mapping,
> +				       start, length, pstart);
> +	if (!ret) {
> +		erofs_fscache_unlock_pages(ractx->rac, length);
> +		ret = length;
> +	}
> +
> +	return ret;
> +}
> +
> +static int erofs_fscache_ra_inline(struct erofs_fscahce_ra_ctx *ractx,
> +				   struct erofs_fscache_map *fsmap)
> +{

We could fold in this, since it has the only user.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linux-cachefs] [PATCH v5 09/22] erofs: make erofs_map_blocks() generally available
  2022-03-16 13:17   ` Jeffle Xu
@ 2022-03-17  5:35     ` Gao Xiang
  -1 siblings, 0 replies; 102+ messages in thread
From: Gao Xiang @ 2022-03-17  5:35 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: gregkh, willy, linux-kernel, dhowells, joseph.qi, linux-cachefs,
	torvalds, linux-fsdevel, luodaowen.backend, gerry, linux-erofs

On Wed, Mar 16, 2022 at 09:17:10PM +0800, Jeffle Xu wrote:
> ... so that it can be used in the following introduced fs/erofs/fscache.c.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linux-cachefs] [PATCH v5 09/22] erofs: make erofs_map_blocks() generally available
@ 2022-03-17  5:35     ` Gao Xiang
  0 siblings, 0 replies; 102+ messages in thread
From: Gao Xiang @ 2022-03-17  5:35 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, gregkh,
	tao.peng, willy, linux-kernel, joseph.qi, bo.liu, linux-fsdevel,
	luodaowen.backend, eguan, gerry, torvalds

On Wed, Mar 16, 2022 at 09:17:10PM +0800, Jeffle Xu wrote:
> ... so that it can be used in the following introduced fs/erofs/fscache.c.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linux-cachefs] [PATCH v5 10/22] erofs: add mode checking helper
  2022-03-16 13:17   ` Jeffle Xu
@ 2022-03-17  5:36     ` Gao Xiang
  -1 siblings, 0 replies; 102+ messages in thread
From: Gao Xiang @ 2022-03-17  5:36 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: gregkh, willy, linux-kernel, dhowells, joseph.qi, linux-cachefs,
	torvalds, linux-fsdevel, luodaowen.backend, gerry, linux-erofs

On Wed, Mar 16, 2022 at 09:17:11PM +0800, Jeffle Xu wrote:
> Until then erofs is exactly blockdev based filesystem. In other using
> scenarios (e.g. container image), erofs needs to run upon files.
> 
> This patch set is going to introduces a new nodev mode, in which erofs
> could be mounted from a bootstrap blob file containing complete erofs
> image.
> 
> Add a helper checking which mode erofs works in.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/erofs/internal.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index e424293f47a2..f66af9ebda43 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -161,6 +161,11 @@ struct erofs_sb_info {
>  #define set_opt(opt, option)	((opt)->mount_opt |= EROFS_MOUNT_##option)
>  #define test_opt(opt, option)	((opt)->mount_opt & EROFS_MOUNT_##option)
>  
> +static inline bool erofs_bdev_mode(struct super_block *sb)

How about renaming it as erofs_is_nodev_mode()?

Thanks,
Gao Xiang

> +{
> +	return sb->s_bdev;
> +}
> +
>  enum {
>  	EROFS_ZIP_CACHE_DISABLED,
>  	EROFS_ZIP_CACHE_READAHEAD,
> -- 
> 2.27.0
> 
> --
> Linux-cachefs mailing list
> Linux-cachefs@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-cachefs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linux-cachefs] [PATCH v5 10/22] erofs: add mode checking helper
@ 2022-03-17  5:36     ` Gao Xiang
  0 siblings, 0 replies; 102+ messages in thread
From: Gao Xiang @ 2022-03-17  5:36 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, gregkh,
	tao.peng, willy, linux-kernel, joseph.qi, bo.liu, linux-fsdevel,
	luodaowen.backend, eguan, gerry, torvalds

On Wed, Mar 16, 2022 at 09:17:11PM +0800, Jeffle Xu wrote:
> Until then erofs is exactly blockdev based filesystem. In other using
> scenarios (e.g. container image), erofs needs to run upon files.
> 
> This patch set is going to introduces a new nodev mode, in which erofs
> could be mounted from a bootstrap blob file containing complete erofs
> image.
> 
> Add a helper checking which mode erofs works in.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/erofs/internal.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index e424293f47a2..f66af9ebda43 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -161,6 +161,11 @@ struct erofs_sb_info {
>  #define set_opt(opt, option)	((opt)->mount_opt |= EROFS_MOUNT_##option)
>  #define test_opt(opt, option)	((opt)->mount_opt & EROFS_MOUNT_##option)
>  
> +static inline bool erofs_bdev_mode(struct super_block *sb)

How about renaming it as erofs_is_nodev_mode()?

Thanks,
Gao Xiang

> +{
> +	return sb->s_bdev;
> +}
> +
>  enum {
>  	EROFS_ZIP_CACHE_DISABLED,
>  	EROFS_ZIP_CACHE_READAHEAD,
> -- 
> 2.27.0
> 
> --
> Linux-cachefs mailing list
> Linux-cachefs@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-cachefs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linux-cachefs] [PATCH v5 17/22] erofs: implement fscache-based data read for non-inline layout
  2022-03-16 13:17   ` Jeffle Xu
@ 2022-03-17  6:18     ` Gao Xiang
  -1 siblings, 0 replies; 102+ messages in thread
From: Gao Xiang @ 2022-03-17  6:18 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: gregkh, willy, linux-kernel, dhowells, joseph.qi, linux-cachefs,
	torvalds, linux-fsdevel, luodaowen.backend, gerry, linux-erofs

On Wed, Mar 16, 2022 at 09:17:18PM +0800, Jeffle Xu wrote:
> This patch implements the data plane of reading data from bootstrap blob
> file over fscache for non-inline layout.
> 
> Be noted that compressed layout is not supported yet.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/erofs/fscache.c  | 94 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/erofs/inode.c    |  6 ++-
>  fs/erofs/internal.h |  1 +
>  3 files changed, 100 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 654414aa87ad..df56562f33c4 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -4,6 +4,12 @@
>   */
>  #include "internal.h"
>  
> +struct erofs_fscache_map {
> +	struct erofs_fscache_context *m_ctx;
> +	erofs_off_t m_pa, m_la, o_la;
> +	u64 m_llen;

Can we directly use "struct erofs_map_blocks map"?
So "erofs_fscache_get_map" can be avoided then.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linux-cachefs] [PATCH v5 17/22] erofs: implement fscache-based data read for non-inline layout
@ 2022-03-17  6:18     ` Gao Xiang
  0 siblings, 0 replies; 102+ messages in thread
From: Gao Xiang @ 2022-03-17  6:18 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, gregkh,
	tao.peng, willy, linux-kernel, joseph.qi, bo.liu, linux-fsdevel,
	luodaowen.backend, eguan, gerry, torvalds

On Wed, Mar 16, 2022 at 09:17:18PM +0800, Jeffle Xu wrote:
> This patch implements the data plane of reading data from bootstrap blob
> file over fscache for non-inline layout.
> 
> Be noted that compressed layout is not supported yet.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/erofs/fscache.c  | 94 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/erofs/inode.c    |  6 ++-
>  fs/erofs/internal.h |  1 +
>  3 files changed, 100 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 654414aa87ad..df56562f33c4 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -4,6 +4,12 @@
>   */
>  #include "internal.h"
>  
> +struct erofs_fscache_map {
> +	struct erofs_fscache_context *m_ctx;
> +	erofs_off_t m_pa, m_la, o_la;
> +	u64 m_llen;

Can we directly use "struct erofs_map_blocks map"?
So "erofs_fscache_get_map" can be avoided then.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linux-cachefs] [PATCH v5 10/22] erofs: add mode checking helper
  2022-03-17  5:36     ` Gao Xiang
  (?)
@ 2022-03-18  5:26     ` JeffleXu
  -1 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-18  5:26 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs, gregkh,
	tao.peng, willy, linux-kernel, joseph.qi, bo.liu, linux-fsdevel,
	luodaowen.backend, eguan, gerry, torvalds



On 3/17/22 1:36 PM, Gao Xiang wrote:
> On Wed, Mar 16, 2022 at 09:17:11PM +0800, Jeffle Xu wrote:
>> Until then erofs is exactly blockdev based filesystem. In other using
>> scenarios (e.g. container image), erofs needs to run upon files.
>>
>> This patch set is going to introduces a new nodev mode, in which erofs
>> could be mounted from a bootstrap blob file containing complete erofs
>> image.
>>
>> Add a helper checking which mode erofs works in.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  fs/erofs/internal.h | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>> index e424293f47a2..f66af9ebda43 100644
>> --- a/fs/erofs/internal.h
>> +++ b/fs/erofs/internal.h
>> @@ -161,6 +161,11 @@ struct erofs_sb_info {
>>  #define set_opt(opt, option)	((opt)->mount_opt |= EROFS_MOUNT_##option)
>>  #define test_opt(opt, option)	((opt)->mount_opt & EROFS_MOUNT_##option)
>>  
>> +static inline bool erofs_bdev_mode(struct super_block *sb)
> 
> How about renaming it as erofs_is_nodev_mode()?

Sure, will be renamed in the next version.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linux-cachefs] [PATCH v5 17/22] erofs: implement fscache-based data read for non-inline layout
  2022-03-17  6:18     ` Gao Xiang
  (?)
@ 2022-03-18  5:29     ` JeffleXu
  -1 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-18  5:29 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs, gregkh,
	tao.peng, willy, linux-kernel, joseph.qi, bo.liu, linux-fsdevel,
	luodaowen.backend, eguan, gerry, torvalds



On 3/17/22 2:18 PM, Gao Xiang wrote:
> On Wed, Mar 16, 2022 at 09:17:18PM +0800, Jeffle Xu wrote:
>> This patch implements the data plane of reading data from bootstrap blob
>> file over fscache for non-inline layout.
>>
>> Be noted that compressed layout is not supported yet.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  fs/erofs/fscache.c  | 94 +++++++++++++++++++++++++++++++++++++++++++++
>>  fs/erofs/inode.c    |  6 ++-
>>  fs/erofs/internal.h |  1 +
>>  3 files changed, 100 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
>> index 654414aa87ad..df56562f33c4 100644
>> --- a/fs/erofs/fscache.c
>> +++ b/fs/erofs/fscache.c
>> @@ -4,6 +4,12 @@
>>   */
>>  #include "internal.h"
>>  
>> +struct erofs_fscache_map {
>> +	struct erofs_fscache_context *m_ctx;
>> +	erofs_off_t m_pa, m_la, o_la;
>> +	u64 m_llen;
> 
> Can we directly use "struct erofs_map_blocks map"?
> So "erofs_fscache_get_map" can be avoided then.

OK, the extra fields will be folded into "struct erofs_map_blocks map".

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 21/22] erofs: implement fscache-based data readahead
  2022-03-17  5:22     ` Gao Xiang
  (?)
@ 2022-03-18  5:41     ` JeffleXu
  -1 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-18  5:41 UTC (permalink / raw)
  To: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend



On 3/17/22 1:22 PM, Gao Xiang wrote:
> On Wed, Mar 16, 2022 at 09:17:22PM +0800, Jeffle Xu wrote:
>> This patch implements fscache-based data readahead. Also registers an
>> individual bdi for each erofs instance to enable readahead.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  fs/erofs/fscache.c | 153 +++++++++++++++++++++++++++++++++++++++++++++
>>  fs/erofs/super.c   |   4 ++
>>  2 files changed, 157 insertions(+)
>>
>> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
>> index 82c52b6e077e..913ca891deb9 100644
>> --- a/fs/erofs/fscache.c
>> +++ b/fs/erofs/fscache.c
>> @@ -10,6 +10,13 @@ struct erofs_fscache_map {
>>  	u64 m_llen;
>>  };
>>  
>> +struct erofs_fscahce_ra_ctx {
> 
> typo,  should be `erofs_fscache_ra_ctx'

Oops. Thanks.


> 
>> +	struct readahead_control *rac;
>> +	struct address_space *mapping;
>> +	loff_t start;
>> +	size_t len, done;
>> +};
>> +
>>  static struct fscache_volume *volume;
>>  
>>  /*
>> @@ -199,12 +206,158 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
>>  	return ret;
>>  }
>>  
>> +static inline size_t erofs_fscache_calc_len(struct erofs_fscahce_ra_ctx *ractx,
>> +					    struct erofs_fscache_map *fsmap)
>> +{
>> +	/*
>> +	 * 1) For CHUNK_BASED layout, the output m_la is rounded down to the
>> +	 * nearest chunk boundary, and the output m_llen actually starts from
>> +	 * the start of the containing chunk.
>> +	 * 2) For other cases, the output m_la is equal to o_la.
>> +	 */
>> +	size_t len = fsmap->m_llen - (fsmap->o_la - fsmap->m_la);
>> +
>> +	return min_t(size_t, len, ractx->len - ractx->done);
>> +}
>> +
>> +static inline void erofs_fscache_unlock_pages(struct readahead_control *rac,
>> +					      size_t len)
> 
> Can we convert them into folios in advance? it seems much
> straight-forward to convert these...
> 
> Or I have to convert them later, and it seems unnecessary...

OK I will try to use folio API in the next version.


> 
> 
>> +{
>> +	while (len) {
>> +		struct page *page = readahead_page(rac);
>> +
>> +		SetPageUptodate(page);
>> +		unlock_page(page);
>> +		put_page(page);
>> +
>> +		len -= PAGE_SIZE;
>> +	}
>> +}
>> +
>> +static int erofs_fscache_ra_hole(struct erofs_fscahce_ra_ctx *ractx,
>> +				 struct erofs_fscache_map *fsmap)
>> +{
>> +	struct iov_iter iter;
>> +	loff_t start = ractx->start + ractx->done;
>> +	size_t length = erofs_fscache_calc_len(ractx, fsmap);
>> +
>> +	iov_iter_xarray(&iter, READ, &ractx->mapping->i_pages, start, length);
>> +	iov_iter_zero(length, &iter);
>> +
>> +	erofs_fscache_unlock_pages(ractx->rac, length);
>> +	return length;
>> +}
>> +
>> +static int erofs_fscache_ra_noinline(struct erofs_fscahce_ra_ctx *ractx,
>> +				     struct erofs_fscache_map *fsmap)
>> +{
>> +	struct fscache_cookie *cookie = fsmap->m_ctx->cookie;
>> +	loff_t start = ractx->start + ractx->done;
>> +	size_t length = erofs_fscache_calc_len(ractx, fsmap);
>> +	loff_t pstart = fsmap->m_pa + (fsmap->o_la - fsmap->m_la);
>> +	int ret;
>> +
>> +	ret = erofs_fscache_read_pages(cookie, ractx->mapping,
>> +				       start, length, pstart);
>> +	if (!ret) {
>> +		erofs_fscache_unlock_pages(ractx->rac, length);
>> +		ret = length;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static int erofs_fscache_ra_inline(struct erofs_fscahce_ra_ctx *ractx,
>> +				   struct erofs_fscache_map *fsmap)
>> +{
> 
> We could fold in this, since it has the only user.

OK, and "struct erofs_fscahce_ra_ctx" is not needed then.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-21 13:34   ` David Howells
  -1 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 13:34 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> Fscache/cachefiles used to serve as a local cache for remote fs. This
> patch, along with the following patches, introduces a new on-demand read
> mode for cachefiles, which can boost the scenario where on-demand read
> semantics is needed, e.g. container image distribution.
> 
> The essential difference between the original mode and on-demand read
> mode is that, in the original mode, when cache miss, netfs itself will
> fetch data from remote, and then write the fetched data into cache file.
> While in on-demand read mode, a user daemon is responsible for fetching
> data and then writing to the cache file.
> 
> This patch only adds the command to enable on-demand read mode. An optional
> parameter to "bind" command is added. On-demand mode will be turned on when
> this optional argument matches "ondemand" exactly, i.e.  "bind
> ondemand". Otherwise cachefiles will keep working in the original mode.

You're not really adding a command, per se.  Also, I would recommend
starting the paragraph with a verb.  How about:

	Make it possible to enable on-demand read mode by adding an
	optional parameter to the "bind" command.  On-demand mode will be
	turned on when this parameter is "ondemand", i.e. "bind ondemand".
	Otherwise cachefiles will work in the original mode.

Also, I'd add a note something like the following:

	This is implemented as a variation on the bind command so that it
	can't be turned on accidentally in /etc/cachefilesd.conf when
	cachefilesd isn't expecting it.	

> The following patches will implement the data plane of on-demand read
> mode.

I would remove this line.  If ondemand mode is not fully implemented in
cachefiles at this point, I would be tempted to move this to the end of the
cachefiles subset of the patchset.  That said, I'm not sure it can be made
to do anything much before that point.

> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> +static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
> +{
> +	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
> +	rwlock_init(&cache->reqs_lock);
> +}

Just merge that into the caller.

> +static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
> +{
> +	xa_destroy(&cache->reqs);
> +}

Ditto.

> +static inline
> +bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
> +{
> +	if (!strcmp(args, "ondemand")) {
> +		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
> +		return true;
> +	}
> +
> +	return false;
> +}
> ...
> +	if (!cachefiles_ondemand_daemon_bind(cache, args) && *args) {
> +		pr_err("'bind' command doesn't take an argument\n");
> +		return -EINVAL;
> +	}
> +

I would merge these together, I think, and say something like "Ondemand
mode not enabled in kernel" if CONFIG_CACHEFILES_ONDEMAND=n.

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-21 13:34   ` David Howells
  0 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 13:34 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: linux-erofs, willy, linux-kernel, dhowells, joseph.qi,
	linux-cachefs, gregkh, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> Fscache/cachefiles used to serve as a local cache for remote fs. This
> patch, along with the following patches, introduces a new on-demand read
> mode for cachefiles, which can boost the scenario where on-demand read
> semantics is needed, e.g. container image distribution.
> 
> The essential difference between the original mode and on-demand read
> mode is that, in the original mode, when cache miss, netfs itself will
> fetch data from remote, and then write the fetched data into cache file.
> While in on-demand read mode, a user daemon is responsible for fetching
> data and then writing to the cache file.
> 
> This patch only adds the command to enable on-demand read mode. An optional
> parameter to "bind" command is added. On-demand mode will be turned on when
> this optional argument matches "ondemand" exactly, i.e.  "bind
> ondemand". Otherwise cachefiles will keep working in the original mode.

You're not really adding a command, per se.  Also, I would recommend
starting the paragraph with a verb.  How about:

	Make it possible to enable on-demand read mode by adding an
	optional parameter to the "bind" command.  On-demand mode will be
	turned on when this parameter is "ondemand", i.e. "bind ondemand".
	Otherwise cachefiles will work in the original mode.

Also, I'd add a note something like the following:

	This is implemented as a variation on the bind command so that it
	can't be turned on accidentally in /etc/cachefilesd.conf when
	cachefilesd isn't expecting it.	

> The following patches will implement the data plane of on-demand read
> mode.

I would remove this line.  If ondemand mode is not fully implemented in
cachefiles at this point, I would be tempted to move this to the end of the
cachefiles subset of the patchset.  That said, I'm not sure it can be made
to do anything much before that point.

> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> +static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
> +{
> +	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
> +	rwlock_init(&cache->reqs_lock);
> +}

Just merge that into the caller.

> +static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
> +{
> +	xa_destroy(&cache->reqs);
> +}

Ditto.

> +static inline
> +bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
> +{
> +	if (!strcmp(args, "ondemand")) {
> +		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
> +		return true;
> +	}
> +
> +	return false;
> +}
> ...
> +	if (!cachefiles_ondemand_daemon_bind(cache, args) && *args) {
> +		pr_err("'bind' command doesn't take an argument\n");
> +		return -EINVAL;
> +	}
> +

I would merge these together, I think, and say something like "Ondemand
mode not enabled in kernel" if CONFIG_CACHEFILES_ONDEMAND=n.

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-16 13:17   ` Jeffle Xu
@ 2022-03-21 13:40     ` Matthew Wilcox
  -1 siblings, 0 replies; 102+ messages in thread
From: Matthew Wilcox @ 2022-03-21 13:40 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend

On Wed, Mar 16, 2022 at 09:17:04PM +0800, Jeffle Xu wrote:
> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> +	struct xarray			reqs;		/* xarray of pending on-demand requests */
> +	rwlock_t			reqs_lock;	/* Lock for reqs xarray */

Why do you have a separate rwlock when the xarray already has its own
spinlock?  This is usually a really bad idea.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-21 13:40     ` Matthew Wilcox
  0 siblings, 0 replies; 102+ messages in thread
From: Matthew Wilcox @ 2022-03-21 13:40 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: linux-erofs, linux-kernel, dhowells, joseph.qi, linux-cachefs,
	gregkh, linux-fsdevel, luodaowen.backend, gerry, torvalds

On Wed, Mar 16, 2022 at 09:17:04PM +0800, Jeffle Xu wrote:
> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> +	struct xarray			reqs;		/* xarray of pending on-demand requests */
> +	rwlock_t			reqs_lock;	/* Lock for reqs xarray */

Why do you have a separate rwlock when the xarray already has its own
spinlock?  This is usually a really bad idea.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 04/22] cachefiles: notify user daemon with anon_fd when looking up cookie
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-21 14:01   ` David Howells
  -1 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 14:01 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: linux-erofs, willy, linux-kernel, dhowells, joseph.qi,
	linux-cachefs, gregkh, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	read_lock(&cache->reqs_lock);
> +
> +	/* recheck dead state under lock */
> +	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
> +		read_unlock(&cache->reqs_lock);
> +		ret = -EIO;
> +		goto out;
> +	}
> +
> +	xa_lock(xa);
> +	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);

You're holding a spinlock.  You can't use GFP_KERNEL.

> +static int cachefiles_ondemand_cinit(struct cachefiles_cache *cache, char *args)
> +{
> ...
> +	tmp = kstrdup(args, GFP_KERNEL);

No need to copy the string.  The caller already did that and added a NUL for
good measure.

I would probably move most of the functions added in this patch to
fs/cachefiles/ondemand.c.

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 04/22] cachefiles: notify user daemon with anon_fd when looking up cookie
@ 2022-03-21 14:01   ` David Howells
  0 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 14:01 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	read_lock(&cache->reqs_lock);
> +
> +	/* recheck dead state under lock */
> +	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
> +		read_unlock(&cache->reqs_lock);
> +		ret = -EIO;
> +		goto out;
> +	}
> +
> +	xa_lock(xa);
> +	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);

You're holding a spinlock.  You can't use GFP_KERNEL.

> +static int cachefiles_ondemand_cinit(struct cachefiles_cache *cache, char *args)
> +{
> ...
> +	tmp = kstrdup(args, GFP_KERNEL);

No need to copy the string.  The caller already did that and added a NUL for
good measure.

I would probably move most of the functions added in this patch to
fs/cachefiles/ondemand.c.

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 06/22] cachefiles: implement on-demand read
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-21 14:05   ` David Howells
  -1 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 14:05 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: linux-erofs, willy, linux-kernel, dhowells, joseph.qi,
	linux-cachefs, gregkh, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	{ "cread",	cachefiles_ondemand_cread	},

Rather than adding the cread command, would it be better to use an ioctl on
the anon fd as the /dev/cachefiles write op is serialised?

> +	/* Stop enqueuig request when daemon closes anon_fd prematurely. */

"Enqueuing"

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 06/22] cachefiles: implement on-demand read
@ 2022-03-21 14:05   ` David Howells
  0 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 14:05 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +	{ "cread",	cachefiles_ondemand_cread	},

Rather than adding the cread command, would it be better to use an ioctl on
the anon fd as the /dev/cachefiles write op is serialised?

> +	/* Stop enqueuig request when daemon closes anon_fd prematurely. */

"Enqueuing"

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-21 13:40     ` Matthew Wilcox
@ 2022-03-21 14:08       ` JeffleXu
  -1 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 14:08 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-erofs, linux-kernel, dhowells, joseph.qi, linux-cachefs,
	gregkh, linux-fsdevel, luodaowen.backend, gerry, torvalds



On 3/21/22 9:40 PM, Matthew Wilcox wrote:
> On Wed, Mar 16, 2022 at 09:17:04PM +0800, Jeffle Xu wrote:
>> +#ifdef CONFIG_CACHEFILES_ONDEMAND
>> +	struct xarray			reqs;		/* xarray of pending on-demand requests */
>> +	rwlock_t			reqs_lock;	/* Lock for reqs xarray */
> 
> Why do you have a separate rwlock when the xarray already has its own
> spinlock?  This is usually a really bad idea.

Hi,

Thanks for reviewing.

reqs_lock is also used to protect the check of cache->flags. Please
refer to patch 4 [1] of this patchset.

```
+	/*
+	 * Enqueue the pending request.
+	 *
+	 * Stop enqueuing the request when daemon is dying. So we need to
+	 * 1) check cache state, and 2) enqueue request if cache is alive.
+	 *
+	 * The above two ops need to be atomic as a whole. @reqs_lock is used
+	 * here to ensure that. Otherwise, request may be enqueued after xarray
+	 * has been flushed, in which case the orphan request will never be
+	 * completed and thus netfs will hang there forever.
+	 */
+	read_lock(&cache->reqs_lock);
+
+	/* recheck dead state under lock */
+	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+		read_unlock(&cache->reqs_lock);
+		ret = -EIO;
+		goto out;
+	}
+
+	xa_lock(xa);
+	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
+	if (!ret)
+		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
+	xa_unlock(xa);
+
+	read_unlock(&cache->reqs_lock);
```

It's mainly used to protect against the xarray flush.

Besides, IMHO read-write lock shall be more performance friendly, since
most cases are the read side.


[1] https://lkml.org/lkml/2022/3/16/351

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-21 14:08       ` JeffleXu
  0 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 14:08 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend



On 3/21/22 9:40 PM, Matthew Wilcox wrote:
> On Wed, Mar 16, 2022 at 09:17:04PM +0800, Jeffle Xu wrote:
>> +#ifdef CONFIG_CACHEFILES_ONDEMAND
>> +	struct xarray			reqs;		/* xarray of pending on-demand requests */
>> +	rwlock_t			reqs_lock;	/* Lock for reqs xarray */
> 
> Why do you have a separate rwlock when the xarray already has its own
> spinlock?  This is usually a really bad idea.

Hi,

Thanks for reviewing.

reqs_lock is also used to protect the check of cache->flags. Please
refer to patch 4 [1] of this patchset.

```
+	/*
+	 * Enqueue the pending request.
+	 *
+	 * Stop enqueuing the request when daemon is dying. So we need to
+	 * 1) check cache state, and 2) enqueue request if cache is alive.
+	 *
+	 * The above two ops need to be atomic as a whole. @reqs_lock is used
+	 * here to ensure that. Otherwise, request may be enqueued after xarray
+	 * has been flushed, in which case the orphan request will never be
+	 * completed and thus netfs will hang there forever.
+	 */
+	read_lock(&cache->reqs_lock);
+
+	/* recheck dead state under lock */
+	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+		read_unlock(&cache->reqs_lock);
+		ret = -EIO;
+		goto out;
+	}
+
+	xa_lock(xa);
+	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
+	if (!ret)
+		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
+	xa_unlock(xa);
+
+	read_unlock(&cache->reqs_lock);
```

It's mainly used to protect against the xarray flush.

Besides, IMHO read-write lock shall be more performance friendly, since
most cases are the read side.


[1] https://lkml.org/lkml/2022/3/16/351

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-16 13:17   ` Jeffle Xu
@ 2022-03-21 14:14     ` David Howells
  -1 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 14:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: joseph.qi, torvalds, linux-kernel, dhowells, linux-fsdevel,
	linux-cachefs, gregkh, luodaowen.backend, gerry, linux-erofs

Matthew Wilcox <willy@infradead.org> wrote:

> Why do you have a separate rwlock when the xarray already has its own
> spinlock?  This is usually a really bad idea.

Jeffle wants to hold a lock across the CACHEFILES_DEAD check and the xarray
access.

However, he tells xarray to do a GFP_KERNEL alloc whilst holding the rwlock:-/

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-21 14:14     ` David Howells
  0 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 14:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: dhowells, Jeffle Xu, linux-cachefs, xiang, chao, linux-erofs,
	torvalds, gregkh, linux-fsdevel, joseph.qi, bo.liu, tao.peng,
	gerry, eguan, linux-kernel, luodaowen.backend

Matthew Wilcox <willy@infradead.org> wrote:

> Why do you have a separate rwlock when the xarray already has its own
> spinlock?  This is usually a really bad idea.

Jeffle wants to hold a lock across the CACHEFILES_DEAD check and the xarray
access.

However, he tells xarray to do a GFP_KERNEL alloc whilst holding the rwlock:-/

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-21 13:34   ` David Howells
@ 2022-03-21 14:16     ` JeffleXu
  -1 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 14:16 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, willy, linux-kernel, joseph.qi, linux-cachefs,
	gregkh, linux-fsdevel, luodaowen.backend, gerry, torvalds

Hi,

Thanks for reviewing.


On 3/21/22 9:34 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> Fscache/cachefiles used to serve as a local cache for remote fs. This
>> patch, along with the following patches, introduces a new on-demand read
>> mode for cachefiles, which can boost the scenario where on-demand read
>> semantics is needed, e.g. container image distribution.
>>
>> The essential difference between the original mode and on-demand read
>> mode is that, in the original mode, when cache miss, netfs itself will
>> fetch data from remote, and then write the fetched data into cache file.
>> While in on-demand read mode, a user daemon is responsible for fetching
>> data and then writing to the cache file.
>>
>> This patch only adds the command to enable on-demand read mode. An optional
>> parameter to "bind" command is added. On-demand mode will be turned on when
>> this optional argument matches "ondemand" exactly, i.e.  "bind
>> ondemand". Otherwise cachefiles will keep working in the original mode.
> 
> You're not really adding a command, per se.  Also, I would recommend
> starting the paragraph with a verb.  How about:
> 
> 	Make it possible to enable on-demand read mode by adding an
> 	optional parameter to the "bind" command.  On-demand mode will be
> 	turned on when this parameter is "ondemand", i.e. "bind ondemand".
> 	Otherwise cachefiles will work in the original mode.
> 
> Also, I'd add a note something like the following:
> 
> 	This is implemented as a variation on the bind command so that it
> 	can't be turned on accidentally in /etc/cachefilesd.conf when
> 	cachefilesd isn't expecting it.	

Alright, looks much better :)

> 
>> The following patches will implement the data plane of on-demand read
>> mode.
> 
> I would remove this line.  If ondemand mode is not fully implemented in
> cachefiles at this point, I would be tempted to move this to the end of the
> cachefiles subset of the patchset.  That said, I'm not sure it can be made
> to do anything much before that point.


Alright.

> 
>> +#ifdef CONFIG_CACHEFILES_ONDEMAND
>> +static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
>> +{
>> +	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
>> +	rwlock_init(&cache->reqs_lock);
>> +}
> 
> Just merge that into the caller.
> 
>> +static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
>> +{
>> +	xa_destroy(&cache->reqs);
>> +}
> 
> Ditto.
> 
>> +static inline
>> +bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
>> +{
>> +	if (!strcmp(args, "ondemand")) {
>> +		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>> ...
>> +	if (!cachefiles_ondemand_daemon_bind(cache, args) && *args) {
>> +		pr_err("'bind' command doesn't take an argument\n");
>> +		return -EINVAL;
>> +	}
>> +
> 
> I would merge these together, I think, and say something like "Ondemand
> mode not enabled in kernel" if CONFIG_CACHEFILES_ONDEMAND=n.
> 

The reason why I extract all these logic into small sized function is
that, the **callers** can call cachefiles_ondemand_daemon_bind()
directly without any clause like:

```
#ifdef CONFIG_CACHEFILES_ONDEMAND
	...
#else
	...
```



Another choice is like

```
if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND))
	...
else
	...
```


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-21 14:16     ` JeffleXu
  0 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 14:16 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend

Hi,

Thanks for reviewing.


On 3/21/22 9:34 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> Fscache/cachefiles used to serve as a local cache for remote fs. This
>> patch, along with the following patches, introduces a new on-demand read
>> mode for cachefiles, which can boost the scenario where on-demand read
>> semantics is needed, e.g. container image distribution.
>>
>> The essential difference between the original mode and on-demand read
>> mode is that, in the original mode, when cache miss, netfs itself will
>> fetch data from remote, and then write the fetched data into cache file.
>> While in on-demand read mode, a user daemon is responsible for fetching
>> data and then writing to the cache file.
>>
>> This patch only adds the command to enable on-demand read mode. An optional
>> parameter to "bind" command is added. On-demand mode will be turned on when
>> this optional argument matches "ondemand" exactly, i.e.  "bind
>> ondemand". Otherwise cachefiles will keep working in the original mode.
> 
> You're not really adding a command, per se.  Also, I would recommend
> starting the paragraph with a verb.  How about:
> 
> 	Make it possible to enable on-demand read mode by adding an
> 	optional parameter to the "bind" command.  On-demand mode will be
> 	turned on when this parameter is "ondemand", i.e. "bind ondemand".
> 	Otherwise cachefiles will work in the original mode.
> 
> Also, I'd add a note something like the following:
> 
> 	This is implemented as a variation on the bind command so that it
> 	can't be turned on accidentally in /etc/cachefilesd.conf when
> 	cachefilesd isn't expecting it.	

Alright, looks much better :)

> 
>> The following patches will implement the data plane of on-demand read
>> mode.
> 
> I would remove this line.  If ondemand mode is not fully implemented in
> cachefiles at this point, I would be tempted to move this to the end of the
> cachefiles subset of the patchset.  That said, I'm not sure it can be made
> to do anything much before that point.


Alright.

> 
>> +#ifdef CONFIG_CACHEFILES_ONDEMAND
>> +static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
>> +{
>> +	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
>> +	rwlock_init(&cache->reqs_lock);
>> +}
> 
> Just merge that into the caller.
> 
>> +static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
>> +{
>> +	xa_destroy(&cache->reqs);
>> +}
> 
> Ditto.
> 
>> +static inline
>> +bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
>> +{
>> +	if (!strcmp(args, "ondemand")) {
>> +		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>> ...
>> +	if (!cachefiles_ondemand_daemon_bind(cache, args) && *args) {
>> +		pr_err("'bind' command doesn't take an argument\n");
>> +		return -EINVAL;
>> +	}
>> +
> 
> I would merge these together, I think, and say something like "Ondemand
> mode not enabled in kernel" if CONFIG_CACHEFILES_ONDEMAND=n.
> 

The reason why I extract all these logic into small sized function is
that, the **callers** can call cachefiles_ondemand_daemon_bind()
directly without any clause like:

```
#ifdef CONFIG_CACHEFILES_ONDEMAND
	...
#else
	...
```



Another choice is like

```
if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND))
	...
else
	...
```


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 05/22] cachefiles: notify user daemon when withdrawing cookie
  2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
@ 2022-03-21 14:20   ` David Howells
  -1 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 14:20 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: linux-erofs, willy, linux-kernel, dhowells, joseph.qi,
	linux-cachefs, gregkh, linux-fsdevel, luodaowen.backend, gerry,
	torvalds

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> Notify user daemon that cookie is going to be withdrawed, providing a

"withdrawn".

> +	/* CLOSE request doesn't look forward a reply */

I'm not sure what you mean.

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 05/22] cachefiles: notify user daemon when withdrawing cookie
@ 2022-03-21 14:20   ` David Howells
  0 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 14:20 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, willy, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry,
	eguan, linux-kernel, luodaowen.backend

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> Notify user daemon that cookie is going to be withdrawed, providing a

"withdrawn".

> +	/* CLOSE request doesn't look forward a reply */

I'm not sure what you mean.

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linux-cachefs] [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-21 14:14     ` David Howells
@ 2022-03-21 14:20       ` Gao Xiang
  -1 siblings, 0 replies; 102+ messages in thread
From: Gao Xiang @ 2022-03-21 14:20 UTC (permalink / raw)
  To: David Howells
  Cc: gregkh, linux-erofs, Matthew Wilcox, linux-kernel, joseph.qi,
	linux-cachefs, linux-fsdevel, luodaowen.backend, gerry, torvalds

On Mon, Mar 21, 2022 at 02:14:03PM +0000, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > Why do you have a separate rwlock when the xarray already has its own
> > spinlock?  This is usually a really bad idea.
> 
> Jeffle wants to hold a lock across the CACHEFILES_DEAD check and the xarray
> access.
> 
> However, he tells xarray to do a GFP_KERNEL alloc whilst holding the rwlock:-/

Yeah, sorry, there are trivial mistakes due to sleep in atomic
contexts (sorry that I didn't catch them earlier..)

Thanks,
Gao Xiang

> 
> David
> --
> Linux-cachefs mailing list
> Linux-cachefs@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-cachefs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linux-cachefs] [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-21 14:20       ` Gao Xiang
  0 siblings, 0 replies; 102+ messages in thread
From: Gao Xiang @ 2022-03-21 14:20 UTC (permalink / raw)
  To: David Howells
  Cc: Matthew Wilcox, joseph.qi, torvalds, chao, tao.peng,
	linux-kernel, linux-fsdevel, linux-cachefs, bo.liu, gregkh,
	luodaowen.backend, xiang, gerry, linux-erofs, eguan

On Mon, Mar 21, 2022 at 02:14:03PM +0000, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > Why do you have a separate rwlock when the xarray already has its own
> > spinlock?  This is usually a really bad idea.
> 
> Jeffle wants to hold a lock across the CACHEFILES_DEAD check and the xarray
> access.
> 
> However, he tells xarray to do a GFP_KERNEL alloc whilst holding the rwlock:-/

Yeah, sorry, there are trivial mistakes due to sleep in atomic
contexts (sorry that I didn't catch them earlier..)

Thanks,
Gao Xiang

> 
> David
> --
> Linux-cachefs mailing list
> Linux-cachefs@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-cachefs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-21 14:08       ` JeffleXu
@ 2022-03-21 14:26         ` Matthew Wilcox
  -1 siblings, 0 replies; 102+ messages in thread
From: Matthew Wilcox @ 2022-03-21 14:26 UTC (permalink / raw)
  To: JeffleXu
  Cc: linux-erofs, linux-kernel, dhowells, joseph.qi, linux-cachefs,
	gregkh, linux-fsdevel, luodaowen.backend, gerry, torvalds

On Mon, Mar 21, 2022 at 10:08:47PM +0800, JeffleXu wrote:
> reqs_lock is also used to protect the check of cache->flags. Please
> refer to patch 4 [1] of this patchset.

Yes, that's exactly what I meant by "bad idea".

> ```
> +	/*
> +	 * Enqueue the pending request.
> +	 *
> +	 * Stop enqueuing the request when daemon is dying. So we need to
> +	 * 1) check cache state, and 2) enqueue request if cache is alive.
> +	 *
> +	 * The above two ops need to be atomic as a whole. @reqs_lock is used
> +	 * here to ensure that. Otherwise, request may be enqueued after xarray
> +	 * has been flushed, in which case the orphan request will never be
> +	 * completed and thus netfs will hang there forever.
> +	 */
> +	read_lock(&cache->reqs_lock);
> +
> +	/* recheck dead state under lock */
> +	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
> +		read_unlock(&cache->reqs_lock);
> +		ret = -EIO;
> +		goto out;
> +	}

So this is an error path.  We're almost always going to take the xa_lock
immediately after taking the read_lock.  In other words, you've done two
atomic operations instead of one.

> +	xa_lock(xa);
> +	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
> +	if (!ret)
> +		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
> +	xa_unlock(xa);
> +
> +	read_unlock(&cache->reqs_lock);
> ```
> 
> It's mainly used to protect against the xarray flush.
> 
> Besides, IMHO read-write lock shall be more performance friendly, since
> most cases are the read side.

That's almost never true.  rwlocks are usually a bad idea because you
still have to bounce the cacheline, so you replace lock contention
(which you can see) with cacheline contention (which is harder to
measure).  And then you have questions about reader/writer fairness
(should new readers queue behind a writer if there's one waiting, or
should a steady stream of readers be able to hold a writer off
indefinitely?)

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-21 14:26         ` Matthew Wilcox
  0 siblings, 0 replies; 102+ messages in thread
From: Matthew Wilcox @ 2022-03-21 14:26 UTC (permalink / raw)
  To: JeffleXu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend

On Mon, Mar 21, 2022 at 10:08:47PM +0800, JeffleXu wrote:
> reqs_lock is also used to protect the check of cache->flags. Please
> refer to patch 4 [1] of this patchset.

Yes, that's exactly what I meant by "bad idea".

> ```
> +	/*
> +	 * Enqueue the pending request.
> +	 *
> +	 * Stop enqueuing the request when daemon is dying. So we need to
> +	 * 1) check cache state, and 2) enqueue request if cache is alive.
> +	 *
> +	 * The above two ops need to be atomic as a whole. @reqs_lock is used
> +	 * here to ensure that. Otherwise, request may be enqueued after xarray
> +	 * has been flushed, in which case the orphan request will never be
> +	 * completed and thus netfs will hang there forever.
> +	 */
> +	read_lock(&cache->reqs_lock);
> +
> +	/* recheck dead state under lock */
> +	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
> +		read_unlock(&cache->reqs_lock);
> +		ret = -EIO;
> +		goto out;
> +	}

So this is an error path.  We're almost always going to take the xa_lock
immediately after taking the read_lock.  In other words, you've done two
atomic operations instead of one.

> +	xa_lock(xa);
> +	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
> +	if (!ret)
> +		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
> +	xa_unlock(xa);
> +
> +	read_unlock(&cache->reqs_lock);
> ```
> 
> It's mainly used to protect against the xarray flush.
> 
> Besides, IMHO read-write lock shall be more performance friendly, since
> most cases are the read side.

That's almost never true.  rwlocks are usually a bad idea because you
still have to bounce the cacheline, so you replace lock contention
(which you can see) with cacheline contention (which is harder to
measure).  And then you have questions about reader/writer fairness
(should new readers queue behind a writer if there's one waiting, or
should a steady stream of readers be able to hold a writer off
indefinitely?)

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 05/22] cachefiles: notify user daemon when withdrawing cookie
  2022-03-21 14:20   ` David Howells
@ 2022-03-21 14:31     ` JeffleXu
  -1 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 14:31 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, willy, linux-kernel, joseph.qi, linux-cachefs,
	gregkh, linux-fsdevel, luodaowen.backend, gerry, torvalds



On 3/21/22 10:20 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> Notify user daemon that cookie is going to be withdrawed, providing a
> 
> "withdrawn".

Thanks.

> 
>> +	/* CLOSE request doesn't look forward a reply */
> 
> I'm not sure what you mean.

When cookie gets withdrawn, Cachefiles will send a CLOSE request to user
daemon, telling that the associated anon_fd could be closed. But it's
just a hint. User daemon could keep the anon_fd open when it receives
the CLOSE request. After sending the CLOSE request, Cachefiles will go
on the process of withdrawing cookie and won't wait for a reply
synchronously. So CLOSE request is just a hint to user daemon, and it
doesn't need to be replied.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 05/22] cachefiles: notify user daemon when withdrawing cookie
@ 2022-03-21 14:31     ` JeffleXu
  0 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 14:31 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend



On 3/21/22 10:20 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> Notify user daemon that cookie is going to be withdrawed, providing a
> 
> "withdrawn".

Thanks.

> 
>> +	/* CLOSE request doesn't look forward a reply */
> 
> I'm not sure what you mean.

When cookie gets withdrawn, Cachefiles will send a CLOSE request to user
daemon, telling that the associated anon_fd could be closed. But it's
just a hint. User daemon could keep the anon_fd open when it receives
the CLOSE request. After sending the CLOSE request, Cachefiles will go
on the process of withdrawing cookie and won't wait for a reply
synchronously. So CLOSE request is just a hint to user daemon, and it
doesn't need to be replied.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 04/22] cachefiles: notify user daemon with anon_fd when looking up cookie
  2022-03-21 14:01   ` David Howells
@ 2022-03-21 14:43     ` JeffleXu
  -1 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 14:43 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend



On 3/21/22 10:01 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	read_lock(&cache->reqs_lock);
>> +
>> +	/* recheck dead state under lock */
>> +	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
>> +		read_unlock(&cache->reqs_lock);
>> +		ret = -EIO;
>> +		goto out;
>> +	}
>> +
>> +	xa_lock(xa);
>> +	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
> 
> You're holding a spinlock.  You can't use GFP_KERNEL.

Oh yes... I've dropped into this for second time... Sorry for that.

> 
>> +static int cachefiles_ondemand_cinit(struct cachefiles_cache *cache, char *args)
>> +{
>> ...
>> +	tmp = kstrdup(args, GFP_KERNEL);
> 
> No need to copy the string.  The caller already did that and added a NUL for
> good measure.

Right.


> 
> I would probably move most of the functions added in this patch to
> fs/cachefiles/ondemand.c.

Alright.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 04/22] cachefiles: notify user daemon with anon_fd when looking up cookie
@ 2022-03-21 14:43     ` JeffleXu
  0 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 14:43 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, willy, linux-kernel, joseph.qi, linux-cachefs,
	gregkh, linux-fsdevel, luodaowen.backend, gerry, torvalds



On 3/21/22 10:01 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	read_lock(&cache->reqs_lock);
>> +
>> +	/* recheck dead state under lock */
>> +	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
>> +		read_unlock(&cache->reqs_lock);
>> +		ret = -EIO;
>> +		goto out;
>> +	}
>> +
>> +	xa_lock(xa);
>> +	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
> 
> You're holding a spinlock.  You can't use GFP_KERNEL.

Oh yes... I've dropped into this for second time... Sorry for that.

> 
>> +static int cachefiles_ondemand_cinit(struct cachefiles_cache *cache, char *args)
>> +{
>> ...
>> +	tmp = kstrdup(args, GFP_KERNEL);
> 
> No need to copy the string.  The caller already did that and added a NUL for
> good measure.

Right.


> 
> I would probably move most of the functions added in this patch to
> fs/cachefiles/ondemand.c.

Alright.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 06/22] cachefiles: implement on-demand read
  2022-03-21 14:05   ` David Howells
@ 2022-03-21 14:51     ` JeffleXu
  -1 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 14:51 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh, willy,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend



On 3/21/22 10:05 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	{ "cread",	cachefiles_ondemand_cread	},
> 
> Rather than adding the cread command, would it be better to use an ioctl on
> the anon fd as the /dev/cachefiles write op is serialised?

Sounds reasonable. I will try it in the next version. Thanks.

> 
>> +	/* Stop enqueuig request when daemon closes anon_fd prematurely. */
> 
> "Enqueuing"

Wow.. Maybe I need to update my checkpatch...

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 06/22] cachefiles: implement on-demand read
@ 2022-03-21 14:51     ` JeffleXu
  0 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 14:51 UTC (permalink / raw)
  To: David Howells
  Cc: linux-erofs, willy, linux-kernel, joseph.qi, linux-cachefs,
	gregkh, linux-fsdevel, luodaowen.backend, gerry, torvalds



On 3/21/22 10:05 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> +	{ "cread",	cachefiles_ondemand_cread	},
> 
> Rather than adding the cread command, would it be better to use an ioctl on
> the anon fd as the /dev/cachefiles write op is serialised?

Sounds reasonable. I will try it in the next version. Thanks.

> 
>> +	/* Stop enqueuig request when daemon closes anon_fd prematurely. */
> 
> "Enqueuing"

Wow.. Maybe I need to update my checkpatch...

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-21 14:26         ` Matthew Wilcox
@ 2022-03-21 15:18           ` JeffleXu
  -1 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 15:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend



On 3/21/22 10:26 PM, Matthew Wilcox wrote:
> On Mon, Mar 21, 2022 at 10:08:47PM +0800, JeffleXu wrote:
>> reqs_lock is also used to protect the check of cache->flags. Please
>> refer to patch 4 [1] of this patchset.
> 
> Yes, that's exactly what I meant by "bad idea".
> 
>> ```
>> +	/*
>> +	 * Enqueue the pending request.
>> +	 *
>> +	 * Stop enqueuing the request when daemon is dying. So we need to
>> +	 * 1) check cache state, and 2) enqueue request if cache is alive.
>> +	 *
>> +	 * The above two ops need to be atomic as a whole. @reqs_lock is used
>> +	 * here to ensure that. Otherwise, request may be enqueued after xarray
>> +	 * has been flushed, in which case the orphan request will never be
>> +	 * completed and thus netfs will hang there forever.
>> +	 */
>> +	read_lock(&cache->reqs_lock);
>> +
>> +	/* recheck dead state under lock */
>> +	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
>> +		read_unlock(&cache->reqs_lock);
>> +		ret = -EIO;
>> +		goto out;
>> +	}
> 
> So this is an error path.  We're almost always going to take the xa_lock
> immediately after taking the read_lock.  In other words, you've done two
> atomic operations instead of one.

Right.

> 
>> +	xa_lock(xa);
>> +	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
>> +	if (!ret)
>> +		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
>> +	xa_unlock(xa);
>> +
>> +	read_unlock(&cache->reqs_lock);
>> ```
>>
>> It's mainly used to protect against the xarray flush.
>>
>> Besides, IMHO read-write lock shall be more performance friendly, since
>> most cases are the read side.
> 
> That's almost never true.  rwlocks are usually a bad idea because you
> still have to bounce the cacheline, so you replace lock contention
> (which you can see) with cacheline contention (which is harder to
> measure).  And then you have questions about reader/writer fairness
> (should new readers queue behind a writer if there's one waiting, or
> should a steady stream of readers be able to hold a writer off
> indefinitely?)

Interesting, I didn't notice it before. Thanks for explaining it.


BTW what I want is just

```
PROCESS 1		PROCESS 2
=========		=========
#lock			#lock
set DEAD state		if (not DEAD)
flush xarray		   enqueue into xarray
#unlock			#unlock
```

I think it is a generic paradigm. So it seems that the spinlock inside
xarray is already adequate for this job?

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-21 15:18           ` JeffleXu
  0 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-21 15:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-erofs, linux-kernel, dhowells, joseph.qi, linux-cachefs,
	gregkh, linux-fsdevel, luodaowen.backend, gerry, torvalds



On 3/21/22 10:26 PM, Matthew Wilcox wrote:
> On Mon, Mar 21, 2022 at 10:08:47PM +0800, JeffleXu wrote:
>> reqs_lock is also used to protect the check of cache->flags. Please
>> refer to patch 4 [1] of this patchset.
> 
> Yes, that's exactly what I meant by "bad idea".
> 
>> ```
>> +	/*
>> +	 * Enqueue the pending request.
>> +	 *
>> +	 * Stop enqueuing the request when daemon is dying. So we need to
>> +	 * 1) check cache state, and 2) enqueue request if cache is alive.
>> +	 *
>> +	 * The above two ops need to be atomic as a whole. @reqs_lock is used
>> +	 * here to ensure that. Otherwise, request may be enqueued after xarray
>> +	 * has been flushed, in which case the orphan request will never be
>> +	 * completed and thus netfs will hang there forever.
>> +	 */
>> +	read_lock(&cache->reqs_lock);
>> +
>> +	/* recheck dead state under lock */
>> +	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
>> +		read_unlock(&cache->reqs_lock);
>> +		ret = -EIO;
>> +		goto out;
>> +	}
> 
> So this is an error path.  We're almost always going to take the xa_lock
> immediately after taking the read_lock.  In other words, you've done two
> atomic operations instead of one.

Right.

> 
>> +	xa_lock(xa);
>> +	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
>> +	if (!ret)
>> +		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
>> +	xa_unlock(xa);
>> +
>> +	read_unlock(&cache->reqs_lock);
>> ```
>>
>> It's mainly used to protect against the xarray flush.
>>
>> Besides, IMHO read-write lock shall be more performance friendly, since
>> most cases are the read side.
> 
> That's almost never true.  rwlocks are usually a bad idea because you
> still have to bounce the cacheline, so you replace lock contention
> (which you can see) with cacheline contention (which is harder to
> measure).  And then you have questions about reader/writer fairness
> (should new readers queue behind a writer if there's one waiting, or
> should a steady stream of readers be able to hold a writer off
> indefinitely?)

Interesting, I didn't notice it before. Thanks for explaining it.


BTW what I want is just

```
PROCESS 1		PROCESS 2
=========		=========
#lock			#lock
set DEAD state		if (not DEAD)
flush xarray		   enqueue into xarray
#unlock			#unlock
```

I think it is a generic paradigm. So it seems that the spinlock inside
xarray is already adequate for this job?

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-21 15:18           ` JeffleXu
@ 2022-03-21 15:21             ` Matthew Wilcox
  -1 siblings, 0 replies; 102+ messages in thread
From: Matthew Wilcox @ 2022-03-21 15:21 UTC (permalink / raw)
  To: JeffleXu
  Cc: linux-erofs, linux-kernel, dhowells, joseph.qi, linux-cachefs,
	gregkh, linux-fsdevel, luodaowen.backend, gerry, torvalds

On Mon, Mar 21, 2022 at 11:18:05PM +0800, JeffleXu wrote:
> >> Besides, IMHO read-write lock shall be more performance friendly, since
> >> most cases are the read side.
> > 
> > That's almost never true.  rwlocks are usually a bad idea because you
> > still have to bounce the cacheline, so you replace lock contention
> > (which you can see) with cacheline contention (which is harder to
> > measure).  And then you have questions about reader/writer fairness
> > (should new readers queue behind a writer if there's one waiting, or
> > should a steady stream of readers be able to hold a writer off
> > indefinitely?)
> 
> Interesting, I didn't notice it before. Thanks for explaining it.

No problem.  It's hard to notice.

> BTW what I want is just
> 
> ```
> PROCESS 1		PROCESS 2
> =========		=========
> #lock			#lock
> set DEAD state		if (not DEAD)
> flush xarray		   enqueue into xarray
> #unlock			#unlock
> ```
> 
> I think it is a generic paradigm. So it seems that the spinlock inside
> xarray is already adequate for this job?

Absolutely; just use xa_lock() to protect both setting & testing the
flag.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-21 15:21             ` Matthew Wilcox
  0 siblings, 0 replies; 102+ messages in thread
From: Matthew Wilcox @ 2022-03-21 15:21 UTC (permalink / raw)
  To: JeffleXu
  Cc: dhowells, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend

On Mon, Mar 21, 2022 at 11:18:05PM +0800, JeffleXu wrote:
> >> Besides, IMHO read-write lock shall be more performance friendly, since
> >> most cases are the read side.
> > 
> > That's almost never true.  rwlocks are usually a bad idea because you
> > still have to bounce the cacheline, so you replace lock contention
> > (which you can see) with cacheline contention (which is harder to
> > measure).  And then you have questions about reader/writer fairness
> > (should new readers queue behind a writer if there's one waiting, or
> > should a steady stream of readers be able to hold a writer off
> > indefinitely?)
> 
> Interesting, I didn't notice it before. Thanks for explaining it.

No problem.  It's hard to notice.

> BTW what I want is just
> 
> ```
> PROCESS 1		PROCESS 2
> =========		=========
> #lock			#lock
> set DEAD state		if (not DEAD)
> flush xarray		   enqueue into xarray
> #unlock			#unlock
> ```
> 
> I think it is a generic paradigm. So it seems that the spinlock inside
> xarray is already adequate for this job?

Absolutely; just use xa_lock() to protect both setting & testing the
flag.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-21 15:18           ` JeffleXu
@ 2022-03-21 15:30             ` David Howells
  -1 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 15:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: dhowells, JeffleXu, linux-cachefs, xiang, chao, linux-erofs,
	torvalds, gregkh, linux-fsdevel, joseph.qi, bo.liu, tao.peng,
	gerry, eguan, linux-kernel, luodaowen.backend

Matthew Wilcox <willy@infradead.org> wrote:

> Absolutely; just use xa_lock() to protect both setting & testing the
> flag.

How should Jeffle deal with xarray dropping the lock internally in order to do
an allocation and then taking it again (actually in patch 5)?

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-21 15:30             ` David Howells
  0 siblings, 0 replies; 102+ messages in thread
From: David Howells @ 2022-03-21 15:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: joseph.qi, torvalds, linux-kernel, dhowells, linux-fsdevel,
	linux-cachefs, gregkh, luodaowen.backend, gerry, linux-erofs

Matthew Wilcox <willy@infradead.org> wrote:

> Absolutely; just use xa_lock() to protect both setting & testing the
> flag.

How should Jeffle deal with xarray dropping the lock internally in order to do
an allocation and then taking it again (actually in patch 5)?

David


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-21 15:30             ` David Howells
@ 2022-03-22 17:04               ` Matthew Wilcox
  -1 siblings, 0 replies; 102+ messages in thread
From: Matthew Wilcox @ 2022-03-22 17:04 UTC (permalink / raw)
  To: David Howells
  Cc: JeffleXu, linux-cachefs, xiang, chao, linux-erofs, torvalds,
	gregkh, linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend

On Mon, Mar 21, 2022 at 03:30:52PM +0000, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > Absolutely; just use xa_lock() to protect both setting & testing the
> > flag.
> 
> How should Jeffle deal with xarray dropping the lock internally in order to do
> an allocation and then taking it again (actually in patch 5)?

There are a number of ways to handle this.  I'll outline two; others
are surely possible.

option 1:

add side:

xa_lock();
if (!DEAD)
	xa_store(GFP_KERNEL);
	if (DEAD)
		xa_erase();
xa_unlock();

destroy side:

xa_lock();
set DEAD;
xa_for_each()
	xa_erase();
xa_unlock();

That has the problem (?) that it might be temporarily possible to see
a newly-added entry in a DEAD array.

If that is a problem, you can use xa_reserve() on the add side, followed
by overwriting it or removing it, depending on the state of the DEAD flag.

If you really want to, you can decompose the add side so that you always
check the DEAD flag before doing the store, ie:

do {
	xas_lock();
	if (DEAD)
		xas_set_error(-EINVAL);
	else
		xas_store();
	xas_unlock();
} while (xas_nomem(GFP_KERNEL));


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-22 17:04               ` Matthew Wilcox
  0 siblings, 0 replies; 102+ messages in thread
From: Matthew Wilcox @ 2022-03-22 17:04 UTC (permalink / raw)
  To: David Howells
  Cc: joseph.qi, linux-erofs, linux-kernel, linux-fsdevel,
	linux-cachefs, gregkh, luodaowen.backend, gerry, torvalds

On Mon, Mar 21, 2022 at 03:30:52PM +0000, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > Absolutely; just use xa_lock() to protect both setting & testing the
> > flag.
> 
> How should Jeffle deal with xarray dropping the lock internally in order to do
> an allocation and then taking it again (actually in patch 5)?

There are a number of ways to handle this.  I'll outline two; others
are surely possible.

option 1:

add side:

xa_lock();
if (!DEAD)
	xa_store(GFP_KERNEL);
	if (DEAD)
		xa_erase();
xa_unlock();

destroy side:

xa_lock();
set DEAD;
xa_for_each()
	xa_erase();
xa_unlock();

That has the problem (?) that it might be temporarily possible to see
a newly-added entry in a DEAD array.

If that is a problem, you can use xa_reserve() on the add side, followed
by overwriting it or removing it, depending on the state of the DEAD flag.

If you really want to, you can decompose the add side so that you always
check the DEAD flag before doing the store, ie:

do {
	xas_lock();
	if (DEAD)
		xas_set_error(-EINVAL);
	else
		xas_store();
	xas_unlock();
} while (xas_nomem(GFP_KERNEL));


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
  2022-03-22 17:04               ` Matthew Wilcox
@ 2022-03-23  5:32                 ` JeffleXu
  -1 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-23  5:32 UTC (permalink / raw)
  To: Matthew Wilcox, David Howells
  Cc: linux-cachefs, xiang, chao, linux-erofs, torvalds, gregkh,
	linux-fsdevel, joseph.qi, bo.liu, tao.peng, gerry, eguan,
	linux-kernel, luodaowen.backend



On 3/23/22 1:04 AM, Matthew Wilcox wrote:
> On Mon, Mar 21, 2022 at 03:30:52PM +0000, David Howells wrote:
>> Matthew Wilcox <willy@infradead.org> wrote:
>>
>>> Absolutely; just use xa_lock() to protect both setting & testing the
>>> flag.
>>
>> How should Jeffle deal with xarray dropping the lock internally in order to do
>> an allocation and then taking it again (actually in patch 5)?
> 
> There are a number of ways to handle this.  I'll outline two; others
> are surely possible.

Thanks.


> 
> option 1:
> 
> add side:
> 
> xa_lock();
> if (!DEAD)
> 	xa_store(GFP_KERNEL);
> 	if (DEAD)
> 		xa_erase();
> xa_unlock();
> 
> destroy side:
> 
> xa_lock();
> set DEAD;
> xa_for_each()
> 	xa_erase();
> xa_unlock();
> 
> That has the problem (?) that it might be temporarily possible to see
> a newly-added entry in a DEAD array.

I think this problem doesn't matter in our scenario.


> 
> If that is a problem, you can use xa_reserve() on the add side, followed
> by overwriting it or removing it, depending on the state of the DEAD flag.

Right. Then even the normal path (when memory allocation succeeds) needs
to call xa_reserve() once.


> 
> If you really want to, you can decompose the add side so that you always
> check the DEAD flag before doing the store, ie:
> 
> do {
> 	xas_lock();
> 	if (DEAD)
> 		xas_set_error(-EINVAL);
> 	else
> 		xas_store();
> 	xas_unlock();
> } while (xas_nomem(GFP_KERNEL));

This way is more cleaner from the locking semantics, with the cost of
code duplication. However, after decomposing the __xa_alloc(), we can
also reuse the xas when setting CACHEFILES_REQ_NEW mark.

```
+	xa_lock(xa);
+	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
+	if (!ret)
+		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
+	xa_unlock(xa);
```

So far personally I prefer the decomposing way in our scenario.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 03/22] cachefiles: introduce on-demand read mode
@ 2022-03-23  5:32                 ` JeffleXu
  0 siblings, 0 replies; 102+ messages in thread
From: JeffleXu @ 2022-03-23  5:32 UTC (permalink / raw)
  To: Matthew Wilcox, David Howells
  Cc: linux-erofs, linux-kernel, joseph.qi, linux-cachefs, gregkh,
	linux-fsdevel, luodaowen.backend, gerry, torvalds



On 3/23/22 1:04 AM, Matthew Wilcox wrote:
> On Mon, Mar 21, 2022 at 03:30:52PM +0000, David Howells wrote:
>> Matthew Wilcox <willy@infradead.org> wrote:
>>
>>> Absolutely; just use xa_lock() to protect both setting & testing the
>>> flag.
>>
>> How should Jeffle deal with xarray dropping the lock internally in order to do
>> an allocation and then taking it again (actually in patch 5)?
> 
> There are a number of ways to handle this.  I'll outline two; others
> are surely possible.

Thanks.


> 
> option 1:
> 
> add side:
> 
> xa_lock();
> if (!DEAD)
> 	xa_store(GFP_KERNEL);
> 	if (DEAD)
> 		xa_erase();
> xa_unlock();
> 
> destroy side:
> 
> xa_lock();
> set DEAD;
> xa_for_each()
> 	xa_erase();
> xa_unlock();
> 
> That has the problem (?) that it might be temporarily possible to see
> a newly-added entry in a DEAD array.

I think this problem doesn't matter in our scenario.


> 
> If that is a problem, you can use xa_reserve() on the add side, followed
> by overwriting it or removing it, depending on the state of the DEAD flag.

Right. Then even the normal path (when memory allocation succeeds) needs
to call xa_reserve() once.


> 
> If you really want to, you can decompose the add side so that you always
> check the DEAD flag before doing the store, ie:
> 
> do {
> 	xas_lock();
> 	if (DEAD)
> 		xas_set_error(-EINVAL);
> 	else
> 		xas_store();
> 	xas_unlock();
> } while (xas_nomem(GFP_KERNEL));

This way is more cleaner from the locking semantics, with the cost of
code duplication. However, after decomposing the __xa_alloc(), we can
also reuse the xas when setting CACHEFILES_REQ_NEW mark.

```
+	xa_lock(xa);
+	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
+	if (!ret)
+		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
+	xa_unlock(xa);
```

So far personally I prefer the decomposing way in our scenario.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 00/22] fscache, erofs: fscache-based on-demand read semantics
  2022-03-23  8:38 [PATCH v5 00/22] fscache, erofs: fscache-based on-demand read semantics 田子晨
@ 2022-03-23 11:45   ` gregkh
  0 siblings, 0 replies; 102+ messages in thread
From: gregkh @ 2022-03-23 11:45 UTC (permalink / raw)
  To: 田子晨
  Cc: jefflexu, chao, dhowells, gerry, joseph.qi, linux-cachefs,
	linux-erofs, linux-fsdevel, linux-kernel, luodaowen.backend,
	torvalds, willy, xiang

On Wed, Mar 23, 2022 at 08:38:07AM +0000, 田子晨 wrote:
> This solution looks good, and we’ re also interested  in it ,  please accelerate its progress so we can use it.

Please test the patches and provide your feedback on it (i.e.
"Reviewed-by:" and the like.)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 00/22] fscache, erofs: fscache-based on-demand read semantics
@ 2022-03-23 11:45   ` gregkh
  0 siblings, 0 replies; 102+ messages in thread
From: gregkh @ 2022-03-23 11:45 UTC (permalink / raw)
  To: 田子晨
  Cc: torvalds, linux-kernel, willy, dhowells, joseph.qi,
	linux-cachefs, luodaowen.backend, linux-fsdevel, gerry,
	linux-erofs

On Wed, Mar 23, 2022 at 08:38:07AM +0000, 田子晨 wrote:
> This solution looks good, and we’ re also interested  in it ,  please accelerate its progress so we can use it.

Please test the patches and provide your feedback on it (i.e.
"Reviewed-by:" and the like.)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5 00/22] fscache, erofs: fscache-based on-demand read semantics
@ 2022-03-23  8:38 田子晨
  2022-03-23 11:45   ` gregkh
  0 siblings, 1 reply; 102+ messages in thread
From: 田子晨 @ 2022-03-23  8:38 UTC (permalink / raw)
  To: jefflexu
  Cc: gregkh, linux-kernel, willy, dhowells, joseph.qi, linux-cachefs,
	torvalds, linux-fsdevel, luodaowen.backend, gerry, linux-erofs

[-- Attachment #1: Type: text/plain, Size: 139 bytes --]

This solution looks good, and we’ re also interested  in it ,  please accelerate its progress so we can use it.

Best wishes,
zichen

[-- Attachment #2: Type: text/html, Size: 540 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2022-03-23 11:45 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16 13:17 [PATCH v5 00/22] fscache,erofs: fscache-based on-demand read semantics Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 00/22] fscache, erofs: " Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 01/22] fscache: export fscache_end_operation() Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 02/22] cachefiles: extract write routine Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 03/22] cachefiles: introduce on-demand read mode Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-21 13:40   ` Matthew Wilcox
2022-03-21 13:40     ` Matthew Wilcox
2022-03-21 14:08     ` JeffleXu
2022-03-21 14:08       ` JeffleXu
2022-03-21 14:26       ` Matthew Wilcox
2022-03-21 14:26         ` Matthew Wilcox
2022-03-21 15:18         ` JeffleXu
2022-03-21 15:18           ` JeffleXu
2022-03-21 15:21           ` Matthew Wilcox
2022-03-21 15:21             ` Matthew Wilcox
2022-03-21 15:30           ` David Howells
2022-03-21 15:30             ` David Howells
2022-03-22 17:04             ` Matthew Wilcox
2022-03-22 17:04               ` Matthew Wilcox
2022-03-23  5:32               ` JeffleXu
2022-03-23  5:32                 ` JeffleXu
2022-03-21 14:14   ` David Howells
2022-03-21 14:14     ` David Howells
2022-03-21 14:20     ` [Linux-cachefs] " Gao Xiang
2022-03-21 14:20       ` Gao Xiang
2022-03-16 13:17 ` [PATCH v5 04/22] cachefiles: notify user daemon with anon_fd when looking up cookie Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 19:37   ` kernel test robot
2022-03-16 19:37     ` kernel test robot
2022-03-16 13:17 ` [PATCH v5 05/22] cachefiles: notify user daemon when withdrawing cookie Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 06/22] cachefiles: implement on-demand read Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 07/22] cachefiles: document on-demand read mode Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 08/22] erofs: use meta buffers for erofs_read_superblock() Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 09/22] erofs: make erofs_map_blocks() generally available Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-17  5:35   ` [Linux-cachefs] " Gao Xiang
2022-03-17  5:35     ` Gao Xiang
2022-03-16 13:17 ` [PATCH v5 10/22] erofs: add mode checking helper Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-17  5:36   ` [Linux-cachefs] " Gao Xiang
2022-03-17  5:36     ` Gao Xiang
2022-03-18  5:26     ` JeffleXu
2022-03-16 13:17 ` [PATCH v5 11/22] erofs: register global fscache volume Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 21:52   ` kernel test robot
2022-03-16 21:52     ` kernel test robot
2022-03-17  1:49   ` kernel test robot
2022-03-17  1:49     ` kernel test robot
2022-03-16 13:17 ` [PATCH v5 12/22] erofs: add cookie context helper functions Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 13/22] erofs: add anonymous inode managing page cache of blob file Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 14/22] erofs: add erofs_fscache_read_pages() helper Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 15/22] erofs: register cookie context for bootstrap blob Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 16/22] erofs: implement fscache-based metadata read Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 17/22] erofs: implement fscache-based data read for non-inline layout Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-17  6:18   ` [Linux-cachefs] " Gao Xiang
2022-03-17  6:18     ` Gao Xiang
2022-03-18  5:29     ` JeffleXu
2022-03-16 13:17 ` [PATCH v5 18/22] erofs: implement fscache-based data read for inline layout Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 19/22] erofs: register cookie context for data blobs Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 20/22] erofs: implement fscache-based data read " Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-16 13:17 ` [PATCH v5 21/22] erofs: implement fscache-based data readahead Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-17  5:22   ` Gao Xiang
2022-03-17  5:22     ` Gao Xiang
2022-03-18  5:41     ` JeffleXu
2022-03-16 13:17 ` [PATCH v5 22/22] erofs: add 'uuid' mount option Jeffle Xu
2022-03-16 13:17   ` Jeffle Xu
2022-03-21 13:34 ` [PATCH v5 03/22] cachefiles: introduce on-demand read mode David Howells
2022-03-21 13:34   ` David Howells
2022-03-21 14:16   ` JeffleXu
2022-03-21 14:16     ` JeffleXu
2022-03-21 14:01 ` [PATCH v5 04/22] cachefiles: notify user daemon with anon_fd when looking up cookie David Howells
2022-03-21 14:01   ` David Howells
2022-03-21 14:43   ` JeffleXu
2022-03-21 14:43     ` JeffleXu
2022-03-21 14:05 ` [PATCH v5 06/22] cachefiles: implement on-demand read David Howells
2022-03-21 14:05   ` David Howells
2022-03-21 14:51   ` JeffleXu
2022-03-21 14:51     ` JeffleXu
2022-03-21 14:20 ` [PATCH v5 05/22] cachefiles: notify user daemon when withdrawing cookie David Howells
2022-03-21 14:20   ` David Howells
2022-03-21 14:31   ` JeffleXu
2022-03-21 14:31     ` JeffleXu
2022-03-23  8:38 [PATCH v5 00/22] fscache, erofs: fscache-based on-demand read semantics 田子晨
2022-03-23 11:45 ` gregkh
2022-03-23 11:45   ` gregkh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.