All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] cachefiles: content map
@ 2022-08-02  3:03 Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 1/9] cachefiles: improve FSCACHE_COOKIE_NO_DATA_TO_READ optimization Jingbo Xu
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Jingbo Xu @ 2022-08-02  3:03 UTC (permalink / raw)
  To: dhowells, linux-cachefs; +Cc: linux-kernel, xiang

Kernel Patchset
===============
Git tree:

    https://github.com/lostjeffle/linux.git jingbo/dev-fscache-bitmap-v1

Gitweb:

    https://github.com/lostjeffle/linux/commits/jingbo/dev-fscache-bitmap-v1


[Introduction]
==============

Besides the SEEK_[DATA|HOLE] llseek mechanism provided by the backing
filesystem, this patch set is going to introduce a bitmap based
mechanism, in which a self-maintained bitmap is used to track if the
file range has been cached by the backing file.


[Design]
========

[Content Map]
The content map is allocated/expanded/shorten in unit of PAGE_SIZE,
which is multiples times of the block size of the backing filesystem,
so that the backing content map file can be easily punched hole if the
content map gets truncated or invalidated. Each bit of the content map
indicates the existence of 4KB data of the backing file, thus each
(4K sized) chunk of content map covers 128MB data of the backing file.

In the lookup phase, for the case when the backing file already exists,
the content map is loaded from the backing content map file. When the
backing file gets written, the content map gets updated on the
completion of the write (i.e. cachefiles_write_complete()).

When the backing file is truncated to a larger size, we need to expand
the content map accordingly. However the expansion of the content map is
done in a lazy expansion way. That is, the expansion of the content map
is delayed to the point when the content map needs to be marked, inside
cachefiles_write_complete(), i.e. iocb.ki_complete() callback. It shall
be safe to allocate memory with GFP_KERNEL inside the iocb.ki_complete()
callback, since the callback is scheduled by workqueue for DIRECT IO.

While for the case where the backing file doesn't exist, i.e. a new
tmpfile is created as the backing file, the content map will not be
allocated at the lookup phase. Instead, it will be expanded at runtime
in the same way described above.

When the backing file is truncated to a smaller size, only the tailing
part that exceeds the new size gets zeroed, while the content map itself
is not truncated.

Thus the content map size may be smaller or larger than the actual size
of the backing file.


[Backing Content Map File]
The content map is permanentized to the backing content map file.
Currently each sub-directory under one volume maintains one backing
content map file, so that the cacehfilesd only needs to remove the whole
sub-directory (including the content map file and backing files in the
sub-directory) as usual when it's going to cull the whole sub-directory
or volume.

In this case, the content map file will be shared among all backing
files under the same sub-directory. Thus the offset of the content map
in the backing content map file needs to be stored in the xattr for each
backing file. Besides, since the content map size may be smaller or
larger than the actual size of the backing file as we described above,
the content map size also needs to be stored in the xattr of the backing
file.

When expanding the content map, a new offset inside the backing content
map file also needs to be allocated, with the old range starting from
the old offset getting punched hole. Currently the new offset is always
allocated in an appending style, i.e. the previous hole will not be
reused.


[Time Sequence]
===============
I haven't do much work on this yet though... Actually there are three
actions when filling the cache:

1. write data to the backing file
2. write content map to the backing content map file
3. flush the content of xattr to disk

Currently action 1 is through DIRECT IO, while action 2 is buffered IO.
To make sure the content map is flushed to disk _before_ xattr gets
flushed to disk, the backing content map file is opened with O_DSYNC, so
that the following write to the backing content map file will only
return when the written data has been flushed to disk.


[TEST]
======
It passes the test cases for on-demand mode[1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/tests/fscache?h=experimental-tests-fscache

It also passes xfstests on NFS 4.0 with fscache enabled.

The performance test is still under progress.


Jingbo Xu (9):
  cachefiles: improve FSCACHE_COOKIE_NO_DATA_TO_READ optimization
  cachefiles: add content map file helpers
  cachefiles: allocate per-subdir content map files
  cachefiles: alloc/load/save content map
  cachefiles: mark content map on write to the backing file
  cachefiles: check content map on read/write
  cachefiles: free content map on invalidate
  cachefiles: resize content map on resize
  cachefiles: cull content map file on cull

 fs/cachefiles/Makefile      |   3 +-
 fs/cachefiles/content-map.c | 333 ++++++++++++++++++++++++++++++++++++
 fs/cachefiles/interface.c   |  10 +-
 fs/cachefiles/internal.h    |  31 ++++
 fs/cachefiles/io.c          |  59 +++++--
 fs/cachefiles/namei.c       |  96 +++++++++++
 fs/cachefiles/ondemand.c    |   5 +-
 fs/cachefiles/volume.c      |  14 +-
 fs/cachefiles/xattr.c       |  26 +++
 fs/fscache/cookie.c         |   2 +-
 10 files changed, 558 insertions(+), 21 deletions(-)
 create mode 100644 fs/cachefiles/content-map.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RFC 1/9] cachefiles: improve FSCACHE_COOKIE_NO_DATA_TO_READ optimization
  2022-08-02  3:03 [RFC PATCH 0/9] cachefiles: content map Jingbo Xu
@ 2022-08-02  3:03 ` Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 2/9] cachefiles: add content map file helpers Jingbo Xu
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2022-08-02  3:03 UTC (permalink / raw)
  To: dhowells, linux-cachefs; +Cc: linux-kernel, xiang

In the following introduced content map feature,
cachefiles_prepare_[read|write] can query if the requested range is
cached through either SEEK_[DATA|HOLE] llseek or a self maintained
bitmap according to object->content_info.

For already existing backing files, content_info can be derived from the
xattr of the backing file. While for newly created tmpfile, content_info
is initialized when the backing file is written for the first time. This
time sequence requires FSCACHE_COOKIE_NO_DATA_TO_READ optimization, so
that llseek will only be called after the first write, i.e. after
content_info has been initializaed.

This patch includes following changes:

1. Enable NO_DATA optimization in cachefiles_prepare_[read|write].

2. Clear FSCACHE_COOKIE_NO_DATA_TO_READ on first write to the backing
   file.

When working in non-on-demand mode, FSCACHE_COOKIE_NO_DATA_TO_READ is
cleared when a_ops->release_folio() called. While for on-demand mode,
there's a retry logic in cachefiles_prepare_read(), i.e. the requested
range will be checked for the second time after the on-demand read, thus
FSCACHE_COOKIE_NO_DATA_TO_READ needs to be cleared for on-demand mode
once write completes.

3. Improve the setting/clearing of FSCACHE_COOKIE_NO_DATA_TO_READ in
on-demand mode.

Since now we rely on NO_DATA optimization when the backing file is
actually tmpfile, the setting of FSCACHE_COOKIE_NO_DATA_TO_READ flag in
on-demand mode is delayed until the size of the backing file is acquired
when copen completes, so that FSCACHE_COOKIE_NO_DATA_TO_READ flag of
tmpfile can be retained.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/io.c       | 20 +++++++++++++-------
 fs/cachefiles/ondemand.c |  5 +----
 fs/fscache/cookie.c      |  2 +-
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 000a28f46e59..b513d9bf81f1 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -255,6 +255,7 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
 {
 	struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
 	struct cachefiles_object *object = ki->object;
+	struct fscache_cookie *cookie = object->cookie;
 	struct inode *inode = file_inode(ki->iocb.ki_filp);
 
 	_enter("%ld", ret);
@@ -269,6 +270,9 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
 
 	atomic_long_sub(ki->b_writing, &object->volume->cache->b_writing);
 	set_bit(FSCACHE_COOKIE_HAVE_DATA, &object->cookie->flags);
+	if (cookie->advice & FSCACHE_ADV_WANT_CACHE_SIZE &&
+	    test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
+		clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
 	if (ki->term_func)
 		ki->term_func(ki->term_func_priv, ret, ki->was_async);
 	cachefiles_put_kiocb(ki);
@@ -413,13 +417,6 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 		goto out_no_object;
 	}
 
-	if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) {
-		__set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
-		why = cachefiles_trace_read_no_data;
-		if (!test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags))
-			goto out_no_object;
-	}
-
 	/* The object and the file may be being created in the background. */
 	if (!file) {
 		why = cachefiles_trace_read_no_file;
@@ -434,6 +431,11 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 	object = cachefiles_cres_object(cres);
 	cache = object->volume->cache;
 	cachefiles_begin_secure(cache, &saved_cred);
+
+	if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) {
+		why = cachefiles_trace_read_no_data;
+		goto download_and_store;
+	}
 retry:
 	off = cachefiles_inject_read_error();
 	if (off == 0)
@@ -510,6 +512,7 @@ int __cachefiles_prepare_write(struct cachefiles_object *object,
 			       bool no_space_allocated_yet)
 {
 	struct cachefiles_cache *cache = object->volume->cache;
+	struct fscache_cookie *cookie = object->cookie;
 	loff_t start = *_start, pos;
 	size_t len = *_len, down;
 	int ret;
@@ -526,6 +529,9 @@ int __cachefiles_prepare_write(struct cachefiles_object *object,
 	if (no_space_allocated_yet)
 		goto check_space;
 
+	if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
+		goto check_space;
+
 	pos = cachefiles_inject_read_error();
 	if (pos == 0)
 		pos = vfs_llseek(file, *_start, SEEK_DATA);
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 1fee702d5529..a317857e2dfd 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -166,12 +166,9 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 
 	cookie = req->object->cookie;
 	cookie->object_size = size;
-	if (size)
-		clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
-	else
+	if (size == 0)
 		set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
 	trace_cachefiles_ondemand_copen(req->object, id, size);
-
 out:
 	complete(&req->done);
 	return ret;
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 74920826d8f6..49c269c078eb 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -340,7 +340,7 @@ static struct fscache_cookie *fscache_alloc_cookie(
 	cookie->key_len		= index_key_len;
 	cookie->aux_len		= aux_data_len;
 	cookie->object_size	= object_size;
-	if (object_size == 0)
+	if (object_size == 0 && !(advice & FSCACHE_ADV_WANT_CACHE_SIZE))
 		__set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
 
 	if (fscache_set_key(cookie, index_key, index_key_len) < 0)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 2/9] cachefiles: add content map file helpers
  2022-08-02  3:03 [RFC PATCH 0/9] cachefiles: content map Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 1/9] cachefiles: improve FSCACHE_COOKIE_NO_DATA_TO_READ optimization Jingbo Xu
@ 2022-08-02  3:03 ` Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 3/9] cachefiles: allocate per-subdir content map files Jingbo Xu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2022-08-02  3:03 UTC (permalink / raw)
  To: dhowells, linux-cachefs; +Cc: linux-kernel, xiang

Besides the mapping mechanism provided by the backing fs, a self
maintained bitmap can be used to track if the corresponding file range
is cached by the backing file or not. In this case, a content map file
is used to permanentize the bitmap.

As the first step, add the helper functions for looking up and freeing
these content map files.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/internal.h |  4 ++
 fs/cachefiles/namei.c    | 88 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 6cba2c6de2f9..4c3ee6935811 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -270,6 +270,10 @@ extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 					       bool *_is_new);
 extern void cachefiles_put_directory(struct dentry *dir);
 
+int cachefiles_look_up_map(struct cachefiles_cache *cache,
+			   struct dentry *dir, struct file **pfile);
+void cachefiles_put_map(struct file *file);
+
 extern int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 			   char *filename);
 
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index facf2ebe464b..2948eea18ca2 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -231,6 +231,94 @@ void cachefiles_put_directory(struct dentry *dir)
 	}
 }
 
+/*
+ * Look up a content map file.
+ */
+int cachefiles_look_up_map(struct cachefiles_cache *cache,
+			   struct dentry *dir, struct file **pfile)
+{
+	struct dentry *dentry;
+	struct file *file;
+	struct path path;
+	char *name = "Map";
+	int ret;
+
+	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+retry:
+	ret = cachefiles_inject_read_error();
+	if (ret)
+		goto err_unlock_dir;
+
+	dentry = lookup_one_len(name, dir, strlen(name));
+	if (IS_ERR(dentry)) {
+		ret = PTR_ERR(dentry);
+		goto err_unlock_dir;
+	}
+
+	if (d_is_negative(dentry)) {
+		ret = cachefiles_has_space(cache, 1, 0,
+				cachefiles_has_space_for_create);
+		if (ret)
+			goto err_dput;
+
+		ret = vfs_create(&init_user_ns, d_inode(dir), dentry, S_IFREG, true);
+		if (ret)
+			goto err_dput;
+
+		if (unlikely(d_unhashed(dentry))) {
+			cachefiles_put_directory(dentry);
+			goto retry;
+		}
+		ASSERT(d_backing_inode(dentry));
+	}
+
+	inode_lock(d_inode(dentry));
+	inode_unlock(d_inode(dir));
+
+	if (!__cachefiles_mark_inode_in_use(NULL, dentry)) {
+		inode_unlock(d_inode(dentry));
+		dput(dentry);
+		return -EBUSY;
+	}
+
+	inode_unlock(d_inode(dentry));
+	ASSERT(d_backing_inode(dentry));
+
+	if (!d_is_reg(dentry)) {
+		pr_err("%pd is not a file\n", dentry);
+		cachefiles_put_directory(dentry);
+		return -EIO;
+	}
+
+	path.mnt = cache->mnt;
+	path.dentry = dentry;
+	file = open_with_fake_path(&path, O_RDWR | O_LARGEFILE,
+			d_backing_inode(dentry), cache->cache_cred);
+	if (IS_ERR(file))
+		cachefiles_put_directory(dentry);
+
+	*pfile = file;
+	dput(dentry);
+	return 0;
+
+err_dput:
+	dput(dentry);
+err_unlock_dir:
+	inode_unlock(d_inode(dir));
+	return ret;
+}
+
+/*
+ * Put a content map file.
+ */
+void cachefiles_put_map(struct file *file)
+{
+	if (file) {
+		cachefiles_do_unmark_inode_in_use(NULL, file->f_path.dentry);
+		fput(file);
+	}
+}
+
 /*
  * Remove a regular file from the cache.
  */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 3/9] cachefiles: allocate per-subdir content map files
  2022-08-02  3:03 [RFC PATCH 0/9] cachefiles: content map Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 1/9] cachefiles: improve FSCACHE_COOKIE_NO_DATA_TO_READ optimization Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 2/9] cachefiles: add content map file helpers Jingbo Xu
@ 2022-08-02  3:03 ` Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 4/9] cachefiles: alloc/load/save content map Jingbo Xu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2022-08-02  3:03 UTC (permalink / raw)
  To: dhowells, linux-cachefs; +Cc: linux-kernel, xiang

Allocate one content map file for each sub-directory under one volume,
so that the cacehfilesd only needs to remove the whole sub-directory
(including the content map file and backing files in the sub-directory)
as usual when it's going to cull the whole sub-directory or volume.

The content map file will be shared among all backing files under this
same sub-directory.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/internal.h |  1 +
 fs/cachefiles/namei.c    |  2 +-
 fs/cachefiles/volume.c   | 14 ++++++++++++--
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 4c3ee6935811..06bde4e0e4f5 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -42,6 +42,7 @@ struct cachefiles_volume {
 	struct fscache_volume		*vcookie;	/* The netfs's representation */
 	struct dentry			*dentry;	/* The volume dentry */
 	struct dentry			*fanout[256];	/* Fanout subdirs */
+	struct file                     *content_map[256]; /* Content map files */
 };
 
 /*
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 2948eea18ca2..d2d5feea64e8 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -292,7 +292,7 @@ int cachefiles_look_up_map(struct cachefiles_cache *cache,
 
 	path.mnt = cache->mnt;
 	path.dentry = dentry;
-	file = open_with_fake_path(&path, O_RDWR | O_LARGEFILE,
+	file = open_with_fake_path(&path, O_RDWR | O_LARGEFILE | O_DSYNC,
 			d_backing_inode(dentry), cache->cache_cred);
 	if (IS_ERR(file))
 		cachefiles_put_directory(dentry);
diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c
index 89df0ba8ba5e..4decc91a8886 100644
--- a/fs/cachefiles/volume.c
+++ b/fs/cachefiles/volume.c
@@ -20,6 +20,7 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
 	struct cachefiles_cache *cache = vcookie->cache->cache_priv;
 	const struct cred *saved_cred;
 	struct dentry *vdentry, *fan;
+	struct file *map;
 	size_t len;
 	char *name;
 	bool is_new = false;
@@ -73,6 +74,11 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
 		if (IS_ERR(fan))
 			goto error_fan;
 		volume->fanout[i] = fan;
+
+		ret = cachefiles_look_up_map(cache, fan, &map);
+		if (ret)
+			goto error_fan;
+		volume->content_map[i] = map;
 	}
 
 	cachefiles_end_secure(cache, saved_cred);
@@ -91,8 +97,10 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
 	return;
 
 error_fan:
-	for (i = 0; i < 256; i++)
+	for (i = 0; i < 256; i++) {
 		cachefiles_put_directory(volume->fanout[i]);
+		cachefiles_put_map(volume->content_map[i]);
+	}
 error_dir:
 	cachefiles_put_directory(volume->dentry);
 error_name:
@@ -113,8 +121,10 @@ static void __cachefiles_free_volume(struct cachefiles_volume *volume)
 
 	volume->vcookie->cache_priv = NULL;
 
-	for (i = 0; i < 256; i++)
+	for (i = 0; i < 256; i++) {
 		cachefiles_put_directory(volume->fanout[i]);
+		cachefiles_put_map(volume->content_map[i]);
+	}
 	cachefiles_put_directory(volume->dentry);
 	kfree(volume);
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 4/9] cachefiles: alloc/load/save content map
  2022-08-02  3:03 [RFC PATCH 0/9] cachefiles: content map Jingbo Xu
                   ` (2 preceding siblings ...)
  2022-08-02  3:03 ` [PATCH RFC 3/9] cachefiles: allocate per-subdir content map files Jingbo Xu
@ 2022-08-02  3:03 ` Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 5/9] cachefiles: mark content map on write to the backing file Jingbo Xu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2022-08-02  3:03 UTC (permalink / raw)
  To: dhowells, linux-cachefs; +Cc: linux-kernel, xiang

Besides the SEEK_[DATA|HOLE] llseek mechanism provided by the backing
filesystem, this patch set is going to introduce a bitmap based
mechanism, in which a self-maintained bitmap is used to track if the
file range has been cached by the backing file.

The bitmap is permanentized to the corresponding backing content map
file. Since all backing files under one sub-directory share one backing
content map file, the offset of the content map in the backing content
map file is stored in the xattr for each backing file. Besides, the size
of the content map is also stored in the xattr of the backing file. As
shown in the following patches, the content map size stored in the
xattr can be smaller or larger than the actual size of the backing file.

In the lookup phase, for the case when the backing file already exists,
the content map is loaded from the backing content map file. If the
backing file doesn't exist, i.e. a new tmpfile is created as the backing
file, the content map will not be initialized at this point. Instead, it
will be expanded at runtime later.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/Makefile      |  3 +-
 fs/cachefiles/content-map.c | 93 +++++++++++++++++++++++++++++++++++++
 fs/cachefiles/interface.c   |  8 +++-
 fs/cachefiles/internal.h    | 13 ++++++
 fs/cachefiles/namei.c       |  4 ++
 fs/cachefiles/xattr.c       |  9 ++++
 6 files changed, 128 insertions(+), 2 deletions(-)
 create mode 100644 fs/cachefiles/content-map.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index c37a7a9af10b..59cd26cd7700 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -13,7 +13,8 @@ cachefiles-y := \
 	namei.o \
 	security.o \
 	volume.o \
-	xattr.o
+	xattr.o \
+	content-map.o
 
 cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o
 cachefiles-$(CONFIG_CACHEFILES_ONDEMAND) += ondemand.o
diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
new file mode 100644
index 000000000000..3432efdecbcf
--- /dev/null
+++ b/fs/cachefiles/content-map.c
@@ -0,0 +1,93 @@
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/uio.h>
+#include "internal.h"
+
+/*
+ * Zero the unused tail.
+ *
+ * @i_size indicates the size of the backing object.
+ */
+static void cachefiles_zero_content_map(void *map, size_t content_map_size,
+					size_t i_size)
+{
+	unsigned long granules_needed = DIV_ROUND_UP(i_size, CACHEFILES_GRAN_SIZE);
+	unsigned long bytes_needed = BITS_TO_BYTES(granules_needed);
+	unsigned long byte_end = min_t(unsigned long, bytes_needed, content_map_size);
+	int i;
+
+	if (bytes_needed < content_map_size)
+		memset(map + bytes_needed, 0, content_map_size - bytes_needed);
+
+	for (i = granules_needed; i < byte_end * BITS_PER_BYTE; i++)
+		clear_bit(i, map);
+}
+
+/*
+ * Load the content map from the backing map file.
+ */
+int cachefiles_load_content_map(struct cachefiles_object *object)
+{
+	struct file *file = object->volume->content_map[(u8)object->cookie->key_hash];
+	loff_t off = object->content_map_off;
+	size_t size = object->content_map_size;
+	void *map;
+	int ret;
+
+	if (object->content_info != CACHEFILES_CONTENT_MAP)
+		return 0;
+
+	map = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, get_order(size));
+	if (!map)
+		return -ENOMEM;
+
+	ret = kernel_read(file, map, size, &off);
+	if (ret != size) {
+		free_pages((unsigned long)map, get_order(size));
+		return ret < 0 ? ret : -EIO;
+	}
+
+	/*
+	 * Zero the unused tail. Later when expanding the content map, the
+	 * content map itself may keep the same size while i_size of the backing
+	 * object is increased. In this case, the original content map is reused
+	 * and part of the original unused tail is used now. Be noted that
+	 * content_map_size stored in xattr may be smaller or larger than the
+	 * actual size of the backing object.
+	 */
+	cachefiles_zero_content_map(map, size, object->cookie->object_size);
+
+	object->content_map = map;
+	return 0;
+}
+
+/*
+ * Save the content map to the backing map file.
+ */
+void cachefiles_save_content_map(struct cachefiles_object *object)
+{
+	struct file *file = object->volume->content_map[(u8)object->cookie->key_hash];
+	loff_t off;
+	int ret;
+
+	if (object->content_info != CACHEFILES_CONTENT_MAP ||
+	    !object->content_map_size)
+		return;
+
+	/* allocate space from content map file */
+	off = object->content_map_off;
+	if (off == CACHEFILES_CONTENT_MAP_OFF_INVAL) {
+		struct inode *inode = file_inode(file);
+
+		inode_lock(inode);
+		off = i_size_read(inode);
+		i_size_write(inode, off + object->content_map_size);
+		inode_unlock(inode);
+
+		object->content_map_off = off;
+	}
+
+	ret = kernel_write(file, object->content_map, object->content_map_size, &off);
+	if (ret != object->content_map_size)
+		object->content_info = CACHEFILES_CONTENT_NO_DATA;
+}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index a69073a1d3f0..4cfbdc87b635 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -38,6 +38,8 @@ struct cachefiles_object *cachefiles_alloc_object(struct fscache_cookie *cookie)
 	object->volume = volume;
 	object->debug_id = atomic_inc_return(&cachefiles_object_debug_id);
 	object->cookie = fscache_get_cookie(cookie, fscache_cookie_get_attach_object);
+	object->content_map_off = CACHEFILES_CONTENT_MAP_OFF_INVAL;
+	rwlock_init(&object->content_map_lock);
 
 	fscache_count_object(vcookie->cache);
 	trace_cachefiles_ref(object->debug_id, cookie->debug_id, 1,
@@ -88,6 +90,8 @@ void cachefiles_put_object(struct cachefiles_object *object,
 		ASSERTCMP(object->file, ==, NULL);
 
 		kfree(object->d_name);
+		free_pages((unsigned long)object->content_map,
+			   get_order(object->content_map_size));
 
 		cache = object->volume->cache->cache;
 		fscache_put_cookie(object->cookie, fscache_cookie_put_object);
@@ -309,8 +313,10 @@ static void cachefiles_commit_object(struct cachefiles_object *object,
 		update = true;
 	if (test_and_clear_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flags))
 		update = true;
-	if (update)
+	if (update) {
+		cachefiles_save_content_map(object);
 		cachefiles_set_object_xattr(object);
+	}
 
 	if (test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags))
 		cachefiles_commit_tmpfile(cache, object);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 06bde4e0e4f5..1335ea5f4a5e 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -19,6 +19,7 @@
 #include <linux/cachefiles.h>
 
 #define CACHEFILES_DIO_BLOCK_SIZE 4096
+#define CACHEFILES_GRAN_SIZE 4096	/* one bit represents 4K */
 
 struct cachefiles_cache;
 struct cachefiles_object;
@@ -30,6 +31,7 @@ enum cachefiles_content {
 	CACHEFILES_CONTENT_ALL		= 2, /* Content is all present, no map */
 	CACHEFILES_CONTENT_BACKFS_MAP	= 3, /* Content is piecemeal, mapped through backing fs */
 	CACHEFILES_CONTENT_DIRTY	= 4, /* Content is dirty (only seen on disk) */
+	CACHEFILES_CONTENT_MAP		= 5, /* Content is piecemeal, map in use */
 	nr__cachefiles_content
 };
 
@@ -59,6 +61,11 @@ struct cachefiles_object {
 	refcount_t			ref;
 	u8				d_name_len;	/* Length of filename */
 	enum cachefiles_content		content_info:8;	/* Info about content presence */
+	rwlock_t			content_map_lock;
+	void				*content_map;
+	size_t				content_map_size; /* size of content map in bytes */
+	loff_t				content_map_off;  /* offset in the backing content map file */
+#define CACHEFILES_CONTENT_MAP_OFF_INVAL	-1
 	unsigned long			flags;
 #define CACHEFILES_OBJECT_USING_TMPFILE	0		/* Have an unlinked tmpfile */
 #ifdef CONFIG_CACHEFILES_ONDEMAND
@@ -169,6 +176,12 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
 				unsigned fnr, unsigned bnr,
 				enum cachefiles_has_space_for reason);
 
+/*
+ * content-map.c
+ */
+extern int cachefiles_load_content_map(struct cachefiles_object *object);
+extern void cachefiles_save_content_map(struct cachefiles_object *object);
+
 /*
  * daemon.c
  */
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index d2d5feea64e8..f5e1ec1d9445 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -690,6 +690,10 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
 	if (ret < 0)
 		goto check_failed;
 
+	ret = cachefiles_load_content_map(object);
+	if (ret < 0)
+		goto check_failed;
+
 	object->file = file;
 
 	/* Always update the atime on an object we've just looked up (this is
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 00b087c14995..05ac6b70787a 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -20,6 +20,8 @@
 struct cachefiles_xattr {
 	__be64	object_size;	/* Actual size of the object */
 	__be64	zero_point;	/* Size after which server has no data not written by us */
+	__be64	content_map_off;/* Offset inside the content map file */
+	__be64	content_map_size;/* Size of the content map */
 	__u8	type;		/* Type of object */
 	__u8	content;	/* Content presence (enum cachefiles_content) */
 	__u8	data[];		/* netfs coherency data */
@@ -58,6 +60,8 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object)
 	buf->zero_point		= 0;
 	buf->type		= CACHEFILES_COOKIE_TYPE_DATA;
 	buf->content		= object->content_info;
+	buf->content_map_off	= cpu_to_be64(object->content_map_off);
+	buf->content_map_size	= cpu_to_be64(object->content_map_size);
 	if (test_bit(FSCACHE_COOKIE_LOCAL_WRITE, &object->cookie->flags))
 		buf->content	= CACHEFILES_CONTENT_DIRTY;
 	if (len > 0)
@@ -129,6 +133,11 @@ int cachefiles_check_auxdata(struct cachefiles_object *object, struct file *file
 		pr_warn("Dirty object in cache\n");
 		why = cachefiles_coherency_check_dirty;
 	} else {
+		object->content_info = buf->content;
+		if (object->content_info == CACHEFILES_CONTENT_MAP) {
+			object->content_map_off = be64_to_cpu(buf->content_map_off);
+			object->content_map_size = be64_to_cpu(buf->content_map_size);
+		}
 		why = cachefiles_coherency_check_ok;
 		ret = 0;
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 5/9] cachefiles: mark content map on write to the backing file
  2022-08-02  3:03 [RFC PATCH 0/9] cachefiles: content map Jingbo Xu
                   ` (3 preceding siblings ...)
  2022-08-02  3:03 ` [PATCH RFC 4/9] cachefiles: alloc/load/save content map Jingbo Xu
@ 2022-08-02  3:03 ` Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 6/9] cachefiles: check content map on read/write Jingbo Xu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2022-08-02  3:03 UTC (permalink / raw)
  To: dhowells, linux-cachefs; +Cc: linux-kernel, xiang

Mark the content map on completion of the write to the backing file.

The expansion of the content map (when the backing file is truncated to
a larger size), and the allocation of the content map (when the backing
file is a newly created tmpfile) is delayed to the point when the
content map needs to be marked. It shall be safe to allocate memory with
GFP_KERNEL inside the iocb.ki_complete() callback, since the callback is
scheduled by workqueue for DIRECT IO.

The content map is sized in granule of block size of backing filesystem,
so that the backing content map file can be easily punched hole if the
content map gets truncated or invalidated. Currently the content map is
sized in PAGE_SIZE unit, which shall be multiples times of the block
size of backing filesystem. Each bit of the content map indicates the
existence of 4KB data of the backing file, thus each (4K sized) chunk of
content map covers 128MB data of the backing file.

When expanding the content map, a new content map needs to be allocated.
A new offset inside the backing content map file also needs to be
allocated, with the old range starting from the old offset getting
punched hole. Currently the new offset is always allocated in an append
style, i.e. the previous hole will not be reused.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/content-map.c | 129 ++++++++++++++++++++++++++++++++++++
 fs/cachefiles/internal.h    |   2 +
 fs/cachefiles/io.c          |   3 +
 3 files changed, 134 insertions(+)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 3432efdecbcf..877ff79e181b 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -1,8 +1,24 @@
 #include <linux/fs.h>
 #include <linux/namei.h>
 #include <linux/uio.h>
+#include <linux/falloc.h>
 #include "internal.h"
 
+/*
+ * Return the size of the content map in bytes.
+ *
+ * There's one bit per granule (CACHEFILES_GRAN_SIZE, i.e. 4K). We size it in
+ * terms of block size chunks (e.g. 4K), so that the map file can be punched
+ * hole when the content map is truncated or invalidated. In this case, each 4K
+ * chunk spans (4096 * BITS_PER_BYTE * CACHEFILES_GRAN_SIZE, i.e. 128M) of file
+ * space.
+ */
+static size_t cachefiles_map_size(loff_t i_size)
+{
+	i_size = round_up(i_size, PAGE_SIZE * BITS_PER_BYTE * CACHEFILES_GRAN_SIZE);
+	return i_size / BITS_PER_BYTE / CACHEFILES_GRAN_SIZE;
+}
+
 /*
  * Zero the unused tail.
  *
@@ -91,3 +107,116 @@ void cachefiles_save_content_map(struct cachefiles_object *object)
 	if (ret != object->content_map_size)
 		object->content_info = CACHEFILES_CONTENT_NO_DATA;
 }
+
+static loff_t cachefiles_expand_map_off(struct file *file, loff_t old_off,
+					size_t old_size, size_t new_size)
+{
+	struct inode *inode = file_inode(file);
+	loff_t new_off;
+	bool punch = false;
+
+	inode_lock(inode);
+	new_off = i_size_read(inode);
+	/*
+	 * Simply expand the old content map range if possible; or discard the
+	 * old content map range and create a new one.
+	 */
+	if (new_off == old_off + old_size) {
+		i_size_write(inode, old_off + new_size);
+		new_off = old_off;
+	} else {
+		i_size_write(inode, new_off + new_size);
+		punch = true;
+	}
+	inode_unlock(inode);
+
+	if (punch)
+		vfs_fallocate(file, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+			      old_off, old_size);
+
+	return new_off;
+}
+
+/*
+ * Expand the content map to a larger file size.
+ */
+static void cachefiles_expand_content_map(struct cachefiles_object *object)
+{
+	struct file *file = object->volume->content_map[(u8)object->cookie->key_hash];
+	size_t size, zap_size;
+	void *map, *zap;
+	loff_t off;
+
+	size = cachefiles_map_size(object->cookie->object_size);
+	if (size <= object->content_map_size)
+		return;
+
+	map = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, get_order(size));
+	if (!map)
+		return;
+
+	write_lock_bh(&object->content_map_lock);
+	if (size > object->content_map_size) {
+		zap = object->content_map;
+		zap_size = object->content_map_size;
+		memcpy(map, zap, zap_size);
+		object->content_map = map;
+		object->content_map_size = size;
+
+		/* expand the content map file */
+		off = object->content_map_off;
+		if (off != CACHEFILES_CONTENT_MAP_OFF_INVAL)
+			object->content_map_off = cachefiles_expand_map_off(file,
+				off, zap_size, size);
+	} else {
+		zap = map;
+		zap_size = size;
+	}
+	write_unlock_bh(&object->content_map_lock);
+
+	free_pages((unsigned long)zap, get_order(zap_size));
+}
+
+void cachefiles_mark_content_map(struct cachefiles_object *object,
+				 loff_t start, loff_t len)
+{
+	pgoff_t granule;
+	loff_t end = start + len;
+
+	if (object->cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
+		if (start == 0) {
+			object->content_info = CACHEFILES_CONTENT_SINGLE;
+			set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flags);
+		}
+		return;
+	}
+
+	if (object->content_info == CACHEFILES_CONTENT_NO_DATA)
+		object->content_info = CACHEFILES_CONTENT_MAP;
+
+	/* TODO: set CACHEFILES_CONTENT_BACKFS_MAP accordingly */
+
+	if (object->content_info != CACHEFILES_CONTENT_MAP)
+		return;
+
+	read_lock_bh(&object->content_map_lock);
+	start = round_down(start, CACHEFILES_GRAN_SIZE);
+	do {
+		granule = start / CACHEFILES_GRAN_SIZE;
+		if (granule / BITS_PER_BYTE >= object->content_map_size) {
+			read_unlock_bh(&object->content_map_lock);
+			cachefiles_expand_content_map(object);
+			read_lock_bh(&object->content_map_lock);
+		}
+
+		if (WARN_ON(granule / BITS_PER_BYTE >= object->content_map_size))
+			break;
+
+		set_bit(granule, object->content_map);
+		start += CACHEFILES_GRAN_SIZE;
+	} while (start < end);
+
+	set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flags);
+	read_unlock_bh(&object->content_map_lock);
+}
+
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 1335ea5f4a5e..c252746c8f9b 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -181,6 +181,8 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
  */
 extern int cachefiles_load_content_map(struct cachefiles_object *object);
 extern void cachefiles_save_content_map(struct cachefiles_object *object);
+extern void cachefiles_mark_content_map(struct cachefiles_object *object,
+					loff_t start, loff_t len);
 
 /*
  * daemon.c
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index b513d9bf81f1..27171fac649e 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -264,6 +264,9 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
 	__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
 	__sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
 
+	if (ret == ki->len)
+		cachefiles_mark_content_map(ki->object, ki->start, ki->len);
+
 	if (ret < 0)
 		trace_cachefiles_io_error(object, inode, ret,
 					  cachefiles_trace_write_error);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 6/9] cachefiles: check content map on read/write
  2022-08-02  3:03 [RFC PATCH 0/9] cachefiles: content map Jingbo Xu
                   ` (4 preceding siblings ...)
  2022-08-02  3:03 ` [PATCH RFC 5/9] cachefiles: mark content map on write to the backing file Jingbo Xu
@ 2022-08-02  3:03 ` Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 7/9] cachefiles: free content map on invalidate Jingbo Xu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2022-08-02  3:03 UTC (permalink / raw)
  To: dhowells, linux-cachefs; +Cc: linux-kernel, xiang

cachefiles_find_next_granule()/cachefiles_find_next_hole() are used to
check if the requested range has been cached or not. The return value of
these two functions imitates that of SEEK_[DATA|HOLE], so that the
existing codes can be resued as much as possible.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/content-map.c | 30 ++++++++++++++++++++++++++++++
 fs/cachefiles/internal.h    |  4 ++++
 fs/cachefiles/io.c          | 36 +++++++++++++++++++++++++++++++-----
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 877ff79e181b..949ec5d9e4c9 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -220,3 +220,33 @@ void cachefiles_mark_content_map(struct cachefiles_object *object,
 	read_unlock_bh(&object->content_map_lock);
 }
 
+loff_t cachefiles_find_next_granule(struct cachefiles_object *object,
+				    loff_t start)
+{
+	unsigned long size, granule = start / CACHEFILES_GRAN_SIZE;
+	loff_t result;
+
+	read_lock_bh(&object->content_map_lock);
+	size = object->content_map_size * BITS_PER_BYTE;
+	result = find_next_bit(object->content_map, size, granule);
+	read_unlock_bh(&object->content_map_lock);
+
+	if (result == size)
+		return -ENXIO;
+	return result * CACHEFILES_GRAN_SIZE;
+}
+
+loff_t cachefiles_find_next_hole(struct cachefiles_object *object,
+				 loff_t start)
+{
+	unsigned long size, granule = start / CACHEFILES_GRAN_SIZE;
+	loff_t result;
+
+	read_lock_bh(&object->content_map_lock);
+	size = object->content_map_size * BITS_PER_BYTE;
+	result = find_next_zero_bit(object->content_map, size, granule);
+	read_unlock_bh(&object->content_map_lock);
+
+	return min_t(loff_t, result * CACHEFILES_GRAN_SIZE,
+			     object->cookie->object_size);
+}
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index c252746c8f9b..506700809a6d 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -183,6 +183,10 @@ extern int cachefiles_load_content_map(struct cachefiles_object *object);
 extern void cachefiles_save_content_map(struct cachefiles_object *object);
 extern void cachefiles_mark_content_map(struct cachefiles_object *object,
 					loff_t start, loff_t len);
+extern loff_t cachefiles_find_next_granule(struct cachefiles_object *object,
+					   loff_t start);
+extern loff_t cachefiles_find_next_hole(struct cachefiles_object *object,
+					loff_t start);
 
 /*
  * daemon.c
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 27171fac649e..5c7c84cdafea 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -30,6 +30,32 @@ struct cachefiles_kiocb {
 	u64			b_writing;
 };
 
+static loff_t cachefiles_seek_data(struct cachefiles_object *object,
+		struct file *file, loff_t start)
+{
+	switch (object->content_info) {
+	case CACHEFILES_CONTENT_MAP:
+		return cachefiles_find_next_granule(object, start);
+	case CACHEFILES_CONTENT_BACKFS_MAP:
+		return vfs_llseek(file, start, SEEK_DATA);
+	default:
+		return -EINVAL;
+	}
+}
+
+static loff_t cachefiles_seek_hole(struct cachefiles_object *object,
+		struct file *file, loff_t start)
+{
+	switch (object->content_info) {
+	case CACHEFILES_CONTENT_MAP:
+		return cachefiles_find_next_hole(object, start);
+	case CACHEFILES_CONTENT_BACKFS_MAP:
+		return vfs_llseek(file, start, SEEK_HOLE);
+	default:
+		return -EINVAL;
+	}
+}
+
 static inline void cachefiles_put_kiocb(struct cachefiles_kiocb *ki)
 {
 	if (refcount_dec_and_test(&ki->ki_refcnt)) {
@@ -103,7 +129,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 
 		off2 = cachefiles_inject_read_error();
 		if (off2 == 0)
-			off2 = vfs_llseek(file, off, SEEK_DATA);
+			off2 = cachefiles_seek_data(object, file, off);
 		if (off2 < 0 && off2 >= (loff_t)-MAX_ERRNO && off2 != -ENXIO) {
 			skipped = 0;
 			ret = off2;
@@ -442,7 +468,7 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 retry:
 	off = cachefiles_inject_read_error();
 	if (off == 0)
-		off = vfs_llseek(file, subreq->start, SEEK_DATA);
+		off = cachefiles_seek_data(object, file, subreq->start);
 	if (off < 0 && off >= (loff_t)-MAX_ERRNO) {
 		if (off == (loff_t)-ENXIO) {
 			why = cachefiles_trace_read_seek_nxio;
@@ -468,7 +494,7 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
 
 	to = cachefiles_inject_read_error();
 	if (to == 0)
-		to = vfs_llseek(file, subreq->start, SEEK_HOLE);
+		to = cachefiles_seek_hole(object, file, subreq->start);
 	if (to < 0 && to >= (loff_t)-MAX_ERRNO) {
 		trace_cachefiles_io_error(object, file_inode(file), to,
 					  cachefiles_trace_seek_error);
@@ -537,7 +563,7 @@ int __cachefiles_prepare_write(struct cachefiles_object *object,
 
 	pos = cachefiles_inject_read_error();
 	if (pos == 0)
-		pos = vfs_llseek(file, *_start, SEEK_DATA);
+		pos = cachefiles_seek_data(object, file, *_start);
 	if (pos < 0 && pos >= (loff_t)-MAX_ERRNO) {
 		if (pos == -ENXIO)
 			goto check_space; /* Unallocated tail */
@@ -558,7 +584,7 @@ int __cachefiles_prepare_write(struct cachefiles_object *object,
 
 	pos = cachefiles_inject_read_error();
 	if (pos == 0)
-		pos = vfs_llseek(file, *_start, SEEK_HOLE);
+		pos = cachefiles_seek_hole(object, file, *_start);
 	if (pos < 0 && pos >= (loff_t)-MAX_ERRNO) {
 		trace_cachefiles_io_error(object, file_inode(file), pos,
 					  cachefiles_trace_seek_error);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 7/9] cachefiles: free content map on invalidate
  2022-08-02  3:03 [RFC PATCH 0/9] cachefiles: content map Jingbo Xu
                   ` (5 preceding siblings ...)
  2022-08-02  3:03 ` [PATCH RFC 6/9] cachefiles: check content map on read/write Jingbo Xu
@ 2022-08-02  3:03 ` Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 8/9] cachefiles: resize content map on resize Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 9/9] cachefiles: cull content map file on cull Jingbo Xu
  8 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2022-08-02  3:03 UTC (permalink / raw)
  To: dhowells, linux-cachefs; +Cc: linux-kernel, xiang

Free the content map when the cached file is invalidated. Also hole
punch the backing content map file if any.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/content-map.c | 21 +++++++++++++++++++++
 fs/cachefiles/interface.c   |  1 +
 fs/cachefiles/internal.h    |  1 +
 3 files changed, 23 insertions(+)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 949ec5d9e4c9..b73a109844ca 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -250,3 +250,24 @@ loff_t cachefiles_find_next_hole(struct cachefiles_object *object,
 	return min_t(loff_t, result * CACHEFILES_GRAN_SIZE,
 			     object->cookie->object_size);
 }
+
+void cachefiles_invalidate_content_map(struct cachefiles_object *object)
+{
+	struct file *file = object->volume->content_map[(u8)object->cookie->key_hash];
+
+	if (object->content_info != CACHEFILES_CONTENT_MAP)
+		return;
+
+	write_lock_bh(&object->content_map_lock);
+	free_pages((unsigned long)object->content_map,
+		   get_order(object->content_map_size));
+	object->content_map = NULL;
+	object->content_map_size = 0;
+
+	if (object->content_map_off != CACHEFILES_CONTENT_MAP_OFF_INVAL) {
+		vfs_fallocate(file, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+				object->content_map_off, object->content_map_size);
+		object->content_map_off = CACHEFILES_CONTENT_MAP_OFF_INVAL;
+	}
+	write_unlock_bh(&object->content_map_lock);
+}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 4cfbdc87b635..f87b9a665d85 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -409,6 +409,7 @@ static bool cachefiles_invalidate_cookie(struct fscache_cookie *cookie)
 
 	old_file = object->file;
 	object->file = new_file;
+	cachefiles_invalidate_content_map(object);
 	object->content_info = CACHEFILES_CONTENT_NO_DATA;
 	set_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
 	set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flags);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 506700809a6d..c674c4e42529 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -187,6 +187,7 @@ extern loff_t cachefiles_find_next_granule(struct cachefiles_object *object,
 					   loff_t start);
 extern loff_t cachefiles_find_next_hole(struct cachefiles_object *object,
 					loff_t start);
+extern void cachefiles_invalidate_content_map(struct cachefiles_object *object);
 
 /*
  * daemon.c
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 8/9] cachefiles: resize content map on resize
  2022-08-02  3:03 [RFC PATCH 0/9] cachefiles: content map Jingbo Xu
                   ` (6 preceding siblings ...)
  2022-08-02  3:03 ` [PATCH RFC 7/9] cachefiles: free content map on invalidate Jingbo Xu
@ 2022-08-02  3:03 ` Jingbo Xu
  2022-08-02  3:03 ` [PATCH RFC 9/9] cachefiles: cull content map file on cull Jingbo Xu
  8 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2022-08-02  3:03 UTC (permalink / raw)
  To: dhowells, linux-cachefs; +Cc: linux-kernel, xiang

Adjust the content map when we shorten a backing object. In this case,
only the unused tail of the content map after shortening gets zeroed,
while the size of the content map itself is not changed. Also the
corresponding range in the backing content map file is not changed.

Besides, the content map and the corresponding range in the backing
content map file are not touched when we expand a backing object. They
will be lazily expanded at runtime later.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/content-map.c | 23 +++++++++++++++++++++++
 fs/cachefiles/interface.c   |  1 +
 fs/cachefiles/internal.h    |  2 ++
 3 files changed, 26 insertions(+)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index b73a109844ca..360c59b06670 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -271,3 +271,26 @@ void cachefiles_invalidate_content_map(struct cachefiles_object *object)
 	}
 	write_unlock_bh(&object->content_map_lock);
 }
+
+/*
+ * Adjust the content map when we shorten a backing object.
+ */
+void cachefiles_shorten_content_map(struct cachefiles_object *object,
+				    loff_t new_size)
+{
+	if (object->content_info != CACHEFILES_CONTENT_MAP)
+		return;
+
+	read_lock_bh(&object->content_map_lock);
+	/*
+	 * Nothing needs to be done when content map has not been allocated yet.
+	 */
+	if (!object->content_map_size)
+		goto out;
+
+	if (cachefiles_map_size(new_size) <= object->content_map_size)
+		cachefiles_zero_content_map(object->content_map,
+				object->content_map_size, new_size);
+out:
+	read_unlock_bh(&object->content_map_lock);
+}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index f87b9a665d85..76f70a9ebe50 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -290,6 +290,7 @@ static void cachefiles_resize_cookie(struct netfs_cache_resources *cres,
 		cachefiles_begin_secure(cache, &saved_cred);
 		cachefiles_shorten_object(object, file, new_size);
 		cachefiles_end_secure(cache, saved_cred);
+		cachefiles_shorten_content_map(object, new_size);
 		object->cookie->object_size = new_size;
 		return;
 	}
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index c674c4e42529..7747f99f00c1 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -188,6 +188,8 @@ extern loff_t cachefiles_find_next_granule(struct cachefiles_object *object,
 extern loff_t cachefiles_find_next_hole(struct cachefiles_object *object,
 					loff_t start);
 extern void cachefiles_invalidate_content_map(struct cachefiles_object *object);
+extern void cachefiles_shorten_content_map(struct cachefiles_object *object,
+					   loff_t new_size);
 
 /*
  * daemon.c
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 9/9] cachefiles: cull content map file on cull
  2022-08-02  3:03 [RFC PATCH 0/9] cachefiles: content map Jingbo Xu
                   ` (7 preceding siblings ...)
  2022-08-02  3:03 ` [PATCH RFC 8/9] cachefiles: resize content map on resize Jingbo Xu
@ 2022-08-02  3:03 ` Jingbo Xu
  8 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2022-08-02  3:03 UTC (permalink / raw)
  To: dhowells, linux-cachefs; +Cc: linux-kernel, xiang

Also hole punch the backing content map file when the backing object
gets culled.

When cacehfilesd is going to cull a whole directory, the whole
directory will be moved to the graveyard and then cacehfilesd itself
will remove all files under the directory one by one. Since each
sub-directory under one volume maintains one backing content map file,
cacehfilesd already works well with this bitmap-based mechanism and
doesn't need any refactoring.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/content-map.c | 37 +++++++++++++++++++++++++++++++++++++
 fs/cachefiles/internal.h    |  4 ++++
 fs/cachefiles/namei.c       |  4 ++++
 fs/cachefiles/xattr.c       | 17 +++++++++++++++++
 4 files changed, 62 insertions(+)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 360c59b06670..5584a0182df9 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -294,3 +294,40 @@ void cachefiles_shorten_content_map(struct cachefiles_object *object,
 out:
 	read_unlock_bh(&object->content_map_lock);
 }
+
+int cachefiles_cull_content_map(struct cachefiles_cache *cache,
+				struct dentry *dir, struct dentry *victim)
+{
+	struct dentry *map;
+	struct file *map_file;
+	size_t content_map_size = 0;
+	loff_t content_map_off = 0;
+	struct path path;
+	int ret;
+
+	if (!d_is_reg(victim))
+		return 0;
+
+	ret = cachefiles_get_content_info(victim, &content_map_size, &content_map_off);
+	if (ret || !content_map_size)
+		return ret;
+
+	map = lookup_positive_unlocked("Map", dir, strlen("Map"));
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	path.mnt = cache->mnt;
+	path.dentry = map;
+	map_file = open_with_fake_path(&path, O_RDWR | O_LARGEFILE,
+			d_backing_inode(map), cache->cache_cred);
+	if (IS_ERR(map_file)) {
+		dput(map);
+		return PTR_ERR(map_file);
+	}
+
+	ret = vfs_fallocate(map_file, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+			      content_map_off, content_map_size);
+
+	fput(map_file);
+	return ret;
+}
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 7747f99f00c1..9c36631ee051 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -190,6 +190,8 @@ extern loff_t cachefiles_find_next_hole(struct cachefiles_object *object,
 extern void cachefiles_invalidate_content_map(struct cachefiles_object *object);
 extern void cachefiles_shorten_content_map(struct cachefiles_object *object,
 					   loff_t new_size);
+extern int cachefiles_cull_content_map(struct cachefiles_cache *cache,
+				struct dentry *dir, struct dentry *victim);
 
 /*
  * daemon.c
@@ -384,6 +386,8 @@ extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
 extern void cachefiles_prepare_to_write(struct fscache_cookie *cookie);
 extern bool cachefiles_set_volume_xattr(struct cachefiles_volume *volume);
 extern int cachefiles_check_volume_xattr(struct cachefiles_volume *volume);
+extern int cachefiles_get_content_info(struct dentry *dentry,
+		size_t *content_map_size, loff_t *content_map_off);
 
 /*
  * Error handling
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index f5e1ec1d9445..79c759468ab3 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -923,6 +923,10 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 	if (ret < 0)
 		goto error_unlock;
 
+	ret = cachefiles_cull_content_map(cache, dir, victim);
+	if (ret < 0)
+		goto error;
+
 	ret = cachefiles_bury_object(cache, NULL, dir, victim,
 				     FSCACHE_OBJECT_WAS_CULLED);
 	if (ret < 0)
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 05ac6b70787a..b7091c8e4262 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -283,3 +283,20 @@ int cachefiles_check_volume_xattr(struct cachefiles_volume *volume)
 	_leave(" = %d", ret);
 	return ret;
 }
+
+int cachefiles_get_content_info(struct dentry *dentry, size_t *content_map_size,
+				loff_t *content_map_off)
+{
+	struct cachefiles_xattr buf;
+	ssize_t xlen, tlen = sizeof(buf);
+
+	xlen = vfs_getxattr(&init_user_ns, dentry, cachefiles_xattr_cache, &buf, tlen);
+	if (xlen != tlen)
+		return -ESTALE;
+
+	if (buf.content == CACHEFILES_CONTENT_MAP) {
+		*content_map_off = be64_to_cpu(buf.content_map_off);
+		*content_map_size = be64_to_cpu(buf.content_map_size);
+	}
+	return 0;
+}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-08-02  3:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-02  3:03 [RFC PATCH 0/9] cachefiles: content map Jingbo Xu
2022-08-02  3:03 ` [PATCH RFC 1/9] cachefiles: improve FSCACHE_COOKIE_NO_DATA_TO_READ optimization Jingbo Xu
2022-08-02  3:03 ` [PATCH RFC 2/9] cachefiles: add content map file helpers Jingbo Xu
2022-08-02  3:03 ` [PATCH RFC 3/9] cachefiles: allocate per-subdir content map files Jingbo Xu
2022-08-02  3:03 ` [PATCH RFC 4/9] cachefiles: alloc/load/save content map Jingbo Xu
2022-08-02  3:03 ` [PATCH RFC 5/9] cachefiles: mark content map on write to the backing file Jingbo Xu
2022-08-02  3:03 ` [PATCH RFC 6/9] cachefiles: check content map on read/write Jingbo Xu
2022-08-02  3:03 ` [PATCH RFC 7/9] cachefiles: free content map on invalidate Jingbo Xu
2022-08-02  3:03 ` [PATCH RFC 8/9] cachefiles: resize content map on resize Jingbo Xu
2022-08-02  3:03 ` [PATCH RFC 9/9] cachefiles: cull content map file on cull Jingbo Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.