All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/11] ceph: convert to new FSCache API
@ 2020-07-31 13:04 Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 01/11] ceph: break out writeback of incompatible snap context to separate function Jeff Layton
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

This patchset converts ceph to use the new (not yet merged) FSCache API.
Trying to use fscache+ceph today usually results in oopses. With this
series, it seems to be quite stable.

Where possible, I've converted the code to use the new read helper,
which hides away a lot of the gory details of page handling, which I think
makes the resulting code clearer than it was.

It starts with a few cleanup/reorganization patches to prepare the code. I then
rip out most of the old ceph fscache helpers and replace them with new
ones for the new API.

The rest of the series then plugs buffered read/write caching support
back into the code, with the most of the read-side routines using the
fscache_read_helper.

This passes xfstests' quick group run with the cache disabled. With it
enabled, it passed most of it, but I hit some OOM kills on generic/531.
Still tracking that bit down, but we suspect the problem is in
fscache/cachefiles code and not in these patches.

This is based on top of David's latest fscache-iter branch:

    https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

...my branch is here:

    https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/log/?h=ceph-fscache-iter

Jeff Layton (11):
  ceph: break out writeback of incompatible snap context to separate
    function
  ceph: don't call ceph_update_writeable_page from page_mkwrite
  ceph: fold ceph_sync_readpages into ceph_readpage
  ceph: fold ceph_sync_writepages into writepage_nounlock
  ceph: fold ceph_update_writeable_page into ceph_write_begin
  ceph: conversion to new fscache API
  ceph: convert readpage to fscache read helper
  ceph: plug write_begin into read helper
  ceph: convert readpages to fscache_read_helper
  ceph: add fscache writeback support
  ceph: re-enable fscache support

 fs/ceph/Kconfig |   4 +-
 fs/ceph/addr.c  | 939 +++++++++++++++++++++++++++---------------------
 fs/ceph/cache.c | 290 ++++-----------
 fs/ceph/cache.h | 106 ++----
 fs/ceph/caps.c  |  11 +-
 fs/ceph/file.c  |  13 +-
 fs/ceph/inode.c |  14 +-
 fs/ceph/super.h |   1 -
 8 files changed, 645 insertions(+), 733 deletions(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 01/11] ceph: break out writeback of incompatible snap context to separate function
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 02/11] ceph: don't call ceph_update_writeable_page from page_mkwrite Jeff Layton
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

When dirtying a page, we have to flush incompatible contexts. Move that
into a separate function.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c | 96 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 60 insertions(+), 36 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 6ea761c84494..d8a8803f0e65 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1299,40 +1299,34 @@ static int context_is_writeable_or_written(struct inode *inode,
 	return ret;
 }
 
-/*
- * We are only allowed to write into/dirty the page if the page is
- * clean, or already dirty within the same snap context.
+/**
+ * ceph_find_incompatible - find an incompatible context and return it
+ * @inode: inode associated with page
+ * @page: page being dirtied
  *
- * called with page locked.
- * return success with page locked,
- * or any failure (incl -EAGAIN) with page unlocked.
+ * Returns NULL on success, negative error code on error, and a snapc ref that should be
+ * waited on otherwise.
  */
-static int ceph_update_writeable_page(struct file *file,
-			    loff_t pos, unsigned len,
-			    struct page *page)
+static struct ceph_snap_context *
+ceph_find_incompatible(struct inode *inode, struct page *page)
 {
-	struct inode *inode = file_inode(file);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	struct ceph_inode_info *ci = ceph_inode(inode);
-	loff_t page_off = pos & PAGE_MASK;
-	int pos_in_page = pos & ~PAGE_MASK;
-	int end_in_page = pos_in_page + len;
-	loff_t i_size;
-	int r;
-	struct ceph_snap_context *snapc, *oldest;
 
 	if (READ_ONCE(fsc->mount_state) == CEPH_MOUNT_SHUTDOWN) {
 		dout(" page %p forced umount\n", page);
-		unlock_page(page);
-		return -EIO;
+		return ERR_PTR(-EIO);
 	}
 
-retry_locked:
-	/* writepages currently holds page lock, but if we change that later, */
-	wait_on_page_writeback(page);
+	for (;;) {
+		struct ceph_snap_context *snapc, *oldest;
+
+		wait_on_page_writeback(page);
+
+		snapc = page_snap_context(page);
+		if (!snapc || snapc == ci->i_head_snapc)
+			break;
 
-	snapc = page_snap_context(page);
-	if (snapc && snapc != ci->i_head_snapc) {
 		/*
 		 * this page is already dirty in another (older) snap
 		 * context!  is it writeable now?
@@ -1347,26 +1341,56 @@ static int ceph_update_writeable_page(struct file *file,
 			 * be writeable or written
 			 */
 			snapc = ceph_get_snap_context(snapc);
-			unlock_page(page);
-			ceph_queue_writeback(inode);
-			r = wait_event_killable(ci->i_cap_wq,
-			       context_is_writeable_or_written(inode, snapc));
-			ceph_put_snap_context(snapc);
-			if (r == -ERESTARTSYS)
-				return r;
-			return -EAGAIN;
+			return snapc;
 		}
 		ceph_put_snap_context(oldest);
 
 		/* yay, writeable, do it now (without dropping page lock) */
 		dout(" page %p snapc %p not current, but oldest\n",
 		     page, snapc);
-		if (!clear_page_dirty_for_io(page))
-			goto retry_locked;
-		r = writepage_nounlock(page, NULL);
-		if (r < 0)
+		if (clear_page_dirty_for_io(page)) {
+			int r = writepage_nounlock(page, NULL);
+			if (r < 0)
+				return ERR_PTR(r);
+		}
+	}
+	return NULL;
+}
+
+/*
+ * We are only allowed to write into/dirty the page if the page is
+ * clean, or already dirty within the same snap context.
+ *
+ * called with page locked.
+ * return success with page locked,
+ * or any failure (incl -EAGAIN) with page unlocked.
+ */
+static int ceph_update_writeable_page(struct file *file,
+			    loff_t pos, unsigned len,
+			    struct page *page)
+{
+	struct inode *inode = file_inode(file);
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	struct ceph_snap_context *snapc;
+	loff_t page_off = pos & PAGE_MASK;
+	int pos_in_page = pos & ~PAGE_MASK;
+	int end_in_page = pos_in_page + len;
+	loff_t i_size;
+	int r;
+
+retry_locked:
+	snapc = ceph_find_incompatible(inode, page);
+	if (snapc) {
+		if (IS_ERR(snapc)) {
+			r = PTR_ERR(snapc);
 			goto fail_unlock;
-		goto retry_locked;
+		}
+		unlock_page(page);
+		ceph_queue_writeback(inode);
+		r = wait_event_killable(ci->i_cap_wq,
+					context_is_writeable_or_written(inode, snapc));
+		ceph_put_snap_context(snapc);
+		return -EAGAIN;
 	}
 
 	if (PageUptodate(page)) {
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 02/11] ceph: don't call ceph_update_writeable_page from page_mkwrite
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 01/11] ceph: break out writeback of incompatible snap context to separate function Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 03/11] ceph: fold ceph_sync_readpages into ceph_readpage Jeff Layton
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

page_mkwrite should only be called with Uptodate pages, so we should
only need to flush incompatible snap contexts.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index d8a8803f0e65..e02d8915376f 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1692,6 +1692,8 @@ static vm_fault_t ceph_page_mkwrite(struct vm_fault *vmf)
 	inode_inc_iversion_raw(inode);
 
 	do {
+		struct ceph_snap_context *snapc;
+
 		lock_page(page);
 
 		if (page_mkwrite_check_truncate(page, inode) < 0) {
@@ -1700,13 +1702,26 @@ static vm_fault_t ceph_page_mkwrite(struct vm_fault *vmf)
 			break;
 		}
 
-		err = ceph_update_writeable_page(vma->vm_file, off, len, page);
-		if (err >= 0) {
+		snapc = ceph_find_incompatible(inode, page);
+		if (!snapc) {
 			/* success.  we'll keep the page locked. */
 			set_page_dirty(page);
 			ret = VM_FAULT_LOCKED;
+			break;
 		}
-	} while (err == -EAGAIN);
+
+		unlock_page(page);
+
+		if (IS_ERR(snapc)) {
+			ret = VM_FAULT_SIGBUS;
+			break;
+		}
+
+		ceph_queue_writeback(inode);
+		err = wait_event_killable(ci->i_cap_wq,
+				context_is_writeable_or_written(inode, snapc));
+		ceph_put_snap_context(snapc);
+	} while (err == 0);
 
 	if (ret == VM_FAULT_LOCKED ||
 	    ci->i_inline_version != CEPH_INLINE_NONE) {
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 03/11] ceph: fold ceph_sync_readpages into ceph_readpage
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 01/11] ceph: break out writeback of incompatible snap context to separate function Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 02/11] ceph: don't call ceph_update_writeable_page from page_mkwrite Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 04/11] ceph: fold ceph_sync_writepages into writepage_nounlock Jeff Layton
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

It's the only caller and this will make reorg easier.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c | 79 ++++++++++++++++----------------------------------
 1 file changed, 25 insertions(+), 54 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index e02d8915376f..a30e18495f70 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -182,58 +182,15 @@ static int ceph_releasepage(struct page *page, gfp_t g)
 	return !PagePrivate(page);
 }
 
-/*
- * Read some contiguous pages.  If we cross a stripe boundary, shorten
- * *plen.  Return number of bytes read, or error.
- */
-static int ceph_sync_readpages(struct ceph_fs_client *fsc,
-			       struct ceph_vino vino,
-			       struct ceph_file_layout *layout,
-			       u64 off, u64 *plen,
-			       u32 truncate_seq, u64 truncate_size,
-			       struct page **pages, int num_pages,
-			       int page_align)
-{
-	struct ceph_osd_client *osdc = &fsc->client->osdc;
-	struct ceph_osd_request *req;
-	int rc = 0;
-
-	dout("readpages on ino %llx.%llx on %llu~%llu\n", vino.ino,
-	     vino.snap, off, *plen);
-	req = ceph_osdc_new_request(osdc, layout, vino, off, plen, 0, 1,
-				    CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
-				    NULL, truncate_seq, truncate_size,
-				    false);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
-
-	/* it may be a short read due to an object boundary */
-	osd_req_op_extent_osd_data_pages(req, 0,
-				pages, *plen, page_align, false, false);
-
-	dout("readpages  final extent is %llu~%llu (%llu bytes align %d)\n",
-	     off, *plen, *plen, page_align);
-
-	rc = ceph_osdc_start_request(osdc, req, false);
-	if (!rc)
-		rc = ceph_osdc_wait_request(osdc, req);
-
-	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_latency,
-				 req->r_end_latency, rc);
-
-	ceph_osdc_put_request(req);
-	dout("readpages result %d\n", rc);
-	return rc;
-}
-
-/*
- * read a single page, without unlocking it.
- */
+/* read a single page, without unlocking it. */
 static int ceph_do_readpage(struct file *filp, struct page *page)
 {
 	struct inode *inode = file_inode(filp);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_osd_client *osdc = &fsc->client->osdc;
+	struct ceph_osd_request *req;
+	struct ceph_vino vino = ceph_vino(inode);
 	int err = 0;
 	u64 off = page_offset(page);
 	u64 len = PAGE_SIZE;
@@ -260,12 +217,27 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
 	if (err == 0)
 		return -EINPROGRESS;
 
-	dout("readpage inode %p file %p page %p index %lu\n",
-	     inode, filp, page, page->index);
-	err = ceph_sync_readpages(fsc, ceph_vino(inode),
-				  &ci->i_layout, off, &len,
-				  ci->i_truncate_seq, ci->i_truncate_size,
-				  &page, 1, 0);
+	dout("readpage ino %llx.%llx file %p off %llu len %llu page %p index %lu\n",
+	     vino.ino, vino.snap, filp, off, len, page, page->index);
+	req = ceph_osdc_new_request(osdc, &ci->i_layout, vino, off, &len, 0, 1,
+				    CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ, NULL,
+				    ci->i_truncate_seq, ci->i_truncate_size,
+				    false);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	osd_req_op_extent_osd_data_pages(req, 0, &page, len, 0, false, false);
+
+	err = ceph_osdc_start_request(osdc, req, false);
+	if (!err)
+		err = ceph_osdc_wait_request(osdc, req);
+
+	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_latency,
+				 req->r_end_latency, err);
+
+	ceph_osdc_put_request(req);
+	dout("readpage result %d\n", err);
+
 	if (err == -ENOENT)
 		err = 0;
 	if (err < 0) {
@@ -283,7 +255,6 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
 
 	SetPageUptodate(page);
 	ceph_readpage_to_fscache(inode, page);
-
 out:
 	return err < 0 ? err : 0;
 }
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 04/11] ceph: fold ceph_sync_writepages into writepage_nounlock
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
                   ` (2 preceding siblings ...)
  2020-07-31 13:04 ` [RFC PATCH v2 03/11] ceph: fold ceph_sync_readpages into ceph_readpage Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 05/11] ceph: fold ceph_update_writeable_page into ceph_write_begin Jeff Layton
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

It's the only caller, and this will make it easier to refactor.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c | 93 +++++++++++++++++++-------------------------------
 1 file changed, 35 insertions(+), 58 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index a30e18495f70..a04eaf75480b 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -590,50 +590,6 @@ static u64 get_writepages_data_length(struct inode *inode,
 	return end > start ? end - start : 0;
 }
 
-/*
- * do a synchronous write on N pages
- */
-static int ceph_sync_writepages(struct ceph_fs_client *fsc,
-				struct ceph_vino vino,
-				struct ceph_file_layout *layout,
-				struct ceph_snap_context *snapc,
-				u64 off, u64 len,
-				u32 truncate_seq, u64 truncate_size,
-				struct timespec64 *mtime,
-				struct page **pages, int num_pages)
-{
-	struct ceph_osd_client *osdc = &fsc->client->osdc;
-	struct ceph_osd_request *req;
-	int rc = 0;
-	int page_align = off & ~PAGE_MASK;
-
-	req = ceph_osdc_new_request(osdc, layout, vino, off, &len, 0, 1,
-				    CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE,
-				    snapc, truncate_seq, truncate_size,
-				    true);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
-
-	/* it may be a short write due to an object boundary */
-	osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_align,
-				false, false);
-	dout("writepages %llu~%llu (%llu bytes)\n", off, len, len);
-
-	req->r_mtime = *mtime;
-	rc = ceph_osdc_start_request(osdc, req, true);
-	if (!rc)
-		rc = ceph_osdc_wait_request(osdc, req);
-
-	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_latency,
-				  req->r_end_latency, rc);
-
-	ceph_osdc_put_request(req);
-	if (rc == 0)
-		rc = len;
-	dout("writepages result %d\n", rc);
-	return rc;
-}
-
 /*
  * Write a single page, but leave the page locked.
  *
@@ -642,20 +598,19 @@ static int ceph_sync_writepages(struct ceph_fs_client *fsc,
  */
 static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 {
-	struct inode *inode;
-	struct ceph_inode_info *ci;
-	struct ceph_fs_client *fsc;
+	struct inode *inode = page->mapping->host;
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	struct ceph_snap_context *snapc, *oldest;
 	loff_t page_off = page_offset(page);
-	int err, len = PAGE_SIZE;
+	int err;
+	loff_t len = PAGE_SIZE;
 	struct ceph_writeback_ctl ceph_wbc;
+	struct ceph_osd_client *osdc = &fsc->client->osdc;
+	struct ceph_osd_request *req;
 
 	dout("writepage %p idx %lu\n", page, page->index);
 
-	inode = page->mapping->host;
-	ci = ceph_inode(inode);
-	fsc = ceph_inode_to_client(inode);
-
 	/* verify this is a writeable snap context */
 	snapc = page_snap_context(page);
 	if (!snapc) {
@@ -684,7 +639,7 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 	if (ceph_wbc.i_size < page_off + len)
 		len = ceph_wbc.i_size - page_off;
 
-	dout("writepage %p page %p index %lu on %llu~%u snapc %p seq %lld\n",
+	dout("writepage %p page %p index %lu on %llu~%llu snapc %p seq %lld\n",
 	     inode, page, page->index, page_off, len, snapc, snapc->seq);
 
 	if (atomic_long_inc_return(&fsc->writeback_count) >
@@ -692,11 +647,33 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 		set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
 
 	set_page_writeback(page);
-	err = ceph_sync_writepages(fsc, ceph_vino(inode),
-				   &ci->i_layout, snapc, page_off, len,
-				   ceph_wbc.truncate_seq,
-				   ceph_wbc.truncate_size,
-				   &inode->i_mtime, &page, 1);
+	req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1,
+				    CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc,
+				    ceph_wbc.truncate_seq, ceph_wbc.truncate_size,
+				    true);
+	if (IS_ERR(req)) {
+		redirty_page_for_writepage(wbc, page);
+		end_page_writeback(page);
+		return PTR_ERR(req);
+	}
+
+	/* it may be a short write due to an object boundary */
+	WARN_ON_ONCE(len > PAGE_SIZE);
+	osd_req_op_extent_osd_data_pages(req, 0, &page, len, 0, false, false);
+	dout("writepage %llu~%llu (%llu bytes)\n", page_off, len, len);
+
+	req->r_mtime = inode->i_mtime;
+	err = ceph_osdc_start_request(osdc, req, true);
+	if (!err)
+		err = ceph_osdc_wait_request(osdc, req);
+
+	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_latency,
+				  req->r_end_latency, err);
+
+	ceph_osdc_put_request(req);
+	if (err == 0)
+		err = len;
+
 	if (err < 0) {
 		struct writeback_control tmp_wbc;
 		if (!wbc)
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 05/11] ceph: fold ceph_update_writeable_page into ceph_write_begin
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
                   ` (3 preceding siblings ...)
  2020-07-31 13:04 ` [RFC PATCH v2 04/11] ceph: fold ceph_sync_writepages into writepage_nounlock Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 06/11] ceph: conversion to new fscache API Jeff Layton
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

...and reorganize the loop for better clarity.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c | 141 ++++++++++++++++++++++---------------------------
 1 file changed, 62 insertions(+), 79 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index a04eaf75480b..f4df04769761 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1252,6 +1252,8 @@ static int context_is_writeable_or_written(struct inode *inode,
  * @inode: inode associated with page
  * @page: page being dirtied
  *
+ * We are only allowed to write into/dirty the page if the page is
+ * clean, or already dirty within the same snap context.
  * Returns NULL on success, negative error code on error, and a snapc ref that should be
  * waited on otherwise.
  */
@@ -1308,104 +1310,85 @@ ceph_find_incompatible(struct inode *inode, struct page *page)
 /*
  * We are only allowed to write into/dirty the page if the page is
  * clean, or already dirty within the same snap context.
- *
- * called with page locked.
- * return success with page locked,
- * or any failure (incl -EAGAIN) with page unlocked.
  */
-static int ceph_update_writeable_page(struct file *file,
-			    loff_t pos, unsigned len,
-			    struct page *page)
+static int ceph_write_begin(struct file *file, struct address_space *mapping,
+			    loff_t pos, unsigned len, unsigned flags,
+			    struct page **pagep, void **fsdata)
 {
 	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_snap_context *snapc;
+	struct page *page = NULL;
+	pgoff_t index = pos >> PAGE_SHIFT;
 	loff_t page_off = pos & PAGE_MASK;
 	int pos_in_page = pos & ~PAGE_MASK;
 	int end_in_page = pos_in_page + len;
 	loff_t i_size;
 	int r;
+refind:
+	/* get a page */
+	page = grab_cache_page_write_begin(mapping, index, 0);
+	if (!page)
+		return -ENOMEM;
 
-retry_locked:
-	snapc = ceph_find_incompatible(inode, page);
-	if (snapc) {
-		if (IS_ERR(snapc)) {
-			r = PTR_ERR(snapc);
-			goto fail_unlock;
+	dout("write_begin file %p inode %p page %p %d~%d\n", file,
+	     inode, page, (int)pos, (int)len);
+
+	for (;;) {
+		snapc = ceph_find_incompatible(inode, page);
+		if (snapc) {
+			if (IS_ERR(snapc)) {
+				r = PTR_ERR(snapc);
+				break;
+			}
+			unlock_page(page);
+			ceph_queue_writeback(inode);
+			r = wait_event_killable(ci->i_cap_wq,
+						context_is_writeable_or_written(inode, snapc));
+			ceph_put_snap_context(snapc);
+			put_page(page);
+			goto refind;
 		}
-		unlock_page(page);
-		ceph_queue_writeback(inode);
-		r = wait_event_killable(ci->i_cap_wq,
-					context_is_writeable_or_written(inode, snapc));
-		ceph_put_snap_context(snapc);
-		return -EAGAIN;
-	}
 
-	if (PageUptodate(page)) {
-		dout(" page %p already uptodate\n", page);
-		return 0;
-	}
+		if (PageUptodate(page)) {
+			dout(" page %p already uptodate\n", page);
+			break;
+		}
 
-	/* full page? */
-	if (pos_in_page == 0 && len == PAGE_SIZE)
-		return 0;
+		/* full page? */
+		if (pos_in_page == 0 && len == PAGE_SIZE)
+			break;
 
-	/* past end of file? */
-	i_size = i_size_read(inode);
-
-	if (page_off >= i_size ||
-	    (pos_in_page == 0 && (pos+len) >= i_size &&
-	     end_in_page - pos_in_page != PAGE_SIZE)) {
-		dout(" zeroing %p 0 - %d and %d - %d\n",
-		     page, pos_in_page, end_in_page, (int)PAGE_SIZE);
-		zero_user_segments(page,
-				   0, pos_in_page,
-				   end_in_page, PAGE_SIZE);
-		return 0;
-	}
+		/* past end of file? */
+		i_size = i_size_read(inode);
+		if (page_off >= i_size ||
+		    (pos_in_page == 0 && (pos+len) >= i_size &&
+		     end_in_page - pos_in_page != PAGE_SIZE)) {
+			dout(" zeroing %p 0 - %d and %d - %d\n",
+			     page, pos_in_page, end_in_page, (int)PAGE_SIZE);
+			zero_user_segments(page,
+					   0, pos_in_page,
+					   end_in_page, PAGE_SIZE);
+			break;
+		}
 
-	/* we need to read it. */
-	r = ceph_do_readpage(file, page);
-	if (r < 0) {
-		if (r == -EINPROGRESS)
-			return -EAGAIN;
-		goto fail_unlock;
+		/* we need to read it. */
+		r = ceph_do_readpage(file, page);
+		if (r) {
+			if (r == -EINPROGRESS)
+				continue;
+			break;
+		}
 	}
-	goto retry_locked;
-fail_unlock:
-	unlock_page(page);
-	return r;
-}
 
-/*
- * We are only allowed to write into/dirty the page if the page is
- * clean, or already dirty within the same snap context.
- */
-static int ceph_write_begin(struct file *file, struct address_space *mapping,
-			    loff_t pos, unsigned len, unsigned flags,
-			    struct page **pagep, void **fsdata)
-{
-	struct inode *inode = file_inode(file);
-	struct page *page;
-	pgoff_t index = pos >> PAGE_SHIFT;
-	int r;
-
-	do {
-		/* get a page */
-		page = grab_cache_page_write_begin(mapping, index, 0);
-		if (!page)
-			return -ENOMEM;
-
-		dout("write_begin file %p inode %p page %p %d~%d\n", file,
-		     inode, page, (int)pos, (int)len);
-
-		r = ceph_update_writeable_page(file, pos, len, page);
-		if (r < 0)
+	if (r < 0) {
+		if (page) {
+			unlock_page(page);
 			put_page(page);
-		else
-			*pagep = page;
-	} while (r == -EAGAIN);
-
+		}
+	} else {
+		*pagep = page;
+	}
 	return r;
 }
 
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 06/11] ceph: conversion to new fscache API
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
                   ` (4 preceding siblings ...)
  2020-07-31 13:04 ` [RFC PATCH v2 05/11] ceph: fold ceph_update_writeable_page into ceph_write_begin Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 07/11] ceph: convert readpage to fscache read helper Jeff Layton
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

Now that the fscache API has been reworked and simplified, start
changing ceph over to use it. Drop functions that are no longer needed
and rework some of the others to conform to the new API.

Change how cookies are managed as well. With the old API, we would only
instantiate a cookie when the file was open for reads. Change it to
instantiate the cookie when the inode is instantiated and call use/unuse
when the file is opened/closed. This will allow us to plumb in write
support in subsequent patches.

For now, just rip most of the code out of the read+write codepaths, as
subsequent patches will rework that code to use new helper functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c  |  47 +++-----
 fs/ceph/cache.c | 290 +++++++++++++-----------------------------------
 fs/ceph/cache.h | 106 ++++--------------
 fs/ceph/caps.c  |  11 +-
 fs/ceph/file.c  |  13 ++-
 fs/ceph/inode.c |  14 ++-
 fs/ceph/super.h |   1 -
 7 files changed, 135 insertions(+), 347 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index f4df04769761..e005c32270f5 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -155,19 +155,18 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
 		return;
 	}
 
-	ceph_invalidate_fscache_page(inode, page);
-
 	WARN_ON(!PageLocked(page));
-	if (!PagePrivate(page))
-		return;
+	if (PagePrivate(page)) {
+		dout("%p invalidatepage %p idx %lu full dirty page\n",
+		     inode, page, page->index);
 
-	dout("%p invalidatepage %p idx %lu full dirty page\n",
-	     inode, page, page->index);
+		ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
+		ceph_put_snap_context(snapc);
+		page->private = 0;
+		ClearPagePrivate(page);
+	}
 
-	ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
-	ceph_put_snap_context(snapc);
-	page->private = 0;
-	ClearPagePrivate(page);
+	ceph_wait_on_page_fscache(page);
 }
 
 static int ceph_releasepage(struct page *page, gfp_t g)
@@ -175,11 +174,12 @@ static int ceph_releasepage(struct page *page, gfp_t g)
 	dout("%p releasepage %p idx %lu (%sdirty)\n", page->mapping->host,
 	     page, page->index, PageDirty(page) ? "" : "not ");
 
-	/* Can we release the page from the cache? */
-	if (!ceph_release_fscache_page(page, g))
+	if (PagePrivate(page))
 		return 0;
 
-	return !PagePrivate(page);
+	ceph_wait_on_page_fscache(page);
+
+	return 1;
 }
 
 /* read a single page, without unlocking it. */
@@ -213,10 +213,6 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
 		return 0;
 	}
 
-	err = ceph_readpage_from_fscache(inode, page);
-	if (err == 0)
-		return -EINPROGRESS;
-
 	dout("readpage ino %llx.%llx file %p off %llu len %llu page %p index %lu\n",
 	     vino.ino, vino.snap, filp, off, len, page, page->index);
 	req = ceph_osdc_new_request(osdc, &ci->i_layout, vino, off, &len, 0, 1,
@@ -242,7 +238,6 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
 		err = 0;
 	if (err < 0) {
 		SetPageError(page);
-		ceph_fscache_readpage_cancel(inode, page);
 		if (err == -EBLACKLISTED)
 			fsc->blacklisted = true;
 		goto out;
@@ -254,7 +249,6 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
 		flush_dcache_page(page);
 
 	SetPageUptodate(page);
-	ceph_readpage_to_fscache(inode, page);
 out:
 	return err < 0 ? err : 0;
 }
@@ -294,10 +288,8 @@ static void finish_read(struct ceph_osd_request *req)
 	for (i = 0; i < num_pages; i++) {
 		struct page *page = osd_data->pages[i];
 
-		if (rc < 0 && rc != -ENOENT) {
-			ceph_fscache_readpage_cancel(inode, page);
+		if (rc < 0 && rc != -ENOENT)
 			goto unlock;
-		}
 		if (bytes < (int)PAGE_SIZE) {
 			/* zero (remainder of) page */
 			int s = bytes < 0 ? 0 : bytes;
@@ -307,7 +299,6 @@ static void finish_read(struct ceph_osd_request *req)
 		     page->index);
 		flush_dcache_page(page);
 		SetPageUptodate(page);
-		ceph_readpage_to_fscache(inode, page);
 unlock:
 		unlock_page(page);
 		put_page(page);
@@ -408,7 +399,6 @@ static int start_read(struct inode *inode, struct ceph_rw_context *rw_ctx,
 		     page->index);
 		if (add_to_page_cache_lru(page, &inode->i_data, page->index,
 					  GFP_KERNEL)) {
-			ceph_fscache_uncache_page(inode, page);
 			put_page(page);
 			dout("start_read %p add_to_page_cache failed %p\n",
 			     inode, page);
@@ -441,7 +431,6 @@ static int start_read(struct inode *inode, struct ceph_rw_context *rw_ctx,
 
 out_pages:
 	for (i = 0; i < nr_pages; ++i) {
-		ceph_fscache_readpage_cancel(inode, pages[i]);
 		unlock_page(pages[i]);
 	}
 	ceph_put_page_vector(pages, nr_pages, false);
@@ -471,12 +460,6 @@ static int ceph_readpages(struct file *file, struct address_space *mapping,
 	if (ceph_inode(inode)->i_inline_version != CEPH_INLINE_NONE)
 		return -EINVAL;
 
-	rc = ceph_readpages_from_fscache(mapping->host, mapping, page_list,
-					 &nr_pages);
-
-	if (rc == 0)
-		goto out;
-
 	rw_ctx = ceph_find_rw_context(fi);
 	max = fsc->mount_options->rsize >> PAGE_SHIFT;
 	dout("readpages %p file %p ctx %p nr_pages %d max %d\n",
@@ -487,8 +470,6 @@ static int ceph_readpages(struct file *file, struct address_space *mapping,
 			goto out;
 	}
 out:
-	ceph_fscache_readpages_cancel(inode, page_list);
-
 	dout("readpages %p file %p ret %d\n", inode, file, rc);
 	return rc;
 }
diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index 2f5cb6bc78e1..67f1dad09c19 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@ -35,11 +35,6 @@ struct ceph_fscache_entry {
 	char uniquifier[];
 };
 
-static const struct fscache_cookie_def ceph_fscache_fsid_object_def = {
-	.name		= "CEPH.fsid",
-	.type		= FSCACHE_COOKIE_TYPE_INDEX,
-};
-
 int __init ceph_fscache_register(void)
 {
 	return fscache_register_netfs(&ceph_cache_netfs);
@@ -50,6 +45,51 @@ void ceph_fscache_unregister(void)
 	fscache_unregister_netfs(&ceph_cache_netfs);
 }
 
+void ceph_fscache_use_cookie(struct inode *inode, bool will_modify)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	if (ci->fscache)
+		fscache_use_cookie(ci->fscache, will_modify);
+}
+
+void ceph_fscache_unuse_cookie(struct inode *inode, bool update)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	if (!ci->fscache)
+		return;
+
+	if (update) {
+		struct ceph_aux_inode aux = {
+						.version = ci->i_version,
+						.mtime_sec = inode->i_mtime.tv_sec,
+						.mtime_nsec = inode->i_mtime.tv_nsec,
+					    };
+		loff_t i_size = i_size_read(inode);
+
+		fscache_unuse_cookie(ci->fscache, &aux, &i_size);
+	} else {
+		fscache_unuse_cookie(ci->fscache, NULL, NULL);
+	}
+}
+
+void ceph_fscache_update_cookie(struct inode *inode)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	struct ceph_aux_inode aux;
+	loff_t i_size = i_size_read(inode);
+
+	if (!ci->fscache)
+		return;
+
+	aux.version = ci->i_version;
+	aux.mtime_sec = inode->i_mtime.tv_sec;
+	aux.mtime_nsec = inode->i_mtime.tv_nsec;
+
+	fscache_update_cookie(ci->fscache, &aux, &i_size);
+}
+
 int ceph_fscache_register_fs(struct ceph_fs_client* fsc, struct fs_context *fc)
 {
 	const struct ceph_fsid *fsid = &fsc->client->fsid;
@@ -86,225 +126,70 @@ int ceph_fscache_register_fs(struct ceph_fs_client* fsc, struct fs_context *fc)
 	}
 
 	fsc->fscache = fscache_acquire_cookie(ceph_cache_netfs.primary_index,
-					      &ceph_fscache_fsid_object_def,
-					      &ent->fsid, sizeof(ent->fsid) + uniq_len,
-					      NULL, 0,
-					      fsc, 0, true);
-
+					      FSCACHE_COOKIE_TYPE_INDEX,
+					      "CEPH.fsid", 0, NULL, &ent->fsid,
+					      sizeof(ent->fsid) + uniq_len, NULL, 0, 0);
 	if (fsc->fscache) {
 		ent->fscache = fsc->fscache;
 		list_add_tail(&ent->list, &ceph_fscache_list);
 	} else {
+		pr_warn("Unable to set primary index for fscache! Disabling it.\n");
 		kfree(ent);
-		errorfc(fc, "unable to register fscache cookie for fsid %pU",
-		       fsid);
-		/* all other fs ignore this error */
 	}
 out_unlock:
 	mutex_unlock(&ceph_fscache_lock);
 	return err;
 }
 
-static enum fscache_checkaux ceph_fscache_inode_check_aux(
-	void *cookie_netfs_data, const void *data, uint16_t dlen,
-	loff_t object_size)
-{
-	struct ceph_aux_inode aux;
-	struct ceph_inode_info* ci = cookie_netfs_data;
-	struct inode* inode = &ci->vfs_inode;
-
-	if (dlen != sizeof(aux) ||
-	    i_size_read(inode) != object_size)
-		return FSCACHE_CHECKAUX_OBSOLETE;
-
-	memset(&aux, 0, sizeof(aux));
-	aux.version = ci->i_version;
-	aux.mtime_sec = inode->i_mtime.tv_sec;
-	aux.mtime_nsec = inode->i_mtime.tv_nsec;
-
-	if (memcmp(data, &aux, sizeof(aux)) != 0)
-		return FSCACHE_CHECKAUX_OBSOLETE;
-
-	dout("ceph inode 0x%p cached okay\n", ci);
-	return FSCACHE_CHECKAUX_OKAY;
-}
-
-static const struct fscache_cookie_def ceph_fscache_inode_object_def = {
-	.name		= "CEPH.inode",
-	.type		= FSCACHE_COOKIE_TYPE_DATAFILE,
-	.check_aux	= ceph_fscache_inode_check_aux,
-};
-
 void ceph_fscache_register_inode_cookie(struct inode *inode)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	struct ceph_aux_inode aux;
 
+	/* Register only new inodes */
+	if (!(inode->i_state & I_NEW))
+		return;
+
 	/* No caching for filesystem */
 	if (!fsc->fscache)
 		return;
 
-	/* Only cache for regular files that are read only */
-	if (!S_ISREG(inode->i_mode))
-		return;
+	WARN_ON_ONCE(ci->fscache);
 
-	inode_lock_nested(inode, I_MUTEX_CHILD);
-	if (!ci->fscache) {
-		memset(&aux, 0, sizeof(aux));
-		aux.version = ci->i_version;
-		aux.mtime_sec = inode->i_mtime.tv_sec;
-		aux.mtime_nsec = inode->i_mtime.tv_nsec;
-		ci->fscache = fscache_acquire_cookie(fsc->fscache,
-						     &ceph_fscache_inode_object_def,
-						     &ci->i_vino, sizeof(ci->i_vino),
-						     &aux, sizeof(aux),
-						     ci, i_size_read(inode), false);
-	}
-	inode_unlock(inode);
+	memset(&aux, 0, sizeof(aux));
+	aux.version = ci->i_version;
+	aux.mtime_sec = inode->i_mtime.tv_sec;
+	aux.mtime_nsec = inode->i_mtime.tv_nsec;
+	ci->fscache = fscache_acquire_cookie(fsc->fscache,
+					     FSCACHE_COOKIE_TYPE_DATAFILE,
+					     "CEPH.inode", 0, NULL,
+					     &ci->i_vino,
+					     sizeof(ci->i_vino),
+					     &aux, sizeof(aux),
+					     i_size_read(inode));
 }
 
 void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
 {
-	struct fscache_cookie* cookie;
-
-	if ((cookie = ci->fscache) == NULL)
-		return;
-
-	ci->fscache = NULL;
-
-	fscache_uncache_all_inode_pages(cookie, &ci->vfs_inode);
-	fscache_relinquish_cookie(cookie, &ci->i_vino, false);
-}
-
-static bool ceph_fscache_can_enable(void *data)
-{
-	struct inode *inode = data;
-	return !inode_is_open_for_write(inode);
-}
-
-void ceph_fscache_file_set_cookie(struct inode *inode, struct file *filp)
-{
-	struct ceph_inode_info *ci = ceph_inode(inode);
-
-	if (!fscache_cookie_valid(ci->fscache))
-		return;
-
-	if (inode_is_open_for_write(inode)) {
-		dout("fscache_file_set_cookie %p %p disabling cache\n",
-		     inode, filp);
-		fscache_disable_cookie(ci->fscache, &ci->i_vino, false);
-		fscache_uncache_all_inode_pages(ci->fscache, inode);
-	} else {
-		fscache_enable_cookie(ci->fscache, &ci->i_vino, i_size_read(inode),
-				      ceph_fscache_can_enable, inode);
-		if (fscache_cookie_enabled(ci->fscache)) {
-			dout("fscache_file_set_cookie %p %p enabling cache\n",
-			     inode, filp);
-		}
-	}
-}
-
-static void ceph_readpage_from_fscache_complete(struct page *page, void *data, int error)
-{
-	if (!error)
-		SetPageUptodate(page);
+	struct fscache_cookie* cookie = xchg(&ci->fscache, NULL);
 
-	unlock_page(page);
-}
-
-static inline bool cache_valid(struct ceph_inode_info *ci)
-{
-	return ci->i_fscache_gen == ci->i_rdcache_gen;
+	if (cookie)
+		fscache_relinquish_cookie(cookie, false);
 }
 
-
-/* Atempt to read from the fscache,
- *
- * This function is called from the readpage_nounlock context. DO NOT attempt to
- * unlock the page here (or in the callback).
- */
-int ceph_readpage_from_fscache(struct inode *inode, struct page *page)
+void ceph_fscache_invalidate(struct inode *inode, unsigned int flags)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
-	int ret;
-
-	if (!cache_valid(ci))
-		return -ENOBUFS;
-
-	ret = fscache_read_or_alloc_page(ci->fscache, page,
-					 ceph_readpage_from_fscache_complete, NULL,
-					 GFP_KERNEL);
-
-	switch (ret) {
-		case 0: /* Page found */
-			dout("page read submitted\n");
-			return 0;
-		case -ENOBUFS: /* Pages were not found, and can't be */
-		case -ENODATA: /* Pages were not found */
-			dout("page/inode not in cache\n");
-			return ret;
-		default:
-			dout("%s: unknown error ret = %i\n", __func__, ret);
-			return ret;
-	}
-}
+	struct ceph_aux_inode aux = { .version		= ci->i_version,
+				      .mtime_sec	= inode->i_mtime.tv_sec,
+				      .mtime_nsec	= inode->i_mtime.tv_nsec };
 
-int ceph_readpages_from_fscache(struct inode *inode,
-				  struct address_space *mapping,
-				  struct list_head *pages,
-				  unsigned *nr_pages)
-{
-	struct ceph_inode_info *ci = ceph_inode(inode);
-	int ret;
-
-	if (!cache_valid(ci))
-		return -ENOBUFS;
-
-	ret = fscache_read_or_alloc_pages(ci->fscache, mapping, pages, nr_pages,
-					  ceph_readpage_from_fscache_complete,
-					  NULL, mapping_gfp_mask(mapping));
-
-	switch (ret) {
-		case 0: /* All pages found */
-			dout("all-page read submitted\n");
-			return 0;
-		case -ENOBUFS: /* Some pages were not found, and can't be */
-		case -ENODATA: /* some pages were not found */
-			dout("page/inode not in cache\n");
-			return ret;
-		default:
-			dout("%s: unknown error ret = %i\n", __func__, ret);
-			return ret;
-	}
-}
-
-void ceph_readpage_to_fscache(struct inode *inode, struct page *page)
-{
-	struct ceph_inode_info *ci = ceph_inode(inode);
-	int ret;
-
-	if (!PageFsCache(page))
-		return;
-
-	if (!cache_valid(ci))
-		return;
-
-	ret = fscache_write_page(ci->fscache, page, i_size_read(inode),
-				 GFP_KERNEL);
-	if (ret)
-		 fscache_uncache_page(ci->fscache, page);
-}
-
-void ceph_invalidate_fscache_page(struct inode* inode, struct page *page)
-{
-	struct ceph_inode_info *ci = ceph_inode(inode);
-
-	if (!PageFsCache(page))
-		return;
+	aux.version = ci->i_version;
+	aux.mtime_sec = inode->i_mtime.tv_sec;
+	aux.mtime_nsec = inode->i_mtime.tv_nsec;
 
-	fscache_wait_on_page_write(ci->fscache, page);
-	fscache_uncache_page(ci->fscache, page);
+	fscache_invalidate(ceph_inode(inode)->fscache, &aux, i_size_read(inode), flags);
 }
 
 void ceph_fscache_unregister_fs(struct ceph_fs_client* fsc)
@@ -325,28 +210,7 @@ void ceph_fscache_unregister_fs(struct ceph_fs_client* fsc)
 		WARN_ON_ONCE(!found);
 		mutex_unlock(&ceph_fscache_lock);
 
-		__fscache_relinquish_cookie(fsc->fscache, NULL, false);
+		__fscache_relinquish_cookie(fsc->fscache, false);
 	}
 	fsc->fscache = NULL;
 }
-
-/*
- * caller should hold CEPH_CAP_FILE_{RD,CACHE}
- */
-void ceph_fscache_revalidate_cookie(struct ceph_inode_info *ci)
-{
-	if (cache_valid(ci))
-		return;
-
-	/* resue i_truncate_mutex. There should be no pending
-	 * truncate while the caller holds CEPH_CAP_FILE_RD */
-	mutex_lock(&ci->i_truncate_mutex);
-	if (!cache_valid(ci)) {
-		if (fscache_check_consistency(ci->fscache, &ci->i_vino))
-			fscache_invalidate(ci->fscache);
-		spin_lock(&ci->i_ceph_lock);
-		ci->i_fscache_gen = ci->i_rdcache_gen;
-		spin_unlock(&ci->i_ceph_lock);
-	}
-	mutex_unlock(&ci->i_truncate_mutex);
-}
diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
index 89dbdd1eb14a..4d5777cdf86f 100644
--- a/fs/ceph/cache.h
+++ b/fs/ceph/cache.h
@@ -21,63 +21,36 @@ void ceph_fscache_unregister_fs(struct ceph_fs_client* fsc);
 
 void ceph_fscache_register_inode_cookie(struct inode *inode);
 void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci);
-void ceph_fscache_file_set_cookie(struct inode *inode, struct file *filp);
-void ceph_fscache_revalidate_cookie(struct ceph_inode_info *ci);
 
-int ceph_readpage_from_fscache(struct inode *inode, struct page *page);
-int ceph_readpages_from_fscache(struct inode *inode,
-				struct address_space *mapping,
-				struct list_head *pages,
-				unsigned *nr_pages);
-void ceph_readpage_to_fscache(struct inode *inode, struct page *page);
-void ceph_invalidate_fscache_page(struct inode* inode, struct page *page);
+void ceph_fscache_use_cookie(struct inode *inode, bool will_modify);
+void ceph_fscache_unuse_cookie(struct inode *inode, bool update);
+
+void ceph_fscache_update_cookie(struct inode *inode);
+void ceph_fscache_invalidate(struct inode *inode, unsigned int flags);
 
 static inline void ceph_fscache_inode_init(struct ceph_inode_info *ci)
 {
 	ci->fscache = NULL;
-	ci->i_fscache_gen = 0;
-}
-
-static inline void ceph_fscache_invalidate(struct inode *inode)
-{
-	fscache_invalidate(ceph_inode(inode)->fscache);
-}
-
-static inline void ceph_fscache_uncache_page(struct inode *inode,
-					     struct page *page)
-{
-	struct ceph_inode_info *ci = ceph_inode(inode);
-	return fscache_uncache_page(ci->fscache, page);
 }
 
-static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
+static inline struct fscache_cookie *ceph_fscache_cookie(struct ceph_inode_info *ci)
 {
-	struct inode* inode = page->mapping->host;
-	struct ceph_inode_info *ci = ceph_inode(inode);
-	return fscache_maybe_release_page(ci->fscache, page, gfp);
+	return ci->fscache;
 }
 
-static inline void ceph_fscache_readpage_cancel(struct inode *inode,
-						struct page *page)
+static inline void ceph_wait_on_page_fscache(struct page *page)
 {
-	struct ceph_inode_info *ci = ceph_inode(inode);
-	if (fscache_cookie_valid(ci->fscache) && PageFsCache(page))
-		__fscache_uncache_page(ci->fscache, page);
+	wait_on_page_fscache(page);
 }
 
-static inline void ceph_fscache_readpages_cancel(struct inode *inode,
-						 struct list_head *pages)
+static inline void ceph_fscache_resize_cookie(struct ceph_inode_info *ci, loff_t new_size)
 {
-	struct ceph_inode_info *ci = ceph_inode(inode);
-	return fscache_readpages_cancel(ci->fscache, pages);
-}
+	struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
 
-static inline void ceph_disable_fscache_readpage(struct ceph_inode_info *ci)
-{
-	ci->i_fscache_gen = ci->i_rdcache_gen - 1;
+	if (cookie)
+		fscache_resize_cookie(cookie, new_size);
 }
-
-#else
+#else /* CONFIG_CEPH_FSCACHE */
 
 static inline int ceph_fscache_register(void)
 {
@@ -110,67 +83,34 @@ static inline void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info*
 {
 }
 
-static inline void ceph_fscache_file_set_cookie(struct inode *inode,
-						struct file *filp)
+static inline void ceph_fscache_use_cookie(struct inode *inode, bool will_modify)
 {
 }
 
-static inline void ceph_fscache_revalidate_cookie(struct ceph_inode_info *ci)
+static inline void ceph_fscache_unuse_cookie(struct inode *inode, bool update)
 {
 }
 
-static inline void ceph_fscache_uncache_page(struct inode *inode,
-					     struct page *pages)
+static inline void ceph_fscache_update_cookie(struct inode *inode)
 {
 }
 
-static inline int ceph_readpage_from_fscache(struct inode* inode,
-					     struct page *page)
+static inline void ceph_fscache_invalidate(struct inode *inode, unsigned int flags)
 {
-	return -ENOBUFS;
 }
 
-static inline int ceph_readpages_from_fscache(struct inode *inode,
-					      struct address_space *mapping,
-					      struct list_head *pages,
-					      unsigned *nr_pages)
+static inline struct fscache_cookie *ceph_fscache_cookie(struct ceph_inode_info *ci)
 {
-	return -ENOBUFS;
+	return NULL;
 }
 
-static inline void ceph_readpage_to_fscache(struct inode *inode,
-					    struct page *page)
+static inline void ceph_wait_on_page_fscache(struct page *page)
 {
 }
 
-static inline void ceph_fscache_invalidate(struct inode *inode)
+static inline void ceph_fscache_resize_cookie(struct ceph_inode_info *ci, loff_t new_size)
 {
 }
-
-static inline void ceph_invalidate_fscache_page(struct inode *inode,
-						struct page *page)
-{
-}
-
-static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
-{
-	return 1;
-}
-
-static inline void ceph_fscache_readpage_cancel(struct inode *inode,
-						struct page *page)
-{
-}
-
-static inline void ceph_fscache_readpages_cancel(struct inode *inode,
-						 struct list_head *pages)
-{
-}
-
-static inline void ceph_disable_fscache_readpage(struct ceph_inode_info *ci)
-{
-}
-
-#endif
+#endif /* CONFIG_CEPH_FSCACHE */
 
 #endif
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index f8d0ee19f01d..a4428a40c67c 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1984,7 +1984,7 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
 	    !(ci->i_wb_ref || ci->i_wrbuffer_ref) &&   /* no dirty pages... */
 	    inode->i_data.nrpages &&		/* have cached pages */
 	    (revoking & (CEPH_CAP_FILE_CACHE|
-			 CEPH_CAP_FILE_LAZYIO)) && /*  or revoking cache */
+			 CEPH_CAP_FILE_LAZYIO)) && /* revoking Fc */
 	    !tried_invalidate) {
 		dout("check_caps trying to invalidate on %p\n", inode);
 		if (try_nonblocking_invalidate(inode) < 0) {
@@ -2722,10 +2722,6 @@ static int try_get_cap_refs(struct inode *inode, int need, int want,
 				*got = need | want;
 			else
 				*got = need;
-			if (S_ISREG(inode->i_mode) &&
-			    (need & CEPH_CAP_FILE_RD) &&
-			    !(*got & CEPH_CAP_FILE_CACHE))
-				ceph_disable_fscache_readpage(ci);
 			ceph_take_cap_refs(ci, *got, true);
 			ret = 1;
 		}
@@ -2975,11 +2971,6 @@ int ceph_get_caps(struct file *filp, int need, int want,
 		}
 		break;
 	}
-
-	if (S_ISREG(ci->vfs_inode.i_mode) &&
-	    (_got & CEPH_CAP_FILE_RD) && (_got & CEPH_CAP_FILE_CACHE))
-		ceph_fscache_revalidate_cookie(ci);
-
 	*got = _got;
 	return 0;
 }
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index d51c3f2fdca0..de5dc8cbcd4d 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -250,8 +250,7 @@ static int ceph_init_file(struct inode *inode, struct file *file, int fmode)
 
 	switch (inode->i_mode & S_IFMT) {
 	case S_IFREG:
-		ceph_fscache_register_inode_cookie(inode);
-		ceph_fscache_file_set_cookie(inode, file);
+		ceph_fscache_use_cookie(inode, file->f_mode & FMODE_WRITE);
 		/* fall through */
 	case S_IFDIR:
 		ret = ceph_init_file_info(inode, file, fmode,
@@ -803,6 +802,7 @@ int ceph_release(struct inode *inode, struct file *file)
 		dout("release inode %p regular file %p\n", inode, file);
 		WARN_ON(!list_empty(&fi->rw_contexts));
 
+		ceph_fscache_unuse_cookie(inode, file->f_mode & FMODE_WRITE);
 		ceph_put_fmode(ci, fi->fmode, 1);
 
 		kmem_cache_free(ceph_file_cachep, fi);
@@ -1221,7 +1221,11 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 	     snapc, snapc ? snapc->seq : 0);
 
 	if (write) {
-		int ret2 = invalidate_inode_pages2_range(inode->i_mapping,
+		int ret2;
+
+		ceph_fscache_invalidate(inode, FSCACHE_INVAL_DIO_WRITE);
+
+		ret2 = invalidate_inode_pages2_range(inode->i_mapping,
 					pos >> PAGE_SHIFT,
 					(pos + count - 1) >> PAGE_SHIFT);
 		if (ret2 < 0)
@@ -1432,6 +1436,8 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 	if (ret < 0)
 		return ret;
 
+	/* FIXME: write to cache instead? */
+	ceph_fscache_invalidate(inode, 0);
 	ret = invalidate_inode_pages2_range(inode->i_mapping,
 					    pos >> PAGE_SHIFT,
 					    (pos + count - 1) >> PAGE_SHIFT);
@@ -2381,6 +2387,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
 		goto out_caps;
 
 	/* Drop dst file cached pages */
+	ceph_fscache_invalidate(dst_inode, 0);
 	ret = invalidate_inode_pages2_range(dst_inode->i_mapping,
 					    dst_off >> PAGE_SHIFT,
 					    (dst_off + len) >> PAGE_SHIFT);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 357c937699d5..c98830a5be6f 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -623,6 +623,7 @@ int ceph_fill_file_size(struct inode *inode, int issued,
 		}
 		i_size_write(inode, size);
 		inode->i_blocks = calc_inode_blocks(size);
+		ceph_fscache_update_cookie(inode);
 		ci->i_reported_size = size;
 		if (truncate_seq != ci->i_truncate_seq) {
 			dout("truncate_seq %u -> %u\n",
@@ -655,10 +656,6 @@ int ceph_fill_file_size(struct inode *inode, int issued,
 		     truncate_size);
 		ci->i_truncate_size = truncate_size;
 	}
-
-	if (queue_trunc)
-		ceph_fscache_invalidate(inode);
-
 	return queue_trunc;
 }
 
@@ -927,6 +924,7 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		inode->i_op = &ceph_file_iops;
 		break;
 	case S_IFREG:
+		ceph_fscache_register_inode_cookie(inode);
 		inode->i_op = &ceph_file_iops;
 		inode->i_fop = &ceph_file_fops;
 		break;
@@ -1790,11 +1788,13 @@ bool ceph_inode_set_size(struct inode *inode, loff_t size)
 	spin_lock(&ci->i_ceph_lock);
 	dout("set_size %p %llu -> %llu\n", inode, inode->i_size, size);
 	i_size_write(inode, size);
+	ceph_fscache_update_cookie(inode);
 	inode->i_blocks = calc_inode_blocks(size);
 
 	ret = __ceph_should_report_size(ci);
 
 	spin_unlock(&ci->i_ceph_lock);
+
 	return ret;
 }
 
@@ -1866,6 +1866,7 @@ void ceph_queue_vmtruncate(struct inode *inode)
 	set_bit(CEPH_I_WORK_VMTRUNCATE, &ci->i_work_mask);
 
 	ihold(inode);
+	ceph_fscache_invalidate(inode, 0);
 	if (queue_work(ceph_inode_to_client(inode)->inode_wq,
 		       &ci->i_work)) {
 		dout("ceph_queue_vmtruncate %p\n", inode);
@@ -1975,6 +1976,10 @@ void __ceph_do_pending_vmtruncate(struct inode *inode)
 	spin_unlock(&ci->i_ceph_lock);
 
 	truncate_pagecache(inode, to);
+	ceph_fscache_use_cookie(inode, true);
+	ceph_fscache_resize_cookie(ci, to);
+	ceph_fscache_unuse_cookie(inode, true);
+
 
 	spin_lock(&ci->i_ceph_lock);
 	if (to == ci->i_truncate_size) {
@@ -2136,6 +2141,7 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr)
 		if ((issued & CEPH_CAP_FILE_EXCL) &&
 		    attr->ia_size > inode->i_size) {
 			i_size_write(inode, attr->ia_size);
+			ceph_fscache_update_cookie(inode);
 			inode->i_blocks = calc_inode_blocks(attr->ia_size);
 			ci->i_reported_size = attr->ia_size;
 			dirtied |= CEPH_CAP_FILE_EXCL;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 82e7b4d10b04..72a8c302140a 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -427,7 +427,6 @@ struct ceph_inode_info {
 
 #ifdef CONFIG_CEPH_FSCACHE
 	struct fscache_cookie *fscache;
-	u32 i_fscache_gen;
 #endif
 	errseq_t i_meta_err;
 
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 07/11] ceph: convert readpage to fscache read helper
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
                   ` (5 preceding siblings ...)
  2020-07-31 13:04 ` [RFC PATCH v2 06/11] ceph: conversion to new fscache API Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 08/11] ceph: plug write_begin into " Jeff Layton
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

Create a new ceph_fscache_req structure that holds a fscache_io_request
and a refcount_t. Change the readpage code to use the new
infrastructure. Have KConfig select CONFIG_FSCACHE_SERVICES and
CONFIG_FSCACHE_READ_HELPER so that this will still work even when
FSCache proper is disabled.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/Kconfig |   2 +
 fs/ceph/addr.c  | 204 +++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 196 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
index e955a38be3c8..432aa34b63e7 100644
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -6,6 +6,8 @@ config CEPH_FS
 	select LIBCRC32C
 	select CRYPTO_AES
 	select CRYPTO
+	select FSCACHE_SERVICES
+	select FSCACHE_READ_HELPER
 	default n
 	help
 	  Choose Y or M here to include support for mounting the
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index e005c32270f5..75cdd35f1d2e 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -12,6 +12,7 @@
 #include <linux/signal.h>
 #include <linux/iversion.h>
 #include <linux/ktime.h>
+#include <linux/fscache.h>
 
 #include "super.h"
 #include "mds_client.h"
@@ -182,6 +183,199 @@ static int ceph_releasepage(struct page *page, gfp_t g)
 	return 1;
 }
 
+struct ceph_fscache_req {
+	struct fscache_io_request	fscache_req;
+	refcount_t			ref;
+};
+
+static struct ceph_fscache_req *ceph_fsreq_alloc(void)
+{
+	struct ceph_fscache_req *req = kzalloc(sizeof(*req), GFP_NOFS);
+
+	if (req)
+		refcount_set(&req->ref, 1);
+	return req;
+}
+
+static void ceph_fsreq_done(struct fscache_io_request *fsreq)
+{
+}
+
+static void ceph_fsreq_get(struct fscache_io_request *fsreq)
+{
+	struct ceph_fscache_req *req = container_of(fsreq, struct ceph_fscache_req, fscache_req);
+
+	refcount_inc(&req->ref);
+}
+
+static void ceph_fsreq_put(struct fscache_io_request *fsreq)
+{
+	struct ceph_fscache_req *req = container_of(fsreq, struct ceph_fscache_req, fscache_req);
+
+	if (refcount_dec_and_test(&req->ref)) {
+		fscache_free_io_request(fsreq);
+		kfree(req);
+	}
+}
+
+static void ceph_fsreq_reshape(struct fscache_io_request *fsreq,
+			       struct fscache_request_shape *shape)
+{
+	struct inode *inode = fsreq->mapping->host;
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	u64 objno, objoff;
+	u32 xlen;
+
+	/* Truncate the extent at the end of the current object */
+	ceph_calc_file_object_mapping(&ci->i_layout, shape->actual_start << PAGE_SHIFT,
+				      shape->actual_nr_pages << PAGE_SHIFT, &objno, &objoff, &xlen);
+	shape->actual_nr_pages = xlen >> PAGE_SHIFT;
+}
+
+static void finish_fsreq_read(struct ceph_osd_request *req)
+{
+	struct ceph_fs_client *fsc = ceph_inode_to_client(req->r_inode);
+	struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0);
+	struct fscache_io_request *fsreq = req->r_priv;
+	int num_pages;
+	int err = req->r_result;
+
+	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_latency,
+				 req->r_end_latency, err);
+
+	/* no object means success but no data */
+	if (err == -ENOENT)
+		err = 0;
+	else if (err == -EBLACKLISTED)
+		fsc->blacklisted = true;
+
+	dout("%s: result %d\n", __func__, err);
+	if (err >= 0)
+		fsreq->transferred = err;
+	else
+		fsreq->error = err;
+
+	if (fsreq->io_done)
+		fsreq->io_done(fsreq);
+
+	num_pages = calc_pages_for(osd_data->alignment, osd_data->length);
+	ceph_put_page_vector(osd_data->pages, num_pages, false);
+	ceph_fsreq_put(fsreq);
+	iput(req->r_inode);
+}
+
+static void ceph_fsreq_issue_op(struct fscache_io_request *fsreq)
+{
+	struct inode *inode = fsreq->mapping->host;
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_osd_request *req = NULL;
+	struct ceph_vino vino = ceph_vino(inode);
+	struct iov_iter iter;
+	struct page **pages;
+	size_t page_off;
+	int err = 0;
+	u64 len = fsreq->len;
+
+	req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, fsreq->pos, &len,
+			0, 1, CEPH_OSD_OP_READ,
+			CEPH_OSD_FLAG_READ | fsc->client->osdc.client->options->read_from_replica,
+			NULL, ci->i_truncate_seq, ci->i_truncate_size, false);
+	if (IS_ERR(req)) {
+		err = PTR_ERR(req);
+		goto out;
+	}
+
+	dout("%s: pos=%llu orig_len=%llu len=%llu\n", __func__, fsreq->pos, fsreq->len, len);
+	iov_iter_mapping(&iter, READ, fsreq->mapping, fsreq->pos, len);
+	len = iov_iter_get_pages_alloc(&iter, &pages, len, &page_off);
+	if (len < 0) {
+		err = len;
+		dout("%s: iov_ter_get_pages_alloc returned %d\n", __func__, err);
+		goto out;
+	}
+
+	/* fscache should always give us a page-aligned read */
+	WARN_ON_ONCE(page_off);
+
+	osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0, false, false);
+	req->r_callback = finish_fsreq_read;
+	req->r_priv = fsreq;
+	ceph_fsreq_get(fsreq);
+	req->r_inode = inode;
+	ihold(inode);
+
+	err = ceph_osdc_start_request(req->r_osdc, req, false);
+	if (err) {
+		iput(inode);
+		ceph_fsreq_put(fsreq);
+	}
+out:
+	if (req)
+		ceph_osdc_put_request(req);
+
+	if (err) {
+		fsreq->error = err;
+		if (fsreq->io_done)
+			fsreq->io_done(fsreq);
+	}
+	dout("%s: result %d\n", __func__, fsreq->error);
+}
+
+const struct fscache_io_request_ops ceph_readpage_fsreq_ops = {
+	.issue_op	= ceph_fsreq_issue_op,
+	.reshape	= ceph_fsreq_reshape,
+	.done		= ceph_fsreq_done,
+	.get		= ceph_fsreq_get,
+	.put		= ceph_fsreq_put,
+};
+
+/* read a single page, without unlocking it. */
+static int ceph_readpage(struct file *filp, struct page *page)
+{
+	struct inode *inode = file_inode(filp);
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_fscache_req *req;
+	struct ceph_vino vino = ceph_vino(inode);
+	struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
+	int err = 0;
+	u64 off = page_offset(page);
+	u64 len = PAGE_SIZE;
+
+	if (ci->i_inline_version != CEPH_INLINE_NONE) {
+		/*
+		 * Uptodate inline data should have been added
+		 * into page cache while getting Fcr caps.
+		 */
+		if (off == 0) {
+			unlock_page(page);
+			return -EINVAL;
+		}
+		zero_user_segment(page, 0, PAGE_SIZE);
+		SetPageUptodate(page);
+		unlock_page(page);
+		return 0;
+	}
+
+	dout("readpage ino %llx.%llx file %p off %llu len %llu page %p index %lu\n",
+	     vino.ino, vino.snap, filp, off, len, page, page->index);
+
+	req = ceph_fsreq_alloc();
+	if (!req) {
+		unlock_page(page);
+		return -ENOMEM;
+	}
+
+	fscache_init_io_request(&req->fscache_req, cookie, &ceph_readpage_fsreq_ops);
+	req->fscache_req.mapping = inode->i_mapping;
+
+	err = fscache_read_helper_locked_page(&req->fscache_req, page,
+					      fsc->mount_options->rsize >> PAGE_SHIFT);
+	ceph_fsreq_put(&req->fscache_req);
+	return err;
+}
+
 /* read a single page, without unlocking it. */
 static int ceph_do_readpage(struct file *filp, struct page *page)
 {
@@ -253,16 +447,6 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
 	return err < 0 ? err : 0;
 }
 
-static int ceph_readpage(struct file *filp, struct page *page)
-{
-	int r = ceph_do_readpage(filp, page);
-	if (r != -EINPROGRESS)
-		unlock_page(page);
-	else
-		r = 0;
-	return r;
-}
-
 /*
  * Finish an async read(ahead) op.
  */
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 08/11] ceph: plug write_begin into read helper
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
                   ` (6 preceding siblings ...)
  2020-07-31 13:04 ` [RFC PATCH v2 07/11] ceph: convert readpage to fscache read helper Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper Jeff Layton
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

Plug write_begin into the read helper routine. This requires adding a
new is_req_valid op that we can use to vet whether there is an
incompatible snap context that needs to be flushed before we can fill
the page.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c | 253 ++++++++++++++++++++++++++-----------------------
 1 file changed, 132 insertions(+), 121 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 75cdd35f1d2e..cee497c108bb 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -185,6 +185,7 @@ static int ceph_releasepage(struct page *page, gfp_t g)
 
 struct ceph_fscache_req {
 	struct fscache_io_request	fscache_req;
+	struct ceph_snap_context	*snapc;
 	refcount_t			ref;
 };
 
@@ -376,77 +377,6 @@ static int ceph_readpage(struct file *filp, struct page *page)
 	return err;
 }
 
-/* read a single page, without unlocking it. */
-static int ceph_do_readpage(struct file *filp, struct page *page)
-{
-	struct inode *inode = file_inode(filp);
-	struct ceph_inode_info *ci = ceph_inode(inode);
-	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
-	struct ceph_osd_client *osdc = &fsc->client->osdc;
-	struct ceph_osd_request *req;
-	struct ceph_vino vino = ceph_vino(inode);
-	int err = 0;
-	u64 off = page_offset(page);
-	u64 len = PAGE_SIZE;
-
-	if (off >= i_size_read(inode)) {
-		zero_user_segment(page, 0, PAGE_SIZE);
-		SetPageUptodate(page);
-		return 0;
-	}
-
-	if (ci->i_inline_version != CEPH_INLINE_NONE) {
-		/*
-		 * Uptodate inline data should have been added
-		 * into page cache while getting Fcr caps.
-		 */
-		if (off == 0)
-			return -EINVAL;
-		zero_user_segment(page, 0, PAGE_SIZE);
-		SetPageUptodate(page);
-		return 0;
-	}
-
-	dout("readpage ino %llx.%llx file %p off %llu len %llu page %p index %lu\n",
-	     vino.ino, vino.snap, filp, off, len, page, page->index);
-	req = ceph_osdc_new_request(osdc, &ci->i_layout, vino, off, &len, 0, 1,
-				    CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ, NULL,
-				    ci->i_truncate_seq, ci->i_truncate_size,
-				    false);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
-
-	osd_req_op_extent_osd_data_pages(req, 0, &page, len, 0, false, false);
-
-	err = ceph_osdc_start_request(osdc, req, false);
-	if (!err)
-		err = ceph_osdc_wait_request(osdc, req);
-
-	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_latency,
-				 req->r_end_latency, err);
-
-	ceph_osdc_put_request(req);
-	dout("readpage result %d\n", err);
-
-	if (err == -ENOENT)
-		err = 0;
-	if (err < 0) {
-		SetPageError(page);
-		if (err == -EBLACKLISTED)
-			fsc->blacklisted = true;
-		goto out;
-	}
-	if (err < PAGE_SIZE)
-		/* zero fill remainder of page */
-		zero_user_segment(page, err, PAGE_SIZE);
-	else
-		flush_dcache_page(page);
-
-	SetPageUptodate(page);
-out:
-	return err < 0 ? err : 0;
-}
-
 /*
  * Finish an async read(ahead) op.
  */
@@ -1472,6 +1402,30 @@ ceph_find_incompatible(struct inode *inode, struct page *page)
 	return NULL;
 }
 
+static int ceph_fsreq_is_req_valid(struct fscache_io_request *fsreq)
+{
+	struct ceph_snap_context *snapc;
+	struct ceph_fscache_req *req = container_of(fsreq, struct ceph_fscache_req, fscache_req);
+
+	snapc = ceph_find_incompatible(fsreq->mapping->host, fsreq->no_unlock_page);
+	if (snapc) {
+		if (IS_ERR(snapc))
+			return PTR_ERR(snapc);
+		req->snapc = snapc;
+		return -EAGAIN;
+	}
+	return 0;
+}
+
+const struct fscache_io_request_ops ceph_read_for_write_fsreq_ops = {
+	.issue_op	= ceph_fsreq_issue_op,
+	.reshape	= ceph_fsreq_reshape,
+	.is_req_valid	= ceph_fsreq_is_req_valid,
+	.done		= ceph_fsreq_done,
+	.get		= ceph_fsreq_get,
+	.put		= ceph_fsreq_put,
+};
+
 /*
  * We are only allowed to write into/dirty the page if the page is
  * clean, or already dirty within the same snap context.
@@ -1482,76 +1436,133 @@ static int ceph_write_begin(struct file *file, struct address_space *mapping,
 {
 	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
-	struct ceph_snap_context *snapc;
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
 	struct page *page = NULL;
 	pgoff_t index = pos >> PAGE_SHIFT;
-	loff_t page_off = pos & PAGE_MASK;
 	int pos_in_page = pos & ~PAGE_MASK;
-	int end_in_page = pos_in_page + len;
-	loff_t i_size;
 	int r;
-refind:
-	/* get a page */
-	page = grab_cache_page_write_begin(mapping, index, 0);
-	if (!page)
-		return -ENOMEM;
 
-	dout("write_begin file %p inode %p page %p %d~%d\n", file,
-	     inode, page, (int)pos, (int)len);
+	if (ci->i_inline_version != CEPH_INLINE_NONE) {
+		/*
+		 * In principle, we should never get here, as the inode should have been uninlined
+		 * before we're allowed to write to the page (in write_iter or page_mkwrite).
+		 */
+		WARN_ONCE(1, "ceph: write_begin called on still-inlined inode!\n");
 
-	for (;;) {
-		snapc = ceph_find_incompatible(inode, page);
-		if (snapc) {
-			if (IS_ERR(snapc)) {
-				r = PTR_ERR(snapc);
-				break;
-			}
-			unlock_page(page);
-			ceph_queue_writeback(inode);
-			r = wait_event_killable(ci->i_cap_wq,
-						context_is_writeable_or_written(inode, snapc));
-			ceph_put_snap_context(snapc);
-			put_page(page);
-			goto refind;
+		/*
+		 * Uptodate inline data should have been added
+		 * into page cache while getting Fcr caps.
+		 */
+		if (index == 0) {
+			r = -EINVAL;
+			goto out;
 		}
 
-		if (PageUptodate(page)) {
-			dout(" page %p already uptodate\n", page);
-			break;
+		page = grab_cache_page_write_begin(mapping, index, 0);
+		if (!page)
+			return -ENOMEM;
+
+		zero_user_segment(page, 0, PAGE_SIZE);
+		SetPageUptodate(page);
+		r = 0;
+		goto out;
+	}
+
+	do {
+		struct ceph_fscache_req *req;
+		struct ceph_snap_context *snapc = NULL;
+
+		page = pagecache_get_page(mapping, index, FGP_WRITE, 0);
+		if (page) {
+			r = 0;
+			if (PageUptodate(page)) {
+				lock_page(page);
+				if (PageUptodate(page))
+					goto out;
+				unlock_page(page);
+			}
 		}
 
-		/* full page? */
-		if (pos_in_page == 0 && len == PAGE_SIZE)
-			break;
+		/*
+		 * In some cases we don't need to read at all:
+		 * - full page write
+		 * - write that lies completely beyond EOF
+		 * - write that covers the the page from start to EOF or beyond it
+		 */
+		if ((pos_in_page == 0 && len == PAGE_SIZE) ||
+		    (pos >= i_size_read(inode)) ||
+		    (pos_in_page == 0 && (pos + len) >= i_size_read(inode))) {
+			if (!page) {
+				page = grab_cache_page_write_begin(mapping, index, 0);
+				if (!page) {
+					r = -ENOMEM;
+					break;
+				}
+			} else {
+				lock_page(page);
+			}
+
+			snapc = ceph_find_incompatible(inode, page);
+			if (!snapc) {
+				zero_user_segments(page, 0, pos_in_page,
+							 pos_in_page + len, PAGE_SIZE);
+				r = 0;
+				goto out;
+			}
+
+			unlock_page(page);
+
+			if (IS_ERR(snapc)) {
+				r = PTR_ERR(snapc);
+				goto out;
+			}
+			goto flush_incompat;
+		}
 
-		/* past end of file? */
-		i_size = i_size_read(inode);
-		if (page_off >= i_size ||
-		    (pos_in_page == 0 && (pos+len) >= i_size &&
-		     end_in_page - pos_in_page != PAGE_SIZE)) {
-			dout(" zeroing %p 0 - %d and %d - %d\n",
-			     page, pos_in_page, end_in_page, (int)PAGE_SIZE);
-			zero_user_segments(page,
-					   0, pos_in_page,
-					   end_in_page, PAGE_SIZE);
+		req = ceph_fsreq_alloc();
+		if (!req) {
+			unlock_page(page);
+			r = -ENOMEM;
 			break;
 		}
 
-		/* we need to read it. */
-		r = ceph_do_readpage(file, page);
-		if (r) {
-			if (r == -EINPROGRESS)
-				continue;
+		/*
+		 * Do the read. If we find out that we need to wait on writeback, then kick that
+		 * off, wait for it and then resubmit the read.
+		 */
+		fscache_init_io_request(&req->fscache_req, cookie, &ceph_read_for_write_fsreq_ops);
+		req->fscache_req.mapping = inode->i_mapping;
+
+		r = fscache_read_helper_for_write(&req->fscache_req, &page, index,
+						  fsc->mount_options->rsize >> PAGE_SHIFT, 0);
+		if (r != -EAGAIN) {
+			if (r == 0) {
+				r = wait_on_bit(&req->fscache_req.flags,
+						FSCACHE_IO_READ_IN_PROGRESS, TASK_KILLABLE);
+				ceph_wait_on_page_fscache(page);
+			}
+			ceph_fsreq_put(&req->fscache_req);
 			break;
 		}
-	}
 
+		BUG_ON(!req->snapc);
+		snapc = ceph_get_snap_context(req->snapc);
+		ceph_fsreq_put(&req->fscache_req);
+flush_incompat:
+		put_page(page);
+		page = NULL;
+		ceph_queue_writeback(inode);
+		r = wait_event_killable(ci->i_cap_wq,
+					context_is_writeable_or_written(inode, snapc));
+		ceph_put_snap_context(snapc);
+	} while (r == 0);
+out:
 	if (r < 0) {
-		if (page) {
-			unlock_page(page);
+		if (page)
 			put_page(page);
-		}
 	} else {
+		WARN_ON_ONCE(!PageLocked(page));
 		*pagep = page;
 	}
 	return r;
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
                   ` (7 preceding siblings ...)
  2020-07-31 13:04 ` [RFC PATCH v2 08/11] ceph: plug write_begin into " Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  2020-08-09 15:09   ` [Linux-cachefs] " David Wysochanski
                     ` (2 more replies)
  2020-07-31 13:04 ` [RFC PATCH v2 10/11] ceph: add fscache writeback support Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 11/11] ceph: re-enable fscache support Jeff Layton
  10 siblings, 3 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

Convert ceph_readpages to use the fscache_read_helper. With this we can
rip out a lot of the old readpage/readpages infrastructure.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c | 209 +++++++------------------------------------------
 1 file changed, 28 insertions(+), 181 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index cee497c108bb..8905fe4a0930 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -377,76 +377,23 @@ static int ceph_readpage(struct file *filp, struct page *page)
 	return err;
 }
 
-/*
- * Finish an async read(ahead) op.
- */
-static void finish_read(struct ceph_osd_request *req)
-{
-	struct inode *inode = req->r_inode;
-	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
-	struct ceph_osd_data *osd_data;
-	int rc = req->r_result <= 0 ? req->r_result : 0;
-	int bytes = req->r_result >= 0 ? req->r_result : 0;
-	int num_pages;
-	int i;
-
-	dout("finish_read %p req %p rc %d bytes %d\n", inode, req, rc, bytes);
-	if (rc == -EBLACKLISTED)
-		ceph_inode_to_client(inode)->blacklisted = true;
-
-	/* unlock all pages, zeroing any data we didn't read */
-	osd_data = osd_req_op_extent_osd_data(req, 0);
-	BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_PAGES);
-	num_pages = calc_pages_for((u64)osd_data->alignment,
-					(u64)osd_data->length);
-	for (i = 0; i < num_pages; i++) {
-		struct page *page = osd_data->pages[i];
-
-		if (rc < 0 && rc != -ENOENT)
-			goto unlock;
-		if (bytes < (int)PAGE_SIZE) {
-			/* zero (remainder of) page */
-			int s = bytes < 0 ? 0 : bytes;
-			zero_user_segment(page, s, PAGE_SIZE);
-		}
- 		dout("finish_read %p uptodate %p idx %lu\n", inode, page,
-		     page->index);
-		flush_dcache_page(page);
-		SetPageUptodate(page);
-unlock:
-		unlock_page(page);
-		put_page(page);
-		bytes -= PAGE_SIZE;
-	}
-
-	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_latency,
-				 req->r_end_latency, rc);
-
-	kfree(osd_data->pages);
-}
-
-/*
- * start an async read(ahead) operation.  return nr_pages we submitted
- * a read for on success, or negative error code.
- */
-static int start_read(struct inode *inode, struct ceph_rw_context *rw_ctx,
-		      struct list_head *page_list, int max)
+static int ceph_readpages(struct file *file, struct address_space *mapping,
+			  struct list_head *page_list, unsigned nr_pages)
 {
-	struct ceph_osd_client *osdc =
-		&ceph_inode_to_client(inode)->client->osdc;
+	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
-	struct page *page = lru_to_page(page_list);
-	struct ceph_vino vino;
-	struct ceph_osd_request *req;
-	u64 off;
-	u64 len;
-	int i;
-	struct page **pages;
-	pgoff_t next_index;
-	int nr_pages = 0;
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_file_info *fi = file->private_data;
+	struct ceph_rw_context *rw_ctx;
+	struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
 	int got = 0;
 	int ret = 0;
+	int max = fsc->mount_options->rsize >> PAGE_SHIFT;
+
+	if (ceph_inode(inode)->i_inline_version != CEPH_INLINE_NONE)
+		return -EINVAL;
 
+	rw_ctx = ceph_find_rw_context(fi);
 	if (!rw_ctx) {
 		/* caller of readpages does not hold buffer and read caps
 		 * (fadvise, madvise and readahead cases) */
@@ -459,133 +406,33 @@ static int start_read(struct inode *inode, struct ceph_rw_context *rw_ctx,
 			dout("start_read %p, no cache cap\n", inode);
 			ret = 0;
 		}
-		if (ret <= 0) {
-			if (got)
-				ceph_put_cap_refs(ci, got);
-			while (!list_empty(page_list)) {
-				page = lru_to_page(page_list);
-				list_del(&page->lru);
-				put_page(page);
-			}
-			return ret;
-		}
+		if (ret <= 0)
+			goto out;
 	}
 
-	off = (u64) page_offset(page);
+	dout("readpages %p file %p ctx %p nr_pages %d max %d\n",
+	     inode, file, rw_ctx, nr_pages, max);
 
-	/* count pages */
-	next_index = page->index;
-	list_for_each_entry_reverse(page, page_list, lru) {
-		if (page->index != next_index)
-			break;
-		nr_pages++;
-		next_index++;
-		if (max && nr_pages == max)
-			break;
-	}
-	len = nr_pages << PAGE_SHIFT;
-	dout("start_read %p nr_pages %d is %lld~%lld\n", inode, nr_pages,
-	     off, len);
-	vino = ceph_vino(inode);
-	req = ceph_osdc_new_request(osdc, &ci->i_layout, vino, off, &len,
-				    0, 1, CEPH_OSD_OP_READ,
-				    CEPH_OSD_FLAG_READ, NULL,
-				    ci->i_truncate_seq, ci->i_truncate_size,
-				    false);
-	if (IS_ERR(req)) {
-		ret = PTR_ERR(req);
-		goto out;
-	}
+	while (ret >= 0 && !list_empty(page_list)) {
+		struct ceph_fscache_req *req = ceph_fsreq_alloc();
 
-	/* build page vector */
-	nr_pages = calc_pages_for(0, len);
-	pages = kmalloc_array(nr_pages, sizeof(*pages), GFP_KERNEL);
-	if (!pages) {
-		ret = -ENOMEM;
-		goto out_put;
-	}
-	for (i = 0; i < nr_pages; ++i) {
-		page = list_entry(page_list->prev, struct page, lru);
-		BUG_ON(PageLocked(page));
-		list_del(&page->lru);
-
- 		dout("start_read %p adding %p idx %lu\n", inode, page,
-		     page->index);
-		if (add_to_page_cache_lru(page, &inode->i_data, page->index,
-					  GFP_KERNEL)) {
-			put_page(page);
-			dout("start_read %p add_to_page_cache failed %p\n",
-			     inode, page);
-			nr_pages = i;
-			if (nr_pages > 0) {
-				len = nr_pages << PAGE_SHIFT;
-				osd_req_op_extent_update(req, 0, len);
-				break;
-			}
-			goto out_pages;
+		if (!req) {
+			ret = -ENOMEM;
+			break;
 		}
-		pages[i] = page;
-	}
-	osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0, false, false);
-	req->r_callback = finish_read;
-	req->r_inode = inode;
-
-	dout("start_read %p starting %p %lld~%lld\n", inode, req, off, len);
-	ret = ceph_osdc_start_request(osdc, req, false);
-	if (ret < 0)
-		goto out_pages;
-	ceph_osdc_put_request(req);
-
-	/* After adding locked pages to page cache, the inode holds cache cap.
-	 * So we can drop our cap refs. */
-	if (got)
-		ceph_put_cap_refs(ci, got);
-
-	return nr_pages;
+		fscache_init_io_request(&req->fscache_req, cookie, &ceph_readpage_fsreq_ops);
+		req->fscache_req.mapping = inode->i_mapping;
 
-out_pages:
-	for (i = 0; i < nr_pages; ++i) {
-		unlock_page(pages[i]);
+		ret = fscache_read_helper_page_list(&req->fscache_req, page_list, max);
+		ceph_fsreq_put(&req->fscache_req);
 	}
-	ceph_put_page_vector(pages, nr_pages, false);
-out_put:
-	ceph_osdc_put_request(req);
 out:
+	/* After adding locked pages to page cache, the inode holds Fc refs. We can drop ours. */
 	if (got)
 		ceph_put_cap_refs(ci, got);
-	return ret;
-}
 
-
-/*
- * Read multiple pages.  Leave pages we don't read + unlock in page_list;
- * the caller (VM) cleans them up.
- */
-static int ceph_readpages(struct file *file, struct address_space *mapping,
-			  struct list_head *page_list, unsigned nr_pages)
-{
-	struct inode *inode = file_inode(file);
-	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
-	struct ceph_file_info *fi = file->private_data;
-	struct ceph_rw_context *rw_ctx;
-	int rc = 0;
-	int max = 0;
-
-	if (ceph_inode(inode)->i_inline_version != CEPH_INLINE_NONE)
-		return -EINVAL;
-
-	rw_ctx = ceph_find_rw_context(fi);
-	max = fsc->mount_options->rsize >> PAGE_SHIFT;
-	dout("readpages %p file %p ctx %p nr_pages %d max %d\n",
-	     inode, file, rw_ctx, nr_pages, max);
-	while (!list_empty(page_list)) {
-		rc = start_read(inode, rw_ctx, page_list, max);
-		if (rc < 0)
-			goto out;
-	}
-out:
-	dout("readpages %p file %p ret %d\n", inode, file, rc);
-	return rc;
+	dout("readpages %p file %p ret %d\n", inode, file, ret);
+	return ret;
 }
 
 struct ceph_writeback_ctl
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 10/11] ceph: add fscache writeback support
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
                   ` (8 preceding siblings ...)
  2020-07-31 13:04 ` [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  2020-07-31 13:04 ` [RFC PATCH v2 11/11] ceph: re-enable fscache support Jeff Layton
  10 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

When updating the backing store from the pagecache (a'la writepage or
writepages), write to the cache first. This allows us to keep caching
files even when they are open for write. If the OSD write fails,
invalidate the cache.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c | 122 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 116 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 8905fe4a0930..a21aec8ac0f1 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -435,6 +435,98 @@ static int ceph_readpages(struct file *file, struct address_space *mapping,
 	return ret;
 }
 
+#ifdef CONFIG_CEPH_FSCACHE
+const struct fscache_io_request_ops ceph_write_fsreq_ops = {
+	.get		= ceph_fsreq_get,
+	.put		= ceph_fsreq_put,
+};
+
+/*
+ * Clear the PG_fscache flag from a sequence of pages and wake up anyone who's
+ * waiting.  The last page is included in the sequence. Poached from afs_clear_fscache_bits.
+ */
+static void ceph_clear_fscache_bits(struct address_space *mapping,
+				   pgoff_t start, pgoff_t last)
+{
+	struct page *page;
+
+	XA_STATE(xas, &mapping->i_pages, start);
+
+	rcu_read_lock();
+	xas_for_each(&xas, page, last) {
+		unlock_page_fscache(page);
+	}
+	rcu_read_unlock();
+}
+
+static void ceph_write_to_cache_done(struct fscache_io_request *fsreq)
+{
+	pgoff_t start = fsreq->pos >> PAGE_SHIFT;
+	pgoff_t last = start + fsreq->nr_pages - 1;
+
+	ceph_clear_fscache_bits(fsreq->mapping, start, last);
+	if (fsreq->error && fsreq->error != -ENOBUFS)
+		ceph_fscache_invalidate(fsreq->mapping->host, 0);
+}
+
+static void ceph_write_to_cache(struct inode *inode, u64 off, u64 len)
+{
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct iov_iter iter;
+	struct ceph_fscache_req *req;
+	struct fscache_cookie *cookie = ceph_inode(inode)->fscache;
+	pgoff_t start = off >> PAGE_SHIFT;
+	struct fscache_request_shape shape = {
+		.proposed_start		= start,
+		.proposed_nr_pages	= calc_pages_for(off, len),
+		.max_io_pages		= fsc->mount_options->wsize >> PAGE_SHIFT,
+		.i_size			= i_size_read(inode),
+		.for_write		= true,
+	};
+	pgoff_t last = start + shape.proposed_nr_pages - 1;
+
+	/* Don't do anything if cache is disabled */
+	if (!fscache_cookie_enabled(cookie))
+		goto abandon;
+
+	fscache_shape_request(cookie, &shape);
+	if (!(shape.to_be_done & FSCACHE_WRITE_TO_CACHE) ||
+	    shape.actual_nr_pages == 0 ||
+	    shape.actual_start != shape.proposed_start)
+		goto abandon;
+
+	if (shape.actual_nr_pages < shape.proposed_nr_pages) {
+		ceph_clear_fscache_bits(inode->i_mapping, start + shape.actual_nr_pages,
+					start + shape.proposed_nr_pages - 1);
+		last = shape.proposed_start + shape.actual_nr_pages - 1;
+		len = (u64)(last + 1 - start) << PAGE_SHIFT;
+	}
+
+	req = ceph_fsreq_alloc();
+	if (!req)
+		goto abandon;
+
+	fscache_init_io_request(&req->fscache_req, cookie, &ceph_write_fsreq_ops);
+	req->fscache_req.pos = round_down(off, shape.dio_block_size);
+	req->fscache_req.len = round_up(len, shape.dio_block_size);
+	req->fscache_req.nr_pages = shape.actual_nr_pages;
+	req->fscache_req.mapping = inode->i_mapping;
+	req->fscache_req.io_done = ceph_write_to_cache_done;
+
+	iov_iter_mapping(&iter, WRITE, inode->i_mapping, req->fscache_req.pos,
+			 req->fscache_req.len);
+	fscache_write(&req->fscache_req, &iter);
+	ceph_fsreq_put(&req->fscache_req);
+	return;
+abandon:
+	ceph_clear_fscache_bits(inode->i_mapping, start, last);
+}
+#else
+static inline void ceph_write_to_cache(struct inode *inode, u64 off, u64 len)
+{
+}
+#endif /* CONFIG_CEPH_FSCACHE */
+
 struct ceph_writeback_ctl
 {
 	loff_t i_size;
@@ -588,16 +680,17 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 	    CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
 		set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
 
-	set_page_writeback(page);
 	req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1,
 				    CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc,
 				    ceph_wbc.truncate_seq, ceph_wbc.truncate_size,
 				    true);
-	if (IS_ERR(req)) {
-		redirty_page_for_writepage(wbc, page);
-		end_page_writeback(page);
+	if (IS_ERR(req))
 		return PTR_ERR(req);
-	}
+
+	set_page_writeback(page);
+	if (TestSetPageFsCache(page))
+		BUG();
+	ceph_write_to_cache(inode, page_off, len);
 
 	/* it may be a short write due to an object boundary */
 	WARN_ON_ONCE(len > PAGE_SIZE);
@@ -656,6 +749,9 @@ static int ceph_writepage(struct page *page, struct writeback_control *wbc)
 	struct inode *inode = page->mapping->host;
 	BUG_ON(!inode);
 	ihold(inode);
+
+	ceph_wait_on_page_fscache(page);
+
 	err = writepage_nounlock(page, wbc);
 	if (err == -ERESTARTSYS) {
 		/* direct memory reclaimer was killed by SIGKILL. return 0
@@ -901,7 +997,7 @@ static int ceph_writepages_start(struct address_space *mapping,
 				unlock_page(page);
 				break;
 			}
-			if (PageWriteback(page)) {
+			if (PageWriteback(page) || PageFsCache(page)) {
 				if (wbc->sync_mode == WB_SYNC_NONE) {
 					dout("%p under writeback\n", page);
 					unlock_page(page);
@@ -909,6 +1005,7 @@ static int ceph_writepages_start(struct address_space *mapping,
 				}
 				dout("waiting on writeback %p\n", page);
 				wait_on_page_writeback(page);
+				ceph_wait_on_page_fscache(page);
 			}
 
 			if (!clear_page_dirty_for_io(page)) {
@@ -1041,9 +1138,19 @@ static int ceph_writepages_start(struct address_space *mapping,
 		op_idx = 0;
 		for (i = 0; i < locked_pages; i++) {
 			u64 cur_offset = page_offset(pages[i]);
+			/*
+			 * Discontinuity in page range? Ceph can handle that by just passing
+			 * multiple extents in the write op.
+			 */
 			if (offset + len != cur_offset) {
+				/* If it's full, stop here */
 				if (op_idx + 1 == req->r_num_ops)
 					break;
+
+				/* Kick off an fscache write with what we have so far. */
+				ceph_write_to_cache(inode, offset, len);
+
+				/* Start a new extent */
 				osd_req_op_extent_dup_last(req, op_idx,
 							   cur_offset - offset);
 				dout("writepages got pages at %llu~%llu\n",
@@ -1060,8 +1167,11 @@ static int ceph_writepages_start(struct address_space *mapping,
 			}
 
 			set_page_writeback(pages[i]);
+			if (TestSetPageFsCache(pages[i]))
+				BUG();
 			len += PAGE_SIZE;
 		}
+		ceph_write_to_cache(inode, offset, len);
 
 		if (ceph_wbc.size_stable) {
 			len = min(len, ceph_wbc.i_size - offset);
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH v2 11/11] ceph: re-enable fscache support
  2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
                   ` (9 preceding siblings ...)
  2020-07-31 13:04 ` [RFC PATCH v2 10/11] ceph: add fscache writeback support Jeff Layton
@ 2020-07-31 13:04 ` Jeff Layton
  10 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2020-07-31 13:04 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-cachefs, idryomov

Now that the code has been converted to use the new fscache API, we can
re-enable it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
index 432aa34b63e7..8e6b85275c79 100644
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -22,7 +22,7 @@ config CEPH_FS
 if CEPH_FS
 config CEPH_FSCACHE
 	bool "Enable Ceph client caching support"
-	depends on CEPH_FS=m && FSCACHE_OLD || CEPH_FS=y && FSCACHE_OLD=y
+	depends on CEPH_FS=m && FSCACHE || CEPH_FS=y && FSCACHE=y
 	help
 	  Choose Y here to enable persistent, read-only local
 	  caching support for Ceph clients using FS-Cache
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Linux-cachefs] [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper
  2020-07-31 13:04 ` [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper Jeff Layton
@ 2020-08-09 15:09   ` David Wysochanski
  2020-08-10 11:09     ` Jeff Layton
  2020-08-09 18:06   ` David Wysochanski
  2020-08-10 10:09   ` David Howells
  2 siblings, 1 reply; 19+ messages in thread
From: David Wysochanski @ 2020-08-09 15:09 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, idryomov, linux-cachefs

On Fri, Jul 31, 2020 at 9:05 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> Convert ceph_readpages to use the fscache_read_helper. With this we can
> rip out a lot of the old readpage/readpages infrastructure.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/addr.c | 209 +++++++------------------------------------------
>  1 file changed, 28 insertions(+), 181 deletions(-)
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index cee497c108bb..8905fe4a0930 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -377,76 +377,23 @@ static int ceph_readpage(struct file *filp, struct page *page)
>         return err;
>  }
>
> -/*
> - * Finish an async read(ahead) op.
> - */
> -static void finish_read(struct ceph_osd_request *req)
> -{
> -       struct inode *inode = req->r_inode;
> -       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> -       struct ceph_osd_data *osd_data;
> -       int rc = req->r_result <= 0 ? req->r_result : 0;
> -       int bytes = req->r_result >= 0 ? req->r_result : 0;
> -       int num_pages;
> -       int i;
> -
> -       dout("finish_read %p req %p rc %d bytes %d\n", inode, req, rc, bytes);
> -       if (rc == -EBLACKLISTED)
> -               ceph_inode_to_client(inode)->blacklisted = true;
> -
> -       /* unlock all pages, zeroing any data we didn't read */
> -       osd_data = osd_req_op_extent_osd_data(req, 0);
> -       BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_PAGES);
> -       num_pages = calc_pages_for((u64)osd_data->alignment,
> -                                       (u64)osd_data->length);
> -       for (i = 0; i < num_pages; i++) {
> -               struct page *page = osd_data->pages[i];
> -
> -               if (rc < 0 && rc != -ENOENT)
> -                       goto unlock;
> -               if (bytes < (int)PAGE_SIZE) {
> -                       /* zero (remainder of) page */
> -                       int s = bytes < 0 ? 0 : bytes;
> -                       zero_user_segment(page, s, PAGE_SIZE);
> -               }
> -               dout("finish_read %p uptodate %p idx %lu\n", inode, page,
> -                    page->index);
> -               flush_dcache_page(page);
> -               SetPageUptodate(page);
> -unlock:
> -               unlock_page(page);
> -               put_page(page);
> -               bytes -= PAGE_SIZE;
> -       }
> -
> -       ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_latency,
> -                                req->r_end_latency, rc);
> -
> -       kfree(osd_data->pages);
> -}
> -
> -/*
> - * start an async read(ahead) operation.  return nr_pages we submitted
> - * a read for on success, or negative error code.
> - */
> -static int start_read(struct inode *inode, struct ceph_rw_context *rw_ctx,
> -                     struct list_head *page_list, int max)
> +static int ceph_readpages(struct file *file, struct address_space *mapping,
> +                         struct list_head *page_list, unsigned nr_pages)
>  {
> -       struct ceph_osd_client *osdc =
> -               &ceph_inode_to_client(inode)->client->osdc;
> +       struct inode *inode = file_inode(file);
>         struct ceph_inode_info *ci = ceph_inode(inode);
> -       struct page *page = lru_to_page(page_list);
> -       struct ceph_vino vino;
> -       struct ceph_osd_request *req;
> -       u64 off;
> -       u64 len;
> -       int i;
> -       struct page **pages;
> -       pgoff_t next_index;
> -       int nr_pages = 0;
> +       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> +       struct ceph_file_info *fi = file->private_data;
> +       struct ceph_rw_context *rw_ctx;
> +       struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
>         int got = 0;
>         int ret = 0;
> +       int max = fsc->mount_options->rsize >> PAGE_SHIFT;

Have you ran tests with different values of rsize?
Specifically, rsize < readahead_size == size_of_readpages

I'm seeing a lot of problems with NFS when varying rsize are used wrt
readahead values.  Specifically I'm seeing panics because fscache
expects a 1:1 mapping of issue_op() to io_done() calls, and I get
panics because multiple read completions are trying to unlock the
same pages inside fscache_read_done().

My understanding is afs does not have such 'rsize' limitation, so it
may not be an area that is well tested.  It could be my implementation
of the NFS conversion though, as I thinkwhat needs to happen is the
respect the above 1:1 mapping of issue_op() to io_done() calls, and my
initial implementation did not do that.

FWIW, specifically this unit test was originally failing for me with a panic.
Sun 09 Aug 2020 11:03:22 AM EDT: 1. On NFS client, install and enable
cachefilesd
Sun 09 Aug 2020 11:03:22 AM EDT: 2. On NFS client, mount -o
vers=4.1,fsc,rsize=16384 127.0.0.1:/export/dir1 /mnt/dir1
Sun 09 Aug 2020 11:03:22 AM EDT: 3. On NFS client, dd if=/dev/zero
of=/mnt/dir1/file1.bin bs=65536 count=1
Sun 09 Aug 2020 11:03:22 AM EDT: 4. On NFS client, echo 3 >
/proc/sys/vm/drop_caches
Sun 09 Aug 2020 11:03:22 AM EDT: 5. On NFS client, ./nfs-readahead.sh
set /mnt/dir1 65536
Sun 09 Aug 2020 11:03:23 AM EDT: 6. On NFS client, dd
if=/mnt/dir1/file1.bin of=/dev/null
Sun 09 Aug 2020 11:03:23 AM EDT: 8. On NFS client, echo 3 >
/proc/sys/vm/drop_caches
Sun 09 Aug 2020 11:03:23 AM EDT: 9. On NFS client, dd
if=/mnt/dir1/file1.bin of=/dev/null



> +
> +       if (ceph_inode(inode)->i_inline_version != CEPH_INLINE_NONE)
> +               return -EINVAL;
>
> +       rw_ctx = ceph_find_rw_context(fi);
>         if (!rw_ctx) {
>                 /* caller of readpages does not hold buffer and read caps
>                  * (fadvise, madvise and readahead cases) */
> @@ -459,133 +406,33 @@ static int start_read(struct inode *inode, struct ceph_rw_context *rw_ctx,
>                         dout("start_read %p, no cache cap\n", inode);
>                         ret = 0;
>                 }
> -               if (ret <= 0) {
> -                       if (got)
> -                               ceph_put_cap_refs(ci, got);
> -                       while (!list_empty(page_list)) {
> -                               page = lru_to_page(page_list);
> -                               list_del(&page->lru);
> -                               put_page(page);
> -                       }
> -                       return ret;
> -               }
> +               if (ret <= 0)
> +                       goto out;
>         }
>
> -       off = (u64) page_offset(page);
> +       dout("readpages %p file %p ctx %p nr_pages %d max %d\n",
> +            inode, file, rw_ctx, nr_pages, max);
>
> -       /* count pages */
> -       next_index = page->index;
> -       list_for_each_entry_reverse(page, page_list, lru) {
> -               if (page->index != next_index)
> -                       break;
> -               nr_pages++;
> -               next_index++;
> -               if (max && nr_pages == max)
> -                       break;
> -       }
> -       len = nr_pages << PAGE_SHIFT;
> -       dout("start_read %p nr_pages %d is %lld~%lld\n", inode, nr_pages,
> -            off, len);
> -       vino = ceph_vino(inode);
> -       req = ceph_osdc_new_request(osdc, &ci->i_layout, vino, off, &len,
> -                                   0, 1, CEPH_OSD_OP_READ,
> -                                   CEPH_OSD_FLAG_READ, NULL,
> -                                   ci->i_truncate_seq, ci->i_truncate_size,
> -                                   false);
> -       if (IS_ERR(req)) {
> -               ret = PTR_ERR(req);
> -               goto out;
> -       }
> +       while (ret >= 0 && !list_empty(page_list)) {
> +               struct ceph_fscache_req *req = ceph_fsreq_alloc();
>
> -       /* build page vector */
> -       nr_pages = calc_pages_for(0, len);
> -       pages = kmalloc_array(nr_pages, sizeof(*pages), GFP_KERNEL);
> -       if (!pages) {
> -               ret = -ENOMEM;
> -               goto out_put;
> -       }
> -       for (i = 0; i < nr_pages; ++i) {
> -               page = list_entry(page_list->prev, struct page, lru);
> -               BUG_ON(PageLocked(page));
> -               list_del(&page->lru);
> -
> -               dout("start_read %p adding %p idx %lu\n", inode, page,
> -                    page->index);
> -               if (add_to_page_cache_lru(page, &inode->i_data, page->index,
> -                                         GFP_KERNEL)) {
> -                       put_page(page);
> -                       dout("start_read %p add_to_page_cache failed %p\n",
> -                            inode, page);
> -                       nr_pages = i;
> -                       if (nr_pages > 0) {
> -                               len = nr_pages << PAGE_SHIFT;
> -                               osd_req_op_extent_update(req, 0, len);
> -                               break;
> -                       }
> -                       goto out_pages;
> +               if (!req) {
> +                       ret = -ENOMEM;
> +                       break;
>                 }
> -               pages[i] = page;
> -       }
> -       osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0, false, false);
> -       req->r_callback = finish_read;
> -       req->r_inode = inode;
> -
> -       dout("start_read %p starting %p %lld~%lld\n", inode, req, off, len);
> -       ret = ceph_osdc_start_request(osdc, req, false);
> -       if (ret < 0)
> -               goto out_pages;
> -       ceph_osdc_put_request(req);
> -
> -       /* After adding locked pages to page cache, the inode holds cache cap.
> -        * So we can drop our cap refs. */
> -       if (got)
> -               ceph_put_cap_refs(ci, got);
> -
> -       return nr_pages;
> +               fscache_init_io_request(&req->fscache_req, cookie, &ceph_readpage_fsreq_ops);
> +               req->fscache_req.mapping = inode->i_mapping;
>
> -out_pages:
> -       for (i = 0; i < nr_pages; ++i) {
> -               unlock_page(pages[i]);
> +               ret = fscache_read_helper_page_list(&req->fscache_req, page_list, max);
> +               ceph_fsreq_put(&req->fscache_req);
>         }
> -       ceph_put_page_vector(pages, nr_pages, false);
> -out_put:
> -       ceph_osdc_put_request(req);
>  out:
> +       /* After adding locked pages to page cache, the inode holds Fc refs. We can drop ours. */
>         if (got)
>                 ceph_put_cap_refs(ci, got);
> -       return ret;
> -}
>
> -
> -/*
> - * Read multiple pages.  Leave pages we don't read + unlock in page_list;
> - * the caller (VM) cleans them up.
> - */
> -static int ceph_readpages(struct file *file, struct address_space *mapping,
> -                         struct list_head *page_list, unsigned nr_pages)
> -{
> -       struct inode *inode = file_inode(file);
> -       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> -       struct ceph_file_info *fi = file->private_data;
> -       struct ceph_rw_context *rw_ctx;
> -       int rc = 0;
> -       int max = 0;
> -
> -       if (ceph_inode(inode)->i_inline_version != CEPH_INLINE_NONE)
> -               return -EINVAL;
> -
> -       rw_ctx = ceph_find_rw_context(fi);
> -       max = fsc->mount_options->rsize >> PAGE_SHIFT;
> -       dout("readpages %p file %p ctx %p nr_pages %d max %d\n",
> -            inode, file, rw_ctx, nr_pages, max);
> -       while (!list_empty(page_list)) {
> -               rc = start_read(inode, rw_ctx, page_list, max);
> -               if (rc < 0)
> -                       goto out;
> -       }
> -out:
> -       dout("readpages %p file %p ret %d\n", inode, file, rc);
> -       return rc;
> +       dout("readpages %p file %p ret %d\n", inode, file, ret);
> +       return ret;
>  }
>
>  struct ceph_writeback_ctl
> --
> 2.26.2
>
>
> --
> Linux-cachefs mailing list
> Linux-cachefs@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cachefs
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-cachefs] [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper
  2020-07-31 13:04 ` [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper Jeff Layton
  2020-08-09 15:09   ` [Linux-cachefs] " David Wysochanski
@ 2020-08-09 18:06   ` David Wysochanski
  2020-08-10 10:09   ` David Howells
  2 siblings, 0 replies; 19+ messages in thread
From: David Wysochanski @ 2020-08-09 18:06 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, idryomov, linux-cachefs

On Fri, Jul 31, 2020 at 9:05 AM Jeff Layton <jlayton@kernel.org> wrote:
> +static int ceph_readpages(struct file *file, struct address_space *mapping,
> +                         struct list_head *page_list, unsigned nr_pages)
>  {
...
> +       int max = fsc->mount_options->rsize >> PAGE_SHIFT;
...
> +               ret = fscache_read_helper_page_list(&req->fscache_req, page_list, max);

Looks like the root of my problems is that the 'max_pages' parameter
given to fscache_read_helper_page_list() does not work for purposes of
limiting the IO to the 'rsize'.  That is, the fscache_io_request.nr_pages
exceeds 'max_pages' and becomes readahead_size.  So even though
max_pages is based on 'rsize', when issue_op() is called, it is for a
fscache_io_request that exceeds 'rsize', resulting in multiple NFS
reads that go over the wire and multiple completions, each of
which end up calling back into io_done() which blows up
because fscache does not expect this.  Looks like
fscache_shape_request() overrides any 'max_pages'
value (actually it is cachefiles_shape_request) , so it's
unclear why the netfs would pass in a 'max_pages' if it is
not honored - seems like a bug maybe or it's not obvious
what the purpose is there.  I tried a custom 'shape' method
and got further, but it blew up on another test, so I'm not sure.

It would be good to know if this somehow works for you but my guess is
you'll see similar failures when rsize < readahead_size == size_of_readpages.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-cachefs] [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper
  2020-07-31 13:04 ` [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper Jeff Layton
  2020-08-09 15:09   ` [Linux-cachefs] " David Wysochanski
  2020-08-09 18:06   ` David Wysochanski
@ 2020-08-10 10:09   ` David Howells
  2020-08-10 13:50     ` David Wysochanski
  2 siblings, 1 reply; 19+ messages in thread
From: David Howells @ 2020-08-10 10:09 UTC (permalink / raw)
  To: David Wysochanski
  Cc: dhowells, Jeff Layton, ceph-devel, linux-cachefs, idryomov

David Wysochanski <dwysocha@redhat.com> wrote:

> Looks like fscache_shape_request() overrides any 'max_pages' value (actually
> it is cachefiles_shape_request) , so it's unclear why the netfs would pass
> in a 'max_pages' if it is not honored - seems like a bug maybe or it's not
> obvious

I think the problem is that cachefiles_shape_request() is applying the limit
too early.  It's using it to cut down the number of pages in the original
request (only applicable to readpages), but then the shaping to fit cache
granules can exceed that, so it needs to be applied later also.

Does the attached patch help?

David
---
diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 2bfba2e41c39..ce05cf1d9a6e 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -134,7 +134,8 @@ void cachefiles_shape_request(struct fscache_object *obj,
 	_enter("{%lx,%lx,%x},%llx,%d",
 	       start, end, max_pages, i_size, shape->for_write);
 
-	if (start >= CACHEFILES_SIZE_LIMIT / PAGE_SIZE) {
+	if (start >= CACHEFILES_SIZE_LIMIT / PAGE_SIZE ||
+	    max_pages < CACHEFILES_GRAN_PAGES) {
 		shape->to_be_done = FSCACHE_READ_FROM_SERVER;
 		return;
 	}
@@ -144,10 +145,6 @@ void cachefiles_shape_request(struct fscache_object *obj,
 	if (shape->i_size > CACHEFILES_SIZE_LIMIT)
 		i_size = CACHEFILES_SIZE_LIMIT;
 
-	max_pages = round_down(max_pages, CACHEFILES_GRAN_PAGES);
-	if (end - start > max_pages)
-		end = start + max_pages;
-
 	granule = start / CACHEFILES_GRAN_PAGES;
 	if (granule / 8 >= object->content_map_size) {
 		cachefiles_expand_content_map(object, i_size);
@@ -185,6 +182,10 @@ void cachefiles_shape_request(struct fscache_object *obj,
 		start = round_down(start, CACHEFILES_GRAN_PAGES);
 		end   = round_up(end, CACHEFILES_GRAN_PAGES);
 
+		/* Trim to the maximum size the netfs supports */
+		if (end - start > max_pages)
+			end = round_down(start + max_pages, CACHEFILES_GRAN_PAGES);
+
 		/* But trim to the end of the file and the starting page */
 		eof = (i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
 		if (eof <= shape->proposed_start)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Linux-cachefs] [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper
  2020-08-09 15:09   ` [Linux-cachefs] " David Wysochanski
@ 2020-08-10 11:09     ` Jeff Layton
  2020-08-10 12:24       ` David Wysochanski
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2020-08-10 11:09 UTC (permalink / raw)
  To: David Wysochanski; +Cc: ceph-devel, idryomov, linux-cachefs

On Sun, 2020-08-09 at 11:09 -0400, David Wysochanski wrote:
> On Fri, Jul 31, 2020 at 9:05 AM Jeff Layton <jlayton@kernel.org> wrote:
> > Convert ceph_readpages to use the fscache_read_helper. With this we can
> > rip out a lot of the old readpage/readpages infrastructure.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/addr.c | 209 +++++++------------------------------------------
> >  1 file changed, 28 insertions(+), 181 deletions(-)
> > 
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index cee497c108bb..8905fe4a0930 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -377,76 +377,23 @@ static int ceph_readpage(struct file *filp, struct page *page)
> >         return err;
> >  }
> > 
> > -/*
> > - * Finish an async read(ahead) op.
> > - */
> > -static void finish_read(struct ceph_osd_request *req)
> > -{
> > -       struct inode *inode = req->r_inode;
> > -       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > -       struct ceph_osd_data *osd_data;
> > -       int rc = req->r_result <= 0 ? req->r_result : 0;
> > -       int bytes = req->r_result >= 0 ? req->r_result : 0;
> > -       int num_pages;
> > -       int i;
> > -
> > -       dout("finish_read %p req %p rc %d bytes %d\n", inode, req, rc, bytes);
> > -       if (rc == -EBLACKLISTED)
> > -               ceph_inode_to_client(inode)->blacklisted = true;
> > -
> > -       /* unlock all pages, zeroing any data we didn't read */
> > -       osd_data = osd_req_op_extent_osd_data(req, 0);
> > -       BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_PAGES);
> > -       num_pages = calc_pages_for((u64)osd_data->alignment,
> > -                                       (u64)osd_data->length);
> > -       for (i = 0; i < num_pages; i++) {
> > -               struct page *page = osd_data->pages[i];
> > -
> > -               if (rc < 0 && rc != -ENOENT)
> > -                       goto unlock;
> > -               if (bytes < (int)PAGE_SIZE) {
> > -                       /* zero (remainder of) page */
> > -                       int s = bytes < 0 ? 0 : bytes;
> > -                       zero_user_segment(page, s, PAGE_SIZE);
> > -               }
> > -               dout("finish_read %p uptodate %p idx %lu\n", inode, page,
> > -                    page->index);
> > -               flush_dcache_page(page);
> > -               SetPageUptodate(page);
> > -unlock:
> > -               unlock_page(page);
> > -               put_page(page);
> > -               bytes -= PAGE_SIZE;
> > -       }
> > -
> > -       ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_latency,
> > -                                req->r_end_latency, rc);
> > -
> > -       kfree(osd_data->pages);
> > -}
> > -
> > -/*
> > - * start an async read(ahead) operation.  return nr_pages we submitted
> > - * a read for on success, or negative error code.
> > - */
> > -static int start_read(struct inode *inode, struct ceph_rw_context *rw_ctx,
> > -                     struct list_head *page_list, int max)
> > +static int ceph_readpages(struct file *file, struct address_space *mapping,
> > +                         struct list_head *page_list, unsigned nr_pages)
> >  {
> > -       struct ceph_osd_client *osdc =
> > -               &ceph_inode_to_client(inode)->client->osdc;
> > +       struct inode *inode = file_inode(file);
> >         struct ceph_inode_info *ci = ceph_inode(inode);
> > -       struct page *page = lru_to_page(page_list);
> > -       struct ceph_vino vino;
> > -       struct ceph_osd_request *req;
> > -       u64 off;
> > -       u64 len;
> > -       int i;
> > -       struct page **pages;
> > -       pgoff_t next_index;
> > -       int nr_pages = 0;
> > +       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > +       struct ceph_file_info *fi = file->private_data;
> > +       struct ceph_rw_context *rw_ctx;
> > +       struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
> >         int got = 0;
> >         int ret = 0;
> > +       int max = fsc->mount_options->rsize >> PAGE_SHIFT;
> 
> Have you ran tests with different values of rsize?
> Specifically, rsize < readahead_size == size_of_readpages
> 
> I'm seeing a lot of problems with NFS when varying rsize are used wrt
> readahead values.  Specifically I'm seeing panics because fscache
> expects a 1:1 mapping of issue_op() to io_done() calls, and I get
> panics because multiple read completions are trying to unlock the
> same pages inside fscache_read_done().
> 
> My understanding is afs does not have such 'rsize' limitation, so it
> may not be an area that is well tested.  It could be my implementation
> of the NFS conversion though, as I thinkwhat needs to happen is the
> respect the above 1:1 mapping of issue_op() to io_done() calls, and my
> initial implementation did not do that.
> 
> FWIW, specifically this unit test was originally failing for me with a panic.
> Sun 09 Aug 2020 11:03:22 AM EDT: 1. On NFS client, install and enable
> cachefilesd
> Sun 09 Aug 2020 11:03:22 AM EDT: 2. On NFS client, mount -o
> vers=4.1,fsc,rsize=16384 127.0.0.1:/export/dir1 /mnt/dir1
> Sun 09 Aug 2020 11:03:22 AM EDT: 3. On NFS client, dd if=/dev/zero
> of=/mnt/dir1/file1.bin bs=65536 count=1
> Sun 09 Aug 2020 11:03:22 AM EDT: 4. On NFS client, echo 3 >
> /proc/sys/vm/drop_caches
> Sun 09 Aug 2020 11:03:22 AM EDT: 5. On NFS client, ./nfs-readahead.sh
> set /mnt/dir1 65536
> Sun 09 Aug 2020 11:03:23 AM EDT: 8. On NFS client, echo 3 >
> /proc/sys/vm/drop_caches
> Sun 09 Aug 2020 11:03:23 AM EDT: 9. On NFS client, dd
> if=/mnt/dir1/file1.bin of=/dev/null
> 
> 

I haven't tested much with varying rsize and wsize (setting them on
cephfs is pretty rare), but I'll plan to. What's in nfs-readahead.sh?


> 
> > +
> > +       if (ceph_inode(inode)->i_inline_version != CEPH_INLINE_NONE)
> > +               return -EINVAL;
> > 
> > +       rw_ctx = ceph_find_rw_context(fi);
> >         if (!rw_ctx) {
> >                 /* caller of readpages does not hold buffer and read caps
> >                  * (fadvise, madvise and readahead cases) */
> > @@ -459,133 +406,33 @@ static int start_read(struct inode *inode, struct ceph_rw_context *rw_ctx,
> >                         dout("start_read %p, no cache cap\n", inode);
> >                         ret = 0;
> >                 }
> > -               if (ret <= 0) {
> > -                       if (got)
> > -                               ceph_put_cap_refs(ci, got);
> > -                       while (!list_empty(page_list)) {
> > -                               page = lru_to_page(page_list);
> > -                               list_del(&page->lru);
> > -                               put_page(page);
> > -                       }
> > -                       return ret;
> > -               }
> > +               if (ret <= 0)
> > +                       goto out;
> >         }
> > 
> > -       off = (u64) page_offset(page);
> > +       dout("readpages %p file %p ctx %p nr_pages %d max %d\n",
> > +            inode, file, rw_ctx, nr_pages, max);
> > 
> > -       /* count pages */
> > -       next_index = page->index;
> > -       list_for_each_entry_reverse(page, page_list, lru) {
> > -               if (page->index != next_index)
> > -                       break;
> > -               nr_pages++;
> > -               next_index++;
> > -               if (max && nr_pages == max)
> > -                       break;
> > -       }
> > -       len = nr_pages << PAGE_SHIFT;
> > -       dout("start_read %p nr_pages %d is %lld~%lld\n", inode, nr_pages,
> > -            off, len);
> > -       vino = ceph_vino(inode);
> > -       req = ceph_osdc_new_request(osdc, &ci->i_layout, vino, off, &len,
> > -                                   0, 1, CEPH_OSD_OP_READ,
> > -                                   CEPH_OSD_FLAG_READ, NULL,
> > -                                   ci->i_truncate_seq, ci->i_truncate_size,
> > -                                   false);
> > -       if (IS_ERR(req)) {
> > -               ret = PTR_ERR(req);
> > -               goto out;
> > -       }
> > +       while (ret >= 0 && !list_empty(page_list)) {
> > +               struct ceph_fscache_req *req = ceph_fsreq_alloc();
> > 
> > -       /* build page vector */
> > -       nr_pages = calc_pages_for(0, len);
> > -       pages = kmalloc_array(nr_pages, sizeof(*pages), GFP_KERNEL);
> > -       if (!pages) {
> > -               ret = -ENOMEM;
> > -               goto out_put;
> > -       }
> > -       for (i = 0; i < nr_pages; ++i) {
> > -               page = list_entry(page_list->prev, struct page, lru);
> > -               BUG_ON(PageLocked(page));
> > -               list_del(&page->lru);
> > -
> > -               dout("start_read %p adding %p idx %lu\n", inode, page,
> > -                    page->index);
> > -               if (add_to_page_cache_lru(page, &inode->i_data, page->index,
> > -                                         GFP_KERNEL)) {
> > -                       put_page(page);
> > -                       dout("start_read %p add_to_page_cache failed %p\n",
> > -                            inode, page);
> > -                       nr_pages = i;
> > -                       if (nr_pages > 0) {
> > -                               len = nr_pages << PAGE_SHIFT;
> > -                               osd_req_op_extent_update(req, 0, len);
> > -                               break;
> > -                       }
> > -                       goto out_pages;
> > +               if (!req) {
> > +                       ret = -ENOMEM;
> > +                       break;
> >                 }
> > -               pages[i] = page;
> > -       }
> > -       osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0, false, false);
> > -       req->r_callback = finish_read;
> > -       req->r_inode = inode;
> > -
> > -       dout("start_read %p starting %p %lld~%lld\n", inode, req, off, len);
> > -       ret = ceph_osdc_start_request(osdc, req, false);
> > -       if (ret < 0)
> > -               goto out_pages;
> > -       ceph_osdc_put_request(req);
> > -
> > -       /* After adding locked pages to page cache, the inode holds cache cap.
> > -        * So we can drop our cap refs. */
> > -       if (got)
> > -               ceph_put_cap_refs(ci, got);
> > -
> > -       return nr_pages;
> > +               fscache_init_io_request(&req->fscache_req, cookie, &ceph_readpage_fsreq_ops);
> > +               req->fscache_req.mapping = inode->i_mapping;
> > 
> > -out_pages:
> > -       for (i = 0; i < nr_pages; ++i) {
> > -               unlock_page(pages[i]);
> > +               ret = fscache_read_helper_page_list(&req->fscache_req, page_list, max);
> > +               ceph_fsreq_put(&req->fscache_req);
> >         }
> > -       ceph_put_page_vector(pages, nr_pages, false);
> > -out_put:
> > -       ceph_osdc_put_request(req);
> >  out:
> > +       /* After adding locked pages to page cache, the inode holds Fc refs. We can drop ours. */
> >         if (got)
> >                 ceph_put_cap_refs(ci, got);
> > -       return ret;
> > -}
> > 
> > -
> > -/*
> > - * Read multiple pages.  Leave pages we don't read + unlock in page_list;
> > - * the caller (VM) cleans them up.
> > - */
> > -static int ceph_readpages(struct file *file, struct address_space *mapping,
> > -                         struct list_head *page_list, unsigned nr_pages)
> > -{
> > -       struct inode *inode = file_inode(file);
> > -       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > -       struct ceph_file_info *fi = file->private_data;
> > -       struct ceph_rw_context *rw_ctx;
> > -       int rc = 0;
> > -       int max = 0;
> > -
> > -       if (ceph_inode(inode)->i_inline_version != CEPH_INLINE_NONE)
> > -               return -EINVAL;
> > -
> > -       rw_ctx = ceph_find_rw_context(fi);
> > -       max = fsc->mount_options->rsize >> PAGE_SHIFT;
> > -       dout("readpages %p file %p ctx %p nr_pages %d max %d\n",
> > -            inode, file, rw_ctx, nr_pages, max);
> > -       while (!list_empty(page_list)) {
> > -               rc = start_read(inode, rw_ctx, page_list, max);
> > -               if (rc < 0)
> > -                       goto out;
> > -       }
> > -out:
> > -       dout("readpages %p file %p ret %d\n", inode, file, rc);
> > -       return rc;
> > +       dout("readpages %p file %p ret %d\n", inode, file, ret);
> > +       return ret;
> >  }
> > 
> >  struct ceph_writeback_ctl
> > --
> > 2.26.2
> > 
> > 
> > --
> > Linux-cachefs mailing list
> > Linux-cachefs@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cachefs
> > 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-cachefs] [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper
  2020-08-10 11:09     ` Jeff Layton
@ 2020-08-10 12:24       ` David Wysochanski
  0 siblings, 0 replies; 19+ messages in thread
From: David Wysochanski @ 2020-08-10 12:24 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, idryomov, linux-cachefs

[-- Attachment #1: Type: text/plain, Size: 6081 bytes --]

On Mon, Aug 10, 2020 at 7:09 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Sun, 2020-08-09 at 11:09 -0400, David Wysochanski wrote:
> > On Fri, Jul 31, 2020 at 9:05 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > Convert ceph_readpages to use the fscache_read_helper. With this we can
> > > rip out a lot of the old readpage/readpages infrastructure.
> > >
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/addr.c | 209 +++++++------------------------------------------
> > >  1 file changed, 28 insertions(+), 181 deletions(-)
> > >
> > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > > index cee497c108bb..8905fe4a0930 100644
> > > --- a/fs/ceph/addr.c
> > > +++ b/fs/ceph/addr.c
> > > @@ -377,76 +377,23 @@ static int ceph_readpage(struct file *filp, struct page *page)
> > >         return err;
> > >  }
> > >
> > > -/*
> > > - * Finish an async read(ahead) op.
> > > - */
> > > -static void finish_read(struct ceph_osd_request *req)
> > > -{
> > > -       struct inode *inode = req->r_inode;
> > > -       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > > -       struct ceph_osd_data *osd_data;
> > > -       int rc = req->r_result <= 0 ? req->r_result : 0;
> > > -       int bytes = req->r_result >= 0 ? req->r_result : 0;
> > > -       int num_pages;
> > > -       int i;
> > > -
> > > -       dout("finish_read %p req %p rc %d bytes %d\n", inode, req, rc, bytes);
> > > -       if (rc == -EBLACKLISTED)
> > > -               ceph_inode_to_client(inode)->blacklisted = true;
> > > -
> > > -       /* unlock all pages, zeroing any data we didn't read */
> > > -       osd_data = osd_req_op_extent_osd_data(req, 0);
> > > -       BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_PAGES);
> > > -       num_pages = calc_pages_for((u64)osd_data->alignment,
> > > -                                       (u64)osd_data->length);
> > > -       for (i = 0; i < num_pages; i++) {
> > > -               struct page *page = osd_data->pages[i];
> > > -
> > > -               if (rc < 0 && rc != -ENOENT)
> > > -                       goto unlock;
> > > -               if (bytes < (int)PAGE_SIZE) {
> > > -                       /* zero (remainder of) page */
> > > -                       int s = bytes < 0 ? 0 : bytes;
> > > -                       zero_user_segment(page, s, PAGE_SIZE);
> > > -               }
> > > -               dout("finish_read %p uptodate %p idx %lu\n", inode, page,
> > > -                    page->index);
> > > -               flush_dcache_page(page);
> > > -               SetPageUptodate(page);
> > > -unlock:
> > > -               unlock_page(page);
> > > -               put_page(page);
> > > -               bytes -= PAGE_SIZE;
> > > -       }
> > > -
> > > -       ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_latency,
> > > -                                req->r_end_latency, rc);
> > > -
> > > -       kfree(osd_data->pages);
> > > -}
> > > -
> > > -/*
> > > - * start an async read(ahead) operation.  return nr_pages we submitted
> > > - * a read for on success, or negative error code.
> > > - */
> > > -static int start_read(struct inode *inode, struct ceph_rw_context *rw_ctx,
> > > -                     struct list_head *page_list, int max)
> > > +static int ceph_readpages(struct file *file, struct address_space *mapping,
> > > +                         struct list_head *page_list, unsigned nr_pages)
> > >  {
> > > -       struct ceph_osd_client *osdc =
> > > -               &ceph_inode_to_client(inode)->client->osdc;
> > > +       struct inode *inode = file_inode(file);
> > >         struct ceph_inode_info *ci = ceph_inode(inode);
> > > -       struct page *page = lru_to_page(page_list);
> > > -       struct ceph_vino vino;
> > > -       struct ceph_osd_request *req;
> > > -       u64 off;
> > > -       u64 len;
> > > -       int i;
> > > -       struct page **pages;
> > > -       pgoff_t next_index;
> > > -       int nr_pages = 0;
> > > +       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > > +       struct ceph_file_info *fi = file->private_data;
> > > +       struct ceph_rw_context *rw_ctx;
> > > +       struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
> > >         int got = 0;
> > >         int ret = 0;
> > > +       int max = fsc->mount_options->rsize >> PAGE_SHIFT;
> >
> > Have you ran tests with different values of rsize?
> > Specifically, rsize < readahead_size == size_of_readpages
> >
> > I'm seeing a lot of problems with NFS when varying rsize are used wrt
> > readahead values.  Specifically I'm seeing panics because fscache
> > expects a 1:1 mapping of issue_op() to io_done() calls, and I get
> > panics because multiple read completions are trying to unlock the
> > same pages inside fscache_read_done().
> >
> > My understanding is afs does not have such 'rsize' limitation, so it
> > may not be an area that is well tested.  It could be my implementation
> > of the NFS conversion though, as I thinkwhat needs to happen is the
> > respect the above 1:1 mapping of issue_op() to io_done() calls, and my
> > initial implementation did not do that.
> >
> > FWIW, specifically this unit test was originally failing for me with a panic.
> > Sun 09 Aug 2020 11:03:22 AM EDT: 1. On NFS client, install and enable
> > cachefilesd
> > Sun 09 Aug 2020 11:03:22 AM EDT: 2. On NFS client, mount -o
> > vers=4.1,fsc,rsize=16384 127.0.0.1:/export/dir1 /mnt/dir1
> > Sun 09 Aug 2020 11:03:22 AM EDT: 3. On NFS client, dd if=/dev/zero
> > of=/mnt/dir1/file1.bin bs=65536 count=1
> > Sun 09 Aug 2020 11:03:22 AM EDT: 4. On NFS client, echo 3 >
> > /proc/sys/vm/drop_caches
> > Sun 09 Aug 2020 11:03:22 AM EDT: 5. On NFS client, ./nfs-readahead.sh
> > set /mnt/dir1 65536
> > Sun 09 Aug 2020 11:03:23 AM EDT: 8. On NFS client, echo 3 >
> > /proc/sys/vm/drop_caches
> > Sun 09 Aug 2020 11:03:23 AM EDT: 9. On NFS client, dd
> > if=/mnt/dir1/file1.bin of=/dev/null
> >
> >
>
> I haven't tested much with varying rsize and wsize (setting them on
> cephfs is pretty rare), but I'll plan to. What's in nfs-readahead.sh?
>
>

See attached.

[-- Attachment #2: nfs-readahead.sh --]
[-- Type: application/x-shellscript, Size: 1087 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-cachefs] [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper
  2020-08-10 10:09   ` David Howells
@ 2020-08-10 13:50     ` David Wysochanski
  2020-08-10 18:55       ` David Wysochanski
  0 siblings, 1 reply; 19+ messages in thread
From: David Wysochanski @ 2020-08-10 13:50 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, ceph-devel, linux-cachefs, idryomov

On Mon, Aug 10, 2020 at 6:09 AM David Howells <dhowells@redhat.com> wrote:
>
> David Wysochanski <dwysocha@redhat.com> wrote:
>
> > Looks like fscache_shape_request() overrides any 'max_pages' value (actually
> > it is cachefiles_shape_request) , so it's unclear why the netfs would pass
> > in a 'max_pages' if it is not honored - seems like a bug maybe or it's not
> > obvious
>
> I think the problem is that cachefiles_shape_request() is applying the limit
> too early.  It's using it to cut down the number of pages in the original
> request (only applicable to readpages), but then the shaping to fit cache
> granules can exceed that, so it needs to be applied later also.
>
> Does the attached patch help?
>
> David
> ---
> diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
> index 2bfba2e41c39..ce05cf1d9a6e 100644
> --- a/fs/cachefiles/content-map.c
> +++ b/fs/cachefiles/content-map.c
> @@ -134,7 +134,8 @@ void cachefiles_shape_request(struct fscache_object *obj,
>         _enter("{%lx,%lx,%x},%llx,%d",
>                start, end, max_pages, i_size, shape->for_write);
>
> -       if (start >= CACHEFILES_SIZE_LIMIT / PAGE_SIZE) {
> +       if (start >= CACHEFILES_SIZE_LIMIT / PAGE_SIZE ||
> +           max_pages < CACHEFILES_GRAN_PAGES) {
>                 shape->to_be_done = FSCACHE_READ_FROM_SERVER;
>                 return;
>         }
> @@ -144,10 +145,6 @@ void cachefiles_shape_request(struct fscache_object *obj,
>         if (shape->i_size > CACHEFILES_SIZE_LIMIT)
>                 i_size = CACHEFILES_SIZE_LIMIT;
>
> -       max_pages = round_down(max_pages, CACHEFILES_GRAN_PAGES);
> -       if (end - start > max_pages)
> -               end = start + max_pages;
> -
>         granule = start / CACHEFILES_GRAN_PAGES;
>         if (granule / 8 >= object->content_map_size) {
>                 cachefiles_expand_content_map(object, i_size);
> @@ -185,6 +182,10 @@ void cachefiles_shape_request(struct fscache_object *obj,
>                 start = round_down(start, CACHEFILES_GRAN_PAGES);
>                 end   = round_up(end, CACHEFILES_GRAN_PAGES);
>
> +               /* Trim to the maximum size the netfs supports */
> +               if (end - start > max_pages)
> +                       end = round_down(start + max_pages, CACHEFILES_GRAN_PAGES);
> +
>                 /* But trim to the end of the file and the starting page */
>                 eof = (i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
>                 if (eof <= shape->proposed_start)
>

I tried this and got the same panic - I think i_size is the culprit
(it is larger than max_pages).  I'll send you a larger trace offline
with cachefiles/fscache debugging enabled if that helps, but below is
some custom tracing that may be enough because it shows before / after
shaping values.

Here's outline of the test (smaller rsize and readahead for simplicity):
# ./t1_rsize_lt_read.sh 4.1
Setting NFS vers=4.1
Mon 10 Aug 2020 09:34:18 AM EDT: 1. On NFS client, install and enable
cachefilesd
Mon 10 Aug 2020 09:34:18 AM EDT: 2. On NFS client, mount -o
vers=4.1,fsc,rsize=8192 127.0.0.1:/export/dir1 /mnt/dir1
Mon 10 Aug 2020 09:34:18 AM EDT: 3. On NFS client, dd if=/dev/zero
of=/mnt/dir1/file1.bin bs=16384 count=1
Mon 10 Aug 2020 09:34:18 AM EDT: 4. On NFS client, echo 3 >
/proc/sys/vm/drop_caches
Mon 10 Aug 2020 09:34:19 AM EDT: 5. On NFS client, ./nfs-readahead.sh
set /mnt/dir1 16384
Mon 10 Aug 2020 09:34:19 AM EDT: 6. On NFS client, dd
if=/mnt/dir1/file1.bin of=/dev/null
Mon 10 Aug 2020 09:34:19 AM EDT: 7. On NFS client, echo 3 >
/proc/sys/vm/drop_caches
Mon 10 Aug 2020 09:34:19 AM EDT: 8. On NFS client, dd
if=/mnt/dir1/file1.bin of=/dev/null


Console with custom nfs tracing
[   62.955355] t1_rsize_lt_rea (4840): drop_caches: 3
[   63.028786] fs/nfs/fscache.c:480 before read_helper_page_list pid
4882 inode ffff8902b4a2a828 cache ffff8902f50b5800 pages
ffffb4b4c0fafca8 max_pages 2
[   63.028804] fs/fscache/read_helper.c:347 pid 4882
fscache_read_helper before shape req ffff8902f50b5800 req->nr_pages 0
shape.actual_nr_pages 48 shape.proposed_nr_pages 4
[   63.037231] fs/fscache/read_helper.c:353 pid 4882
fscache_read_helper after shape req ffff8902f50b5800 req->nr_pages 0
shape.actual_nr_pages 4 shape.proposed_nr_pages 4
[   63.043421] fs/fscache/read_helper.c:531 pid 4882
fscache_read_helper before while req ffff8902f50b5800 req->nr_pages 1
shape.actual_nr_pages 4 shape.proposed_nr_pages 4
[   63.049498] fs/fscache/read_helper.c:531 pid 4882
fscache_read_helper before while req ffff8902f50b5800 req->nr_pages 2
shape.actual_nr_pages 4 shape.proposed_nr_pages 4
[   63.063708] fs/fscache/read_helper.c:531 pid 4882
fscache_read_helper before while req ffff8902f50b5800 req->nr_pages 3
shape.actual_nr_pages 4 shape.proposed_nr_pages 4
[   63.070114] fs/fscache/read_helper.c:531 pid 4882
fscache_read_helper before while req ffff8902f50b5800 req->nr_pages 4
shape.actual_nr_pages 4 shape.proposed_nr_pages 4
[   63.076438] fs/nfs/fscache.c:369 enter nfs_issue_op pid 4882 inode
ffff8902b4a2a828 cache ffff8902f50b5800 start 0 last 3
[   63.082964] fs/nfs/fscache.c:379 before readpage_async_filler pid
4882 inode ffff8902b4a2a828 cache ffff8902f50b5800 page
fffff42f08741a00
[   63.087591] fs/nfs/fscache.c:382 after readpage_async_filler pid
4882 inode ffff8902b4a2a828 cache ffff8902f50b5800 page
fffff42f08741a00 cache.error 0
[   63.093058] fs/nfs/fscache.c:379 before readpage_async_filler pid
4882 inode ffff8902b4a2a828 cache ffff8902f50b5800 page
fffff42f08288680
[   63.098927] fs/nfs/fscache.c:382 after readpage_async_filler pid
4882 inode ffff8902b4a2a828 cache ffff8902f50b5800 page
fffff42f08288680 cache.error 0
[   63.104507] fs/nfs/fscache.c:379 before readpage_async_filler pid
4882 inode ffff8902b4a2a828 cache ffff8902f50b5800 page
fffff42f082816c0
[   63.110922] fs/nfs/fscache.c:382 after readpage_async_filler pid
4882 inode ffff8902b4a2a828 cache ffff8902f50b5800 page
fffff42f082816c0 cache.error 0
[   63.111973] fs/nfs/fscache.c:523 pid 233 before io_done inode
ffff8902b4a2a828 bytes 8192 &req->cache ffff8902f50b5800 cache.pos 0
cache.len 16384
[   63.115407] fs/nfs/fscache.c:379 before readpage_async_filler pid
4882 inode ffff8902b4a2a828 cache ffff8902f50b5800 page
fffff42f067e8f40
[   63.126337] fs/nfs/fscache.c:382 after readpage_async_filler pid
4882 inode ffff8902b4a2a828 cache ffff8902f50b5800 page
fffff42f067e8f40 cache.error 0
[   63.131411] fs/nfs/fscache.c:388 exit nfs_issue_op pid 4882 inode
ffff8902b4a2a828 cache ffff8902f50b5800
[   63.131955] fs/nfs/fscache.c:523 pid 233 before io_done inode
ffff8902b4a2a828 bytes 8192 &req->cache ffff8902f50b5800 cache.pos 0
cache.len 16384
[   63.137012] fs/nfs/fscache.c:484 after read_helper_page_list pid
4882 inode ffff8902b4a2a828 cache ffff8902f50b5800 cache.pos 0
cache.len 16384 cache.nr_pages 4 pages ffffb4b4c0fafca8 ret 0
[   63.140922] page:fffff42f08741a00 refcount:2 mapcount:0
mapping:00000000727f3adc index:0x0
[   63.141091] mapping->aops:nfs_file_aops [nfs] dentry name:"file1.bin"
[   63.146475] fs/nfs/fscache.c:490 outside while(!list_empty(pages))
read_helper_page_list pid 4882 inode ffff8902b4a2a828 cache
ffff8902f50b5800 cache.pos 0 cache.len 16384 cache.nr_pages 4
[   63.146740] fs/fscache/read_helper.c:347 pid 4882
fscache_read_helper before shape req ffff8902f50b5800 req->nr_pages 0
shape.actual_nr_pages 3227832042 shape.proposed_nr_pages 1
[   63.153662] flags: 0x17ffffc0000006(referenced|uptodate)
[   63.153699] raw: 0017ffffc0000006 dead000000000100 dead000000000122
ffff8902b4a2a9a0
[   63.168174] fs/fscache/read_helper.c:353 pid 4882
fscache_read_helper after shape req ffff8902f50b5800 req->nr_pages 0
shape.actual_nr_pages 5 shape.proposed_nr_pages 1
[   63.193131] raw: 0000000000000000 0000000000000000 00000001ffffffff
ffff8902ecfe8000
[   63.203785] page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
[   63.206372] page->mem_cgroup:ffff8902ecfe8000
[   63.208333] ------------[ cut here ]------------
[   63.211081] kernel BUG at mm/filemap.c:1290!
[   63.213152] invalid opcode: 0000 [#1] SMP PTI


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-cachefs] [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper
  2020-08-10 13:50     ` David Wysochanski
@ 2020-08-10 18:55       ` David Wysochanski
  0 siblings, 0 replies; 19+ messages in thread
From: David Wysochanski @ 2020-08-10 18:55 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, ceph-devel, linux-cachefs, idryomov

On Mon, Aug 10, 2020 at 9:50 AM David Wysochanski <dwysocha@redhat.com> wrote:
>
> On Mon, Aug 10, 2020 at 6:09 AM David Howells <dhowells@redhat.com> wrote:
> >
> > David Wysochanski <dwysocha@redhat.com> wrote:
> >
> > > Looks like fscache_shape_request() overrides any 'max_pages' value (actually
> > > it is cachefiles_shape_request) , so it's unclear why the netfs would pass
> > > in a 'max_pages' if it is not honored - seems like a bug maybe or it's not
> > > obvious
> >
> > I think the problem is that cachefiles_shape_request() is applying the limit
> > too early.  It's using it to cut down the number of pages in the original
> > request (only applicable to readpages), but then the shaping to fit cache
> > granules can exceed that, so it needs to be applied later also.
> >
> > Does the attached patch help?
> >
> > David
> > ---
> > diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
> > index 2bfba2e41c39..ce05cf1d9a6e 100644
> > --- a/fs/cachefiles/content-map.c
> > +++ b/fs/cachefiles/content-map.c
> > @@ -134,7 +134,8 @@ void cachefiles_shape_request(struct fscache_object *obj,
> >         _enter("{%lx,%lx,%x},%llx,%d",
> >                start, end, max_pages, i_size, shape->for_write);
> >
> > -       if (start >= CACHEFILES_SIZE_LIMIT / PAGE_SIZE) {
> > +       if (start >= CACHEFILES_SIZE_LIMIT / PAGE_SIZE ||
> > +           max_pages < CACHEFILES_GRAN_PAGES) {
> >                 shape->to_be_done = FSCACHE_READ_FROM_SERVER;
> >                 return;
> >         }
> > @@ -144,10 +145,6 @@ void cachefiles_shape_request(struct fscache_object *obj,
> >         if (shape->i_size > CACHEFILES_SIZE_LIMIT)
> >                 i_size = CACHEFILES_SIZE_LIMIT;
> >
> > -       max_pages = round_down(max_pages, CACHEFILES_GRAN_PAGES);
> > -       if (end - start > max_pages)
> > -               end = start + max_pages;
> > -
> >         granule = start / CACHEFILES_GRAN_PAGES;
> >         if (granule / 8 >= object->content_map_size) {
> >                 cachefiles_expand_content_map(object, i_size);
> > @@ -185,6 +182,10 @@ void cachefiles_shape_request(struct fscache_object *obj,
> >                 start = round_down(start, CACHEFILES_GRAN_PAGES);
> >                 end   = round_up(end, CACHEFILES_GRAN_PAGES);
> >
> > +               /* Trim to the maximum size the netfs supports */
> > +               if (end - start > max_pages)
> > +                       end = round_down(start + max_pages, CACHEFILES_GRAN_PAGES);
> > +
> >                 /* But trim to the end of the file and the starting page */
> >                 eof = (i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> >                 if (eof <= shape->proposed_start)
> >
>
> I tried this and got the same panic - I think i_size is the culprit
> (it is larger than max_pages).  I'll send you a larger trace offline
> with cachefiles/fscache debugging enabled if that helps, but below is
> some custom tracing that may be enough because it shows before / after
> shaping values.
>

FWIW, after testing the aforementioned patch, and tracing it,
it is not i_size after all.  I added this small patch on top of the
patch to cachefiles_shape_request() and no more panics.  Though
this may not address the full underlying issues, it at least gets
past this point and max_pages seems to work better.

---
diff --git a/fs/fscache/read_helper.c b/fs/fscache/read_helper.c
index a464c3e3188a..fa67339e7304 100644
--- a/fs/fscache/read_helper.c
+++ b/fs/fscache/read_helper.c
@@ -318,8 +318,8 @@ static int fscache_read_helper(struct
fscache_io_request *req,
        switch (type) {
        case FSCACHE_READ_PAGE_LIST:
                shape.proposed_start = lru_to_page(pages)->index;
-               shape.proposed_nr_pages =
-                       lru_to_last_page(pages)->index -
shape.proposed_start + 1;
+               shape.proposed_nr_pages = min_t(unsigned int, max_pages,
+                       lru_to_last_page(pages)->index -
shape.proposed_start + 1);
                break;

        case FSCACHE_READ_LOCKED_PAGE:


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-08-10 18:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31 13:04 [RFC PATCH v2 00/11] ceph: convert to new FSCache API Jeff Layton
2020-07-31 13:04 ` [RFC PATCH v2 01/11] ceph: break out writeback of incompatible snap context to separate function Jeff Layton
2020-07-31 13:04 ` [RFC PATCH v2 02/11] ceph: don't call ceph_update_writeable_page from page_mkwrite Jeff Layton
2020-07-31 13:04 ` [RFC PATCH v2 03/11] ceph: fold ceph_sync_readpages into ceph_readpage Jeff Layton
2020-07-31 13:04 ` [RFC PATCH v2 04/11] ceph: fold ceph_sync_writepages into writepage_nounlock Jeff Layton
2020-07-31 13:04 ` [RFC PATCH v2 05/11] ceph: fold ceph_update_writeable_page into ceph_write_begin Jeff Layton
2020-07-31 13:04 ` [RFC PATCH v2 06/11] ceph: conversion to new fscache API Jeff Layton
2020-07-31 13:04 ` [RFC PATCH v2 07/11] ceph: convert readpage to fscache read helper Jeff Layton
2020-07-31 13:04 ` [RFC PATCH v2 08/11] ceph: plug write_begin into " Jeff Layton
2020-07-31 13:04 ` [RFC PATCH v2 09/11] ceph: convert readpages to fscache_read_helper Jeff Layton
2020-08-09 15:09   ` [Linux-cachefs] " David Wysochanski
2020-08-10 11:09     ` Jeff Layton
2020-08-10 12:24       ` David Wysochanski
2020-08-09 18:06   ` David Wysochanski
2020-08-10 10:09   ` David Howells
2020-08-10 13:50     ` David Wysochanski
2020-08-10 18:55       ` David Wysochanski
2020-07-31 13:04 ` [RFC PATCH v2 10/11] ceph: add fscache writeback support Jeff Layton
2020-07-31 13:04 ` [RFC PATCH v2 11/11] ceph: re-enable fscache support Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.