linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 0/5] Convert NFS with fscache to the netfs API
@ 2023-02-20 13:43 Dave Wysochanski
  2023-02-20 13:43 ` [PATCH v11 1/5] NFS: Rename readpage_async_filler to nfs_read_add_folio Dave Wysochanski
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Dave Wysochanski @ 2023-02-20 13:43 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne
  Cc: linux-nfs, linux-cachefs

Trond, this v11 patchset addresses your latest feedback on patch #2,
and I did a little bit of cleanup to patch 3 (see Changes notes below).
I'm not sure of further changes to patch #3 without a more in-depth
review with specifics, if you feel the current approach is unacceptable [1].

Ben and Daire, if you could test this set and provide you feedback
and a Tested-By: that would be appreciated.  This set addresses
the existing NFS + fscache performance concerns seen by a few
users [5], which is due to utilization use of the deprecated
single-page function, fscache_fallback_read_page().  However,
until "known issue #1" below is also resolved, even with these
patches, performance of NFS+fscache will still be a problem
in some scenarios.

This patchset converts NFS with fscache buffered read IO paths to
use the netfs API with a non-invasive approach.  The existing NFS pgio
layer does not need extensive changes, and is the best way so far I've
found to address Trond's previous concerns about modifying the IO
path [2] as well as only enabling netfs when fscache is configured
and enabled [3].  I have not attempted performance comparisions to
address Chuck Lever's concern [4] because we are not converting the
non-fscache enabled NFS IO paths to netfs. 

The patchset is based on Trond's latest 'testing' branch which includes
his folio patchset, and is based on 6.2-rc5.  It has been pushed to
github at:
https://github.com/DaveWysochanskiRH/kernel/commits/nfs-fscache-netfs
https://github.com/DaveWysochanskiRH/kernel/commit/6424e4f139652b7552eff26eb5da1f2282d35616

Changes since v10 [6]
=====================
PATCH6: Dropped
PATCH1: Rename nfs_pageio_add_page to nfs_read_add_folio
PATCH2: Use anonymous union to add struct netfs_inode to nfs_inode (Trond) [7]
PATCH3: Change nfs_netfs_readpage_release() to nfs_netfs_folio_unlock()

Testing
=======
I did a full round of testing on this because it was rebased on top of
Trond's testing branch which included his folio series.
All of my unit tests pass except the one with the known issue #1 below.
Multiple runs of xfstests generic tests (applicable to NFS) were run with
with various servers, both with and without fscache enabled, and
compared to baseline (Trond's testing branch).  No new failures were
observed with these patches, and in some xfstest instances, this
patchset improves the results (some tests that were failing now pass).
- hammerspace(pNFS flexfiles): NFS4.1, NFS4.2
- NetApp(pNFS filelayout): NFS4.1, NFS4.0, NFS3
- RHEL9: NFS4.2, NFS4.1, NFS4.0, NFS3

Known issues
============
1. Unit test setting rsize < readahead does not properly read from
fscache but re-reads data from the NFS server
* This will be fixed with another dhowells patch [8]:
  "[PATCH v6 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"
* Daire Byrne verified the patch fixes his issue as well

2. "Cache volume key already in use" after xfstest runs involving multiple mounts
* Simple reproducer requires just two mounts as follows:
 mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp1 /mnt1
 mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp2 /mnt2
* This should be fixed with dhowells patch [9]:
  "[PATCH v5] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing"


References
==========
[1] https://lore.kernel.org/linux-nfs/0676ecb2bb708e6fc29dbbe6b44551d6a0d021dc.camel@kernel.org/
[2] https://lore.kernel.org/linux-nfs/9cfd5bc3cfc6abc2d3316b0387222e708d67f595.camel@hammerspace.com/
[3] https://lore.kernel.org/linux-nfs/da9200f1bded9b8b078a7aef227fd6b92eb028fb.camel@hammerspace.com/
[4] https://lore.kernel.org/linux-nfs/0A640C47-5F51-47E8-864D-E0E980F8B310@oracle.com/
[5] https://lore.kernel.org/linux-nfs/CA+QRt4tPqH87NVkoETLjxieGjZ_f7XxRj+xS3NVxcJ+b8AAKQg@mail.gmail.com/
[6] https://lore.kernel.org/linux-nfs/20221103161637.1725471-1-dwysocha@redhat.com/
[7] https://lore.kernel.org/linux-nfs/4d60636f62df4f5c200666ed2d1a5f2414c18e1f.camel@kernel.org/
[8] https://lore.kernel.org/linux-nfs/20230216150701.3654894-1-dhowells@redhat.com/T/#mf3807fa68fb6d495b87dde0d76b5237833a0cc81
[9] https://lore.kernel.org/linux-kernel/217595.1662033775@warthog.procyon.org.uk/

Dave Wysochanski (5):
  NFS: Rename readpage_async_filler to nfs_read_add_folio
  NFS: Configure support for netfs when NFS fscache is configured
  NFS: Convert buffered read paths to use netfs when fscache is enabled
  NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API
  NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit

 fs/nfs/Kconfig             |   1 +
 fs/nfs/fscache.c           | 242 ++++++++++++++++++++++---------------
 fs/nfs/fscache.h           | 131 ++++++++++++++------
 fs/nfs/inode.c             |   2 +
 fs/nfs/internal.h          |   9 ++
 fs/nfs/iostat.h            |  17 ---
 fs/nfs/nfstrace.h          |  91 --------------
 fs/nfs/pagelist.c          |   4 +
 fs/nfs/read.c              | 105 ++++++++--------
 fs/nfs/super.c             |  11 --
 include/linux/nfs_fs.h     |  25 ++--
 include/linux/nfs_iostat.h |  12 --
 include/linux/nfs_page.h   |   3 +
 include/linux/nfs_xdr.h    |   3 +
 14 files changed, 317 insertions(+), 339 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v11 1/5] NFS: Rename readpage_async_filler to nfs_read_add_folio
  2023-02-20 13:43 [PATCH v11 0/5] Convert NFS with fscache to the netfs API Dave Wysochanski
@ 2023-02-20 13:43 ` Dave Wysochanski
  2023-02-20 13:43 ` [PATCH v11 2/5] NFS: Configure support for netfs when NFS fscache is configured Dave Wysochanski
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Dave Wysochanski @ 2023-02-20 13:43 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne
  Cc: linux-nfs, linux-cachefs

Rename readpage_async_filler to nfs_read_add_folio to
better reflect what this function does (add a folio to
the nfs_pageio_descriptor), and simplify arguments to
this function by removing struct nfs_readdesc.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/read.c | 54 +++++++++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index c380cff4108e..4cb3991c4735 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -126,11 +126,6 @@ static void nfs_readpage_release(struct nfs_page *req, int error)
 	nfs_release_request(req);
 }
 
-struct nfs_readdesc {
-	struct nfs_pageio_descriptor pgio;
-	struct nfs_open_context *ctx;
-};
-
 static void nfs_page_group_set_uptodate(struct nfs_page *req)
 {
 	if (nfs_page_group_sync_on_bit(req, PG_UPTODATE))
@@ -152,7 +147,8 @@ static void nfs_read_completion(struct nfs_pgio_header *hdr)
 
 		if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) {
 			/* note: regions of the page not covered by a
-			 * request are zeroed in readpage_async_filler */
+			 * request are zeroed in nfs_read_add_folio
+			 */
 			if (bytes > hdr->good_bytes) {
 				/* nothing in this request was good, so zero
 				 * the full extent of the request */
@@ -280,7 +276,9 @@ static void nfs_readpage_result(struct rpc_task *task,
 		nfs_readpage_retry(task, hdr);
 }
 
-static int readpage_async_filler(struct nfs_readdesc *desc, struct folio *folio)
+static int nfs_read_add_folio(struct nfs_pageio_descriptor *pgio,
+			      struct nfs_open_context *ctx,
+			      struct folio *folio)
 {
 	struct inode *inode = folio_file_mapping(folio)->host;
 	struct nfs_server *server = NFS_SERVER(inode);
@@ -302,15 +300,15 @@ static int readpage_async_filler(struct nfs_readdesc *desc, struct folio *folio)
 			goto out_unlock;
 	}
 
-	new = nfs_page_create_from_folio(desc->ctx, folio, 0, aligned_len);
+	new = nfs_page_create_from_folio(ctx, folio, 0, aligned_len);
 	if (IS_ERR(new))
 		goto out_error;
 
 	if (len < fsize)
 		folio_zero_segment(folio, len, fsize);
-	if (!nfs_pageio_add_request(&desc->pgio, new)) {
+	if (!nfs_pageio_add_request(pgio, new)) {
 		nfs_list_remove_request(new);
-		error = desc->pgio.pg_error;
+		error = pgio->pg_error;
 		nfs_readpage_release(new, error);
 		goto out;
 	}
@@ -331,8 +329,9 @@ static int readpage_async_filler(struct nfs_readdesc *desc, struct folio *folio)
  */
 int nfs_read_folio(struct file *file, struct folio *folio)
 {
-	struct nfs_readdesc desc;
 	struct inode *inode = file_inode(file);
+	struct nfs_pageio_descriptor pgio;
+	struct nfs_open_context *ctx;
 	int ret;
 
 	trace_nfs_aop_readpage(inode, folio);
@@ -355,25 +354,25 @@ int nfs_read_folio(struct file *file, struct folio *folio)
 	if (NFS_STALE(inode))
 		goto out_unlock;
 
-	desc.ctx = get_nfs_open_context(nfs_file_open_context(file));
+	ctx = get_nfs_open_context(nfs_file_open_context(file));
 
-	xchg(&desc.ctx->error, 0);
-	nfs_pageio_init_read(&desc.pgio, inode, false,
+	xchg(&ctx->error, 0);
+	nfs_pageio_init_read(&pgio, inode, false,
 			     &nfs_async_read_completion_ops);
 
-	ret = readpage_async_filler(&desc, folio);
+	ret = nfs_read_add_folio(&pgio, ctx, folio);
 	if (ret)
 		goto out;
 
-	nfs_pageio_complete_read(&desc.pgio);
-	ret = desc.pgio.pg_error < 0 ? desc.pgio.pg_error : 0;
+	nfs_pageio_complete_read(&pgio);
+	ret = pgio.pg_error < 0 ? pgio.pg_error : 0;
 	if (!ret) {
 		ret = folio_wait_locked_killable(folio);
 		if (!folio_test_uptodate(folio) && !ret)
-			ret = xchg(&desc.ctx->error, 0);
+			ret = xchg(&ctx->error, 0);
 	}
 out:
-	put_nfs_open_context(desc.ctx);
+	put_nfs_open_context(ctx);
 	trace_nfs_aop_readpage_done(inode, folio, ret);
 	return ret;
 out_unlock:
@@ -384,9 +383,10 @@ int nfs_read_folio(struct file *file, struct folio *folio)
 
 void nfs_readahead(struct readahead_control *ractl)
 {
+	struct nfs_pageio_descriptor pgio;
+	struct nfs_open_context *ctx;
 	unsigned int nr_pages = readahead_count(ractl);
 	struct file *file = ractl->file;
-	struct nfs_readdesc desc;
 	struct inode *inode = ractl->mapping->host;
 	struct folio *folio;
 	int ret;
@@ -400,24 +400,24 @@ void nfs_readahead(struct readahead_control *ractl)
 
 	if (file == NULL) {
 		ret = -EBADF;
-		desc.ctx = nfs_find_open_context(inode, NULL, FMODE_READ);
-		if (desc.ctx == NULL)
+		ctx = nfs_find_open_context(inode, NULL, FMODE_READ);
+		if (ctx == NULL)
 			goto out;
 	} else
-		desc.ctx = get_nfs_open_context(nfs_file_open_context(file));
+		ctx = get_nfs_open_context(nfs_file_open_context(file));
 
-	nfs_pageio_init_read(&desc.pgio, inode, false,
+	nfs_pageio_init_read(&pgio, inode, false,
 			     &nfs_async_read_completion_ops);
 
 	while ((folio = readahead_folio(ractl)) != NULL) {
-		ret = readpage_async_filler(&desc, folio);
+		ret = nfs_read_add_folio(&pgio, ctx, folio);
 		if (ret)
 			break;
 	}
 
-	nfs_pageio_complete_read(&desc.pgio);
+	nfs_pageio_complete_read(&pgio);
 
-	put_nfs_open_context(desc.ctx);
+	put_nfs_open_context(ctx);
 out:
 	trace_nfs_aop_readahead_done(inode, nr_pages, ret);
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v11 2/5] NFS: Configure support for netfs when NFS fscache is configured
  2023-02-20 13:43 [PATCH v11 0/5] Convert NFS with fscache to the netfs API Dave Wysochanski
  2023-02-20 13:43 ` [PATCH v11 1/5] NFS: Rename readpage_async_filler to nfs_read_add_folio Dave Wysochanski
@ 2023-02-20 13:43 ` Dave Wysochanski
  2023-02-20 13:43 ` [PATCH v11 3/5] NFS: Convert buffered read paths to use netfs when fscache is enabled Dave Wysochanski
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Dave Wysochanski @ 2023-02-20 13:43 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne
  Cc: linux-nfs, linux-cachefs

As first steps for support of the netfs library when NFS_FSCACHE is
configured, add NETFS_SUPPORT to Kconfig and add the required netfs_inode
into struct nfs_inode.

Using netfs requires we move the VFS inode structure to be stored
inside struct netfs_inode, along with the fscache_cookie.
Thus, if NFS_FSCACHE is configured, place netfs_inode inside an
anonymous union so the vfs_inode memory is the same and we do
not need to modify other non-fscache areas of NFS.
In addition, inside the NFS fscache code, use the new helpers,
netfs_inode() and netfs_i_cookie() helpers, and remove our own
helper, nfs_i_fscache().

Later patches will convert NFS fscache to fully use netfs.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/Kconfig         |  1 +
 fs/nfs/fscache.c       | 20 +++++++++-----------
 fs/nfs/fscache.h       | 15 ++++++---------
 include/linux/nfs_fs.h | 24 ++++++++++--------------
 4 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 1ead5bd740c2..a7942c77415d 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -171,6 +171,7 @@ config ROOT_NFS
 config NFS_FSCACHE
 	bool "Provide NFS client caching support"
 	depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
+	select NETFS_SUPPORT
 	help
 	  Say Y here if you want NFS data to be cached locally on disc through
 	  the general filesystem cache manager
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index e731c00a9fcb..313c1519510c 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -163,13 +163,14 @@ void nfs_fscache_init_inode(struct inode *inode)
 	struct nfs_server *nfss = NFS_SERVER(inode);
 	struct nfs_inode *nfsi = NFS_I(inode);
 
-	nfsi->fscache = NULL;
+	netfs_inode(inode)->cache = NULL;
 	if (!(nfss->fscache && S_ISREG(inode->i_mode)))
 		return;
 
 	nfs_fscache_update_auxdata(&auxdata, inode);
 
-	nfsi->fscache = fscache_acquire_cookie(NFS_SB(inode->i_sb)->fscache,
+	netfs_inode(inode)->cache = fscache_acquire_cookie(
+					       nfss->fscache,
 					       0,
 					       nfsi->fh.data, /* index_key */
 					       nfsi->fh.size,
@@ -183,11 +184,8 @@ void nfs_fscache_init_inode(struct inode *inode)
  */
 void nfs_fscache_clear_inode(struct inode *inode)
 {
-	struct nfs_inode *nfsi = NFS_I(inode);
-	struct fscache_cookie *cookie = nfs_i_fscache(inode);
-
-	fscache_relinquish_cookie(cookie, false);
-	nfsi->fscache = NULL;
+	fscache_relinquish_cookie(netfs_i_cookie(netfs_inode(inode)), false);
+	netfs_inode(inode)->cache = NULL;
 }
 
 /*
@@ -212,7 +210,7 @@ void nfs_fscache_clear_inode(struct inode *inode)
 void nfs_fscache_open_file(struct inode *inode, struct file *filp)
 {
 	struct nfs_fscache_inode_auxdata auxdata;
-	struct fscache_cookie *cookie = nfs_i_fscache(inode);
+	struct fscache_cookie *cookie = netfs_i_cookie(netfs_inode(inode));
 	bool open_for_write = inode_is_open_for_write(inode);
 
 	if (!fscache_cookie_valid(cookie))
@@ -230,7 +228,7 @@ EXPORT_SYMBOL_GPL(nfs_fscache_open_file);
 void nfs_fscache_release_file(struct inode *inode, struct file *filp)
 {
 	struct nfs_fscache_inode_auxdata auxdata;
-	struct fscache_cookie *cookie = nfs_i_fscache(inode);
+	struct fscache_cookie *cookie = netfs_i_cookie(netfs_inode(inode));
 	loff_t i_size = i_size_read(inode);
 
 	nfs_fscache_update_auxdata(&auxdata, inode);
@@ -243,7 +241,7 @@ void nfs_fscache_release_file(struct inode *inode, struct file *filp)
 static int fscache_fallback_read_page(struct inode *inode, struct page *page)
 {
 	struct netfs_cache_resources cres;
-	struct fscache_cookie *cookie = nfs_i_fscache(inode);
+	struct fscache_cookie *cookie = netfs_i_cookie(&NFS_I(inode)->netfs);
 	struct iov_iter iter;
 	struct bio_vec bvec[1];
 	int ret;
@@ -271,7 +269,7 @@ static int fscache_fallback_write_page(struct inode *inode, struct page *page,
 				       bool no_space_allocated_yet)
 {
 	struct netfs_cache_resources cres;
-	struct fscache_cookie *cookie = nfs_i_fscache(inode);
+	struct fscache_cookie *cookie = netfs_i_cookie(&NFS_I(inode)->netfs);
 	struct iov_iter iter;
 	struct bio_vec bvec[1];
 	loff_t start = page_offset(page);
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 2a37af880978..38614ed8f951 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -54,7 +54,7 @@ static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
 		if (current_is_kswapd() || !(gfp & __GFP_FS))
 			return false;
 		folio_wait_fscache(folio);
-		fscache_note_page_release(nfs_i_fscache(folio->mapping->host));
+		fscache_note_page_release(netfs_i_cookie(&NFS_I(folio->mapping->host)->netfs));
 		nfs_inc_fscache_stats(folio->mapping->host,
 				      NFSIOS_FSCACHE_PAGES_UNCACHED);
 	}
@@ -66,7 +66,7 @@ static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
  */
 static inline int nfs_fscache_read_page(struct inode *inode, struct page *page)
 {
-	if (nfs_i_fscache(inode))
+	if (netfs_inode(inode)->cache)
 		return __nfs_fscache_read_page(inode, page);
 	return -ENOBUFS;
 }
@@ -78,7 +78,7 @@ static inline int nfs_fscache_read_page(struct inode *inode, struct page *page)
 static inline void nfs_fscache_write_page(struct inode *inode,
 					   struct page *page)
 {
-	if (nfs_i_fscache(inode))
+	if (netfs_inode(inode)->cache)
 		__nfs_fscache_write_page(inode, page);
 }
 
@@ -101,13 +101,10 @@ static inline void nfs_fscache_update_auxdata(struct nfs_fscache_inode_auxdata *
 static inline void nfs_fscache_invalidate(struct inode *inode, int flags)
 {
 	struct nfs_fscache_inode_auxdata auxdata;
-	struct nfs_inode *nfsi = NFS_I(inode);
+	struct fscache_cookie *cookie =  netfs_i_cookie(&NFS_I(inode)->netfs);
 
-	if (nfsi->fscache) {
-		nfs_fscache_update_auxdata(&auxdata, inode);
-		fscache_invalidate(nfsi->fscache, &auxdata,
-				   i_size_read(inode), flags);
-	}
+	nfs_fscache_update_auxdata(&auxdata, inode);
+	fscache_invalidate(cookie, &auxdata, i_size_read(inode), flags);
 }
 
 /*
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 45c44211e50e..580847c70fec 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -31,6 +31,10 @@
 #include <linux/sunrpc/auth.h>
 #include <linux/sunrpc/clnt.h>
 
+#ifdef CONFIG_NFS_FSCACHE
+#include <linux/netfs.h>
+#endif
+
 #include <linux/nfs.h>
 #include <linux/nfs2.h>
 #include <linux/nfs3.h>
@@ -204,14 +208,15 @@ struct nfs_inode {
 	/* how many bytes have been written/read and how many bytes queued up */
 	__u64 write_io;
 	__u64 read_io;
-#ifdef CONFIG_NFS_FSCACHE
-	struct fscache_cookie	*fscache;
-#endif
-	struct inode		vfs_inode;
-
 #ifdef CONFIG_NFS_V4_2
 	struct nfs4_xattr_cache *xattr_cache;
 #endif
+	union {
+		struct inode		vfs_inode;
+#ifdef CONFIG_NFS_FSCACHE
+		struct netfs_inode	netfs; /* netfs context and VFS inode */
+#endif
+	};
 };
 
 struct nfs4_copy_state {
@@ -329,15 +334,6 @@ static inline int NFS_STALE(const struct inode *inode)
 	return test_bit(NFS_INO_STALE, &NFS_I(inode)->flags);
 }
 
-static inline struct fscache_cookie *nfs_i_fscache(struct inode *inode)
-{
-#ifdef CONFIG_NFS_FSCACHE
-	return NFS_I(inode)->fscache;
-#else
-	return NULL;
-#endif
-}
-
 static inline __u64 NFS_FILEID(const struct inode *inode)
 {
 	return NFS_I(inode)->fileid;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v11 3/5] NFS: Convert buffered read paths to use netfs when fscache is enabled
  2023-02-20 13:43 [PATCH v11 0/5] Convert NFS with fscache to the netfs API Dave Wysochanski
  2023-02-20 13:43 ` [PATCH v11 1/5] NFS: Rename readpage_async_filler to nfs_read_add_folio Dave Wysochanski
  2023-02-20 13:43 ` [PATCH v11 2/5] NFS: Configure support for netfs when NFS fscache is configured Dave Wysochanski
@ 2023-02-20 13:43 ` Dave Wysochanski
  2023-02-20 13:43 ` [PATCH v11 4/5] NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API Dave Wysochanski
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Dave Wysochanski @ 2023-02-20 13:43 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne
  Cc: linux-nfs, linux-cachefs

Convert the NFS buffered read code paths to corresponding netfs APIs,
but only when fscache is configured and enabled.

The netfs API defines struct netfs_request_ops which must be filled
in by the network filesystem.  For NFS, we only need to define 5 of
the functions, the main one being the issue_read() function.
The issue_read() function is called by the netfs layer when a read
cannot be fulfilled locally, and must be sent to the server (either
the cache is not active, or it is active but the data is not available).
Once the read from the server is complete, netfs requires a call to
netfs_subreq_terminated() which conveys either how many bytes were read
successfully, or an error.  Note that issue_read() is called with a
structure, netfs_io_subrequest, which defines the IO requested, and
contains a start and a length (both in bytes), and assumes the underlying
netfs will return a either an error on the whole region, or the number
of bytes successfully read.

The NFS IO path is page based and the main APIs are the pgio APIs defined
in pagelist.c.  For the pgio APIs, there is no way for the caller to
know how many RPCs will be sent and how the pages will be broken up
into underlying RPCs, each of which will have their own completion and
return code.  In contrast, netfs is subrequest based, a single
subrequest may contain multiple pages, and a single subrequest is
initiated with issue_read() and terminated with netfs_subreq_terminated().
Thus, to utilze the netfs APIs, NFS needs some way to accommodate
the netfs API requirement on the single response to the whole
subrequest, while also minimizing disruptive changes to the NFS
pgio layer.

The approach taken with this patch is to allocate a small structure
for each nfs_netfs_issue_read() call, store the final error and number
of bytes successfully transferred in the structure, and update these values
as each RPC completes.  The refcount on the structure is used as a marker
for the last RPC completion, is incremented in nfs_netfs_read_initiate(),
and decremented inside nfs_netfs_read_completion(), when a nfs_pgio_header
contains a valid pointer to the data.  On the final put (which signals
the final outstanding RPC is complete) in nfs_netfs_read_completion(),
call netfs_subreq_terminated() with either the final error value (if
one or more READs complete with an error) or the number of bytes
successfully transferred (if all RPCs complete successfully).  Note
that when all RPCs complete successfully, the number of bytes transferred
is capped to the length of the subrequest.  Capping the transferred length
to the subrequest length prevents "Subreq overread" warnings from netfs.
This is due to the "aligned_len" in nfs_pageio_add_page(), and the
corner case where NFS requests a full page at the end of the file,
even when i_size reflects only a partial page (NFS overread).

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/fscache.c         | 226 +++++++++++++++++++++++----------------
 fs/nfs/fscache.h         | 122 +++++++++++++++------
 fs/nfs/inode.c           |   2 +
 fs/nfs/internal.h        |   9 ++
 fs/nfs/pagelist.c        |   4 +
 fs/nfs/read.c            |  61 +++++------
 include/linux/nfs_page.h |   3 +
 include/linux/nfs_xdr.h  |   3 +
 8 files changed, 274 insertions(+), 156 deletions(-)

diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 313c1519510c..95c2b3056e2b 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -15,6 +15,9 @@
 #include <linux/seq_file.h>
 #include <linux/slab.h>
 #include <linux/iversion.h>
+#include <linux/xarray.h>
+#include <linux/fscache.h>
+#include <linux/netfs.h>
 
 #include "internal.h"
 #include "iostat.h"
@@ -235,112 +238,153 @@ void nfs_fscache_release_file(struct inode *inode, struct file *filp)
 	fscache_unuse_cookie(cookie, &auxdata, &i_size);
 }
 
-/*
- * Fallback page reading interface.
- */
-static int fscache_fallback_read_page(struct inode *inode, struct page *page)
+int nfs_netfs_read_folio(struct file *file, struct folio *folio)
 {
-	struct netfs_cache_resources cres;
-	struct fscache_cookie *cookie = netfs_i_cookie(&NFS_I(inode)->netfs);
-	struct iov_iter iter;
-	struct bio_vec bvec[1];
-	int ret;
-
-	memset(&cres, 0, sizeof(cres));
-	bvec[0].bv_page		= page;
-	bvec[0].bv_offset	= 0;
-	bvec[0].bv_len		= PAGE_SIZE;
-	iov_iter_bvec(&iter, ITER_DEST, bvec, ARRAY_SIZE(bvec), PAGE_SIZE);
-
-	ret = fscache_begin_read_operation(&cres, cookie);
-	if (ret < 0)
-		return ret;
-
-	ret = fscache_read(&cres, page_offset(page), &iter, NETFS_READ_HOLE_FAIL,
-			   NULL, NULL);
-	fscache_end_operation(&cres);
-	return ret;
+	if (!netfs_inode(folio_inode(folio))->cache)
+		return -ENOBUFS;
+
+	return netfs_read_folio(file, folio);
 }
 
-/*
- * Fallback page writing interface.
- */
-static int fscache_fallback_write_page(struct inode *inode, struct page *page,
-				       bool no_space_allocated_yet)
+int nfs_netfs_readahead(struct readahead_control *ractl)
 {
-	struct netfs_cache_resources cres;
-	struct fscache_cookie *cookie = netfs_i_cookie(&NFS_I(inode)->netfs);
-	struct iov_iter iter;
-	struct bio_vec bvec[1];
-	loff_t start = page_offset(page);
-	size_t len = PAGE_SIZE;
-	int ret;
-
-	memset(&cres, 0, sizeof(cres));
-	bvec[0].bv_page		= page;
-	bvec[0].bv_offset	= 0;
-	bvec[0].bv_len		= PAGE_SIZE;
-	iov_iter_bvec(&iter, ITER_SOURCE, bvec, ARRAY_SIZE(bvec), PAGE_SIZE);
-
-	ret = fscache_begin_write_operation(&cres, cookie);
-	if (ret < 0)
-		return ret;
-
-	ret = cres.ops->prepare_write(&cres, &start, &len, i_size_read(inode),
-				      no_space_allocated_yet);
-	if (ret == 0)
-		ret = fscache_write(&cres, page_offset(page), &iter, NULL, NULL);
-	fscache_end_operation(&cres);
-	return ret;
+	struct inode *inode = ractl->mapping->host;
+
+	if (!netfs_inode(inode)->cache)
+		return -ENOBUFS;
+
+	netfs_readahead(ractl);
+	return 0;
 }
 
-/*
- * Retrieve a page from fscache
- */
-int __nfs_fscache_read_page(struct inode *inode, struct page *page)
+atomic_t nfs_netfs_debug_id;
+static int nfs_netfs_init_request(struct netfs_io_request *rreq, struct file *file)
 {
-	int ret;
+	rreq->netfs_priv = get_nfs_open_context(nfs_file_open_context(file));
+	rreq->debug_id = atomic_inc_return(&nfs_netfs_debug_id);
 
-	trace_nfs_fscache_read_page(inode, page);
-	if (PageChecked(page)) {
-		ClearPageChecked(page);
-		ret = 1;
-		goto out;
-	}
+	return 0;
+}
 
-	ret = fscache_fallback_read_page(inode, page);
-	if (ret < 0) {
-		nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_FAIL);
-		SetPageChecked(page);
-		goto out;
-	}
+static void nfs_netfs_free_request(struct netfs_io_request *rreq)
+{
+	put_nfs_open_context(rreq->netfs_priv);
+}
 
-	/* Read completed synchronously */
-	nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_OK);
-	SetPageUptodate(page);
-	ret = 0;
-out:
-	trace_nfs_fscache_read_page_exit(inode, page, ret);
-	return ret;
+static inline int nfs_netfs_begin_cache_operation(struct netfs_io_request *rreq)
+{
+	return fscache_begin_read_operation(&rreq->cache_resources,
+					    netfs_i_cookie(netfs_inode(rreq->inode)));
 }
 
-/*
- * Store a newly fetched page in fscache.  We can be certain there's no page
- * stored in the cache as yet otherwise we would've read it from there.
- */
-void __nfs_fscache_write_page(struct inode *inode, struct page *page)
+static struct nfs_netfs_io_data *nfs_netfs_alloc(struct netfs_io_subrequest *sreq)
 {
-	int ret;
+	struct nfs_netfs_io_data *netfs;
+
+	netfs = kzalloc(sizeof(*netfs), GFP_KERNEL_ACCOUNT);
+	if (!netfs)
+		return NULL;
+	netfs->sreq = sreq;
+	refcount_set(&netfs->refcount, 1);
+	return netfs;
+}
 
-	trace_nfs_fscache_write_page(inode, page);
+static bool nfs_netfs_clamp_length(struct netfs_io_subrequest *sreq)
+{
+	size_t	rsize = NFS_SB(sreq->rreq->inode->i_sb)->rsize;
 
-	ret = fscache_fallback_write_page(inode, page, true);
+	sreq->len = min(sreq->len, rsize);
+	return true;
+}
 
-	if (ret != 0) {
-		nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_WRITTEN_FAIL);
-		nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_UNCACHED);
-	} else {
-		nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_WRITTEN_OK);
+static void nfs_netfs_issue_read(struct netfs_io_subrequest *sreq)
+{
+	struct nfs_netfs_io_data	*netfs;
+	struct nfs_pageio_descriptor	pgio;
+	struct inode *inode = sreq->rreq->inode;
+	struct nfs_open_context *ctx = sreq->rreq->netfs_priv;
+	struct page *page;
+	int err;
+	pgoff_t start = (sreq->start + sreq->transferred) >> PAGE_SHIFT;
+	pgoff_t last = ((sreq->start + sreq->len -
+			 sreq->transferred - 1) >> PAGE_SHIFT);
+	XA_STATE(xas, &sreq->rreq->mapping->i_pages, start);
+
+	nfs_pageio_init_read(&pgio, inode, false,
+			     &nfs_async_read_completion_ops);
+
+	netfs = nfs_netfs_alloc(sreq);
+	if (!netfs)
+		return netfs_subreq_terminated(sreq, -ENOMEM, false);
+
+	pgio.pg_netfs = netfs; /* used in completion */
+
+	xas_lock(&xas);
+	xas_for_each(&xas, page, last) {
+		/* nfs_read_add_folio() may schedule() due to pNFS layout and other RPCs  */
+		xas_pause(&xas);
+		xas_unlock(&xas);
+		err = nfs_read_add_folio(&pgio, ctx, page_folio(page));
+		if (err < 0) {
+			netfs->error = err;
+			goto out;
+		}
+		xas_lock(&xas);
 	}
-	trace_nfs_fscache_write_page_exit(inode, page, ret);
+	xas_unlock(&xas);
+out:
+	nfs_pageio_complete_read(&pgio);
+	nfs_netfs_put(netfs);
 }
+
+void nfs_netfs_initiate_read(struct nfs_pgio_header *hdr)
+{
+	struct nfs_netfs_io_data        *netfs = hdr->netfs;
+
+	if (!netfs)
+		return;
+
+	nfs_netfs_get(netfs);
+}
+
+int nfs_netfs_folio_unlock(struct folio *folio)
+{
+	struct inode *inode = folio_file_mapping(folio)->host;
+
+	/*
+	 * If fscache is enabled, netfs will unlock pages.
+	 */
+	if (netfs_inode(inode)->cache)
+		return 0;
+
+	return 1;
+}
+
+void nfs_netfs_read_completion(struct nfs_pgio_header *hdr)
+{
+	struct nfs_netfs_io_data        *netfs = hdr->netfs;
+	struct netfs_io_subrequest      *sreq;
+
+	if (!netfs)
+		return;
+
+	sreq = netfs->sreq;
+	if (test_bit(NFS_IOHDR_EOF, &hdr->flags))
+		__set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
+
+	if (hdr->error)
+		netfs->error = hdr->error;
+	else
+		atomic64_add(hdr->res.count, &netfs->transferred);
+
+	nfs_netfs_put(netfs);
+	hdr->netfs = NULL;
+}
+
+const struct netfs_request_ops nfs_netfs_ops = {
+	.init_request		= nfs_netfs_init_request,
+	.free_request		= nfs_netfs_free_request,
+	.begin_cache_operation	= nfs_netfs_begin_cache_operation,
+	.issue_read		= nfs_netfs_issue_read,
+	.clamp_length		= nfs_netfs_clamp_length
+};
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 38614ed8f951..e1706e736c64 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -34,6 +34,58 @@ struct nfs_fscache_inode_auxdata {
 	u64	change_attr;
 };
 
+struct nfs_netfs_io_data {
+	/*
+	 * NFS may split a netfs_io_subrequest into multiple RPCs, each
+	 * with their own read completion.  In netfs, we can only call
+	 * netfs_subreq_terminated() once for each subrequest.  Use the
+	 * refcount here to double as a marker of the last RPC completion,
+	 * and only call netfs via netfs_subreq_terminated() once.
+	 */
+	refcount_t			refcount;
+	struct netfs_io_subrequest	*sreq;
+
+	/*
+	 * Final disposition of the netfs_io_subrequest, sent in
+	 * netfs_subreq_terminated()
+	 */
+	atomic64_t	transferred;
+	int		error;
+};
+
+static inline void nfs_netfs_get(struct nfs_netfs_io_data *netfs)
+{
+	refcount_inc(&netfs->refcount);
+}
+
+static inline void nfs_netfs_put(struct nfs_netfs_io_data *netfs)
+{
+	ssize_t final_len;
+
+	/* Only the last RPC completion should call netfs_subreq_terminated() */
+	if (!refcount_dec_and_test(&netfs->refcount))
+		return;
+
+	/*
+	 * The NFS pageio interface may read a complete page, even when netfs
+	 * only asked for a partial page.  Specifically, this may be seen when
+	 * one thread is truncating a file while another one is reading the last
+	 * page of the file.
+	 * Correct the final length here to be no larger than the netfs subrequest
+	 * length, and thus avoid netfs's "Subreq overread" warning message.
+	 */
+	final_len = min_t(s64, netfs->sreq->len, atomic64_read(&netfs->transferred));
+	netfs_subreq_terminated(netfs->sreq, netfs->error ?: final_len, false);
+	kfree(netfs);
+}
+static inline void nfs_netfs_inode_init(struct nfs_inode *nfsi)
+{
+	netfs_inode_init(&nfsi->netfs, &nfs_netfs_ops);
+}
+extern void nfs_netfs_initiate_read(struct nfs_pgio_header *hdr);
+extern void nfs_netfs_read_completion(struct nfs_pgio_header *hdr);
+extern int nfs_netfs_folio_unlock(struct folio *folio);
+
 /*
  * fscache.c
  */
@@ -44,9 +96,8 @@ extern void nfs_fscache_init_inode(struct inode *);
 extern void nfs_fscache_clear_inode(struct inode *);
 extern void nfs_fscache_open_file(struct inode *, struct file *);
 extern void nfs_fscache_release_file(struct inode *, struct file *);
-
-extern int __nfs_fscache_read_page(struct inode *, struct page *);
-extern void __nfs_fscache_write_page(struct inode *, struct page *);
+extern int nfs_netfs_readahead(struct readahead_control *ractl);
+extern int nfs_netfs_read_folio(struct file *file, struct folio *folio);
 
 static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
 {
@@ -54,34 +105,11 @@ static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
 		if (current_is_kswapd() || !(gfp & __GFP_FS))
 			return false;
 		folio_wait_fscache(folio);
-		fscache_note_page_release(netfs_i_cookie(&NFS_I(folio->mapping->host)->netfs));
-		nfs_inc_fscache_stats(folio->mapping->host,
-				      NFSIOS_FSCACHE_PAGES_UNCACHED);
 	}
+	fscache_note_page_release(netfs_i_cookie(netfs_inode(folio->mapping->host)));
 	return true;
 }
 
-/*
- * Retrieve a page from an inode data storage object.
- */
-static inline int nfs_fscache_read_page(struct inode *inode, struct page *page)
-{
-	if (netfs_inode(inode)->cache)
-		return __nfs_fscache_read_page(inode, page);
-	return -ENOBUFS;
-}
-
-/*
- * Store a page newly fetched from the server in an inode data storage object
- * in the cache.
- */
-static inline void nfs_fscache_write_page(struct inode *inode,
-					   struct page *page)
-{
-	if (netfs_inode(inode)->cache)
-		__nfs_fscache_write_page(inode, page);
-}
-
 static inline void nfs_fscache_update_auxdata(struct nfs_fscache_inode_auxdata *auxdata,
 					      struct inode *inode)
 {
@@ -117,7 +145,28 @@ static inline const char *nfs_server_fscache_state(struct nfs_server *server)
 	return "no ";
 }
 
+static inline void nfs_netfs_set_pgio_header(struct nfs_pgio_header *hdr,
+					     struct nfs_pageio_descriptor *desc)
+{
+	hdr->netfs = desc->pg_netfs;
+}
+static inline void nfs_netfs_set_pageio_descriptor(struct nfs_pageio_descriptor *desc,
+						   struct nfs_pgio_header *hdr)
+{
+	desc->pg_netfs = hdr->netfs;
+}
+static inline void nfs_netfs_reset_pageio_descriptor(struct nfs_pageio_descriptor *desc)
+{
+	desc->pg_netfs = NULL;
+}
 #else /* CONFIG_NFS_FSCACHE */
+static inline void nfs_netfs_inode_init(struct nfs_inode *nfsi) {}
+static inline void nfs_netfs_initiate_read(struct nfs_pgio_header *hdr) {}
+static inline void nfs_netfs_read_completion(struct nfs_pgio_header *hdr) {}
+static inline int nfs_netfs_folio_unlock(struct folio *folio)
+{
+	return 1;
+}
 static inline void nfs_fscache_release_super_cookie(struct super_block *sb) {}
 
 static inline void nfs_fscache_init_inode(struct inode *inode) {}
@@ -125,22 +174,29 @@ static inline void nfs_fscache_clear_inode(struct inode *inode) {}
 static inline void nfs_fscache_open_file(struct inode *inode,
 					 struct file *filp) {}
 static inline void nfs_fscache_release_file(struct inode *inode, struct file *file) {}
-
-static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
+static inline int nfs_netfs_readahead(struct readahead_control *ractl)
 {
-	return true; /* may release folio */
+	return -ENOBUFS;
 }
-static inline int nfs_fscache_read_page(struct inode *inode, struct page *page)
+static inline int nfs_netfs_read_folio(struct file *file, struct folio *folio)
 {
 	return -ENOBUFS;
 }
-static inline void nfs_fscache_write_page(struct inode *inode, struct page *page) {}
+
+static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
+{
+	return true; /* may release folio */
+}
 static inline void nfs_fscache_invalidate(struct inode *inode, int flags) {}
 
 static inline const char *nfs_server_fscache_state(struct nfs_server *server)
 {
 	return "no ";
 }
-
+static inline void nfs_netfs_set_pgio_header(struct nfs_pgio_header *hdr,
+					     struct nfs_pageio_descriptor *desc) {}
+static inline void nfs_netfs_set_pageio_descriptor(struct nfs_pageio_descriptor *desc,
+						   struct nfs_pgio_header *hdr) {}
+static inline void nfs_netfs_reset_pageio_descriptor(struct nfs_pageio_descriptor *desc) {}
 #endif /* CONFIG_NFS_FSCACHE */
 #endif /* _NFS_FSCACHE_H */
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index e98ee7599eeb..68503a7e5b8d 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2246,6 +2246,8 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
 #ifdef CONFIG_NFS_V4_2
 	nfsi->xattr_cache = NULL;
 #endif
+	nfs_netfs_inode_init(nfsi);
+
 	return &nfsi->vfs_inode;
 }
 EXPORT_SYMBOL_GPL(nfs_alloc_inode);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 239ae5084774..53e7257feb14 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -452,6 +452,10 @@ extern void nfs_sb_deactive(struct super_block *sb);
 extern int nfs_client_for_each_server(struct nfs_client *clp,
 				      int (*fn)(struct nfs_server *, void *),
 				      void *data);
+#ifdef CONFIG_NFS_FSCACHE
+extern const struct netfs_request_ops nfs_netfs_ops;
+#endif
+
 /* io.c */
 extern void nfs_start_io_read(struct inode *inode);
 extern void nfs_end_io_read(struct inode *inode);
@@ -481,9 +485,14 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh, bool
 
 struct nfs_pgio_completion_ops;
 /* read.c */
+extern const struct nfs_pgio_completion_ops nfs_async_read_completion_ops;
 extern void nfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 			struct inode *inode, bool force_mds,
 			const struct nfs_pgio_completion_ops *compl_ops);
+extern int nfs_read_add_folio(struct nfs_pageio_descriptor *pgio,
+			       struct nfs_open_context *ctx,
+			       struct folio *folio);
+extern void nfs_pageio_complete_read(struct nfs_pageio_descriptor *pgio);
 extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
 extern void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio);
 
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 0b4c07c93a52..43582dcba66c 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -25,6 +25,7 @@
 #include "internal.h"
 #include "pnfs.h"
 #include "nfstrace.h"
+#include "fscache.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
@@ -104,6 +105,7 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
 	hdr->good_bytes = mirror->pg_count;
 	hdr->io_completion = desc->pg_io_completion;
 	hdr->dreq = desc->pg_dreq;
+	nfs_netfs_set_pgio_header(hdr, desc);
 	hdr->release = release;
 	hdr->completion_ops = desc->pg_completion_ops;
 	if (hdr->completion_ops->init_hdr)
@@ -940,6 +942,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
 	desc->pg_lseg = NULL;
 	desc->pg_io_completion = NULL;
 	desc->pg_dreq = NULL;
+	nfs_netfs_reset_pageio_descriptor(desc);
 	desc->pg_bsize = bsize;
 
 	desc->pg_mirror_count = 1;
@@ -1476,6 +1479,7 @@ int nfs_pageio_resend(struct nfs_pageio_descriptor *desc,
 
 	desc->pg_io_completion = hdr->io_completion;
 	desc->pg_dreq = hdr->dreq;
+	nfs_netfs_set_pageio_descriptor(desc, hdr);
 	list_splice_init(&hdr->pages, &pages);
 	while (!list_empty(&pages)) {
 		struct nfs_page *req = nfs_list_entry(pages.next);
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 4cb3991c4735..e3a630f4764b 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -30,7 +30,7 @@
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
-static const struct nfs_pgio_completion_ops nfs_async_read_completion_ops;
+const struct nfs_pgio_completion_ops nfs_async_read_completion_ops;
 static const struct nfs_rw_ops nfs_rw_read_ops;
 
 static struct kmem_cache *nfs_rdata_cachep;
@@ -73,7 +73,7 @@ void nfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 }
 EXPORT_SYMBOL_GPL(nfs_pageio_init_read);
 
-static void nfs_pageio_complete_read(struct nfs_pageio_descriptor *pgio)
+void nfs_pageio_complete_read(struct nfs_pageio_descriptor *pgio)
 {
 	struct nfs_pgio_mirror *pgm;
 	unsigned long npages;
@@ -109,20 +109,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_reset_read_mds);
 
 static void nfs_readpage_release(struct nfs_page *req, int error)
 {
-	struct inode *inode = d_inode(nfs_req_openctx(req)->dentry);
 	struct folio *folio = nfs_page_to_folio(req);
 
-	dprintk("NFS: read done (%s/%llu %d@%lld)\n", inode->i_sb->s_id,
-		(unsigned long long)NFS_FILEID(inode), req->wb_bytes,
-		(long long)req_offset(req));
-
 	if (nfs_error_is_fatal_on_server(error) && error != -ETIMEDOUT)
 		folio_set_error(folio);
-	if (nfs_page_group_sync_on_bit(req, PG_UNLOCKPAGE)) {
-		if (folio_test_uptodate(folio))
-			nfs_fscache_write_page(inode, &folio->page);
-		folio_unlock(folio);
-	}
+	if (nfs_page_group_sync_on_bit(req, PG_UNLOCKPAGE))
+		if (nfs_netfs_folio_unlock(folio))
+			folio_unlock(folio);
+
 	nfs_release_request(req);
 }
 
@@ -176,6 +170,8 @@ static void nfs_read_completion(struct nfs_pgio_header *hdr)
 		nfs_list_remove_request(req);
 		nfs_readpage_release(req, error);
 	}
+	nfs_netfs_read_completion(hdr);
+
 out:
 	hdr->release(hdr);
 }
@@ -186,6 +182,7 @@ static void nfs_initiate_read(struct nfs_pgio_header *hdr,
 			      struct rpc_task_setup *task_setup_data, int how)
 {
 	rpc_ops->read_setup(hdr, msg);
+	nfs_netfs_initiate_read(hdr);
 	trace_nfs_initiate_read(hdr);
 }
 
@@ -201,7 +198,7 @@ nfs_async_read_error(struct list_head *head, int error)
 	}
 }
 
-static const struct nfs_pgio_completion_ops nfs_async_read_completion_ops = {
+const struct nfs_pgio_completion_ops nfs_async_read_completion_ops = {
 	.error_cleanup = nfs_async_read_error,
 	.completion = nfs_read_completion,
 };
@@ -276,9 +273,9 @@ static void nfs_readpage_result(struct rpc_task *task,
 		nfs_readpage_retry(task, hdr);
 }
 
-static int nfs_read_add_folio(struct nfs_pageio_descriptor *pgio,
-			      struct nfs_open_context *ctx,
-			      struct folio *folio)
+int nfs_read_add_folio(struct nfs_pageio_descriptor *pgio,
+		       struct nfs_open_context *ctx,
+		       struct folio *folio)
 {
 	struct inode *inode = folio_file_mapping(folio)->host;
 	struct nfs_server *server = NFS_SERVER(inode);
@@ -294,15 +291,11 @@ static int nfs_read_add_folio(struct nfs_pageio_descriptor *pgio,
 
 	aligned_len = min_t(unsigned int, ALIGN(len, rsize), fsize);
 
-	if (!IS_SYNC(inode)) {
-		error = nfs_fscache_read_page(inode, &folio->page);
-		if (error == 0)
-			goto out_unlock;
-	}
-
 	new = nfs_page_create_from_folio(ctx, folio, 0, aligned_len);
-	if (IS_ERR(new))
-		goto out_error;
+	if (IS_ERR(new)) {
+		error = PTR_ERR(new);
+		goto out;
+	}
 
 	if (len < fsize)
 		folio_zero_segment(folio, len, fsize);
@@ -313,10 +306,6 @@ static int nfs_read_add_folio(struct nfs_pageio_descriptor *pgio,
 		goto out;
 	}
 	return 0;
-out_error:
-	error = PTR_ERR(new);
-out_unlock:
-	folio_unlock(folio);
 out:
 	return error;
 }
@@ -354,6 +343,10 @@ int nfs_read_folio(struct file *file, struct folio *folio)
 	if (NFS_STALE(inode))
 		goto out_unlock;
 
+	ret = nfs_netfs_read_folio(file, folio);
+	if (!ret)
+		goto out;
+
 	ctx = get_nfs_open_context(nfs_file_open_context(file));
 
 	xchg(&ctx->error, 0);
@@ -362,7 +355,7 @@ int nfs_read_folio(struct file *file, struct folio *folio)
 
 	ret = nfs_read_add_folio(&pgio, ctx, folio);
 	if (ret)
-		goto out;
+		goto out_put;
 
 	nfs_pageio_complete_read(&pgio);
 	ret = pgio.pg_error < 0 ? pgio.pg_error : 0;
@@ -371,14 +364,14 @@ int nfs_read_folio(struct file *file, struct folio *folio)
 		if (!folio_test_uptodate(folio) && !ret)
 			ret = xchg(&ctx->error, 0);
 	}
-out:
+out_put:
 	put_nfs_open_context(ctx);
+out:
 	trace_nfs_aop_readpage_done(inode, folio, ret);
 	return ret;
 out_unlock:
 	folio_unlock(folio);
-	trace_nfs_aop_readpage_done(inode, folio, ret);
-	return ret;
+	goto out;
 }
 
 void nfs_readahead(struct readahead_control *ractl)
@@ -398,6 +391,10 @@ void nfs_readahead(struct readahead_control *ractl)
 	if (NFS_STALE(inode))
 		goto out;
 
+	ret = nfs_netfs_readahead(ractl);
+	if (!ret)
+		goto out;
+
 	if (file == NULL) {
 		ret = -EBADF;
 		ctx = nfs_find_open_context(inode, NULL, FMODE_READ);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index a2f1ca657623..aa9f4c6ebe26 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -105,6 +105,9 @@ struct nfs_pageio_descriptor {
 	struct pnfs_layout_segment *pg_lseg;
 	struct nfs_io_completion *pg_io_completion;
 	struct nfs_direct_req	*pg_dreq;
+#ifdef CONFIG_NFS_FSCACHE
+	void			*pg_netfs;
+#endif
 	unsigned int		pg_bsize;	/* default bsize for mirrors */
 
 	u32			pg_mirror_count;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index e86cf6642d21..e196ef595908 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1619,6 +1619,9 @@ struct nfs_pgio_header {
 	const struct nfs_rw_ops	*rw_ops;
 	struct nfs_io_completion *io_completion;
 	struct nfs_direct_req	*dreq;
+#ifdef CONFIG_NFS_FSCACHE
+	void			*netfs;
+#endif
 
 	int			pnfs_error;
 	int			error;		/* merge with pnfs_error */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v11 4/5] NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API
  2023-02-20 13:43 [PATCH v11 0/5] Convert NFS with fscache to the netfs API Dave Wysochanski
                   ` (2 preceding siblings ...)
  2023-02-20 13:43 ` [PATCH v11 3/5] NFS: Convert buffered read paths to use netfs when fscache is enabled Dave Wysochanski
@ 2023-02-20 13:43 ` Dave Wysochanski
  2023-02-20 13:43 ` [PATCH v11 5/5] NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit Dave Wysochanski
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Dave Wysochanski @ 2023-02-20 13:43 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne
  Cc: linux-nfs, linux-cachefs

The old NFSIOS_FSCACHE counters are no longer accurate or useful with
the conversion to the new netfs API.  The new API does not have a page
based interface, and so the counters in nfs_stat_fscachecounters are
no longer obtainable.  The new netfs the API has extensive statistics
inside /proc/fs/fscache/stats so we no longer need NFS specific fscache
stats.

Note this also removes the 'fsc:' line from /proc/self/mountstats so
it will be a user-visible change.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfs/iostat.h            | 17 -----------------
 fs/nfs/super.c             | 11 -----------
 include/linux/nfs_iostat.h | 12 ------------
 3 files changed, 40 deletions(-)

diff --git a/fs/nfs/iostat.h b/fs/nfs/iostat.h
index 2ddaab1ac653..5aa776b5a3e7 100644
--- a/fs/nfs/iostat.h
+++ b/fs/nfs/iostat.h
@@ -17,9 +17,6 @@
 
 struct nfs_iostats {
 	unsigned long long	bytes[__NFSIOS_BYTESMAX];
-#ifdef CONFIG_NFS_FSCACHE
-	unsigned long long	fscache[__NFSIOS_FSCACHEMAX];
-#endif
 	unsigned long		events[__NFSIOS_COUNTSMAX];
 } ____cacheline_aligned;
 
@@ -49,20 +46,6 @@ static inline void nfs_add_stats(const struct inode *inode,
 	nfs_add_server_stats(NFS_SERVER(inode), stat, addend);
 }
 
-#ifdef CONFIG_NFS_FSCACHE
-static inline void nfs_add_fscache_stats(struct inode *inode,
-					 enum nfs_stat_fscachecounters stat,
-					 long addend)
-{
-	this_cpu_add(NFS_SERVER(inode)->io_stats->fscache[stat], addend);
-}
-static inline void nfs_inc_fscache_stats(struct inode *inode,
-					 enum nfs_stat_fscachecounters stat)
-{
-	this_cpu_inc(NFS_SERVER(inode)->io_stats->fscache[stat]);
-}
-#endif
-
 static inline struct nfs_iostats __percpu *nfs_alloc_iostats(void)
 {
 	return alloc_percpu(struct nfs_iostats);
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 05ae23657527..90796db8c205 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -692,10 +692,6 @@ int nfs_show_stats(struct seq_file *m, struct dentry *root)
 			totals.events[i] += stats->events[i];
 		for (i = 0; i < __NFSIOS_BYTESMAX; i++)
 			totals.bytes[i] += stats->bytes[i];
-#ifdef CONFIG_NFS_FSCACHE
-		for (i = 0; i < __NFSIOS_FSCACHEMAX; i++)
-			totals.fscache[i] += stats->fscache[i];
-#endif
 
 		preempt_enable();
 	}
@@ -706,13 +702,6 @@ int nfs_show_stats(struct seq_file *m, struct dentry *root)
 	seq_puts(m, "\n\tbytes:\t");
 	for (i = 0; i < __NFSIOS_BYTESMAX; i++)
 		seq_printf(m, "%Lu ", totals.bytes[i]);
-#ifdef CONFIG_NFS_FSCACHE
-	if (nfss->options & NFS_OPTION_FSCACHE) {
-		seq_puts(m, "\n\tfsc:\t");
-		for (i = 0; i < __NFSIOS_FSCACHEMAX; i++)
-			seq_printf(m, "%Lu ", totals.fscache[i]);
-	}
-#endif
 	seq_putc(m, '\n');
 
 	rpc_clnt_show_stats(m, nfss->client);
diff --git a/include/linux/nfs_iostat.h b/include/linux/nfs_iostat.h
index 027874c36c88..8d946089d151 100644
--- a/include/linux/nfs_iostat.h
+++ b/include/linux/nfs_iostat.h
@@ -119,16 +119,4 @@ enum nfs_stat_eventcounters {
 	__NFSIOS_COUNTSMAX,
 };
 
-/*
- * NFS local caching servicing counters
- */
-enum nfs_stat_fscachecounters {
-	NFSIOS_FSCACHE_PAGES_READ_OK,
-	NFSIOS_FSCACHE_PAGES_READ_FAIL,
-	NFSIOS_FSCACHE_PAGES_WRITTEN_OK,
-	NFSIOS_FSCACHE_PAGES_WRITTEN_FAIL,
-	NFSIOS_FSCACHE_PAGES_UNCACHED,
-	__NFSIOS_FSCACHEMAX,
-};
-
 #endif	/* _LINUX_NFS_IOSTAT */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v11 5/5] NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit
  2023-02-20 13:43 [PATCH v11 0/5] Convert NFS with fscache to the netfs API Dave Wysochanski
                   ` (3 preceding siblings ...)
  2023-02-20 13:43 ` [PATCH v11 4/5] NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API Dave Wysochanski
@ 2023-02-20 13:43 ` Dave Wysochanski
  2023-02-22 12:50 ` [PATCH v11 0/5] Convert NFS with fscache to the netfs API Daire Byrne
  2023-03-23 17:49 ` [Linux-cachefs] " David Wysochanski
  6 siblings, 0 replies; 13+ messages in thread
From: Dave Wysochanski @ 2023-02-20 13:43 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne
  Cc: linux-nfs, linux-cachefs

The NFS specific trace points are no longer needed as tracing is well
covered by netfs and fscache.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfs/nfstrace.h      | 91 ------------------------------------------
 include/linux/nfs_fs.h |  1 -
 2 files changed, 92 deletions(-)

diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index a778713343df..4e90ca531176 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -39,7 +39,6 @@
 			{ BIT(NFS_INO_STALE), "STALE" }, \
 			{ BIT(NFS_INO_ACL_LRU_SET), "ACL_LRU_SET" }, \
 			{ BIT(NFS_INO_INVALIDATING), "INVALIDATING" }, \
-			{ BIT(NFS_INO_FSCACHE), "FSCACHE" }, \
 			{ BIT(NFS_INO_LAYOUTCOMMIT), "NEED_LAYOUTCOMMIT" }, \
 			{ BIT(NFS_INO_LAYOUTCOMMITTING), "LAYOUTCOMMIT" }, \
 			{ BIT(NFS_INO_LAYOUTSTATS), "LAYOUTSTATS" }, \
@@ -1243,96 +1242,6 @@ TRACE_EVENT(nfs_readpage_short,
 		)
 );
 
-DECLARE_EVENT_CLASS(nfs_fscache_page_event,
-		TP_PROTO(
-			const struct inode *inode,
-			struct page *page
-		),
-
-		TP_ARGS(inode, page),
-
-		TP_STRUCT__entry(
-			__field(dev_t, dev)
-			__field(u32, fhandle)
-			__field(u64, fileid)
-			__field(loff_t, offset)
-		),
-
-		TP_fast_assign(
-			const struct nfs_inode *nfsi = NFS_I(inode);
-			const struct nfs_fh *fh = &nfsi->fh;
-
-			__entry->offset = page_index(page) << PAGE_SHIFT;
-			__entry->dev = inode->i_sb->s_dev;
-			__entry->fileid = nfsi->fileid;
-			__entry->fhandle = nfs_fhandle_hash(fh);
-		),
-
-		TP_printk(
-			"fileid=%02x:%02x:%llu fhandle=0x%08x "
-			"offset=%lld",
-			MAJOR(__entry->dev), MINOR(__entry->dev),
-			(unsigned long long)__entry->fileid,
-			__entry->fhandle,
-			(long long)__entry->offset
-		)
-);
-DECLARE_EVENT_CLASS(nfs_fscache_page_event_done,
-		TP_PROTO(
-			const struct inode *inode,
-			struct page *page,
-			int error
-		),
-
-		TP_ARGS(inode, page, error),
-
-		TP_STRUCT__entry(
-			__field(int, error)
-			__field(dev_t, dev)
-			__field(u32, fhandle)
-			__field(u64, fileid)
-			__field(loff_t, offset)
-		),
-
-		TP_fast_assign(
-			const struct nfs_inode *nfsi = NFS_I(inode);
-			const struct nfs_fh *fh = &nfsi->fh;
-
-			__entry->offset = page_index(page) << PAGE_SHIFT;
-			__entry->dev = inode->i_sb->s_dev;
-			__entry->fileid = nfsi->fileid;
-			__entry->fhandle = nfs_fhandle_hash(fh);
-			__entry->error = error;
-		),
-
-		TP_printk(
-			"fileid=%02x:%02x:%llu fhandle=0x%08x "
-			"offset=%lld error=%d",
-			MAJOR(__entry->dev), MINOR(__entry->dev),
-			(unsigned long long)__entry->fileid,
-			__entry->fhandle,
-			(long long)__entry->offset, __entry->error
-		)
-);
-#define DEFINE_NFS_FSCACHE_PAGE_EVENT(name) \
-	DEFINE_EVENT(nfs_fscache_page_event, name, \
-			TP_PROTO( \
-				const struct inode *inode, \
-				struct page *page \
-			), \
-			TP_ARGS(inode, page))
-#define DEFINE_NFS_FSCACHE_PAGE_EVENT_DONE(name) \
-	DEFINE_EVENT(nfs_fscache_page_event_done, name, \
-			TP_PROTO( \
-				const struct inode *inode, \
-				struct page *page, \
-				int error \
-			), \
-			TP_ARGS(inode, page, error))
-DEFINE_NFS_FSCACHE_PAGE_EVENT(nfs_fscache_read_page);
-DEFINE_NFS_FSCACHE_PAGE_EVENT_DONE(nfs_fscache_read_page_exit);
-DEFINE_NFS_FSCACHE_PAGE_EVENT(nfs_fscache_write_page);
-DEFINE_NFS_FSCACHE_PAGE_EVENT_DONE(nfs_fscache_write_page_exit);
 
 TRACE_EVENT(nfs_pgio_error,
 	TP_PROTO(
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 580847c70fec..137d296e6002 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -281,7 +281,6 @@ struct nfs4_copy_state {
 #define NFS_INO_ACL_LRU_SET	(2)		/* Inode is on the LRU list */
 #define NFS_INO_INVALIDATING	(3)		/* inode is being invalidated */
 #define NFS_INO_PRESERVE_UNLINKED (4)		/* preserve file if removed while open */
-#define NFS_INO_FSCACHE		(5)		/* inode can be cached by FS-Cache */
 #define NFS_INO_LAYOUTCOMMIT	(9)		/* layoutcommit required */
 #define NFS_INO_LAYOUTCOMMITTING (10)		/* layoutcommit inflight */
 #define NFS_INO_LAYOUTSTATS	(11)		/* layoutstats inflight */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v11 0/5] Convert NFS with fscache to the netfs API
  2023-02-20 13:43 [PATCH v11 0/5] Convert NFS with fscache to the netfs API Dave Wysochanski
                   ` (4 preceding siblings ...)
  2023-02-20 13:43 ` [PATCH v11 5/5] NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit Dave Wysochanski
@ 2023-02-22 12:50 ` Daire Byrne
  2023-02-22 15:57   ` David Wysochanski
  2023-03-23 17:49 ` [Linux-cachefs] " David Wysochanski
  6 siblings, 1 reply; 13+ messages in thread
From: Daire Byrne @ 2023-02-22 12:50 UTC (permalink / raw)
  To: Dave Wysochanski
  Cc: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne, linux-nfs, linux-cachefs

Dave,

Thanks for this! I have been testing it with our production (render
farm) loads for the last couple of days and have not run into any
issues so far. It seems to be performing on par with your previous
version of the patchset (based on v6.0).

I am also running with both the "known issues" dhowells patches [8] &
[9] mentioned in your email (as I was with your previous version).

Tested-by: Daire Byrne <daire@dneg.com>



On Mon, 20 Feb 2023 at 13:44, Dave Wysochanski <dwysocha@redhat.com> wrote:
>
> Trond, this v11 patchset addresses your latest feedback on patch #2,
> and I did a little bit of cleanup to patch 3 (see Changes notes below).
> I'm not sure of further changes to patch #3 without a more in-depth
> review with specifics, if you feel the current approach is unacceptable [1].
>
> Ben and Daire, if you could test this set and provide you feedback
> and a Tested-By: that would be appreciated.  This set addresses
> the existing NFS + fscache performance concerns seen by a few
> users [5], which is due to utilization use of the deprecated
> single-page function, fscache_fallback_read_page().  However,
> until "known issue #1" below is also resolved, even with these
> patches, performance of NFS+fscache will still be a problem
> in some scenarios.
>
> This patchset converts NFS with fscache buffered read IO paths to
> use the netfs API with a non-invasive approach.  The existing NFS pgio
> layer does not need extensive changes, and is the best way so far I've
> found to address Trond's previous concerns about modifying the IO
> path [2] as well as only enabling netfs when fscache is configured
> and enabled [3].  I have not attempted performance comparisions to
> address Chuck Lever's concern [4] because we are not converting the
> non-fscache enabled NFS IO paths to netfs.
>
> The patchset is based on Trond's latest 'testing' branch which includes
> his folio patchset, and is based on 6.2-rc5.  It has been pushed to
> github at:
> https://github.com/DaveWysochanskiRH/kernel/commits/nfs-fscache-netfs
> https://github.com/DaveWysochanskiRH/kernel/commit/6424e4f139652b7552eff26eb5da1f2282d35616
>
> Changes since v10 [6]
> =====================
> PATCH6: Dropped
> PATCH1: Rename nfs_pageio_add_page to nfs_read_add_folio
> PATCH2: Use anonymous union to add struct netfs_inode to nfs_inode (Trond) [7]
> PATCH3: Change nfs_netfs_readpage_release() to nfs_netfs_folio_unlock()
>
> Testing
> =======
> I did a full round of testing on this because it was rebased on top of
> Trond's testing branch which included his folio series.
> All of my unit tests pass except the one with the known issue #1 below.
> Multiple runs of xfstests generic tests (applicable to NFS) were run with
> with various servers, both with and without fscache enabled, and
> compared to baseline (Trond's testing branch).  No new failures were
> observed with these patches, and in some xfstest instances, this
> patchset improves the results (some tests that were failing now pass).
> - hammerspace(pNFS flexfiles): NFS4.1, NFS4.2
> - NetApp(pNFS filelayout): NFS4.1, NFS4.0, NFS3
> - RHEL9: NFS4.2, NFS4.1, NFS4.0, NFS3
>
> Known issues
> ============
> 1. Unit test setting rsize < readahead does not properly read from
> fscache but re-reads data from the NFS server
> * This will be fixed with another dhowells patch [8]:
>   "[PATCH v6 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"
> * Daire Byrne verified the patch fixes his issue as well
>
> 2. "Cache volume key already in use" after xfstest runs involving multiple mounts
> * Simple reproducer requires just two mounts as follows:
>  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp1 /mnt1
>  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp2 /mnt2
> * This should be fixed with dhowells patch [9]:
>   "[PATCH v5] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing"
>
>
> References
> ==========
> [1] https://lore.kernel.org/linux-nfs/0676ecb2bb708e6fc29dbbe6b44551d6a0d021dc.camel@kernel.org/
> [2] https://lore.kernel.org/linux-nfs/9cfd5bc3cfc6abc2d3316b0387222e708d67f595.camel@hammerspace.com/
> [3] https://lore.kernel.org/linux-nfs/da9200f1bded9b8b078a7aef227fd6b92eb028fb.camel@hammerspace.com/
> [4] https://lore.kernel.org/linux-nfs/0A640C47-5F51-47E8-864D-E0E980F8B310@oracle.com/
> [5] https://lore.kernel.org/linux-nfs/CA+QRt4tPqH87NVkoETLjxieGjZ_f7XxRj+xS3NVxcJ+b8AAKQg@mail.gmail.com/
> [6] https://lore.kernel.org/linux-nfs/20221103161637.1725471-1-dwysocha@redhat.com/
> [7] https://lore.kernel.org/linux-nfs/4d60636f62df4f5c200666ed2d1a5f2414c18e1f.camel@kernel.org/
> [8] https://lore.kernel.org/linux-nfs/20230216150701.3654894-1-dhowells@redhat.com/T/#mf3807fa68fb6d495b87dde0d76b5237833a0cc81
> [9] https://lore.kernel.org/linux-kernel/217595.1662033775@warthog.procyon.org.uk/
>
> Dave Wysochanski (5):
>   NFS: Rename readpage_async_filler to nfs_read_add_folio
>   NFS: Configure support for netfs when NFS fscache is configured
>   NFS: Convert buffered read paths to use netfs when fscache is enabled
>   NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API
>   NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit
>
>  fs/nfs/Kconfig             |   1 +
>  fs/nfs/fscache.c           | 242 ++++++++++++++++++++++---------------
>  fs/nfs/fscache.h           | 131 ++++++++++++++------
>  fs/nfs/inode.c             |   2 +
>  fs/nfs/internal.h          |   9 ++
>  fs/nfs/iostat.h            |  17 ---
>  fs/nfs/nfstrace.h          |  91 --------------
>  fs/nfs/pagelist.c          |   4 +
>  fs/nfs/read.c              | 105 ++++++++--------
>  fs/nfs/super.c             |  11 --
>  include/linux/nfs_fs.h     |  25 ++--
>  include/linux/nfs_iostat.h |  12 --
>  include/linux/nfs_page.h   |   3 +
>  include/linux/nfs_xdr.h    |   3 +
>  14 files changed, 317 insertions(+), 339 deletions(-)
>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v11 0/5] Convert NFS with fscache to the netfs API
  2023-02-22 12:50 ` [PATCH v11 0/5] Convert NFS with fscache to the netfs API Daire Byrne
@ 2023-02-22 15:57   ` David Wysochanski
  2023-02-23 15:47     ` Daire Byrne
  0 siblings, 1 reply; 13+ messages in thread
From: David Wysochanski @ 2023-02-22 15:57 UTC (permalink / raw)
  To: Daire Byrne
  Cc: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne, linux-nfs, linux-cachefs

Thanks Daire!  Did this also resolve the issue with
/proc/PID/io/read_bytes you reported (link below)
since netfs should increment that count now?

https://lore.kernel.org/linux-nfs/CAPt2mGNEYUk5u8V4abe=5MM5msZqmvzCVrtCP4Qw1n=gCHCnww@mail.gmail.com/



On Wed, Feb 22, 2023 at 7:51 AM Daire Byrne <daire@dneg.com> wrote:
>
> Dave,
>
> Thanks for this! I have been testing it with our production (render
> farm) loads for the last couple of days and have not run into any
> issues so far. It seems to be performing on par with your previous
> version of the patchset (based on v6.0).
>
> I am also running with both the "known issues" dhowells patches [8] &
> [9] mentioned in your email (as I was with your previous version).
>
> Tested-by: Daire Byrne <daire@dneg.com>
>
>
>
> On Mon, 20 Feb 2023 at 13:44, Dave Wysochanski <dwysocha@redhat.com> wrote:
> >
> > Trond, this v11 patchset addresses your latest feedback on patch #2,
> > and I did a little bit of cleanup to patch 3 (see Changes notes below).
> > I'm not sure of further changes to patch #3 without a more in-depth
> > review with specifics, if you feel the current approach is unacceptable [1].
> >
> > Ben and Daire, if you could test this set and provide you feedback
> > and a Tested-By: that would be appreciated.  This set addresses
> > the existing NFS + fscache performance concerns seen by a few
> > users [5], which is due to utilization use of the deprecated
> > single-page function, fscache_fallback_read_page().  However,
> > until "known issue #1" below is also resolved, even with these
> > patches, performance of NFS+fscache will still be a problem
> > in some scenarios.
> >
> > This patchset converts NFS with fscache buffered read IO paths to
> > use the netfs API with a non-invasive approach.  The existing NFS pgio
> > layer does not need extensive changes, and is the best way so far I've
> > found to address Trond's previous concerns about modifying the IO
> > path [2] as well as only enabling netfs when fscache is configured
> > and enabled [3].  I have not attempted performance comparisions to
> > address Chuck Lever's concern [4] because we are not converting the
> > non-fscache enabled NFS IO paths to netfs.
> >
> > The patchset is based on Trond's latest 'testing' branch which includes
> > his folio patchset, and is based on 6.2-rc5.  It has been pushed to
> > github at:
> > https://github.com/DaveWysochanskiRH/kernel/commits/nfs-fscache-netfs
> > https://github.com/DaveWysochanskiRH/kernel/commit/6424e4f139652b7552eff26eb5da1f2282d35616
> >
> > Changes since v10 [6]
> > =====================
> > PATCH6: Dropped
> > PATCH1: Rename nfs_pageio_add_page to nfs_read_add_folio
> > PATCH2: Use anonymous union to add struct netfs_inode to nfs_inode (Trond) [7]
> > PATCH3: Change nfs_netfs_readpage_release() to nfs_netfs_folio_unlock()
> >
> > Testing
> > =======
> > I did a full round of testing on this because it was rebased on top of
> > Trond's testing branch which included his folio series.
> > All of my unit tests pass except the one with the known issue #1 below.
> > Multiple runs of xfstests generic tests (applicable to NFS) were run with
> > with various servers, both with and without fscache enabled, and
> > compared to baseline (Trond's testing branch).  No new failures were
> > observed with these patches, and in some xfstest instances, this
> > patchset improves the results (some tests that were failing now pass).
> > - hammerspace(pNFS flexfiles): NFS4.1, NFS4.2
> > - NetApp(pNFS filelayout): NFS4.1, NFS4.0, NFS3
> > - RHEL9: NFS4.2, NFS4.1, NFS4.0, NFS3
> >
> > Known issues
> > ============
> > 1. Unit test setting rsize < readahead does not properly read from
> > fscache but re-reads data from the NFS server
> > * This will be fixed with another dhowells patch [8]:
> >   "[PATCH v6 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"
> > * Daire Byrne verified the patch fixes his issue as well
> >
> > 2. "Cache volume key already in use" after xfstest runs involving multiple mounts
> > * Simple reproducer requires just two mounts as follows:
> >  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp1 /mnt1
> >  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp2 /mnt2
> > * This should be fixed with dhowells patch [9]:
> >   "[PATCH v5] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing"
> >
> >
> > References
> > ==========
> > [1] https://lore.kernel.org/linux-nfs/0676ecb2bb708e6fc29dbbe6b44551d6a0d021dc.camel@kernel.org/
> > [2] https://lore.kernel.org/linux-nfs/9cfd5bc3cfc6abc2d3316b0387222e708d67f595.camel@hammerspace.com/
> > [3] https://lore.kernel.org/linux-nfs/da9200f1bded9b8b078a7aef227fd6b92eb028fb.camel@hammerspace.com/
> > [4] https://lore.kernel.org/linux-nfs/0A640C47-5F51-47E8-864D-E0E980F8B310@oracle.com/
> > [5] https://lore.kernel.org/linux-nfs/CA+QRt4tPqH87NVkoETLjxieGjZ_f7XxRj+xS3NVxcJ+b8AAKQg@mail.gmail.com/
> > [6] https://lore.kernel.org/linux-nfs/20221103161637.1725471-1-dwysocha@redhat.com/
> > [7] https://lore.kernel.org/linux-nfs/4d60636f62df4f5c200666ed2d1a5f2414c18e1f.camel@kernel.org/
> > [8] https://lore.kernel.org/linux-nfs/20230216150701.3654894-1-dhowells@redhat.com/T/#mf3807fa68fb6d495b87dde0d76b5237833a0cc81
> > [9] https://lore.kernel.org/linux-kernel/217595.1662033775@warthog.procyon.org.uk/
> >
> > Dave Wysochanski (5):
> >   NFS: Rename readpage_async_filler to nfs_read_add_folio
> >   NFS: Configure support for netfs when NFS fscache is configured
> >   NFS: Convert buffered read paths to use netfs when fscache is enabled
> >   NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API
> >   NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit
> >
> >  fs/nfs/Kconfig             |   1 +
> >  fs/nfs/fscache.c           | 242 ++++++++++++++++++++++---------------
> >  fs/nfs/fscache.h           | 131 ++++++++++++++------
> >  fs/nfs/inode.c             |   2 +
> >  fs/nfs/internal.h          |   9 ++
> >  fs/nfs/iostat.h            |  17 ---
> >  fs/nfs/nfstrace.h          |  91 --------------
> >  fs/nfs/pagelist.c          |   4 +
> >  fs/nfs/read.c              | 105 ++++++++--------
> >  fs/nfs/super.c             |  11 --
> >  include/linux/nfs_fs.h     |  25 ++--
> >  include/linux/nfs_iostat.h |  12 --
> >  include/linux/nfs_page.h   |   3 +
> >  include/linux/nfs_xdr.h    |   3 +
> >  14 files changed, 317 insertions(+), 339 deletions(-)
> >
> > --
> > 2.31.1
> >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v11 0/5] Convert NFS with fscache to the netfs API
  2023-02-22 15:57   ` David Wysochanski
@ 2023-02-23 15:47     ` Daire Byrne
  2023-03-09 23:50       ` David Wysochanski
  0 siblings, 1 reply; 13+ messages in thread
From: Daire Byrne @ 2023-02-23 15:47 UTC (permalink / raw)
  To: David Wysochanski
  Cc: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne, linux-nfs, linux-cachefs

Oh, I didn't realise that was included.

I did a quick test doing a dd off a NFS mount but I did not see the
read_bytes increment - rchar yes, read_bytes no (as before). This was
an NFSv3 mount.

Daire

On Wed, 22 Feb 2023 at 15:57, David Wysochanski <dwysocha@redhat.com> wrote:
>
> Thanks Daire!  Did this also resolve the issue with
> /proc/PID/io/read_bytes you reported (link below)
> since netfs should increment that count now?
>
> https://lore.kernel.org/linux-nfs/CAPt2mGNEYUk5u8V4abe=5MM5msZqmvzCVrtCP4Qw1n=gCHCnww@mail.gmail.com/
>
>
>
> On Wed, Feb 22, 2023 at 7:51 AM Daire Byrne <daire@dneg.com> wrote:
> >
> > Dave,
> >
> > Thanks for this! I have been testing it with our production (render
> > farm) loads for the last couple of days and have not run into any
> > issues so far. It seems to be performing on par with your previous
> > version of the patchset (based on v6.0).
> >
> > I am also running with both the "known issues" dhowells patches [8] &
> > [9] mentioned in your email (as I was with your previous version).
> >
> > Tested-by: Daire Byrne <daire@dneg.com>
> >
> >
> >
> > On Mon, 20 Feb 2023 at 13:44, Dave Wysochanski <dwysocha@redhat.com> wrote:
> > >
> > > Trond, this v11 patchset addresses your latest feedback on patch #2,
> > > and I did a little bit of cleanup to patch 3 (see Changes notes below).
> > > I'm not sure of further changes to patch #3 without a more in-depth
> > > review with specifics, if you feel the current approach is unacceptable [1].
> > >
> > > Ben and Daire, if you could test this set and provide you feedback
> > > and a Tested-By: that would be appreciated.  This set addresses
> > > the existing NFS + fscache performance concerns seen by a few
> > > users [5], which is due to utilization use of the deprecated
> > > single-page function, fscache_fallback_read_page().  However,
> > > until "known issue #1" below is also resolved, even with these
> > > patches, performance of NFS+fscache will still be a problem
> > > in some scenarios.
> > >
> > > This patchset converts NFS with fscache buffered read IO paths to
> > > use the netfs API with a non-invasive approach.  The existing NFS pgio
> > > layer does not need extensive changes, and is the best way so far I've
> > > found to address Trond's previous concerns about modifying the IO
> > > path [2] as well as only enabling netfs when fscache is configured
> > > and enabled [3].  I have not attempted performance comparisions to
> > > address Chuck Lever's concern [4] because we are not converting the
> > > non-fscache enabled NFS IO paths to netfs.
> > >
> > > The patchset is based on Trond's latest 'testing' branch which includes
> > > his folio patchset, and is based on 6.2-rc5.  It has been pushed to
> > > github at:
> > > https://github.com/DaveWysochanskiRH/kernel/commits/nfs-fscache-netfs
> > > https://github.com/DaveWysochanskiRH/kernel/commit/6424e4f139652b7552eff26eb5da1f2282d35616
> > >
> > > Changes since v10 [6]
> > > =====================
> > > PATCH6: Dropped
> > > PATCH1: Rename nfs_pageio_add_page to nfs_read_add_folio
> > > PATCH2: Use anonymous union to add struct netfs_inode to nfs_inode (Trond) [7]
> > > PATCH3: Change nfs_netfs_readpage_release() to nfs_netfs_folio_unlock()
> > >
> > > Testing
> > > =======
> > > I did a full round of testing on this because it was rebased on top of
> > > Trond's testing branch which included his folio series.
> > > All of my unit tests pass except the one with the known issue #1 below.
> > > Multiple runs of xfstests generic tests (applicable to NFS) were run with
> > > with various servers, both with and without fscache enabled, and
> > > compared to baseline (Trond's testing branch).  No new failures were
> > > observed with these patches, and in some xfstest instances, this
> > > patchset improves the results (some tests that were failing now pass).
> > > - hammerspace(pNFS flexfiles): NFS4.1, NFS4.2
> > > - NetApp(pNFS filelayout): NFS4.1, NFS4.0, NFS3
> > > - RHEL9: NFS4.2, NFS4.1, NFS4.0, NFS3
> > >
> > > Known issues
> > > ============
> > > 1. Unit test setting rsize < readahead does not properly read from
> > > fscache but re-reads data from the NFS server
> > > * This will be fixed with another dhowells patch [8]:
> > >   "[PATCH v6 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"
> > > * Daire Byrne verified the patch fixes his issue as well
> > >
> > > 2. "Cache volume key already in use" after xfstest runs involving multiple mounts
> > > * Simple reproducer requires just two mounts as follows:
> > >  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp1 /mnt1
> > >  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp2 /mnt2
> > > * This should be fixed with dhowells patch [9]:
> > >   "[PATCH v5] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing"
> > >
> > >
> > > References
> > > ==========
> > > [1] https://lore.kernel.org/linux-nfs/0676ecb2bb708e6fc29dbbe6b44551d6a0d021dc.camel@kernel.org/
> > > [2] https://lore.kernel.org/linux-nfs/9cfd5bc3cfc6abc2d3316b0387222e708d67f595.camel@hammerspace.com/
> > > [3] https://lore.kernel.org/linux-nfs/da9200f1bded9b8b078a7aef227fd6b92eb028fb.camel@hammerspace.com/
> > > [4] https://lore.kernel.org/linux-nfs/0A640C47-5F51-47E8-864D-E0E980F8B310@oracle.com/
> > > [5] https://lore.kernel.org/linux-nfs/CA+QRt4tPqH87NVkoETLjxieGjZ_f7XxRj+xS3NVxcJ+b8AAKQg@mail.gmail.com/
> > > [6] https://lore.kernel.org/linux-nfs/20221103161637.1725471-1-dwysocha@redhat.com/
> > > [7] https://lore.kernel.org/linux-nfs/4d60636f62df4f5c200666ed2d1a5f2414c18e1f.camel@kernel.org/
> > > [8] https://lore.kernel.org/linux-nfs/20230216150701.3654894-1-dhowells@redhat.com/T/#mf3807fa68fb6d495b87dde0d76b5237833a0cc81
> > > [9] https://lore.kernel.org/linux-kernel/217595.1662033775@warthog.procyon.org.uk/
> > >
> > > Dave Wysochanski (5):
> > >   NFS: Rename readpage_async_filler to nfs_read_add_folio
> > >   NFS: Configure support for netfs when NFS fscache is configured
> > >   NFS: Convert buffered read paths to use netfs when fscache is enabled
> > >   NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API
> > >   NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit
> > >
> > >  fs/nfs/Kconfig             |   1 +
> > >  fs/nfs/fscache.c           | 242 ++++++++++++++++++++++---------------
> > >  fs/nfs/fscache.h           | 131 ++++++++++++++------
> > >  fs/nfs/inode.c             |   2 +
> > >  fs/nfs/internal.h          |   9 ++
> > >  fs/nfs/iostat.h            |  17 ---
> > >  fs/nfs/nfstrace.h          |  91 --------------
> > >  fs/nfs/pagelist.c          |   4 +
> > >  fs/nfs/read.c              | 105 ++++++++--------
> > >  fs/nfs/super.c             |  11 --
> > >  include/linux/nfs_fs.h     |  25 ++--
> > >  include/linux/nfs_iostat.h |  12 --
> > >  include/linux/nfs_page.h   |   3 +
> > >  include/linux/nfs_xdr.h    |   3 +
> > >  14 files changed, 317 insertions(+), 339 deletions(-)
> > >
> > > --
> > > 2.31.1
> > >
> >
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v11 0/5] Convert NFS with fscache to the netfs API
  2023-02-23 15:47     ` Daire Byrne
@ 2023-03-09 23:50       ` David Wysochanski
  0 siblings, 0 replies; 13+ messages in thread
From: David Wysochanski @ 2023-03-09 23:50 UTC (permalink / raw)
  To: Daire Byrne
  Cc: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne, linux-nfs, linux-cachefs

I posted a patch which should fix the problem.  It applied cleanly
to 6.3-rc1 as well as this series.

I think netfs should not be counting in the unlock path
because the PID won't be correct.  Fixing netfs should
be a separate patch though, orthogonal to this v11 series.

On Thu, Feb 23, 2023 at 10:47 AM Daire Byrne <daire@dneg.com> wrote:
>
> Oh, I didn't realise that was included.
>
> I did a quick test doing a dd off a NFS mount but I did not see the
> read_bytes increment - rchar yes, read_bytes no (as before). This was
> an NFSv3 mount.
>
> Daire
>
> On Wed, 22 Feb 2023 at 15:57, David Wysochanski <dwysocha@redhat.com> wrote:
> >
> > Thanks Daire!  Did this also resolve the issue with
> > /proc/PID/io/read_bytes you reported (link below)
> > since netfs should increment that count now?
> >
> > https://lore.kernel.org/linux-nfs/CAPt2mGNEYUk5u8V4abe=5MM5msZqmvzCVrtCP4Qw1n=gCHCnww@mail.gmail.com/
> >
> >
> >
> > On Wed, Feb 22, 2023 at 7:51 AM Daire Byrne <daire@dneg.com> wrote:
> > >
> > > Dave,
> > >
> > > Thanks for this! I have been testing it with our production (render
> > > farm) loads for the last couple of days and have not run into any
> > > issues so far. It seems to be performing on par with your previous
> > > version of the patchset (based on v6.0).
> > >
> > > I am also running with both the "known issues" dhowells patches [8] &
> > > [9] mentioned in your email (as I was with your previous version).
> > >
> > > Tested-by: Daire Byrne <daire@dneg.com>
> > >
> > >
> > >
> > > On Mon, 20 Feb 2023 at 13:44, Dave Wysochanski <dwysocha@redhat.com> wrote:
> > > >
> > > > Trond, this v11 patchset addresses your latest feedback on patch #2,
> > > > and I did a little bit of cleanup to patch 3 (see Changes notes below).
> > > > I'm not sure of further changes to patch #3 without a more in-depth
> > > > review with specifics, if you feel the current approach is unacceptable [1].
> > > >
> > > > Ben and Daire, if you could test this set and provide you feedback
> > > > and a Tested-By: that would be appreciated.  This set addresses
> > > > the existing NFS + fscache performance concerns seen by a few
> > > > users [5], which is due to utilization use of the deprecated
> > > > single-page function, fscache_fallback_read_page().  However,
> > > > until "known issue #1" below is also resolved, even with these
> > > > patches, performance of NFS+fscache will still be a problem
> > > > in some scenarios.
> > > >
> > > > This patchset converts NFS with fscache buffered read IO paths to
> > > > use the netfs API with a non-invasive approach.  The existing NFS pgio
> > > > layer does not need extensive changes, and is the best way so far I've
> > > > found to address Trond's previous concerns about modifying the IO
> > > > path [2] as well as only enabling netfs when fscache is configured
> > > > and enabled [3].  I have not attempted performance comparisions to
> > > > address Chuck Lever's concern [4] because we are not converting the
> > > > non-fscache enabled NFS IO paths to netfs.
> > > >
> > > > The patchset is based on Trond's latest 'testing' branch which includes
> > > > his folio patchset, and is based on 6.2-rc5.  It has been pushed to
> > > > github at:
> > > > https://github.com/DaveWysochanskiRH/kernel/commits/nfs-fscache-netfs
> > > > https://github.com/DaveWysochanskiRH/kernel/commit/6424e4f139652b7552eff26eb5da1f2282d35616
> > > >
> > > > Changes since v10 [6]
> > > > =====================
> > > > PATCH6: Dropped
> > > > PATCH1: Rename nfs_pageio_add_page to nfs_read_add_folio
> > > > PATCH2: Use anonymous union to add struct netfs_inode to nfs_inode (Trond) [7]
> > > > PATCH3: Change nfs_netfs_readpage_release() to nfs_netfs_folio_unlock()
> > > >
> > > > Testing
> > > > =======
> > > > I did a full round of testing on this because it was rebased on top of
> > > > Trond's testing branch which included his folio series.
> > > > All of my unit tests pass except the one with the known issue #1 below.
> > > > Multiple runs of xfstests generic tests (applicable to NFS) were run with
> > > > with various servers, both with and without fscache enabled, and
> > > > compared to baseline (Trond's testing branch).  No new failures were
> > > > observed with these patches, and in some xfstest instances, this
> > > > patchset improves the results (some tests that were failing now pass).
> > > > - hammerspace(pNFS flexfiles): NFS4.1, NFS4.2
> > > > - NetApp(pNFS filelayout): NFS4.1, NFS4.0, NFS3
> > > > - RHEL9: NFS4.2, NFS4.1, NFS4.0, NFS3
> > > >
> > > > Known issues
> > > > ============
> > > > 1. Unit test setting rsize < readahead does not properly read from
> > > > fscache but re-reads data from the NFS server
> > > > * This will be fixed with another dhowells patch [8]:
> > > >   "[PATCH v6 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"
> > > > * Daire Byrne verified the patch fixes his issue as well
> > > >
> > > > 2. "Cache volume key already in use" after xfstest runs involving multiple mounts
> > > > * Simple reproducer requires just two mounts as follows:
> > > >  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp1 /mnt1
> > > >  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp2 /mnt2
> > > > * This should be fixed with dhowells patch [9]:
> > > >   "[PATCH v5] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing"
> > > >
> > > >
> > > > References
> > > > ==========
> > > > [1] https://lore.kernel.org/linux-nfs/0676ecb2bb708e6fc29dbbe6b44551d6a0d021dc.camel@kernel.org/
> > > > [2] https://lore.kernel.org/linux-nfs/9cfd5bc3cfc6abc2d3316b0387222e708d67f595.camel@hammerspace.com/
> > > > [3] https://lore.kernel.org/linux-nfs/da9200f1bded9b8b078a7aef227fd6b92eb028fb.camel@hammerspace.com/
> > > > [4] https://lore.kernel.org/linux-nfs/0A640C47-5F51-47E8-864D-E0E980F8B310@oracle.com/
> > > > [5] https://lore.kernel.org/linux-nfs/CA+QRt4tPqH87NVkoETLjxieGjZ_f7XxRj+xS3NVxcJ+b8AAKQg@mail.gmail.com/
> > > > [6] https://lore.kernel.org/linux-nfs/20221103161637.1725471-1-dwysocha@redhat.com/
> > > > [7] https://lore.kernel.org/linux-nfs/4d60636f62df4f5c200666ed2d1a5f2414c18e1f.camel@kernel.org/
> > > > [8] https://lore.kernel.org/linux-nfs/20230216150701.3654894-1-dhowells@redhat.com/T/#mf3807fa68fb6d495b87dde0d76b5237833a0cc81
> > > > [9] https://lore.kernel.org/linux-kernel/217595.1662033775@warthog.procyon.org.uk/
> > > >
> > > > Dave Wysochanski (5):
> > > >   NFS: Rename readpage_async_filler to nfs_read_add_folio
> > > >   NFS: Configure support for netfs when NFS fscache is configured
> > > >   NFS: Convert buffered read paths to use netfs when fscache is enabled
> > > >   NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API
> > > >   NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit
> > > >
> > > >  fs/nfs/Kconfig             |   1 +
> > > >  fs/nfs/fscache.c           | 242 ++++++++++++++++++++++---------------
> > > >  fs/nfs/fscache.h           | 131 ++++++++++++++------
> > > >  fs/nfs/inode.c             |   2 +
> > > >  fs/nfs/internal.h          |   9 ++
> > > >  fs/nfs/iostat.h            |  17 ---
> > > >  fs/nfs/nfstrace.h          |  91 --------------
> > > >  fs/nfs/pagelist.c          |   4 +
> > > >  fs/nfs/read.c              | 105 ++++++++--------
> > > >  fs/nfs/super.c             |  11 --
> > > >  include/linux/nfs_fs.h     |  25 ++--
> > > >  include/linux/nfs_iostat.h |  12 --
> > > >  include/linux/nfs_page.h   |   3 +
> > > >  include/linux/nfs_xdr.h    |   3 +
> > > >  14 files changed, 317 insertions(+), 339 deletions(-)
> > > >
> > > > --
> > > > 2.31.1
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-cachefs] [PATCH v11 0/5] Convert NFS with fscache to the netfs API
  2023-02-20 13:43 [PATCH v11 0/5] Convert NFS with fscache to the netfs API Dave Wysochanski
                   ` (5 preceding siblings ...)
  2023-02-22 12:50 ` [PATCH v11 0/5] Convert NFS with fscache to the netfs API Daire Byrne
@ 2023-03-23 17:49 ` David Wysochanski
  2023-04-25 17:32   ` Chris Chilvers
  6 siblings, 1 reply; 13+ messages in thread
From: David Wysochanski @ 2023-03-23 17:49 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne
  Cc: linux-nfs, linux-cachefs

On Mon, Feb 20, 2023 at 8:43 AM Dave Wysochanski <dwysocha@redhat.com> wrote:
>
> Trond, this v11 patchset addresses your latest feedback on patch #2,
> and I did a little bit of cleanup to patch 3 (see Changes notes below).
> I'm not sure of further changes to patch #3 without a more in-depth
> review with specifics, if you feel the current approach is unacceptable [1].
>

Trond and/or Anna,

Have you had a chance to review this patchset and do you have further
concerns?

Note that patch #3 will require a small fixup to apply after v6.3-rc1
due to this commit:
8bb7cd842c44 nfs: use bvec_set_page to initialize bvecs

There is also still the small open issue of netfs counting read_bytes
in its unlock page path, but I view that as a netfs issue.  I'll followup
with David Howells on that.


> Ben and Daire, if you could test this set and provide you feedback
> and a Tested-By: that would be appreciated.  This set addresses
> the existing NFS + fscache performance concerns seen by a few
> users [5], which is due to utilization use of the deprecated
> single-page function, fscache_fallback_read_page().  However,
> until "known issue #1" below is also resolved, even with these
> patches, performance of NFS+fscache will still be a problem
> in some scenarios.
>
> This patchset converts NFS with fscache buffered read IO paths to
> use the netfs API with a non-invasive approach.  The existing NFS pgio
> layer does not need extensive changes, and is the best way so far I've
> found to address Trond's previous concerns about modifying the IO
> path [2] as well as only enabling netfs when fscache is configured
> and enabled [3].  I have not attempted performance comparisions to
> address Chuck Lever's concern [4] because we are not converting the
> non-fscache enabled NFS IO paths to netfs.
>
> The patchset is based on Trond's latest 'testing' branch which includes
> his folio patchset, and is based on 6.2-rc5.  It has been pushed to
> github at:
> https://github.com/DaveWysochanskiRH/kernel/commits/nfs-fscache-netfs
> https://github.com/DaveWysochanskiRH/kernel/commit/6424e4f139652b7552eff26eb5da1f2282d35616
>
> Changes since v10 [6]
> =====================
> PATCH6: Dropped
> PATCH1: Rename nfs_pageio_add_page to nfs_read_add_folio
> PATCH2: Use anonymous union to add struct netfs_inode to nfs_inode (Trond) [7]
> PATCH3: Change nfs_netfs_readpage_release() to nfs_netfs_folio_unlock()
>
> Testing
> =======
> I did a full round of testing on this because it was rebased on top of
> Trond's testing branch which included his folio series.
> All of my unit tests pass except the one with the known issue #1 below.
> Multiple runs of xfstests generic tests (applicable to NFS) were run with
> with various servers, both with and without fscache enabled, and
> compared to baseline (Trond's testing branch).  No new failures were
> observed with these patches, and in some xfstest instances, this
> patchset improves the results (some tests that were failing now pass).
> - hammerspace(pNFS flexfiles): NFS4.1, NFS4.2
> - NetApp(pNFS filelayout): NFS4.1, NFS4.0, NFS3
> - RHEL9: NFS4.2, NFS4.1, NFS4.0, NFS3
>
> Known issues
> ============
> 1. Unit test setting rsize < readahead does not properly read from
> fscache but re-reads data from the NFS server
> * This will be fixed with another dhowells patch [8]:
>   "[PATCH v6 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"
> * Daire Byrne verified the patch fixes his issue as well
>
> 2. "Cache volume key already in use" after xfstest runs involving multiple mounts
> * Simple reproducer requires just two mounts as follows:
>  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp1 /mnt1
>  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp2 /mnt2
> * This should be fixed with dhowells patch [9]:
>   "[PATCH v5] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing"
>
>
> References
> ==========
> [1] https://lore.kernel.org/linux-nfs/0676ecb2bb708e6fc29dbbe6b44551d6a0d021dc.camel@kernel.org/
> [2] https://lore.kernel.org/linux-nfs/9cfd5bc3cfc6abc2d3316b0387222e708d67f595.camel@hammerspace.com/
> [3] https://lore.kernel.org/linux-nfs/da9200f1bded9b8b078a7aef227fd6b92eb028fb.camel@hammerspace.com/
> [4] https://lore.kernel.org/linux-nfs/0A640C47-5F51-47E8-864D-E0E980F8B310@oracle.com/
> [5] https://lore.kernel.org/linux-nfs/CA+QRt4tPqH87NVkoETLjxieGjZ_f7XxRj+xS3NVxcJ+b8AAKQg@mail.gmail.com/
> [6] https://lore.kernel.org/linux-nfs/20221103161637.1725471-1-dwysocha@redhat.com/
> [7] https://lore.kernel.org/linux-nfs/4d60636f62df4f5c200666ed2d1a5f2414c18e1f.camel@kernel.org/
> [8] https://lore.kernel.org/linux-nfs/20230216150701.3654894-1-dhowells@redhat.com/T/#mf3807fa68fb6d495b87dde0d76b5237833a0cc81
> [9] https://lore.kernel.org/linux-kernel/217595.1662033775@warthog.procyon.org.uk/
>
> Dave Wysochanski (5):
>   NFS: Rename readpage_async_filler to nfs_read_add_folio
>   NFS: Configure support for netfs when NFS fscache is configured
>   NFS: Convert buffered read paths to use netfs when fscache is enabled
>   NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API
>   NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit
>
>  fs/nfs/Kconfig             |   1 +
>  fs/nfs/fscache.c           | 242 ++++++++++++++++++++++---------------
>  fs/nfs/fscache.h           | 131 ++++++++++++++------
>  fs/nfs/inode.c             |   2 +
>  fs/nfs/internal.h          |   9 ++
>  fs/nfs/iostat.h            |  17 ---
>  fs/nfs/nfstrace.h          |  91 --------------
>  fs/nfs/pagelist.c          |   4 +
>  fs/nfs/read.c              | 105 ++++++++--------
>  fs/nfs/super.c             |  11 --
>  include/linux/nfs_fs.h     |  25 ++--
>  include/linux/nfs_iostat.h |  12 --
>  include/linux/nfs_page.h   |   3 +
>  include/linux/nfs_xdr.h    |   3 +
>  14 files changed, 317 insertions(+), 339 deletions(-)
>
> --
> 2.31.1
>
> --
> Linux-cachefs mailing list
> Linux-cachefs@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-cachefs
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-cachefs] [PATCH v11 0/5] Convert NFS with fscache to the netfs API
  2023-03-23 17:49 ` [Linux-cachefs] " David Wysochanski
@ 2023-04-25 17:32   ` Chris Chilvers
  2023-04-25 17:58     ` David Wysochanski
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Chilvers @ 2023-04-25 17:32 UTC (permalink / raw)
  To: David Wysochanski
  Cc: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne, linux-nfs, linux-cachefs

We've been using NFS with FS-Cache to act as a caching proxy (re-export). When
under load we've encountered an issue where all the nfsd processes seem to get
stuck in I/O wait.

The proxy is running an older version of the nfs-fscache-nfsfs branch taken
from 17th Nov 2022,
https://github.com/DaveWysochanskiRH/kernel/commit/52acbd4584d1b83c844371e48de1a1e39d255a6d.

The proxy mounts the source NFS server using NFS v3 with FS-Cache enabled.
The clients mount the proxy using NFS v4.2 to avoid issues with file handle
sizes.

dmesg suggests that most of the CPU tasks are blocked by nfsd on
folio_wait_bit_common

    INFO: task nfsd:180059 blocked for more than 120 seconds.
          Not tainted 6.1.0-rc5+ #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:nfsd            state:D stack:0     pid:180059 ppid:2
flags:0x00004000
    Call Trace:
     <TASK>
     __schedule+0x31e/0x14a0
     ? _raw_spin_unlock_irqrestore+0x27/0x50
     schedule+0x6b/0x110
     io_schedule+0x46/0x80
     folio_wait_bit_common+0x124/0x340
     ? xas_find+0x7c/0x1e0
     ? xas_find_marked+0x1f7/0x370
     ? filemap_invalidate_unlock_two+0x50/0x50
     folio_wait_private_2+0x2c/0x50
     nfs_release_folio+0x5e/0xb0 [nfs]
     filemap_release_folio+0x66/0x80
     invalidate_inode_pages2_range+0x240/0x400
     invalidate_inode_pages2+0x17/0x20
     nfs_clear_invalid_mapping+0x1d8/0x2d0 [nfs]
     nfs_revalidate_mapping+0x55/0x70 [nfs]
     nfs_file_read+0x4c/0xc0 [nfs]
     generic_file_splice_read+0x8f/0x160
     do_splice_to+0x7d/0xc0
     splice_direct_to_actor+0xad/0x210
     ? fsid_source+0x60/0x60 [nfsd]
     ? nfsd_file_do_acquire+0xacf/0xbd0 [nfsd]
     nfsd_splice_read+0x7c/0x120 [nfsd]
     nfsd_read+0x147/0x1b0 [nfsd]
     nfsd3_proc_read+0x1b5/0x2d0 [nfsd]
     ? svcxdr_decode_nfs_fh3+0x4e/0x130 [nfsd]
     nfsd_dispatch+0x173/0x2b0 [nfsd]
     svc_process_common+0x3c8/0x620 [sunrpc]
     ? nfsd_svc+0x3e0/0x3e0 [nfsd]
     ? nfsd_shutdown_threads+0xb0/0xb0 [nfsd]
     svc_process+0xb2/0x100 [sunrpc]
     nfsd+0xda/0x190 [nfsd]
     kthread+0xfa/0x130
     ? kthread_complete_and_exit+0x20/0x20
     ret_from_fork+0x1f/0x30
     </TASK>

Checking /proc/PID/stack for all the nfsd processes had the following counts:

* 72 - nfsd_file_do_acquire+0x20b/0xbd0
* 36 - folio_wait_bit_common+0x124/0x340
* 10 - fscache_begin_operation.part.0+0x288/0x2b0 [fscache]
*  8 - __fscache_use_cookie+0x2e5/0x320 [fscache]
*  2 - ext4_llseek+0x91/0x110

dmesg also contains a lot of errors about timeouts waiting for the cookie state
to change:

    FS-Cache: fscache_begin_operation: cookie state change wait timed
out: cookie->state=1 state=1
    FS-Cache: O-cookie c=0026b915 [fl=602c na=1 nA=2 s=L]
    FS-Cache: O-cookie V=00000001
[Infs,3.0,2,,8335540a,6465ebb4,,,d0,100000,100000,249f0,249f0,249f0,249f0,1]
    FS-Cache: O-key=[32]
'b4eb6564010000005509a81a02000000ffffffff000000000400fd0301000000'
    FS-Cache: fscache_begin_operation: cookie state change wait timed
out: cookie->state=1 state=1
    FS-Cache: O-cookie c=0024fd88 [fl=602c na=1 nA=2 s=L]
    FS-Cache: O-cookie V=00000001
[Infs,3.0,2,,8335540a,6465ebb4,,,d0,100000,100000,249f0,249f0,249f0,249f0,1]
    FS-Cache: O-key=[32]
'b4eb6564010000007df7aa1702000000ffffffff000000000400fd0301000000'
    FS-Cache: fscache_begin_operation: cookie state change wait timed
out: cookie->state=1 state=1
    FS-Cache: O-cookie c=0026d61a [fl=4024 na=1 nA=2 s=L]
    FS-Cache: O-cookie V=00000001
[Infs,3.0,2,,8335540a,6465ebb4,,,d0,100000,100000,249f0,249f0,249f0,249f0,1]
    FS-Cache: O-key=[32]
'b4eb656401000000b79947ce01000000ffffffff000000000400fd0301000000'

The clients were shutdown but the proxy instance kept for further diagnosis.
The nfsd sockets remained stuck in CLOSE_WAIT, and the nfsd processes remained
stuck on various tasks for at least 4 days. It seems at some point over the
weekend the issue resolved itself and now all the nfsd threads are idle.

I'm going to try to see if I can reproduce this on the latest versions of the
patches with the lockdep enabled. Though currently we've only seen this issue
while the system is under a heavy production workload (rendering) after several
days.

On Thu, 23 Mar 2023 at 17:50, David Wysochanski <dwysocha@redhat.com> wrote:
>
> On Mon, Feb 20, 2023 at 8:43 AM Dave Wysochanski <dwysocha@redhat.com> wrote:
> >
> > Trond, this v11 patchset addresses your latest feedback on patch #2,
> > and I did a little bit of cleanup to patch 3 (see Changes notes below).
> > I'm not sure of further changes to patch #3 without a more in-depth
> > review with specifics, if you feel the current approach is unacceptable [1].
> >
>
> Trond and/or Anna,
>
> Have you had a chance to review this patchset and do you have further
> concerns?
>
> Note that patch #3 will require a small fixup to apply after v6.3-rc1
> due to this commit:
> 8bb7cd842c44 nfs: use bvec_set_page to initialize bvecs
>
> There is also still the small open issue of netfs counting read_bytes
> in its unlock page path, but I view that as a netfs issue.  I'll followup
> with David Howells on that.
>
>
> > Ben and Daire, if you could test this set and provide you feedback
> > and a Tested-By: that would be appreciated.  This set addresses
> > the existing NFS + fscache performance concerns seen by a few
> > users [5], which is due to utilization use of the deprecated
> > single-page function, fscache_fallback_read_page().  However,
> > until "known issue #1" below is also resolved, even with these
> > patches, performance of NFS+fscache will still be a problem
> > in some scenarios.
> >
> > This patchset converts NFS with fscache buffered read IO paths to
> > use the netfs API with a non-invasive approach.  The existing NFS pgio
> > layer does not need extensive changes, and is the best way so far I've
> > found to address Trond's previous concerns about modifying the IO
> > path [2] as well as only enabling netfs when fscache is configured
> > and enabled [3].  I have not attempted performance comparisions to
> > address Chuck Lever's concern [4] because we are not converting the
> > non-fscache enabled NFS IO paths to netfs.
> >
> > The patchset is based on Trond's latest 'testing' branch which includes
> > his folio patchset, and is based on 6.2-rc5.  It has been pushed to
> > github at:
> > https://github.com/DaveWysochanskiRH/kernel/commits/nfs-fscache-netfs
> > https://github.com/DaveWysochanskiRH/kernel/commit/6424e4f139652b7552eff26eb5da1f2282d35616
> >
> > Changes since v10 [6]
> > =====================
> > PATCH6: Dropped
> > PATCH1: Rename nfs_pageio_add_page to nfs_read_add_folio
> > PATCH2: Use anonymous union to add struct netfs_inode to nfs_inode (Trond) [7]
> > PATCH3: Change nfs_netfs_readpage_release() to nfs_netfs_folio_unlock()
> >
> > Testing
> > =======
> > I did a full round of testing on this because it was rebased on top of
> > Trond's testing branch which included his folio series.
> > All of my unit tests pass except the one with the known issue #1 below.
> > Multiple runs of xfstests generic tests (applicable to NFS) were run with
> > with various servers, both with and without fscache enabled, and
> > compared to baseline (Trond's testing branch).  No new failures were
> > observed with these patches, and in some xfstest instances, this
> > patchset improves the results (some tests that were failing now pass).
> > - hammerspace(pNFS flexfiles): NFS4.1, NFS4.2
> > - NetApp(pNFS filelayout): NFS4.1, NFS4.0, NFS3
> > - RHEL9: NFS4.2, NFS4.1, NFS4.0, NFS3
> >
> > Known issues
> > ============
> > 1. Unit test setting rsize < readahead does not properly read from
> > fscache but re-reads data from the NFS server
> > * This will be fixed with another dhowells patch [8]:
> >   "[PATCH v6 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"
> > * Daire Byrne verified the patch fixes his issue as well
> >
> > 2. "Cache volume key already in use" after xfstest runs involving multiple mounts
> > * Simple reproducer requires just two mounts as follows:
> >  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp1 /mnt1
> >  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp2 /mnt2
> > * This should be fixed with dhowells patch [9]:
> >   "[PATCH v5] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing"
> >
> >
> > References
> > ==========
> > [1] https://lore.kernel.org/linux-nfs/0676ecb2bb708e6fc29dbbe6b44551d6a0d021dc.camel@kernel.org/
> > [2] https://lore.kernel.org/linux-nfs/9cfd5bc3cfc6abc2d3316b0387222e708d67f595.camel@hammerspace.com/
> > [3] https://lore.kernel.org/linux-nfs/da9200f1bded9b8b078a7aef227fd6b92eb028fb.camel@hammerspace.com/
> > [4] https://lore.kernel.org/linux-nfs/0A640C47-5F51-47E8-864D-E0E980F8B310@oracle.com/
> > [5] https://lore.kernel.org/linux-nfs/CA+QRt4tPqH87NVkoETLjxieGjZ_f7XxRj+xS3NVxcJ+b8AAKQg@mail.gmail.com/
> > [6] https://lore.kernel.org/linux-nfs/20221103161637.1725471-1-dwysocha@redhat.com/
> > [7] https://lore.kernel.org/linux-nfs/4d60636f62df4f5c200666ed2d1a5f2414c18e1f.camel@kernel.org/
> > [8] https://lore.kernel.org/linux-nfs/20230216150701.3654894-1-dhowells@redhat.com/T/#mf3807fa68fb6d495b87dde0d76b5237833a0cc81
> > [9] https://lore.kernel.org/linux-kernel/217595.1662033775@warthog.procyon.org.uk/
> >
> > Dave Wysochanski (5):
> >   NFS: Rename readpage_async_filler to nfs_read_add_folio
> >   NFS: Configure support for netfs when NFS fscache is configured
> >   NFS: Convert buffered read paths to use netfs when fscache is enabled
> >   NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API
> >   NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit
> >
> >  fs/nfs/Kconfig             |   1 +
> >  fs/nfs/fscache.c           | 242 ++++++++++++++++++++++---------------
> >  fs/nfs/fscache.h           | 131 ++++++++++++++------
> >  fs/nfs/inode.c             |   2 +
> >  fs/nfs/internal.h          |   9 ++
> >  fs/nfs/iostat.h            |  17 ---
> >  fs/nfs/nfstrace.h          |  91 --------------
> >  fs/nfs/pagelist.c          |   4 +
> >  fs/nfs/read.c              | 105 ++++++++--------
> >  fs/nfs/super.c             |  11 --
> >  include/linux/nfs_fs.h     |  25 ++--
> >  include/linux/nfs_iostat.h |  12 --
> >  include/linux/nfs_page.h   |   3 +
> >  include/linux/nfs_xdr.h    |   3 +
> >  14 files changed, 317 insertions(+), 339 deletions(-)
> >
> > --
> > 2.31.1
> >
> > --
> > Linux-cachefs mailing list
> > Linux-cachefs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/linux-cachefs
> >
>
> --
> Linux-cachefs mailing list
> Linux-cachefs@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-cachefs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-cachefs] [PATCH v11 0/5] Convert NFS with fscache to the netfs API
  2023-04-25 17:32   ` Chris Chilvers
@ 2023-04-25 17:58     ` David Wysochanski
  0 siblings, 0 replies; 13+ messages in thread
From: David Wysochanski @ 2023-04-25 17:58 UTC (permalink / raw)
  To: Chris Chilvers
  Cc: Anna Schumaker, Trond Myklebust, David Howells, Benjamin Maynard,
	Daire Byrne, linux-nfs, linux-cachefs

On Tue, Apr 25, 2023 at 1:32 PM Chris Chilvers <chilversc@gmail.com> wrote:
>
> We've been using NFS with FS-Cache to act as a caching proxy (re-export). When
> under load we've encountered an issue where all the nfsd processes seem to get
> stuck in I/O wait.
>
> The proxy is running an older version of the nfs-fscache-nfsfs branch taken
> from 17th Nov 2022,
> https://github.com/DaveWysochanskiRH/kernel/commit/52acbd4584d1b83c844371e48de1a1e39d255a6d.
>
> The proxy mounts the source NFS server using NFS v3 with FS-Cache enabled.
> The clients mount the proxy using NFS v4.2 to avoid issues with file handle
> sizes.
>
> dmesg suggests that most of the CPU tasks are blocked by nfsd on
> folio_wait_bit_common
>
>     INFO: task nfsd:180059 blocked for more than 120 seconds.
>           Not tainted 6.1.0-rc5+ #1
>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>     task:nfsd            state:D stack:0     pid:180059 ppid:2
> flags:0x00004000
>     Call Trace:
>      <TASK>
>      __schedule+0x31e/0x14a0
>      ? _raw_spin_unlock_irqrestore+0x27/0x50
>      schedule+0x6b/0x110
>      io_schedule+0x46/0x80
>      folio_wait_bit_common+0x124/0x340
>      ? xas_find+0x7c/0x1e0
>      ? xas_find_marked+0x1f7/0x370
>      ? filemap_invalidate_unlock_two+0x50/0x50
>      folio_wait_private_2+0x2c/0x50
>      nfs_release_folio+0x5e/0xb0 [nfs]
>      filemap_release_folio+0x66/0x80
>      invalidate_inode_pages2_range+0x240/0x400
>      invalidate_inode_pages2+0x17/0x20
>      nfs_clear_invalid_mapping+0x1d8/0x2d0 [nfs]
>      nfs_revalidate_mapping+0x55/0x70 [nfs]
>      nfs_file_read+0x4c/0xc0 [nfs]
>      generic_file_splice_read+0x8f/0x160
>      do_splice_to+0x7d/0xc0
>      splice_direct_to_actor+0xad/0x210
>      ? fsid_source+0x60/0x60 [nfsd]
>      ? nfsd_file_do_acquire+0xacf/0xbd0 [nfsd]
>      nfsd_splice_read+0x7c/0x120 [nfsd]
>      nfsd_read+0x147/0x1b0 [nfsd]
>      nfsd3_proc_read+0x1b5/0x2d0 [nfsd]
>      ? svcxdr_decode_nfs_fh3+0x4e/0x130 [nfsd]
>      nfsd_dispatch+0x173/0x2b0 [nfsd]
>      svc_process_common+0x3c8/0x620 [sunrpc]
>      ? nfsd_svc+0x3e0/0x3e0 [nfsd]
>      ? nfsd_shutdown_threads+0xb0/0xb0 [nfsd]
>      svc_process+0xb2/0x100 [sunrpc]
>      nfsd+0xda/0x190 [nfsd]
>      kthread+0xfa/0x130
>      ? kthread_complete_and_exit+0x20/0x20
>      ret_from_fork+0x1f/0x30
>      </TASK>
>
> Checking /proc/PID/stack for all the nfsd processes had the following counts:
>
> * 72 - nfsd_file_do_acquire+0x20b/0xbd0
> * 36 - folio_wait_bit_common+0x124/0x340
> * 10 - fscache_begin_operation.part.0+0x288/0x2b0 [fscache]
> *  8 - __fscache_use_cookie+0x2e5/0x320 [fscache]
> *  2 - ext4_llseek+0x91/0x110
>
> dmesg also contains a lot of errors about timeouts waiting for the cookie state
> to change:
>
>     FS-Cache: fscache_begin_operation: cookie state change wait timed
> out: cookie->state=1 state=1
>     FS-Cache: O-cookie c=0026b915 [fl=602c na=1 nA=2 s=L]
>     FS-Cache: O-cookie V=00000001
> [Infs,3.0,2,,8335540a,6465ebb4,,,d0,100000,100000,249f0,249f0,249f0,249f0,1]
>     FS-Cache: O-key=[32]
> 'b4eb6564010000005509a81a02000000ffffffff000000000400fd0301000000'
>     FS-Cache: fscache_begin_operation: cookie state change wait timed
> out: cookie->state=1 state=1
>     FS-Cache: O-cookie c=0024fd88 [fl=602c na=1 nA=2 s=L]
>     FS-Cache: O-cookie V=00000001
> [Infs,3.0,2,,8335540a,6465ebb4,,,d0,100000,100000,249f0,249f0,249f0,249f0,1]
>     FS-Cache: O-key=[32]
> 'b4eb6564010000007df7aa1702000000ffffffff000000000400fd0301000000'
>     FS-Cache: fscache_begin_operation: cookie state change wait timed
> out: cookie->state=1 state=1
>     FS-Cache: O-cookie c=0026d61a [fl=4024 na=1 nA=2 s=L]
>     FS-Cache: O-cookie V=00000001
> [Infs,3.0,2,,8335540a,6465ebb4,,,d0,100000,100000,249f0,249f0,249f0,249f0,1]
>     FS-Cache: O-key=[32]
> 'b4eb656401000000b79947ce01000000ffffffff000000000400fd0301000000'
>
> The clients were shutdown but the proxy instance kept for further diagnosis.
> The nfsd sockets remained stuck in CLOSE_WAIT, and the nfsd processes remained
> stuck on various tasks for at least 4 days. It seems at some point over the
> weekend the issue resolved itself and now all the nfsd threads are idle.
>
Let me think about this a bit more, though at first glance it does not
ring a bell.

> I'm going to try to see if I can reproduce this on the latest versions of the
> patches with the lockdep enabled. Though currently we've only seen this issue
> while the system is under a heavy production workload (rendering) after several
> days.
>

May I suggest using anna's linux-next branch for further testing?
The v11 NFS netfs patches are in there, along with other patches for the
next merge window.
git://git.linux-nfs.org/projects/anna/linux-nfs.git

$ git log --oneline remotes/nfs-client-anna/linux-next | head --lines=20
e025f0a73f6a NFS: Cleanup unused rpc_clnt variable
c5733ae6dc89 NFS: set varaiable nfs_netfs_debug_id
storage-class-specifier to static
691d0b782066 SUNRPC: remove the maximum number of retries in call_bind_status
ec108d3cc766 NFS: Convert readdir page array functions to use a folio
61f02e0ab81e NFS: Convert the readdir array-of-pages into an array-of-folios
3db63daabe21 NFSv3: handle out-of-order write replies.
03f5bd75a4c1 NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit
0631d5e02a1c NFS: Remove all NFSIOS_FSCACHE counters due to conversion
to netfs API
000dbe0bec05 NFS: Convert buffered read paths to use netfs when
fscache is enabled
88a4d7bdeec9 NFS: Configure support for netfs when NFS fscache is configured
01c3a40084a4 NFS: Rename readpage_async_filler to nfs_read_add_folio
703c6d03f165 sunrpc: simplify one-level sysctl registration for debug_table
32e356be32b6 sunrpc: move sunrpc_table and proc routines above
c946cb69f238 sunrpc: simplify one-level sysctl registration for
xs_tunables_table
17c6d0ce8340 sunrpc: simplify one-level sysctl registration for
xr_tunables_table
39724217447f nfs: simplify two-level sysctl registration for nfs_cb_sysctls
a2183160ca7e nfs: simplify two-level sysctl registration for nfs4_cb_sysctls
c1d889cf99b8 lockd: simplify two-level sysctl registration for nlm_sysctls
40882deb83c2 NFSv4.1: Always send a RECLAIM_COMPLETE after establishing lease
09a9639e56c0 Linux 6.3-rc6


> On Thu, 23 Mar 2023 at 17:50, David Wysochanski <dwysocha@redhat.com> wrote:
> >
> > On Mon, Feb 20, 2023 at 8:43 AM Dave Wysochanski <dwysocha@redhat.com> wrote:
> > >
> > > Trond, this v11 patchset addresses your latest feedback on patch #2,
> > > and I did a little bit of cleanup to patch 3 (see Changes notes below).
> > > I'm not sure of further changes to patch #3 without a more in-depth
> > > review with specifics, if you feel the current approach is unacceptable [1].
> > >
> >
> > Trond and/or Anna,
> >
> > Have you had a chance to review this patchset and do you have further
> > concerns?
> >
> > Note that patch #3 will require a small fixup to apply after v6.3-rc1
> > due to this commit:
> > 8bb7cd842c44 nfs: use bvec_set_page to initialize bvecs
> >
> > There is also still the small open issue of netfs counting read_bytes
> > in its unlock page path, but I view that as a netfs issue.  I'll followup
> > with David Howells on that.
> >
> >
> > > Ben and Daire, if you could test this set and provide you feedback
> > > and a Tested-By: that would be appreciated.  This set addresses
> > > the existing NFS + fscache performance concerns seen by a few
> > > users [5], which is due to utilization use of the deprecated
> > > single-page function, fscache_fallback_read_page().  However,
> > > until "known issue #1" below is also resolved, even with these
> > > patches, performance of NFS+fscache will still be a problem
> > > in some scenarios.
> > >
> > > This patchset converts NFS with fscache buffered read IO paths to
> > > use the netfs API with a non-invasive approach.  The existing NFS pgio
> > > layer does not need extensive changes, and is the best way so far I've
> > > found to address Trond's previous concerns about modifying the IO
> > > path [2] as well as only enabling netfs when fscache is configured
> > > and enabled [3].  I have not attempted performance comparisions to
> > > address Chuck Lever's concern [4] because we are not converting the
> > > non-fscache enabled NFS IO paths to netfs.
> > >
> > > The patchset is based on Trond's latest 'testing' branch which includes
> > > his folio patchset, and is based on 6.2-rc5.  It has been pushed to
> > > github at:
> > > https://github.com/DaveWysochanskiRH/kernel/commits/nfs-fscache-netfs
> > > https://github.com/DaveWysochanskiRH/kernel/commit/6424e4f139652b7552eff26eb5da1f2282d35616
> > >
> > > Changes since v10 [6]
> > > =====================
> > > PATCH6: Dropped
> > > PATCH1: Rename nfs_pageio_add_page to nfs_read_add_folio
> > > PATCH2: Use anonymous union to add struct netfs_inode to nfs_inode (Trond) [7]
> > > PATCH3: Change nfs_netfs_readpage_release() to nfs_netfs_folio_unlock()
> > >
> > > Testing
> > > =======
> > > I did a full round of testing on this because it was rebased on top of
> > > Trond's testing branch which included his folio series.
> > > All of my unit tests pass except the one with the known issue #1 below.
> > > Multiple runs of xfstests generic tests (applicable to NFS) were run with
> > > with various servers, both with and without fscache enabled, and
> > > compared to baseline (Trond's testing branch).  No new failures were
> > > observed with these patches, and in some xfstest instances, this
> > > patchset improves the results (some tests that were failing now pass).
> > > - hammerspace(pNFS flexfiles): NFS4.1, NFS4.2
> > > - NetApp(pNFS filelayout): NFS4.1, NFS4.0, NFS3
> > > - RHEL9: NFS4.2, NFS4.1, NFS4.0, NFS3
> > >
> > > Known issues
> > > ============
> > > 1. Unit test setting rsize < readahead does not properly read from
> > > fscache but re-reads data from the NFS server
> > > * This will be fixed with another dhowells patch [8]:
> > >   "[PATCH v6 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"
> > > * Daire Byrne verified the patch fixes his issue as well
> > >
> > > 2. "Cache volume key already in use" after xfstest runs involving multiple mounts
> > > * Simple reproducer requires just two mounts as follows:
> > >  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp1 /mnt1
> > >  mount -overs=4.1,fsc,nosharecache -o context=system_u:object_r:root_t:s0  nfs-server:/exp2 /mnt2
> > > * This should be fixed with dhowells patch [9]:
> > >   "[PATCH v5] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing"
> > >
> > >
> > > References
> > > ==========
> > > [1] https://lore.kernel.org/linux-nfs/0676ecb2bb708e6fc29dbbe6b44551d6a0d021dc.camel@kernel.org/
> > > [2] https://lore.kernel.org/linux-nfs/9cfd5bc3cfc6abc2d3316b0387222e708d67f595.camel@hammerspace.com/
> > > [3] https://lore.kernel.org/linux-nfs/da9200f1bded9b8b078a7aef227fd6b92eb028fb.camel@hammerspace.com/
> > > [4] https://lore.kernel.org/linux-nfs/0A640C47-5F51-47E8-864D-E0E980F8B310@oracle.com/
> > > [5] https://lore.kernel.org/linux-nfs/CA+QRt4tPqH87NVkoETLjxieGjZ_f7XxRj+xS3NVxcJ+b8AAKQg@mail.gmail.com/
> > > [6] https://lore.kernel.org/linux-nfs/20221103161637.1725471-1-dwysocha@redhat.com/
> > > [7] https://lore.kernel.org/linux-nfs/4d60636f62df4f5c200666ed2d1a5f2414c18e1f.camel@kernel.org/
> > > [8] https://lore.kernel.org/linux-nfs/20230216150701.3654894-1-dhowells@redhat.com/T/#mf3807fa68fb6d495b87dde0d76b5237833a0cc81
> > > [9] https://lore.kernel.org/linux-kernel/217595.1662033775@warthog.procyon.org.uk/
> > >
> > > Dave Wysochanski (5):
> > >   NFS: Rename readpage_async_filler to nfs_read_add_folio
> > >   NFS: Configure support for netfs when NFS fscache is configured
> > >   NFS: Convert buffered read paths to use netfs when fscache is enabled
> > >   NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API
> > >   NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit
> > >
> > >  fs/nfs/Kconfig             |   1 +
> > >  fs/nfs/fscache.c           | 242 ++++++++++++++++++++++---------------
> > >  fs/nfs/fscache.h           | 131 ++++++++++++++------
> > >  fs/nfs/inode.c             |   2 +
> > >  fs/nfs/internal.h          |   9 ++
> > >  fs/nfs/iostat.h            |  17 ---
> > >  fs/nfs/nfstrace.h          |  91 --------------
> > >  fs/nfs/pagelist.c          |   4 +
> > >  fs/nfs/read.c              | 105 ++++++++--------
> > >  fs/nfs/super.c             |  11 --
> > >  include/linux/nfs_fs.h     |  25 ++--
> > >  include/linux/nfs_iostat.h |  12 --
> > >  include/linux/nfs_page.h   |   3 +
> > >  include/linux/nfs_xdr.h    |   3 +
> > >  14 files changed, 317 insertions(+), 339 deletions(-)
> > >
> > > --
> > > 2.31.1
> > >
> > > --
> > > Linux-cachefs mailing list
> > > Linux-cachefs@redhat.com
> > > https://listman.redhat.com/mailman/listinfo/linux-cachefs
> > >
> >
> > --
> > Linux-cachefs mailing list
> > Linux-cachefs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/linux-cachefs
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-04-25 18:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-20 13:43 [PATCH v11 0/5] Convert NFS with fscache to the netfs API Dave Wysochanski
2023-02-20 13:43 ` [PATCH v11 1/5] NFS: Rename readpage_async_filler to nfs_read_add_folio Dave Wysochanski
2023-02-20 13:43 ` [PATCH v11 2/5] NFS: Configure support for netfs when NFS fscache is configured Dave Wysochanski
2023-02-20 13:43 ` [PATCH v11 3/5] NFS: Convert buffered read paths to use netfs when fscache is enabled Dave Wysochanski
2023-02-20 13:43 ` [PATCH v11 4/5] NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API Dave Wysochanski
2023-02-20 13:43 ` [PATCH v11 5/5] NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit Dave Wysochanski
2023-02-22 12:50 ` [PATCH v11 0/5] Convert NFS with fscache to the netfs API Daire Byrne
2023-02-22 15:57   ` David Wysochanski
2023-02-23 15:47     ` Daire Byrne
2023-03-09 23:50       ` David Wysochanski
2023-03-23 17:49 ` [Linux-cachefs] " David Wysochanski
2023-04-25 17:32   ` Chris Chilvers
2023-04-25 17:58     ` David Wysochanski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).