All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Convert NFS to the new netfs API
@ 2022-08-24  9:34 Dave Wysochanski
  2022-08-24  9:34 ` [RFC PATCH 1/3] NFS: Rename readpage_async_filler to nfs_pageio_add_page Dave Wysochanski
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Dave Wysochanski @ 2022-08-24  9:34 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells
  Cc: linux-nfs, linux-cachefs, Benjamin Maynard, Daire Byrne

This patchset converts NFS non-direct READ IO paths to unconditionally
use the netfs API with a non-invasive approach.  The existing NFS pgio
layer does not need extensive changes, and is the best way so far I've
found to avoid Trond's earlier objection [1].  I have not yet attempted
performance comparisions to address Chuck Lever's concern [2].

The main patch to be reviewed is patch #3 which converts nfs_read_folio
and nfs_readahead.

I do not really like this patchset but so far this is what I was able
to come up with the move things forward.  I think it is awkward and
still needs work, has some issues (below) though it is fairly stable
by running xfstests generic with various servers.  The known issues
are as follows:

1. Unit test setting rsize < readahead does not properly read from
fscache but re-reads data from the NFS server
* This will be fixed once another linux-cachefs [3] patch to resolve
"Stop read optimisation when folio removed from pagecache"

2. "Cache volume key already in use" after xfstest runs
* xfstests (hammerspace with vers=4.2,fsc) shows the following on the
console after some tests:
"NFS: Cache volume key already in use (nfs,4.1,2,c50,cfe0100a,3,,,8000,100000,100000,bb8,ea60,7530,ea60,1)"
* This may be fixed with another patch [4] that is in progress

3. Occasional process IO hangs in read paths
* May be due to out-of-order RPC replies or something unique to NetApp
triggering a bug in the conversion patch
* xfstests (Example: generic/075 with netapp ontap9 vers=4.1)
* Sample backtrace
 #2 [ffff888104967990] io_schedule at ffffffffbc46969d
 #3 [ffff8881049679b0] folio_wait_bit_common at ffffffffbb40214b
 #4 [ffff888104967ad0] filemap_get_pages at ffffffffbb40688c
 #5 [ffff888104967c00] filemap_read at ffffffffbb406a61
 #6 [ffff888104967d80] nfs_file_read at ffffffffc0e0f415 [nfs]
 #7 [ffff888104967db0] vfs_read at ffffffffbb5681a5

4. Data corruption seen with unit test where rsize < readahead
* Seen with vanilla 6.0-rc2 (did not occur on 5.19); likely unrelated
to this patchset
 mount -o vers=4.2,fsc,rsize=8192 127.0.0.1:/export /mnt
 dd if=/dev/urandom of=/tmp/integrity-rsize-file1.bin bs=16k count=1
 ./nfs-readahead.sh set /mnt 16384
 dd if=/tmp/integrity-rsize-file1.bin of=/mnt/integrity-rsize-file1.bin bs=16k count=1
 echo 3 > /proc/sys/vm/drop_caches
 md5sum /mnt/integrity-rsize-file1.bin /tmp/integrity-rsize-file1.bin
 md5sums don't match, MD5_NFS = 00eaf1a5bc1b3dfd54711db551619afa != MD5_LOCAL = e8d835c83ba1f1264869dc40673fa20c


The patchset is based on 6.0-rc2 and has been pushed to github at:
https://github.com/DaveWysochanskiRH/kernel/commits/nfs-fscache-netfs

[1] https://lore.kernel.org/linux-nfs/9cfd5bc3cfc6abc2d3316b0387222e708d67f595.camel@hammerspace.com/
[2] https://marc.info/?l=linux-nfs&m=160597917525083&w=4
[3] https://www.mail-archive.com/linux-cachefs@redhat.com/msg03043.html
[4] https://marc.info/?l=linux-nfs&m=165962662200679&w=4

Dave Wysochanski (3):
  NFS: Rename readpage_async_filler to nfs_pageio_add_page
  NFS: Add support for netfs in struct nfs_inode and Kconfig
  NFS: Convert nfs_read_folio and nfs_readahead to netfs APIs

 fs/nfs/Kconfig           |   1 +
 fs/nfs/delegation.c      |   2 +-
 fs/nfs/dir.c             |   2 +-
 fs/nfs/fscache.c         | 191 ++++++++++++++++++---------------------
 fs/nfs/fscache.h         |  77 ++++++++--------
 fs/nfs/inode.c           |   8 +-
 fs/nfs/internal.h        |  10 +-
 fs/nfs/pagelist.c        |  14 +++
 fs/nfs/pnfs.c            |  12 +--
 fs/nfs/read.c            | 117 ++++++++----------------
 fs/nfs/write.c           |   2 +-
 include/linux/nfs_fs.h   |  19 +---
 include/linux/nfs_page.h |   1 +
 include/linux/nfs_xdr.h  |   1 +
 14 files changed, 210 insertions(+), 247 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH 1/3] NFS: Rename readpage_async_filler to nfs_pageio_add_page
  2022-08-24  9:34 [RFC PATCH 0/3] Convert NFS to the new netfs API Dave Wysochanski
@ 2022-08-24  9:34 ` Dave Wysochanski
  2022-08-24  9:35 ` [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig Dave Wysochanski
  2022-08-24  9:35 ` [RFC PATCH 3/3] NFS: Convert nfs_read_folio and nfs_readahead to netfs APIs Dave Wysochanski
  2 siblings, 0 replies; 16+ messages in thread
From: Dave Wysochanski @ 2022-08-24  9:34 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells
  Cc: linux-nfs, linux-cachefs, Benjamin Maynard, Daire Byrne

Rename readpage_async_filler to nfs_pageio_add_page to
better reflect what this function does (add a page to
the nfs_pageio_descriptor), and simplify arguments to
this function by removing struct nfs_readdesc.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/read.c | 60 +++++++++++++++++++++++++--------------------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 8ae2c8d1219d..525e82ea9a9e 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -127,11 +127,6 @@ static void nfs_readpage_release(struct nfs_page *req, int error)
 	nfs_release_request(req);
 }
 
-struct nfs_readdesc {
-	struct nfs_pageio_descriptor pgio;
-	struct nfs_open_context *ctx;
-};
-
 static void nfs_page_group_set_uptodate(struct nfs_page *req)
 {
 	if (nfs_page_group_sync_on_bit(req, PG_UPTODATE))
@@ -153,7 +148,8 @@ static void nfs_read_completion(struct nfs_pgio_header *hdr)
 
 		if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) {
 			/* note: regions of the page not covered by a
-			 * request are zeroed in readpage_async_filler */
+			 * request are zeroed in nfs_pageio_add_page
+			 */
 			if (bytes > hdr->good_bytes) {
 				/* nothing in this request was good, so zero
 				 * the full extent of the request */
@@ -281,8 +277,10 @@ static void nfs_readpage_result(struct rpc_task *task,
 		nfs_readpage_retry(task, hdr);
 }
 
-static int
-readpage_async_filler(struct nfs_readdesc *desc, struct page *page)
+int
+nfs_pageio_add_page(struct nfs_pageio_descriptor *pgio,
+		    struct nfs_open_context *ctx,
+		    struct page *page)
 {
 	struct inode *inode = page_file_mapping(page)->host;
 	unsigned int rsize = NFS_SERVER(inode)->rsize;
@@ -302,15 +300,15 @@ readpage_async_filler(struct nfs_readdesc *desc, struct page *page)
 			goto out_unlock;
 	}
 
-	new = nfs_create_request(desc->ctx, page, 0, aligned_len);
+	new = nfs_create_request(ctx, page, 0, aligned_len);
 	if (IS_ERR(new))
 		goto out_error;
 
 	if (len < PAGE_SIZE)
 		zero_user_segment(page, len, PAGE_SIZE);
-	if (!nfs_pageio_add_request(&desc->pgio, new)) {
+	if (!nfs_pageio_add_request(pgio, new)) {
 		nfs_list_remove_request(new);
-		error = desc->pgio.pg_error;
+		error = pgio->pg_error;
 		nfs_readpage_release(new, error);
 		goto out;
 	}
@@ -332,7 +330,8 @@ readpage_async_filler(struct nfs_readdesc *desc, struct page *page)
 int nfs_read_folio(struct file *file, struct folio *folio)
 {
 	struct page *page = &folio->page;
-	struct nfs_readdesc desc;
+	struct nfs_pageio_descriptor pgio;
+	struct nfs_open_context *ctx;
 	struct inode *inode = page_file_mapping(page)->host;
 	int ret;
 
@@ -358,29 +357,29 @@ int nfs_read_folio(struct file *file, struct folio *folio)
 
 	if (file == NULL) {
 		ret = -EBADF;
-		desc.ctx = nfs_find_open_context(inode, NULL, FMODE_READ);
-		if (desc.ctx == NULL)
+		ctx = nfs_find_open_context(inode, NULL, FMODE_READ);
+		if (ctx == NULL)
 			goto out_unlock;
 	} else
-		desc.ctx = get_nfs_open_context(nfs_file_open_context(file));
+		ctx = get_nfs_open_context(nfs_file_open_context(file));
 
-	xchg(&desc.ctx->error, 0);
-	nfs_pageio_init_read(&desc.pgio, inode, false,
+	xchg(&ctx->error, 0);
+	nfs_pageio_init_read(&pgio, inode, false,
 			     &nfs_async_read_completion_ops);
 
-	ret = readpage_async_filler(&desc, page);
+	ret = nfs_pageio_add_page(&pgio, ctx, page);
 	if (ret)
 		goto out;
 
-	nfs_pageio_complete_read(&desc.pgio);
-	ret = desc.pgio.pg_error < 0 ? desc.pgio.pg_error : 0;
+	nfs_pageio_complete_read(&pgio);
+	ret = pgio.pg_error < 0 ? pgio.pg_error : 0;
 	if (!ret) {
 		ret = wait_on_page_locked_killable(page);
 		if (!PageUptodate(page) && !ret)
-			ret = xchg(&desc.ctx->error, 0);
+			ret = xchg(&ctx->error, 0);
 	}
 out:
-	put_nfs_open_context(desc.ctx);
+	put_nfs_open_context(ctx);
 	trace_nfs_aop_readpage_done(inode, page, ret);
 	return ret;
 out_unlock:
@@ -391,9 +390,10 @@ int nfs_read_folio(struct file *file, struct folio *folio)
 
 void nfs_readahead(struct readahead_control *ractl)
 {
+	struct nfs_pageio_descriptor pgio;
+	struct nfs_open_context *ctx;
 	unsigned int nr_pages = readahead_count(ractl);
 	struct file *file = ractl->file;
-	struct nfs_readdesc desc;
 	struct inode *inode = ractl->mapping->host;
 	struct page *page;
 	int ret;
@@ -407,25 +407,25 @@ void nfs_readahead(struct readahead_control *ractl)
 
 	if (file == NULL) {
 		ret = -EBADF;
-		desc.ctx = nfs_find_open_context(inode, NULL, FMODE_READ);
-		if (desc.ctx == NULL)
+		ctx = nfs_find_open_context(inode, NULL, FMODE_READ);
+		if (ctx == NULL)
 			goto out;
 	} else
-		desc.ctx = get_nfs_open_context(nfs_file_open_context(file));
+		ctx = get_nfs_open_context(nfs_file_open_context(file));
 
-	nfs_pageio_init_read(&desc.pgio, inode, false,
+	nfs_pageio_init_read(&pgio, inode, false,
 			     &nfs_async_read_completion_ops);
 
 	while ((page = readahead_page(ractl)) != NULL) {
-		ret = readpage_async_filler(&desc, page);
+		ret = nfs_pageio_add_page(&pgio, ctx, page);
 		put_page(page);
 		if (ret)
 			break;
 	}
 
-	nfs_pageio_complete_read(&desc.pgio);
+	nfs_pageio_complete_read(&pgio);
 
-	put_nfs_open_context(desc.ctx);
+	put_nfs_open_context(ctx);
 out:
 	trace_nfs_aop_readahead_done(inode, nr_pages, ret);
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24  9:34 [RFC PATCH 0/3] Convert NFS to the new netfs API Dave Wysochanski
  2022-08-24  9:34 ` [RFC PATCH 1/3] NFS: Rename readpage_async_filler to nfs_pageio_add_page Dave Wysochanski
@ 2022-08-24  9:35 ` Dave Wysochanski
  2022-08-24 12:42   ` Trond Myklebust
  2022-08-24  9:35 ` [RFC PATCH 3/3] NFS: Convert nfs_read_folio and nfs_readahead to netfs APIs Dave Wysochanski
  2 siblings, 1 reply; 16+ messages in thread
From: Dave Wysochanski @ 2022-08-24  9:35 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells
  Cc: linux-nfs, linux-cachefs, Benjamin Maynard, Daire Byrne

As first steps for support of the netfs library, add NETFS_SUPPORT
to Kconfig and add the required netfs_inode into struct nfs_inode.
The struct netfs_inode is now where the vfs_inode is stored as well
as the fscache_cookie.  In addition, use the netfs_inode() and
netfs_i_cookie() helpers, and remove our own helper, nfs_i_fscache().

Later patches will enable netfs by defining NFS specific version
of struct netfs_request_ops and calling netfs_inode_init().

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/Kconfig         |  1 +
 fs/nfs/delegation.c    |  2 +-
 fs/nfs/dir.c           |  2 +-
 fs/nfs/fscache.c       | 20 +++++++++-----------
 fs/nfs/fscache.h       | 15 ++++++---------
 fs/nfs/inode.c         |  6 +++---
 fs/nfs/internal.h      |  2 +-
 fs/nfs/pnfs.c          | 12 ++++++------
 fs/nfs/write.c         |  2 +-
 include/linux/nfs_fs.h | 19 +++++--------------
 10 files changed, 34 insertions(+), 47 deletions(-)

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 14a72224b657..79b241bed762 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -5,6 +5,7 @@ config NFS_FS
 	select LOCKD
 	select SUNRPC
 	select NFS_ACL_SUPPORT if NFS_V3_ACL
+	select NETFS_SUPPORT
 	help
 	  Choose Y here if you want to access files residing on other
 	  computers using Sun's Network File System protocol.  To compile
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 5c97cad741a7..b80d6ae1ead6 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -306,7 +306,7 @@ nfs_start_delegation_return_locked(struct nfs_inode *nfsi)
 	}
 	spin_unlock(&delegation->lock);
 	if (ret)
-		nfs_clear_verifier_delegated(&nfsi->vfs_inode);
+		nfs_clear_verifier_delegated(&nfsi->netfs.inode);
 out:
 	return ret;
 }
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index dbab3caa15ed..e76d6d05e0e1 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2796,7 +2796,7 @@ nfs_do_access_cache_scan(unsigned int nr_to_scan)
 
 		if (nr_to_scan-- == 0)
 			break;
-		inode = &nfsi->vfs_inode;
+		inode = &nfsi->netfs.inode;
 		spin_lock(&inode->i_lock);
 		if (list_empty(&nfsi->access_cache_entry_lru))
 			goto remove_lru_entry;
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index e861d7bae305..a6fc1c8b6644 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -163,13 +163,14 @@ void nfs_fscache_init_inode(struct inode *inode)
 	struct nfs_server *nfss = NFS_SERVER(inode);
 	struct nfs_inode *nfsi = NFS_I(inode);
 
-	nfsi->fscache = NULL;
+	netfs_inode(inode)->cache = NULL;
 	if (!(nfss->fscache && S_ISREG(inode->i_mode)))
 		return;
 
 	nfs_fscache_update_auxdata(&auxdata, inode);
 
-	nfsi->fscache = fscache_acquire_cookie(NFS_SB(inode->i_sb)->fscache,
+	netfs_inode(inode)->cache = fscache_acquire_cookie(
+					       nfss->fscache,
 					       0,
 					       nfsi->fh.data, /* index_key */
 					       nfsi->fh.size,
@@ -183,11 +184,8 @@ void nfs_fscache_init_inode(struct inode *inode)
  */
 void nfs_fscache_clear_inode(struct inode *inode)
 {
-	struct nfs_inode *nfsi = NFS_I(inode);
-	struct fscache_cookie *cookie = nfs_i_fscache(inode);
-
-	fscache_relinquish_cookie(cookie, false);
-	nfsi->fscache = NULL;
+	fscache_relinquish_cookie(netfs_i_cookie(&NFS_I(inode)->netfs), false);
+	netfs_inode(inode)->cache = NULL;
 }
 
 /*
@@ -212,7 +210,7 @@ void nfs_fscache_clear_inode(struct inode *inode)
 void nfs_fscache_open_file(struct inode *inode, struct file *filp)
 {
 	struct nfs_fscache_inode_auxdata auxdata;
-	struct fscache_cookie *cookie = nfs_i_fscache(inode);
+	struct fscache_cookie *cookie = netfs_i_cookie(&NFS_I(inode)->netfs);
 	bool open_for_write = inode_is_open_for_write(inode);
 
 	if (!fscache_cookie_valid(cookie))
@@ -230,7 +228,7 @@ EXPORT_SYMBOL_GPL(nfs_fscache_open_file);
 void nfs_fscache_release_file(struct inode *inode, struct file *filp)
 {
 	struct nfs_fscache_inode_auxdata auxdata;
-	struct fscache_cookie *cookie = nfs_i_fscache(inode);
+	struct fscache_cookie *cookie = netfs_i_cookie(&NFS_I(inode)->netfs);
 	loff_t i_size = i_size_read(inode);
 
 	nfs_fscache_update_auxdata(&auxdata, inode);
@@ -243,7 +241,7 @@ void nfs_fscache_release_file(struct inode *inode, struct file *filp)
 static int fscache_fallback_read_page(struct inode *inode, struct page *page)
 {
 	struct netfs_cache_resources cres;
-	struct fscache_cookie *cookie = nfs_i_fscache(inode);
+	struct fscache_cookie *cookie = netfs_i_cookie(&NFS_I(inode)->netfs);
 	struct iov_iter iter;
 	struct bio_vec bvec[1];
 	int ret;
@@ -271,7 +269,7 @@ static int fscache_fallback_write_page(struct inode *inode, struct page *page,
 				       bool no_space_allocated_yet)
 {
 	struct netfs_cache_resources cres;
-	struct fscache_cookie *cookie = nfs_i_fscache(inode);
+	struct fscache_cookie *cookie = netfs_i_cookie(&NFS_I(inode)->netfs);
 	struct iov_iter iter;
 	struct bio_vec bvec[1];
 	loff_t start = page_offset(page);
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 2a37af880978..38614ed8f951 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -54,7 +54,7 @@ static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
 		if (current_is_kswapd() || !(gfp & __GFP_FS))
 			return false;
 		folio_wait_fscache(folio);
-		fscache_note_page_release(nfs_i_fscache(folio->mapping->host));
+		fscache_note_page_release(netfs_i_cookie(&NFS_I(folio->mapping->host)->netfs));
 		nfs_inc_fscache_stats(folio->mapping->host,
 				      NFSIOS_FSCACHE_PAGES_UNCACHED);
 	}
@@ -66,7 +66,7 @@ static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
  */
 static inline int nfs_fscache_read_page(struct inode *inode, struct page *page)
 {
-	if (nfs_i_fscache(inode))
+	if (netfs_inode(inode)->cache)
 		return __nfs_fscache_read_page(inode, page);
 	return -ENOBUFS;
 }
@@ -78,7 +78,7 @@ static inline int nfs_fscache_read_page(struct inode *inode, struct page *page)
 static inline void nfs_fscache_write_page(struct inode *inode,
 					   struct page *page)
 {
-	if (nfs_i_fscache(inode))
+	if (netfs_inode(inode)->cache)
 		__nfs_fscache_write_page(inode, page);
 }
 
@@ -101,13 +101,10 @@ static inline void nfs_fscache_update_auxdata(struct nfs_fscache_inode_auxdata *
 static inline void nfs_fscache_invalidate(struct inode *inode, int flags)
 {
 	struct nfs_fscache_inode_auxdata auxdata;
-	struct nfs_inode *nfsi = NFS_I(inode);
+	struct fscache_cookie *cookie =  netfs_i_cookie(&NFS_I(inode)->netfs);
 
-	if (nfsi->fscache) {
-		nfs_fscache_update_auxdata(&auxdata, inode);
-		fscache_invalidate(nfsi->fscache, &auxdata,
-				   i_size_read(inode), flags);
-	}
+	nfs_fscache_update_auxdata(&auxdata, inode);
+	fscache_invalidate(cookie, &auxdata, i_size_read(inode), flags);
 }
 
 /*
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index b4e46b0ffa2d..3acddec41ba7 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1410,7 +1410,7 @@ int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping)
 
 static bool nfs_file_has_writers(struct nfs_inode *nfsi)
 {
-	struct inode *inode = &nfsi->vfs_inode;
+	struct inode *inode = &nfsi->netfs.inode;
 
 	if (!S_ISREG(inode->i_mode))
 		return false;
@@ -2248,7 +2248,7 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
 #ifdef CONFIG_NFS_V4_2
 	nfsi->xattr_cache = NULL;
 #endif
-	return &nfsi->vfs_inode;
+	return &nfsi->netfs.inode;
 }
 EXPORT_SYMBOL_GPL(nfs_alloc_inode);
 
@@ -2272,7 +2272,7 @@ static void init_once(void *foo)
 {
 	struct nfs_inode *nfsi = (struct nfs_inode *) foo;
 
-	inode_init_once(&nfsi->vfs_inode);
+	inode_init_once(&nfsi->netfs.inode);
 	INIT_LIST_HEAD(&nfsi->open_files);
 	INIT_LIST_HEAD(&nfsi->access_cache_entry_lru);
 	INIT_LIST_HEAD(&nfsi->access_cache_inode_lru);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 27c720d71b4e..29cfee4660c5 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -355,7 +355,7 @@ nfs4_label_copy(struct nfs4_label *dst, struct nfs4_label *src)
 
 static inline void nfs_zap_label_cache_locked(struct nfs_inode *nfsi)
 {
-	if (nfs_server_capable(&nfsi->vfs_inode, NFS_CAP_SECURITY_LABEL))
+	if (nfs_server_capable(&nfsi->netfs.inode, NFS_CAP_SECURITY_LABEL))
 		nfsi->cache_validity |= NFS_INO_INVALID_LABEL;
 }
 #else
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 41a9b6b58fb9..4087de475829 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -763,19 +763,19 @@ static struct pnfs_layout_hdr *__pnfs_destroy_layout(struct nfs_inode *nfsi)
 	struct pnfs_layout_hdr *lo;
 	LIST_HEAD(tmp_list);
 
-	spin_lock(&nfsi->vfs_inode.i_lock);
+	spin_lock(&nfsi->netfs.inode.i_lock);
 	lo = nfsi->layout;
 	if (lo) {
 		pnfs_get_layout_hdr(lo);
 		pnfs_mark_layout_stateid_invalid(lo, &tmp_list);
 		pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RO_FAILED);
 		pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RW_FAILED);
-		spin_unlock(&nfsi->vfs_inode.i_lock);
+		spin_unlock(&nfsi->netfs.inode.i_lock);
 		pnfs_free_lseg_list(&tmp_list);
-		nfs_commit_inode(&nfsi->vfs_inode, 0);
+		nfs_commit_inode(&nfsi->netfs.inode, 0);
 		pnfs_put_layout_hdr(lo);
 	} else
-		spin_unlock(&nfsi->vfs_inode.i_lock);
+		spin_unlock(&nfsi->netfs.inode.i_lock);
 	return lo;
 }
 
@@ -790,9 +790,9 @@ static bool pnfs_layout_removed(struct nfs_inode *nfsi,
 {
 	bool ret;
 
-	spin_lock(&nfsi->vfs_inode.i_lock);
+	spin_lock(&nfsi->netfs.inode.i_lock);
 	ret = nfsi->layout != lo;
-	spin_unlock(&nfsi->vfs_inode.i_lock);
+	spin_unlock(&nfsi->netfs.inode.i_lock);
 	return ret;
 }
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 51a7e202d6e5..6a8bf34662b9 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -828,7 +828,7 @@ nfs_page_search_commits_for_head_request_locked(struct nfs_inode *nfsi,
 {
 	struct nfs_page *freq, *t;
 	struct nfs_commit_info cinfo;
-	struct inode *inode = &nfsi->vfs_inode;
+	struct inode *inode = &nfsi->netfs.inode;
 
 	nfs_init_cinfo_from_inode(&cinfo, inode);
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index b32ed68e7dc4..1df6c48abc7c 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -31,6 +31,8 @@
 #include <linux/sunrpc/auth.h>
 #include <linux/sunrpc/clnt.h>
 
+#include <linux/netfs.h>
+
 #include <linux/nfs.h>
 #include <linux/nfs2.h>
 #include <linux/nfs3.h>
@@ -122,6 +124,8 @@ struct nfs4_xattr_cache;
  * nfs fs inode data in memory
  */
 struct nfs_inode {
+	struct netfs_inode netfs; /* Netfslib context and vfs inode */
+
 	/*
 	 * The 64bit 'inode number'
 	 */
@@ -203,10 +207,6 @@ struct nfs_inode {
 	/* how many bytes have been written/read and how many bytes queued up */
 	__u64 write_io;
 	__u64 read_io;
-#ifdef CONFIG_NFS_FSCACHE
-	struct fscache_cookie	*fscache;
-#endif
-	struct inode		vfs_inode;
 
 #ifdef CONFIG_NFS_V4_2
 	struct nfs4_xattr_cache *xattr_cache;
@@ -283,7 +283,7 @@ struct nfs4_copy_state {
 
 static inline struct nfs_inode *NFS_I(const struct inode *inode)
 {
-	return container_of(inode, struct nfs_inode, vfs_inode);
+	return container_of(inode, struct nfs_inode, netfs.inode);
 }
 
 static inline struct nfs_server *NFS_SB(const struct super_block *s)
@@ -328,15 +328,6 @@ static inline int NFS_STALE(const struct inode *inode)
 	return test_bit(NFS_INO_STALE, &NFS_I(inode)->flags);
 }
 
-static inline struct fscache_cookie *nfs_i_fscache(struct inode *inode)
-{
-#ifdef CONFIG_NFS_FSCACHE
-	return NFS_I(inode)->fscache;
-#else
-	return NULL;
-#endif
-}
-
 static inline __u64 NFS_FILEID(const struct inode *inode)
 {
 	return NFS_I(inode)->fileid;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 3/3] NFS: Convert nfs_read_folio and nfs_readahead to netfs APIs
  2022-08-24  9:34 [RFC PATCH 0/3] Convert NFS to the new netfs API Dave Wysochanski
  2022-08-24  9:34 ` [RFC PATCH 1/3] NFS: Rename readpage_async_filler to nfs_pageio_add_page Dave Wysochanski
  2022-08-24  9:35 ` [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig Dave Wysochanski
@ 2022-08-24  9:35 ` Dave Wysochanski
  2 siblings, 0 replies; 16+ messages in thread
From: Dave Wysochanski @ 2022-08-24  9:35 UTC (permalink / raw)
  To: Anna Schumaker, Trond Myklebust, David Howells
  Cc: linux-nfs, linux-cachefs, Benjamin Maynard, Daire Byrne

Convert the NFS non-direct read code paths to corresponding netfs APIs.
The netfs API defines struct netfs_request_ops which must be filled
in by the network filesystem.  For NFS, we only need to define 4 of
the functions, the main one being the issue_read() function.
The issue_read() function is called by the netfs layer when a read
cannot be fulfilled locally, and must be sent to the server (either
the cache is not active, or it is active but the data is not available).
Once the read from the server is complete, netfs requires a call to
netfs_subreq_terminated() which conveys either how many bytes were read
successfully, or an error.  Note that issue_read() is called with a
structure, netfs_io_subrequest, which defines the IO requested, and
contains a start and a length (both in bytes), and assumes the underlying
netfs will return a either an error on the whole region, or the number
of bytes successfully read (a short read).

The NFS pgio API layer is page based, and there is no way for the caller
of these APIs to know how many RPCs will be sent and how the pages
will be broken up into underlying RPCs, each of which will have their
own return code.  Thus, NFS needs some way to accommodate the netfs
API requirement on the single response to the whole request, while also
minimizing disruptive changes to the NFS pgio layer.  The approach taken
with this patch is to allocate a small structure for each call to
nfs_issue_read() to keep some accounting information for the outstanding
RPCs, as well as the final error value or the number of bytes successfully
read.  The accounting data is updated inside the pgio layer, when a
nfs_pgio_header contains a valid pointer to this data.  In the NFS read
completion routine, nfs_read_completion(), the final error value and
bytes read is updated, and the accounting data is used to determine
whether this completion represents the last RPC.  If this is the last RPC,
call netfs_subreq_terminated() with the final error value or the number
of bytes transferred.

Note that the final disposition of a netfs_io_subrequest is given by
transferred_or_error parameter to netfs_subreq_terminated(), which
conveys either the number of bytes transferred (all RPCs were successful),
or an error (one or more RPCs completed with an error).  In the error
case, the value of transferred_or_error is set to the error in the RPC
that completed last.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/fscache.c         | 175 ++++++++++++++++++---------------------
 fs/nfs/fscache.h         |  68 ++++++++-------
 fs/nfs/inode.c           |   2 +
 fs/nfs/internal.h        |   8 ++
 fs/nfs/pagelist.c        |  14 ++++
 fs/nfs/read.c            |  99 +++++++---------------
 include/linux/nfs_page.h |   1 +
 include/linux/nfs_xdr.h  |   1 +
 8 files changed, 172 insertions(+), 196 deletions(-)

diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index a6fc1c8b6644..8e1f5c2ed8f2 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -15,6 +15,9 @@
 #include <linux/seq_file.h>
 #include <linux/slab.h>
 #include <linux/iversion.h>
+#include <linux/xarray.h>
+#include <linux/fscache.h>
+#include <linux/netfs.h>
 
 #include "internal.h"
 #include "iostat.h"
@@ -235,112 +238,96 @@ void nfs_fscache_release_file(struct inode *inode, struct file *filp)
 	fscache_unuse_cookie(cookie, &auxdata, &i_size);
 }
 
-/*
- * Fallback page reading interface.
- */
-static int fscache_fallback_read_page(struct inode *inode, struct page *page)
+
+atomic_t nfs_netfs_debug_id;
+static int nfs_netfs_init_request(struct netfs_io_request *rreq, struct file *file)
 {
-	struct netfs_cache_resources cres;
-	struct fscache_cookie *cookie = netfs_i_cookie(&NFS_I(inode)->netfs);
-	struct iov_iter iter;
-	struct bio_vec bvec[1];
-	int ret;
-
-	memset(&cres, 0, sizeof(cres));
-	bvec[0].bv_page		= page;
-	bvec[0].bv_offset	= 0;
-	bvec[0].bv_len		= PAGE_SIZE;
-	iov_iter_bvec(&iter, READ, bvec, ARRAY_SIZE(bvec), PAGE_SIZE);
-
-	ret = fscache_begin_read_operation(&cres, cookie);
-	if (ret < 0)
-		return ret;
-
-	ret = fscache_read(&cres, page_offset(page), &iter, NETFS_READ_HOLE_FAIL,
-			   NULL, NULL);
-	fscache_end_operation(&cres);
-	return ret;
+	struct nfs_open_context *ctx;
+
+	if (file == NULL) {
+		ctx = nfs_find_open_context(rreq->inode, NULL, FMODE_READ);
+		if (!ctx)
+			return -ENOMEM;
+	} else
+		ctx = get_nfs_open_context(nfs_file_open_context(file));
+
+	rreq->netfs_priv = ctx;
+
+	if (netfs_i_cookie(&NFS_I(rreq->inode)->netfs))
+		rreq->debug_id = atomic_inc_return(&nfs_netfs_debug_id);
+
+	return 0;
 }
 
-/*
- * Fallback page writing interface.
- */
-static int fscache_fallback_write_page(struct inode *inode, struct page *page,
-				       bool no_space_allocated_yet)
+static void nfs_netfs_free_request(struct netfs_io_request *rreq)
 {
-	struct netfs_cache_resources cres;
-	struct fscache_cookie *cookie = netfs_i_cookie(&NFS_I(inode)->netfs);
-	struct iov_iter iter;
-	struct bio_vec bvec[1];
-	loff_t start = page_offset(page);
-	size_t len = PAGE_SIZE;
-	int ret;
-
-	memset(&cres, 0, sizeof(cres));
-	bvec[0].bv_page		= page;
-	bvec[0].bv_offset	= 0;
-	bvec[0].bv_len		= PAGE_SIZE;
-	iov_iter_bvec(&iter, WRITE, bvec, ARRAY_SIZE(bvec), PAGE_SIZE);
-
-	ret = fscache_begin_write_operation(&cres, cookie);
-	if (ret < 0)
-		return ret;
-
-	ret = cres.ops->prepare_write(&cres, &start, &len, i_size_read(inode),
-				      no_space_allocated_yet);
-	if (ret == 0)
-		ret = fscache_write(&cres, page_offset(page), &iter, NULL, NULL);
-	fscache_end_operation(&cres);
-	return ret;
+	put_nfs_open_context(rreq->netfs_priv);
 }
 
-/*
- * Retrieve a page from fscache
- */
-int __nfs_fscache_read_page(struct inode *inode, struct page *page)
+static inline int nfs_begin_cache_operation(struct netfs_io_request *rreq)
 {
-	int ret;
-
-	trace_nfs_fscache_read_page(inode, page);
-	if (PageChecked(page)) {
-		ClearPageChecked(page);
-		ret = 1;
-		goto out;
-	}
-
-	ret = fscache_fallback_read_page(inode, page);
-	if (ret < 0) {
-		nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_FAIL);
-		SetPageChecked(page);
-		goto out;
-	}
-
-	/* Read completed synchronously */
-	nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_OK);
-	SetPageUptodate(page);
-	ret = 0;
-out:
-	trace_nfs_fscache_read_page_exit(inode, page, ret);
-	return ret;
+	return fscache_begin_read_operation(&rreq->cache_resources,
+					    netfs_i_cookie(&NFS_I(rreq->inode)->netfs));
 }
 
-/*
- * Store a newly fetched page in fscache.  We can be certain there's no page
- * stored in the cache as yet otherwise we would've read it from there.
- */
-void __nfs_fscache_write_page(struct inode *inode, struct page *page)
+static struct nfs_netfs_io_data *nfs_netfs_alloc(struct netfs_io_subrequest *sreq)
 {
-	int ret;
+	struct nfs_netfs_io_data *netfs;
+
+	netfs = kzalloc(sizeof(*netfs), GFP_KERNEL_ACCOUNT);
+	if (!netfs)
+		return NULL;
+	netfs->sreq = sreq;
+	refcount_set(&netfs->refcount, 1);
+	return netfs;
+}
 
-	trace_nfs_fscache_write_page(inode, page);
+static bool nfs_netfs_clamp_length(struct netfs_io_subrequest *sreq)
+{
+	size_t	rsize = NFS_SB(sreq->rreq->inode->i_sb)->rsize;
 
-	ret = fscache_fallback_write_page(inode, page, true);
+	sreq->len = min(sreq->len, rsize);
+	return true;
+}
 
-	if (ret != 0) {
-		nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_WRITTEN_FAIL);
-		nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_UNCACHED);
-	} else {
-		nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_WRITTEN_OK);
+static void nfs_issue_read(struct netfs_io_subrequest *sreq)
+{
+	struct nfs_pageio_descriptor pgio;
+	struct inode *inode = sreq->rreq->inode;
+	struct nfs_open_context *ctx = sreq->rreq->netfs_priv;
+	struct page *page;
+	int err;
+	pgoff_t start = (sreq->start + sreq->transferred) >> PAGE_SHIFT;
+	pgoff_t last = ((sreq->start + sreq->len -
+			 sreq->transferred - 1) >> PAGE_SHIFT);
+	XA_STATE(xas, &sreq->rreq->mapping->i_pages, start);
+
+	nfs_pageio_init_read(&pgio, inode, false,
+			     &nfs_async_read_completion_ops);
+
+	pgio.pg_netfs = nfs_netfs_alloc(sreq); /* used in completion */
+	if (!pgio.pg_netfs)
+		return netfs_subreq_terminated(sreq, -ENOMEM, false);
+
+	xas_lock(&xas);
+	xas_for_each(&xas, page, last) {
+		/* nfs_pageio_add_page() may schedule() due to pNFS layout and other RPCs  */
+		xas_pause(&xas);
+		xas_unlock(&xas);
+		err = nfs_pageio_add_page(&pgio, ctx, page);
+		if (err < 0)
+			return netfs_subreq_terminated(sreq, err, false);
+		xas_lock(&xas);
 	}
-	trace_nfs_fscache_write_page_exit(inode, page, ret);
+	xas_unlock(&xas);
+	nfs_pageio_complete_read(&pgio);
+	nfs_netfs_put(pgio.pg_netfs);
 }
+
+const struct netfs_request_ops nfs_netfs_ops = {
+	.init_request		= nfs_netfs_init_request,
+	.free_request		= nfs_netfs_free_request,
+	.begin_cache_operation	= nfs_begin_cache_operation,
+	.issue_read		= nfs_issue_read,
+	.clamp_length		= nfs_netfs_clamp_length
+};
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 38614ed8f951..90a20ce9c432 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -34,6 +34,41 @@ struct nfs_fscache_inode_auxdata {
 	u64	change_attr;
 };
 
+struct nfs_netfs_io_data {
+	refcount_t			refcount;
+	struct netfs_io_subrequest	*sreq;
+
+	/*
+	 * NFS may split a netfs_io_subrequest into multiple RPCs, each
+	 * with their own read completion.  In netfs, we can only call
+	 * netfs_subreq_terminated() once for each subrequest.  So we
+	 * must keep track of the rpcs and rpc_byte_count for what has
+	 * been submitted, and only call netfs via netfs_subreq_terminated()
+	 * when the final RPC has completed.
+	 */
+	atomic_t	rpcs;
+	unsigned long	rpc_byte_count;
+
+	/*
+	 * Final dispostion of the netfs_io_subrequest, sent in
+	 * netfs_subreq_terminated()
+	 */
+	ssize_t		transferred;
+	int		error;
+};
+
+static inline void nfs_netfs_get(struct nfs_netfs_io_data *netfs)
+{
+	refcount_inc(&netfs->refcount);
+}
+
+static inline void nfs_netfs_put(struct nfs_netfs_io_data *netfs)
+{
+	if (!refcount_dec_and_test(&netfs->refcount))
+		return;
+	kfree(netfs);
+}
+
 /*
  * fscache.c
  */
@@ -45,43 +80,17 @@ extern void nfs_fscache_clear_inode(struct inode *);
 extern void nfs_fscache_open_file(struct inode *, struct file *);
 extern void nfs_fscache_release_file(struct inode *, struct file *);
 
-extern int __nfs_fscache_read_page(struct inode *, struct page *);
-extern void __nfs_fscache_write_page(struct inode *, struct page *);
-
 static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
 {
 	if (folio_test_fscache(folio)) {
 		if (current_is_kswapd() || !(gfp & __GFP_FS))
 			return false;
 		folio_wait_fscache(folio);
-		fscache_note_page_release(netfs_i_cookie(&NFS_I(folio->mapping->host)->netfs));
-		nfs_inc_fscache_stats(folio->mapping->host,
-				      NFSIOS_FSCACHE_PAGES_UNCACHED);
 	}
+	fscache_note_page_release(netfs_i_cookie(&NFS_I(folio->mapping->host)->netfs));
 	return true;
 }
 
-/*
- * Retrieve a page from an inode data storage object.
- */
-static inline int nfs_fscache_read_page(struct inode *inode, struct page *page)
-{
-	if (netfs_inode(inode)->cache)
-		return __nfs_fscache_read_page(inode, page);
-	return -ENOBUFS;
-}
-
-/*
- * Store a page newly fetched from the server in an inode data storage object
- * in the cache.
- */
-static inline void nfs_fscache_write_page(struct inode *inode,
-					   struct page *page)
-{
-	if (netfs_inode(inode)->cache)
-		__nfs_fscache_write_page(inode, page);
-}
-
 static inline void nfs_fscache_update_auxdata(struct nfs_fscache_inode_auxdata *auxdata,
 					      struct inode *inode)
 {
@@ -130,11 +139,6 @@ static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
 {
 	return true; /* may release folio */
 }
-static inline int nfs_fscache_read_page(struct inode *inode, struct page *page)
-{
-	return -ENOBUFS;
-}
-static inline void nfs_fscache_write_page(struct inode *inode, struct page *page) {}
 static inline void nfs_fscache_invalidate(struct inode *inode, int flags) {}
 
 static inline const char *nfs_server_fscache_state(struct nfs_server *server)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 3acddec41ba7..dea13500003f 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2248,6 +2248,8 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
 #ifdef CONFIG_NFS_V4_2
 	nfsi->xattr_cache = NULL;
 #endif
+	netfs_inode_init(&nfsi->netfs, &nfs_netfs_ops);
+
 	return &nfsi->netfs.inode;
 }
 EXPORT_SYMBOL_GPL(nfs_alloc_inode);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 29cfee4660c5..2895e9ada22e 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -453,6 +453,9 @@ extern void nfs_sb_deactive(struct super_block *sb);
 extern int nfs_client_for_each_server(struct nfs_client *clp,
 				      int (*fn)(struct nfs_server *, void *),
 				      void *data);
+/* fscache.c */
+extern const struct netfs_request_ops nfs_netfs_ops;
+
 /* io.c */
 extern void nfs_start_io_read(struct inode *inode);
 extern void nfs_end_io_read(struct inode *inode);
@@ -482,9 +485,14 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh, bool
 
 struct nfs_pgio_completion_ops;
 /* read.c */
+extern const struct nfs_pgio_completion_ops nfs_async_read_completion_ops;
 extern void nfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 			struct inode *inode, bool force_mds,
 			const struct nfs_pgio_completion_ops *compl_ops);
+extern int nfs_pageio_add_page(struct nfs_pageio_descriptor *pgio,
+			       struct nfs_open_context *ctx,
+			       struct page *page);
+extern void nfs_pageio_complete_read(struct nfs_pageio_descriptor *pgio);
 extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
 extern void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio);
 
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 317cedfa52bf..93458da76e32 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -25,6 +25,7 @@
 #include "internal.h"
 #include "pnfs.h"
 #include "nfstrace.h"
+#include "fscache.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
@@ -68,6 +69,10 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
 	hdr->good_bytes = mirror->pg_count;
 	hdr->io_completion = desc->pg_io_completion;
 	hdr->dreq = desc->pg_dreq;
+	if (desc->pg_netfs) {
+		hdr->netfs = desc->pg_netfs;
+		nfs_netfs_get(desc->pg_netfs);
+	}
 	hdr->release = release;
 	hdr->completion_ops = desc->pg_completion_ops;
 	if (hdr->completion_ops->init_hdr)
@@ -846,6 +851,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
 	desc->pg_lseg = NULL;
 	desc->pg_io_completion = NULL;
 	desc->pg_dreq = NULL;
+	desc->pg_netfs = NULL;
 	desc->pg_bsize = bsize;
 
 	desc->pg_mirror_count = 1;
@@ -896,6 +902,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
 	struct nfs_page_array *pg_array = &hdr->page_array;
 	unsigned int pagecount, pageused;
 	gfp_t gfp_flags = nfs_io_gfp_mask();
+	struct nfs_netfs_io_data	*netfs = hdr->netfs;
 
 	pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
 	pg_array->npages = pagecount;
@@ -940,6 +947,12 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
 	/* Set up the argument struct */
 	nfs_pgio_rpcsetup(hdr, mirror->pg_count, desc->pg_ioflags, &cinfo);
 	desc->pg_rpc_callops = &nfs_pgio_common_ops;
+
+	/* Record netfs RPC count */
+	if (netfs) {
+		atomic_inc(&netfs->rpcs);
+		netfs->rpc_byte_count += hdr->args.count;
+	}
 	return 0;
 }
 EXPORT_SYMBOL_GPL(nfs_generic_pgio);
@@ -1360,6 +1373,7 @@ int nfs_pageio_resend(struct nfs_pageio_descriptor *desc,
 
 	desc->pg_io_completion = hdr->io_completion;
 	desc->pg_dreq = hdr->dreq;
+	desc->pg_netfs = hdr->netfs;
 	list_splice_init(&hdr->pages, &pages);
 	while (!list_empty(&pages)) {
 		struct nfs_page *req = nfs_list_entry(pages.next);
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 525e82ea9a9e..f7dcb72530ee 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -30,7 +30,7 @@
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
-static const struct nfs_pgio_completion_ops nfs_async_read_completion_ops;
+const struct nfs_pgio_completion_ops nfs_async_read_completion_ops;
 static const struct nfs_rw_ops nfs_rw_read_ops;
 
 static struct kmem_cache *nfs_rdata_cachep;
@@ -74,7 +74,7 @@ void nfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 }
 EXPORT_SYMBOL_GPL(nfs_pageio_init_read);
 
-static void nfs_pageio_complete_read(struct nfs_pageio_descriptor *pgio)
+void nfs_pageio_complete_read(struct nfs_pageio_descriptor *pgio)
 {
 	struct nfs_pgio_mirror *pgm;
 	unsigned long npages;
@@ -119,11 +119,7 @@ static void nfs_readpage_release(struct nfs_page *req, int error)
 
 	if (nfs_error_is_fatal_on_server(error) && error != -ETIMEDOUT)
 		SetPageError(page);
-	if (nfs_page_group_sync_on_bit(req, PG_UNLOCKPAGE)) {
-		if (PageUptodate(page))
-			nfs_fscache_write_page(inode, page);
-		unlock_page(page);
-	}
+
 	nfs_release_request(req);
 }
 
@@ -137,6 +133,8 @@ static void nfs_read_completion(struct nfs_pgio_header *hdr)
 {
 	unsigned long bytes = 0;
 	int error;
+	struct nfs_netfs_io_data	*netfs = hdr->netfs;
+	struct netfs_io_subrequest	*sreq = netfs->sreq;
 
 	if (test_bit(NFS_IOHDR_REDO, &hdr->flags))
 		goto out;
@@ -177,6 +175,24 @@ static void nfs_read_completion(struct nfs_pgio_header *hdr)
 		nfs_list_remove_request(req);
 		nfs_readpage_release(req, error);
 	}
+	if (netfs) {
+		if (test_bit(NFS_IOHDR_EOF, &hdr->flags))
+			__set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
+
+		/* FIXME: What if multiple RPCs race, need locking (spinlock?) or is this ok? */
+		if (hdr->error)
+			netfs->error = hdr->error;
+		else
+			netfs->transferred += hdr->res.count;
+
+		/* Only the last RPC completion should call netfs_subreq_terminated() */
+		if (atomic_dec_and_test(&netfs->rpcs) &&
+		    (netfs->rpc_byte_count >= sreq->len)) { /* Could have EOF */
+			netfs_subreq_terminated(sreq, netfs->error ?: netfs->transferred, false);
+			nfs_netfs_put(netfs);
+			hdr->netfs = NULL;
+		}
+	}
 out:
 	hdr->release(hdr);
 }
@@ -202,7 +218,7 @@ nfs_async_read_error(struct list_head *head, int error)
 	}
 }
 
-static const struct nfs_pgio_completion_ops nfs_async_read_completion_ops = {
+const struct nfs_pgio_completion_ops nfs_async_read_completion_ops = {
 	.error_cleanup = nfs_async_read_error,
 	.completion = nfs_read_completion,
 };
@@ -294,12 +310,6 @@ nfs_pageio_add_page(struct nfs_pageio_descriptor *pgio,
 
 	aligned_len = min_t(unsigned int, ALIGN(len, rsize), PAGE_SIZE);
 
-	if (!IS_SYNC(page->mapping->host)) {
-		error = nfs_fscache_read_page(page->mapping->host, page);
-		if (error == 0)
-			goto out_unlock;
-	}
-
 	new = nfs_create_request(ctx, page, 0, aligned_len);
 	if (IS_ERR(new))
 		goto out_error;
@@ -315,8 +325,6 @@ nfs_pageio_add_page(struct nfs_pageio_descriptor *pgio,
 	return 0;
 out_error:
 	error = PTR_ERR(new);
-out_unlock:
-	unlock_page(page);
 out:
 	return error;
 }
@@ -330,8 +338,6 @@ nfs_pageio_add_page(struct nfs_pageio_descriptor *pgio,
 int nfs_read_folio(struct file *file, struct folio *folio)
 {
 	struct page *page = &folio->page;
-	struct nfs_pageio_descriptor pgio;
-	struct nfs_open_context *ctx;
 	struct inode *inode = page_file_mapping(page)->host;
 	int ret;
 
@@ -355,47 +361,19 @@ int nfs_read_folio(struct file *file, struct folio *folio)
 	if (NFS_STALE(inode))
 		goto out_unlock;
 
-	if (file == NULL) {
-		ret = -EBADF;
-		ctx = nfs_find_open_context(inode, NULL, FMODE_READ);
-		if (ctx == NULL)
-			goto out_unlock;
-	} else
-		ctx = get_nfs_open_context(nfs_file_open_context(file));
-
-	xchg(&ctx->error, 0);
-	nfs_pageio_init_read(&pgio, inode, false,
-			     &nfs_async_read_completion_ops);
-
-	ret = nfs_pageio_add_page(&pgio, ctx, page);
-	if (ret)
-		goto out;
-
-	nfs_pageio_complete_read(&pgio);
-	ret = pgio.pg_error < 0 ? pgio.pg_error : 0;
-	if (!ret) {
-		ret = wait_on_page_locked_killable(page);
-		if (!PageUptodate(page) && !ret)
-			ret = xchg(&ctx->error, 0);
-	}
-out:
-	put_nfs_open_context(ctx);
-	trace_nfs_aop_readpage_done(inode, page, ret);
-	return ret;
+	ret = netfs_read_folio(file, folio);
+	goto out;
 out_unlock:
 	unlock_page(page);
+out:
 	trace_nfs_aop_readpage_done(inode, page, ret);
 	return ret;
 }
 
 void nfs_readahead(struct readahead_control *ractl)
 {
-	struct nfs_pageio_descriptor pgio;
-	struct nfs_open_context *ctx;
 	unsigned int nr_pages = readahead_count(ractl);
-	struct file *file = ractl->file;
 	struct inode *inode = ractl->mapping->host;
-	struct page *page;
 	int ret;
 
 	trace_nfs_aop_readahead(inode, readahead_pos(ractl), nr_pages);
@@ -405,27 +383,8 @@ void nfs_readahead(struct readahead_control *ractl)
 	if (NFS_STALE(inode))
 		goto out;
 
-	if (file == NULL) {
-		ret = -EBADF;
-		ctx = nfs_find_open_context(inode, NULL, FMODE_READ);
-		if (ctx == NULL)
-			goto out;
-	} else
-		ctx = get_nfs_open_context(nfs_file_open_context(file));
-
-	nfs_pageio_init_read(&pgio, inode, false,
-			     &nfs_async_read_completion_ops);
-
-	while ((page = readahead_page(ractl)) != NULL) {
-		ret = nfs_pageio_add_page(&pgio, ctx, page);
-		put_page(page);
-		if (ret)
-			break;
-	}
-
-	nfs_pageio_complete_read(&pgio);
-
-	put_nfs_open_context(ctx);
+	ret = 0;
+	netfs_readahead(ractl);
 out:
 	trace_nfs_aop_readahead_done(inode, nr_pages, ret);
 }
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index ba7e2e4b0926..f942ed3f7215 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -101,6 +101,7 @@ struct nfs_pageio_descriptor {
 	struct pnfs_layout_segment *pg_lseg;
 	struct nfs_io_completion *pg_io_completion;
 	struct nfs_direct_req	*pg_dreq;
+	void			*pg_netfs;
 	unsigned int		pg_bsize;	/* default bsize for mirrors */
 
 	u32			pg_mirror_count;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index e86cf6642d21..770c3348b696 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1619,6 +1619,7 @@ struct nfs_pgio_header {
 	const struct nfs_rw_ops	*rw_ops;
 	struct nfs_io_completion *io_completion;
 	struct nfs_direct_req	*dreq;
+	void			*netfs;
 
 	int			pnfs_error;
 	int			error;		/* merge with pnfs_error */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24  9:35 ` [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig Dave Wysochanski
@ 2022-08-24 12:42   ` Trond Myklebust
  2022-08-24 13:00     ` David Wysochanski
  0 siblings, 1 reply; 16+ messages in thread
From: Trond Myklebust @ 2022-08-24 12:42 UTC (permalink / raw)
  To: anna.schumaker, dwysocha, dhowells
  Cc: linux-cachefs, linux-nfs, daire.byrne, benmaynard

On Wed, 2022-08-24 at 05:35 -0400, Dave Wysochanski wrote:
> As first steps for support of the netfs library, add NETFS_SUPPORT
> to Kconfig and add the required netfs_inode into struct nfs_inode.
> The struct netfs_inode is now where the vfs_inode is stored as well
> as the fscache_cookie.  In addition, use the netfs_inode() and
> netfs_i_cookie() helpers, and remove our own helper, nfs_i_fscache().
> 
> Later patches will enable netfs by defining NFS specific version
> of struct netfs_request_ops and calling netfs_inode_init().
> 
> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> ---
>  fs/nfs/Kconfig         |  1 +
>  fs/nfs/delegation.c    |  2 +-
>  fs/nfs/dir.c           |  2 +-
>  fs/nfs/fscache.c       | 20 +++++++++-----------
>  fs/nfs/fscache.h       | 15 ++++++---------
>  fs/nfs/inode.c         |  6 +++---
>  fs/nfs/internal.h      |  2 +-
>  fs/nfs/pnfs.c          | 12 ++++++------
>  fs/nfs/write.c         |  2 +-
>  include/linux/nfs_fs.h | 19 +++++--------------
>  10 files changed, 34 insertions(+), 47 deletions(-)
> 
> diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
> index 14a72224b657..79b241bed762 100644
> --- a/fs/nfs/Kconfig
> +++ b/fs/nfs/Kconfig
> @@ -5,6 +5,7 @@ config NFS_FS
>         select LOCKD
>         select SUNRPC
>         select NFS_ACL_SUPPORT if NFS_V3_ACL
> +       select NETFS_SUPPORT
> 

NACK. I'm not at all OK with making netfs mandatory.


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24 12:42   ` Trond Myklebust
@ 2022-08-24 13:00     ` David Wysochanski
  2022-08-24 13:05       ` Trond Myklebust
  2022-08-24 14:12       ` David Howells
  0 siblings, 2 replies; 16+ messages in thread
From: David Wysochanski @ 2022-08-24 13:00 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: anna.schumaker, dhowells, linux-cachefs, linux-nfs, daire.byrne,
	benmaynard

On Wed, Aug 24, 2022 at 8:42 AM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Wed, 2022-08-24 at 05:35 -0400, Dave Wysochanski wrote:
> > As first steps for support of the netfs library, add NETFS_SUPPORT
> > to Kconfig and add the required netfs_inode into struct nfs_inode.
> > The struct netfs_inode is now where the vfs_inode is stored as well
> > as the fscache_cookie.  In addition, use the netfs_inode() and
> > netfs_i_cookie() helpers, and remove our own helper, nfs_i_fscache().
> >
> > Later patches will enable netfs by defining NFS specific version
> > of struct netfs_request_ops and calling netfs_inode_init().
> >
> > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > ---
> >  fs/nfs/Kconfig         |  1 +
> >  fs/nfs/delegation.c    |  2 +-
> >  fs/nfs/dir.c           |  2 +-
> >  fs/nfs/fscache.c       | 20 +++++++++-----------
> >  fs/nfs/fscache.h       | 15 ++++++---------
> >  fs/nfs/inode.c         |  6 +++---
> >  fs/nfs/internal.h      |  2 +-
> >  fs/nfs/pnfs.c          | 12 ++++++------
> >  fs/nfs/write.c         |  2 +-
> >  include/linux/nfs_fs.h | 19 +++++--------------
> >  10 files changed, 34 insertions(+), 47 deletions(-)
> >
> > diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
> > index 14a72224b657..79b241bed762 100644
> > --- a/fs/nfs/Kconfig
> > +++ b/fs/nfs/Kconfig
> > @@ -5,6 +5,7 @@ config NFS_FS
> >         select LOCKD
> >         select SUNRPC
> >         select NFS_ACL_SUPPORT if NFS_V3_ACL
> > +       select NETFS_SUPPORT
> >
>
> NACK. I'm not at all OK with making netfs mandatory.
>

Just so we're on the same page, are you ok with netfs being enabled if
fscache is enabled like today?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24 13:00     ` David Wysochanski
@ 2022-08-24 13:05       ` Trond Myklebust
  2022-08-24 14:12       ` David Howells
  1 sibling, 0 replies; 16+ messages in thread
From: Trond Myklebust @ 2022-08-24 13:05 UTC (permalink / raw)
  To: dwysocha
  Cc: linux-cachefs, linux-nfs, daire.byrne, anna.schumaker,
	benmaynard, dhowells

On Wed, 2022-08-24 at 09:00 -0400, David Wysochanski wrote:
> On Wed, Aug 24, 2022 at 8:42 AM Trond Myklebust
> <trondmy@hammerspace.com> wrote:
> > 
> > On Wed, 2022-08-24 at 05:35 -0400, Dave Wysochanski wrote:
> > > As first steps for support of the netfs library, add
> > > NETFS_SUPPORT
> > > to Kconfig and add the required netfs_inode into struct
> > > nfs_inode.
> > > The struct netfs_inode is now where the vfs_inode is stored as
> > > well
> > > as the fscache_cookie.  In addition, use the netfs_inode() and
> > > netfs_i_cookie() helpers, and remove our own helper,
> > > nfs_i_fscache().
> > > 
> > > Later patches will enable netfs by defining NFS specific version
> > > of struct netfs_request_ops and calling netfs_inode_init().
> > > 
> > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > ---
> > >  fs/nfs/Kconfig         |  1 +
> > >  fs/nfs/delegation.c    |  2 +-
> > >  fs/nfs/dir.c           |  2 +-
> > >  fs/nfs/fscache.c       | 20 +++++++++-----------
> > >  fs/nfs/fscache.h       | 15 ++++++---------
> > >  fs/nfs/inode.c         |  6 +++---
> > >  fs/nfs/internal.h      |  2 +-
> > >  fs/nfs/pnfs.c          | 12 ++++++------
> > >  fs/nfs/write.c         |  2 +-
> > >  include/linux/nfs_fs.h | 19 +++++--------------
> > >  10 files changed, 34 insertions(+), 47 deletions(-)
> > > 
> > > diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
> > > index 14a72224b657..79b241bed762 100644
> > > --- a/fs/nfs/Kconfig
> > > +++ b/fs/nfs/Kconfig
> > > @@ -5,6 +5,7 @@ config NFS_FS
> > >         select LOCKD
> > >         select SUNRPC
> > >         select NFS_ACL_SUPPORT if NFS_V3_ACL
> > > +       select NETFS_SUPPORT
> > > 
> > 
> > NACK. I'm not at all OK with making netfs mandatory.
> > 
> 
> Just so we're on the same page, are you ok with netfs being enabled
> if
> fscache is enabled like today?
> 

As long as it is an opt-in feature, I'm OK. I don't want to have to
compile it in by default.
A cachefs should never become a mandatory feature of networked
filesystems.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24 13:00     ` David Wysochanski
  2022-08-24 13:05       ` Trond Myklebust
@ 2022-08-24 14:12       ` David Howells
  2022-08-24 16:27         ` Trond Myklebust
  2022-08-25 15:20         ` David Howells
  1 sibling, 2 replies; 16+ messages in thread
From: David Howells @ 2022-08-24 14:12 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: dhowells, willy, dwysocha, linux-cachefs, linux-nfs, daire.byrne,
	anna.schumaker, benmaynard

Trond Myklebust <trondmy@hammerspace.com> wrote:

> As long as it is an opt-in feature, I'm OK. I don't want to have to
> compile it in by default.
> A cachefs should never become a mandatory feature of networked
> filesystems.

netfslib is intended to be used even if fsache is not enabled.  It is intended
to make the underlying network filesystem maintainer's life easier by:

 - Moving the implementation of all the VM ops from the network filesystems as
   much as possible into one place.  The network filesystem then just has to
   provide a read op and a write op.

 - Making it such that the filesystem doesn't have to deal with the difference
   between DIO and buffered I/O

 - Handling VM features on behalf of all filesystems.  This gives the VM folk
   one place to change instead of 5+.  mpage and iomap are similar things but
   for blockdev filesystems.

 - Providing features to those filesystems that can support them, eg.:

   - fscrypt
   - compression
   - bounce buffering
   - local caching
   - disconnected operation


Currently nfs interacts with fscache on a page-by-page basis, but this needs
to change:

 (1) Multipage folios are now a thing.  You need to roll folios out into nfs
     if you're going to take advantage of this.  Also, you may have noticed
     that all the VM interfaces are being recast in terms of folios.

 (2) I need to fix the cache so that it no longer uses the backing
     filesystem's metadata to track content.  To make this use less diskspace,
     I want to increase the cache block size to, say, 256K or 2M.

     This means that the cache needs to have a say in how big a read the
     network filesystem does - and also that a single cache request might need
     to be stitched together from multiple read ops.

 (3) More pagecache changes are lurking in the future, possibly including
     getting rid of the concept of pages entirely from the pagecache.

There are users of nfs + fscache and we'd like to continue to support them as
best as possible but the current code noticably degrades performance here.

Unfortunately, I'm also going to need to drop the fallback interface which nfs
currently uses in the next couple versions, we have to at least get the
fscache enabled conversion done.

I've been dealing with the VM, 9p, ceph and cifs people over the direction
that netfslib might need to go in, but for nfs, it's typically been a flat
"no".  I would like to work out how to make netfslib work for nfs also, if
you're willing to discuss it.

I would be open to having a look at importing nfs page handling into netfslib
and working from that - but it still needs to deal with (1) and (2) above, and
I would like to make it pass iterators down to the lower layers as buffer
descriptions.  It's also very complicated stuff.

Also:

 - I've noted the nfs_page structs that nfs uses and I'm looking at a way of
   having something similar, but held separately so that one struct can span
   and store information about multiple folios.

 - I'm looking at punting write-to-the-cache to writepages() or something like
   that so that the VM folks can reclaim the PG_private_2 flag bit, so that
   won't be available to nfs either in the future.

 - aops->write_begin() and ->write_end() are going to go away.  In netfslib
   what I'm trying to do is make a "netfs_perform_write" as a parallel to
   generic_perform_write().

David


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24 14:12       ` David Howells
@ 2022-08-24 16:27         ` Trond Myklebust
  2022-08-24 16:53           ` Matthew Wilcox
  2022-08-25 15:20         ` David Howells
  1 sibling, 1 reply; 16+ messages in thread
From: Trond Myklebust @ 2022-08-24 16:27 UTC (permalink / raw)
  To: dhowells
  Cc: linux-cachefs, linux-nfs, willy, anna.schumaker, daire.byrne,
	dwysocha, benmaynard

On Wed, 2022-08-24 at 15:12 +0100, David Howells wrote:
> Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> > As long as it is an opt-in feature, I'm OK. I don't want to have to
> > compile it in by default.
> > A cachefs should never become a mandatory feature of networked
> > filesystems.
> 
> netfslib is intended to be used even if fsache is not enabled.  It is
> intended
> to make the underlying network filesystem maintainer's life easier
> by:
> 
>  - Moving the implementation of all the VM ops from the network
> filesystems as
>    much as possible into one place.  The network filesystem then just
> has to
>    provide a read op and a write op.
> 
>  - Making it such that the filesystem doesn't have to deal with the
> difference
>    between DIO and buffered I/O
> 
>  - Handling VM features on behalf of all filesystems.  This gives the
> VM folk
>    one place to change instead of 5+.  mpage and iomap are similar
> things but
>    for blockdev filesystems.
> 
>  - Providing features to those filesystems that can support them,
> eg.:
> 
>    - fscrypt
>    - compression
>    - bounce buffering
>    - local caching
>    - disconnected operation
> 
> 
> Currently nfs interacts with fscache on a page-by-page basis, but
> this needs
> to change:
> 
>  (1) Multipage folios are now a thing.  You need to roll folios out
> into nfs
>      if you're going to take advantage of this.  Also, you may have
> noticed
>      that all the VM interfaces are being recast in terms of folios.

Right now, I see limited value in adding multipage folios to NFS.

While basic NFSv4 does allow you to pretend there is a fundamental
underlying block size, pNFS has changed all that, and we have had to
engineer support for determining the I/O block size on the fly, and
building the RPC requests accordingly. Client side mirroring just adds
to the fun.

As I see it, the only value that multipage folios might bring to NFS
would be smaller page cache management overhead when dealing with large
files.

> 
>  (2) I need to fix the cache so that it no longer uses the backing
>      filesystem's metadata to track content.  To make this use less
> diskspace,
>      I want to increase the cache block size to, say, 256K or 2M.
> 
>      This means that the cache needs to have a say in how big a read
> the
>      network filesystem does - and also that a single cache request
> might need
>      to be stitched together from multiple read ops.
> 
>  (3) More pagecache changes are lurking in the future, possibly
> including
>      getting rid of the concept of pages entirely from the pagecache.
> 
> There are users of nfs + fscache and we'd like to continue to support
> them as
> best as possible but the current code noticably degrades performance
> here.
> 
> Unfortunately, I'm also going to need to drop the fallback interface
> which nfs
> currently uses in the next couple versions, we have to at least get
> the
> fscache enabled conversion done.
> 
> I've been dealing with the VM, 9p, ceph and cifs people over the
> direction
> that netfslib might need to go in, but for nfs, it's typically been a
> flat
> "no".  I would like to work out how to make netfslib work for nfs
> also, if
> you're willing to discuss it.
> 
> I would be open to having a look at importing nfs page handling into
> netfslib
> and working from that - but it still needs to deal with (1) and (2)
> above, and
> I would like to make it pass iterators down to the lower layers as
> buffer
> descriptions.  It's also very complicated stuff.
> 
> Also:
> 
>  - I've noted the nfs_page structs that nfs uses and I'm looking at a
> way of
>    having something similar, but held separately so that one struct
> can span
>    and store information about multiple folios.
> 
>  - I'm looking at punting write-to-the-cache to writepages() or
> something like
>    that so that the VM folks can reclaim the PG_private_2 flag bit,
> so that
>    won't be available to nfs either in the future.
> 
>  - aops->write_begin() and ->write_end() are going to go away.  In
> netfslib
>    what I'm trying to do is make a "netfs_perform_write" as a
> parallel to
>    generic_perform_write().
> 

What problems would any of this solve for NFS? I'm worried about the
cost of all this proposed code churn as well; as you said 'it is
complicated stuff', mainly for the good reason that we've been
optimising a lot of code over the last 25-30 years.

However let's start with the "why?" question first. Why do I need an
extra layer of abstraction between NFS and the VM, when one of my
primary concerns right now is that the stack depth keeps growing?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24 16:27         ` Trond Myklebust
@ 2022-08-24 16:53           ` Matthew Wilcox
  2022-08-24 17:43             ` Trond Myklebust
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2022-08-24 16:53 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: dhowells, linux-cachefs, linux-nfs, anna.schumaker, daire.byrne,
	dwysocha, benmaynard

On Wed, Aug 24, 2022 at 04:27:04PM +0000, Trond Myklebust wrote:
> Right now, I see limited value in adding multipage folios to NFS.
> 
> While basic NFSv4 does allow you to pretend there is a fundamental
> underlying block size, pNFS has changed all that, and we have had to
> engineer support for determining the I/O block size on the fly, and
> building the RPC requests accordingly. Client side mirroring just adds
> to the fun.
> 
> As I see it, the only value that multipage folios might bring to NFS
> would be smaller page cache management overhead when dealing with large
> files.

Yes, but that's a Really Big Deal.  Machines with a lot of memory end
up with very long LRU lists.  We can't afford the overhead of managing
memory in 4kB chunks any more.  (I don't want to dwell on this point too
much; I've run the numbers before and can do so again if you want me to
go into more details).

Beyond that, filesystems have a lot of interactions with the page cache
today.  When I started looking at this, I thought filesystem people all
had a deep understanding of how the page cache worked.  Now I realise
everyone's as clueless as I am.  The real benefit I see to projects like
iomap/netfs is that they insulate filesystems from having to deal with
the page cache.  All the interactions are in two or three places and we
can refactor without having to talk to the owners of 50+ filesystems.

It also gives us a chance to re-examine some of the assumptions that
we have made over the years about how filesystems and page cache should
be interacting.  We've fixed a fair few bugs in recent years that came
about because filesystem people don't tend to have deep knowledge of mm
internals (and they shouldn't need to!)

I don't know that netfs has the perfect interface to be used for nfs.
But that too can be changed to make it work better for your needs.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24 16:53           ` Matthew Wilcox
@ 2022-08-24 17:43             ` Trond Myklebust
  2022-08-25 15:01               ` Matthew Wilcox
  2022-08-25 15:30               ` David Howells
  0 siblings, 2 replies; 16+ messages in thread
From: Trond Myklebust @ 2022-08-24 17:43 UTC (permalink / raw)
  To: willy
  Cc: linux-cachefs, linux-nfs, daire.byrne, anna.schumaker,
	benmaynard, dhowells, dwysocha

On Wed, 2022-08-24 at 17:53 +0100, Matthew Wilcox wrote:
> On Wed, Aug 24, 2022 at 04:27:04PM +0000, Trond Myklebust wrote:
> > Right now, I see limited value in adding multipage folios to NFS.
> > 
> > While basic NFSv4 does allow you to pretend there is a fundamental
> > underlying block size, pNFS has changed all that, and we have had
> > to
> > engineer support for determining the I/O block size on the fly, and
> > building the RPC requests accordingly. Client side mirroring just
> > adds
> > to the fun.
> > 
> > As I see it, the only value that multipage folios might bring to
> > NFS
> > would be smaller page cache management overhead when dealing with
> > large
> > files.
> 
> Yes, but that's a Really Big Deal.  Machines with a lot of memory end
> up with very long LRU lists.  We can't afford the overhead of
> managing
> memory in 4kB chunks any more.  (I don't want to dwell on this point
> too
> much; I've run the numbers before and can do so again if you want me
> to
> go into more details).
> 
> Beyond that, filesystems have a lot of interactions with the page
> cache
> today.  When I started looking at this, I thought filesystem people
> all
> had a deep understanding of how the page cache worked.  Now I realise
> everyone's as clueless as I am.  The real benefit I see to projects
> like
> iomap/netfs is that they insulate filesystems from having to deal
> with
> the page cache.  All the interactions are in two or three places and
> we
> can refactor without having to talk to the owners of 50+ filesystems.
> 
> It also gives us a chance to re-examine some of the assumptions that
> we have made over the years about how filesystems and page cache
> should
> be interacting.  We've fixed a fair few bugs in recent years that
> came
> about because filesystem people don't tend to have deep knowledge of
> mm
> internals (and they shouldn't need to!)
> 
> I don't know that netfs has the perfect interface to be used for nfs.
> But that too can be changed to make it work better for your needs.

If the VM folks need it, then adding support for multi-page folios is a
much smaller scope than what David was describing. It can be done
without too much surgery to the existing NFS I/O stack. We already have
code to support I/O block sizes that are much less than the page size,
so converting that to act on larger folios is not a huge deal.

What would be useful there is something like a range tree to allow us
to move beyond the PG_uptodate bit, and help make the
is_partially_uptodate() address_space_operation a bit more useful.
Otherwise, we end up having to read in the entire folio, which is what
we do today for pages, but could get onerous with large folios when
doing file random access.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24 17:43             ` Trond Myklebust
@ 2022-08-25 15:01               ` Matthew Wilcox
  2022-08-25 15:32                 ` Trond Myklebust
  2022-08-25 15:30               ` David Howells
  1 sibling, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2022-08-25 15:01 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-cachefs, linux-nfs, daire.byrne, anna.schumaker,
	benmaynard, dhowells, dwysocha

On Wed, Aug 24, 2022 at 05:43:36PM +0000, Trond Myklebust wrote:
> On Wed, 2022-08-24 at 17:53 +0100, Matthew Wilcox wrote:
> > On Wed, Aug 24, 2022 at 04:27:04PM +0000, Trond Myklebust wrote:
> > > Right now, I see limited value in adding multipage folios to NFS.
> > > 
> > > While basic NFSv4 does allow you to pretend there is a fundamental
> > > underlying block size, pNFS has changed all that, and we have had
> > > to
> > > engineer support for determining the I/O block size on the fly, and
> > > building the RPC requests accordingly. Client side mirroring just
> > > adds
> > > to the fun.
> > > 
> > > As I see it, the only value that multipage folios might bring to
> > > NFS
> > > would be smaller page cache management overhead when dealing with
> > > large
> > > files.
> > 
> > Yes, but that's a Really Big Deal.  Machines with a lot of memory end
> > up with very long LRU lists.  We can't afford the overhead of
> > managing
> > memory in 4kB chunks any more.  (I don't want to dwell on this point
> > too
> > much; I've run the numbers before and can do so again if you want me
> > to
> > go into more details).
> > 
> > Beyond that, filesystems have a lot of interactions with the page
> > cache
> > today.  When I started looking at this, I thought filesystem people
> > all
> > had a deep understanding of how the page cache worked.  Now I realise
> > everyone's as clueless as I am.  The real benefit I see to projects
> > like
> > iomap/netfs is that they insulate filesystems from having to deal
> > with
> > the page cache.  All the interactions are in two or three places and
> > we
> > can refactor without having to talk to the owners of 50+ filesystems.
> > 
> > It also gives us a chance to re-examine some of the assumptions that
> > we have made over the years about how filesystems and page cache
> > should
> > be interacting.  We've fixed a fair few bugs in recent years that
> > came
> > about because filesystem people don't tend to have deep knowledge of
> > mm
> > internals (and they shouldn't need to!)
> > 
> > I don't know that netfs has the perfect interface to be used for nfs.
> > But that too can be changed to make it work better for your needs.
> 
> If the VM folks need it, then adding support for multi-page folios is a
> much smaller scope than what David was describing. It can be done
> without too much surgery to the existing NFS I/O stack. We already have
> code to support I/O block sizes that are much less than the page size,
> so converting that to act on larger folios is not a huge deal.
> 
> What would be useful there is something like a range tree to allow us
> to move beyond the PG_uptodate bit, and help make the
> is_partially_uptodate() address_space_operation a bit more useful.
> Otherwise, we end up having to read in the entire folio, which is what
> we do today for pages, but could get onerous with large folios when
> doing file random access.

This is interesting because nobody's asked for this before.  I've had
similar discussions around dirty data tracking, but not around uptodate.
Random small reads shouldn't be a terrible problem; if they truly are
random, we behave as today, allocating single pages, reading the entire
page from the server and setting it uptodate.  If the readahead code
detects a contiguous large read, we increase the allocation size to
match, but again we always read the entire folio from the server and
mark it uptodate.

As far as I know, the only time we create !uptodate folios in the page
cache is partial writes to a folio which has not been previously read.
Obviously, those bytes start out dirty and are tracked through the
existing dirty mechanism, but once they've been written back, we have
three choices that I can see:

1. transition those bytes to a mechanism which records they're uptodate
2. discard that information and re-read the entire folio from the server
   if any bytes are subsequently read
3. read the other bytes in that folio from the server and mark the
   entire folio uptodate

We have a mixture of those options implemented in different filesystems
today.  iomap records whether a block is uptodate or not and treats
every uptodate block as dirty if any block in the folio is dirty.
buffer_head has two bits for each block, separately recording whether
it's dirty and/or uptodate.  AFS tracks one dirty range per folio, but
it first brings the folio uptodate by reading it from the server before
overwriting it (I suppose that's a fourth option).

I don't see a compelling reason for different filesystems to behave
differently here.  I'd like us to settle on one design we can all share,
and I was hoping netfs would be the platform for that.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24 14:12       ` David Howells
  2022-08-24 16:27         ` Trond Myklebust
@ 2022-08-25 15:20         ` David Howells
  1 sibling, 0 replies; 16+ messages in thread
From: David Howells @ 2022-08-25 15:20 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: dhowells, linux-cachefs, linux-nfs, willy, anna.schumaker,
	daire.byrne, dwysocha, benmaynard

Trond Myklebust <trondmy@hammerspace.com> wrote:

> While basic NFSv4 does allow you to pretend there is a fundamental
> underlying block size, pNFS has changed all that, and we have had to
> engineer support for determining the I/O block size on the fly, and
> building the RPC requests accordingly. Client side mirroring just adds
> to the fun.

I've been working with Jeff to make netfslib handle ceph with its distributed
object model as well as 9p and afs with their more traditionally-appearing
flat files.

> However let's start with the "why?" question first. Why do I need an
> extra layer of abstraction between NFS and the VM, when one of my
> primary concerns right now is that the stack depth keeps growing?

It's not exactly an extra layer - it's more a case of taking the same layer
out of five[*] network filesystems, combining them and sharing it.

[*] up to 7 if I can roll it out into orangefs and/or fuse as well.

As to why, well I kind of covered that, but we want to add some services to
network filesystems (such as content encryption) and rather than adding
separately to all five, there exists the possibility of just doing it the once
and sharing it (granted there may be parts that can't be shared).

But also, I need to fix cachefiles - and I can't do that whilst nfs is
operating on a page-by-page basis.  Cachefiles has to have an early say on the
size and shape of a transaction.

And speaking of content encryption, if you're using a local cache and content
encryption, you really don't want the unencrypted data to be stored in your
local cache on your laptop, say - so that requires storage of the encrypted
data into the cache.

Further, the VM folks would like the PG_private_2 bit back, along with
PG_checked and PG_error.  So we need a different way of managing writes to the
cache and preventing overlapping DIO writes.

> What problems would any of this solve for NFS? I'm worried about the
> cost of all this proposed code churn as well; as you said 'it is
> complicated stuff', mainly for the good reason that we've been
> optimising a lot of code over the last 25-30 years.

First off, NFS would get to partake of services being implemented in netfslib.
Granted, this isn't exactly solving problems in NFS, more providing additional
features.

Secondly, shared code means less code - and the code would, in theory, be
better-tested as it would have more users.

Thirdly, it would hopefully reduce the maintenance burden, particularly for
the VM people.

David


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-24 17:43             ` Trond Myklebust
  2022-08-25 15:01               ` Matthew Wilcox
@ 2022-08-25 15:30               ` David Howells
  1 sibling, 0 replies; 16+ messages in thread
From: David Howells @ 2022-08-25 15:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: dhowells, Trond Myklebust, linux-cachefs, linux-nfs, daire.byrne,
	anna.schumaker, benmaynard, dwysocha

Matthew Wilcox <willy@infradead.org> wrote:

> AFS tracks one dirty range per folio, but it first brings the folio uptodate
> by reading it from the server before overwriting it (I suppose that's a
> fourth option).

I'm intending on moving afs towards the nfs way of doing things when writing
to as-yet unread folios - unless a cache is in operation, then we read it
anyway and store the folio(s) into the cache unless the entire cache granule
is going to be overwritten unless we're supporting disconnected mode.  I know
that's exceptions-to-exceptions.

David


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-25 15:01               ` Matthew Wilcox
@ 2022-08-25 15:32                 ` Trond Myklebust
  2022-08-25 17:53                   ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Trond Myklebust @ 2022-08-25 15:32 UTC (permalink / raw)
  To: willy
  Cc: linux-cachefs, linux-nfs, daire.byrne, anna.schumaker,
	benmaynard, dhowells, dwysocha

On Thu, 2022-08-25 at 16:01 +0100, Matthew Wilcox wrote:
> On Wed, Aug 24, 2022 at 05:43:36PM +0000, Trond Myklebust wrote:
> > On Wed, 2022-08-24 at 17:53 +0100, Matthew Wilcox wrote:
> > > On Wed, Aug 24, 2022 at 04:27:04PM +0000, Trond Myklebust wrote:
> > > > Right now, I see limited value in adding multipage folios to
> > > > NFS.
> > > > 
> > > > While basic NFSv4 does allow you to pretend there is a
> > > > fundamental
> > > > underlying block size, pNFS has changed all that, and we have
> > > > had
> > > > to
> > > > engineer support for determining the I/O block size on the fly,
> > > > and
> > > > building the RPC requests accordingly. Client side mirroring
> > > > just
> > > > adds
> > > > to the fun.
> > > > 
> > > > As I see it, the only value that multipage folios might bring
> > > > to
> > > > NFS
> > > > would be smaller page cache management overhead when dealing
> > > > with
> > > > large
> > > > files.
> > > 
> > > Yes, but that's a Really Big Deal.  Machines with a lot of memory
> > > end
> > > up with very long LRU lists.  We can't afford the overhead of
> > > managing
> > > memory in 4kB chunks any more.  (I don't want to dwell on this
> > > point
> > > too
> > > much; I've run the numbers before and can do so again if you want
> > > me
> > > to
> > > go into more details).
> > > 
> > > Beyond that, filesystems have a lot of interactions with the page
> > > cache
> > > today.  When I started looking at this, I thought filesystem
> > > people
> > > all
> > > had a deep understanding of how the page cache worked.  Now I
> > > realise
> > > everyone's as clueless as I am.  The real benefit I see to
> > > projects
> > > like
> > > iomap/netfs is that they insulate filesystems from having to deal
> > > with
> > > the page cache.  All the interactions are in two or three places
> > > and
> > > we
> > > can refactor without having to talk to the owners of 50+
> > > filesystems.
> > > 
> > > It also gives us a chance to re-examine some of the assumptions
> > > that
> > > we have made over the years about how filesystems and page cache
> > > should
> > > be interacting.  We've fixed a fair few bugs in recent years that
> > > came
> > > about because filesystem people don't tend to have deep knowledge
> > > of
> > > mm
> > > internals (and they shouldn't need to!)
> > > 
> > > I don't know that netfs has the perfect interface to be used for
> > > nfs.
> > > But that too can be changed to make it work better for your
> > > needs.
> > 
> > If the VM folks need it, then adding support for multi-page folios
> > is a
> > much smaller scope than what David was describing. It can be done
> > without too much surgery to the existing NFS I/O stack. We already
> > have
> > code to support I/O block sizes that are much less than the page
> > size,
> > so converting that to act on larger folios is not a huge deal.
> > 
> > What would be useful there is something like a range tree to allow
> > us
> > to move beyond the PG_uptodate bit, and help make the
> > is_partially_uptodate() address_space_operation a bit more useful.
> > Otherwise, we end up having to read in the entire folio, which is
> > what
> > we do today for pages, but could get onerous with large folios when
> > doing file random access.
> 
> This is interesting because nobody's asked for this before.  I've had
> similar discussions around dirty data tracking, but not around
> uptodate.
> Random small reads shouldn't be a terrible problem; if they truly are
> random, we behave as today, allocating single pages, reading the
> entire
> page from the server and setting it uptodate.  If the readahead code
> detects a contiguous large read, we increase the allocation size to
> match, but again we always read the entire folio from the server and
> mark it uptodate.
> 
> As far as I know, the only time we create !uptodate folios in the
> page
> cache is partial writes to a folio which has not been previously
> read.
> Obviously, those bytes start out dirty and are tracked through the
> existing dirty mechanism, but once they've been written back, we have
> three choices that I can see:
> 
> 1. transition those bytes to a mechanism which records they're
> uptodate
> 2. discard that information and re-read the entire folio from the
> server
>    if any bytes are subsequently read
> 3. read the other bytes in that folio from the server and mark the
>    entire folio uptodate
> 
> We have a mixture of those options implemented in different
> filesystems
> today.  iomap records whether a block is uptodate or not and treats
> every uptodate block as dirty if any block in the folio is dirty.
> buffer_head has two bits for each block, separately recording whether
> it's dirty and/or uptodate.  AFS tracks one dirty range per folio,
> but
> it first brings the folio uptodate by reading it from the server
> before
> overwriting it (I suppose that's a fourth option).
> 

I'm not talking about the transition of dirty->clean. We already deal
with that. I'm talking about supporting large folios on read-mainly
workloads.

NFS can happily support 1MB sized folios, or even larger than that if
there is a compelling reason to do so.

However, having to read in the entire folio contents if the user is
just asking for a few bytes on a database-style random read workload
can quickly get onerous.
While a lot of NFS servers can do 1MB reads in one RPC call, there are
still many out there that can't. For those servers, we'd have to fall
back to issuing multiple read RPC calls in parallel (which is what we
do today if the user sets an rsize < PAGE_SIZE). This leads to
unnecessary load on the server, which has to deal with multiple RPC
calls for data that won't be used.
The other point is that if your network bandwidth is limited, there is
value in avoiding reads for data that isn't going to be used, which is
why we changed the NFS readahead behaviour to be less aggressive than
it used to be.

This is why I'm suggesting that if you really want to cut down the LRU
table size, you'll want finer grained page up to date tracking than the
folio. It's not so much for the case of writes as it is for the read-
mostly workloads.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig
  2022-08-25 15:32                 ` Trond Myklebust
@ 2022-08-25 17:53                   ` Matthew Wilcox
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2022-08-25 17:53 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-cachefs, linux-nfs, daire.byrne, anna.schumaker,
	benmaynard, dhowells, dwysocha

On Thu, Aug 25, 2022 at 03:32:25PM +0000, Trond Myklebust wrote:
> I'm not talking about the transition of dirty->clean. We already deal
> with that. I'm talking about supporting large folios on read-mainly
> workloads.
> 
> NFS can happily support 1MB sized folios, or even larger than that if
> there is a compelling reason to do so.
> 
> However, having to read in the entire folio contents if the user is
> just asking for a few bytes on a database-style random read workload
> can quickly get onerous.

Ah, we don't do that.  If the user is only asking for a few bytes and
there's no indication that it's part of a streaming read, we allocate &
fill a single page, just as before.  We adapt to the user's workload
and only allocate multi-page folios when there's evidence that it'll
be useful to do so.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-08-25 17:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-24  9:34 [RFC PATCH 0/3] Convert NFS to the new netfs API Dave Wysochanski
2022-08-24  9:34 ` [RFC PATCH 1/3] NFS: Rename readpage_async_filler to nfs_pageio_add_page Dave Wysochanski
2022-08-24  9:35 ` [RFC PATCH 2/3] NFS: Add support for netfs in struct nfs_inode and Kconfig Dave Wysochanski
2022-08-24 12:42   ` Trond Myklebust
2022-08-24 13:00     ` David Wysochanski
2022-08-24 13:05       ` Trond Myklebust
2022-08-24 14:12       ` David Howells
2022-08-24 16:27         ` Trond Myklebust
2022-08-24 16:53           ` Matthew Wilcox
2022-08-24 17:43             ` Trond Myklebust
2022-08-25 15:01               ` Matthew Wilcox
2022-08-25 15:32                 ` Trond Myklebust
2022-08-25 17:53                   ` Matthew Wilcox
2022-08-25 15:30               ` David Howells
2022-08-25 15:20         ` David Howells
2022-08-24  9:35 ` [RFC PATCH 3/3] NFS: Convert nfs_read_folio and nfs_readahead to netfs APIs Dave Wysochanski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.