linux-erofs.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache
@ 2024-03-28 16:33 David Howells
  2024-03-28 16:33 ` [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings David Howells
                   ` (31 more replies)
  0 siblings, 32 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:33 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Hi Christian, Willy,

The primary purpose of these patches is to rework the netfslib writeback
implementation such that pages read from the cache are written to the cache
through ->writepages(), thereby allowing the fscache page flag to be
retired.

The reworking also:

 (1) builds on top of the new writeback_iter() infrastructure;

 (2) makes it possible to use vectored write RPCs as discontiguous streams
     of pages can be accommodated;

 (3) makes it easier to do simultaneous content crypto and stream division.

 (4) provides support for retrying writes and re-dividing a stream;

 (5) replaces the ->launder_folio() op, so that ->writepages() is used
     instead;

 (6) uses mempools to allocate the netfs_io_request and netfs_io_subrequest
     structs to avoid allocation failure in the writeback path.

Some code that uses the fscache page flag is retained for compatibility
purposes with nfs and ceph.  The code is switched to using the synonymous
private_2 label instead and marked with deprecation comments.  I have a
separate set of patches that convert cifs to use this code.

-~-

In this new implementation, writeback_iter() is used to pump folios,
progressively creating two parallel, but separate streams.  Either or both
streams can contain gaps, and the subrequests in each stream can be of
variable size, don't need to align with each other and don't need to align
with the folios.  (Note that more streams can be added if we have multiple
servers to duplicate data to).

Indeed, subrequests can cross folio boundaries, may cover several folios or
a folio may be spanned by multiple subrequests, e.g.:

         +---+---+-----+-----+---+----------+
Folios:  |   |   |     |     |   |          |
         +---+---+-----+-----+---+----------+

           +------+------+     +----+----+
Upload:    |      |      |.....|    |    |
           +------+------+     +----+----+

         +------+------+------+------+------+
Cache:   |      |      |      |      |      |
         +------+------+------+------+------+

Data that got read from the server that needs copying to the cache is
stored in folios that are marked dirty and have folio->private set to a
special value.

The progressive subrequest construction permits the algorithm to be
preparing both the next upload to the server and the next write to the
cache whilst the previous ones are already in progress.  Throttling can be
applied to control the rate of production of subrequests - and, in any
case, we probably want to write them to the server in ascending order,
particularly if the file will be extended.

Content crypto can also be prepared at the same time as the subrequests and
run asynchronously, with the prepped requests being stalled until the
crypto catches up with them.  This might also be useful for transport
crypto, but that happens at a lower layer, so probably would be harder to
pull off.

The algorithm is split into three parts:

 (1) The issuer.  This walks through the data, packaging it up, encrypting
     it and creating subrequests.  The part of this that generates
     subrequests only deals with file positions and spans and so is usable
     for DIO/unbuffered writes as well as buffered writes.

 (2) The collector.  This asynchronously collects completed subrequests,
     unlocks folios, frees crypto buffers and performs any retries.  This
     runs in a work queue so that the issuer can return to the caller for
     writeback (so that the VM can have its kswapd thread back) or async
     writes.

     Collection is slightly complex as the collector has to work out where
     discontiguities happen in the folio list so that it doesn't try and
     collect folios that weren't included in the write out.

 (3) The retryer.  This pauses the issuer, waits for all outstanding
     subrequests to complete and then goes through the failed subrequests
     to reissue them.  This may involve reprepping them (with cifs, the
     credits must be renegotiated and a subrequest may need splitting), and
     doing RMW for content crypto if there's a conflicting change on the
     server.

David

David Howells (26):
  cifs: Fix duplicate fscache cookie warnings
  9p: Clean up some kdoc and unused var warnings.
  netfs: Update i_blocks when write committed to pagecache
  netfs: Replace PG_fscache by setting folio->private and marking dirty
  mm: Remove the PG_fscache alias for PG_private_2
  netfs: Remove deprecated use of PG_private_2 as a second writeback
    flag
  netfs: Make netfs_io_request::subreq_counter an atomic_t
  netfs: Use subreq_counter to allocate subreq debug_index values
  mm: Provide a means of invalidation without using launder_folio
  cifs: Use alternative invalidation to using launder_folio
  9p: Use alternative invalidation to using launder_folio
  afs: Use alternative invalidation to using launder_folio
  netfs: Remove ->launder_folio() support
  netfs: Use mempools for allocating requests and subrequests
  mm: Export writeback_iter()
  netfs: Switch to using unsigned long long rather than loff_t
  netfs: Fix writethrough-mode error handling
  netfs: Add some write-side stats and clean up some stat names
  netfs: New writeback implementation
  netfs, afs: Implement helpers for new write code
  netfs, 9p: Implement helpers for new write code
  netfs, cachefiles: Implement helpers for new write code
  netfs: Cut over to using new writeback code
  netfs: Remove the old writeback code
  netfs: Miscellaneous tidy ups
  netfs, afs: Use writeback retry to deal with alternate keys

 fs/9p/vfs_addr.c             |  60 +--
 fs/9p/vfs_inode_dotl.c       |   4 -
 fs/afs/file.c                |   8 +-
 fs/afs/internal.h            |   6 +-
 fs/afs/validation.c          |   4 +-
 fs/afs/write.c               | 187 ++++----
 fs/cachefiles/io.c           |  75 +++-
 fs/ceph/addr.c               |  24 +-
 fs/ceph/inode.c              |   2 +
 fs/netfs/Makefile            |   3 +-
 fs/netfs/buffered_read.c     |  40 +-
 fs/netfs/buffered_write.c    | 832 ++++-------------------------------
 fs/netfs/direct_write.c      |  30 +-
 fs/netfs/fscache_io.c        |  14 +-
 fs/netfs/internal.h          |  55 ++-
 fs/netfs/io.c                | 155 +------
 fs/netfs/main.c              |  55 ++-
 fs/netfs/misc.c              |  10 +-
 fs/netfs/objects.c           |  81 +++-
 fs/netfs/output.c            | 478 --------------------
 fs/netfs/stats.c             |  17 +-
 fs/netfs/write_collect.c     | 813 ++++++++++++++++++++++++++++++++++
 fs/netfs/write_issue.c       | 673 ++++++++++++++++++++++++++++
 fs/nfs/file.c                |   8 +-
 fs/nfs/fscache.h             |   6 +-
 fs/nfs/write.c               |   4 +-
 fs/smb/client/cifsfs.h       |   1 -
 fs/smb/client/file.c         | 136 +-----
 fs/smb/client/fscache.c      |  16 +-
 fs/smb/client/inode.c        |  27 +-
 include/linux/fscache.h      |  22 +-
 include/linux/netfs.h        | 196 +++++----
 include/linux/pagemap.h      |   1 +
 include/net/9p/client.h      |   2 +
 include/trace/events/netfs.h | 249 ++++++++++-
 mm/filemap.c                 |  52 ++-
 mm/page-writeback.c          |   1 +
 net/9p/Kconfig               |   1 +
 net/9p/client.c              |  49 +++
 net/9p/trans_fd.c            |   1 -
 40 files changed, 2492 insertions(+), 1906 deletions(-)
 delete mode 100644 fs/netfs/output.c
 create mode 100644 fs/netfs/write_collect.c
 create mode 100644 fs/netfs/write_issue.c


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
@ 2024-03-28 16:33 ` David Howells
  2024-04-15 11:25   ` Jeff Layton
  2024-04-15 13:03   ` David Howells
  2024-03-28 16:33 ` [PATCH 02/26] 9p: Clean up some kdoc and unused var warnings David Howells
                   ` (30 subsequent siblings)
  31 siblings, 2 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:33 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: David Howells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Matthew Wilcox, Steve French, linux-cachefs,
	Ilya Dryomov, Shyam Prasad N, Shyam Prasad N, Tom Talpey,
	ceph-devel, Eric Van Hensbergen, linux-nfs, Rohith Surabattula,
	netdev, v9fs, linux-kernel, Steve French, linux-fsdevel, netfs,
	linux-erofs

fscache emits a lot of duplicate cookie warnings with cifs because the
index key for the fscache cookies does not include everything that the
cifs_find_inode() function does.  The latter is used with iget5_locked() to
distinguish between inodes in the local inode cache.

Fix this by adding the creation time and file type to the fscache cookie
key.

Additionally, add a couple of comments to note that if one is changed the
other must be also.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/smb/client/fscache.c | 16 +++++++++++++++-
 fs/smb/client/inode.c   |  2 ++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/smb/client/fscache.c b/fs/smb/client/fscache.c
index c4a3cb736881..340efce8f052 100644
--- a/fs/smb/client/fscache.c
+++ b/fs/smb/client/fscache.c
@@ -12,6 +12,16 @@
 #include "cifs_fs_sb.h"
 #include "cifsproto.h"
 
+/*
+ * Key for fscache inode.  [!] Contents must match comparisons in cifs_find_inode().
+ */
+struct cifs_fscache_inode_key {
+
+	__le64  uniqueid;	/* server inode number */
+	__le64  createtime;	/* creation time on server */
+	u8	type;		/* S_IFMT file type */
+} __packed;
+
 static void cifs_fscache_fill_volume_coherency(
 	struct cifs_tcon *tcon,
 	struct cifs_fscache_volume_coherency_data *cd)
@@ -97,15 +107,19 @@ void cifs_fscache_release_super_cookie(struct cifs_tcon *tcon)
 void cifs_fscache_get_inode_cookie(struct inode *inode)
 {
 	struct cifs_fscache_inode_coherency_data cd;
+	struct cifs_fscache_inode_key key;
 	struct cifsInodeInfo *cifsi = CIFS_I(inode);
 	struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
 	struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
 
+	key.uniqueid	= cpu_to_le64(cifsi->uniqueid);
+	key.createtime	= cpu_to_le64(cifsi->createtime);
+	key.type	= (inode->i_mode & S_IFMT) >> 12;
 	cifs_fscache_fill_coherency(&cifsi->netfs.inode, &cd);
 
 	cifsi->netfs.cache =
 		fscache_acquire_cookie(tcon->fscache, 0,
-				       &cifsi->uniqueid, sizeof(cifsi->uniqueid),
+				       &key, sizeof(key),
 				       &cd, sizeof(cd),
 				       i_size_read(&cifsi->netfs.inode));
 	if (cifsi->netfs.cache)
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
index d28ab0af6049..91b07ef9e25c 100644
--- a/fs/smb/client/inode.c
+++ b/fs/smb/client/inode.c
@@ -1351,6 +1351,8 @@ cifs_find_inode(struct inode *inode, void *opaque)
 {
 	struct cifs_fattr *fattr = opaque;
 
+	/* [!] The compared values must be the same in struct cifs_fscache_inode_key. */
+
 	/* don't match inode with different uniqueid */
 	if (CIFS_I(inode)->uniqueid != fattr->cf_uniqueid)
 		return 0;


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 02/26] 9p: Clean up some kdoc and unused var warnings.
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
  2024-03-28 16:33 ` [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings David Howells
@ 2024-03-28 16:33 ` David Howells
  2024-03-28 16:33 ` [PATCH 03/26] netfs: Update i_blocks when write committed to pagecache David Howells
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:33 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, David Howells, linux-mm,
	Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs,
	Matthew Wilcox, Steve French, linux-cachefs, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	linux-nfs, netdev, v9fs, linux-kernel, linux-fsdevel, netfs,
	linux-erofs

Remove the kdoc for the removed 'req' member of the 9p_conn struct.

Remove a pair of set-but-not-used v9ses variables.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: v9fs@lists.linux.dev
---
 fs/9p/vfs_inode_dotl.c | 4 ----
 net/9p/trans_fd.c      | 1 -
 2 files changed, 5 deletions(-)

diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index ef9db3e03506..7af27ba1c25d 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -297,7 +297,6 @@ static int v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
 			       umode_t omode)
 {
 	int err;
-	struct v9fs_session_info *v9ses;
 	struct p9_fid *fid = NULL, *dfid = NULL;
 	kgid_t gid;
 	const unsigned char *name;
@@ -307,7 +306,6 @@ static int v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
 	struct posix_acl *dacl = NULL, *pacl = NULL;
 
 	p9_debug(P9_DEBUG_VFS, "name %pd\n", dentry);
-	v9ses = v9fs_inode2v9ses(dir);
 
 	omode |= S_IFDIR;
 	if (dir->i_mode & S_ISGID)
@@ -739,7 +737,6 @@ v9fs_vfs_mknod_dotl(struct mnt_idmap *idmap, struct inode *dir,
 	kgid_t gid;
 	const unsigned char *name;
 	umode_t mode;
-	struct v9fs_session_info *v9ses;
 	struct p9_fid *fid = NULL, *dfid = NULL;
 	struct inode *inode;
 	struct p9_qid qid;
@@ -749,7 +746,6 @@ v9fs_vfs_mknod_dotl(struct mnt_idmap *idmap, struct inode *dir,
 		 dir->i_ino, dentry, omode,
 		 MAJOR(rdev), MINOR(rdev));
 
-	v9ses = v9fs_inode2v9ses(dir);
 	dfid = v9fs_parent_fid(dentry);
 	if (IS_ERR(dfid)) {
 		err = PTR_ERR(dfid);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 1a3948b8c493..196060dc6138 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -95,7 +95,6 @@ struct p9_poll_wait {
  * @unsent_req_list: accounting for requests that haven't been sent
  * @rreq: read request
  * @wreq: write request
- * @req: current request being processed (if any)
  * @tmp_buf: temporary buffer to read in header
  * @rc: temporary fcall for reading current frame
  * @wpos: write position for current frame


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 03/26] netfs: Update i_blocks when write committed to pagecache
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
  2024-03-28 16:33 ` [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings David Howells
  2024-03-28 16:33 ` [PATCH 02/26] 9p: Clean up some kdoc and unused var warnings David Howells
@ 2024-03-28 16:33 ` David Howells
  2024-04-15 11:28   ` Jeff Layton
  2024-04-16 22:47   ` David Howells
  2024-03-28 16:33 ` [PATCH 04/26] netfs: Replace PG_fscache by setting folio->private and marking dirty David Howells
                   ` (28 subsequent siblings)
  31 siblings, 2 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:33 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: David Howells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Matthew Wilcox, Steve French, linux-cachefs,
	Ilya Dryomov, Shyam Prasad N, Shyam Prasad N, Tom Talpey,
	ceph-devel, Eric Van Hensbergen, linux-nfs, Rohith Surabattula,
	netdev, v9fs, linux-kernel, Steve French, linux-fsdevel, netfs,
	linux-erofs

Update i_blocks when i_size is updated when we finish making a write to the
pagecache to reflect the amount of space we think will be consumed.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---
 fs/netfs/buffered_write.c | 45 +++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 9a0d32e4b422..c194655a6dcf 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -130,6 +130,37 @@ static struct folio *netfs_grab_folio_for_write(struct address_space *mapping,
 				   mapping_gfp_mask(mapping));
 }
 
+/*
+ * Update i_size and estimate the update to i_blocks to reflect the additional
+ * data written into the pagecache until we can find out from the server what
+ * the values actually are.
+ */
+static void netfs_update_i_size(struct netfs_inode *ctx, struct inode *inode,
+				loff_t i_size, loff_t pos, size_t copied)
+{
+	blkcnt_t add;
+	size_t gap;
+
+	if (ctx->ops->update_i_size) {
+		ctx->ops->update_i_size(inode, pos);
+		return;
+	}
+
+	i_size_write(inode, pos);
+#if IS_ENABLED(CONFIG_FSCACHE)
+	fscache_update_cookie(ctx->cache, NULL, &pos);
+#endif
+
+	gap = SECTOR_SIZE - (i_size & (SECTOR_SIZE - 1));
+	if (copied > gap) {
+		add = DIV_ROUND_UP(copied - gap, SECTOR_SIZE);
+
+		inode->i_blocks = min_t(blkcnt_t,
+					DIV_ROUND_UP(pos, SECTOR_SIZE),
+					inode->i_blocks + add);
+	}
+}
+
 /**
  * netfs_perform_write - Copy data into the pagecache.
  * @iocb: The operation parameters
@@ -352,18 +383,10 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 		trace_netfs_folio(folio, trace);
 
 		/* Update the inode size if we moved the EOF marker */
-		i_size = i_size_read(inode);
 		pos += copied;
-		if (pos > i_size) {
-			if (ctx->ops->update_i_size) {
-				ctx->ops->update_i_size(inode, pos);
-			} else {
-				i_size_write(inode, pos);
-#if IS_ENABLED(CONFIG_FSCACHE)
-				fscache_update_cookie(ctx->cache, NULL, &pos);
-#endif
-			}
-		}
+		i_size = i_size_read(inode);
+		if (pos > i_size)
+			netfs_update_i_size(ctx, inode, i_size, pos, copied);
 		written += copied;
 
 		if (likely(!wreq)) {


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 04/26] netfs: Replace PG_fscache by setting folio->private and marking dirty
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (2 preceding siblings ...)
  2024-03-28 16:33 ` [PATCH 03/26] netfs: Update i_blocks when write committed to pagecache David Howells
@ 2024-03-28 16:33 ` David Howells
  2024-03-28 16:33 ` [PATCH 05/26] mm: Remove the PG_fscache alias for PG_private_2 David Howells
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:33 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, David Howells, linux-mm,
	Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs, Xiubo Li,
	Matthew Wilcox, Steve French, linux-cachefs, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, Bharath SM, Ronnie Sahlberg,
	ceph-devel, Eric Van Hensbergen, Trond Myklebust, linux-nfs,
	netdev, v9fs, linux-kernel, Steve French, Anna Schumaker,
	linux-fsdevel, netfs, linux-erofs

When dirty data is being written to the cache, setting/waiting on/clearing
the fscache flag is always done in tandem with setting/waiting on/clearing
the writeback flag.  The netfslib buffered write routines wait on and set
both flags and the write request cleanup clears both flags, so the fscache
flag is almost superfluous.

The reason it isn't superfluous is because the fscache flag is also used to
indicate that data just read from the server is being written to the cache.
The flag is used to prevent a race involving overlapping direct-I/O writes
to the cache.

Change this to indicate that a page is in need of being copied to the cache
by placing a magic value in folio->private and marking the folios dirty.
Then when the writeback code sees a folio marked in this way, it only
writes it to the cache and not to the server.

If a folio that has this magic value set is modified, the value is just
replaced and the folio will then be uplodaded too.

With this, PG_fscache is no longer required by the netfslib core, 9p and
afs.

Ceph and nfs, however, still need to use the old PG_fscache-based tracking.
To deal with this, a flag, NETFS_ICTX_USE_PGPRIV2, now has to be set on the
flags in the netfs_inode struct for those filesystems.  This reenables the
use of PG_fscache in that inode.  9p and afs use the netfslib write helpers
so get switched over; cifs, for the moment, does page-by-page manual access
to the cache, so doesn't use PG_fscache and is unaffected.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: Bharath SM <bharathsm@microsoft.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna@kernel.org>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---
 fs/ceph/addr.c               |  2 +-
 fs/ceph/inode.c              |  2 +
 fs/netfs/buffered_read.c     | 36 ++++++++++----
 fs/netfs/buffered_write.c    | 93 +++++++++++++++++-------------------
 fs/netfs/fscache_io.c        | 12 +++--
 fs/netfs/internal.h          | 10 ++--
 fs/netfs/io.c                | 18 +++----
 fs/netfs/main.c              |  1 +
 fs/netfs/misc.c              | 10 +---
 fs/netfs/objects.c           |  6 ++-
 fs/nfs/fscache.h             |  2 +
 include/linux/fscache.h      | 22 +++++----
 include/linux/netfs.h        | 20 ++++++--
 include/trace/events/netfs.h |  6 ++-
 14 files changed, 143 insertions(+), 97 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 1340d77124ae..57cbae134b37 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -517,7 +517,7 @@ static void ceph_fscache_write_to_cache(struct inode *inode, u64 off, u64 len, b
 	struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
 
 	fscache_write_to_cache(cookie, inode->i_mapping, off, len, i_size_read(inode),
-			       ceph_fscache_write_terminated, inode, caching);
+			       ceph_fscache_write_terminated, inode, true, caching);
 }
 #else
 static inline void ceph_set_page_fscache(struct page *page)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 7b2e77517f23..99561fddcb38 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -577,6 +577,8 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
 
 	/* Set parameters for the netfs library */
 	netfs_inode_init(&ci->netfs, &ceph_netfs_ops, false);
+	/* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
+	__set_bit(NETFS_ICTX_USE_PGPRIV2, &ci->netfs.flags);
 
 	spin_lock_init(&ci->i_ceph_lock);
 
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 3298c29b5548..6d49319c82c6 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -10,8 +10,11 @@
 #include "internal.h"
 
 /*
- * Unlock the folios in a read operation.  We need to set PG_fscache on any
+ * Unlock the folios in a read operation.  We need to set PG_writeback on any
  * folios we're going to write back before we unlock them.
+ *
+ * Note that if the deprecated NETFS_RREQ_USE_PGPRIV2 is set then we use
+ * PG_private_2 and do a direct write to the cache from here instead.
  */
 void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
 {
@@ -48,14 +51,14 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
 	xas_for_each(&xas, folio, last_page) {
 		loff_t pg_end;
 		bool pg_failed = false;
-		bool folio_started;
+		bool wback_to_cache = false;
+		bool folio_started = false;
 
 		if (xas_retry(&xas, folio))
 			continue;
 
 		pg_end = folio_pos(folio) + folio_size(folio) - 1;
 
-		folio_started = false;
 		for (;;) {
 			loff_t sreq_end;
 
@@ -63,10 +66,16 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
 				pg_failed = true;
 				break;
 			}
-			if (!folio_started && test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) {
-				trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
-				folio_start_fscache(folio);
-				folio_started = true;
+			if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
+				if (!folio_started && test_bit(NETFS_SREQ_COPY_TO_CACHE,
+							       &subreq->flags)) {
+					trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
+					folio_start_fscache(folio);
+					folio_started = true;
+				}
+			} else {
+				wback_to_cache |=
+					test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
 			}
 			pg_failed |= subreq_failed;
 			sreq_end = subreq->start + subreq->len - 1;
@@ -98,6 +107,11 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
 				kfree(finfo);
 			}
 			folio_mark_uptodate(folio);
+			if (wback_to_cache && !WARN_ON_ONCE(folio_get_private(folio) != NULL)) {
+				trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
+				folio_attach_private(folio, NETFS_FOLIO_COPY_TO_CACHE);
+				filemap_dirty_folio(folio->mapping, folio);
+			}
 		}
 
 		if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
@@ -491,9 +505,11 @@ int netfs_write_begin(struct netfs_inode *ctx,
 	netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
 
 have_folio:
-	ret = folio_wait_fscache_killable(folio);
-	if (ret < 0)
-		goto error;
+	if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags)) {
+		ret = folio_wait_fscache_killable(folio);
+		if (ret < 0)
+			goto error;
+	}
 have_folio_no_wait:
 	*_folio = folio;
 	_leave(" = 0");
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index c194655a6dcf..576a68b7887e 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -30,21 +30,13 @@ static void netfs_cleanup_buffered_write(struct netfs_io_request *wreq);
 
 static void netfs_set_group(struct folio *folio, struct netfs_group *netfs_group)
 {
-	if (netfs_group && !folio_get_private(folio))
-		folio_attach_private(folio, netfs_get_group(netfs_group));
-}
+	void *priv = folio_get_private(folio);
 
-#if IS_ENABLED(CONFIG_FSCACHE)
-static void netfs_folio_start_fscache(bool caching, struct folio *folio)
-{
-	if (caching)
-		folio_start_fscache(folio);
-}
-#else
-static void netfs_folio_start_fscache(bool caching, struct folio *folio)
-{
+	if (netfs_group && (!priv || priv == NETFS_FOLIO_COPY_TO_CACHE))
+		folio_attach_private(folio, netfs_get_group(netfs_group));
+	else if (!netfs_group && priv == NETFS_FOLIO_COPY_TO_CACHE)
+		folio_detach_private(folio);
 }
-#endif
 
 /*
  * Decide how we should modify a folio.  We might be attempting to do
@@ -63,11 +55,12 @@ static enum netfs_how_to_modify netfs_how_to_modify(struct netfs_inode *ctx,
 						    bool maybe_trouble)
 {
 	struct netfs_folio *finfo = netfs_folio_info(folio);
+	struct netfs_group *group = netfs_folio_group(folio);
 	loff_t pos = folio_file_pos(folio);
 
 	_enter("");
 
-	if (netfs_folio_group(folio) != netfs_group)
+	if (group != netfs_group && group != NETFS_FOLIO_COPY_TO_CACHE)
 		return NETFS_FLUSH_CONTENT;
 
 	if (folio_test_uptodate(folio))
@@ -397,9 +390,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 				folio_clear_dirty_for_io(folio);
 			/* We make multiple writes to the folio... */
 			if (!folio_test_writeback(folio)) {
-				folio_wait_fscache(folio);
 				folio_start_writeback(folio);
-				folio_start_fscache(folio);
 				if (wreq->iter.count == 0)
 					trace_netfs_folio(folio, netfs_folio_trace_wthru);
 				else
@@ -527,6 +518,7 @@ EXPORT_SYMBOL(netfs_file_write_iter);
  */
 vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group *netfs_group)
 {
+	struct netfs_group *group;
 	struct folio *folio = page_folio(vmf->page);
 	struct file *file = vmf->vma->vm_file;
 	struct inode *inode = file_inode(file);
@@ -549,7 +541,8 @@ vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group *netfs_gr
 		goto out;
 	}
 
-	if (netfs_folio_group(folio) != netfs_group) {
+	group = netfs_folio_group(folio);
+	if (group != netfs_group && group != NETFS_FOLIO_COPY_TO_CACHE) {
 		folio_unlock(folio);
 		err = filemap_fdatawait_range(inode->i_mapping,
 					      folio_pos(folio),
@@ -605,8 +598,6 @@ static void netfs_kill_pages(struct address_space *mapping,
 
 		trace_netfs_folio(folio, netfs_folio_trace_kill);
 		folio_clear_uptodate(folio);
-		if (folio_test_fscache(folio))
-			folio_end_fscache(folio);
 		folio_end_writeback(folio);
 		folio_lock(folio);
 		generic_error_remove_folio(mapping, folio);
@@ -642,8 +633,6 @@ static void netfs_redirty_pages(struct address_space *mapping,
 		next = folio_next_index(folio);
 		trace_netfs_folio(folio, netfs_folio_trace_redirty);
 		filemap_dirty_folio(mapping, folio);
-		if (folio_test_fscache(folio))
-			folio_end_fscache(folio);
 		folio_end_writeback(folio);
 		folio_put(folio);
 	} while (index = next, index <= last);
@@ -699,7 +688,11 @@ static void netfs_pages_written_back(struct netfs_io_request *wreq)
 				if (!folio_test_dirty(folio)) {
 					folio_detach_private(folio);
 					gcount++;
-					trace_netfs_folio(folio, netfs_folio_trace_clear_g);
+					if (group == NETFS_FOLIO_COPY_TO_CACHE)
+						trace_netfs_folio(folio,
+								  netfs_folio_trace_end_copy);
+					else
+						trace_netfs_folio(folio, netfs_folio_trace_clear_g);
 				} else {
 					trace_netfs_folio(folio, netfs_folio_trace_redirtied);
 				}
@@ -723,8 +716,6 @@ static void netfs_pages_written_back(struct netfs_io_request *wreq)
 			trace_netfs_folio(folio, netfs_folio_trace_clear);
 		}
 	end_wb:
-		if (folio_test_fscache(folio))
-			folio_end_fscache(folio);
 		xas_advance(&xas, folio_next_index(folio) - 1);
 		folio_end_writeback(folio);
 	}
@@ -794,7 +785,6 @@ static void netfs_extend_writeback(struct address_space *mapping,
 				   long *_count,
 				   loff_t start,
 				   loff_t max_len,
-				   bool caching,
 				   size_t *_len,
 				   size_t *_top)
 {
@@ -845,8 +835,7 @@ static void netfs_extend_writeback(struct address_space *mapping,
 				break;
 			}
 			if (!folio_test_dirty(folio) ||
-			    folio_test_writeback(folio) ||
-			    folio_test_fscache(folio)) {
+			    folio_test_writeback(folio)) {
 				folio_unlock(folio);
 				folio_put(folio);
 				xas_reset(xas);
@@ -859,7 +848,8 @@ static void netfs_extend_writeback(struct address_space *mapping,
 			if ((const struct netfs_group *)priv != group) {
 				stop = true;
 				finfo = netfs_folio_info(folio);
-				if (finfo->netfs_group != group ||
+				if (!finfo ||
+				    finfo->netfs_group != group ||
 				    finfo->dirty_offset > 0) {
 					folio_unlock(folio);
 					folio_put(folio);
@@ -893,12 +883,14 @@ static void netfs_extend_writeback(struct address_space *mapping,
 
 		for (i = 0; i < folio_batch_count(&fbatch); i++) {
 			folio = fbatch.folios[i];
-			trace_netfs_folio(folio, netfs_folio_trace_store_plus);
+			if (group == NETFS_FOLIO_COPY_TO_CACHE)
+				trace_netfs_folio(folio, netfs_folio_trace_copy_plus);
+			else
+				trace_netfs_folio(folio, netfs_folio_trace_store_plus);
 
 			if (!folio_clear_dirty_for_io(folio))
 				BUG();
 			folio_start_writeback(folio);
-			netfs_folio_start_fscache(caching, folio);
 			folio_unlock(folio);
 		}
 
@@ -924,14 +916,14 @@ static ssize_t netfs_write_back_from_locked_folio(struct address_space *mapping,
 	struct netfs_inode *ctx = netfs_inode(mapping->host);
 	unsigned long long i_size = i_size_read(&ctx->inode);
 	size_t len, max_len;
-	bool caching = netfs_is_cache_enabled(ctx);
 	long count = wbc->nr_to_write;
 	int ret;
 
-	_enter(",%lx,%llx-%llx,%u", folio->index, start, end, caching);
+	_enter(",%lx,%llx-%llx", folio->index, start, end);
 
 	wreq = netfs_alloc_request(mapping, NULL, start, folio_size(folio),
-				   NETFS_WRITEBACK);
+				   group == NETFS_FOLIO_COPY_TO_CACHE ?
+				   NETFS_COPY_TO_CACHE : NETFS_WRITEBACK);
 	if (IS_ERR(wreq)) {
 		folio_unlock(folio);
 		return PTR_ERR(wreq);
@@ -940,7 +932,6 @@ static ssize_t netfs_write_back_from_locked_folio(struct address_space *mapping,
 	if (!folio_clear_dirty_for_io(folio))
 		BUG();
 	folio_start_writeback(folio);
-	netfs_folio_start_fscache(caching, folio);
 
 	count -= folio_nr_pages(folio);
 
@@ -949,7 +940,10 @@ static ssize_t netfs_write_back_from_locked_folio(struct address_space *mapping,
 	 * immediately lockable, is not dirty or is missing, or we reach the
 	 * end of the range.
 	 */
-	trace_netfs_folio(folio, netfs_folio_trace_store);
+	if (group == NETFS_FOLIO_COPY_TO_CACHE)
+		trace_netfs_folio(folio, netfs_folio_trace_copy);
+	else
+		trace_netfs_folio(folio, netfs_folio_trace_store);
 
 	len = wreq->len;
 	finfo = netfs_folio_info(folio);
@@ -972,7 +966,7 @@ static ssize_t netfs_write_back_from_locked_folio(struct address_space *mapping,
 
 		if (len < max_len)
 			netfs_extend_writeback(mapping, group, xas, &count, start,
-					       max_len, caching, &len, &wreq->upper_len);
+					       max_len, &len, &wreq->upper_len);
 	}
 
 cant_expand:
@@ -996,15 +990,18 @@ static ssize_t netfs_write_back_from_locked_folio(struct address_space *mapping,
 
 		iov_iter_xarray(&wreq->iter, ITER_SOURCE, &mapping->i_pages, start,
 				wreq->upper_len);
-		__set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags);
-		ret = netfs_begin_write(wreq, true, netfs_write_trace_writeback);
+		if (group != NETFS_FOLIO_COPY_TO_CACHE) {
+			__set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags);
+			ret = netfs_begin_write(wreq, true, netfs_write_trace_writeback);
+		} else {
+			ret = netfs_begin_write(wreq, true, netfs_write_trace_copy_to_cache);
+		}
 		if (ret == 0 || ret == -EIOCBQUEUED)
 			wbc->nr_to_write -= len / PAGE_SIZE;
 	} else {
 		_debug("write discard %zx @%llx [%llx]", len, start, i_size);
 
 		/* The dirty region was entirely beyond the EOF. */
-		fscache_clear_page_bits(mapping, start, len, caching);
 		netfs_pages_written_back(wreq);
 		ret = 0;
 	}
@@ -1057,9 +1054,11 @@ static ssize_t netfs_writepages_begin(struct address_space *mapping,
 
 		/* Skip any dirty folio that's not in the group of interest. */
 		priv = folio_get_private(folio);
-		if ((const struct netfs_group *)priv != group) {
-			finfo = netfs_folio_info(folio);
-			if (finfo->netfs_group != group) {
+		if ((const struct netfs_group *)priv == NETFS_FOLIO_COPY_TO_CACHE) {
+			group = NETFS_FOLIO_COPY_TO_CACHE;
+		} else if ((const struct netfs_group *)priv != group) {
+			finfo = __netfs_folio_info(priv);
+			if (!finfo || finfo->netfs_group != group) {
 				folio_put(folio);
 				continue;
 			}
@@ -1098,14 +1097,10 @@ static ssize_t netfs_writepages_begin(struct address_space *mapping,
 		goto search_again;
 	}
 
-	if (folio_test_writeback(folio) ||
-	    folio_test_fscache(folio)) {
+	if (folio_test_writeback(folio)) {
 		folio_unlock(folio);
 		if (wbc->sync_mode != WB_SYNC_NONE) {
 			folio_wait_writeback(folio);
-#ifdef CONFIG_FSCACHE
-			folio_wait_fscache(folio);
-#endif
 			goto lock_again;
 		}
 
@@ -1264,7 +1259,8 @@ int netfs_launder_folio(struct folio *folio)
 
 	bvec_set_folio(&bvec, folio, len, offset);
 	iov_iter_bvec(&wreq->iter, ITER_SOURCE, &bvec, 1, len);
-	__set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags);
+	if (group != NETFS_FOLIO_COPY_TO_CACHE)
+		__set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags);
 	ret = netfs_begin_write(wreq, true, netfs_write_trace_launder);
 
 out_put:
@@ -1273,7 +1269,6 @@ int netfs_launder_folio(struct folio *folio)
 	kfree(finfo);
 	netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
 out:
-	folio_wait_fscache(folio);
 	_leave(" = %d", ret);
 	return ret;
 }
diff --git a/fs/netfs/fscache_io.c b/fs/netfs/fscache_io.c
index 43a651ed8264..5028f2ae30da 100644
--- a/fs/netfs/fscache_io.c
+++ b/fs/netfs/fscache_io.c
@@ -166,6 +166,7 @@ struct fscache_write_request {
 	loff_t			start;
 	size_t			len;
 	bool			set_bits;
+	bool			using_pgpriv2;
 	netfs_io_terminated_t	term_func;
 	void			*term_func_priv;
 };
@@ -197,8 +198,9 @@ static void fscache_wreq_done(void *priv, ssize_t transferred_or_error,
 {
 	struct fscache_write_request *wreq = priv;
 
-	fscache_clear_page_bits(wreq->mapping, wreq->start, wreq->len,
-				wreq->set_bits);
+	if (wreq->using_pgpriv2)
+		fscache_clear_page_bits(wreq->mapping, wreq->start, wreq->len,
+					wreq->set_bits);
 
 	if (wreq->term_func)
 		wreq->term_func(wreq->term_func_priv, transferred_or_error,
@@ -212,7 +214,7 @@ void __fscache_write_to_cache(struct fscache_cookie *cookie,
 			      loff_t start, size_t len, loff_t i_size,
 			      netfs_io_terminated_t term_func,
 			      void *term_func_priv,
-			      bool cond)
+			      bool using_pgpriv2, bool cond)
 {
 	struct fscache_write_request *wreq;
 	struct netfs_cache_resources *cres;
@@ -230,6 +232,7 @@ void __fscache_write_to_cache(struct fscache_cookie *cookie,
 	wreq->mapping		= mapping;
 	wreq->start		= start;
 	wreq->len		= len;
+	wreq->using_pgpriv2	= using_pgpriv2;
 	wreq->set_bits		= cond;
 	wreq->term_func		= term_func;
 	wreq->term_func_priv	= term_func_priv;
@@ -257,7 +260,8 @@ void __fscache_write_to_cache(struct fscache_cookie *cookie,
 abandon_free:
 	kfree(wreq);
 abandon:
-	fscache_clear_page_bits(mapping, start, len, cond);
+	if (using_pgpriv2)
+		fscache_clear_page_bits(mapping, start, len, cond);
 	if (term_func)
 		term_func(term_func_priv, ret, false);
 }
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index ec7045d24400..156ab138e224 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -168,7 +168,7 @@ static inline bool netfs_is_cache_enabled(struct netfs_inode *ctx)
  */
 static inline struct netfs_group *netfs_get_group(struct netfs_group *netfs_group)
 {
-	if (netfs_group)
+	if (netfs_group && netfs_group != NETFS_FOLIO_COPY_TO_CACHE)
 		refcount_inc(&netfs_group->ref);
 	return netfs_group;
 }
@@ -178,7 +178,9 @@ static inline struct netfs_group *netfs_get_group(struct netfs_group *netfs_grou
  */
 static inline void netfs_put_group(struct netfs_group *netfs_group)
 {
-	if (netfs_group && refcount_dec_and_test(&netfs_group->ref))
+	if (netfs_group &&
+	    netfs_group != NETFS_FOLIO_COPY_TO_CACHE &&
+	    refcount_dec_and_test(&netfs_group->ref))
 		netfs_group->free(netfs_group);
 }
 
@@ -187,7 +189,9 @@ static inline void netfs_put_group(struct netfs_group *netfs_group)
  */
 static inline void netfs_put_group_many(struct netfs_group *netfs_group, int nr)
 {
-	if (netfs_group && refcount_sub_and_test(nr, &netfs_group->ref))
+	if (netfs_group &&
+	    netfs_group != NETFS_FOLIO_COPY_TO_CACHE &&
+	    refcount_sub_and_test(nr, &netfs_group->ref))
 		netfs_group->free(netfs_group);
 }
 
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index 4261ad6c55b6..b3b9827a9709 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -99,8 +99,9 @@ static void netfs_rreq_completed(struct netfs_io_request *rreq, bool was_async)
 }
 
 /*
- * Deal with the completion of writing the data to the cache.  We have to clear
- * the PG_fscache bits on the folios involved and release the caller's ref.
+ * [DEPRECATED] Deal with the completion of writing the data to the cache.  We
+ * have to clear the PG_fscache bits on the folios involved and release the
+ * caller's ref.
  *
  * May be called in softirq mode and we inherit a ref from the caller.
  */
@@ -138,7 +139,7 @@ static void netfs_rreq_unmark_after_write(struct netfs_io_request *rreq,
 }
 
 static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error,
-				       bool was_async)
+				       bool was_async) /* [DEPRECATED] */
 {
 	struct netfs_io_subrequest *subreq = priv;
 	struct netfs_io_request *rreq = subreq->rreq;
@@ -161,8 +162,8 @@ static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error,
 }
 
 /*
- * Perform any outstanding writes to the cache.  We inherit a ref from the
- * caller.
+ * [DEPRECATED] Perform any outstanding writes to the cache.  We inherit a ref
+ * from the caller.
  */
 static void netfs_rreq_do_write_to_cache(struct netfs_io_request *rreq)
 {
@@ -222,7 +223,7 @@ static void netfs_rreq_do_write_to_cache(struct netfs_io_request *rreq)
 		netfs_rreq_unmark_after_write(rreq, false);
 }
 
-static void netfs_rreq_write_to_cache_work(struct work_struct *work)
+static void netfs_rreq_write_to_cache_work(struct work_struct *work) /* [DEPRECATED] */
 {
 	struct netfs_io_request *rreq =
 		container_of(work, struct netfs_io_request, work);
@@ -230,7 +231,7 @@ static void netfs_rreq_write_to_cache_work(struct work_struct *work)
 	netfs_rreq_do_write_to_cache(rreq);
 }
 
-static void netfs_rreq_write_to_cache(struct netfs_io_request *rreq)
+static void netfs_rreq_write_to_cache(struct netfs_io_request *rreq) /* [DEPRECATED] */
 {
 	rreq->work.func = netfs_rreq_write_to_cache_work;
 	if (!queue_work(system_unbound_wq, &rreq->work))
@@ -409,7 +410,8 @@ static void netfs_rreq_assess(struct netfs_io_request *rreq, bool was_async)
 	clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
 	wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS);
 
-	if (test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags))
+	if (test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags) &&
+	    test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags))
 		return netfs_rreq_write_to_cache(rreq);
 
 	netfs_rreq_completed(rreq, was_async);
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index 5e77618a7940..c5a73c9ed126 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -31,6 +31,7 @@ static const char *netfs_origins[nr__netfs_io_origin] = {
 	[NETFS_READAHEAD]		= "RA",
 	[NETFS_READPAGE]		= "RP",
 	[NETFS_READ_FOR_WRITE]		= "RW",
+	[NETFS_COPY_TO_CACHE]		= "CC",
 	[NETFS_WRITEBACK]		= "WB",
 	[NETFS_WRITETHROUGH]		= "WT",
 	[NETFS_LAUNDER_WRITE]		= "LW",
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 90051ced8e2a..bc1fc54fb724 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -177,13 +177,11 @@ EXPORT_SYMBOL(netfs_clear_inode_writeback);
  */
 void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
 {
-	struct netfs_folio *finfo = NULL;
+	struct netfs_folio *finfo;
 	size_t flen = folio_size(folio);
 
 	_enter("{%lx},%zx,%zx", folio->index, offset, length);
 
-	folio_wait_fscache(folio);
-
 	if (!folio_test_private(folio))
 		return;
 
@@ -248,12 +246,6 @@ bool netfs_release_folio(struct folio *folio, gfp_t gfp)
 
 	if (folio_test_private(folio))
 		return false;
-	if (folio_test_fscache(folio)) {
-		if (current_is_kswapd() || !(gfp & __GFP_FS))
-			return false;
-		folio_wait_fscache(folio);
-	}
-
 	fscache_note_page_release(netfs_i_cookie(ctx));
 	return true;
 }
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 610ceb5bd86c..72b52f070270 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -45,8 +45,12 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
 	refcount_set(&rreq->ref, 1);
 
 	__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
-	if (cached)
+	if (cached) {
 		__set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
+		if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags))
+			/* Filesystem uses deprecated PG_private_2 marking. */
+			__set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
+	}
 	if (file && file->f_flags & O_NONBLOCK)
 		__set_bit(NETFS_RREQ_NONBLOCK, &rreq->flags);
 	if (rreq->netfs_ops->init_request) {
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index e3cb4923316b..814363d1d7c7 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -81,6 +81,8 @@ static inline void nfs_netfs_put(struct nfs_netfs_io_data *netfs)
 static inline void nfs_netfs_inode_init(struct nfs_inode *nfsi)
 {
 	netfs_inode_init(&nfsi->netfs, &nfs_netfs_ops, false);
+	/* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
+	__set_bit(NETFS_ICTX_USE_PGPRIV2, &nfsi->netfs.flags);
 }
 extern void nfs_netfs_initiate_read(struct nfs_pgio_header *hdr);
 extern void nfs_netfs_read_completion(struct nfs_pgio_header *hdr);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 6e8562cbcc43..9de27643607f 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -172,9 +172,12 @@ extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t,
 extern int __fscache_begin_read_operation(struct netfs_cache_resources *, struct fscache_cookie *);
 extern int __fscache_begin_write_operation(struct netfs_cache_resources *, struct fscache_cookie *);
 
-extern void __fscache_write_to_cache(struct fscache_cookie *, struct address_space *,
-				     loff_t, size_t, loff_t, netfs_io_terminated_t, void *,
-				     bool);
+void __fscache_write_to_cache(struct fscache_cookie *cookie,
+			      struct address_space *mapping,
+			      loff_t start, size_t len, loff_t i_size,
+			      netfs_io_terminated_t term_func,
+			      void *term_func_priv,
+			      bool using_pgpriv2, bool cond);
 extern void __fscache_clear_page_bits(struct address_space *, loff_t, size_t);
 
 /**
@@ -597,7 +600,8 @@ static inline void fscache_clear_page_bits(struct address_space *mapping,
  * @i_size: The new size of the inode
  * @term_func: The function to call upon completion
  * @term_func_priv: The private data for @term_func
- * @caching: If PG_fscache has been set
+ * @using_pgpriv2: If we're using PG_private_2 to mark in-progress write
+ * @caching: If we actually want to do the caching
  *
  * Helper function for a netfs to write dirty data from an inode into the cache
  * object that's backing it.
@@ -608,19 +612,21 @@ static inline void fscache_clear_page_bits(struct address_space *mapping,
  * marked with PG_fscache.
  *
  * If given, @term_func will be called upon completion and supplied with
- * @term_func_priv.  Note that the PG_fscache flags will have been cleared by
- * this point, so the netfs must retain its own pin on the mapping.
+ * @term_func_priv.  Note that if @using_pgpriv2 is set, the PG_private_2 flags
+ * will have been cleared by this point, so the netfs must retain its own pin
+ * on the mapping.
  */
 static inline void fscache_write_to_cache(struct fscache_cookie *cookie,
 					  struct address_space *mapping,
 					  loff_t start, size_t len, loff_t i_size,
 					  netfs_io_terminated_t term_func,
 					  void *term_func_priv,
-					  bool caching)
+					  bool using_pgpriv2, bool caching)
 {
 	if (caching)
 		__fscache_write_to_cache(cookie, mapping, start, len, i_size,
-					 term_func, term_func_priv, caching);
+					 term_func, term_func_priv,
+					 using_pgpriv2, caching);
 	else if (term_func)
 		term_func(term_func_priv, -ENOBUFS, false);
 
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 100cbb261269..f5e9c5f84a0c 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -143,6 +143,8 @@ struct netfs_inode {
 #define NETFS_ICTX_UNBUFFERED	1		/* I/O should not use the pagecache */
 #define NETFS_ICTX_WRITETHROUGH	2		/* Write-through caching */
 #define NETFS_ICTX_NO_WRITE_STREAMING	3	/* Don't engage in write-streaming */
+#define NETFS_ICTX_USE_PGPRIV2	31		/* [DEPRECATED] Use PG_private_2 to mark
+						 * write to cache on read */
 };
 
 /*
@@ -165,16 +167,25 @@ struct netfs_folio {
 	unsigned int		dirty_len;	/* Write-streaming dirty data length */
 };
 #define NETFS_FOLIO_INFO	0x1UL	/* OR'd with folio->private. */
+#define NETFS_FOLIO_COPY_TO_CACHE ((struct netfs_group *)0x356UL) /* Write to the cache only */
 
-static inline struct netfs_folio *netfs_folio_info(struct folio *folio)
+static inline bool netfs_is_folio_info(const void *priv)
 {
-	void *priv = folio_get_private(folio);
+	return (unsigned long)priv & NETFS_FOLIO_INFO;
+}
 
-	if ((unsigned long)priv & NETFS_FOLIO_INFO)
+static inline struct netfs_folio *__netfs_folio_info(const void *priv)
+{
+	if (netfs_is_folio_info(priv))
 		return (struct netfs_folio *)((unsigned long)priv & ~NETFS_FOLIO_INFO);
 	return NULL;
 }
 
+static inline struct netfs_folio *netfs_folio_info(struct folio *folio)
+{
+	return __netfs_folio_info(folio_get_private(folio));
+}
+
 static inline struct netfs_group *netfs_folio_group(struct folio *folio)
 {
 	struct netfs_folio *finfo;
@@ -230,6 +241,7 @@ enum netfs_io_origin {
 	NETFS_READAHEAD,		/* This read was triggered by readahead */
 	NETFS_READPAGE,			/* This read is a synchronous read */
 	NETFS_READ_FOR_WRITE,		/* This read is to prepare a write */
+	NETFS_COPY_TO_CACHE,		/* This write is to copy a read to the cache */
 	NETFS_WRITEBACK,		/* This write was triggered by writepages */
 	NETFS_WRITETHROUGH,		/* This write was made by netfs_perform_write() */
 	NETFS_LAUNDER_WRITE,		/* This is triggered by ->launder_folio() */
@@ -287,6 +299,8 @@ struct netfs_io_request {
 #define NETFS_RREQ_UPLOAD_TO_SERVER	8	/* Need to write to the server */
 #define NETFS_RREQ_NONBLOCK		9	/* Don't block if possible (O_NONBLOCK) */
 #define NETFS_RREQ_BLOCKED		10	/* We blocked */
+#define NETFS_RREQ_USE_PGPRIV2		31	/* [DEPRECATED] Use PG_private_2 to mark
+						 * write to cache on read */
 	const struct netfs_request_ops *netfs_ops;
 	void (*cleanup)(struct netfs_io_request *req);
 };
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 447a8c21cf57..e03fafb0c1e3 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -24,6 +24,7 @@
 	E_(netfs_read_trace_write_begin,	"WRITEBEGN")
 
 #define netfs_write_traces					\
+	EM(netfs_write_trace_copy_to_cache,	"COPY2CACH")	\
 	EM(netfs_write_trace_dio_write,		"DIO-WRITE")	\
 	EM(netfs_write_trace_launder,		"LAUNDER  ")	\
 	EM(netfs_write_trace_unbuffered_write,	"UNB-WRITE")	\
@@ -34,6 +35,7 @@
 	EM(NETFS_READAHEAD,			"RA")		\
 	EM(NETFS_READPAGE,			"RP")		\
 	EM(NETFS_READ_FOR_WRITE,		"RW")		\
+	EM(NETFS_COPY_TO_CACHE,			"CC")		\
 	EM(NETFS_WRITEBACK,			"WB")		\
 	EM(NETFS_WRITETHROUGH,			"WT")		\
 	EM(NETFS_LAUNDER_WRITE,			"LW")		\
@@ -127,7 +129,9 @@
 	EM(netfs_folio_trace_clear,		"clear")	\
 	EM(netfs_folio_trace_clear_s,		"clear-s")	\
 	EM(netfs_folio_trace_clear_g,		"clear-g")	\
-	EM(netfs_folio_trace_copy_to_cache,	"copy")		\
+	EM(netfs_folio_trace_copy,		"copy")		\
+	EM(netfs_folio_trace_copy_plus,		"copy+")	\
+	EM(netfs_folio_trace_copy_to_cache,	"mark-copy")	\
 	EM(netfs_folio_trace_end_copy,		"end-copy")	\
 	EM(netfs_folio_trace_filled_gaps,	"filled-gaps")	\
 	EM(netfs_folio_trace_kill,		"kill")		\


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 05/26] mm: Remove the PG_fscache alias for PG_private_2
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (3 preceding siblings ...)
  2024-03-28 16:33 ` [PATCH 04/26] netfs: Replace PG_fscache by setting folio->private and marking dirty David Howells
@ 2024-03-28 16:33 ` David Howells
  2024-03-28 16:33 ` [PATCH 06/26] netfs: Remove deprecated use of PG_private_2 as a second writeback flag David Howells
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:33 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: David Howells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Eric Van Hensbergen, Matthew Wilcox, Steve French,
	linux-cachefs, Ilya Dryomov, Shyam Prasad N, Tom Talpey,
	Bharath SM, Ronnie Sahlberg, ceph-devel, Xiubo Li,
	Trond Myklebust, linux-nfs, netdev, v9fs, linux-kernel,
	Steve French, Anna Schumaker, linux-fsdevel, netfs, linux-erofs

Remove the PG_fscache alias for PG_private_2 and use the latter directly.
Use of this flag for marking pages undergoing writing to the cache should
be considered deprecated and the folios should be marked dirty instead and
the write done in ->writepages().

Note that PG_private_2 itself should be considered deprecated and up for
future removal by the MM folks too.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: Bharath SM <bharathsm@microsoft.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna@kernel.org>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---
 fs/ceph/addr.c           | 11 +++---
 fs/netfs/buffered_read.c |  4 +-
 fs/netfs/fscache_io.c    |  2 +-
 fs/netfs/io.c            |  2 +-
 fs/nfs/file.c            |  8 ++--
 fs/nfs/fscache.h         |  4 +-
 fs/nfs/write.c           |  4 +-
 fs/smb/client/file.c     | 16 ++++----
 include/linux/netfs.h    | 80 ++--------------------------------------
 mm/filemap.c             |  6 +--
 10 files changed, 33 insertions(+), 104 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 57cbae134b37..75690f969ebc 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -500,7 +500,7 @@ const struct netfs_request_ops ceph_netfs_ops = {
 #ifdef CONFIG_CEPH_FSCACHE
 static void ceph_set_page_fscache(struct page *page)
 {
-	set_page_fscache(page);
+	folio_start_private_2(page_folio(page)); /* [DEPRECATED] */
 }
 
 static void ceph_fscache_write_terminated(void *priv, ssize_t error, bool was_async)
@@ -798,7 +798,7 @@ static int ceph_writepage(struct page *page, struct writeback_control *wbc)
 	    ceph_inode_to_fs_client(inode)->write_congested)
 		return AOP_WRITEPAGE_ACTIVATE;
 
-	wait_on_page_fscache(page);
+	folio_wait_private_2(page_folio(page)); /* [DEPRECATED] */
 
 	err = writepage_nounlock(page, wbc);
 	if (err == -ERESTARTSYS) {
@@ -1073,7 +1073,8 @@ static int ceph_writepages_start(struct address_space *mapping,
 				unlock_page(page);
 				break;
 			}
-			if (PageWriteback(page) || PageFsCache(page)) {
+			if (PageWriteback(page) ||
+			    PagePrivate2(page) /* [DEPRECATED] */) {
 				if (wbc->sync_mode == WB_SYNC_NONE) {
 					doutc(cl, "%p under writeback\n", page);
 					unlock_page(page);
@@ -1081,7 +1082,7 @@ static int ceph_writepages_start(struct address_space *mapping,
 				}
 				doutc(cl, "waiting on writeback %p\n", page);
 				wait_on_page_writeback(page);
-				wait_on_page_fscache(page);
+				folio_wait_private_2(page_folio(page)); /* [DEPRECATED] */
 			}
 
 			if (!clear_page_dirty_for_io(page)) {
@@ -1511,7 +1512,7 @@ static int ceph_write_begin(struct file *file, struct address_space *mapping,
 	if (r < 0)
 		return r;
 
-	folio_wait_fscache(folio);
+	folio_wait_private_2(folio); /* [DEPRECATED] */
 	WARN_ON_ONCE(!folio_test_locked(folio));
 	*pagep = &folio->page;
 	return 0;
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 6d49319c82c6..b3fd6e1fa322 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -70,7 +70,7 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
 				if (!folio_started && test_bit(NETFS_SREQ_COPY_TO_CACHE,
 							       &subreq->flags)) {
 					trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
-					folio_start_fscache(folio);
+					folio_start_private_2(folio);
 					folio_started = true;
 				}
 			} else {
@@ -506,7 +506,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
 
 have_folio:
 	if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags)) {
-		ret = folio_wait_fscache_killable(folio);
+		ret = folio_wait_private_2_killable(folio);
 		if (ret < 0)
 			goto error;
 	}
diff --git a/fs/netfs/fscache_io.c b/fs/netfs/fscache_io.c
index 5028f2ae30da..38637e5c9b57 100644
--- a/fs/netfs/fscache_io.c
+++ b/fs/netfs/fscache_io.c
@@ -183,7 +183,7 @@ void __fscache_clear_page_bits(struct address_space *mapping,
 
 		rcu_read_lock();
 		xas_for_each(&xas, page, last) {
-			end_page_fscache(page);
+			folio_end_private_2(page_folio(page));
 		}
 		rcu_read_unlock();
 	}
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index b3b9827a9709..60a19f96e0ce 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -129,7 +129,7 @@ static void netfs_rreq_unmark_after_write(struct netfs_io_request *rreq,
 				continue;
 			unlocked = folio_next_index(folio) - 1;
 			trace_netfs_folio(folio, netfs_folio_trace_end_copy);
-			folio_end_fscache(folio);
+			folio_end_private_2(folio);
 			have_unlocked = true;
 		}
 	}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 407c6e15afe2..6bd127e6683d 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -433,7 +433,7 @@ static void nfs_invalidate_folio(struct folio *folio, size_t offset,
 		return;
 	/* Cancel any unstarted writes on this page */
 	nfs_wb_folio_cancel(inode, folio);
-	folio_wait_fscache(folio);
+	folio_wait_private_2(folio); /* [DEPRECATED] */
 	trace_nfs_invalidate_folio(inode, folio);
 }
 
@@ -500,7 +500,7 @@ static int nfs_launder_folio(struct folio *folio)
 	dfprintk(PAGECACHE, "NFS: launder_folio(%ld, %llu)\n",
 		inode->i_ino, folio_pos(folio));
 
-	folio_wait_fscache(folio);
+	folio_wait_private_2(folio); /* [DEPRECATED] */
 	ret = nfs_wb_folio(inode, folio);
 	trace_nfs_launder_folio_done(inode, folio, ret);
 	return ret;
@@ -593,8 +593,8 @@ static vm_fault_t nfs_vm_page_mkwrite(struct vm_fault *vmf)
 	sb_start_pagefault(inode->i_sb);
 
 	/* make sure the cache has finished storing the page */
-	if (folio_test_fscache(folio) &&
-	    folio_wait_fscache_killable(folio) < 0) {
+	if (folio_test_private_2(folio) && /* [DEPRECATED] */
+	    folio_wait_private_2_killable(folio) < 0) {
 		ret = VM_FAULT_RETRY;
 		goto out;
 	}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 814363d1d7c7..fbed0027996f 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -103,10 +103,10 @@ extern int nfs_netfs_read_folio(struct file *file, struct folio *folio);
 
 static inline bool nfs_fscache_release_folio(struct folio *folio, gfp_t gfp)
 {
-	if (folio_test_fscache(folio)) {
+	if (folio_test_private_2(folio)) { /* [DEPRECATED] */
 		if (current_is_kswapd() || !(gfp & __GFP_FS))
 			return false;
-		folio_wait_fscache(folio);
+		folio_wait_private_2(folio);
 	}
 	fscache_note_page_release(netfs_i_cookie(netfs_inode(folio->mapping->host)));
 	return true;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 5de85d725fb9..2329cbb0e446 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -2120,10 +2120,10 @@ int nfs_migrate_folio(struct address_space *mapping, struct folio *dst,
 	if (folio_test_private(src))
 		return -EBUSY;
 
-	if (folio_test_fscache(src)) {
+	if (folio_test_private_2(src)) { /* [DEPRECATED] */
 		if (mode == MIGRATE_ASYNC)
 			return -EBUSY;
-		folio_wait_fscache(src);
+		folio_wait_private_2(src);
 	}
 
 	return migrate_folio(mapping, dst, src, mode);
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 16aadce492b2..59da572d3384 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2953,12 +2953,12 @@ static ssize_t cifs_writepages_begin(struct address_space *mapping,
 	}
 
 	if (folio_test_writeback(folio) ||
-	    folio_test_fscache(folio)) {
+	    folio_test_private_2(folio)) { /* [DEPRECATED] */
 		folio_unlock(folio);
 		if (wbc->sync_mode != WB_SYNC_NONE) {
 			folio_wait_writeback(folio);
 #ifdef CONFIG_CIFS_FSCACHE
-			folio_wait_fscache(folio);
+			folio_wait_private_2(folio);
 #endif
 			goto lock_again;
 		}
@@ -4431,8 +4431,8 @@ static vm_fault_t cifs_page_mkwrite(struct vm_fault *vmf)
 	 * be modified.  We then assume the entire folio will need writing back.
 	 */
 #ifdef CONFIG_CIFS_FSCACHE
-	if (folio_test_fscache(folio) &&
-	    folio_wait_fscache_killable(folio) < 0)
+	if (folio_test_private_2(folio) && /* [DEPRECATED] */
+	    folio_wait_private_2_killable(folio) < 0)
 		return VM_FAULT_RETRY;
 #endif
 
@@ -4898,10 +4898,10 @@ static bool cifs_release_folio(struct folio *folio, gfp_t gfp)
 {
 	if (folio_test_private(folio))
 		return 0;
-	if (folio_test_fscache(folio)) {
+	if (folio_test_private_2(folio)) { /* [DEPRECATED] */
 		if (current_is_kswapd() || !(gfp & __GFP_FS))
 			return false;
-		folio_wait_fscache(folio);
+		folio_wait_private_2(folio);
 	}
 	fscache_note_page_release(cifs_inode_cookie(folio->mapping->host));
 	return true;
@@ -4910,7 +4910,7 @@ static bool cifs_release_folio(struct folio *folio, gfp_t gfp)
 static void cifs_invalidate_folio(struct folio *folio, size_t offset,
 				 size_t length)
 {
-	folio_wait_fscache(folio);
+	folio_wait_private_2(folio); /* [DEPRECATED] */
 }
 
 static int cifs_launder_folio(struct folio *folio)
@@ -4930,7 +4930,7 @@ static int cifs_launder_folio(struct folio *folio)
 	if (folio_clear_dirty_for_io(folio))
 		rc = cifs_writepage_locked(&folio->page, &wbc);
 
-	folio_wait_fscache(folio);
+	folio_wait_private_2(folio); /* [DEPRECATED] */
 	return rc;
 }
 
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index f5e9c5f84a0c..f36a6d8163d1 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -21,94 +21,22 @@
 
 enum netfs_sreq_ref_trace;
 
-/*
- * Overload PG_private_2 to give us PG_fscache - this is used to indicate that
- * a page is currently backed by a local disk cache
- */
-#define folio_test_fscache(folio)	folio_test_private_2(folio)
-#define PageFsCache(page)		PagePrivate2((page))
-#define SetPageFsCache(page)		SetPagePrivate2((page))
-#define ClearPageFsCache(page)		ClearPagePrivate2((page))
-#define TestSetPageFsCache(page)	TestSetPagePrivate2((page))
-#define TestClearPageFsCache(page)	TestClearPagePrivate2((page))
-
 /**
- * folio_start_fscache - Start an fscache write on a folio.
+ * folio_start_private_2 - Start an fscache write on a folio.  [DEPRECATED]
  * @folio: The folio.
  *
  * Call this function before writing a folio to a local cache.  Starting a
  * second write before the first one finishes is not allowed.
+ *
+ * Note that this should no longer be used.
  */
-static inline void folio_start_fscache(struct folio *folio)
+static inline void folio_start_private_2(struct folio *folio)
 {
 	VM_BUG_ON_FOLIO(folio_test_private_2(folio), folio);
 	folio_get(folio);
 	folio_set_private_2(folio);
 }
 
-/**
- * folio_end_fscache - End an fscache write on a folio.
- * @folio: The folio.
- *
- * Call this function after the folio has been written to the local cache.
- * This will wake any sleepers waiting on this folio.
- */
-static inline void folio_end_fscache(struct folio *folio)
-{
-	folio_end_private_2(folio);
-}
-
-/**
- * folio_wait_fscache - Wait for an fscache write on this folio to end.
- * @folio: The folio.
- *
- * If this folio is currently being written to a local cache, wait for
- * the write to finish.  Another write may start after this one finishes,
- * unless the caller holds the folio lock.
- */
-static inline void folio_wait_fscache(struct folio *folio)
-{
-	folio_wait_private_2(folio);
-}
-
-/**
- * folio_wait_fscache_killable - Wait for an fscache write on this folio to end.
- * @folio: The folio.
- *
- * If this folio is currently being written to a local cache, wait
- * for the write to finish or for a fatal signal to be received.
- * Another write may start after this one finishes, unless the caller
- * holds the folio lock.
- *
- * Return:
- * - 0 if successful.
- * - -EINTR if a fatal signal was encountered.
- */
-static inline int folio_wait_fscache_killable(struct folio *folio)
-{
-	return folio_wait_private_2_killable(folio);
-}
-
-static inline void set_page_fscache(struct page *page)
-{
-	folio_start_fscache(page_folio(page));
-}
-
-static inline void end_page_fscache(struct page *page)
-{
-	folio_end_private_2(page_folio(page));
-}
-
-static inline void wait_on_page_fscache(struct page *page)
-{
-	folio_wait_private_2(page_folio(page));
-}
-
-static inline int wait_on_page_fscache_killable(struct page *page)
-{
-	return folio_wait_private_2_killable(page_folio(page));
-}
-
 /* Marks used on xarray-based buffers */
 #define NETFS_BUF_PUT_MARK	XA_MARK_0	/* - Page needs putting  */
 #define NETFS_BUF_PAGECACHE_MARK XA_MARK_1	/* - Page needs wb/dirty flag wrangling */
diff --git a/mm/filemap.c b/mm/filemap.c
index 7437b2bd75c1..25983f0f96e3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1540,7 +1540,7 @@ EXPORT_SYMBOL(folio_end_private_2);
  * folio_wait_private_2 - Wait for PG_private_2 to be cleared on a folio.
  * @folio: The folio to wait on.
  *
- * Wait for PG_private_2 (aka PG_fscache) to be cleared on a folio.
+ * Wait for PG_private_2 to be cleared on a folio.
  */
 void folio_wait_private_2(struct folio *folio)
 {
@@ -1553,8 +1553,8 @@ EXPORT_SYMBOL(folio_wait_private_2);
  * folio_wait_private_2_killable - Wait for PG_private_2 to be cleared on a folio.
  * @folio: The folio to wait on.
  *
- * Wait for PG_private_2 (aka PG_fscache) to be cleared on a folio or until a
- * fatal signal is received by the calling task.
+ * Wait for PG_private_2 to be cleared on a folio or until a fatal signal is
+ * received by the calling task.
  *
  * Return:
  * - 0 if successful.


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 06/26] netfs: Remove deprecated use of PG_private_2 as a second writeback flag
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (4 preceding siblings ...)
  2024-03-28 16:33 ` [PATCH 05/26] mm: Remove the PG_fscache alias for PG_private_2 David Howells
@ 2024-03-28 16:33 ` David Howells
  2024-03-28 16:33 ` [PATCH 07/26] netfs: Make netfs_io_request::subreq_counter an atomic_t David Howells
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:33 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Remove the deprecated use of PG_private_2 in netfslib.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---
 fs/ceph/addr.c           |  19 +-----
 fs/netfs/buffered_read.c |   8 +--
 fs/netfs/io.c            | 144 ---------------------------------------
 3 files changed, 2 insertions(+), 169 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 75690f969ebc..2d0f13537c85 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -498,11 +498,6 @@ const struct netfs_request_ops ceph_netfs_ops = {
 };
 
 #ifdef CONFIG_CEPH_FSCACHE
-static void ceph_set_page_fscache(struct page *page)
-{
-	folio_start_private_2(page_folio(page)); /* [DEPRECATED] */
-}
-
 static void ceph_fscache_write_terminated(void *priv, ssize_t error, bool was_async)
 {
 	struct inode *inode = priv;
@@ -520,10 +515,6 @@ static void ceph_fscache_write_to_cache(struct inode *inode, u64 off, u64 len, b
 			       ceph_fscache_write_terminated, inode, true, caching);
 }
 #else
-static inline void ceph_set_page_fscache(struct page *page)
-{
-}
-
 static inline void ceph_fscache_write_to_cache(struct inode *inode, u64 off, u64 len, bool caching)
 {
 }
@@ -715,8 +706,6 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 		len = wlen;
 
 	set_page_writeback(page);
-	if (caching)
-		ceph_set_page_fscache(page);
 	ceph_fscache_write_to_cache(inode, page_off, len, caching);
 
 	if (IS_ENCRYPTED(inode)) {
@@ -798,8 +787,6 @@ static int ceph_writepage(struct page *page, struct writeback_control *wbc)
 	    ceph_inode_to_fs_client(inode)->write_congested)
 		return AOP_WRITEPAGE_ACTIVATE;
 
-	folio_wait_private_2(page_folio(page)); /* [DEPRECATED] */
-
 	err = writepage_nounlock(page, wbc);
 	if (err == -ERESTARTSYS) {
 		/* direct memory reclaimer was killed by SIGKILL. return 0
@@ -1073,8 +1060,7 @@ static int ceph_writepages_start(struct address_space *mapping,
 				unlock_page(page);
 				break;
 			}
-			if (PageWriteback(page) ||
-			    PagePrivate2(page) /* [DEPRECATED] */) {
+			if (PageWriteback(page)) {
 				if (wbc->sync_mode == WB_SYNC_NONE) {
 					doutc(cl, "%p under writeback\n", page);
 					unlock_page(page);
@@ -1082,7 +1068,6 @@ static int ceph_writepages_start(struct address_space *mapping,
 				}
 				doutc(cl, "waiting on writeback %p\n", page);
 				wait_on_page_writeback(page);
-				folio_wait_private_2(page_folio(page)); /* [DEPRECATED] */
 			}
 
 			if (!clear_page_dirty_for_io(page)) {
@@ -1267,8 +1252,6 @@ static int ceph_writepages_start(struct address_space *mapping,
 			}
 
 			set_page_writeback(page);
-			if (caching)
-				ceph_set_page_fscache(page);
 			len += thp_size(page);
 		}
 		ceph_fscache_write_to_cache(inode, offset, len, caching);
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index b3fd6e1fa322..1622cce535a3 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -464,7 +464,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
 	if (!netfs_is_cache_enabled(ctx) &&
 	    netfs_skip_folio_read(folio, pos, len, false)) {
 		netfs_stat(&netfs_n_rh_write_zskip);
-		goto have_folio_no_wait;
+		goto have_folio;
 	}
 
 	rreq = netfs_alloc_request(mapping, file,
@@ -505,12 +505,6 @@ int netfs_write_begin(struct netfs_inode *ctx,
 	netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
 
 have_folio:
-	if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags)) {
-		ret = folio_wait_private_2_killable(folio);
-		if (ret < 0)
-			goto error;
-	}
-have_folio_no_wait:
 	*_folio = folio;
 	_leave(" = 0");
 	return 0;
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index 60a19f96e0ce..2641238aae82 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -98,146 +98,6 @@ static void netfs_rreq_completed(struct netfs_io_request *rreq, bool was_async)
 	netfs_put_request(rreq, was_async, netfs_rreq_trace_put_complete);
 }
 
-/*
- * [DEPRECATED] Deal with the completion of writing the data to the cache.  We
- * have to clear the PG_fscache bits on the folios involved and release the
- * caller's ref.
- *
- * May be called in softirq mode and we inherit a ref from the caller.
- */
-static void netfs_rreq_unmark_after_write(struct netfs_io_request *rreq,
-					  bool was_async)
-{
-	struct netfs_io_subrequest *subreq;
-	struct folio *folio;
-	pgoff_t unlocked = 0;
-	bool have_unlocked = false;
-
-	rcu_read_lock();
-
-	list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
-		XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE);
-
-		xas_for_each(&xas, folio, (subreq->start + subreq->len - 1) / PAGE_SIZE) {
-			if (xas_retry(&xas, folio))
-				continue;
-
-			/* We might have multiple writes from the same huge
-			 * folio, but we mustn't unlock a folio more than once.
-			 */
-			if (have_unlocked && folio->index <= unlocked)
-				continue;
-			unlocked = folio_next_index(folio) - 1;
-			trace_netfs_folio(folio, netfs_folio_trace_end_copy);
-			folio_end_private_2(folio);
-			have_unlocked = true;
-		}
-	}
-
-	rcu_read_unlock();
-	netfs_rreq_completed(rreq, was_async);
-}
-
-static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error,
-				       bool was_async) /* [DEPRECATED] */
-{
-	struct netfs_io_subrequest *subreq = priv;
-	struct netfs_io_request *rreq = subreq->rreq;
-
-	if (IS_ERR_VALUE(transferred_or_error)) {
-		netfs_stat(&netfs_n_rh_write_failed);
-		trace_netfs_failure(rreq, subreq, transferred_or_error,
-				    netfs_fail_copy_to_cache);
-	} else {
-		netfs_stat(&netfs_n_rh_write_done);
-	}
-
-	trace_netfs_sreq(subreq, netfs_sreq_trace_write_term);
-
-	/* If we decrement nr_copy_ops to 0, the ref belongs to us. */
-	if (atomic_dec_and_test(&rreq->nr_copy_ops))
-		netfs_rreq_unmark_after_write(rreq, was_async);
-
-	netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
-}
-
-/*
- * [DEPRECATED] Perform any outstanding writes to the cache.  We inherit a ref
- * from the caller.
- */
-static void netfs_rreq_do_write_to_cache(struct netfs_io_request *rreq)
-{
-	struct netfs_cache_resources *cres = &rreq->cache_resources;
-	struct netfs_io_subrequest *subreq, *next, *p;
-	struct iov_iter iter;
-	int ret;
-
-	trace_netfs_rreq(rreq, netfs_rreq_trace_copy);
-
-	/* We don't want terminating writes trying to wake us up whilst we're
-	 * still going through the list.
-	 */
-	atomic_inc(&rreq->nr_copy_ops);
-
-	list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) {
-		if (!test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) {
-			list_del_init(&subreq->rreq_link);
-			netfs_put_subrequest(subreq, false,
-					     netfs_sreq_trace_put_no_copy);
-		}
-	}
-
-	list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
-		/* Amalgamate adjacent writes */
-		while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
-			next = list_next_entry(subreq, rreq_link);
-			if (next->start != subreq->start + subreq->len)
-				break;
-			subreq->len += next->len;
-			list_del_init(&next->rreq_link);
-			netfs_put_subrequest(next, false,
-					     netfs_sreq_trace_put_merged);
-		}
-
-		ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len,
-					       subreq->len, rreq->i_size, true);
-		if (ret < 0) {
-			trace_netfs_failure(rreq, subreq, ret, netfs_fail_prepare_write);
-			trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip);
-			continue;
-		}
-
-		iov_iter_xarray(&iter, ITER_SOURCE, &rreq->mapping->i_pages,
-				subreq->start, subreq->len);
-
-		atomic_inc(&rreq->nr_copy_ops);
-		netfs_stat(&netfs_n_rh_write);
-		netfs_get_subrequest(subreq, netfs_sreq_trace_get_copy_to_cache);
-		trace_netfs_sreq(subreq, netfs_sreq_trace_write);
-		cres->ops->write(cres, subreq->start, &iter,
-				 netfs_rreq_copy_terminated, subreq);
-	}
-
-	/* If we decrement nr_copy_ops to 0, the usage ref belongs to us. */
-	if (atomic_dec_and_test(&rreq->nr_copy_ops))
-		netfs_rreq_unmark_after_write(rreq, false);
-}
-
-static void netfs_rreq_write_to_cache_work(struct work_struct *work) /* [DEPRECATED] */
-{
-	struct netfs_io_request *rreq =
-		container_of(work, struct netfs_io_request, work);
-
-	netfs_rreq_do_write_to_cache(rreq);
-}
-
-static void netfs_rreq_write_to_cache(struct netfs_io_request *rreq) /* [DEPRECATED] */
-{
-	rreq->work.func = netfs_rreq_write_to_cache_work;
-	if (!queue_work(system_unbound_wq, &rreq->work))
-		BUG();
-}
-
 /*
  * Handle a short read.
  */
@@ -410,10 +270,6 @@ static void netfs_rreq_assess(struct netfs_io_request *rreq, bool was_async)
 	clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
 	wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS);
 
-	if (test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags) &&
-	    test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags))
-		return netfs_rreq_write_to_cache(rreq);
-
 	netfs_rreq_completed(rreq, was_async);
 }
 


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 07/26] netfs: Make netfs_io_request::subreq_counter an atomic_t
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (5 preceding siblings ...)
  2024-03-28 16:33 ` [PATCH 06/26] netfs: Remove deprecated use of PG_private_2 as a second writeback flag David Howells
@ 2024-03-28 16:33 ` David Howells
  2024-03-28 16:34 ` [PATCH 08/26] netfs: Use subreq_counter to allocate subreq debug_index values David Howells
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:33 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Make the netfs_io_request::subreq_counter, used to generate values for
netfs_io_subrequest::debug_index, into an atomic_t so that it can be called
from the retry thread at the same time as the app thread issuing writes.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/output.c     | 2 +-
 include/linux/netfs.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/output.c b/fs/netfs/output.c
index 625eb68f3e5a..fbdbb4f78234 100644
--- a/fs/netfs/output.c
+++ b/fs/netfs/output.c
@@ -37,7 +37,7 @@ struct netfs_io_subrequest *netfs_create_write_request(struct netfs_io_request *
 		subreq->source	= dest;
 		subreq->start	= start;
 		subreq->len	= len;
-		subreq->debug_index = wreq->subreq_counter++;
+		subreq->debug_index = atomic_inc_return(&wreq->subreq_counter);
 
 		switch (subreq->source) {
 		case NETFS_UPLOAD_TO_SERVER:
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index f36a6d8163d1..ddafc6ebff42 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -202,7 +202,7 @@ struct netfs_io_request {
 	unsigned int		debug_id;
 	unsigned int		rsize;		/* Maximum read size (0 for none) */
 	unsigned int		wsize;		/* Maximum write size (0 for none) */
-	unsigned int		subreq_counter;	/* Next subreq->debug_index */
+	atomic_t		subreq_counter;	/* Next subreq->debug_index */
 	atomic_t		nr_outstanding;	/* Number of ops in progress */
 	atomic_t		nr_copy_ops;	/* Number of copy-to-cache ops in progress */
 	size_t			submitted;	/* Amount submitted for I/O so far */


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 08/26] netfs: Use subreq_counter to allocate subreq debug_index values
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (6 preceding siblings ...)
  2024-03-28 16:33 ` [PATCH 07/26] netfs: Make netfs_io_request::subreq_counter an atomic_t David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 09/26] mm: Provide a means of invalidation without using launder_folio David Howells
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Use the subreq_counter in netfs_io_request to allocate subrequest
debug_index values in read ops as well as write ops.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/io.c      | 7 ++-----
 fs/netfs/objects.c | 1 +
 fs/netfs/output.c  | 1 -
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index 2641238aae82..8de581ac0cfb 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -501,8 +501,7 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
  * Slice off a piece of a read request and submit an I/O request for it.
  */
 static bool netfs_rreq_submit_slice(struct netfs_io_request *rreq,
-				    struct iov_iter *io_iter,
-				    unsigned int *_debug_index)
+				    struct iov_iter *io_iter)
 {
 	struct netfs_io_subrequest *subreq;
 	enum netfs_io_source source;
@@ -511,7 +510,6 @@ static bool netfs_rreq_submit_slice(struct netfs_io_request *rreq,
 	if (!subreq)
 		return false;
 
-	subreq->debug_index	= (*_debug_index)++;
 	subreq->start		= rreq->start + rreq->submitted;
 	subreq->len		= io_iter->count;
 
@@ -565,7 +563,6 @@ static bool netfs_rreq_submit_slice(struct netfs_io_request *rreq,
 int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
 {
 	struct iov_iter io_iter;
-	unsigned int debug_index = 0;
 	int ret;
 
 	_enter("R=%x %llx-%llx",
@@ -596,7 +593,7 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
 		if (rreq->origin == NETFS_DIO_READ &&
 		    rreq->start + rreq->submitted >= rreq->i_size)
 			break;
-		if (!netfs_rreq_submit_slice(rreq, &io_iter, &debug_index))
+		if (!netfs_rreq_submit_slice(rreq, &io_iter))
 			break;
 		if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
 		    test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 72b52f070270..8acc03a64059 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -152,6 +152,7 @@ struct netfs_io_subrequest *netfs_alloc_subrequest(struct netfs_io_request *rreq
 		INIT_LIST_HEAD(&subreq->rreq_link);
 		refcount_set(&subreq->ref, 2);
 		subreq->rreq = rreq;
+		subreq->debug_index = atomic_inc_return(&rreq->subreq_counter);
 		netfs_get_request(rreq, netfs_rreq_trace_get_subreq);
 		netfs_stat(&netfs_n_rh_sreq);
 	}
diff --git a/fs/netfs/output.c b/fs/netfs/output.c
index fbdbb4f78234..e586396d6b72 100644
--- a/fs/netfs/output.c
+++ b/fs/netfs/output.c
@@ -37,7 +37,6 @@ struct netfs_io_subrequest *netfs_create_write_request(struct netfs_io_request *
 		subreq->source	= dest;
 		subreq->start	= start;
 		subreq->len	= len;
-		subreq->debug_index = atomic_inc_return(&wreq->subreq_counter);
 
 		switch (subreq->source) {
 		case NETFS_UPLOAD_TO_SERVER:


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 09/26] mm: Provide a means of invalidation without using launder_folio
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (7 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 08/26] netfs: Use subreq_counter to allocate subreq debug_index values David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-04-15 11:41   ` Jeff Layton
  2024-03-28 16:34 ` [PATCH 10/26] cifs: Use alternative invalidation to " David Howells
                   ` (22 subsequent siblings)
  31 siblings, 1 reply; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: David Howells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Miklos Szeredi, Matthew Wilcox, Christoph Hellwig,
	Steve French, linux-cachefs, Ilya Dryomov, devel, Shyam Prasad N,
	Christian Brauner, Tom Talpey, Alexander Viro, ceph-devel,
	Eric Van Hensbergen, Trond Myklebust, linux-nfs, netdev, v9fs,
	linux-kernel, linux-fsdevel, netfs, Andrew Morton, linux-erofs

Implement a replacement for launder_folio.  The key feature of
invalidate_inode_pages2() is that it locks each folio individually, unmaps
it to prevent mmap'd accesses interfering and calls the ->launder_folio()
address_space op to flush it.  This has problems: firstly, each folio is
written individually as one or more small writes; secondly, adjacent folios
cannot be added so easily into the laundry; thirdly, it's yet another op to
implement.

Instead, use the invalidate lock to cause anyone wanting to add a folio to
the inode to wait, then unmap all the folios if we have mmaps, then,
conditionally, use ->writepages() to flush any dirty data back and then
discard all pages.

The invalidate lock prevents ->read_iter(), ->write_iter() and faulting
through mmap all from adding pages for the duration.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-mm@kvack.org
cc: linux-fsdevel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: devel@lists.orangefs.org
---
 include/linux/pagemap.h |  1 +
 mm/filemap.c            | 46 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2df35e65557d..4eb3d4177a53 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -40,6 +40,7 @@ int filemap_fdatawait_keep_errors(struct address_space *mapping);
 int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend);
 int filemap_fdatawait_range_keep_errors(struct address_space *mapping,
 		loff_t start_byte, loff_t end_byte);
+int filemap_invalidate_inode(struct inode *inode, bool flush);
 
 static inline int filemap_fdatawait(struct address_space *mapping)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index 25983f0f96e3..087f685107a5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4134,6 +4134,52 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp)
 }
 EXPORT_SYMBOL(filemap_release_folio);
 
+/**
+ * filemap_invalidate_inode - Invalidate/forcibly write back an inode's pagecache
+ * @inode: The inode to flush
+ * @flush: Set to write back rather than simply invalidate.
+ *
+ * Invalidate all the folios on an inode, possibly writing them back first.
+ * Whilst the operation is undertaken, the invalidate lock is held to prevent
+ * new folios from being installed.
+ */
+int filemap_invalidate_inode(struct inode *inode, bool flush)
+{
+	struct address_space *mapping = inode->i_mapping;
+
+	if (!mapping || !mapping->nrpages)
+		goto out;
+
+	/* Prevent new folios from being added to the inode. */
+	filemap_invalidate_lock(mapping);
+
+	if (!mapping->nrpages)
+		goto unlock;
+
+	unmap_mapping_pages(mapping, 0, ULONG_MAX, false);
+
+	/* Write back the data if we're asked to. */
+	if (flush) {
+		struct writeback_control wbc = {
+			.sync_mode	= WB_SYNC_ALL,
+			.nr_to_write	= LONG_MAX,
+			.range_start	= 0,
+			.range_end	= LLONG_MAX,
+		};
+
+		filemap_fdatawrite_wbc(mapping, &wbc);
+	}
+
+	/* Wait for writeback to complete on all folios and discard. */
+	truncate_inode_pages_range(mapping, 0, LLONG_MAX);
+
+unlock:
+	filemap_invalidate_unlock(mapping);
+out:
+	return filemap_check_errors(mapping);
+}
+EXPORT_SYMBOL(filemap_invalidate_inode);
+
 #ifdef CONFIG_CACHESTAT_SYSCALL
 /**
  * filemap_cachestat() - compute the page cache statistics of a mapping


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 10/26] cifs: Use alternative invalidation to using launder_folio
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (8 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 09/26] mm: Provide a means of invalidation without using launder_folio David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 11/26] 9p: " David Howells
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: David Howells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Matthew Wilcox, Steve French, linux-cachefs,
	Ilya Dryomov, Shyam Prasad N, Shyam Prasad N, Tom Talpey,
	ceph-devel, Eric Van Hensbergen, linux-nfs, Rohith Surabattula,
	netdev, v9fs, linux-kernel, Steve French, linux-fsdevel, netfs,
	linux-erofs

Use writepages-based flushing invalidation instead of
invalidate_inode_pages2() and ->launder_folio().  This will allow
->launder_folio() to be removed eventually.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/smb/client/cifsfs.h |   1 -
 fs/smb/client/file.c   | 122 -----------------------------------------
 fs/smb/client/inode.c  |  25 ++-------
 3 files changed, 5 insertions(+), 143 deletions(-)

diff --git a/fs/smb/client/cifsfs.h b/fs/smb/client/cifsfs.h
index ca55d01117c8..1ab7e5998c58 100644
--- a/fs/smb/client/cifsfs.h
+++ b/fs/smb/client/cifsfs.h
@@ -69,7 +69,6 @@ extern int cifs_revalidate_file_attr(struct file *filp);
 extern int cifs_revalidate_dentry_attr(struct dentry *);
 extern int cifs_revalidate_file(struct file *filp);
 extern int cifs_revalidate_dentry(struct dentry *);
-extern int cifs_invalidate_mapping(struct inode *inode);
 extern int cifs_revalidate_mapping(struct inode *inode);
 extern int cifs_zap_mapping(struct inode *inode);
 extern int cifs_getattr(struct mnt_idmap *, const struct path *,
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 59da572d3384..f92d4d42e87e 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2576,64 +2576,6 @@ struct cifs_writedata *cifs_writedata_alloc(work_func_t complete)
 	return wdata;
 }
 
-static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
-{
-	struct address_space *mapping = page->mapping;
-	loff_t offset = (loff_t)page->index << PAGE_SHIFT;
-	char *write_data;
-	int rc = -EFAULT;
-	int bytes_written = 0;
-	struct inode *inode;
-	struct cifsFileInfo *open_file;
-
-	if (!mapping || !mapping->host)
-		return -EFAULT;
-
-	inode = page->mapping->host;
-
-	offset += (loff_t)from;
-	write_data = kmap(page);
-	write_data += from;
-
-	if ((to > PAGE_SIZE) || (from > to)) {
-		kunmap(page);
-		return -EIO;
-	}
-
-	/* racing with truncate? */
-	if (offset > mapping->host->i_size) {
-		kunmap(page);
-		return 0; /* don't care */
-	}
-
-	/* check to make sure that we are not extending the file */
-	if (mapping->host->i_size - offset < (loff_t)to)
-		to = (unsigned)(mapping->host->i_size - offset);
-
-	rc = cifs_get_writable_file(CIFS_I(mapping->host), FIND_WR_ANY,
-				    &open_file);
-	if (!rc) {
-		bytes_written = cifs_write(open_file, open_file->pid,
-					   write_data, to - from, &offset);
-		cifsFileInfo_put(open_file);
-		/* Does mm or vfs already set times? */
-		simple_inode_init_ts(inode);
-		if ((bytes_written > 0) && (offset))
-			rc = 0;
-		else if (bytes_written < 0)
-			rc = bytes_written;
-		else
-			rc = -EFAULT;
-	} else {
-		cifs_dbg(FYI, "No writable handle for write page rc=%d\n", rc);
-		if (!is_retryable_error(rc))
-			rc = -EIO;
-	}
-
-	kunmap(page);
-	return rc;
-}
-
 /*
  * Extend the region to be written back to include subsequent contiguously
  * dirty pages if possible, but don't sleep while doing so.
@@ -3047,47 +2989,6 @@ static int cifs_writepages(struct address_space *mapping,
 	return ret;
 }
 
-static int
-cifs_writepage_locked(struct page *page, struct writeback_control *wbc)
-{
-	int rc;
-	unsigned int xid;
-
-	xid = get_xid();
-/* BB add check for wbc flags */
-	get_page(page);
-	if (!PageUptodate(page))
-		cifs_dbg(FYI, "ppw - page not up to date\n");
-
-	/*
-	 * Set the "writeback" flag, and clear "dirty" in the radix tree.
-	 *
-	 * A writepage() implementation always needs to do either this,
-	 * or re-dirty the page with "redirty_page_for_writepage()" in
-	 * the case of a failure.
-	 *
-	 * Just unlocking the page will cause the radix tree tag-bits
-	 * to fail to update with the state of the page correctly.
-	 */
-	set_page_writeback(page);
-retry_write:
-	rc = cifs_partialpagewrite(page, 0, PAGE_SIZE);
-	if (is_retryable_error(rc)) {
-		if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN)
-			goto retry_write;
-		redirty_page_for_writepage(wbc, page);
-	} else if (rc != 0) {
-		SetPageError(page);
-		mapping_set_error(page->mapping, rc);
-	} else {
-		SetPageUptodate(page);
-	}
-	end_page_writeback(page);
-	put_page(page);
-	free_xid(xid);
-	return rc;
-}
-
 static int cifs_write_end(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
 			struct page *page, void *fsdata)
@@ -4913,27 +4814,6 @@ static void cifs_invalidate_folio(struct folio *folio, size_t offset,
 	folio_wait_private_2(folio); /* [DEPRECATED] */
 }
 
-static int cifs_launder_folio(struct folio *folio)
-{
-	int rc = 0;
-	loff_t range_start = folio_pos(folio);
-	loff_t range_end = range_start + folio_size(folio);
-	struct writeback_control wbc = {
-		.sync_mode = WB_SYNC_ALL,
-		.nr_to_write = 0,
-		.range_start = range_start,
-		.range_end = range_end,
-	};
-
-	cifs_dbg(FYI, "Launder page: %lu\n", folio->index);
-
-	if (folio_clear_dirty_for_io(folio))
-		rc = cifs_writepage_locked(&folio->page, &wbc);
-
-	folio_wait_private_2(folio); /* [DEPRECATED] */
-	return rc;
-}
-
 void cifs_oplock_break(struct work_struct *work)
 {
 	struct cifsFileInfo *cfile = container_of(work, struct cifsFileInfo,
@@ -5112,7 +4992,6 @@ const struct address_space_operations cifs_addr_ops = {
 	.release_folio = cifs_release_folio,
 	.direct_IO = cifs_direct_io,
 	.invalidate_folio = cifs_invalidate_folio,
-	.launder_folio = cifs_launder_folio,
 	.migrate_folio = filemap_migrate_folio,
 	/*
 	 * TODO: investigate and if useful we could add an is_dirty_writeback
@@ -5135,6 +5014,5 @@ const struct address_space_operations cifs_addr_ops_smallbuf = {
 	.dirty_folio = netfs_dirty_folio,
 	.release_folio = cifs_release_folio,
 	.invalidate_folio = cifs_invalidate_folio,
-	.launder_folio = cifs_launder_folio,
 	.migrate_folio = filemap_migrate_folio,
 };
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
index 91b07ef9e25c..468ea2312a1a 100644
--- a/fs/smb/client/inode.c
+++ b/fs/smb/client/inode.c
@@ -2430,24 +2430,6 @@ cifs_dentry_needs_reval(struct dentry *dentry)
 	return false;
 }
 
-/*
- * Zap the cache. Called when invalid_mapping flag is set.
- */
-int
-cifs_invalidate_mapping(struct inode *inode)
-{
-	int rc = 0;
-
-	if (inode->i_mapping && inode->i_mapping->nrpages != 0) {
-		rc = invalidate_inode_pages2(inode->i_mapping);
-		if (rc)
-			cifs_dbg(VFS, "%s: invalidate inode %p failed with rc %d\n",
-				 __func__, inode, rc);
-	}
-
-	return rc;
-}
-
 /**
  * cifs_wait_bit_killable - helper for functions that are sleeping on bit locks
  *
@@ -2484,9 +2466,12 @@ cifs_revalidate_mapping(struct inode *inode)
 		if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RW_CACHE)
 			goto skip_invalidate;
 
-		rc = cifs_invalidate_mapping(inode);
-		if (rc)
+		rc = filemap_invalidate_inode(inode, true);
+		if (rc) {
+			cifs_dbg(VFS, "%s: invalidate inode %p failed with rc %d\n",
+				 __func__, inode, rc);
 			set_bit(CIFS_INO_INVALID_MAPPING, flags);
+		}
 	}
 
 skip_invalidate:


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 11/26] 9p: Use alternative invalidation to using launder_folio
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (9 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 10/26] cifs: Use alternative invalidation to " David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-04-15 11:43   ` Jeff Layton
  2024-04-16 23:03   ` David Howells
  2024-03-28 16:34 ` [PATCH 12/26] afs: " David Howells
                   ` (20 subsequent siblings)
  31 siblings, 2 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, David Howells, linux-mm,
	Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs,
	Matthew Wilcox, Steve French, linux-cachefs, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	linux-nfs, netdev, v9fs, linux-kernel, linux-fsdevel, netfs,
	linux-erofs

Use writepages-based flushing invalidation instead of
invalidate_inode_pages2() and ->launder_folio().  This will allow
->launder_folio() to be removed eventually.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: v9fs@lists.linux.dev
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/9p/vfs_addr.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 047855033d32..5a943c122d83 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -89,7 +89,6 @@ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file)
 	bool writing = (rreq->origin == NETFS_READ_FOR_WRITE ||
 			rreq->origin == NETFS_WRITEBACK ||
 			rreq->origin == NETFS_WRITETHROUGH ||
-			rreq->origin == NETFS_LAUNDER_WRITE ||
 			rreq->origin == NETFS_UNBUFFERED_WRITE ||
 			rreq->origin == NETFS_DIO_WRITE);
 
@@ -141,7 +140,6 @@ const struct address_space_operations v9fs_addr_operations = {
 	.dirty_folio		= netfs_dirty_folio,
 	.release_folio		= netfs_release_folio,
 	.invalidate_folio	= netfs_invalidate_folio,
-	.launder_folio		= netfs_launder_folio,
 	.direct_IO		= noop_direct_IO,
 	.writepages		= netfs_writepages,
 };


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 12/26] afs: Use alternative invalidation to using launder_folio
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (10 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 11/26] 9p: " David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 13/26] netfs: Remove ->launder_folio() support David Howells
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Use writepages-based flushing invalidation instead of
invalidate_inode_pages2() and ->launder_folio().  This will allow
->launder_folio() to be removed eventually.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/afs/file.c       |  1 -
 fs/afs/internal.h   |  1 -
 fs/afs/validation.c |  4 ++--
 fs/afs/write.c      | 10 +++-------
 4 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index ef2cc8f565d2..dfd8f60f5e1f 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -54,7 +54,6 @@ const struct address_space_operations afs_file_aops = {
 	.read_folio	= netfs_read_folio,
 	.readahead	= netfs_readahead,
 	.dirty_folio	= netfs_dirty_folio,
-	.launder_folio	= netfs_launder_folio,
 	.release_folio	= netfs_release_folio,
 	.invalidate_folio = netfs_invalidate_folio,
 	.migrate_folio	= filemap_migrate_folio,
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 6ce5a612937c..b93aa026daa4 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -916,7 +916,6 @@ struct afs_operation {
 			loff_t	pos;
 			loff_t	size;
 			loff_t	i_size;
-			bool	laundering;	/* Laundering page, PG_writeback not set */
 		} store;
 		struct {
 			struct iattr	*attr;
diff --git a/fs/afs/validation.c b/fs/afs/validation.c
index 32a53fc8dfb2..1d8bbc46f734 100644
--- a/fs/afs/validation.c
+++ b/fs/afs/validation.c
@@ -365,9 +365,9 @@ static void afs_zap_data(struct afs_vnode *vnode)
 	 * written back in a regular file and completely discard the pages in a
 	 * directory or symlink */
 	if (S_ISREG(vnode->netfs.inode.i_mode))
-		invalidate_remote_inode(&vnode->netfs.inode);
+		filemap_invalidate_inode(&vnode->netfs.inode, true);
 	else
-		invalidate_inode_pages2(vnode->netfs.inode.i_mapping);
+		filemap_invalidate_inode(&vnode->netfs.inode, false);
 }
 
 /*
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 74402d95a884..1bc26466eb72 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -75,8 +75,7 @@ static void afs_store_data_success(struct afs_operation *op)
 	op->ctime = op->file[0].scb.status.mtime_client;
 	afs_vnode_commit_status(op, &op->file[0]);
 	if (!afs_op_error(op)) {
-		if (!op->store.laundering)
-			afs_pages_written_back(vnode, op->store.pos, op->store.size);
+		afs_pages_written_back(vnode, op->store.pos, op->store.size);
 		afs_stat_v(vnode, n_stores);
 		atomic_long_add(op->store.size, &afs_v2net(vnode)->n_store_bytes);
 	}
@@ -91,8 +90,7 @@ static const struct afs_operation_ops afs_store_data_operation = {
 /*
  * write to a file
  */
-static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t pos,
-			  bool laundering)
+static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t pos)
 {
 	struct afs_operation *op;
 	struct afs_wb_key *wbk = NULL;
@@ -123,7 +121,6 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t
 	op->file[0].modification = true;
 	op->store.pos = pos;
 	op->store.size = size;
-	op->store.laundering = laundering;
 	op->flags |= AFS_OPERATION_UNINTR;
 	op->ops = &afs_store_data_operation;
 
@@ -168,8 +165,7 @@ static void afs_upload_to_server(struct netfs_io_subrequest *subreq)
 	       subreq->rreq->debug_id, subreq->debug_index, subreq->io_iter.count);
 
 	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
-	ret = afs_store_data(vnode, &subreq->io_iter, subreq->start,
-			     subreq->rreq->origin == NETFS_LAUNDER_WRITE);
+	ret = afs_store_data(vnode, &subreq->io_iter, subreq->start);
 	netfs_write_subrequest_terminated(subreq, ret < 0 ? ret : subreq->len,
 					  false);
 }


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 13/26] netfs: Remove ->launder_folio() support
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (11 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 12/26] afs: " David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 14/26] netfs: Use mempools for allocating requests and subrequests David Howells
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, David Howells, linux-mm,
	Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs,
	Matthew Wilcox, Steve French, linux-cachefs, Ilya Dryomov, devel,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	linux-nfs, netdev, v9fs, linux-kernel, Steve French,
	linux-fsdevel, netfs, linux-erofs

Remove support for ->launder_folio() from netfslib and expect filesystems
to use filemap_invalidate_inode() instead.  netfs_launder_folio() can then
be got rid of.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <sfrench@samba.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-mm@kvack.org
cc: linux-fsdevel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: devel@lists.orangefs.org
---
 fs/netfs/buffered_write.c    | 74 ------------------------------------
 fs/netfs/main.c              |  1 -
 include/linux/netfs.h        |  2 -
 include/trace/events/netfs.h |  3 --
 4 files changed, 80 deletions(-)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 576a68b7887e..624d8859c2fa 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -1199,77 +1199,3 @@ int netfs_writepages(struct address_space *mapping,
 	return ret;
 }
 EXPORT_SYMBOL(netfs_writepages);
-
-/*
- * Deal with the disposition of a laundered folio.
- */
-static void netfs_cleanup_launder_folio(struct netfs_io_request *wreq)
-{
-	if (wreq->error) {
-		pr_notice("R=%08x Laundering error %d\n", wreq->debug_id, wreq->error);
-		mapping_set_error(wreq->mapping, wreq->error);
-	}
-}
-
-/**
- * netfs_launder_folio - Clean up a dirty folio that's being invalidated
- * @folio: The folio to clean
- *
- * This is called to write back a folio that's being invalidated when an inode
- * is getting torn down.  Ideally, writepages would be used instead.
- */
-int netfs_launder_folio(struct folio *folio)
-{
-	struct netfs_io_request *wreq;
-	struct address_space *mapping = folio->mapping;
-	struct netfs_folio *finfo = netfs_folio_info(folio);
-	struct netfs_group *group = netfs_folio_group(folio);
-	struct bio_vec bvec;
-	unsigned long long i_size = i_size_read(mapping->host);
-	unsigned long long start = folio_pos(folio);
-	size_t offset = 0, len;
-	int ret = 0;
-
-	if (finfo) {
-		offset = finfo->dirty_offset;
-		start += offset;
-		len = finfo->dirty_len;
-	} else {
-		len = folio_size(folio);
-	}
-	len = min_t(unsigned long long, len, i_size - start);
-
-	wreq = netfs_alloc_request(mapping, NULL, start, len, NETFS_LAUNDER_WRITE);
-	if (IS_ERR(wreq)) {
-		ret = PTR_ERR(wreq);
-		goto out;
-	}
-
-	if (!folio_clear_dirty_for_io(folio))
-		goto out_put;
-
-	trace_netfs_folio(folio, netfs_folio_trace_launder);
-
-	_debug("launder %llx-%llx", start, start + len - 1);
-
-	/* Speculatively write to the cache.  We have to fix this up later if
-	 * the store fails.
-	 */
-	wreq->cleanup = netfs_cleanup_launder_folio;
-
-	bvec_set_folio(&bvec, folio, len, offset);
-	iov_iter_bvec(&wreq->iter, ITER_SOURCE, &bvec, 1, len);
-	if (group != NETFS_FOLIO_COPY_TO_CACHE)
-		__set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags);
-	ret = netfs_begin_write(wreq, true, netfs_write_trace_launder);
-
-out_put:
-	folio_detach_private(folio);
-	netfs_put_group(group);
-	kfree(finfo);
-	netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
-out:
-	_leave(" = %d", ret);
-	return ret;
-}
-EXPORT_SYMBOL(netfs_launder_folio);
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index c5a73c9ed126..844efbb2e7a2 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -34,7 +34,6 @@ static const char *netfs_origins[nr__netfs_io_origin] = {
 	[NETFS_COPY_TO_CACHE]		= "CC",
 	[NETFS_WRITEBACK]		= "WB",
 	[NETFS_WRITETHROUGH]		= "WT",
-	[NETFS_LAUNDER_WRITE]		= "LW",
 	[NETFS_UNBUFFERED_WRITE]	= "UW",
 	[NETFS_DIO_READ]		= "DR",
 	[NETFS_DIO_WRITE]		= "DW",
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index ddafc6ebff42..3af589dabd7f 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -172,7 +172,6 @@ enum netfs_io_origin {
 	NETFS_COPY_TO_CACHE,		/* This write is to copy a read to the cache */
 	NETFS_WRITEBACK,		/* This write was triggered by writepages */
 	NETFS_WRITETHROUGH,		/* This write was made by netfs_perform_write() */
-	NETFS_LAUNDER_WRITE,		/* This is triggered by ->launder_folio() */
 	NETFS_UNBUFFERED_WRITE,		/* This is an unbuffered write */
 	NETFS_DIO_READ,			/* This is a direct I/O read */
 	NETFS_DIO_WRITE,		/* This is a direct I/O write */
@@ -352,7 +351,6 @@ int netfs_unpin_writeback(struct inode *inode, struct writeback_control *wbc);
 void netfs_clear_inode_writeback(struct inode *inode, const void *aux);
 void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length);
 bool netfs_release_folio(struct folio *folio, gfp_t gfp);
-int netfs_launder_folio(struct folio *folio);
 
 /* VMA operations API. */
 vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group *netfs_group);
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index e03fafb0c1e3..30769103638f 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -26,7 +26,6 @@
 #define netfs_write_traces					\
 	EM(netfs_write_trace_copy_to_cache,	"COPY2CACH")	\
 	EM(netfs_write_trace_dio_write,		"DIO-WRITE")	\
-	EM(netfs_write_trace_launder,		"LAUNDER  ")	\
 	EM(netfs_write_trace_unbuffered_write,	"UNB-WRITE")	\
 	EM(netfs_write_trace_writeback,		"WRITEBACK")	\
 	E_(netfs_write_trace_writethrough,	"WRITETHRU")
@@ -38,7 +37,6 @@
 	EM(NETFS_COPY_TO_CACHE,			"CC")		\
 	EM(NETFS_WRITEBACK,			"WB")		\
 	EM(NETFS_WRITETHROUGH,			"WT")		\
-	EM(NETFS_LAUNDER_WRITE,			"LW")		\
 	EM(NETFS_UNBUFFERED_WRITE,		"UW")		\
 	EM(NETFS_DIO_READ,			"DR")		\
 	E_(NETFS_DIO_WRITE,			"DW")
@@ -135,7 +133,6 @@
 	EM(netfs_folio_trace_end_copy,		"end-copy")	\
 	EM(netfs_folio_trace_filled_gaps,	"filled-gaps")	\
 	EM(netfs_folio_trace_kill,		"kill")		\
-	EM(netfs_folio_trace_launder,		"launder")	\
 	EM(netfs_folio_trace_mkwrite,		"mkwrite")	\
 	EM(netfs_folio_trace_mkwrite_plus,	"mkwrite+")	\
 	EM(netfs_folio_trace_read_gaps,		"read-gaps")	\


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 14/26] netfs: Use mempools for allocating requests and subrequests
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (12 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 13/26] netfs: Remove ->launder_folio() support David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 15/26] mm: Export writeback_iter() David Howells
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Use mempools for allocating requests and subrequests in an effort to make
sure that allocation always succeeds so that when performing writeback we
can always make progress.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---
 fs/netfs/internal.h   |  2 ++
 fs/netfs/main.c       | 51 ++++++++++++++++++++++++++++++++-----
 fs/netfs/objects.c    | 59 ++++++++++++++++++++++++++++---------------
 include/linux/netfs.h |  5 ++--
 4 files changed, 89 insertions(+), 28 deletions(-)

diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 156ab138e224..c67da478cd2b 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -37,6 +37,8 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync);
 extern unsigned int netfs_debug;
 extern struct list_head netfs_io_requests;
 extern spinlock_t netfs_proc_lock;
+extern mempool_t netfs_request_pool;
+extern mempool_t netfs_subrequest_pool;
 
 #ifdef CONFIG_PROC_FS
 static inline void netfs_proc_add_rreq(struct netfs_io_request *rreq)
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index 844efbb2e7a2..4805b9377364 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -7,6 +7,7 @@
 
 #include <linux/module.h>
 #include <linux/export.h>
+#include <linux/mempool.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include "internal.h"
@@ -23,6 +24,11 @@ unsigned netfs_debug;
 module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO);
 MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask");
 
+static struct kmem_cache *netfs_request_slab;
+static struct kmem_cache *netfs_subrequest_slab;
+mempool_t netfs_request_pool;
+mempool_t netfs_subrequest_pool;
+
 #ifdef CONFIG_PROC_FS
 LIST_HEAD(netfs_io_requests);
 DEFINE_SPINLOCK(netfs_proc_lock);
@@ -98,25 +104,54 @@ static int __init netfs_init(void)
 {
 	int ret = -ENOMEM;
 
+	netfs_request_slab = kmem_cache_create("netfs_request",
+					       sizeof(struct netfs_io_request), 0,
+					       SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT,
+					       NULL);
+	if (!netfs_request_slab)
+		goto error_req;
+
+	if (mempool_init_slab_pool(&netfs_request_pool, 100, netfs_request_slab) < 0)
+		goto error_reqpool;
+
+	netfs_subrequest_slab = kmem_cache_create("netfs_subrequest",
+						  sizeof(struct netfs_io_subrequest), 0,
+						  SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT,
+						  NULL);
+	if (!netfs_subrequest_slab)
+		goto error_subreq;
+
+	if (mempool_init_slab_pool(&netfs_subrequest_pool, 100, netfs_subrequest_slab) < 0)
+		goto error_subreqpool;
+
 	if (!proc_mkdir("fs/netfs", NULL))
-		goto error;
+		goto error_proc;
 	if (!proc_create_seq("fs/netfs/requests", S_IFREG | 0444, NULL,
 			     &netfs_requests_seq_ops))
-		goto error_proc;
+		goto error_procfile;
 #ifdef CONFIG_FSCACHE_STATS
 	if (!proc_create_single("fs/netfs/stats", S_IFREG | 0444, NULL,
 				netfs_stats_show))
-		goto error_proc;
+		goto error_procfile;
 #endif
 
 	ret = fscache_init();
 	if (ret < 0)
-		goto error_proc;
+		goto error_fscache;
 	return 0;
 
-error_proc:
+error_fscache:
+error_procfile:
 	remove_proc_entry("fs/netfs", NULL);
-error:
+error_proc:
+	mempool_exit(&netfs_subrequest_pool);
+error_subreqpool:
+	kmem_cache_destroy(netfs_subrequest_slab);
+error_subreq:
+	mempool_exit(&netfs_request_pool);
+error_reqpool:
+	kmem_cache_destroy(netfs_request_slab);
+error_req:
 	return ret;
 }
 fs_initcall(netfs_init);
@@ -125,5 +160,9 @@ static void __exit netfs_exit(void)
 {
 	fscache_exit();
 	remove_proc_entry("fs/netfs", NULL);
+	mempool_exit(&netfs_subrequest_pool);
+	kmem_cache_destroy(netfs_subrequest_slab);
+	mempool_exit(&netfs_request_pool);
+	kmem_cache_destroy(netfs_request_slab);
 }
 module_exit(netfs_exit);
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 8acc03a64059..1a4e2ce735ce 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -6,6 +6,8 @@
  */
 
 #include <linux/slab.h>
+#include <linux/mempool.h>
+#include <linux/delay.h>
 #include "internal.h"
 
 /*
@@ -20,17 +22,22 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
 	struct inode *inode = file ? file_inode(file) : mapping->host;
 	struct netfs_inode *ctx = netfs_inode(inode);
 	struct netfs_io_request *rreq;
+	mempool_t *mempool = ctx->ops->request_pool ?: &netfs_request_pool;
+	struct kmem_cache *cache = mempool->pool_data;
 	bool is_unbuffered = (origin == NETFS_UNBUFFERED_WRITE ||
 			      origin == NETFS_DIO_READ ||
 			      origin == NETFS_DIO_WRITE);
 	bool cached = !is_unbuffered && netfs_is_cache_enabled(ctx);
 	int ret;
 
-	rreq = kzalloc(ctx->ops->io_request_size ?: sizeof(struct netfs_io_request),
-		       GFP_KERNEL);
-	if (!rreq)
-		return ERR_PTR(-ENOMEM);
+	for (;;) {
+		rreq = mempool_alloc(mempool, GFP_KERNEL);
+		if (rreq)
+			break;
+		msleep(10);
+	}
 
+	memset(rreq, 0, kmem_cache_size(cache));
 	rreq->start	= start;
 	rreq->len	= len;
 	rreq->upper_len	= len;
@@ -56,7 +63,7 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
 	if (rreq->netfs_ops->init_request) {
 		ret = rreq->netfs_ops->init_request(rreq, file);
 		if (ret < 0) {
-			kfree(rreq);
+			mempool_free(rreq, rreq->netfs_ops->request_pool ?: &netfs_request_pool);
 			return ERR_PTR(ret);
 		}
 	}
@@ -88,6 +95,14 @@ void netfs_clear_subrequests(struct netfs_io_request *rreq, bool was_async)
 	}
 }
 
+static void netfs_free_request_rcu(struct rcu_head *rcu)
+{
+	struct netfs_io_request *rreq = container_of(rcu, struct netfs_io_request, rcu);
+
+	mempool_free(rreq, rreq->netfs_ops->request_pool ?: &netfs_request_pool);
+	netfs_stat_d(&netfs_n_rh_rreq);
+}
+
 static void netfs_free_request(struct work_struct *work)
 {
 	struct netfs_io_request *rreq =
@@ -110,8 +125,7 @@ static void netfs_free_request(struct work_struct *work)
 		}
 		kvfree(rreq->direct_bv);
 	}
-	kfree_rcu(rreq, rcu);
-	netfs_stat_d(&netfs_n_rh_rreq);
+	call_rcu(&rreq->rcu, netfs_free_request_rcu);
 }
 
 void netfs_put_request(struct netfs_io_request *rreq, bool was_async,
@@ -143,20 +157,25 @@ void netfs_put_request(struct netfs_io_request *rreq, bool was_async,
 struct netfs_io_subrequest *netfs_alloc_subrequest(struct netfs_io_request *rreq)
 {
 	struct netfs_io_subrequest *subreq;
-
-	subreq = kzalloc(rreq->netfs_ops->io_subrequest_size ?:
-			 sizeof(struct netfs_io_subrequest),
-			 GFP_KERNEL);
-	if (subreq) {
-		INIT_WORK(&subreq->work, NULL);
-		INIT_LIST_HEAD(&subreq->rreq_link);
-		refcount_set(&subreq->ref, 2);
-		subreq->rreq = rreq;
-		subreq->debug_index = atomic_inc_return(&rreq->subreq_counter);
-		netfs_get_request(rreq, netfs_rreq_trace_get_subreq);
-		netfs_stat(&netfs_n_rh_sreq);
+	mempool_t *mempool = rreq->netfs_ops->subrequest_pool ?: &netfs_subrequest_pool;
+	struct kmem_cache *cache = mempool->pool_data;
+
+	for (;;) {
+		subreq = mempool_alloc(rreq->netfs_ops->subrequest_pool ?: &netfs_subrequest_pool,
+				       GFP_KERNEL);
+		if (subreq)
+			break;
+		msleep(10);
 	}
 
+	memset(subreq, 0, kmem_cache_size(cache));
+	INIT_WORK(&subreq->work, NULL);
+	INIT_LIST_HEAD(&subreq->rreq_link);
+	refcount_set(&subreq->ref, 2);
+	subreq->rreq = rreq;
+	subreq->debug_index = atomic_inc_return(&rreq->subreq_counter);
+	netfs_get_request(rreq, netfs_rreq_trace_get_subreq);
+	netfs_stat(&netfs_n_rh_sreq);
 	return subreq;
 }
 
@@ -178,7 +197,7 @@ static void netfs_free_subrequest(struct netfs_io_subrequest *subreq,
 	trace_netfs_sreq(subreq, netfs_sreq_trace_free);
 	if (rreq->netfs_ops->free_subrequest)
 		rreq->netfs_ops->free_subrequest(subreq);
-	kfree(subreq);
+	mempool_free(subreq, rreq->netfs_ops->subrequest_pool ?: &netfs_subrequest_pool);
 	netfs_stat_d(&netfs_n_rh_sreq);
 	netfs_put_request(rreq, was_async, netfs_rreq_trace_put_subreq);
 }
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 3af589dabd7f..0b6c2c2d3c23 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -20,6 +20,7 @@
 #include <linux/uio.h>
 
 enum netfs_sreq_ref_trace;
+typedef struct mempool_s mempool_t;
 
 /**
  * folio_start_private_2 - Start an fscache write on a folio.  [DEPRECATED]
@@ -236,8 +237,8 @@ struct netfs_io_request {
  * Operations the network filesystem can/must provide to the helpers.
  */
 struct netfs_request_ops {
-	unsigned int	io_request_size;	/* Alloc size for netfs_io_request struct */
-	unsigned int	io_subrequest_size;	/* Alloc size for netfs_io_subrequest struct */
+	mempool_t *request_pool;
+	mempool_t *subrequest_pool;
 	int (*init_request)(struct netfs_io_request *rreq, struct file *file);
 	void (*free_request)(struct netfs_io_request *rreq);
 	void (*free_subrequest)(struct netfs_io_subrequest *rreq);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 15/26] mm: Export writeback_iter()
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (13 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 14/26] netfs: Use mempools for allocating requests and subrequests David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-04-03  8:59   ` Christoph Hellwig
  2024-04-03 10:10   ` David Howells
  2024-03-28 16:34 ` [PATCH 16/26] netfs: Switch to using unsigned long long rather than loff_t David Howells
                   ` (16 subsequent siblings)
  31 siblings, 2 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: David Howells, linux-mm, Marc Dionne, Christoph Hellwig,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, linux-afs,
	Steve French, linux-cachefs, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, linux-nfs, netdev,
	v9fs, linux-kernel, linux-fsdevel, netfs, linux-erofs

Export writeback_iter() so that it can be used by netfslib as a module.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Christoph Hellwig <hch@lst.de>
cc: linux-mm@kvack.org
---
 mm/page-writeback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3e19b87049db..9df160a1cf9e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2546,6 +2546,7 @@ struct folio *writeback_iter(struct address_space *mapping,
 	folio_batch_release(&wbc->fbatch);
 	return NULL;
 }
+EXPORT_SYMBOL(writeback_iter);
 
 /**
  * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 16/26] netfs: Switch to using unsigned long long rather than loff_t
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (14 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 15/26] mm: Export writeback_iter() David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 17/26] netfs: Fix writethrough-mode error handling David Howells
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: David Howells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Eric Van Hensbergen, Matthew Wilcox, Steve French,
	linux-cachefs, Ilya Dryomov, Shyam Prasad N, Tom Talpey,
	ceph-devel, Xiubo Li, linux-nfs, netdev, v9fs, linux-kernel,
	linux-fsdevel, netfs, linux-erofs

Switch to using unsigned long long rather than loff_t in netfslib to avoid
problems with the sign flipping in the maths when we're dealing with the
byte at position 0x7fffffffffffffff.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/cachefiles/io.c           |  2 +-
 fs/ceph/addr.c               |  2 +-
 fs/netfs/buffered_read.c     |  4 +++-
 fs/netfs/buffered_write.c    |  2 +-
 fs/netfs/io.c                |  6 +++---
 fs/netfs/main.c              |  2 +-
 fs/netfs/output.c            |  4 ++--
 include/linux/netfs.h        | 16 +++++++++-------
 include/trace/events/netfs.h |  6 +++---
 9 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 1d685357e67f..5ba5c7814fe4 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -493,7 +493,7 @@ cachefiles_do_prepare_read(struct netfs_cache_resources *cres,
  * boundary as appropriate.
  */
 static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *subreq,
-						    loff_t i_size)
+						    unsigned long long i_size)
 {
 	return cachefiles_do_prepare_read(&subreq->rreq->cache_resources,
 					  subreq->start, &subreq->len, i_size,
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 2d0f13537c85..3bd58eaea231 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -193,7 +193,7 @@ static void ceph_netfs_expand_readahead(struct netfs_io_request *rreq)
 	 * block, but do not exceed the file size, unless the original
 	 * request already exceeds it.
 	 */
-	new_end = min(round_up(end, lo->stripe_unit), rreq->i_size);
+	new_end = umin(round_up(end, lo->stripe_unit), rreq->i_size);
 	if (new_end > end && new_end <= rreq->start + max_len)
 		rreq->len = new_end - rreq->start;
 
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 1622cce535a3..47603f08680e 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -130,7 +130,9 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
 }
 
 static void netfs_cache_expand_readahead(struct netfs_io_request *rreq,
-					 loff_t *_start, size_t *_len, loff_t i_size)
+					 unsigned long long *_start,
+					 unsigned long long *_len,
+					 unsigned long long i_size)
 {
 	struct netfs_cache_resources *cres = &rreq->cache_resources;
 
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 624d8859c2fa..8e4a3fb287e3 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -663,7 +663,7 @@ static void netfs_pages_written_back(struct netfs_io_request *wreq)
 	last = (wreq->start + wreq->len - 1) / PAGE_SIZE;
 	xas_for_each(&xas, folio, last) {
 		WARN(!folio_test_writeback(folio),
-		     "bad %zx @%llx page %lx %lx\n",
+		     "bad %llx @%llx page %lx %lx\n",
 		     wreq->len, wreq->start, folio->index, last);
 
 		if ((finfo = netfs_folio_info(folio))) {
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index 8de581ac0cfb..6cfecfcd02e1 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -476,7 +476,7 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
 
 set:
 	if (subreq->len > rreq->len)
-		pr_warn("R=%08x[%u] SREQ>RREQ %zx > %zx\n",
+		pr_warn("R=%08x[%u] SREQ>RREQ %zx > %llx\n",
 			rreq->debug_id, subreq->debug_index,
 			subreq->len, rreq->len);
 
@@ -513,7 +513,7 @@ static bool netfs_rreq_submit_slice(struct netfs_io_request *rreq,
 	subreq->start		= rreq->start + rreq->submitted;
 	subreq->len		= io_iter->count;
 
-	_debug("slice %llx,%zx,%zx", subreq->start, subreq->len, rreq->submitted);
+	_debug("slice %llx,%zx,%llx", subreq->start, subreq->len, rreq->submitted);
 	list_add_tail(&subreq->rreq_link, &rreq->subrequests);
 
 	/* Call out to the cache to find out what it can do with the remaining
@@ -588,7 +588,7 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
 	atomic_set(&rreq->nr_outstanding, 1);
 	io_iter = rreq->io_iter;
 	do {
-		_debug("submit %llx + %zx >= %llx",
+		_debug("submit %llx + %llx >= %llx",
 		       rreq->start, rreq->submitted, rreq->i_size);
 		if (rreq->origin == NETFS_DIO_READ &&
 		    rreq->start + rreq->submitted >= rreq->i_size)
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index 4805b9377364..5f0f438e5d21 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -62,7 +62,7 @@ static int netfs_requests_seq_show(struct seq_file *m, void *v)
 
 	rreq = list_entry(v, struct netfs_io_request, proc_link);
 	seq_printf(m,
-		   "%08x %s %3d %2lx %4d %3d @%04llx %zx/%zx",
+		   "%08x %s %3d %2lx %4d %3d @%04llx %llx/%llx",
 		   rreq->debug_id,
 		   netfs_origins[rreq->origin],
 		   refcount_read(&rreq->ref),
diff --git a/fs/netfs/output.c b/fs/netfs/output.c
index e586396d6b72..85374322f10f 100644
--- a/fs/netfs/output.c
+++ b/fs/netfs/output.c
@@ -439,7 +439,7 @@ static void netfs_submit_writethrough(struct netfs_io_request *wreq, bool final)
  */
 int netfs_advance_writethrough(struct netfs_io_request *wreq, size_t copied, bool to_page_end)
 {
-	_enter("ic=%zu sb=%zu ws=%u cp=%zu tp=%u",
+	_enter("ic=%zu sb=%llu ws=%u cp=%zu tp=%u",
 	       wreq->iter.count, wreq->submitted, wreq->wsize, copied, to_page_end);
 
 	wreq->iter.count += copied;
@@ -457,7 +457,7 @@ int netfs_end_writethrough(struct netfs_io_request *wreq, struct kiocb *iocb)
 {
 	int ret = -EIOCBQUEUED;
 
-	_enter("ic=%zu sb=%zu ws=%u",
+	_enter("ic=%zu sb=%llu ws=%u",
 	       wreq->iter.count, wreq->submitted, wreq->wsize);
 
 	if (wreq->submitted < wreq->io_iter.count)
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 0b6c2c2d3c23..88269681d4fc 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -149,7 +149,7 @@ struct netfs_io_subrequest {
 	struct work_struct	work;
 	struct list_head	rreq_link;	/* Link in rreq->subrequests */
 	struct iov_iter		io_iter;	/* Iterator for this subrequest */
-	loff_t			start;		/* Where to start the I/O */
+	unsigned long long	start;		/* Where to start the I/O */
 	size_t			len;		/* Size of the I/O */
 	size_t			transferred;	/* Amount of data transferred */
 	refcount_t		ref;
@@ -205,15 +205,15 @@ struct netfs_io_request {
 	atomic_t		subreq_counter;	/* Next subreq->debug_index */
 	atomic_t		nr_outstanding;	/* Number of ops in progress */
 	atomic_t		nr_copy_ops;	/* Number of copy-to-cache ops in progress */
-	size_t			submitted;	/* Amount submitted for I/O so far */
-	size_t			len;		/* Length of the request */
 	size_t			upper_len;	/* Length can be extended to here */
+	unsigned long long	submitted;	/* Amount submitted for I/O so far */
+	unsigned long long	len;		/* Length of the request */
 	size_t			transferred;	/* Amount to be indicated as transferred */
 	short			error;		/* 0 or error that occurred */
 	enum netfs_io_origin	origin;		/* Origin of the request */
 	bool			direct_bv_unpin; /* T if direct_bv[] must be unpinned */
-	loff_t			i_size;		/* Size of the file */
-	loff_t			start;		/* Start position */
+	unsigned long long	i_size;		/* Size of the file */
+	unsigned long long	start;		/* Start position */
 	pgoff_t			no_unlock_folio; /* Don't unlock this folio after read */
 	refcount_t		ref;
 	unsigned long		flags;
@@ -294,13 +294,15 @@ struct netfs_cache_ops {
 
 	/* Expand readahead request */
 	void (*expand_readahead)(struct netfs_cache_resources *cres,
-				 loff_t *_start, size_t *_len, loff_t i_size);
+				 unsigned long long *_start,
+				 unsigned long long *_len,
+				 unsigned long long i_size);
 
 	/* Prepare a read operation, shortening it to a cached/uncached
 	 * boundary as appropriate.
 	 */
 	enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
-					     loff_t i_size);
+					     unsigned long long i_size);
 
 	/* Prepare a write operation, working out what part of the write we can
 	 * actually do.
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 30769103638f..7126d2ea459c 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -280,7 +280,7 @@ TRACE_EVENT(netfs_sreq,
 		    __entry->start	= sreq->start;
 			   ),
 
-	    TP_printk("R=%08x[%u] %s %s f=%02x s=%llx %zx/%zx e=%d",
+	    TP_printk("R=%08x[%x] %s %s f=%02x s=%llx %zx/%zx e=%d",
 		      __entry->rreq, __entry->index,
 		      __print_symbolic(__entry->source, netfs_sreq_sources),
 		      __print_symbolic(__entry->what, netfs_sreq_traces),
@@ -320,7 +320,7 @@ TRACE_EVENT(netfs_failure,
 		    __entry->start	= sreq ? sreq->start : 0;
 			   ),
 
-	    TP_printk("R=%08x[%d] %s f=%02x s=%llx %zx/%zx %s e=%d",
+	    TP_printk("R=%08x[%x] %s f=%02x s=%llx %zx/%zx %s e=%d",
 		      __entry->rreq, __entry->index,
 		      __print_symbolic(__entry->source, netfs_sreq_sources),
 		      __entry->flags,
@@ -436,7 +436,7 @@ TRACE_EVENT(netfs_write,
 		    __field(unsigned int,		cookie		)
 		    __field(enum netfs_write_trace,	what		)
 		    __field(unsigned long long,		start		)
-		    __field(size_t,			len		)
+		    __field(unsigned long long,		len		)
 			     ),
 
 	    TP_fast_assign(


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 17/26] netfs: Fix writethrough-mode error handling
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (15 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 16/26] netfs: Switch to using unsigned long long rather than loff_t David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-04-15 12:40   ` Jeff Layton
  2024-04-17  9:04   ` David Howells
  2024-03-28 16:34 ` [PATCH 18/26] netfs: Add some write-side stats and clean up some stat names David Howells
                   ` (14 subsequent siblings)
  31 siblings, 2 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Fix the error return in netfs_perform_write() acting in writethrough-mode
to return any cached error in the case that netfs_end_writethrough()
returns 0.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/buffered_write.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 8e4a3fb287e3..db4ad158948b 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -188,7 +188,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 	enum netfs_how_to_modify howto;
 	enum netfs_folio_trace trace;
 	unsigned int bdp_flags = (iocb->ki_flags & IOCB_SYNC) ? 0: BDP_ASYNC;
-	ssize_t written = 0, ret;
+	ssize_t written = 0, ret, ret2;
 	loff_t i_size, pos = iocb->ki_pos, from, to;
 	size_t max_chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
 	bool maybe_trouble = false;
@@ -409,10 +409,12 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 
 out:
 	if (unlikely(wreq)) {
-		ret = netfs_end_writethrough(wreq, iocb);
+		ret2 = netfs_end_writethrough(wreq, iocb);
 		wbc_detach_inode(&wbc);
-		if (ret == -EIOCBQUEUED)
-			return ret;
+		if (ret2 == -EIOCBQUEUED)
+			return ret2;
+		if (ret == 0)
+			ret = ret2;
 	}
 
 	iocb->ki_pos += written;


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 18/26] netfs: Add some write-side stats and clean up some stat names
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (16 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 17/26] netfs: Fix writethrough-mode error handling David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 19/26] netfs: New writeback implementation David Howells
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Add some write-side stats to count buffered writes, buffered writethrough,
and writepages calls.

Whilst we're at it, clean up the naming on some of the existing stats
counters and organise the output into two sets.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/buffered_read.c  |  2 +-
 fs/netfs/buffered_write.c |  3 +++
 fs/netfs/direct_write.c   |  2 +-
 fs/netfs/internal.h       |  7 +++++--
 fs/netfs/stats.c          | 17 ++++++++++++-----
 5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 47603f08680e..a6bb03bea920 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -282,7 +282,7 @@ int netfs_read_folio(struct file *file, struct folio *folio)
 	if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
 		goto discard;
 
-	netfs_stat(&netfs_n_rh_readpage);
+	netfs_stat(&netfs_n_rh_read_folio);
 	trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage);
 
 	/* Set up the output buffer */
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index db4ad158948b..244d67a43972 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -215,6 +215,9 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 		if (!is_sync_kiocb(iocb))
 			wreq->iocb = iocb;
 		wreq->cleanup = netfs_cleanup_buffered_write;
+		netfs_stat(&netfs_n_wh_writethrough);
+	} else {
+		netfs_stat(&netfs_n_wh_buffered_write);
 	}
 
 	do {
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index bee047e20f5d..37c91188107b 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -143,7 +143,7 @@ ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		return 0;
 
 	trace_netfs_write_iter(iocb, from);
-	netfs_stat(&netfs_n_rh_dio_write);
+	netfs_stat(&netfs_n_wh_dio_write);
 
 	ret = netfs_start_io_direct(inode);
 	if (ret < 0)
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index c67da478cd2b..58289cc65e25 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -106,9 +106,8 @@ int netfs_end_writethrough(struct netfs_io_request *wreq, struct kiocb *iocb);
  */
 #ifdef CONFIG_NETFS_STATS
 extern atomic_t netfs_n_rh_dio_read;
-extern atomic_t netfs_n_rh_dio_write;
 extern atomic_t netfs_n_rh_readahead;
-extern atomic_t netfs_n_rh_readpage;
+extern atomic_t netfs_n_rh_read_folio;
 extern atomic_t netfs_n_rh_rreq;
 extern atomic_t netfs_n_rh_sreq;
 extern atomic_t netfs_n_rh_download;
@@ -125,6 +124,10 @@ extern atomic_t netfs_n_rh_write_begin;
 extern atomic_t netfs_n_rh_write_done;
 extern atomic_t netfs_n_rh_write_failed;
 extern atomic_t netfs_n_rh_write_zskip;
+extern atomic_t netfs_n_wh_buffered_write;
+extern atomic_t netfs_n_wh_writethrough;
+extern atomic_t netfs_n_wh_dio_write;
+extern atomic_t netfs_n_wh_writepages;
 extern atomic_t netfs_n_wh_wstream_conflict;
 extern atomic_t netfs_n_wh_upload;
 extern atomic_t netfs_n_wh_upload_done;
diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c
index deeba9f9dcf5..0892768eea32 100644
--- a/fs/netfs/stats.c
+++ b/fs/netfs/stats.c
@@ -10,9 +10,8 @@
 #include "internal.h"
 
 atomic_t netfs_n_rh_dio_read;
-atomic_t netfs_n_rh_dio_write;
 atomic_t netfs_n_rh_readahead;
-atomic_t netfs_n_rh_readpage;
+atomic_t netfs_n_rh_read_folio;
 atomic_t netfs_n_rh_rreq;
 atomic_t netfs_n_rh_sreq;
 atomic_t netfs_n_rh_download;
@@ -29,6 +28,10 @@ atomic_t netfs_n_rh_write_begin;
 atomic_t netfs_n_rh_write_done;
 atomic_t netfs_n_rh_write_failed;
 atomic_t netfs_n_rh_write_zskip;
+atomic_t netfs_n_wh_buffered_write;
+atomic_t netfs_n_wh_writethrough;
+atomic_t netfs_n_wh_dio_write;
+atomic_t netfs_n_wh_writepages;
 atomic_t netfs_n_wh_wstream_conflict;
 atomic_t netfs_n_wh_upload;
 atomic_t netfs_n_wh_upload_done;
@@ -39,13 +42,17 @@ atomic_t netfs_n_wh_write_failed;
 
 int netfs_stats_show(struct seq_file *m, void *v)
 {
-	seq_printf(m, "Netfs  : DR=%u DW=%u RA=%u RP=%u WB=%u WBZ=%u\n",
+	seq_printf(m, "Netfs  : DR=%u RA=%u RF=%u WB=%u WBZ=%u\n",
 		   atomic_read(&netfs_n_rh_dio_read),
-		   atomic_read(&netfs_n_rh_dio_write),
 		   atomic_read(&netfs_n_rh_readahead),
-		   atomic_read(&netfs_n_rh_readpage),
+		   atomic_read(&netfs_n_rh_read_folio),
 		   atomic_read(&netfs_n_rh_write_begin),
 		   atomic_read(&netfs_n_rh_write_zskip));
+	seq_printf(m, "Netfs  : BW=%u WT=%u DW=%u WP=%u\n",
+		   atomic_read(&netfs_n_wh_buffered_write),
+		   atomic_read(&netfs_n_wh_writethrough),
+		   atomic_read(&netfs_n_wh_dio_write),
+		   atomic_read(&netfs_n_wh_writepages));
 	seq_printf(m, "Netfs  : ZR=%u sh=%u sk=%u\n",
 		   atomic_read(&netfs_n_rh_zero),
 		   atomic_read(&netfs_n_rh_short_read),


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 19/26] netfs: New writeback implementation
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (17 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 18/26] netfs: Add some write-side stats and clean up some stat names David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-29 10:34   ` Naveen Mamindlapalli
  2024-03-30  1:03   ` Vadim Fedorenko
  2024-03-28 16:34 ` [PATCH 20/26] netfs, afs: Implement helpers for new write code David Howells
                   ` (12 subsequent siblings)
  31 siblings, 2 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, David Howells, linux-mm,
	Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs,
	Matthew Wilcox, Steve French, linux-cachefs, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	linux-nfs, netdev, v9fs, linux-kernel, linux-fsdevel, netfs,
	linux-erofs

The current netfslib writeback implementation creates writeback requests of
contiguous folio data and then separately tiles subrequests over the space
twice, once for the server and once for the cache.  This creates a few
issues:

 (1) Every time there's a discontiguity or a change between writing to only
     one destination or writing to both, it must create a new request.
     This makes it harder to do vectored writes.

 (2) The folios don't have the writeback mark removed until the end of the
     request - and a request could be hundreds of megabytes.

 (3) In future, I want to support a larger cache granularity, which will
     require aggregation of some folios that contain unmodified data (which
     only need to go to the cache) and some which contain modifications
     (which need to be uploaded and stored to the cache) - but, currently,
     these are treated as discontiguous.

There's also a move to get everyone to use writeback_iter() to extract
writable folios from the pagecache.  That said, currently writeback_iter()
has some issues that make it less than ideal:

 (1) there's no way to cancel the iteration, even if you find a "temporary"
     error that means the current folio and all subsequent folios are going
     to fail;

 (2) there's no way to filter the folios being written back - something
     that will impact Ceph with it's ordered snap system;

 (3) and if you get a folio you can't immediately deal with (say you need
     to flush the preceding writes), you are left with a folio hanging in
     the locked state for the duration, when really we should unlock it and
     relock it later.

In this new implementation, I use writeback_iter() to pump folios,
progressively creating two parallel, but separate streams and cleaning up
the finished folios as the subrequests complete.  Either or both streams
can contain gaps, and the subrequests in each stream can be of variable
size, don't need to align with each other and don't need to align with the
folios.

Indeed, subrequests can cross folio boundaries, may cover several folios or
a folio may be spanned by multiple folios, e.g.:

         +---+---+-----+-----+---+----------+
Folios:  |   |   |     |     |   |          |
         +---+---+-----+-----+---+----------+

           +------+------+     +----+----+
Upload:    |      |      |.....|    |    |
           +------+------+     +----+----+

         +------+------+------+------+------+
Cache:   |      |      |      |      |      |
         +------+------+------+------+------+

The progressive subrequest construction permits the algorithm to be
preparing both the next upload to the server and the next write to the
cache whilst the previous ones are already in progress.  Throttling can be
applied to control the rate of production of subrequests - and, in any
case, we probably want to write them to the server in ascending order,
particularly if the file will be extended.

Content crypto can also be prepared at the same time as the subrequests and
run asynchronously, with the prepped requests being stalled until the
crypto catches up with them.  This might also be useful for transport
crypto, but that happens at a lower layer, so probably would be harder to
pull off.

The algorithm is split into three parts:

 (1) The issuer.  This walks through the data, packaging it up, encrypting
     it and creating subrequests.  The part of this that generates
     subrequests only deals with file positions and spans and so is usable
     for DIO/unbuffered writes as well as buffered writes.

 (2) The collector. This asynchronously collects completed subrequests,
     unlocks folios, frees crypto buffers and performs any retries.  This
     runs in a work queue so that the issuer can return to the caller for
     writeback (so that the VM can have its kswapd thread back) or async
     writes.

 (3) The retryer.  This pauses the issuer, waits for all outstanding
     subrequests to complete and then goes through the failed subrequests
     to reissue them.  This may involve reprepping them (with cifs, the
     credits must be renegotiated, and a subrequest may need splitting),
     and doing RMW for content crypto if there's a conflicting change on
     the server.

[!] Note that some of the functions are prefixed with "new_" to avoid
clashes with existing functions.  These will be renamed in a later patch
that cuts over to the new algorithm.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/Makefile            |   4 +-
 fs/netfs/buffered_write.c    |   4 -
 fs/netfs/internal.h          |  27 ++
 fs/netfs/objects.c           |  17 +
 fs/netfs/write_collect.c     | 808 +++++++++++++++++++++++++++++++++++
 fs/netfs/write_issue.c       | 673 +++++++++++++++++++++++++++++
 include/linux/netfs.h        |  68 ++-
 include/trace/events/netfs.h | 232 +++++++++-
 8 files changed, 1824 insertions(+), 9 deletions(-)
 create mode 100644 fs/netfs/write_collect.c
 create mode 100644 fs/netfs/write_issue.c

diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index d4d1d799819e..1eb86e34b5a9 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -11,7 +11,9 @@ netfs-y := \
 	main.o \
 	misc.o \
 	objects.o \
-	output.o
+	output.o \
+	write_collect.o \
+	write_issue.o
 
 netfs-$(CONFIG_NETFS_STATS) += stats.o
 
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 244d67a43972..621532dacef5 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -74,16 +74,12 @@ static enum netfs_how_to_modify netfs_how_to_modify(struct netfs_inode *ctx,
 
 	if (file->f_mode & FMODE_READ)
 		goto no_write_streaming;
-	if (test_bit(NETFS_ICTX_NO_WRITE_STREAMING, &ctx->flags))
-		goto no_write_streaming;
 
 	if (netfs_is_cache_enabled(ctx)) {
 		/* We don't want to get a streaming write on a file that loses
 		 * caching service temporarily because the backing store got
 		 * culled.
 		 */
-		if (!test_bit(NETFS_ICTX_NO_WRITE_STREAMING, &ctx->flags))
-			set_bit(NETFS_ICTX_NO_WRITE_STREAMING, &ctx->flags);
 		goto no_write_streaming;
 	}
 
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 58289cc65e25..5d3f74a70fa7 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -153,6 +153,33 @@ static inline void netfs_stat_d(atomic_t *stat)
 #define netfs_stat_d(x) do {} while(0)
 #endif
 
+/*
+ * write_collect.c
+ */
+int netfs_folio_written_back(struct folio *folio);
+void netfs_write_collection_worker(struct work_struct *work);
+void netfs_wake_write_collector(struct netfs_io_request *wreq, bool was_async);
+
+/*
+ * write_issue.c
+ */
+struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
+						struct file *file,
+						loff_t start,
+						enum netfs_io_origin origin);
+void netfs_reissue_write(struct netfs_io_stream *stream,
+			 struct netfs_io_subrequest *subreq);
+int netfs_advance_write(struct netfs_io_request *wreq,
+			struct netfs_io_stream *stream,
+			loff_t start, size_t len, bool to_eof);
+struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t len);
+int new_netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
+				   struct folio *folio, size_t copied, bool to_page_end,
+				   struct folio **writethrough_cache);
+int new_netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
+			       struct folio *writethrough_cache);
+int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t len);
+
 /*
  * Miscellaneous functions.
  */
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 1a4e2ce735ce..c90d482b1650 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -47,6 +47,10 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
 	rreq->inode	= inode;
 	rreq->i_size	= i_size_read(inode);
 	rreq->debug_id	= atomic_inc_return(&debug_ids);
+	rreq->wsize	= INT_MAX;
+	spin_lock_init(&rreq->lock);
+	INIT_LIST_HEAD(&rreq->io_streams[0].subrequests);
+	INIT_LIST_HEAD(&rreq->io_streams[1].subrequests);
 	INIT_LIST_HEAD(&rreq->subrequests);
 	INIT_WORK(&rreq->work, NULL);
 	refcount_set(&rreq->ref, 1);
@@ -85,6 +89,8 @@ void netfs_get_request(struct netfs_io_request *rreq, enum netfs_rreq_ref_trace
 void netfs_clear_subrequests(struct netfs_io_request *rreq, bool was_async)
 {
 	struct netfs_io_subrequest *subreq;
+	struct netfs_io_stream *stream;
+	int s;
 
 	while (!list_empty(&rreq->subrequests)) {
 		subreq = list_first_entry(&rreq->subrequests,
@@ -93,6 +99,17 @@ void netfs_clear_subrequests(struct netfs_io_request *rreq, bool was_async)
 		netfs_put_subrequest(subreq, was_async,
 				     netfs_sreq_trace_put_clear);
 	}
+
+	for (s = 0; s < ARRAY_SIZE(rreq->io_streams); s++) {
+		stream = &rreq->io_streams[s];
+		while (!list_empty(&stream->subrequests)) {
+			subreq = list_first_entry(&stream->subrequests,
+						  struct netfs_io_subrequest, rreq_link);
+			list_del(&subreq->rreq_link);
+			netfs_put_subrequest(subreq, was_async,
+					     netfs_sreq_trace_put_clear);
+		}
+	}
 }
 
 static void netfs_free_request_rcu(struct rcu_head *rcu)
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
new file mode 100644
index 000000000000..5e2ca8b25af0
--- /dev/null
+++ b/fs/netfs/write_collect.c
@@ -0,0 +1,808 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Network filesystem write subrequest result collection, assessment
+ * and retrying.
+ *
+ * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/export.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+/* Notes made in the collector */
+#define HIT_PENDING		0x01	/* A front op was still pending */
+#define SOME_EMPTY		0x02	/* One of more streams are empty */
+#define ALL_EMPTY		0x04	/* All streams are empty */
+#define MAYBE_DISCONTIG		0x08	/* A front op may be discontiguous (rounded to PAGE_SIZE) */
+#define NEED_REASSESS		0x10	/* Need to loop round and reassess */
+#define REASSESS_DISCONTIG	0x20	/* Reassess discontiguity if contiguity advances */
+#define MADE_PROGRESS		0x40	/* Made progress cleaning up a stream or the folio set */
+#define BUFFERED		0x80	/* The pagecache needs cleaning up */
+#define NEED_RETRY		0x100	/* A front op requests retrying */
+#define SAW_FAILURE		0x200	/* One stream or hit a permanent failure */
+
+/*
+ * Successful completion of write of a folio to the server and/or cache.  Note
+ * that we are not allowed to lock the folio here on pain of deadlocking with
+ * truncate.
+ */
+int netfs_folio_written_back(struct folio *folio)
+{
+	enum netfs_folio_trace why = netfs_folio_trace_clear;
+	struct netfs_folio *finfo;
+	struct netfs_group *group = NULL;
+	int gcount = 0;
+
+	if ((finfo = netfs_folio_info(folio))) {
+		/* Streaming writes cannot be redirtied whilst under writeback,
+		 * so discard the streaming record.
+		 */
+		folio_detach_private(folio);
+		group = finfo->netfs_group;
+		gcount++;
+		kfree(finfo);
+		why = netfs_folio_trace_clear_s;
+		goto end_wb;
+	}
+
+	if ((group = netfs_folio_group(folio))) {
+		if (group == NETFS_FOLIO_COPY_TO_CACHE) {
+			why = netfs_folio_trace_clear_cc;
+			if (group == NETFS_FOLIO_COPY_TO_CACHE)
+				folio_detach_private(folio);
+			else
+				why = netfs_folio_trace_redirtied;
+			goto end_wb;
+		}
+
+		/* Need to detach the group pointer if the page didn't get
+		 * redirtied.  If it has been redirtied, then it must be within
+		 * the same group.
+		 */
+		why = netfs_folio_trace_redirtied;
+		if (!folio_test_dirty(folio)) {
+			if (!folio_test_dirty(folio)) {
+				folio_detach_private(folio);
+				gcount++;
+				why = netfs_folio_trace_clear_g;
+			}
+		}
+	}
+
+end_wb:
+	trace_netfs_folio(folio, why);
+	folio_end_writeback(folio);
+	return gcount;
+}
+
+/*
+ * Get hold of a folio we have under writeback.  We don't want to get the
+ * refcount on it.
+ */
+static struct folio *netfs_writeback_lookup_folio(struct netfs_io_request *wreq, loff_t pos)
+{
+	XA_STATE(xas, &wreq->mapping->i_pages, pos / PAGE_SIZE);
+	struct folio *folio;
+
+	rcu_read_lock();
+
+	for (;;) {
+		xas_reset(&xas);
+		folio = xas_load(&xas);
+		if (xas_retry(&xas, folio))
+			continue;
+
+		if (!folio || xa_is_value(folio))
+			kdebug("R=%08x: folio %lx (%llx) not present",
+			       wreq->debug_id, xas.xa_index, pos / PAGE_SIZE);
+		BUG_ON(!folio || xa_is_value(folio));
+
+		if (folio == xas_reload(&xas))
+			break;
+	}
+
+	rcu_read_unlock();
+
+	if (WARN_ONCE(!folio_test_writeback(folio),
+		      "R=%08x: folio %lx is not under writeback\n",
+		      wreq->debug_id, folio->index)) {
+		trace_netfs_folio(folio, netfs_folio_trace_not_under_wback);
+	}
+	return folio;
+}
+
+/*
+ * Unlock any folios we've finished with.
+ */
+static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq,
+					  unsigned long long collected_to,
+					  unsigned int *notes)
+{
+	for (;;) {
+		struct folio *folio;
+		struct netfs_folio *finfo;
+		unsigned long long fpos, fend;
+		size_t fsize, flen;
+
+		folio = netfs_writeback_lookup_folio(wreq, wreq->cleaned_to);
+
+		fpos = folio_pos(folio);
+		fsize = folio_size(folio);
+		finfo = netfs_folio_info(folio);
+		flen = finfo ? finfo->dirty_offset + finfo->dirty_len : fsize;
+
+		fend = min_t(unsigned long long, fpos + flen, wreq->i_size);
+
+		trace_netfs_collect_folio(wreq, folio, fend, collected_to);
+
+		if (fpos + fsize > wreq->contiguity) {
+			trace_netfs_collect_contig(wreq, fpos + fsize,
+						   netfs_contig_trace_unlock);
+			wreq->contiguity = fpos + fsize;
+		}
+
+		/* Unlock any folio we've transferred all of. */
+		if (collected_to < fend)
+			break;
+
+		wreq->nr_group_rel += netfs_folio_written_back(folio);
+		wreq->cleaned_to = fpos + fsize;
+		*notes |= MADE_PROGRESS;
+
+		if (fpos + fsize >= collected_to)
+			break;
+	}
+}
+
+/*
+ * Perform retries on the streams that need it.
+ */
+static void netfs_retry_write_stream(struct netfs_io_request *wreq,
+				     struct netfs_io_stream *stream)
+{
+	struct list_head *next;
+
+	_enter("R=%x[%x:]", wreq->debug_id, stream->stream_nr);
+
+	if (unlikely(stream->failed))
+		return;
+
+	/* If there's no renegotiation to do, just resend each failed subreq. */
+	if (!stream->prepare_write) {
+		struct netfs_io_subrequest *subreq;
+
+		list_for_each_entry(subreq, &stream->subrequests, rreq_link) {
+			if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
+				break;
+			if (__test_and_clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
+				__set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
+				netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+				netfs_reissue_write(stream, subreq);
+			}
+		}
+		return;
+	}
+
+	if (list_empty(&stream->subrequests))
+		return;
+	next = stream->subrequests.next;
+
+	do {
+		struct netfs_io_subrequest *subreq = NULL, *from, *to, *tmp;
+		unsigned long long start, len;
+		size_t part;
+		bool boundary = false;
+
+		/* Go through the stream and find the next span of contiguous
+		 * data that we then rejig (cifs, for example, needs the wsize
+		 * renegotiating) and reissue.
+		 */
+		from = list_entry(next, struct netfs_io_subrequest, rreq_link);
+		to = from;
+		start = from->start + from->transferred;
+		len   = from->len   - from->transferred;
+
+		if (test_bit(NETFS_SREQ_FAILED, &from->flags) ||
+		    !test_bit(NETFS_SREQ_NEED_RETRY, &from->flags))
+			return;
+
+		list_for_each_continue(next, &stream->subrequests) {
+			subreq = list_entry(next, struct netfs_io_subrequest, rreq_link);
+			if (subreq->start + subreq->transferred != start + len ||
+			    test_bit(NETFS_SREQ_BOUNDARY, &subreq->flags) ||
+			    !test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags))
+				break;
+			to = subreq;
+			len += to->len;
+		}
+
+		/* Work through the sublist. */
+		subreq = from;
+		list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) {
+			if (!len)
+				break;
+			/* Renegotiate max_len (wsize) */
+			trace_netfs_sreq(subreq, netfs_sreq_trace_retry);
+			__clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+			__set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
+			stream->prepare_write(subreq);
+
+			part = min(len, subreq->max_len);
+			subreq->len = part;
+			subreq->start = start;
+			subreq->transferred = 0;
+			len -= part;
+			start += part;
+			if (len && subreq == to &&
+			    __test_and_clear_bit(NETFS_SREQ_BOUNDARY, &to->flags))
+				boundary = true;
+
+			netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+			netfs_reissue_write(stream, subreq);
+			if (subreq == to)
+				break;
+		}
+
+		/* If we managed to use fewer subreqs, we can discard the
+		 * excess; if we used the same number, then we're done.
+		 */
+		if (!len) {
+			if (subreq == to)
+				continue;
+			list_for_each_entry_safe_from(subreq, tmp,
+						      &stream->subrequests, rreq_link) {
+				trace_netfs_sreq(subreq, netfs_sreq_trace_discard);
+				list_del(&subreq->rreq_link);
+				netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_done);
+				if (subreq == to)
+					break;
+			}
+			continue;
+		}
+
+		/* We ran out of subrequests, so we need to allocate some more
+		 * and insert them after.
+		 */
+		do {
+			subreq = netfs_alloc_subrequest(wreq);
+			subreq->source		= to->source;
+			subreq->start		= start;
+			subreq->max_len		= len;
+			subreq->max_nr_segs	= INT_MAX;
+			subreq->debug_index	= atomic_inc_return(&wreq->subreq_counter);
+			subreq->stream_nr	= to->stream_nr;
+			__set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
+
+			trace_netfs_sreq_ref(wreq->debug_id, subreq->debug_index,
+					     refcount_read(&subreq->ref),
+					     netfs_sreq_trace_new);
+			netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+
+			list_add(&subreq->rreq_link, &to->rreq_link);
+			to = list_next_entry(to, rreq_link);
+			trace_netfs_sreq(subreq, netfs_sreq_trace_retry);
+
+			switch (stream->source) {
+			case NETFS_UPLOAD_TO_SERVER:
+				netfs_stat(&netfs_n_wh_upload);
+				subreq->max_len = min(len, wreq->wsize);
+				break;
+			case NETFS_WRITE_TO_CACHE:
+				netfs_stat(&netfs_n_wh_write);
+				break;
+			default:
+				WARN_ON_ONCE(1);
+			}
+
+			stream->prepare_write(subreq);
+
+			part = min(len, subreq->max_len);
+			subreq->len = subreq->transferred + part;
+			len -= part;
+			start += part;
+			if (!len && boundary) {
+				__set_bit(NETFS_SREQ_BOUNDARY, &to->flags);
+				boundary = false;
+			}
+
+			netfs_reissue_write(stream, subreq);
+			if (!len)
+				break;
+
+		} while (len);
+
+	} while (!list_is_head(next, &stream->subrequests));
+}
+
+/*
+ * Perform retries on the streams that need it.  If we're doing content
+ * encryption and the server copy changed due to a third-party write, we may
+ * need to do an RMW cycle and also rewrite the data to the cache.
+ */
+static void netfs_retry_writes(struct netfs_io_request *wreq)
+{
+	struct netfs_io_subrequest *subreq;
+	struct netfs_io_stream *stream;
+	int s;
+
+	/* Wait for all outstanding I/O to quiesce before performing retries as
+	 * we may need to renegotiate the I/O sizes.
+	 */
+	for (s = 0; s < NR_IO_STREAMS; s++) {
+		stream = &wreq->io_streams[s];
+		if (!stream->active)
+			continue;
+
+		list_for_each_entry(subreq, &stream->subrequests, rreq_link) {
+			wait_on_bit(&subreq->flags, NETFS_SREQ_IN_PROGRESS,
+				    TASK_UNINTERRUPTIBLE);
+		}
+	}
+
+	// TODO: Enc: Fetch changed partial pages
+	// TODO: Enc: Reencrypt content if needed.
+	// TODO: Enc: Wind back transferred point.
+	// TODO: Enc: Mark cache pages for retry.
+
+	for (s = 0; s < NR_IO_STREAMS; s++) {
+		stream = &wreq->io_streams[s];
+		if (stream->need_retry) {
+			stream->need_retry = false;
+			netfs_retry_write_stream(wreq, stream);
+		}
+	}
+}
+
+/*
+ * Collect and assess the results of various write subrequests.  We may need to
+ * retry some of the results - or even do an RMW cycle for content crypto.
+ *
+ * Note that we have a number of parallel, overlapping lists of subrequests,
+ * one to the server and one to the local cache for example, which may not be
+ * the same size or starting position and may not even correspond in boundary
+ * alignment.
+ */
+static void netfs_collect_write_results(struct netfs_io_request *wreq)
+{
+	struct netfs_io_subrequest *front, *remove;
+	struct netfs_io_stream *stream;
+	unsigned long long collected_to;
+	unsigned int notes;
+	int s;
+
+	_enter("%llx-%llx", wreq->start, wreq->start + wreq->len);
+	trace_netfs_collect(wreq);
+	trace_netfs_rreq(wreq, netfs_rreq_trace_collect);
+
+reassess_streams:
+	smp_rmb();
+	collected_to = ULLONG_MAX;
+	if (wreq->origin == NETFS_WRITEBACK)
+		notes = ALL_EMPTY | BUFFERED | MAYBE_DISCONTIG;
+	else if (wreq->origin == NETFS_WRITETHROUGH)
+		notes = ALL_EMPTY | BUFFERED;
+	else
+		notes = ALL_EMPTY;
+
+	/* Remove completed subrequests from the front of the streams and
+	 * advance the completion point on each stream.  We stop when we hit
+	 * something that's in progress.  The issuer thread may be adding stuff
+	 * to the tail whilst we're doing this.
+	 *
+	 * We must not, however, merge in discontiguities that span whole
+	 * folios that aren't under writeback.  This is made more complicated
+	 * by the folios in the gap being of unpredictable sizes - if they even
+	 * exist - but we don't want to look them up.
+	 */
+	for (s = 0; s < NR_IO_STREAMS; s++) {
+		loff_t rstart, rend;
+
+		stream = &wreq->io_streams[s];
+		/* Read active flag before list pointers */
+		if (!smp_load_acquire(&stream->active))
+			continue;
+
+		front = stream->front;
+		while (front) {
+			trace_netfs_collect_sreq(wreq, front);
+			//_debug("sreq [%x] %llx %zx/%zx",
+			//       front->debug_index, front->start, front->transferred, front->len);
+
+			/* Stall if there may be a discontinuity. */
+			rstart = round_down(front->start, PAGE_SIZE);
+			if (rstart > wreq->contiguity) {
+				if (wreq->contiguity > stream->collected_to) {
+					trace_netfs_collect_gap(wreq, stream,
+								wreq->contiguity, 'D');
+					stream->collected_to = wreq->contiguity;
+				}
+				notes |= REASSESS_DISCONTIG;
+				break;
+			}
+			rend = round_up(front->start + front->len, PAGE_SIZE);
+			if (rend > wreq->contiguity) {
+				trace_netfs_collect_contig(wreq, rend,
+							   netfs_contig_trace_collect);
+				wreq->contiguity = rend;
+				if (notes & REASSESS_DISCONTIG)
+					notes |= NEED_REASSESS;
+			}
+			notes &= ~MAYBE_DISCONTIG;
+
+			/* Stall if the front is still undergoing I/O. */
+			if (test_bit(NETFS_SREQ_IN_PROGRESS, &front->flags)) {
+				notes |= HIT_PENDING;
+				break;
+			}
+			smp_rmb(); /* Read counters after I-P flag. */
+
+			if (stream->failed) {
+				stream->collected_to = front->start + front->len;
+				notes |= MADE_PROGRESS | SAW_FAILURE;
+				goto cancel;
+			}
+			if (front->start + front->transferred > stream->collected_to) {
+				stream->collected_to = front->start + front->transferred;
+				stream->transferred = stream->collected_to - wreq->start;
+				notes |= MADE_PROGRESS;
+			}
+			if (test_bit(NETFS_SREQ_FAILED, &front->flags)) {
+				stream->failed = true;
+				stream->error = front->error;
+				if (stream->source == NETFS_UPLOAD_TO_SERVER)
+					mapping_set_error(wreq->mapping, front->error);
+				notes |= NEED_REASSESS | SAW_FAILURE;
+				break;
+			}
+			if (front->transferred < front->len) {
+				stream->need_retry = true;
+				notes |= NEED_RETRY | MADE_PROGRESS;
+				break;
+			}
+
+		cancel:
+			/* Remove if completely consumed. */
+			spin_lock(&wreq->lock);
+
+			remove = front;
+			list_del_init(&front->rreq_link);
+			front = list_first_entry_or_null(&stream->subrequests,
+							 struct netfs_io_subrequest, rreq_link);
+			stream->front = front;
+			if (!front) {
+				unsigned long long jump_to = atomic64_read(&wreq->issued_to);
+
+				if (stream->collected_to < jump_to) {
+					trace_netfs_collect_gap(wreq, stream, jump_to, 'A');
+					stream->collected_to = jump_to;
+				}
+			}
+
+			spin_unlock(&wreq->lock);
+			netfs_put_subrequest(remove, false,
+					     notes & SAW_FAILURE ?
+					     netfs_sreq_trace_put_cancel :
+					     netfs_sreq_trace_put_done);
+		}
+
+		if (front)
+			notes &= ~ALL_EMPTY;
+		else
+			notes |= SOME_EMPTY;
+
+		if (stream->collected_to < collected_to)
+			collected_to = stream->collected_to;
+	}
+
+	if (collected_to != ULLONG_MAX && collected_to > wreq->collected_to)
+		wreq->collected_to = collected_to;
+
+	/* If we have an empty stream, we need to jump it forward over any gap
+	 * otherwise the collection point will never advance.
+	 *
+	 * Note that the issuer always adds to the stream with the lowest
+	 * so-far submitted start, so if we see two consecutive subreqs in one
+	 * stream with nothing between then in another stream, then the second
+	 * stream has a gap that can be jumped.
+	 */
+	if (notes & SOME_EMPTY) {
+		unsigned long long jump_to = wreq->start + wreq->len;
+
+		for (s = 0; s < NR_IO_STREAMS; s++) {
+			stream = &wreq->io_streams[s];
+			if (stream->active &&
+			    stream->front &&
+			    stream->front->start < jump_to)
+				jump_to = stream->front->start;
+		}
+
+		for (s = 0; s < NR_IO_STREAMS; s++) {
+			stream = &wreq->io_streams[s];
+			if (stream->active &&
+			    !stream->front &&
+			    stream->collected_to < jump_to) {
+				trace_netfs_collect_gap(wreq, stream, jump_to, 'B');
+				stream->collected_to = jump_to;
+			}
+		}
+	}
+
+	for (s = 0; s < NR_IO_STREAMS; s++) {
+		stream = &wreq->io_streams[s];
+		if (stream->active)
+			trace_netfs_collect_stream(wreq, stream);
+	}
+
+	trace_netfs_collect_state(wreq, wreq->collected_to, notes);
+
+	/* Unlock any folios that we have now finished with. */
+	if (notes & BUFFERED) {
+		unsigned long long clean_to = min(wreq->collected_to, wreq->contiguity);
+
+		if (wreq->cleaned_to < clean_to)
+			netfs_writeback_unlock_folios(wreq, clean_to, &notes);
+	} else {
+		wreq->cleaned_to = wreq->collected_to;
+	}
+
+	// TODO: Discard encryption buffers
+
+	/* If all streams are discontiguous with the last folio we cleared, we
+	 * may need to skip a set of folios.
+	 */
+	if ((notes & (MAYBE_DISCONTIG | ALL_EMPTY)) == MAYBE_DISCONTIG) {
+		unsigned long long jump_to = ULLONG_MAX;
+
+		for (s = 0; s < NR_IO_STREAMS; s++) {
+			stream = &wreq->io_streams[s];
+			if (stream->active && stream->front &&
+			    stream->front->start < jump_to)
+				jump_to = stream->front->start;
+		}
+
+		trace_netfs_collect_contig(wreq, jump_to, netfs_contig_trace_jump);
+		wreq->contiguity = jump_to;
+		wreq->cleaned_to = jump_to;
+		wreq->collected_to = jump_to;
+		for (s = 0; s < NR_IO_STREAMS; s++) {
+			stream = &wreq->io_streams[s];
+			if (stream->collected_to < jump_to)
+				stream->collected_to = jump_to;
+		}
+		//cond_resched();
+		notes |= MADE_PROGRESS;
+		goto reassess_streams;
+	}
+
+	if (notes & NEED_RETRY)
+		goto need_retry;
+	if ((notes & MADE_PROGRESS) && test_bit(NETFS_RREQ_PAUSE, &wreq->flags)) {
+		trace_netfs_rreq(wreq, netfs_rreq_trace_unpause);
+		clear_bit_unlock(NETFS_RREQ_PAUSE, &wreq->flags);
+		wake_up_bit(&wreq->flags, NETFS_RREQ_PAUSE);
+	}
+
+	if (notes & NEED_REASSESS) {
+		//cond_resched();
+		goto reassess_streams;
+	}
+	if (notes & MADE_PROGRESS) {
+		//cond_resched();
+		goto reassess_streams;
+	}
+
+out:
+	netfs_put_group_many(wreq->group, wreq->nr_group_rel);
+	wreq->nr_group_rel = 0;
+	_leave(" = %x", notes);
+	return;
+
+need_retry:
+	/* Okay...  We're going to have to retry one or both streams.  Note
+	 * that any partially completed op will have had any wholly transferred
+	 * folios removed from it.
+	 */
+	_debug("retry");
+	netfs_retry_writes(wreq);
+	goto out;
+}
+
+/*
+ * Perform the collection of subrequests, folios and encryption buffers.
+ */
+void netfs_write_collection_worker(struct work_struct *work)
+{
+	struct netfs_io_request *wreq = container_of(work, struct netfs_io_request, work);
+	struct netfs_inode *ictx = netfs_inode(wreq->inode);
+	size_t transferred;
+	int s;
+
+	_enter("R=%x", wreq->debug_id);
+
+	netfs_see_request(wreq, netfs_rreq_trace_see_work);
+	if (!test_bit(NETFS_RREQ_IN_PROGRESS, &wreq->flags)) {
+		netfs_put_request(wreq, false, netfs_rreq_trace_put_work);
+		return;
+	}
+
+	netfs_collect_write_results(wreq);
+
+	/* We're done when the app thread has finished posting subreqs and all
+	 * the queues in all the streams are empty.
+	 */
+	if (!test_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags)) {
+		netfs_put_request(wreq, false, netfs_rreq_trace_put_work);
+		return;
+	}
+	smp_rmb(); /* Read ALL_QUEUED before lists. */
+
+	transferred = LONG_MAX;
+	for (s = 0; s < NR_IO_STREAMS; s++) {
+		struct netfs_io_stream *stream = &wreq->io_streams[s];
+		if (!stream->active)
+			continue;
+		if (!list_empty(&stream->subrequests)) {
+			netfs_put_request(wreq, false, netfs_rreq_trace_put_work);
+			return;
+		}
+		if (stream->transferred < transferred)
+			transferred = stream->transferred;
+	}
+
+	/* Okay, declare that all I/O is complete. */
+	wreq->transferred = transferred;
+	trace_netfs_rreq(wreq, netfs_rreq_trace_write_done);
+
+	if (wreq->io_streams[1].active &&
+	    wreq->io_streams[1].failed) {
+		/* Cache write failure doesn't prevent writeback completion
+		 * unless we're in disconnected mode.
+		 */
+		ictx->ops->invalidate_cache(wreq);
+	}
+
+	if (wreq->cleanup)
+		wreq->cleanup(wreq);
+
+	if (wreq->origin == NETFS_DIO_WRITE &&
+	    wreq->mapping->nrpages) {
+		/* mmap may have got underfoot and we may now have folios
+		 * locally covering the region we just wrote.  Attempt to
+		 * discard the folios, but leave in place any modified locally.
+		 * ->write_iter() is prevented from interfering by the DIO
+		 * counter.
+		 */
+		pgoff_t first = wreq->start >> PAGE_SHIFT;
+		pgoff_t last = (wreq->start + wreq->transferred - 1) >> PAGE_SHIFT;
+		invalidate_inode_pages2_range(wreq->mapping, first, last);
+	}
+
+	if (wreq->origin == NETFS_DIO_WRITE)
+		inode_dio_end(wreq->inode);
+
+	_debug("finished");
+	trace_netfs_rreq(wreq, netfs_rreq_trace_wake_ip);
+	clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &wreq->flags);
+	wake_up_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS);
+
+	if (wreq->iocb) {
+		wreq->iocb->ki_pos += wreq->transferred;
+		if (wreq->iocb->ki_complete)
+			wreq->iocb->ki_complete(
+				wreq->iocb, wreq->error ? wreq->error : wreq->transferred);
+		wreq->iocb = VFS_PTR_POISON;
+	}
+
+	netfs_clear_subrequests(wreq, false);
+	netfs_put_request(wreq, false, netfs_rreq_trace_put_work_complete);
+}
+
+/*
+ * Wake the collection work item.
+ */
+void netfs_wake_write_collector(struct netfs_io_request *wreq, bool was_async)
+{
+	if (!work_pending(&wreq->work)) {
+		netfs_get_request(wreq, netfs_rreq_trace_get_work);
+		if (!queue_work(system_unbound_wq, &wreq->work))
+			netfs_put_request(wreq, was_async, netfs_rreq_trace_put_work_nq);
+	}
+}
+
+/**
+ * new_netfs_write_subrequest_terminated - Note the termination of a write operation.
+ * @_op: The I/O request that has terminated.
+ * @transferred_or_error: The amount of data transferred or an error code.
+ * @was_async: The termination was asynchronous
+ *
+ * This tells the library that a contributory write I/O operation has
+ * terminated, one way or another, and that it should collect the results.
+ *
+ * The caller indicates in @transferred_or_error the outcome of the operation,
+ * supplying a positive value to indicate the number of bytes transferred or a
+ * negative error code.  The library will look after reissuing I/O operations
+ * as appropriate and writing downloaded data to the cache.
+ *
+ * If @was_async is true, the caller might be running in softirq or interrupt
+ * context and we can't sleep.
+ *
+ * When this is called, ownership of the subrequest is transferred back to the
+ * library, along with a ref.
+ *
+ * Note that %_op is a void* so that the function can be passed to
+ * kiocb::term_func without the need for a casting wrapper.
+ */
+void new_netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
+					   bool was_async)
+{
+	struct netfs_io_subrequest *subreq = _op;
+	struct netfs_io_request *wreq = subreq->rreq;
+	struct netfs_io_stream *stream = &wreq->io_streams[subreq->stream_nr];
+
+	_enter("%x[%x] %zd", wreq->debug_id, subreq->debug_index, transferred_or_error);
+
+	switch (subreq->source) {
+	case NETFS_UPLOAD_TO_SERVER:
+		netfs_stat(&netfs_n_wh_upload_done);
+		break;
+	case NETFS_WRITE_TO_CACHE:
+		netfs_stat(&netfs_n_wh_write_done);
+		break;
+	case NETFS_INVALID_WRITE:
+		break;
+	default:
+		BUG();
+	}
+
+	if (IS_ERR_VALUE(transferred_or_error)) {
+		subreq->error = transferred_or_error;
+		if (subreq->error == -EAGAIN)
+			set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+		else
+			set_bit(NETFS_SREQ_FAILED, &subreq->flags);
+		trace_netfs_failure(wreq, subreq, transferred_or_error, netfs_fail_write);
+
+		switch (subreq->source) {
+		case NETFS_WRITE_TO_CACHE:
+			netfs_stat(&netfs_n_wh_write_failed);
+			break;
+		case NETFS_UPLOAD_TO_SERVER:
+			netfs_stat(&netfs_n_wh_upload_failed);
+			break;
+		default:
+			break;
+		}
+		trace_netfs_rreq(wreq, netfs_rreq_trace_set_pause);
+		set_bit(NETFS_RREQ_PAUSE, &wreq->flags);
+	} else {
+		if (WARN(transferred_or_error > subreq->len - subreq->transferred,
+			 "Subreq excess write: R=%x[%x] %zd > %zu - %zu",
+			 wreq->debug_id, subreq->debug_index,
+			 transferred_or_error, subreq->len, subreq->transferred))
+			transferred_or_error = subreq->len - subreq->transferred;
+
+		subreq->error = 0;
+		subreq->transferred += transferred_or_error;
+
+		if (subreq->transferred < subreq->len)
+			set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+	}
+
+	trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
+
+	clear_bit_unlock(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
+	wake_up_bit(&subreq->flags, NETFS_SREQ_IN_PROGRESS);
+
+	/* If we are at the head of the queue, wake up the collector,
+	 * transferring a ref to it if we were the ones to do so.
+	 */
+	if (list_is_first(&subreq->rreq_link, &stream->subrequests))
+		netfs_wake_write_collector(wreq, was_async);
+
+	netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
+}
+EXPORT_SYMBOL(new_netfs_write_subrequest_terminated);
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
new file mode 100644
index 000000000000..e0fb472898f5
--- /dev/null
+++ b/fs/netfs/write_issue.c
@@ -0,0 +1,673 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Network filesystem high-level (buffered) writeback.
+ *
+ * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ *
+ * To support network filesystems with local caching, we manage a situation
+ * that can be envisioned like the following:
+ *
+ *               +---+---+-----+-----+---+----------+
+ *    Folios:    |   |   |     |     |   |          |
+ *               +---+---+-----+-----+---+----------+
+ *
+ *                 +------+------+     +----+----+
+ *    Upload:      |      |      |.....|    |    |
+ *  (Stream 0)     +------+------+     +----+----+
+ *
+ *               +------+------+------+------+------+
+ *    Cache:     |      |      |      |      |      |
+ *  (Stream 1)   +------+------+------+------+------+
+ *
+ * Where we have a sequence of folios of varying sizes that we need to overlay
+ * with multiple parallel streams of I/O requests, where the I/O requests in a
+ * stream may also be of various sizes (in cifs, for example, the sizes are
+ * negotiated with the server; in something like ceph, they may represent the
+ * sizes of storage objects).
+ *
+ * The sequence in each stream may contain gaps and noncontiguous subrequests
+ * may be glued together into single vectored write RPCs.
+ */
+
+#include <linux/export.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include "internal.h"
+
+/*
+ * Kill all dirty folios in the event of an unrecoverable error, starting with
+ * a locked folio we've already obtained from writeback_iter().
+ */
+static void netfs_kill_dirty_pages(struct address_space *mapping,
+				   struct writeback_control *wbc,
+				   struct folio *folio)
+{
+	int error = 0;
+
+	do {
+		enum netfs_folio_trace why = netfs_folio_trace_kill;
+		struct netfs_group *group = NULL;
+		struct netfs_folio *finfo = NULL;
+		void *priv;
+
+		priv = folio_detach_private(folio);
+		if (priv) {
+			finfo = __netfs_folio_info(priv);
+			if (finfo) {
+				/* Kill folio from streaming write. */
+				group = finfo->netfs_group;
+				why = netfs_folio_trace_kill_s;
+			} else {
+				group = priv;
+				if (group == NETFS_FOLIO_COPY_TO_CACHE) {
+					/* Kill copy-to-cache folio */
+					why = netfs_folio_trace_kill_cc;
+					group = NULL;
+				} else {
+					/* Kill folio with group */
+					why = netfs_folio_trace_kill_g;
+				}
+			}
+		}
+
+		trace_netfs_folio(folio, why);
+
+		folio_start_writeback(folio);
+		folio_unlock(folio);
+		folio_end_writeback(folio);
+
+		netfs_put_group(group);
+		kfree(finfo);
+
+	} while ((folio = writeback_iter(mapping, wbc, folio, &error)));
+}
+
+/*
+ * Create a write request and set it up appropriately for the origin type.
+ */
+struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
+						struct file *file,
+						loff_t start,
+						enum netfs_io_origin origin)
+{
+	struct netfs_io_request *wreq;
+	struct netfs_inode *ictx;
+
+	wreq = netfs_alloc_request(mapping, file, start, 0, origin);
+	if (IS_ERR(wreq))
+		return wreq;
+
+	_enter("R=%x", wreq->debug_id);
+
+	ictx = netfs_inode(wreq->inode);
+	if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &wreq->flags))
+		fscache_begin_write_operation(&wreq->cache_resources, netfs_i_cookie(ictx));
+
+	wreq->contiguity = wreq->start;
+	wreq->cleaned_to = wreq->start;
+	INIT_WORK(&wreq->work, netfs_write_collection_worker);
+
+	wreq->io_streams[0].stream_nr		= 0;
+	wreq->io_streams[0].source		= NETFS_UPLOAD_TO_SERVER;
+	wreq->io_streams[0].prepare_write	= ictx->ops->prepare_write;
+	wreq->io_streams[0].issue_write		= ictx->ops->issue_write;
+	wreq->io_streams[0].collected_to	= start;
+	wreq->io_streams[0].transferred		= LONG_MAX;
+
+	wreq->io_streams[1].stream_nr		= 1;
+	wreq->io_streams[1].source		= NETFS_WRITE_TO_CACHE;
+	wreq->io_streams[1].collected_to	= start;
+	wreq->io_streams[1].transferred		= LONG_MAX;
+	if (fscache_resources_valid(&wreq->cache_resources)) {
+		wreq->io_streams[1].avail	= true;
+		wreq->io_streams[1].prepare_write = wreq->cache_resources.ops->prepare_write_subreq;
+		wreq->io_streams[1].issue_write = wreq->cache_resources.ops->issue_write;
+	}
+
+	return wreq;
+}
+
+/**
+ * netfs_prepare_write_failed - Note write preparation failed
+ * @subreq: The subrequest to mark
+ *
+ * Mark a subrequest to note that preparation for write failed.
+ */
+void netfs_prepare_write_failed(struct netfs_io_subrequest *subreq)
+{
+	__set_bit(NETFS_SREQ_FAILED, &subreq->flags);
+	trace_netfs_sreq(subreq, netfs_sreq_trace_prep_failed);
+}
+EXPORT_SYMBOL(netfs_prepare_write_failed);
+
+/*
+ * Prepare a write subrequest.  We need to allocate a new subrequest
+ * if we don't have one.
+ */
+static void netfs_prepare_write(struct netfs_io_request *wreq,
+				struct netfs_io_stream *stream,
+				loff_t start)
+{
+	struct netfs_io_subrequest *subreq;
+
+	subreq = netfs_alloc_subrequest(wreq);
+	subreq->source		= stream->source;
+	subreq->start		= start;
+	subreq->max_len		= ULONG_MAX;
+	subreq->max_nr_segs	= INT_MAX;
+	subreq->stream_nr	= stream->stream_nr;
+
+	_enter("R=%x[%x]", wreq->debug_id, subreq->debug_index);
+
+	trace_netfs_sreq_ref(wreq->debug_id, subreq->debug_index,
+			     refcount_read(&subreq->ref),
+			     netfs_sreq_trace_new);
+
+	trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
+
+	switch (stream->source) {
+	case NETFS_UPLOAD_TO_SERVER:
+		netfs_stat(&netfs_n_wh_upload);
+		subreq->max_len = wreq->wsize;
+		break;
+	case NETFS_WRITE_TO_CACHE:
+		netfs_stat(&netfs_n_wh_write);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	if (stream->prepare_write)
+		stream->prepare_write(subreq);
+
+	__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
+
+	/* We add to the end of the list whilst the collector may be walking
+	 * the list.  The collector only goes nextwards and uses the lock to
+	 * remove entries off of the front.
+	 */
+	spin_lock(&wreq->lock);
+	list_add_tail(&subreq->rreq_link, &stream->subrequests);
+	if (list_is_first(&subreq->rreq_link, &stream->subrequests)) {
+		stream->front = subreq;
+		if (!stream->active) {
+			stream->collected_to = stream->front->start;
+			/* Write list pointers before active flag */
+			smp_store_release(&stream->active, true);
+		}
+	}
+
+	spin_unlock(&wreq->lock);
+
+	stream->construct = subreq;
+}
+
+/*
+ * Set the I/O iterator for the filesystem/cache to use and dispatch the I/O
+ * operation.  The operation may be asynchronous and should call
+ * netfs_write_subrequest_terminated() when complete.
+ */
+static void netfs_do_issue_write(struct netfs_io_stream *stream,
+				 struct netfs_io_subrequest *subreq)
+{
+	struct netfs_io_request *wreq = subreq->rreq;
+
+	_enter("R=%x[%x],%zx", wreq->debug_id, subreq->debug_index, subreq->len);
+
+	if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
+		return netfs_write_subrequest_terminated(subreq, subreq->error, false);
+
+	// TODO: Use encrypted buffer
+	if (test_bit(NETFS_RREQ_USE_IO_ITER, &wreq->flags)) {
+		subreq->io_iter = wreq->io_iter;
+		iov_iter_advance(&subreq->io_iter,
+				 subreq->start + subreq->transferred - wreq->start);
+		iov_iter_truncate(&subreq->io_iter,
+				 subreq->len - subreq->transferred);
+	} else {
+		iov_iter_xarray(&subreq->io_iter, ITER_SOURCE, &wreq->mapping->i_pages,
+				subreq->start + subreq->transferred,
+				subreq->len   - subreq->transferred);
+	}
+
+	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
+	stream->issue_write(subreq);
+}
+
+void netfs_reissue_write(struct netfs_io_stream *stream,
+			 struct netfs_io_subrequest *subreq)
+{
+	__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
+	netfs_do_issue_write(stream, subreq);
+}
+
+static void netfs_issue_write(struct netfs_io_request *wreq,
+			      struct netfs_io_stream *stream)
+{
+	struct netfs_io_subrequest *subreq = stream->construct;
+
+	if (!subreq)
+		return;
+	stream->construct = NULL;
+
+	if (subreq->start + subreq->len > wreq->start + wreq->submitted)
+		wreq->len = wreq->submitted = subreq->start + subreq->len - wreq->start;
+	netfs_do_issue_write(stream, subreq);
+}
+
+/*
+ * Add data to the write subrequest, dispatching each as we fill it up or if it
+ * is discontiguous with the previous.  We only fill one part at a time so that
+ * we can avoid overrunning the credits obtained (cifs) and try to parallelise
+ * content-crypto preparation with network writes.
+ */
+int netfs_advance_write(struct netfs_io_request *wreq,
+			struct netfs_io_stream *stream,
+			loff_t start, size_t len, bool to_eof)
+{
+	struct netfs_io_subrequest *subreq = stream->construct;
+	size_t part;
+
+	if (!stream->avail) {
+		_leave("no write");
+		return len;
+	}
+
+	_enter("R=%x[%x]", wreq->debug_id, subreq ? subreq->debug_index : 0);
+
+	if (subreq && start != subreq->start + subreq->len) {
+		netfs_issue_write(wreq, stream);
+		subreq = NULL;
+	}
+
+	if (!stream->construct)
+		netfs_prepare_write(wreq, stream, start);
+	subreq = stream->construct;
+
+	part = min(subreq->max_len - subreq->len, len);
+	_debug("part %zx/%zx %zx/%zx", subreq->len, subreq->max_len, part, len);
+	subreq->len += part;
+	subreq->nr_segs++;
+
+	if (subreq->len >= subreq->max_len ||
+	    subreq->nr_segs >= subreq->max_nr_segs ||
+	    to_eof) {
+		netfs_issue_write(wreq, stream);
+		subreq = NULL;
+	}
+
+	return part;
+}
+
+/*
+ * Write some of a pending folio data back to the server.
+ */
+static int netfs_write_folio(struct netfs_io_request *wreq,
+			     struct writeback_control *wbc,
+			     struct folio *folio)
+{
+	struct netfs_io_stream *upload = &wreq->io_streams[0];
+	struct netfs_io_stream *cache  = &wreq->io_streams[1];
+	struct netfs_io_stream *stream;
+	struct netfs_group *fgroup; /* TODO: Use this with ceph */
+	struct netfs_folio *finfo;
+	size_t fsize = folio_size(folio), flen = fsize, foff = 0;
+	loff_t fpos = folio_pos(folio);
+	bool to_eof = false, streamw = false;
+	bool debug = false;
+
+	_enter("");
+
+	if (fpos >= wreq->i_size) {
+		/* mmap beyond eof. */
+		_debug("beyond eof");
+		folio_start_writeback(folio);
+		folio_unlock(folio);
+		wreq->nr_group_rel += netfs_folio_written_back(folio);
+		netfs_put_group_many(wreq->group, wreq->nr_group_rel);
+		wreq->nr_group_rel = 0;
+		return 0;
+	}
+
+	fgroup = netfs_folio_group(folio);
+	finfo = netfs_folio_info(folio);
+	if (finfo) {
+		foff = finfo->dirty_offset;
+		flen = foff + finfo->dirty_len;
+		streamw = true;
+	}
+
+	if (wreq->origin == NETFS_WRITETHROUGH) {
+		to_eof = false;
+		if (flen > wreq->i_size - fpos)
+			flen = wreq->i_size - fpos;
+	} else if (flen > wreq->i_size - fpos) {
+		flen = wreq->i_size - fpos;
+		if (!streamw)
+			folio_zero_segment(folio, flen, fsize);
+		to_eof = true;
+	} else if (flen == wreq->i_size - fpos) {
+		to_eof = true;
+	}
+	flen -= foff;
+
+	_debug("folio %zx %zx %zx", foff, flen, fsize);
+
+	/* Deal with discontinuities in the stream of dirty pages.  These can
+	 * arise from a number of sources:
+	 *
+	 * (1) Intervening non-dirty pages from random-access writes, multiple
+	 *     flushers writing back different parts simultaneously and manual
+	 *     syncing.
+	 *
+	 * (2) Partially-written pages from write-streaming.
+	 *
+	 * (3) Pages that belong to a different write-back group (eg.  Ceph
+	 *     snapshots).
+	 *
+	 * (4) Actually-clean pages that were marked for write to the cache
+	 *     when they were read.  Note that these appear as a special
+	 *     write-back group.
+	 */
+	if (fgroup == NETFS_FOLIO_COPY_TO_CACHE) {
+		netfs_issue_write(wreq, upload);
+	} else if (fgroup != wreq->group) {
+		/* We can't write this page to the server yet. */
+		kdebug("wrong group");
+		folio_redirty_for_writepage(wbc, folio);
+		folio_unlock(folio);
+		netfs_issue_write(wreq, upload);
+		netfs_issue_write(wreq, cache);
+		return 0;
+	}
+
+	if (foff > 0)
+		netfs_issue_write(wreq, upload);
+	if (streamw)
+		netfs_issue_write(wreq, cache);
+
+	/* Flip the page to the writeback state and unlock.  If we're called
+	 * from write-through, then the page has already been put into the wb
+	 * state.
+	 */
+	if (wreq->origin == NETFS_WRITEBACK)
+		folio_start_writeback(folio);
+	folio_unlock(folio);
+
+	if (fgroup == NETFS_FOLIO_COPY_TO_CACHE) {
+		if (!fscache_resources_valid(&wreq->cache_resources)) {
+			trace_netfs_folio(folio, netfs_folio_trace_cancel_copy);
+			netfs_issue_write(wreq, upload);
+			netfs_folio_written_back(folio);
+			return 0;
+		}
+		trace_netfs_folio(folio, netfs_folio_trace_store_copy);
+	} else if (!upload->construct) {
+		trace_netfs_folio(folio, netfs_folio_trace_store);
+	} else {
+		trace_netfs_folio(folio, netfs_folio_trace_store_plus);
+	}
+
+	/* Move the submission point forward to allow for write-streaming data
+	 * not starting at the front of the page.  We don't do write-streaming
+	 * with the cache as the cache requires DIO alignment.
+	 *
+	 * Also skip uploading for data that's been read and just needs copying
+	 * to the cache.
+	 */
+	for (int s = 0; s < NR_IO_STREAMS; s++) {
+		stream = &wreq->io_streams[s];
+		stream->submit_max_len = fsize;
+		stream->submit_off = foff;
+		stream->submit_len = flen;
+		if ((stream->source == NETFS_WRITE_TO_CACHE && streamw) ||
+		    (stream->source == NETFS_UPLOAD_TO_SERVER &&
+		     fgroup == NETFS_FOLIO_COPY_TO_CACHE)) {
+			stream->submit_off = UINT_MAX;
+			stream->submit_len = 0;
+			stream->submit_max_len = 0;
+		}
+	}
+
+	/* Attach the folio to one or more subrequests.  For a big folio, we
+	 * could end up with thousands of subrequests if the wsize is small -
+	 * but we might need to wait during the creation of subrequests for
+	 * network resources (eg. SMB credits).
+	 */
+	for (;;) {
+		ssize_t part;
+		size_t lowest_off = ULONG_MAX;
+		int choose_s = -1;
+
+		/* Always add to the lowest-submitted stream first. */
+		for (int s = 0; s < NR_IO_STREAMS; s++) {
+			stream = &wreq->io_streams[s];
+			if (stream->submit_len > 0 &&
+			    stream->submit_off < lowest_off) {
+				lowest_off = stream->submit_off;
+				choose_s = s;
+			}
+		}
+
+		if (choose_s < 0)
+			break;
+		stream = &wreq->io_streams[choose_s];
+
+		part = netfs_advance_write(wreq, stream, fpos + stream->submit_off,
+					   stream->submit_len, to_eof);
+		atomic64_set(&wreq->issued_to, fpos + stream->submit_off);
+		stream->submit_off += part;
+		stream->submit_max_len -= part;
+		if (part > stream->submit_len)
+			stream->submit_len = 0;
+		else
+			stream->submit_len -= part;
+		if (part > 0)
+			debug = true;
+	}
+
+	atomic64_set(&wreq->issued_to, fpos + fsize);
+
+	if (!debug)
+		kdebug("R=%x: No submit", wreq->debug_id);
+
+	if (flen < fsize)
+		for (int s = 0; s < NR_IO_STREAMS; s++)
+			netfs_issue_write(wreq, &wreq->io_streams[s]);
+
+	_leave(" = 0");
+	return 0;
+}
+
+/*
+ * Write some of the pending data back to the server
+ */
+int new_netfs_writepages(struct address_space *mapping,
+			 struct writeback_control *wbc)
+{
+	struct netfs_inode *ictx = netfs_inode(mapping->host);
+	struct netfs_io_request *wreq = NULL;
+	struct folio *folio;
+	int error = 0;
+
+	if (wbc->sync_mode == WB_SYNC_ALL)
+		mutex_lock(&ictx->wb_lock);
+	else if (!mutex_trylock(&ictx->wb_lock))
+		return 0;
+
+	/* Need the first folio to be able to set up the op. */
+	folio = writeback_iter(mapping, wbc, NULL, &error);
+	if (!folio)
+		goto out;
+
+	wreq = netfs_create_write_req(mapping, NULL, folio_pos(folio), NETFS_WRITEBACK);
+	if (IS_ERR(wreq)) {
+		error = PTR_ERR(wreq);
+		goto couldnt_start;
+	}
+
+	trace_netfs_write(wreq, netfs_write_trace_writeback);
+	netfs_stat(&netfs_n_wh_writepages);
+
+	do {
+		_debug("wbiter %lx %llx", folio->index, wreq->start + wreq->submitted);
+
+		/* It appears we don't have to handle cyclic writeback wrapping. */
+		WARN_ON_ONCE(wreq && folio_pos(folio) < wreq->start + wreq->submitted);
+
+		if (netfs_folio_group(folio) != NETFS_FOLIO_COPY_TO_CACHE &&
+		    unlikely(!test_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags))) {
+			set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags);
+			wreq->netfs_ops->begin_writeback(wreq);
+		}
+
+		error = netfs_write_folio(wreq, wbc, folio);
+		if (error < 0)
+			break;
+	} while ((folio = writeback_iter(mapping, wbc, folio, &error)));
+
+	for (int s = 0; s < NR_IO_STREAMS; s++)
+		netfs_issue_write(wreq, &wreq->io_streams[s]);
+	smp_wmb(); /* Write lists before ALL_QUEUED. */
+	set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags);
+
+	mutex_unlock(&ictx->wb_lock);
+
+	netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
+	_leave(" = %d", error);
+	return error;
+
+couldnt_start:
+	netfs_kill_dirty_pages(mapping, wbc, folio);
+out:
+	mutex_unlock(&ictx->wb_lock);
+	_leave(" = %d", error);
+	return error;
+}
+EXPORT_SYMBOL(new_netfs_writepages);
+
+/*
+ * Begin a write operation for writing through the pagecache.
+ */
+struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t len)
+{
+	struct netfs_io_request *wreq = NULL;
+	struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
+
+	mutex_lock(&ictx->wb_lock);
+
+	wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
+				      iocb->ki_pos, NETFS_WRITETHROUGH);
+	if (IS_ERR(wreq))
+		mutex_unlock(&ictx->wb_lock);
+
+	wreq->io_streams[0].avail = true;
+	trace_netfs_write(wreq, netfs_write_trace_writethrough);
+	return wreq;
+}
+
+/*
+ * Advance the state of the write operation used when writing through the
+ * pagecache.  Data has been copied into the pagecache that we need to append
+ * to the request.  If we've added more than wsize then we need to create a new
+ * subrequest.
+ */
+int new_netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
+				   struct folio *folio, size_t copied, bool to_page_end,
+				   struct folio **writethrough_cache)
+{
+	_enter("R=%x ic=%zu ws=%u cp=%zu tp=%u",
+	       wreq->debug_id, wreq->iter.count, wreq->wsize, copied, to_page_end);
+
+	if (!*writethrough_cache) {
+		if (folio_test_dirty(folio))
+			/* Sigh.  mmap. */
+			folio_clear_dirty_for_io(folio);
+
+		/* We can make multiple writes to the folio... */
+		folio_start_writeback(folio);
+		if (wreq->len == 0)
+			trace_netfs_folio(folio, netfs_folio_trace_wthru);
+		else
+			trace_netfs_folio(folio, netfs_folio_trace_wthru_plus);
+		*writethrough_cache = folio;
+	}
+
+	wreq->len += copied;
+	if (!to_page_end)
+		return 0;
+
+	*writethrough_cache = NULL;
+	return netfs_write_folio(wreq, wbc, folio);
+}
+
+/*
+ * End a write operation used when writing through the pagecache.
+ */
+int new_netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
+			       struct folio *writethrough_cache)
+{
+	struct netfs_inode *ictx = netfs_inode(wreq->inode);
+	int ret;
+
+	_enter("R=%x", wreq->debug_id);
+
+	if (writethrough_cache)
+		netfs_write_folio(wreq, wbc, writethrough_cache);
+
+	netfs_issue_write(wreq, &wreq->io_streams[0]);
+	netfs_issue_write(wreq, &wreq->io_streams[1]);
+	smp_wmb(); /* Write lists before ALL_QUEUED. */
+	set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags);
+
+	mutex_unlock(&ictx->wb_lock);
+
+	ret = wreq->error;
+	netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
+	return ret;
+}
+
+/*
+ * Write data to the server without going through the pagecache and without
+ * writing it to the local cache.
+ */
+int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t len)
+{
+	struct netfs_io_stream *upload = &wreq->io_streams[0];
+	ssize_t part;
+	loff_t start = wreq->start;
+	int error = 0;
+
+	_enter("%zx", len);
+
+	if (wreq->origin == NETFS_DIO_WRITE)
+		inode_dio_begin(wreq->inode);
+
+	while (len) {
+		// TODO: Prepare content encryption
+
+		_debug("unbuffered %zx", len);
+		part = netfs_advance_write(wreq, upload, start, len, false);
+		start += part;
+		len -= part;
+		if (test_bit(NETFS_RREQ_PAUSE, &wreq->flags)) {
+			trace_netfs_rreq(wreq, netfs_rreq_trace_wait_pause);
+			wait_on_bit(&wreq->flags, NETFS_RREQ_PAUSE, TASK_UNINTERRUPTIBLE);
+		}
+		if (test_bit(NETFS_RREQ_FAILED, &wreq->flags))
+			break;
+	}
+
+	netfs_issue_write(wreq, upload);
+
+	smp_wmb(); /* Write lists before ALL_QUEUED. */
+	set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags);
+	if (list_empty(&upload->subrequests))
+		netfs_wake_write_collector(wreq, false);
+
+	_leave(" = %d", error);
+	return error;
+}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 88269681d4fc..42dba05a428b 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -64,6 +64,7 @@ struct netfs_inode {
 #if IS_ENABLED(CONFIG_FSCACHE)
 	struct fscache_cookie	*cache;
 #endif
+	struct mutex		wb_lock;	/* Writeback serialisation */
 	loff_t			remote_i_size;	/* Size of the remote file */
 	loff_t			zero_point;	/* Size after which we assume there's no data
 						 * on the server */
@@ -71,7 +72,6 @@ struct netfs_inode {
 #define NETFS_ICTX_ODIRECT	0		/* The file has DIO in progress */
 #define NETFS_ICTX_UNBUFFERED	1		/* I/O should not use the pagecache */
 #define NETFS_ICTX_WRITETHROUGH	2		/* Write-through caching */
-#define NETFS_ICTX_NO_WRITE_STREAMING	3	/* Don't engage in write-streaming */
 #define NETFS_ICTX_USE_PGPRIV2	31		/* [DEPRECATED] Use PG_private_2 to mark
 						 * write to cache on read */
 };
@@ -126,6 +126,33 @@ static inline struct netfs_group *netfs_folio_group(struct folio *folio)
 	return priv;
 }
 
+/*
+ * Stream of I/O subrequests going to a particular destination, such as the
+ * server or the local cache.  This is mainly intended for writing where we may
+ * have to write to multiple destinations concurrently.
+ */
+struct netfs_io_stream {
+	/* Submission tracking */
+	struct netfs_io_subrequest *construct;	/* Op being constructed */
+	unsigned int		submit_off;	/* Folio offset we're submitting from */
+	unsigned int		submit_len;	/* Amount of data left to submit */
+	unsigned int		submit_max_len;	/* Amount I/O can be rounded up to */
+	void (*prepare_write)(struct netfs_io_subrequest *subreq);
+	void (*issue_write)(struct netfs_io_subrequest *subreq);
+	/* Collection tracking */
+	struct list_head	subrequests;	/* Contributory I/O operations */
+	struct netfs_io_subrequest *front;	/* Op being collected */
+	unsigned long long	collected_to;	/* Position we've collected results to */
+	size_t			transferred;	/* The amount transferred from this stream */
+	enum netfs_io_source	source;		/* Where to read from/write to */
+	unsigned short		error;		/* Aggregate error for the stream */
+	unsigned char		stream_nr;	/* Index of stream in parent table */
+	bool			avail;		/* T if stream is available */
+	bool			active;		/* T if stream is active */
+	bool			need_retry;	/* T if this stream needs retrying */
+	bool			failed;		/* T if this stream failed */
+};
+
 /*
  * Resources required to do operations on a cache.
  */
@@ -150,13 +177,16 @@ struct netfs_io_subrequest {
 	struct list_head	rreq_link;	/* Link in rreq->subrequests */
 	struct iov_iter		io_iter;	/* Iterator for this subrequest */
 	unsigned long long	start;		/* Where to start the I/O */
+	size_t			max_len;	/* Maximum size of the I/O */
 	size_t			len;		/* Size of the I/O */
 	size_t			transferred;	/* Amount of data transferred */
 	refcount_t		ref;
 	short			error;		/* 0 or error that occurred */
 	unsigned short		debug_index;	/* Index in list (for debugging output) */
+	unsigned int		nr_segs;	/* Number of segs in io_iter */
 	unsigned int		max_nr_segs;	/* 0 or max number of segments in an iterator */
 	enum netfs_io_source	source;		/* Where to read from/write to */
+	unsigned char		stream_nr;	/* I/O stream this belongs to */
 	unsigned long		flags;
 #define NETFS_SREQ_COPY_TO_CACHE	0	/* Set if should copy the data to the cache */
 #define NETFS_SREQ_CLEAR_TAIL		1	/* Set if the rest of the read should be cleared */
@@ -164,6 +194,11 @@ struct netfs_io_subrequest {
 #define NETFS_SREQ_SEEK_DATA_READ	3	/* Set if ->read() should SEEK_DATA first */
 #define NETFS_SREQ_NO_PROGRESS		4	/* Set if we didn't manage to read any data */
 #define NETFS_SREQ_ONDEMAND		5	/* Set if it's from on-demand read mode */
+#define NETFS_SREQ_BOUNDARY		6	/* Set if ends on hard boundary (eg. ceph object) */
+#define NETFS_SREQ_IN_PROGRESS		8	/* Unlocked when the subrequest completes */
+#define NETFS_SREQ_NEED_RETRY		9	/* Set if the filesystem requests a retry */
+#define NETFS_SREQ_RETRYING		10	/* Set if we're retrying */
+#define NETFS_SREQ_FAILED		11	/* Set if the subreq failed unretryably */
 };
 
 enum netfs_io_origin {
@@ -194,6 +229,9 @@ struct netfs_io_request {
 	struct netfs_cache_resources cache_resources;
 	struct list_head	proc_link;	/* Link in netfs_iorequests */
 	struct list_head	subrequests;	/* Contributory I/O operations */
+	struct netfs_io_stream	io_streams[2];	/* Streams of parallel I/O operations */
+#define NR_IO_STREAMS 2 //wreq->nr_io_streams
+	struct netfs_group	*group;		/* Writeback group being written back */
 	struct iov_iter		iter;		/* Unencrypted-side iterator */
 	struct iov_iter		io_iter;	/* I/O (Encrypted-side) iterator */
 	void			*netfs_priv;	/* Private data for the netfs */
@@ -203,6 +241,8 @@ struct netfs_io_request {
 	unsigned int		rsize;		/* Maximum read size (0 for none) */
 	unsigned int		wsize;		/* Maximum write size (0 for none) */
 	atomic_t		subreq_counter;	/* Next subreq->debug_index */
+	unsigned int		nr_group_rel;	/* Number of refs to release on ->group */
+	spinlock_t		lock;		/* Lock for queuing subreqs */
 	atomic_t		nr_outstanding;	/* Number of ops in progress */
 	atomic_t		nr_copy_ops;	/* Number of copy-to-cache ops in progress */
 	size_t			upper_len;	/* Length can be extended to here */
@@ -214,6 +254,10 @@ struct netfs_io_request {
 	bool			direct_bv_unpin; /* T if direct_bv[] must be unpinned */
 	unsigned long long	i_size;		/* Size of the file */
 	unsigned long long	start;		/* Start position */
+	atomic64_t		issued_to;	/* Write issuer folio cursor */
+	unsigned long long	contiguity;	/* Tracking for gaps in the writeback sequence */
+	unsigned long long	collected_to;	/* Point we've collected to */
+	unsigned long long	cleaned_to;	/* Position we've cleaned folios to */
 	pgoff_t			no_unlock_folio; /* Don't unlock this folio after read */
 	refcount_t		ref;
 	unsigned long		flags;
@@ -227,6 +271,9 @@ struct netfs_io_request {
 #define NETFS_RREQ_UPLOAD_TO_SERVER	8	/* Need to write to the server */
 #define NETFS_RREQ_NONBLOCK		9	/* Don't block if possible (O_NONBLOCK) */
 #define NETFS_RREQ_BLOCKED		10	/* We blocked */
+#define NETFS_RREQ_PAUSE		11	/* Pause subrequest generation */
+#define NETFS_RREQ_USE_IO_ITER		12	/* Use ->io_iter rather than ->i_pages */
+#define NETFS_RREQ_ALL_QUEUED		13	/* All subreqs are now queued */
 #define NETFS_RREQ_USE_PGPRIV2		31	/* [DEPRECATED] Use PG_private_2 to mark
 						 * write to cache on read */
 	const struct netfs_request_ops *netfs_ops;
@@ -258,6 +305,9 @@ struct netfs_request_ops {
 	/* Write request handling */
 	void (*create_write_requests)(struct netfs_io_request *wreq,
 				      loff_t start, size_t len);
+	void (*begin_writeback)(struct netfs_io_request *wreq);
+	void (*prepare_write)(struct netfs_io_subrequest *subreq);
+	void (*issue_write)(struct netfs_io_subrequest *subreq);
 	void (*invalidate_cache)(struct netfs_io_request *wreq);
 };
 
@@ -292,6 +342,9 @@ struct netfs_cache_ops {
 		     netfs_io_terminated_t term_func,
 		     void *term_func_priv);
 
+	/* Write data to the cache from a netfs subrequest. */
+	void (*issue_write)(struct netfs_io_subrequest *subreq);
+
 	/* Expand readahead request */
 	void (*expand_readahead)(struct netfs_cache_resources *cres,
 				 unsigned long long *_start,
@@ -304,6 +357,13 @@ struct netfs_cache_ops {
 	enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
 					     unsigned long long i_size);
 
+	/* Prepare a write subrequest, working out if we're allowed to do it
+	 * and finding out the maximum amount of data to gather before
+	 * attempting to submit.  If we're not permitted to do it, the
+	 * subrequest should be marked failed.
+	 */
+	void (*prepare_write_subreq)(struct netfs_io_subrequest *subreq);
+
 	/* Prepare a write operation, working out what part of the write we can
 	 * actually do.
 	 */
@@ -349,6 +409,8 @@ int netfs_write_begin(struct netfs_inode *, struct file *,
 		      struct folio **, void **fsdata);
 int netfs_writepages(struct address_space *mapping,
 		     struct writeback_control *wbc);
+int new_netfs_writepages(struct address_space *mapping,
+			struct writeback_control *wbc);
 bool netfs_dirty_folio(struct address_space *mapping, struct folio *folio);
 int netfs_unpin_writeback(struct inode *inode, struct writeback_control *wbc);
 void netfs_clear_inode_writeback(struct inode *inode, const void *aux);
@@ -372,8 +434,11 @@ size_t netfs_limit_iter(const struct iov_iter *iter, size_t start_offset,
 struct netfs_io_subrequest *netfs_create_write_request(
 	struct netfs_io_request *wreq, enum netfs_io_source dest,
 	loff_t start, size_t len, work_func_t worker);
+void netfs_prepare_write_failed(struct netfs_io_subrequest *subreq);
 void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
 				       bool was_async);
+void new_netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
+					   bool was_async);
 void netfs_queue_write_request(struct netfs_io_subrequest *subreq);
 
 int netfs_start_io_read(struct inode *inode);
@@ -415,6 +480,7 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
 #if IS_ENABLED(CONFIG_FSCACHE)
 	ctx->cache = NULL;
 #endif
+	mutex_init(&ctx->wb_lock);
 	/* ->releasepage() drives zero_point */
 	if (use_zero_point) {
 		ctx->zero_point = ctx->remote_i_size;
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 7126d2ea459c..e7700172ae7e 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -44,14 +44,18 @@
 #define netfs_rreq_traces					\
 	EM(netfs_rreq_trace_assess,		"ASSESS ")	\
 	EM(netfs_rreq_trace_copy,		"COPY   ")	\
+	EM(netfs_rreq_trace_collect,		"COLLECT")	\
 	EM(netfs_rreq_trace_done,		"DONE   ")	\
 	EM(netfs_rreq_trace_free,		"FREE   ")	\
 	EM(netfs_rreq_trace_redirty,		"REDIRTY")	\
 	EM(netfs_rreq_trace_resubmit,		"RESUBMT")	\
+	EM(netfs_rreq_trace_set_pause,		"PAUSE  ")	\
 	EM(netfs_rreq_trace_unlock,		"UNLOCK ")	\
 	EM(netfs_rreq_trace_unmark,		"UNMARK ")	\
 	EM(netfs_rreq_trace_wait_ip,		"WAIT-IP")	\
+	EM(netfs_rreq_trace_wait_pause,		"WT-PAUS")	\
 	EM(netfs_rreq_trace_wake_ip,		"WAKE-IP")	\
+	EM(netfs_rreq_trace_unpause,		"UNPAUSE")	\
 	E_(netfs_rreq_trace_write_done,		"WR-DONE")
 
 #define netfs_sreq_sources					\
@@ -64,11 +68,15 @@
 	E_(NETFS_INVALID_WRITE,			"INVL")
 
 #define netfs_sreq_traces					\
+	EM(netfs_sreq_trace_discard,		"DSCRD")	\
 	EM(netfs_sreq_trace_download_instead,	"RDOWN")	\
+	EM(netfs_sreq_trace_fail,		"FAIL ")	\
 	EM(netfs_sreq_trace_free,		"FREE ")	\
 	EM(netfs_sreq_trace_limited,		"LIMIT")	\
 	EM(netfs_sreq_trace_prepare,		"PREP ")	\
+	EM(netfs_sreq_trace_prep_failed,	"PRPFL")	\
 	EM(netfs_sreq_trace_resubmit_short,	"SHORT")	\
+	EM(netfs_sreq_trace_retry,		"RETRY")	\
 	EM(netfs_sreq_trace_submit,		"SUBMT")	\
 	EM(netfs_sreq_trace_terminated,		"TERM ")	\
 	EM(netfs_sreq_trace_write,		"WRITE")	\
@@ -88,6 +96,7 @@
 #define netfs_rreq_ref_traces					\
 	EM(netfs_rreq_trace_get_for_outstanding,"GET OUTSTND")	\
 	EM(netfs_rreq_trace_get_subreq,		"GET SUBREQ ")	\
+	EM(netfs_rreq_trace_get_work,		"GET WORK   ")	\
 	EM(netfs_rreq_trace_put_complete,	"PUT COMPLT ")	\
 	EM(netfs_rreq_trace_put_discard,	"PUT DISCARD")	\
 	EM(netfs_rreq_trace_put_failed,		"PUT FAILED ")	\
@@ -95,6 +104,8 @@
 	EM(netfs_rreq_trace_put_return,		"PUT RETURN ")	\
 	EM(netfs_rreq_trace_put_subreq,		"PUT SUBREQ ")	\
 	EM(netfs_rreq_trace_put_work,		"PUT WORK   ")	\
+	EM(netfs_rreq_trace_put_work_complete,	"PUT WORK CP")	\
+	EM(netfs_rreq_trace_put_work_nq,	"PUT WORK NQ")	\
 	EM(netfs_rreq_trace_see_work,		"SEE WORK   ")	\
 	E_(netfs_rreq_trace_new,		"NEW        ")
 
@@ -103,11 +114,14 @@
 	EM(netfs_sreq_trace_get_resubmit,	"GET RESUBMIT")	\
 	EM(netfs_sreq_trace_get_short_read,	"GET SHORTRD")	\
 	EM(netfs_sreq_trace_new,		"NEW        ")	\
+	EM(netfs_sreq_trace_put_cancel,		"PUT CANCEL ")	\
 	EM(netfs_sreq_trace_put_clear,		"PUT CLEAR  ")	\
 	EM(netfs_sreq_trace_put_discard,	"PUT DISCARD")	\
+	EM(netfs_sreq_trace_put_done,		"PUT DONE   ")	\
 	EM(netfs_sreq_trace_put_failed,		"PUT FAILED ")	\
 	EM(netfs_sreq_trace_put_merged,		"PUT MERGED ")	\
 	EM(netfs_sreq_trace_put_no_copy,	"PUT NO COPY")	\
+	EM(netfs_sreq_trace_put_oom,		"PUT OOM    ")	\
 	EM(netfs_sreq_trace_put_wip,		"PUT WIP    ")	\
 	EM(netfs_sreq_trace_put_work,		"PUT WORK   ")	\
 	E_(netfs_sreq_trace_put_terminated,	"PUT TERM   ")
@@ -124,7 +138,9 @@
 	EM(netfs_streaming_filled_page,		"mod-streamw-f") \
 	EM(netfs_streaming_cont_filled_page,	"mod-streamw-f+") \
 	/* The rest are for writeback */			\
+	EM(netfs_folio_trace_cancel_copy,	"cancel-copy")	\
 	EM(netfs_folio_trace_clear,		"clear")	\
+	EM(netfs_folio_trace_clear_cc,		"clear-cc")	\
 	EM(netfs_folio_trace_clear_s,		"clear-s")	\
 	EM(netfs_folio_trace_clear_g,		"clear-g")	\
 	EM(netfs_folio_trace_copy,		"copy")		\
@@ -133,16 +149,26 @@
 	EM(netfs_folio_trace_end_copy,		"end-copy")	\
 	EM(netfs_folio_trace_filled_gaps,	"filled-gaps")	\
 	EM(netfs_folio_trace_kill,		"kill")		\
+	EM(netfs_folio_trace_kill_cc,		"kill-cc")	\
+	EM(netfs_folio_trace_kill_g,		"kill-g")	\
+	EM(netfs_folio_trace_kill_s,		"kill-s")	\
 	EM(netfs_folio_trace_mkwrite,		"mkwrite")	\
 	EM(netfs_folio_trace_mkwrite_plus,	"mkwrite+")	\
+	EM(netfs_folio_trace_not_under_wback,	"!wback")	\
 	EM(netfs_folio_trace_read_gaps,		"read-gaps")	\
 	EM(netfs_folio_trace_redirty,		"redirty")	\
 	EM(netfs_folio_trace_redirtied,		"redirtied")	\
 	EM(netfs_folio_trace_store,		"store")	\
+	EM(netfs_folio_trace_store_copy,	"store-copy")	\
 	EM(netfs_folio_trace_store_plus,	"store+")	\
 	EM(netfs_folio_trace_wthru,		"wthru")	\
 	E_(netfs_folio_trace_wthru_plus,	"wthru+")
 
+#define netfs_collect_contig_traces				\
+	EM(netfs_contig_trace_collect,		"Collect")	\
+	EM(netfs_contig_trace_jump,		"-->JUMP-->")	\
+	E_(netfs_contig_trace_unlock,		"Unlock")
+
 #ifndef __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY
 #define __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY
 
@@ -159,6 +185,7 @@ enum netfs_failure { netfs_failures } __mode(byte);
 enum netfs_rreq_ref_trace { netfs_rreq_ref_traces } __mode(byte);
 enum netfs_sreq_ref_trace { netfs_sreq_ref_traces } __mode(byte);
 enum netfs_folio_trace { netfs_folio_traces } __mode(byte);
+enum netfs_collect_contig_trace { netfs_collect_contig_traces } __mode(byte);
 
 #endif
 
@@ -180,6 +207,7 @@ netfs_failures;
 netfs_rreq_ref_traces;
 netfs_sreq_ref_traces;
 netfs_folio_traces;
+netfs_collect_contig_traces;
 
 /*
  * Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -413,16 +441,18 @@ TRACE_EVENT(netfs_write_iter,
 		    __field(unsigned long long,		start		)
 		    __field(size_t,			len		)
 		    __field(unsigned int,		flags		)
+		    __field(unsigned int,		ino		)
 			     ),
 
 	    TP_fast_assign(
 		    __entry->start	= iocb->ki_pos;
 		    __entry->len	= iov_iter_count(from);
+		    __entry->ino	= iocb->ki_filp->f_inode->i_ino;
 		    __entry->flags	= iocb->ki_flags;
 			   ),
 
-	    TP_printk("WRITE-ITER s=%llx l=%zx f=%x",
-		      __entry->start, __entry->len, __entry->flags)
+	    TP_printk("WRITE-ITER i=%x s=%llx l=%zx f=%x",
+		      __entry->ino, __entry->start, __entry->len, __entry->flags)
 	    );
 
 TRACE_EVENT(netfs_write,
@@ -434,6 +464,7 @@ TRACE_EVENT(netfs_write,
 	    TP_STRUCT__entry(
 		    __field(unsigned int,		wreq		)
 		    __field(unsigned int,		cookie		)
+		    __field(unsigned int,		ino		)
 		    __field(enum netfs_write_trace,	what		)
 		    __field(unsigned long long,		start		)
 		    __field(unsigned long long,		len		)
@@ -444,18 +475,213 @@ TRACE_EVENT(netfs_write,
 		    struct fscache_cookie *__cookie = netfs_i_cookie(__ctx);
 		    __entry->wreq	= wreq->debug_id;
 		    __entry->cookie	= __cookie ? __cookie->debug_id : 0;
+		    __entry->ino	= wreq->inode->i_ino;
 		    __entry->what	= what;
 		    __entry->start	= wreq->start;
 		    __entry->len	= wreq->len;
 			   ),
 
-	    TP_printk("R=%08x %s c=%08x by=%llx-%llx",
+	    TP_printk("R=%08x %s c=%08x i=%x by=%llx-%llx",
 		      __entry->wreq,
 		      __print_symbolic(__entry->what, netfs_write_traces),
 		      __entry->cookie,
+		      __entry->ino,
 		      __entry->start, __entry->start + __entry->len - 1)
 	    );
 
+TRACE_EVENT(netfs_collect,
+	    TP_PROTO(const struct netfs_io_request *wreq),
+
+	    TP_ARGS(wreq),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		wreq		)
+		    __field(unsigned int,		len		)
+		    __field(unsigned long long,		transferred	)
+		    __field(unsigned long long,		start		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->wreq	= wreq->debug_id;
+		    __entry->start	= wreq->start;
+		    __entry->len	= wreq->len;
+		    __entry->transferred = wreq->transferred;
+			   ),
+
+	    TP_printk("R=%08x s=%llx-%llx",
+		      __entry->wreq,
+		      __entry->start + __entry->transferred,
+		      __entry->start + __entry->len)
+	    );
+
+TRACE_EVENT(netfs_collect_contig,
+	    TP_PROTO(const struct netfs_io_request *wreq, unsigned long long to,
+		     enum netfs_collect_contig_trace type),
+
+	    TP_ARGS(wreq, to, type),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		wreq)
+		    __field(enum netfs_collect_contig_trace, type)
+		    __field(unsigned long long,		contiguity)
+		    __field(unsigned long long,		to)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->wreq	= wreq->debug_id;
+		    __entry->type	= type;
+		    __entry->contiguity	= wreq->contiguity;
+		    __entry->to		= to;
+			   ),
+
+	    TP_printk("R=%08x %llx -> %llx %s",
+		      __entry->wreq,
+		      __entry->contiguity,
+		      __entry->to,
+		      __print_symbolic(__entry->type, netfs_collect_contig_traces))
+	    );
+
+TRACE_EVENT(netfs_collect_sreq,
+	    TP_PROTO(const struct netfs_io_request *wreq,
+		     const struct netfs_io_subrequest *subreq),
+
+	    TP_ARGS(wreq, subreq),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		wreq		)
+		    __field(unsigned int,		subreq		)
+		    __field(unsigned int,		stream		)
+		    __field(unsigned int,		len		)
+		    __field(unsigned int,		transferred	)
+		    __field(unsigned long long,		start		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->wreq	= wreq->debug_id;
+		    __entry->subreq	= subreq->debug_index;
+		    __entry->stream	= subreq->stream_nr;
+		    __entry->start	= subreq->start;
+		    __entry->len	= subreq->len;
+		    __entry->transferred = subreq->transferred;
+			   ),
+
+	    TP_printk("R=%08x[%u:%02x] s=%llx t=%x/%x",
+		      __entry->wreq, __entry->stream, __entry->subreq,
+		      __entry->start, __entry->transferred, __entry->len)
+	    );
+
+TRACE_EVENT(netfs_collect_folio,
+	    TP_PROTO(const struct netfs_io_request *wreq,
+		     const struct folio *folio,
+		     unsigned long long fend,
+		     unsigned long long collected_to),
+
+	    TP_ARGS(wreq, folio, fend, collected_to),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	wreq		)
+		    __field(unsigned long,	index		)
+		    __field(unsigned long long,	fend		)
+		    __field(unsigned long long,	cleaned_to	)
+		    __field(unsigned long long,	collected_to	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->wreq	= wreq->debug_id;
+		    __entry->index	= folio->index;
+		    __entry->fend	= fend;
+		    __entry->cleaned_to	= wreq->cleaned_to;
+		    __entry->collected_to = collected_to;
+			   ),
+
+	    TP_printk("R=%08x ix=%05lx r=%llx-%llx t=%llx/%llx",
+		      __entry->wreq, __entry->index,
+		      (unsigned long long)__entry->index * PAGE_SIZE, __entry->fend,
+		      __entry->cleaned_to, __entry->collected_to)
+	    );
+
+TRACE_EVENT(netfs_collect_state,
+	    TP_PROTO(const struct netfs_io_request *wreq,
+		     unsigned long long collected_to,
+		     unsigned int notes),
+
+	    TP_ARGS(wreq, collected_to, notes),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	wreq		)
+		    __field(unsigned int,	notes		)
+		    __field(unsigned long long,	collected_to	)
+		    __field(unsigned long long,	cleaned_to	)
+		    __field(unsigned long long,	contiguity	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->wreq	= wreq->debug_id;
+		    __entry->notes	= notes;
+		    __entry->collected_to = collected_to;
+		    __entry->cleaned_to	= wreq->cleaned_to;
+		    __entry->contiguity = wreq->contiguity;
+			   ),
+
+	    TP_printk("R=%08x cto=%llx fto=%llx ctg=%llx n=%x",
+		      __entry->wreq, __entry->collected_to,
+		      __entry->cleaned_to, __entry->contiguity,
+		      __entry->notes)
+	    );
+
+TRACE_EVENT(netfs_collect_gap,
+	    TP_PROTO(const struct netfs_io_request *wreq,
+		     const struct netfs_io_stream *stream,
+		     unsigned long long jump_to, char type),
+
+	    TP_ARGS(wreq, stream, jump_to, type),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	wreq)
+		    __field(unsigned char,	stream)
+		    __field(unsigned char,	type)
+		    __field(unsigned long long,	from)
+		    __field(unsigned long long,	to)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->wreq	= wreq->debug_id;
+		    __entry->stream	= stream->stream_nr;
+		    __entry->from	= stream->collected_to;
+		    __entry->to		= jump_to;
+		    __entry->type	= type;
+			   ),
+
+	    TP_printk("R=%08x[%x:] %llx->%llx %c",
+		      __entry->wreq, __entry->stream,
+		      __entry->from, __entry->to, __entry->type)
+	    );
+
+TRACE_EVENT(netfs_collect_stream,
+	    TP_PROTO(const struct netfs_io_request *wreq,
+		     const struct netfs_io_stream *stream),
+
+	    TP_ARGS(wreq, stream),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,	wreq)
+		    __field(unsigned char,	stream)
+		    __field(unsigned long long,	collected_to)
+		    __field(unsigned long long,	front)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->wreq	= wreq->debug_id;
+		    __entry->stream	= stream->stream_nr;
+		    __entry->collected_to = stream->collected_to;
+		    __entry->front	= stream->front ? stream->front->start : UINT_MAX;
+			   ),
+
+	    TP_printk("R=%08x[%x:] cto=%llx frn=%llx",
+		      __entry->wreq, __entry->stream,
+		      __entry->collected_to, __entry->front)
+	    );
+
 #undef EM
 #undef E_
 #endif /* _TRACE_NETFS_H */


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 20/26] netfs, afs: Implement helpers for new write code
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (18 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 19/26] netfs: New writeback implementation David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 21/26] netfs, 9p: " David Howells
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Implement the helpers for the new write code in afs.  There's now an
optional ->prepare_write() that allows the filesystem to set the parameters
for the next write, such as maximum size and maximum segment count, and an
->issue_write() that is called to initiate an (asynchronous) write
operation.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/afs/file.c     |  3 +++
 fs/afs/internal.h |  3 +++
 fs/afs/write.c    | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index dfd8f60f5e1f..db9ebae84fa2 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -400,6 +400,9 @@ const struct netfs_request_ops afs_req_ops = {
 	.update_i_size		= afs_update_i_size,
 	.invalidate_cache	= afs_netfs_invalidate_cache,
 	.create_write_requests	= afs_create_write_requests,
+	.begin_writeback	= afs_begin_writeback,
+	.prepare_write		= afs_prepare_write,
+	.issue_write		= afs_issue_write,
 };
 
 static void afs_add_open_mmap(struct afs_vnode *vnode)
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index b93aa026daa4..dcf0ae0323d3 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1598,6 +1598,9 @@ extern int afs_check_volume_status(struct afs_volume *, struct afs_operation *);
 /*
  * write.c
  */
+void afs_prepare_write(struct netfs_io_subrequest *subreq);
+void afs_issue_write(struct netfs_io_subrequest *subreq);
+void afs_begin_writeback(struct netfs_io_request *wreq);
 extern int afs_writepages(struct address_space *, struct writeback_control *);
 extern int afs_fsync(struct file *, loff_t, loff_t, int);
 extern vm_fault_t afs_page_mkwrite(struct vm_fault *vmf);
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 1bc26466eb72..89b073881cac 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -194,6 +194,52 @@ void afs_create_write_requests(struct netfs_io_request *wreq, loff_t start, size
 		netfs_queue_write_request(subreq);
 }
 
+/*
+ * Writeback calls this when it finds a folio that needs uploading.  This isn't
+ * called if writeback only has copy-to-cache to deal with.
+ */
+void afs_begin_writeback(struct netfs_io_request *wreq)
+{
+	wreq->io_streams[0].avail = true;
+}
+
+/*
+ * Prepare a subrequest to write to the server.  This sets the max_len
+ * parameter.
+ */
+void afs_prepare_write(struct netfs_io_subrequest *subreq)
+{
+	//if (test_bit(NETFS_SREQ_RETRYING, &subreq->flags))
+	//	subreq->max_len = 512 * 1024;
+	//else
+	subreq->max_len = 256 * 1024 * 1024;
+}
+
+/*
+ * Issue a subrequest to write to the server.
+ */
+void afs_issue_write(struct netfs_io_subrequest *subreq)
+{
+	struct afs_vnode *vnode = AFS_FS_I(subreq->rreq->inode);
+	ssize_t ret;
+
+	_enter("%x[%x],%zx",
+	       subreq->rreq->debug_id, subreq->debug_index, subreq->io_iter.count);
+
+#if 0 // Error injection
+	if (subreq->debug_index == 3)
+		return netfs_write_subrequest_terminated(subreq, -ENOANO, false);
+
+	if (!test_bit(NETFS_SREQ_RETRYING, &subreq->flags)) {
+		set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+		return netfs_write_subrequest_terminated(subreq, -EAGAIN, false);
+	}
+#endif
+
+	ret = afs_store_data(vnode, &subreq->io_iter, subreq->start);
+	netfs_write_subrequest_terminated(subreq, ret < 0 ? ret : subreq->len, false);
+}
+
 /*
  * write some of the pending data back to the server
  */


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 21/26] netfs, 9p: Implement helpers for new write code
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (19 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 20/26] netfs, afs: Implement helpers for new write code David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 22/26] netfs, cachefiles: " David Howells
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, David Howells, linux-mm,
	Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs,
	Matthew Wilcox, Steve French, linux-cachefs, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	linux-nfs, netdev, v9fs, linux-kernel, linux-fsdevel, netfs,
	linux-erofs

Implement the helpers for the new write code in 9p.  There's now an
optional ->prepare_write() that allows the filesystem to set the parameters
for the next write, such as maximum size and maximum segment count, and an
->issue_write() that is called to initiate an (asynchronous) write
operation.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: v9fs@lists.linux.dev
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/9p/vfs_addr.c        | 48 ++++++++++++++++++++++++++++++++++++++++
 include/net/9p/client.h |  2 ++
 net/9p/Kconfig          |  1 +
 net/9p/client.c         | 49 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 100 insertions(+)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 5a943c122d83..07d03efdd594 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -26,6 +26,40 @@
 #include "cache.h"
 #include "fid.h"
 
+/*
+ * Writeback calls this when it finds a folio that needs uploading.  This isn't
+ * called if writeback only has copy-to-cache to deal with.
+ */
+static void v9fs_begin_writeback(struct netfs_io_request *wreq)
+{
+	struct p9_fid *fid;
+
+	fid = v9fs_fid_find_inode(wreq->inode, true, INVALID_UID, true);
+	if (!fid) {
+		WARN_ONCE(1, "folio expected an open fid inode->i_ino=%lx\n",
+			  wreq->inode->i_ino);
+		return;
+	}
+
+	wreq->wsize = fid->clnt->msize - P9_IOHDRSZ;
+	if (fid->iounit)
+		wreq->wsize = min(wreq->wsize, fid->iounit);
+	wreq->netfs_priv = fid;
+	wreq->io_streams[0].avail = true;
+}
+
+/*
+ * Issue a subrequest to write to the server.
+ */
+static void v9fs_issue_write(struct netfs_io_subrequest *subreq)
+{
+	struct p9_fid *fid = subreq->rreq->netfs_priv;
+	int err, len;
+
+	len = p9_client_write(fid, subreq->start, &subreq->io_iter, &err);
+	netfs_write_subrequest_terminated(subreq, len ?: err, false);
+}
+
 static void v9fs_upload_to_server(struct netfs_io_subrequest *subreq)
 {
 	struct p9_fid *fid = subreq->rreq->netfs_priv;
@@ -92,6 +126,14 @@ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file)
 			rreq->origin == NETFS_UNBUFFERED_WRITE ||
 			rreq->origin == NETFS_DIO_WRITE);
 
+#if 0 // TODO: Cut over
+	if (rreq->origin == NETFS_WRITEBACK)
+		return 0; /* We don't get the write handle until we find we
+			   * have actually dirty data and not just
+			   * copy-to-cache data.
+			   */
+#endif
+
 	if (file) {
 		fid = file->private_data;
 		if (!fid)
@@ -103,6 +145,10 @@ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file)
 			goto no_fid;
 	}
 
+	rreq->wsize = fid->clnt->msize - P9_IOHDRSZ;
+	if (fid->iounit)
+		rreq->wsize = min(rreq->wsize, fid->iounit);
+
 	/* we might need to read from a fid that was opened write-only
 	 * for read-modify-write of page cache, use the writeback fid
 	 * for that */
@@ -131,6 +177,8 @@ const struct netfs_request_ops v9fs_req_ops = {
 	.init_request		= v9fs_init_request,
 	.free_request		= v9fs_free_request,
 	.issue_read		= v9fs_issue_read,
+	.begin_writeback	= v9fs_begin_writeback,
+	.issue_write		= v9fs_issue_write,
 	.create_write_requests	= v9fs_create_write_requests,
 };
 
diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index 78ebcf782ce5..4f785098c67a 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -207,6 +207,8 @@ int p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err
 int p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
 		int *err);
 int p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err);
+struct netfs_io_subrequest;
+void p9_client_write_subreq(struct netfs_io_subrequest *subreq);
 int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset);
 int p9dirent_read(struct p9_client *clnt, char *buf, int len,
 		  struct p9_dirent *dirent);
diff --git a/net/9p/Kconfig b/net/9p/Kconfig
index 00ebce9e5a65..bcdab9c23b40 100644
--- a/net/9p/Kconfig
+++ b/net/9p/Kconfig
@@ -5,6 +5,7 @@
 
 menuconfig NET_9P
 	tristate "Plan 9 Resource Sharing Support (9P2000)"
+	select NETFS_SUPPORT
 	help
 	  If you say Y here, you will get experimental support for
 	  Plan 9 resource sharing via the 9P2000 protocol.
diff --git a/net/9p/client.c b/net/9p/client.c
index e265a0ca6bdd..844aca4fe4d8 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -18,6 +18,7 @@
 #include <linux/sched/signal.h>
 #include <linux/uaccess.h>
 #include <linux/uio.h>
+#include <linux/netfs.h>
 #include <net/9p/9p.h>
 #include <linux/parser.h>
 #include <linux/seq_file.h>
@@ -1661,6 +1662,54 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 }
 EXPORT_SYMBOL(p9_client_write);
 
+void
+p9_client_write_subreq(struct netfs_io_subrequest *subreq)
+{
+	struct netfs_io_request *wreq = subreq->rreq;
+	struct p9_fid *fid = wreq->netfs_priv;
+	struct p9_client *clnt = fid->clnt;
+	struct p9_req_t *req;
+	unsigned long long start = subreq->start + subreq->transferred;
+	size_t len = subreq->len - subreq->transferred;
+	int written, err;
+
+	p9_debug(P9_DEBUG_9P, ">>> TWRITE fid %d offset %llu len %zd\n",
+		 fid->fid, start, len);
+
+	/* Don't bother zerocopy for small IO (< 1024) */
+	if (clnt->trans_mod->zc_request && len > 1024) {
+		req = p9_client_zc_rpc(clnt, P9_TWRITE, NULL, &subreq->io_iter,
+				       0, wreq->len, P9_ZC_HDR_SZ, "dqd",
+				       fid->fid, start, len);
+	} else {
+		req = p9_client_rpc(clnt, P9_TWRITE, "dqV", fid->fid,
+				    start, len, &subreq->io_iter);
+	}
+	if (IS_ERR(req)) {
+		netfs_write_subrequest_terminated(subreq, PTR_ERR(req), false);
+		return;
+	}
+
+	err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &written);
+	if (err) {
+		trace_9p_protocol_dump(clnt, &req->rc);
+		p9_req_put(clnt, req);
+		netfs_write_subrequest_terminated(subreq, err, false);
+		return;
+	}
+
+	if (written > len) {
+		pr_err("bogus RWRITE count (%d > %lu)\n", written, len);
+		written = len;
+	}
+
+	p9_debug(P9_DEBUG_9P, "<<< RWRITE count %zd\n", len);
+
+	p9_req_put(clnt, req);
+	netfs_write_subrequest_terminated(subreq, written, false);
+}
+EXPORT_SYMBOL(p9_client_write_subreq);
+
 struct p9_wstat *p9_client_stat(struct p9_fid *fid)
 {
 	int err;


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 22/26] netfs, cachefiles: Implement helpers for new write code
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (20 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 21/26] netfs, 9p: " David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 23/26] netfs: Cut over to using new writeback code David Howells
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Implement the helpers for the new write code in cachefiles.  There's now an
optional ->prepare_write() that allows the filesystem to set the parameters
for the next write, such as maximum size and maximum segment count, and an
->issue_write() that is called to initiate an (asynchronous) write
operation.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-erofs@lists.ozlabs.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/cachefiles/io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 5ba5c7814fe4..437b24b0fd1c 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -622,6 +622,77 @@ static int cachefiles_prepare_write(struct netfs_cache_resources *cres,
 	return ret;
 }
 
+static void cachefiles_prepare_write_subreq(struct netfs_io_subrequest *subreq)
+{
+	struct netfs_io_request *wreq = subreq->rreq;
+	struct netfs_cache_resources *cres = &wreq->cache_resources;
+
+	_enter("W=%x[%x] %llx", wreq->debug_id, subreq->debug_index, subreq->start);
+
+	subreq->max_len = ULONG_MAX;
+	subreq->max_nr_segs = BIO_MAX_VECS;
+
+	if (!cachefiles_cres_file(cres)) {
+		if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE))
+			return netfs_prepare_write_failed(subreq);
+		if (!cachefiles_cres_file(cres))
+			return netfs_prepare_write_failed(subreq);
+	}
+}
+
+static void cachefiles_issue_write(struct netfs_io_subrequest *subreq)
+{
+	struct netfs_io_request *wreq = subreq->rreq;
+	struct netfs_cache_resources *cres = &wreq->cache_resources;
+	struct cachefiles_object *object = cachefiles_cres_object(cres);
+	struct cachefiles_cache *cache = object->volume->cache;
+	const struct cred *saved_cred;
+	size_t off, pre, post, len = subreq->len;
+	loff_t start = subreq->start;
+	int ret;
+
+	_enter("W=%x[%x] %llx-%llx",
+	       wreq->debug_id, subreq->debug_index, start, start + len - 1);
+
+	/* We need to start on the cache granularity boundary */
+	off = start & (CACHEFILES_DIO_BLOCK_SIZE - 1);
+	if (off) {
+		pre = CACHEFILES_DIO_BLOCK_SIZE - off;
+		if (pre >= len) {
+			netfs_write_subrequest_terminated(subreq, len, false);
+			return;
+		}
+		subreq->transferred += pre;
+		start += pre;
+		len -= pre;
+		iov_iter_advance(&subreq->io_iter, pre);
+	}
+
+	/* We also need to end on the cache granularity boundary */
+	post = len & (CACHEFILES_DIO_BLOCK_SIZE - 1);
+	if (post) {
+		len -= post;
+		if (len == 0) {
+			netfs_write_subrequest_terminated(subreq, post, false);
+			return;
+		}
+		iov_iter_truncate(&subreq->io_iter, len);
+	}
+
+	cachefiles_begin_secure(cache, &saved_cred);
+	ret = __cachefiles_prepare_write(object, cachefiles_cres_file(cres),
+					 &start, &len, len, true);
+	cachefiles_end_secure(cache, saved_cred);
+	if (ret < 0) {
+		netfs_write_subrequest_terminated(subreq, ret, false);
+		return;
+	}
+
+	cachefiles_write(&subreq->rreq->cache_resources,
+			 subreq->start, &subreq->io_iter,
+			 netfs_write_subrequest_terminated, subreq);
+}
+
 /*
  * Clean up an operation.
  */
@@ -638,8 +709,10 @@ static const struct netfs_cache_ops cachefiles_netfs_cache_ops = {
 	.end_operation		= cachefiles_end_operation,
 	.read			= cachefiles_read,
 	.write			= cachefiles_write,
+	.issue_write		= cachefiles_issue_write,
 	.prepare_read		= cachefiles_prepare_read,
 	.prepare_write		= cachefiles_prepare_write,
+	.prepare_write_subreq	= cachefiles_prepare_write_subreq,
 	.prepare_ondemand_read	= cachefiles_prepare_ondemand_read,
 	.query_occupancy	= cachefiles_query_occupancy,
 };


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 23/26] netfs: Cut over to using new writeback code
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (21 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 22/26] netfs, cachefiles: " David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 24/26] netfs: Remove the old " David Howells
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, David Howells, linux-mm,
	Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs,
	Matthew Wilcox, Steve French, linux-cachefs, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	linux-nfs, netdev, v9fs, linux-kernel, linux-fsdevel, netfs,
	linux-erofs

Cut over to using the new writeback code.  The old code is #ifdef'd out or
otherwise removed from compilation to avoid conflicts and will be removed
in a future patch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/9p/vfs_addr.c          |  6 ++---
 fs/afs/file.c             |  3 +--
 fs/afs/internal.h         |  1 -
 fs/afs/write.c            |  2 ++
 fs/netfs/Makefile         |  1 -
 fs/netfs/buffered_write.c | 46 +++++++++++++++++++++------------------
 fs/netfs/direct_write.c   | 26 ++++++++++++----------
 fs/netfs/internal.h       | 21 +++++-------------
 fs/netfs/write_collect.c  |  8 +++----
 fs/netfs/write_issue.c    | 18 +++++++--------
 include/linux/netfs.h     |  9 --------
 11 files changed, 63 insertions(+), 78 deletions(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 07d03efdd594..4845e655bc39 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -60,6 +60,7 @@ static void v9fs_issue_write(struct netfs_io_subrequest *subreq)
 	netfs_write_subrequest_terminated(subreq, len ?: err, false);
 }
 
+#if 0 // TODO: Remove
 static void v9fs_upload_to_server(struct netfs_io_subrequest *subreq)
 {
 	struct p9_fid *fid = subreq->rreq->netfs_priv;
@@ -91,6 +92,7 @@ static void v9fs_create_write_requests(struct netfs_io_request *wreq, loff_t sta
 	if (subreq)
 		netfs_queue_write_request(subreq);
 }
+#endif
 
 /**
  * v9fs_issue_read - Issue a read from 9P
@@ -121,18 +123,15 @@ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file)
 {
 	struct p9_fid *fid;
 	bool writing = (rreq->origin == NETFS_READ_FOR_WRITE ||
-			rreq->origin == NETFS_WRITEBACK ||
 			rreq->origin == NETFS_WRITETHROUGH ||
 			rreq->origin == NETFS_UNBUFFERED_WRITE ||
 			rreq->origin == NETFS_DIO_WRITE);
 
-#if 0 // TODO: Cut over
 	if (rreq->origin == NETFS_WRITEBACK)
 		return 0; /* We don't get the write handle until we find we
 			   * have actually dirty data and not just
 			   * copy-to-cache data.
 			   */
-#endif
 
 	if (file) {
 		fid = file->private_data;
@@ -179,7 +178,6 @@ const struct netfs_request_ops v9fs_req_ops = {
 	.issue_read		= v9fs_issue_read,
 	.begin_writeback	= v9fs_begin_writeback,
 	.issue_write		= v9fs_issue_write,
-	.create_write_requests	= v9fs_create_write_requests,
 };
 
 const struct address_space_operations v9fs_addr_operations = {
diff --git a/fs/afs/file.c b/fs/afs/file.c
index db9ebae84fa2..8f983e3ecae7 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -353,7 +353,7 @@ static int afs_init_request(struct netfs_io_request *rreq, struct file *file)
 	if (file)
 		rreq->netfs_priv = key_get(afs_file_key(file));
 	rreq->rsize = 256 * 1024;
-	rreq->wsize = 256 * 1024;
+	rreq->wsize = 256 * 1024 * 1024;
 	return 0;
 }
 
@@ -399,7 +399,6 @@ const struct netfs_request_ops afs_req_ops = {
 	.issue_read		= afs_issue_read,
 	.update_i_size		= afs_update_i_size,
 	.invalidate_cache	= afs_netfs_invalidate_cache,
-	.create_write_requests	= afs_create_write_requests,
 	.begin_writeback	= afs_begin_writeback,
 	.prepare_write		= afs_prepare_write,
 	.issue_write		= afs_issue_write,
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index dcf0ae0323d3..887245f9336d 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1605,7 +1605,6 @@ extern int afs_writepages(struct address_space *, struct writeback_control *);
 extern int afs_fsync(struct file *, loff_t, loff_t, int);
 extern vm_fault_t afs_page_mkwrite(struct vm_fault *vmf);
 extern void afs_prune_wb_keys(struct afs_vnode *);
-void afs_create_write_requests(struct netfs_io_request *wreq, loff_t start, size_t len);
 
 /*
  * xattr.c
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 89b073881cac..0ead204c84cb 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -156,6 +156,7 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t
 	return afs_put_operation(op);
 }
 
+#if 0 // TODO: Remove
 static void afs_upload_to_server(struct netfs_io_subrequest *subreq)
 {
 	struct afs_vnode *vnode = AFS_FS_I(subreq->rreq->inode);
@@ -193,6 +194,7 @@ void afs_create_write_requests(struct netfs_io_request *wreq, loff_t start, size
 	if (subreq)
 		netfs_queue_write_request(subreq);
 }
+#endif
 
 /*
  * Writeback calls this when it finds a folio that needs uploading.  This isn't
diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index 1eb86e34b5a9..8e6781e0b10b 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -11,7 +11,6 @@ netfs-y := \
 	main.o \
 	misc.o \
 	objects.o \
-	output.o \
 	write_collect.o \
 	write_issue.o
 
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 621532dacef5..945e646cd2db 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -26,8 +26,6 @@ enum netfs_how_to_modify {
 	NETFS_FLUSH_CONTENT,		/* Flush incompatible content. */
 };
 
-static void netfs_cleanup_buffered_write(struct netfs_io_request *wreq);
-
 static void netfs_set_group(struct folio *folio, struct netfs_group *netfs_group)
 {
 	void *priv = folio_get_private(folio);
@@ -180,7 +178,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 	};
 	struct netfs_io_request *wreq = NULL;
 	struct netfs_folio *finfo;
-	struct folio *folio;
+	struct folio *folio, *writethrough = NULL;
 	enum netfs_how_to_modify howto;
 	enum netfs_folio_trace trace;
 	unsigned int bdp_flags = (iocb->ki_flags & IOCB_SYNC) ? 0: BDP_ASYNC;
@@ -210,7 +208,6 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 		}
 		if (!is_sync_kiocb(iocb))
 			wreq->iocb = iocb;
-		wreq->cleanup = netfs_cleanup_buffered_write;
 		netfs_stat(&netfs_n_wh_writethrough);
 	} else {
 		netfs_stat(&netfs_n_wh_buffered_write);
@@ -254,6 +251,15 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 		offset = pos & (flen - 1);
 		part = min_t(size_t, flen - offset, part);
 
+		/* Wait for writeback to complete.  The writeback engine owns
+		 * the info in folio->private and may change it until it
+		 * removes the WB mark.
+		 */
+		if (folio_wait_writeback_killable(folio)) {
+			ret = written ? -EINTR : -ERESTARTSYS;
+			goto error_folio_unlock;
+		}
+
 		if (signal_pending(current)) {
 			ret = written ? -EINTR : -ERESTARTSYS;
 			goto error_folio_unlock;
@@ -328,6 +334,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 				maybe_trouble = true;
 				iov_iter_revert(iter, copied);
 				copied = 0;
+				folio_unlock(folio);
 				goto retry;
 			}
 			netfs_set_group(folio, netfs_group);
@@ -383,23 +390,16 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 
 		if (likely(!wreq)) {
 			folio_mark_dirty(folio);
+			folio_unlock(folio);
 		} else {
-			if (folio_test_dirty(folio))
-				/* Sigh.  mmap. */
-				folio_clear_dirty_for_io(folio);
-			/* We make multiple writes to the folio... */
-			if (!folio_test_writeback(folio)) {
-				folio_start_writeback(folio);
-				if (wreq->iter.count == 0)
-					trace_netfs_folio(folio, netfs_folio_trace_wthru);
-				else
-					trace_netfs_folio(folio, netfs_folio_trace_wthru_plus);
-			}
-			netfs_advance_writethrough(wreq, copied,
-						   offset + copied == flen);
+			if (pos > wreq->i_size)
+				wreq->i_size = pos;
+			netfs_advance_writethrough(wreq, &wbc, folio, copied,
+						   offset + copied == flen,
+						   &writethrough);
+			/* Folio unlocked */
 		}
 	retry:
-		folio_unlock(folio);
 		folio_put(folio);
 		folio = NULL;
 
@@ -408,7 +408,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 
 out:
 	if (unlikely(wreq)) {
-		ret2 = netfs_end_writethrough(wreq, iocb);
+		ret2 = netfs_end_writethrough(wreq, &wbc, writethrough);
 		wbc_detach_inode(&wbc);
 		if (ret2 == -EIOCBQUEUED)
 			return ret2;
@@ -530,11 +530,13 @@ vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group *netfs_gr
 
 	sb_start_pagefault(inode->i_sb);
 
-	if (folio_wait_writeback_killable(folio))
+	if (folio_lock_killable(folio) < 0)
 		goto out;
 
-	if (folio_lock_killable(folio) < 0)
+	if (folio_wait_writeback_killable(folio)) {
+		ret = VM_FAULT_LOCKED;
 		goto out;
+	}
 
 	/* Can we see a streaming write here? */
 	if (WARN_ON(!folio_test_uptodate(folio))) {
@@ -574,6 +576,7 @@ vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group *netfs_gr
 }
 EXPORT_SYMBOL(netfs_page_mkwrite);
 
+#if 0 // TODO: Remove
 /*
  * Kill all the pages in the given range
  */
@@ -1200,3 +1203,4 @@ int netfs_writepages(struct address_space *mapping,
 	return ret;
 }
 EXPORT_SYMBOL(netfs_writepages);
+#endif
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 37c91188107b..330ba7cb3f10 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -34,6 +34,7 @@ static ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov
 	unsigned long long start = iocb->ki_pos;
 	unsigned long long end = start + iov_iter_count(iter);
 	ssize_t ret, n;
+	size_t len = iov_iter_count(iter);
 	bool async = !is_sync_kiocb(iocb);
 
 	_enter("");
@@ -46,13 +47,17 @@ static ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov
 
 	_debug("uw %llx-%llx", start, end);
 
-	wreq = netfs_alloc_request(iocb->ki_filp->f_mapping, iocb->ki_filp,
-				   start, end - start,
-				   iocb->ki_flags & IOCB_DIRECT ?
-				   NETFS_DIO_WRITE : NETFS_UNBUFFERED_WRITE);
+	wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp, start,
+				      iocb->ki_flags & IOCB_DIRECT ?
+				      NETFS_DIO_WRITE : NETFS_UNBUFFERED_WRITE);
 	if (IS_ERR(wreq))
 		return PTR_ERR(wreq);
 
+	wreq->io_streams[0].avail = true;
+	trace_netfs_write(wreq, (iocb->ki_flags & IOCB_DIRECT ?
+				 netfs_write_trace_dio_write :
+				 netfs_write_trace_unbuffered_write));
+
 	{
 		/* If this is an async op and we're not using a bounce buffer,
 		 * we have to save the source buffer as the iterator is only
@@ -63,7 +68,7 @@ static ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov
 		 * request.
 		 */
 		if (async || user_backed_iter(iter)) {
-			n = netfs_extract_user_iter(iter, wreq->len, &wreq->iter, 0);
+			n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
 			if (n < 0) {
 				ret = n;
 				goto out;
@@ -71,7 +76,6 @@ static ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov
 			wreq->direct_bv = (struct bio_vec *)wreq->iter.bvec;
 			wreq->direct_bv_count = n;
 			wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
-			wreq->len = iov_iter_count(&wreq->iter);
 		} else {
 			wreq->iter = *iter;
 		}
@@ -79,6 +83,8 @@ static ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov
 		wreq->io_iter = wreq->iter;
 	}
 
+	__set_bit(NETFS_RREQ_USE_IO_ITER, &wreq->flags);
+
 	/* Copy the data into the bounce buffer and encrypt it. */
 	// TODO
 
@@ -87,10 +93,7 @@ static ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov
 	if (async)
 		wreq->iocb = iocb;
 	wreq->cleanup = netfs_cleanup_dio_write;
-	ret = netfs_begin_write(wreq, is_sync_kiocb(iocb),
-				iocb->ki_flags & IOCB_DIRECT ?
-				netfs_write_trace_dio_write :
-				netfs_write_trace_unbuffered_write);
+	ret = netfs_unbuffered_write(wreq, is_sync_kiocb(iocb), iov_iter_count(&wreq->io_iter));
 	if (ret < 0) {
 		_debug("begin = %zd", ret);
 		goto out;
@@ -100,9 +103,8 @@ static ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov
 		trace_netfs_rreq(wreq, netfs_rreq_trace_wait_ip);
 		wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS,
 			    TASK_UNINTERRUPTIBLE);
-
+		smp_rmb(); /* Read error/transferred after RIP flag */
 		ret = wreq->error;
-		_debug("waited = %zd", ret);
 		if (ret == 0) {
 			ret = wreq->transferred;
 			iocb->ki_pos += ret;
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 5d3f74a70fa7..95e281a8af78 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -92,15 +92,6 @@ static inline void netfs_see_request(struct netfs_io_request *rreq,
 	trace_netfs_rreq_ref(rreq->debug_id, refcount_read(&rreq->ref), what);
 }
 
-/*
- * output.c
- */
-int netfs_begin_write(struct netfs_io_request *wreq, bool may_wait,
-		      enum netfs_write_trace what);
-struct netfs_io_request *netfs_begin_writethrough(struct kiocb *iocb, size_t len);
-int netfs_advance_writethrough(struct netfs_io_request *wreq, size_t copied, bool to_page_end);
-int netfs_end_writethrough(struct netfs_io_request *wreq, struct kiocb *iocb);
-
 /*
  * stats.c
  */
@@ -172,12 +163,12 @@ void netfs_reissue_write(struct netfs_io_stream *stream,
 int netfs_advance_write(struct netfs_io_request *wreq,
 			struct netfs_io_stream *stream,
 			loff_t start, size_t len, bool to_eof);
-struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t len);
-int new_netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
-				   struct folio *folio, size_t copied, bool to_page_end,
-				   struct folio **writethrough_cache);
-int new_netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
-			       struct folio *writethrough_cache);
+struct netfs_io_request *netfs_begin_writethrough(struct kiocb *iocb, size_t len);
+int netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
+			       struct folio *folio, size_t copied, bool to_page_end,
+			       struct folio **writethrough_cache);
+int netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
+			   struct folio *writethrough_cache);
 int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t len);
 
 /*
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 5e2ca8b25af0..bea939ab0830 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -714,7 +714,7 @@ void netfs_wake_write_collector(struct netfs_io_request *wreq, bool was_async)
 }
 
 /**
- * new_netfs_write_subrequest_terminated - Note the termination of a write operation.
+ * netfs_write_subrequest_terminated - Note the termination of a write operation.
  * @_op: The I/O request that has terminated.
  * @transferred_or_error: The amount of data transferred or an error code.
  * @was_async: The termination was asynchronous
@@ -736,8 +736,8 @@ void netfs_wake_write_collector(struct netfs_io_request *wreq, bool was_async)
  * Note that %_op is a void* so that the function can be passed to
  * kiocb::term_func without the need for a casting wrapper.
  */
-void new_netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
-					   bool was_async)
+void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
+				       bool was_async)
 {
 	struct netfs_io_subrequest *subreq = _op;
 	struct netfs_io_request *wreq = subreq->rreq;
@@ -805,4 +805,4 @@ void new_netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_err
 
 	netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
 }
-EXPORT_SYMBOL(new_netfs_write_subrequest_terminated);
+EXPORT_SYMBOL(netfs_write_subrequest_terminated);
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index e0fb472898f5..61e6208de235 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -485,8 +485,8 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
 /*
  * Write some of the pending data back to the server
  */
-int new_netfs_writepages(struct address_space *mapping,
-			 struct writeback_control *wbc)
+int netfs_writepages(struct address_space *mapping,
+		     struct writeback_control *wbc)
 {
 	struct netfs_inode *ictx = netfs_inode(mapping->host);
 	struct netfs_io_request *wreq = NULL;
@@ -547,12 +547,12 @@ int new_netfs_writepages(struct address_space *mapping,
 	_leave(" = %d", error);
 	return error;
 }
-EXPORT_SYMBOL(new_netfs_writepages);
+EXPORT_SYMBOL(netfs_writepages);
 
 /*
  * Begin a write operation for writing through the pagecache.
  */
-struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t len)
+struct netfs_io_request *netfs_begin_writethrough(struct kiocb *iocb, size_t len)
 {
 	struct netfs_io_request *wreq = NULL;
 	struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
@@ -575,9 +575,9 @@ struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t
  * to the request.  If we've added more than wsize then we need to create a new
  * subrequest.
  */
-int new_netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
-				   struct folio *folio, size_t copied, bool to_page_end,
-				   struct folio **writethrough_cache)
+int netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
+			       struct folio *folio, size_t copied, bool to_page_end,
+			       struct folio **writethrough_cache)
 {
 	_enter("R=%x ic=%zu ws=%u cp=%zu tp=%u",
 	       wreq->debug_id, wreq->iter.count, wreq->wsize, copied, to_page_end);
@@ -607,8 +607,8 @@ int new_netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeba
 /*
  * End a write operation used when writing through the pagecache.
  */
-int new_netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
-			       struct folio *writethrough_cache)
+int netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
+			   struct folio *writethrough_cache)
 {
 	struct netfs_inode *ictx = netfs_inode(wreq->inode);
 	int ret;
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 42dba05a428b..c2ba364041b0 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -303,8 +303,6 @@ struct netfs_request_ops {
 	void (*update_i_size)(struct inode *inode, loff_t i_size);
 
 	/* Write request handling */
-	void (*create_write_requests)(struct netfs_io_request *wreq,
-				      loff_t start, size_t len);
 	void (*begin_writeback)(struct netfs_io_request *wreq);
 	void (*prepare_write)(struct netfs_io_subrequest *subreq);
 	void (*issue_write)(struct netfs_io_subrequest *subreq);
@@ -409,8 +407,6 @@ int netfs_write_begin(struct netfs_inode *, struct file *,
 		      struct folio **, void **fsdata);
 int netfs_writepages(struct address_space *mapping,
 		     struct writeback_control *wbc);
-int new_netfs_writepages(struct address_space *mapping,
-			struct writeback_control *wbc);
 bool netfs_dirty_folio(struct address_space *mapping, struct folio *folio);
 int netfs_unpin_writeback(struct inode *inode, struct writeback_control *wbc);
 void netfs_clear_inode_writeback(struct inode *inode, const void *aux);
@@ -431,14 +427,9 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len,
 				iov_iter_extraction_t extraction_flags);
 size_t netfs_limit_iter(const struct iov_iter *iter, size_t start_offset,
 			size_t max_size, size_t max_segs);
-struct netfs_io_subrequest *netfs_create_write_request(
-	struct netfs_io_request *wreq, enum netfs_io_source dest,
-	loff_t start, size_t len, work_func_t worker);
 void netfs_prepare_write_failed(struct netfs_io_subrequest *subreq);
 void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
 				       bool was_async);
-void new_netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
-					   bool was_async);
 void netfs_queue_write_request(struct netfs_io_subrequest *subreq);
 
 int netfs_start_io_read(struct inode *inode);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 24/26] netfs: Remove the old writeback code
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (22 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 23/26] netfs: Cut over to using new writeback code David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-04-15 12:20   ` Jeff Layton
  2024-04-17 10:36   ` David Howells
  2024-03-28 16:34 ` [PATCH 25/26] netfs: Miscellaneous tidy ups David Howells
                   ` (7 subsequent siblings)
  31 siblings, 2 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, David Howells, linux-mm,
	Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs,
	Matthew Wilcox, Steve French, linux-cachefs, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	linux-nfs, netdev, v9fs, linux-kernel, linux-fsdevel, netfs,
	linux-erofs

Remove the old writeback code.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/9p/vfs_addr.c          |  34 ---
 fs/afs/write.c            |  40 ---
 fs/netfs/buffered_write.c | 629 --------------------------------------
 fs/netfs/direct_write.c   |   2 +-
 fs/netfs/output.c         | 477 -----------------------------
 5 files changed, 1 insertion(+), 1181 deletions(-)
 delete mode 100644 fs/netfs/output.c

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 4845e655bc39..a97ceb105cd8 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -60,40 +60,6 @@ static void v9fs_issue_write(struct netfs_io_subrequest *subreq)
 	netfs_write_subrequest_terminated(subreq, len ?: err, false);
 }
 
-#if 0 // TODO: Remove
-static void v9fs_upload_to_server(struct netfs_io_subrequest *subreq)
-{
-	struct p9_fid *fid = subreq->rreq->netfs_priv;
-	int err, len;
-
-	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
-	len = p9_client_write(fid, subreq->start, &subreq->io_iter, &err);
-	netfs_write_subrequest_terminated(subreq, len ?: err, false);
-}
-
-static void v9fs_upload_to_server_worker(struct work_struct *work)
-{
-	struct netfs_io_subrequest *subreq =
-		container_of(work, struct netfs_io_subrequest, work);
-
-	v9fs_upload_to_server(subreq);
-}
-
-/*
- * Set up write requests for a writeback slice.  We need to add a write request
- * for each write we want to make.
- */
-static void v9fs_create_write_requests(struct netfs_io_request *wreq, loff_t start, size_t len)
-{
-	struct netfs_io_subrequest *subreq;
-
-	subreq = netfs_create_write_request(wreq, NETFS_UPLOAD_TO_SERVER,
-					    start, len, v9fs_upload_to_server_worker);
-	if (subreq)
-		netfs_queue_write_request(subreq);
-}
-#endif
-
 /**
  * v9fs_issue_read - Issue a read from 9P
  * @subreq: The read to make
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 0ead204c84cb..6ef7d4cbc008 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -156,46 +156,6 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t
 	return afs_put_operation(op);
 }
 
-#if 0 // TODO: Remove
-static void afs_upload_to_server(struct netfs_io_subrequest *subreq)
-{
-	struct afs_vnode *vnode = AFS_FS_I(subreq->rreq->inode);
-	ssize_t ret;
-
-	_enter("%x[%x],%zx",
-	       subreq->rreq->debug_id, subreq->debug_index, subreq->io_iter.count);
-
-	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
-	ret = afs_store_data(vnode, &subreq->io_iter, subreq->start);
-	netfs_write_subrequest_terminated(subreq, ret < 0 ? ret : subreq->len,
-					  false);
-}
-
-static void afs_upload_to_server_worker(struct work_struct *work)
-{
-	struct netfs_io_subrequest *subreq =
-		container_of(work, struct netfs_io_subrequest, work);
-
-	afs_upload_to_server(subreq);
-}
-
-/*
- * Set up write requests for a writeback slice.  We need to add a write request
- * for each write we want to make.
- */
-void afs_create_write_requests(struct netfs_io_request *wreq, loff_t start, size_t len)
-{
-	struct netfs_io_subrequest *subreq;
-
-	_enter("%x,%llx-%llx", wreq->debug_id, start, start + len);
-
-	subreq = netfs_create_write_request(wreq, NETFS_UPLOAD_TO_SERVER,
-					    start, len, afs_upload_to_server_worker);
-	if (subreq)
-		netfs_queue_write_request(subreq);
-}
-#endif
-
 /*
  * Writeback calls this when it finds a folio that needs uploading.  This isn't
  * called if writeback only has copy-to-cache to deal with.
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 945e646cd2db..2da9905abec9 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -575,632 +575,3 @@ vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group *netfs_gr
 	return ret;
 }
 EXPORT_SYMBOL(netfs_page_mkwrite);
-
-#if 0 // TODO: Remove
-/*
- * Kill all the pages in the given range
- */
-static void netfs_kill_pages(struct address_space *mapping,
-			     loff_t start, loff_t len)
-{
-	struct folio *folio;
-	pgoff_t index = start / PAGE_SIZE;
-	pgoff_t last = (start + len - 1) / PAGE_SIZE, next;
-
-	_enter("%llx-%llx", start, start + len - 1);
-
-	do {
-		_debug("kill %lx (to %lx)", index, last);
-
-		folio = filemap_get_folio(mapping, index);
-		if (IS_ERR(folio)) {
-			next = index + 1;
-			continue;
-		}
-
-		next = folio_next_index(folio);
-
-		trace_netfs_folio(folio, netfs_folio_trace_kill);
-		folio_clear_uptodate(folio);
-		folio_end_writeback(folio);
-		folio_lock(folio);
-		generic_error_remove_folio(mapping, folio);
-		folio_unlock(folio);
-		folio_put(folio);
-
-	} while (index = next, index <= last);
-
-	_leave("");
-}
-
-/*
- * Redirty all the pages in a given range.
- */
-static void netfs_redirty_pages(struct address_space *mapping,
-				loff_t start, loff_t len)
-{
-	struct folio *folio;
-	pgoff_t index = start / PAGE_SIZE;
-	pgoff_t last = (start + len - 1) / PAGE_SIZE, next;
-
-	_enter("%llx-%llx", start, start + len - 1);
-
-	do {
-		_debug("redirty %llx @%llx", len, start);
-
-		folio = filemap_get_folio(mapping, index);
-		if (IS_ERR(folio)) {
-			next = index + 1;
-			continue;
-		}
-
-		next = folio_next_index(folio);
-		trace_netfs_folio(folio, netfs_folio_trace_redirty);
-		filemap_dirty_folio(mapping, folio);
-		folio_end_writeback(folio);
-		folio_put(folio);
-	} while (index = next, index <= last);
-
-	balance_dirty_pages_ratelimited(mapping);
-
-	_leave("");
-}
-
-/*
- * Completion of write to server
- */
-static void netfs_pages_written_back(struct netfs_io_request *wreq)
-{
-	struct address_space *mapping = wreq->mapping;
-	struct netfs_folio *finfo;
-	struct netfs_group *group = NULL;
-	struct folio *folio;
-	pgoff_t last;
-	int gcount = 0;
-
-	XA_STATE(xas, &mapping->i_pages, wreq->start / PAGE_SIZE);
-
-	_enter("%llx-%llx", wreq->start, wreq->start + wreq->len);
-
-	rcu_read_lock();
-
-	last = (wreq->start + wreq->len - 1) / PAGE_SIZE;
-	xas_for_each(&xas, folio, last) {
-		WARN(!folio_test_writeback(folio),
-		     "bad %llx @%llx page %lx %lx\n",
-		     wreq->len, wreq->start, folio->index, last);
-
-		if ((finfo = netfs_folio_info(folio))) {
-			/* Streaming writes cannot be redirtied whilst under
-			 * writeback, so discard the streaming record.
-			 */
-			folio_detach_private(folio);
-			group = finfo->netfs_group;
-			gcount++;
-			trace_netfs_folio(folio, netfs_folio_trace_clear_s);
-			kfree(finfo);
-		} else if ((group = netfs_folio_group(folio))) {
-			/* Need to detach the group pointer if the page didn't
-			 * get redirtied.  If it has been redirtied, then it
-			 * must be within the same group.
-			 */
-			if (folio_test_dirty(folio)) {
-				trace_netfs_folio(folio, netfs_folio_trace_redirtied);
-				goto end_wb;
-			}
-			if (folio_trylock(folio)) {
-				if (!folio_test_dirty(folio)) {
-					folio_detach_private(folio);
-					gcount++;
-					if (group == NETFS_FOLIO_COPY_TO_CACHE)
-						trace_netfs_folio(folio,
-								  netfs_folio_trace_end_copy);
-					else
-						trace_netfs_folio(folio, netfs_folio_trace_clear_g);
-				} else {
-					trace_netfs_folio(folio, netfs_folio_trace_redirtied);
-				}
-				folio_unlock(folio);
-				goto end_wb;
-			}
-
-			xas_pause(&xas);
-			rcu_read_unlock();
-			folio_lock(folio);
-			if (!folio_test_dirty(folio)) {
-				folio_detach_private(folio);
-				gcount++;
-				trace_netfs_folio(folio, netfs_folio_trace_clear_g);
-			} else {
-				trace_netfs_folio(folio, netfs_folio_trace_redirtied);
-			}
-			folio_unlock(folio);
-			rcu_read_lock();
-		} else {
-			trace_netfs_folio(folio, netfs_folio_trace_clear);
-		}
-	end_wb:
-		xas_advance(&xas, folio_next_index(folio) - 1);
-		folio_end_writeback(folio);
-	}
-
-	rcu_read_unlock();
-	netfs_put_group_many(group, gcount);
-	_leave("");
-}
-
-/*
- * Deal with the disposition of the folios that are under writeback to close
- * out the operation.
- */
-static void netfs_cleanup_buffered_write(struct netfs_io_request *wreq)
-{
-	struct address_space *mapping = wreq->mapping;
-
-	_enter("");
-
-	switch (wreq->error) {
-	case 0:
-		netfs_pages_written_back(wreq);
-		break;
-
-	default:
-		pr_notice("R=%08x Unexpected error %d\n", wreq->debug_id, wreq->error);
-		fallthrough;
-	case -EACCES:
-	case -EPERM:
-	case -ENOKEY:
-	case -EKEYEXPIRED:
-	case -EKEYREJECTED:
-	case -EKEYREVOKED:
-	case -ENETRESET:
-	case -EDQUOT:
-	case -ENOSPC:
-		netfs_redirty_pages(mapping, wreq->start, wreq->len);
-		break;
-
-	case -EROFS:
-	case -EIO:
-	case -EREMOTEIO:
-	case -EFBIG:
-	case -ENOENT:
-	case -ENOMEDIUM:
-	case -ENXIO:
-		netfs_kill_pages(mapping, wreq->start, wreq->len);
-		break;
-	}
-
-	if (wreq->error)
-		mapping_set_error(mapping, wreq->error);
-	if (wreq->netfs_ops->done)
-		wreq->netfs_ops->done(wreq);
-}
-
-/*
- * Extend the region to be written back to include subsequent contiguously
- * dirty pages if possible, but don't sleep while doing so.
- *
- * If this page holds new content, then we can include filler zeros in the
- * writeback.
- */
-static void netfs_extend_writeback(struct address_space *mapping,
-				   struct netfs_group *group,
-				   struct xa_state *xas,
-				   long *_count,
-				   loff_t start,
-				   loff_t max_len,
-				   size_t *_len,
-				   size_t *_top)
-{
-	struct netfs_folio *finfo;
-	struct folio_batch fbatch;
-	struct folio *folio;
-	unsigned int i;
-	pgoff_t index = (start + *_len) / PAGE_SIZE;
-	size_t len;
-	void *priv;
-	bool stop = true;
-
-	folio_batch_init(&fbatch);
-
-	do {
-		/* Firstly, we gather up a batch of contiguous dirty pages
-		 * under the RCU read lock - but we can't clear the dirty flags
-		 * there if any of those pages are mapped.
-		 */
-		rcu_read_lock();
-
-		xas_for_each(xas, folio, ULONG_MAX) {
-			stop = true;
-			if (xas_retry(xas, folio))
-				continue;
-			if (xa_is_value(folio))
-				break;
-			if (folio->index != index) {
-				xas_reset(xas);
-				break;
-			}
-
-			if (!folio_try_get_rcu(folio)) {
-				xas_reset(xas);
-				continue;
-			}
-
-			/* Has the folio moved or been split? */
-			if (unlikely(folio != xas_reload(xas))) {
-				folio_put(folio);
-				xas_reset(xas);
-				break;
-			}
-
-			if (!folio_trylock(folio)) {
-				folio_put(folio);
-				xas_reset(xas);
-				break;
-			}
-			if (!folio_test_dirty(folio) ||
-			    folio_test_writeback(folio)) {
-				folio_unlock(folio);
-				folio_put(folio);
-				xas_reset(xas);
-				break;
-			}
-
-			stop = false;
-			len = folio_size(folio);
-			priv = folio_get_private(folio);
-			if ((const struct netfs_group *)priv != group) {
-				stop = true;
-				finfo = netfs_folio_info(folio);
-				if (!finfo ||
-				    finfo->netfs_group != group ||
-				    finfo->dirty_offset > 0) {
-					folio_unlock(folio);
-					folio_put(folio);
-					xas_reset(xas);
-					break;
-				}
-				len = finfo->dirty_len;
-			}
-
-			*_top += folio_size(folio);
-			index += folio_nr_pages(folio);
-			*_count -= folio_nr_pages(folio);
-			*_len += len;
-			if (*_len >= max_len || *_count <= 0)
-				stop = true;
-
-			if (!folio_batch_add(&fbatch, folio))
-				break;
-			if (stop)
-				break;
-		}
-
-		xas_pause(xas);
-		rcu_read_unlock();
-
-		/* Now, if we obtained any folios, we can shift them to being
-		 * writable and mark them for caching.
-		 */
-		if (!folio_batch_count(&fbatch))
-			break;
-
-		for (i = 0; i < folio_batch_count(&fbatch); i++) {
-			folio = fbatch.folios[i];
-			if (group == NETFS_FOLIO_COPY_TO_CACHE)
-				trace_netfs_folio(folio, netfs_folio_trace_copy_plus);
-			else
-				trace_netfs_folio(folio, netfs_folio_trace_store_plus);
-
-			if (!folio_clear_dirty_for_io(folio))
-				BUG();
-			folio_start_writeback(folio);
-			folio_unlock(folio);
-		}
-
-		folio_batch_release(&fbatch);
-		cond_resched();
-	} while (!stop);
-}
-
-/*
- * Synchronously write back the locked page and any subsequent non-locked dirty
- * pages.
- */
-static ssize_t netfs_write_back_from_locked_folio(struct address_space *mapping,
-						  struct writeback_control *wbc,
-						  struct netfs_group *group,
-						  struct xa_state *xas,
-						  struct folio *folio,
-						  unsigned long long start,
-						  unsigned long long end)
-{
-	struct netfs_io_request *wreq;
-	struct netfs_folio *finfo;
-	struct netfs_inode *ctx = netfs_inode(mapping->host);
-	unsigned long long i_size = i_size_read(&ctx->inode);
-	size_t len, max_len;
-	long count = wbc->nr_to_write;
-	int ret;
-
-	_enter(",%lx,%llx-%llx", folio->index, start, end);
-
-	wreq = netfs_alloc_request(mapping, NULL, start, folio_size(folio),
-				   group == NETFS_FOLIO_COPY_TO_CACHE ?
-				   NETFS_COPY_TO_CACHE : NETFS_WRITEBACK);
-	if (IS_ERR(wreq)) {
-		folio_unlock(folio);
-		return PTR_ERR(wreq);
-	}
-
-	if (!folio_clear_dirty_for_io(folio))
-		BUG();
-	folio_start_writeback(folio);
-
-	count -= folio_nr_pages(folio);
-
-	/* Find all consecutive lockable dirty pages that have contiguous
-	 * written regions, stopping when we find a page that is not
-	 * immediately lockable, is not dirty or is missing, or we reach the
-	 * end of the range.
-	 */
-	if (group == NETFS_FOLIO_COPY_TO_CACHE)
-		trace_netfs_folio(folio, netfs_folio_trace_copy);
-	else
-		trace_netfs_folio(folio, netfs_folio_trace_store);
-
-	len = wreq->len;
-	finfo = netfs_folio_info(folio);
-	if (finfo) {
-		start += finfo->dirty_offset;
-		if (finfo->dirty_offset + finfo->dirty_len != len) {
-			len = finfo->dirty_len;
-			goto cant_expand;
-		}
-		len = finfo->dirty_len;
-	}
-
-	if (start < i_size) {
-		/* Trim the write to the EOF; the extra data is ignored.  Also
-		 * put an upper limit on the size of a single storedata op.
-		 */
-		max_len = 65536 * 4096;
-		max_len = min_t(unsigned long long, max_len, end - start + 1);
-		max_len = min_t(unsigned long long, max_len, i_size - start);
-
-		if (len < max_len)
-			netfs_extend_writeback(mapping, group, xas, &count, start,
-					       max_len, &len, &wreq->upper_len);
-	}
-
-cant_expand:
-	len = min_t(unsigned long long, len, i_size - start);
-
-	/* We now have a contiguous set of dirty pages, each with writeback
-	 * set; the first page is still locked at this point, but all the rest
-	 * have been unlocked.
-	 */
-	folio_unlock(folio);
-	wreq->start = start;
-	wreq->len = len;
-
-	if (start < i_size) {
-		_debug("write back %zx @%llx [%llx]", len, start, i_size);
-
-		/* Speculatively write to the cache.  We have to fix this up
-		 * later if the store fails.
-		 */
-		wreq->cleanup = netfs_cleanup_buffered_write;
-
-		iov_iter_xarray(&wreq->iter, ITER_SOURCE, &mapping->i_pages, start,
-				wreq->upper_len);
-		if (group != NETFS_FOLIO_COPY_TO_CACHE) {
-			__set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags);
-			ret = netfs_begin_write(wreq, true, netfs_write_trace_writeback);
-		} else {
-			ret = netfs_begin_write(wreq, true, netfs_write_trace_copy_to_cache);
-		}
-		if (ret == 0 || ret == -EIOCBQUEUED)
-			wbc->nr_to_write -= len / PAGE_SIZE;
-	} else {
-		_debug("write discard %zx @%llx [%llx]", len, start, i_size);
-
-		/* The dirty region was entirely beyond the EOF. */
-		netfs_pages_written_back(wreq);
-		ret = 0;
-	}
-
-	netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
-	_leave(" = 1");
-	return 1;
-}
-
-/*
- * Write a region of pages back to the server
- */
-static ssize_t netfs_writepages_begin(struct address_space *mapping,
-				      struct writeback_control *wbc,
-				      struct netfs_group *group,
-				      struct xa_state *xas,
-				      unsigned long long *_start,
-				      unsigned long long end)
-{
-	const struct netfs_folio *finfo;
-	struct folio *folio;
-	unsigned long long start = *_start;
-	ssize_t ret;
-	void *priv;
-	int skips = 0;
-
-	_enter("%llx,%llx,", start, end);
-
-search_again:
-	/* Find the first dirty page in the group. */
-	rcu_read_lock();
-
-	for (;;) {
-		folio = xas_find_marked(xas, end / PAGE_SIZE, PAGECACHE_TAG_DIRTY);
-		if (xas_retry(xas, folio) || xa_is_value(folio))
-			continue;
-		if (!folio)
-			break;
-
-		if (!folio_try_get_rcu(folio)) {
-			xas_reset(xas);
-			continue;
-		}
-
-		if (unlikely(folio != xas_reload(xas))) {
-			folio_put(folio);
-			xas_reset(xas);
-			continue;
-		}
-
-		/* Skip any dirty folio that's not in the group of interest. */
-		priv = folio_get_private(folio);
-		if ((const struct netfs_group *)priv == NETFS_FOLIO_COPY_TO_CACHE) {
-			group = NETFS_FOLIO_COPY_TO_CACHE;
-		} else if ((const struct netfs_group *)priv != group) {
-			finfo = __netfs_folio_info(priv);
-			if (!finfo || finfo->netfs_group != group) {
-				folio_put(folio);
-				continue;
-			}
-		}
-
-		xas_pause(xas);
-		break;
-	}
-	rcu_read_unlock();
-	if (!folio)
-		return 0;
-
-	start = folio_pos(folio); /* May regress with THPs */
-
-	_debug("wback %lx", folio->index);
-
-	/* At this point we hold neither the i_pages lock nor the page lock:
-	 * the page may be truncated or invalidated (changing page->mapping to
-	 * NULL), or even swizzled back from swapper_space to tmpfs file
-	 * mapping
-	 */
-lock_again:
-	if (wbc->sync_mode != WB_SYNC_NONE) {
-		ret = folio_lock_killable(folio);
-		if (ret < 0)
-			return ret;
-	} else {
-		if (!folio_trylock(folio))
-			goto search_again;
-	}
-
-	if (folio->mapping != mapping ||
-	    !folio_test_dirty(folio)) {
-		start += folio_size(folio);
-		folio_unlock(folio);
-		goto search_again;
-	}
-
-	if (folio_test_writeback(folio)) {
-		folio_unlock(folio);
-		if (wbc->sync_mode != WB_SYNC_NONE) {
-			folio_wait_writeback(folio);
-			goto lock_again;
-		}
-
-		start += folio_size(folio);
-		if (wbc->sync_mode == WB_SYNC_NONE) {
-			if (skips >= 5 || need_resched()) {
-				ret = 0;
-				goto out;
-			}
-			skips++;
-		}
-		goto search_again;
-	}
-
-	ret = netfs_write_back_from_locked_folio(mapping, wbc, group, xas,
-						 folio, start, end);
-out:
-	if (ret > 0)
-		*_start = start + ret;
-	_leave(" = %zd [%llx]", ret, *_start);
-	return ret;
-}
-
-/*
- * Write a region of pages back to the server
- */
-static int netfs_writepages_region(struct address_space *mapping,
-				   struct writeback_control *wbc,
-				   struct netfs_group *group,
-				   unsigned long long *_start,
-				   unsigned long long end)
-{
-	ssize_t ret;
-
-	XA_STATE(xas, &mapping->i_pages, *_start / PAGE_SIZE);
-
-	do {
-		ret = netfs_writepages_begin(mapping, wbc, group, &xas,
-					     _start, end);
-		if (ret > 0 && wbc->nr_to_write > 0)
-			cond_resched();
-	} while (ret > 0 && wbc->nr_to_write > 0);
-
-	return ret > 0 ? 0 : ret;
-}
-
-/*
- * write some of the pending data back to the server
- */
-int netfs_writepages(struct address_space *mapping,
-		     struct writeback_control *wbc)
-{
-	struct netfs_group *group = NULL;
-	loff_t start, end;
-	int ret;
-
-	_enter("");
-
-	/* We have to be careful as we can end up racing with setattr()
-	 * truncating the pagecache since the caller doesn't take a lock here
-	 * to prevent it.
-	 */
-
-	if (wbc->range_cyclic && mapping->writeback_index) {
-		start = mapping->writeback_index * PAGE_SIZE;
-		ret = netfs_writepages_region(mapping, wbc, group,
-					      &start, LLONG_MAX);
-		if (ret < 0)
-			goto out;
-
-		if (wbc->nr_to_write <= 0) {
-			mapping->writeback_index = start / PAGE_SIZE;
-			goto out;
-		}
-
-		start = 0;
-		end = mapping->writeback_index * PAGE_SIZE;
-		mapping->writeback_index = 0;
-		ret = netfs_writepages_region(mapping, wbc, group, &start, end);
-		if (ret == 0)
-			mapping->writeback_index = start / PAGE_SIZE;
-	} else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) {
-		start = 0;
-		ret = netfs_writepages_region(mapping, wbc, group,
-					      &start, LLONG_MAX);
-		if (wbc->nr_to_write > 0 && ret == 0)
-			mapping->writeback_index = start / PAGE_SIZE;
-	} else {
-		start = wbc->range_start;
-		ret = netfs_writepages_region(mapping, wbc, group,
-					      &start, wbc->range_end);
-	}
-
-out:
-	_leave(" = %d", ret);
-	return ret;
-}
-EXPORT_SYMBOL(netfs_writepages);
-#endif
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 330ba7cb3f10..e4a9cf7cd234 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -37,7 +37,7 @@ static ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov
 	size_t len = iov_iter_count(iter);
 	bool async = !is_sync_kiocb(iocb);
 
-	_enter("");
+	_enter("%lx", iov_iter_count(iter));
 
 	/* We're going to need a bounce buffer if what we transmit is going to
 	 * be different in some way to the source buffer, e.g. because it gets
diff --git a/fs/netfs/output.c b/fs/netfs/output.c
deleted file mode 100644
index 85374322f10f..000000000000
--- a/fs/netfs/output.c
+++ /dev/null
@@ -1,477 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/* Network filesystem high-level write support.
- *
- * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- */
-
-#include <linux/fs.h>
-#include <linux/mm.h>
-#include <linux/pagemap.h>
-#include <linux/slab.h>
-#include <linux/writeback.h>
-#include <linux/pagevec.h>
-#include "internal.h"
-
-/**
- * netfs_create_write_request - Create a write operation.
- * @wreq: The write request this is storing from.
- * @dest: The destination type
- * @start: Start of the region this write will modify
- * @len: Length of the modification
- * @worker: The worker function to handle the write(s)
- *
- * Allocate a write operation, set it up and add it to the list on a write
- * request.
- */
-struct netfs_io_subrequest *netfs_create_write_request(struct netfs_io_request *wreq,
-						       enum netfs_io_source dest,
-						       loff_t start, size_t len,
-						       work_func_t worker)
-{
-	struct netfs_io_subrequest *subreq;
-
-	subreq = netfs_alloc_subrequest(wreq);
-	if (subreq) {
-		INIT_WORK(&subreq->work, worker);
-		subreq->source	= dest;
-		subreq->start	= start;
-		subreq->len	= len;
-
-		switch (subreq->source) {
-		case NETFS_UPLOAD_TO_SERVER:
-			netfs_stat(&netfs_n_wh_upload);
-			break;
-		case NETFS_WRITE_TO_CACHE:
-			netfs_stat(&netfs_n_wh_write);
-			break;
-		default:
-			BUG();
-		}
-
-		subreq->io_iter = wreq->io_iter;
-		iov_iter_advance(&subreq->io_iter, subreq->start - wreq->start);
-		iov_iter_truncate(&subreq->io_iter, subreq->len);
-
-		trace_netfs_sreq_ref(wreq->debug_id, subreq->debug_index,
-				     refcount_read(&subreq->ref),
-				     netfs_sreq_trace_new);
-		atomic_inc(&wreq->nr_outstanding);
-		list_add_tail(&subreq->rreq_link, &wreq->subrequests);
-		trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
-	}
-
-	return subreq;
-}
-EXPORT_SYMBOL(netfs_create_write_request);
-
-/*
- * Process a completed write request once all the component operations have
- * been completed.
- */
-static void netfs_write_terminated(struct netfs_io_request *wreq, bool was_async)
-{
-	struct netfs_io_subrequest *subreq;
-	struct netfs_inode *ctx = netfs_inode(wreq->inode);
-	size_t transferred = 0;
-
-	_enter("R=%x[]", wreq->debug_id);
-
-	trace_netfs_rreq(wreq, netfs_rreq_trace_write_done);
-
-	list_for_each_entry(subreq, &wreq->subrequests, rreq_link) {
-		if (subreq->error || subreq->transferred == 0)
-			break;
-		transferred += subreq->transferred;
-		if (subreq->transferred < subreq->len)
-			break;
-	}
-	wreq->transferred = transferred;
-
-	list_for_each_entry(subreq, &wreq->subrequests, rreq_link) {
-		if (!subreq->error)
-			continue;
-		switch (subreq->source) {
-		case NETFS_UPLOAD_TO_SERVER:
-			/* Depending on the type of failure, this may prevent
-			 * writeback completion unless we're in disconnected
-			 * mode.
-			 */
-			if (!wreq->error)
-				wreq->error = subreq->error;
-			break;
-
-		case NETFS_WRITE_TO_CACHE:
-			/* Failure doesn't prevent writeback completion unless
-			 * we're in disconnected mode.
-			 */
-			if (subreq->error != -ENOBUFS)
-				ctx->ops->invalidate_cache(wreq);
-			break;
-
-		default:
-			WARN_ON_ONCE(1);
-			if (!wreq->error)
-				wreq->error = -EIO;
-			return;
-		}
-	}
-
-	wreq->cleanup(wreq);
-
-	if (wreq->origin == NETFS_DIO_WRITE &&
-	    wreq->mapping->nrpages) {
-		pgoff_t first = wreq->start >> PAGE_SHIFT;
-		pgoff_t last = (wreq->start + wreq->transferred - 1) >> PAGE_SHIFT;
-		invalidate_inode_pages2_range(wreq->mapping, first, last);
-	}
-
-	if (wreq->origin == NETFS_DIO_WRITE)
-		inode_dio_end(wreq->inode);
-
-	_debug("finished");
-	trace_netfs_rreq(wreq, netfs_rreq_trace_wake_ip);
-	clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &wreq->flags);
-	wake_up_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS);
-
-	if (wreq->iocb) {
-		wreq->iocb->ki_pos += transferred;
-		if (wreq->iocb->ki_complete)
-			wreq->iocb->ki_complete(
-				wreq->iocb, wreq->error ? wreq->error : transferred);
-	}
-
-	netfs_clear_subrequests(wreq, was_async);
-	netfs_put_request(wreq, was_async, netfs_rreq_trace_put_complete);
-}
-
-/*
- * Deal with the completion of writing the data to the cache.
- */
-void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
-				       bool was_async)
-{
-	struct netfs_io_subrequest *subreq = _op;
-	struct netfs_io_request *wreq = subreq->rreq;
-	unsigned int u;
-
-	_enter("%x[%x] %zd", wreq->debug_id, subreq->debug_index, transferred_or_error);
-
-	switch (subreq->source) {
-	case NETFS_UPLOAD_TO_SERVER:
-		netfs_stat(&netfs_n_wh_upload_done);
-		break;
-	case NETFS_WRITE_TO_CACHE:
-		netfs_stat(&netfs_n_wh_write_done);
-		break;
-	case NETFS_INVALID_WRITE:
-		break;
-	default:
-		BUG();
-	}
-
-	if (IS_ERR_VALUE(transferred_or_error)) {
-		subreq->error = transferred_or_error;
-		trace_netfs_failure(wreq, subreq, transferred_or_error,
-				    netfs_fail_write);
-		goto failed;
-	}
-
-	if (WARN(transferred_or_error > subreq->len - subreq->transferred,
-		 "Subreq excess write: R%x[%x] %zd > %zu - %zu",
-		 wreq->debug_id, subreq->debug_index,
-		 transferred_or_error, subreq->len, subreq->transferred))
-		transferred_or_error = subreq->len - subreq->transferred;
-
-	subreq->error = 0;
-	subreq->transferred += transferred_or_error;
-
-	if (iov_iter_count(&subreq->io_iter) != subreq->len - subreq->transferred)
-		pr_warn("R=%08x[%u] ITER POST-MISMATCH %zx != %zx-%zx %x\n",
-			wreq->debug_id, subreq->debug_index,
-			iov_iter_count(&subreq->io_iter), subreq->len,
-			subreq->transferred, subreq->io_iter.iter_type);
-
-	if (subreq->transferred < subreq->len)
-		goto incomplete;
-
-	__clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
-out:
-	trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
-
-	/* If we decrement nr_outstanding to 0, the ref belongs to us. */
-	u = atomic_dec_return(&wreq->nr_outstanding);
-	if (u == 0)
-		netfs_write_terminated(wreq, was_async);
-	else if (u == 1)
-		wake_up_var(&wreq->nr_outstanding);
-
-	netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
-	return;
-
-incomplete:
-	if (transferred_or_error == 0) {
-		if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
-			subreq->error = -ENODATA;
-			goto failed;
-		}
-	} else {
-		__clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
-	}
-
-	__set_bit(NETFS_SREQ_SHORT_IO, &subreq->flags);
-	set_bit(NETFS_RREQ_INCOMPLETE_IO, &wreq->flags);
-	goto out;
-
-failed:
-	switch (subreq->source) {
-	case NETFS_WRITE_TO_CACHE:
-		netfs_stat(&netfs_n_wh_write_failed);
-		set_bit(NETFS_RREQ_INCOMPLETE_IO, &wreq->flags);
-		break;
-	case NETFS_UPLOAD_TO_SERVER:
-		netfs_stat(&netfs_n_wh_upload_failed);
-		set_bit(NETFS_RREQ_FAILED, &wreq->flags);
-		wreq->error = subreq->error;
-		break;
-	default:
-		break;
-	}
-	goto out;
-}
-EXPORT_SYMBOL(netfs_write_subrequest_terminated);
-
-static void netfs_write_to_cache_op(struct netfs_io_subrequest *subreq)
-{
-	struct netfs_io_request *wreq = subreq->rreq;
-	struct netfs_cache_resources *cres = &wreq->cache_resources;
-
-	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
-
-	cres->ops->write(cres, subreq->start, &subreq->io_iter,
-			 netfs_write_subrequest_terminated, subreq);
-}
-
-static void netfs_write_to_cache_op_worker(struct work_struct *work)
-{
-	struct netfs_io_subrequest *subreq =
-		container_of(work, struct netfs_io_subrequest, work);
-
-	netfs_write_to_cache_op(subreq);
-}
-
-/**
- * netfs_queue_write_request - Queue a write request for attention
- * @subreq: The write request to be queued
- *
- * Queue the specified write request for processing by a worker thread.  We
- * pass the caller's ref on the request to the worker thread.
- */
-void netfs_queue_write_request(struct netfs_io_subrequest *subreq)
-{
-	if (!queue_work(system_unbound_wq, &subreq->work))
-		netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_wip);
-}
-EXPORT_SYMBOL(netfs_queue_write_request);
-
-/*
- * Set up a op for writing to the cache.
- */
-static void netfs_set_up_write_to_cache(struct netfs_io_request *wreq)
-{
-	struct netfs_cache_resources *cres = &wreq->cache_resources;
-	struct netfs_io_subrequest *subreq;
-	struct netfs_inode *ctx = netfs_inode(wreq->inode);
-	struct fscache_cookie *cookie = netfs_i_cookie(ctx);
-	loff_t start = wreq->start;
-	size_t len = wreq->len;
-	int ret;
-
-	if (!fscache_cookie_enabled(cookie)) {
-		clear_bit(NETFS_RREQ_WRITE_TO_CACHE, &wreq->flags);
-		return;
-	}
-
-	_debug("write to cache");
-	ret = fscache_begin_write_operation(cres, cookie);
-	if (ret < 0)
-		return;
-
-	ret = cres->ops->prepare_write(cres, &start, &len, wreq->upper_len,
-				       i_size_read(wreq->inode), true);
-	if (ret < 0)
-		return;
-
-	subreq = netfs_create_write_request(wreq, NETFS_WRITE_TO_CACHE, start, len,
-					    netfs_write_to_cache_op_worker);
-	if (!subreq)
-		return;
-
-	netfs_write_to_cache_op(subreq);
-}
-
-/*
- * Begin the process of writing out a chunk of data.
- *
- * We are given a write request that holds a series of dirty regions and
- * (partially) covers a sequence of folios, all of which are present.  The
- * pages must have been marked as writeback as appropriate.
- *
- * We need to perform the following steps:
- *
- * (1) If encrypting, create an output buffer and encrypt each block of the
- *     data into it, otherwise the output buffer will point to the original
- *     folios.
- *
- * (2) If the data is to be cached, set up a write op for the entire output
- *     buffer to the cache, if the cache wants to accept it.
- *
- * (3) If the data is to be uploaded (ie. not merely cached):
- *
- *     (a) If the data is to be compressed, create a compression buffer and
- *         compress the data into it.
- *
- *     (b) For each destination we want to upload to, set up write ops to write
- *         to that destination.  We may need multiple writes if the data is not
- *         contiguous or the span exceeds wsize for a server.
- */
-int netfs_begin_write(struct netfs_io_request *wreq, bool may_wait,
-		      enum netfs_write_trace what)
-{
-	struct netfs_inode *ctx = netfs_inode(wreq->inode);
-
-	_enter("R=%x %llx-%llx f=%lx",
-	       wreq->debug_id, wreq->start, wreq->start + wreq->len - 1,
-	       wreq->flags);
-
-	trace_netfs_write(wreq, what);
-	if (wreq->len == 0 || wreq->iter.count == 0) {
-		pr_err("Zero-sized write [R=%x]\n", wreq->debug_id);
-		return -EIO;
-	}
-
-	if (wreq->origin == NETFS_DIO_WRITE)
-		inode_dio_begin(wreq->inode);
-
-	wreq->io_iter = wreq->iter;
-
-	/* ->outstanding > 0 carries a ref */
-	netfs_get_request(wreq, netfs_rreq_trace_get_for_outstanding);
-	atomic_set(&wreq->nr_outstanding, 1);
-
-	/* Start the encryption/compression going.  We can do that in the
-	 * background whilst we generate a list of write ops that we want to
-	 * perform.
-	 */
-	// TODO: Encrypt or compress the region as appropriate
-
-	/* We need to write all of the region to the cache */
-	if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &wreq->flags))
-		netfs_set_up_write_to_cache(wreq);
-
-	/* However, we don't necessarily write all of the region to the server.
-	 * Caching of reads is being managed this way also.
-	 */
-	if (test_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags))
-		ctx->ops->create_write_requests(wreq, wreq->start, wreq->len);
-
-	if (atomic_dec_and_test(&wreq->nr_outstanding))
-		netfs_write_terminated(wreq, false);
-
-	if (!may_wait)
-		return -EIOCBQUEUED;
-
-	wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS,
-		    TASK_UNINTERRUPTIBLE);
-	return wreq->error;
-}
-
-/*
- * Begin a write operation for writing through the pagecache.
- */
-struct netfs_io_request *netfs_begin_writethrough(struct kiocb *iocb, size_t len)
-{
-	struct netfs_io_request *wreq;
-	struct file *file = iocb->ki_filp;
-
-	wreq = netfs_alloc_request(file->f_mapping, file, iocb->ki_pos, len,
-				   NETFS_WRITETHROUGH);
-	if (IS_ERR(wreq))
-		return wreq;
-
-	trace_netfs_write(wreq, netfs_write_trace_writethrough);
-
-	__set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags);
-	iov_iter_xarray(&wreq->iter, ITER_SOURCE, &wreq->mapping->i_pages, wreq->start, 0);
-	wreq->io_iter = wreq->iter;
-
-	/* ->outstanding > 0 carries a ref */
-	netfs_get_request(wreq, netfs_rreq_trace_get_for_outstanding);
-	atomic_set(&wreq->nr_outstanding, 1);
-	return wreq;
-}
-
-static void netfs_submit_writethrough(struct netfs_io_request *wreq, bool final)
-{
-	struct netfs_inode *ictx = netfs_inode(wreq->inode);
-	unsigned long long start;
-	size_t len;
-
-	if (!test_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags))
-		return;
-
-	start = wreq->start + wreq->submitted;
-	len = wreq->iter.count - wreq->submitted;
-	if (!final) {
-		len /= wreq->wsize; /* Round to number of maximum packets */
-		len *= wreq->wsize;
-	}
-
-	ictx->ops->create_write_requests(wreq, start, len);
-	wreq->submitted += len;
-}
-
-/*
- * Advance the state of the write operation used when writing through the
- * pagecache.  Data has been copied into the pagecache that we need to append
- * to the request.  If we've added more than wsize then we need to create a new
- * subrequest.
- */
-int netfs_advance_writethrough(struct netfs_io_request *wreq, size_t copied, bool to_page_end)
-{
-	_enter("ic=%zu sb=%llu ws=%u cp=%zu tp=%u",
-	       wreq->iter.count, wreq->submitted, wreq->wsize, copied, to_page_end);
-
-	wreq->iter.count += copied;
-	wreq->io_iter.count += copied;
-	if (to_page_end && wreq->io_iter.count - wreq->submitted >= wreq->wsize)
-		netfs_submit_writethrough(wreq, false);
-
-	return wreq->error;
-}
-
-/*
- * End a write operation used when writing through the pagecache.
- */
-int netfs_end_writethrough(struct netfs_io_request *wreq, struct kiocb *iocb)
-{
-	int ret = -EIOCBQUEUED;
-
-	_enter("ic=%zu sb=%llu ws=%u",
-	       wreq->iter.count, wreq->submitted, wreq->wsize);
-
-	if (wreq->submitted < wreq->io_iter.count)
-		netfs_submit_writethrough(wreq, true);
-
-	if (atomic_dec_and_test(&wreq->nr_outstanding))
-		netfs_write_terminated(wreq, false);
-
-	if (is_sync_kiocb(iocb)) {
-		wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS,
-			    TASK_UNINTERRUPTIBLE);
-		ret = wreq->error;
-	}
-
-	netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
-	return ret;
-}


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 25/26] netfs: Miscellaneous tidy ups
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (23 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 24/26] netfs: Remove the old " David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-03-28 16:34 ` [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys David Howells
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Do a couple of miscellaneous tidy ups:

 (1) Add a qualifier into a file banner comment.

 (2) Put the writeback folio traces back into alphabetical order.

 (3) Remove some unused folio traces.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/buffered_write.c    | 2 +-
 include/trace/events/netfs.h | 6 +-----
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 2da9905abec9..1eff9413eb1b 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
-/* Network filesystem high-level write support.
+/* Network filesystem high-level buffered write support.
  *
  * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index e7700172ae7e..4ba553a6d71b 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -141,12 +141,9 @@
 	EM(netfs_folio_trace_cancel_copy,	"cancel-copy")	\
 	EM(netfs_folio_trace_clear,		"clear")	\
 	EM(netfs_folio_trace_clear_cc,		"clear-cc")	\
-	EM(netfs_folio_trace_clear_s,		"clear-s")	\
 	EM(netfs_folio_trace_clear_g,		"clear-g")	\
-	EM(netfs_folio_trace_copy,		"copy")		\
-	EM(netfs_folio_trace_copy_plus,		"copy+")	\
+	EM(netfs_folio_trace_clear_s,		"clear-s")	\
 	EM(netfs_folio_trace_copy_to_cache,	"mark-copy")	\
-	EM(netfs_folio_trace_end_copy,		"end-copy")	\
 	EM(netfs_folio_trace_filled_gaps,	"filled-gaps")	\
 	EM(netfs_folio_trace_kill,		"kill")		\
 	EM(netfs_folio_trace_kill_cc,		"kill-cc")	\
@@ -156,7 +153,6 @@
 	EM(netfs_folio_trace_mkwrite_plus,	"mkwrite+")	\
 	EM(netfs_folio_trace_not_under_wback,	"!wback")	\
 	EM(netfs_folio_trace_read_gaps,		"read-gaps")	\
-	EM(netfs_folio_trace_redirty,		"redirty")	\
 	EM(netfs_folio_trace_redirtied,		"redirtied")	\
 	EM(netfs_folio_trace_store,		"store")	\
 	EM(netfs_folio_trace_store_copy,	"store-copy")	\


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (24 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 25/26] netfs: Miscellaneous tidy ups David Howells
@ 2024-03-28 16:34 ` David Howells
  2024-04-01 13:53   ` Simon Horman
  2024-04-02  8:32   ` David Howells
  2024-04-02  8:46 ` [PATCH 19/26] netfs: New writeback implementation David Howells
                   ` (5 subsequent siblings)
  31 siblings, 2 replies; 62+ messages in thread
From: David Howells @ 2024-03-28 16:34 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, Tom Talpey, Shyam Prasad N, linux-cifs,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, David Howells, Steve French, linux-cachefs,
	linux-mm, netdev, Marc Dionne, netfs, Ilya Dryomov,
	Eric Van Hensbergen, linux-erofs, linux-afs

Use a hook in the new writeback code's retry algorithm to rotate the keys
once all the outstanding subreqs have failed rather than doing it
separately on each subreq.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/afs/file.c            |   1 +
 fs/afs/internal.h        |   1 +
 fs/afs/write.c           | 175 +++++++++++++++++++--------------------
 fs/netfs/write_collect.c |   9 +-
 include/linux/netfs.h    |   2 +
 5 files changed, 96 insertions(+), 92 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 8f983e3ecae7..c3f0c45ae9a9 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -368,6 +368,7 @@ static int afs_check_write_begin(struct file *file, loff_t pos, unsigned len,
 static void afs_free_request(struct netfs_io_request *rreq)
 {
 	key_put(rreq->netfs_priv);
+	afs_put_wb_key(rreq->netfs_priv2);
 }
 
 static void afs_update_i_size(struct inode *inode, loff_t new_i_size)
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 887245f9336d..6e1d3c4daf72 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1601,6 +1601,7 @@ extern int afs_check_volume_status(struct afs_volume *, struct afs_operation *);
 void afs_prepare_write(struct netfs_io_subrequest *subreq);
 void afs_issue_write(struct netfs_io_subrequest *subreq);
 void afs_begin_writeback(struct netfs_io_request *wreq);
+void afs_retry_request(struct netfs_io_request *wreq, struct netfs_io_stream *stream);
 extern int afs_writepages(struct address_space *, struct writeback_control *);
 extern int afs_fsync(struct file *, loff_t, loff_t, int);
 extern vm_fault_t afs_page_mkwrite(struct vm_fault *vmf);
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 6ef7d4cbc008..838db2e94388 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -29,43 +29,39 @@ static void afs_pages_written_back(struct afs_vnode *vnode, loff_t start, unsign
 
 /*
  * Find a key to use for the writeback.  We cached the keys used to author the
- * writes on the vnode.  *_wbk will contain the last writeback key used or NULL
- * and we need to start from there if it's set.
+ * writes on the vnode.  wreq->netfs_priv2 will contain the last writeback key
+ * record used or NULL and we need to start from there if it's set.
+ * wreq->netfs_priv will be set to the key itself or NULL.
  */
-static int afs_get_writeback_key(struct afs_vnode *vnode,
-				 struct afs_wb_key **_wbk)
+static void afs_get_writeback_key(struct netfs_io_request *wreq)
 {
-	struct afs_wb_key *wbk = NULL;
-	struct list_head *p;
-	int ret = -ENOKEY, ret2;
+	struct afs_wb_key *wbk, *old = wreq->netfs_priv2;
+	struct afs_vnode *vnode = AFS_FS_I(wreq->inode);
+
+	key_put(wreq->netfs_priv);
+	wreq->netfs_priv = NULL;
+	wreq->netfs_priv2 = NULL;
 
 	spin_lock(&vnode->wb_lock);
-	if (*_wbk)
-		p = (*_wbk)->vnode_link.next;
+	if (old)
+		wbk = list_next_entry(old, vnode_link);
 	else
-		p = vnode->wb_keys.next;
+		wbk = list_first_entry(&vnode->wb_keys, struct afs_wb_key, vnode_link);
 
-	while (p != &vnode->wb_keys) {
-		wbk = list_entry(p, struct afs_wb_key, vnode_link);
+	list_for_each_entry_from(wbk, &vnode->wb_keys, vnode_link) {
 		_debug("wbk %u", key_serial(wbk->key));
-		ret2 = key_validate(wbk->key);
-		if (ret2 == 0) {
+		if (key_validate(wbk->key) == 0) {
 			refcount_inc(&wbk->usage);
+			wreq->netfs_priv = key_get(wbk->key);
+			wreq->netfs_priv2 = wbk;
 			_debug("USE WB KEY %u", key_serial(wbk->key));
 			break;
 		}
-
-		wbk = NULL;
-		if (ret == -ENOKEY)
-			ret = ret2;
-		p = p->next;
 	}
 
 	spin_unlock(&vnode->wb_lock);
-	if (*_wbk)
-		afs_put_wb_key(*_wbk);
-	*_wbk = wbk;
-	return 0;
+
+	afs_put_wb_key(old);
 }
 
 static void afs_store_data_success(struct afs_operation *op)
@@ -88,72 +84,83 @@ static const struct afs_operation_ops afs_store_data_operation = {
 };
 
 /*
- * write to a file
+ * Prepare a subrequest to write to the server.  This sets the max_len
+ * parameter.
  */
-static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t pos)
+void afs_prepare_write(struct netfs_io_subrequest *subreq)
+{
+	//if (test_bit(NETFS_SREQ_RETRYING, &subreq->flags))
+	//	subreq->max_len = 512 * 1024;
+	//else
+	subreq->max_len = 256 * 1024 * 1024;
+}
+
+/*
+ * Issue a subrequest to write to the server.
+ */
+void afs_issue_write(struct netfs_io_subrequest *subreq)
 {
+	struct netfs_io_request *wreq = subreq->rreq;
 	struct afs_operation *op;
-	struct afs_wb_key *wbk = NULL;
-	loff_t size = iov_iter_count(iter);
+	struct afs_vnode *vnode = AFS_FS_I(wreq->inode);
+	unsigned long long pos = subreq->start + subreq->transferred;
+	size_t len = subreq->len - subreq->transferred;
 	int ret = -ENOKEY;
 
-	_enter("%s{%llx:%llu.%u},%llx,%llx",
+	_enter("R=%x[%x],%s{%llx:%llu.%u},%llx,%zx",
+	       wreq->debug_id, subreq->debug_index,
 	       vnode->volume->name,
 	       vnode->fid.vid,
 	       vnode->fid.vnode,
 	       vnode->fid.unique,
-	       size, pos);
+	       pos, len);
 
-	ret = afs_get_writeback_key(vnode, &wbk);
-	if (ret) {
-		_leave(" = %d [no keys]", ret);
-		return ret;
-	}
+#if 0 // Error injection
+	if (subreq->debug_index == 3)
+		return netfs_write_subrequest_terminated(subreq, -ENOANO, false);
 
-	op = afs_alloc_operation(wbk->key, vnode->volume);
-	if (IS_ERR(op)) {
-		afs_put_wb_key(wbk);
-		return -ENOMEM;
+	if (!test_bit(NETFS_SREQ_RETRYING, &subreq->flags)) {
+		set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+		return netfs_write_subrequest_terminated(subreq, -EAGAIN, false);
 	}
+#endif
+
+	op = afs_alloc_operation(wreq->netfs_priv, vnode->volume);
+	if (IS_ERR(op))
+		return netfs_write_subrequest_terminated(subreq, -EAGAIN, false);
 
 	afs_op_set_vnode(op, 0, vnode);
-	op->file[0].dv_delta = 1;
+	op->file[0].dv_delta	= 1;
 	op->file[0].modification = true;
-	op->store.pos = pos;
-	op->store.size = size;
-	op->flags |= AFS_OPERATION_UNINTR;
-	op->ops = &afs_store_data_operation;
+	op->store.pos		= pos;
+	op->store.size		= len,
+	op->flags		|= AFS_OPERATION_UNINTR;
+	op->ops			= &afs_store_data_operation;
 
-try_next_key:
 	afs_begin_vnode_operation(op);
 
-	op->store.write_iter = iter;
-	op->store.i_size = max(pos + size, vnode->netfs.remote_i_size);
-	op->mtime = inode_get_mtime(&vnode->netfs.inode);
+	op->store.write_iter	= &subreq->io_iter;
+	op->store.i_size	= umax(pos + len, vnode->netfs.remote_i_size);
+	op->mtime		= inode_get_mtime(&vnode->netfs.inode);
 
 	afs_wait_for_operation(op);
-
-	switch (afs_op_error(op)) {
+	ret = afs_put_operation(op);
+	switch (ret) {
 	case -EACCES:
 	case -EPERM:
 	case -ENOKEY:
 	case -EKEYEXPIRED:
 	case -EKEYREJECTED:
 	case -EKEYREVOKED:
-		_debug("next");
-
-		ret = afs_get_writeback_key(vnode, &wbk);
-		if (ret == 0) {
-			key_put(op->key);
-			op->key = key_get(wbk->key);
-			goto try_next_key;
-		}
+		/* If there are more keys we can try, use the retry algorithm
+		 * to rotate the keys.
+		 */
+		if (wreq->netfs_priv2)
+			set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
 		break;
 	}
 
-	afs_put_wb_key(wbk);
-	_leave(" = %d", afs_op_error(op));
-	return afs_put_operation(op);
+	netfs_write_subrequest_terminated(subreq, ret < 0 ? ret : subreq->len, false);
 }
 
 /*
@@ -162,44 +169,32 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t
  */
 void afs_begin_writeback(struct netfs_io_request *wreq)
 {
+	afs_get_writeback_key(wreq);
 	wreq->io_streams[0].avail = true;
 }
 
 /*
- * Prepare a subrequest to write to the server.  This sets the max_len
- * parameter.
- */
-void afs_prepare_write(struct netfs_io_subrequest *subreq)
-{
-	//if (test_bit(NETFS_SREQ_RETRYING, &subreq->flags))
-	//	subreq->max_len = 512 * 1024;
-	//else
-	subreq->max_len = 256 * 1024 * 1024;
-}
-
-/*
- * Issue a subrequest to write to the server.
+ * Prepare to retry the writes in request.  Use this to try rotating the
+ * available writeback keys.
  */
-void afs_issue_write(struct netfs_io_subrequest *subreq)
+void afs_retry_request(struct netfs_io_request *wreq, struct netfs_io_stream *stream)
 {
-	struct afs_vnode *vnode = AFS_FS_I(subreq->rreq->inode);
-	ssize_t ret;
-
-	_enter("%x[%x],%zx",
-	       subreq->rreq->debug_id, subreq->debug_index, subreq->io_iter.count);
-
-#if 0 // Error injection
-	if (subreq->debug_index == 3)
-		return netfs_write_subrequest_terminated(subreq, -ENOANO, false);
+	struct netfs_io_subrequest *subreq =
+		list_first_entry(&stream->subrequests,
+				 struct netfs_io_subrequest, rreq_link);
 
-	if (!test_bit(NETFS_SREQ_RETRYING, &subreq->flags)) {
-		set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
-		return netfs_write_subrequest_terminated(subreq, -EAGAIN, false);
+	switch (subreq->error) {
+	case -EACCES:
+	case -EPERM:
+	case -ENOKEY:
+	case -EKEYEXPIRED:
+	case -EKEYREJECTED:
+	case -EKEYREVOKED:
+		afs_get_writeback_key(wreq);
+		if (!wreq->netfs_priv)
+			stream->failed = true;
+		break;
 	}
-#endif
-
-	ret = afs_store_data(vnode, &subreq->io_iter, subreq->start);
-	netfs_write_subrequest_terminated(subreq, ret < 0 ? ret : subreq->len, false);
 }
 
 /*
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index bea939ab0830..7ff15e2d7270 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -168,6 +168,13 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
 
 	_enter("R=%x[%x:]", wreq->debug_id, stream->stream_nr);
 
+	if (list_empty(&stream->subrequests))
+		return;
+
+	if (stream->source == NETFS_UPLOAD_TO_SERVER &&
+	    wreq->netfs_ops->retry_request)
+		wreq->netfs_ops->retry_request(wreq, stream);
+
 	if (unlikely(stream->failed))
 		return;
 
@@ -187,8 +194,6 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
 		return;
 	}
 
-	if (list_empty(&stream->subrequests))
-		return;
 	next = stream->subrequests.next;
 
 	do {
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index c2ba364041b0..298552f5122c 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -235,6 +235,7 @@ struct netfs_io_request {
 	struct iov_iter		iter;		/* Unencrypted-side iterator */
 	struct iov_iter		io_iter;	/* I/O (Encrypted-side) iterator */
 	void			*netfs_priv;	/* Private data for the netfs */
+	void			*netfs_priv2;	/* Private data for the netfs */
 	struct bio_vec		*direct_bv;	/* DIO buffer list (when handling iovec-iter) */
 	unsigned int		direct_bv_count; /* Number of elements in direct_bv[] */
 	unsigned int		debug_id;
@@ -306,6 +307,7 @@ struct netfs_request_ops {
 	void (*begin_writeback)(struct netfs_io_request *wreq);
 	void (*prepare_write)(struct netfs_io_subrequest *subreq);
 	void (*issue_write)(struct netfs_io_subrequest *subreq);
+	void (*retry_request)(struct netfs_io_request *wreq, struct netfs_io_stream *stream);
 	void (*invalidate_cache)(struct netfs_io_request *wreq);
 };
 


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* RE: [PATCH 19/26] netfs: New writeback implementation
  2024-03-28 16:34 ` [PATCH 19/26] netfs: New writeback implementation David Howells
@ 2024-03-29 10:34   ` Naveen Mamindlapalli
  2024-03-30  1:06     ` Vadim Fedorenko
  2024-03-30  1:03   ` Vadim Fedorenko
  1 sibling, 1 reply; 62+ messages in thread
From: Naveen Mamindlapalli @ 2024-03-29 10:34 UTC (permalink / raw)
  To: David Howells, Christian Brauner, Jeff Layton, Gao Xiang,
	Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, linux-mm, Marc Dionne,
	linux-afs, Paulo Alcantara, linux-cifs, Matthew Wilcox,
	Steve French, linux-cachefs, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, linux-nfs, netdev,
	v9fs, linux-kernel, linux-fsdevel, netfs,
	 linux-erofs@lists.ozlabs.org

> -----Original Message-----
> From: David Howells <dhowells@redhat.com>
> Sent: Thursday, March 28, 2024 10:04 PM
> To: Christian Brauner <christian@brauner.io>; Jeff Layton <jlayton@kernel.org>;
> Gao Xiang <hsiangkao@linux.alibaba.com>; Dominique Martinet
> <asmadeus@codewreck.org>
> Cc: David Howells <dhowells@redhat.com>; Matthew Wilcox
> <willy@infradead.org>; Steve French <smfrench@gmail.com>; Marc Dionne
> <marc.dionne@auristor.com>; Paulo Alcantara <pc@manguebit.com>; Shyam
> Prasad N <sprasad@microsoft.com>; Tom Talpey <tom@talpey.com>; Eric Van
> Hensbergen <ericvh@kernel.org>; Ilya Dryomov <idryomov@gmail.com>;
> netfs@lists.linux.dev; linux-cachefs@redhat.com; linux-afs@lists.infradead.org;
> linux-cifs@vger.kernel.org; linux-nfs@vger.kernel.org; ceph-
> devel@vger.kernel.org; v9fs@lists.linux.dev; linux-erofs@lists.ozlabs.org; linux-
> fsdevel@vger.kernel.org; linux-mm@kvack.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Latchesar Ionkov <lucho@ionkov.net>; Christian
> Schoenebeck <linux_oss@crudebyte.com>
> Subject: [PATCH 19/26] netfs: New writeback implementation
> 
> The current netfslib writeback implementation creates writeback requests of
> contiguous folio data and then separately tiles subrequests over the space
> twice, once for the server and once for the cache.  This creates a few
> issues:
> 
>  (1) Every time there's a discontiguity or a change between writing to only
>      one destination or writing to both, it must create a new request.
>      This makes it harder to do vectored writes.
> 
>  (2) The folios don't have the writeback mark removed until the end of the
>      request - and a request could be hundreds of megabytes.
> 
>  (3) In future, I want to support a larger cache granularity, which will
>      require aggregation of some folios that contain unmodified data (which
>      only need to go to the cache) and some which contain modifications
>      (which need to be uploaded and stored to the cache) - but, currently,
>      these are treated as discontiguous.
> 
> There's also a move to get everyone to use writeback_iter() to extract
> writable folios from the pagecache.  That said, currently writeback_iter()
> has some issues that make it less than ideal:
> 
>  (1) there's no way to cancel the iteration, even if you find a "temporary"
>      error that means the current folio and all subsequent folios are going
>      to fail;
> 
>  (2) there's no way to filter the folios being written back - something
>      that will impact Ceph with it's ordered snap system;
> 
>  (3) and if you get a folio you can't immediately deal with (say you need
>      to flush the preceding writes), you are left with a folio hanging in
>      the locked state for the duration, when really we should unlock it and
>      relock it later.
> 
> In this new implementation, I use writeback_iter() to pump folios,
> progressively creating two parallel, but separate streams and cleaning up
> the finished folios as the subrequests complete.  Either or both streams
> can contain gaps, and the subrequests in each stream can be of variable
> size, don't need to align with each other and don't need to align with the
> folios.
> 
> Indeed, subrequests can cross folio boundaries, may cover several folios or
> a folio may be spanned by multiple folios, e.g.:
> 
>          +---+---+-----+-----+---+----------+
> Folios:  |   |   |     |     |   |          |
>          +---+---+-----+-----+---+----------+
> 
>            +------+------+     +----+----+
> Upload:    |      |      |.....|    |    |
>            +------+------+     +----+----+
> 
>          +------+------+------+------+------+
> Cache:   |      |      |      |      |      |
>          +------+------+------+------+------+
> 
> The progressive subrequest construction permits the algorithm to be
> preparing both the next upload to the server and the next write to the
> cache whilst the previous ones are already in progress.  Throttling can be
> applied to control the rate of production of subrequests - and, in any
> case, we probably want to write them to the server in ascending order,
> particularly if the file will be extended.
> 
> Content crypto can also be prepared at the same time as the subrequests and
> run asynchronously, with the prepped requests being stalled until the
> crypto catches up with them.  This might also be useful for transport
> crypto, but that happens at a lower layer, so probably would be harder to
> pull off.
> 
> The algorithm is split into three parts:
> 
>  (1) The issuer.  This walks through the data, packaging it up, encrypting
>      it and creating subrequests.  The part of this that generates
>      subrequests only deals with file positions and spans and so is usable
>      for DIO/unbuffered writes as well as buffered writes.
> 
>  (2) The collector. This asynchronously collects completed subrequests,
>      unlocks folios, frees crypto buffers and performs any retries.  This
>      runs in a work queue so that the issuer can return to the caller for
>      writeback (so that the VM can have its kswapd thread back) or async
>      writes.
> 
>  (3) The retryer.  This pauses the issuer, waits for all outstanding
>      subrequests to complete and then goes through the failed subrequests
>      to reissue them.  This may involve reprepping them (with cifs, the
>      credits must be renegotiated, and a subrequest may need splitting),
>      and doing RMW for content crypto if there's a conflicting change on
>      the server.
> 
> [!] Note that some of the functions are prefixed with "new_" to avoid
> clashes with existing functions.  These will be renamed in a later patch
> that cuts over to the new algorithm.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: Eric Van Hensbergen <ericvh@kernel.org>
> cc: Latchesar Ionkov <lucho@ionkov.net>
> cc: Dominique Martinet <asmadeus@codewreck.org>
> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: v9fs@lists.linux.dev
> cc: linux-afs@lists.infradead.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/netfs/Makefile            |   4 +-
>  fs/netfs/buffered_write.c    |   4 -
>  fs/netfs/internal.h          |  27 ++
>  fs/netfs/objects.c           |  17 +
>  fs/netfs/write_collect.c     | 808 +++++++++++++++++++++++++++++++++++
>  fs/netfs/write_issue.c       | 673 +++++++++++++++++++++++++++++
>  include/linux/netfs.h        |  68 ++-
>  include/trace/events/netfs.h | 232 +++++++++-
>  8 files changed, 1824 insertions(+), 9 deletions(-)
>  create mode 100644 fs/netfs/write_collect.c
>  create mode 100644 fs/netfs/write_issue.c
> 
> diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
> index d4d1d799819e..1eb86e34b5a9 100644
> --- a/fs/netfs/Makefile
> +++ b/fs/netfs/Makefile
> @@ -11,7 +11,9 @@ netfs-y := \
>  	main.o \
>  	misc.o \
>  	objects.o \
> -	output.o
> +	output.o \
> +	write_collect.o \
> +	write_issue.o
> 
>  netfs-$(CONFIG_NETFS_STATS) += stats.o
> 
> diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
> index 244d67a43972..621532dacef5 100644
> --- a/fs/netfs/buffered_write.c
> +++ b/fs/netfs/buffered_write.c
> @@ -74,16 +74,12 @@ static enum netfs_how_to_modify
> netfs_how_to_modify(struct netfs_inode *ctx,
> 
>  	if (file->f_mode & FMODE_READ)
>  		goto no_write_streaming;
> -	if (test_bit(NETFS_ICTX_NO_WRITE_STREAMING, &ctx->flags))
> -		goto no_write_streaming;
> 
>  	if (netfs_is_cache_enabled(ctx)) {
>  		/* We don't want to get a streaming write on a file that loses
>  		 * caching service temporarily because the backing store got
>  		 * culled.
>  		 */
> -		if (!test_bit(NETFS_ICTX_NO_WRITE_STREAMING, &ctx-
> >flags))
> -			set_bit(NETFS_ICTX_NO_WRITE_STREAMING, &ctx-
> >flags);
>  		goto no_write_streaming;
>  	}
> 
> diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
> index 58289cc65e25..5d3f74a70fa7 100644
> --- a/fs/netfs/internal.h
> +++ b/fs/netfs/internal.h
> @@ -153,6 +153,33 @@ static inline void netfs_stat_d(atomic_t *stat)
>  #define netfs_stat_d(x) do {} while(0)
>  #endif
> 
> +/*
> + * write_collect.c
> + */
> +int netfs_folio_written_back(struct folio *folio);
> +void netfs_write_collection_worker(struct work_struct *work);
> +void netfs_wake_write_collector(struct netfs_io_request *wreq, bool
> was_async);
> +
> +/*
> + * write_issue.c
> + */
> +struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
> +						struct file *file,
> +						loff_t start,
> +						enum netfs_io_origin origin);
> +void netfs_reissue_write(struct netfs_io_stream *stream,
> +			 struct netfs_io_subrequest *subreq);
> +int netfs_advance_write(struct netfs_io_request *wreq,
> +			struct netfs_io_stream *stream,
> +			loff_t start, size_t len, bool to_eof);
> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t
> len);
> +int new_netfs_advance_writethrough(struct netfs_io_request *wreq, struct
> writeback_control *wbc,
> +				   struct folio *folio, size_t copied, bool
> to_page_end,
> +				   struct folio **writethrough_cache);
> +int new_netfs_end_writethrough(struct netfs_io_request *wreq, struct
> writeback_control *wbc,
> +			       struct folio *writethrough_cache);
> +int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t
> len);
> +
>  /*
>   * Miscellaneous functions.
>   */
> diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
> index 1a4e2ce735ce..c90d482b1650 100644
> --- a/fs/netfs/objects.c
> +++ b/fs/netfs/objects.c
> @@ -47,6 +47,10 @@ struct netfs_io_request *netfs_alloc_request(struct
> address_space *mapping,
>  	rreq->inode	= inode;
>  	rreq->i_size	= i_size_read(inode);
>  	rreq->debug_id	= atomic_inc_return(&debug_ids);
> +	rreq->wsize	= INT_MAX;
> +	spin_lock_init(&rreq->lock);
> +	INIT_LIST_HEAD(&rreq->io_streams[0].subrequests);
> +	INIT_LIST_HEAD(&rreq->io_streams[1].subrequests);
>  	INIT_LIST_HEAD(&rreq->subrequests);
>  	INIT_WORK(&rreq->work, NULL);
>  	refcount_set(&rreq->ref, 1);
> @@ -85,6 +89,8 @@ void netfs_get_request(struct netfs_io_request *rreq, enum
> netfs_rreq_ref_trace
>  void netfs_clear_subrequests(struct netfs_io_request *rreq, bool was_async)
>  {
>  	struct netfs_io_subrequest *subreq;
> +	struct netfs_io_stream *stream;
> +	int s;
> 
>  	while (!list_empty(&rreq->subrequests)) {
>  		subreq = list_first_entry(&rreq->subrequests,
> @@ -93,6 +99,17 @@ void netfs_clear_subrequests(struct netfs_io_request
> *rreq, bool was_async)
>  		netfs_put_subrequest(subreq, was_async,
>  				     netfs_sreq_trace_put_clear);
>  	}
> +
> +	for (s = 0; s < ARRAY_SIZE(rreq->io_streams); s++) {
> +		stream = &rreq->io_streams[s];
> +		while (!list_empty(&stream->subrequests)) {
> +			subreq = list_first_entry(&stream->subrequests,
> +						  struct netfs_io_subrequest,
> rreq_link);
> +			list_del(&subreq->rreq_link);
> +			netfs_put_subrequest(subreq, was_async,
> +					     netfs_sreq_trace_put_clear);
> +		}
> +	}
>  }
> 
>  static void netfs_free_request_rcu(struct rcu_head *rcu)
> diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
> new file mode 100644
> index 000000000000..5e2ca8b25af0
> --- /dev/null
> +++ b/fs/netfs/write_collect.c
> @@ -0,0 +1,808 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Network filesystem write subrequest result collection, assessment
> + * and retrying.
> + *
> + * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +
> +#include <linux/export.h>
> +#include <linux/fs.h>
> +#include <linux/mm.h>
> +#include <linux/pagemap.h>
> +#include <linux/slab.h>
> +#include "internal.h"
> +
> +/* Notes made in the collector */
> +#define HIT_PENDING		0x01	/* A front op was still pending */
> +#define SOME_EMPTY		0x02	/* One of more streams are empty
> */
> +#define ALL_EMPTY		0x04	/* All streams are empty */
> +#define MAYBE_DISCONTIG		0x08	/* A front op may be
> discontiguous (rounded to PAGE_SIZE) */
> +#define NEED_REASSESS		0x10	/* Need to loop round and
> reassess */
> +#define REASSESS_DISCONTIG	0x20	/* Reassess discontiguity if
> contiguity advances */
> +#define MADE_PROGRESS		0x40	/* Made progress cleaning up a
> stream or the folio set */
> +#define BUFFERED		0x80	/* The pagecache needs cleaning up */
> +#define NEED_RETRY		0x100	/* A front op requests retrying */
> +#define SAW_FAILURE		0x200	/* One stream or hit a permanent
> failure */
> +
> +/*
> + * Successful completion of write of a folio to the server and/or cache.  Note
> + * that we are not allowed to lock the folio here on pain of deadlocking with
> + * truncate.
> + */
> +int netfs_folio_written_back(struct folio *folio)
> +{
> +	enum netfs_folio_trace why = netfs_folio_trace_clear;
> +	struct netfs_folio *finfo;
> +	struct netfs_group *group = NULL;
> +	int gcount = 0;

Reverse xmas tree order missing in multiple functions.

> +
> +	if ((finfo = netfs_folio_info(folio))) {
> +		/* Streaming writes cannot be redirtied whilst under writeback,
> +		 * so discard the streaming record.
> +		 */
> +		folio_detach_private(folio);
> +		group = finfo->netfs_group;
> +		gcount++;
> +		kfree(finfo);
> +		why = netfs_folio_trace_clear_s;
> +		goto end_wb;
> +	}
> +
> +	if ((group = netfs_folio_group(folio))) {
> +		if (group == NETFS_FOLIO_COPY_TO_CACHE) {
> +			why = netfs_folio_trace_clear_cc;
> +			if (group == NETFS_FOLIO_COPY_TO_CACHE)
> +				folio_detach_private(folio);
> +			else
> +				why = netfs_folio_trace_redirtied;
> +			goto end_wb;
> +		}
> +
> +		/* Need to detach the group pointer if the page didn't get
> +		 * redirtied.  If it has been redirtied, then it must be within
> +		 * the same group.
> +		 */
> +		why = netfs_folio_trace_redirtied;
> +		if (!folio_test_dirty(folio)) {
> +			if (!folio_test_dirty(folio)) {
> +				folio_detach_private(folio);
> +				gcount++;
> +				why = netfs_folio_trace_clear_g;
> +			}
> +		}
> +	}
> +
> +end_wb:
> +	trace_netfs_folio(folio, why);
> +	folio_end_writeback(folio);
> +	return gcount;
> +}
> +
> +/*
> + * Get hold of a folio we have under writeback.  We don't want to get the
> + * refcount on it.
> + */
> +static struct folio *netfs_writeback_lookup_folio(struct netfs_io_request *wreq,
> loff_t pos)
> +{
> +	XA_STATE(xas, &wreq->mapping->i_pages, pos / PAGE_SIZE);
> +	struct folio *folio;
> +
> +	rcu_read_lock();
> +
> +	for (;;) {
> +		xas_reset(&xas);
> +		folio = xas_load(&xas);
> +		if (xas_retry(&xas, folio))
> +			continue;
> +
> +		if (!folio || xa_is_value(folio))
> +			kdebug("R=%08x: folio %lx (%llx) not present",
> +			       wreq->debug_id, xas.xa_index, pos / PAGE_SIZE);
> +		BUG_ON(!folio || xa_is_value(folio));
> +
> +		if (folio == xas_reload(&xas))
> +			break;
> +	}
> +
> +	rcu_read_unlock();
> +
> +	if (WARN_ONCE(!folio_test_writeback(folio),
> +		      "R=%08x: folio %lx is not under writeback\n",
> +		      wreq->debug_id, folio->index)) {
> +		trace_netfs_folio(folio, netfs_folio_trace_not_under_wback);
> +	}
> +	return folio;
> +}
> +
> +/*
> + * Unlock any folios we've finished with.
> + */
> +static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq,
> +					  unsigned long long collected_to,
> +					  unsigned int *notes)
> +{
> +	for (;;) {
> +		struct folio *folio;
> +		struct netfs_folio *finfo;
> +		unsigned long long fpos, fend;
> +		size_t fsize, flen;
> +
> +		folio = netfs_writeback_lookup_folio(wreq, wreq->cleaned_to);
> +
> +		fpos = folio_pos(folio);
> +		fsize = folio_size(folio);
> +		finfo = netfs_folio_info(folio);
> +		flen = finfo ? finfo->dirty_offset + finfo->dirty_len : fsize;
> +
> +		fend = min_t(unsigned long long, fpos + flen, wreq->i_size);
> +
> +		trace_netfs_collect_folio(wreq, folio, fend, collected_to);
> +
> +		if (fpos + fsize > wreq->contiguity) {
> +			trace_netfs_collect_contig(wreq, fpos + fsize,
> +						   netfs_contig_trace_unlock);
> +			wreq->contiguity = fpos + fsize;
> +		}
> +
> +		/* Unlock any folio we've transferred all of. */
> +		if (collected_to < fend)
> +			break;
> +
> +		wreq->nr_group_rel += netfs_folio_written_back(folio);
> +		wreq->cleaned_to = fpos + fsize;
> +		*notes |= MADE_PROGRESS;
> +
> +		if (fpos + fsize >= collected_to)
> +			break;
> +	}
> +}
> +
> +/*
> + * Perform retries on the streams that need it.
> + */
> +static void netfs_retry_write_stream(struct netfs_io_request *wreq,
> +				     struct netfs_io_stream *stream)
> +{
> +	struct list_head *next;
> +
> +	_enter("R=%x[%x:]", wreq->debug_id, stream->stream_nr);
> +
> +	if (unlikely(stream->failed))
> +		return;
> +
> +	/* If there's no renegotiation to do, just resend each failed subreq. */
> +	if (!stream->prepare_write) {
> +		struct netfs_io_subrequest *subreq;
> +
> +		list_for_each_entry(subreq, &stream->subrequests, rreq_link) {
> +			if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
> +				break;
> +			if (__test_and_clear_bit(NETFS_SREQ_NEED_RETRY,
> &subreq->flags)) {
> +				__set_bit(NETFS_SREQ_RETRYING, &subreq-
> >flags);
> +				netfs_get_subrequest(subreq,
> netfs_sreq_trace_get_resubmit);
> +				netfs_reissue_write(stream, subreq);
> +			}
> +		}
> +		return;
> +	}
> +
> +	if (list_empty(&stream->subrequests))
> +		return;
> +	next = stream->subrequests.next;
> +
> +	do {
> +		struct netfs_io_subrequest *subreq = NULL, *from, *to, *tmp;
> +		unsigned long long start, len;
> +		size_t part;
> +		bool boundary = false;
> +
> +		/* Go through the stream and find the next span of contiguous
> +		 * data that we then rejig (cifs, for example, needs the wsize
> +		 * renegotiating) and reissue.
> +		 */
> +		from = list_entry(next, struct netfs_io_subrequest, rreq_link);
> +		to = from;
> +		start = from->start + from->transferred;
> +		len   = from->len   - from->transferred;
> +
> +		if (test_bit(NETFS_SREQ_FAILED, &from->flags) ||
> +		    !test_bit(NETFS_SREQ_NEED_RETRY, &from->flags))
> +			return;
> +
> +		list_for_each_continue(next, &stream->subrequests) {
> +			subreq = list_entry(next, struct netfs_io_subrequest,
> rreq_link);
> +			if (subreq->start + subreq->transferred != start + len ||
> +			    test_bit(NETFS_SREQ_BOUNDARY, &subreq->flags)
> ||
> +			    !test_bit(NETFS_SREQ_NEED_RETRY, &subreq-
> >flags))
> +				break;
> +			to = subreq;
> +			len += to->len;
> +		}
> +
> +		/* Work through the sublist. */
> +		subreq = from;
> +		list_for_each_entry_from(subreq, &stream->subrequests,
> rreq_link) {
> +			if (!len)
> +				break;
> +			/* Renegotiate max_len (wsize) */
> +			trace_netfs_sreq(subreq, netfs_sreq_trace_retry);
> +			__clear_bit(NETFS_SREQ_NEED_RETRY, &subreq-
> >flags);
> +			__set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
> +			stream->prepare_write(subreq);
> +
> +			part = min(len, subreq->max_len);
> +			subreq->len = part;
> +			subreq->start = start;
> +			subreq->transferred = 0;
> +			len -= part;
> +			start += part;
> +			if (len && subreq == to &&
> +			    __test_and_clear_bit(NETFS_SREQ_BOUNDARY, &to-
> >flags))
> +				boundary = true;
> +
> +			netfs_get_subrequest(subreq,
> netfs_sreq_trace_get_resubmit);
> +			netfs_reissue_write(stream, subreq);
> +			if (subreq == to)
> +				break;
> +		}
> +
> +		/* If we managed to use fewer subreqs, we can discard the
> +		 * excess; if we used the same number, then we're done.
> +		 */
> +		if (!len) {
> +			if (subreq == to)
> +				continue;
> +			list_for_each_entry_safe_from(subreq, tmp,
> +						      &stream->subrequests,
> rreq_link) {
> +				trace_netfs_sreq(subreq,
> netfs_sreq_trace_discard);
> +				list_del(&subreq->rreq_link);
> +				netfs_put_subrequest(subreq, false,
> netfs_sreq_trace_put_done);
> +				if (subreq == to)
> +					break;
> +			}
> +			continue;
> +		}
> +
> +		/* We ran out of subrequests, so we need to allocate some more
> +		 * and insert them after.
> +		 */
> +		do {
> +			subreq = netfs_alloc_subrequest(wreq);
> +			subreq->source		= to->source;
> +			subreq->start		= start;
> +			subreq->max_len		= len;
> +			subreq->max_nr_segs	= INT_MAX;
> +			subreq->debug_index	= atomic_inc_return(&wreq-
> >subreq_counter);
> +			subreq->stream_nr	= to->stream_nr;
> +			__set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
> +
> +			trace_netfs_sreq_ref(wreq->debug_id, subreq-
> >debug_index,
> +					     refcount_read(&subreq->ref),
> +					     netfs_sreq_trace_new);
> +			netfs_get_subrequest(subreq,
> netfs_sreq_trace_get_resubmit);
> +
> +			list_add(&subreq->rreq_link, &to->rreq_link);
> +			to = list_next_entry(to, rreq_link);
> +			trace_netfs_sreq(subreq, netfs_sreq_trace_retry);
> +
> +			switch (stream->source) {
> +			case NETFS_UPLOAD_TO_SERVER:
> +				netfs_stat(&netfs_n_wh_upload);
> +				subreq->max_len = min(len, wreq->wsize);
> +				break;
> +			case NETFS_WRITE_TO_CACHE:
> +				netfs_stat(&netfs_n_wh_write);
> +				break;
> +			default:
> +				WARN_ON_ONCE(1);
> +			}
> +
> +			stream->prepare_write(subreq);
> +
> +			part = min(len, subreq->max_len);
> +			subreq->len = subreq->transferred + part;
> +			len -= part;
> +			start += part;
> +			if (!len && boundary) {
> +				__set_bit(NETFS_SREQ_BOUNDARY, &to-
> >flags);
> +				boundary = false;
> +			}
> +
> +			netfs_reissue_write(stream, subreq);
> +			if (!len)
> +				break;
> +
> +		} while (len);
> +
> +	} while (!list_is_head(next, &stream->subrequests));
> +}
> +
> +/*
> + * Perform retries on the streams that need it.  If we're doing content
> + * encryption and the server copy changed due to a third-party write, we may
> + * need to do an RMW cycle and also rewrite the data to the cache.
> + */
> +static void netfs_retry_writes(struct netfs_io_request *wreq)
> +{
> +	struct netfs_io_subrequest *subreq;
> +	struct netfs_io_stream *stream;
> +	int s;
> +
> +	/* Wait for all outstanding I/O to quiesce before performing retries as
> +	 * we may need to renegotiate the I/O sizes.
> +	 */
> +	for (s = 0; s < NR_IO_STREAMS; s++) {
> +		stream = &wreq->io_streams[s];
> +		if (!stream->active)
> +			continue;
> +
> +		list_for_each_entry(subreq, &stream->subrequests, rreq_link) {
> +			wait_on_bit(&subreq->flags,
> NETFS_SREQ_IN_PROGRESS,
> +				    TASK_UNINTERRUPTIBLE);
> +		}
> +	}
> +
> +	// TODO: Enc: Fetch changed partial pages
> +	// TODO: Enc: Reencrypt content if needed.
> +	// TODO: Enc: Wind back transferred point.
> +	// TODO: Enc: Mark cache pages for retry.
> +
> +	for (s = 0; s < NR_IO_STREAMS; s++) {
> +		stream = &wreq->io_streams[s];
> +		if (stream->need_retry) {
> +			stream->need_retry = false;
> +			netfs_retry_write_stream(wreq, stream);
> +		}
> +	}
> +}
> +
> +/*
> + * Collect and assess the results of various write subrequests.  We may need to
> + * retry some of the results - or even do an RMW cycle for content crypto.
> + *
> + * Note that we have a number of parallel, overlapping lists of subrequests,
> + * one to the server and one to the local cache for example, which may not be
> + * the same size or starting position and may not even correspond in boundary
> + * alignment.
> + */
> +static void netfs_collect_write_results(struct netfs_io_request *wreq)
> +{
> +	struct netfs_io_subrequest *front, *remove;
> +	struct netfs_io_stream *stream;
> +	unsigned long long collected_to;
> +	unsigned int notes;
> +	int s;
> +
> +	_enter("%llx-%llx", wreq->start, wreq->start + wreq->len);
> +	trace_netfs_collect(wreq);
> +	trace_netfs_rreq(wreq, netfs_rreq_trace_collect);
> +
> +reassess_streams:
> +	smp_rmb();
> +	collected_to = ULLONG_MAX;
> +	if (wreq->origin == NETFS_WRITEBACK)
> +		notes = ALL_EMPTY | BUFFERED | MAYBE_DISCONTIG;
> +	else if (wreq->origin == NETFS_WRITETHROUGH)
> +		notes = ALL_EMPTY | BUFFERED;
> +	else
> +		notes = ALL_EMPTY;
> +
> +	/* Remove completed subrequests from the front of the streams and
> +	 * advance the completion point on each stream.  We stop when we hit
> +	 * something that's in progress.  The issuer thread may be adding stuff
> +	 * to the tail whilst we're doing this.
> +	 *
> +	 * We must not, however, merge in discontiguities that span whole
> +	 * folios that aren't under writeback.  This is made more complicated
> +	 * by the folios in the gap being of unpredictable sizes - if they even
> +	 * exist - but we don't want to look them up.
> +	 */
> +	for (s = 0; s < NR_IO_STREAMS; s++) {
> +		loff_t rstart, rend;
> +
> +		stream = &wreq->io_streams[s];
> +		/* Read active flag before list pointers */
> +		if (!smp_load_acquire(&stream->active))
> +			continue;
> +
> +		front = stream->front;
> +		while (front) {
> +			trace_netfs_collect_sreq(wreq, front);
> +			//_debug("sreq [%x] %llx %zx/%zx",
> +			//       front->debug_index, front->start, front->transferred,
> front->len);
> +
> +			/* Stall if there may be a discontinuity. */
> +			rstart = round_down(front->start, PAGE_SIZE);
> +			if (rstart > wreq->contiguity) {
> +				if (wreq->contiguity > stream->collected_to) {
> +					trace_netfs_collect_gap(wreq, stream,
> +								wreq->contiguity,
> 'D');
> +					stream->collected_to = wreq->contiguity;
> +				}
> +				notes |= REASSESS_DISCONTIG;
> +				break;
> +			}
> +			rend = round_up(front->start + front->len, PAGE_SIZE);
> +			if (rend > wreq->contiguity) {
> +				trace_netfs_collect_contig(wreq, rend,
> +
> netfs_contig_trace_collect);
> +				wreq->contiguity = rend;
> +				if (notes & REASSESS_DISCONTIG)
> +					notes |= NEED_REASSESS;
> +			}
> +			notes &= ~MAYBE_DISCONTIG;
> +
> +			/* Stall if the front is still undergoing I/O. */
> +			if (test_bit(NETFS_SREQ_IN_PROGRESS, &front-
> >flags)) {
> +				notes |= HIT_PENDING;
> +				break;
> +			}
> +			smp_rmb(); /* Read counters after I-P flag. */
> +
> +			if (stream->failed) {
> +				stream->collected_to = front->start + front->len;
> +				notes |= MADE_PROGRESS | SAW_FAILURE;
> +				goto cancel;
> +			}
> +			if (front->start + front->transferred > stream-
> >collected_to) {
> +				stream->collected_to = front->start + front-
> >transferred;
> +				stream->transferred = stream->collected_to -
> wreq->start;
> +				notes |= MADE_PROGRESS;
> +			}
> +			if (test_bit(NETFS_SREQ_FAILED, &front->flags)) {
> +				stream->failed = true;
> +				stream->error = front->error;
> +				if (stream->source ==
> NETFS_UPLOAD_TO_SERVER)
> +					mapping_set_error(wreq->mapping, front-
> >error);
> +				notes |= NEED_REASSESS | SAW_FAILURE;
> +				break;
> +			}
> +			if (front->transferred < front->len) {
> +				stream->need_retry = true;
> +				notes |= NEED_RETRY | MADE_PROGRESS;
> +				break;
> +			}
> +
> +		cancel:
> +			/* Remove if completely consumed. */
> +			spin_lock(&wreq->lock);
> +
> +			remove = front;
> +			list_del_init(&front->rreq_link);
> +			front = list_first_entry_or_null(&stream->subrequests,
> +							 struct
> netfs_io_subrequest, rreq_link);
> +			stream->front = front;
> +			if (!front) {
> +				unsigned long long jump_to =
> atomic64_read(&wreq->issued_to);
> +
> +				if (stream->collected_to < jump_to) {
> +					trace_netfs_collect_gap(wreq, stream,
> jump_to, 'A');
> +					stream->collected_to = jump_to;
> +				}
> +			}
> +
> +			spin_unlock(&wreq->lock);
> +			netfs_put_subrequest(remove, false,
> +					     notes & SAW_FAILURE ?
> +					     netfs_sreq_trace_put_cancel :
> +					     netfs_sreq_trace_put_done);
> +		}
> +
> +		if (front)
> +			notes &= ~ALL_EMPTY;
> +		else
> +			notes |= SOME_EMPTY;
> +
> +		if (stream->collected_to < collected_to)
> +			collected_to = stream->collected_to;
> +	}
> +
> +	if (collected_to != ULLONG_MAX && collected_to > wreq->collected_to)
> +		wreq->collected_to = collected_to;
> +
> +	/* If we have an empty stream, we need to jump it forward over any gap
> +	 * otherwise the collection point will never advance.
> +	 *
> +	 * Note that the issuer always adds to the stream with the lowest
> +	 * so-far submitted start, so if we see two consecutive subreqs in one
> +	 * stream with nothing between then in another stream, then the second
> +	 * stream has a gap that can be jumped.
> +	 */
> +	if (notes & SOME_EMPTY) {
> +		unsigned long long jump_to = wreq->start + wreq->len;
> +
> +		for (s = 0; s < NR_IO_STREAMS; s++) {
> +			stream = &wreq->io_streams[s];
> +			if (stream->active &&
> +			    stream->front &&
> +			    stream->front->start < jump_to)
> +				jump_to = stream->front->start;
> +		}
> +
> +		for (s = 0; s < NR_IO_STREAMS; s++) {
> +			stream = &wreq->io_streams[s];
> +			if (stream->active &&
> +			    !stream->front &&
> +			    stream->collected_to < jump_to) {
> +				trace_netfs_collect_gap(wreq, stream, jump_to,
> 'B');
> +				stream->collected_to = jump_to;
> +			}
> +		}
> +	}
> +
> +	for (s = 0; s < NR_IO_STREAMS; s++) {
> +		stream = &wreq->io_streams[s];
> +		if (stream->active)
> +			trace_netfs_collect_stream(wreq, stream);
> +	}
> +
> +	trace_netfs_collect_state(wreq, wreq->collected_to, notes);
> +
> +	/* Unlock any folios that we have now finished with. */
> +	if (notes & BUFFERED) {
> +		unsigned long long clean_to = min(wreq->collected_to, wreq-
> >contiguity);
> +
> +		if (wreq->cleaned_to < clean_to)
> +			netfs_writeback_unlock_folios(wreq, clean_to, &notes);
> +	} else {
> +		wreq->cleaned_to = wreq->collected_to;
> +	}
> +
> +	// TODO: Discard encryption buffers
> +
> +	/* If all streams are discontiguous with the last folio we cleared, we
> +	 * may need to skip a set of folios.
> +	 */
> +	if ((notes & (MAYBE_DISCONTIG | ALL_EMPTY)) ==
> MAYBE_DISCONTIG) {
> +		unsigned long long jump_to = ULLONG_MAX;
> +
> +		for (s = 0; s < NR_IO_STREAMS; s++) {
> +			stream = &wreq->io_streams[s];
> +			if (stream->active && stream->front &&
> +			    stream->front->start < jump_to)
> +				jump_to = stream->front->start;
> +		}
> +
> +		trace_netfs_collect_contig(wreq, jump_to,
> netfs_contig_trace_jump);
> +		wreq->contiguity = jump_to;
> +		wreq->cleaned_to = jump_to;
> +		wreq->collected_to = jump_to;
> +		for (s = 0; s < NR_IO_STREAMS; s++) {
> +			stream = &wreq->io_streams[s];
> +			if (stream->collected_to < jump_to)
> +				stream->collected_to = jump_to;
> +		}
> +		//cond_resched();
> +		notes |= MADE_PROGRESS;
> +		goto reassess_streams;
> +	}
> +
> +	if (notes & NEED_RETRY)
> +		goto need_retry;
> +	if ((notes & MADE_PROGRESS) && test_bit(NETFS_RREQ_PAUSE,
> &wreq->flags)) {
> +		trace_netfs_rreq(wreq, netfs_rreq_trace_unpause);
> +		clear_bit_unlock(NETFS_RREQ_PAUSE, &wreq->flags);
> +		wake_up_bit(&wreq->flags, NETFS_RREQ_PAUSE);
> +	}
> +
> +	if (notes & NEED_REASSESS) {
> +		//cond_resched();
> +		goto reassess_streams;
> +	}
> +	if (notes & MADE_PROGRESS) {
> +		//cond_resched();
> +		goto reassess_streams;
> +	}
> +
> +out:
> +	netfs_put_group_many(wreq->group, wreq->nr_group_rel);
> +	wreq->nr_group_rel = 0;
> +	_leave(" = %x", notes);
> +	return;
> +
> +need_retry:
> +	/* Okay...  We're going to have to retry one or both streams.  Note
> +	 * that any partially completed op will have had any wholly transferred
> +	 * folios removed from it.
> +	 */
> +	_debug("retry");
> +	netfs_retry_writes(wreq);
> +	goto out;
> +}
> +
> +/*
> + * Perform the collection of subrequests, folios and encryption buffers.
> + */
> +void netfs_write_collection_worker(struct work_struct *work)
> +{
> +	struct netfs_io_request *wreq = container_of(work, struct
> netfs_io_request, work);
> +	struct netfs_inode *ictx = netfs_inode(wreq->inode);
> +	size_t transferred;
> +	int s;
> +
> +	_enter("R=%x", wreq->debug_id);
> +
> +	netfs_see_request(wreq, netfs_rreq_trace_see_work);
> +	if (!test_bit(NETFS_RREQ_IN_PROGRESS, &wreq->flags)) {
> +		netfs_put_request(wreq, false, netfs_rreq_trace_put_work);
> +		return;
> +	}
> +
> +	netfs_collect_write_results(wreq);
> +
> +	/* We're done when the app thread has finished posting subreqs and all
> +	 * the queues in all the streams are empty.
> +	 */
> +	if (!test_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags)) {
> +		netfs_put_request(wreq, false, netfs_rreq_trace_put_work);
> +		return;
> +	}
> +	smp_rmb(); /* Read ALL_QUEUED before lists. */
> +
> +	transferred = LONG_MAX;
> +	for (s = 0; s < NR_IO_STREAMS; s++) {
> +		struct netfs_io_stream *stream = &wreq->io_streams[s];
> +		if (!stream->active)
> +			continue;
> +		if (!list_empty(&stream->subrequests)) {
> +			netfs_put_request(wreq, false,
> netfs_rreq_trace_put_work);
> +			return;
> +		}
> +		if (stream->transferred < transferred)
> +			transferred = stream->transferred;
> +	}
> +
> +	/* Okay, declare that all I/O is complete. */
> +	wreq->transferred = transferred;
> +	trace_netfs_rreq(wreq, netfs_rreq_trace_write_done);
> +
> +	if (wreq->io_streams[1].active &&
> +	    wreq->io_streams[1].failed) {
> +		/* Cache write failure doesn't prevent writeback completion
> +		 * unless we're in disconnected mode.
> +		 */
> +		ictx->ops->invalidate_cache(wreq);
> +	}
> +
> +	if (wreq->cleanup)
> +		wreq->cleanup(wreq);
> +
> +	if (wreq->origin == NETFS_DIO_WRITE &&
> +	    wreq->mapping->nrpages) {
> +		/* mmap may have got underfoot and we may now have folios
> +		 * locally covering the region we just wrote.  Attempt to
> +		 * discard the folios, but leave in place any modified locally.
> +		 * ->write_iter() is prevented from interfering by the DIO
> +		 * counter.
> +		 */
> +		pgoff_t first = wreq->start >> PAGE_SHIFT;
> +		pgoff_t last = (wreq->start + wreq->transferred - 1) >>
> PAGE_SHIFT;
> +		invalidate_inode_pages2_range(wreq->mapping, first, last);
> +	}
> +
> +	if (wreq->origin == NETFS_DIO_WRITE)
> +		inode_dio_end(wreq->inode);
> +
> +	_debug("finished");
> +	trace_netfs_rreq(wreq, netfs_rreq_trace_wake_ip);
> +	clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &wreq->flags);
> +	wake_up_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS);
> +
> +	if (wreq->iocb) {
> +		wreq->iocb->ki_pos += wreq->transferred;
> +		if (wreq->iocb->ki_complete)
> +			wreq->iocb->ki_complete(
> +				wreq->iocb, wreq->error ? wreq->error : wreq-
> >transferred);
> +		wreq->iocb = VFS_PTR_POISON;
> +	}
> +
> +	netfs_clear_subrequests(wreq, false);
> +	netfs_put_request(wreq, false, netfs_rreq_trace_put_work_complete);
> +}
> +
> +/*
> + * Wake the collection work item.
> + */
> +void netfs_wake_write_collector(struct netfs_io_request *wreq, bool was_async)
> +{
> +	if (!work_pending(&wreq->work)) {
> +		netfs_get_request(wreq, netfs_rreq_trace_get_work);
> +		if (!queue_work(system_unbound_wq, &wreq->work))
> +			netfs_put_request(wreq, was_async,
> netfs_rreq_trace_put_work_nq);
> +	}
> +}
> +
> +/**
> + * new_netfs_write_subrequest_terminated - Note the termination of a write
> operation.
> + * @_op: The I/O request that has terminated.
> + * @transferred_or_error: The amount of data transferred or an error code.
> + * @was_async: The termination was asynchronous
> + *
> + * This tells the library that a contributory write I/O operation has
> + * terminated, one way or another, and that it should collect the results.
> + *
> + * The caller indicates in @transferred_or_error the outcome of the operation,
> + * supplying a positive value to indicate the number of bytes transferred or a
> + * negative error code.  The library will look after reissuing I/O operations
> + * as appropriate and writing downloaded data to the cache.
> + *
> + * If @was_async is true, the caller might be running in softirq or interrupt
> + * context and we can't sleep.
> + *
> + * When this is called, ownership of the subrequest is transferred back to the
> + * library, along with a ref.
> + *
> + * Note that %_op is a void* so that the function can be passed to
> + * kiocb::term_func without the need for a casting wrapper.
> + */
> +void new_netfs_write_subrequest_terminated(void *_op, ssize_t
> transferred_or_error,
> +					   bool was_async)
> +{
> +	struct netfs_io_subrequest *subreq = _op;
> +	struct netfs_io_request *wreq = subreq->rreq;
> +	struct netfs_io_stream *stream = &wreq->io_streams[subreq-
> >stream_nr];
> +
> +	_enter("%x[%x] %zd", wreq->debug_id, subreq->debug_index,
> transferred_or_error);
> +
> +	switch (subreq->source) {
> +	case NETFS_UPLOAD_TO_SERVER:
> +		netfs_stat(&netfs_n_wh_upload_done);
> +		break;
> +	case NETFS_WRITE_TO_CACHE:
> +		netfs_stat(&netfs_n_wh_write_done);
> +		break;
> +	case NETFS_INVALID_WRITE:
> +		break;
> +	default:
> +		BUG();
> +	}
> +
> +	if (IS_ERR_VALUE(transferred_or_error)) {
> +		subreq->error = transferred_or_error;
> +		if (subreq->error == -EAGAIN)
> +			set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
> +		else
> +			set_bit(NETFS_SREQ_FAILED, &subreq->flags);
> +		trace_netfs_failure(wreq, subreq, transferred_or_error,
> netfs_fail_write);
> +
> +		switch (subreq->source) {
> +		case NETFS_WRITE_TO_CACHE:
> +			netfs_stat(&netfs_n_wh_write_failed);
> +			break;
> +		case NETFS_UPLOAD_TO_SERVER:
> +			netfs_stat(&netfs_n_wh_upload_failed);
> +			break;
> +		default:
> +			break;
> +		}
> +		trace_netfs_rreq(wreq, netfs_rreq_trace_set_pause);
> +		set_bit(NETFS_RREQ_PAUSE, &wreq->flags);
> +	} else {
> +		if (WARN(transferred_or_error > subreq->len - subreq-
> >transferred,
> +			 "Subreq excess write: R=%x[%x] %zd > %zu - %zu",
> +			 wreq->debug_id, subreq->debug_index,
> +			 transferred_or_error, subreq->len, subreq->transferred))
> +			transferred_or_error = subreq->len - subreq->transferred;
> +
> +		subreq->error = 0;
> +		subreq->transferred += transferred_or_error;
> +
> +		if (subreq->transferred < subreq->len)
> +			set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
> +	}
> +
> +	trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
> +
> +	clear_bit_unlock(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
> +	wake_up_bit(&subreq->flags, NETFS_SREQ_IN_PROGRESS);
> +
> +	/* If we are at the head of the queue, wake up the collector,
> +	 * transferring a ref to it if we were the ones to do so.
> +	 */
> +	if (list_is_first(&subreq->rreq_link, &stream->subrequests))
> +		netfs_wake_write_collector(wreq, was_async);
> +
> +	netfs_put_subrequest(subreq, was_async,
> netfs_sreq_trace_put_terminated);
> +}
> +EXPORT_SYMBOL(new_netfs_write_subrequest_terminated);
> diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
> new file mode 100644
> index 000000000000..e0fb472898f5
> --- /dev/null
> +++ b/fs/netfs/write_issue.c
> @@ -0,0 +1,673 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Network filesystem high-level (buffered) writeback.
> + *
> + * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + *
> + *
> + * To support network filesystems with local caching, we manage a situation
> + * that can be envisioned like the following:
> + *
> + *               +---+---+-----+-----+---+----------+
> + *    Folios:    |   |   |     |     |   |          |
> + *               +---+---+-----+-----+---+----------+
> + *
> + *                 +------+------+     +----+----+
> + *    Upload:      |      |      |.....|    |    |
> + *  (Stream 0)     +------+------+     +----+----+
> + *
> + *               +------+------+------+------+------+
> + *    Cache:     |      |      |      |      |      |
> + *  (Stream 1)   +------+------+------+------+------+
> + *
> + * Where we have a sequence of folios of varying sizes that we need to overlay
> + * with multiple parallel streams of I/O requests, where the I/O requests in a
> + * stream may also be of various sizes (in cifs, for example, the sizes are
> + * negotiated with the server; in something like ceph, they may represent the
> + * sizes of storage objects).
> + *
> + * The sequence in each stream may contain gaps and noncontiguous
> subrequests
> + * may be glued together into single vectored write RPCs.
> + */
> +
> +#include <linux/export.h>
> +#include <linux/fs.h>
> +#include <linux/mm.h>
> +#include <linux/pagemap.h>
> +#include "internal.h"
> +
> +/*
> + * Kill all dirty folios in the event of an unrecoverable error, starting with
> + * a locked folio we've already obtained from writeback_iter().
> + */
> +static void netfs_kill_dirty_pages(struct address_space *mapping,
> +				   struct writeback_control *wbc,
> +				   struct folio *folio)
> +{
> +	int error = 0;
> +
> +	do {
> +		enum netfs_folio_trace why = netfs_folio_trace_kill;
> +		struct netfs_group *group = NULL;
> +		struct netfs_folio *finfo = NULL;
> +		void *priv;
> +
> +		priv = folio_detach_private(folio);
> +		if (priv) {
> +			finfo = __netfs_folio_info(priv);
> +			if (finfo) {
> +				/* Kill folio from streaming write. */
> +				group = finfo->netfs_group;
> +				why = netfs_folio_trace_kill_s;
> +			} else {
> +				group = priv;
> +				if (group == NETFS_FOLIO_COPY_TO_CACHE)
> {
> +					/* Kill copy-to-cache folio */
> +					why = netfs_folio_trace_kill_cc;
> +					group = NULL;
> +				} else {
> +					/* Kill folio with group */
> +					why = netfs_folio_trace_kill_g;
> +				}
> +			}
> +		}
> +
> +		trace_netfs_folio(folio, why);
> +
> +		folio_start_writeback(folio);
> +		folio_unlock(folio);
> +		folio_end_writeback(folio);
> +
> +		netfs_put_group(group);
> +		kfree(finfo);
> +
> +	} while ((folio = writeback_iter(mapping, wbc, folio, &error)));
> +}
> +
> +/*
> + * Create a write request and set it up appropriately for the origin type.
> + */
> +struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
> +						struct file *file,
> +						loff_t start,
> +						enum netfs_io_origin origin)
> +{
> +	struct netfs_io_request *wreq;
> +	struct netfs_inode *ictx;
> +
> +	wreq = netfs_alloc_request(mapping, file, start, 0, origin);
> +	if (IS_ERR(wreq))
> +		return wreq;
> +
> +	_enter("R=%x", wreq->debug_id);
> +
> +	ictx = netfs_inode(wreq->inode);
> +	if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &wreq->flags))
> +		fscache_begin_write_operation(&wreq->cache_resources,
> netfs_i_cookie(ictx));
> +
> +	wreq->contiguity = wreq->start;
> +	wreq->cleaned_to = wreq->start;
> +	INIT_WORK(&wreq->work, netfs_write_collection_worker);
> +
> +	wreq->io_streams[0].stream_nr		= 0;
> +	wreq->io_streams[0].source		=
> NETFS_UPLOAD_TO_SERVER;
> +	wreq->io_streams[0].prepare_write	= ictx->ops->prepare_write;
> +	wreq->io_streams[0].issue_write		= ictx->ops->issue_write;
> +	wreq->io_streams[0].collected_to	= start;
> +	wreq->io_streams[0].transferred		= LONG_MAX;
> +
> +	wreq->io_streams[1].stream_nr		= 1;
> +	wreq->io_streams[1].source		= NETFS_WRITE_TO_CACHE;
> +	wreq->io_streams[1].collected_to	= start;
> +	wreq->io_streams[1].transferred		= LONG_MAX;
> +	if (fscache_resources_valid(&wreq->cache_resources)) {
> +		wreq->io_streams[1].avail	= true;
> +		wreq->io_streams[1].prepare_write = wreq-
> >cache_resources.ops->prepare_write_subreq;
> +		wreq->io_streams[1].issue_write = wreq->cache_resources.ops-
> >issue_write;
> +	}
> +
> +	return wreq;
> +}
> +
> +/**
> + * netfs_prepare_write_failed - Note write preparation failed
> + * @subreq: The subrequest to mark
> + *
> + * Mark a subrequest to note that preparation for write failed.
> + */
> +void netfs_prepare_write_failed(struct netfs_io_subrequest *subreq)
> +{
> +	__set_bit(NETFS_SREQ_FAILED, &subreq->flags);
> +	trace_netfs_sreq(subreq, netfs_sreq_trace_prep_failed);
> +}
> +EXPORT_SYMBOL(netfs_prepare_write_failed);
> +
> +/*
> + * Prepare a write subrequest.  We need to allocate a new subrequest
> + * if we don't have one.
> + */
> +static void netfs_prepare_write(struct netfs_io_request *wreq,
> +				struct netfs_io_stream *stream,
> +				loff_t start)
> +{
> +	struct netfs_io_subrequest *subreq;
> +
> +	subreq = netfs_alloc_subrequest(wreq);
> +	subreq->source		= stream->source;
> +	subreq->start		= start;
> +	subreq->max_len		= ULONG_MAX;
> +	subreq->max_nr_segs	= INT_MAX;
> +	subreq->stream_nr	= stream->stream_nr;
> +
> +	_enter("R=%x[%x]", wreq->debug_id, subreq->debug_index);
> +
> +	trace_netfs_sreq_ref(wreq->debug_id, subreq->debug_index,
> +			     refcount_read(&subreq->ref),
> +			     netfs_sreq_trace_new);
> +
> +	trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
> +
> +	switch (stream->source) {
> +	case NETFS_UPLOAD_TO_SERVER:
> +		netfs_stat(&netfs_n_wh_upload);
> +		subreq->max_len = wreq->wsize;
> +		break;
> +	case NETFS_WRITE_TO_CACHE:
> +		netfs_stat(&netfs_n_wh_write);
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		break;
> +	}
> +
> +	if (stream->prepare_write)
> +		stream->prepare_write(subreq);
> +
> +	__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
> +
> +	/* We add to the end of the list whilst the collector may be walking
> +	 * the list.  The collector only goes nextwards and uses the lock to
> +	 * remove entries off of the front.
> +	 */
> +	spin_lock(&wreq->lock);
> +	list_add_tail(&subreq->rreq_link, &stream->subrequests);
> +	if (list_is_first(&subreq->rreq_link, &stream->subrequests)) {
> +		stream->front = subreq;
> +		if (!stream->active) {
> +			stream->collected_to = stream->front->start;
> +			/* Write list pointers before active flag */
> +			smp_store_release(&stream->active, true);
> +		}
> +	}
> +
> +	spin_unlock(&wreq->lock);
> +
> +	stream->construct = subreq;
> +}
> +
> +/*
> + * Set the I/O iterator for the filesystem/cache to use and dispatch the I/O
> + * operation.  The operation may be asynchronous and should call
> + * netfs_write_subrequest_terminated() when complete.
> + */
> +static void netfs_do_issue_write(struct netfs_io_stream *stream,
> +				 struct netfs_io_subrequest *subreq)
> +{
> +	struct netfs_io_request *wreq = subreq->rreq;
> +
> +	_enter("R=%x[%x],%zx", wreq->debug_id, subreq->debug_index, subreq-
> >len);
> +
> +	if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
> +		return netfs_write_subrequest_terminated(subreq, subreq->error,
> false);
> +
> +	// TODO: Use encrypted buffer
> +	if (test_bit(NETFS_RREQ_USE_IO_ITER, &wreq->flags)) {
> +		subreq->io_iter = wreq->io_iter;
> +		iov_iter_advance(&subreq->io_iter,
> +				 subreq->start + subreq->transferred - wreq-
> >start);
> +		iov_iter_truncate(&subreq->io_iter,
> +				 subreq->len - subreq->transferred);
> +	} else {
> +		iov_iter_xarray(&subreq->io_iter, ITER_SOURCE, &wreq-
> >mapping->i_pages,
> +				subreq->start + subreq->transferred,
> +				subreq->len   - subreq->transferred);
> +	}
> +
> +	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
> +	stream->issue_write(subreq);
> +}
> +
> +void netfs_reissue_write(struct netfs_io_stream *stream,
> +			 struct netfs_io_subrequest *subreq)
> +{
> +	__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
> +	netfs_do_issue_write(stream, subreq);
> +}
> +
> +static void netfs_issue_write(struct netfs_io_request *wreq,
> +			      struct netfs_io_stream *stream)
> +{
> +	struct netfs_io_subrequest *subreq = stream->construct;
> +
> +	if (!subreq)
> +		return;
> +	stream->construct = NULL;
> +
> +	if (subreq->start + subreq->len > wreq->start + wreq->submitted)
> +		wreq->len = wreq->submitted = subreq->start + subreq->len -
> wreq->start;
> +	netfs_do_issue_write(stream, subreq);
> +}
> +
> +/*
> + * Add data to the write subrequest, dispatching each as we fill it up or if it
> + * is discontiguous with the previous.  We only fill one part at a time so that
> + * we can avoid overrunning the credits obtained (cifs) and try to parallelise
> + * content-crypto preparation with network writes.
> + */
> +int netfs_advance_write(struct netfs_io_request *wreq,
> +			struct netfs_io_stream *stream,
> +			loff_t start, size_t len, bool to_eof)
> +{
> +	struct netfs_io_subrequest *subreq = stream->construct;
> +	size_t part;
> +
> +	if (!stream->avail) {
> +		_leave("no write");
> +		return len;
> +	}
> +
> +	_enter("R=%x[%x]", wreq->debug_id, subreq ? subreq->debug_index : 0);
> +
> +	if (subreq && start != subreq->start + subreq->len) {
> +		netfs_issue_write(wreq, stream);
> +		subreq = NULL;
> +	}
> +
> +	if (!stream->construct)
> +		netfs_prepare_write(wreq, stream, start);
> +	subreq = stream->construct;
> +
> +	part = min(subreq->max_len - subreq->len, len);
> +	_debug("part %zx/%zx %zx/%zx", subreq->len, subreq->max_len, part,
> len);
> +	subreq->len += part;
> +	subreq->nr_segs++;
> +
> +	if (subreq->len >= subreq->max_len ||
> +	    subreq->nr_segs >= subreq->max_nr_segs ||
> +	    to_eof) {
> +		netfs_issue_write(wreq, stream);
> +		subreq = NULL;
> +	}
> +
> +	return part;
> +}
> +
> +/*
> + * Write some of a pending folio data back to the server.
> + */
> +static int netfs_write_folio(struct netfs_io_request *wreq,
> +			     struct writeback_control *wbc,
> +			     struct folio *folio)
> +{
> +	struct netfs_io_stream *upload = &wreq->io_streams[0];
> +	struct netfs_io_stream *cache  = &wreq->io_streams[1];
> +	struct netfs_io_stream *stream;
> +	struct netfs_group *fgroup; /* TODO: Use this with ceph */
> +	struct netfs_folio *finfo;
> +	size_t fsize = folio_size(folio), flen = fsize, foff = 0;
> +	loff_t fpos = folio_pos(folio);
> +	bool to_eof = false, streamw = false;
> +	bool debug = false;
> +
> +	_enter("");
> +
> +	if (fpos >= wreq->i_size) {
> +		/* mmap beyond eof. */
> +		_debug("beyond eof");
> +		folio_start_writeback(folio);
> +		folio_unlock(folio);
> +		wreq->nr_group_rel += netfs_folio_written_back(folio);
> +		netfs_put_group_many(wreq->group, wreq->nr_group_rel);
> +		wreq->nr_group_rel = 0;
> +		return 0;
> +	}
> +
> +	fgroup = netfs_folio_group(folio);
> +	finfo = netfs_folio_info(folio);
> +	if (finfo) {
> +		foff = finfo->dirty_offset;
> +		flen = foff + finfo->dirty_len;
> +		streamw = true;
> +	}
> +
> +	if (wreq->origin == NETFS_WRITETHROUGH) {
> +		to_eof = false;
> +		if (flen > wreq->i_size - fpos)
> +			flen = wreq->i_size - fpos;
> +	} else if (flen > wreq->i_size - fpos) {
> +		flen = wreq->i_size - fpos;
> +		if (!streamw)
> +			folio_zero_segment(folio, flen, fsize);
> +		to_eof = true;
> +	} else if (flen == wreq->i_size - fpos) {
> +		to_eof = true;
> +	}
> +	flen -= foff;
> +
> +	_debug("folio %zx %zx %zx", foff, flen, fsize);
> +
> +	/* Deal with discontinuities in the stream of dirty pages.  These can
> +	 * arise from a number of sources:
> +	 *
> +	 * (1) Intervening non-dirty pages from random-access writes, multiple
> +	 *     flushers writing back different parts simultaneously and manual
> +	 *     syncing.
> +	 *
> +	 * (2) Partially-written pages from write-streaming.
> +	 *
> +	 * (3) Pages that belong to a different write-back group (eg.  Ceph
> +	 *     snapshots).
> +	 *
> +	 * (4) Actually-clean pages that were marked for write to the cache
> +	 *     when they were read.  Note that these appear as a special
> +	 *     write-back group.
> +	 */
> +	if (fgroup == NETFS_FOLIO_COPY_TO_CACHE) {
> +		netfs_issue_write(wreq, upload);
> +	} else if (fgroup != wreq->group) {
> +		/* We can't write this page to the server yet. */
> +		kdebug("wrong group");
> +		folio_redirty_for_writepage(wbc, folio);
> +		folio_unlock(folio);
> +		netfs_issue_write(wreq, upload);
> +		netfs_issue_write(wreq, cache);
> +		return 0;
> +	}
> +
> +	if (foff > 0)
> +		netfs_issue_write(wreq, upload);
> +	if (streamw)
> +		netfs_issue_write(wreq, cache);
> +
> +	/* Flip the page to the writeback state and unlock.  If we're called
> +	 * from write-through, then the page has already been put into the wb
> +	 * state.
> +	 */
> +	if (wreq->origin == NETFS_WRITEBACK)
> +		folio_start_writeback(folio);
> +	folio_unlock(folio);
> +
> +	if (fgroup == NETFS_FOLIO_COPY_TO_CACHE) {
> +		if (!fscache_resources_valid(&wreq->cache_resources)) {
> +			trace_netfs_folio(folio, netfs_folio_trace_cancel_copy);
> +			netfs_issue_write(wreq, upload);
> +			netfs_folio_written_back(folio);
> +			return 0;
> +		}
> +		trace_netfs_folio(folio, netfs_folio_trace_store_copy);
> +	} else if (!upload->construct) {
> +		trace_netfs_folio(folio, netfs_folio_trace_store);
> +	} else {
> +		trace_netfs_folio(folio, netfs_folio_trace_store_plus);
> +	}
> +
> +	/* Move the submission point forward to allow for write-streaming data
> +	 * not starting at the front of the page.  We don't do write-streaming
> +	 * with the cache as the cache requires DIO alignment.
> +	 *
> +	 * Also skip uploading for data that's been read and just needs copying
> +	 * to the cache.
> +	 */
> +	for (int s = 0; s < NR_IO_STREAMS; s++) {
> +		stream = &wreq->io_streams[s];
> +		stream->submit_max_len = fsize;
> +		stream->submit_off = foff;
> +		stream->submit_len = flen;
> +		if ((stream->source == NETFS_WRITE_TO_CACHE && streamw)
> ||
> +		    (stream->source == NETFS_UPLOAD_TO_SERVER &&
> +		     fgroup == NETFS_FOLIO_COPY_TO_CACHE)) {
> +			stream->submit_off = UINT_MAX;
> +			stream->submit_len = 0;
> +			stream->submit_max_len = 0;
> +		}
> +	}
> +
> +	/* Attach the folio to one or more subrequests.  For a big folio, we
> +	 * could end up with thousands of subrequests if the wsize is small -
> +	 * but we might need to wait during the creation of subrequests for
> +	 * network resources (eg. SMB credits).
> +	 */
> +	for (;;) {
> +		ssize_t part;
> +		size_t lowest_off = ULONG_MAX;
> +		int choose_s = -1;
> +
> +		/* Always add to the lowest-submitted stream first. */
> +		for (int s = 0; s < NR_IO_STREAMS; s++) {
> +			stream = &wreq->io_streams[s];
> +			if (stream->submit_len > 0 &&
> +			    stream->submit_off < lowest_off) {
> +				lowest_off = stream->submit_off;
> +				choose_s = s;
> +			}
> +		}
> +
> +		if (choose_s < 0)
> +			break;
> +		stream = &wreq->io_streams[choose_s];
> +
> +		part = netfs_advance_write(wreq, stream, fpos + stream-
> >submit_off,
> +					   stream->submit_len, to_eof);
> +		atomic64_set(&wreq->issued_to, fpos + stream->submit_off);
> +		stream->submit_off += part;
> +		stream->submit_max_len -= part;
> +		if (part > stream->submit_len)
> +			stream->submit_len = 0;
> +		else
> +			stream->submit_len -= part;
> +		if (part > 0)
> +			debug = true;
> +	}
> +
> +	atomic64_set(&wreq->issued_to, fpos + fsize);
> +
> +	if (!debug)
> +		kdebug("R=%x: No submit", wreq->debug_id);
> +
> +	if (flen < fsize)
> +		for (int s = 0; s < NR_IO_STREAMS; s++)
> +			netfs_issue_write(wreq, &wreq->io_streams[s]);
> +
> +	_leave(" = 0");
> +	return 0;
> +}
> +
> +/*
> + * Write some of the pending data back to the server
> + */
> +int new_netfs_writepages(struct address_space *mapping,
> +			 struct writeback_control *wbc)
> +{
> +	struct netfs_inode *ictx = netfs_inode(mapping->host);
> +	struct netfs_io_request *wreq = NULL;
> +	struct folio *folio;
> +	int error = 0;
> +
> +	if (wbc->sync_mode == WB_SYNC_ALL)
> +		mutex_lock(&ictx->wb_lock);
> +	else if (!mutex_trylock(&ictx->wb_lock))
> +		return 0;
> +
> +	/* Need the first folio to be able to set up the op. */
> +	folio = writeback_iter(mapping, wbc, NULL, &error);
> +	if (!folio)
> +		goto out;
> +
> +	wreq = netfs_create_write_req(mapping, NULL, folio_pos(folio),
> NETFS_WRITEBACK);
> +	if (IS_ERR(wreq)) {
> +		error = PTR_ERR(wreq);
> +		goto couldnt_start;
> +	}
> +
> +	trace_netfs_write(wreq, netfs_write_trace_writeback);
> +	netfs_stat(&netfs_n_wh_writepages);
> +
> +	do {
> +		_debug("wbiter %lx %llx", folio->index, wreq->start + wreq-
> >submitted);
> +
> +		/* It appears we don't have to handle cyclic writeback wrapping. */
> +		WARN_ON_ONCE(wreq && folio_pos(folio) < wreq->start + wreq-
> >submitted);
> +
> +		if (netfs_folio_group(folio) != NETFS_FOLIO_COPY_TO_CACHE
> &&
> +		    unlikely(!test_bit(NETFS_RREQ_UPLOAD_TO_SERVER,
> &wreq->flags))) {
> +			set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq-
> >flags);
> +			wreq->netfs_ops->begin_writeback(wreq);
> +		}
> +
> +		error = netfs_write_folio(wreq, wbc, folio);
> +		if (error < 0)
> +			break;
> +	} while ((folio = writeback_iter(mapping, wbc, folio, &error)));
> +
> +	for (int s = 0; s < NR_IO_STREAMS; s++)
> +		netfs_issue_write(wreq, &wreq->io_streams[s]);
> +	smp_wmb(); /* Write lists before ALL_QUEUED. */
> +	set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags);
> +
> +	mutex_unlock(&ictx->wb_lock);
> +
> +	netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
> +	_leave(" = %d", error);
> +	return error;
> +
> +couldnt_start:
> +	netfs_kill_dirty_pages(mapping, wbc, folio);
> +out:
> +	mutex_unlock(&ictx->wb_lock);
> +	_leave(" = %d", error);
> +	return error;
> +}
> +EXPORT_SYMBOL(new_netfs_writepages);
> +
> +/*
> + * Begin a write operation for writing through the pagecache.
> + */
> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t
> len)
> +{
> +	struct netfs_io_request *wreq = NULL;
> +	struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
> +
> +	mutex_lock(&ictx->wb_lock);
> +
> +	wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
> +				      iocb->ki_pos, NETFS_WRITETHROUGH);
> +	if (IS_ERR(wreq))
> +		mutex_unlock(&ictx->wb_lock);
> +
> +	wreq->io_streams[0].avail = true;
> +	trace_netfs_write(wreq, netfs_write_trace_writethrough);

Missing mutex_unlock() before return.

Thanks,
Naveen

> +	return wreq;
> +}
> +
> +/*
> + * Advance the state of the write operation used when writing through the
> + * pagecache.  Data has been copied into the pagecache that we need to append
> + * to the request.  If we've added more than wsize then we need to create a new
> + * subrequest.
> + */
> +int new_netfs_advance_writethrough(struct netfs_io_request *wreq, struct
> writeback_control *wbc,
> +				   struct folio *folio, size_t copied, bool
> to_page_end,
> +				   struct folio **writethrough_cache)
> +{
> +	_enter("R=%x ic=%zu ws=%u cp=%zu tp=%u",
> +	       wreq->debug_id, wreq->iter.count, wreq->wsize, copied,
> to_page_end);
> +
> +	if (!*writethrough_cache) {
> +		if (folio_test_dirty(folio))
> +			/* Sigh.  mmap. */
> +			folio_clear_dirty_for_io(folio);
> +
> +		/* We can make multiple writes to the folio... */
> +		folio_start_writeback(folio);
> +		if (wreq->len == 0)
> +			trace_netfs_folio(folio, netfs_folio_trace_wthru);
> +		else
> +			trace_netfs_folio(folio, netfs_folio_trace_wthru_plus);
> +		*writethrough_cache = folio;
> +	}
> +
> +	wreq->len += copied;
> +	if (!to_page_end)
> +		return 0;
> +
> +	*writethrough_cache = NULL;
> +	return netfs_write_folio(wreq, wbc, folio);
> +}
> +
> +/*
> + * End a write operation used when writing through the pagecache.
> + */
> +int new_netfs_end_writethrough(struct netfs_io_request *wreq, struct
> writeback_control *wbc,
> +			       struct folio *writethrough_cache)
> +{
> +	struct netfs_inode *ictx = netfs_inode(wreq->inode);
> +	int ret;
> +
> +	_enter("R=%x", wreq->debug_id);
> +
> +	if (writethrough_cache)
> +		netfs_write_folio(wreq, wbc, writethrough_cache);
> +
> +	netfs_issue_write(wreq, &wreq->io_streams[0]);
> +	netfs_issue_write(wreq, &wreq->io_streams[1]);
> +	smp_wmb(); /* Write lists before ALL_QUEUED. */
> +	set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags);
> +
> +	mutex_unlock(&ictx->wb_lock);
> +
> +	ret = wreq->error;
> +	netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
> +	return ret;
> +}
> +
> +/*
> + * Write data to the server without going through the pagecache and without
> + * writing it to the local cache.
> + */
> +int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t
> len)
> +{
> +	struct netfs_io_stream *upload = &wreq->io_streams[0];
> +	ssize_t part;
> +	loff_t start = wreq->start;
> +	int error = 0;
> +
> +	_enter("%zx", len);
> +
> +	if (wreq->origin == NETFS_DIO_WRITE)
> +		inode_dio_begin(wreq->inode);
> +
> +	while (len) {
> +		// TODO: Prepare content encryption
> +
> +		_debug("unbuffered %zx", len);
> +		part = netfs_advance_write(wreq, upload, start, len, false);
> +		start += part;
> +		len -= part;
> +		if (test_bit(NETFS_RREQ_PAUSE, &wreq->flags)) {
> +			trace_netfs_rreq(wreq, netfs_rreq_trace_wait_pause);
> +			wait_on_bit(&wreq->flags, NETFS_RREQ_PAUSE,
> TASK_UNINTERRUPTIBLE);
> +		}
> +		if (test_bit(NETFS_RREQ_FAILED, &wreq->flags))
> +			break;
> +	}
> +
> +	netfs_issue_write(wreq, upload);
> +
> +	smp_wmb(); /* Write lists before ALL_QUEUED. */
> +	set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags);
> +	if (list_empty(&upload->subrequests))
> +		netfs_wake_write_collector(wreq, false);
> +
> +	_leave(" = %d", error);
> +	return error;
> +}
> diff --git a/include/linux/netfs.h b/include/linux/netfs.h
> index 88269681d4fc..42dba05a428b 100644
> --- a/include/linux/netfs.h
> +++ b/include/linux/netfs.h
> @@ -64,6 +64,7 @@ struct netfs_inode {
>  #if IS_ENABLED(CONFIG_FSCACHE)
>  	struct fscache_cookie	*cache;
>  #endif
> +	struct mutex		wb_lock;	/* Writeback serialisation */
>  	loff_t			remote_i_size;	/* Size of the remote file */
>  	loff_t			zero_point;	/* Size after which we assume
> there's no data
>  						 * on the server */
> @@ -71,7 +72,6 @@ struct netfs_inode {
>  #define NETFS_ICTX_ODIRECT	0		/* The file has DIO in
> progress */
>  #define NETFS_ICTX_UNBUFFERED	1		/* I/O should not use the
> pagecache */
>  #define NETFS_ICTX_WRITETHROUGH	2		/* Write-through
> caching */
> -#define NETFS_ICTX_NO_WRITE_STREAMING	3	/* Don't engage in
> write-streaming */
>  #define NETFS_ICTX_USE_PGPRIV2	31		/* [DEPRECATED] Use
> PG_private_2 to mark
>  						 * write to cache on read */
>  };
> @@ -126,6 +126,33 @@ static inline struct netfs_group *netfs_folio_group(struct
> folio *folio)
>  	return priv;
>  }
> 
> +/*
> + * Stream of I/O subrequests going to a particular destination, such as the
> + * server or the local cache.  This is mainly intended for writing where we may
> + * have to write to multiple destinations concurrently.
> + */
> +struct netfs_io_stream {
> +	/* Submission tracking */
> +	struct netfs_io_subrequest *construct;	/* Op being constructed */
> +	unsigned int		submit_off;	/* Folio offset we're submitting
> from */
> +	unsigned int		submit_len;	/* Amount of data left to submit */
> +	unsigned int		submit_max_len;	/* Amount I/O can be
> rounded up to */
> +	void (*prepare_write)(struct netfs_io_subrequest *subreq);
> +	void (*issue_write)(struct netfs_io_subrequest *subreq);
> +	/* Collection tracking */
> +	struct list_head	subrequests;	/* Contributory I/O operations */
> +	struct netfs_io_subrequest *front;	/* Op being collected */
> +	unsigned long long	collected_to;	/* Position we've collected results
> to */
> +	size_t			transferred;	/* The amount transferred from
> this stream */
> +	enum netfs_io_source	source;		/* Where to read from/write to */
> +	unsigned short		error;		/* Aggregate error for the stream
> */
> +	unsigned char		stream_nr;	/* Index of stream in parent table
> */
> +	bool			avail;		/* T if stream is available */
> +	bool			active;		/* T if stream is active */
> +	bool			need_retry;	/* T if this stream needs retrying
> */
> +	bool			failed;		/* T if this stream failed */
> +};
> +
>  /*
>   * Resources required to do operations on a cache.
>   */
> @@ -150,13 +177,16 @@ struct netfs_io_subrequest {
>  	struct list_head	rreq_link;	/* Link in rreq->subrequests */
>  	struct iov_iter		io_iter;	/* Iterator for this subrequest */
>  	unsigned long long	start;		/* Where to start the I/O */
> +	size_t			max_len;	/* Maximum size of the I/O */
>  	size_t			len;		/* Size of the I/O */
>  	size_t			transferred;	/* Amount of data transferred */
>  	refcount_t		ref;
>  	short			error;		/* 0 or error that occurred */
>  	unsigned short		debug_index;	/* Index in list (for debugging
> output) */
> +	unsigned int		nr_segs;	/* Number of segs in io_iter */
>  	unsigned int		max_nr_segs;	/* 0 or max number of segments
> in an iterator */
>  	enum netfs_io_source	source;		/* Where to read from/write to */
> +	unsigned char		stream_nr;	/* I/O stream this belongs to */
>  	unsigned long		flags;
>  #define NETFS_SREQ_COPY_TO_CACHE	0	/* Set if should copy the
> data to the cache */
>  #define NETFS_SREQ_CLEAR_TAIL		1	/* Set if the rest of the
> read should be cleared */
> @@ -164,6 +194,11 @@ struct netfs_io_subrequest {
>  #define NETFS_SREQ_SEEK_DATA_READ	3	/* Set if ->read() should
> SEEK_DATA first */
>  #define NETFS_SREQ_NO_PROGRESS		4	/* Set if we didn't
> manage to read any data */
>  #define NETFS_SREQ_ONDEMAND		5	/* Set if it's from on-
> demand read mode */
> +#define NETFS_SREQ_BOUNDARY		6	/* Set if ends on hard
> boundary (eg. ceph object) */
> +#define NETFS_SREQ_IN_PROGRESS		8	/* Unlocked when
> the subrequest completes */
> +#define NETFS_SREQ_NEED_RETRY		9	/* Set if the
> filesystem requests a retry */
> +#define NETFS_SREQ_RETRYING		10	/* Set if we're retrying */
> +#define NETFS_SREQ_FAILED		11	/* Set if the subreq failed
> unretryably */
>  };
> 
>  enum netfs_io_origin {
> @@ -194,6 +229,9 @@ struct netfs_io_request {
>  	struct netfs_cache_resources cache_resources;
>  	struct list_head	proc_link;	/* Link in netfs_iorequests */
>  	struct list_head	subrequests;	/* Contributory I/O operations */
> +	struct netfs_io_stream	io_streams[2];	/* Streams of parallel I/O
> operations */
> +#define NR_IO_STREAMS 2 //wreq->nr_io_streams
> +	struct netfs_group	*group;		/* Writeback group being written
> back */
>  	struct iov_iter		iter;		/* Unencrypted-side iterator */
>  	struct iov_iter		io_iter;	/* I/O (Encrypted-side) iterator */
>  	void			*netfs_priv;	/* Private data for the netfs */
> @@ -203,6 +241,8 @@ struct netfs_io_request {
>  	unsigned int		rsize;		/* Maximum read size (0 for none)
> */
>  	unsigned int		wsize;		/* Maximum write size (0 for
> none) */
>  	atomic_t		subreq_counter;	/* Next subreq-
> >debug_index */
> +	unsigned int		nr_group_rel;	/* Number of refs to release on -
> >group */
> +	spinlock_t		lock;		/* Lock for queuing subreqs */
>  	atomic_t		nr_outstanding;	/* Number of ops in progress */
>  	atomic_t		nr_copy_ops;	/* Number of copy-to-cache ops in
> progress */
>  	size_t			upper_len;	/* Length can be extended to here
> */
> @@ -214,6 +254,10 @@ struct netfs_io_request {
>  	bool			direct_bv_unpin; /* T if direct_bv[] must be
> unpinned */
>  	unsigned long long	i_size;		/* Size of the file */
>  	unsigned long long	start;		/* Start position */
> +	atomic64_t		issued_to;	/* Write issuer folio cursor */
> +	unsigned long long	contiguity;	/* Tracking for gaps in the
> writeback sequence */
> +	unsigned long long	collected_to;	/* Point we've collected to */
> +	unsigned long long	cleaned_to;	/* Position we've cleaned folios to
> */
>  	pgoff_t			no_unlock_folio; /* Don't unlock this folio after
> read */
>  	refcount_t		ref;
>  	unsigned long		flags;
> @@ -227,6 +271,9 @@ struct netfs_io_request {
>  #define NETFS_RREQ_UPLOAD_TO_SERVER	8	/* Need to write to
> the server */
>  #define NETFS_RREQ_NONBLOCK		9	/* Don't block if possible
> (O_NONBLOCK) */
>  #define NETFS_RREQ_BLOCKED		10	/* We blocked */
> +#define NETFS_RREQ_PAUSE		11	/* Pause subrequest
> generation */
> +#define NETFS_RREQ_USE_IO_ITER		12	/* Use ->io_iter
> rather than ->i_pages */
> +#define NETFS_RREQ_ALL_QUEUED		13	/* All subreqs are
> now queued */
>  #define NETFS_RREQ_USE_PGPRIV2		31	/*
> [DEPRECATED] Use PG_private_2 to mark
>  						 * write to cache on read */
>  	const struct netfs_request_ops *netfs_ops;
> @@ -258,6 +305,9 @@ struct netfs_request_ops {
>  	/* Write request handling */
>  	void (*create_write_requests)(struct netfs_io_request *wreq,
>  				      loff_t start, size_t len);
> +	void (*begin_writeback)(struct netfs_io_request *wreq);
> +	void (*prepare_write)(struct netfs_io_subrequest *subreq);
> +	void (*issue_write)(struct netfs_io_subrequest *subreq);
>  	void (*invalidate_cache)(struct netfs_io_request *wreq);
>  };
> 
> @@ -292,6 +342,9 @@ struct netfs_cache_ops {
>  		     netfs_io_terminated_t term_func,
>  		     void *term_func_priv);
> 
> +	/* Write data to the cache from a netfs subrequest. */
> +	void (*issue_write)(struct netfs_io_subrequest *subreq);
> +
>  	/* Expand readahead request */
>  	void (*expand_readahead)(struct netfs_cache_resources *cres,
>  				 unsigned long long *_start,
> @@ -304,6 +357,13 @@ struct netfs_cache_ops {
>  	enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest
> *subreq,
>  					     unsigned long long i_size);
> 
> +	/* Prepare a write subrequest, working out if we're allowed to do it
> +	 * and finding out the maximum amount of data to gather before
> +	 * attempting to submit.  If we're not permitted to do it, the
> +	 * subrequest should be marked failed.
> +	 */
> +	void (*prepare_write_subreq)(struct netfs_io_subrequest *subreq);
> +
>  	/* Prepare a write operation, working out what part of the write we can
>  	 * actually do.
>  	 */
> @@ -349,6 +409,8 @@ int netfs_write_begin(struct netfs_inode *, struct file *,
>  		      struct folio **, void **fsdata);
>  int netfs_writepages(struct address_space *mapping,
>  		     struct writeback_control *wbc);
> +int new_netfs_writepages(struct address_space *mapping,
> +			struct writeback_control *wbc);
>  bool netfs_dirty_folio(struct address_space *mapping, struct folio *folio);
>  int netfs_unpin_writeback(struct inode *inode, struct writeback_control *wbc);
>  void netfs_clear_inode_writeback(struct inode *inode, const void *aux);
> @@ -372,8 +434,11 @@ size_t netfs_limit_iter(const struct iov_iter *iter, size_t
> start_offset,
>  struct netfs_io_subrequest *netfs_create_write_request(
>  	struct netfs_io_request *wreq, enum netfs_io_source dest,
>  	loff_t start, size_t len, work_func_t worker);
> +void netfs_prepare_write_failed(struct netfs_io_subrequest *subreq);
>  void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
>  				       bool was_async);
> +void new_netfs_write_subrequest_terminated(void *_op, ssize_t
> transferred_or_error,
> +					   bool was_async);
>  void netfs_queue_write_request(struct netfs_io_subrequest *subreq);
> 
>  int netfs_start_io_read(struct inode *inode);
> @@ -415,6 +480,7 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
>  #if IS_ENABLED(CONFIG_FSCACHE)
>  	ctx->cache = NULL;
>  #endif
> +	mutex_init(&ctx->wb_lock);
>  	/* ->releasepage() drives zero_point */
>  	if (use_zero_point) {
>  		ctx->zero_point = ctx->remote_i_size;
> diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
> index 7126d2ea459c..e7700172ae7e 100644
> --- a/include/trace/events/netfs.h
> +++ b/include/trace/events/netfs.h
> @@ -44,14 +44,18 @@
>  #define netfs_rreq_traces					\
>  	EM(netfs_rreq_trace_assess,		"ASSESS ")	\
>  	EM(netfs_rreq_trace_copy,		"COPY   ")	\
> +	EM(netfs_rreq_trace_collect,		"COLLECT")	\
>  	EM(netfs_rreq_trace_done,		"DONE   ")	\
>  	EM(netfs_rreq_trace_free,		"FREE   ")	\
>  	EM(netfs_rreq_trace_redirty,		"REDIRTY")	\
>  	EM(netfs_rreq_trace_resubmit,		"RESUBMT")	\
> +	EM(netfs_rreq_trace_set_pause,		"PAUSE  ")	\
>  	EM(netfs_rreq_trace_unlock,		"UNLOCK ")	\
>  	EM(netfs_rreq_trace_unmark,		"UNMARK ")	\
>  	EM(netfs_rreq_trace_wait_ip,		"WAIT-IP")	\
> +	EM(netfs_rreq_trace_wait_pause,		"WT-PAUS")	\
>  	EM(netfs_rreq_trace_wake_ip,		"WAKE-IP")	\
> +	EM(netfs_rreq_trace_unpause,		"UNPAUSE")	\
>  	E_(netfs_rreq_trace_write_done,		"WR-DONE")
> 
>  #define netfs_sreq_sources					\
> @@ -64,11 +68,15 @@
>  	E_(NETFS_INVALID_WRITE,			"INVL")
> 
>  #define netfs_sreq_traces					\
> +	EM(netfs_sreq_trace_discard,		"DSCRD")	\
>  	EM(netfs_sreq_trace_download_instead,	"RDOWN")	\
> +	EM(netfs_sreq_trace_fail,		"FAIL ")	\
>  	EM(netfs_sreq_trace_free,		"FREE ")	\
>  	EM(netfs_sreq_trace_limited,		"LIMIT")	\
>  	EM(netfs_sreq_trace_prepare,		"PREP ")	\
> +	EM(netfs_sreq_trace_prep_failed,	"PRPFL")	\
>  	EM(netfs_sreq_trace_resubmit_short,	"SHORT")	\
> +	EM(netfs_sreq_trace_retry,		"RETRY")	\
>  	EM(netfs_sreq_trace_submit,		"SUBMT")	\
>  	EM(netfs_sreq_trace_terminated,		"TERM ")	\
>  	EM(netfs_sreq_trace_write,		"WRITE")	\
> @@ -88,6 +96,7 @@
>  #define netfs_rreq_ref_traces					\
>  	EM(netfs_rreq_trace_get_for_outstanding,"GET OUTSTND")	\
>  	EM(netfs_rreq_trace_get_subreq,		"GET SUBREQ ")	\
> +	EM(netfs_rreq_trace_get_work,		"GET WORK   ")	\
>  	EM(netfs_rreq_trace_put_complete,	"PUT COMPLT ")	\
>  	EM(netfs_rreq_trace_put_discard,	"PUT DISCARD")	\
>  	EM(netfs_rreq_trace_put_failed,		"PUT FAILED ")	\
> @@ -95,6 +104,8 @@
>  	EM(netfs_rreq_trace_put_return,		"PUT RETURN ")	\
>  	EM(netfs_rreq_trace_put_subreq,		"PUT SUBREQ ")	\
>  	EM(netfs_rreq_trace_put_work,		"PUT WORK   ")	\
> +	EM(netfs_rreq_trace_put_work_complete,	"PUT WORK CP")	\
> +	EM(netfs_rreq_trace_put_work_nq,	"PUT WORK NQ")	\
>  	EM(netfs_rreq_trace_see_work,		"SEE WORK   ")	\
>  	E_(netfs_rreq_trace_new,		"NEW        ")
> 
> @@ -103,11 +114,14 @@
>  	EM(netfs_sreq_trace_get_resubmit,	"GET RESUBMIT")	\
>  	EM(netfs_sreq_trace_get_short_read,	"GET SHORTRD")	\
>  	EM(netfs_sreq_trace_new,		"NEW        ")	\
> +	EM(netfs_sreq_trace_put_cancel,		"PUT CANCEL ")	\
>  	EM(netfs_sreq_trace_put_clear,		"PUT CLEAR  ")	\
>  	EM(netfs_sreq_trace_put_discard,	"PUT DISCARD")	\
> +	EM(netfs_sreq_trace_put_done,		"PUT DONE   ")	\
>  	EM(netfs_sreq_trace_put_failed,		"PUT FAILED ")	\
>  	EM(netfs_sreq_trace_put_merged,		"PUT MERGED ")	\
>  	EM(netfs_sreq_trace_put_no_copy,	"PUT NO COPY")	\
> +	EM(netfs_sreq_trace_put_oom,		"PUT OOM    ")	\
>  	EM(netfs_sreq_trace_put_wip,		"PUT WIP    ")	\
>  	EM(netfs_sreq_trace_put_work,		"PUT WORK   ")	\
>  	E_(netfs_sreq_trace_put_terminated,	"PUT TERM   ")
> @@ -124,7 +138,9 @@
>  	EM(netfs_streaming_filled_page,		"mod-streamw-f") \
>  	EM(netfs_streaming_cont_filled_page,	"mod-streamw-f+") \
>  	/* The rest are for writeback */			\
> +	EM(netfs_folio_trace_cancel_copy,	"cancel-copy")	\
>  	EM(netfs_folio_trace_clear,		"clear")	\
> +	EM(netfs_folio_trace_clear_cc,		"clear-cc")	\
>  	EM(netfs_folio_trace_clear_s,		"clear-s")	\
>  	EM(netfs_folio_trace_clear_g,		"clear-g")	\
>  	EM(netfs_folio_trace_copy,		"copy")		\
> @@ -133,16 +149,26 @@
>  	EM(netfs_folio_trace_end_copy,		"end-copy")	\
>  	EM(netfs_folio_trace_filled_gaps,	"filled-gaps")	\
>  	EM(netfs_folio_trace_kill,		"kill")		\
> +	EM(netfs_folio_trace_kill_cc,		"kill-cc")	\
> +	EM(netfs_folio_trace_kill_g,		"kill-g")	\
> +	EM(netfs_folio_trace_kill_s,		"kill-s")	\
>  	EM(netfs_folio_trace_mkwrite,		"mkwrite")	\
>  	EM(netfs_folio_trace_mkwrite_plus,	"mkwrite+")	\
> +	EM(netfs_folio_trace_not_under_wback,	"!wback")	\
>  	EM(netfs_folio_trace_read_gaps,		"read-gaps")	\
>  	EM(netfs_folio_trace_redirty,		"redirty")	\
>  	EM(netfs_folio_trace_redirtied,		"redirtied")	\
>  	EM(netfs_folio_trace_store,		"store")	\
> +	EM(netfs_folio_trace_store_copy,	"store-copy")	\
>  	EM(netfs_folio_trace_store_plus,	"store+")	\
>  	EM(netfs_folio_trace_wthru,		"wthru")	\
>  	E_(netfs_folio_trace_wthru_plus,	"wthru+")
> 
> +#define netfs_collect_contig_traces				\
> +	EM(netfs_contig_trace_collect,		"Collect")	\
> +	EM(netfs_contig_trace_jump,		"-->JUMP-->")	\
> +	E_(netfs_contig_trace_unlock,		"Unlock")
> +
>  #ifndef __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY
>  #define __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY
> 
> @@ -159,6 +185,7 @@ enum netfs_failure { netfs_failures } __mode(byte);
>  enum netfs_rreq_ref_trace { netfs_rreq_ref_traces } __mode(byte);
>  enum netfs_sreq_ref_trace { netfs_sreq_ref_traces } __mode(byte);
>  enum netfs_folio_trace { netfs_folio_traces } __mode(byte);
> +enum netfs_collect_contig_trace { netfs_collect_contig_traces } __mode(byte);
> 
>  #endif
> 
> @@ -180,6 +207,7 @@ netfs_failures;
>  netfs_rreq_ref_traces;
>  netfs_sreq_ref_traces;
>  netfs_folio_traces;
> +netfs_collect_contig_traces;
> 
>  /*
>   * Now redefine the EM() and E_() macros to map the enums to the strings that
> @@ -413,16 +441,18 @@ TRACE_EVENT(netfs_write_iter,
>  		    __field(unsigned long long,		start		)
>  		    __field(size_t,			len		)
>  		    __field(unsigned int,		flags		)
> +		    __field(unsigned int,		ino		)
>  			     ),
> 
>  	    TP_fast_assign(
>  		    __entry->start	= iocb->ki_pos;
>  		    __entry->len	= iov_iter_count(from);
> +		    __entry->ino	= iocb->ki_filp->f_inode->i_ino;
>  		    __entry->flags	= iocb->ki_flags;
>  			   ),
> 
> -	    TP_printk("WRITE-ITER s=%llx l=%zx f=%x",
> -		      __entry->start, __entry->len, __entry->flags)
> +	    TP_printk("WRITE-ITER i=%x s=%llx l=%zx f=%x",
> +		      __entry->ino, __entry->start, __entry->len, __entry->flags)
>  	    );
> 
>  TRACE_EVENT(netfs_write,
> @@ -434,6 +464,7 @@ TRACE_EVENT(netfs_write,
>  	    TP_STRUCT__entry(
>  		    __field(unsigned int,		wreq		)
>  		    __field(unsigned int,		cookie		)
> +		    __field(unsigned int,		ino		)
>  		    __field(enum netfs_write_trace,	what		)
>  		    __field(unsigned long long,		start		)
>  		    __field(unsigned long long,		len		)
> @@ -444,18 +475,213 @@ TRACE_EVENT(netfs_write,
>  		    struct fscache_cookie *__cookie = netfs_i_cookie(__ctx);
>  		    __entry->wreq	= wreq->debug_id;
>  		    __entry->cookie	= __cookie ? __cookie->debug_id : 0;
> +		    __entry->ino	= wreq->inode->i_ino;
>  		    __entry->what	= what;
>  		    __entry->start	= wreq->start;
>  		    __entry->len	= wreq->len;
>  			   ),
> 
> -	    TP_printk("R=%08x %s c=%08x by=%llx-%llx",
> +	    TP_printk("R=%08x %s c=%08x i=%x by=%llx-%llx",
>  		      __entry->wreq,
>  		      __print_symbolic(__entry->what, netfs_write_traces),
>  		      __entry->cookie,
> +		      __entry->ino,
>  		      __entry->start, __entry->start + __entry->len - 1)
>  	    );
> 
> +TRACE_EVENT(netfs_collect,
> +	    TP_PROTO(const struct netfs_io_request *wreq),
> +
> +	    TP_ARGS(wreq),
> +
> +	    TP_STRUCT__entry(
> +		    __field(unsigned int,		wreq		)
> +		    __field(unsigned int,		len		)
> +		    __field(unsigned long long,		transferred	)
> +		    __field(unsigned long long,		start		)
> +			     ),
> +
> +	    TP_fast_assign(
> +		    __entry->wreq	= wreq->debug_id;
> +		    __entry->start	= wreq->start;
> +		    __entry->len	= wreq->len;
> +		    __entry->transferred = wreq->transferred;
> +			   ),
> +
> +	    TP_printk("R=%08x s=%llx-%llx",
> +		      __entry->wreq,
> +		      __entry->start + __entry->transferred,
> +		      __entry->start + __entry->len)
> +	    );
> +
> +TRACE_EVENT(netfs_collect_contig,
> +	    TP_PROTO(const struct netfs_io_request *wreq, unsigned long long
> to,
> +		     enum netfs_collect_contig_trace type),
> +
> +	    TP_ARGS(wreq, to, type),
> +
> +	    TP_STRUCT__entry(
> +		    __field(unsigned int,		wreq)
> +		    __field(enum netfs_collect_contig_trace, type)
> +		    __field(unsigned long long,		contiguity)
> +		    __field(unsigned long long,		to)
> +			     ),
> +
> +	    TP_fast_assign(
> +		    __entry->wreq	= wreq->debug_id;
> +		    __entry->type	= type;
> +		    __entry->contiguity	= wreq->contiguity;
> +		    __entry->to		= to;
> +			   ),
> +
> +	    TP_printk("R=%08x %llx -> %llx %s",
> +		      __entry->wreq,
> +		      __entry->contiguity,
> +		      __entry->to,
> +		      __print_symbolic(__entry->type, netfs_collect_contig_traces))
> +	    );
> +
> +TRACE_EVENT(netfs_collect_sreq,
> +	    TP_PROTO(const struct netfs_io_request *wreq,
> +		     const struct netfs_io_subrequest *subreq),
> +
> +	    TP_ARGS(wreq, subreq),
> +
> +	    TP_STRUCT__entry(
> +		    __field(unsigned int,		wreq		)
> +		    __field(unsigned int,		subreq		)
> +		    __field(unsigned int,		stream		)
> +		    __field(unsigned int,		len		)
> +		    __field(unsigned int,		transferred	)
> +		    __field(unsigned long long,		start		)
> +			     ),
> +
> +	    TP_fast_assign(
> +		    __entry->wreq	= wreq->debug_id;
> +		    __entry->subreq	= subreq->debug_index;
> +		    __entry->stream	= subreq->stream_nr;
> +		    __entry->start	= subreq->start;
> +		    __entry->len	= subreq->len;
> +		    __entry->transferred = subreq->transferred;
> +			   ),
> +
> +	    TP_printk("R=%08x[%u:%02x] s=%llx t=%x/%x",
> +		      __entry->wreq, __entry->stream, __entry->subreq,
> +		      __entry->start, __entry->transferred, __entry->len)
> +	    );
> +
> +TRACE_EVENT(netfs_collect_folio,
> +	    TP_PROTO(const struct netfs_io_request *wreq,
> +		     const struct folio *folio,
> +		     unsigned long long fend,
> +		     unsigned long long collected_to),
> +
> +	    TP_ARGS(wreq, folio, fend, collected_to),
> +
> +	    TP_STRUCT__entry(
> +		    __field(unsigned int,	wreq		)
> +		    __field(unsigned long,	index		)
> +		    __field(unsigned long long,	fend		)
> +		    __field(unsigned long long,	cleaned_to	)
> +		    __field(unsigned long long,	collected_to	)
> +			     ),
> +
> +	    TP_fast_assign(
> +		    __entry->wreq	= wreq->debug_id;
> +		    __entry->index	= folio->index;
> +		    __entry->fend	= fend;
> +		    __entry->cleaned_to	= wreq->cleaned_to;
> +		    __entry->collected_to = collected_to;
> +			   ),
> +
> +	    TP_printk("R=%08x ix=%05lx r=%llx-%llx t=%llx/%llx",
> +		      __entry->wreq, __entry->index,
> +		      (unsigned long long)__entry->index * PAGE_SIZE, __entry-
> >fend,
> +		      __entry->cleaned_to, __entry->collected_to)
> +	    );
> +
> +TRACE_EVENT(netfs_collect_state,
> +	    TP_PROTO(const struct netfs_io_request *wreq,
> +		     unsigned long long collected_to,
> +		     unsigned int notes),
> +
> +	    TP_ARGS(wreq, collected_to, notes),
> +
> +	    TP_STRUCT__entry(
> +		    __field(unsigned int,	wreq		)
> +		    __field(unsigned int,	notes		)
> +		    __field(unsigned long long,	collected_to	)
> +		    __field(unsigned long long,	cleaned_to	)
> +		    __field(unsigned long long,	contiguity	)
> +			     ),
> +
> +	    TP_fast_assign(
> +		    __entry->wreq	= wreq->debug_id;
> +		    __entry->notes	= notes;
> +		    __entry->collected_to = collected_to;
> +		    __entry->cleaned_to	= wreq->cleaned_to;
> +		    __entry->contiguity = wreq->contiguity;
> +			   ),
> +
> +	    TP_printk("R=%08x cto=%llx fto=%llx ctg=%llx n=%x",
> +		      __entry->wreq, __entry->collected_to,
> +		      __entry->cleaned_to, __entry->contiguity,
> +		      __entry->notes)
> +	    );
> +
> +TRACE_EVENT(netfs_collect_gap,
> +	    TP_PROTO(const struct netfs_io_request *wreq,
> +		     const struct netfs_io_stream *stream,
> +		     unsigned long long jump_to, char type),
> +
> +	    TP_ARGS(wreq, stream, jump_to, type),
> +
> +	    TP_STRUCT__entry(
> +		    __field(unsigned int,	wreq)
> +		    __field(unsigned char,	stream)
> +		    __field(unsigned char,	type)
> +		    __field(unsigned long long,	from)
> +		    __field(unsigned long long,	to)
> +			     ),
> +
> +	    TP_fast_assign(
> +		    __entry->wreq	= wreq->debug_id;
> +		    __entry->stream	= stream->stream_nr;
> +		    __entry->from	= stream->collected_to;
> +		    __entry->to		= jump_to;
> +		    __entry->type	= type;
> +			   ),
> +
> +	    TP_printk("R=%08x[%x:] %llx->%llx %c",
> +		      __entry->wreq, __entry->stream,
> +		      __entry->from, __entry->to, __entry->type)
> +	    );
> +
> +TRACE_EVENT(netfs_collect_stream,
> +	    TP_PROTO(const struct netfs_io_request *wreq,
> +		     const struct netfs_io_stream *stream),
> +
> +	    TP_ARGS(wreq, stream),
> +
> +	    TP_STRUCT__entry(
> +		    __field(unsigned int,	wreq)
> +		    __field(unsigned char,	stream)
> +		    __field(unsigned long long,	collected_to)
> +		    __field(unsigned long long,	front)
> +			     ),
> +
> +	    TP_fast_assign(
> +		    __entry->wreq	= wreq->debug_id;
> +		    __entry->stream	= stream->stream_nr;
> +		    __entry->collected_to = stream->collected_to;
> +		    __entry->front	= stream->front ? stream->front->start :
> UINT_MAX;
> +			   ),
> +
> +	    TP_printk("R=%08x[%x:] cto=%llx frn=%llx",
> +		      __entry->wreq, __entry->stream,
> +		      __entry->collected_to, __entry->front)
> +	    );
> +
>  #undef EM
>  #undef E_
>  #endif /* _TRACE_NETFS_H */
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 19/26] netfs: New writeback implementation
  2024-03-28 16:34 ` [PATCH 19/26] netfs: New writeback implementation David Howells
  2024-03-29 10:34   ` Naveen Mamindlapalli
@ 2024-03-30  1:03   ` Vadim Fedorenko
  1 sibling, 0 replies; 62+ messages in thread
From: Vadim Fedorenko @ 2024-03-30  1:03 UTC (permalink / raw)
  To: David Howells, Christian Brauner, Jeff Layton, Gao Xiang,
	Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, linux-mm, Marc Dionne,
	linux-afs, Paulo Alcantara, linux-cifs, Matthew Wilcox,
	Steve French, linux-cachefs, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, linux-nfs, netdev,
	v9fs, linux-kernel, linux-fsdevel, netfs, linux-erofs

On 28/03/2024 16:34, David Howells wrote:
> The current netfslib writeback implementation creates writeback requests of
> contiguous folio data and then separately tiles subrequests over the space
> twice, once for the server and once for the cache.  This creates a few
> issues:
> 
>   (1) Every time there's a discontiguity or a change between writing to only
>       one destination or writing to both, it must create a new request.
>       This makes it harder to do vectored writes.
> 
>   (2) The folios don't have the writeback mark removed until the end of the
>       request - and a request could be hundreds of megabytes.
> 
>   (3) In future, I want to support a larger cache granularity, which will
>       require aggregation of some folios that contain unmodified data (which
>       only need to go to the cache) and some which contain modifications
>       (which need to be uploaded and stored to the cache) - but, currently,
>       these are treated as discontiguous.
> 
> There's also a move to get everyone to use writeback_iter() to extract
> writable folios from the pagecache.  That said, currently writeback_iter()
> has some issues that make it less than ideal:
> 
>   (1) there's no way to cancel the iteration, even if you find a "temporary"
>       error that means the current folio and all subsequent folios are going
>       to fail;
> 
>   (2) there's no way to filter the folios being written back - something
>       that will impact Ceph with it's ordered snap system;
> 
>   (3) and if you get a folio you can't immediately deal with (say you need
>       to flush the preceding writes), you are left with a folio hanging in
>       the locked state for the duration, when really we should unlock it and
>       relock it later.
> 
> In this new implementation, I use writeback_iter() to pump folios,
> progressively creating two parallel, but separate streams and cleaning up
> the finished folios as the subrequests complete.  Either or both streams
> can contain gaps, and the subrequests in each stream can be of variable
> size, don't need to align with each other and don't need to align with the
> folios.
> 
> Indeed, subrequests can cross folio boundaries, may cover several folios or
> a folio may be spanned by multiple folios, e.g.:
> 
>           +---+---+-----+-----+---+----------+
> Folios:  |   |   |     |     |   |          |
>           +---+---+-----+-----+---+----------+
> 
>             +------+------+     +----+----+
> Upload:    |      |      |.....|    |    |
>             +------+------+     +----+----+
> 
>           +------+------+------+------+------+
> Cache:   |      |      |      |      |      |
>           +------+------+------+------+------+
> 
> The progressive subrequest construction permits the algorithm to be
> preparing both the next upload to the server and the next write to the
> cache whilst the previous ones are already in progress.  Throttling can be
> applied to control the rate of production of subrequests - and, in any
> case, we probably want to write them to the server in ascending order,
> particularly if the file will be extended.
> 
> Content crypto can also be prepared at the same time as the subrequests and
> run asynchronously, with the prepped requests being stalled until the
> crypto catches up with them.  This might also be useful for transport
> crypto, but that happens at a lower layer, so probably would be harder to
> pull off.
> 
> The algorithm is split into three parts:
> 
>   (1) The issuer.  This walks through the data, packaging it up, encrypting
>       it and creating subrequests.  The part of this that generates
>       subrequests only deals with file positions and spans and so is usable
>       for DIO/unbuffered writes as well as buffered writes.
> 
>   (2) The collector. This asynchronously collects completed subrequests,
>       unlocks folios, frees crypto buffers and performs any retries.  This
>       runs in a work queue so that the issuer can return to the caller for
>       writeback (so that the VM can have its kswapd thread back) or async
>       writes.
> 
>   (3) The retryer.  This pauses the issuer, waits for all outstanding
>       subrequests to complete and then goes through the failed subrequests
>       to reissue them.  This may involve reprepping them (with cifs, the
>       credits must be renegotiated, and a subrequest may need splitting),
>       and doing RMW for content crypto if there's a conflicting change on
>       the server.
> 
> [!] Note that some of the functions are prefixed with "new_" to avoid
> clashes with existing functions.  These will be renamed in a later patch
> that cuts over to the new algorithm.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: Eric Van Hensbergen <ericvh@kernel.org>
> cc: Latchesar Ionkov <lucho@ionkov.net>
> cc: Dominique Martinet <asmadeus@codewreck.org>
> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: v9fs@lists.linux.dev
> cc: linux-afs@lists.infradead.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org

[..snip..]
> +/*
> + * Begin a write operation for writing through the pagecache.
> + */
> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t len)
> +{
> +	struct netfs_io_request *wreq = NULL;
> +	struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
> +
> +	mutex_lock(&ictx->wb_lock);
> +
> +	wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
> +				      iocb->ki_pos, NETFS_WRITETHROUGH);
> +	if (IS_ERR(wreq))
> +		mutex_unlock(&ictx->wb_lock);
> +
> +	wreq->io_streams[0].avail = true;

in case IS_ERR(wreq) is true, the execution falls through and this
derefere is invalid.

> +	trace_netfs_write(wreq, netfs_write_trace_writethrough);

not sure if we still need trace function call in case of error

> +	return wreq;
> +}
> +

[..snip..]




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 19/26] netfs: New writeback implementation
  2024-03-29 10:34   ` Naveen Mamindlapalli
@ 2024-03-30  1:06     ` Vadim Fedorenko
  2024-03-30  1:06       ` Vadim Fedorenko
  0 siblings, 1 reply; 62+ messages in thread
From: Vadim Fedorenko @ 2024-03-30  1:06 UTC (permalink / raw)
  To: Naveen Mamindlapalli, David Howells, Christian Brauner,
	Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, linux-mm, Marc Dionne,
	linux-afs, Paulo Alcantara, linux-cifs, Matthew Wilcox,
	Steve French, linux-cachefs, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, linux-nfs, netdev,
	v9fs, linux-kernel, linux-fsdevel, netfs,
	 linux-erofs@lists.ozlabs.org

On 29/03/2024 10:34, Naveen Mamindlapalli wrote:
>> -----Original Message-----
>> From: David Howells <dhowells@redhat.com>
>> Sent: Thursday, March 28, 2024 10:04 PM
>> To: Christian Brauner <christian@brauner.io>; Jeff Layton <jlayton@kernel.org>;
>> Gao Xiang <hsiangkao@linux.alibaba.com>; Dominique Martinet
>> <asmadeus@codewreck.org>
>> Cc: David Howells <dhowells@redhat.com>; Matthew Wilcox
>> <willy@infradead.org>; Steve French <smfrench@gmail.com>; Marc Dionne
>> <marc.dionne@auristor.com>; Paulo Alcantara <pc@manguebit.com>; Shyam
>> Prasad N <sprasad@microsoft.com>; Tom Talpey <tom@talpey.com>; Eric Van
>> Hensbergen <ericvh@kernel.org>; Ilya Dryomov <idryomov@gmail.com>;
>> netfs@lists.linux.dev; linux-cachefs@redhat.com; linux-afs@lists.infradead.org;
>> linux-cifs@vger.kernel.org; linux-nfs@vger.kernel.org; ceph-
>> devel@vger.kernel.org; v9fs@lists.linux.dev; linux-erofs@lists.ozlabs.org; linux-
>> fsdevel@vger.kernel.org; linux-mm@kvack.org; netdev@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Latchesar Ionkov <lucho@ionkov.net>; Christian
>> Schoenebeck <linux_oss@crudebyte.com>
>> Subject: [PATCH 19/26] netfs: New writeback implementation
>>
>> The current netfslib writeback implementation creates writeback requests of
>> contiguous folio data and then separately tiles subrequests over the space
>> twice, once for the server and once for the cache.  This creates a few
>> issues:
>>
>>   (1) Every time there's a discontiguity or a change between writing to only
>>       one destination or writing to both, it must create a new request.
>>       This makes it harder to do vectored writes.
>>
>>   (2) The folios don't have the writeback mark removed until the end of the
>>       request - and a request could be hundreds of megabytes.
>>
>>   (3) In future, I want to support a larger cache granularity, which will
>>       require aggregation of some folios that contain unmodified data (which
>>       only need to go to the cache) and some which contain modifications
>>       (which need to be uploaded and stored to the cache) - but, currently,
>>       these are treated as discontiguous.
>>
>> There's also a move to get everyone to use writeback_iter() to extract
>> writable folios from the pagecache.  That said, currently writeback_iter()
>> has some issues that make it less than ideal:
>>
>>   (1) there's no way to cancel the iteration, even if you find a "temporary"
>>       error that means the current folio and all subsequent folios are going
>>       to fail;
>>
>>   (2) there's no way to filter the folios being written back - something
>>       that will impact Ceph with it's ordered snap system;
>>
>>   (3) and if you get a folio you can't immediately deal with (say you need
>>       to flush the preceding writes), you are left with a folio hanging in
>>       the locked state for the duration, when really we should unlock it and
>>       relock it later.
>>
>> In this new implementation, I use writeback_iter() to pump folios,
>> progressively creating two parallel, but separate streams and cleaning up
>> the finished folios as the subrequests complete.  Either or both streams
>> can contain gaps, and the subrequests in each stream can be of variable
>> size, don't need to align with each other and don't need to align with the
>> folios.
>>
>> Indeed, subrequests can cross folio boundaries, may cover several folios or
>> a folio may be spanned by multiple folios, e.g.:
>>
>>           +---+---+-----+-----+---+----------+
>> Folios:  |   |   |     |     |   |          |
>>           +---+---+-----+-----+---+----------+
>>
>>             +------+------+     +----+----+
>> Upload:    |      |      |.....|    |    |
>>             +------+------+     +----+----+
>>
>>           +------+------+------+------+------+
>> Cache:   |      |      |      |      |      |
>>           +------+------+------+------+------+
>>
>> The progressive subrequest construction permits the algorithm to be
>> preparing both the next upload to the server and the next write to the
>> cache whilst the previous ones are already in progress.  Throttling can be
>> applied to control the rate of production of subrequests - and, in any
>> case, we probably want to write them to the server in ascending order,
>> particularly if the file will be extended.
>>
>> Content crypto can also be prepared at the same time as the subrequests and
>> run asynchronously, with the prepped requests being stalled until the
>> crypto catches up with them.  This might also be useful for transport
>> crypto, but that happens at a lower layer, so probably would be harder to
>> pull off.
>>
>> The algorithm is split into three parts:
>>
>>   (1) The issuer.  This walks through the data, packaging it up, encrypting
>>       it and creating subrequests.  The part of this that generates
>>       subrequests only deals with file positions and spans and so is usable
>>       for DIO/unbuffered writes as well as buffered writes.
>>
>>   (2) The collector. This asynchronously collects completed subrequests,
>>       unlocks folios, frees crypto buffers and performs any retries.  This
>>       runs in a work queue so that the issuer can return to the caller for
>>       writeback (so that the VM can have its kswapd thread back) or async
>>       writes.
>>
>>   (3) The retryer.  This pauses the issuer, waits for all outstanding
>>       subrequests to complete and then goes through the failed subrequests
>>       to reissue them.  This may involve reprepping them (with cifs, the
>>       credits must be renegotiated, and a subrequest may need splitting),
>>       and doing RMW for content crypto if there's a conflicting change on
>>       the server.
>>
>> [!] Note that some of the functions are prefixed with "new_" to avoid
>> clashes with existing functions.  These will be renamed in a later patch
>> that cuts over to the new algorithm.
>>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Jeff Layton <jlayton@kernel.org>
>> cc: Eric Van Hensbergen <ericvh@kernel.org>
>> cc: Latchesar Ionkov <lucho@ionkov.net>
>> cc: Dominique Martinet <asmadeus@codewreck.org>
>> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
>> cc: Marc Dionne <marc.dionne@auristor.com>
>> cc: v9fs@lists.linux.dev
>> cc: linux-afs@lists.infradead.org
>> cc: netfs@lists.linux.dev
>> cc: linux-fsdevel@vger.kernel.org

[..snip..]

>> +/*
>> + * Begin a write operation for writing through the pagecache.
>> + */
>> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t
>> len)
>> +{
>> +	struct netfs_io_request *wreq = NULL;
>> +	struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
>> +
>> +	mutex_lock(&ictx->wb_lock);
>> +
>> +	wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
>> +				      iocb->ki_pos, NETFS_WRITETHROUGH);
>> +	if (IS_ERR(wreq))
>> +		mutex_unlock(&ictx->wb_lock);
>> +
>> +	wreq->io_streams[0].avail = true;
>> +	trace_netfs_write(wreq, netfs_write_trace_writethrough);
> 
> Missing mutex_unlock() before return.
> 

mutex_unlock() happens in new_netfs_end_writethrough()

> Thanks,
> Naveen
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 19/26] netfs: New writeback implementation
  2024-03-30  1:06     ` Vadim Fedorenko
@ 2024-03-30  1:06       ` Vadim Fedorenko
  0 siblings, 0 replies; 62+ messages in thread
From: Vadim Fedorenko @ 2024-03-30  1:06 UTC (permalink / raw)
  To: Naveen Mamindlapalli, David Howells, Christian Brauner,
	Jeff Layton, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, linux-mm, Marc Dionne,
	linux-afs, Paulo Alcantara, linux-cifs, Matthew Wilcox,
	Steve French, linux-cachefs, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, linux-nfs, netdev,
	v9fs, linux-kernel, linux-fsdevel, netfs,
	 linux-erofs@lists.ozlabs.org

On 29/03/2024 10:34, Naveen Mamindlapalli wrote:
>> -----Original Message-----
>> From: David Howells <dhowells@redhat.com>
>> Sent: Thursday, March 28, 2024 10:04 PM
>> To: Christian Brauner <christian@brauner.io>; Jeff Layton <jlayton@kernel.org>;
>> Gao Xiang <hsiangkao@linux.alibaba.com>; Dominique Martinet
>> <asmadeus@codewreck.org>
>> Cc: David Howells <dhowells@redhat.com>; Matthew Wilcox
>> <willy@infradead.org>; Steve French <smfrench@gmail.com>; Marc Dionne
>> <marc.dionne@auristor.com>; Paulo Alcantara <pc@manguebit.com>; Shyam
>> Prasad N <sprasad@microsoft.com>; Tom Talpey <tom@talpey.com>; Eric Van
>> Hensbergen <ericvh@kernel.org>; Ilya Dryomov <idryomov@gmail.com>;
>> netfs@lists.linux.dev; linux-cachefs@redhat.com; linux-afs@lists.infradead.org;
>> linux-cifs@vger.kernel.org; linux-nfs@vger.kernel.org; ceph-
>> devel@vger.kernel.org; v9fs@lists.linux.dev; linux-erofs@lists.ozlabs.org; linux-
>> fsdevel@vger.kernel.org; linux-mm@kvack.org; netdev@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Latchesar Ionkov <lucho@ionkov.net>; Christian
>> Schoenebeck <linux_oss@crudebyte.com>
>> Subject: [PATCH 19/26] netfs: New writeback implementation
>>
>> The current netfslib writeback implementation creates writeback requests of
>> contiguous folio data and then separately tiles subrequests over the space
>> twice, once for the server and once for the cache.  This creates a few
>> issues:
>>
>>   (1) Every time there's a discontiguity or a change between writing to only
>>       one destination or writing to both, it must create a new request.
>>       This makes it harder to do vectored writes.
>>
>>   (2) The folios don't have the writeback mark removed until the end of the
>>       request - and a request could be hundreds of megabytes.
>>
>>   (3) In future, I want to support a larger cache granularity, which will
>>       require aggregation of some folios that contain unmodified data (which
>>       only need to go to the cache) and some which contain modifications
>>       (which need to be uploaded and stored to the cache) - but, currently,
>>       these are treated as discontiguous.
>>
>> There's also a move to get everyone to use writeback_iter() to extract
>> writable folios from the pagecache.  That said, currently writeback_iter()
>> has some issues that make it less than ideal:
>>
>>   (1) there's no way to cancel the iteration, even if you find a "temporary"
>>       error that means the current folio and all subsequent folios are going
>>       to fail;
>>
>>   (2) there's no way to filter the folios being written back - something
>>       that will impact Ceph with it's ordered snap system;
>>
>>   (3) and if you get a folio you can't immediately deal with (say you need
>>       to flush the preceding writes), you are left with a folio hanging in
>>       the locked state for the duration, when really we should unlock it and
>>       relock it later.
>>
>> In this new implementation, I use writeback_iter() to pump folios,
>> progressively creating two parallel, but separate streams and cleaning up
>> the finished folios as the subrequests complete.  Either or both streams
>> can contain gaps, and the subrequests in each stream can be of variable
>> size, don't need to align with each other and don't need to align with the
>> folios.
>>
>> Indeed, subrequests can cross folio boundaries, may cover several folios or
>> a folio may be spanned by multiple folios, e.g.:
>>
>>           +---+---+-----+-----+---+----------+
>> Folios:  |   |   |     |     |   |          |
>>           +---+---+-----+-----+---+----------+
>>
>>             +------+------+     +----+----+
>> Upload:    |      |      |.....|    |    |
>>             +------+------+     +----+----+
>>
>>           +------+------+------+------+------+
>> Cache:   |      |      |      |      |      |
>>           +------+------+------+------+------+
>>
>> The progressive subrequest construction permits the algorithm to be
>> preparing both the next upload to the server and the next write to the
>> cache whilst the previous ones are already in progress.  Throttling can be
>> applied to control the rate of production of subrequests - and, in any
>> case, we probably want to write them to the server in ascending order,
>> particularly if the file will be extended.
>>
>> Content crypto can also be prepared at the same time as the subrequests and
>> run asynchronously, with the prepped requests being stalled until the
>> crypto catches up with them.  This might also be useful for transport
>> crypto, but that happens at a lower layer, so probably would be harder to
>> pull off.
>>
>> The algorithm is split into three parts:
>>
>>   (1) The issuer.  This walks through the data, packaging it up, encrypting
>>       it and creating subrequests.  The part of this that generates
>>       subrequests only deals with file positions and spans and so is usable
>>       for DIO/unbuffered writes as well as buffered writes.
>>
>>   (2) The collector. This asynchronously collects completed subrequests,
>>       unlocks folios, frees crypto buffers and performs any retries.  This
>>       runs in a work queue so that the issuer can return to the caller for
>>       writeback (so that the VM can have its kswapd thread back) or async
>>       writes.
>>
>>   (3) The retryer.  This pauses the issuer, waits for all outstanding
>>       subrequests to complete and then goes through the failed subrequests
>>       to reissue them.  This may involve reprepping them (with cifs, the
>>       credits must be renegotiated, and a subrequest may need splitting),
>>       and doing RMW for content crypto if there's a conflicting change on
>>       the server.
>>
>> [!] Note that some of the functions are prefixed with "new_" to avoid
>> clashes with existing functions.  These will be renamed in a later patch
>> that cuts over to the new algorithm.
>>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Jeff Layton <jlayton@kernel.org>
>> cc: Eric Van Hensbergen <ericvh@kernel.org>
>> cc: Latchesar Ionkov <lucho@ionkov.net>
>> cc: Dominique Martinet <asmadeus@codewreck.org>
>> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
>> cc: Marc Dionne <marc.dionne@auristor.com>
>> cc: v9fs@lists.linux.dev
>> cc: linux-afs@lists.infradead.org
>> cc: netfs@lists.linux.dev
>> cc: linux-fsdevel@vger.kernel.org

[..snip..]

>> +/*
>> + * Begin a write operation for writing through the pagecache.
>> + */
>> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t
>> len)
>> +{
>> +	struct netfs_io_request *wreq = NULL;
>> +	struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
>> +
>> +	mutex_lock(&ictx->wb_lock);
>> +
>> +	wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
>> +				      iocb->ki_pos, NETFS_WRITETHROUGH);
>> +	if (IS_ERR(wreq))
>> +		mutex_unlock(&ictx->wb_lock);
>> +
>> +	wreq->io_streams[0].avail = true;
>> +	trace_netfs_write(wreq, netfs_write_trace_writethrough);
> 
> Missing mutex_unlock() before return.
> 

mutex_unlock() happens in new_netfs_end_writethrough()

> Thanks,
> Naveen
> 


X-sender: <netdev+bounces-83486-peter.schumann=secunet.com@vger.kernel.org>
X-Receiver: <peter.schumann@secunet.com> ORCPT=rfc822;peter.schumann@secunet.com NOTIFY=NEVER; X-ExtendedProps=BQAMAAIAAAUAWAAXAEgAAACdOWm+FoEIR7KnWe1lIx8pQ049U2NodW1hbm4gUGV0ZXIsT1U9VXNlcnMsT1U9TWlncmF0aW9uLERDPXNlY3VuZXQsREM9ZGUFAGwAAgAADwA2AAAATWljcm9zb2Z0LkV4Y2hhbmdlLlRyYW5zcG9ydC5NYWlsUmVjaXBpZW50LkRpc3BsYXlOYW1lDwAPAAAAU2NodW1hbm4sIFBldGVyBQA8AAIAAAUAHQAPAAwAAABtYngtZXNzZW4tMDEFAA4AEQAuyVP5XtO9RYbNJlr9VbVbBQALABcAvgAAAEOSGd+Q7QVIkVZ3ffGxE8RDTj1EQjQsQ049RGF0YWJhc2VzLENOPUV4Y2hhbmdlIEFkbWluaXN0cmF0aXZlIEdyb3VwIChGWURJQk9IRjIzU1BETFQpLENOPUFkbWluaXN0cmF0aXZlIEdyb3VwcyxDTj1zZWN1bmV0LENOPU1pY3Jvc29mdCBFeGNoYW5nZSxDTj1TZXJ2aWNlcyxDTj1Db25maWd1cmF0aW9uLERDPXNlY3VuZXQsREM9ZGUFABIADwBgAAAAL289c2VjdW5ldC9vdT1FeGNoYW5nZSBBZG1pbmlzdHJhdGl2ZSBHcm91cCAoRllESUJPSEYyM1NQRExUKS9jbj1SZWNpcGllbnRzL2NuPVBldGVyIFNjaHVtYW5uNWU3BQBHAAIAAAUARgAHAAMAAAAFAEMAAgAABQAWAAIAAAUAagAJAAEAAAAAAAAABQAVABYAAgAAAA8ANQAAAE1pY3Jvc29mdC5FeGNoYW5nZS5UcmFuc3BvcnQuRGlyZWN0b3J5RGF0YS5Jc1Jlc291cmNlAgAABQAUABEAnTlpvhaBCEeyp1ntZSMfKQUAIwACAAEFACIADwAxAAAAQXV0b1Jlc3BvbnNlU3VwcHJlc3M6IDANClRyYW5zbWl0SGlzdG9yeTogRmFsc2UNCg8ALwAAAE1pY3Jvc29mdC5FeGNoYW5nZS5UcmFuc3BvcnQuRXhwYW5zaW9uR3JvdXBUeXBlDwAVAAAATWVtYmVyc0dyb3VwRXhwYW5zaW9uBQAmAAIAAQ==
X-CreatedBy: MSExchange15
X-HeloDomain: a.mx.secunet.com
X-ExtendedProps: BQBjAAoA3kymlidQ3AgFAGEACAABAAAABQA3AAIAAA8APAAAAE1pY3Jvc29mdC5FeGNoYW5nZS5UcmFuc3BvcnQuTWFpbFJlY2lwaWVudC5Pcmdhbml6YXRpb25TY29wZREAAAAAAAAAAAAAAAAAAAAAAAUASQACAAEFAAQAFCABAAAAGgAAAHBldGVyLnNjaHVtYW5uQHNlY3VuZXQuY29tBQAGAAIAAQ8AKgAAAE1pY3Jvc29mdC5FeGNoYW5nZS5UcmFuc3BvcnQuUmVzdWJtaXRDb3VudAcAAwAAAA8ACQAAAENJQXVkaXRlZAIAAQUAAgAHAAEAAAAFAAMABwAAAAAABQAFAAIAAQUAYgAKADEAAADQigAABQBkAA8AAwAAAEh1YgUAKQACAAEPAD8AAABNaWNyb3NvZnQuRXhjaGFuZ2UuVHJhbnNwb3J0LkRpcmVjdG9yeURhdGEuTWFpbERlbGl2ZXJ5UHJpb3JpdHkPAAMAAABMb3c=
X-Source: SMTP:Default MBX-ESSEN-02
X-SourceIPAddress: 62.96.220.36
X-EndOfInjectedXHeaders: 28032
Received: from cas-essen-02.secunet.de (10.53.40.202) by
 mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.37; Sat, 30 Mar 2024 02:06:35 +0100
Received: from a.mx.secunet.com (62.96.220.36) by cas-essen-02.secunet.de
 (10.53.40.202) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend
 Transport; Sat, 30 Mar 2024 02:06:35 +0100
Received: from localhost (localhost [127.0.0.1])
	by a.mx.secunet.com (Postfix) with ESMTP id 3EE60201E5
	for <peter.schumann@secunet.com>; Sat, 30 Mar 2024 02:06:35 +0100 (CET)
X-Virus-Scanned: by secunet
X-Spam-Flag: NO
X-Spam-Score: -2.751
X-Spam-Level:
X-Spam-Status: No, score=-2.751 tagged_above=-999 required=2.1
	tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249,
	MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_NONE=-0.0001,
	SPF_HELO_NONE=0.001, SPF_PASS=-0.001]
	autolearn=unavailable autolearn_force=no
Authentication-Results: a.mx.secunet.com (amavisd-new);
	dkim=pass (1024-bit key) header.d=linux.dev
Received: from a.mx.secunet.com ([127.0.0.1])
	by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 6tvewKec8aDo for <peter.schumann@secunet.com>;
	Sat, 30 Mar 2024 02:06:31 +0100 (CET)
Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=147.75.48.161; helo=sy.mirrors.kernel.org; envelope-from=netdev+bounces-83486-peter.schumann=secunet.com@vger.kernel.org; receiver=peter.schumann@secunet.com 
DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com 617F120519
Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org [147.75.48.161])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by a.mx.secunet.com (Postfix) with ESMTPS id 617F120519
	for <peter.schumann@secunet.com>; Sat, 30 Mar 2024 02:06:31 +0100 (CET)
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sy.mirrors.kernel.org (Postfix) with ESMTPS id 2491AB2154C
	for <peter.schumann@secunet.com>; Sat, 30 Mar 2024 01:06:27 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 7E78810FF;
	Sat, 30 Mar 2024 01:06:22 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="VugHsG2Y"
X-Original-To: netdev@vger.kernel.org
Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 762E77E8
	for <netdev@vger.kernel.org>; Sat, 30 Mar 2024 01:06:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.173
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1711760782; cv=none; b=dASfLrrkRxmD6WmYvcvyTFgLXAgqW4qcP8FwVw/FT8ajSayU1k2jNzB6oEhlAk4YxWiFWUStYosH2VKaROBs5wKHQh4Rsxe59gs4L4KuJN+VlHKDa1iIm9ShtgGS6jAthHnsiMpAE+me1GueQZILnQSEjyu5ZoBpE9mg1Ojzukk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1711760782; c=relaxed/simple;
	bh=0Tab6a8G72hAdQQTifWJdxmg84+y/tA9VvkWPFI9VQA=;
	h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:
	 In-Reply-To:Content-Type; b=J9cTiVFLYlA2DR+fBNIsoiV/11LbFxExG+qAmCsON2fksIZjZEAFWqHQx2zJk8Dqn3t/Quqw4LH8Yjb2qqlthM0L82RcciykTG9EQ9SPWlqiRoPPhuerZSz/amNX1IgyImsufFdXk4+oiQpzCA0LsWzVgTdTI9x4oenmDhjahZI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=VugHsG2Y; arc=none smtp.client-ip=91.218.175.173
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev
Message-ID: <08dd01e3-c45e-47d9-bcde-55f7d1edc480@linux.dev>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1711760777;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=BFgFwDTUylEnErMgQV0Ufr9/Ufnl/0omKSnqywixpVg=;
	b=VugHsG2Y5vYBP20CvcOMmGaEI/5A/PvLnTsQTtTA0ZebTuac4nORyH8iRZe/CwFs5RLRhJ
	4Ih/prbwbd8/OSA0Wv9Z9Z9JdeLOJUf8/vLW1xeGCG/2qNeI4CXYcIw3EixotT7o6oviEg
	ZM4gfY/Y4bUjm5TsY8pyZBWQLZ0Jv74=
Date: Fri, 29 Mar 2024 18:06:09 -0700
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Subject: Re: [PATCH 19/26] netfs: New writeback implementation
Content-Language: en-US
To: Naveen Mamindlapalli <naveenm@marvell.com>,
 David Howells <dhowells@redhat.com>, Christian Brauner
 <christian@brauner.io>, Jeff Layton <jlayton@kernel.org>,
 Gao Xiang <hsiangkao@linux.alibaba.com>,
 Dominique Martinet <asmadeus@codewreck.org>
Cc: Matthew Wilcox <willy@infradead.org>, Steve French <smfrench@gmail.com>,
 Marc Dionne <marc.dionne@auristor.com>, Paulo Alcantara <pc@manguebit.com>,
 Shyam Prasad N <sprasad@microsoft.com>, Tom Talpey <tom@talpey.com>,
 Eric Van Hensbergen <ericvh@kernel.org>, Ilya Dryomov <idryomov@gmail.com>,
 "netfs@lists.linux.dev" <netfs@lists.linux.dev>,
 "linux-cachefs@redhat.com" <linux-cachefs@redhat.com>,
 "linux-afs@lists.infradead.org" <linux-afs@lists.infradead.org>,
 "linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>,
 "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
 "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
 "v9fs@lists.linux.dev" <v9fs@lists.linux.dev>,
 "linux-erofs@lists.ozlabs.org" <linux-erofs@lists.ozlabs.org>,
 "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
 "linux-mm@kvack.org" <linux-mm@kvack.org>,
 "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
 Latchesar Ionkov <lucho@ionkov.net>,
 Christian Schoenebeck <linux_oss@crudebyte.com>
References: <20240328163424.2781320-1-dhowells@redhat.com>
 <20240328163424.2781320-20-dhowells@redhat.com>
 <SJ2PR18MB5635A86C024316BC5E57B79EA23A2@SJ2PR18MB5635.namprd18.prod.outlook.com>
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
In-Reply-To: <SJ2PR18MB5635A86C024316BC5E57B79EA23A2@SJ2PR18MB5635.namprd18.prod.outlook.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Migadu-Flow: FLOW_OUT
Return-Path: netdev+bounces-83486-peter.schumann=secunet.com@vger.kernel.org
X-MS-Exchange-Organization-OriginalArrivalTime: 30 Mar 2024 01:06:35.2990
 (UTC)
X-MS-Exchange-Organization-Network-Message-Id: 407005da-cf0f-43eb-e721-08dc5055a526
X-MS-Exchange-Organization-OriginalClientIPAddress: 62.96.220.36
X-MS-Exchange-Organization-OriginalServerIPAddress: 10.53.40.202
X-MS-Exchange-Organization-Cross-Premises-Headers-Processed: cas-essen-02.secunet.de
X-MS-Exchange-Organization-OrderedPrecisionLatencyInProgress: LSRV=mbx-essen-02.secunet.de:TOTAL-HUB=25969.715|SMR=0.330(SMRDE=0.005|SMRC=0.324(SMRCL=0.103|X-SMRCR=0.323))|CAT=0.079(CATOS=0.001
 |CATRESL=0.033(CATRESLP2R=0.026)|CATORES=0.043(CATRS=0.043(CATRS-Index
 Routing Agent=0.041
 )))|QDM=5760.243|SMSC=0.016|SMS=2.529(SMSMBXD-INC=2.522)|UNK=0.001|QDM=5821.174|SMSC=0.014
 |SMS=3.274(SMSMBXD-INC=3.267)|QDM=8272.090|SMSC=0.022|SMS=4.740(SMSMBXD-INC=4.719
 )|QDM=6095.229|UNK=0.001|CAT=0.027(CATRESL=0.026(CATRESLP2R=0.012))|QDM=5.054|UNK=0.004
 |CAT=0.022(CATRESL=0.020(CATRESLP2R=0.009))|QDM=5.047|UNK=0.004|CAT=0.005(CATRESL=0.004
 (CATRESLP2R=0.002));2024-03-30T08:19:25.041Z
X-MS-Exchange-Forest-ArrivalHubServer: mbx-essen-02.secunet.de
X-MS-Exchange-Organization-AuthSource: cas-essen-02.secunet.de
X-MS-Exchange-Organization-AuthAs: Anonymous
X-MS-Exchange-Organization-FromEntityHeader: Internet
X-MS-Exchange-Organization-OriginalSize: 15012
X-MS-Exchange-Organization-HygienePolicy: Standard
X-MS-Exchange-Organization-MessageLatency: SRV=cas-essen-02.secunet.de:TOTAL-FE=0.025|SMR=0.022(SMRPI=0.020(SMRPI-FrontendProxyAgent=0.020))|SMS=0.002
X-MS-Exchange-Organization-Recipient-Limit-Verified: True
X-MS-Exchange-Organization-TotalRecipientCount: 1
X-MS-Exchange-Organization-Rules-Execution-History: 0b0cf904-14ac-4724-8bdf-482ee6223cf2%%%fd34672d-751c-45ae-a963-ed177fcabe23%%%d8080257-b0c3-47b4-b0db-23bc0c8ddb3c%%%95e591a2-5d7d-4afa-b1d0-7573d6c0a5d9%%%f7d0f6bc-4dcc-4876-8c5d-b3d6ddbb3d55%%%16355082-c50b-4214-9c7d-d39575f9f79b
X-MS-Exchange-Forest-RulesExecuted: mbx-essen-02
X-MS-Exchange-Organization-RulesExecuted: mbx-essen-02
X-MS-Exchange-Forest-IndexAgent-0: AQ0CZW4AAZIQAAAPAAADH4sIAAAAAAAEAJ1Ze29bx7E/lMSnHnYedp
 LeB/bmj1aKKdpxgqKRVUOp7da+iJ3AdpoCQSAsD5fkqQ7Psuchmr35
 HBfod7kf7v5mZnd5SFmpUUGmlruzs/P8zez6/776NlP3v7p774u79+
 /d/1J9fu/kiy/76oW+NCZTz/UsyUapnus0TdQit6U52e89fKiO6efb
 PJkkmU7Vc1MUemJ4kpf/mNvZiXqsL5ORemoXJk0LdTqayugsN6OpLg
 exnT1k6lcmK0/U62mVFyO97OPUPJ6q+7/rKy/SvS/Vd8+Z9rU9UY+m
 eVKUic7UH3JdZSZXp7GfOhvK1CCxDx+o/zbjsfpGL0ubqdO/pjw4uz
 B5ZtKBzScPHzDPP2mr/oK9E3U6LejvhbZnaZJVbwY6TYZ6qFnWB+qx
 hTmSv1WGRCyTzJS8/1QXMz0yVXEW25FZ5Ca+YO68+Ch+J0M8AMuynJ
 qF+iFJY/tGGC+SNF2eJdk4B389EpnVq9JcGtjYZDDTaTEb8+hsMtNJ
 GpjlsXqc2CwzwmmGicGIJ850RcayuaP9TlepVV+nsc5KnWt1Oo/PZr
 BCZYaJF+7VdKlnzOm7XBd6pF7g4DkPz2ZJnNvCjj3taztTr3U6N0t1
 WtrZWcljt/gkT2L1Z50xr6cmK4YmnyDSTg0WLqdrzlHP0qVWj/Olnd
 lLdZqMZFRXlNnAD+MCDivKYiBuG5nLB4qHx7GOp2Zct7Vf0WHTmoGF
 p9ucgOZygnBayeW3Z29bis18KikAEUx6leDyq+tFNbkNi/bvqR4W9f
 OY67i4hq/QzGZnF5dawu8B2QXU19EyP5m+SvKNLmG2Qufqmc0uyPpp
 FU/tWcLfBuAM94Q8lCzGusnMENEPajri3BZIibwameGyNLV8r4Z/NT
 FS/sfvvn796Kn6/Ku793/7k7jxRL1ADizypDRDKKKS2Tw1MwCELnE2
 bRcYmBoVVzkCv5R9yNNrd6k4N7o0RY0gN8hi2FnZMfOLbVYmk8pWhR
 rbNLFqpEutdDZSyMlMFWaOzChNulRlkoJTUQ1XLC6BQCBTxVzHkm7l
 IolNX9ksNuDnVk1OhMRzbZ4DdKCgUlIESbUamwWzSooCx5wEzZU6/P
 xIPQErkmVmiEdufkNbRknh9EjKpQJ7reIpEtmooSkXhOdkgAQ4V1rI
 kC4dQ/qxmUHIwpuZmAy7a8RDW077KinVrCpKJyS4Z3CVM8Ogxos1me
 kL6IEtU52PyECwqVWX8LtFKoorikFdrftH7Fa2fwHi7De0+ZI1rLkO
 SEb+AxCATQV1UyYwZNcxDWuSOOHUMZtdh++xrdIRrKKmVTaCOBQHam
 YmmgJ1XaovjtSzTI2rssrh0WdqAZQkZYpqPrd5CaapBoLl4kc1yXVW
 YQYe6KvFNAFCE4hvyJTkMN9kkuNEsfZYFXYWdC+BVBySOsmg4syOkn
 ECbTkoD5nrmusQlpnBOsSaWPoMcXXEijNvEcZzFZ4xH1/UmAn3wA4m
 quapBTiOhJN4b/2IYzWsyr7Px3TZr/EDWQFdoW/JUQM2RS1QkXCDel
 JLJKeFhV3Jw6ySKZWheKcYxfeqqIXDOf7mh0c0b96UuY6lJNO6HqbB
 omM0JCzyHH3KKuNg5kIno5rsVzgzuymEZiNKNoqDKMIpwIEHPJOpBD
 Uk3UxVn5+ZRewsSVAU2thI1NIZ7IM+qQgOY7W0lRonHK6flmaGINP5
 8tOaSU2eM3SQCEZnhbjCgaGgF0d7mhJMFRTyfqFgT0ws0rruI6vGqK
 gPNpLxquDjJC0d1jl2Q0MIQTYrIT3n5zFbqpxunEHiUiYQNsNL6hEq
 JSbKKWyAQyyBBKKjyPRcFcsCmj/YSENSytmHQkI7Xek7LAqwSGYzM0
 oEpskTwv6wgPBERDG9oXVaFVMJCzRt2Ot0McVRn7eQtVIzLoWTP5Ew
 lUiTbD3QVWrjC9KhJHj0+D6qvIcXVEqQBSmFGQrClFGoymgbBRIUXM
 MJP5+CXb5KE8BRSQhL6Lte6AifrsuOeTWbO69Jfs5zCwAqiuSS7MWY
 zmi/AC2KXZqatE+JHYof9ALRrGBHxClCj+iruVQ8igk0x8UUBvCxJq
 FZr5XoASBwSdn3JKEAo0JD1cUzl2qMXPJANdHzou8r8RovLBpkstvJ
 mwBXgNJLwC8lP/Mqkr8Dt6WaeFRDVz/JxKfMwbIkdMa1dL6uiGp1Z4
 xA3F9XksRHS+x7iaFFlYFMBorMEIwxdwwFYRqC1BnL5szPxxjRDbml
 QMeOUrVE6U3LZB4gDYAxmAzqYON/7uAW5v/VP2UkX+WWxoxOlPpZrf
 7VP39eMf35XzxkY5tsXNHdWc3dWQn2PZeck5Uw4c+Afn4OM5tSvRP7
 t2hS2/HWP3KTo7JxclWmt/75Fw8JzW0tO2uxRVmBcK9i7hrmJp8lpS
 SZTicWWT+dSdV2GU6ZS0nKGUZkGYqkq+i+iNf60kDBAOIIXEZSc4Pe
 IIUQDjAvE2qXUZSlqugUaThaUlp64bnG5rYsUxJCElSifD5PE8kxyv
 PcSjVklEECY//I6UidUS21uJHr0xk6WzrJCuT3gi2GqzpBq+vQnBJT
 M9tQlXYXMRpGkooLj0NEutTH1L2BSzJ2mIaM47qFbIRpsMuMVun/CN
 JTdY3z5Vwqu3QvQ+OsTw2PWKzQ1KxTx/4WXPTQn1ck2xLX+dxmMG9K
 XaTDH+Y4B8OwS6ovyg3AutYLi128RHyRg88Dn5m/bMySybQM8qJwjK
 tUyhaa2IK62xonqQVcxqdwH27upBi6X7uASVO9hBVR+WtuCD22vwCI
 kSvY0o7Ha21fLXwhV4HgQDXP2Gu5MeyYK1cg2sbtWO71Wej0gkyb22
 oiFqOGuY/t8YWWik3BD+TMWKf1DkUK8KoQ1tzDB4gYcsdIXAs4wYWX
 grbeQa9dDTPXihRifY6muS0Sbrulo55rP7KkfFWE0iU/5JDHz769W2
 XDajw2q8sTxRG9I9HfjaW33atiiyih29dArLUeZn55VaJHdU3qLb00
 LEWoQ2O4qPDxJoKIQsAnCM8NwxJBW1IFdL6q9zlVxrVcI2Jws8N5lS
 FbsIHL4GTOLTCp8mx1/YDIORmoxm51UTysM/nzc2bA90nCzItioecj
 ji9AIdEf8Y2ZjLLJ7cqFkMxJ+ixXsTfXyJ+iJi/yVtNB5D5qxW1VIl
 EZc64JFgZDsf3q5WFizXpIU6e+7pv1njY3fP56omuC5UubXpLcDCPc
 6hE0HnJg0kNXf+PyjEwYkQZ85x/SzsxMbEkN9qjvLtS10kSHcOPE6U
 spdFSPGumu6NiXz39gq8Tr2CmA654ysDZOk5gT0b1h2M1uW9B85Zsf
 /+sn9cIy5NO9jq5r1oF4lcUu43KG0XHyhnKFNP8UbfT5p9ztXdpEUD
 hOdUGgKR3iG3rkghyBi8BBsaoLsAzAfSRRzN064qF0l3S5zFfhoci6
 MrtYgd5Kh1doOM3oGPh4PFy++wt6HJ/807fuQOifYP/J82ugf4e3wE
 B79YX8l1/Hac/qOf/dnxBpX+2N+xfetwP5215ew+IvvggHqrc+NG/w
 uOaFdr+33/txMCiyZD4Y/ETfsOvO3c/kr/pM/cFMBAS5a7Fz9y7AmR
 Je4mo4sHrI8CzuykD6Q5H1PLHnPj8/o0CX2SGddc4nOZaHbtdFYuOh
 +ow++3x1OpceIDWZvITc+R93HIHXdSfByX9Tv1cvvv/mmwfXkWcIBh
 wUl29AWZs7pBrphiTG8cOL5Bxz86Mjx2vFcVaV5s05VaPDXxOn44eL
 IX+9SupEkoPkAVMMQFKvH3T8cHw+0wyR6DTrK/0Vv1/+CbvmVCBfPH
 n9x1fnP7x89vrJ66cvv/3+T0+PamYB7h0+e3X+5OXLQxLy6OjqIaKn
 1N130vT4IdzhbtQ/3vtpoC9RNaA+PGBqR9N7mXExwcZgCfqqNnMuNP
 VY4SPxq56j0lBYrol3BEBEzBpXqwdMu9/boPEtJEJ+FZbordeD8ojz
 hN7psgtuQdz/Szqe+70o2oq2txpRD78YRNs7jWazETUj/LZbUXs/Om
 hGNztRtx29txO1mlGrHXU7jeiDqIvV7aiLyU7Uw5YD/EatrWgHHDpR
 1In2Me5GPdB3G9GHNOhsRc2dqNmKOu1G9H7UYQ697WgHZ4Hm38CH+H
 e2oxY2dqIDHHQb24mmu0VkG6cTcSvaa9JMi8lIQvDvEP8uxpjhE/d2
 GtHHNGhDFyHrRrtQ/Ga07Tf2sNqItvHvVtSRScgGbtuk2o6cCG7bUa
 cX7bUa0XtsIpkUwfyhZJBbfAp/JZN+7KRty9f/YOZNJ3YXM79iPiyt
 o/mEacLMjme7miHDNuun/Ls/1DGJdqHOR1GnzvZ9ctO2bGmRhdur1W
 h3Y6ZLZuk0yVDCEwIf7Eb7TWJL4on8bZINbKOtxjZ9sk9l9SD6CAPo
 3mFXHkQ32mzeDvm03Yt2m86qpKl83YW/eKZJIdGW8bZzwU742mS9wK
 0RbcEprcYHjShqsGphHgEMZTHAXsxwEG7vRDewHadg7Gda/NnkGNhi
 gT/YYdtiHpNbnB2QOZwi8gThoZEcyoYl8baI5qDNEQ6esiRM8Nlt3B
 Y+zJ9ExXaRR6RlO+wzf7KYSCsSykGwFafkTQkbkZAPdcYX+UG5H30i
 cgYtRHhP3GVH3xLJxSD70UdtNm8QDEsiOSlLGQF92yzwXofz0dthj+
 WhwPZkuyGMg6gyaDbeFyPIoWK0louc3XAui7EtEopSImcwNUwhu2AN
 BrH3rmMrhmXV2t3ow5BxMhDD0kzNy8JBnNj0abvr1ZQl+Mjn1A6nTN
 fPN8WGcuhWtB9kEIIOh434osV4ImEjgyazFcMeRDclMuvGF3BDkItP
 OxxsDNcd76+2DIK7d8lfuy2/He4TYwZd6pEm6nPM7Av+i1OaHqyCQe
 rzlIzRzRYHrazit+fHAoNbnGiyPaQhywnJv25F/2i0ov9tNKNEJIKH
 96J9UQsGZqHaknMhuLZ8UgbbS8Jtc6S0vCVkiz9wt8fhucXwIILIXo
 yxhM/d6GDbRWJnx2dwXd1w7hYXkyaLGiJXCDrR+x7+d/xMd4NGNN1m
 AQ44rQNwitnYWp2WF5U3tkVN4eBT56DltRBKsXSTJRRuHtUka/eDVU
 OGhdRk1boBS5hty7sAmXcrYAMbaqee38gwSZQQL11X8FshslrRe/Uk
 lr1XsVbwANVDysgu41PLezbEO1sMUXFDmAh6NT26h/BYj5aDLjtaCI
 JTZJWxQQrmzbo8bZ+40iM0V9lzWxJIhAmgItywKjG24/EyALmHkwNB
 IJDtsT2hQs8jaDe6Hawne1sOg6lo4Fz+KhjzYYA0h9keG/gU0agnZ7
 FJ90IUrcdMb4MPs/oAbcK+g4erfRk1a1caPTfJjq53T90tB5n1LsNN
 As82ekB2/Wb3EY5b7/J4fq1F6pC0m82Xm6z1ayuy9qp1gh1+hVUxix
 hTGgoW4xMCdZafDdjdpl7MIR/QgCGfAgnytKgg4JRbG1s4JG7I5C5n
 685qTBt7Lo9a6FvbFK4fNWmytbM2eVt23WZLNl0LGVZbjUbUpWakW5
 s8lnGL/Os6xL3oYxjBT4LDx5jf4+Z6/Swa96hr7vjmCwQ3muRfZ0+q
 G43oP/3XvehWm33N3LqcYgfhq6vY1A67ON+LPmwBGf4f/nChsmgpAA
 ABCtYZPD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTE2
 Ij8+DQo8RW1haWxTZXQ+DQogIDxWZXJzaW9uPjE1LjAuMC4wPC9WZX
 JzaW9uPg0KICA8RW1haWxzPg0KICAgIDxFbWFpbCBTdGFydEluZGV4
 PSIxMDUiIFBvc2l0aW9uPSJPdGhlciI+DQogICAgICA8RW1haWxTdH
 Jpbmc+ZGhvd2VsbHNAcmVkaGF0LmNvbTwvRW1haWxTdHJpbmc+DQog
 ICAgPC9FbWFpbD4NCiAgICA8RW1haWwgU3RhcnRJbmRleD0iMTk3Ii
 BQb3NpdGlvbj0iT3RoZXIiPg0KICAgICAgPEVtYWlsU3RyaW5nPmNo
 cmlzdGlhbkBicmF1bmVyLmlvPC9FbWFpbFN0cmluZz4NCiAgICA8L0
 VtYWlsPg0KICAgIDxFbWFpbCBTdGFydEluZGV4PSIyMzMiIFBvc2l0
 aW9uPSJPdGhlciI+DQogICAgICA8RW1haWxTdHJpbmc+amxheXRvbk
 BrZXJuZWwub3JnPC9FbWFpbFN0cmluZz4NCiAgICA8L0VtYWlsPg0K
 ICAgIDxFbWFpbCBTdGFydEluZGV4PSIyNjkiIFBvc2l0aW9uPSJPdG
 hlciI+DQogICAgICA8RW1haWxTdHJpbmc+aHNpYW5na2FvQGxpbnV4
 LmFsaWJhYmEuY29tPC9FbWFpbFN0cmluZz4NCiAgICA8L0VtYWlsPg
 0KICAgIDxFbWFpbCBTdGFydEluZGV4PSIzMjMiIFBvc2l0aW9uPSJP
 dGhlciI+DQogICAgICA8RW1haWxTdHJpbmc+YXNtYWRldXNAY29kZX
 dyZWNrLm9yZzwvRW1haWxTdHJpbmc+DQogICAgPC9FbWFpbD4NCiAg
 ICA8RW1haWwgU3RhcnRJbmRleD0iNDEyIiBQb3NpdGlvbj0iT3RoZX
 IiPg0KICAgICAgPEVtYWlsU3RyaW5nPndpbGx5QGluZnJhZGVhZC5v
 cmc8L0VtYWlsU3RyaW5nPg0KICAgIDwvRW1haWw+DQogICAgPEVtYW
 lsIFN0YXJ0SW5kZXg9IjQ0OCIgUG9zaXRpb249Ik90aGVyIj4NCiAg
 ICAgIDxFbWFpbFN0cmluZz5zbWZyZW5jaEBnbWFpbC5jb208L0VtYW
 lsU3RyaW5nPg0KICAgIDwvRW1haWw+DQogICAgPEVtYWlsIFN0YXJ0
 SW5kZXg9IjQ4NiIgUG9zaXRpb249Ik90aGVyIj4NCiAgICAgIDxFbW
 FpbFN0cmluZz5tYXJjLmRpb25uZUBhdXJpc3Rvci5jb208L0VtYWls
 U3RyaW5nPg0KICAgIDwvRW1haWw+DQogICAgPEVtYWlsIFN0YXJ0SW
 5kZXg9IjUzMCIgUG9zaXRpb249Ik90aGVyIj4NCiAgICAgIDxFbWFp
 bFN0cmluZz5wY0BtYW5ndWViaXQuY29tPC9FbWFpbFN0cmluZz4NCi
 AgICA8L0VtYWlsPg0KICAgIDxFbWFpbCBTdGFydEluZGV4PSI1Njki
 IFBvc2l0aW9uPSJPdGhlciI+DQogICAgICA8RW1haWxTdHJpbmc+c3
 ByYXNhZEBtaWNyb3NvZnQuY29tPC9FbWFpbFN0cmluZz4NCiAgICA8
 L0VtYWlsPg0KICAgIDxFbWFpbCBTdGFydEluZGV4PSI2MDUiIFBvc2
 l0aW9uPSJPdGhlciI+DQogICAgICA8RW1haWxTdHJpbmc+dG9tQHRh
 bHBleS5jb208L0VtYWlsU3RyaW5nPg0KICAgIDwvRW1haWw+DQogIC
 AgPEVtYWlsIFN0YXJ0SW5kZXg9IjY0NyIgUG9zaXRpb249Ik90aGVy
 Ij4NCiAgICAgIDxFbWFpbFN0cmluZz5lcmljdmhAa2VybmVsLm9yZz
 wvRW1haWxTdHJpbmc+DQogICAgPC9FbWFpbD4NCiAgICA8RW1haWwg
 U3RhcnRJbmRleD0iNjgxIiBQb3NpdGlvbj0iT3RoZXIiPg0KICAgIC
 AgPEVtYWlsU3RyaW5nPmlkcnlvbW92QGdtYWlsLmNvbTwvRW1haWxT
 dHJpbmc+DQogICAgPC9FbWFpbD4NCiAgICA8RW1haWwgU3RhcnRJbm
 RleD0iNzA2IiBQb3NpdGlvbj0iT3RoZXIiPg0KICAgICAgPEVtYWls
 U3RyaW5nPm5ldGZzQGxpc3RzLmxpbnV4LmRldjwvRW1haWxTdHJpbm
 c+DQogICAgPC9FbWFpbD4NCiAgICA8RW1haWwgU3RhcnRJbmRleD0i
 NzI5IiBQb3NpdGlvbj0iT3RoZXIiPg0KICAgICAgPEVtYWlsU3RyaW
 5nPmxpbnV4LWNhY2hlZnNAcmVkaGF0LmNvbTwvRW1haWxTdHJpbmc+
 DQogICAgPC9FbWFpbD4NCiAgICA8RW1haWwgU3RhcnRJbmRleD0iNz
 U1IiBQb3NpdGlvbj0iT3RoZXIiPg0KICAgICAgPEVtYWlsU3RyaW5n
 PmxpbnV4LWFmc0BsaXN0cy5pbmZyYWRlYWQub3JnPC9FbWFpbFN0cm
 luZz4NCiAgICA8L0VtYWlsPg0KICAgIDxFbWFpbCBTdGFydEluZGV4
 PSI3OTAiIFBvc2l0aW9uPSJPdGhlciI+DQogICAgICA8RW1haWxTdH
 Jpbmc+bGludXgtY2lmc0B2Z2VyLmtlcm5lbC5vcmc8L0VtYWlsU3Ry
 aW5nPg0KICAgIDwvRW1haWw+DQogICAgPEVtYWlsIFN0YXJ0SW5kZX
 g9IjgxOCIgUG9zaXRpb249Ik90aGVyIj4NCiAgICAgIDxFbWFpbFN0
 cmluZz5saW51eC1uZnNAdmdlci5rZXJuZWwub3JnPC9FbWFpbFN0cm
 luZz4NCiAgICA8L0VtYWlsPg0KICAgIDxFbWFpbCBTdGFydEluZGV4
 PSI4NTUiIFBvc2l0aW9uPSJPdGhlciI+DQogICAgICA8RW1haWxTdH
 Jpbmc+ZGV2ZWxAdmdlci5rZXJuZWwub3JnPC9FbWFpbFN0cmluZz4N
 CiAgICA8L0VtYWlsPg0KICAgIDxFbWFpbCBTdGFydEluZGV4PSI4Nz
 giIFBvc2l0aW9uPSJPdGhlciI+DQogICAgICA8RW1haWxTdHJpbmc+
 djlmc0BsaXN0cy5saW51eC5kZXY8L0VtYWlsU3RyaW5nPg0KICAgID
 wvRW1haWw+DQogICAgPEVtYWlsIFN0YXJ0SW5kZXg9IjkwMCIgUG9z
 aXRpb249Ik90aGVyIj4NCiAgICAgIDxFbWFpbFN0cmluZz5saW51eC
 1lcm9mc0BsaXN0cy5vemxhYnMub3JnPC9FbWFpbFN0cmluZz4NCiAg
 ICA8L0VtYWlsPg0KICAgIDxFbWFpbCBTdGFydEluZGV4PSI5NDEiIF
 Bvc2l0aW9uPSJPdGhlciI+DQogICAgICA8RW1haWxTdHJpbmc+ZnNk
 ZXZlbEB2Z2VyLmtlcm5lbC5vcmc8L0VtYWlsU3RyaW5nPg0KICAgID
 wvRW1haWw+DQogICAgPEVtYWlsIFN0YXJ0SW5kZXg9Ijk2NiIgUG9z
 aXRpb249Ik90aGVyIj4NCiAgICAgIDxFbWFpbFN0cmluZz5saW51eC
 1tbUBrdmFjay5vcmc8L0VtYWlsU3RyaW5nPg0KICAgIDwvRW1haWw+
 DQogICAgPEVtYWlsIFN0YXJ0SW5kZXg9Ijk4NiIgUG9zaXRpb249Ik
 90aGVyIj4NCiAgICAgIDxFbWFpbFN0cmluZz5uZXRkZXZAdmdlci5r
 ZXJuZWwub3JnPC9FbWFpbFN0cmluZz4NCiAgICA8L0VtYWlsPg0KIC
 AgIDxFbWFpbCBTdGFydEluZGV4PSIxMDIxIiBQb3NpdGlvbj0iT3Ro
 ZXIiPg0KICAgICAgPEVtYWlsU3RyaW5nPmtlcm5lbEB2Z2VyLmtlcm
 5lbC5vcmc8L0VtYWlsU3RyaW5nPg0KICAgIDwvRW1haWw+DQogICAg
 PEVtYWlsIFN0YXJ0SW5kZXg9IjEwNjMiIFBvc2l0aW9uPSJPdGhlci
 I+DQogICAgICA8RW1haWxTdHJpbmc+bHVjaG9AaW9ua292Lm5ldDwv
 RW1haWxTdHJpbmc+DQogICAgPC9FbWFpbD4NCiAgICA8RW1haWwgU3
 RhcnRJbmRleD0iMTEwOSIgUG9zaXRpb249Ik90aGVyIj4NCiAgICAg
 IDxFbWFpbFN0cmluZz5saW51eF9vc3NAY3J1ZGVieXRlLmNvbTwvRW
 1haWxTdHJpbmc+DQogICAgPC9FbWFpbD4NCiAgPC9FbWFpbHM+DQo8
 L0VtYWlsU2V0PgEOzgFSZXRyaWV2ZXJPcGVyYXRvciwxMCwwO1JldH
 JpZXZlck9wZXJhdG9yLDExLDM7UG9zdERvY1BhcnNlck9wZXJhdG9y
 LDEwLDI7UG9zdERvY1BhcnNlck9wZXJhdG9yLDExLDA7UG9zdFdvcm
 RCcmVha2VyRGlhZ25vc3RpY09wZXJhdG9yLDEwLDY7UG9zdFdvcmRC
 cmVha2VyRGlhZ25vc3RpY09wZXJhdG9yLDExLDA7VHJhbnNwb3J0V3
 JpdGVyUHJvZHVjZXIsMjAsOQ==
X-MS-Exchange-Forest-IndexAgent: 1 7753
X-MS-Exchange-Forest-EmailMessageHash: E2F20B10
X-MS-Exchange-Forest-Language: en
X-MS-Exchange-Organization-Processed-By-Journaling: Journal Agent
X-MS-Exchange-Organization-Transport-Properties: DeliveryPriority=Low
X-MS-Exchange-Organization-Prioritization: 2:RC:REDACTED-015ab0b5b79b98c9c07fb4b31a88d5d2@secunet.com:12/10|SR
X-MS-Exchange-Organization-IncludeInSla: False:RecipientCountThresholdExceeded

On 29/03/2024 10:34, Naveen Mamindlapalli wrote:
>> -----Original Message-----
>> From: David Howells <dhowells@redhat.com>
>> Sent: Thursday, March 28, 2024 10:04 PM
>> To: Christian Brauner <christian@brauner.io>; Jeff Layton <jlayton@kernel.org>;
>> Gao Xiang <hsiangkao@linux.alibaba.com>; Dominique Martinet
>> <asmadeus@codewreck.org>
>> Cc: David Howells <dhowells@redhat.com>; Matthew Wilcox
>> <willy@infradead.org>; Steve French <smfrench@gmail.com>; Marc Dionne
>> <marc.dionne@auristor.com>; Paulo Alcantara <pc@manguebit.com>; Shyam
>> Prasad N <sprasad@microsoft.com>; Tom Talpey <tom@talpey.com>; Eric Van
>> Hensbergen <ericvh@kernel.org>; Ilya Dryomov <idryomov@gmail.com>;
>> netfs@lists.linux.dev; linux-cachefs@redhat.com; linux-afs@lists.infradead.org;
>> linux-cifs@vger.kernel.org; linux-nfs@vger.kernel.org; ceph-
>> devel@vger.kernel.org; v9fs@lists.linux.dev; linux-erofs@lists.ozlabs.org; linux-
>> fsdevel@vger.kernel.org; linux-mm@kvack.org; netdev@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Latchesar Ionkov <lucho@ionkov.net>; Christian
>> Schoenebeck <linux_oss@crudebyte.com>
>> Subject: [PATCH 19/26] netfs: New writeback implementation
>>
>> The current netfslib writeback implementation creates writeback requests of
>> contiguous folio data and then separately tiles subrequests over the space
>> twice, once for the server and once for the cache.  This creates a few
>> issues:
>>
>>   (1) Every time there's a discontiguity or a change between writing to only
>>       one destination or writing to both, it must create a new request.
>>       This makes it harder to do vectored writes.
>>
>>   (2) The folios don't have the writeback mark removed until the end of the
>>       request - and a request could be hundreds of megabytes.
>>
>>   (3) In future, I want to support a larger cache granularity, which will
>>       require aggregation of some folios that contain unmodified data (which
>>       only need to go to the cache) and some which contain modifications
>>       (which need to be uploaded and stored to the cache) - but, currently,
>>       these are treated as discontiguous.
>>
>> There's also a move to get everyone to use writeback_iter() to extract
>> writable folios from the pagecache.  That said, currently writeback_iter()
>> has some issues that make it less than ideal:
>>
>>   (1) there's no way to cancel the iteration, even if you find a "temporary"
>>       error that means the current folio and all subsequent folios are going
>>       to fail;
>>
>>   (2) there's no way to filter the folios being written back - something
>>       that will impact Ceph with it's ordered snap system;
>>
>>   (3) and if you get a folio you can't immediately deal with (say you need
>>       to flush the preceding writes), you are left with a folio hanging in
>>       the locked state for the duration, when really we should unlock it and
>>       relock it later.
>>
>> In this new implementation, I use writeback_iter() to pump folios,
>> progressively creating two parallel, but separate streams and cleaning up
>> the finished folios as the subrequests complete.  Either or both streams
>> can contain gaps, and the subrequests in each stream can be of variable
>> size, don't need to align with each other and don't need to align with the
>> folios.
>>
>> Indeed, subrequests can cross folio boundaries, may cover several folios or
>> a folio may be spanned by multiple folios, e.g.:
>>
>>           +---+---+-----+-----+---+----------+
>> Folios:  |   |   |     |     |   |          |
>>           +---+---+-----+-----+---+----------+
>>
>>             +------+------+     +----+----+
>> Upload:    |      |      |.....|    |    |
>>             +------+------+     +----+----+
>>
>>           +------+------+------+------+------+
>> Cache:   |      |      |      |      |      |
>>           +------+------+------+------+------+
>>
>> The progressive subrequest construction permits the algorithm to be
>> preparing both the next upload to the server and the next write to the
>> cache whilst the previous ones are already in progress.  Throttling can be
>> applied to control the rate of production of subrequests - and, in any
>> case, we probably want to write them to the server in ascending order,
>> particularly if the file will be extended.
>>
>> Content crypto can also be prepared at the same time as the subrequests and
>> run asynchronously, with the prepped requests being stalled until the
>> crypto catches up with them.  This might also be useful for transport
>> crypto, but that happens at a lower layer, so probably would be harder to
>> pull off.
>>
>> The algorithm is split into three parts:
>>
>>   (1) The issuer.  This walks through the data, packaging it up, encrypting
>>       it and creating subrequests.  The part of this that generates
>>       subrequests only deals with file positions and spans and so is usable
>>       for DIO/unbuffered writes as well as buffered writes.
>>
>>   (2) The collector. This asynchronously collects completed subrequests,
>>       unlocks folios, frees crypto buffers and performs any retries.  This
>>       runs in a work queue so that the issuer can return to the caller for
>>       writeback (so that the VM can have its kswapd thread back) or async
>>       writes.
>>
>>   (3) The retryer.  This pauses the issuer, waits for all outstanding
>>       subrequests to complete and then goes through the failed subrequests
>>       to reissue them.  This may involve reprepping them (with cifs, the
>>       credits must be renegotiated, and a subrequest may need splitting),
>>       and doing RMW for content crypto if there's a conflicting change on
>>       the server.
>>
>> [!] Note that some of the functions are prefixed with "new_" to avoid
>> clashes with existing functions.  These will be renamed in a later patch
>> that cuts over to the new algorithm.
>>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Jeff Layton <jlayton@kernel.org>
>> cc: Eric Van Hensbergen <ericvh@kernel.org>
>> cc: Latchesar Ionkov <lucho@ionkov.net>
>> cc: Dominique Martinet <asmadeus@codewreck.org>
>> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
>> cc: Marc Dionne <marc.dionne@auristor.com>
>> cc: v9fs@lists.linux.dev
>> cc: linux-afs@lists.infradead.org
>> cc: netfs@lists.linux.dev
>> cc: linux-fsdevel@vger.kernel.org

[..snip..]

>> +/*
>> + * Begin a write operation for writing through the pagecache.
>> + */
>> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t
>> len)
>> +{
>> +	struct netfs_io_request *wreq = NULL;
>> +	struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
>> +
>> +	mutex_lock(&ictx->wb_lock);
>> +
>> +	wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
>> +				      iocb->ki_pos, NETFS_WRITETHROUGH);
>> +	if (IS_ERR(wreq))
>> +		mutex_unlock(&ictx->wb_lock);
>> +
>> +	wreq->io_streams[0].avail = true;
>> +	trace_netfs_write(wreq, netfs_write_trace_writethrough);
> 
> Missing mutex_unlock() before return.
> 

mutex_unlock() happens in new_netfs_end_writethrough()

> Thanks,
> Naveen
> 


X-sender: <netdev+bounces-83486-steffen.klassert=secunet.com@vger.kernel.org>
X-Receiver: <steffen.klassert@secunet.com> ORCPT=rfc822;steffen.klassert@secunet.com
X-CreatedBy: MSExchange15
X-HeloDomain: mbx-dresden-01.secunet.de
X-ExtendedProps: BQBjAAoANE2mlidQ3AgFADcAAgAADwA8AAAATWljcm9zb2Z0LkV4Y2hhbmdlLlRyYW5zcG9ydC5NYWlsUmVjaXBpZW50Lk9yZ2FuaXphdGlvblNjb3BlEQAAAAAAAAAAAAAAAAAAAAAADwA/AAAATWljcm9zb2Z0LkV4Y2hhbmdlLlRyYW5zcG9ydC5EaXJlY3RvcnlEYXRhLk1haWxEZWxpdmVyeVByaW9yaXR5DwADAAAATG93
X-Source: SMTP:Default MBX-ESSEN-02
X-SourceIPAddress: 10.53.40.199
X-EndOfInjectedXHeaders: 15568
Received: from mbx-dresden-01.secunet.de (10.53.40.199) by
 mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.37; Sat, 30 Mar 2024 02:06:35 +0100
Received: from a.mx.secunet.com (62.96.220.36) by cas-essen-01.secunet.de
 (10.53.40.201) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend
 Transport; Sat, 30 Mar 2024 02:06:34 +0100
Received: from localhost (localhost [127.0.0.1])
	by a.mx.secunet.com (Postfix) with ESMTP id B2D8120847
	for <steffen.klassert@secunet.com>; Sat, 30 Mar 2024 02:06:34 +0100 (CET)
X-Virus-Scanned: by secunet
X-Spam-Flag: NO
X-Spam-Score: -2.751
X-Spam-Level:
X-Spam-Status: No, score=-2.751 tagged_above=-999 required=2.1
	tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249,
	MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_NONE=-0.0001,
	SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: a.mx.secunet.com (amavisd-new);
	dkim=pass (1024-bit key) header.d=linux.dev
Received: from a.mx.secunet.com ([127.0.0.1])
	by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id rTuZ69sK9a_2 for <steffen.klassert@secunet.com>;
	Sat, 30 Mar 2024 02:06:31 +0100 (CET)
Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=147.75.199.223; helo=ny.mirrors.kernel.org; envelope-from=netdev+bounces-83486-steffen.klassert=secunet.com@vger.kernel.org; receiver=steffen.klassert@secunet.com 
DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com 29E82201E5
Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org [147.75.199.223])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by a.mx.secunet.com (Postfix) with ESMTPS id 29E82201E5
	for <steffen.klassert@secunet.com>; Sat, 30 Mar 2024 02:06:31 +0100 (CET)
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by ny.mirrors.kernel.org (Postfix) with ESMTPS id E8EA61C21430
	for <steffen.klassert@secunet.com>; Sat, 30 Mar 2024 01:06:29 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id EB85617D2;
	Sat, 30 Mar 2024 01:06:22 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="VugHsG2Y"
X-Original-To: netdev@vger.kernel.org
Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 762E77E8
	for <netdev@vger.kernel.org>; Sat, 30 Mar 2024 01:06:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.173
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1711760782; cv=none; b=dASfLrrkRxmD6WmYvcvyTFgLXAgqW4qcP8FwVw/FT8ajSayU1k2jNzB6oEhlAk4YxWiFWUStYosH2VKaROBs5wKHQh4Rsxe59gs4L4KuJN+VlHKDa1iIm9ShtgGS6jAthHnsiMpAE+me1GueQZILnQSEjyu5ZoBpE9mg1Ojzukk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1711760782; c=relaxed/simple;
	bh=0Tab6a8G72hAdQQTifWJdxmg84+y/tA9VvkWPFI9VQA=;
	h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:
	 In-Reply-To:Content-Type; b=J9cTiVFLYlA2DR+fBNIsoiV/11LbFxExG+qAmCsON2fksIZjZEAFWqHQx2zJk8Dqn3t/Quqw4LH8Yjb2qqlthM0L82RcciykTG9EQ9SPWlqiRoPPhuerZSz/amNX1IgyImsufFdXk4+oiQpzCA0LsWzVgTdTI9x4oenmDhjahZI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=VugHsG2Y; arc=none smtp.client-ip=91.218.175.173
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev
Message-ID: <08dd01e3-c45e-47d9-bcde-55f7d1edc480@linux.dev>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1711760777;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=BFgFwDTUylEnErMgQV0Ufr9/Ufnl/0omKSnqywixpVg=;
	b=VugHsG2Y5vYBP20CvcOMmGaEI/5A/PvLnTsQTtTA0ZebTuac4nORyH8iRZe/CwFs5RLRhJ
	4Ih/prbwbd8/OSA0Wv9Z9Z9JdeLOJUf8/vLW1xeGCG/2qNeI4CXYcIw3EixotT7o6oviEg
	ZM4gfY/Y4bUjm5TsY8pyZBWQLZ0Jv74=
Date: Fri, 29 Mar 2024 18:06:09 -0700
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Subject: Re: [PATCH 19/26] netfs: New writeback implementation
Content-Language: en-US
To: Naveen Mamindlapalli <naveenm@marvell.com>,
 David Howells <dhowells@redhat.com>, Christian Brauner
 <christian@brauner.io>, Jeff Layton <jlayton@kernel.org>,
 Gao Xiang <hsiangkao@linux.alibaba.com>,
 Dominique Martinet <asmadeus@codewreck.org>
Cc: Matthew Wilcox <willy@infradead.org>, Steve French <smfrench@gmail.com>,
 Marc Dionne <marc.dionne@auristor.com>, Paulo Alcantara <pc@manguebit.com>,
 Shyam Prasad N <sprasad@microsoft.com>, Tom Talpey <tom@talpey.com>,
 Eric Van Hensbergen <ericvh@kernel.org>, Ilya Dryomov <idryomov@gmail.com>,
 "netfs@lists.linux.dev" <netfs@lists.linux.dev>,
 "linux-cachefs@redhat.com" <linux-cachefs@redhat.com>,
 "linux-afs@lists.infradead.org" <linux-afs@lists.infradead.org>,
 "linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>,
 "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
 "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
 "v9fs@lists.linux.dev" <v9fs@lists.linux.dev>,
 "linux-erofs@lists.ozlabs.org" <linux-erofs@lists.ozlabs.org>,
 "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
 "linux-mm@kvack.org" <linux-mm@kvack.org>,
 "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
 Latchesar Ionkov <lucho@ionkov.net>,
 Christian Schoenebeck <linux_oss@crudebyte.com>
References: <20240328163424.2781320-1-dhowells@redhat.com>
 <20240328163424.2781320-20-dhowells@redhat.com>
 <SJ2PR18MB5635A86C024316BC5E57B79EA23A2@SJ2PR18MB5635.namprd18.prod.outlook.com>
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
In-Reply-To: <SJ2PR18MB5635A86C024316BC5E57B79EA23A2@SJ2PR18MB5635.namprd18.prod.outlook.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Migadu-Flow: FLOW_OUT
Return-Path: netdev+bounces-83486-steffen.klassert=secunet.com@vger.kernel.org
X-MS-Exchange-Organization-OriginalArrivalTime: 30 Mar 2024 01:06:34.7524
 (UTC)
X-MS-Exchange-Organization-Network-Message-Id: 87894eac-5cd8-4523-8273-08dc5055a4d3
X-MS-Exchange-Organization-OriginalClientIPAddress: 62.96.220.36
X-MS-Exchange-Organization-OriginalServerIPAddress: 10.53.40.201
X-MS-Exchange-Organization-Cross-Premises-Headers-Processed: cas-essen-01.secunet.de
X-MS-Exchange-Organization-OrderedPrecisionLatencyInProgress: LSRV=cas-essen-01.secunet.de:TOTAL-FE=0.012|SMR=0.012(SMRPI=0.009(SMRPI-FrontendProxyAgent=0.009));2024-03-30T01:06:34.764Z
X-MS-Exchange-Forest-ArrivalHubServer: mbx-essen-02.secunet.de
X-MS-Exchange-Organization-AuthSource: cas-essen-01.secunet.de
X-MS-Exchange-Organization-AuthAs: Anonymous
X-MS-Exchange-Organization-OriginalSize: 15019
X-MS-Exchange-Organization-Transport-Properties: DeliveryPriority=Low
X-MS-Exchange-Organization-Prioritization: 2:ShadowRedundancy
X-MS-Exchange-Organization-IncludeInSla: False:ShadowRedundancy

On 29/03/2024 10:34, Naveen Mamindlapalli wrote:
>> -----Original Message-----
>> From: David Howells <dhowells@redhat.com>
>> Sent: Thursday, March 28, 2024 10:04 PM
>> To: Christian Brauner <christian@brauner.io>; Jeff Layton <jlayton@kernel.org>;
>> Gao Xiang <hsiangkao@linux.alibaba.com>; Dominique Martinet
>> <asmadeus@codewreck.org>
>> Cc: David Howells <dhowells@redhat.com>; Matthew Wilcox
>> <willy@infradead.org>; Steve French <smfrench@gmail.com>; Marc Dionne
>> <marc.dionne@auristor.com>; Paulo Alcantara <pc@manguebit.com>; Shyam
>> Prasad N <sprasad@microsoft.com>; Tom Talpey <tom@talpey.com>; Eric Van
>> Hensbergen <ericvh@kernel.org>; Ilya Dryomov <idryomov@gmail.com>;
>> netfs@lists.linux.dev; linux-cachefs@redhat.com; linux-afs@lists.infradead.org;
>> linux-cifs@vger.kernel.org; linux-nfs@vger.kernel.org; ceph-
>> devel@vger.kernel.org; v9fs@lists.linux.dev; linux-erofs@lists.ozlabs.org; linux-
>> fsdevel@vger.kernel.org; linux-mm@kvack.org; netdev@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Latchesar Ionkov <lucho@ionkov.net>; Christian
>> Schoenebeck <linux_oss@crudebyte.com>
>> Subject: [PATCH 19/26] netfs: New writeback implementation
>>
>> The current netfslib writeback implementation creates writeback requests of
>> contiguous folio data and then separately tiles subrequests over the space
>> twice, once for the server and once for the cache.  This creates a few
>> issues:
>>
>>   (1) Every time there's a discontiguity or a change between writing to only
>>       one destination or writing to both, it must create a new request.
>>       This makes it harder to do vectored writes.
>>
>>   (2) The folios don't have the writeback mark removed until the end of the
>>       request - and a request could be hundreds of megabytes.
>>
>>   (3) In future, I want to support a larger cache granularity, which will
>>       require aggregation of some folios that contain unmodified data (which
>>       only need to go to the cache) and some which contain modifications
>>       (which need to be uploaded and stored to the cache) - but, currently,
>>       these are treated as discontiguous.
>>
>> There's also a move to get everyone to use writeback_iter() to extract
>> writable folios from the pagecache.  That said, currently writeback_iter()
>> has some issues that make it less than ideal:
>>
>>   (1) there's no way to cancel the iteration, even if you find a "temporary"
>>       error that means the current folio and all subsequent folios are going
>>       to fail;
>>
>>   (2) there's no way to filter the folios being written back - something
>>       that will impact Ceph with it's ordered snap system;
>>
>>   (3) and if you get a folio you can't immediately deal with (say you need
>>       to flush the preceding writes), you are left with a folio hanging in
>>       the locked state for the duration, when really we should unlock it and
>>       relock it later.
>>
>> In this new implementation, I use writeback_iter() to pump folios,
>> progressively creating two parallel, but separate streams and cleaning up
>> the finished folios as the subrequests complete.  Either or both streams
>> can contain gaps, and the subrequests in each stream can be of variable
>> size, don't need to align with each other and don't need to align with the
>> folios.
>>
>> Indeed, subrequests can cross folio boundaries, may cover several folios or
>> a folio may be spanned by multiple folios, e.g.:
>>
>>           +---+---+-----+-----+---+----------+
>> Folios:  |   |   |     |     |   |          |
>>           +---+---+-----+-----+---+----------+
>>
>>             +------+------+     +----+----+
>> Upload:    |      |      |.....|    |    |
>>             +------+------+     +----+----+
>>
>>           +------+------+------+------+------+
>> Cache:   |      |      |      |      |      |
>>           +------+------+------+------+------+
>>
>> The progressive subrequest construction permits the algorithm to be
>> preparing both the next upload to the server and the next write to the
>> cache whilst the previous ones are already in progress.  Throttling can be
>> applied to control the rate of production of subrequests - and, in any
>> case, we probably want to write them to the server in ascending order,
>> particularly if the file will be extended.
>>
>> Content crypto can also be prepared at the same time as the subrequests and
>> run asynchronously, with the prepped requests being stalled until the
>> crypto catches up with them.  This might also be useful for transport
>> crypto, but that happens at a lower layer, so probably would be harder to
>> pull off.
>>
>> The algorithm is split into three parts:
>>
>>   (1) The issuer.  This walks through the data, packaging it up, encrypting
>>       it and creating subrequests.  The part of this that generates
>>       subrequests only deals with file positions and spans and so is usable
>>       for DIO/unbuffered writes as well as buffered writes.
>>
>>   (2) The collector. This asynchronously collects completed subrequests,
>>       unlocks folios, frees crypto buffers and performs any retries.  This
>>       runs in a work queue so that the issuer can return to the caller for
>>       writeback (so that the VM can have its kswapd thread back) or async
>>       writes.
>>
>>   (3) The retryer.  This pauses the issuer, waits for all outstanding
>>       subrequests to complete and then goes through the failed subrequests
>>       to reissue them.  This may involve reprepping them (with cifs, the
>>       credits must be renegotiated, and a subrequest may need splitting),
>>       and doing RMW for content crypto if there's a conflicting change on
>>       the server.
>>
>> [!] Note that some of the functions are prefixed with "new_" to avoid
>> clashes with existing functions.  These will be renamed in a later patch
>> that cuts over to the new algorithm.
>>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Jeff Layton <jlayton@kernel.org>
>> cc: Eric Van Hensbergen <ericvh@kernel.org>
>> cc: Latchesar Ionkov <lucho@ionkov.net>
>> cc: Dominique Martinet <asmadeus@codewreck.org>
>> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
>> cc: Marc Dionne <marc.dionne@auristor.com>
>> cc: v9fs@lists.linux.dev
>> cc: linux-afs@lists.infradead.org
>> cc: netfs@lists.linux.dev
>> cc: linux-fsdevel@vger.kernel.org

[..snip..]

>> +/*
>> + * Begin a write operation for writing through the pagecache.
>> + */
>> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t
>> len)
>> +{
>> +	struct netfs_io_request *wreq = NULL;
>> +	struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
>> +
>> +	mutex_lock(&ictx->wb_lock);
>> +
>> +	wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
>> +				      iocb->ki_pos, NETFS_WRITETHROUGH);
>> +	if (IS_ERR(wreq))
>> +		mutex_unlock(&ictx->wb_lock);
>> +
>> +	wreq->io_streams[0].avail = true;
>> +	trace_netfs_write(wreq, netfs_write_trace_writethrough);
> 
> Missing mutex_unlock() before return.
> 

mutex_unlock() happens in new_netfs_end_writethrough()

> Thanks,
> Naveen
> 



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys
  2024-03-28 16:34 ` [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys David Howells
@ 2024-04-01 13:53   ` Simon Horman
  2024-04-02  8:32   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Simon Horman @ 2024-04-01 13:53 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Steve French,
	linux-cachefs, Gao Xiang, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, netdev, v9fs, Jeff Layton, linux-kernel,
	linux-fsdevel, netfs, linux-erofs

On Thu, Mar 28, 2024 at 04:34:18PM +0000, David Howells wrote:

...

> +void afs_issue_write(struct netfs_io_subrequest *subreq)
>  {
> +	struct netfs_io_request *wreq = subreq->rreq;
>  	struct afs_operation *op;
> -	struct afs_wb_key *wbk = NULL;
> -	loff_t size = iov_iter_count(iter);
> +	struct afs_vnode *vnode = AFS_FS_I(wreq->inode);
> +	unsigned long long pos = subreq->start + subreq->transferred;
> +	size_t len = subreq->len - subreq->transferred;
>  	int ret = -ENOKEY;
>  
> -	_enter("%s{%llx:%llu.%u},%llx,%llx",
> +	_enter("R=%x[%x],%s{%llx:%llu.%u},%llx,%zx",
> +	       wreq->debug_id, subreq->debug_index,
>  	       vnode->volume->name,
>  	       vnode->fid.vid,
>  	       vnode->fid.vnode,
>  	       vnode->fid.unique,
> -	       size, pos);
> +	       pos, len);
>  
> -	ret = afs_get_writeback_key(vnode, &wbk);
> -	if (ret) {
> -		_leave(" = %d [no keys]", ret);
> -		return ret;
> -	}
> +#if 0 // Error injection
> +	if (subreq->debug_index == 3)
> +		return netfs_write_subrequest_terminated(subreq, -ENOANO, false);
>  
> -	op = afs_alloc_operation(wbk->key, vnode->volume);
> -	if (IS_ERR(op)) {
> -		afs_put_wb_key(wbk);
> -		return -ENOMEM;
> +	if (!test_bit(NETFS_SREQ_RETRYING, &subreq->flags)) {
> +		set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
> +		return netfs_write_subrequest_terminated(subreq, -EAGAIN, false);
>  	}
> +#endif
> +
> +	op = afs_alloc_operation(wreq->netfs_priv, vnode->volume);
> +	if (IS_ERR(op))
> +		return netfs_write_subrequest_terminated(subreq, -EAGAIN, false);
>  
>  	afs_op_set_vnode(op, 0, vnode);
> -	op->file[0].dv_delta = 1;
> +	op->file[0].dv_delta	= 1;
>  	op->file[0].modification = true;
> -	op->store.pos = pos;
> -	op->store.size = size;
> -	op->flags |= AFS_OPERATION_UNINTR;
> -	op->ops = &afs_store_data_operation;
> +	op->store.pos		= pos;
> +	op->store.size		= len,

nit: this is probably more intuitively written using len;

> +	op->flags		|= AFS_OPERATION_UNINTR;
> +	op->ops			= &afs_store_data_operation;
>  
> -try_next_key:
>  	afs_begin_vnode_operation(op);
>  
> -	op->store.write_iter = iter;
> -	op->store.i_size = max(pos + size, vnode->netfs.remote_i_size);
> -	op->mtime = inode_get_mtime(&vnode->netfs.inode);
> +	op->store.write_iter	= &subreq->io_iter;
> +	op->store.i_size	= umax(pos + len, vnode->netfs.remote_i_size);
> +	op->mtime		= inode_get_mtime(&vnode->netfs.inode);
>  
>  	afs_wait_for_operation(op);
> -
> -	switch (afs_op_error(op)) {
> +	ret = afs_put_operation(op);
> +	switch (ret) {
>  	case -EACCES:
>  	case -EPERM:
>  	case -ENOKEY:
>  	case -EKEYEXPIRED:
>  	case -EKEYREJECTED:
>  	case -EKEYREVOKED:
> -		_debug("next");
> -
> -		ret = afs_get_writeback_key(vnode, &wbk);
> -		if (ret == 0) {
> -			key_put(op->key);
> -			op->key = key_get(wbk->key);
> -			goto try_next_key;
> -		}
> +		/* If there are more keys we can try, use the retry algorithm
> +		 * to rotate the keys.
> +		 */
> +		if (wreq->netfs_priv2)
> +			set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
>  		break;
>  	}
>  
> -	afs_put_wb_key(wbk);
> -	_leave(" = %d", afs_op_error(op));
> -	return afs_put_operation(op);
> +	netfs_write_subrequest_terminated(subreq, ret < 0 ? ret : subreq->len, false);
>  }
>  
>  /*

...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys
  2024-03-28 16:34 ` [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys David Howells
  2024-04-01 13:53   ` Simon Horman
@ 2024-04-02  8:32   ` David Howells
  2024-04-10 17:38     ` Simon Horman
  2024-04-11  7:09     ` David Howells
  1 sibling, 2 replies; 62+ messages in thread
From: David Howells @ 2024-04-02  8:32 UTC (permalink / raw)
  To: Simon Horman
  Cc: Dominique Martinet, dhowells, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Steve French,
	linux-cachefs, Gao Xiang, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, netdev, v9fs, Jeff Layton, linux-kernel,
	linux-fsdevel, netfs, linux-erofs

Simon Horman <horms@kernel.org> wrote:

> > +	op->store.size		= len,
> 
> nit: this is probably more intuitively written using len;

I'm not sure it makes a difference, but switching 'size' to 'len' in kafs is a
separate thing that doesn't need to be part of this patchset.

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 19/26] netfs: New writeback implementation
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (25 preceding siblings ...)
  2024-03-28 16:34 ` [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys David Howells
@ 2024-04-02  8:46 ` David Howells
  2024-04-02 10:48 ` [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache Christian Brauner
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-04-02  8:46 UTC (permalink / raw)
  To: Dan Carpenter, Naveen Mamindlapalli, Vadim Fedorenko
  Cc: Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck,
	dhowells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Matthew Wilcox, Steve French, linux-cachefs,
	Gao Xiang, Ilya Dryomov, Shyam Prasad N, Tom Talpey, ceph-devel,
	Eric Van Hensbergen, Christian Brauner, linux-nfs, netdev, v9fs,
	Jeff Layton, linux-kernel, linux-fsdevel, netfs, linux-erofs

David Howells <dhowells@redhat.com> wrote:

> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t len)
> +{
> +	struct netfs_io_request *wreq = NULL;
> +	struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
> +
> +	mutex_lock(&ictx->wb_lock);
> +
> +	wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
> +				      iocb->ki_pos, NETFS_WRITETHROUGH);
> +	if (IS_ERR(wreq))
> +		mutex_unlock(&ictx->wb_lock);

This needs a "return wreq;" adding and appropriate braces.  Thanks to those
who pointed it out.

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (26 preceding siblings ...)
  2024-04-02  8:46 ` [PATCH 19/26] netfs: New writeback implementation David Howells
@ 2024-04-02 10:48 ` Christian Brauner
  2024-04-04  7:51 ` [PATCH 21/26] netfs, 9p: Implement helpers for new write code David Howells
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: Christian Brauner @ 2024-04-02 10:48 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Steve French,
	linux-cachefs, Ilya Dryomov, Shyam Prasad N, linux-nfs,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	netdev, v9fs, Jeff Layton, linux-kernel, linux-fsdevel, netfs,
	linux-erofs

On Thu, 28 Mar 2024 16:33:52 +0000, David Howells wrote:
> The primary purpose of these patches is to rework the netfslib writeback
> implementation such that pages read from the cache are written to the cache
> through ->writepages(), thereby allowing the fscache page flag to be
> retired.
> 
> The reworking also:
> 
> [...]

Pulled from netfs-writeback which contains the minor fixes pointed out.

---

Applied to the vfs.netfs branch of the vfs/vfs.git tree.
Patches in the vfs.netfs branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.netfs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] mm: Export writeback_iter()
  2024-03-28 16:34 ` [PATCH 15/26] mm: Export writeback_iter() David Howells
@ 2024-04-03  8:59   ` Christoph Hellwig
  2024-04-03 10:10   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2024-04-03  8:59 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Christoph Hellwig,
	Steve French, linux-cachefs, Gao Xiang, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	Christian Brauner, linux-nfs, netdev, v9fs, Jeff Layton,
	linux-kernel, linux-fsdevel, netfs, linux-erofs

On Thu, Mar 28, 2024 at 04:34:07PM +0000, David Howells wrote:
> Export writeback_iter() so that it can be used by netfslib as a module.

EXPORT_SYMBOL_GPL, please.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] mm: Export writeback_iter()
  2024-03-28 16:34 ` [PATCH 15/26] mm: Export writeback_iter() David Howells
  2024-04-03  8:59   ` Christoph Hellwig
@ 2024-04-03 10:10   ` David Howells
  2024-04-03 10:14     ` Christoph Hellwig
  2024-04-03 10:55     ` David Howells
  1 sibling, 2 replies; 62+ messages in thread
From: David Howells @ 2024-04-03 10:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dominique Martinet, dhowells, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Steve French,
	linux-cachefs, Gao Xiang, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, netdev, v9fs, Jeff Layton, linux-kernel,
	linux-fsdevel, netfs, linux-erofs

Christoph Hellwig <hch@lst.de> wrote:

> On Thu, Mar 28, 2024 at 04:34:07PM +0000, David Howells wrote:
> > Export writeback_iter() so that it can be used by netfslib as a module.
> 
> EXPORT_SYMBOL_GPL, please.

That depends.  You put a comment on write_cache_pages() saying that people
should use writeback_iter() instead.  w_c_p() is not marked GPL.  Is it your
intention to get rid of it?

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] mm: Export writeback_iter()
  2024-04-03 10:10   ` David Howells
@ 2024-04-03 10:14     ` Christoph Hellwig
  2024-04-03 10:55     ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2024-04-03 10:14 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Christoph Hellwig,
	Steve French, linux-cachefs, Gao Xiang, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	Christian Brauner, linux-nfs, netdev, v9fs, Jeff Layton,
	linux-kernel, linux-fsdevel, netfs, linux-erofs

On Wed, Apr 03, 2024 at 11:10:47AM +0100, David Howells wrote:
> That depends.  You put a comment on write_cache_pages() saying that people
> should use writeback_iter() instead.  w_c_p() is not marked GPL.  Is it your
> intention to get rid of it?

Yes.  If you think you're not a derivate work of Linux you have no
business using either one.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] mm: Export writeback_iter()
  2024-04-03 10:10   ` David Howells
  2024-04-03 10:14     ` Christoph Hellwig
@ 2024-04-03 10:55     ` David Howells
  2024-04-03 12:41       ` Christoph Hellwig
  2024-04-03 12:58       ` David Howells
  1 sibling, 2 replies; 62+ messages in thread
From: David Howells @ 2024-04-03 10:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dominique Martinet, dhowells, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Steve French,
	linux-cachefs, Gao Xiang, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, netdev, v9fs, Jeff Layton, linux-kernel,
	linux-fsdevel, netfs, linux-erofs

Christoph Hellwig <hch@lst.de> wrote:

> On Wed, Apr 03, 2024 at 11:10:47AM +0100, David Howells wrote:
> > That depends.  You put a comment on write_cache_pages() saying that people
> > should use writeback_iter() instead.  w_c_p() is not marked GPL.  Is it your
> > intention to get rid of it?
> 
> Yes.  If you think you're not a derivate work of Linux you have no
> business using either one.

So why are we bothering with EXPORT_SYMBOL at all?  Why don't you just send a
patch replace all of them with EXPORT_SYMBOL_GPL()?

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] mm: Export writeback_iter()
  2024-04-03 10:55     ` David Howells
@ 2024-04-03 12:41       ` Christoph Hellwig
  2024-04-03 12:58       ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2024-04-03 12:41 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Christoph Hellwig,
	Steve French, linux-cachefs, Gao Xiang, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	Christian Brauner, linux-nfs, netdev, v9fs, Jeff Layton,
	linux-kernel, linux-fsdevel, netfs, linux-erofs

On Wed, Apr 03, 2024 at 11:55:00AM +0100, David Howells wrote:
> So why are we bothering with EXPORT_SYMBOL at all?  Why don't you just send a
> patch replace all of them with EXPORT_SYMBOL_GPL()?

No my business.  But if you want to side track this let me just put this
in here:

NAK to the non-GPL EXPORT of writeback_iter().


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] mm: Export writeback_iter()
  2024-04-03 10:55     ` David Howells
  2024-04-03 12:41       ` Christoph Hellwig
@ 2024-04-03 12:58       ` David Howells
  2024-04-05  6:53         ` Christoph Hellwig
  2024-04-05 10:15         ` Christian Brauner
  1 sibling, 2 replies; 62+ messages in thread
From: David Howells @ 2024-04-03 12:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dominique Martinet, dhowells, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Steve French,
	linux-cachefs, Gao Xiang, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, netdev, v9fs, Jeff Layton, linux-kernel,
	linux-fsdevel, netfs, linux-erofs

Christoph Hellwig <hch@lst.de> wrote:

> > So why are we bothering with EXPORT_SYMBOL at all?  Why don't you just
> > send a patch replace all of them with EXPORT_SYMBOL_GPL()?
> 
> No my business.

Clearly it is as you're gradually replacing APIs with stuff that is GPL'd.

> But if you want to side track this let me just put this in here:
> 
> NAK to the non-GPL EXPORT of writeback_iter().

Very well, I'll switch that export to GPL.  Christian, if you can amend that
patch in your tree?

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 21/26] netfs, 9p: Implement helpers for new write code
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (27 preceding siblings ...)
  2024-04-02 10:48 ` [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache Christian Brauner
@ 2024-04-04  7:51 ` David Howells
  2024-04-04  8:01 ` David Howells
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-04-04  7:51 UTC (permalink / raw)
  Cc: Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck,
	dhowells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Matthew Wilcox, Steve French, linux-cachefs,
	Gao Xiang, Ilya Dryomov, Shyam Prasad N, Tom Talpey, ceph-devel,
	Eric Van Hensbergen, Christian Brauner, linux-nfs, netdev, v9fs,
	Jeff Layton, linux-kernel, linux-fsdevel, netfs, linux-erofs

David Howells <dhowells@redhat.com> wrote:

> +	size_t len = subreq->len - subreq->transferred;

This actually needs to be 'int len' because of the varargs packet formatter.

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 21/26] netfs, 9p: Implement helpers for new write code
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (28 preceding siblings ...)
  2024-04-04  7:51 ` [PATCH 21/26] netfs, 9p: Implement helpers for new write code David Howells
@ 2024-04-04  8:01 ` David Howells
  2024-04-08 15:53 ` [PATCH 23/26] netfs: Cut over to using new writeback code David Howells
  2024-04-15 12:49 ` [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache Jeff Layton
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-04-04  8:01 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck,
	dhowells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Matthew Wilcox, Steve French, linux-cachefs,
	Gao Xiang, Ilya Dryomov, Shyam Prasad N, Tom Talpey, ceph-devel,
	Eric Van Hensbergen, linux-nfs, netdev, v9fs, Jeff Layton,
	linux-kernel, linux-fsdevel, netfs, linux-erofs

David Howells <dhowells@redhat.com> wrote:

> > +	size_t len = subreq->len - subreq->transferred;
> 
> This actually needs to be 'int len' because of the varargs packet formatter.

I think the attached change is what's required.

David
---
diff --git a/net/9p/client.c b/net/9p/client.c
index 844aca4fe4d8..04af2a7bf54b 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1670,10 +1670,10 @@ p9_client_write_subreq(struct netfs_io_subrequest *subreq)
 	struct p9_client *clnt = fid->clnt;
 	struct p9_req_t *req;
 	unsigned long long start = subreq->start + subreq->transferred;
-	size_t len = subreq->len - subreq->transferred;
-	int written, err;
+	int written, len = subreq->len - subreq->transferred;
+	int err;
 
-	p9_debug(P9_DEBUG_9P, ">>> TWRITE fid %d offset %llu len %zd\n",
+	p9_debug(P9_DEBUG_9P, ">>> TWRITE fid %d offset %llu len %d\n",
 		 fid->fid, start, len);
 
 	/* Don't bother zerocopy for small IO (< 1024) */
@@ -1699,11 +1699,11 @@ p9_client_write_subreq(struct netfs_io_subrequest *subreq)
 	}
 
 	if (written > len) {
-		pr_err("bogus RWRITE count (%d > %lu)\n", written, len);
+		pr_err("bogus RWRITE count (%d > %u)\n", written, len);
 		written = len;
 	}
 
-	p9_debug(P9_DEBUG_9P, "<<< RWRITE count %zd\n", len);
+	p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", len);
 
 	p9_req_put(clnt, req);
 	netfs_write_subrequest_terminated(subreq, written, false);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] mm: Export writeback_iter()
  2024-04-03 12:58       ` David Howells
@ 2024-04-05  6:53         ` Christoph Hellwig
  2024-04-05 10:15         ` Christian Brauner
  1 sibling, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2024-04-05  6:53 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Christoph Hellwig,
	Steve French, linux-cachefs, Gao Xiang, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	Christian Brauner, linux-nfs, netdev, v9fs, Jeff Layton,
	linux-kernel, linux-fsdevel, netfs, linux-erofs

On Wed, Apr 03, 2024 at 01:58:15PM +0100, David Howells wrote:
> Very well, I'll switch that export to GPL.  Christian, if you can amend that
> patch in your tree?

Thanks!


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] mm: Export writeback_iter()
  2024-04-03 12:58       ` David Howells
  2024-04-05  6:53         ` Christoph Hellwig
@ 2024-04-05 10:15         ` Christian Brauner
  1 sibling, 0 replies; 62+ messages in thread
From: Christian Brauner @ 2024-04-05 10:15 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Christoph Hellwig,
	Steve French, linux-cachefs, Gao Xiang, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	Christian Brauner, linux-nfs, netdev, v9fs, Jeff Layton,
	linux-kernel, linux-fsdevel, netfs, linux-erofs

On Wed, Apr 03, 2024 at 01:58:15PM +0100, David Howells wrote:
> Christoph Hellwig <hch@lst.de> wrote:
> 
> > > So why are we bothering with EXPORT_SYMBOL at all?  Why don't you just
> > > send a patch replace all of them with EXPORT_SYMBOL_GPL()?
> > 
> > No my business.
> 
> Clearly it is as you're gradually replacing APIs with stuff that is GPL'd.
> 
> > But if you want to side track this let me just put this in here:
> > 
> > NAK to the non-GPL EXPORT of writeback_iter().
> 
> Very well, I'll switch that export to GPL.  Christian, if you can amend that
> patch in your tree?

Sorted yesterday night!

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 23/26] netfs: Cut over to using new writeback code
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (29 preceding siblings ...)
  2024-04-04  8:01 ` David Howells
@ 2024-04-08 15:53 ` David Howells
  2024-04-15 12:49 ` [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache Jeff Layton
  31 siblings, 0 replies; 62+ messages in thread
From: David Howells @ 2024-04-08 15:53 UTC (permalink / raw)
  To: Christian Brauner, Matthew Wilcox
  Cc: Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck,
	dhowells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Steve French, linux-cachefs, Gao Xiang, Ilya Dryomov,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	linux-nfs, netdev, v9fs, Jeff Layton, linux-kernel,
	linux-fsdevel, netfs, linux-erofs

David Howells <dhowells@redhat.com> wrote:

> +		/* Wait for writeback to complete.  The writeback engine owns
> +		 * the info in folio->private and may change it until it
> +		 * removes the WB mark.
> +		 */
> +		if (folio_wait_writeback_killable(folio)) {
> +			ret = written ? -EINTR : -ERESTARTSYS;
> +			goto error_folio_unlock;
> +		}
> +

It turns out that this really kills performance with fio with as many jobs as
cpus.  It's taking up to around 8x longer to complete a pwrite() on average
and perf shows a 30% of the CPU cycles are being spent in contention on the
i_rwsem.

The reason this was added here is that writeback cannot take the folio lock in
order to clean up folio->private without risking deadlock vs the truncation
routines (IIRC).

I can mitigate this by skipping the wait if folio->private is not set and if
we're not going to attach anything there (see attached).  Note that if
writeout is ongoing and there is nothing attached to ->private, then we should
not be engaging write-streaming mode and attaching a new netfs_folio (and if
we did, we'd flush the page and wait for it anyway).

The other possibility is if we have a writeback group to set.  This only
applies to ceph for the moment and is something that will need dealing with
if/when ceph is made to use this code.

David
---

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 1eff9413eb1b..279b296f8014 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -255,7 +255,8 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 		 * the info in folio->private and may change it until it
 		 * removes the WB mark.
 		 */
-		if (folio_wait_writeback_killable(folio)) {
+		if (folio_get_private(folio) &&
+		    folio_wait_writeback_killable(folio)) {
 			ret = written ? -EINTR : -ERESTARTSYS;
 			goto error_folio_unlock;
 		}


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys
  2024-04-02  8:32   ` David Howells
@ 2024-04-10 17:38     ` Simon Horman
  2024-04-11  7:09     ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Simon Horman @ 2024-04-10 17:38 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Steve French,
	linux-cachefs, Gao Xiang, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, netdev, v9fs, Jeff Layton, linux-kernel,
	linux-fsdevel, netfs, linux-erofs

On Tue, Apr 02, 2024 at 09:32:37AM +0100, David Howells wrote:
> Simon Horman <horms@kernel.org> wrote:
> 
> > > +	op->store.size		= len,
> > 
> > nit: this is probably more intuitively written using len;
> 
> I'm not sure it makes a difference, but switching 'size' to 'len' in kafs is a
> separate thing that doesn't need to be part of this patchset.

Sorry, I meant, using ';' rather than ',' at the end of the line.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys
  2024-04-02  8:32   ` David Howells
  2024-04-10 17:38     ` Simon Horman
@ 2024-04-11  7:09     ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: David Howells @ 2024-04-11  7:09 UTC (permalink / raw)
  To: Simon Horman
  Cc: Dominique Martinet, dhowells, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Steve French,
	linux-cachefs, Gao Xiang, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, netdev, v9fs, Jeff Layton, linux-kernel,
	linux-fsdevel, netfs, linux-erofs

Simon Horman <horms@kernel.org> wrote:

> On Tue, Apr 02, 2024 at 09:32:37AM +0100, David Howells wrote:
> > Simon Horman <horms@kernel.org> wrote:
> > 
> > > > +	op->store.size		= len,
> > > 
> > > nit: this is probably more intuitively written using len;
> > 
> > I'm not sure it makes a difference, but switching 'size' to 'len' in kafs is a
> > separate thing that doesn't need to be part of this patchset.
> 
> Sorry, I meant, using ';' rather than ',' at the end of the line.

Ah, yes.  That makes a lot more sense!

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings
  2024-03-28 16:33 ` [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings David Howells
@ 2024-04-15 11:25   ` Jeff Layton
  2024-04-15 13:03   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2024-04-15 11:25 UTC (permalink / raw)
  To: David Howells, Christian Brauner, Gao Xiang, Dominique Martinet
  Cc: linux-mm, Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs,
	Matthew Wilcox, Steve French, linux-cachefs, Ilya Dryomov,
	Shyam Prasad N, Shyam Prasad N, Tom Talpey, ceph-devel,
	Eric Van Hensbergen, linux-nfs, Rohith Surabattula, netdev, v9fs,
	linux-kernel, Steve French, linux-fsdevel, netfs, linux-erofs

On Thu, 2024-03-28 at 16:33 +0000, David Howells wrote:
> fscache emits a lot of duplicate cookie warnings with cifs because the
> index key for the fscache cookies does not include everything that the
> cifs_find_inode() function does.  The latter is used with iget5_locked() to
> distinguish between inodes in the local inode cache.
> 
> Fix this by adding the creation time and file type to the fscache cookie
> key.
> 
> Additionally, add a couple of comments to note that if one is changed the
> other must be also.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <sfrench@samba.org>
> cc: Shyam Prasad N <nspmangalore@gmail.com>
> cc: Rohith Surabattula <rohiths.msft@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: linux-cifs@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/smb/client/fscache.c | 16 +++++++++++++++-
>  fs/smb/client/inode.c   |  2 ++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/smb/client/fscache.c b/fs/smb/client/fscache.c
> index c4a3cb736881..340efce8f052 100644
> --- a/fs/smb/client/fscache.c
> +++ b/fs/smb/client/fscache.c
> @@ -12,6 +12,16 @@
>  #include "cifs_fs_sb.h"
>  #include "cifsproto.h"
>  
> +/*
> + * Key for fscache inode.  [!] Contents must match comparisons in cifs_find_inode().
> + */
> +struct cifs_fscache_inode_key {
> +
> +	__le64  uniqueid;	/* server inode number */
> +	__le64  createtime;	/* creation time on server */
> +	u8	type;		/* S_IFMT file type */
> +} __packed;
> +

Interesting. So the uniqueid of the inode is not unique within the fs?
Or are the clients are mounting shares that span multiple filesystems?
Or, are we looking at a situation where the uniqueid is being quickly
reused for new inodes after the original inode is unlinked?

Should we be mixing in a fsid (or whatever the windows equivalent is)?

>  static void cifs_fscache_fill_volume_coherency(
>  	struct cifs_tcon *tcon,
>  	struct cifs_fscache_volume_coherency_data *cd)
> @@ -97,15 +107,19 @@ void cifs_fscache_release_super_cookie(struct cifs_tcon *tcon)
>  void cifs_fscache_get_inode_cookie(struct inode *inode)
>  {
>  	struct cifs_fscache_inode_coherency_data cd;
> +	struct cifs_fscache_inode_key key;
>  	struct cifsInodeInfo *cifsi = CIFS_I(inode);
>  	struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
>  	struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
>  
> +	key.uniqueid	= cpu_to_le64(cifsi->uniqueid);
> +	key.createtime	= cpu_to_le64(cifsi->createtime);
> +	key.type	= (inode->i_mode & S_IFMT) >> 12;
>  	cifs_fscache_fill_coherency(&cifsi->netfs.inode, &cd);
>  
>  	cifsi->netfs.cache =
>  		fscache_acquire_cookie(tcon->fscache, 0,
> -				       &cifsi->uniqueid, sizeof(cifsi->uniqueid),
> +				       &key, sizeof(key),
>  				       &cd, sizeof(cd),
>  				       i_size_read(&cifsi->netfs.inode));
>  	if (cifsi->netfs.cache)
> diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
> index d28ab0af6049..91b07ef9e25c 100644
> --- a/fs/smb/client/inode.c
> +++ b/fs/smb/client/inode.c
> @@ -1351,6 +1351,8 @@ cifs_find_inode(struct inode *inode, void *opaque)
>  {
>  	struct cifs_fattr *fattr = opaque;
>  
> +	/* [!] The compared values must be the same in struct cifs_fscache_inode_key. */
> +
>  	/* don't match inode with different uniqueid */
>  	if (CIFS_I(inode)->uniqueid != fattr->cf_uniqueid)
>  		return 0;
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/26] netfs: Update i_blocks when write committed to pagecache
  2024-03-28 16:33 ` [PATCH 03/26] netfs: Update i_blocks when write committed to pagecache David Howells
@ 2024-04-15 11:28   ` Jeff Layton
  2024-04-16 22:47   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2024-04-15 11:28 UTC (permalink / raw)
  To: David Howells, Christian Brauner, Gao Xiang, Dominique Martinet
  Cc: linux-mm, Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs,
	Matthew Wilcox, Steve French, linux-cachefs, Ilya Dryomov,
	Shyam Prasad N, Shyam Prasad N, Tom Talpey, ceph-devel,
	Eric Van Hensbergen, linux-nfs, Rohith Surabattula, netdev, v9fs,
	linux-kernel, Steve French, linux-fsdevel, netfs, linux-erofs

On Thu, 2024-03-28 at 16:33 +0000, David Howells wrote:
> Update i_blocks when i_size is updated when we finish making a write to the
> pagecache to reflect the amount of space we think will be consumed.
> 

Umm ok, but why? I get that the i_size and i_blocks would be out of sync
until we get back new attrs from the server, but is that a problem? I'm
mainly curious as to what's paying attention to the i_blocks during this
window.

> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <sfrench@samba.org>
> cc: Shyam Prasad N <nspmangalore@gmail.com>
> cc: Rohith Surabattula <rohiths.msft@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: linux-cifs@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-mm@kvack.org
> ---
>  fs/netfs/buffered_write.c | 45 +++++++++++++++++++++++++++++----------
>  1 file changed, 34 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
> index 9a0d32e4b422..c194655a6dcf 100644
> --- a/fs/netfs/buffered_write.c
> +++ b/fs/netfs/buffered_write.c
> @@ -130,6 +130,37 @@ static struct folio *netfs_grab_folio_for_write(struct address_space *mapping,
>  				   mapping_gfp_mask(mapping));
>  }
>  
> +/*
> + * Update i_size and estimate the update to i_blocks to reflect the additional
> + * data written into the pagecache until we can find out from the server what
> + * the values actually are.
> + */
> +static void netfs_update_i_size(struct netfs_inode *ctx, struct inode *inode,
> +				loff_t i_size, loff_t pos, size_t copied)
> +{
> +	blkcnt_t add;
> +	size_t gap;
> +
> +	if (ctx->ops->update_i_size) {
> +		ctx->ops->update_i_size(inode, pos);
> +		return;
> +	}
> +
> +	i_size_write(inode, pos);
> +#if IS_ENABLED(CONFIG_FSCACHE)
> +	fscache_update_cookie(ctx->cache, NULL, &pos);
> +#endif
> +
> +	gap = SECTOR_SIZE - (i_size & (SECTOR_SIZE - 1));
> +	if (copied > gap) {
> +		add = DIV_ROUND_UP(copied - gap, SECTOR_SIZE);
> +
> +		inode->i_blocks = min_t(blkcnt_t,
> +					DIV_ROUND_UP(pos, SECTOR_SIZE),
> +					inode->i_blocks + add);
> +	}
> +}
> +
>  /**
>   * netfs_perform_write - Copy data into the pagecache.
>   * @iocb: The operation parameters
> @@ -352,18 +383,10 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
>  		trace_netfs_folio(folio, trace);
>  
>  		/* Update the inode size if we moved the EOF marker */
> -		i_size = i_size_read(inode);
>  		pos += copied;
> -		if (pos > i_size) {
> -			if (ctx->ops->update_i_size) {
> -				ctx->ops->update_i_size(inode, pos);
> -			} else {
> -				i_size_write(inode, pos);
> -#if IS_ENABLED(CONFIG_FSCACHE)
> -				fscache_update_cookie(ctx->cache, NULL, &pos);
> -#endif
> -			}
> -		}
> +		i_size = i_size_read(inode);
> +		if (pos > i_size)
> +			netfs_update_i_size(ctx, inode, i_size, pos, copied);
>  		written += copied;
>  
>  		if (likely(!wreq)) {
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/26] mm: Provide a means of invalidation without using launder_folio
  2024-03-28 16:34 ` [PATCH 09/26] mm: Provide a means of invalidation without using launder_folio David Howells
@ 2024-04-15 11:41   ` Jeff Layton
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2024-04-15 11:41 UTC (permalink / raw)
  To: David Howells, Christian Brauner, Gao Xiang, Dominique Martinet
  Cc: linux-mm, Marc Dionne, linux-afs, Paulo Alcantara, linux-cifs,
	Miklos Szeredi, Matthew Wilcox, Christoph Hellwig, Steve French,
	linux-cachefs, Ilya Dryomov, devel, Shyam Prasad N,
	Christian Brauner, Tom Talpey, Alexander Viro, ceph-devel,
	Eric Van Hensbergen, Trond Myklebust, linux-nfs, netdev, v9fs,
	linux-kernel, linux-fsdevel, netfs, Andrew Morton, linux-erofs

On Thu, 2024-03-28 at 16:34 +0000, David Howells wrote:
> Implement a replacement for launder_folio.  The key feature of
> invalidate_inode_pages2() is that it locks each folio individually, unmaps
> it to prevent mmap'd accesses interfering and calls the ->launder_folio()
> address_space op to flush it.  This has problems: firstly, each folio is
> written individually as one or more small writes; secondly, adjacent folios
> cannot be added so easily into the laundry; thirdly, it's yet another op to
> implement.
> 
> Instead, use the invalidate lock to cause anyone wanting to add a folio to
> the inode to wait, then unmap all the folios if we have mmaps, then,
> conditionally, use ->writepages() to flush any dirty data back and then
> discard all pages.
> 
> The invalidate lock prevents ->read_iter(), ->write_iter() and faulting
> through mmap all from adding pages for the duration.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: Miklos Szeredi <miklos@szeredi.hu>
> cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> cc: Christoph Hellwig <hch@lst.de>
> cc: Andrew Morton <akpm@linux-foundation.org>
> cc: Alexander Viro <viro@zeniv.linux.org.uk>
> cc: Christian Brauner <brauner@kernel.org>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: linux-mm@kvack.org
> cc: linux-fsdevel@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: v9fs@lists.linux.dev
> cc: linux-afs@lists.infradead.org
> cc: ceph-devel@vger.kernel.org
> cc: linux-cifs@vger.kernel.org
> cc: linux-nfs@vger.kernel.org
> cc: devel@lists.orangefs.org
> ---
>  include/linux/pagemap.h |  1 +
>  mm/filemap.c            | 46 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 2df35e65557d..4eb3d4177a53 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -40,6 +40,7 @@ int filemap_fdatawait_keep_errors(struct address_space *mapping);
>  int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend);
>  int filemap_fdatawait_range_keep_errors(struct address_space *mapping,
>  		loff_t start_byte, loff_t end_byte);
> +int filemap_invalidate_inode(struct inode *inode, bool flush);
>  
>  static inline int filemap_fdatawait(struct address_space *mapping)
>  {
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 25983f0f96e3..087f685107a5 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -4134,6 +4134,52 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp)
>  }
>  EXPORT_SYMBOL(filemap_release_folio);
>  
> +/**
> + * filemap_invalidate_inode - Invalidate/forcibly write back an inode's pagecache
> + * @inode: The inode to flush
> + * @flush: Set to write back rather than simply invalidate.
> + *
> + * Invalidate all the folios on an inode, possibly writing them back first.
> + * Whilst the operation is undertaken, the invalidate lock is held to prevent
> + * new folios from being installed.
> + */
> +int filemap_invalidate_inode(struct inode *inode, bool flush)
> +{
> +	struct address_space *mapping = inode->i_mapping;
> +
> +	if (!mapping || !mapping->nrpages)
> +		goto out;
> +
> +	/* Prevent new folios from being added to the inode. */
> +	filemap_invalidate_lock(mapping);
> +
> +	if (!mapping->nrpages)
> +		goto unlock;
> +
> +	unmap_mapping_pages(mapping, 0, ULONG_MAX, false);
> +
> +	/* Write back the data if we're asked to. */
> +	if (flush) {
> +		struct writeback_control wbc = {
> +			.sync_mode	= WB_SYNC_ALL,
> +			.nr_to_write	= LONG_MAX,
> +			.range_start	= 0,
> +			.range_end	= LLONG_MAX,
> +		};
> +
> +		filemap_fdatawrite_wbc(mapping, &wbc);
> +	}
> +
> +	/* Wait for writeback to complete on all folios and discard. */
> +	truncate_inode_pages_range(mapping, 0, LLONG_MAX);
> +
> +unlock:
> +	filemap_invalidate_unlock(mapping);
> +out:
> +	return filemap_check_errors(mapping);
> +}
> +EXPORT_SYMBOL(filemap_invalidate_inode);
> +
>  #ifdef CONFIG_CACHESTAT_SYSCALL
>  /**
>   * filemap_cachestat() - compute the page cache statistics of a mapping
> 
> 

I'd have liked to have seen the first caller of this function too.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 11/26] 9p: Use alternative invalidation to using launder_folio
  2024-03-28 16:34 ` [PATCH 11/26] 9p: " David Howells
@ 2024-04-15 11:43   ` Jeff Layton
  2024-04-16 23:03   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2024-04-15 11:43 UTC (permalink / raw)
  To: David Howells, Christian Brauner, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, linux-mm, Marc Dionne,
	linux-afs, Paulo Alcantara, linux-cifs, Matthew Wilcox,
	Steve French, linux-cachefs, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, linux-nfs, netdev,
	v9fs, linux-kernel, linux-fsdevel, netfs, linux-erofs

On Thu, 2024-03-28 at 16:34 +0000, David Howells wrote:
> Use writepages-based flushing invalidation instead of
> invalidate_inode_pages2() and ->launder_folio().  This will allow
> ->launder_folio() to be removed eventually.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Eric Van Hensbergen <ericvh@kernel.org>
> cc: Latchesar Ionkov <lucho@ionkov.net>
> cc: Dominique Martinet <asmadeus@codewreck.org>
> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: v9fs@lists.linux.dev
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/9p/vfs_addr.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index 047855033d32..5a943c122d83 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -89,7 +89,6 @@ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file)
>  	bool writing = (rreq->origin == NETFS_READ_FOR_WRITE ||
>  			rreq->origin == NETFS_WRITEBACK ||
>  			rreq->origin == NETFS_WRITETHROUGH ||
> -			rreq->origin == NETFS_LAUNDER_WRITE ||
>  			rreq->origin == NETFS_UNBUFFERED_WRITE ||
>  			rreq->origin == NETFS_DIO_WRITE);
>  
> @@ -141,7 +140,6 @@ const struct address_space_operations v9fs_addr_operations = {
>  	.dirty_folio		= netfs_dirty_folio,
>  	.release_folio		= netfs_release_folio,
>  	.invalidate_folio	= netfs_invalidate_folio,
> -	.launder_folio		= netfs_launder_folio,
>  	.direct_IO		= noop_direct_IO,
>  	.writepages		= netfs_writepages,
>  };
> 

Shouldn't this include a call to filemap_invalidate_inode? Is just
removing launder_folio enough to do this?

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 24/26] netfs: Remove the old writeback code
  2024-03-28 16:34 ` [PATCH 24/26] netfs: Remove the old " David Howells
@ 2024-04-15 12:20   ` Jeff Layton
  2024-04-17 10:36   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2024-04-15 12:20 UTC (permalink / raw)
  To: David Howells, Christian Brauner, Gao Xiang, Dominique Martinet
  Cc: Latchesar Ionkov, Christian Schoenebeck, linux-mm, Marc Dionne,
	linux-afs, Paulo Alcantara, linux-cifs, Matthew Wilcox,
	Steve French, linux-cachefs, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, linux-nfs, netdev,
	v9fs, linux-kernel, linux-fsdevel, netfs, linux-erofs

On Thu, 2024-03-28 at 16:34 +0000, David Howells wrote:
> Remove the old writeback code.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: Eric Van Hensbergen <ericvh@kernel.org>
> cc: Latchesar Ionkov <lucho@ionkov.net>
> cc: Dominique Martinet <asmadeus@codewreck.org>
> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: v9fs@lists.linux.dev
> cc: linux-afs@lists.infradead.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/9p/vfs_addr.c          |  34 ---
>  fs/afs/write.c            |  40 ---
>  fs/netfs/buffered_write.c | 629 --------------------------------------
>  fs/netfs/direct_write.c   |   2 +-
>  fs/netfs/output.c         | 477 -----------------------------
>  5 files changed, 1 insertion(+), 1181 deletions(-)
>  delete mode 100644 fs/netfs/output.c
> 
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index 4845e655bc39..a97ceb105cd8 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -60,40 +60,6 @@ static void v9fs_issue_write(struct netfs_io_subrequest *subreq)
>  	netfs_write_subrequest_terminated(subreq, len ?: err, false);
>  }
>  
> -#if 0 // TODO: Remove

#23 and #24 should probably be merged. I don't see any reason to do the
two-step of ifdef'ing out the code and then removing it. Just go for it
at this point in the series.

> -static void v9fs_upload_to_server(struct netfs_io_subrequest *subreq)
> -{
> -	struct p9_fid *fid = subreq->rreq->netfs_priv;
> -	int err, len;
> -
> -	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
> -	len = p9_client_write(fid, subreq->start, &subreq->io_iter, &err);
> -	netfs_write_subrequest_terminated(subreq, len ?: err, false);
> -}
> -
> -static void v9fs_upload_to_server_worker(struct work_struct *work)
> -{
> -	struct netfs_io_subrequest *subreq =
> -		container_of(work, struct netfs_io_subrequest, work);
> -
> -	v9fs_upload_to_server(subreq);
> -}
> -
> -/*
> - * Set up write requests for a writeback slice.  We need to add a write request
> - * for each write we want to make.
> - */
> -static void v9fs_create_write_requests(struct netfs_io_request *wreq, loff_t start, size_t len)
> -{
> -	struct netfs_io_subrequest *subreq;
> -
> -	subreq = netfs_create_write_request(wreq, NETFS_UPLOAD_TO_SERVER,
> -					    start, len, v9fs_upload_to_server_worker);
> -	if (subreq)
> -		netfs_queue_write_request(subreq);
> -}
> -#endif
> -
>  /**
>   * v9fs_issue_read - Issue a read from 9P
>   * @subreq: The read to make
> diff --git a/fs/afs/write.c b/fs/afs/write.c
> index 0ead204c84cb..6ef7d4cbc008 100644
> --- a/fs/afs/write.c
> +++ b/fs/afs/write.c
> @@ -156,46 +156,6 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t
>  	return afs_put_operation(op);
>  }
>  
> -#if 0 // TODO: Remove
> -static void afs_upload_to_server(struct netfs_io_subrequest *subreq)
> -{
> -	struct afs_vnode *vnode = AFS_FS_I(subreq->rreq->inode);
> -	ssize_t ret;
> -
> -	_enter("%x[%x],%zx",
> -	       subreq->rreq->debug_id, subreq->debug_index, subreq->io_iter.count);
> -
> -	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
> -	ret = afs_store_data(vnode, &subreq->io_iter, subreq->start);
> -	netfs_write_subrequest_terminated(subreq, ret < 0 ? ret : subreq->len,
> -					  false);
> -}
> -
> -static void afs_upload_to_server_worker(struct work_struct *work)
> -{
> -	struct netfs_io_subrequest *subreq =
> -		container_of(work, struct netfs_io_subrequest, work);
> -
> -	afs_upload_to_server(subreq);
> -}
> -
> -/*
> - * Set up write requests for a writeback slice.  We need to add a write request
> - * for each write we want to make.
> - */
> -void afs_create_write_requests(struct netfs_io_request *wreq, loff_t start, size_t len)
> -{
> -	struct netfs_io_subrequest *subreq;
> -
> -	_enter("%x,%llx-%llx", wreq->debug_id, start, start + len);
> -
> -	subreq = netfs_create_write_request(wreq, NETFS_UPLOAD_TO_SERVER,
> -					    start, len, afs_upload_to_server_worker);
> -	if (subreq)
> -		netfs_queue_write_request(subreq);
> -}
> -#endif
> -
>  /*
>   * Writeback calls this when it finds a folio that needs uploading.  This isn't
>   * called if writeback only has copy-to-cache to deal with.
> diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
> index 945e646cd2db..2da9905abec9 100644
> --- a/fs/netfs/buffered_write.c
> +++ b/fs/netfs/buffered_write.c
> @@ -575,632 +575,3 @@ vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group *netfs_gr
>  	return ret;
>  }
>  EXPORT_SYMBOL(netfs_page_mkwrite);
> -
> -#if 0 // TODO: Remove
> -/*
> - * Kill all the pages in the given range
> - */
> -static void netfs_kill_pages(struct address_space *mapping,
> -			     loff_t start, loff_t len)
> -{
> -	struct folio *folio;
> -	pgoff_t index = start / PAGE_SIZE;
> -	pgoff_t last = (start + len - 1) / PAGE_SIZE, next;
> -
> -	_enter("%llx-%llx", start, start + len - 1);
> -
> -	do {
> -		_debug("kill %lx (to %lx)", index, last);
> -
> -		folio = filemap_get_folio(mapping, index);
> -		if (IS_ERR(folio)) {
> -			next = index + 1;
> -			continue;
> -		}
> -
> -		next = folio_next_index(folio);
> -
> -		trace_netfs_folio(folio, netfs_folio_trace_kill);
> -		folio_clear_uptodate(folio);
> -		folio_end_writeback(folio);
> -		folio_lock(folio);
> -		generic_error_remove_folio(mapping, folio);
> -		folio_unlock(folio);
> -		folio_put(folio);
> -
> -	} while (index = next, index <= last);
> -
> -	_leave("");
> -}
> -
> -/*
> - * Redirty all the pages in a given range.
> - */
> -static void netfs_redirty_pages(struct address_space *mapping,
> -				loff_t start, loff_t len)
> -{
> -	struct folio *folio;
> -	pgoff_t index = start / PAGE_SIZE;
> -	pgoff_t last = (start + len - 1) / PAGE_SIZE, next;
> -
> -	_enter("%llx-%llx", start, start + len - 1);
> -
> -	do {
> -		_debug("redirty %llx @%llx", len, start);
> -
> -		folio = filemap_get_folio(mapping, index);
> -		if (IS_ERR(folio)) {
> -			next = index + 1;
> -			continue;
> -		}
> -
> -		next = folio_next_index(folio);
> -		trace_netfs_folio(folio, netfs_folio_trace_redirty);
> -		filemap_dirty_folio(mapping, folio);
> -		folio_end_writeback(folio);
> -		folio_put(folio);
> -	} while (index = next, index <= last);
> -
> -	balance_dirty_pages_ratelimited(mapping);
> -
> -	_leave("");
> -}
> -
> -/*
> - * Completion of write to server
> - */
> -static void netfs_pages_written_back(struct netfs_io_request *wreq)
> -{
> -	struct address_space *mapping = wreq->mapping;
> -	struct netfs_folio *finfo;
> -	struct netfs_group *group = NULL;
> -	struct folio *folio;
> -	pgoff_t last;
> -	int gcount = 0;
> -
> -	XA_STATE(xas, &mapping->i_pages, wreq->start / PAGE_SIZE);
> -
> -	_enter("%llx-%llx", wreq->start, wreq->start + wreq->len);
> -
> -	rcu_read_lock();
> -
> -	last = (wreq->start + wreq->len - 1) / PAGE_SIZE;
> -	xas_for_each(&xas, folio, last) {
> -		WARN(!folio_test_writeback(folio),
> -		     "bad %llx @%llx page %lx %lx\n",
> -		     wreq->len, wreq->start, folio->index, last);
> -
> -		if ((finfo = netfs_folio_info(folio))) {
> -			/* Streaming writes cannot be redirtied whilst under
> -			 * writeback, so discard the streaming record.
> -			 */
> -			folio_detach_private(folio);
> -			group = finfo->netfs_group;
> -			gcount++;
> -			trace_netfs_folio(folio, netfs_folio_trace_clear_s);
> -			kfree(finfo);
> -		} else if ((group = netfs_folio_group(folio))) {
> -			/* Need to detach the group pointer if the page didn't
> -			 * get redirtied.  If it has been redirtied, then it
> -			 * must be within the same group.
> -			 */
> -			if (folio_test_dirty(folio)) {
> -				trace_netfs_folio(folio, netfs_folio_trace_redirtied);
> -				goto end_wb;
> -			}
> -			if (folio_trylock(folio)) {
> -				if (!folio_test_dirty(folio)) {
> -					folio_detach_private(folio);
> -					gcount++;
> -					if (group == NETFS_FOLIO_COPY_TO_CACHE)
> -						trace_netfs_folio(folio,
> -								  netfs_folio_trace_end_copy);
> -					else
> -						trace_netfs_folio(folio, netfs_folio_trace_clear_g);
> -				} else {
> -					trace_netfs_folio(folio, netfs_folio_trace_redirtied);
> -				}
> -				folio_unlock(folio);
> -				goto end_wb;
> -			}
> -
> -			xas_pause(&xas);
> -			rcu_read_unlock();
> -			folio_lock(folio);
> -			if (!folio_test_dirty(folio)) {
> -				folio_detach_private(folio);
> -				gcount++;
> -				trace_netfs_folio(folio, netfs_folio_trace_clear_g);
> -			} else {
> -				trace_netfs_folio(folio, netfs_folio_trace_redirtied);
> -			}
> -			folio_unlock(folio);
> -			rcu_read_lock();
> -		} else {
> -			trace_netfs_folio(folio, netfs_folio_trace_clear);
> -		}
> -	end_wb:
> -		xas_advance(&xas, folio_next_index(folio) - 1);
> -		folio_end_writeback(folio);
> -	}
> -
> -	rcu_read_unlock();
> -	netfs_put_group_many(group, gcount);
> -	_leave("");
> -}
> -
> -/*
> - * Deal with the disposition of the folios that are under writeback to close
> - * out the operation.
> - */
> -static void netfs_cleanup_buffered_write(struct netfs_io_request *wreq)
> -{
> -	struct address_space *mapping = wreq->mapping;
> -
> -	_enter("");
> -
> -	switch (wreq->error) {
> -	case 0:
> -		netfs_pages_written_back(wreq);
> -		break;
> -
> -	default:
> -		pr_notice("R=%08x Unexpected error %d\n", wreq->debug_id, wreq->error);
> -		fallthrough;
> -	case -EACCES:
> -	case -EPERM:
> -	case -ENOKEY:
> -	case -EKEYEXPIRED:
> -	case -EKEYREJECTED:
> -	case -EKEYREVOKED:
> -	case -ENETRESET:
> -	case -EDQUOT:
> -	case -ENOSPC:
> -		netfs_redirty_pages(mapping, wreq->start, wreq->len);
> -		break;
> -
> -	case -EROFS:
> -	case -EIO:
> -	case -EREMOTEIO:
> -	case -EFBIG:
> -	case -ENOENT:
> -	case -ENOMEDIUM:
> -	case -ENXIO:
> -		netfs_kill_pages(mapping, wreq->start, wreq->len);
> -		break;
> -	}
> -
> -	if (wreq->error)
> -		mapping_set_error(mapping, wreq->error);
> -	if (wreq->netfs_ops->done)
> -		wreq->netfs_ops->done(wreq);
> -}
> -
> -/*
> - * Extend the region to be written back to include subsequent contiguously
> - * dirty pages if possible, but don't sleep while doing so.
> - *
> - * If this page holds new content, then we can include filler zeros in the
> - * writeback.
> - */
> -static void netfs_extend_writeback(struct address_space *mapping,
> -				   struct netfs_group *group,
> -				   struct xa_state *xas,
> -				   long *_count,
> -				   loff_t start,
> -				   loff_t max_len,
> -				   size_t *_len,
> -				   size_t *_top)
> -{
> -	struct netfs_folio *finfo;
> -	struct folio_batch fbatch;
> -	struct folio *folio;
> -	unsigned int i;
> -	pgoff_t index = (start + *_len) / PAGE_SIZE;
> -	size_t len;
> -	void *priv;
> -	bool stop = true;
> -
> -	folio_batch_init(&fbatch);
> -
> -	do {
> -		/* Firstly, we gather up a batch of contiguous dirty pages
> -		 * under the RCU read lock - but we can't clear the dirty flags
> -		 * there if any of those pages are mapped.
> -		 */
> -		rcu_read_lock();
> -
> -		xas_for_each(xas, folio, ULONG_MAX) {
> -			stop = true;
> -			if (xas_retry(xas, folio))
> -				continue;
> -			if (xa_is_value(folio))
> -				break;
> -			if (folio->index != index) {
> -				xas_reset(xas);
> -				break;
> -			}
> -
> -			if (!folio_try_get_rcu(folio)) {
> -				xas_reset(xas);
> -				continue;
> -			}
> -
> -			/* Has the folio moved or been split? */
> -			if (unlikely(folio != xas_reload(xas))) {
> -				folio_put(folio);
> -				xas_reset(xas);
> -				break;
> -			}
> -
> -			if (!folio_trylock(folio)) {
> -				folio_put(folio);
> -				xas_reset(xas);
> -				break;
> -			}
> -			if (!folio_test_dirty(folio) ||
> -			    folio_test_writeback(folio)) {
> -				folio_unlock(folio);
> -				folio_put(folio);
> -				xas_reset(xas);
> -				break;
> -			}
> -
> -			stop = false;
> -			len = folio_size(folio);
> -			priv = folio_get_private(folio);
> -			if ((const struct netfs_group *)priv != group) {
> -				stop = true;
> -				finfo = netfs_folio_info(folio);
> -				if (!finfo ||
> -				    finfo->netfs_group != group ||
> -				    finfo->dirty_offset > 0) {
> -					folio_unlock(folio);
> -					folio_put(folio);
> -					xas_reset(xas);
> -					break;
> -				}
> -				len = finfo->dirty_len;
> -			}
> -
> -			*_top += folio_size(folio);
> -			index += folio_nr_pages(folio);
> -			*_count -= folio_nr_pages(folio);
> -			*_len += len;
> -			if (*_len >= max_len || *_count <= 0)
> -				stop = true;
> -
> -			if (!folio_batch_add(&fbatch, folio))
> -				break;
> -			if (stop)
> -				break;
> -		}
> -
> -		xas_pause(xas);
> -		rcu_read_unlock();
> -
> -		/* Now, if we obtained any folios, we can shift them to being
> -		 * writable and mark them for caching.
> -		 */
> -		if (!folio_batch_count(&fbatch))
> -			break;
> -
> -		for (i = 0; i < folio_batch_count(&fbatch); i++) {
> -			folio = fbatch.folios[i];
> -			if (group == NETFS_FOLIO_COPY_TO_CACHE)
> -				trace_netfs_folio(folio, netfs_folio_trace_copy_plus);
> -			else
> -				trace_netfs_folio(folio, netfs_folio_trace_store_plus);
> -
> -			if (!folio_clear_dirty_for_io(folio))
> -				BUG();
> -			folio_start_writeback(folio);
> -			folio_unlock(folio);
> -		}
> -
> -		folio_batch_release(&fbatch);
> -		cond_resched();
> -	} while (!stop);
> -}
> -
> -/*
> - * Synchronously write back the locked page and any subsequent non-locked dirty
> - * pages.
> - */
> -static ssize_t netfs_write_back_from_locked_folio(struct address_space *mapping,
> -						  struct writeback_control *wbc,
> -						  struct netfs_group *group,
> -						  struct xa_state *xas,
> -						  struct folio *folio,
> -						  unsigned long long start,
> -						  unsigned long long end)
> -{
> -	struct netfs_io_request *wreq;
> -	struct netfs_folio *finfo;
> -	struct netfs_inode *ctx = netfs_inode(mapping->host);
> -	unsigned long long i_size = i_size_read(&ctx->inode);
> -	size_t len, max_len;
> -	long count = wbc->nr_to_write;
> -	int ret;
> -
> -	_enter(",%lx,%llx-%llx", folio->index, start, end);
> -
> -	wreq = netfs_alloc_request(mapping, NULL, start, folio_size(folio),
> -				   group == NETFS_FOLIO_COPY_TO_CACHE ?
> -				   NETFS_COPY_TO_CACHE : NETFS_WRITEBACK);
> -	if (IS_ERR(wreq)) {
> -		folio_unlock(folio);
> -		return PTR_ERR(wreq);
> -	}
> -
> -	if (!folio_clear_dirty_for_io(folio))
> -		BUG();
> -	folio_start_writeback(folio);
> -
> -	count -= folio_nr_pages(folio);
> -
> -	/* Find all consecutive lockable dirty pages that have contiguous
> -	 * written regions, stopping when we find a page that is not
> -	 * immediately lockable, is not dirty or is missing, or we reach the
> -	 * end of the range.
> -	 */
> -	if (group == NETFS_FOLIO_COPY_TO_CACHE)
> -		trace_netfs_folio(folio, netfs_folio_trace_copy);
> -	else
> -		trace_netfs_folio(folio, netfs_folio_trace_store);
> -
> -	len = wreq->len;
> -	finfo = netfs_folio_info(folio);
> -	if (finfo) {
> -		start += finfo->dirty_offset;
> -		if (finfo->dirty_offset + finfo->dirty_len != len) {
> -			len = finfo->dirty_len;
> -			goto cant_expand;
> -		}
> -		len = finfo->dirty_len;
> -	}
> -
> -	if (start < i_size) {
> -		/* Trim the write to the EOF; the extra data is ignored.  Also
> -		 * put an upper limit on the size of a single storedata op.
> -		 */
> -		max_len = 65536 * 4096;
> -		max_len = min_t(unsigned long long, max_len, end - start + 1);
> -		max_len = min_t(unsigned long long, max_len, i_size - start);
> -
> -		if (len < max_len)
> -			netfs_extend_writeback(mapping, group, xas, &count, start,
> -					       max_len, &len, &wreq->upper_len);
> -	}
> -
> -cant_expand:
> -	len = min_t(unsigned long long, len, i_size - start);
> -
> -	/* We now have a contiguous set of dirty pages, each with writeback
> -	 * set; the first page is still locked at this point, but all the rest
> -	 * have been unlocked.
> -	 */
> -	folio_unlock(folio);
> -	wreq->start = start;
> -	wreq->len = len;
> -
> -	if (start < i_size) {
> -		_debug("write back %zx @%llx [%llx]", len, start, i_size);
> -
> -		/* Speculatively write to the cache.  We have to fix this up
> -		 * later if the store fails.
> -		 */
> -		wreq->cleanup = netfs_cleanup_buffered_write;
> -
> -		iov_iter_xarray(&wreq->iter, ITER_SOURCE, &mapping->i_pages, start,
> -				wreq->upper_len);
> -		if (group != NETFS_FOLIO_COPY_TO_CACHE) {
> -			__set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags);
> -			ret = netfs_begin_write(wreq, true, netfs_write_trace_writeback);
> -		} else {
> -			ret = netfs_begin_write(wreq, true, netfs_write_trace_copy_to_cache);
> -		}
> -		if (ret == 0 || ret == -EIOCBQUEUED)
> -			wbc->nr_to_write -= len / PAGE_SIZE;
> -	} else {
> -		_debug("write discard %zx @%llx [%llx]", len, start, i_size);
> -
> -		/* The dirty region was entirely beyond the EOF. */
> -		netfs_pages_written_back(wreq);
> -		ret = 0;
> -	}
> -
> -	netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
> -	_leave(" = 1");
> -	return 1;
> -}
> -
> -/*
> - * Write a region of pages back to the server
> - */
> -static ssize_t netfs_writepages_begin(struct address_space *mapping,
> -				      struct writeback_control *wbc,
> -				      struct netfs_group *group,
> -				      struct xa_state *xas,
> -				      unsigned long long *_start,
> -				      unsigned long long end)
> -{
> -	const struct netfs_folio *finfo;
> -	struct folio *folio;
> -	unsigned long long start = *_start;
> -	ssize_t ret;
> -	void *priv;
> -	int skips = 0;
> -
> -	_enter("%llx,%llx,", start, end);
> -
> -search_again:
> -	/* Find the first dirty page in the group. */
> -	rcu_read_lock();
> -
> -	for (;;) {
> -		folio = xas_find_marked(xas, end / PAGE_SIZE, PAGECACHE_TAG_DIRTY);
> -		if (xas_retry(xas, folio) || xa_is_value(folio))
> -			continue;
> -		if (!folio)
> -			break;
> -
> -		if (!folio_try_get_rcu(folio)) {
> -			xas_reset(xas);
> -			continue;
> -		}
> -
> -		if (unlikely(folio != xas_reload(xas))) {
> -			folio_put(folio);
> -			xas_reset(xas);
> -			continue;
> -		}
> -
> -		/* Skip any dirty folio that's not in the group of interest. */
> -		priv = folio_get_private(folio);
> -		if ((const struct netfs_group *)priv == NETFS_FOLIO_COPY_TO_CACHE) {
> -			group = NETFS_FOLIO_COPY_TO_CACHE;
> -		} else if ((const struct netfs_group *)priv != group) {
> -			finfo = __netfs_folio_info(priv);
> -			if (!finfo || finfo->netfs_group != group) {
> -				folio_put(folio);
> -				continue;
> -			}
> -		}
> -
> -		xas_pause(xas);
> -		break;
> -	}
> -	rcu_read_unlock();
> -	if (!folio)
> -		return 0;
> -
> -	start = folio_pos(folio); /* May regress with THPs */
> -
> -	_debug("wback %lx", folio->index);
> -
> -	/* At this point we hold neither the i_pages lock nor the page lock:
> -	 * the page may be truncated or invalidated (changing page->mapping to
> -	 * NULL), or even swizzled back from swapper_space to tmpfs file
> -	 * mapping
> -	 */
> -lock_again:
> -	if (wbc->sync_mode != WB_SYNC_NONE) {
> -		ret = folio_lock_killable(folio);
> -		if (ret < 0)
> -			return ret;
> -	} else {
> -		if (!folio_trylock(folio))
> -			goto search_again;
> -	}
> -
> -	if (folio->mapping != mapping ||
> -	    !folio_test_dirty(folio)) {
> -		start += folio_size(folio);
> -		folio_unlock(folio);
> -		goto search_again;
> -	}
> -
> -	if (folio_test_writeback(folio)) {
> -		folio_unlock(folio);
> -		if (wbc->sync_mode != WB_SYNC_NONE) {
> -			folio_wait_writeback(folio);
> -			goto lock_again;
> -		}
> -
> -		start += folio_size(folio);
> -		if (wbc->sync_mode == WB_SYNC_NONE) {
> -			if (skips >= 5 || need_resched()) {
> -				ret = 0;
> -				goto out;
> -			}
> -			skips++;
> -		}
> -		goto search_again;
> -	}
> -
> -	ret = netfs_write_back_from_locked_folio(mapping, wbc, group, xas,
> -						 folio, start, end);
> -out:
> -	if (ret > 0)
> -		*_start = start + ret;
> -	_leave(" = %zd [%llx]", ret, *_start);
> -	return ret;
> -}
> -
> -/*
> - * Write a region of pages back to the server
> - */
> -static int netfs_writepages_region(struct address_space *mapping,
> -				   struct writeback_control *wbc,
> -				   struct netfs_group *group,
> -				   unsigned long long *_start,
> -				   unsigned long long end)
> -{
> -	ssize_t ret;
> -
> -	XA_STATE(xas, &mapping->i_pages, *_start / PAGE_SIZE);
> -
> -	do {
> -		ret = netfs_writepages_begin(mapping, wbc, group, &xas,
> -					     _start, end);
> -		if (ret > 0 && wbc->nr_to_write > 0)
> -			cond_resched();
> -	} while (ret > 0 && wbc->nr_to_write > 0);
> -
> -	return ret > 0 ? 0 : ret;
> -}
> -
> -/*
> - * write some of the pending data back to the server
> - */
> -int netfs_writepages(struct address_space *mapping,
> -		     struct writeback_control *wbc)
> -{
> -	struct netfs_group *group = NULL;
> -	loff_t start, end;
> -	int ret;
> -
> -	_enter("");
> -
> -	/* We have to be careful as we can end up racing with setattr()
> -	 * truncating the pagecache since the caller doesn't take a lock here
> -	 * to prevent it.
> -	 */
> -
> -	if (wbc->range_cyclic && mapping->writeback_index) {
> -		start = mapping->writeback_index * PAGE_SIZE;
> -		ret = netfs_writepages_region(mapping, wbc, group,
> -					      &start, LLONG_MAX);
> -		if (ret < 0)
> -			goto out;
> -
> -		if (wbc->nr_to_write <= 0) {
> -			mapping->writeback_index = start / PAGE_SIZE;
> -			goto out;
> -		}
> -
> -		start = 0;
> -		end = mapping->writeback_index * PAGE_SIZE;
> -		mapping->writeback_index = 0;
> -		ret = netfs_writepages_region(mapping, wbc, group, &start, end);
> -		if (ret == 0)
> -			mapping->writeback_index = start / PAGE_SIZE;
> -	} else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) {
> -		start = 0;
> -		ret = netfs_writepages_region(mapping, wbc, group,
> -					      &start, LLONG_MAX);
> -		if (wbc->nr_to_write > 0 && ret == 0)
> -			mapping->writeback_index = start / PAGE_SIZE;
> -	} else {
> -		start = wbc->range_start;
> -		ret = netfs_writepages_region(mapping, wbc, group,
> -					      &start, wbc->range_end);
> -	}
> -
> -out:
> -	_leave(" = %d", ret);
> -	return ret;
> -}
> -EXPORT_SYMBOL(netfs_writepages);
> -#endif
> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
> index 330ba7cb3f10..e4a9cf7cd234 100644
> --- a/fs/netfs/direct_write.c
> +++ b/fs/netfs/direct_write.c
> @@ -37,7 +37,7 @@ static ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov
>  	size_t len = iov_iter_count(iter);
>  	bool async = !is_sync_kiocb(iocb);
>  
> -	_enter("");
> +	_enter("%lx", iov_iter_count(iter));
>  
>  	/* We're going to need a bounce buffer if what we transmit is going to
>  	 * be different in some way to the source buffer, e.g. because it gets
> diff --git a/fs/netfs/output.c b/fs/netfs/output.c
> deleted file mode 100644
> index 85374322f10f..000000000000
> --- a/fs/netfs/output.c
> +++ /dev/null
> @@ -1,477 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0-only
> -/* Network filesystem high-level write support.
> - *
> - * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
> - * Written by David Howells (dhowells@redhat.com)
> - */
> -
> -#include <linux/fs.h>
> -#include <linux/mm.h>
> -#include <linux/pagemap.h>
> -#include <linux/slab.h>
> -#include <linux/writeback.h>
> -#include <linux/pagevec.h>
> -#include "internal.h"
> -
> -/**
> - * netfs_create_write_request - Create a write operation.
> - * @wreq: The write request this is storing from.
> - * @dest: The destination type
> - * @start: Start of the region this write will modify
> - * @len: Length of the modification
> - * @worker: The worker function to handle the write(s)
> - *
> - * Allocate a write operation, set it up and add it to the list on a write
> - * request.
> - */
> -struct netfs_io_subrequest *netfs_create_write_request(struct netfs_io_request *wreq,
> -						       enum netfs_io_source dest,
> -						       loff_t start, size_t len,
> -						       work_func_t worker)
> -{
> -	struct netfs_io_subrequest *subreq;
> -
> -	subreq = netfs_alloc_subrequest(wreq);
> -	if (subreq) {
> -		INIT_WORK(&subreq->work, worker);
> -		subreq->source	= dest;
> -		subreq->start	= start;
> -		subreq->len	= len;
> -
> -		switch (subreq->source) {
> -		case NETFS_UPLOAD_TO_SERVER:
> -			netfs_stat(&netfs_n_wh_upload);
> -			break;
> -		case NETFS_WRITE_TO_CACHE:
> -			netfs_stat(&netfs_n_wh_write);
> -			break;
> -		default:
> -			BUG();
> -		}
> -
> -		subreq->io_iter = wreq->io_iter;
> -		iov_iter_advance(&subreq->io_iter, subreq->start - wreq->start);
> -		iov_iter_truncate(&subreq->io_iter, subreq->len);
> -
> -		trace_netfs_sreq_ref(wreq->debug_id, subreq->debug_index,
> -				     refcount_read(&subreq->ref),
> -				     netfs_sreq_trace_new);
> -		atomic_inc(&wreq->nr_outstanding);
> -		list_add_tail(&subreq->rreq_link, &wreq->subrequests);
> -		trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
> -	}
> -
> -	return subreq;
> -}
> -EXPORT_SYMBOL(netfs_create_write_request);
> -
> -/*
> - * Process a completed write request once all the component operations have
> - * been completed.
> - */
> -static void netfs_write_terminated(struct netfs_io_request *wreq, bool was_async)
> -{
> -	struct netfs_io_subrequest *subreq;
> -	struct netfs_inode *ctx = netfs_inode(wreq->inode);
> -	size_t transferred = 0;
> -
> -	_enter("R=%x[]", wreq->debug_id);
> -
> -	trace_netfs_rreq(wreq, netfs_rreq_trace_write_done);
> -
> -	list_for_each_entry(subreq, &wreq->subrequests, rreq_link) {
> -		if (subreq->error || subreq->transferred == 0)
> -			break;
> -		transferred += subreq->transferred;
> -		if (subreq->transferred < subreq->len)
> -			break;
> -	}
> -	wreq->transferred = transferred;
> -
> -	list_for_each_entry(subreq, &wreq->subrequests, rreq_link) {
> -		if (!subreq->error)
> -			continue;
> -		switch (subreq->source) {
> -		case NETFS_UPLOAD_TO_SERVER:
> -			/* Depending on the type of failure, this may prevent
> -			 * writeback completion unless we're in disconnected
> -			 * mode.
> -			 */
> -			if (!wreq->error)
> -				wreq->error = subreq->error;
> -			break;
> -
> -		case NETFS_WRITE_TO_CACHE:
> -			/* Failure doesn't prevent writeback completion unless
> -			 * we're in disconnected mode.
> -			 */
> -			if (subreq->error != -ENOBUFS)
> -				ctx->ops->invalidate_cache(wreq);
> -			break;
> -
> -		default:
> -			WARN_ON_ONCE(1);
> -			if (!wreq->error)
> -				wreq->error = -EIO;
> -			return;
> -		}
> -	}
> -
> -	wreq->cleanup(wreq);
> -
> -	if (wreq->origin == NETFS_DIO_WRITE &&
> -	    wreq->mapping->nrpages) {
> -		pgoff_t first = wreq->start >> PAGE_SHIFT;
> -		pgoff_t last = (wreq->start + wreq->transferred - 1) >> PAGE_SHIFT;
> -		invalidate_inode_pages2_range(wreq->mapping, first, last);
> -	}
> -
> -	if (wreq->origin == NETFS_DIO_WRITE)
> -		inode_dio_end(wreq->inode);
> -
> -	_debug("finished");
> -	trace_netfs_rreq(wreq, netfs_rreq_trace_wake_ip);
> -	clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &wreq->flags);
> -	wake_up_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS);
> -
> -	if (wreq->iocb) {
> -		wreq->iocb->ki_pos += transferred;
> -		if (wreq->iocb->ki_complete)
> -			wreq->iocb->ki_complete(
> -				wreq->iocb, wreq->error ? wreq->error : transferred);
> -	}
> -
> -	netfs_clear_subrequests(wreq, was_async);
> -	netfs_put_request(wreq, was_async, netfs_rreq_trace_put_complete);
> -}
> -
> -/*
> - * Deal with the completion of writing the data to the cache.
> - */
> -void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
> -				       bool was_async)
> -{
> -	struct netfs_io_subrequest *subreq = _op;
> -	struct netfs_io_request *wreq = subreq->rreq;
> -	unsigned int u;
> -
> -	_enter("%x[%x] %zd", wreq->debug_id, subreq->debug_index, transferred_or_error);
> -
> -	switch (subreq->source) {
> -	case NETFS_UPLOAD_TO_SERVER:
> -		netfs_stat(&netfs_n_wh_upload_done);
> -		break;
> -	case NETFS_WRITE_TO_CACHE:
> -		netfs_stat(&netfs_n_wh_write_done);
> -		break;
> -	case NETFS_INVALID_WRITE:
> -		break;
> -	default:
> -		BUG();
> -	}
> -
> -	if (IS_ERR_VALUE(transferred_or_error)) {
> -		subreq->error = transferred_or_error;
> -		trace_netfs_failure(wreq, subreq, transferred_or_error,
> -				    netfs_fail_write);
> -		goto failed;
> -	}
> -
> -	if (WARN(transferred_or_error > subreq->len - subreq->transferred,
> -		 "Subreq excess write: R%x[%x] %zd > %zu - %zu",
> -		 wreq->debug_id, subreq->debug_index,
> -		 transferred_or_error, subreq->len, subreq->transferred))
> -		transferred_or_error = subreq->len - subreq->transferred;
> -
> -	subreq->error = 0;
> -	subreq->transferred += transferred_or_error;
> -
> -	if (iov_iter_count(&subreq->io_iter) != subreq->len - subreq->transferred)
> -		pr_warn("R=%08x[%u] ITER POST-MISMATCH %zx != %zx-%zx %x\n",
> -			wreq->debug_id, subreq->debug_index,
> -			iov_iter_count(&subreq->io_iter), subreq->len,
> -			subreq->transferred, subreq->io_iter.iter_type);
> -
> -	if (subreq->transferred < subreq->len)
> -		goto incomplete;
> -
> -	__clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
> -out:
> -	trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
> -
> -	/* If we decrement nr_outstanding to 0, the ref belongs to us. */
> -	u = atomic_dec_return(&wreq->nr_outstanding);
> -	if (u == 0)
> -		netfs_write_terminated(wreq, was_async);
> -	else if (u == 1)
> -		wake_up_var(&wreq->nr_outstanding);
> -
> -	netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
> -	return;
> -
> -incomplete:
> -	if (transferred_or_error == 0) {
> -		if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
> -			subreq->error = -ENODATA;
> -			goto failed;
> -		}
> -	} else {
> -		__clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
> -	}
> -
> -	__set_bit(NETFS_SREQ_SHORT_IO, &subreq->flags);
> -	set_bit(NETFS_RREQ_INCOMPLETE_IO, &wreq->flags);
> -	goto out;
> -
> -failed:
> -	switch (subreq->source) {
> -	case NETFS_WRITE_TO_CACHE:
> -		netfs_stat(&netfs_n_wh_write_failed);
> -		set_bit(NETFS_RREQ_INCOMPLETE_IO, &wreq->flags);
> -		break;
> -	case NETFS_UPLOAD_TO_SERVER:
> -		netfs_stat(&netfs_n_wh_upload_failed);
> -		set_bit(NETFS_RREQ_FAILED, &wreq->flags);
> -		wreq->error = subreq->error;
> -		break;
> -	default:
> -		break;
> -	}
> -	goto out;
> -}
> -EXPORT_SYMBOL(netfs_write_subrequest_terminated);
> -
> -static void netfs_write_to_cache_op(struct netfs_io_subrequest *subreq)
> -{
> -	struct netfs_io_request *wreq = subreq->rreq;
> -	struct netfs_cache_resources *cres = &wreq->cache_resources;
> -
> -	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
> -
> -	cres->ops->write(cres, subreq->start, &subreq->io_iter,
> -			 netfs_write_subrequest_terminated, subreq);
> -}
> -
> -static void netfs_write_to_cache_op_worker(struct work_struct *work)
> -{
> -	struct netfs_io_subrequest *subreq =
> -		container_of(work, struct netfs_io_subrequest, work);
> -
> -	netfs_write_to_cache_op(subreq);
> -}
> -
> -/**
> - * netfs_queue_write_request - Queue a write request for attention
> - * @subreq: The write request to be queued
> - *
> - * Queue the specified write request for processing by a worker thread.  We
> - * pass the caller's ref on the request to the worker thread.
> - */
> -void netfs_queue_write_request(struct netfs_io_subrequest *subreq)
> -{
> -	if (!queue_work(system_unbound_wq, &subreq->work))
> -		netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_wip);
> -}
> -EXPORT_SYMBOL(netfs_queue_write_request);
> -
> -/*
> - * Set up a op for writing to the cache.
> - */
> -static void netfs_set_up_write_to_cache(struct netfs_io_request *wreq)
> -{
> -	struct netfs_cache_resources *cres = &wreq->cache_resources;
> -	struct netfs_io_subrequest *subreq;
> -	struct netfs_inode *ctx = netfs_inode(wreq->inode);
> -	struct fscache_cookie *cookie = netfs_i_cookie(ctx);
> -	loff_t start = wreq->start;
> -	size_t len = wreq->len;
> -	int ret;
> -
> -	if (!fscache_cookie_enabled(cookie)) {
> -		clear_bit(NETFS_RREQ_WRITE_TO_CACHE, &wreq->flags);
> -		return;
> -	}
> -
> -	_debug("write to cache");
> -	ret = fscache_begin_write_operation(cres, cookie);
> -	if (ret < 0)
> -		return;
> -
> -	ret = cres->ops->prepare_write(cres, &start, &len, wreq->upper_len,
> -				       i_size_read(wreq->inode), true);
> -	if (ret < 0)
> -		return;
> -
> -	subreq = netfs_create_write_request(wreq, NETFS_WRITE_TO_CACHE, start, len,
> -					    netfs_write_to_cache_op_worker);
> -	if (!subreq)
> -		return;
> -
> -	netfs_write_to_cache_op(subreq);
> -}
> -
> -/*
> - * Begin the process of writing out a chunk of data.
> - *
> - * We are given a write request that holds a series of dirty regions and
> - * (partially) covers a sequence of folios, all of which are present.  The
> - * pages must have been marked as writeback as appropriate.
> - *
> - * We need to perform the following steps:
> - *
> - * (1) If encrypting, create an output buffer and encrypt each block of the
> - *     data into it, otherwise the output buffer will point to the original
> - *     folios.
> - *
> - * (2) If the data is to be cached, set up a write op for the entire output
> - *     buffer to the cache, if the cache wants to accept it.
> - *
> - * (3) If the data is to be uploaded (ie. not merely cached):
> - *
> - *     (a) If the data is to be compressed, create a compression buffer and
> - *         compress the data into it.
> - *
> - *     (b) For each destination we want to upload to, set up write ops to write
> - *         to that destination.  We may need multiple writes if the data is not
> - *         contiguous or the span exceeds wsize for a server.
> - */
> -int netfs_begin_write(struct netfs_io_request *wreq, bool may_wait,
> -		      enum netfs_write_trace what)
> -{
> -	struct netfs_inode *ctx = netfs_inode(wreq->inode);
> -
> -	_enter("R=%x %llx-%llx f=%lx",
> -	       wreq->debug_id, wreq->start, wreq->start + wreq->len - 1,
> -	       wreq->flags);
> -
> -	trace_netfs_write(wreq, what);
> -	if (wreq->len == 0 || wreq->iter.count == 0) {
> -		pr_err("Zero-sized write [R=%x]\n", wreq->debug_id);
> -		return -EIO;
> -	}
> -
> -	if (wreq->origin == NETFS_DIO_WRITE)
> -		inode_dio_begin(wreq->inode);
> -
> -	wreq->io_iter = wreq->iter;
> -
> -	/* ->outstanding > 0 carries a ref */
> -	netfs_get_request(wreq, netfs_rreq_trace_get_for_outstanding);
> -	atomic_set(&wreq->nr_outstanding, 1);
> -
> -	/* Start the encryption/compression going.  We can do that in the
> -	 * background whilst we generate a list of write ops that we want to
> -	 * perform.
> -	 */
> -	// TODO: Encrypt or compress the region as appropriate
> -
> -	/* We need to write all of the region to the cache */
> -	if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &wreq->flags))
> -		netfs_set_up_write_to_cache(wreq);
> -
> -	/* However, we don't necessarily write all of the region to the server.
> -	 * Caching of reads is being managed this way also.
> -	 */
> -	if (test_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags))
> -		ctx->ops->create_write_requests(wreq, wreq->start, wreq->len);
> -
> -	if (atomic_dec_and_test(&wreq->nr_outstanding))
> -		netfs_write_terminated(wreq, false);
> -
> -	if (!may_wait)
> -		return -EIOCBQUEUED;
> -
> -	wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS,
> -		    TASK_UNINTERRUPTIBLE);
> -	return wreq->error;
> -}
> -
> -/*
> - * Begin a write operation for writing through the pagecache.
> - */
> -struct netfs_io_request *netfs_begin_writethrough(struct kiocb *iocb, size_t len)
> -{
> -	struct netfs_io_request *wreq;
> -	struct file *file = iocb->ki_filp;
> -
> -	wreq = netfs_alloc_request(file->f_mapping, file, iocb->ki_pos, len,
> -				   NETFS_WRITETHROUGH);
> -	if (IS_ERR(wreq))
> -		return wreq;
> -
> -	trace_netfs_write(wreq, netfs_write_trace_writethrough);
> -
> -	__set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags);
> -	iov_iter_xarray(&wreq->iter, ITER_SOURCE, &wreq->mapping->i_pages, wreq->start, 0);
> -	wreq->io_iter = wreq->iter;
> -
> -	/* ->outstanding > 0 carries a ref */
> -	netfs_get_request(wreq, netfs_rreq_trace_get_for_outstanding);
> -	atomic_set(&wreq->nr_outstanding, 1);
> -	return wreq;
> -}
> -
> -static void netfs_submit_writethrough(struct netfs_io_request *wreq, bool final)
> -{
> -	struct netfs_inode *ictx = netfs_inode(wreq->inode);
> -	unsigned long long start;
> -	size_t len;
> -
> -	if (!test_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags))
> -		return;
> -
> -	start = wreq->start + wreq->submitted;
> -	len = wreq->iter.count - wreq->submitted;
> -	if (!final) {
> -		len /= wreq->wsize; /* Round to number of maximum packets */
> -		len *= wreq->wsize;
> -	}
> -
> -	ictx->ops->create_write_requests(wreq, start, len);
> -	wreq->submitted += len;
> -}
> -
> -/*
> - * Advance the state of the write operation used when writing through the
> - * pagecache.  Data has been copied into the pagecache that we need to append
> - * to the request.  If we've added more than wsize then we need to create a new
> - * subrequest.
> - */
> -int netfs_advance_writethrough(struct netfs_io_request *wreq, size_t copied, bool to_page_end)
> -{
> -	_enter("ic=%zu sb=%llu ws=%u cp=%zu tp=%u",
> -	       wreq->iter.count, wreq->submitted, wreq->wsize, copied, to_page_end);
> -
> -	wreq->iter.count += copied;
> -	wreq->io_iter.count += copied;
> -	if (to_page_end && wreq->io_iter.count - wreq->submitted >= wreq->wsize)
> -		netfs_submit_writethrough(wreq, false);
> -
> -	return wreq->error;
> -}
> -
> -/*
> - * End a write operation used when writing through the pagecache.
> - */
> -int netfs_end_writethrough(struct netfs_io_request *wreq, struct kiocb *iocb)
> -{
> -	int ret = -EIOCBQUEUED;
> -
> -	_enter("ic=%zu sb=%llu ws=%u",
> -	       wreq->iter.count, wreq->submitted, wreq->wsize);
> -
> -	if (wreq->submitted < wreq->io_iter.count)
> -		netfs_submit_writethrough(wreq, true);
> -
> -	if (atomic_dec_and_test(&wreq->nr_outstanding))
> -		netfs_write_terminated(wreq, false);
> -
> -	if (is_sync_kiocb(iocb)) {
> -		wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS,
> -			    TASK_UNINTERRUPTIBLE);
> -		ret = wreq->error;
> -	}
> -
> -	netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
> -	return ret;
> -}
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 17/26] netfs: Fix writethrough-mode error handling
  2024-03-28 16:34 ` [PATCH 17/26] netfs: Fix writethrough-mode error handling David Howells
@ 2024-04-15 12:40   ` Jeff Layton
  2024-04-17  9:04   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2024-04-15 12:40 UTC (permalink / raw)
  To: David Howells, Christian Brauner, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, linux-cifs, Shyam Prasad N, linux-mm,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, Tom Talpey, Steve French, linux-cachefs, netdev,
	Marc Dionne, netfs, Ilya Dryomov, Eric Van Hensbergen,
	linux-erofs, linux-afs

On Thu, 2024-03-28 at 16:34 +0000, David Howells wrote:
> Fix the error return in netfs_perform_write() acting in writethrough-mode
> to return any cached error in the case that netfs_end_writethrough()
> returns 0.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/netfs/buffered_write.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
> index 8e4a3fb287e3..db4ad158948b 100644
> --- a/fs/netfs/buffered_write.c
> +++ b/fs/netfs/buffered_write.c
> @@ -188,7 +188,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
>  	enum netfs_how_to_modify howto;
>  	enum netfs_folio_trace trace;
>  	unsigned int bdp_flags = (iocb->ki_flags & IOCB_SYNC) ? 0: BDP_ASYNC;
> -	ssize_t written = 0, ret;
> +	ssize_t written = 0, ret, ret2;
>  	loff_t i_size, pos = iocb->ki_pos, from, to;
>  	size_t max_chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
>  	bool maybe_trouble = false;
> @@ -409,10 +409,12 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
>  
>  out:
>  	if (unlikely(wreq)) {
> -		ret = netfs_end_writethrough(wreq, iocb);
> +		ret2 = netfs_end_writethrough(wreq, iocb);
>  		wbc_detach_inode(&wbc);
> -		if (ret == -EIOCBQUEUED)
> -			return ret;
> +		if (ret2 == -EIOCBQUEUED)
> +			return ret2;
> +		if (ret == 0)
> +			ret = ret2;
>  	}
>  
>  	iocb->ki_pos += written;
> 

Should this be merged independently? It looks like a bug that's present
now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache
  2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
                   ` (30 preceding siblings ...)
  2024-04-08 15:53 ` [PATCH 23/26] netfs: Cut over to using new writeback code David Howells
@ 2024-04-15 12:49 ` Jeff Layton
  31 siblings, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2024-04-15 12:49 UTC (permalink / raw)
  To: David Howells, Christian Brauner, Gao Xiang, Dominique Martinet
  Cc: Paulo Alcantara, linux-cifs, Shyam Prasad N, linux-mm,
	linux-fsdevel, v9fs, ceph-devel, linux-nfs, Matthew Wilcox,
	linux-kernel, Tom Talpey, Steve French, linux-cachefs, netdev,
	Marc Dionne, netfs, Ilya Dryomov, Eric Van Hensbergen,
	linux-erofs, linux-afs

On Thu, 2024-03-28 at 16:33 +0000, David Howells wrote:
> Hi Christian, Willy,
> 
> The primary purpose of these patches is to rework the netfslib writeback
> implementation such that pages read from the cache are written to the cache
> through ->writepages(), thereby allowing the fscache page flag to be
> retired.
> 
> The reworking also:
> 
>  (1) builds on top of the new writeback_iter() infrastructure;
> 
>  (2) makes it possible to use vectored write RPCs as discontiguous streams
>      of pages can be accommodated;
> 
>  (3) makes it easier to do simultaneous content crypto and stream division.
> 
>  (4) provides support for retrying writes and re-dividing a stream;
> 
>  (5) replaces the ->launder_folio() op, so that ->writepages() is used
>      instead;
> 
>  (6) uses mempools to allocate the netfs_io_request and netfs_io_subrequest
>      structs to avoid allocation failure in the writeback path.
> 
> Some code that uses the fscache page flag is retained for compatibility
> purposes with nfs and ceph.  The code is switched to using the synonymous
> private_2 label instead and marked with deprecation comments.  I have a
> separate set of patches that convert cifs to use this code.
> 
> -~-
> 
> In this new implementation, writeback_iter() is used to pump folios,
> progressively creating two parallel, but separate streams.  Either or both
> streams can contain gaps, and the subrequests in each stream can be of
> variable size, don't need to align with each other and don't need to align
> with the folios.  (Note that more streams can be added if we have multiple
> servers to duplicate data to).
> 
> Indeed, subrequests can cross folio boundaries, may cover several folios or
> a folio may be spanned by multiple subrequests, e.g.:
> 
>          +---+---+-----+-----+---+----------+
> Folios:  |   |   |     |     |   |          |
>          +---+---+-----+-----+---+----------+
> 
>            +------+------+     +----+----+
> Upload:    |      |      |.....|    |    |
>            +------+------+     +----+----+
> 
>          +------+------+------+------+------+
> Cache:   |      |      |      |      |      |
>          +------+------+------+------+------+
> 
> Data that got read from the server that needs copying to the cache is
> stored in folios that are marked dirty and have folio->private set to a
> special value.
> 
> The progressive subrequest construction permits the algorithm to be
> preparing both the next upload to the server and the next write to the
> cache whilst the previous ones are already in progress.  Throttling can be
> applied to control the rate of production of subrequests - and, in any
> case, we probably want to write them to the server in ascending order,
> particularly if the file will be extended.
> 
> Content crypto can also be prepared at the same time as the subrequests and
> run asynchronously, with the prepped requests being stalled until the
> crypto catches up with them.  This might also be useful for transport
> crypto, but that happens at a lower layer, so probably would be harder to
> pull off.
> 
> The algorithm is split into three parts:
> 
>  (1) The issuer.  This walks through the data, packaging it up, encrypting
>      it and creating subrequests.  The part of this that generates
>      subrequests only deals with file positions and spans and so is usable
>      for DIO/unbuffered writes as well as buffered writes.
> 
>  (2) The collector.  This asynchronously collects completed subrequests,
>      unlocks folios, frees crypto buffers and performs any retries.  This
>      runs in a work queue so that the issuer can return to the caller for
>      writeback (so that the VM can have its kswapd thread back) or async
>      writes.
> 
>      Collection is slightly complex as the collector has to work out where
>      discontiguities happen in the folio list so that it doesn't try and
>      collect folios that weren't included in the write out.
> 
>  (3) The retryer.  This pauses the issuer, waits for all outstanding
>      subrequests to complete and then goes through the failed subrequests
>      to reissue them.  This may involve reprepping them (with cifs, the
>      credits must be renegotiated and a subrequest may need splitting), and
>      doing RMW for content crypto if there's a conflicting change on the
>      server.
> 
> David
> 
> David Howells (26):
>   cifs: Fix duplicate fscache cookie warnings
>   9p: Clean up some kdoc and unused var warnings.
>   netfs: Update i_blocks when write committed to pagecache
>   netfs: Replace PG_fscache by setting folio->private and marking dirty
>   mm: Remove the PG_fscache alias for PG_private_2
>   netfs: Remove deprecated use of PG_private_2 as a second writeback
>     flag
>   netfs: Make netfs_io_request::subreq_counter an atomic_t
>   netfs: Use subreq_counter to allocate subreq debug_index values
>   mm: Provide a means of invalidation without using launder_folio
>   cifs: Use alternative invalidation to using launder_folio
>   9p: Use alternative invalidation to using launder_folio
>   afs: Use alternative invalidation to using launder_folio
>   netfs: Remove ->launder_folio() support
>   netfs: Use mempools for allocating requests and subrequests
>   mm: Export writeback_iter()
>   netfs: Switch to using unsigned long long rather than loff_t
>   netfs: Fix writethrough-mode error handling
>   netfs: Add some write-side stats and clean up some stat names
>   netfs: New writeback implementation
>   netfs, afs: Implement helpers for new write code
>   netfs, 9p: Implement helpers for new write code
>   netfs, cachefiles: Implement helpers for new write code
>   netfs: Cut over to using new writeback code
>   netfs: Remove the old writeback code
>   netfs: Miscellaneous tidy ups
>   netfs, afs: Use writeback retry to deal with alternate keys
> 
>  fs/9p/vfs_addr.c             |  60 +--
>  fs/9p/vfs_inode_dotl.c       |   4 -
>  fs/afs/file.c                |   8 +-
>  fs/afs/internal.h            |   6 +-
>  fs/afs/validation.c          |   4 +-
>  fs/afs/write.c               | 187 ++++----
>  fs/cachefiles/io.c           |  75 +++-
>  fs/ceph/addr.c               |  24 +-
>  fs/ceph/inode.c              |   2 +
>  fs/netfs/Makefile            |   3 +-
>  fs/netfs/buffered_read.c     |  40 +-
>  fs/netfs/buffered_write.c    | 832 ++++-------------------------------
>  fs/netfs/direct_write.c      |  30 +-
>  fs/netfs/fscache_io.c        |  14 +-
>  fs/netfs/internal.h          |  55 ++-
>  fs/netfs/io.c                | 155 +------
>  fs/netfs/main.c              |  55 ++-
>  fs/netfs/misc.c              |  10 +-
>  fs/netfs/objects.c           |  81 +++-
>  fs/netfs/output.c            | 478 --------------------
>  fs/netfs/stats.c             |  17 +-
>  fs/netfs/write_collect.c     | 813 ++++++++++++++++++++++++++++++++++
>  fs/netfs/write_issue.c       | 673 ++++++++++++++++++++++++++++
>  fs/nfs/file.c                |   8 +-
>  fs/nfs/fscache.h             |   6 +-
>  fs/nfs/write.c               |   4 +-
>  fs/smb/client/cifsfs.h       |   1 -
>  fs/smb/client/file.c         | 136 +-----
>  fs/smb/client/fscache.c      |  16 +-
>  fs/smb/client/inode.c        |  27 +-
>  include/linux/fscache.h      |  22 +-
>  include/linux/netfs.h        | 196 +++++----
>  include/linux/pagemap.h      |   1 +
>  include/net/9p/client.h      |   2 +
>  include/trace/events/netfs.h | 249 ++++++++++-
>  mm/filemap.c                 |  52 ++-
>  mm/page-writeback.c          |   1 +
>  net/9p/Kconfig               |   1 +
>  net/9p/client.c              |  49 +++
>  net/9p/trans_fd.c            |   1 -
>  40 files changed, 2492 insertions(+), 1906 deletions(-)
>  delete mode 100644 fs/netfs/output.c
>  create mode 100644 fs/netfs/write_collect.c
>  create mode 100644 fs/netfs/write_issue.c
> 

This all looks pretty reasonable. There is at least one bugfix that
looks like it ought to go in independently (#17). #19 is huge, complex
and hard to review. That will need some cycles in -next, I think. In any
case, on any that I didn't send comments you can add:

    Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings
  2024-03-28 16:33 ` [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings David Howells
  2024-04-15 11:25   ` Jeff Layton
@ 2024-04-15 13:03   ` David Howells
  2024-04-15 22:51     ` Steve French
  2024-04-16 22:40     ` David Howells
  1 sibling, 2 replies; 62+ messages in thread
From: David Howells @ 2024-04-15 13:03 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Dominique Martinet, dhowells, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Steve French,
	linux-cachefs, Gao Xiang, Ilya Dryomov, Shyam Prasad N,
	Shyam Prasad N, Tom Talpey, ceph-devel, Eric Van Hensbergen,
	Christian Brauner, linux-nfs, Rohith Surabattula, netdev, v9fs,
	linux-kernel, Steve French, linux-fsdevel, netfs, linux-erofs

Jeff Layton <jlayton@kernel.org> wrote:

> > +struct cifs_fscache_inode_key {
> > +
> > +	__le64  uniqueid;	/* server inode number */
> > +	__le64  createtime;	/* creation time on server */
> > +	u8	type;		/* S_IFMT file type */
> > +} __packed;
> > +
> 
> Interesting. So the uniqueid of the inode is not unique within the fs?
> Or are the clients are mounting shares that span multiple filesystems?
> Or, are we looking at a situation where the uniqueid is being quickly
> reused for new inodes after the original inode is unlinked?

The problem is that it's not unique over time.  creat(); unlink(); creat();
may yield a repeat of the uniqueid.  It's like i_ino in that respect.

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings
  2024-04-15 13:03   ` David Howells
@ 2024-04-15 22:51     ` Steve French
  2024-04-16 22:40     ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: Steve French @ 2024-04-15 22:51 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, linux-cachefs,
	Gao Xiang, Ilya Dryomov, Shyam Prasad N, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, Rohith Surabattula, netdev, v9fs, Jeff Layton,
	linux-kernel, Steve French, linux-fsdevel, netfs, linux-erofs

Should this be merged independently (and sooner? in rc5?)

On Mon, Apr 15, 2024 at 8:04 AM David Howells <dhowells@redhat.com> wrote:
>
> Jeff Layton <jlayton@kernel.org> wrote:
>
> > > +struct cifs_fscache_inode_key {
> > > +
> > > +   __le64  uniqueid;       /* server inode number */
> > > +   __le64  createtime;     /* creation time on server */
> > > +   u8      type;           /* S_IFMT file type */
> > > +} __packed;
> > > +
> >
> > Interesting. So the uniqueid of the inode is not unique within the fs?
> > Or are the clients are mounting shares that span multiple filesystems?
> > Or, are we looking at a situation where the uniqueid is being quickly
> > reused for new inodes after the original inode is unlinked?
>
> The problem is that it's not unique over time.  creat(); unlink(); creat();
> may yield a repeat of the uniqueid.  It's like i_ino in that respect.
>
> David
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings
  2024-04-15 13:03   ` David Howells
  2024-04-15 22:51     ` Steve French
@ 2024-04-16 22:40     ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: David Howells @ 2024-04-16 22:40 UTC (permalink / raw)
  To: Steve French
  Cc: Dominique Martinet, dhowells, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, linux-cachefs,
	Gao Xiang, Ilya Dryomov, Shyam Prasad N, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, Rohith Surabattula, netdev, v9fs, Jeff Layton,
	linux-kernel, Steve French, linux-fsdevel, netfs, linux-erofs

Steve French <smfrench@gmail.com> wrote:

> Should this be merged independently (and sooner? in rc5?)

It's already upstream through the cifs tree and I've dropped it from my
branch.

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/26] netfs: Update i_blocks when write committed to pagecache
  2024-03-28 16:33 ` [PATCH 03/26] netfs: Update i_blocks when write committed to pagecache David Howells
  2024-04-15 11:28   ` Jeff Layton
@ 2024-04-16 22:47   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: David Howells @ 2024-04-16 22:47 UTC (permalink / raw)
  To: Jeff Layton, Steve French
  Cc: Dominique Martinet, dhowells, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, linux-cachefs,
	Gao Xiang, Ilya Dryomov, Shyam Prasad N, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, Rohith Surabattula, netdev, v9fs, linux-kernel,
	Steve French, linux-fsdevel, netfs, linux-erofs

Jeff Layton <jlayton@kernel.org> wrote:

> > Update i_blocks when i_size is updated when we finish making a write to the
> > pagecache to reflect the amount of space we think will be consumed.
> > 
> 
> Umm ok, but why? I get that the i_size and i_blocks would be out of sync
> until we get back new attrs from the server, but is that a problem? I'm
> mainly curious as to what's paying attention to the i_blocks during this
> window.

This is taking over from a cifs patch that does the same thing - but in code
that is removed by my cifs-netfs branch, so I should probably let Steve speak
to that, though I think the problem with cifs is that these fields aren't
properly updated until the closure occurs and the server is consulted.

    commit dbfdff402d89854126658376cbcb08363194d3cd
    Author: Steve French <stfrench@microsoft.com>
    Date:   Thu Feb 22 00:26:52 2024 -0600

    smb3: update allocation size more accurately on write completion

    Changes to allocation size are approximated for extending writes of cached
    files until the server returns the actual value (on SMB3 close or query info
    for example), but it was setting the estimated value for number of blocks
    to larger than the file size even if the file is likely sparse which
    breaks various xfstests (e.g. generic/129, 130, 221, 228).
    
    When i_size and i_blocks are updated in write completion do not increase
    allocation size more than what was written (rounded up to 512 bytes).

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 11/26] 9p: Use alternative invalidation to using launder_folio
  2024-03-28 16:34 ` [PATCH 11/26] 9p: " David Howells
  2024-04-15 11:43   ` Jeff Layton
@ 2024-04-16 23:03   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: David Howells @ 2024-04-16 23:03 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck,
	dhowells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Matthew Wilcox, Steve French, linux-cachefs,
	Gao Xiang, Ilya Dryomov, Shyam Prasad N, Tom Talpey, ceph-devel,
	Eric Van Hensbergen, Christian Brauner, linux-nfs, netdev, v9fs,
	linux-kernel, linux-fsdevel, netfs, linux-erofs

Jeff Layton <jlayton@kernel.org> wrote:

> Shouldn't this include a call to filemap_invalidate_inode? Is just
> removing launder_folio enough to do this?

Good point.  netfs_unbuffered_write_iter() calls kiocb_invalidate_pages() -
which uses invalidate_inode_pages2_range() to discard the pagecache.  It
should probably use filemap_invalidate_inode() instead.

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 17/26] netfs: Fix writethrough-mode error handling
  2024-03-28 16:34 ` [PATCH 17/26] netfs: Fix writethrough-mode error handling David Howells
  2024-04-15 12:40   ` Jeff Layton
@ 2024-04-17  9:04   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: David Howells @ 2024-04-17  9:04 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Dominique Martinet, dhowells, linux-mm, Marc Dionne, linux-afs,
	Paulo Alcantara, linux-cifs, Matthew Wilcox, Steve French,
	linux-cachefs, Gao Xiang, Ilya Dryomov, Shyam Prasad N,
	Tom Talpey, ceph-devel, Eric Van Hensbergen, Christian Brauner,
	linux-nfs, netdev, v9fs, linux-kernel, linux-fsdevel, netfs,
	linux-erofs

Jeff Layton <jlayton@kernel.org> wrote:

> Should this be merged independently? It looks like a bug that's present
> now.

Yes.  I've just posted that as a separate fix for Christian to pick up.  I
still need to keep it in this set, though, until it is upstream.

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 24/26] netfs: Remove the old writeback code
  2024-03-28 16:34 ` [PATCH 24/26] netfs: Remove the old " David Howells
  2024-04-15 12:20   ` Jeff Layton
@ 2024-04-17 10:36   ` David Howells
  1 sibling, 0 replies; 62+ messages in thread
From: David Howells @ 2024-04-17 10:36 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck,
	dhowells, linux-mm, Marc Dionne, linux-afs, Paulo Alcantara,
	linux-cifs, Matthew Wilcox, Steve French, linux-cachefs,
	Gao Xiang, Ilya Dryomov, Shyam Prasad N, Tom Talpey, ceph-devel,
	Eric Van Hensbergen, Christian Brauner, linux-nfs, netdev, v9fs,
	linux-kernel, linux-fsdevel, netfs, linux-erofs

Jeff Layton <jlayton@kernel.org> wrote:

> #23 and #24 should probably be merged. I don't see any reason to do the
> two-step of ifdef'ing out the code and then removing it. Just go for it
> at this point in the series.

I would prefer to keep the ~500 line patch that's rearranging the plumbing
separate from the ~1200 line patch that just deletes a load of lines to make
the cutover patch easier to review.  I guess that comes down to a matter of
preference.

David


^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2024-04-17 10:36 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
2024-03-28 16:33 ` [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings David Howells
2024-04-15 11:25   ` Jeff Layton
2024-04-15 13:03   ` David Howells
2024-04-15 22:51     ` Steve French
2024-04-16 22:40     ` David Howells
2024-03-28 16:33 ` [PATCH 02/26] 9p: Clean up some kdoc and unused var warnings David Howells
2024-03-28 16:33 ` [PATCH 03/26] netfs: Update i_blocks when write committed to pagecache David Howells
2024-04-15 11:28   ` Jeff Layton
2024-04-16 22:47   ` David Howells
2024-03-28 16:33 ` [PATCH 04/26] netfs: Replace PG_fscache by setting folio->private and marking dirty David Howells
2024-03-28 16:33 ` [PATCH 05/26] mm: Remove the PG_fscache alias for PG_private_2 David Howells
2024-03-28 16:33 ` [PATCH 06/26] netfs: Remove deprecated use of PG_private_2 as a second writeback flag David Howells
2024-03-28 16:33 ` [PATCH 07/26] netfs: Make netfs_io_request::subreq_counter an atomic_t David Howells
2024-03-28 16:34 ` [PATCH 08/26] netfs: Use subreq_counter to allocate subreq debug_index values David Howells
2024-03-28 16:34 ` [PATCH 09/26] mm: Provide a means of invalidation without using launder_folio David Howells
2024-04-15 11:41   ` Jeff Layton
2024-03-28 16:34 ` [PATCH 10/26] cifs: Use alternative invalidation to " David Howells
2024-03-28 16:34 ` [PATCH 11/26] 9p: " David Howells
2024-04-15 11:43   ` Jeff Layton
2024-04-16 23:03   ` David Howells
2024-03-28 16:34 ` [PATCH 12/26] afs: " David Howells
2024-03-28 16:34 ` [PATCH 13/26] netfs: Remove ->launder_folio() support David Howells
2024-03-28 16:34 ` [PATCH 14/26] netfs: Use mempools for allocating requests and subrequests David Howells
2024-03-28 16:34 ` [PATCH 15/26] mm: Export writeback_iter() David Howells
2024-04-03  8:59   ` Christoph Hellwig
2024-04-03 10:10   ` David Howells
2024-04-03 10:14     ` Christoph Hellwig
2024-04-03 10:55     ` David Howells
2024-04-03 12:41       ` Christoph Hellwig
2024-04-03 12:58       ` David Howells
2024-04-05  6:53         ` Christoph Hellwig
2024-04-05 10:15         ` Christian Brauner
2024-03-28 16:34 ` [PATCH 16/26] netfs: Switch to using unsigned long long rather than loff_t David Howells
2024-03-28 16:34 ` [PATCH 17/26] netfs: Fix writethrough-mode error handling David Howells
2024-04-15 12:40   ` Jeff Layton
2024-04-17  9:04   ` David Howells
2024-03-28 16:34 ` [PATCH 18/26] netfs: Add some write-side stats and clean up some stat names David Howells
2024-03-28 16:34 ` [PATCH 19/26] netfs: New writeback implementation David Howells
2024-03-29 10:34   ` Naveen Mamindlapalli
2024-03-30  1:06     ` Vadim Fedorenko
2024-03-30  1:06       ` Vadim Fedorenko
2024-03-30  1:03   ` Vadim Fedorenko
2024-03-28 16:34 ` [PATCH 20/26] netfs, afs: Implement helpers for new write code David Howells
2024-03-28 16:34 ` [PATCH 21/26] netfs, 9p: " David Howells
2024-03-28 16:34 ` [PATCH 22/26] netfs, cachefiles: " David Howells
2024-03-28 16:34 ` [PATCH 23/26] netfs: Cut over to using new writeback code David Howells
2024-03-28 16:34 ` [PATCH 24/26] netfs: Remove the old " David Howells
2024-04-15 12:20   ` Jeff Layton
2024-04-17 10:36   ` David Howells
2024-03-28 16:34 ` [PATCH 25/26] netfs: Miscellaneous tidy ups David Howells
2024-03-28 16:34 ` [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys David Howells
2024-04-01 13:53   ` Simon Horman
2024-04-02  8:32   ` David Howells
2024-04-10 17:38     ` Simon Horman
2024-04-11  7:09     ` David Howells
2024-04-02  8:46 ` [PATCH 19/26] netfs: New writeback implementation David Howells
2024-04-02 10:48 ` [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache Christian Brauner
2024-04-04  7:51 ` [PATCH 21/26] netfs, 9p: Implement helpers for new write code David Howells
2024-04-04  8:01 ` David Howells
2024-04-08 15:53 ` [PATCH 23/26] netfs: Cut over to using new writeback code David Howells
2024-04-15 12:49 ` [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).