linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][RFC PATCH 0/7] cifs: In-progress conversion to use iov_iters and netfslib
@ 2022-01-25 13:57 David Howells
  2022-01-25 13:57 ` [RFC PATCH 1/7] cifs: Transition from ->readpages() to ->readahead() David Howells
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: David Howells @ 2022-01-25 13:57 UTC (permalink / raw)
  To: smfrench, nspmangalore
  Cc: Matthew Wilcox, linux-cachefs, Jeff Layton, linux-cifs, dhowells,
	jlayton, linux-cifs, linux-cachefs, linux-fsdevel


Hi Steve,

Okay, I've has a go at crudely splitting up my conversion of cifs to use
netfslib into separate patches and I thought I'd post it for you and Shyam
to have a look over:

 (1) The conversion from ->readpages() to ->readahead().

 (2) A patch that does some random miscellaneous bits.

 (3) Change the I/O paths to use an iterator all the way to the socket
     instead of a page list.  Note that cifs won't compile from this patch
     until patch 6.

 (4) Replace cifs's writepages implementation with the one from afs and
     make it deal with variable rsize and stuff like that.  This sets up
     iterators rather than page lists.

     This also makes direct/unbuffered write use an iterator.  This
     probably requires more massaging to make it handle credits.

 (5) Modify cifs_readahead() to hand an iterator down.

 (6) Make direct and unbuffered reads hand an iterator down.  Note that the
     iterator refers to the original buffers and bounce pages aren't used.

 (7) Make cifs use netfslib for reading.

As stated, patches 3, 4 and 5 don't compile because the pagelist struct
members disappeared to make way for the iterators.  This avoids duplicating
various functions in the transport and transport security code.  I'm not
sure how best to deal with this - maybe by setting up bvecs instead of
pagelists at the top level and then I can hand a bvec-class iter down.

The patches can also be found here.  Note that this requires some of the
patches from my netfs-lib branch.

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=cifs-experimental

David
---
David Howells (7):
      cifs: Transition from ->readpages() to ->readahead()
      cifs: Miscellaneous bits
      cifs: Change the I/O paths to use an iterator rather than a page list
      cifs: Make cifs_writepages() hand an iterator down
      cifs: Make cifs_readahead() pass an iterator down
      cifs: Get direct I/O and unbuffered I/O working with iterators
      cifs: Use netfslib to handle reads


 fs/cifs/Kconfig        |    1 +
 fs/cifs/cifsencrypt.c  |   40 +-
 fs/cifs/cifsfs.c       |    8 +-
 fs/cifs/cifsfs.h       |    6 +-
 fs/cifs/cifsglob.h     |   34 +-
 fs/cifs/cifsproto.h    |   11 +-
 fs/cifs/cifssmb.c      |  233 +++--
 fs/cifs/connect.c      |   18 +-
 fs/cifs/file.c         | 1930 ++++++++++------------------------------
 fs/cifs/fscache.c      |   31 -
 fs/cifs/fscache.h      |   52 --
 fs/cifs/inode.c        |   17 +-
 fs/cifs/misc.c         |  109 ---
 fs/cifs/smb2ops.c      |  365 ++++----
 fs/cifs/smb2pdu.c      |   27 +-
 fs/cifs/transport.c    |   37 +-
 fs/netfs/read_helper.c |    7 +-
 17 files changed, 888 insertions(+), 2038 deletions(-)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 1/7] cifs: Transition from ->readpages() to ->readahead()
  2022-01-25 13:57 [RFC][RFC PATCH 0/7] cifs: In-progress conversion to use iov_iters and netfslib David Howells
@ 2022-01-25 13:57 ` David Howells
  2022-01-25 14:20   ` Matthew Wilcox
  2022-01-25 14:57   ` David Howells
  2022-01-25 13:57 ` [RFC PATCH 2/7] cifs: Miscellaneous bits David Howells
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: David Howells @ 2022-01-25 13:57 UTC (permalink / raw)
  To: smfrench, nspmangalore
  Cc: Matthew Wilcox, Jeff Layton, linux-cifs, linux-cachefs, dhowells,
	jlayton, linux-cifs, linux-cachefs, linux-fsdevel

Transition the cifs filesystem from using the old ->readpages() method to
using the new ->readahead() method.

For the moment, this removes any invocation of fscache to read data from
the local cache, leaving that to another patch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
---

 fs/cifs/file.c |  169 +++++++++++---------------------------------------------
 1 file changed, 33 insertions(+), 136 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 015fd415e5ee..1cce7e5b2334 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -4269,8 +4269,6 @@ cifs_readv_complete(struct work_struct *work)
 	for (i = 0; i < rdata->nr_pages; i++) {
 		struct page *page = rdata->pages[i];
 
-		lru_cache_add(page);
-
 		if (rdata->result == 0 ||
 		    (rdata->result == -EAGAIN && got_bytes)) {
 			flush_dcache_page(page);
@@ -4340,7 +4338,6 @@ readpages_fill_pages(struct TCP_Server_Info *server,
 			 * fill them until the writes are flushed.
 			 */
 			zero_user(page, 0, PAGE_SIZE);
-			lru_cache_add(page);
 			flush_dcache_page(page);
 			SetPageUptodate(page);
 			unlock_page(page);
@@ -4350,7 +4347,6 @@ readpages_fill_pages(struct TCP_Server_Info *server,
 			continue;
 		} else {
 			/* no need to hold page hostage */
-			lru_cache_add(page);
 			unlock_page(page);
 			put_page(page);
 			rdata->pages[i] = NULL;
@@ -4393,92 +4389,16 @@ cifs_readpages_copy_into_pages(struct TCP_Server_Info *server,
 	return readpages_fill_pages(server, rdata, iter, iter->count);
 }
 
-static int
-readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
-		    unsigned int rsize, struct list_head *tmplist,
-		    unsigned int *nr_pages, loff_t *offset, unsigned int *bytes)
-{
-	struct page *page, *tpage;
-	unsigned int expected_index;
-	int rc;
-	gfp_t gfp = readahead_gfp_mask(mapping);
-
-	INIT_LIST_HEAD(tmplist);
-
-	page = lru_to_page(page_list);
-
-	/*
-	 * Lock the page and put it in the cache. Since no one else
-	 * should have access to this page, we're safe to simply set
-	 * PG_locked without checking it first.
-	 */
-	__SetPageLocked(page);
-	rc = add_to_page_cache_locked(page, mapping,
-				      page->index, gfp);
-
-	/* give up if we can't stick it in the cache */
-	if (rc) {
-		__ClearPageLocked(page);
-		return rc;
-	}
-
-	/* move first page to the tmplist */
-	*offset = (loff_t)page->index << PAGE_SHIFT;
-	*bytes = PAGE_SIZE;
-	*nr_pages = 1;
-	list_move_tail(&page->lru, tmplist);
-
-	/* now try and add more pages onto the request */
-	expected_index = page->index + 1;
-	list_for_each_entry_safe_reverse(page, tpage, page_list, lru) {
-		/* discontinuity ? */
-		if (page->index != expected_index)
-			break;
-
-		/* would this page push the read over the rsize? */
-		if (*bytes + PAGE_SIZE > rsize)
-			break;
-
-		__SetPageLocked(page);
-		rc = add_to_page_cache_locked(page, mapping, page->index, gfp);
-		if (rc) {
-			__ClearPageLocked(page);
-			break;
-		}
-		list_move_tail(&page->lru, tmplist);
-		(*bytes) += PAGE_SIZE;
-		expected_index++;
-		(*nr_pages)++;
-	}
-	return rc;
-}
-
-static int cifs_readpages(struct file *file, struct address_space *mapping,
-	struct list_head *page_list, unsigned num_pages)
+static void cifs_readahead(struct readahead_control *ractl)
 {
 	int rc;
-	int err = 0;
-	struct list_head tmplist;
-	struct cifsFileInfo *open_file = file->private_data;
-	struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file);
+	struct cifsFileInfo *open_file = ractl->file->private_data;
+	struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file);
 	struct TCP_Server_Info *server;
 	pid_t pid;
 	unsigned int xid;
 
 	xid = get_xid();
-	/*
-	 * Reads as many pages as possible from fscache. Returns -ENOBUFS
-	 * immediately if the cookie is negative
-	 *
-	 * After this point, every page in the list might have PG_fscache set,
-	 * so we will need to clean that up off of every page we don't use.
-	 */
-	rc = cifs_readpages_from_fscache(mapping->host, mapping, page_list,
-					 &num_pages);
-	if (rc == 0) {
-		free_xid(xid);
-		return rc;
-	}
 
 	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
 		pid = open_file->pid;
@@ -4489,39 +4409,32 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
 	server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
 
 	cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n",
-		 __func__, file, mapping, num_pages);
+		 __func__, ractl->file, ractl->mapping, readahead_count(ractl));
 
 	/*
-	 * Start with the page at end of list and move it to private
-	 * list. Do the same with any following pages until we hit
-	 * the rsize limit, hit an index discontinuity, or run out of
-	 * pages. Issue the async read and then start the loop again
-	 * until the list is empty.
-	 *
-	 * Note that list order is important. The page_list is in
-	 * the order of declining indexes. When we put the pages in
-	 * the rdata->pages, then we want them in increasing order.
+	 * Chop the readahead request up into rsize-sized read requests.
 	 */
-	while (!list_empty(page_list) && !err) {
-		unsigned int i, nr_pages, bytes, rsize;
-		loff_t offset;
-		struct page *page, *tpage;
+	while (readahead_count(ractl) - ractl->_batch_count) {
+		unsigned int i, nr_pages, got, rsize;
+		struct page *page;
 		struct cifs_readdata *rdata;
 		struct cifs_credits credits_on_stack;
 		struct cifs_credits *credits = &credits_on_stack;
 
 		if (open_file->invalidHandle) {
 			rc = cifs_reopen_file(open_file, true);
-			if (rc == -EAGAIN)
-				continue;
-			else if (rc)
+			if (rc) {
+				if (rc == -EAGAIN)
+					continue;
 				break;
+			}
 		}
 
 		rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize,
 						   &rsize, credits);
 		if (rc)
 			break;
+		nr_pages = min_t(size_t, rsize / PAGE_SIZE, readahead_count(ractl));
 
 		/*
 		 * Give up immediately if rsize is too small to read an entire
@@ -4529,16 +4442,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
 		 * reach this point however since we set ra_pages to 0 when the
 		 * rsize is smaller than a cache page.
 		 */
-		if (unlikely(rsize < PAGE_SIZE)) {
-			add_credits_and_wake_if(server, credits, 0);
-			free_xid(xid);
-			return 0;
-		}
-
-		nr_pages = 0;
-		err = readpages_get_pages(mapping, page_list, rsize, &tmplist,
-					 &nr_pages, &offset, &bytes);
-		if (!nr_pages) {
+		if (unlikely(!nr_pages)) {
 			add_credits_and_wake_if(server, credits, 0);
 			break;
 		}
@@ -4546,36 +4450,31 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
 		rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete);
 		if (!rdata) {
 			/* best to give up if we're out of mem */
-			list_for_each_entry_safe(page, tpage, &tmplist, lru) {
-				list_del(&page->lru);
-				lru_cache_add(page);
-				unlock_page(page);
-				put_page(page);
-			}
-			rc = -ENOMEM;
 			add_credits_and_wake_if(server, credits, 0);
 			break;
 		}
 
-		rdata->cfile = cifsFileInfo_get(open_file);
-		rdata->server = server;
-		rdata->mapping = mapping;
-		rdata->offset = offset;
-		rdata->bytes = bytes;
-		rdata->pid = pid;
-		rdata->pagesz = PAGE_SIZE;
-		rdata->tailsz = PAGE_SIZE;
+		got = __readahead_batch(ractl, rdata->pages, nr_pages);
+		if (got != nr_pages) {
+			pr_warn("__readahead_batch() returned %u/%u\n",
+				got, nr_pages);
+			nr_pages = got;
+		}
+
+		rdata->nr_pages = nr_pages;
+		rdata->bytes	= readahead_batch_length(ractl);
+		rdata->cfile	= cifsFileInfo_get(open_file);
+		rdata->server	= server;
+		rdata->mapping	= ractl->mapping;
+		rdata->offset	= readahead_pos(ractl);
+		rdata->pid	= pid;
+		rdata->pagesz	= PAGE_SIZE;
+		rdata->tailsz	= PAGE_SIZE;
 		rdata->read_into_pages = cifs_readpages_read_into_pages;
 		rdata->copy_into_pages = cifs_readpages_copy_into_pages;
-		rdata->credits = credits_on_stack;
-
-		list_for_each_entry_safe(page, tpage, &tmplist, lru) {
-			list_del(&page->lru);
-			rdata->pages[rdata->nr_pages++] = page;
-		}
+		rdata->credits	= credits_on_stack;
 
 		rc = adjust_credits(server, &rdata->credits, rdata->bytes);
-
 		if (!rc) {
 			if (rdata->cfile->invalidHandle)
 				rc = -EAGAIN;
@@ -4587,7 +4486,6 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
 			add_credits_and_wake_if(server, &rdata->credits, 0);
 			for (i = 0; i < rdata->nr_pages; i++) {
 				page = rdata->pages[i];
-				lru_cache_add(page);
 				unlock_page(page);
 				put_page(page);
 			}
@@ -4600,7 +4498,6 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
 	}
 
 	free_xid(xid);
-	return rc;
 }
 
 /*
@@ -4905,7 +4802,7 @@ void cifs_oplock_break(struct work_struct *work)
  * In the non-cached mode (mount with cache=none), we shunt off direct read and write requests
  * so this method should never be called.
  *
- * Direct IO is not yet supported in the cached mode. 
+ * Direct IO is not yet supported in the cached mode.
  */
 static ssize_t
 cifs_direct_io(struct kiocb *iocb, struct iov_iter *iter)
@@ -4987,7 +4884,7 @@ static int cifs_set_page_dirty(struct page *page)
 
 const struct address_space_operations cifs_addr_ops = {
 	.readpage = cifs_readpage,
-	.readpages = cifs_readpages,
+	.readahead = cifs_readahead,
 	.writepage = cifs_writepage,
 	.writepages = cifs_writepages,
 	.write_begin = cifs_write_begin,



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 2/7] cifs: Miscellaneous bits
  2022-01-25 13:57 [RFC][RFC PATCH 0/7] cifs: In-progress conversion to use iov_iters and netfslib David Howells
  2022-01-25 13:57 ` [RFC PATCH 1/7] cifs: Transition from ->readpages() to ->readahead() David Howells
@ 2022-01-25 13:57 ` David Howells
  2022-01-25 13:57 ` [RFC PATCH 3/7] cifs: Change the I/O paths to use an iterator rather than a page list David Howells
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: David Howells @ 2022-01-25 13:57 UTC (permalink / raw)
  To: smfrench, nspmangalore
  Cc: dhowells, jlayton, linux-cifs, linux-cachefs, linux-fsdevel


---

 fs/cifs/connect.c |    2 +-
 fs/cifs/file.c    |    8 +++++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 11a22a30ee14..ed210d774a21 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -162,7 +162,7 @@ static void cifs_resolve_server(struct work_struct *work)
 	mutex_unlock(&server->srv_mutex);
 }
 
-/**
+/*
  * Mark all sessions and tcons for reconnect.
  *
  * @server needs to be previously set to CifsNeedReconnect.
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 1cce7e5b2334..24722fe75def 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -4205,13 +4205,19 @@ cifs_page_mkwrite(struct vm_fault *vmf)
 {
 	struct page *page = vmf->page;
 
+	/* Wait for the page to be written to the cache before we allow it to
+	 * be modified.  We then assume the entire page will need writing back.
+	 */
 #ifdef CONFIG_CIFS_FSCACHE
 	if (PageFsCache(page) &&
 	    wait_on_page_fscache_killable(page) < 0)
 		return VM_FAULT_RETRY;
 #endif
 
-	lock_page(page);
+	wait_on_page_writeback(page);
+
+	if (lock_page_killable(page) < 0)
+		return VM_FAULT_RETRY;
 	return VM_FAULT_LOCKED;
 }
 



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 3/7] cifs: Change the I/O paths to use an iterator rather than a page list
  2022-01-25 13:57 [RFC][RFC PATCH 0/7] cifs: In-progress conversion to use iov_iters and netfslib David Howells
  2022-01-25 13:57 ` [RFC PATCH 1/7] cifs: Transition from ->readpages() to ->readahead() David Howells
  2022-01-25 13:57 ` [RFC PATCH 2/7] cifs: Miscellaneous bits David Howells
@ 2022-01-25 13:57 ` David Howells
  2022-01-31  5:06   ` Rohith Surabattula
  2022-02-14 16:06   ` David Howells
  2022-01-25 13:58 ` [RFC PATCH 4/7] cifs: Make cifs_writepages() hand an iterator down David Howells
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: David Howells @ 2022-01-25 13:57 UTC (permalink / raw)
  To: smfrench, nspmangalore
  Cc: dhowells, jlayton, linux-cifs, linux-cachefs, linux-fsdevel


---

 fs/cifs/cifsencrypt.c |   40 +++--
 fs/cifs/cifsfs.c      |    2 
 fs/cifs/cifsfs.h      |    3 
 fs/cifs/cifsglob.h    |   28 +---
 fs/cifs/cifsproto.h   |   10 +
 fs/cifs/cifssmb.c     |  224 +++++++++++++++++++-----------
 fs/cifs/connect.c     |   16 ++
 fs/cifs/misc.c        |   19 ---
 fs/cifs/smb2ops.c     |  365 ++++++++++++++++++++++++-------------------------
 fs/cifs/smb2pdu.c     |   12 --
 fs/cifs/transport.c   |   37 +----
 11 files changed, 379 insertions(+), 377 deletions(-)

diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
index 0912d8bbbac1..69bbf3d6c4d4 100644
--- a/fs/cifs/cifsencrypt.c
+++ b/fs/cifs/cifsencrypt.c
@@ -24,12 +24,27 @@
 #include "../smbfs_common/arc4.h"
 #include <crypto/aead.h>
 
+static ssize_t cifs_signature_scan(struct iov_iter *i, const void *p,
+				   size_t len, size_t off, void *priv)
+{
+	struct shash_desc *shash = priv;
+	int rc;
+
+	rc = crypto_shash_update(shash, p, len);
+	if (rc) {
+		cifs_dbg(VFS, "%s: Could not update with payload\n", __func__);
+		return rc;
+	}
+
+	return len;
+}
+
 int __cifs_calc_signature(struct smb_rqst *rqst,
 			struct TCP_Server_Info *server, char *signature,
 			struct shash_desc *shash)
 {
 	int i;
-	int rc;
+	ssize_t rc;
 	struct kvec *iov = rqst->rq_iov;
 	int n_vec = rqst->rq_nvec;
 	int is_smb2 = server->vals->header_preamble_size == 0;
@@ -62,25 +77,10 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
 		}
 	}
 
-	/* now hash over the rq_pages array */
-	for (i = 0; i < rqst->rq_npages; i++) {
-		void *kaddr;
-		unsigned int len, offset;
-
-		rqst_page_get_length(rqst, i, &len, &offset);
-
-		kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;
-
-		rc = crypto_shash_update(shash, kaddr, len);
-		if (rc) {
-			cifs_dbg(VFS, "%s: Could not update with payload\n",
-				 __func__);
-			kunmap(rqst->rq_pages[i]);
-			return rc;
-		}
-
-		kunmap(rqst->rq_pages[i]);
-	}
+	rc = iov_iter_scan(&rqst->rq_iter, iov_iter_count(&rqst->rq_iter),
+			   cifs_signature_scan, shash);
+	if (rc < 0)
+		return rc;
 
 	rc = crypto_shash_final(shash, signature);
 	if (rc)
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 199edac0cb59..a56cb9c8c5ff 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -935,7 +935,7 @@ cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	ssize_t rc;
 	struct inode *inode = file_inode(iocb->ki_filp);
 
-	if (iocb->ki_filp->f_flags & O_DIRECT)
+	if (iocb->ki_flags & IOCB_DIRECT)
 		return cifs_user_readv(iocb, iter);
 
 	rc = cifs_revalidate_mapping(inode);
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 15a5c5db038b..1c77bbc0815f 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -110,6 +110,9 @@ extern int cifs_file_strict_mmap(struct file * , struct vm_area_struct *);
 extern const struct file_operations cifs_dir_ops;
 extern int cifs_dir_open(struct inode *inode, struct file *file);
 extern int cifs_readdir(struct file *file, struct dir_context *ctx);
+extern void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len);
+extern void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len);
+extern void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len);
 
 /* Functions related to dir entries */
 extern const struct dentry_operations cifs_dentry_ops;
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 0a4085ced40f..3a4fed645636 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -191,11 +191,8 @@ struct cifs_cred {
 struct smb_rqst {
 	struct kvec	*rq_iov;	/* array of kvecs */
 	unsigned int	rq_nvec;	/* number of kvecs in array */
-	struct page	**rq_pages;	/* pointer to array of page ptrs */
-	unsigned int	rq_offset;	/* the offset to the 1st page */
-	unsigned int	rq_npages;	/* number pages in array */
-	unsigned int	rq_pagesz;	/* page size to use */
-	unsigned int	rq_tailsz;	/* length of last page */
+	struct iov_iter	rq_iter;	/* Data iterator */
+	struct xarray	rq_buffer;	/* Page buffer for encryption */
 };
 
 struct mid_q_entry;
@@ -1323,28 +1320,18 @@ struct cifs_readdata {
 	struct address_space		*mapping;
 	struct cifs_aio_ctx		*ctx;
 	__u64				offset;
+	ssize_t				got_bytes;
 	unsigned int			bytes;
-	unsigned int			got_bytes;
 	pid_t				pid;
 	int				result;
 	struct work_struct		work;
-	int (*read_into_pages)(struct TCP_Server_Info *server,
-				struct cifs_readdata *rdata,
-				unsigned int len);
-	int (*copy_into_pages)(struct TCP_Server_Info *server,
-				struct cifs_readdata *rdata,
-				struct iov_iter *iter);
+	struct iov_iter			iter;
 	struct kvec			iov[2];
 	struct TCP_Server_Info		*server;
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	struct smbd_mr			*mr;
 #endif
-	unsigned int			pagesz;
-	unsigned int			page_offset;
-	unsigned int			tailsz;
 	struct cifs_credits		credits;
-	unsigned int			nr_pages;
-	struct page			**pages;
 };
 
 /* asynchronous write support */
@@ -1356,6 +1343,8 @@ struct cifs_writedata {
 	struct work_struct		work;
 	struct cifsFileInfo		*cfile;
 	struct cifs_aio_ctx		*ctx;
+	struct iov_iter			iter;
+	struct bio_vec			*bv;
 	__u64				offset;
 	pid_t				pid;
 	unsigned int			bytes;
@@ -1364,12 +1353,7 @@ struct cifs_writedata {
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	struct smbd_mr			*mr;
 #endif
-	unsigned int			pagesz;
-	unsigned int			page_offset;
-	unsigned int			tailsz;
 	struct cifs_credits		credits;
-	unsigned int			nr_pages;
-	struct page			**pages;
 };
 
 /*
diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index d3701295402d..1b143f0a03c0 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -242,6 +242,9 @@ extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
 					unsigned int page_offset,
 					unsigned int to_read);
 extern int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb);
+extern int cifs_read_iter_from_socket(struct TCP_Server_Info *server,
+				      struct iov_iter *iter,
+				      unsigned int to_read);
 extern int cifs_match_super(struct super_block *, void *);
 extern int cifs_mount(struct cifs_sb_info *cifs_sb, struct smb3_fs_context *ctx);
 extern void cifs_umount(struct cifs_sb_info *);
@@ -575,10 +578,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid);
 int cifs_async_writev(struct cifs_writedata *wdata,
 		      void (*release)(struct kref *kref));
 void cifs_writev_complete(struct work_struct *work);
-struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
-						work_func_t complete);
-struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages,
-						work_func_t complete);
+struct cifs_writedata *cifs_writedata_alloc(work_func_t complete);
 void cifs_writedata_release(struct kref *refcount);
 int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
 			  struct cifs_sb_info *cifs_sb,
@@ -602,8 +602,6 @@ int cifs_alloc_hash(const char *name, struct crypto_shash **shash,
 		    struct sdesc **sdesc);
 void cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc);
 
-extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
-				unsigned int *len, unsigned int *offset);
 struct cifs_chan *
 cifs_ses_find_chan(struct cifs_ses *ses, struct TCP_Server_Info *server);
 int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses);
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 071e2f21a7db..38e7276352e2 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -24,6 +24,7 @@
 #include <linux/task_io_accounting_ops.h>
 #include <linux/uaccess.h>
 #include "cifspdu.h"
+#include "cifsfs.h"
 #include "cifsglob.h"
 #include "cifsacl.h"
 #include "cifsproto.h"
@@ -1388,11 +1389,11 @@ int
 cifs_discard_remaining_data(struct TCP_Server_Info *server)
 {
 	unsigned int rfclen = server->pdu_size;
-	int remaining = rfclen + server->vals->header_preamble_size -
+	size_t remaining = rfclen + server->vals->header_preamble_size -
 		server->total_read;
 
 	while (remaining > 0) {
-		int length;
+		ssize_t length;
 
 		length = cifs_discard_from_socket(server,
 				min_t(size_t, remaining,
@@ -1539,10 +1540,15 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 		return cifs_readv_discard(server, mid);
 	}
 
-	length = rdata->read_into_pages(server, rdata, data_len);
-	if (length < 0)
-		return length;
-
+#ifdef CONFIG_CIFS_SMB_DIRECT
+	if (rdata->mr)
+		length = data_len; /* An RDMA read is already done. */
+	else
+#endif
+		length = cifs_read_iter_from_socket(server, &rdata->iter,
+						    data_len);
+	if (length > 0)
+		rdata->got_bytes += length;
 	server->total_read += length;
 
 	cifs_dbg(FYI, "total_read=%u buflen=%u remaining=%u\n",
@@ -1566,11 +1572,7 @@ cifs_readv_callback(struct mid_q_entry *mid)
 	struct TCP_Server_Info *server = tcon->ses->server;
 	struct smb_rqst rqst = { .rq_iov = rdata->iov,
 				 .rq_nvec = 2,
-				 .rq_pages = rdata->pages,
-				 .rq_offset = rdata->page_offset,
-				 .rq_npages = rdata->nr_pages,
-				 .rq_pagesz = rdata->pagesz,
-				 .rq_tailsz = rdata->tailsz };
+				 .rq_iter = rdata->iter };
 	struct cifs_credits credits = { .value = 1, .instance = 0 };
 
 	cifs_dbg(FYI, "%s: mid=%llu state=%d result=%d bytes=%u\n",
@@ -1925,10 +1927,93 @@ cifs_writedata_release(struct kref *refcount)
 	if (wdata->cfile)
 		cifsFileInfo_put(wdata->cfile);
 
-	kvfree(wdata->pages);
 	kfree(wdata);
 }
 
+/*
+ * Completion of write to server.
+ */
+void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len)
+{
+	struct address_space *mapping = inode->i_mapping;
+	struct folio *folio;
+	pgoff_t end;
+
+	XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
+
+	rcu_read_lock();
+
+	end = (start + len - 1) / PAGE_SIZE;
+	xas_for_each(&xas, folio, end) {
+		if (!folio_test_writeback(folio)) {
+			pr_err("bad %x @%llx page %lx %lx\n",
+			       len, start, folio_index(folio), end);
+			BUG();
+		}
+
+		folio_detach_private(folio);
+		folio_end_writeback(folio);
+	}
+
+	rcu_read_unlock();
+}
+
+/*
+ * Failure of write to server.
+ */
+void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len)
+{
+	struct address_space *mapping = inode->i_mapping;
+	struct folio *folio;
+	pgoff_t end;
+
+	XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
+
+	rcu_read_lock();
+
+	end = (start + len - 1) / PAGE_SIZE;
+	xas_for_each(&xas, folio, end) {
+		if (!folio_test_writeback(folio)) {
+			pr_err("bad %x @%llx page %lx %lx\n",
+			       len, start, folio_index(folio), end);
+			BUG();
+		}
+
+		folio_set_error(folio);
+		folio_end_writeback(folio);
+	}
+
+	rcu_read_unlock();
+}
+
+/*
+ * Redirty pages after a temporary failure.
+ */
+void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len)
+{
+	struct address_space *mapping = inode->i_mapping;
+	struct folio *folio;
+	pgoff_t end;
+
+	XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
+
+	rcu_read_lock();
+
+	end = (start + len - 1) / PAGE_SIZE;
+	xas_for_each(&xas, folio, end) {
+		if (!folio_test_writeback(folio)) {
+			pr_err("bad %x @%llx page %lx %lx\n",
+			       len, start, folio_index(folio), end);
+			BUG();
+		}
+
+		filemap_dirty_folio(folio->mapping, folio);
+		folio_end_writeback(folio);
+	}
+
+	rcu_read_unlock();
+}
+
 /*
  * Write failed with a retryable error. Resend the write request. It's also
  * possible that the page was redirtied so re-clean the page.
@@ -1936,51 +2021,56 @@ cifs_writedata_release(struct kref *refcount)
 static void
 cifs_writev_requeue(struct cifs_writedata *wdata)
 {
-	int i, rc = 0;
+	int rc = 0;
 	struct inode *inode = d_inode(wdata->cfile->dentry);
 	struct TCP_Server_Info *server;
-	unsigned int rest_len;
+	unsigned int rest_len = wdata->bytes;
+	loff_t fpos = wdata->offset;
 
 	server = tlink_tcon(wdata->cfile->tlink)->ses->server;
-	i = 0;
-	rest_len = wdata->bytes;
 	do {
 		struct cifs_writedata *wdata2;
-		unsigned int j, nr_pages, wsize, tailsz, cur_len;
+		unsigned int wsize, cur_len;
 
 		wsize = server->ops->wp_retry_size(inode);
 		if (wsize < rest_len) {
-			nr_pages = wsize / PAGE_SIZE;
-			if (!nr_pages) {
+			if (wsize < PAGE_SIZE) {
 				rc = -ENOTSUPP;
 				break;
 			}
-			cur_len = nr_pages * PAGE_SIZE;
-			tailsz = PAGE_SIZE;
+			cur_len = min(round_down(wsize, PAGE_SIZE), rest_len);
 		} else {
-			nr_pages = DIV_ROUND_UP(rest_len, PAGE_SIZE);
 			cur_len = rest_len;
-			tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE;
 		}
 
-		wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);
+		wdata2 = cifs_writedata_alloc(cifs_writev_complete);
 		if (!wdata2) {
 			rc = -ENOMEM;
 			break;
 		}
 
-		for (j = 0; j < nr_pages; j++) {
-			wdata2->pages[j] = wdata->pages[i + j];
-			lock_page(wdata2->pages[j]);
-			clear_page_dirty_for_io(wdata2->pages[j]);
-		}
-
 		wdata2->sync_mode = wdata->sync_mode;
-		wdata2->nr_pages = nr_pages;
-		wdata2->offset = page_offset(wdata2->pages[0]);
-		wdata2->pagesz = PAGE_SIZE;
-		wdata2->tailsz = tailsz;
-		wdata2->bytes = cur_len;
+		wdata2->offset	= fpos;
+		wdata2->bytes	= cur_len;
+		wdata2->iter	= wdata->iter;
+
+		iov_iter_advance(&wdata2->iter, fpos - wdata->offset);
+		iov_iter_truncate(&wdata2->iter, wdata2->bytes);
+
+#if 0
+		if (iov_iter_is_xarray(&wdata2->iter)) {
+			/* TODO: Check for pages having been redirtied and
+			 * clean them.  We can do this by walking the xarray.
+			 * If it's not an xarray, then it's a DIO and we
+			 * shouldn't be mucking around with the page bits.
+			 */
+			for (j = 0; j < nr_pages; j++) {
+				wdata2->pages[j] = wdata->pages[i + j];
+				lock_page(wdata2->pages[j]);
+				clear_page_dirty_for_io(wdata2->pages[j]);
+			}
+		}
+#endif
 
 		rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY,
 					    &wdata2->cfile);
@@ -1995,33 +2085,25 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
 						       cifs_writedata_release);
 		}
 
-		for (j = 0; j < nr_pages; j++) {
-			unlock_page(wdata2->pages[j]);
-			if (rc != 0 && !is_retryable_error(rc)) {
-				SetPageError(wdata2->pages[j]);
-				end_page_writeback(wdata2->pages[j]);
-				put_page(wdata2->pages[j]);
-			}
-		}
+		if (iov_iter_is_xarray(&wdata2->iter))
+			cifs_pages_written_back(inode, wdata2->offset, wdata2->bytes);
 
 		kref_put(&wdata2->refcount, cifs_writedata_release);
 		if (rc) {
 			if (is_retryable_error(rc))
 				continue;
-			i += nr_pages;
+			fpos += cur_len;
+			rest_len -= cur_len;
 			break;
 		}
 
+		fpos += cur_len;
 		rest_len -= cur_len;
-		i += nr_pages;
-	} while (i < wdata->nr_pages);
+	} while (rest_len > 0);
 
-	/* cleanup remaining pages from the original wdata */
-	for (; i < wdata->nr_pages; i++) {
-		SetPageError(wdata->pages[i]);
-		end_page_writeback(wdata->pages[i]);
-		put_page(wdata->pages[i]);
-	}
+	/* Clean up remaining pages from the original wdata */
+	if (iov_iter_is_xarray(&wdata->iter))
+		cifs_pages_written_back(inode, fpos, rest_len);
 
 	if (rc != 0 && !is_retryable_error(rc))
 		mapping_set_error(inode->i_mapping, rc);
@@ -2034,7 +2116,6 @@ cifs_writev_complete(struct work_struct *work)
 	struct cifs_writedata *wdata = container_of(work,
 						struct cifs_writedata, work);
 	struct inode *inode = d_inode(wdata->cfile->dentry);
-	int i = 0;
 
 	if (wdata->result == 0) {
 		spin_lock(&inode->i_lock);
@@ -2045,40 +2126,25 @@ cifs_writev_complete(struct work_struct *work)
 	} else if (wdata->sync_mode == WB_SYNC_ALL && wdata->result == -EAGAIN)
 		return cifs_writev_requeue(wdata);
 
-	for (i = 0; i < wdata->nr_pages; i++) {
-		struct page *page = wdata->pages[i];
-		if (wdata->result == -EAGAIN)
-			__set_page_dirty_nobuffers(page);
-		else if (wdata->result < 0)
-			SetPageError(page);
-		end_page_writeback(page);
-		cifs_readpage_to_fscache(inode, page);
-		put_page(page);
-	}
+	if (wdata->result == -EAGAIN)
+		cifs_pages_write_redirty(inode, wdata->offset, wdata->bytes);
+	else if (wdata->result < 0)
+		cifs_pages_write_failed(inode, wdata->offset, wdata->bytes);
+	else
+		cifs_pages_written_back(inode, wdata->offset, wdata->bytes);
+
 	if (wdata->result != -EAGAIN)
 		mapping_set_error(inode->i_mapping, wdata->result);
 	kref_put(&wdata->refcount, cifs_writedata_release);
 }
 
 struct cifs_writedata *
-cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
-{
-	struct page **pages =
-		kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
-	if (pages)
-		return cifs_writedata_direct_alloc(pages, complete);
-
-	return NULL;
-}
-
-struct cifs_writedata *
-cifs_writedata_direct_alloc(struct page **pages, work_func_t complete)
+cifs_writedata_alloc(work_func_t complete)
 {
 	struct cifs_writedata *wdata;
 
 	wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
 	if (wdata != NULL) {
-		wdata->pages = pages;
 		kref_init(&wdata->refcount);
 		INIT_LIST_HEAD(&wdata->list);
 		init_completion(&wdata->done);
@@ -2186,11 +2252,7 @@ cifs_async_writev(struct cifs_writedata *wdata,
 
 	rqst.rq_iov = iov;
 	rqst.rq_nvec = 2;
-	rqst.rq_pages = wdata->pages;
-	rqst.rq_offset = wdata->page_offset;
-	rqst.rq_npages = wdata->nr_pages;
-	rqst.rq_pagesz = wdata->pagesz;
-	rqst.rq_tailsz = wdata->tailsz;
+	rqst.rq_iter = wdata->iter;
 
 	cifs_dbg(FYI, "async write at %llu %u bytes\n",
 		 wdata->offset, wdata->bytes);
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index ed210d774a21..d0851c9881b3 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -704,6 +704,22 @@ cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page,
 	return cifs_readv_from_socket(server, &smb_msg);
 }
 
+int
+cifs_read_iter_from_socket(struct TCP_Server_Info *server, struct iov_iter *iter,
+			   unsigned int to_read)
+{
+	struct msghdr smb_msg;
+	int ret;
+
+	smb_msg.msg_iter = *iter;
+	if (smb_msg.msg_iter.count > to_read)
+		smb_msg.msg_iter.count = to_read;
+	ret = cifs_readv_from_socket(server, &smb_msg);
+	if (ret > 0)
+		iov_iter_advance(iter, ret);
+	return ret;
+}
+
 static bool
 is_smb_response(struct TCP_Server_Info *server, unsigned char type)
 {
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index 56598f7dbe00..f5fe5720456a 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -1122,25 +1122,6 @@ cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc)
 	*shash = NULL;
 }
 
-/**
- * rqst_page_get_length - obtain the length and offset for a page in smb_rqst
- * @rqst: The request descriptor
- * @page: The index of the page to query
- * @len: Where to store the length for this page:
- * @offset: Where to store the offset for this page
- */
-void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
-				unsigned int *len, unsigned int *offset)
-{
-	*len = rqst->rq_pagesz;
-	*offset = (page == 0) ? rqst->rq_offset : 0;
-
-	if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
-		*len = rqst->rq_tailsz;
-	else if (page == 0)
-		*len = rqst->rq_pagesz - rqst->rq_offset;
-}
-
 void extract_unc_hostname(const char *unc, const char **h, size_t *len)
 {
 	const char *end;
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index af5d0830bc8a..e1649ac194db 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -4406,15 +4406,30 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len,
 static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf,
 				   unsigned int buflen)
 {
-	void *addr;
+	struct page *page;
+
 	/*
 	 * VMAP_STACK (at least) puts stack into the vmalloc address space
 	 */
 	if (is_vmalloc_addr(buf))
-		addr = vmalloc_to_page(buf);
+		page = vmalloc_to_page(buf);
 	else
-		addr = virt_to_page(buf);
-	sg_set_page(sg, addr, buflen, offset_in_page(buf));
+		page = virt_to_page(buf);
+	sg_set_page(sg, page, buflen, offset_in_page(buf));
+}
+
+struct cifs_init_sg_priv {
+		struct scatterlist *sg;
+		unsigned int idx;
+};
+
+static ssize_t cifs_init_sg_scan(struct iov_iter *i, const void *p,
+				 size_t len, size_t off, void *_priv)
+{
+	struct cifs_init_sg_priv *priv = _priv;
+
+	smb2_sg_set_buf(&priv->sg[priv->idx++], p, len);
+	return len;
 }
 
 /* Assumes the first rqst has a transform header as the first iov.
@@ -4426,43 +4441,46 @@ static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf,
 static struct scatterlist *
 init_sg(int num_rqst, struct smb_rqst *rqst, u8 *sign)
 {
+	struct cifs_init_sg_priv priv;
 	unsigned int sg_len;
-	struct scatterlist *sg;
 	unsigned int i;
 	unsigned int j;
-	unsigned int idx = 0;
+	ssize_t rc;
 	int skip;
 
 	sg_len = 1;
-	for (i = 0; i < num_rqst; i++)
-		sg_len += rqst[i].rq_nvec + rqst[i].rq_npages;
+	for (i = 0; i < num_rqst; i++) {
+		unsigned int np = iov_iter_npages(&rqst[i].rq_iter, INT_MAX);
+		sg_len += rqst[i].rq_nvec + np;
+	}
 
-	sg = kmalloc_array(sg_len, sizeof(struct scatterlist), GFP_KERNEL);
-	if (!sg)
+	priv.idx = 0;
+	priv.sg = kmalloc_array(sg_len, sizeof(struct scatterlist), GFP_KERNEL);
+	if (!priv.sg)
 		return NULL;
 
-	sg_init_table(sg, sg_len);
+	sg_init_table(priv.sg, sg_len);
 	for (i = 0; i < num_rqst; i++) {
+		struct iov_iter *iter = &rqst[i].rq_iter;
+		size_t count = iov_iter_count(iter);
+
 		for (j = 0; j < rqst[i].rq_nvec; j++) {
 			/*
 			 * The first rqst has a transform header where the
 			 * first 20 bytes are not part of the encrypted blob
 			 */
 			skip = (i == 0) && (j == 0) ? 20 : 0;
-			smb2_sg_set_buf(&sg[idx++],
+			smb2_sg_set_buf(&priv.sg[priv.idx++],
 					rqst[i].rq_iov[j].iov_base + skip,
 					rqst[i].rq_iov[j].iov_len - skip);
-			}
-
-		for (j = 0; j < rqst[i].rq_npages; j++) {
-			unsigned int len, offset;
-
-			rqst_page_get_length(&rqst[i], j, &len, &offset);
-			sg_set_page(&sg[idx++], rqst[i].rq_pages[j], len, offset);
 		}
+
+		rc = iov_iter_scan(iter, count, cifs_init_sg_scan, &priv);
+		iov_iter_revert(iter, count);
+		WARN_ON(rc < 0);
 	}
-	smb2_sg_set_buf(&sg[idx], sign, SMB2_SIGNATURE_SIZE);
-	return sg;
+	smb2_sg_set_buf(&priv.sg[priv.idx], sign, SMB2_SIGNATURE_SIZE);
+	return priv.sg;
 }
 
 static int
@@ -4599,18 +4617,30 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst,
 	return rc;
 }
 
+/*
+ * Clear a read buffer, discarding the folios which have XA_MARK_0 set.
+ */
+static void cifs_clear_xarray_buffer(struct xarray *buffer)
+{
+       struct folio *folio;
+       XA_STATE(xas, buffer, 0);
+
+       rcu_read_lock();
+       xas_for_each_marked(&xas, folio, ULONG_MAX, XA_MARK_0) {
+               folio_put(folio);
+       }
+       rcu_read_unlock();
+       xa_destroy(buffer);
+}
+
 void
 smb3_free_compound_rqst(int num_rqst, struct smb_rqst *rqst)
 {
-	int i, j;
+	int i;
 
-	for (i = 0; i < num_rqst; i++) {
-		if (rqst[i].rq_pages) {
-			for (j = rqst[i].rq_npages - 1; j >= 0; j--)
-				put_page(rqst[i].rq_pages[j]);
-			kfree(rqst[i].rq_pages);
-		}
-	}
+	for (i = 0; i < num_rqst; i++)
+		if (!xa_empty(&rqst[i].rq_buffer))
+			cifs_clear_xarray_buffer(&rqst[i].rq_buffer);
 }
 
 /*
@@ -4630,50 +4660,51 @@ static int
 smb3_init_transform_rq(struct TCP_Server_Info *server, int num_rqst,
 		       struct smb_rqst *new_rq, struct smb_rqst *old_rq)
 {
-	struct page **pages;
 	struct smb2_transform_hdr *tr_hdr = new_rq[0].rq_iov[0].iov_base;
-	unsigned int npages;
+	struct page *page;
 	unsigned int orig_len = 0;
 	int i, j;
 	int rc = -ENOMEM;
 
 	for (i = 1; i < num_rqst; i++) {
-		npages = old_rq[i - 1].rq_npages;
-		pages = kmalloc_array(npages, sizeof(struct page *),
-				      GFP_KERNEL);
-		if (!pages)
-			goto err_free;
-
-		new_rq[i].rq_pages = pages;
-		new_rq[i].rq_npages = npages;
-		new_rq[i].rq_offset = old_rq[i - 1].rq_offset;
-		new_rq[i].rq_pagesz = old_rq[i - 1].rq_pagesz;
-		new_rq[i].rq_tailsz = old_rq[i - 1].rq_tailsz;
-		new_rq[i].rq_iov = old_rq[i - 1].rq_iov;
-		new_rq[i].rq_nvec = old_rq[i - 1].rq_nvec;
-
-		orig_len += smb_rqst_len(server, &old_rq[i - 1]);
-
-		for (j = 0; j < npages; j++) {
-			pages[j] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
-			if (!pages[j])
-				goto err_free;
-		}
-
-		/* copy pages form the old */
-		for (j = 0; j < npages; j++) {
-			char *dst, *src;
-			unsigned int offset, len;
-
-			rqst_page_get_length(&new_rq[i], j, &len, &offset);
-
-			dst = (char *) kmap(new_rq[i].rq_pages[j]) + offset;
-			src = (char *) kmap(old_rq[i - 1].rq_pages[j]) + offset;
+		struct smb_rqst *old = &old_rq[i - 1];
+		struct smb_rqst *new = &new_rq[i];
+		struct xarray *buffer = &new->rq_buffer;
+		unsigned int npages;
+		size_t size = iov_iter_count(&old->rq_iter), seg, copied = 0;
+
+		xa_init(buffer);
+
+		if (size > 0) {
+			npages = DIV_ROUND_UP(size, PAGE_SIZE);
+			for (j = 0; j < npages; j++) {
+				void *o;
+
+				rc = -ENOMEM;
+				page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
+				if (!page)
+					goto err_free;
+				page->index = j;
+				o = xa_store(buffer, j, page, GFP_KERNEL);
+				if (xa_is_err(o)) {
+					rc = xa_err(o);
+					put_page(page);
+					goto err_free;
+				}
 
-			memcpy(dst, src, len);
-			kunmap(new_rq[i].rq_pages[j]);
-			kunmap(old_rq[i - 1].rq_pages[j]);
+				seg = min(size - copied, PAGE_SIZE);
+				if (copy_page_from_iter(page, 0, seg, &old->rq_iter) != seg) {
+					rc = -EFAULT;
+					goto err_free;
+				}
+				copied += seg;
+			}
+			iov_iter_xarray(&new->rq_iter, iov_iter_rw(&old->rq_iter),
+					buffer, 0, size);
 		}
+		new->rq_iov = old->rq_iov;
+		new->rq_nvec = old->rq_nvec;
+		orig_len += smb_rqst_len(server, new);
 	}
 
 	/* fill the 1st iov with a transform header */
@@ -4701,12 +4732,12 @@ smb3_is_transform_hdr(void *buf)
 
 static int
 decrypt_raw_data(struct TCP_Server_Info *server, char *buf,
-		 unsigned int buf_data_size, struct page **pages,
-		 unsigned int npages, unsigned int page_data_size,
+		 unsigned int buf_data_size, struct iov_iter *iter,
 		 bool is_offloaded)
 {
 	struct kvec iov[2];
 	struct smb_rqst rqst = {NULL};
+	size_t iter_size = 0;
 	int rc;
 
 	iov[0].iov_base = buf;
@@ -4716,10 +4747,10 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf,
 
 	rqst.rq_iov = iov;
 	rqst.rq_nvec = 2;
-	rqst.rq_pages = pages;
-	rqst.rq_npages = npages;
-	rqst.rq_pagesz = PAGE_SIZE;
-	rqst.rq_tailsz = (page_data_size % PAGE_SIZE) ? : PAGE_SIZE;
+	if (iter) {
+		rqst.rq_iter = *iter;
+		iter_size = iov_iter_count(iter);
+	}
 
 	rc = crypt_message(server, 1, &rqst, 0);
 	cifs_dbg(FYI, "Decrypt message returned %d\n", rc);
@@ -4730,73 +4761,37 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf,
 	memmove(buf, iov[1].iov_base, buf_data_size);
 
 	if (!is_offloaded)
-		server->total_read = buf_data_size + page_data_size;
+		server->total_read = buf_data_size + iter_size;
 
 	return rc;
 }
 
 static int
-read_data_into_pages(struct TCP_Server_Info *server, struct page **pages,
-		     unsigned int npages, unsigned int len)
+cifs_copy_pages_to_iter(struct xarray *pages, unsigned int data_size,
+			unsigned int skip, struct iov_iter *iter)
 {
-	int i;
-	int length;
+	struct page *page;
+	unsigned long index;
 
-	for (i = 0; i < npages; i++) {
-		struct page *page = pages[i];
-		size_t n;
+	xa_for_each(pages, index, page) {
+		size_t n, len = min_t(unsigned int, PAGE_SIZE - skip, data_size);
 
-		n = len;
-		if (len >= PAGE_SIZE) {
-			/* enough data to fill the page */
-			n = PAGE_SIZE;
-			len -= n;
-		} else {
-			zero_user(page, len, PAGE_SIZE - len);
-			len = 0;
+		n = copy_page_to_iter(page, skip, len, iter);
+		if (n != len) {
+			cifs_dbg(VFS, "%s: something went wrong\n", __func__);
+			return -EIO;
 		}
-		length = cifs_read_page_from_socket(server, page, 0, n);
-		if (length < 0)
-			return length;
-		server->total_read += length;
+		data_size -= n;
+		skip = 0;
 	}
 
 	return 0;
 }
 
-static int
-init_read_bvec(struct page **pages, unsigned int npages, unsigned int data_size,
-	       unsigned int cur_off, struct bio_vec **page_vec)
-{
-	struct bio_vec *bvec;
-	int i;
-
-	bvec = kcalloc(npages, sizeof(struct bio_vec), GFP_KERNEL);
-	if (!bvec)
-		return -ENOMEM;
-
-	for (i = 0; i < npages; i++) {
-		bvec[i].bv_page = pages[i];
-		bvec[i].bv_offset = (i == 0) ? cur_off : 0;
-		bvec[i].bv_len = min_t(unsigned int, PAGE_SIZE, data_size);
-		data_size -= bvec[i].bv_len;
-	}
-
-	if (data_size != 0) {
-		cifs_dbg(VFS, "%s: something went wrong\n", __func__);
-		kfree(bvec);
-		return -EIO;
-	}
-
-	*page_vec = bvec;
-	return 0;
-}
-
 static int
 handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
-		 char *buf, unsigned int buf_len, struct page **pages,
-		 unsigned int npages, unsigned int page_data_size,
-		 bool is_offloaded)
+		 char *buf, unsigned int buf_len, struct xarray *pages,
+		 unsigned int pages_len, bool is_offloaded)
 {
 	unsigned int data_offset;
 	unsigned int data_len;
@@ -4805,9 +4800,6 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
 	unsigned int pad_len;
 	struct cifs_readdata *rdata = mid->callback_data;
 	struct smb2_hdr *shdr = (struct smb2_hdr *)buf;
-	struct bio_vec *bvec = NULL;
-	struct iov_iter iter;
-	struct kvec iov;
 	int length;
 	bool use_rdma_mr = false;
 
@@ -4896,7 +4888,7 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
 			return 0;
 		}
 
-		if (data_len > page_data_size - pad_len) {
+		if (data_len > pages_len - pad_len) {
 			/* data_len is corrupt -- discard frame */
 			rdata->result = -EIO;
 			if (is_offloaded)
@@ -4906,8 +4898,9 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
 			return 0;
 		}
 
-		rdata->result = init_read_bvec(pages, npages, page_data_size,
-					       cur_off, &bvec);
+		/* Copy the data to the output I/O iterator. */
+		rdata->result = cifs_copy_pages_to_iter(pages, pages_len,
+							cur_off, &rdata->iter);
 		if (rdata->result != 0) {
 			if (is_offloaded)
 				mid->mid_state = MID_RESPONSE_MALFORMED;
@@ -4915,14 +4908,15 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
 				dequeue_mid(mid, rdata->result);
 			return 0;
 		}
+		rdata->got_bytes = pages_len;
 
-		iov_iter_bvec(&iter, WRITE, bvec, npages, data_len);
 	} else if (buf_len >= data_offset + data_len) {
 		/* read response payload is in buf */
-		WARN_ONCE(npages > 0, "read data can be either in buf or in pages");
-		iov.iov_base = buf + data_offset;
-		iov.iov_len = data_len;
-		iov_iter_kvec(&iter, WRITE, &iov, 1, data_len);
+		WARN_ONCE(pages && !xa_empty(pages),
+			  "read data can be either in buf or in pages");
+		length = copy_to_iter(buf + data_offset, data_len, &rdata->iter);
+		if (length < 0)
+			return length;
 	} else {
 		/* read response payload cannot be in both buf and pages */
 		WARN_ONCE(1, "buf can not contain only a part of read data");
@@ -4934,13 +4928,6 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
 		return 0;
 	}
 
-	length = rdata->copy_into_pages(server, rdata, &iter);
-
-	kfree(bvec);
-
-	if (length < 0)
-		return length;
-
 	if (is_offloaded)
 		mid->mid_state = MID_RESPONSE_RECEIVED;
 	else
@@ -4951,9 +4938,8 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
 struct smb2_decrypt_work {
 	struct work_struct decrypt;
 	struct TCP_Server_Info *server;
-	struct page **ppages;
+	struct xarray buffer;
 	char *buf;
-	unsigned int npages;
 	unsigned int len;
 };
 
@@ -4962,11 +4948,13 @@ static void smb2_decrypt_offload(struct work_struct *work)
 {
 	struct smb2_decrypt_work *dw = container_of(work,
 				struct smb2_decrypt_work, decrypt);
-	int i, rc;
+	int rc;
 	struct mid_q_entry *mid;
+	struct iov_iter iter;
 
+	iov_iter_xarray(&iter, READ, &dw->buffer, 0, dw->len);
 	rc = decrypt_raw_data(dw->server, dw->buf, dw->server->vals->read_rsp_size,
-			      dw->ppages, dw->npages, dw->len, true);
+			      &iter, true);
 	if (rc) {
 		cifs_dbg(VFS, "error decrypting rc=%d\n", rc);
 		goto free_pages;
@@ -4980,7 +4968,7 @@ static void smb2_decrypt_offload(struct work_struct *work)
 		mid->decrypted = true;
 		rc = handle_read_data(dw->server, mid, dw->buf,
 				      dw->server->vals->read_rsp_size,
-				      dw->ppages, dw->npages, dw->len,
+				      &dw->buffer, dw->len,
 				      true);
 		if (rc >= 0) {
 #ifdef CONFIG_CIFS_STATS2
@@ -5012,10 +5000,7 @@ static void smb2_decrypt_offload(struct work_struct *work)
 	}
 
 free_pages:
-	for (i = dw->npages-1; i >= 0; i--)
-		put_page(dw->ppages[i]);
-
-	kfree(dw->ppages);
+	cifs_clear_xarray_buffer(&dw->buffer);
 	cifs_small_buf_release(dw->buf);
 	kfree(dw);
 }
@@ -5025,47 +5010,66 @@ static int
 receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
 		       int *num_mids)
 {
+	struct page *page;
 	char *buf = server->smallbuf;
 	struct smb2_transform_hdr *tr_hdr = (struct smb2_transform_hdr *)buf;
-	unsigned int npages;
-	struct page **pages;
-	unsigned int len;
+	struct iov_iter iter;
+	unsigned int len, npages;
 	unsigned int buflen = server->pdu_size;
 	int rc;
 	int i = 0;
 	struct smb2_decrypt_work *dw;
 
+	dw = kzalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL);
+	if (!dw)
+		return -ENOMEM;
+	xa_init(&dw->buffer);
+	INIT_WORK(&dw->decrypt, smb2_decrypt_offload);
+	dw->server = server;
+
 	*num_mids = 1;
 	len = min_t(unsigned int, buflen, server->vals->read_rsp_size +
 		sizeof(struct smb2_transform_hdr)) - HEADER_SIZE(server) + 1;
 
 	rc = cifs_read_from_socket(server, buf + HEADER_SIZE(server) - 1, len);
 	if (rc < 0)
-		return rc;
+		goto free_dw;
 	server->total_read += rc;
 
 	len = le32_to_cpu(tr_hdr->OriginalMessageSize) -
 		server->vals->read_rsp_size;
+	dw->len = len;
 	npages = DIV_ROUND_UP(len, PAGE_SIZE);
 
-	pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
-	if (!pages) {
-		rc = -ENOMEM;
-		goto discard_data;
-	}
-
+	rc = -ENOMEM;
 	for (; i < npages; i++) {
-		pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
-		if (!pages[i]) {
-			rc = -ENOMEM;
+		void *old;
+
+		page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
+		if (!page) {
+			goto discard_data;
+		}
+		page->index = i;
+		old = xa_store(&dw->buffer, i, page, GFP_KERNEL);
+		if (xa_is_err(old)) {
+			rc = xa_err(old);
+			put_page(page);
 			goto discard_data;
 		}
 	}
 
-	/* read read data into pages */
-	rc = read_data_into_pages(server, pages, npages, len);
-	if (rc)
-		goto free_pages;
+	iov_iter_xarray(&iter, READ, &dw->buffer, 0, npages * PAGE_SIZE);
+
+	/* Read the data into the buffer and clear excess bufferage. */
+	rc = cifs_read_iter_from_socket(server, &iter, dw->len);
+	if (rc < 0)
+		goto discard_data;
+
+	server->total_read += rc;
+	if (rc < npages * PAGE_SIZE)
+		iov_iter_zero(npages * PAGE_SIZE - rc, &iter);
+	iov_iter_revert(&iter, npages * PAGE_SIZE);
+	iov_iter_truncate(&iter, dw->len);
 
 	rc = cifs_discard_remaining_data(server);
 	if (rc)
@@ -5078,39 +5082,28 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
 
 	if ((server->min_offload) && (server->in_flight > 1) &&
 	    (server->pdu_size >= server->min_offload)) {
-		dw = kmalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL);
-		if (dw == NULL)
-			goto non_offloaded_decrypt;
-
 		dw->buf = server->smallbuf;
 		server->smallbuf = (char *)cifs_small_buf_get();
 
-		INIT_WORK(&dw->decrypt, smb2_decrypt_offload);
-
-		dw->npages = npages;
-		dw->server = server;
-		dw->ppages = pages;
-		dw->len = len;
 		queue_work(decrypt_wq, &dw->decrypt);
 		*num_mids = 0; /* worker thread takes care of finding mid */
 		return -1;
 	}
 
-non_offloaded_decrypt:
 	rc = decrypt_raw_data(server, buf, server->vals->read_rsp_size,
-			      pages, npages, len, false);
+			      &iter, false);
 	if (rc)
 		goto free_pages;
 
 	*mid = smb2_find_mid(server, buf);
-	if (*mid == NULL)
+	if (*mid == NULL) {
 		cifs_dbg(FYI, "mid not found\n");
-	else {
+	} else {
 		cifs_dbg(FYI, "mid found\n");
 		(*mid)->decrypted = true;
 		rc = handle_read_data(server, *mid, buf,
 				      server->vals->read_rsp_size,
-				      pages, npages, len, false);
+				      &dw->buffer, dw->len, false);
 		if (rc >= 0) {
 			if (server->ops->is_network_name_deleted) {
 				server->ops->is_network_name_deleted(buf,
@@ -5120,9 +5113,9 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
 	}
 
 free_pages:
-	for (i = i - 1; i >= 0; i--)
-		put_page(pages[i]);
-	kfree(pages);
+	cifs_clear_xarray_buffer(&dw->buffer);
+free_dw:
+	kfree(dw);
 	return rc;
 discard_data:
 	cifs_discard_remaining_data(server);
@@ -5160,7 +5153,7 @@ receive_encrypted_standard(struct TCP_Server_Info *server,
 	server->total_read += length;
 
 	buf_size = pdu_length - sizeof(struct smb2_transform_hdr);
-	length = decrypt_raw_data(server, buf, buf_size, NULL, 0, 0, false);
+	length = decrypt_raw_data(server, buf, buf_size, NULL, false);
 	if (length)
 		return length;
 
@@ -5259,7 +5252,7 @@ smb3_handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 	char *buf = server->large_buf ? server->bigbuf : server->smallbuf;
 
 	return handle_read_data(server, mid, buf, server->pdu_size,
-				NULL, 0, 0, false);
+				NULL, 0, false);
 }
 
 static int
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 7e7909b1ae11..ebbea7526ee2 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -4118,11 +4118,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
 	struct cifs_credits credits = { .value = 0, .instance = 0 };
 	struct smb_rqst rqst = { .rq_iov = &rdata->iov[1],
 				 .rq_nvec = 1,
-				 .rq_pages = rdata->pages,
-				 .rq_offset = rdata->page_offset,
-				 .rq_npages = rdata->nr_pages,
-				 .rq_pagesz = rdata->pagesz,
-				 .rq_tailsz = rdata->tailsz };
+				 .rq_iter = rdata->iter };
 
 	WARN_ONCE(rdata->server != mid->server,
 		  "rdata server %p != mid server %p",
@@ -4522,11 +4518,7 @@ smb2_async_writev(struct cifs_writedata *wdata,
 
 	rqst.rq_iov = iov;
 	rqst.rq_nvec = 1;
-	rqst.rq_pages = wdata->pages;
-	rqst.rq_offset = wdata->page_offset;
-	rqst.rq_npages = wdata->nr_pages;
-	rqst.rq_pagesz = wdata->pagesz;
-	rqst.rq_tailsz = wdata->tailsz;
+	rqst.rq_iter = wdata->iter;
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	if (wdata->mr) {
 		iov[0].iov_len += sizeof(struct smbd_buffer_descriptor_v1);
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 8540f7c13eae..cb19c43c0009 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -276,26 +276,7 @@ smb_rqst_len(struct TCP_Server_Info *server, struct smb_rqst *rqst)
 	for (i = 0; i < nvec; i++)
 		buflen += iov[i].iov_len;
 
-	/*
-	 * Add in the page array if there is one. The caller needs to make
-	 * sure rq_offset and rq_tailsz are set correctly. If a buffer of
-	 * multiple pages ends at page boundary, rq_tailsz needs to be set to
-	 * PAGE_SIZE.
-	 */
-	if (rqst->rq_npages) {
-		if (rqst->rq_npages == 1)
-			buflen += rqst->rq_tailsz;
-		else {
-			/*
-			 * If there is more than one page, calculate the
-			 * buffer length based on rq_offset and rq_tailsz
-			 */
-			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
-					rqst->rq_offset;
-			buflen += rqst->rq_tailsz;
-		}
-	}
-
+	buflen += iov_iter_count(&rqst->rq_iter);
 	return buflen;
 }
 
@@ -382,23 +363,15 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst,
 
 		total_len += sent;
 
-		/* now walk the page array and send each page in it */
-		for (i = 0; i < rqst[j].rq_npages; i++) {
-			struct bio_vec bvec;
-
-			bvec.bv_page = rqst[j].rq_pages[i];
-			rqst_page_get_length(&rqst[j], i, &bvec.bv_len,
-					     &bvec.bv_offset);
-
-			iov_iter_bvec(&smb_msg.msg_iter, WRITE,
-				      &bvec, 1, bvec.bv_len);
+		if (iov_iter_count(&rqst[j].rq_iter) > 0) {
+			smb_msg.msg_iter = rqst[j].rq_iter;
 			rc = smb_send_kvec(server, &smb_msg, &sent);
 			if (rc < 0)
 				break;
-
 			total_len += sent;
 		}
-	}
+
+}
 
 unmask:
 	sigprocmask(SIG_SETMASK, &oldmask, NULL);



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 4/7] cifs: Make cifs_writepages() hand an iterator down
  2022-01-25 13:57 [RFC][RFC PATCH 0/7] cifs: In-progress conversion to use iov_iters and netfslib David Howells
                   ` (2 preceding siblings ...)
  2022-01-25 13:57 ` [RFC PATCH 3/7] cifs: Change the I/O paths to use an iterator rather than a page list David Howells
@ 2022-01-25 13:58 ` David Howells
  2022-01-25 13:58 ` [RFC PATCH 5/7] cifs: Make cifs_readahead() pass " David Howells
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: David Howells @ 2022-01-25 13:58 UTC (permalink / raw)
  To: smfrench, nspmangalore
  Cc: dhowells, jlayton, linux-cifs, linux-cachefs, linux-fsdevel


---

 fs/cifs/file.c |  725 +++++++++++++++++++++++---------------------------------
 1 file changed, 304 insertions(+), 421 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 24722fe75def..f40e5b938d43 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2262,294 +2262,334 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
 	return rc;
 }
 
-static struct cifs_writedata *
-wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping,
-			  pgoff_t end, pgoff_t *index,
-			  unsigned int *found_pages)
+/*
+ * Extend the region to be written back to include subsequent contiguously
+ * dirty pages if possible, but don't sleep while doing so.
+ */
+static void cifs_extend_writeback(struct address_space *mapping,
+				  long *_count,
+				  loff_t start,
+				  loff_t max_len,
+				  unsigned int *_len)
 {
-	struct cifs_writedata *wdata;
-
-	wdata = cifs_writedata_alloc((unsigned int)tofind,
-				     cifs_writev_complete);
-	if (!wdata)
-		return NULL;
+	struct pagevec pvec;
+	struct folio *folio;
+	unsigned int psize;
+	loff_t len = *_len;
+	pgoff_t index = (start + len) / PAGE_SIZE;
+	bool stop = true;
+	unsigned int i;
 
-	*found_pages = find_get_pages_range_tag(mapping, index, end,
-				PAGECACHE_TAG_DIRTY, tofind, wdata->pages);
-	return wdata;
-}
+	XA_STATE(xas, &mapping->i_pages, index);
+	pagevec_init(&pvec);
 
-static unsigned int
-wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
-		    struct address_space *mapping,
-		    struct writeback_control *wbc,
-		    pgoff_t end, pgoff_t *index, pgoff_t *next, bool *done)
-{
-	unsigned int nr_pages = 0, i;
-	struct page *page;
-
-	for (i = 0; i < found_pages; i++) {
-		page = wdata->pages[i];
-		/*
-		 * At this point we hold neither the i_pages lock nor the
-		 * page lock: the page may be truncated or invalidated
-		 * (changing page->mapping to NULL), or even swizzled
-		 * back from swapper_space to tmpfs file mapping
+	do {
+		/* Firstly, we gather up a batch of contiguous dirty pages
+		 * under the RCU read lock - but we can't clear the dirty flags
+		 * there if any of those pages are mapped.
 		 */
+		rcu_read_lock();
 
-		if (nr_pages == 0)
-			lock_page(page);
-		else if (!trylock_page(page))
-			break;
+		xas_for_each(&xas, folio, ULONG_MAX) {
+			stop = true;
+			if (xas_retry(&xas, folio))
+				continue;
+			if (xa_is_value(folio))
+				break;
+			if (folio_index(folio) != index)
+				break;
 
-		if (unlikely(page->mapping != mapping)) {
-			unlock_page(page);
-			break;
-		}
+			if (!folio_try_get_rcu(folio)) {
+				xas_reset(&xas);
+				continue;
+			}
 
-		if (!wbc->range_cyclic && page->index > end) {
-			*done = true;
-			unlock_page(page);
-			break;
-		}
+			/* Has the page moved or been split? */
+			if (unlikely(folio != xas_reload(&xas))) {
+				folio_put(folio);
+				break;
+			}
 
-		if (*next && (page->index != *next)) {
-			/* Not next consecutive page */
-			unlock_page(page);
-			break;
-		}
+			if (!folio_trylock(folio)) {
+				folio_put(folio);
+				break;
+			}
+			if (!folio_test_dirty(folio) || folio_test_writeback(folio)) {
+				folio_unlock(folio);
+				folio_put(folio);
+				break;
+			}
 
-		if (wbc->sync_mode != WB_SYNC_NONE)
-			wait_on_page_writeback(page);
+			psize = folio_size(folio);
+			len += psize;
+			if (len >= max_len || *_count <= 0)
+				stop = true;
 
-		if (PageWriteback(page) ||
-				!clear_page_dirty_for_io(page)) {
-			unlock_page(page);
-			break;
+			index += folio_nr_pages(folio);
+			if (!pagevec_add(&pvec, &folio->page))
+				break;
+			if (stop)
+				break;
 		}
 
-		/*
-		 * This actually clears the dirty bit in the radix tree.
-		 * See cifs_writepage() for more commentary.
+		if (!stop)
+			xas_pause(&xas);
+		rcu_read_unlock();
+
+		/* Now, if we obtained any pages, we can shift them to being
+		 * writable and mark them for caching.
 		 */
-		set_page_writeback(page);
-		if (page_offset(page) >= i_size_read(mapping->host)) {
-			*done = true;
-			unlock_page(page);
-			end_page_writeback(page);
+		if (!pagevec_count(&pvec))
 			break;
-		}
-
-		wdata->pages[i] = page;
-		*next = page->index + 1;
-		++nr_pages;
-	}
-
-	/* reset index to refind any pages skipped */
-	if (nr_pages == 0)
-		*index = wdata->pages[0]->index + 1;
 
-	/* put any pages we aren't going to use */
-	for (i = nr_pages; i < found_pages; i++) {
-		put_page(wdata->pages[i]);
-		wdata->pages[i] = NULL;
-	}
-
-	return nr_pages;
-}
+		for (i = 0; i < pagevec_count(&pvec); i++) {
+			folio = page_folio(pvec.pages[i]);
+			if (!folio_clear_dirty_for_io(folio))
+				BUG();
+			if (folio_start_writeback(folio))
+				BUG();
 
-static int
-wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages,
-		 struct address_space *mapping, struct writeback_control *wbc)
-{
-	int rc;
-
-	wdata->sync_mode = wbc->sync_mode;
-	wdata->nr_pages = nr_pages;
-	wdata->offset = page_offset(wdata->pages[0]);
-	wdata->pagesz = PAGE_SIZE;
-	wdata->tailsz = min(i_size_read(mapping->host) -
-			page_offset(wdata->pages[nr_pages - 1]),
-			(loff_t)PAGE_SIZE);
-	wdata->bytes = ((nr_pages - 1) * PAGE_SIZE) + wdata->tailsz;
-	wdata->pid = wdata->cfile->pid;
-
-	rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes);
-	if (rc)
-		return rc;
+			*_count -= folio_nr_pages(folio);
+			folio_unlock(folio);
+		}
 
-	if (wdata->cfile->invalidHandle)
-		rc = -EAGAIN;
-	else
-		rc = wdata->server->ops->async_writev(wdata,
-						      cifs_writedata_release);
+		pagevec_release(&pvec);
+		cond_resched();
+	} while (!stop);
 
-	return rc;
+	*_len = len;
 }
 
-static int cifs_writepages(struct address_space *mapping,
-			   struct writeback_control *wbc)
+/*
+ * Write back the locked page and any subsequent non-locked dirty pages.
+ */
+static ssize_t cifs_write_back_from_locked_folio(struct address_space *mapping,
+						 struct writeback_control *wbc,
+						 struct folio *folio,
+						 loff_t start, loff_t end)
 {
 	struct inode *inode = mapping->host;
-	struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
 	struct TCP_Server_Info *server;
-	bool done = false, scanned = false, range_whole = false;
-	pgoff_t end, index;
 	struct cifs_writedata *wdata;
+	struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
+	struct cifs_credits credits_on_stack;
+	struct cifs_credits *credits = &credits_on_stack;
 	struct cifsFileInfo *cfile = NULL;
-	int rc = 0;
-	int saved_rc = 0;
-	unsigned int xid;
+	unsigned int xid, wsize, len, max_len;
+	loff_t i_size = i_size_read(inode);
+	long count = wbc->nr_to_write;
+	int rc;
 
-	/*
-	 * If wsize is smaller than the page cache size, default to writing
-	 * one page at a time via cifs_writepage
-	 */
-	if (cifs_sb->ctx->wsize < PAGE_SIZE)
-		return generic_writepages(mapping, wbc);
+	if (folio_start_writeback(folio))
+		BUG();
+
+	count -= folio_nr_pages(folio);
+	len = folio_size(folio);
 
 	xid = get_xid();
-	if (wbc->range_cyclic) {
-		index = mapping->writeback_index; /* Start from prev offset */
-		end = -1;
-	} else {
-		index = wbc->range_start >> PAGE_SHIFT;
-		end = wbc->range_end >> PAGE_SHIFT;
-		if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
-			range_whole = true;
-		scanned = true;
-	}
 	server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses);
 
-retry:
-	while (!done && index <= end) {
-		unsigned int i, nr_pages, found_pages, wsize;
-		pgoff_t next = 0, tofind, saved_index = index;
-		struct cifs_credits credits_on_stack;
-		struct cifs_credits *credits = &credits_on_stack;
-		int get_file_rc = 0;
+	rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile);
+	if (rc) {
+		cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", rc);
+		goto err_xid;
+	}
 
-		if (cfile)
-			cifsFileInfo_put(cfile);
+	rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize,
+					   &wsize, credits);
+	if (rc != 0)
+		goto err_close;
 
-		rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile);
+	wdata = cifs_writedata_alloc(cifs_writev_complete);
+	if (!wdata) {
+		rc = -ENOMEM;
+		goto err_uncredit;
+	}
 
-		/* in case of an error store it to return later */
-		if (rc)
-			get_file_rc = rc;
+	wdata->sync_mode = wbc->sync_mode;
+	wdata->offset = folio_pos(folio);
+	wdata->pid = cfile->pid;
+	wdata->credits = credits_on_stack;
+	wdata->cfile = cfile;
+	wdata->server = server;
+	cfile = NULL;
+
+	/* Find all consecutive lockable dirty pages, stopping when we find a
+	 * page that is not immediately lockable, is not dirty or is missing,
+	 * or we reach the end of the range.
+	 */
+	if (start < i_size) {
+		/* Trim the write to the EOF; the extra data is ignored.  Also
+		 * put an upper limit on the size of a single storedata op.
+		 */
+		max_len = wsize;
+		max_len = min_t(unsigned long long, max_len, end - start + 1);
+		max_len = min_t(unsigned long long, max_len, i_size - start);
 
-		rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize,
-						   &wsize, credits);
-		if (rc != 0) {
-			done = true;
-			break;
-		}
+		if (len < max_len)
+			cifs_extend_writeback(mapping, &count, start,
+					      max_len, &len);
+		len = min_t(loff_t, len, max_len);
+	}
 
-		tofind = min((wsize / PAGE_SIZE) - 1, end - index) + 1;
+	wdata->bytes = len;
 
-		wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index,
-						  &found_pages);
-		if (!wdata) {
-			rc = -ENOMEM;
-			done = true;
-			add_credits_and_wake_if(server, credits, 0);
-			break;
-		}
+	/* We now have a contiguous set of dirty pages, each with writeback
+	 * set; the first page is still locked at this point, but all the rest
+	 * have been unlocked.
+	 */
+	folio_unlock(folio);
 
-		if (found_pages == 0) {
-			kref_put(&wdata->refcount, cifs_writedata_release);
-			add_credits_and_wake_if(server, credits, 0);
-			break;
-		}
+	if (start < i_size) {
+		iov_iter_xarray(&wdata->iter, WRITE, &mapping->i_pages, start, len);
 
-		nr_pages = wdata_prepare_pages(wdata, found_pages, mapping, wbc,
-					       end, &index, &next, &done);
+		rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes);
+		if (rc)
+			goto err_wdata;
 
-		/* nothing to write? */
-		if (nr_pages == 0) {
-			kref_put(&wdata->refcount, cifs_writedata_release);
-			add_credits_and_wake_if(server, credits, 0);
-			continue;
-		}
+		if (wdata->cfile->invalidHandle)
+			rc = -EAGAIN;
+		else
+			rc = wdata->server->ops->async_writev(wdata,
+							      cifs_writedata_release);
+	} else {
+		/* The dirty region was entirely beyond the EOF. */
+		cifs_pages_written_back(inode, start, len);
+		rc = 0;
+	}
 
-		wdata->credits = credits_on_stack;
-		wdata->cfile = cfile;
-		wdata->server = server;
-		cfile = NULL;
+err_wdata:
+	kref_put(&wdata->refcount, cifs_writedata_release);
+err_uncredit:
+	add_credits_and_wake_if(server, credits, 0);
+err_close:
+	if (cfile)
+		cifsFileInfo_put(cfile);
+err_xid:
+	free_xid(xid);
+	if (rc == 0) {
+		wbc->nr_to_write = count;
+	} else if (is_retryable_error(rc)) {
+		cifs_pages_write_redirty(inode, start, len);
+	} else {
+		cifs_pages_write_failed(inode, start, len);
+		mapping_set_error(mapping, rc);
+	}
+	/* Indication to update ctime and mtime as close is deferred */
+	set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags);
+	return rc;
+}
 
-		if (!wdata->cfile) {
-			cifs_dbg(VFS, "No writable handle in writepages rc=%d\n",
-				 get_file_rc);
-			if (is_retryable_error(get_file_rc))
-				rc = get_file_rc;
-			else
-				rc = -EBADF;
-		} else
-			rc = wdata_send_pages(wdata, nr_pages, mapping, wbc);
+/*
+ * write a region of pages back to the server
+ */
+static int cifs_writepages_region(struct address_space *mapping,
+				  struct writeback_control *wbc,
+				  loff_t start, loff_t end, loff_t *_next)
+{
+	struct folio *folio;
+	struct page *head_page;
+	ssize_t ret;
+	int n;
 
-		for (i = 0; i < nr_pages; ++i)
-			unlock_page(wdata->pages[i]);
+	do {
+		pgoff_t index = start / PAGE_SIZE;
 
-		/* send failure -- clean up the mess */
-		if (rc != 0) {
-			add_credits_and_wake_if(server, &wdata->credits, 0);
-			for (i = 0; i < nr_pages; ++i) {
-				if (is_retryable_error(rc))
-					redirty_page_for_writepage(wbc,
-							   wdata->pages[i]);
-				else
-					SetPageError(wdata->pages[i]);
-				end_page_writeback(wdata->pages[i]);
-				put_page(wdata->pages[i]);
+		n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE,
+					     PAGECACHE_TAG_DIRTY, 1, &head_page);
+		if (!n)
+			break;
+
+		folio = page_folio(head_page);
+		start = folio_pos(folio); /* May regress with THPs */
+
+		/* At this point we hold neither the i_pages lock nor the
+		 * page lock: the page may be truncated or invalidated
+		 * (changing page->mapping to NULL), or even swizzled
+		 * back from swapper_space to tmpfs file mapping
+		 */
+		if (wbc->sync_mode != WB_SYNC_NONE) {
+			ret = folio_lock_killable(folio);
+			if (ret < 0) {
+				folio_put(folio);
+				return ret;
+			}
+		} else {
+			if (!folio_trylock(folio)) {
+				folio_put(folio);
+				return 0;
 			}
-			if (!is_retryable_error(rc))
-				mapping_set_error(mapping, rc);
 		}
-		kref_put(&wdata->refcount, cifs_writedata_release);
 
-		if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) {
-			index = saved_index;
+		if (folio_mapping(folio) != mapping ||
+		    !folio_test_dirty(folio)) {
+			start += folio_size(folio);
+			folio_unlock(folio);
+			folio_put(folio);
 			continue;
 		}
 
-		/* Return immediately if we received a signal during writing */
-		if (is_interrupt_error(rc)) {
-			done = true;
-			break;
+		if (folio_test_writeback(folio)) {
+			folio_unlock(folio);
+			if (wbc->sync_mode != WB_SYNC_NONE)
+				folio_wait_writeback(folio);
+			folio_put(folio);
+			continue;
 		}
 
-		if (rc != 0 && saved_rc == 0)
-			saved_rc = rc;
+		if (!folio_clear_dirty_for_io(folio))
+			BUG();
 
-		wbc->nr_to_write -= nr_pages;
-		if (wbc->nr_to_write <= 0)
-			done = true;
+		ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
+		folio_put(folio);
+		if (ret < 0)
+			return ret;
 
-		index = next;
-	}
+		start += ret;
+		cond_resched();
+	} while (wbc->nr_to_write > 0);
 
-	if (!scanned && !done) {
-		/*
-		 * We hit the last page and there is more work to be done: wrap
-		 * back to the start of the file
-		 */
-		scanned = true;
-		index = 0;
-		goto retry;
-	}
+	*_next = start;
+	return 0;
+}
 
-	if (saved_rc != 0)
-		rc = saved_rc;
+/*
+ * Write some of the pending data back to the server
+ */
+static int cifs_writepages(struct address_space *mapping,
+			   struct writeback_control *wbc)
+{
+	loff_t start, next;
+	int ret;
 
-	if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
-		mapping->writeback_index = index;
+	/* We have to be careful as we can end up racing with setattr()
+	 * truncating the pagecache since the caller doesn't take a lock here
+	 * to prevent it.
+	 */
 
-	if (cfile)
-		cifsFileInfo_put(cfile);
-	free_xid(xid);
-	/* Indication to update ctime and mtime as close is deferred */
-	set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags);
-	return rc;
+	if (wbc->range_cyclic) {
+		start = mapping->writeback_index * PAGE_SIZE;
+		ret = cifs_writepages_region(mapping, wbc, start, LLONG_MAX, &next);
+		if (ret == 0) {
+			mapping->writeback_index = next / PAGE_SIZE;
+			if (start > 0 && wbc->nr_to_write > 0) {
+				ret = cifs_writepages_region(mapping, wbc, 0,
+							     start, &next);
+				if (ret == 0)
+					mapping->writeback_index =
+						next / PAGE_SIZE;
+			}
+		}
+	} else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) {
+		ret = cifs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next);
+		if (wbc->nr_to_write > 0 && ret == 0)
+			mapping->writeback_index = next / PAGE_SIZE;
+	} else {
+		ret = cifs_writepages_region(mapping, wbc,
+					     wbc->range_start, wbc->range_end, &next);
+	}
+
+	return ret;
 }
 
 static int
@@ -2608,6 +2648,7 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
 	struct inode *inode = mapping->host;
 	struct cifsFileInfo *cfile = file->private_data;
 	struct cifs_sb_info *cifs_sb = CIFS_SB(cfile->dentry->d_sb);
+	struct folio *folio = page_folio(page);
 	__u32 pid;
 
 	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
@@ -2618,14 +2659,14 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
 	cifs_dbg(FYI, "write_end for page %p from pos %lld with %d bytes\n",
 		 page, pos, copied);
 
-	if (PageChecked(page)) {
+	if (folio_test_checked(folio)) {
 		if (copied == len)
-			SetPageUptodate(page);
-		ClearPageChecked(page);
-	} else if (!PageUptodate(page) && copied == PAGE_SIZE)
-		SetPageUptodate(page);
+			folio_mark_uptodate(folio);
+		folio_clear_checked(folio);
+	} else if (!folio_test_uptodate(folio) && copied == PAGE_SIZE)
+		folio_mark_uptodate(folio);
 
-	if (!PageUptodate(page)) {
+	if (!folio_test_uptodate(folio)) {
 		char *page_data;
 		unsigned offset = pos & (PAGE_SIZE - 1);
 		unsigned int xid;
@@ -2782,57 +2823,13 @@ int cifs_flush(struct file *file, fl_owner_t id)
 	return rc;
 }
 
-static int
-cifs_write_allocate_pages(struct page **pages, unsigned long num_pages)
-{
-	int rc = 0;
-	unsigned long i;
-
-	for (i = 0; i < num_pages; i++) {
-		pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
-		if (!pages[i]) {
-			/*
-			 * save number of pages we have already allocated and
-			 * return with ENOMEM error
-			 */
-			num_pages = i;
-			rc = -ENOMEM;
-			break;
-		}
-	}
-
-	if (rc) {
-		for (i = 0; i < num_pages; i++)
-			put_page(pages[i]);
-	}
-	return rc;
-}
-
-static inline
-size_t get_numpages(const size_t wsize, const size_t len, size_t *cur_len)
-{
-	size_t num_pages;
-	size_t clen;
-
-	clen = min_t(const size_t, len, wsize);
-	num_pages = DIV_ROUND_UP(clen, PAGE_SIZE);
-
-	if (cur_len)
-		*cur_len = clen;
-
-	return num_pages;
-}
-
 static void
 cifs_uncached_writedata_release(struct kref *refcount)
 {
-	int i;
 	struct cifs_writedata *wdata = container_of(refcount,
 					struct cifs_writedata, refcount);
 
 	kref_put(&wdata->ctx->refcount, cifs_aio_ctx_release);
-	for (i = 0; i < wdata->nr_pages; i++)
-		put_page(wdata->pages[i]);
 	cifs_writedata_release(refcount);
 }
 
@@ -2858,48 +2855,6 @@ cifs_uncached_writev_complete(struct work_struct *work)
 	kref_put(&wdata->refcount, cifs_uncached_writedata_release);
 }
 
-static int
-wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from,
-		      size_t *len, unsigned long *num_pages)
-{
-	size_t save_len, copied, bytes, cur_len = *len;
-	unsigned long i, nr_pages = *num_pages;
-
-	save_len = cur_len;
-	for (i = 0; i < nr_pages; i++) {
-		bytes = min_t(const size_t, cur_len, PAGE_SIZE);
-		copied = copy_page_from_iter(wdata->pages[i], 0, bytes, from);
-		cur_len -= copied;
-		/*
-		 * If we didn't copy as much as we expected, then that
-		 * may mean we trod into an unmapped area. Stop copying
-		 * at that point. On the next pass through the big
-		 * loop, we'll likely end up getting a zero-length
-		 * write and bailing out of it.
-		 */
-		if (copied < bytes)
-			break;
-	}
-	cur_len = save_len - cur_len;
-	*len = cur_len;
-
-	/*
-	 * If we have no data to send, then that probably means that
-	 * the copy above failed altogether. That's most likely because
-	 * the address in the iovec was bogus. Return -EFAULT and let
-	 * the caller free anything we allocated and bail out.
-	 */
-	if (!cur_len)
-		return -EFAULT;
-
-	/*
-	 * i + 1 now represents the number of pages we actually used in
-	 * the copy phase above.
-	 */
-	*num_pages = i + 1;
-	return 0;
-}
-
 static int
 cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list,
 	struct cifs_aio_ctx *ctx)
@@ -2978,14 +2933,11 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 {
 	int rc = 0;
 	size_t cur_len;
-	unsigned long nr_pages, num_pages, i;
 	struct cifs_writedata *wdata;
 	struct iov_iter saved_from = *from;
 	loff_t saved_offset = offset;
 	pid_t pid;
 	struct TCP_Server_Info *server;
-	struct page **pagevec;
-	size_t start;
 	unsigned int xid;
 
 	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
@@ -3016,93 +2968,22 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 
 		cur_len = min_t(const size_t, len, wsize);
 
-		if (ctx->direct_io) {
-			ssize_t result;
-
-			result = iov_iter_get_pages_alloc(
-				from, &pagevec, cur_len, &start);
-			if (result < 0) {
-				cifs_dbg(VFS,
-					 "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n",
-					 result, iov_iter_type(from),
-					 from->iov_offset, from->count);
-				dump_stack();
-
-				rc = result;
-				add_credits_and_wake_if(server, credits, 0);
-				break;
-			}
-			cur_len = (size_t)result;
-			iov_iter_advance(from, cur_len);
-
-			nr_pages =
-				(cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
-
-			wdata = cifs_writedata_direct_alloc(pagevec,
-					     cifs_uncached_writev_complete);
-			if (!wdata) {
-				rc = -ENOMEM;
-				add_credits_and_wake_if(server, credits, 0);
-				break;
-			}
-
-
-			wdata->page_offset = start;
-			wdata->tailsz =
-				nr_pages > 1 ?
-					cur_len - (PAGE_SIZE - start) -
-					(nr_pages - 2) * PAGE_SIZE :
-					cur_len;
-		} else {
-			nr_pages = get_numpages(wsize, len, &cur_len);
-			wdata = cifs_writedata_alloc(nr_pages,
-					     cifs_uncached_writev_complete);
-			if (!wdata) {
-				rc = -ENOMEM;
-				add_credits_and_wake_if(server, credits, 0);
-				break;
-			}
-
-			rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
-			if (rc) {
-				kvfree(wdata->pages);
-				kfree(wdata);
-				add_credits_and_wake_if(server, credits, 0);
-				break;
-			}
-
-			num_pages = nr_pages;
-			rc = wdata_fill_from_iovec(
-				wdata, from, &cur_len, &num_pages);
-			if (rc) {
-				for (i = 0; i < nr_pages; i++)
-					put_page(wdata->pages[i]);
-				kvfree(wdata->pages);
-				kfree(wdata);
-				add_credits_and_wake_if(server, credits, 0);
-				break;
-			}
-
-			/*
-			 * Bring nr_pages down to the number of pages we
-			 * actually used, and free any pages that we didn't use.
-			 */
-			for ( ; nr_pages > num_pages; nr_pages--)
-				put_page(wdata->pages[nr_pages - 1]);
-
-			wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
+		wdata = cifs_writedata_alloc(cifs_uncached_writev_complete);
+		if (!wdata) {
+			rc = -ENOMEM;
+			add_credits_and_wake_if(server, credits, 0);
+			break;
 		}
 
 		wdata->sync_mode = WB_SYNC_ALL;
-		wdata->nr_pages = nr_pages;
-		wdata->offset = (__u64)offset;
-		wdata->cfile = cifsFileInfo_get(open_file);
-		wdata->server = server;
-		wdata->pid = pid;
-		wdata->bytes = cur_len;
-		wdata->pagesz = PAGE_SIZE;
-		wdata->credits = credits_on_stack;
-		wdata->ctx = ctx;
+		wdata->offset	= (__u64)offset;
+		wdata->cfile	= cifsFileInfo_get(open_file);
+		wdata->server	= server;
+		wdata->pid	= pid;
+		wdata->bytes	= cur_len;
+		wdata->credits	= credits_on_stack;
+		wdata->iter	= *from;
+		wdata->ctx	= ctx;
 		kref_get(&ctx->refcount);
 
 		rc = adjust_credits(server, &wdata->credits, wdata->bytes);
@@ -3228,7 +3109,6 @@ static ssize_t __cifs_writev(
 	struct cifs_sb_info *cifs_sb;
 	struct cifs_aio_ctx *ctx;
 	struct iov_iter saved_from = *from;
-	size_t len = iov_iter_count(from);
 	int rc;
 
 	/*
@@ -3262,18 +3142,21 @@ static ssize_t __cifs_writev(
 		ctx->iocb = iocb;
 
 	ctx->pos = iocb->ki_pos;
+	ctx->direct_io = direct;
 
-	if (direct) {
-		ctx->direct_io = true;
-		ctx->iter = *from;
-		ctx->len = len;
-	} else {
-		rc = setup_aio_ctx_iter(ctx, from, WRITE);
-		if (rc) {
-			kref_put(&ctx->refcount, cifs_aio_ctx_release);
-			return rc;
-		}
+	/*
+	 * Duplicate the iterator as it may contain references to the calling
+	 * process's virtual memory layout which won't be available in an async
+	 * worker thread.  This also takes a ref on every folio involved and
+	 * attaches them to ctx->bv[].
+	 */
+	rc = extract_iter_to_iter(from, ctx->len, &ctx->iter, &ctx->bv);
+	if (rc < 0) {
+		kref_put(&ctx->refcount, cifs_aio_ctx_release);
+		return rc;
 	}
+	ctx->npages = rc;
+	ctx->len = iov_iter_count(&ctx->iter);
 
 	/* grab a lock here due to read response handlers can access ctx */
 	mutex_lock(&ctx->aio_mutex);



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 5/7] cifs: Make cifs_readahead() pass an iterator down
  2022-01-25 13:57 [RFC][RFC PATCH 0/7] cifs: In-progress conversion to use iov_iters and netfslib David Howells
                   ` (3 preceding siblings ...)
  2022-01-25 13:58 ` [RFC PATCH 4/7] cifs: Make cifs_writepages() hand an iterator down David Howells
@ 2022-01-25 13:58 ` David Howells
  2022-01-25 13:58 ` [RFC PATCH 6/7] cifs: Get direct I/O and unbuffered I/O working with iterators David Howells
  2022-01-25 13:59 ` [RFC PATCH 7/7] cifs: Use netfslib to handle reads David Howells
  6 siblings, 0 replies; 20+ messages in thread
From: David Howells @ 2022-01-25 13:58 UTC (permalink / raw)
  To: smfrench, nspmangalore
  Cc: dhowells, jlayton, linux-cifs, linux-cachefs, linux-fsdevel


---

 fs/cifs/file.c |  214 +++++++++++++-------------------------------------------
 1 file changed, 50 insertions(+), 164 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index f40e5b938d43..b57f9b492227 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3301,14 +3301,12 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from)
 	return written;
 }
 
-static struct cifs_readdata *
-cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
+static struct cifs_readdata *cifs_readdata_alloc(work_func_t complete)
 {
 	struct cifs_readdata *rdata;
 
 	rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
-	if (rdata != NULL) {
-		rdata->pages = pages;
+	if (rdata) {
 		kref_init(&rdata->refcount);
 		INIT_LIST_HEAD(&rdata->list);
 		init_completion(&rdata->done);
@@ -3318,22 +3316,6 @@ cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
 	return rdata;
 }
 
-static struct cifs_readdata *
-cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
-{
-	struct page **pages =
-		kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL);
-	struct cifs_readdata *ret = NULL;
-
-	if (pages) {
-		ret = cifs_readdata_direct_alloc(pages, complete);
-		if (!ret)
-			kfree(pages);
-	}
-
-	return ret;
-}
-
 void
 cifs_readdata_release(struct kref *refcount)
 {
@@ -4147,145 +4129,60 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	return rc;
 }
 
-static void
-cifs_readv_complete(struct work_struct *work)
+/*
+ * Unlock a bunch of folios in the pagecache.
+ */
+static void cifs_unlock_folios(struct address_space *mapping, pgoff_t first, pgoff_t last)
 {
-	unsigned int i, got_bytes;
-	struct cifs_readdata *rdata = container_of(work,
-						struct cifs_readdata, work);
-
-	got_bytes = rdata->got_bytes;
-	for (i = 0; i < rdata->nr_pages; i++) {
-		struct page *page = rdata->pages[i];
-
-		if (rdata->result == 0 ||
-		    (rdata->result == -EAGAIN && got_bytes)) {
-			flush_dcache_page(page);
-			SetPageUptodate(page);
-		} else
-			SetPageError(page);
-
-		unlock_page(page);
-
-		if (rdata->result == 0 ||
-		    (rdata->result == -EAGAIN && got_bytes))
-			cifs_readpage_to_fscache(rdata->mapping->host, page);
-
-		got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes);
-
-		put_page(page);
-		rdata->pages[i] = NULL;
-	}
-	kref_put(&rdata->refcount, cifs_readdata_release);
+       struct folio *folio;
+       XA_STATE(xas, &mapping->i_pages, first);
+
+       rcu_read_lock();
+       xas_for_each(&xas, folio, last) {
+               folio_unlock(folio);
+       }
+       rcu_read_unlock();
 }
 
-static int
-readpages_fill_pages(struct TCP_Server_Info *server,
-		     struct cifs_readdata *rdata, struct iov_iter *iter,
-		     unsigned int len)
+static void cifs_readahead_complete(struct work_struct *work)
 {
-	int result = 0;
-	unsigned int i;
-	u64 eof;
-	pgoff_t eof_index;
-	unsigned int nr_pages = rdata->nr_pages;
-	unsigned int page_offset = rdata->page_offset;
+	struct cifs_readdata *rdata = container_of(work,
+						   struct cifs_readdata, work);
+	struct folio *folio;
+	pgoff_t last;
+	bool good = rdata->result == 0 || (rdata->result == -EAGAIN && rdata->got_bytes);
 
-	/* determine the eof that the server (probably) has */
-	eof = CIFS_I(rdata->mapping->host)->server_eof;
-	eof_index = eof ? (eof - 1) >> PAGE_SHIFT : 0;
-	cifs_dbg(FYI, "eof=%llu eof_index=%lu\n", eof, eof_index);
+	XA_STATE(xas, &rdata->mapping->i_pages, rdata->offset / PAGE_SIZE);
 
-	rdata->got_bytes = 0;
-	rdata->tailsz = PAGE_SIZE;
-	for (i = 0; i < nr_pages; i++) {
-		struct page *page = rdata->pages[i];
-		unsigned int to_read = rdata->pagesz;
-		size_t n;
+#if 0
+	if (good)
+		cifs_readpage_to_fscache(rdata->mapping->host, page);
+#endif
 
-		if (i == 0)
-			to_read -= page_offset;
-		else
-			page_offset = 0;
+	if (iov_iter_count(&rdata->iter) > 0)
+		iov_iter_zero(iov_iter_count(&rdata->iter), &rdata->iter);
 
-		n = to_read;
+	last = round_down(rdata->offset + rdata->got_bytes - 1, PAGE_SIZE);
 
-		if (len >= to_read) {
-			len -= to_read;
-		} else if (len > 0) {
-			/* enough for partial page, fill and zero the rest */
-			zero_user(page, len + page_offset, to_read - len);
-			n = rdata->tailsz = len;
-			len = 0;
-		} else if (page->index > eof_index) {
-			/*
-			 * The VFS will not try to do readahead past the
-			 * i_size, but it's possible that we have outstanding
-			 * writes with gaps in the middle and the i_size hasn't
-			 * caught up yet. Populate those with zeroed out pages
-			 * to prevent the VFS from repeatedly attempting to
-			 * fill them until the writes are flushed.
-			 */
-			zero_user(page, 0, PAGE_SIZE);
-			flush_dcache_page(page);
-			SetPageUptodate(page);
-			unlock_page(page);
-			put_page(page);
-			rdata->pages[i] = NULL;
-			rdata->nr_pages--;
-			continue;
-		} else {
-			/* no need to hold page hostage */
-			unlock_page(page);
-			put_page(page);
-			rdata->pages[i] = NULL;
-			rdata->nr_pages--;
-			continue;
+	xas_for_each(&xas, folio, last) {
+		if (good) {
+			flush_dcache_folio(folio);
+			folio_mark_uptodate(folio);
 		}
-
-		if (iter)
-			result = copy_page_from_iter(
-					page, page_offset, n, iter);
-#ifdef CONFIG_CIFS_SMB_DIRECT
-		else if (rdata->mr)
-			result = n;
-#endif
-		else
-			result = cifs_read_page_from_socket(
-					server, page, page_offset, n);
-		if (result < 0)
-			break;
-
-		rdata->got_bytes += result;
+		folio_unlock(folio);
 	}
 
-	return rdata->got_bytes > 0 && result != -ECONNABORTED ?
-						rdata->got_bytes : result;
-}
-
-static int
-cifs_readpages_read_into_pages(struct TCP_Server_Info *server,
-			       struct cifs_readdata *rdata, unsigned int len)
-{
-	return readpages_fill_pages(server, rdata, NULL, len);
-}
-
-static int
-cifs_readpages_copy_into_pages(struct TCP_Server_Info *server,
-			       struct cifs_readdata *rdata,
-			       struct iov_iter *iter)
-{
-	return readpages_fill_pages(server, rdata, iter, iter->count);
+	kref_put(&rdata->refcount, cifs_readdata_release);
 }
 
 static void cifs_readahead(struct readahead_control *ractl)
 {
-	int rc;
 	struct cifsFileInfo *open_file = ractl->file->private_data;
 	struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file);
 	struct TCP_Server_Info *server;
-	pid_t pid;
 	unsigned int xid;
+	pid_t pid;
+	int rc = 0;
 
 	xid = get_xid();
 
@@ -4294,7 +4191,6 @@ static void cifs_readahead(struct readahead_control *ractl)
 	else
 		pid = current->tgid;
 
-	rc = 0;
 	server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
 
 	cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n",
@@ -4304,8 +4200,7 @@ static void cifs_readahead(struct readahead_control *ractl)
 	 * Chop the readahead request up into rsize-sized read requests.
 	 */
 	while (readahead_count(ractl) - ractl->_batch_count) {
-		unsigned int i, nr_pages, got, rsize;
-		struct page *page;
+		unsigned int i, nr_pages, rsize;
 		struct cifs_readdata *rdata;
 		struct cifs_credits credits_on_stack;
 		struct cifs_credits *credits = &credits_on_stack;
@@ -4336,33 +4231,28 @@ static void cifs_readahead(struct readahead_control *ractl)
 			break;
 		}
 
-		rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete);
+		rdata = cifs_readdata_alloc(cifs_readahead_complete);
 		if (!rdata) {
 			/* best to give up if we're out of mem */
 			add_credits_and_wake_if(server, credits, 0);
 			break;
 		}
 
-		got = __readahead_batch(ractl, rdata->pages, nr_pages);
-		if (got != nr_pages) {
-			pr_warn("__readahead_batch() returned %u/%u\n",
-				got, nr_pages);
-			nr_pages = got;
-		}
-
-		rdata->nr_pages = nr_pages;
-		rdata->bytes	= readahead_batch_length(ractl);
+		rdata->offset	= readahead_pos(ractl);
+		rdata->bytes	= nr_pages * PAGE_SIZE;
 		rdata->cfile	= cifsFileInfo_get(open_file);
 		rdata->server	= server;
 		rdata->mapping	= ractl->mapping;
-		rdata->offset	= readahead_pos(ractl);
 		rdata->pid	= pid;
-		rdata->pagesz	= PAGE_SIZE;
-		rdata->tailsz	= PAGE_SIZE;
-		rdata->read_into_pages = cifs_readpages_read_into_pages;
-		rdata->copy_into_pages = cifs_readpages_copy_into_pages;
 		rdata->credits	= credits_on_stack;
 
+		for (i = 0; i < nr_pages; i++)
+			if (!readahead_folio(ractl))
+				BUG();
+
+		iov_iter_xarray(&rdata->iter, READ, &rdata->mapping->i_pages,
+				rdata->offset, rdata->bytes);
+
 		rc = adjust_credits(server, &rdata->credits, rdata->bytes);
 		if (!rc) {
 			if (rdata->cfile->invalidHandle)
@@ -4373,17 +4263,13 @@ static void cifs_readahead(struct readahead_control *ractl)
 
 		if (rc) {
 			add_credits_and_wake_if(server, &rdata->credits, 0);
-			for (i = 0; i < rdata->nr_pages; i++) {
-				page = rdata->pages[i];
-				unlock_page(page);
-				put_page(page);
-			}
+			cifs_unlock_folios(rdata->mapping,
+					   rdata->offset / PAGE_SIZE,
+					   (rdata->offset + rdata->bytes - 1) / PAGE_SIZE);
 			/* Fallback to the readpage in error/reconnect cases */
 			kref_put(&rdata->refcount, cifs_readdata_release);
 			break;
 		}
-
-		kref_put(&rdata->refcount, cifs_readdata_release);
 	}
 
 	free_xid(xid);



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 6/7] cifs: Get direct I/O and unbuffered I/O working with iterators
  2022-01-25 13:57 [RFC][RFC PATCH 0/7] cifs: In-progress conversion to use iov_iters and netfslib David Howells
                   ` (4 preceding siblings ...)
  2022-01-25 13:58 ` [RFC PATCH 5/7] cifs: Make cifs_readahead() pass " David Howells
@ 2022-01-25 13:58 ` David Howells
  2022-01-25 13:59 ` [RFC PATCH 7/7] cifs: Use netfslib to handle reads David Howells
  6 siblings, 0 replies; 20+ messages in thread
From: David Howells @ 2022-01-25 13:58 UTC (permalink / raw)
  To: smfrench, nspmangalore
  Cc: dhowells, jlayton, linux-cifs, linux-cachefs, linux-fsdevel


---

 fs/cifs/cifsproto.h |    1 
 fs/cifs/file.c      |  299 ++++++---------------------------------------------
 fs/cifs/misc.c      |   90 ---------------
 3 files changed, 35 insertions(+), 355 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 1b143f0a03c0..fb6bcda46266 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -595,7 +595,6 @@ enum securityEnum cifs_select_sectype(struct TCP_Server_Info *,
 					enum securityEnum);
 struct cifs_aio_ctx *cifs_aio_ctx_alloc(void);
 void cifs_aio_ctx_release(struct kref *refcount);
-int setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw);
 void smb2_cached_lease_break(struct work_struct *work);
 
 int cifs_alloc_hash(const char *name, struct crypto_shash **shash,
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index b57f9b492227..f9b9a1562e17 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3321,6 +3321,9 @@ cifs_readdata_release(struct kref *refcount)
 {
 	struct cifs_readdata *rdata = container_of(refcount,
 					struct cifs_readdata, refcount);
+
+	if (rdata->ctx)
+		kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	if (rdata->mr) {
 		smbd_deregister_mr(rdata->mr);
@@ -3330,85 +3333,9 @@ cifs_readdata_release(struct kref *refcount)
 	if (rdata->cfile)
 		cifsFileInfo_put(rdata->cfile);
 
-	kvfree(rdata->pages);
 	kfree(rdata);
 }
 
-static int
-cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages)
-{
-	int rc = 0;
-	struct page *page;
-	unsigned int i;
-
-	for (i = 0; i < nr_pages; i++) {
-		page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
-		if (!page) {
-			rc = -ENOMEM;
-			break;
-		}
-		rdata->pages[i] = page;
-	}
-
-	if (rc) {
-		unsigned int nr_page_failed = i;
-
-		for (i = 0; i < nr_page_failed; i++) {
-			put_page(rdata->pages[i]);
-			rdata->pages[i] = NULL;
-		}
-	}
-	return rc;
-}
-
-static void
-cifs_uncached_readdata_release(struct kref *refcount)
-{
-	struct cifs_readdata *rdata = container_of(refcount,
-					struct cifs_readdata, refcount);
-	unsigned int i;
-
-	kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
-	for (i = 0; i < rdata->nr_pages; i++) {
-		put_page(rdata->pages[i]);
-	}
-	cifs_readdata_release(refcount);
-}
-
-/**
- * cifs_readdata_to_iov - copy data from pages in response to an iovec
- * @rdata:	the readdata response with list of pages holding data
- * @iter:	destination for our data
- *
- * This function copies data from a list of pages in a readdata response into
- * an array of iovecs. It will first calculate where the data should go
- * based on the info in the readdata and then copy the data into that spot.
- */
-static int
-cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter)
-{
-	size_t remaining = rdata->got_bytes;
-	unsigned int i;
-
-	for (i = 0; i < rdata->nr_pages; i++) {
-		struct page *page = rdata->pages[i];
-		size_t copy = min_t(size_t, remaining, PAGE_SIZE);
-		size_t written;
-
-		if (unlikely(iov_iter_is_pipe(iter))) {
-			void *addr = kmap_atomic(page);
-
-			written = copy_to_iter(addr, copy, iter);
-			kunmap_atomic(addr);
-		} else
-			written = copy_page_to_iter(page, 0, copy, iter);
-		remaining -= written;
-		if (written < copy && iov_iter_count(iter) > 0)
-			break;
-	}
-	return remaining ? -EFAULT : 0;
-}
-
 static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
 
 static void
@@ -3420,81 +3347,7 @@ cifs_uncached_readv_complete(struct work_struct *work)
 	complete(&rdata->done);
 	collect_uncached_read_data(rdata->ctx);
 	/* the below call can possibly free the last ref to aio ctx */
-	kref_put(&rdata->refcount, cifs_uncached_readdata_release);
-}
-
-static int
-uncached_fill_pages(struct TCP_Server_Info *server,
-		    struct cifs_readdata *rdata, struct iov_iter *iter,
-		    unsigned int len)
-{
-	int result = 0;
-	unsigned int i;
-	unsigned int nr_pages = rdata->nr_pages;
-	unsigned int page_offset = rdata->page_offset;
-
-	rdata->got_bytes = 0;
-	rdata->tailsz = PAGE_SIZE;
-	for (i = 0; i < nr_pages; i++) {
-		struct page *page = rdata->pages[i];
-		size_t n;
-		unsigned int segment_size = rdata->pagesz;
-
-		if (i == 0)
-			segment_size -= page_offset;
-		else
-			page_offset = 0;
-
-
-		if (len <= 0) {
-			/* no need to hold page hostage */
-			rdata->pages[i] = NULL;
-			rdata->nr_pages--;
-			put_page(page);
-			continue;
-		}
-
-		n = len;
-		if (len >= segment_size)
-			/* enough data to fill the page */
-			n = segment_size;
-		else
-			rdata->tailsz = len;
-		len -= n;
-
-		if (iter)
-			result = copy_page_from_iter(
-					page, page_offset, n, iter);
-#ifdef CONFIG_CIFS_SMB_DIRECT
-		else if (rdata->mr)
-			result = n;
-#endif
-		else
-			result = cifs_read_page_from_socket(
-					server, page, page_offset, n);
-		if (result < 0)
-			break;
-
-		rdata->got_bytes += result;
-	}
-
-	return rdata->got_bytes > 0 && result != -ECONNABORTED ?
-						rdata->got_bytes : result;
-}
-
-static int
-cifs_uncached_read_into_pages(struct TCP_Server_Info *server,
-			      struct cifs_readdata *rdata, unsigned int len)
-{
-	return uncached_fill_pages(server, rdata, NULL, len);
-}
-
-static int
-cifs_uncached_copy_into_pages(struct TCP_Server_Info *server,
-			      struct cifs_readdata *rdata,
-			      struct iov_iter *iter)
-{
-	return uncached_fill_pages(server, rdata, iter, iter->count);
+	kref_put(&rdata->refcount, cifs_readdata_release);
 }
 
 static int cifs_resend_rdata(struct cifs_readdata *rdata,
@@ -3565,7 +3418,7 @@ static int cifs_resend_rdata(struct cifs_readdata *rdata,
 	} while (rc == -EAGAIN);
 
 fail:
-	kref_put(&rdata->refcount, cifs_uncached_readdata_release);
+	kref_put(&rdata->refcount, cifs_readdata_release);
 	return rc;
 }
 
@@ -3575,16 +3428,13 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 		     struct cifs_aio_ctx *ctx)
 {
 	struct cifs_readdata *rdata;
-	unsigned int npages, rsize;
+	unsigned int rsize;
 	struct cifs_credits credits_on_stack;
 	struct cifs_credits *credits = &credits_on_stack;
 	size_t cur_len;
 	int rc;
 	pid_t pid;
 	struct TCP_Server_Info *server;
-	struct page **pagevec;
-	size_t start;
-	struct iov_iter direct_iov = ctx->iter;
 
 	server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
 
@@ -3593,9 +3443,6 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 	else
 		pid = current->tgid;
 
-	if (ctx->direct_io)
-		iov_iter_advance(&direct_iov, offset - ctx->pos);
-
 	do {
 		if (open_file->invalidHandle) {
 			rc = cifs_reopen_file(open_file, true);
@@ -3612,77 +3459,26 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 
 		cur_len = min_t(const size_t, len, rsize);
 
-		if (ctx->direct_io) {
-			ssize_t result;
-
-			result = iov_iter_get_pages_alloc(
-					&direct_iov, &pagevec,
-					cur_len, &start);
-			if (result < 0) {
-				cifs_dbg(VFS,
-					 "Couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n",
-					 result, iov_iter_type(&direct_iov),
-					 direct_iov.iov_offset,
-					 direct_iov.count);
-				dump_stack();
-
-				rc = result;
-				add_credits_and_wake_if(server, credits, 0);
-				break;
-			}
-			cur_len = (size_t)result;
-			iov_iter_advance(&direct_iov, cur_len);
-
-			rdata = cifs_readdata_direct_alloc(
-					pagevec, cifs_uncached_readv_complete);
-			if (!rdata) {
-				add_credits_and_wake_if(server, credits, 0);
-				rc = -ENOMEM;
-				break;
-			}
-
-			npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE;
-			rdata->page_offset = start;
-			rdata->tailsz = npages > 1 ?
-				cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
-				cur_len;
-
-		} else {
-
-			npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
-			/* allocate a readdata struct */
-			rdata = cifs_readdata_alloc(npages,
-					    cifs_uncached_readv_complete);
-			if (!rdata) {
-				add_credits_and_wake_if(server, credits, 0);
-				rc = -ENOMEM;
-				break;
-			}
-
-			rc = cifs_read_allocate_pages(rdata, npages);
-			if (rc) {
-				kvfree(rdata->pages);
-				kfree(rdata);
-				add_credits_and_wake_if(server, credits, 0);
-				break;
-			}
-
-			rdata->tailsz = PAGE_SIZE;
+		rdata = cifs_readdata_alloc(cifs_uncached_readv_complete);
+		if (!rdata) {
+			add_credits_and_wake_if(server, credits, 0);
+			rc = -ENOMEM;
+			break;
 		}
 
-		rdata->server = server;
-		rdata->cfile = cifsFileInfo_get(open_file);
-		rdata->nr_pages = npages;
-		rdata->offset = offset;
-		rdata->bytes = cur_len;
-		rdata->pid = pid;
-		rdata->pagesz = PAGE_SIZE;
-		rdata->read_into_pages = cifs_uncached_read_into_pages;
-		rdata->copy_into_pages = cifs_uncached_copy_into_pages;
-		rdata->credits = credits_on_stack;
-		rdata->ctx = ctx;
+		rdata->server	= server;
+		rdata->cfile	= cifsFileInfo_get(open_file);
+		rdata->offset	= offset;
+		rdata->bytes	= cur_len;
+		rdata->pid	= pid;
+		rdata->credits	= credits_on_stack;
+		rdata->ctx	= ctx;
 		kref_get(&ctx->refcount);
 
+		rdata->iter	= ctx->iter;
+		iov_iter_advance(&rdata->iter, offset - ctx->pos);
+		iov_iter_truncate(&rdata->iter, cur_len);
+
 		rc = adjust_credits(server, &rdata->credits, rdata->bytes);
 
 		if (!rc) {
@@ -3694,12 +3490,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 
 		if (rc) {
 			add_credits_and_wake_if(server, &rdata->credits, 0);
-			kref_put(&rdata->refcount,
-				cifs_uncached_readdata_release);
-			if (rc == -EAGAIN) {
-				iov_iter_revert(&direct_iov, cur_len);
+			kref_put(&rdata->refcount, cifs_readdata_release);
+			if (rc == -EAGAIN)
 				continue;
-			}
 			break;
 		}
 
@@ -3746,22 +3539,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
 				list_del_init(&rdata->list);
 				INIT_LIST_HEAD(&tmp_list);
 
-				/*
-				 * Got a part of data and then reconnect has
-				 * happened -- fill the buffer and continue
-				 * reading.
-				 */
-				if (got_bytes && got_bytes < rdata->bytes) {
-					rc = 0;
-					if (!ctx->direct_io)
-						rc = cifs_readdata_to_iov(rdata, to);
-					if (rc) {
-						kref_put(&rdata->refcount,
-							cifs_uncached_readdata_release);
-						continue;
-					}
-				}
-
 				if (ctx->direct_io) {
 					/*
 					 * Re-use rdata as this is a
@@ -3778,7 +3555,7 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
 						&tmp_list, ctx);
 
 					kref_put(&rdata->refcount,
-						cifs_uncached_readdata_release);
+						cifs_readdata_release);
 				}
 
 				list_splice(&tmp_list, &ctx->list);
@@ -3786,8 +3563,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
 				goto again;
 			} else if (rdata->result)
 				rc = rdata->result;
-			else if (!ctx->direct_io)
-				rc = cifs_readdata_to_iov(rdata, to);
 
 			/* if there was a short read -- discard anything left */
 			if (rdata->got_bytes && rdata->got_bytes < rdata->bytes)
@@ -3796,7 +3571,7 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
 			ctx->total_len += rdata->got_bytes;
 		}
 		list_del_init(&rdata->list);
-		kref_put(&rdata->refcount, cifs_uncached_readdata_release);
+		kref_put(&rdata->refcount, cifs_readdata_release);
 	}
 
 	if (!ctx->direct_io)
@@ -3856,7 +3631,10 @@ static ssize_t __cifs_readv(
 	if (!ctx)
 		return -ENOMEM;
 
-	ctx->cfile = cifsFileInfo_get(cfile);
+	ctx->pos	= offset;
+	ctx->direct_io	= direct;
+	ctx->len	= len;
+	ctx->cfile	= cifsFileInfo_get(cfile);
 
 	if (!is_sync_kiocb(iocb))
 		ctx->iocb = iocb;
@@ -3864,19 +3642,12 @@ static ssize_t __cifs_readv(
 	if (iter_is_iovec(to))
 		ctx->should_dirty = true;
 
-	if (direct) {
-		ctx->pos = offset;
-		ctx->direct_io = true;
-		ctx->iter = *to;
-		ctx->len = len;
-	} else {
-		rc = setup_aio_ctx_iter(ctx, to, READ);
-		if (rc) {
-			kref_put(&ctx->refcount, cifs_aio_ctx_release);
-			return rc;
-		}
-		len = ctx->len;
+	rc = extract_iter_to_iter(to, len, &ctx->iter, &ctx->bv);
+	if (rc < 0) {
+		kref_put(&ctx->refcount, cifs_aio_ctx_release);
+		return rc;
 	}
+	ctx->npages = rc;
 
 	/* grab a lock here due to read response handlers can access ctx */
 	mutex_lock(&ctx->aio_mutex);
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index f5fe5720456a..6bbc314ab84c 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -974,96 +974,6 @@ cifs_aio_ctx_release(struct kref *refcount)
 	kfree(ctx);
 }
 
-#define CIFS_AIO_KMALLOC_LIMIT (1024 * 1024)
-
-int
-setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw)
-{
-	ssize_t rc;
-	unsigned int cur_npages;
-	unsigned int npages = 0;
-	unsigned int i;
-	size_t len;
-	size_t count = iov_iter_count(iter);
-	unsigned int saved_len;
-	size_t start;
-	unsigned int max_pages = iov_iter_npages(iter, INT_MAX);
-	struct page **pages = NULL;
-	struct bio_vec *bv = NULL;
-
-	if (iov_iter_is_kvec(iter)) {
-		memcpy(&ctx->iter, iter, sizeof(*iter));
-		ctx->len = count;
-		iov_iter_advance(iter, count);
-		return 0;
-	}
-
-	if (array_size(max_pages, sizeof(*bv)) <= CIFS_AIO_KMALLOC_LIMIT)
-		bv = kmalloc_array(max_pages, sizeof(*bv), GFP_KERNEL);
-
-	if (!bv) {
-		bv = vmalloc(array_size(max_pages, sizeof(*bv)));
-		if (!bv)
-			return -ENOMEM;
-	}
-
-	if (array_size(max_pages, sizeof(*pages)) <= CIFS_AIO_KMALLOC_LIMIT)
-		pages = kmalloc_array(max_pages, sizeof(*pages), GFP_KERNEL);
-
-	if (!pages) {
-		pages = vmalloc(array_size(max_pages, sizeof(*pages)));
-		if (!pages) {
-			kvfree(bv);
-			return -ENOMEM;
-		}
-	}
-
-	saved_len = count;
-
-	while (count && npages < max_pages) {
-		rc = iov_iter_get_pages(iter, pages, count, max_pages, &start);
-		if (rc < 0) {
-			cifs_dbg(VFS, "Couldn't get user pages (rc=%zd)\n", rc);
-			break;
-		}
-
-		if (rc > count) {
-			cifs_dbg(VFS, "get pages rc=%zd more than %zu\n", rc,
-				 count);
-			break;
-		}
-
-		iov_iter_advance(iter, rc);
-		count -= rc;
-		rc += start;
-		cur_npages = DIV_ROUND_UP(rc, PAGE_SIZE);
-
-		if (npages + cur_npages > max_pages) {
-			cifs_dbg(VFS, "out of vec array capacity (%u vs %u)\n",
-				 npages + cur_npages, max_pages);
-			break;
-		}
-
-		for (i = 0; i < cur_npages; i++) {
-			len = rc > PAGE_SIZE ? PAGE_SIZE : rc;
-			bv[npages + i].bv_page = pages[i];
-			bv[npages + i].bv_offset = start;
-			bv[npages + i].bv_len = len - start;
-			rc -= len;
-			start = 0;
-		}
-
-		npages += cur_npages;
-	}
-
-	kvfree(pages);
-	ctx->bv = bv;
-	ctx->len = saved_len - count;
-	ctx->npages = npages;
-	iov_iter_bvec(&ctx->iter, rw, ctx->bv, npages, ctx->len);
-	return 0;
-}
-
 /**
  * cifs_alloc_hash - allocate hash and hash context together
  * @name: The name of the crypto hash algo



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 7/7] cifs: Use netfslib to handle reads
  2022-01-25 13:57 [RFC][RFC PATCH 0/7] cifs: In-progress conversion to use iov_iters and netfslib David Howells
                   ` (5 preceding siblings ...)
  2022-01-25 13:58 ` [RFC PATCH 6/7] cifs: Get direct I/O and unbuffered I/O working with iterators David Howells
@ 2022-01-25 13:59 ` David Howells
  2022-02-08  5:59   ` Rohith Surabattula
  2022-02-14 16:33   ` David Howells
  6 siblings, 2 replies; 20+ messages in thread
From: David Howells @ 2022-01-25 13:59 UTC (permalink / raw)
  To: smfrench, nspmangalore
  Cc: dhowells, jlayton, linux-cifs, linux-cachefs, linux-fsdevel


---

 fs/cifs/Kconfig        |    1 
 fs/cifs/cifsfs.c       |    6 
 fs/cifs/cifsfs.h       |    3 
 fs/cifs/cifsglob.h     |    6 
 fs/cifs/cifssmb.c      |    9 -
 fs/cifs/file.c         |  824 ++++++++----------------------------------------
 fs/cifs/fscache.c      |   31 --
 fs/cifs/fscache.h      |   52 ---
 fs/cifs/inode.c        |   17 +
 fs/cifs/smb2pdu.c      |   15 +
 fs/netfs/read_helper.c |    7 
 11 files changed, 182 insertions(+), 789 deletions(-)

diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig
index 3b7e3b9e4fd2..c47e2d3a101f 100644
--- a/fs/cifs/Kconfig
+++ b/fs/cifs/Kconfig
@@ -2,6 +2,7 @@
 config CIFS
 	tristate "SMB3 and CIFS support (advanced network filesystem)"
 	depends on INET
+	select NETFS_SUPPORT
 	select NLS
 	select CRYPTO
 	select CRYPTO_MD5
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index a56cb9c8c5ff..bd06df3bb24b 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -936,7 +936,7 @@ cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	struct inode *inode = file_inode(iocb->ki_filp);
 
 	if (iocb->ki_flags & IOCB_DIRECT)
-		return cifs_user_readv(iocb, iter);
+		return netfs_direct_read_iter(iocb, iter);
 
 	rc = cifs_revalidate_mapping(inode);
 	if (rc)
@@ -1314,7 +1314,7 @@ const struct file_operations cifs_file_strict_ops = {
 };
 
 const struct file_operations cifs_file_direct_ops = {
-	.read_iter = cifs_direct_readv,
+	.read_iter = netfs_direct_read_iter,
 	.write_iter = cifs_direct_writev,
 	.open = cifs_open,
 	.release = cifs_close,
@@ -1370,7 +1370,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
 };
 
 const struct file_operations cifs_file_direct_nobrl_ops = {
-	.read_iter = cifs_direct_readv,
+	.read_iter = netfs_direct_read_iter,
 	.write_iter = cifs_direct_writev,
 	.open = cifs_open,
 	.release = cifs_close,
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 1c77bbc0815f..c7d5c268fc47 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -85,6 +85,7 @@ extern const struct inode_operations cifs_dfs_referral_inode_operations;
 
 
 /* Functions related to files and directories */
+extern const struct netfs_request_ops cifs_req_ops;
 extern const struct file_operations cifs_file_ops;
 extern const struct file_operations cifs_file_direct_ops; /* if directio mnt */
 extern const struct file_operations cifs_file_strict_ops; /* if strictio mnt */
@@ -94,8 +95,6 @@ extern const struct file_operations cifs_file_strict_nobrl_ops;
 extern int cifs_open(struct inode *inode, struct file *file);
 extern int cifs_close(struct inode *inode, struct file *file);
 extern int cifs_closedir(struct inode *inode, struct file *file);
-extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
-extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 3a4fed645636..938e4e9827ed 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1313,18 +1313,14 @@ struct cifs_aio_ctx {
 
 /* asynchronous read support */
 struct cifs_readdata {
+	struct netfs_read_subrequest	*subreq;
 	struct kref			refcount;
-	struct list_head		list;
-	struct completion		done;
 	struct cifsFileInfo		*cfile;
-	struct address_space		*mapping;
-	struct cifs_aio_ctx		*ctx;
 	__u64				offset;
 	ssize_t				got_bytes;
 	unsigned int			bytes;
 	pid_t				pid;
 	int				result;
-	struct work_struct		work;
 	struct iov_iter			iter;
 	struct kvec			iov[2];
 	struct TCP_Server_Info		*server;
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 38e7276352e2..c9fb77a8b31b 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -23,6 +23,7 @@
 #include <linux/swap.h>
 #include <linux/task_io_accounting_ops.h>
 #include <linux/uaccess.h>
+#include <linux/netfs.h>
 #include "cifspdu.h"
 #include "cifsfs.h"
 #include "cifsglob.h"
@@ -1609,7 +1610,13 @@ cifs_readv_callback(struct mid_q_entry *mid)
 		rdata->result = -EIO;
 	}
 
-	queue_work(cifsiod_wq, &rdata->work);
+	if (rdata->result == 0 || rdata->result == -EAGAIN)
+		iov_iter_advance(&rdata->subreq->iter, rdata->got_bytes);
+	netfs_subreq_terminated(rdata->subreq,
+				(rdata->result == 0 || rdata->result == -EAGAIN) ?
+				rdata->got_bytes : rdata->result,
+				false);
+	kref_put(&rdata->refcount, cifs_readdata_release);
 	DeleteMidQEntry(mid);
 	add_credits(server, &credits, 0);
 }
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index f9b9a1562e17..36559de02e37 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -21,6 +21,7 @@
 #include <linux/slab.h>
 #include <linux/swap.h>
 #include <linux/mm.h>
+#include <linux/netfs.h>
 #include <asm/div64.h>
 #include "cifsfs.h"
 #include "cifspdu.h"
@@ -3306,12 +3307,8 @@ static struct cifs_readdata *cifs_readdata_alloc(work_func_t complete)
 	struct cifs_readdata *rdata;
 
 	rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
-	if (rdata) {
+	if (rdata)
 		kref_init(&rdata->refcount);
-		INIT_LIST_HEAD(&rdata->list);
-		init_completion(&rdata->done);
-		INIT_WORK(&rdata->work, complete);
-	}
 
 	return rdata;
 }
@@ -3322,8 +3319,6 @@ cifs_readdata_release(struct kref *refcount)
 	struct cifs_readdata *rdata = container_of(refcount,
 					struct cifs_readdata, refcount);
 
-	if (rdata->ctx)
-		kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	if (rdata->mr) {
 		smbd_deregister_mr(rdata->mr);
@@ -3336,370 +3331,6 @@ cifs_readdata_release(struct kref *refcount)
 	kfree(rdata);
 }
 
-static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
-
-static void
-cifs_uncached_readv_complete(struct work_struct *work)
-{
-	struct cifs_readdata *rdata = container_of(work,
-						struct cifs_readdata, work);
-
-	complete(&rdata->done);
-	collect_uncached_read_data(rdata->ctx);
-	/* the below call can possibly free the last ref to aio ctx */
-	kref_put(&rdata->refcount, cifs_readdata_release);
-}
-
-static int cifs_resend_rdata(struct cifs_readdata *rdata,
-			struct list_head *rdata_list,
-			struct cifs_aio_ctx *ctx)
-{
-	unsigned int rsize;
-	struct cifs_credits credits;
-	int rc;
-	struct TCP_Server_Info *server;
-
-	/* XXX: should we pick a new channel here? */
-	server = rdata->server;
-
-	do {
-		if (rdata->cfile->invalidHandle) {
-			rc = cifs_reopen_file(rdata->cfile, true);
-			if (rc == -EAGAIN)
-				continue;
-			else if (rc)
-				break;
-		}
-
-		/*
-		 * Wait for credits to resend this rdata.
-		 * Note: we are attempting to resend the whole rdata not in
-		 * segments
-		 */
-		do {
-			rc = server->ops->wait_mtu_credits(server, rdata->bytes,
-						&rsize, &credits);
-
-			if (rc)
-				goto fail;
-
-			if (rsize < rdata->bytes) {
-				add_credits_and_wake_if(server, &credits, 0);
-				msleep(1000);
-			}
-		} while (rsize < rdata->bytes);
-		rdata->credits = credits;
-
-		rc = adjust_credits(server, &rdata->credits, rdata->bytes);
-		if (!rc) {
-			if (rdata->cfile->invalidHandle)
-				rc = -EAGAIN;
-			else {
-#ifdef CONFIG_CIFS_SMB_DIRECT
-				if (rdata->mr) {
-					rdata->mr->need_invalidate = true;
-					smbd_deregister_mr(rdata->mr);
-					rdata->mr = NULL;
-				}
-#endif
-				rc = server->ops->async_readv(rdata);
-			}
-		}
-
-		/* If the read was successfully sent, we are done */
-		if (!rc) {
-			/* Add to aio pending list */
-			list_add_tail(&rdata->list, rdata_list);
-			return 0;
-		}
-
-		/* Roll back credits and retry if needed */
-		add_credits_and_wake_if(server, &rdata->credits, 0);
-	} while (rc == -EAGAIN);
-
-fail:
-	kref_put(&rdata->refcount, cifs_readdata_release);
-	return rc;
-}
-
-static int
-cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
-		     struct cifs_sb_info *cifs_sb, struct list_head *rdata_list,
-		     struct cifs_aio_ctx *ctx)
-{
-	struct cifs_readdata *rdata;
-	unsigned int rsize;
-	struct cifs_credits credits_on_stack;
-	struct cifs_credits *credits = &credits_on_stack;
-	size_t cur_len;
-	int rc;
-	pid_t pid;
-	struct TCP_Server_Info *server;
-
-	server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
-
-	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
-		pid = open_file->pid;
-	else
-		pid = current->tgid;
-
-	do {
-		if (open_file->invalidHandle) {
-			rc = cifs_reopen_file(open_file, true);
-			if (rc == -EAGAIN)
-				continue;
-			else if (rc)
-				break;
-		}
-
-		rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize,
-						   &rsize, credits);
-		if (rc)
-			break;
-
-		cur_len = min_t(const size_t, len, rsize);
-
-		rdata = cifs_readdata_alloc(cifs_uncached_readv_complete);
-		if (!rdata) {
-			add_credits_and_wake_if(server, credits, 0);
-			rc = -ENOMEM;
-			break;
-		}
-
-		rdata->server	= server;
-		rdata->cfile	= cifsFileInfo_get(open_file);
-		rdata->offset	= offset;
-		rdata->bytes	= cur_len;
-		rdata->pid	= pid;
-		rdata->credits	= credits_on_stack;
-		rdata->ctx	= ctx;
-		kref_get(&ctx->refcount);
-
-		rdata->iter	= ctx->iter;
-		iov_iter_advance(&rdata->iter, offset - ctx->pos);
-		iov_iter_truncate(&rdata->iter, cur_len);
-
-		rc = adjust_credits(server, &rdata->credits, rdata->bytes);
-
-		if (!rc) {
-			if (rdata->cfile->invalidHandle)
-				rc = -EAGAIN;
-			else
-				rc = server->ops->async_readv(rdata);
-		}
-
-		if (rc) {
-			add_credits_and_wake_if(server, &rdata->credits, 0);
-			kref_put(&rdata->refcount, cifs_readdata_release);
-			if (rc == -EAGAIN)
-				continue;
-			break;
-		}
-
-		list_add_tail(&rdata->list, rdata_list);
-		offset += cur_len;
-		len -= cur_len;
-	} while (len > 0);
-
-	return rc;
-}
-
-static void
-collect_uncached_read_data(struct cifs_aio_ctx *ctx)
-{
-	struct cifs_readdata *rdata, *tmp;
-	struct iov_iter *to = &ctx->iter;
-	struct cifs_sb_info *cifs_sb;
-	int rc;
-
-	cifs_sb = CIFS_SB(ctx->cfile->dentry->d_sb);
-
-	mutex_lock(&ctx->aio_mutex);
-
-	if (list_empty(&ctx->list)) {
-		mutex_unlock(&ctx->aio_mutex);
-		return;
-	}
-
-	rc = ctx->rc;
-	/* the loop below should proceed in the order of increasing offsets */
-again:
-	list_for_each_entry_safe(rdata, tmp, &ctx->list, list) {
-		if (!rc) {
-			if (!try_wait_for_completion(&rdata->done)) {
-				mutex_unlock(&ctx->aio_mutex);
-				return;
-			}
-
-			if (rdata->result == -EAGAIN) {
-				/* resend call if it's a retryable error */
-				struct list_head tmp_list;
-				unsigned int got_bytes = rdata->got_bytes;
-
-				list_del_init(&rdata->list);
-				INIT_LIST_HEAD(&tmp_list);
-
-				if (ctx->direct_io) {
-					/*
-					 * Re-use rdata as this is a
-					 * direct I/O
-					 */
-					rc = cifs_resend_rdata(
-						rdata,
-						&tmp_list, ctx);
-				} else {
-					rc = cifs_send_async_read(
-						rdata->offset + got_bytes,
-						rdata->bytes - got_bytes,
-						rdata->cfile, cifs_sb,
-						&tmp_list, ctx);
-
-					kref_put(&rdata->refcount,
-						cifs_readdata_release);
-				}
-
-				list_splice(&tmp_list, &ctx->list);
-
-				goto again;
-			} else if (rdata->result)
-				rc = rdata->result;
-
-			/* if there was a short read -- discard anything left */
-			if (rdata->got_bytes && rdata->got_bytes < rdata->bytes)
-				rc = -ENODATA;
-
-			ctx->total_len += rdata->got_bytes;
-		}
-		list_del_init(&rdata->list);
-		kref_put(&rdata->refcount, cifs_readdata_release);
-	}
-
-	if (!ctx->direct_io)
-		ctx->total_len = ctx->len - iov_iter_count(to);
-
-	/* mask nodata case */
-	if (rc == -ENODATA)
-		rc = 0;
-
-	ctx->rc = (rc == 0) ? (ssize_t)ctx->total_len : rc;
-
-	mutex_unlock(&ctx->aio_mutex);
-
-	if (ctx->iocb && ctx->iocb->ki_complete)
-		ctx->iocb->ki_complete(ctx->iocb, ctx->rc);
-	else
-		complete(&ctx->done);
-}
-
-static ssize_t __cifs_readv(
-	struct kiocb *iocb, struct iov_iter *to, bool direct)
-{
-	size_t len;
-	struct file *file = iocb->ki_filp;
-	struct cifs_sb_info *cifs_sb;
-	struct cifsFileInfo *cfile;
-	struct cifs_tcon *tcon;
-	ssize_t rc, total_read = 0;
-	loff_t offset = iocb->ki_pos;
-	struct cifs_aio_ctx *ctx;
-
-	/*
-	 * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
-	 * fall back to data copy read path
-	 * this could be improved by getting pages directly in ITER_KVEC
-	 */
-	if (direct && iov_iter_is_kvec(to)) {
-		cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
-		direct = false;
-	}
-
-	len = iov_iter_count(to);
-	if (!len)
-		return 0;
-
-	cifs_sb = CIFS_FILE_SB(file);
-	cfile = file->private_data;
-	tcon = tlink_tcon(cfile->tlink);
-
-	if (!tcon->ses->server->ops->async_readv)
-		return -ENOSYS;
-
-	if ((file->f_flags & O_ACCMODE) == O_WRONLY)
-		cifs_dbg(FYI, "attempting read on write only file instance\n");
-
-	ctx = cifs_aio_ctx_alloc();
-	if (!ctx)
-		return -ENOMEM;
-
-	ctx->pos	= offset;
-	ctx->direct_io	= direct;
-	ctx->len	= len;
-	ctx->cfile	= cifsFileInfo_get(cfile);
-
-	if (!is_sync_kiocb(iocb))
-		ctx->iocb = iocb;
-
-	if (iter_is_iovec(to))
-		ctx->should_dirty = true;
-
-	rc = extract_iter_to_iter(to, len, &ctx->iter, &ctx->bv);
-	if (rc < 0) {
-		kref_put(&ctx->refcount, cifs_aio_ctx_release);
-		return rc;
-	}
-	ctx->npages = rc;
-
-	/* grab a lock here due to read response handlers can access ctx */
-	mutex_lock(&ctx->aio_mutex);
-
-	rc = cifs_send_async_read(offset, len, cfile, cifs_sb, &ctx->list, ctx);
-
-	/* if at least one read request send succeeded, then reset rc */
-	if (!list_empty(&ctx->list))
-		rc = 0;
-
-	mutex_unlock(&ctx->aio_mutex);
-
-	if (rc) {
-		kref_put(&ctx->refcount, cifs_aio_ctx_release);
-		return rc;
-	}
-
-	if (!is_sync_kiocb(iocb)) {
-		kref_put(&ctx->refcount, cifs_aio_ctx_release);
-		return -EIOCBQUEUED;
-	}
-
-	rc = wait_for_completion_killable(&ctx->done);
-	if (rc) {
-		mutex_lock(&ctx->aio_mutex);
-		ctx->rc = rc = -EINTR;
-		total_read = ctx->total_len;
-		mutex_unlock(&ctx->aio_mutex);
-	} else {
-		rc = ctx->rc;
-		total_read = ctx->total_len;
-	}
-
-	kref_put(&ctx->refcount, cifs_aio_ctx_release);
-
-	if (total_read) {
-		iocb->ki_pos += total_read;
-		return total_read;
-	}
-	return rc;
-}
-
-ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
-{
-	return __cifs_readv(iocb, to, true);
-}
-
-ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
-{
-	return __cifs_readv(iocb, to, false);
-}
-
 ssize_t
 cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
 {
@@ -3720,12 +3351,15 @@ cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
 	 * pos+len-1.
 	 */
 	if (!CIFS_CACHE_READ(cinode))
-		return cifs_user_readv(iocb, to);
+		return netfs_direct_read_iter(iocb, to);
 
 	if (cap_unix(tcon->ses) &&
 	    (CIFS_UNIX_FCNTL_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability)) &&
-	    ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0))
+	    ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0)) {
+		if (iocb->ki_flags & IOCB_DIRECT)
+			return netfs_direct_read_iter(iocb, to);
 		return generic_file_read_iter(iocb, to);
+	}
 
 	/*
 	 * We need to hold the sem to be sure nobody modifies lock list
@@ -3734,104 +3368,16 @@ cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
 	down_read(&cinode->lock_sem);
 	if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(to),
 				     tcon->ses->server->vals->shared_lock_type,
-				     0, NULL, CIFS_READ_OP))
-		rc = generic_file_read_iter(iocb, to);
+				     0, NULL, CIFS_READ_OP)) {
+		if (iocb->ki_flags & IOCB_DIRECT)
+			rc = netfs_direct_read_iter(iocb, to);
+		else
+			rc = generic_file_read_iter(iocb, to);
+	}
 	up_read(&cinode->lock_sem);
 	return rc;
 }
 
-static ssize_t
-cifs_read(struct file *file, char *read_data, size_t read_size, loff_t *offset)
-{
-	int rc = -EACCES;
-	unsigned int bytes_read = 0;
-	unsigned int total_read;
-	unsigned int current_read_size;
-	unsigned int rsize;
-	struct cifs_sb_info *cifs_sb;
-	struct cifs_tcon *tcon;
-	struct TCP_Server_Info *server;
-	unsigned int xid;
-	char *cur_offset;
-	struct cifsFileInfo *open_file;
-	struct cifs_io_parms io_parms = {0};
-	int buf_type = CIFS_NO_BUFFER;
-	__u32 pid;
-
-	xid = get_xid();
-	cifs_sb = CIFS_FILE_SB(file);
-
-	/* FIXME: set up handlers for larger reads and/or convert to async */
-	rsize = min_t(unsigned int, cifs_sb->ctx->rsize, CIFSMaxBufSize);
-
-	if (file->private_data == NULL) {
-		rc = -EBADF;
-		free_xid(xid);
-		return rc;
-	}
-	open_file = file->private_data;
-	tcon = tlink_tcon(open_file->tlink);
-	server = cifs_pick_channel(tcon->ses);
-
-	if (!server->ops->sync_read) {
-		free_xid(xid);
-		return -ENOSYS;
-	}
-
-	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
-		pid = open_file->pid;
-	else
-		pid = current->tgid;
-
-	if ((file->f_flags & O_ACCMODE) == O_WRONLY)
-		cifs_dbg(FYI, "attempting read on write only file instance\n");
-
-	for (total_read = 0, cur_offset = read_data; read_size > total_read;
-	     total_read += bytes_read, cur_offset += bytes_read) {
-		do {
-			current_read_size = min_t(uint, read_size - total_read,
-						  rsize);
-			/*
-			 * For windows me and 9x we do not want to request more
-			 * than it negotiated since it will refuse the read
-			 * then.
-			 */
-			if (!(tcon->ses->capabilities &
-				tcon->ses->server->vals->cap_large_files)) {
-				current_read_size = min_t(uint,
-					current_read_size, CIFSMaxBufSize);
-			}
-			if (open_file->invalidHandle) {
-				rc = cifs_reopen_file(open_file, true);
-				if (rc != 0)
-					break;
-			}
-			io_parms.pid = pid;
-			io_parms.tcon = tcon;
-			io_parms.offset = *offset;
-			io_parms.length = current_read_size;
-			io_parms.server = server;
-			rc = server->ops->sync_read(xid, &open_file->fid, &io_parms,
-						    &bytes_read, &cur_offset,
-						    &buf_type);
-		} while (rc == -EAGAIN);
-
-		if (rc || (bytes_read == 0)) {
-			if (total_read) {
-				break;
-			} else {
-				free_xid(xid);
-				return rc;
-			}
-		} else {
-			cifs_stats_bytes_read(tcon, total_read);
-			*offset += bytes_read;
-		}
-	}
-	free_xid(xid);
-	return total_read;
-}
-
 /*
  * If the page is mmap'ed into a process' page tables, then we need to make
  * sure that it doesn't change while being written back.
@@ -3901,224 +3447,149 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma)
 }
 
 /*
- * Unlock a bunch of folios in the pagecache.
+ * Issue a read operation on behalf of the netfs helper functions.  We're asked
+ * to make a read of a certain size at a point in the file.  We are permitted
+ * to only read a portion of that, but as long as we read something, the netfs
+ * helper will call us again so that we can issue another read.
  */
-static void cifs_unlock_folios(struct address_space *mapping, pgoff_t first, pgoff_t last)
-{
-       struct folio *folio;
-       XA_STATE(xas, &mapping->i_pages, first);
-
-       rcu_read_lock();
-       xas_for_each(&xas, folio, last) {
-               folio_unlock(folio);
-       }
-       rcu_read_unlock();
-}
-
-static void cifs_readahead_complete(struct work_struct *work)
-{
-	struct cifs_readdata *rdata = container_of(work,
-						   struct cifs_readdata, work);
-	struct folio *folio;
-	pgoff_t last;
-	bool good = rdata->result == 0 || (rdata->result == -EAGAIN && rdata->got_bytes);
-
-	XA_STATE(xas, &rdata->mapping->i_pages, rdata->offset / PAGE_SIZE);
-
-#if 0
-	if (good)
-		cifs_readpage_to_fscache(rdata->mapping->host, page);
-#endif
-
-	if (iov_iter_count(&rdata->iter) > 0)
-		iov_iter_zero(iov_iter_count(&rdata->iter), &rdata->iter);
-
-	last = round_down(rdata->offset + rdata->got_bytes - 1, PAGE_SIZE);
-
-	xas_for_each(&xas, folio, last) {
-		if (good) {
-			flush_dcache_folio(folio);
-			folio_mark_uptodate(folio);
-		}
-		folio_unlock(folio);
-	}
-
-	kref_put(&rdata->refcount, cifs_readdata_release);
-}
-
-static void cifs_readahead(struct readahead_control *ractl)
+static void cifs_req_issue_op(struct netfs_read_subrequest *subreq)
 {
-	struct cifsFileInfo *open_file = ractl->file->private_data;
-	struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file);
+	struct netfs_read_request *rreq = subreq->rreq;
 	struct TCP_Server_Info *server;
+	struct cifs_readdata *rdata;
+	struct cifsFileInfo *open_file = rreq->netfs_priv;
+	struct cifs_sb_info *cifs_sb = CIFS_SB(rreq->inode->i_sb);
+	struct cifs_credits credits_on_stack, *credits = &credits_on_stack;
 	unsigned int xid;
 	pid_t pid;
 	int rc = 0;
+	unsigned int rsize;
 
 	xid = get_xid();
 
 	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
 		pid = open_file->pid;
 	else
-		pid = current->tgid;
+		pid = current->tgid; // Ummm...  This may be a workqueue
 
 	server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
 
-	cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n",
-		 __func__, ractl->file, ractl->mapping, readahead_count(ractl));
-
-	/*
-	 * Chop the readahead request up into rsize-sized read requests.
-	 */
-	while (readahead_count(ractl) - ractl->_batch_count) {
-		unsigned int i, nr_pages, rsize;
-		struct cifs_readdata *rdata;
-		struct cifs_credits credits_on_stack;
-		struct cifs_credits *credits = &credits_on_stack;
+	cifs_dbg(FYI, "%s: op=%08x[%x] mapping=%p len=%zu/%zu\n",
+		 __func__, rreq->debug_id, subreq->debug_index, rreq->mapping,
+		 subreq->transferred, subreq->len);
 
-		if (open_file->invalidHandle) {
+	if (open_file->invalidHandle) {
+		do {
 			rc = cifs_reopen_file(open_file, true);
-			if (rc) {
-				if (rc == -EAGAIN)
-					continue;
-				break;
-			}
-		}
-
-		rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize,
-						   &rsize, credits);
+		} while (rc == -EAGAIN);
 		if (rc)
-			break;
-		nr_pages = min_t(size_t, rsize / PAGE_SIZE, readahead_count(ractl));
-
-		/*
-		 * Give up immediately if rsize is too small to read an entire
-		 * page. The VFS will fall back to readpage. We should never
-		 * reach this point however since we set ra_pages to 0 when the
-		 * rsize is smaller than a cache page.
-		 */
-		if (unlikely(!nr_pages)) {
-			add_credits_and_wake_if(server, credits, 0);
-			break;
-		}
-
-		rdata = cifs_readdata_alloc(cifs_readahead_complete);
-		if (!rdata) {
-			/* best to give up if we're out of mem */
-			add_credits_and_wake_if(server, credits, 0);
-			break;
-		}
+			goto out;
+	}
 
-		rdata->offset	= readahead_pos(ractl);
-		rdata->bytes	= nr_pages * PAGE_SIZE;
-		rdata->cfile	= cifsFileInfo_get(open_file);
-		rdata->server	= server;
-		rdata->mapping	= ractl->mapping;
-		rdata->pid	= pid;
-		rdata->credits	= credits_on_stack;
+	rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize, &rsize, credits);
+	if (rc)
+		goto out;
 
-		for (i = 0; i < nr_pages; i++)
-			if (!readahead_folio(ractl))
-				BUG();
+	rdata = cifs_readdata_alloc(NULL);
+	if (!rdata) {
+		add_credits_and_wake_if(server, credits, 0);
+		rc = -ENOMEM;
+		goto out;
+	}
 
-		iov_iter_xarray(&rdata->iter, READ, &rdata->mapping->i_pages,
-				rdata->offset, rdata->bytes);
+	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+	rdata->subreq	= subreq;
+	rdata->cfile	= cifsFileInfo_get(open_file);
+	rdata->server	= server;
+	rdata->offset	= subreq->start + subreq->transferred;
+	rdata->bytes	= subreq->len   - subreq->transferred;
+	rdata->pid	= pid;
+	rdata->credits	= credits_on_stack;
+	rdata->iter	= subreq->iter;
 
-		rc = adjust_credits(server, &rdata->credits, rdata->bytes);
-		if (!rc) {
-			if (rdata->cfile->invalidHandle)
-				rc = -EAGAIN;
-			else
-				rc = server->ops->async_readv(rdata);
-		}
+	rc = adjust_credits(server, &rdata->credits, rdata->bytes);
+	if (!rc) {
+		if (rdata->cfile->invalidHandle)
+			rc = -EAGAIN;
+		else
+			rc = server->ops->async_readv(rdata);
+	}
 
-		if (rc) {
-			add_credits_and_wake_if(server, &rdata->credits, 0);
-			cifs_unlock_folios(rdata->mapping,
-					   rdata->offset / PAGE_SIZE,
-					   (rdata->offset + rdata->bytes - 1) / PAGE_SIZE);
-			/* Fallback to the readpage in error/reconnect cases */
-			kref_put(&rdata->refcount, cifs_readdata_release);
-			break;
-		}
+	if (rc) {
+		add_credits_and_wake_if(server, &rdata->credits, 0);
+		/* Fallback to the readpage in error/reconnect cases */
+		kref_put(&rdata->refcount, cifs_readdata_release);
+		goto out;
 	}
 
+	kref_put(&rdata->refcount, cifs_readdata_release);
+
+out:
 	free_xid(xid);
+	if (rc)
+		netfs_subreq_terminated(subreq, rc, false);
+}
+
+static int cifs_init_rreq(struct netfs_read_request *rreq, struct file *file)
+{
+	rreq->netfs_priv = file->private_data;
+	return 0;
 }
 
 /*
- * cifs_readpage_worker must be called with the page pinned
+ * Expand the size of a readahead to the size of the rsize, if at least as
+ * large as a page, allowing for the possibility that rsize is not pow-2
+ * aligned.
  */
-static int cifs_readpage_worker(struct file *file, struct page *page,
-	loff_t *poffset)
+static void cifs_expand_readahead(struct netfs_read_request *rreq)
 {
-	char *read_data;
-	int rc;
+	struct cifs_sb_info *cifs_sb = CIFS_SB(rreq->inode->i_sb);
+	unsigned int rsize = cifs_sb->ctx->rsize;
+	loff_t misalignment, i_size = i_size_read(rreq->inode);
 
-	/* Is the page cached? */
-	rc = cifs_readpage_from_fscache(file_inode(file), page);
-	if (rc == 0)
-		goto read_complete;
-
-	read_data = kmap(page);
-	/* for reads over a certain size could initiate async read ahead */
-
-	rc = cifs_read(file, read_data, PAGE_SIZE, poffset);
-
-	if (rc < 0)
-		goto io_error;
-	else
-		cifs_dbg(FYI, "Bytes read %d\n", rc);
+	if (rsize < PAGE_SIZE)
+		return;
 
-	/* we do not want atime to be less than mtime, it broke some apps */
-	file_inode(file)->i_atime = current_time(file_inode(file));
-	if (timespec64_compare(&(file_inode(file)->i_atime), &(file_inode(file)->i_mtime)))
-		file_inode(file)->i_atime = file_inode(file)->i_mtime;
+	if (rsize < INT_MAX)
+		rsize = roundup_pow_of_two(rsize);
 	else
-		file_inode(file)->i_atime = current_time(file_inode(file));
+		rsize = ((unsigned int)INT_MAX + 1) / 2;
 
-	if (PAGE_SIZE > rc)
-		memset(read_data + rc, 0, PAGE_SIZE - rc);
-
-	flush_dcache_page(page);
-	SetPageUptodate(page);
-
-	/* send this page to the cache */
-	cifs_readpage_to_fscache(file_inode(file), page);
-
-	rc = 0;
-
-io_error:
-	kunmap(page);
-	unlock_page(page);
+	misalignment = rreq->start & (rsize - 1);
+	if (misalignment) {
+		rreq->start -= misalignment;
+		rreq->len += misalignment;
+	}
 
-read_complete:
-	return rc;
+	rreq->len = round_up(rreq->len, rsize);
+	if (rreq->start < i_size && rreq->len > i_size - rreq->start)
+		rreq->len = i_size - rreq->start;
 }
 
-static int cifs_readpage(struct file *file, struct page *page)
+static void cifs_rreq_done(struct netfs_read_request *rreq)
 {
-	loff_t offset = page_file_offset(page);
-	int rc = -EACCES;
-	unsigned int xid;
+	struct inode *inode = rreq->inode;
 
-	xid = get_xid();
-
-	if (file->private_data == NULL) {
-		rc = -EBADF;
-		free_xid(xid);
-		return rc;
-	}
-
-	cifs_dbg(FYI, "readpage %p at offset %d 0x%x\n",
-		 page, (int)offset, (int)offset);
-
-	rc = cifs_readpage_worker(file, page, &offset);
+	/* we do not want atime to be less than mtime, it broke some apps */
+	inode->i_atime = current_time(inode);
+	if (timespec64_compare(&inode->i_atime, &inode->i_mtime))
+		inode->i_atime = inode->i_mtime;
+	else
+		inode->i_atime = current_time(inode);
+}
 
-	free_xid(xid);
-	return rc;
+static void cifs_req_cleanup(struct address_space *mapping, void *netfs_priv)
+{
 }
 
+const struct netfs_request_ops cifs_req_ops = {
+	.init_rreq		= cifs_init_rreq,
+	.expand_readahead	= cifs_expand_readahead,
+	.issue_op		= cifs_req_issue_op,
+	.done			= cifs_rreq_done,
+	.cleanup		= cifs_req_cleanup,
+};
+
 static int is_inode_writable(struct cifsInodeInfo *cifs_inode)
 {
 	struct cifsFileInfo *open_file;
@@ -4168,34 +3639,20 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned flags,
 			struct page **pagep, void **fsdata)
 {
-	int oncethru = 0;
-	pgoff_t index = pos >> PAGE_SHIFT;
-	loff_t offset = pos & (PAGE_SIZE - 1);
-	loff_t page_start = pos & PAGE_MASK;
-	loff_t i_size;
-	struct page *page;
-	int rc = 0;
+	struct folio *folio;
+	int rc;
 
 	cifs_dbg(FYI, "write_begin from %lld len %d\n", (long long)pos, len);
 
-start:
-	page = grab_cache_page_write_begin(mapping, index, flags);
-	if (!page) {
-		rc = -ENOMEM;
-		goto out;
-	}
-
-	if (PageUptodate(page))
-		goto out;
-
-	/*
-	 * If we write a full page it will be up to date, no need to read from
-	 * the server. If the write is short, we'll end up doing a sync write
-	 * instead.
+	/* Prefetch area to be written into the cache if we're caching this
+	 * file.  We need to do this before we get a lock on the page in case
+	 * there's more than one writer competing for the same cache block.
 	 */
-	if (len == PAGE_SIZE)
-		goto out;
+	rc = netfs_write_begin(file, mapping, pos, len, flags, &folio, fsdata);
+	if (rc < 0)
+		return rc;
 
+#if 0
 	/*
 	 * optimize away the read when we have an oplock, and we're not
 	 * expecting to use any of the data we'd be reading in. That
@@ -4210,34 +3667,17 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 					   offset + len,
 					   PAGE_SIZE);
 			/*
-			 * PageChecked means that the parts of the page
-			 * to which we're not writing are considered up
-			 * to date. Once the data is copied to the
-			 * page, it can be set uptodate.
+			 * Marking a folio checked means that the parts of the
+			 * page to which we're not writing are considered up to
+			 * date. Once the data is copied to the page, it can be
+			 * set uptodate.
 			 */
-			SetPageChecked(page);
+			folio_set_checked(folio);
 			goto out;
 		}
 	}
-
-	if ((file->f_flags & O_ACCMODE) != O_WRONLY && !oncethru) {
-		/*
-		 * might as well read a page, it is fast enough. If we get
-		 * an error, we don't need to return it. cifs_write_end will
-		 * do a sync write instead since PG_uptodate isn't set.
-		 */
-		cifs_readpage_worker(file, page, &page_start);
-		put_page(page);
-		oncethru = 1;
-		goto start;
-	} else {
-		/* we could try using another file handle if there is one -
-		   but how would we lock it to prevent close of that handle
-		   racing with this read? In any case
-		   this will be written out by write_end so is fine */
-	}
-out:
-	*pagep = page;
+#endif
+	*pagep = folio_page(folio, (pos - folio_pos(folio)) / PAGE_SIZE);
 	return rc;
 }
 
@@ -4429,8 +3869,8 @@ static int cifs_set_page_dirty(struct page *page)
 #endif
 
 const struct address_space_operations cifs_addr_ops = {
-	.readpage = cifs_readpage,
-	.readahead = cifs_readahead,
+	.readpage = netfs_readpage,
+	.readahead = netfs_readahead,
 	.writepage = cifs_writepage,
 	.writepages = cifs_writepages,
 	.write_begin = cifs_write_begin,
@@ -4455,7 +3895,7 @@ const struct address_space_operations cifs_addr_ops = {
  * to leave cifs_readpages out of the address space operations.
  */
 const struct address_space_operations cifs_addr_ops_smallbuf = {
-	.readpage = cifs_readpage,
+	.readpage = netfs_readpage,
 	.writepage = cifs_writepage,
 	.writepages = cifs_writepages,
 	.write_begin = cifs_write_begin,
diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c
index a7e7e5a97b7f..bb1c3a372de4 100644
--- a/fs/cifs/fscache.c
+++ b/fs/cifs/fscache.c
@@ -134,34 +134,3 @@ void cifs_fscache_release_inode_cookie(struct inode *inode)
 		cifsi->netfs_ctx.cache = NULL;
 	}
 }
-
-/*
- * Retrieve a page from FS-Cache
- */
-int __cifs_readpage_from_fscache(struct inode *inode, struct page *page)
-{
-	cifs_dbg(FYI, "%s: (fsc:%p, p:%p, i:0x%p\n",
-		 __func__, cifs_inode_cookie(inode), page, inode);
-	return -ENOBUFS; // Needs conversion to using netfslib
-}
-
-/*
- * Retrieve a set of pages from FS-Cache
- */
-int __cifs_readpages_from_fscache(struct inode *inode,
-				struct address_space *mapping,
-				struct list_head *pages,
-				unsigned *nr_pages)
-{
-	cifs_dbg(FYI, "%s: (0x%p/%u/0x%p)\n",
-		 __func__, cifs_inode_cookie(inode), *nr_pages, inode);
-	return -ENOBUFS; // Needs conversion to using netfslib
-}
-
-void __cifs_readpage_to_fscache(struct inode *inode, struct page *page)
-{
-	cifs_dbg(FYI, "%s: (fsc: %p, p: %p, i: %p)\n",
-		 __func__, cifs_inode_cookie(inode), page, inode);
-
-	// Needs conversion to using netfslib
-}
diff --git a/fs/cifs/fscache.h b/fs/cifs/fscache.h
index 9f6e42e85d14..fdc03cd7b881 100644
--- a/fs/cifs/fscache.h
+++ b/fs/cifs/fscache.h
@@ -58,14 +58,6 @@ void cifs_fscache_fill_coherency(struct inode *inode,
 }
 
 
-extern int cifs_fscache_release_page(struct page *page, gfp_t gfp);
-extern int __cifs_readpage_from_fscache(struct inode *, struct page *);
-extern int __cifs_readpages_from_fscache(struct inode *,
-					 struct address_space *,
-					 struct list_head *,
-					 unsigned *);
-extern void __cifs_readpage_to_fscache(struct inode *, struct page *);
-
 static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode)
 {
 	return netfs_i_cookie(inode);
@@ -80,33 +72,6 @@ static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags
 			   i_size_read(inode), flags);
 }
 
-static inline int cifs_readpage_from_fscache(struct inode *inode,
-					     struct page *page)
-{
-	if (cifs_inode_cookie(inode))
-		return __cifs_readpage_from_fscache(inode, page);
-
-	return -ENOBUFS;
-}
-
-static inline int cifs_readpages_from_fscache(struct inode *inode,
-					      struct address_space *mapping,
-					      struct list_head *pages,
-					      unsigned *nr_pages)
-{
-	if (cifs_inode_cookie(inode))
-		return __cifs_readpages_from_fscache(inode, mapping, pages,
-						     nr_pages);
-	return -ENOBUFS;
-}
-
-static inline void cifs_readpage_to_fscache(struct inode *inode,
-					    struct page *page)
-{
-	if (PageFsCache(page))
-		__cifs_readpage_to_fscache(inode, page);
-}
-
 #else /* CONFIG_CIFS_FSCACHE */
 static inline
 void cifs_fscache_fill_coherency(struct inode *inode,
@@ -123,23 +88,6 @@ static inline void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool upd
 static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { return NULL; }
 static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags) {}
 
-static inline int
-cifs_readpage_from_fscache(struct inode *inode, struct page *page)
-{
-	return -ENOBUFS;
-}
-
-static inline int cifs_readpages_from_fscache(struct inode *inode,
-					      struct address_space *mapping,
-					      struct list_head *pages,
-					      unsigned *nr_pages)
-{
-	return -ENOBUFS;
-}
-
-static inline void cifs_readpage_to_fscache(struct inode *inode,
-			struct page *page) {}
-
 #endif /* CONFIG_CIFS_FSCACHE */
 
 #endif /* _CIFS_FSCACHE_H */
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 7d8b3ceb2af3..b6a9ded9fbb2 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -26,6 +26,19 @@
 #include "fs_context.h"
 #include "cifs_ioctl.h"
 
+/*
+ * Set parameters for the netfs library
+ */
+static void cifs_set_netfs_context(struct inode *inode)
+{
+	struct netfs_i_context *ctx = netfs_i_context(inode);
+	struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
+
+	netfs_i_context_init(inode, &cifs_req_ops);
+	ctx->rsize = cifs_sb->ctx->rsize;
+	ctx->wsize = cifs_sb->ctx->wsize;
+}
+
 static void cifs_set_ops(struct inode *inode)
 {
 	struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
@@ -209,8 +222,10 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr)
 
 	if (fattr->cf_flags & CIFS_FATTR_DFS_REFERRAL)
 		inode->i_flags |= S_AUTOMOUNT;
-	if (inode->i_state & I_NEW)
+	if (inode->i_state & I_NEW) {
+		cifs_set_netfs_context(inode);
 		cifs_set_ops(inode);
+	}
 	return 0;
 }
 
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index ebbea7526ee2..0d76cffb4e75 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -23,6 +23,7 @@
 #include <linux/uuid.h>
 #include <linux/pagemap.h>
 #include <linux/xattr.h>
+#include <linux/netfs.h>
 #include "cifsglob.h"
 #include "cifsacl.h"
 #include "cifsproto.h"
@@ -4185,7 +4186,19 @@ smb2_readv_callback(struct mid_q_entry *mid)
 				     tcon->tid, tcon->ses->Suid,
 				     rdata->offset, rdata->got_bytes);
 
-	queue_work(cifsiod_wq, &rdata->work);
+	if (rdata->result == -ENODATA) {
+		/* We may have got an EOF error because fallocate
+		 * failed to enlarge the file.
+		 */
+		if (rdata->subreq->start < rdata->subreq->rreq->i_size)
+			rdata->result = 0;
+	}
+	if (rdata->result == 0 || rdata->result == -EAGAIN)
+		iov_iter_advance(&rdata->subreq->iter, rdata->got_bytes);
+	netfs_subreq_terminated(rdata->subreq,
+				(rdata->result == 0 || rdata->result == -EAGAIN) ?
+				rdata->got_bytes : rdata->result, false);
+	kref_put(&rdata->refcount, cifs_readdata_release);
 	DeleteMidQEntry(mid);
 	add_credits(server, &credits, 0);
 }
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index df13c9b22ca8..1fa242140dc4 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -553,8 +553,13 @@ static void netfs_rreq_assess_dio(struct netfs_read_request *rreq)
 	list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
 		if (subreq->error || subreq->transferred == 0)
 			break;
-		for (i = 0; i < subreq->bv_count; i++)
+		for (i = 0; i < subreq->bv_count; i++) {
 			flush_dcache_page(subreq->bv[i].bv_page);
+			// TODO: cifs marks pages in the destination buffer
+			// dirty under some circumstances after a read.  Do we
+			// need to do that too?
+			set_page_dirty(subreq->bv[i].bv_page);
+		}
 		transferred += subreq->transferred;
 		if (subreq->transferred < subreq->len)
 			break;



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/7] cifs: Transition from ->readpages() to ->readahead()
  2022-01-25 13:57 ` [RFC PATCH 1/7] cifs: Transition from ->readpages() to ->readahead() David Howells
@ 2022-01-25 14:20   ` Matthew Wilcox
  2022-01-25 14:57   ` David Howells
  1 sibling, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2022-01-25 14:20 UTC (permalink / raw)
  To: David Howells
  Cc: smfrench, nspmangalore, Jeff Layton, linux-cifs, linux-cachefs,
	linux-fsdevel

On Tue, Jan 25, 2022 at 01:57:44PM +0000, David Howells wrote:
> +	while (readahead_count(ractl) - ractl->_batch_count) {

You do understand that prefixing a structure member with an '_' means
"Don't use this", right?  If I could get the compiler to prevent you, I
would.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/7] cifs: Transition from ->readpages() to ->readahead()
  2022-01-25 13:57 ` [RFC PATCH 1/7] cifs: Transition from ->readpages() to ->readahead() David Howells
  2022-01-25 14:20   ` Matthew Wilcox
@ 2022-01-25 14:57   ` David Howells
  1 sibling, 0 replies; 20+ messages in thread
From: David Howells @ 2022-01-25 14:57 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: dhowells, smfrench, nspmangalore, Jeff Layton, linux-cifs,
	linux-cachefs, linux-fsdevel

Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, Jan 25, 2022 at 01:57:44PM +0000, David Howells wrote:
> > +	while (readahead_count(ractl) - ractl->_batch_count) {
> 
> You do understand that prefixing a structure member with an '_' means
> "Don't use this", right?  If I could get the compiler to prevent you, I
> would.

Yes, I know.  However, as previously discussed, I think that your
implementation of readahead batching doesn't work right - hence the need to
apply compensation to the values returned by the accessor functions.

Btw, I end up doing this:

		for (i = 0; i < nr_pages; i++)
			if (!readahead_folio(ractl))
				BUG();

in patch 5.  I want to create a batch, but I don't want to be given the array
of addresses of the folios as I'm going to use an xarray-class iterator.
Further, _batch_count at this point is some value related to just the last
folio and not the batch as a whole:-/

(Also, the above won't work if any folios retrieved are larger than a page)

Note that cifs_readahead() is removed in patch 7 and readahead functionality
is offloaded to netfslib, so I'm not sure it's worth spending much time on
fixing.

[I should mention that netfs_readahead() also does:

	while (readahead_folio(ractl))
		;
which could probably be replaced with something better that doesn't keep
taking and dropping the RCU readlock.]

Would you object if I added a function like:

	static inline
	unsigned int readahead_commit_batch(struct readahead_control *rac)
	{
		BUG_ON(rac->_batch_count > rac->_nr_pages);
		rac->_nr_pages -= rac->_batch_count;
		rac->_index += rac->_batch_count;
		rac->_batch_count = 0;
	}

It could then be called from both __readahead_folio() and __readahead_batch().
For __readahead_folio(), the duplicate setting of _batch_count should be
optimised away on the path where a folio is returned.  I could then call this
from the loop in cifs before going round again.

I'd also like to consider adding something like:

	static inline
	void readahead_set_batch(struct readahead_control *rac)
	{
		unsigned int i = 0;
		struct page *page;
		XA_STATE(xas, &rac->mapping->i_pages, 0);

		BUG_ON(rac->_batch_count > rac->_nr_pages);
		rac->_nr_pages -= rac->_batch_count;
		rac->_index += rac->_batch_count;
		rac->_batch_count = 0;

		xas_set(&xas, rac->_index);
		rcu_read_lock();
		xas_for_each(&xas, page, rac->_index + rac->_nr_pages - 1) {
			if (xas_retry(&xas, page))
				continue;
			VM_BUG_ON_PAGE(!PageLocked(page), page);
			VM_BUG_ON_PAGE(PageTail(page), page);
			rac->_batch_count += thp_nr_pages(page);
		}
		rcu_read_unlock();
	}

so that netfslib can use it to load all the pages it is given into a batch
without retrieving the page pointers.

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 3/7] cifs: Change the I/O paths to use an iterator rather than a page list
  2022-01-25 13:57 ` [RFC PATCH 3/7] cifs: Change the I/O paths to use an iterator rather than a page list David Howells
@ 2022-01-31  5:06   ` Rohith Surabattula
  2022-01-31  5:48     ` Shyam Prasad N
  2022-02-14 16:06   ` David Howells
  1 sibling, 1 reply; 20+ messages in thread
From: Rohith Surabattula @ 2022-01-31  5:06 UTC (permalink / raw)
  To: David Howells
  Cc: smfrench, nspmangalore, jlayton, linux-cifs, linux-cachefs,
	linux-fsdevel

Hi David,

After copying the buf to the XArray iterator, "got_bytes" field is not
updated. As a result, the read of data which is less than page size
failed.
Below is the patch to fix the above issue.

diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index e1649ac194db..5faf45672891 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -4917,6 +4917,7 @@ handle_read_data(struct TCP_Server_Info *server,
struct mid_q_entry *mid,
                length = copy_to_iter(buf + data_offset, data_len,
&rdata->iter);
                if (length < 0)
                        return length;
+               rdata->got_bytes = data_len;
        } else {
                /* read response payload cannot be in both buf and pages */
                WARN_ONCE(1, "buf can not contain only a part of read data");

Regards,
Rohith

On Wed, Jan 26, 2022 at 1:21 AM David Howells <dhowells@redhat.com> wrote:
>
>
> ---
>
>  fs/cifs/cifsencrypt.c |   40 +++--
>  fs/cifs/cifsfs.c      |    2
>  fs/cifs/cifsfs.h      |    3
>  fs/cifs/cifsglob.h    |   28 +---
>  fs/cifs/cifsproto.h   |   10 +
>  fs/cifs/cifssmb.c     |  224 +++++++++++++++++++-----------
>  fs/cifs/connect.c     |   16 ++
>  fs/cifs/misc.c        |   19 ---
>  fs/cifs/smb2ops.c     |  365 ++++++++++++++++++++++++-------------------------
>  fs/cifs/smb2pdu.c     |   12 --
>  fs/cifs/transport.c   |   37 +----
>  11 files changed, 379 insertions(+), 377 deletions(-)
>
> diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
> index 0912d8bbbac1..69bbf3d6c4d4 100644
> --- a/fs/cifs/cifsencrypt.c
> +++ b/fs/cifs/cifsencrypt.c
> @@ -24,12 +24,27 @@
>  #include "../smbfs_common/arc4.h"
>  #include <crypto/aead.h>
>
> +static ssize_t cifs_signature_scan(struct iov_iter *i, const void *p,
> +                                  size_t len, size_t off, void *priv)
> +{
> +       struct shash_desc *shash = priv;
> +       int rc;
> +
> +       rc = crypto_shash_update(shash, p, len);
> +       if (rc) {
> +               cifs_dbg(VFS, "%s: Could not update with payload\n", __func__);
> +               return rc;
> +       }
> +
> +       return len;
> +}
> +
>  int __cifs_calc_signature(struct smb_rqst *rqst,
>                         struct TCP_Server_Info *server, char *signature,
>                         struct shash_desc *shash)
>  {
>         int i;
> -       int rc;
> +       ssize_t rc;
>         struct kvec *iov = rqst->rq_iov;
>         int n_vec = rqst->rq_nvec;
>         int is_smb2 = server->vals->header_preamble_size == 0;
> @@ -62,25 +77,10 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
>                 }
>         }
>
> -       /* now hash over the rq_pages array */
> -       for (i = 0; i < rqst->rq_npages; i++) {
> -               void *kaddr;
> -               unsigned int len, offset;
> -
> -               rqst_page_get_length(rqst, i, &len, &offset);
> -
> -               kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;
> -
> -               rc = crypto_shash_update(shash, kaddr, len);
> -               if (rc) {
> -                       cifs_dbg(VFS, "%s: Could not update with payload\n",
> -                                __func__);
> -                       kunmap(rqst->rq_pages[i]);
> -                       return rc;
> -               }
> -
> -               kunmap(rqst->rq_pages[i]);
> -       }
> +       rc = iov_iter_scan(&rqst->rq_iter, iov_iter_count(&rqst->rq_iter),
> +                          cifs_signature_scan, shash);
> +       if (rc < 0)
> +               return rc;
>
>         rc = crypto_shash_final(shash, signature);
>         if (rc)
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 199edac0cb59..a56cb9c8c5ff 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -935,7 +935,7 @@ cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>         ssize_t rc;
>         struct inode *inode = file_inode(iocb->ki_filp);
>
> -       if (iocb->ki_filp->f_flags & O_DIRECT)
> +       if (iocb->ki_flags & IOCB_DIRECT)
>                 return cifs_user_readv(iocb, iter);
>
>         rc = cifs_revalidate_mapping(inode);
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index 15a5c5db038b..1c77bbc0815f 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -110,6 +110,9 @@ extern int cifs_file_strict_mmap(struct file * , struct vm_area_struct *);
>  extern const struct file_operations cifs_dir_ops;
>  extern int cifs_dir_open(struct inode *inode, struct file *file);
>  extern int cifs_readdir(struct file *file, struct dir_context *ctx);
> +extern void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len);
> +extern void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len);
> +extern void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len);
>
>  /* Functions related to dir entries */
>  extern const struct dentry_operations cifs_dentry_ops;
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 0a4085ced40f..3a4fed645636 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -191,11 +191,8 @@ struct cifs_cred {
>  struct smb_rqst {
>         struct kvec     *rq_iov;        /* array of kvecs */
>         unsigned int    rq_nvec;        /* number of kvecs in array */
> -       struct page     **rq_pages;     /* pointer to array of page ptrs */
> -       unsigned int    rq_offset;      /* the offset to the 1st page */
> -       unsigned int    rq_npages;      /* number pages in array */
> -       unsigned int    rq_pagesz;      /* page size to use */
> -       unsigned int    rq_tailsz;      /* length of last page */
> +       struct iov_iter rq_iter;        /* Data iterator */
> +       struct xarray   rq_buffer;      /* Page buffer for encryption */
>  };
>
>  struct mid_q_entry;
> @@ -1323,28 +1320,18 @@ struct cifs_readdata {
>         struct address_space            *mapping;
>         struct cifs_aio_ctx             *ctx;
>         __u64                           offset;
> +       ssize_t                         got_bytes;
>         unsigned int                    bytes;
> -       unsigned int                    got_bytes;
>         pid_t                           pid;
>         int                             result;
>         struct work_struct              work;
> -       int (*read_into_pages)(struct TCP_Server_Info *server,
> -                               struct cifs_readdata *rdata,
> -                               unsigned int len);
> -       int (*copy_into_pages)(struct TCP_Server_Info *server,
> -                               struct cifs_readdata *rdata,
> -                               struct iov_iter *iter);
> +       struct iov_iter                 iter;
>         struct kvec                     iov[2];
>         struct TCP_Server_Info          *server;
>  #ifdef CONFIG_CIFS_SMB_DIRECT
>         struct smbd_mr                  *mr;
>  #endif
> -       unsigned int                    pagesz;
> -       unsigned int                    page_offset;
> -       unsigned int                    tailsz;
>         struct cifs_credits             credits;
> -       unsigned int                    nr_pages;
> -       struct page                     **pages;
>  };
>
>  /* asynchronous write support */
> @@ -1356,6 +1343,8 @@ struct cifs_writedata {
>         struct work_struct              work;
>         struct cifsFileInfo             *cfile;
>         struct cifs_aio_ctx             *ctx;
> +       struct iov_iter                 iter;
> +       struct bio_vec                  *bv;
>         __u64                           offset;
>         pid_t                           pid;
>         unsigned int                    bytes;
> @@ -1364,12 +1353,7 @@ struct cifs_writedata {
>  #ifdef CONFIG_CIFS_SMB_DIRECT
>         struct smbd_mr                  *mr;
>  #endif
> -       unsigned int                    pagesz;
> -       unsigned int                    page_offset;
> -       unsigned int                    tailsz;
>         struct cifs_credits             credits;
> -       unsigned int                    nr_pages;
> -       struct page                     **pages;
>  };
>
>  /*
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index d3701295402d..1b143f0a03c0 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -242,6 +242,9 @@ extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
>                                         unsigned int page_offset,
>                                         unsigned int to_read);
>  extern int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb);
> +extern int cifs_read_iter_from_socket(struct TCP_Server_Info *server,
> +                                     struct iov_iter *iter,
> +                                     unsigned int to_read);
>  extern int cifs_match_super(struct super_block *, void *);
>  extern int cifs_mount(struct cifs_sb_info *cifs_sb, struct smb3_fs_context *ctx);
>  extern void cifs_umount(struct cifs_sb_info *);
> @@ -575,10 +578,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid);
>  int cifs_async_writev(struct cifs_writedata *wdata,
>                       void (*release)(struct kref *kref));
>  void cifs_writev_complete(struct work_struct *work);
> -struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
> -                                               work_func_t complete);
> -struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages,
> -                                               work_func_t complete);
> +struct cifs_writedata *cifs_writedata_alloc(work_func_t complete);
>  void cifs_writedata_release(struct kref *refcount);
>  int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
>                           struct cifs_sb_info *cifs_sb,
> @@ -602,8 +602,6 @@ int cifs_alloc_hash(const char *name, struct crypto_shash **shash,
>                     struct sdesc **sdesc);
>  void cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc);
>
> -extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
> -                               unsigned int *len, unsigned int *offset);
>  struct cifs_chan *
>  cifs_ses_find_chan(struct cifs_ses *ses, struct TCP_Server_Info *server);
>  int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses);
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 071e2f21a7db..38e7276352e2 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -24,6 +24,7 @@
>  #include <linux/task_io_accounting_ops.h>
>  #include <linux/uaccess.h>
>  #include "cifspdu.h"
> +#include "cifsfs.h"
>  #include "cifsglob.h"
>  #include "cifsacl.h"
>  #include "cifsproto.h"
> @@ -1388,11 +1389,11 @@ int
>  cifs_discard_remaining_data(struct TCP_Server_Info *server)
>  {
>         unsigned int rfclen = server->pdu_size;
> -       int remaining = rfclen + server->vals->header_preamble_size -
> +       size_t remaining = rfclen + server->vals->header_preamble_size -
>                 server->total_read;
>
>         while (remaining > 0) {
> -               int length;
> +               ssize_t length;
>
>                 length = cifs_discard_from_socket(server,
>                                 min_t(size_t, remaining,
> @@ -1539,10 +1540,15 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
>                 return cifs_readv_discard(server, mid);
>         }
>
> -       length = rdata->read_into_pages(server, rdata, data_len);
> -       if (length < 0)
> -               return length;
> -
> +#ifdef CONFIG_CIFS_SMB_DIRECT
> +       if (rdata->mr)
> +               length = data_len; /* An RDMA read is already done. */
> +       else
> +#endif
> +               length = cifs_read_iter_from_socket(server, &rdata->iter,
> +                                                   data_len);
> +       if (length > 0)
> +               rdata->got_bytes += length;
>         server->total_read += length;
>
>         cifs_dbg(FYI, "total_read=%u buflen=%u remaining=%u\n",
> @@ -1566,11 +1572,7 @@ cifs_readv_callback(struct mid_q_entry *mid)
>         struct TCP_Server_Info *server = tcon->ses->server;
>         struct smb_rqst rqst = { .rq_iov = rdata->iov,
>                                  .rq_nvec = 2,
> -                                .rq_pages = rdata->pages,
> -                                .rq_offset = rdata->page_offset,
> -                                .rq_npages = rdata->nr_pages,
> -                                .rq_pagesz = rdata->pagesz,
> -                                .rq_tailsz = rdata->tailsz };
> +                                .rq_iter = rdata->iter };
>         struct cifs_credits credits = { .value = 1, .instance = 0 };
>
>         cifs_dbg(FYI, "%s: mid=%llu state=%d result=%d bytes=%u\n",
> @@ -1925,10 +1927,93 @@ cifs_writedata_release(struct kref *refcount)
>         if (wdata->cfile)
>                 cifsFileInfo_put(wdata->cfile);
>
> -       kvfree(wdata->pages);
>         kfree(wdata);
>  }
>
> +/*
> + * Completion of write to server.
> + */
> +void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len)
> +{
> +       struct address_space *mapping = inode->i_mapping;
> +       struct folio *folio;
> +       pgoff_t end;
> +
> +       XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
> +
> +       rcu_read_lock();
> +
> +       end = (start + len - 1) / PAGE_SIZE;
> +       xas_for_each(&xas, folio, end) {
> +               if (!folio_test_writeback(folio)) {
> +                       pr_err("bad %x @%llx page %lx %lx\n",
> +                              len, start, folio_index(folio), end);
> +                       BUG();
> +               }
> +
> +               folio_detach_private(folio);
> +               folio_end_writeback(folio);
> +       }
> +
> +       rcu_read_unlock();
> +}
> +
> +/*
> + * Failure of write to server.
> + */
> +void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len)
> +{
> +       struct address_space *mapping = inode->i_mapping;
> +       struct folio *folio;
> +       pgoff_t end;
> +
> +       XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
> +
> +       rcu_read_lock();
> +
> +       end = (start + len - 1) / PAGE_SIZE;
> +       xas_for_each(&xas, folio, end) {
> +               if (!folio_test_writeback(folio)) {
> +                       pr_err("bad %x @%llx page %lx %lx\n",
> +                              len, start, folio_index(folio), end);
> +                       BUG();
> +               }
> +
> +               folio_set_error(folio);
> +               folio_end_writeback(folio);
> +       }
> +
> +       rcu_read_unlock();
> +}
> +
> +/*
> + * Redirty pages after a temporary failure.
> + */
> +void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len)
> +{
> +       struct address_space *mapping = inode->i_mapping;
> +       struct folio *folio;
> +       pgoff_t end;
> +
> +       XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
> +
> +       rcu_read_lock();
> +
> +       end = (start + len - 1) / PAGE_SIZE;
> +       xas_for_each(&xas, folio, end) {
> +               if (!folio_test_writeback(folio)) {
> +                       pr_err("bad %x @%llx page %lx %lx\n",
> +                              len, start, folio_index(folio), end);
> +                       BUG();
> +               }
> +
> +               filemap_dirty_folio(folio->mapping, folio);
> +               folio_end_writeback(folio);
> +       }
> +
> +       rcu_read_unlock();
> +}
> +
>  /*
>   * Write failed with a retryable error. Resend the write request. It's also
>   * possible that the page was redirtied so re-clean the page.
> @@ -1936,51 +2021,56 @@ cifs_writedata_release(struct kref *refcount)
>  static void
>  cifs_writev_requeue(struct cifs_writedata *wdata)
>  {
> -       int i, rc = 0;
> +       int rc = 0;
>         struct inode *inode = d_inode(wdata->cfile->dentry);
>         struct TCP_Server_Info *server;
> -       unsigned int rest_len;
> +       unsigned int rest_len = wdata->bytes;
> +       loff_t fpos = wdata->offset;
>
>         server = tlink_tcon(wdata->cfile->tlink)->ses->server;
> -       i = 0;
> -       rest_len = wdata->bytes;
>         do {
>                 struct cifs_writedata *wdata2;
> -               unsigned int j, nr_pages, wsize, tailsz, cur_len;
> +               unsigned int wsize, cur_len;
>
>                 wsize = server->ops->wp_retry_size(inode);
>                 if (wsize < rest_len) {
> -                       nr_pages = wsize / PAGE_SIZE;
> -                       if (!nr_pages) {
> +                       if (wsize < PAGE_SIZE) {
>                                 rc = -ENOTSUPP;
>                                 break;
>                         }
> -                       cur_len = nr_pages * PAGE_SIZE;
> -                       tailsz = PAGE_SIZE;
> +                       cur_len = min(round_down(wsize, PAGE_SIZE), rest_len);
>                 } else {
> -                       nr_pages = DIV_ROUND_UP(rest_len, PAGE_SIZE);
>                         cur_len = rest_len;
> -                       tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE;
>                 }
>
> -               wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);
> +               wdata2 = cifs_writedata_alloc(cifs_writev_complete);
>                 if (!wdata2) {
>                         rc = -ENOMEM;
>                         break;
>                 }
>
> -               for (j = 0; j < nr_pages; j++) {
> -                       wdata2->pages[j] = wdata->pages[i + j];
> -                       lock_page(wdata2->pages[j]);
> -                       clear_page_dirty_for_io(wdata2->pages[j]);
> -               }
> -
>                 wdata2->sync_mode = wdata->sync_mode;
> -               wdata2->nr_pages = nr_pages;
> -               wdata2->offset = page_offset(wdata2->pages[0]);
> -               wdata2->pagesz = PAGE_SIZE;
> -               wdata2->tailsz = tailsz;
> -               wdata2->bytes = cur_len;
> +               wdata2->offset  = fpos;
> +               wdata2->bytes   = cur_len;
> +               wdata2->iter    = wdata->iter;
> +
> +               iov_iter_advance(&wdata2->iter, fpos - wdata->offset);
> +               iov_iter_truncate(&wdata2->iter, wdata2->bytes);
> +
> +#if 0
> +               if (iov_iter_is_xarray(&wdata2->iter)) {
> +                       /* TODO: Check for pages having been redirtied and
> +                        * clean them.  We can do this by walking the xarray.
> +                        * If it's not an xarray, then it's a DIO and we
> +                        * shouldn't be mucking around with the page bits.
> +                        */
> +                       for (j = 0; j < nr_pages; j++) {
> +                               wdata2->pages[j] = wdata->pages[i + j];
> +                               lock_page(wdata2->pages[j]);
> +                               clear_page_dirty_for_io(wdata2->pages[j]);
> +                       }
> +               }
> +#endif
>
>                 rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY,
>                                             &wdata2->cfile);
> @@ -1995,33 +2085,25 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
>                                                        cifs_writedata_release);
>                 }
>
> -               for (j = 0; j < nr_pages; j++) {
> -                       unlock_page(wdata2->pages[j]);
> -                       if (rc != 0 && !is_retryable_error(rc)) {
> -                               SetPageError(wdata2->pages[j]);
> -                               end_page_writeback(wdata2->pages[j]);
> -                               put_page(wdata2->pages[j]);
> -                       }
> -               }
> +               if (iov_iter_is_xarray(&wdata2->iter))
> +                       cifs_pages_written_back(inode, wdata2->offset, wdata2->bytes);
>
>                 kref_put(&wdata2->refcount, cifs_writedata_release);
>                 if (rc) {
>                         if (is_retryable_error(rc))
>                                 continue;
> -                       i += nr_pages;
> +                       fpos += cur_len;
> +                       rest_len -= cur_len;
>                         break;
>                 }
>
> +               fpos += cur_len;
>                 rest_len -= cur_len;
> -               i += nr_pages;
> -       } while (i < wdata->nr_pages);
> +       } while (rest_len > 0);
>
> -       /* cleanup remaining pages from the original wdata */
> -       for (; i < wdata->nr_pages; i++) {
> -               SetPageError(wdata->pages[i]);
> -               end_page_writeback(wdata->pages[i]);
> -               put_page(wdata->pages[i]);
> -       }
> +       /* Clean up remaining pages from the original wdata */
> +       if (iov_iter_is_xarray(&wdata->iter))
> +               cifs_pages_written_back(inode, fpos, rest_len);
>
>         if (rc != 0 && !is_retryable_error(rc))
>                 mapping_set_error(inode->i_mapping, rc);
> @@ -2034,7 +2116,6 @@ cifs_writev_complete(struct work_struct *work)
>         struct cifs_writedata *wdata = container_of(work,
>                                                 struct cifs_writedata, work);
>         struct inode *inode = d_inode(wdata->cfile->dentry);
> -       int i = 0;
>
>         if (wdata->result == 0) {
>                 spin_lock(&inode->i_lock);
> @@ -2045,40 +2126,25 @@ cifs_writev_complete(struct work_struct *work)
>         } else if (wdata->sync_mode == WB_SYNC_ALL && wdata->result == -EAGAIN)
>                 return cifs_writev_requeue(wdata);
>
> -       for (i = 0; i < wdata->nr_pages; i++) {
> -               struct page *page = wdata->pages[i];
> -               if (wdata->result == -EAGAIN)
> -                       __set_page_dirty_nobuffers(page);
> -               else if (wdata->result < 0)
> -                       SetPageError(page);
> -               end_page_writeback(page);
> -               cifs_readpage_to_fscache(inode, page);
> -               put_page(page);
> -       }
> +       if (wdata->result == -EAGAIN)
> +               cifs_pages_write_redirty(inode, wdata->offset, wdata->bytes);
> +       else if (wdata->result < 0)
> +               cifs_pages_write_failed(inode, wdata->offset, wdata->bytes);
> +       else
> +               cifs_pages_written_back(inode, wdata->offset, wdata->bytes);
> +
>         if (wdata->result != -EAGAIN)
>                 mapping_set_error(inode->i_mapping, wdata->result);
>         kref_put(&wdata->refcount, cifs_writedata_release);
>  }
>
>  struct cifs_writedata *
> -cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
> -{
> -       struct page **pages =
> -               kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
> -       if (pages)
> -               return cifs_writedata_direct_alloc(pages, complete);
> -
> -       return NULL;
> -}
> -
> -struct cifs_writedata *
> -cifs_writedata_direct_alloc(struct page **pages, work_func_t complete)
> +cifs_writedata_alloc(work_func_t complete)
>  {
>         struct cifs_writedata *wdata;
>
>         wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
>         if (wdata != NULL) {
> -               wdata->pages = pages;
>                 kref_init(&wdata->refcount);
>                 INIT_LIST_HEAD(&wdata->list);
>                 init_completion(&wdata->done);
> @@ -2186,11 +2252,7 @@ cifs_async_writev(struct cifs_writedata *wdata,
>
>         rqst.rq_iov = iov;
>         rqst.rq_nvec = 2;
> -       rqst.rq_pages = wdata->pages;
> -       rqst.rq_offset = wdata->page_offset;
> -       rqst.rq_npages = wdata->nr_pages;
> -       rqst.rq_pagesz = wdata->pagesz;
> -       rqst.rq_tailsz = wdata->tailsz;
> +       rqst.rq_iter = wdata->iter;
>
>         cifs_dbg(FYI, "async write at %llu %u bytes\n",
>                  wdata->offset, wdata->bytes);
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index ed210d774a21..d0851c9881b3 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -704,6 +704,22 @@ cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page,
>         return cifs_readv_from_socket(server, &smb_msg);
>  }
>
> +int
> +cifs_read_iter_from_socket(struct TCP_Server_Info *server, struct iov_iter *iter,
> +                          unsigned int to_read)
> +{
> +       struct msghdr smb_msg;
> +       int ret;
> +
> +       smb_msg.msg_iter = *iter;
> +       if (smb_msg.msg_iter.count > to_read)
> +               smb_msg.msg_iter.count = to_read;
> +       ret = cifs_readv_from_socket(server, &smb_msg);
> +       if (ret > 0)
> +               iov_iter_advance(iter, ret);
> +       return ret;
> +}
> +
>  static bool
>  is_smb_response(struct TCP_Server_Info *server, unsigned char type)
>  {
> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
> index 56598f7dbe00..f5fe5720456a 100644
> --- a/fs/cifs/misc.c
> +++ b/fs/cifs/misc.c
> @@ -1122,25 +1122,6 @@ cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc)
>         *shash = NULL;
>  }
>
> -/**
> - * rqst_page_get_length - obtain the length and offset for a page in smb_rqst
> - * @rqst: The request descriptor
> - * @page: The index of the page to query
> - * @len: Where to store the length for this page:
> - * @offset: Where to store the offset for this page
> - */
> -void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
> -                               unsigned int *len, unsigned int *offset)
> -{
> -       *len = rqst->rq_pagesz;
> -       *offset = (page == 0) ? rqst->rq_offset : 0;
> -
> -       if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
> -               *len = rqst->rq_tailsz;
> -       else if (page == 0)
> -               *len = rqst->rq_pagesz - rqst->rq_offset;
> -}
> -
>  void extract_unc_hostname(const char *unc, const char **h, size_t *len)
>  {
>         const char *end;
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index af5d0830bc8a..e1649ac194db 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -4406,15 +4406,30 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len,
>  static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf,
>                                    unsigned int buflen)
>  {
> -       void *addr;
> +       struct page *page;
> +
>         /*
>          * VMAP_STACK (at least) puts stack into the vmalloc address space
>          */
>         if (is_vmalloc_addr(buf))
> -               addr = vmalloc_to_page(buf);
> +               page = vmalloc_to_page(buf);
>         else
> -               addr = virt_to_page(buf);
> -       sg_set_page(sg, addr, buflen, offset_in_page(buf));
> +               page = virt_to_page(buf);
> +       sg_set_page(sg, page, buflen, offset_in_page(buf));
> +}
> +
> +struct cifs_init_sg_priv {
> +               struct scatterlist *sg;
> +               unsigned int idx;
> +};
> +
> +static ssize_t cifs_init_sg_scan(struct iov_iter *i, const void *p,
> +                                size_t len, size_t off, void *_priv)
> +{
> +       struct cifs_init_sg_priv *priv = _priv;
> +
> +       smb2_sg_set_buf(&priv->sg[priv->idx++], p, len);
> +       return len;
>  }
>
>  /* Assumes the first rqst has a transform header as the first iov.
> @@ -4426,43 +4441,46 @@ static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf,
>  static struct scatterlist *
>  init_sg(int num_rqst, struct smb_rqst *rqst, u8 *sign)
>  {
> +       struct cifs_init_sg_priv priv;
>         unsigned int sg_len;
> -       struct scatterlist *sg;
>         unsigned int i;
>         unsigned int j;
> -       unsigned int idx = 0;
> +       ssize_t rc;
>         int skip;
>
>         sg_len = 1;
> -       for (i = 0; i < num_rqst; i++)
> -               sg_len += rqst[i].rq_nvec + rqst[i].rq_npages;
> +       for (i = 0; i < num_rqst; i++) {
> +               unsigned int np = iov_iter_npages(&rqst[i].rq_iter, INT_MAX);
> +               sg_len += rqst[i].rq_nvec + np;
> +       }
>
> -       sg = kmalloc_array(sg_len, sizeof(struct scatterlist), GFP_KERNEL);
> -       if (!sg)
> +       priv.idx = 0;
> +       priv.sg = kmalloc_array(sg_len, sizeof(struct scatterlist), GFP_KERNEL);
> +       if (!priv.sg)
>                 return NULL;
>
> -       sg_init_table(sg, sg_len);
> +       sg_init_table(priv.sg, sg_len);
>         for (i = 0; i < num_rqst; i++) {
> +               struct iov_iter *iter = &rqst[i].rq_iter;
> +               size_t count = iov_iter_count(iter);
> +
>                 for (j = 0; j < rqst[i].rq_nvec; j++) {
>                         /*
>                          * The first rqst has a transform header where the
>                          * first 20 bytes are not part of the encrypted blob
>                          */
>                         skip = (i == 0) && (j == 0) ? 20 : 0;
> -                       smb2_sg_set_buf(&sg[idx++],
> +                       smb2_sg_set_buf(&priv.sg[priv.idx++],
>                                         rqst[i].rq_iov[j].iov_base + skip,
>                                         rqst[i].rq_iov[j].iov_len - skip);
> -                       }
> -
> -               for (j = 0; j < rqst[i].rq_npages; j++) {
> -                       unsigned int len, offset;
> -
> -                       rqst_page_get_length(&rqst[i], j, &len, &offset);
> -                       sg_set_page(&sg[idx++], rqst[i].rq_pages[j], len, offset);
>                 }
> +
> +               rc = iov_iter_scan(iter, count, cifs_init_sg_scan, &priv);
> +               iov_iter_revert(iter, count);
> +               WARN_ON(rc < 0);
>         }
> -       smb2_sg_set_buf(&sg[idx], sign, SMB2_SIGNATURE_SIZE);
> -       return sg;
> +       smb2_sg_set_buf(&priv.sg[priv.idx], sign, SMB2_SIGNATURE_SIZE);
> +       return priv.sg;
>  }
>
>  static int
> @@ -4599,18 +4617,30 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst,
>         return rc;
>  }
>
> +/*
> + * Clear a read buffer, discarding the folios which have XA_MARK_0 set.
> + */
> +static void cifs_clear_xarray_buffer(struct xarray *buffer)
> +{
> +       struct folio *folio;
> +       XA_STATE(xas, buffer, 0);
> +
> +       rcu_read_lock();
> +       xas_for_each_marked(&xas, folio, ULONG_MAX, XA_MARK_0) {
> +               folio_put(folio);
> +       }
> +       rcu_read_unlock();
> +       xa_destroy(buffer);
> +}
> +
>  void
>  smb3_free_compound_rqst(int num_rqst, struct smb_rqst *rqst)
>  {
> -       int i, j;
> +       int i;
>
> -       for (i = 0; i < num_rqst; i++) {
> -               if (rqst[i].rq_pages) {
> -                       for (j = rqst[i].rq_npages - 1; j >= 0; j--)
> -                               put_page(rqst[i].rq_pages[j]);
> -                       kfree(rqst[i].rq_pages);
> -               }
> -       }
> +       for (i = 0; i < num_rqst; i++)
> +               if (!xa_empty(&rqst[i].rq_buffer))
> +                       cifs_clear_xarray_buffer(&rqst[i].rq_buffer);
>  }
>
>  /*
> @@ -4630,50 +4660,51 @@ static int
>  smb3_init_transform_rq(struct TCP_Server_Info *server, int num_rqst,
>                        struct smb_rqst *new_rq, struct smb_rqst *old_rq)
>  {
> -       struct page **pages;
>         struct smb2_transform_hdr *tr_hdr = new_rq[0].rq_iov[0].iov_base;
> -       unsigned int npages;
> +       struct page *page;
>         unsigned int orig_len = 0;
>         int i, j;
>         int rc = -ENOMEM;
>
>         for (i = 1; i < num_rqst; i++) {
> -               npages = old_rq[i - 1].rq_npages;
> -               pages = kmalloc_array(npages, sizeof(struct page *),
> -                                     GFP_KERNEL);
> -               if (!pages)
> -                       goto err_free;
> -
> -               new_rq[i].rq_pages = pages;
> -               new_rq[i].rq_npages = npages;
> -               new_rq[i].rq_offset = old_rq[i - 1].rq_offset;
> -               new_rq[i].rq_pagesz = old_rq[i - 1].rq_pagesz;
> -               new_rq[i].rq_tailsz = old_rq[i - 1].rq_tailsz;
> -               new_rq[i].rq_iov = old_rq[i - 1].rq_iov;
> -               new_rq[i].rq_nvec = old_rq[i - 1].rq_nvec;
> -
> -               orig_len += smb_rqst_len(server, &old_rq[i - 1]);
> -
> -               for (j = 0; j < npages; j++) {
> -                       pages[j] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
> -                       if (!pages[j])
> -                               goto err_free;
> -               }
> -
> -               /* copy pages form the old */
> -               for (j = 0; j < npages; j++) {
> -                       char *dst, *src;
> -                       unsigned int offset, len;
> -
> -                       rqst_page_get_length(&new_rq[i], j, &len, &offset);
> -
> -                       dst = (char *) kmap(new_rq[i].rq_pages[j]) + offset;
> -                       src = (char *) kmap(old_rq[i - 1].rq_pages[j]) + offset;
> +               struct smb_rqst *old = &old_rq[i - 1];
> +               struct smb_rqst *new = &new_rq[i];
> +               struct xarray *buffer = &new->rq_buffer;
> +               unsigned int npages;
> +               size_t size = iov_iter_count(&old->rq_iter), seg, copied = 0;
> +
> +               xa_init(buffer);
> +
> +               if (size > 0) {
> +                       npages = DIV_ROUND_UP(size, PAGE_SIZE);
> +                       for (j = 0; j < npages; j++) {
> +                               void *o;
> +
> +                               rc = -ENOMEM;
> +                               page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
> +                               if (!page)
> +                                       goto err_free;
> +                               page->index = j;
> +                               o = xa_store(buffer, j, page, GFP_KERNEL);
> +                               if (xa_is_err(o)) {
> +                                       rc = xa_err(o);
> +                                       put_page(page);
> +                                       goto err_free;
> +                               }
>
> -                       memcpy(dst, src, len);
> -                       kunmap(new_rq[i].rq_pages[j]);
> -                       kunmap(old_rq[i - 1].rq_pages[j]);
> +                               seg = min(size - copied, PAGE_SIZE);
> +                               if (copy_page_from_iter(page, 0, seg, &old->rq_iter) != seg) {
> +                                       rc = -EFAULT;
> +                                       goto err_free;
> +                               }
> +                               copied += seg;
> +                       }
> +                       iov_iter_xarray(&new->rq_iter, iov_iter_rw(&old->rq_iter),
> +                                       buffer, 0, size);
>                 }
> +               new->rq_iov = old->rq_iov;
> +               new->rq_nvec = old->rq_nvec;
> +               orig_len += smb_rqst_len(server, new);
>         }
>
>         /* fill the 1st iov with a transform header */
> @@ -4701,12 +4732,12 @@ smb3_is_transform_hdr(void *buf)
>
>  static int
>  decrypt_raw_data(struct TCP_Server_Info *server, char *buf,
> -                unsigned int buf_data_size, struct page **pages,
> -                unsigned int npages, unsigned int page_data_size,
> +                unsigned int buf_data_size, struct iov_iter *iter,
>                  bool is_offloaded)
>  {
>         struct kvec iov[2];
>         struct smb_rqst rqst = {NULL};
> +       size_t iter_size = 0;
>         int rc;
>
>         iov[0].iov_base = buf;
> @@ -4716,10 +4747,10 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf,
>
>         rqst.rq_iov = iov;
>         rqst.rq_nvec = 2;
> -       rqst.rq_pages = pages;
> -       rqst.rq_npages = npages;
> -       rqst.rq_pagesz = PAGE_SIZE;
> -       rqst.rq_tailsz = (page_data_size % PAGE_SIZE) ? : PAGE_SIZE;
> +       if (iter) {
> +               rqst.rq_iter = *iter;
> +               iter_size = iov_iter_count(iter);
> +       }
>
>         rc = crypt_message(server, 1, &rqst, 0);
>         cifs_dbg(FYI, "Decrypt message returned %d\n", rc);
> @@ -4730,73 +4761,37 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf,
>         memmove(buf, iov[1].iov_base, buf_data_size);
>
>         if (!is_offloaded)
> -               server->total_read = buf_data_size + page_data_size;
> +               server->total_read = buf_data_size + iter_size;
>
>         return rc;
>  }
>
>  static int
> -read_data_into_pages(struct TCP_Server_Info *server, struct page **pages,
> -                    unsigned int npages, unsigned int len)
> +cifs_copy_pages_to_iter(struct xarray *pages, unsigned int data_size,
> +                       unsigned int skip, struct iov_iter *iter)
>  {
> -       int i;
> -       int length;
> +       struct page *page;
> +       unsigned long index;
>
> -       for (i = 0; i < npages; i++) {
> -               struct page *page = pages[i];
> -               size_t n;
> +       xa_for_each(pages, index, page) {
> +               size_t n, len = min_t(unsigned int, PAGE_SIZE - skip, data_size);
>
> -               n = len;
> -               if (len >= PAGE_SIZE) {
> -                       /* enough data to fill the page */
> -                       n = PAGE_SIZE;
> -                       len -= n;
> -               } else {
> -                       zero_user(page, len, PAGE_SIZE - len);
> -                       len = 0;
> +               n = copy_page_to_iter(page, skip, len, iter);
> +               if (n != len) {
> +                       cifs_dbg(VFS, "%s: something went wrong\n", __func__);
> +                       return -EIO;
>                 }
> -               length = cifs_read_page_from_socket(server, page, 0, n);
> -               if (length < 0)
> -                       return length;
> -               server->total_read += length;
> +               data_size -= n;
> +               skip = 0;
>         }
>
>         return 0;
>  }
>
> -static int
> -init_read_bvec(struct page **pages, unsigned int npages, unsigned int data_size,
> -              unsigned int cur_off, struct bio_vec **page_vec)
> -{
> -       struct bio_vec *bvec;
> -       int i;
> -
> -       bvec = kcalloc(npages, sizeof(struct bio_vec), GFP_KERNEL);
> -       if (!bvec)
> -               return -ENOMEM;
> -
> -       for (i = 0; i < npages; i++) {
> -               bvec[i].bv_page = pages[i];
> -               bvec[i].bv_offset = (i == 0) ? cur_off : 0;
> -               bvec[i].bv_len = min_t(unsigned int, PAGE_SIZE, data_size);
> -               data_size -= bvec[i].bv_len;
> -       }
> -
> -       if (data_size != 0) {
> -               cifs_dbg(VFS, "%s: something went wrong\n", __func__);
> -               kfree(bvec);
> -               return -EIO;
> -       }
> -
> -       *page_vec = bvec;
> -       return 0;
> -}
> -
>  static int
>  handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
> -                char *buf, unsigned int buf_len, struct page **pages,
> -                unsigned int npages, unsigned int page_data_size,
> -                bool is_offloaded)
> +                char *buf, unsigned int buf_len, struct xarray *pages,
> +                unsigned int pages_len, bool is_offloaded)
>  {
>         unsigned int data_offset;
>         unsigned int data_len;
> @@ -4805,9 +4800,6 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
>         unsigned int pad_len;
>         struct cifs_readdata *rdata = mid->callback_data;
>         struct smb2_hdr *shdr = (struct smb2_hdr *)buf;
> -       struct bio_vec *bvec = NULL;
> -       struct iov_iter iter;
> -       struct kvec iov;
>         int length;
>         bool use_rdma_mr = false;
>
> @@ -4896,7 +4888,7 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
>                         return 0;
>                 }
>
> -               if (data_len > page_data_size - pad_len) {
> +               if (data_len > pages_len - pad_len) {
>                         /* data_len is corrupt -- discard frame */
>                         rdata->result = -EIO;
>                         if (is_offloaded)
> @@ -4906,8 +4898,9 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
>                         return 0;
>                 }
>
> -               rdata->result = init_read_bvec(pages, npages, page_data_size,
> -                                              cur_off, &bvec);
> +               /* Copy the data to the output I/O iterator. */
> +               rdata->result = cifs_copy_pages_to_iter(pages, pages_len,
> +                                                       cur_off, &rdata->iter);
>                 if (rdata->result != 0) {
>                         if (is_offloaded)
>                                 mid->mid_state = MID_RESPONSE_MALFORMED;
> @@ -4915,14 +4908,15 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
>                                 dequeue_mid(mid, rdata->result);
>                         return 0;
>                 }
> +               rdata->got_bytes = pages_len;
>
> -               iov_iter_bvec(&iter, WRITE, bvec, npages, data_len);
>         } else if (buf_len >= data_offset + data_len) {
>                 /* read response payload is in buf */
> -               WARN_ONCE(npages > 0, "read data can be either in buf or in pages");
> -               iov.iov_base = buf + data_offset;
> -               iov.iov_len = data_len;
> -               iov_iter_kvec(&iter, WRITE, &iov, 1, data_len);
> +               WARN_ONCE(pages && !xa_empty(pages),
> +                         "read data can be either in buf or in pages");
> +               length = copy_to_iter(buf + data_offset, data_len, &rdata->iter);
> +               if (length < 0)
> +                       return length;
>         } else {
>                 /* read response payload cannot be in both buf and pages */
>                 WARN_ONCE(1, "buf can not contain only a part of read data");
> @@ -4934,13 +4928,6 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
>                 return 0;
>         }
>
> -       length = rdata->copy_into_pages(server, rdata, &iter);
> -
> -       kfree(bvec);
> -
> -       if (length < 0)
> -               return length;
> -
>         if (is_offloaded)
>                 mid->mid_state = MID_RESPONSE_RECEIVED;
>         else
> @@ -4951,9 +4938,8 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
>  struct smb2_decrypt_work {
>         struct work_struct decrypt;
>         struct TCP_Server_Info *server;
> -       struct page **ppages;
> +       struct xarray buffer;
>         char *buf;
> -       unsigned int npages;
>         unsigned int len;
>  };
>
> @@ -4962,11 +4948,13 @@ static void smb2_decrypt_offload(struct work_struct *work)
>  {
>         struct smb2_decrypt_work *dw = container_of(work,
>                                 struct smb2_decrypt_work, decrypt);
> -       int i, rc;
> +       int rc;
>         struct mid_q_entry *mid;
> +       struct iov_iter iter;
>
> +       iov_iter_xarray(&iter, READ, &dw->buffer, 0, dw->len);
>         rc = decrypt_raw_data(dw->server, dw->buf, dw->server->vals->read_rsp_size,
> -                             dw->ppages, dw->npages, dw->len, true);
> +                             &iter, true);
>         if (rc) {
>                 cifs_dbg(VFS, "error decrypting rc=%d\n", rc);
>                 goto free_pages;
> @@ -4980,7 +4968,7 @@ static void smb2_decrypt_offload(struct work_struct *work)
>                 mid->decrypted = true;
>                 rc = handle_read_data(dw->server, mid, dw->buf,
>                                       dw->server->vals->read_rsp_size,
> -                                     dw->ppages, dw->npages, dw->len,
> +                                     &dw->buffer, dw->len,
>                                       true);
>                 if (rc >= 0) {
>  #ifdef CONFIG_CIFS_STATS2
> @@ -5012,10 +5000,7 @@ static void smb2_decrypt_offload(struct work_struct *work)
>         }
>
>  free_pages:
> -       for (i = dw->npages-1; i >= 0; i--)
> -               put_page(dw->ppages[i]);
> -
> -       kfree(dw->ppages);
> +       cifs_clear_xarray_buffer(&dw->buffer);
>         cifs_small_buf_release(dw->buf);
>         kfree(dw);
>  }
> @@ -5025,47 +5010,66 @@ static int
>  receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
>                        int *num_mids)
>  {
> +       struct page *page;
>         char *buf = server->smallbuf;
>         struct smb2_transform_hdr *tr_hdr = (struct smb2_transform_hdr *)buf;
> -       unsigned int npages;
> -       struct page **pages;
> -       unsigned int len;
> +       struct iov_iter iter;
> +       unsigned int len, npages;
>         unsigned int buflen = server->pdu_size;
>         int rc;
>         int i = 0;
>         struct smb2_decrypt_work *dw;
>
> +       dw = kzalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL);
> +       if (!dw)
> +               return -ENOMEM;
> +       xa_init(&dw->buffer);
> +       INIT_WORK(&dw->decrypt, smb2_decrypt_offload);
> +       dw->server = server;
> +
>         *num_mids = 1;
>         len = min_t(unsigned int, buflen, server->vals->read_rsp_size +
>                 sizeof(struct smb2_transform_hdr)) - HEADER_SIZE(server) + 1;
>
>         rc = cifs_read_from_socket(server, buf + HEADER_SIZE(server) - 1, len);
>         if (rc < 0)
> -               return rc;
> +               goto free_dw;
>         server->total_read += rc;
>
>         len = le32_to_cpu(tr_hdr->OriginalMessageSize) -
>                 server->vals->read_rsp_size;
> +       dw->len = len;
>         npages = DIV_ROUND_UP(len, PAGE_SIZE);
>
> -       pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
> -       if (!pages) {
> -               rc = -ENOMEM;
> -               goto discard_data;
> -       }
> -
> +       rc = -ENOMEM;
>         for (; i < npages; i++) {
> -               pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
> -               if (!pages[i]) {
> -                       rc = -ENOMEM;
> +               void *old;
> +
> +               page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
> +               if (!page) {
> +                       goto discard_data;
> +               }
> +               page->index = i;
> +               old = xa_store(&dw->buffer, i, page, GFP_KERNEL);
> +               if (xa_is_err(old)) {
> +                       rc = xa_err(old);
> +                       put_page(page);
>                         goto discard_data;
>                 }
>         }
>
> -       /* read read data into pages */
> -       rc = read_data_into_pages(server, pages, npages, len);
> -       if (rc)
> -               goto free_pages;
> +       iov_iter_xarray(&iter, READ, &dw->buffer, 0, npages * PAGE_SIZE);
> +
> +       /* Read the data into the buffer and clear excess bufferage. */
> +       rc = cifs_read_iter_from_socket(server, &iter, dw->len);
> +       if (rc < 0)
> +               goto discard_data;
> +
> +       server->total_read += rc;
> +       if (rc < npages * PAGE_SIZE)
> +               iov_iter_zero(npages * PAGE_SIZE - rc, &iter);
> +       iov_iter_revert(&iter, npages * PAGE_SIZE);
> +       iov_iter_truncate(&iter, dw->len);
>
>         rc = cifs_discard_remaining_data(server);
>         if (rc)
> @@ -5078,39 +5082,28 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
>
>         if ((server->min_offload) && (server->in_flight > 1) &&
>             (server->pdu_size >= server->min_offload)) {
> -               dw = kmalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL);
> -               if (dw == NULL)
> -                       goto non_offloaded_decrypt;
> -
>                 dw->buf = server->smallbuf;
>                 server->smallbuf = (char *)cifs_small_buf_get();
>
> -               INIT_WORK(&dw->decrypt, smb2_decrypt_offload);
> -
> -               dw->npages = npages;
> -               dw->server = server;
> -               dw->ppages = pages;
> -               dw->len = len;
>                 queue_work(decrypt_wq, &dw->decrypt);
>                 *num_mids = 0; /* worker thread takes care of finding mid */
>                 return -1;
>         }
>
> -non_offloaded_decrypt:
>         rc = decrypt_raw_data(server, buf, server->vals->read_rsp_size,
> -                             pages, npages, len, false);
> +                             &iter, false);
>         if (rc)
>                 goto free_pages;
>
>         *mid = smb2_find_mid(server, buf);
> -       if (*mid == NULL)
> +       if (*mid == NULL) {
>                 cifs_dbg(FYI, "mid not found\n");
> -       else {
> +       } else {
>                 cifs_dbg(FYI, "mid found\n");
>                 (*mid)->decrypted = true;
>                 rc = handle_read_data(server, *mid, buf,
>                                       server->vals->read_rsp_size,
> -                                     pages, npages, len, false);
> +                                     &dw->buffer, dw->len, false);
>                 if (rc >= 0) {
>                         if (server->ops->is_network_name_deleted) {
>                                 server->ops->is_network_name_deleted(buf,
> @@ -5120,9 +5113,9 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
>         }
>
>  free_pages:
> -       for (i = i - 1; i >= 0; i--)
> -               put_page(pages[i]);
> -       kfree(pages);
> +       cifs_clear_xarray_buffer(&dw->buffer);
> +free_dw:
> +       kfree(dw);
>         return rc;
>  discard_data:
>         cifs_discard_remaining_data(server);
> @@ -5160,7 +5153,7 @@ receive_encrypted_standard(struct TCP_Server_Info *server,
>         server->total_read += length;
>
>         buf_size = pdu_length - sizeof(struct smb2_transform_hdr);
> -       length = decrypt_raw_data(server, buf, buf_size, NULL, 0, 0, false);
> +       length = decrypt_raw_data(server, buf, buf_size, NULL, false);
>         if (length)
>                 return length;
>
> @@ -5259,7 +5252,7 @@ smb3_handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid)
>         char *buf = server->large_buf ? server->bigbuf : server->smallbuf;
>
>         return handle_read_data(server, mid, buf, server->pdu_size,
> -                               NULL, 0, 0, false);
> +                               NULL, 0, false);
>  }
>
>  static int
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> index 7e7909b1ae11..ebbea7526ee2 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -4118,11 +4118,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
>         struct cifs_credits credits = { .value = 0, .instance = 0 };
>         struct smb_rqst rqst = { .rq_iov = &rdata->iov[1],
>                                  .rq_nvec = 1,
> -                                .rq_pages = rdata->pages,
> -                                .rq_offset = rdata->page_offset,
> -                                .rq_npages = rdata->nr_pages,
> -                                .rq_pagesz = rdata->pagesz,
> -                                .rq_tailsz = rdata->tailsz };
> +                                .rq_iter = rdata->iter };
>
>         WARN_ONCE(rdata->server != mid->server,
>                   "rdata server %p != mid server %p",
> @@ -4522,11 +4518,7 @@ smb2_async_writev(struct cifs_writedata *wdata,
>
>         rqst.rq_iov = iov;
>         rqst.rq_nvec = 1;
> -       rqst.rq_pages = wdata->pages;
> -       rqst.rq_offset = wdata->page_offset;
> -       rqst.rq_npages = wdata->nr_pages;
> -       rqst.rq_pagesz = wdata->pagesz;
> -       rqst.rq_tailsz = wdata->tailsz;
> +       rqst.rq_iter = wdata->iter;
>  #ifdef CONFIG_CIFS_SMB_DIRECT
>         if (wdata->mr) {
>                 iov[0].iov_len += sizeof(struct smbd_buffer_descriptor_v1);
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 8540f7c13eae..cb19c43c0009 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -276,26 +276,7 @@ smb_rqst_len(struct TCP_Server_Info *server, struct smb_rqst *rqst)
>         for (i = 0; i < nvec; i++)
>                 buflen += iov[i].iov_len;
>
> -       /*
> -        * Add in the page array if there is one. The caller needs to make
> -        * sure rq_offset and rq_tailsz are set correctly. If a buffer of
> -        * multiple pages ends at page boundary, rq_tailsz needs to be set to
> -        * PAGE_SIZE.
> -        */
> -       if (rqst->rq_npages) {
> -               if (rqst->rq_npages == 1)
> -                       buflen += rqst->rq_tailsz;
> -               else {
> -                       /*
> -                        * If there is more than one page, calculate the
> -                        * buffer length based on rq_offset and rq_tailsz
> -                        */
> -                       buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> -                                       rqst->rq_offset;
> -                       buflen += rqst->rq_tailsz;
> -               }
> -       }
> -
> +       buflen += iov_iter_count(&rqst->rq_iter);
>         return buflen;
>  }
>
> @@ -382,23 +363,15 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst,
>
>                 total_len += sent;
>
> -               /* now walk the page array and send each page in it */
> -               for (i = 0; i < rqst[j].rq_npages; i++) {
> -                       struct bio_vec bvec;
> -
> -                       bvec.bv_page = rqst[j].rq_pages[i];
> -                       rqst_page_get_length(&rqst[j], i, &bvec.bv_len,
> -                                            &bvec.bv_offset);
> -
> -                       iov_iter_bvec(&smb_msg.msg_iter, WRITE,
> -                                     &bvec, 1, bvec.bv_len);
> +               if (iov_iter_count(&rqst[j].rq_iter) > 0) {
> +                       smb_msg.msg_iter = rqst[j].rq_iter;
>                         rc = smb_send_kvec(server, &smb_msg, &sent);
>                         if (rc < 0)
>                                 break;
> -
>                         total_len += sent;
>                 }
> -       }
> +
> +}
>
>  unmask:
>         sigprocmask(SIG_SETMASK, &oldmask, NULL);
>
>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 3/7] cifs: Change the I/O paths to use an iterator rather than a page list
  2022-01-31  5:06   ` Rohith Surabattula
@ 2022-01-31  5:48     ` Shyam Prasad N
  0 siblings, 0 replies; 20+ messages in thread
From: Shyam Prasad N @ 2022-01-31  5:48 UTC (permalink / raw)
  To: Rohith Surabattula
  Cc: David Howells, Steve French, jlayton, CIFS, linux-cachefs, linux-fsdevel

Looks good to me.
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>

On Mon, Jan 31, 2022 at 10:36 AM Rohith Surabattula
<rohiths.msft@gmail.com> wrote:
>
> Hi David,
>
> After copying the buf to the XArray iterator, "got_bytes" field is not
> updated. As a result, the read of data which is less than page size
> failed.
> Below is the patch to fix the above issue.
>
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index e1649ac194db..5faf45672891 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -4917,6 +4917,7 @@ handle_read_data(struct TCP_Server_Info *server,
> struct mid_q_entry *mid,
>                 length = copy_to_iter(buf + data_offset, data_len,
> &rdata->iter);
>                 if (length < 0)
>                         return length;
> +               rdata->got_bytes = data_len;
>         } else {
>                 /* read response payload cannot be in both buf and pages */
>                 WARN_ONCE(1, "buf can not contain only a part of read data");
>
> Regards,
> Rohith
>
> On Wed, Jan 26, 2022 at 1:21 AM David Howells <dhowells@redhat.com> wrote:
> >
> >
> > ---
> >
> >  fs/cifs/cifsencrypt.c |   40 +++--
> >  fs/cifs/cifsfs.c      |    2
> >  fs/cifs/cifsfs.h      |    3
> >  fs/cifs/cifsglob.h    |   28 +---
> >  fs/cifs/cifsproto.h   |   10 +
> >  fs/cifs/cifssmb.c     |  224 +++++++++++++++++++-----------
> >  fs/cifs/connect.c     |   16 ++
> >  fs/cifs/misc.c        |   19 ---
> >  fs/cifs/smb2ops.c     |  365 ++++++++++++++++++++++++-------------------------
> >  fs/cifs/smb2pdu.c     |   12 --
> >  fs/cifs/transport.c   |   37 +----
> >  11 files changed, 379 insertions(+), 377 deletions(-)
> >
> > diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
> > index 0912d8bbbac1..69bbf3d6c4d4 100644
> > --- a/fs/cifs/cifsencrypt.c
> > +++ b/fs/cifs/cifsencrypt.c
> > @@ -24,12 +24,27 @@
> >  #include "../smbfs_common/arc4.h"
> >  #include <crypto/aead.h>
> >
> > +static ssize_t cifs_signature_scan(struct iov_iter *i, const void *p,
> > +                                  size_t len, size_t off, void *priv)
> > +{
> > +       struct shash_desc *shash = priv;
> > +       int rc;
> > +
> > +       rc = crypto_shash_update(shash, p, len);
> > +       if (rc) {
> > +               cifs_dbg(VFS, "%s: Could not update with payload\n", __func__);
> > +               return rc;
> > +       }
> > +
> > +       return len;
> > +}
> > +
> >  int __cifs_calc_signature(struct smb_rqst *rqst,
> >                         struct TCP_Server_Info *server, char *signature,
> >                         struct shash_desc *shash)
> >  {
> >         int i;
> > -       int rc;
> > +       ssize_t rc;
> >         struct kvec *iov = rqst->rq_iov;
> >         int n_vec = rqst->rq_nvec;
> >         int is_smb2 = server->vals->header_preamble_size == 0;
> > @@ -62,25 +77,10 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
> >                 }
> >         }
> >
> > -       /* now hash over the rq_pages array */
> > -       for (i = 0; i < rqst->rq_npages; i++) {
> > -               void *kaddr;
> > -               unsigned int len, offset;
> > -
> > -               rqst_page_get_length(rqst, i, &len, &offset);
> > -
> > -               kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;
> > -
> > -               rc = crypto_shash_update(shash, kaddr, len);
> > -               if (rc) {
> > -                       cifs_dbg(VFS, "%s: Could not update with payload\n",
> > -                                __func__);
> > -                       kunmap(rqst->rq_pages[i]);
> > -                       return rc;
> > -               }
> > -
> > -               kunmap(rqst->rq_pages[i]);
> > -       }
> > +       rc = iov_iter_scan(&rqst->rq_iter, iov_iter_count(&rqst->rq_iter),
> > +                          cifs_signature_scan, shash);
> > +       if (rc < 0)
> > +               return rc;
> >
> >         rc = crypto_shash_final(shash, signature);
> >         if (rc)
> > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> > index 199edac0cb59..a56cb9c8c5ff 100644
> > --- a/fs/cifs/cifsfs.c
> > +++ b/fs/cifs/cifsfs.c
> > @@ -935,7 +935,7 @@ cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> >         ssize_t rc;
> >         struct inode *inode = file_inode(iocb->ki_filp);
> >
> > -       if (iocb->ki_filp->f_flags & O_DIRECT)
> > +       if (iocb->ki_flags & IOCB_DIRECT)
> >                 return cifs_user_readv(iocb, iter);
> >
> >         rc = cifs_revalidate_mapping(inode);
> > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> > index 15a5c5db038b..1c77bbc0815f 100644
> > --- a/fs/cifs/cifsfs.h
> > +++ b/fs/cifs/cifsfs.h
> > @@ -110,6 +110,9 @@ extern int cifs_file_strict_mmap(struct file * , struct vm_area_struct *);
> >  extern const struct file_operations cifs_dir_ops;
> >  extern int cifs_dir_open(struct inode *inode, struct file *file);
> >  extern int cifs_readdir(struct file *file, struct dir_context *ctx);
> > +extern void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len);
> > +extern void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len);
> > +extern void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len);
> >
> >  /* Functions related to dir entries */
> >  extern const struct dentry_operations cifs_dentry_ops;
> > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> > index 0a4085ced40f..3a4fed645636 100644
> > --- a/fs/cifs/cifsglob.h
> > +++ b/fs/cifs/cifsglob.h
> > @@ -191,11 +191,8 @@ struct cifs_cred {
> >  struct smb_rqst {
> >         struct kvec     *rq_iov;        /* array of kvecs */
> >         unsigned int    rq_nvec;        /* number of kvecs in array */
> > -       struct page     **rq_pages;     /* pointer to array of page ptrs */
> > -       unsigned int    rq_offset;      /* the offset to the 1st page */
> > -       unsigned int    rq_npages;      /* number pages in array */
> > -       unsigned int    rq_pagesz;      /* page size to use */
> > -       unsigned int    rq_tailsz;      /* length of last page */
> > +       struct iov_iter rq_iter;        /* Data iterator */
> > +       struct xarray   rq_buffer;      /* Page buffer for encryption */
> >  };
> >
> >  struct mid_q_entry;
> > @@ -1323,28 +1320,18 @@ struct cifs_readdata {
> >         struct address_space            *mapping;
> >         struct cifs_aio_ctx             *ctx;
> >         __u64                           offset;
> > +       ssize_t                         got_bytes;
> >         unsigned int                    bytes;
> > -       unsigned int                    got_bytes;
> >         pid_t                           pid;
> >         int                             result;
> >         struct work_struct              work;
> > -       int (*read_into_pages)(struct TCP_Server_Info *server,
> > -                               struct cifs_readdata *rdata,
> > -                               unsigned int len);
> > -       int (*copy_into_pages)(struct TCP_Server_Info *server,
> > -                               struct cifs_readdata *rdata,
> > -                               struct iov_iter *iter);
> > +       struct iov_iter                 iter;
> >         struct kvec                     iov[2];
> >         struct TCP_Server_Info          *server;
> >  #ifdef CONFIG_CIFS_SMB_DIRECT
> >         struct smbd_mr                  *mr;
> >  #endif
> > -       unsigned int                    pagesz;
> > -       unsigned int                    page_offset;
> > -       unsigned int                    tailsz;
> >         struct cifs_credits             credits;
> > -       unsigned int                    nr_pages;
> > -       struct page                     **pages;
> >  };
> >
> >  /* asynchronous write support */
> > @@ -1356,6 +1343,8 @@ struct cifs_writedata {
> >         struct work_struct              work;
> >         struct cifsFileInfo             *cfile;
> >         struct cifs_aio_ctx             *ctx;
> > +       struct iov_iter                 iter;
> > +       struct bio_vec                  *bv;
> >         __u64                           offset;
> >         pid_t                           pid;
> >         unsigned int                    bytes;
> > @@ -1364,12 +1353,7 @@ struct cifs_writedata {
> >  #ifdef CONFIG_CIFS_SMB_DIRECT
> >         struct smbd_mr                  *mr;
> >  #endif
> > -       unsigned int                    pagesz;
> > -       unsigned int                    page_offset;
> > -       unsigned int                    tailsz;
> >         struct cifs_credits             credits;
> > -       unsigned int                    nr_pages;
> > -       struct page                     **pages;
> >  };
> >
> >  /*
> > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> > index d3701295402d..1b143f0a03c0 100644
> > --- a/fs/cifs/cifsproto.h
> > +++ b/fs/cifs/cifsproto.h
> > @@ -242,6 +242,9 @@ extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
> >                                         unsigned int page_offset,
> >                                         unsigned int to_read);
> >  extern int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb);
> > +extern int cifs_read_iter_from_socket(struct TCP_Server_Info *server,
> > +                                     struct iov_iter *iter,
> > +                                     unsigned int to_read);
> >  extern int cifs_match_super(struct super_block *, void *);
> >  extern int cifs_mount(struct cifs_sb_info *cifs_sb, struct smb3_fs_context *ctx);
> >  extern void cifs_umount(struct cifs_sb_info *);
> > @@ -575,10 +578,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid);
> >  int cifs_async_writev(struct cifs_writedata *wdata,
> >                       void (*release)(struct kref *kref));
> >  void cifs_writev_complete(struct work_struct *work);
> > -struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
> > -                                               work_func_t complete);
> > -struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages,
> > -                                               work_func_t complete);
> > +struct cifs_writedata *cifs_writedata_alloc(work_func_t complete);
> >  void cifs_writedata_release(struct kref *refcount);
> >  int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
> >                           struct cifs_sb_info *cifs_sb,
> > @@ -602,8 +602,6 @@ int cifs_alloc_hash(const char *name, struct crypto_shash **shash,
> >                     struct sdesc **sdesc);
> >  void cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc);
> >
> > -extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
> > -                               unsigned int *len, unsigned int *offset);
> >  struct cifs_chan *
> >  cifs_ses_find_chan(struct cifs_ses *ses, struct TCP_Server_Info *server);
> >  int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses);
> > diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> > index 071e2f21a7db..38e7276352e2 100644
> > --- a/fs/cifs/cifssmb.c
> > +++ b/fs/cifs/cifssmb.c
> > @@ -24,6 +24,7 @@
> >  #include <linux/task_io_accounting_ops.h>
> >  #include <linux/uaccess.h>
> >  #include "cifspdu.h"
> > +#include "cifsfs.h"
> >  #include "cifsglob.h"
> >  #include "cifsacl.h"
> >  #include "cifsproto.h"
> > @@ -1388,11 +1389,11 @@ int
> >  cifs_discard_remaining_data(struct TCP_Server_Info *server)
> >  {
> >         unsigned int rfclen = server->pdu_size;
> > -       int remaining = rfclen + server->vals->header_preamble_size -
> > +       size_t remaining = rfclen + server->vals->header_preamble_size -
> >                 server->total_read;
> >
> >         while (remaining > 0) {
> > -               int length;
> > +               ssize_t length;
> >
> >                 length = cifs_discard_from_socket(server,
> >                                 min_t(size_t, remaining,
> > @@ -1539,10 +1540,15 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
> >                 return cifs_readv_discard(server, mid);
> >         }
> >
> > -       length = rdata->read_into_pages(server, rdata, data_len);
> > -       if (length < 0)
> > -               return length;
> > -
> > +#ifdef CONFIG_CIFS_SMB_DIRECT
> > +       if (rdata->mr)
> > +               length = data_len; /* An RDMA read is already done. */
> > +       else
> > +#endif
> > +               length = cifs_read_iter_from_socket(server, &rdata->iter,
> > +                                                   data_len);
> > +       if (length > 0)
> > +               rdata->got_bytes += length;
> >         server->total_read += length;
> >
> >         cifs_dbg(FYI, "total_read=%u buflen=%u remaining=%u\n",
> > @@ -1566,11 +1572,7 @@ cifs_readv_callback(struct mid_q_entry *mid)
> >         struct TCP_Server_Info *server = tcon->ses->server;
> >         struct smb_rqst rqst = { .rq_iov = rdata->iov,
> >                                  .rq_nvec = 2,
> > -                                .rq_pages = rdata->pages,
> > -                                .rq_offset = rdata->page_offset,
> > -                                .rq_npages = rdata->nr_pages,
> > -                                .rq_pagesz = rdata->pagesz,
> > -                                .rq_tailsz = rdata->tailsz };
> > +                                .rq_iter = rdata->iter };
> >         struct cifs_credits credits = { .value = 1, .instance = 0 };
> >
> >         cifs_dbg(FYI, "%s: mid=%llu state=%d result=%d bytes=%u\n",
> > @@ -1925,10 +1927,93 @@ cifs_writedata_release(struct kref *refcount)
> >         if (wdata->cfile)
> >                 cifsFileInfo_put(wdata->cfile);
> >
> > -       kvfree(wdata->pages);
> >         kfree(wdata);
> >  }
> >
> > +/*
> > + * Completion of write to server.
> > + */
> > +void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len)
> > +{
> > +       struct address_space *mapping = inode->i_mapping;
> > +       struct folio *folio;
> > +       pgoff_t end;
> > +
> > +       XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
> > +
> > +       rcu_read_lock();
> > +
> > +       end = (start + len - 1) / PAGE_SIZE;
> > +       xas_for_each(&xas, folio, end) {
> > +               if (!folio_test_writeback(folio)) {
> > +                       pr_err("bad %x @%llx page %lx %lx\n",
> > +                              len, start, folio_index(folio), end);
> > +                       BUG();
> > +               }
> > +
> > +               folio_detach_private(folio);
> > +               folio_end_writeback(folio);
> > +       }
> > +
> > +       rcu_read_unlock();
> > +}
> > +
> > +/*
> > + * Failure of write to server.
> > + */
> > +void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len)
> > +{
> > +       struct address_space *mapping = inode->i_mapping;
> > +       struct folio *folio;
> > +       pgoff_t end;
> > +
> > +       XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
> > +
> > +       rcu_read_lock();
> > +
> > +       end = (start + len - 1) / PAGE_SIZE;
> > +       xas_for_each(&xas, folio, end) {
> > +               if (!folio_test_writeback(folio)) {
> > +                       pr_err("bad %x @%llx page %lx %lx\n",
> > +                              len, start, folio_index(folio), end);
> > +                       BUG();
> > +               }
> > +
> > +               folio_set_error(folio);
> > +               folio_end_writeback(folio);
> > +       }
> > +
> > +       rcu_read_unlock();
> > +}
> > +
> > +/*
> > + * Redirty pages after a temporary failure.
> > + */
> > +void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len)
> > +{
> > +       struct address_space *mapping = inode->i_mapping;
> > +       struct folio *folio;
> > +       pgoff_t end;
> > +
> > +       XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
> > +
> > +       rcu_read_lock();
> > +
> > +       end = (start + len - 1) / PAGE_SIZE;
> > +       xas_for_each(&xas, folio, end) {
> > +               if (!folio_test_writeback(folio)) {
> > +                       pr_err("bad %x @%llx page %lx %lx\n",
> > +                              len, start, folio_index(folio), end);
> > +                       BUG();
> > +               }
> > +
> > +               filemap_dirty_folio(folio->mapping, folio);
> > +               folio_end_writeback(folio);
> > +       }
> > +
> > +       rcu_read_unlock();
> > +}
> > +
> >  /*
> >   * Write failed with a retryable error. Resend the write request. It's also
> >   * possible that the page was redirtied so re-clean the page.
> > @@ -1936,51 +2021,56 @@ cifs_writedata_release(struct kref *refcount)
> >  static void
> >  cifs_writev_requeue(struct cifs_writedata *wdata)
> >  {
> > -       int i, rc = 0;
> > +       int rc = 0;
> >         struct inode *inode = d_inode(wdata->cfile->dentry);
> >         struct TCP_Server_Info *server;
> > -       unsigned int rest_len;
> > +       unsigned int rest_len = wdata->bytes;
> > +       loff_t fpos = wdata->offset;
> >
> >         server = tlink_tcon(wdata->cfile->tlink)->ses->server;
> > -       i = 0;
> > -       rest_len = wdata->bytes;
> >         do {
> >                 struct cifs_writedata *wdata2;
> > -               unsigned int j, nr_pages, wsize, tailsz, cur_len;
> > +               unsigned int wsize, cur_len;
> >
> >                 wsize = server->ops->wp_retry_size(inode);
> >                 if (wsize < rest_len) {
> > -                       nr_pages = wsize / PAGE_SIZE;
> > -                       if (!nr_pages) {
> > +                       if (wsize < PAGE_SIZE) {
> >                                 rc = -ENOTSUPP;
> >                                 break;
> >                         }
> > -                       cur_len = nr_pages * PAGE_SIZE;
> > -                       tailsz = PAGE_SIZE;
> > +                       cur_len = min(round_down(wsize, PAGE_SIZE), rest_len);
> >                 } else {
> > -                       nr_pages = DIV_ROUND_UP(rest_len, PAGE_SIZE);
> >                         cur_len = rest_len;
> > -                       tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE;
> >                 }
> >
> > -               wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);
> > +               wdata2 = cifs_writedata_alloc(cifs_writev_complete);
> >                 if (!wdata2) {
> >                         rc = -ENOMEM;
> >                         break;
> >                 }
> >
> > -               for (j = 0; j < nr_pages; j++) {
> > -                       wdata2->pages[j] = wdata->pages[i + j];
> > -                       lock_page(wdata2->pages[j]);
> > -                       clear_page_dirty_for_io(wdata2->pages[j]);
> > -               }
> > -
> >                 wdata2->sync_mode = wdata->sync_mode;
> > -               wdata2->nr_pages = nr_pages;
> > -               wdata2->offset = page_offset(wdata2->pages[0]);
> > -               wdata2->pagesz = PAGE_SIZE;
> > -               wdata2->tailsz = tailsz;
> > -               wdata2->bytes = cur_len;
> > +               wdata2->offset  = fpos;
> > +               wdata2->bytes   = cur_len;
> > +               wdata2->iter    = wdata->iter;
> > +
> > +               iov_iter_advance(&wdata2->iter, fpos - wdata->offset);
> > +               iov_iter_truncate(&wdata2->iter, wdata2->bytes);
> > +
> > +#if 0
> > +               if (iov_iter_is_xarray(&wdata2->iter)) {
> > +                       /* TODO: Check for pages having been redirtied and
> > +                        * clean them.  We can do this by walking the xarray.
> > +                        * If it's not an xarray, then it's a DIO and we
> > +                        * shouldn't be mucking around with the page bits.
> > +                        */
> > +                       for (j = 0; j < nr_pages; j++) {
> > +                               wdata2->pages[j] = wdata->pages[i + j];
> > +                               lock_page(wdata2->pages[j]);
> > +                               clear_page_dirty_for_io(wdata2->pages[j]);
> > +                       }
> > +               }
> > +#endif
> >
> >                 rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY,
> >                                             &wdata2->cfile);
> > @@ -1995,33 +2085,25 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
> >                                                        cifs_writedata_release);
> >                 }
> >
> > -               for (j = 0; j < nr_pages; j++) {
> > -                       unlock_page(wdata2->pages[j]);
> > -                       if (rc != 0 && !is_retryable_error(rc)) {
> > -                               SetPageError(wdata2->pages[j]);
> > -                               end_page_writeback(wdata2->pages[j]);
> > -                               put_page(wdata2->pages[j]);
> > -                       }
> > -               }
> > +               if (iov_iter_is_xarray(&wdata2->iter))
> > +                       cifs_pages_written_back(inode, wdata2->offset, wdata2->bytes);
> >
> >                 kref_put(&wdata2->refcount, cifs_writedata_release);
> >                 if (rc) {
> >                         if (is_retryable_error(rc))
> >                                 continue;
> > -                       i += nr_pages;
> > +                       fpos += cur_len;
> > +                       rest_len -= cur_len;
> >                         break;
> >                 }
> >
> > +               fpos += cur_len;
> >                 rest_len -= cur_len;
> > -               i += nr_pages;
> > -       } while (i < wdata->nr_pages);
> > +       } while (rest_len > 0);
> >
> > -       /* cleanup remaining pages from the original wdata */
> > -       for (; i < wdata->nr_pages; i++) {
> > -               SetPageError(wdata->pages[i]);
> > -               end_page_writeback(wdata->pages[i]);
> > -               put_page(wdata->pages[i]);
> > -       }
> > +       /* Clean up remaining pages from the original wdata */
> > +       if (iov_iter_is_xarray(&wdata->iter))
> > +               cifs_pages_written_back(inode, fpos, rest_len);
> >
> >         if (rc != 0 && !is_retryable_error(rc))
> >                 mapping_set_error(inode->i_mapping, rc);
> > @@ -2034,7 +2116,6 @@ cifs_writev_complete(struct work_struct *work)
> >         struct cifs_writedata *wdata = container_of(work,
> >                                                 struct cifs_writedata, work);
> >         struct inode *inode = d_inode(wdata->cfile->dentry);
> > -       int i = 0;
> >
> >         if (wdata->result == 0) {
> >                 spin_lock(&inode->i_lock);
> > @@ -2045,40 +2126,25 @@ cifs_writev_complete(struct work_struct *work)
> >         } else if (wdata->sync_mode == WB_SYNC_ALL && wdata->result == -EAGAIN)
> >                 return cifs_writev_requeue(wdata);
> >
> > -       for (i = 0; i < wdata->nr_pages; i++) {
> > -               struct page *page = wdata->pages[i];
> > -               if (wdata->result == -EAGAIN)
> > -                       __set_page_dirty_nobuffers(page);
> > -               else if (wdata->result < 0)
> > -                       SetPageError(page);
> > -               end_page_writeback(page);
> > -               cifs_readpage_to_fscache(inode, page);
> > -               put_page(page);
> > -       }
> > +       if (wdata->result == -EAGAIN)
> > +               cifs_pages_write_redirty(inode, wdata->offset, wdata->bytes);
> > +       else if (wdata->result < 0)
> > +               cifs_pages_write_failed(inode, wdata->offset, wdata->bytes);
> > +       else
> > +               cifs_pages_written_back(inode, wdata->offset, wdata->bytes);
> > +
> >         if (wdata->result != -EAGAIN)
> >                 mapping_set_error(inode->i_mapping, wdata->result);
> >         kref_put(&wdata->refcount, cifs_writedata_release);
> >  }
> >
> >  struct cifs_writedata *
> > -cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
> > -{
> > -       struct page **pages =
> > -               kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
> > -       if (pages)
> > -               return cifs_writedata_direct_alloc(pages, complete);
> > -
> > -       return NULL;
> > -}
> > -
> > -struct cifs_writedata *
> > -cifs_writedata_direct_alloc(struct page **pages, work_func_t complete)
> > +cifs_writedata_alloc(work_func_t complete)
> >  {
> >         struct cifs_writedata *wdata;
> >
> >         wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
> >         if (wdata != NULL) {
> > -               wdata->pages = pages;
> >                 kref_init(&wdata->refcount);
> >                 INIT_LIST_HEAD(&wdata->list);
> >                 init_completion(&wdata->done);
> > @@ -2186,11 +2252,7 @@ cifs_async_writev(struct cifs_writedata *wdata,
> >
> >         rqst.rq_iov = iov;
> >         rqst.rq_nvec = 2;
> > -       rqst.rq_pages = wdata->pages;
> > -       rqst.rq_offset = wdata->page_offset;
> > -       rqst.rq_npages = wdata->nr_pages;
> > -       rqst.rq_pagesz = wdata->pagesz;
> > -       rqst.rq_tailsz = wdata->tailsz;
> > +       rqst.rq_iter = wdata->iter;
> >
> >         cifs_dbg(FYI, "async write at %llu %u bytes\n",
> >                  wdata->offset, wdata->bytes);
> > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> > index ed210d774a21..d0851c9881b3 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -704,6 +704,22 @@ cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page,
> >         return cifs_readv_from_socket(server, &smb_msg);
> >  }
> >
> > +int
> > +cifs_read_iter_from_socket(struct TCP_Server_Info *server, struct iov_iter *iter,
> > +                          unsigned int to_read)
> > +{
> > +       struct msghdr smb_msg;
> > +       int ret;
> > +
> > +       smb_msg.msg_iter = *iter;
> > +       if (smb_msg.msg_iter.count > to_read)
> > +               smb_msg.msg_iter.count = to_read;
> > +       ret = cifs_readv_from_socket(server, &smb_msg);
> > +       if (ret > 0)
> > +               iov_iter_advance(iter, ret);
> > +       return ret;
> > +}
> > +
> >  static bool
> >  is_smb_response(struct TCP_Server_Info *server, unsigned char type)
> >  {
> > diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
> > index 56598f7dbe00..f5fe5720456a 100644
> > --- a/fs/cifs/misc.c
> > +++ b/fs/cifs/misc.c
> > @@ -1122,25 +1122,6 @@ cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc)
> >         *shash = NULL;
> >  }
> >
> > -/**
> > - * rqst_page_get_length - obtain the length and offset for a page in smb_rqst
> > - * @rqst: The request descriptor
> > - * @page: The index of the page to query
> > - * @len: Where to store the length for this page:
> > - * @offset: Where to store the offset for this page
> > - */
> > -void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
> > -                               unsigned int *len, unsigned int *offset)
> > -{
> > -       *len = rqst->rq_pagesz;
> > -       *offset = (page == 0) ? rqst->rq_offset : 0;
> > -
> > -       if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
> > -               *len = rqst->rq_tailsz;
> > -       else if (page == 0)
> > -               *len = rqst->rq_pagesz - rqst->rq_offset;
> > -}
> > -
> >  void extract_unc_hostname(const char *unc, const char **h, size_t *len)
> >  {
> >         const char *end;
> > diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> > index af5d0830bc8a..e1649ac194db 100644
> > --- a/fs/cifs/smb2ops.c
> > +++ b/fs/cifs/smb2ops.c
> > @@ -4406,15 +4406,30 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len,
> >  static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf,
> >                                    unsigned int buflen)
> >  {
> > -       void *addr;
> > +       struct page *page;
> > +
> >         /*
> >          * VMAP_STACK (at least) puts stack into the vmalloc address space
> >          */
> >         if (is_vmalloc_addr(buf))
> > -               addr = vmalloc_to_page(buf);
> > +               page = vmalloc_to_page(buf);
> >         else
> > -               addr = virt_to_page(buf);
> > -       sg_set_page(sg, addr, buflen, offset_in_page(buf));
> > +               page = virt_to_page(buf);
> > +       sg_set_page(sg, page, buflen, offset_in_page(buf));
> > +}
> > +
> > +struct cifs_init_sg_priv {
> > +               struct scatterlist *sg;
> > +               unsigned int idx;
> > +};
> > +
> > +static ssize_t cifs_init_sg_scan(struct iov_iter *i, const void *p,
> > +                                size_t len, size_t off, void *_priv)
> > +{
> > +       struct cifs_init_sg_priv *priv = _priv;
> > +
> > +       smb2_sg_set_buf(&priv->sg[priv->idx++], p, len);
> > +       return len;
> >  }
> >
> >  /* Assumes the first rqst has a transform header as the first iov.
> > @@ -4426,43 +4441,46 @@ static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf,
> >  static struct scatterlist *
> >  init_sg(int num_rqst, struct smb_rqst *rqst, u8 *sign)
> >  {
> > +       struct cifs_init_sg_priv priv;
> >         unsigned int sg_len;
> > -       struct scatterlist *sg;
> >         unsigned int i;
> >         unsigned int j;
> > -       unsigned int idx = 0;
> > +       ssize_t rc;
> >         int skip;
> >
> >         sg_len = 1;
> > -       for (i = 0; i < num_rqst; i++)
> > -               sg_len += rqst[i].rq_nvec + rqst[i].rq_npages;
> > +       for (i = 0; i < num_rqst; i++) {
> > +               unsigned int np = iov_iter_npages(&rqst[i].rq_iter, INT_MAX);
> > +               sg_len += rqst[i].rq_nvec + np;
> > +       }
> >
> > -       sg = kmalloc_array(sg_len, sizeof(struct scatterlist), GFP_KERNEL);
> > -       if (!sg)
> > +       priv.idx = 0;
> > +       priv.sg = kmalloc_array(sg_len, sizeof(struct scatterlist), GFP_KERNEL);
> > +       if (!priv.sg)
> >                 return NULL;
> >
> > -       sg_init_table(sg, sg_len);
> > +       sg_init_table(priv.sg, sg_len);
> >         for (i = 0; i < num_rqst; i++) {
> > +               struct iov_iter *iter = &rqst[i].rq_iter;
> > +               size_t count = iov_iter_count(iter);
> > +
> >                 for (j = 0; j < rqst[i].rq_nvec; j++) {
> >                         /*
> >                          * The first rqst has a transform header where the
> >                          * first 20 bytes are not part of the encrypted blob
> >                          */
> >                         skip = (i == 0) && (j == 0) ? 20 : 0;
> > -                       smb2_sg_set_buf(&sg[idx++],
> > +                       smb2_sg_set_buf(&priv.sg[priv.idx++],
> >                                         rqst[i].rq_iov[j].iov_base + skip,
> >                                         rqst[i].rq_iov[j].iov_len - skip);
> > -                       }
> > -
> > -               for (j = 0; j < rqst[i].rq_npages; j++) {
> > -                       unsigned int len, offset;
> > -
> > -                       rqst_page_get_length(&rqst[i], j, &len, &offset);
> > -                       sg_set_page(&sg[idx++], rqst[i].rq_pages[j], len, offset);
> >                 }
> > +
> > +               rc = iov_iter_scan(iter, count, cifs_init_sg_scan, &priv);
> > +               iov_iter_revert(iter, count);
> > +               WARN_ON(rc < 0);
> >         }
> > -       smb2_sg_set_buf(&sg[idx], sign, SMB2_SIGNATURE_SIZE);
> > -       return sg;
> > +       smb2_sg_set_buf(&priv.sg[priv.idx], sign, SMB2_SIGNATURE_SIZE);
> > +       return priv.sg;
> >  }
> >
> >  static int
> > @@ -4599,18 +4617,30 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst,
> >         return rc;
> >  }
> >
> > +/*
> > + * Clear a read buffer, discarding the folios which have XA_MARK_0 set.
> > + */
> > +static void cifs_clear_xarray_buffer(struct xarray *buffer)
> > +{
> > +       struct folio *folio;
> > +       XA_STATE(xas, buffer, 0);
> > +
> > +       rcu_read_lock();
> > +       xas_for_each_marked(&xas, folio, ULONG_MAX, XA_MARK_0) {
> > +               folio_put(folio);
> > +       }
> > +       rcu_read_unlock();
> > +       xa_destroy(buffer);
> > +}
> > +
> >  void
> >  smb3_free_compound_rqst(int num_rqst, struct smb_rqst *rqst)
> >  {
> > -       int i, j;
> > +       int i;
> >
> > -       for (i = 0; i < num_rqst; i++) {
> > -               if (rqst[i].rq_pages) {
> > -                       for (j = rqst[i].rq_npages - 1; j >= 0; j--)
> > -                               put_page(rqst[i].rq_pages[j]);
> > -                       kfree(rqst[i].rq_pages);
> > -               }
> > -       }
> > +       for (i = 0; i < num_rqst; i++)
> > +               if (!xa_empty(&rqst[i].rq_buffer))
> > +                       cifs_clear_xarray_buffer(&rqst[i].rq_buffer);
> >  }
> >
> >  /*
> > @@ -4630,50 +4660,51 @@ static int
> >  smb3_init_transform_rq(struct TCP_Server_Info *server, int num_rqst,
> >                        struct smb_rqst *new_rq, struct smb_rqst *old_rq)
> >  {
> > -       struct page **pages;
> >         struct smb2_transform_hdr *tr_hdr = new_rq[0].rq_iov[0].iov_base;
> > -       unsigned int npages;
> > +       struct page *page;
> >         unsigned int orig_len = 0;
> >         int i, j;
> >         int rc = -ENOMEM;
> >
> >         for (i = 1; i < num_rqst; i++) {
> > -               npages = old_rq[i - 1].rq_npages;
> > -               pages = kmalloc_array(npages, sizeof(struct page *),
> > -                                     GFP_KERNEL);
> > -               if (!pages)
> > -                       goto err_free;
> > -
> > -               new_rq[i].rq_pages = pages;
> > -               new_rq[i].rq_npages = npages;
> > -               new_rq[i].rq_offset = old_rq[i - 1].rq_offset;
> > -               new_rq[i].rq_pagesz = old_rq[i - 1].rq_pagesz;
> > -               new_rq[i].rq_tailsz = old_rq[i - 1].rq_tailsz;
> > -               new_rq[i].rq_iov = old_rq[i - 1].rq_iov;
> > -               new_rq[i].rq_nvec = old_rq[i - 1].rq_nvec;
> > -
> > -               orig_len += smb_rqst_len(server, &old_rq[i - 1]);
> > -
> > -               for (j = 0; j < npages; j++) {
> > -                       pages[j] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
> > -                       if (!pages[j])
> > -                               goto err_free;
> > -               }
> > -
> > -               /* copy pages form the old */
> > -               for (j = 0; j < npages; j++) {
> > -                       char *dst, *src;
> > -                       unsigned int offset, len;
> > -
> > -                       rqst_page_get_length(&new_rq[i], j, &len, &offset);
> > -
> > -                       dst = (char *) kmap(new_rq[i].rq_pages[j]) + offset;
> > -                       src = (char *) kmap(old_rq[i - 1].rq_pages[j]) + offset;
> > +               struct smb_rqst *old = &old_rq[i - 1];
> > +               struct smb_rqst *new = &new_rq[i];
> > +               struct xarray *buffer = &new->rq_buffer;
> > +               unsigned int npages;
> > +               size_t size = iov_iter_count(&old->rq_iter), seg, copied = 0;
> > +
> > +               xa_init(buffer);
> > +
> > +               if (size > 0) {
> > +                       npages = DIV_ROUND_UP(size, PAGE_SIZE);
> > +                       for (j = 0; j < npages; j++) {
> > +                               void *o;
> > +
> > +                               rc = -ENOMEM;
> > +                               page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
> > +                               if (!page)
> > +                                       goto err_free;
> > +                               page->index = j;
> > +                               o = xa_store(buffer, j, page, GFP_KERNEL);
> > +                               if (xa_is_err(o)) {
> > +                                       rc = xa_err(o);
> > +                                       put_page(page);
> > +                                       goto err_free;
> > +                               }
> >
> > -                       memcpy(dst, src, len);
> > -                       kunmap(new_rq[i].rq_pages[j]);
> > -                       kunmap(old_rq[i - 1].rq_pages[j]);
> > +                               seg = min(size - copied, PAGE_SIZE);
> > +                               if (copy_page_from_iter(page, 0, seg, &old->rq_iter) != seg) {
> > +                                       rc = -EFAULT;
> > +                                       goto err_free;
> > +                               }
> > +                               copied += seg;
> > +                       }
> > +                       iov_iter_xarray(&new->rq_iter, iov_iter_rw(&old->rq_iter),
> > +                                       buffer, 0, size);
> >                 }
> > +               new->rq_iov = old->rq_iov;
> > +               new->rq_nvec = old->rq_nvec;
> > +               orig_len += smb_rqst_len(server, new);
> >         }
> >
> >         /* fill the 1st iov with a transform header */
> > @@ -4701,12 +4732,12 @@ smb3_is_transform_hdr(void *buf)
> >
> >  static int
> >  decrypt_raw_data(struct TCP_Server_Info *server, char *buf,
> > -                unsigned int buf_data_size, struct page **pages,
> > -                unsigned int npages, unsigned int page_data_size,
> > +                unsigned int buf_data_size, struct iov_iter *iter,
> >                  bool is_offloaded)
> >  {
> >         struct kvec iov[2];
> >         struct smb_rqst rqst = {NULL};
> > +       size_t iter_size = 0;
> >         int rc;
> >
> >         iov[0].iov_base = buf;
> > @@ -4716,10 +4747,10 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf,
> >
> >         rqst.rq_iov = iov;
> >         rqst.rq_nvec = 2;
> > -       rqst.rq_pages = pages;
> > -       rqst.rq_npages = npages;
> > -       rqst.rq_pagesz = PAGE_SIZE;
> > -       rqst.rq_tailsz = (page_data_size % PAGE_SIZE) ? : PAGE_SIZE;
> > +       if (iter) {
> > +               rqst.rq_iter = *iter;
> > +               iter_size = iov_iter_count(iter);
> > +       }
> >
> >         rc = crypt_message(server, 1, &rqst, 0);
> >         cifs_dbg(FYI, "Decrypt message returned %d\n", rc);
> > @@ -4730,73 +4761,37 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf,
> >         memmove(buf, iov[1].iov_base, buf_data_size);
> >
> >         if (!is_offloaded)
> > -               server->total_read = buf_data_size + page_data_size;
> > +               server->total_read = buf_data_size + iter_size;
> >
> >         return rc;
> >  }
> >
> >  static int
> > -read_data_into_pages(struct TCP_Server_Info *server, struct page **pages,
> > -                    unsigned int npages, unsigned int len)
> > +cifs_copy_pages_to_iter(struct xarray *pages, unsigned int data_size,
> > +                       unsigned int skip, struct iov_iter *iter)
> >  {
> > -       int i;
> > -       int length;
> > +       struct page *page;
> > +       unsigned long index;
> >
> > -       for (i = 0; i < npages; i++) {
> > -               struct page *page = pages[i];
> > -               size_t n;
> > +       xa_for_each(pages, index, page) {
> > +               size_t n, len = min_t(unsigned int, PAGE_SIZE - skip, data_size);
> >
> > -               n = len;
> > -               if (len >= PAGE_SIZE) {
> > -                       /* enough data to fill the page */
> > -                       n = PAGE_SIZE;
> > -                       len -= n;
> > -               } else {
> > -                       zero_user(page, len, PAGE_SIZE - len);
> > -                       len = 0;
> > +               n = copy_page_to_iter(page, skip, len, iter);
> > +               if (n != len) {
> > +                       cifs_dbg(VFS, "%s: something went wrong\n", __func__);
> > +                       return -EIO;
> >                 }
> > -               length = cifs_read_page_from_socket(server, page, 0, n);
> > -               if (length < 0)
> > -                       return length;
> > -               server->total_read += length;
> > +               data_size -= n;
> > +               skip = 0;
> >         }
> >
> >         return 0;
> >  }
> >
> > -static int
> > -init_read_bvec(struct page **pages, unsigned int npages, unsigned int data_size,
> > -              unsigned int cur_off, struct bio_vec **page_vec)
> > -{
> > -       struct bio_vec *bvec;
> > -       int i;
> > -
> > -       bvec = kcalloc(npages, sizeof(struct bio_vec), GFP_KERNEL);
> > -       if (!bvec)
> > -               return -ENOMEM;
> > -
> > -       for (i = 0; i < npages; i++) {
> > -               bvec[i].bv_page = pages[i];
> > -               bvec[i].bv_offset = (i == 0) ? cur_off : 0;
> > -               bvec[i].bv_len = min_t(unsigned int, PAGE_SIZE, data_size);
> > -               data_size -= bvec[i].bv_len;
> > -       }
> > -
> > -       if (data_size != 0) {
> > -               cifs_dbg(VFS, "%s: something went wrong\n", __func__);
> > -               kfree(bvec);
> > -               return -EIO;
> > -       }
> > -
> > -       *page_vec = bvec;
> > -       return 0;
> > -}
> > -
> >  static int
> >  handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
> > -                char *buf, unsigned int buf_len, struct page **pages,
> > -                unsigned int npages, unsigned int page_data_size,
> > -                bool is_offloaded)
> > +                char *buf, unsigned int buf_len, struct xarray *pages,
> > +                unsigned int pages_len, bool is_offloaded)
> >  {
> >         unsigned int data_offset;
> >         unsigned int data_len;
> > @@ -4805,9 +4800,6 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
> >         unsigned int pad_len;
> >         struct cifs_readdata *rdata = mid->callback_data;
> >         struct smb2_hdr *shdr = (struct smb2_hdr *)buf;
> > -       struct bio_vec *bvec = NULL;
> > -       struct iov_iter iter;
> > -       struct kvec iov;
> >         int length;
> >         bool use_rdma_mr = false;
> >
> > @@ -4896,7 +4888,7 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
> >                         return 0;
> >                 }
> >
> > -               if (data_len > page_data_size - pad_len) {
> > +               if (data_len > pages_len - pad_len) {
> >                         /* data_len is corrupt -- discard frame */
> >                         rdata->result = -EIO;
> >                         if (is_offloaded)
> > @@ -4906,8 +4898,9 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
> >                         return 0;
> >                 }
> >
> > -               rdata->result = init_read_bvec(pages, npages, page_data_size,
> > -                                              cur_off, &bvec);
> > +               /* Copy the data to the output I/O iterator. */
> > +               rdata->result = cifs_copy_pages_to_iter(pages, pages_len,
> > +                                                       cur_off, &rdata->iter);
> >                 if (rdata->result != 0) {
> >                         if (is_offloaded)
> >                                 mid->mid_state = MID_RESPONSE_MALFORMED;
> > @@ -4915,14 +4908,15 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
> >                                 dequeue_mid(mid, rdata->result);
> >                         return 0;
> >                 }
> > +               rdata->got_bytes = pages_len;
> >
> > -               iov_iter_bvec(&iter, WRITE, bvec, npages, data_len);
> >         } else if (buf_len >= data_offset + data_len) {
> >                 /* read response payload is in buf */
> > -               WARN_ONCE(npages > 0, "read data can be either in buf or in pages");
> > -               iov.iov_base = buf + data_offset;
> > -               iov.iov_len = data_len;
> > -               iov_iter_kvec(&iter, WRITE, &iov, 1, data_len);
> > +               WARN_ONCE(pages && !xa_empty(pages),
> > +                         "read data can be either in buf or in pages");
> > +               length = copy_to_iter(buf + data_offset, data_len, &rdata->iter);
> > +               if (length < 0)
> > +                       return length;
> >         } else {
> >                 /* read response payload cannot be in both buf and pages */
> >                 WARN_ONCE(1, "buf can not contain only a part of read data");
> > @@ -4934,13 +4928,6 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
> >                 return 0;
> >         }
> >
> > -       length = rdata->copy_into_pages(server, rdata, &iter);
> > -
> > -       kfree(bvec);
> > -
> > -       if (length < 0)
> > -               return length;
> > -
> >         if (is_offloaded)
> >                 mid->mid_state = MID_RESPONSE_RECEIVED;
> >         else
> > @@ -4951,9 +4938,8 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
> >  struct smb2_decrypt_work {
> >         struct work_struct decrypt;
> >         struct TCP_Server_Info *server;
> > -       struct page **ppages;
> > +       struct xarray buffer;
> >         char *buf;
> > -       unsigned int npages;
> >         unsigned int len;
> >  };
> >
> > @@ -4962,11 +4948,13 @@ static void smb2_decrypt_offload(struct work_struct *work)
> >  {
> >         struct smb2_decrypt_work *dw = container_of(work,
> >                                 struct smb2_decrypt_work, decrypt);
> > -       int i, rc;
> > +       int rc;
> >         struct mid_q_entry *mid;
> > +       struct iov_iter iter;
> >
> > +       iov_iter_xarray(&iter, READ, &dw->buffer, 0, dw->len);
> >         rc = decrypt_raw_data(dw->server, dw->buf, dw->server->vals->read_rsp_size,
> > -                             dw->ppages, dw->npages, dw->len, true);
> > +                             &iter, true);
> >         if (rc) {
> >                 cifs_dbg(VFS, "error decrypting rc=%d\n", rc);
> >                 goto free_pages;
> > @@ -4980,7 +4968,7 @@ static void smb2_decrypt_offload(struct work_struct *work)
> >                 mid->decrypted = true;
> >                 rc = handle_read_data(dw->server, mid, dw->buf,
> >                                       dw->server->vals->read_rsp_size,
> > -                                     dw->ppages, dw->npages, dw->len,
> > +                                     &dw->buffer, dw->len,
> >                                       true);
> >                 if (rc >= 0) {
> >  #ifdef CONFIG_CIFS_STATS2
> > @@ -5012,10 +5000,7 @@ static void smb2_decrypt_offload(struct work_struct *work)
> >         }
> >
> >  free_pages:
> > -       for (i = dw->npages-1; i >= 0; i--)
> > -               put_page(dw->ppages[i]);
> > -
> > -       kfree(dw->ppages);
> > +       cifs_clear_xarray_buffer(&dw->buffer);
> >         cifs_small_buf_release(dw->buf);
> >         kfree(dw);
> >  }
> > @@ -5025,47 +5010,66 @@ static int
> >  receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
> >                        int *num_mids)
> >  {
> > +       struct page *page;
> >         char *buf = server->smallbuf;
> >         struct smb2_transform_hdr *tr_hdr = (struct smb2_transform_hdr *)buf;
> > -       unsigned int npages;
> > -       struct page **pages;
> > -       unsigned int len;
> > +       struct iov_iter iter;
> > +       unsigned int len, npages;
> >         unsigned int buflen = server->pdu_size;
> >         int rc;
> >         int i = 0;
> >         struct smb2_decrypt_work *dw;
> >
> > +       dw = kzalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL);
> > +       if (!dw)
> > +               return -ENOMEM;
> > +       xa_init(&dw->buffer);
> > +       INIT_WORK(&dw->decrypt, smb2_decrypt_offload);
> > +       dw->server = server;
> > +
> >         *num_mids = 1;
> >         len = min_t(unsigned int, buflen, server->vals->read_rsp_size +
> >                 sizeof(struct smb2_transform_hdr)) - HEADER_SIZE(server) + 1;
> >
> >         rc = cifs_read_from_socket(server, buf + HEADER_SIZE(server) - 1, len);
> >         if (rc < 0)
> > -               return rc;
> > +               goto free_dw;
> >         server->total_read += rc;
> >
> >         len = le32_to_cpu(tr_hdr->OriginalMessageSize) -
> >                 server->vals->read_rsp_size;
> > +       dw->len = len;
> >         npages = DIV_ROUND_UP(len, PAGE_SIZE);
> >
> > -       pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
> > -       if (!pages) {
> > -               rc = -ENOMEM;
> > -               goto discard_data;
> > -       }
> > -
> > +       rc = -ENOMEM;
> >         for (; i < npages; i++) {
> > -               pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
> > -               if (!pages[i]) {
> > -                       rc = -ENOMEM;
> > +               void *old;
> > +
> > +               page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
> > +               if (!page) {
> > +                       goto discard_data;
> > +               }
> > +               page->index = i;
> > +               old = xa_store(&dw->buffer, i, page, GFP_KERNEL);
> > +               if (xa_is_err(old)) {
> > +                       rc = xa_err(old);
> > +                       put_page(page);
> >                         goto discard_data;
> >                 }
> >         }
> >
> > -       /* read read data into pages */
> > -       rc = read_data_into_pages(server, pages, npages, len);
> > -       if (rc)
> > -               goto free_pages;
> > +       iov_iter_xarray(&iter, READ, &dw->buffer, 0, npages * PAGE_SIZE);
> > +
> > +       /* Read the data into the buffer and clear excess bufferage. */
> > +       rc = cifs_read_iter_from_socket(server, &iter, dw->len);
> > +       if (rc < 0)
> > +               goto discard_data;
> > +
> > +       server->total_read += rc;
> > +       if (rc < npages * PAGE_SIZE)
> > +               iov_iter_zero(npages * PAGE_SIZE - rc, &iter);
> > +       iov_iter_revert(&iter, npages * PAGE_SIZE);
> > +       iov_iter_truncate(&iter, dw->len);
> >
> >         rc = cifs_discard_remaining_data(server);
> >         if (rc)
> > @@ -5078,39 +5082,28 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
> >
> >         if ((server->min_offload) && (server->in_flight > 1) &&
> >             (server->pdu_size >= server->min_offload)) {
> > -               dw = kmalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL);
> > -               if (dw == NULL)
> > -                       goto non_offloaded_decrypt;
> > -
> >                 dw->buf = server->smallbuf;
> >                 server->smallbuf = (char *)cifs_small_buf_get();
> >
> > -               INIT_WORK(&dw->decrypt, smb2_decrypt_offload);
> > -
> > -               dw->npages = npages;
> > -               dw->server = server;
> > -               dw->ppages = pages;
> > -               dw->len = len;
> >                 queue_work(decrypt_wq, &dw->decrypt);
> >                 *num_mids = 0; /* worker thread takes care of finding mid */
> >                 return -1;
> >         }
> >
> > -non_offloaded_decrypt:
> >         rc = decrypt_raw_data(server, buf, server->vals->read_rsp_size,
> > -                             pages, npages, len, false);
> > +                             &iter, false);
> >         if (rc)
> >                 goto free_pages;
> >
> >         *mid = smb2_find_mid(server, buf);
> > -       if (*mid == NULL)
> > +       if (*mid == NULL) {
> >                 cifs_dbg(FYI, "mid not found\n");
> > -       else {
> > +       } else {
> >                 cifs_dbg(FYI, "mid found\n");
> >                 (*mid)->decrypted = true;
> >                 rc = handle_read_data(server, *mid, buf,
> >                                       server->vals->read_rsp_size,
> > -                                     pages, npages, len, false);
> > +                                     &dw->buffer, dw->len, false);
> >                 if (rc >= 0) {
> >                         if (server->ops->is_network_name_deleted) {
> >                                 server->ops->is_network_name_deleted(buf,
> > @@ -5120,9 +5113,9 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
> >         }
> >
> >  free_pages:
> > -       for (i = i - 1; i >= 0; i--)
> > -               put_page(pages[i]);
> > -       kfree(pages);
> > +       cifs_clear_xarray_buffer(&dw->buffer);
> > +free_dw:
> > +       kfree(dw);
> >         return rc;
> >  discard_data:
> >         cifs_discard_remaining_data(server);
> > @@ -5160,7 +5153,7 @@ receive_encrypted_standard(struct TCP_Server_Info *server,
> >         server->total_read += length;
> >
> >         buf_size = pdu_length - sizeof(struct smb2_transform_hdr);
> > -       length = decrypt_raw_data(server, buf, buf_size, NULL, 0, 0, false);
> > +       length = decrypt_raw_data(server, buf, buf_size, NULL, false);
> >         if (length)
> >                 return length;
> >
> > @@ -5259,7 +5252,7 @@ smb3_handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid)
> >         char *buf = server->large_buf ? server->bigbuf : server->smallbuf;
> >
> >         return handle_read_data(server, mid, buf, server->pdu_size,
> > -                               NULL, 0, 0, false);
> > +                               NULL, 0, false);
> >  }
> >
> >  static int
> > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> > index 7e7909b1ae11..ebbea7526ee2 100644
> > --- a/fs/cifs/smb2pdu.c
> > +++ b/fs/cifs/smb2pdu.c
> > @@ -4118,11 +4118,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
> >         struct cifs_credits credits = { .value = 0, .instance = 0 };
> >         struct smb_rqst rqst = { .rq_iov = &rdata->iov[1],
> >                                  .rq_nvec = 1,
> > -                                .rq_pages = rdata->pages,
> > -                                .rq_offset = rdata->page_offset,
> > -                                .rq_npages = rdata->nr_pages,
> > -                                .rq_pagesz = rdata->pagesz,
> > -                                .rq_tailsz = rdata->tailsz };
> > +                                .rq_iter = rdata->iter };
> >
> >         WARN_ONCE(rdata->server != mid->server,
> >                   "rdata server %p != mid server %p",
> > @@ -4522,11 +4518,7 @@ smb2_async_writev(struct cifs_writedata *wdata,
> >
> >         rqst.rq_iov = iov;
> >         rqst.rq_nvec = 1;
> > -       rqst.rq_pages = wdata->pages;
> > -       rqst.rq_offset = wdata->page_offset;
> > -       rqst.rq_npages = wdata->nr_pages;
> > -       rqst.rq_pagesz = wdata->pagesz;
> > -       rqst.rq_tailsz = wdata->tailsz;
> > +       rqst.rq_iter = wdata->iter;
> >  #ifdef CONFIG_CIFS_SMB_DIRECT
> >         if (wdata->mr) {
> >                 iov[0].iov_len += sizeof(struct smbd_buffer_descriptor_v1);
> > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> > index 8540f7c13eae..cb19c43c0009 100644
> > --- a/fs/cifs/transport.c
> > +++ b/fs/cifs/transport.c
> > @@ -276,26 +276,7 @@ smb_rqst_len(struct TCP_Server_Info *server, struct smb_rqst *rqst)
> >         for (i = 0; i < nvec; i++)
> >                 buflen += iov[i].iov_len;
> >
> > -       /*
> > -        * Add in the page array if there is one. The caller needs to make
> > -        * sure rq_offset and rq_tailsz are set correctly. If a buffer of
> > -        * multiple pages ends at page boundary, rq_tailsz needs to be set to
> > -        * PAGE_SIZE.
> > -        */
> > -       if (rqst->rq_npages) {
> > -               if (rqst->rq_npages == 1)
> > -                       buflen += rqst->rq_tailsz;
> > -               else {
> > -                       /*
> > -                        * If there is more than one page, calculate the
> > -                        * buffer length based on rq_offset and rq_tailsz
> > -                        */
> > -                       buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> > -                                       rqst->rq_offset;
> > -                       buflen += rqst->rq_tailsz;
> > -               }
> > -       }
> > -
> > +       buflen += iov_iter_count(&rqst->rq_iter);
> >         return buflen;
> >  }
> >
> > @@ -382,23 +363,15 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst,
> >
> >                 total_len += sent;
> >
> > -               /* now walk the page array and send each page in it */
> > -               for (i = 0; i < rqst[j].rq_npages; i++) {
> > -                       struct bio_vec bvec;
> > -
> > -                       bvec.bv_page = rqst[j].rq_pages[i];
> > -                       rqst_page_get_length(&rqst[j], i, &bvec.bv_len,
> > -                                            &bvec.bv_offset);
> > -
> > -                       iov_iter_bvec(&smb_msg.msg_iter, WRITE,
> > -                                     &bvec, 1, bvec.bv_len);
> > +               if (iov_iter_count(&rqst[j].rq_iter) > 0) {
> > +                       smb_msg.msg_iter = rqst[j].rq_iter;
> >                         rc = smb_send_kvec(server, &smb_msg, &sent);
> >                         if (rc < 0)
> >                                 break;
> > -
> >                         total_len += sent;
> >                 }
> > -       }
> > +
> > +}
> >
> >  unmask:
> >         sigprocmask(SIG_SETMASK, &oldmask, NULL);
> >
> >



-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 7/7] cifs: Use netfslib to handle reads
  2022-01-25 13:59 ` [RFC PATCH 7/7] cifs: Use netfslib to handle reads David Howells
@ 2022-02-08  5:59   ` Rohith Surabattula
  2022-02-14 16:33   ` David Howells
  1 sibling, 0 replies; 20+ messages in thread
From: Rohith Surabattula @ 2022-02-08  5:59 UTC (permalink / raw)
  To: David Howells
  Cc: smfrench, nspmangalore, jlayton, linux-cifs, linux-cachefs,
	linux-fsdevel

Hi David,

I have tested netfs integration with fsc mount option enabled. But, I
observed function "netfs_cache_prepare_read" always returns
"NETFS_DOWNLOAD_FROM_SERVER" because cres->ops(i.e cachefiles
operations) is not set.

static enum netfs_read_source netfs_cache_prepare_read(struct
netfs_read_subrequest *subreq,
                                                       loff_t i_size)
{
        struct netfs_read_request *rreq = subreq->rreq;
        struct netfs_cache_resources *cres = &rreq->cache_resources;

        if (cres->ops)
                return cres->ops->prepare_read(subreq, i_size);
        if (subreq->start >= rreq->i_size)
                return NETFS_FILL_WITH_ZEROES;
        return NETFS_DOWNLOAD_FROM_SERVER;

I have used cifs-experimental branch in your repo to test netfs changes.

Please let me know if any work needs to be done for netfs to integrate
with cachefiles?

Regards,
Rohith

On Wed, Jan 26, 2022 at 1:24 AM David Howells <dhowells@redhat.com> wrote:
>
>
> ---
>
>  fs/cifs/Kconfig        |    1
>  fs/cifs/cifsfs.c       |    6
>  fs/cifs/cifsfs.h       |    3
>  fs/cifs/cifsglob.h     |    6
>  fs/cifs/cifssmb.c      |    9 -
>  fs/cifs/file.c         |  824 ++++++++----------------------------------------
>  fs/cifs/fscache.c      |   31 --
>  fs/cifs/fscache.h      |   52 ---
>  fs/cifs/inode.c        |   17 +
>  fs/cifs/smb2pdu.c      |   15 +
>  fs/netfs/read_helper.c |    7
>  11 files changed, 182 insertions(+), 789 deletions(-)
>
> diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig
> index 3b7e3b9e4fd2..c47e2d3a101f 100644
> --- a/fs/cifs/Kconfig
> +++ b/fs/cifs/Kconfig
> @@ -2,6 +2,7 @@
>  config CIFS
>         tristate "SMB3 and CIFS support (advanced network filesystem)"
>         depends on INET
> +       select NETFS_SUPPORT
>         select NLS
>         select CRYPTO
>         select CRYPTO_MD5
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index a56cb9c8c5ff..bd06df3bb24b 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -936,7 +936,7 @@ cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>         struct inode *inode = file_inode(iocb->ki_filp);
>
>         if (iocb->ki_flags & IOCB_DIRECT)
> -               return cifs_user_readv(iocb, iter);
> +               return netfs_direct_read_iter(iocb, iter);
>
>         rc = cifs_revalidate_mapping(inode);
>         if (rc)
> @@ -1314,7 +1314,7 @@ const struct file_operations cifs_file_strict_ops = {
>  };
>
>  const struct file_operations cifs_file_direct_ops = {
> -       .read_iter = cifs_direct_readv,
> +       .read_iter = netfs_direct_read_iter,
>         .write_iter = cifs_direct_writev,
>         .open = cifs_open,
>         .release = cifs_close,
> @@ -1370,7 +1370,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
>  };
>
>  const struct file_operations cifs_file_direct_nobrl_ops = {
> -       .read_iter = cifs_direct_readv,
> +       .read_iter = netfs_direct_read_iter,
>         .write_iter = cifs_direct_writev,
>         .open = cifs_open,
>         .release = cifs_close,
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index 1c77bbc0815f..c7d5c268fc47 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -85,6 +85,7 @@ extern const struct inode_operations cifs_dfs_referral_inode_operations;
>
>
>  /* Functions related to files and directories */
> +extern const struct netfs_request_ops cifs_req_ops;
>  extern const struct file_operations cifs_file_ops;
>  extern const struct file_operations cifs_file_direct_ops; /* if directio mnt */
>  extern const struct file_operations cifs_file_strict_ops; /* if strictio mnt */
> @@ -94,8 +95,6 @@ extern const struct file_operations cifs_file_strict_nobrl_ops;
>  extern int cifs_open(struct inode *inode, struct file *file);
>  extern int cifs_close(struct inode *inode, struct file *file);
>  extern int cifs_closedir(struct inode *inode, struct file *file);
> -extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
> -extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
>  extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
>  extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
>  extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 3a4fed645636..938e4e9827ed 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1313,18 +1313,14 @@ struct cifs_aio_ctx {
>
>  /* asynchronous read support */
>  struct cifs_readdata {
> +       struct netfs_read_subrequest    *subreq;
>         struct kref                     refcount;
> -       struct list_head                list;
> -       struct completion               done;
>         struct cifsFileInfo             *cfile;
> -       struct address_space            *mapping;
> -       struct cifs_aio_ctx             *ctx;
>         __u64                           offset;
>         ssize_t                         got_bytes;
>         unsigned int                    bytes;
>         pid_t                           pid;
>         int                             result;
> -       struct work_struct              work;
>         struct iov_iter                 iter;
>         struct kvec                     iov[2];
>         struct TCP_Server_Info          *server;
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 38e7276352e2..c9fb77a8b31b 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -23,6 +23,7 @@
>  #include <linux/swap.h>
>  #include <linux/task_io_accounting_ops.h>
>  #include <linux/uaccess.h>
> +#include <linux/netfs.h>
>  #include "cifspdu.h"
>  #include "cifsfs.h"
>  #include "cifsglob.h"
> @@ -1609,7 +1610,13 @@ cifs_readv_callback(struct mid_q_entry *mid)
>                 rdata->result = -EIO;
>         }
>
> -       queue_work(cifsiod_wq, &rdata->work);
> +       if (rdata->result == 0 || rdata->result == -EAGAIN)
> +               iov_iter_advance(&rdata->subreq->iter, rdata->got_bytes);
> +       netfs_subreq_terminated(rdata->subreq,
> +                               (rdata->result == 0 || rdata->result == -EAGAIN) ?
> +                               rdata->got_bytes : rdata->result,
> +                               false);
> +       kref_put(&rdata->refcount, cifs_readdata_release);
>         DeleteMidQEntry(mid);
>         add_credits(server, &credits, 0);
>  }
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index f9b9a1562e17..36559de02e37 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -21,6 +21,7 @@
>  #include <linux/slab.h>
>  #include <linux/swap.h>
>  #include <linux/mm.h>
> +#include <linux/netfs.h>
>  #include <asm/div64.h>
>  #include "cifsfs.h"
>  #include "cifspdu.h"
> @@ -3306,12 +3307,8 @@ static struct cifs_readdata *cifs_readdata_alloc(work_func_t complete)
>         struct cifs_readdata *rdata;
>
>         rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
> -       if (rdata) {
> +       if (rdata)
>                 kref_init(&rdata->refcount);
> -               INIT_LIST_HEAD(&rdata->list);
> -               init_completion(&rdata->done);
> -               INIT_WORK(&rdata->work, complete);
> -       }
>
>         return rdata;
>  }
> @@ -3322,8 +3319,6 @@ cifs_readdata_release(struct kref *refcount)
>         struct cifs_readdata *rdata = container_of(refcount,
>                                         struct cifs_readdata, refcount);
>
> -       if (rdata->ctx)
> -               kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
>  #ifdef CONFIG_CIFS_SMB_DIRECT
>         if (rdata->mr) {
>                 smbd_deregister_mr(rdata->mr);
> @@ -3336,370 +3331,6 @@ cifs_readdata_release(struct kref *refcount)
>         kfree(rdata);
>  }
>
> -static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
> -
> -static void
> -cifs_uncached_readv_complete(struct work_struct *work)
> -{
> -       struct cifs_readdata *rdata = container_of(work,
> -                                               struct cifs_readdata, work);
> -
> -       complete(&rdata->done);
> -       collect_uncached_read_data(rdata->ctx);
> -       /* the below call can possibly free the last ref to aio ctx */
> -       kref_put(&rdata->refcount, cifs_readdata_release);
> -}
> -
> -static int cifs_resend_rdata(struct cifs_readdata *rdata,
> -                       struct list_head *rdata_list,
> -                       struct cifs_aio_ctx *ctx)
> -{
> -       unsigned int rsize;
> -       struct cifs_credits credits;
> -       int rc;
> -       struct TCP_Server_Info *server;
> -
> -       /* XXX: should we pick a new channel here? */
> -       server = rdata->server;
> -
> -       do {
> -               if (rdata->cfile->invalidHandle) {
> -                       rc = cifs_reopen_file(rdata->cfile, true);
> -                       if (rc == -EAGAIN)
> -                               continue;
> -                       else if (rc)
> -                               break;
> -               }
> -
> -               /*
> -                * Wait for credits to resend this rdata.
> -                * Note: we are attempting to resend the whole rdata not in
> -                * segments
> -                */
> -               do {
> -                       rc = server->ops->wait_mtu_credits(server, rdata->bytes,
> -                                               &rsize, &credits);
> -
> -                       if (rc)
> -                               goto fail;
> -
> -                       if (rsize < rdata->bytes) {
> -                               add_credits_and_wake_if(server, &credits, 0);
> -                               msleep(1000);
> -                       }
> -               } while (rsize < rdata->bytes);
> -               rdata->credits = credits;
> -
> -               rc = adjust_credits(server, &rdata->credits, rdata->bytes);
> -               if (!rc) {
> -                       if (rdata->cfile->invalidHandle)
> -                               rc = -EAGAIN;
> -                       else {
> -#ifdef CONFIG_CIFS_SMB_DIRECT
> -                               if (rdata->mr) {
> -                                       rdata->mr->need_invalidate = true;
> -                                       smbd_deregister_mr(rdata->mr);
> -                                       rdata->mr = NULL;
> -                               }
> -#endif
> -                               rc = server->ops->async_readv(rdata);
> -                       }
> -               }
> -
> -               /* If the read was successfully sent, we are done */
> -               if (!rc) {
> -                       /* Add to aio pending list */
> -                       list_add_tail(&rdata->list, rdata_list);
> -                       return 0;
> -               }
> -
> -               /* Roll back credits and retry if needed */
> -               add_credits_and_wake_if(server, &rdata->credits, 0);
> -       } while (rc == -EAGAIN);
> -
> -fail:
> -       kref_put(&rdata->refcount, cifs_readdata_release);
> -       return rc;
> -}
> -
> -static int
> -cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
> -                    struct cifs_sb_info *cifs_sb, struct list_head *rdata_list,
> -                    struct cifs_aio_ctx *ctx)
> -{
> -       struct cifs_readdata *rdata;
> -       unsigned int rsize;
> -       struct cifs_credits credits_on_stack;
> -       struct cifs_credits *credits = &credits_on_stack;
> -       size_t cur_len;
> -       int rc;
> -       pid_t pid;
> -       struct TCP_Server_Info *server;
> -
> -       server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
> -
> -       if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> -               pid = open_file->pid;
> -       else
> -               pid = current->tgid;
> -
> -       do {
> -               if (open_file->invalidHandle) {
> -                       rc = cifs_reopen_file(open_file, true);
> -                       if (rc == -EAGAIN)
> -                               continue;
> -                       else if (rc)
> -                               break;
> -               }
> -
> -               rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize,
> -                                                  &rsize, credits);
> -               if (rc)
> -                       break;
> -
> -               cur_len = min_t(const size_t, len, rsize);
> -
> -               rdata = cifs_readdata_alloc(cifs_uncached_readv_complete);
> -               if (!rdata) {
> -                       add_credits_and_wake_if(server, credits, 0);
> -                       rc = -ENOMEM;
> -                       break;
> -               }
> -
> -               rdata->server   = server;
> -               rdata->cfile    = cifsFileInfo_get(open_file);
> -               rdata->offset   = offset;
> -               rdata->bytes    = cur_len;
> -               rdata->pid      = pid;
> -               rdata->credits  = credits_on_stack;
> -               rdata->ctx      = ctx;
> -               kref_get(&ctx->refcount);
> -
> -               rdata->iter     = ctx->iter;
> -               iov_iter_advance(&rdata->iter, offset - ctx->pos);
> -               iov_iter_truncate(&rdata->iter, cur_len);
> -
> -               rc = adjust_credits(server, &rdata->credits, rdata->bytes);
> -
> -               if (!rc) {
> -                       if (rdata->cfile->invalidHandle)
> -                               rc = -EAGAIN;
> -                       else
> -                               rc = server->ops->async_readv(rdata);
> -               }
> -
> -               if (rc) {
> -                       add_credits_and_wake_if(server, &rdata->credits, 0);
> -                       kref_put(&rdata->refcount, cifs_readdata_release);
> -                       if (rc == -EAGAIN)
> -                               continue;
> -                       break;
> -               }
> -
> -               list_add_tail(&rdata->list, rdata_list);
> -               offset += cur_len;
> -               len -= cur_len;
> -       } while (len > 0);
> -
> -       return rc;
> -}
> -
> -static void
> -collect_uncached_read_data(struct cifs_aio_ctx *ctx)
> -{
> -       struct cifs_readdata *rdata, *tmp;
> -       struct iov_iter *to = &ctx->iter;
> -       struct cifs_sb_info *cifs_sb;
> -       int rc;
> -
> -       cifs_sb = CIFS_SB(ctx->cfile->dentry->d_sb);
> -
> -       mutex_lock(&ctx->aio_mutex);
> -
> -       if (list_empty(&ctx->list)) {
> -               mutex_unlock(&ctx->aio_mutex);
> -               return;
> -       }
> -
> -       rc = ctx->rc;
> -       /* the loop below should proceed in the order of increasing offsets */
> -again:
> -       list_for_each_entry_safe(rdata, tmp, &ctx->list, list) {
> -               if (!rc) {
> -                       if (!try_wait_for_completion(&rdata->done)) {
> -                               mutex_unlock(&ctx->aio_mutex);
> -                               return;
> -                       }
> -
> -                       if (rdata->result == -EAGAIN) {
> -                               /* resend call if it's a retryable error */
> -                               struct list_head tmp_list;
> -                               unsigned int got_bytes = rdata->got_bytes;
> -
> -                               list_del_init(&rdata->list);
> -                               INIT_LIST_HEAD(&tmp_list);
> -
> -                               if (ctx->direct_io) {
> -                                       /*
> -                                        * Re-use rdata as this is a
> -                                        * direct I/O
> -                                        */
> -                                       rc = cifs_resend_rdata(
> -                                               rdata,
> -                                               &tmp_list, ctx);
> -                               } else {
> -                                       rc = cifs_send_async_read(
> -                                               rdata->offset + got_bytes,
> -                                               rdata->bytes - got_bytes,
> -                                               rdata->cfile, cifs_sb,
> -                                               &tmp_list, ctx);
> -
> -                                       kref_put(&rdata->refcount,
> -                                               cifs_readdata_release);
> -                               }
> -
> -                               list_splice(&tmp_list, &ctx->list);
> -
> -                               goto again;
> -                       } else if (rdata->result)
> -                               rc = rdata->result;
> -
> -                       /* if there was a short read -- discard anything left */
> -                       if (rdata->got_bytes && rdata->got_bytes < rdata->bytes)
> -                               rc = -ENODATA;
> -
> -                       ctx->total_len += rdata->got_bytes;
> -               }
> -               list_del_init(&rdata->list);
> -               kref_put(&rdata->refcount, cifs_readdata_release);
> -       }
> -
> -       if (!ctx->direct_io)
> -               ctx->total_len = ctx->len - iov_iter_count(to);
> -
> -       /* mask nodata case */
> -       if (rc == -ENODATA)
> -               rc = 0;
> -
> -       ctx->rc = (rc == 0) ? (ssize_t)ctx->total_len : rc;
> -
> -       mutex_unlock(&ctx->aio_mutex);
> -
> -       if (ctx->iocb && ctx->iocb->ki_complete)
> -               ctx->iocb->ki_complete(ctx->iocb, ctx->rc);
> -       else
> -               complete(&ctx->done);
> -}
> -
> -static ssize_t __cifs_readv(
> -       struct kiocb *iocb, struct iov_iter *to, bool direct)
> -{
> -       size_t len;
> -       struct file *file = iocb->ki_filp;
> -       struct cifs_sb_info *cifs_sb;
> -       struct cifsFileInfo *cfile;
> -       struct cifs_tcon *tcon;
> -       ssize_t rc, total_read = 0;
> -       loff_t offset = iocb->ki_pos;
> -       struct cifs_aio_ctx *ctx;
> -
> -       /*
> -        * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> -        * fall back to data copy read path
> -        * this could be improved by getting pages directly in ITER_KVEC
> -        */
> -       if (direct && iov_iter_is_kvec(to)) {
> -               cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
> -               direct = false;
> -       }
> -
> -       len = iov_iter_count(to);
> -       if (!len)
> -               return 0;
> -
> -       cifs_sb = CIFS_FILE_SB(file);
> -       cfile = file->private_data;
> -       tcon = tlink_tcon(cfile->tlink);
> -
> -       if (!tcon->ses->server->ops->async_readv)
> -               return -ENOSYS;
> -
> -       if ((file->f_flags & O_ACCMODE) == O_WRONLY)
> -               cifs_dbg(FYI, "attempting read on write only file instance\n");
> -
> -       ctx = cifs_aio_ctx_alloc();
> -       if (!ctx)
> -               return -ENOMEM;
> -
> -       ctx->pos        = offset;
> -       ctx->direct_io  = direct;
> -       ctx->len        = len;
> -       ctx->cfile      = cifsFileInfo_get(cfile);
> -
> -       if (!is_sync_kiocb(iocb))
> -               ctx->iocb = iocb;
> -
> -       if (iter_is_iovec(to))
> -               ctx->should_dirty = true;
> -
> -       rc = extract_iter_to_iter(to, len, &ctx->iter, &ctx->bv);
> -       if (rc < 0) {
> -               kref_put(&ctx->refcount, cifs_aio_ctx_release);
> -               return rc;
> -       }
> -       ctx->npages = rc;
> -
> -       /* grab a lock here due to read response handlers can access ctx */
> -       mutex_lock(&ctx->aio_mutex);
> -
> -       rc = cifs_send_async_read(offset, len, cfile, cifs_sb, &ctx->list, ctx);
> -
> -       /* if at least one read request send succeeded, then reset rc */
> -       if (!list_empty(&ctx->list))
> -               rc = 0;
> -
> -       mutex_unlock(&ctx->aio_mutex);
> -
> -       if (rc) {
> -               kref_put(&ctx->refcount, cifs_aio_ctx_release);
> -               return rc;
> -       }
> -
> -       if (!is_sync_kiocb(iocb)) {
> -               kref_put(&ctx->refcount, cifs_aio_ctx_release);
> -               return -EIOCBQUEUED;
> -       }
> -
> -       rc = wait_for_completion_killable(&ctx->done);
> -       if (rc) {
> -               mutex_lock(&ctx->aio_mutex);
> -               ctx->rc = rc = -EINTR;
> -               total_read = ctx->total_len;
> -               mutex_unlock(&ctx->aio_mutex);
> -       } else {
> -               rc = ctx->rc;
> -               total_read = ctx->total_len;
> -       }
> -
> -       kref_put(&ctx->refcount, cifs_aio_ctx_release);
> -
> -       if (total_read) {
> -               iocb->ki_pos += total_read;
> -               return total_read;
> -       }
> -       return rc;
> -}
> -
> -ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
> -{
> -       return __cifs_readv(iocb, to, true);
> -}
> -
> -ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> -{
> -       return __cifs_readv(iocb, to, false);
> -}
> -
>  ssize_t
>  cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
>  {
> @@ -3720,12 +3351,15 @@ cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
>          * pos+len-1.
>          */
>         if (!CIFS_CACHE_READ(cinode))
> -               return cifs_user_readv(iocb, to);
> +               return netfs_direct_read_iter(iocb, to);
>
>         if (cap_unix(tcon->ses) &&
>             (CIFS_UNIX_FCNTL_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability)) &&
> -           ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0))
> +           ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0)) {
> +               if (iocb->ki_flags & IOCB_DIRECT)
> +                       return netfs_direct_read_iter(iocb, to);
>                 return generic_file_read_iter(iocb, to);
> +       }
>
>         /*
>          * We need to hold the sem to be sure nobody modifies lock list
> @@ -3734,104 +3368,16 @@ cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
>         down_read(&cinode->lock_sem);
>         if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(to),
>                                      tcon->ses->server->vals->shared_lock_type,
> -                                    0, NULL, CIFS_READ_OP))
> -               rc = generic_file_read_iter(iocb, to);
> +                                    0, NULL, CIFS_READ_OP)) {
> +               if (iocb->ki_flags & IOCB_DIRECT)
> +                       rc = netfs_direct_read_iter(iocb, to);
> +               else
> +                       rc = generic_file_read_iter(iocb, to);
> +       }
>         up_read(&cinode->lock_sem);
>         return rc;
>  }
>
> -static ssize_t
> -cifs_read(struct file *file, char *read_data, size_t read_size, loff_t *offset)
> -{
> -       int rc = -EACCES;
> -       unsigned int bytes_read = 0;
> -       unsigned int total_read;
> -       unsigned int current_read_size;
> -       unsigned int rsize;
> -       struct cifs_sb_info *cifs_sb;
> -       struct cifs_tcon *tcon;
> -       struct TCP_Server_Info *server;
> -       unsigned int xid;
> -       char *cur_offset;
> -       struct cifsFileInfo *open_file;
> -       struct cifs_io_parms io_parms = {0};
> -       int buf_type = CIFS_NO_BUFFER;
> -       __u32 pid;
> -
> -       xid = get_xid();
> -       cifs_sb = CIFS_FILE_SB(file);
> -
> -       /* FIXME: set up handlers for larger reads and/or convert to async */
> -       rsize = min_t(unsigned int, cifs_sb->ctx->rsize, CIFSMaxBufSize);
> -
> -       if (file->private_data == NULL) {
> -               rc = -EBADF;
> -               free_xid(xid);
> -               return rc;
> -       }
> -       open_file = file->private_data;
> -       tcon = tlink_tcon(open_file->tlink);
> -       server = cifs_pick_channel(tcon->ses);
> -
> -       if (!server->ops->sync_read) {
> -               free_xid(xid);
> -               return -ENOSYS;
> -       }
> -
> -       if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> -               pid = open_file->pid;
> -       else
> -               pid = current->tgid;
> -
> -       if ((file->f_flags & O_ACCMODE) == O_WRONLY)
> -               cifs_dbg(FYI, "attempting read on write only file instance\n");
> -
> -       for (total_read = 0, cur_offset = read_data; read_size > total_read;
> -            total_read += bytes_read, cur_offset += bytes_read) {
> -               do {
> -                       current_read_size = min_t(uint, read_size - total_read,
> -                                                 rsize);
> -                       /*
> -                        * For windows me and 9x we do not want to request more
> -                        * than it negotiated since it will refuse the read
> -                        * then.
> -                        */
> -                       if (!(tcon->ses->capabilities &
> -                               tcon->ses->server->vals->cap_large_files)) {
> -                               current_read_size = min_t(uint,
> -                                       current_read_size, CIFSMaxBufSize);
> -                       }
> -                       if (open_file->invalidHandle) {
> -                               rc = cifs_reopen_file(open_file, true);
> -                               if (rc != 0)
> -                                       break;
> -                       }
> -                       io_parms.pid = pid;
> -                       io_parms.tcon = tcon;
> -                       io_parms.offset = *offset;
> -                       io_parms.length = current_read_size;
> -                       io_parms.server = server;
> -                       rc = server->ops->sync_read(xid, &open_file->fid, &io_parms,
> -                                                   &bytes_read, &cur_offset,
> -                                                   &buf_type);
> -               } while (rc == -EAGAIN);
> -
> -               if (rc || (bytes_read == 0)) {
> -                       if (total_read) {
> -                               break;
> -                       } else {
> -                               free_xid(xid);
> -                               return rc;
> -                       }
> -               } else {
> -                       cifs_stats_bytes_read(tcon, total_read);
> -                       *offset += bytes_read;
> -               }
> -       }
> -       free_xid(xid);
> -       return total_read;
> -}
> -
>  /*
>   * If the page is mmap'ed into a process' page tables, then we need to make
>   * sure that it doesn't change while being written back.
> @@ -3901,224 +3447,149 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  }
>
>  /*
> - * Unlock a bunch of folios in the pagecache.
> + * Issue a read operation on behalf of the netfs helper functions.  We're asked
> + * to make a read of a certain size at a point in the file.  We are permitted
> + * to only read a portion of that, but as long as we read something, the netfs
> + * helper will call us again so that we can issue another read.
>   */
> -static void cifs_unlock_folios(struct address_space *mapping, pgoff_t first, pgoff_t last)
> -{
> -       struct folio *folio;
> -       XA_STATE(xas, &mapping->i_pages, first);
> -
> -       rcu_read_lock();
> -       xas_for_each(&xas, folio, last) {
> -               folio_unlock(folio);
> -       }
> -       rcu_read_unlock();
> -}
> -
> -static void cifs_readahead_complete(struct work_struct *work)
> -{
> -       struct cifs_readdata *rdata = container_of(work,
> -                                                  struct cifs_readdata, work);
> -       struct folio *folio;
> -       pgoff_t last;
> -       bool good = rdata->result == 0 || (rdata->result == -EAGAIN && rdata->got_bytes);
> -
> -       XA_STATE(xas, &rdata->mapping->i_pages, rdata->offset / PAGE_SIZE);
> -
> -#if 0
> -       if (good)
> -               cifs_readpage_to_fscache(rdata->mapping->host, page);
> -#endif
> -
> -       if (iov_iter_count(&rdata->iter) > 0)
> -               iov_iter_zero(iov_iter_count(&rdata->iter), &rdata->iter);
> -
> -       last = round_down(rdata->offset + rdata->got_bytes - 1, PAGE_SIZE);
> -
> -       xas_for_each(&xas, folio, last) {
> -               if (good) {
> -                       flush_dcache_folio(folio);
> -                       folio_mark_uptodate(folio);
> -               }
> -               folio_unlock(folio);
> -       }
> -
> -       kref_put(&rdata->refcount, cifs_readdata_release);
> -}
> -
> -static void cifs_readahead(struct readahead_control *ractl)
> +static void cifs_req_issue_op(struct netfs_read_subrequest *subreq)
>  {
> -       struct cifsFileInfo *open_file = ractl->file->private_data;
> -       struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file);
> +       struct netfs_read_request *rreq = subreq->rreq;
>         struct TCP_Server_Info *server;
> +       struct cifs_readdata *rdata;
> +       struct cifsFileInfo *open_file = rreq->netfs_priv;
> +       struct cifs_sb_info *cifs_sb = CIFS_SB(rreq->inode->i_sb);
> +       struct cifs_credits credits_on_stack, *credits = &credits_on_stack;
>         unsigned int xid;
>         pid_t pid;
>         int rc = 0;
> +       unsigned int rsize;
>
>         xid = get_xid();
>
>         if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
>                 pid = open_file->pid;
>         else
> -               pid = current->tgid;
> +               pid = current->tgid; // Ummm...  This may be a workqueue
>
>         server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
>
> -       cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n",
> -                __func__, ractl->file, ractl->mapping, readahead_count(ractl));
> -
> -       /*
> -        * Chop the readahead request up into rsize-sized read requests.
> -        */
> -       while (readahead_count(ractl) - ractl->_batch_count) {
> -               unsigned int i, nr_pages, rsize;
> -               struct cifs_readdata *rdata;
> -               struct cifs_credits credits_on_stack;
> -               struct cifs_credits *credits = &credits_on_stack;
> +       cifs_dbg(FYI, "%s: op=%08x[%x] mapping=%p len=%zu/%zu\n",
> +                __func__, rreq->debug_id, subreq->debug_index, rreq->mapping,
> +                subreq->transferred, subreq->len);
>
> -               if (open_file->invalidHandle) {
> +       if (open_file->invalidHandle) {
> +               do {
>                         rc = cifs_reopen_file(open_file, true);
> -                       if (rc) {
> -                               if (rc == -EAGAIN)
> -                                       continue;
> -                               break;
> -                       }
> -               }
> -
> -               rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize,
> -                                                  &rsize, credits);
> +               } while (rc == -EAGAIN);
>                 if (rc)
> -                       break;
> -               nr_pages = min_t(size_t, rsize / PAGE_SIZE, readahead_count(ractl));
> -
> -               /*
> -                * Give up immediately if rsize is too small to read an entire
> -                * page. The VFS will fall back to readpage. We should never
> -                * reach this point however since we set ra_pages to 0 when the
> -                * rsize is smaller than a cache page.
> -                */
> -               if (unlikely(!nr_pages)) {
> -                       add_credits_and_wake_if(server, credits, 0);
> -                       break;
> -               }
> -
> -               rdata = cifs_readdata_alloc(cifs_readahead_complete);
> -               if (!rdata) {
> -                       /* best to give up if we're out of mem */
> -                       add_credits_and_wake_if(server, credits, 0);
> -                       break;
> -               }
> +                       goto out;
> +       }
>
> -               rdata->offset   = readahead_pos(ractl);
> -               rdata->bytes    = nr_pages * PAGE_SIZE;
> -               rdata->cfile    = cifsFileInfo_get(open_file);
> -               rdata->server   = server;
> -               rdata->mapping  = ractl->mapping;
> -               rdata->pid      = pid;
> -               rdata->credits  = credits_on_stack;
> +       rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize, &rsize, credits);
> +       if (rc)
> +               goto out;
>
> -               for (i = 0; i < nr_pages; i++)
> -                       if (!readahead_folio(ractl))
> -                               BUG();
> +       rdata = cifs_readdata_alloc(NULL);
> +       if (!rdata) {
> +               add_credits_and_wake_if(server, credits, 0);
> +               rc = -ENOMEM;
> +               goto out;
> +       }
>
> -               iov_iter_xarray(&rdata->iter, READ, &rdata->mapping->i_pages,
> -                               rdata->offset, rdata->bytes);
> +       __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> +       rdata->subreq   = subreq;
> +       rdata->cfile    = cifsFileInfo_get(open_file);
> +       rdata->server   = server;
> +       rdata->offset   = subreq->start + subreq->transferred;
> +       rdata->bytes    = subreq->len   - subreq->transferred;
> +       rdata->pid      = pid;
> +       rdata->credits  = credits_on_stack;
> +       rdata->iter     = subreq->iter;
>
> -               rc = adjust_credits(server, &rdata->credits, rdata->bytes);
> -               if (!rc) {
> -                       if (rdata->cfile->invalidHandle)
> -                               rc = -EAGAIN;
> -                       else
> -                               rc = server->ops->async_readv(rdata);
> -               }
> +       rc = adjust_credits(server, &rdata->credits, rdata->bytes);
> +       if (!rc) {
> +               if (rdata->cfile->invalidHandle)
> +                       rc = -EAGAIN;
> +               else
> +                       rc = server->ops->async_readv(rdata);
> +       }
>
> -               if (rc) {
> -                       add_credits_and_wake_if(server, &rdata->credits, 0);
> -                       cifs_unlock_folios(rdata->mapping,
> -                                          rdata->offset / PAGE_SIZE,
> -                                          (rdata->offset + rdata->bytes - 1) / PAGE_SIZE);
> -                       /* Fallback to the readpage in error/reconnect cases */
> -                       kref_put(&rdata->refcount, cifs_readdata_release);
> -                       break;
> -               }
> +       if (rc) {
> +               add_credits_and_wake_if(server, &rdata->credits, 0);
> +               /* Fallback to the readpage in error/reconnect cases */
> +               kref_put(&rdata->refcount, cifs_readdata_release);
> +               goto out;
>         }
>
> +       kref_put(&rdata->refcount, cifs_readdata_release);
> +
> +out:
>         free_xid(xid);
> +       if (rc)
> +               netfs_subreq_terminated(subreq, rc, false);
> +}
> +
> +static int cifs_init_rreq(struct netfs_read_request *rreq, struct file *file)
> +{
> +       rreq->netfs_priv = file->private_data;
> +       return 0;
>  }
>
>  /*
> - * cifs_readpage_worker must be called with the page pinned
> + * Expand the size of a readahead to the size of the rsize, if at least as
> + * large as a page, allowing for the possibility that rsize is not pow-2
> + * aligned.
>   */
> -static int cifs_readpage_worker(struct file *file, struct page *page,
> -       loff_t *poffset)
> +static void cifs_expand_readahead(struct netfs_read_request *rreq)
>  {
> -       char *read_data;
> -       int rc;
> +       struct cifs_sb_info *cifs_sb = CIFS_SB(rreq->inode->i_sb);
> +       unsigned int rsize = cifs_sb->ctx->rsize;
> +       loff_t misalignment, i_size = i_size_read(rreq->inode);
>
> -       /* Is the page cached? */
> -       rc = cifs_readpage_from_fscache(file_inode(file), page);
> -       if (rc == 0)
> -               goto read_complete;
> -
> -       read_data = kmap(page);
> -       /* for reads over a certain size could initiate async read ahead */
> -
> -       rc = cifs_read(file, read_data, PAGE_SIZE, poffset);
> -
> -       if (rc < 0)
> -               goto io_error;
> -       else
> -               cifs_dbg(FYI, "Bytes read %d\n", rc);
> +       if (rsize < PAGE_SIZE)
> +               return;
>
> -       /* we do not want atime to be less than mtime, it broke some apps */
> -       file_inode(file)->i_atime = current_time(file_inode(file));
> -       if (timespec64_compare(&(file_inode(file)->i_atime), &(file_inode(file)->i_mtime)))
> -               file_inode(file)->i_atime = file_inode(file)->i_mtime;
> +       if (rsize < INT_MAX)
> +               rsize = roundup_pow_of_two(rsize);
>         else
> -               file_inode(file)->i_atime = current_time(file_inode(file));
> +               rsize = ((unsigned int)INT_MAX + 1) / 2;
>
> -       if (PAGE_SIZE > rc)
> -               memset(read_data + rc, 0, PAGE_SIZE - rc);
> -
> -       flush_dcache_page(page);
> -       SetPageUptodate(page);
> -
> -       /* send this page to the cache */
> -       cifs_readpage_to_fscache(file_inode(file), page);
> -
> -       rc = 0;
> -
> -io_error:
> -       kunmap(page);
> -       unlock_page(page);
> +       misalignment = rreq->start & (rsize - 1);
> +       if (misalignment) {
> +               rreq->start -= misalignment;
> +               rreq->len += misalignment;
> +       }
>
> -read_complete:
> -       return rc;
> +       rreq->len = round_up(rreq->len, rsize);
> +       if (rreq->start < i_size && rreq->len > i_size - rreq->start)
> +               rreq->len = i_size - rreq->start;
>  }
>
> -static int cifs_readpage(struct file *file, struct page *page)
> +static void cifs_rreq_done(struct netfs_read_request *rreq)
>  {
> -       loff_t offset = page_file_offset(page);
> -       int rc = -EACCES;
> -       unsigned int xid;
> +       struct inode *inode = rreq->inode;
>
> -       xid = get_xid();
> -
> -       if (file->private_data == NULL) {
> -               rc = -EBADF;
> -               free_xid(xid);
> -               return rc;
> -       }
> -
> -       cifs_dbg(FYI, "readpage %p at offset %d 0x%x\n",
> -                page, (int)offset, (int)offset);
> -
> -       rc = cifs_readpage_worker(file, page, &offset);
> +       /* we do not want atime to be less than mtime, it broke some apps */
> +       inode->i_atime = current_time(inode);
> +       if (timespec64_compare(&inode->i_atime, &inode->i_mtime))
> +               inode->i_atime = inode->i_mtime;
> +       else
> +               inode->i_atime = current_time(inode);
> +}
>
> -       free_xid(xid);
> -       return rc;
> +static void cifs_req_cleanup(struct address_space *mapping, void *netfs_priv)
> +{
>  }
>
> +const struct netfs_request_ops cifs_req_ops = {
> +       .init_rreq              = cifs_init_rreq,
> +       .expand_readahead       = cifs_expand_readahead,
> +       .issue_op               = cifs_req_issue_op,
> +       .done                   = cifs_rreq_done,
> +       .cleanup                = cifs_req_cleanup,
> +};
> +
>  static int is_inode_writable(struct cifsInodeInfo *cifs_inode)
>  {
>         struct cifsFileInfo *open_file;
> @@ -4168,34 +3639,20 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
>                         loff_t pos, unsigned len, unsigned flags,
>                         struct page **pagep, void **fsdata)
>  {
> -       int oncethru = 0;
> -       pgoff_t index = pos >> PAGE_SHIFT;
> -       loff_t offset = pos & (PAGE_SIZE - 1);
> -       loff_t page_start = pos & PAGE_MASK;
> -       loff_t i_size;
> -       struct page *page;
> -       int rc = 0;
> +       struct folio *folio;
> +       int rc;
>
>         cifs_dbg(FYI, "write_begin from %lld len %d\n", (long long)pos, len);
>
> -start:
> -       page = grab_cache_page_write_begin(mapping, index, flags);
> -       if (!page) {
> -               rc = -ENOMEM;
> -               goto out;
> -       }
> -
> -       if (PageUptodate(page))
> -               goto out;
> -
> -       /*
> -        * If we write a full page it will be up to date, no need to read from
> -        * the server. If the write is short, we'll end up doing a sync write
> -        * instead.
> +       /* Prefetch area to be written into the cache if we're caching this
> +        * file.  We need to do this before we get a lock on the page in case
> +        * there's more than one writer competing for the same cache block.
>          */
> -       if (len == PAGE_SIZE)
> -               goto out;
> +       rc = netfs_write_begin(file, mapping, pos, len, flags, &folio, fsdata);
> +       if (rc < 0)
> +               return rc;
>
> +#if 0
>         /*
>          * optimize away the read when we have an oplock, and we're not
>          * expecting to use any of the data we'd be reading in. That
> @@ -4210,34 +3667,17 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
>                                            offset + len,
>                                            PAGE_SIZE);
>                         /*
> -                        * PageChecked means that the parts of the page
> -                        * to which we're not writing are considered up
> -                        * to date. Once the data is copied to the
> -                        * page, it can be set uptodate.
> +                        * Marking a folio checked means that the parts of the
> +                        * page to which we're not writing are considered up to
> +                        * date. Once the data is copied to the page, it can be
> +                        * set uptodate.
>                          */
> -                       SetPageChecked(page);
> +                       folio_set_checked(folio);
>                         goto out;
>                 }
>         }
> -
> -       if ((file->f_flags & O_ACCMODE) != O_WRONLY && !oncethru) {
> -               /*
> -                * might as well read a page, it is fast enough. If we get
> -                * an error, we don't need to return it. cifs_write_end will
> -                * do a sync write instead since PG_uptodate isn't set.
> -                */
> -               cifs_readpage_worker(file, page, &page_start);
> -               put_page(page);
> -               oncethru = 1;
> -               goto start;
> -       } else {
> -               /* we could try using another file handle if there is one -
> -                  but how would we lock it to prevent close of that handle
> -                  racing with this read? In any case
> -                  this will be written out by write_end so is fine */
> -       }
> -out:
> -       *pagep = page;
> +#endif
> +       *pagep = folio_page(folio, (pos - folio_pos(folio)) / PAGE_SIZE);
>         return rc;
>  }
>
> @@ -4429,8 +3869,8 @@ static int cifs_set_page_dirty(struct page *page)
>  #endif
>
>  const struct address_space_operations cifs_addr_ops = {
> -       .readpage = cifs_readpage,
> -       .readahead = cifs_readahead,
> +       .readpage = netfs_readpage,
> +       .readahead = netfs_readahead,
>         .writepage = cifs_writepage,
>         .writepages = cifs_writepages,
>         .write_begin = cifs_write_begin,
> @@ -4455,7 +3895,7 @@ const struct address_space_operations cifs_addr_ops = {
>   * to leave cifs_readpages out of the address space operations.
>   */
>  const struct address_space_operations cifs_addr_ops_smallbuf = {
> -       .readpage = cifs_readpage,
> +       .readpage = netfs_readpage,
>         .writepage = cifs_writepage,
>         .writepages = cifs_writepages,
>         .write_begin = cifs_write_begin,
> diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c
> index a7e7e5a97b7f..bb1c3a372de4 100644
> --- a/fs/cifs/fscache.c
> +++ b/fs/cifs/fscache.c
> @@ -134,34 +134,3 @@ void cifs_fscache_release_inode_cookie(struct inode *inode)
>                 cifsi->netfs_ctx.cache = NULL;
>         }
>  }
> -
> -/*
> - * Retrieve a page from FS-Cache
> - */
> -int __cifs_readpage_from_fscache(struct inode *inode, struct page *page)
> -{
> -       cifs_dbg(FYI, "%s: (fsc:%p, p:%p, i:0x%p\n",
> -                __func__, cifs_inode_cookie(inode), page, inode);
> -       return -ENOBUFS; // Needs conversion to using netfslib
> -}
> -
> -/*
> - * Retrieve a set of pages from FS-Cache
> - */
> -int __cifs_readpages_from_fscache(struct inode *inode,
> -                               struct address_space *mapping,
> -                               struct list_head *pages,
> -                               unsigned *nr_pages)
> -{
> -       cifs_dbg(FYI, "%s: (0x%p/%u/0x%p)\n",
> -                __func__, cifs_inode_cookie(inode), *nr_pages, inode);
> -       return -ENOBUFS; // Needs conversion to using netfslib
> -}
> -
> -void __cifs_readpage_to_fscache(struct inode *inode, struct page *page)
> -{
> -       cifs_dbg(FYI, "%s: (fsc: %p, p: %p, i: %p)\n",
> -                __func__, cifs_inode_cookie(inode), page, inode);
> -
> -       // Needs conversion to using netfslib
> -}
> diff --git a/fs/cifs/fscache.h b/fs/cifs/fscache.h
> index 9f6e42e85d14..fdc03cd7b881 100644
> --- a/fs/cifs/fscache.h
> +++ b/fs/cifs/fscache.h
> @@ -58,14 +58,6 @@ void cifs_fscache_fill_coherency(struct inode *inode,
>  }
>
>
> -extern int cifs_fscache_release_page(struct page *page, gfp_t gfp);
> -extern int __cifs_readpage_from_fscache(struct inode *, struct page *);
> -extern int __cifs_readpages_from_fscache(struct inode *,
> -                                        struct address_space *,
> -                                        struct list_head *,
> -                                        unsigned *);
> -extern void __cifs_readpage_to_fscache(struct inode *, struct page *);
> -
>  static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode)
>  {
>         return netfs_i_cookie(inode);
> @@ -80,33 +72,6 @@ static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags
>                            i_size_read(inode), flags);
>  }
>
> -static inline int cifs_readpage_from_fscache(struct inode *inode,
> -                                            struct page *page)
> -{
> -       if (cifs_inode_cookie(inode))
> -               return __cifs_readpage_from_fscache(inode, page);
> -
> -       return -ENOBUFS;
> -}
> -
> -static inline int cifs_readpages_from_fscache(struct inode *inode,
> -                                             struct address_space *mapping,
> -                                             struct list_head *pages,
> -                                             unsigned *nr_pages)
> -{
> -       if (cifs_inode_cookie(inode))
> -               return __cifs_readpages_from_fscache(inode, mapping, pages,
> -                                                    nr_pages);
> -       return -ENOBUFS;
> -}
> -
> -static inline void cifs_readpage_to_fscache(struct inode *inode,
> -                                           struct page *page)
> -{
> -       if (PageFsCache(page))
> -               __cifs_readpage_to_fscache(inode, page);
> -}
> -
>  #else /* CONFIG_CIFS_FSCACHE */
>  static inline
>  void cifs_fscache_fill_coherency(struct inode *inode,
> @@ -123,23 +88,6 @@ static inline void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool upd
>  static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { return NULL; }
>  static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags) {}
>
> -static inline int
> -cifs_readpage_from_fscache(struct inode *inode, struct page *page)
> -{
> -       return -ENOBUFS;
> -}
> -
> -static inline int cifs_readpages_from_fscache(struct inode *inode,
> -                                             struct address_space *mapping,
> -                                             struct list_head *pages,
> -                                             unsigned *nr_pages)
> -{
> -       return -ENOBUFS;
> -}
> -
> -static inline void cifs_readpage_to_fscache(struct inode *inode,
> -                       struct page *page) {}
> -
>  #endif /* CONFIG_CIFS_FSCACHE */
>
>  #endif /* _CIFS_FSCACHE_H */
> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> index 7d8b3ceb2af3..b6a9ded9fbb2 100644
> --- a/fs/cifs/inode.c
> +++ b/fs/cifs/inode.c
> @@ -26,6 +26,19 @@
>  #include "fs_context.h"
>  #include "cifs_ioctl.h"
>
> +/*
> + * Set parameters for the netfs library
> + */
> +static void cifs_set_netfs_context(struct inode *inode)
> +{
> +       struct netfs_i_context *ctx = netfs_i_context(inode);
> +       struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
> +
> +       netfs_i_context_init(inode, &cifs_req_ops);
> +       ctx->rsize = cifs_sb->ctx->rsize;
> +       ctx->wsize = cifs_sb->ctx->wsize;
> +}
> +
>  static void cifs_set_ops(struct inode *inode)
>  {
>         struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
> @@ -209,8 +222,10 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr)
>
>         if (fattr->cf_flags & CIFS_FATTR_DFS_REFERRAL)
>                 inode->i_flags |= S_AUTOMOUNT;
> -       if (inode->i_state & I_NEW)
> +       if (inode->i_state & I_NEW) {
> +               cifs_set_netfs_context(inode);
>                 cifs_set_ops(inode);
> +       }
>         return 0;
>  }
>
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> index ebbea7526ee2..0d76cffb4e75 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -23,6 +23,7 @@
>  #include <linux/uuid.h>
>  #include <linux/pagemap.h>
>  #include <linux/xattr.h>
> +#include <linux/netfs.h>
>  #include "cifsglob.h"
>  #include "cifsacl.h"
>  #include "cifsproto.h"
> @@ -4185,7 +4186,19 @@ smb2_readv_callback(struct mid_q_entry *mid)
>                                      tcon->tid, tcon->ses->Suid,
>                                      rdata->offset, rdata->got_bytes);
>
> -       queue_work(cifsiod_wq, &rdata->work);
> +       if (rdata->result == -ENODATA) {
> +               /* We may have got an EOF error because fallocate
> +                * failed to enlarge the file.
> +                */
> +               if (rdata->subreq->start < rdata->subreq->rreq->i_size)
> +                       rdata->result = 0;
> +       }
> +       if (rdata->result == 0 || rdata->result == -EAGAIN)
> +               iov_iter_advance(&rdata->subreq->iter, rdata->got_bytes);
> +       netfs_subreq_terminated(rdata->subreq,
> +                               (rdata->result == 0 || rdata->result == -EAGAIN) ?
> +                               rdata->got_bytes : rdata->result, false);
> +       kref_put(&rdata->refcount, cifs_readdata_release);
>         DeleteMidQEntry(mid);
>         add_credits(server, &credits, 0);
>  }
> diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
> index df13c9b22ca8..1fa242140dc4 100644
> --- a/fs/netfs/read_helper.c
> +++ b/fs/netfs/read_helper.c
> @@ -553,8 +553,13 @@ static void netfs_rreq_assess_dio(struct netfs_read_request *rreq)
>         list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
>                 if (subreq->error || subreq->transferred == 0)
>                         break;
> -               for (i = 0; i < subreq->bv_count; i++)
> +               for (i = 0; i < subreq->bv_count; i++) {
>                         flush_dcache_page(subreq->bv[i].bv_page);
> +                       // TODO: cifs marks pages in the destination buffer
> +                       // dirty under some circumstances after a read.  Do we
> +                       // need to do that too?
> +                       set_page_dirty(subreq->bv[i].bv_page);
> +               }
>                 transferred += subreq->transferred;
>                 if (subreq->transferred < subreq->len)
>                         break;
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 3/7] cifs: Change the I/O paths to use an iterator rather than a page list
  2022-01-25 13:57 ` [RFC PATCH 3/7] cifs: Change the I/O paths to use an iterator rather than a page list David Howells
  2022-01-31  5:06   ` Rohith Surabattula
@ 2022-02-14 16:06   ` David Howells
  1 sibling, 0 replies; 20+ messages in thread
From: David Howells @ 2022-02-14 16:06 UTC (permalink / raw)
  To: Rohith Surabattula
  Cc: dhowells, smfrench, nspmangalore, jlayton, linux-cifs,
	linux-cachefs, linux-fsdevel

Rohith Surabattula <rohiths.msft@gmail.com> wrote:

> After copying the buf to the XArray iterator, "got_bytes" field is not
> updated. As a result, the read of data which is less than page size
> failed.
> Below is the patch to fix the above issue.

Okay, I've folded that in, thanks.

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 7/7] cifs: Use netfslib to handle reads
  2022-01-25 13:59 ` [RFC PATCH 7/7] cifs: Use netfslib to handle reads David Howells
  2022-02-08  5:59   ` Rohith Surabattula
@ 2022-02-14 16:33   ` David Howells
  2022-02-28 14:14     ` Rohith Surabattula
  2022-02-28 14:28     ` David Howells
  1 sibling, 2 replies; 20+ messages in thread
From: David Howells @ 2022-02-14 16:33 UTC (permalink / raw)
  To: Rohith Surabattula
  Cc: dhowells, smfrench, nspmangalore, jlayton, linux-cifs,
	linux-cachefs, linux-fsdevel

Rohith Surabattula <rohiths.msft@gmail.com> wrote:

> I have tested netfs integration with fsc mount option enabled. But, I
> observed function "netfs_cache_prepare_read" always returns
> "NETFS_DOWNLOAD_FROM_SERVER" because cres->ops(i.e cachefiles
> operations) is not set.

I see it download from the server and write to the cache:

	# cat /proc/fs/fscache/stats 
	...
	IO     : rd=0 wr=4     <---- no reads, four writes made
	RdHelp : DR=0 RA=4 RP=0 WB=0 WBZ=0 rr=0 sr=0
	RdHelp : ZR=0 sh=0 sk=0
	RdHelp : DL=4 ds=4 df=0 di=0
	RdHelp : RD=0 rs=0 rf=0
	RdHelp : WR=4 ws=4 wf=0

Turning on the cachefiles_vol_coherency tracepoint, I see:

     kworker/2:2-1040    [002] .....   585.499799: cachefiles_vol_coherency: V=00000003 VOL BAD cmp  B=480004
     kworker/2:2-1040    [002] .....   585.499872: cachefiles_vol_coherency: V=00000003 VOL SET ok   B=480005

every time I unmount and mount again.  One of the fields is different each
time.

Using the netfs tracepoints, I can see the download being made from the server
and then the subsequent write to the cache:

          md5sum-4689    [003] .....   887.382290: netfs_read: R=00000005 READAHEAD c=0000004e ni=86 s=0 20000
          md5sum-4689    [003] .....   887.383076: netfs_read: R=00000005 EXPANDED  c=0000004e ni=86 s=0 400000
          md5sum-4689    [003] .....   887.383252: netfs_sreq: R=00000005[0] PREP  DOWN f=01 s=0 0/400000 e=0
          md5sum-4689    [003] .....   887.383252: netfs_sreq: R=00000005[0] SUBMT DOWN f=01 s=0 0/400000 e=0
           cifsd-4687    [002] .....   887.394926: netfs_sreq: R=00000005[0] TERM  DOWN f=03 s=0 400000/400000 e=0
           cifsd-4687    [002] .....   887.394928: netfs_rreq: R=00000005 ASSESS f=22
           cifsd-4687    [002] .....   887.394928: netfs_rreq: R=00000005 UNLOCK f=22
    kworker/u8:4-776     [000] .....   887.395000: netfs_rreq: R=00000005 WRITE  f=02
    kworker/u8:4-776     [000] .....   887.395005: netfs_sreq: R=00000005[0] WRITE DOWN f=03 s=0 400000/400000 e=0
     kworker/3:2-1001    [003] .....   887.627881: netfs_sreq: R=00000005[0] WTERM DOWN f=03 s=0 400000/400000 e=0
     kworker/3:2-1001    [003] .....   887.628163: netfs_rreq: R=00000005 DONE   f=02
     kworker/3:2-1001    [003] .....   887.628165: netfs_sreq: R=00000005[0] FREE  DOWN f=03 s=0 400000/400000 e=0
    kworker/u8:4-776     [000] .....   887.628216: netfs_rreq: R=00000005 FREE   f=02

Can you mount a cifs share with "-o fsc", read a file and then look in
/proc/fs/fscache/cookies and /proc/fs/fscache/stats for me?

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 7/7] cifs: Use netfslib to handle reads
  2022-02-14 16:33   ` David Howells
@ 2022-02-28 14:14     ` Rohith Surabattula
  2022-02-28 14:28     ` David Howells
  1 sibling, 0 replies; 20+ messages in thread
From: Rohith Surabattula @ 2022-02-28 14:14 UTC (permalink / raw)
  To: David Howells
  Cc: smfrench, nspmangalore, jlayton, linux-cifs, linux-cachefs,
	linux-fsdevel

Hi David,

Below is the trace o/p when mounted with fsc option:
              vi-1631    [000] .....  2519.247539: netfs_read:
R=00000006 READAHEAD c=00000000 ni=0 s=0 1000
              vi-1631    [000] .....  2519.247540: netfs_read:
R=00000006 EXPANDED  c=00000000 ni=0 s=0 1000
              vi-1631    [000] .....  2519.247550: netfs_sreq:
R=00000006[0] PREP  DOWN f=00 s=0 0/100000 e=0
              vi-1631    [000] .....  2519.247551: netfs_sreq:
R=00000006[0] SUBMT DOWN f=00 s=0 0/100000 e=0
           cifsd-1390    [001] .....  2519.287542: netfs_sreq:
R=00000006[0] TERM  DOWN f=02 s=0 100000/100000 e=0
           cifsd-1390    [001] .....  2519.287545: netfs_rreq:
R=00000006 ASSESS f=20
           cifsd-1390    [001] .....  2519.287545: netfs_rreq:
R=00000006 UNLOCK f=20
           cifsd-1390    [001] .....  2519.287571: netfs_rreq:
R=00000006 DONE   f=00
           cifsd-1390    [001] .....  2519.287572: netfs_sreq:
R=00000006[0] FREE  DOWN f=02 s=0 100000/100000 e=0
           cifsd-1390    [001] .....  2519.287573: netfs_rreq:
R=00000006 FREE   f=00

Mount :
root@netfsvm:/sys/kernel/debug/tracing# sudo mount -t cifs
//netfsstg.file.core.windows.net/testshare on /mnt/testshare type cifs
(rw,relatime,vers=3.0,cache=strict,username=netfsstg,uid=0,noforceuid,gid=0,noforcegid,addr=52.239.170.72,file_mode=0777,dir_mode=0777,soft,persistenthandles,nounix,serverino,mapposix,fsc,rsize=1048576,wsize=1048576,bsize=1048576,echo_interval=60,actimeo=1)

I dont see writing fscache. It always downloads from the server.

root@netfsvm:/sys/kernel/debug/tracing# ps -A | grep cache
    450 ?        00:00:00 mkey_cache
   1361 ?        00:00:00 cachefilesd

root@netfsvm:/sys/kernel/debug/tracing# cat /proc/fs/fscache/stats
FS-Cache statistics
Cookies: n=29 v=1 vcol=0 voom=0
Acquire: n=29 ok=29 oom=0
LRU    : n=0 exp=0 rmv=0 drp=0 at=0
Invals : n=0
Updates: n=0 rsz=0 rsn=0
Relinqs: n=0 rtr=0 drop=0
NoSpace: nwr=0 ncr=0 cull=0
IO     : rd=0 wr=0
RdHelp : DR=0 RA=6 RP=0 WB=0 WBZ=7 rr=0 sr=0
RdHelp : ZR=0 sh=0 sk=7
RdHelp : DL=6 ds=6 df=0 di=0
RdHelp : RD=0 rs=0 rf=0
RdHelp : WR=0 ws=0 wf=0

root@netfsvm:/sys/kernel/debug/tracing# cat /proc/fs/fscache/cookies
COOKIE   VOLUME   REF ACT ACC S FL DEF
======== ======== === === === = == ================
00000002 00000001   1   0   0 - 4008 302559bec76a7924,
0a13e961000000000a13e96100000000d01f4719d01f4719
00000003 00000001   1   0   0 - 4000 0000000000640090,
37630162000000003763016200000000e8650f119c49f411
00000004 00000001   1   0   0 - 4000 00000000001800f0,
244e016200000000244e01620000000044975123c042f525
00000005 00000001   1   0   0 - 4000 00000000007000a0,
ea92e96100000000ea92e96100000000acee2035acee2035
00000006 00000001   1   0   0 - 4000 00000000007000c0,
ad92e96100000000ad92e96100000000407da317407da317
00000007 00000001   1   0   0 - 4000 00000000002800e0,
4aeaf361000000004aeaf3610000000078c77b0d6850dc1f
00000008 00000001   1   0   0 - 4008 0000000000140080,
df92136200000000df92136200000000b8e0f30eb8e0f30e
00000009 00000001   1   0   0 - 4008 00000000001400e0,
d39d136200000000d39d136200000000f4e6e51bf4e6e51b
0000000a 00000001   1   0   0 - 4008 0000000000140090,
d99d136200000000d99d136200000000dcd77d28dcd77d28
0000000b 00000001   1   0   0 - 4008 0000000000540080,
cdd21c6200000000cdd21c62000000009c8cd90c9c8cd90c
0000000c 00000001   1   0   0 - 4008 00000000005400c0,
cdd21c6200000000cdd21c6200000000f44b440df44b440d
0000000d 00000001   1   0   0 - 4008 00000000005400a0,
cdd21c6200000000cdd21c62000000005487b50f5487b50f
0000000e 00000001   1   0   0 - 4008 00000000005400e0,
ebd21c6200000000ebd21c6200000000c07c1800c07c1800
0000000f 00000001   1   0   0 - 4008 0000000000540090,
ebd21c6200000000ebd21c620000000094fc730094fc7300
00000010 00000001   1   0   0 - 4008 00000000005400d0,
ebd21c6200000000ebd21c6200000000bcb78902bcb78902
00000011 00000001   1   0   0 - 4008 00000000005400b0,
29d31c620000000029d31c62000000002c02e8252c02e825
00000012 00000001   1   0   0 - 4008 00000000005400f0,
29d31c620000000029d31c6200000000c83fae26c83fae26
00000013 00000001   1   0   0 - 4008 0000000000540088,
29d31c620000000029d31c6200000000e4fcc328e4fcc328
00000014 00000001   1   0   0 - 4008 00000000005400c8,
3bd31c62000000003bd31c6200000000747b780b747b780b
00000015 00000001   1   0   0 - 4008 00000000005400a8,
3bd31c62000000003bd31c6200000000ecf57e0decf57e0d
00000016 00000001   1   0   0 - 4008 00000000005400e8,
b0d51c6200000000b0d51c62000000002005e5092005e509
00000017 00000001   1   0   0 - 4008 0000000000540098,
b0d51c6200000000b0d51c620000000034035f0a34035f0a
00000018 00000001   1   0   0 - 4008 00000000005400d8,
b0d51c6200000000b0d51c62000000001cfdc00c1cfdc00c
00000019 00000001   1   0   0 - 4008 00000000005400b8,
50d61c620000000050d61c62000000004453d0384453d038
0000001a 00000001   1   0   0 - 4008 00000000005400f8,
50d61c620000000050d61c6200000000d4113b39d4113b39
0000001b 00000001   1   0   0 - 4008 0000000000540084,
51d61c620000000051d61c62000000002042020020420200
0000001c 00000001   1   0   0 - 4008 00000000005400c4,
16d71c620000000016d71c62000000009ceb0d019ceb0d01
0000001d 00000001   1   0   0 - 4008 00000000005400a4,
16d71c620000000016d71c6200000000dcae7801dcae7801
0000001e 00000001   1   0   0 - 4008 00000000005400e4,
16d71c620000000016d71c6200000000ec2af903ec2af903

I have enabled below fscache and cachefiles related tracepoints. But
nothing is getting printed in trace o/p.
echo 1 >/sys/kernel/debug/tracing/events/fscache/fscache_access/enable
echo 1 >/sys/kernel/debug/tracing/events/fscache/fscache_active/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_coherency/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_read/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_write/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_io_error/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_vfs_error/enable
echo 1 > events/cachefiles/cachefiles_vol_coherency/enable

Regards,
Rohith

On Mon, Feb 14, 2022 at 10:03 PM David Howells <dhowells@redhat.com> wrote:
>
> Rohith Surabattula <rohiths.msft@gmail.com> wrote:
>
> > I have tested netfs integration with fsc mount option enabled. But, I
> > observed function "netfs_cache_prepare_read" always returns
> > "NETFS_DOWNLOAD_FROM_SERVER" because cres->ops(i.e cachefiles
> > operations) is not set.
>
> I see it download from the server and write to the cache:
>
>         # cat /proc/fs/fscache/stats
>         ...
>         IO     : rd=0 wr=4     <---- no reads, four writes made
>         RdHelp : DR=0 RA=4 RP=0 WB=0 WBZ=0 rr=0 sr=0
>         RdHelp : ZR=0 sh=0 sk=0
>         RdHelp : DL=4 ds=4 df=0 di=0
>         RdHelp : RD=0 rs=0 rf=0
>         RdHelp : WR=4 ws=4 wf=0
>
> Turning on the cachefiles_vol_coherency tracepoint, I see:
>
>      kworker/2:2-1040    [002] .....   585.499799: cachefiles_vol_coherency: V=00000003 VOL BAD cmp  B=480004
>      kworker/2:2-1040    [002] .....   585.499872: cachefiles_vol_coherency: V=00000003 VOL SET ok   B=480005
>
> every time I unmount and mount again.  One of the fields is different each
> time.
>
> Using the netfs tracepoints, I can see the download being made from the server
> and then the subsequent write to the cache:
>
>           md5sum-4689    [003] .....   887.382290: netfs_read: R=00000005 READAHEAD c=0000004e ni=86 s=0 20000
>           md5sum-4689    [003] .....   887.383076: netfs_read: R=00000005 EXPANDED  c=0000004e ni=86 s=0 400000
>           md5sum-4689    [003] .....   887.383252: netfs_sreq: R=00000005[0] PREP  DOWN f=01 s=0 0/400000 e=0
>           md5sum-4689    [003] .....   887.383252: netfs_sreq: R=00000005[0] SUBMT DOWN f=01 s=0 0/400000 e=0
>            cifsd-4687    [002] .....   887.394926: netfs_sreq: R=00000005[0] TERM  DOWN f=03 s=0 400000/400000 e=0
>            cifsd-4687    [002] .....   887.394928: netfs_rreq: R=00000005 ASSESS f=22
>            cifsd-4687    [002] .....   887.394928: netfs_rreq: R=00000005 UNLOCK f=22
>     kworker/u8:4-776     [000] .....   887.395000: netfs_rreq: R=00000005 WRITE  f=02
>     kworker/u8:4-776     [000] .....   887.395005: netfs_sreq: R=00000005[0] WRITE DOWN f=03 s=0 400000/400000 e=0
>      kworker/3:2-1001    [003] .....   887.627881: netfs_sreq: R=00000005[0] WTERM DOWN f=03 s=0 400000/400000 e=0
>      kworker/3:2-1001    [003] .....   887.628163: netfs_rreq: R=00000005 DONE   f=02
>      kworker/3:2-1001    [003] .....   887.628165: netfs_sreq: R=00000005[0] FREE  DOWN f=03 s=0 400000/400000 e=0
>     kworker/u8:4-776     [000] .....   887.628216: netfs_rreq: R=00000005 FREE   f=02
>
> Can you mount a cifs share with "-o fsc", read a file and then look in
> /proc/fs/fscache/cookies and /proc/fs/fscache/stats for me?
>
> David
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 7/7] cifs: Use netfslib to handle reads
  2022-02-14 16:33   ` David Howells
  2022-02-28 14:14     ` Rohith Surabattula
@ 2022-02-28 14:28     ` David Howells
  2022-03-01  7:21       ` Rohith Surabattula
  2022-03-01  9:39       ` David Howells
  1 sibling, 2 replies; 20+ messages in thread
From: David Howells @ 2022-02-28 14:28 UTC (permalink / raw)
  To: Rohith Surabattula
  Cc: dhowells, smfrench, nspmangalore, jlayton, linux-cifs,
	linux-cachefs, linux-fsdevel

Rohith Surabattula <rohiths.msft@gmail.com> wrote:

> R=00000006 READAHEAD c=00000000 ni=0 s=0 1000
>               vi-1631    [000] .....  2519.247540: netfs_read:

"c=00000000" would indicate that no fscache cookie was allocated for this
inode.

> COOKIE   VOLUME   REF ACT ACC S FL DEF
> ======== ======== === === === = == ================
> 00000002 00000001   1   0   0 - 4008 302559bec76a7924,
> 0a13e961000000000a13e96100000000d01f4719d01f4719
> 00000003 00000001   1   0   0 - 4000 0000000000640090,
> 37630162000000003763016200000000e8650f119c49f411

But we can see some cookies have been allocated.

Can you turn on:

  echo 1 >/sys/kernel/debug/tracing/events/fscache/fscache_acquire/enable

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 7/7] cifs: Use netfslib to handle reads
  2022-02-28 14:28     ` David Howells
@ 2022-03-01  7:21       ` Rohith Surabattula
  2022-03-01  9:39       ` David Howells
  1 sibling, 0 replies; 20+ messages in thread
From: Rohith Surabattula @ 2022-03-01  7:21 UTC (permalink / raw)
  To: David Howells
  Cc: smfrench, nspmangalore, jlayton, linux-cifs, linux-cachefs,
	linux-fsdevel

Hi David,

Below are traces:
              vi-9189    [001] ..... 64454.731493: fscache_acquire:
c=0000001f V=00000001 vr=31 vc=30
              vi-9189    [001] ..... 64454.739242: fscache_acquire:
c=00000020 V=00000001 vr=32 vc=31
              vi-9189    [001] ..... 64454.783474: fscache_acquire:
c=00000021 V=00000001 vr=33 vc=32
              vi-9189    [001] ..... 64454.794650: netfs_read:
R=00000007 READAHEAD c=00000000 ni=0 s=0 1000
              vi-9189    [001] ..... 64454.794652: netfs_read:
R=00000007 EXPANDED  c=00000000 ni=0 s=0 1000
              vi-9189    [001] ..... 64454.794661: netfs_sreq:
R=00000007[0] PREP  DOWN f=00 s=0 0/100000 e=0
              vi-9189    [001] ..... 64454.794662: netfs_sreq:
R=00000007[0] SUBMT DOWN f=00 s=0 0/100000 e=0
           cifsd-1390    [000] ..... 64454.817450: netfs_sreq:
R=00000007[0] TERM  DOWN f=02 s=0 100000/100000 e=0
           cifsd-1390    [000] ..... 64454.817451: netfs_rreq:
R=00000007 ASSESS f=20
           cifsd-1390    [000] ..... 64454.817452: netfs_rreq:
R=00000007 UNLOCK f=20
           cifsd-1390    [000] ..... 64454.817464: netfs_rreq:
R=00000007 DONE   f=00
           cifsd-1390    [000] ..... 64454.817464: netfs_sreq:
R=00000007[0] FREE  DOWN f=02 s=0 100000/100000 e=0
           cifsd-1390    [000] ..... 64454.817465: netfs_rreq:
R=00000007 FREE   f=00

Regards,
Rohith

On Mon, Feb 28, 2022 at 7:58 PM David Howells <dhowells@redhat.com> wrote:
>
> Rohith Surabattula <rohiths.msft@gmail.com> wrote:
>
> > R=00000006 READAHEAD c=00000000 ni=0 s=0 1000
> >               vi-1631    [000] .....  2519.247540: netfs_read:
>
> "c=00000000" would indicate that no fscache cookie was allocated for this
> inode.
>
> > COOKIE   VOLUME   REF ACT ACC S FL DEF
> > ======== ======== === === === = == ================
> > 00000002 00000001   1   0   0 - 4008 302559bec76a7924,
> > 0a13e961000000000a13e96100000000d01f4719d01f4719
> > 00000003 00000001   1   0   0 - 4000 0000000000640090,
> > 37630162000000003763016200000000e8650f119c49f411
>
> But we can see some cookies have been allocated.
>
> Can you turn on:
>
>   echo 1 >/sys/kernel/debug/tracing/events/fscache/fscache_acquire/enable
>
> David
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 7/7] cifs: Use netfslib to handle reads
  2022-02-28 14:28     ` David Howells
  2022-03-01  7:21       ` Rohith Surabattula
@ 2022-03-01  9:39       ` David Howells
  2022-03-02 13:51         ` Rohith Surabattula
  1 sibling, 1 reply; 20+ messages in thread
From: David Howells @ 2022-03-01  9:39 UTC (permalink / raw)
  To: Rohith Surabattula
  Cc: dhowells, smfrench, nspmangalore, jlayton, linux-cifs,
	linux-cachefs, linux-fsdevel

Btw, do you have any changes on top of my cifs-experimental branch?

Also, what commands are you running to test it?  I see you're using 'vi' - is
it possible that vi is opening the file O_RDWR?

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 7/7] cifs: Use netfslib to handle reads
  2022-03-01  9:39       ` David Howells
@ 2022-03-02 13:51         ` Rohith Surabattula
  0 siblings, 0 replies; 20+ messages in thread
From: Rohith Surabattula @ 2022-03-02 13:51 UTC (permalink / raw)
  To: David Howells
  Cc: smfrench, nspmangalore, jlayton, linux-cifs, linux-cachefs,
	linux-fsdevel

No, I don't have any private changes on top of your cifs-experimental branch.

Below is last commit:
commit cf302ba2d441582a060a0ea1aa4af47f09b24f57 (HEAD ->
cifs-experimental, origin/cifs-experimental)
Author: David Howells <dhowells@redhat.com>
Date:   Tue Nov 17 15:56:59 2020 +0000

    cifs: Use netfslib to handle reads

yes, I have used "Vi". I have tried with md5sum as well.

Regards,
Rohith

On Tue, Mar 1, 2022 at 3:09 PM David Howells <dhowells@redhat.com> wrote:
>
> Btw, do you have any changes on top of my cifs-experimental branch?
>
> Also, what commands are you running to test it?  I see you're using 'vi' - is
> it possible that vi is opening the file O_RDWR?
>
> David
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-03-02 13:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-25 13:57 [RFC][RFC PATCH 0/7] cifs: In-progress conversion to use iov_iters and netfslib David Howells
2022-01-25 13:57 ` [RFC PATCH 1/7] cifs: Transition from ->readpages() to ->readahead() David Howells
2022-01-25 14:20   ` Matthew Wilcox
2022-01-25 14:57   ` David Howells
2022-01-25 13:57 ` [RFC PATCH 2/7] cifs: Miscellaneous bits David Howells
2022-01-25 13:57 ` [RFC PATCH 3/7] cifs: Change the I/O paths to use an iterator rather than a page list David Howells
2022-01-31  5:06   ` Rohith Surabattula
2022-01-31  5:48     ` Shyam Prasad N
2022-02-14 16:06   ` David Howells
2022-01-25 13:58 ` [RFC PATCH 4/7] cifs: Make cifs_writepages() hand an iterator down David Howells
2022-01-25 13:58 ` [RFC PATCH 5/7] cifs: Make cifs_readahead() pass " David Howells
2022-01-25 13:58 ` [RFC PATCH 6/7] cifs: Get direct I/O and unbuffered I/O working with iterators David Howells
2022-01-25 13:59 ` [RFC PATCH 7/7] cifs: Use netfslib to handle reads David Howells
2022-02-08  5:59   ` Rohith Surabattula
2022-02-14 16:33   ` David Howells
2022-02-28 14:14     ` Rohith Surabattula
2022-02-28 14:28     ` David Howells
2022-03-01  7:21       ` Rohith Surabattula
2022-03-01  9:39       ` David Howells
2022-03-02 13:51         ` Rohith Surabattula

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).