All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE
@ 2023-03-08 16:52 David Howells
  2023-03-08 16:52 ` [PATCH v17 01/14] splice: Clean up direct_splice_read() a bit David Howells
                   ` (13 more replies)
  0 siblings, 14 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm

Hi Jens, Al, Christoph,

The first half of this patchset kills off ITER_PIPE to avoid a race between
truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a
bio with unpinned/unref'ed pages from an O_DIRECT splice read.  This causes
memory corruption[2].  Instead, we use filemap_splice_read(), which invokes
the buffered file reading code and splices from the pagecache into the
pipe; direct_splice_read(), which bulk-allocates a buffer, reads into it
and then pushes the filled pages into the pipe; or handle it in
filesystem-specific code.

 (1) Simplify the calculations for the number of pages to be reclaimed in
     direct_splice_read().

 (2) Turn do_splice_to() into a helper so that it can be used by overlayfs
     and coda to perform the checks on the lower fs.

 (3) Provide shmem with its own splice_read to handle non-existent pages
     in the pagecache.  We don't want a ->read_folio() as we don't want to
     populate holes, but filemap_get_pages() requires it.

 (4) Provide overlayfs with its own splice_read to call down to a lower
     layer as overlayfs doesn't provide ->read_folio().

 (5) Provide coda with its own splice_read to call down to a lower layer as
     coda doesn't provide ->read_folio().

 (6) Direct ->splice_read to direct_splice_read() in tty, procfs, kernfs
     and random files as they just copy to the output buffer and don't
     splice pages.

 (7) Change generic_file_splice_read() to just switch between
     filemap_splice_read() and direct_splice_read() rather than using
     ITER_PIPE.

 (8) Remove ITER_PIPE and its paraphernalia as generic_file_splice_read()
     was the only user.

The second half of the patchset rolls page-pinning out to the bio struct
and the block layer, using iov_iter_extract_pages() to get pages and noting
with BIO_PAGE_PINNED if the data pages attached to a bio are pinned.  If
the data pages come from a non-user-backed iterator, then the pages are
left unpinned and unref'd, relying on whoever set up the I/O to do the
retaining

 (9) Don't hold a ref on ZERO_PAGE in iomap_dio_zero().

(10) Fix bio_flagged() so that it doesn't prevent a gcc optimisation.

(11) Make the bio struct carry a pair of flags to indicate the cleanup
     mode.  BIO_NO_PAGE_REF is replaced with BIO_PAGE_REFFED (indicating
     FOLL_GET was used) and BIO_PAGE_PINNED (indicating FOLL_PIN was used)
     is added.

     BIO_PAGE_REFFED will go away, but at the moment fs/direct-io.c sets it
     and this series does not fully address that file.

(12) Add a function, bio_release_page(), to release a page appropriately to
     the cleanup mode indicated by the BIO_PAGE_* flags.

(13) Make bio_iov_iter_get_pages() use iov_iter_extract_pages() to retain
     the pages appropriately and clean them up later.

(14) Make bio_map_user_iov() also use iov_iter_extract_pages().

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-extract

David

Changes:
========
ver #17)
 - Rename do_splice_to() to vfs_splice_read() and export it so that it can
   be a helper and make overlayfs and coda use it, allowing duplicate
   checks to be removed.

ver #16)
 - The filemap_get_pages() changes are now upstream.
 - filemap_splice_read() and direct_splice_read() are now upstream.
 - iov_iter_extract_pages() is now upstream.

ver #15)
 - Fixed up some errors in overlayfs_splice_read().

ver #14)
 - Some changes to generic_file_buffered_splice_read():
   - Rename to filemap_splice_read() and move to mm/filemap.c.
   - Create a helper, pipe_head_buf().
   - Use init_sync_kiocb().
 - Some changes to generic_file_direct_splice_read():
   - Use alloc_pages_bulk_array() rather than alloc_pages_bulk_list().
   - Use release_pages() instead of __free_page() in a loop.
   - Rename to direct_splice_read().
 - Rearrange the patches to implement filemap_splice_read() and
   direct_splice_read() separately to changing generic_file_splice_read().
 - Don't call generic_file_splice_read() when there isn't a ->read_folio().
 - Insert patches to fix read_folio-less cases:
   - Make tty, procfs, kernfs and (u)random use direct_splice_read().
   - Make overlayfs and coda call down to a lower layer.
   - Give shmem its own splice-read that doesn't insert missing pages.
 - Fixed a min() with mixed type args on some arches.

ver #13)
 - Only use allocation in advance and ITER_BVEC for DIO read-splice.
 - Make buffered read-splice get pages directly from the pagecache.
 - Alter filemap_get_pages() & co. so that it doesn't need an iterator.

ver #12)
 - Added the missing __bitwise on the iov_iter_extraction_t typedef.
 - Rebased on -rc7.
 - Don't specify FOLL_PIN to pin_user_pages_fast().
 - Inserted patch at front to fix race between DIO read and truncation that
   caused memory corruption when iov_iter_revert() got called on an
   ITER_PIPE iterator[2].
 - Inserted a patch after that to remove the now-unused ITER_PIPE and its
   helper functions.
 - Removed the ITER_PIPE bits from iov_iter_extract_pages().

ver #11)
 - Fix iov_iter_extract_kvec_pages() to include the offset into the page in
   the returned starting offset.
 - Use __bitwise for the extraction flags

ver #10)
 - Fix use of i->kvec in iov_iter_extract_bvec_pages() to be i->bvec.
 - Drop bio_set_cleanup_mode(), open coding it instead.

ver #9)
 - It's now not permitted to use FOLL_PIN outside of mm/, so:
 - Change iov_iter_extract_mode() into iov_iter_extract_will_pin() and
   return true/false instead of FOLL_PIN/0.
 - Drop of folio_put_unpin() and page_put_unpin() and instead call
   unpin_user_page() (and put_page()) directly as necessary.
 - Make __bio_release_pages() call bio_release_page() instead of
   unpin_user_page() as there's no BIO_* -> FOLL_* translation to do.
 - Drop the FOLL_* renumbering patch.
 - Change extract_flags to extraction_flags.

ver #8)
 - Import Christoph Hellwig's changes.
   - Split the conversion-to-extraction patch.
   - Drop the extract_flags arg from iov_iter_extract_mode().
   - Don't default bios to BIO_PAGE_REFFED, but set explicitly.
 - Switch FOLL_PIN and FOLL_GET when renumbering so PIN is at bit 0.
 - Switch BIO_PAGE_PINNED and BIO_PAGE_REFFED so PINNED is at bit 0.
 - We should always be using FOLL_PIN (not FOLL_GET) for DIO, so adjust the
   patches for that.

ver #7)
 - For now, drop the parts to pass the I/O direction to iov_iter_*pages*()
   as it turned out to be a lot more complicated, with places not setting
   IOCB_WRITE when they should, for example.
 - Drop all the patches that changed things other then the block layer's
   bio handling.  The netfslib and cifs changes can go into a separate
   patchset.
 - Add support for extracting pages from KVEC-type iterators.
 - When extracting from BVEC/KVEC, skip over empty vecs at the front.

ver #6)
 - Fix write() syscall and co. not setting IOCB_WRITE.
 - Added iocb_is_read() and iocb_is_write() to check IOCB_WRITE.
 - Use op_is_write() in bio_copy_user_iov().
 - Drop the iterator direction checks from smbd_recv().
 - Define FOLL_SOURCE_BUF and FOLL_DEST_BUF and pass them in as part of
   gup_flags to iov_iter_get/extract_pages*().
 - Replace iov_iter_get_pages*2() with iov_iter_get_pages*() and remove.
 - Add back the function to indicate the cleanup mode.
 - Drop the cleanup_mode return arg to iov_iter_extract_pages().
 - Provide a helper to clean up a page.
 - Renumbered FOLL_GET and FOLL_PIN and made BIO_PAGE_REFFED/PINNED have
   the same numerical values, enforced with an assertion.
 - Converted AF_ALG, SCSI vhost, generic DIO, FUSE, splice to pipe, 9P and
   NFS.
 - Added in the patches to make CIFS do top-to-bottom iterators and use
   various of the added extraction functions.
 - Added a pair of work-in-progess patches to make sk_buff fragments store
   FOLL_GET and FOLL_PIN.

ver #5)
 - Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED and split into own patch.
 - Transcribe FOLL_GET/PIN into BIO_PAGE_REFFED/PINNED flags.
 - Add patch to allow bio_flagged() to be combined by gcc.

ver #4)
 - Drop the patch to move the FOLL_* flags to linux/mm_types.h as they're
   no longer referenced by linux/uio.h.
 - Add ITER_SOURCE/DEST cleanup patches.
 - Make iov_iter/netfslib iter extraction patches use ITER_SOURCE/DEST.
 - Allow additional gup_flags to be passed into iov_iter_extract_pages().
 - Add struct bio patch.

ver #3)
 - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access
   to get/pin_user_pages_fast()[1].

ver #2)
 - Rolled the extraction cleanup mode query function into the extraction
   function, returning the indication through the argument list.
 - Fixed patch 4 (extract to scatterlist) to actually use the new
   extraction API.

Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1]
Link: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ [2]
Link: https://lore.kernel.org/r/166697254399.61150.1256557652599252121.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166722777223.2555743.162508599131141451.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732024173.3186319.18204305072070871546.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166869687556.3723671.10061142538708346995.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166920902005.1461876.2786264600108839814.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/166997419665.9475.15014699817597102032.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/167305160937.1521586.133299343565358971.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/167344725490.2425628.13771289553670112965.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk/ # v6
Link: https://lore.kernel.org/r/20230120175556.3556978-1-dhowells@redhat.com/ # v7
Link: https://lore.kernel.org/r/20230123173007.325544-1-dhowells@redhat.com/ # v8
Link: https://lore.kernel.org/r/20230124170108.1070389-1-dhowells@redhat.com/ # v9
Link: https://lore.kernel.org/r/20230125210657.2335748-1-dhowells@redhat.com/ # v10
Link: https://lore.kernel.org/r/20230126141626.2809643-1-dhowells@redhat.com/ # v11
Link: https://lore.kernel.org/r/20230207171305.3716974-1-dhowells@redhat.com/ # v12
Link: https://lore.kernel.org/r/20230209102954.528942-1-dhowells@redhat.com/ # v13
Link: https://lore.kernel.org/r/20230214171330.2722188-1-dhowells@redhat.com/ # v14
Link: https://lore.kernel.org/r/20230308143754.1976726-1-dhowells@redhat.com/ # v16

Additional patches that got folded in:

Link: https://lore.kernel.org/r/20230213134619.2198965-1-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20230213153301.2338806-1-dhowells@redhat.com/ # v2
Link: https://lore.kernel.org/r/20230214083710.2547248-1-dhowells@redhat.com/ # v3

Christoph Hellwig (1):
  block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted
    logic

David Howells (13):
  splice: Clean up direct_splice_read() a bit
  splice: Make do_splice_to() generic and export it
  shmem: Implement splice-read
  overlayfs: Implement splice-read
  coda: Implement splice-read
  tty, proc, kernfs, random: Use direct_splice_read()
  splice: Do splice read from a file without using ITER_PIPE
  iov_iter: Kill ITER_PIPE
  iomap: Don't get an reference on ZERO_PAGE for direct I/O block
    zeroing
  block: Fix bio_flagged() so that gcc can better optimise it
  block: Add BIO_PAGE_PINNED and associated infrastructure
  block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
  block: convert bio_map_user_iov to use iov_iter_extract_pages

 block/bio.c               |  29 +--
 block/blk-map.c           |  22 +-
 block/blk.h               |  12 ++
 drivers/char/random.c     |   4 +-
 drivers/tty/tty_io.c      |   4 +-
 fs/cifs/cifsfs.c          |   8 +-
 fs/cifs/cifsfs.h          |   3 -
 fs/cifs/file.c            |  16 --
 fs/coda/file.c            |  29 ++-
 fs/direct-io.c            |   2 +
 fs/iomap/direct-io.c      |   1 -
 fs/kernfs/file.c          |   2 +-
 fs/overlayfs/file.c       |  23 +-
 fs/proc/inode.c           |   4 +-
 fs/proc/proc_sysctl.c     |   2 +-
 fs/proc_namespace.c       |   6 +-
 fs/splice.c               |  76 +++----
 include/linux/bio.h       |   5 +-
 include/linux/blk_types.h |   3 +-
 include/linux/splice.h    |   3 +
 include/linux/uio.h       |  14 --
 lib/iov_iter.c            | 429 +-------------------------------------
 mm/filemap.c              |   4 +-
 mm/shmem.c                | 124 ++++++++++-
 24 files changed, 274 insertions(+), 551 deletions(-)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v17 01/14] splice: Clean up direct_splice_read() a bit
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-14 17:30   ` Christoph Hellwig
  2023-03-08 16:52 ` [PATCH v17 02/14] splice: Make do_splice_to() generic and export it David Howells
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, John Hubbard

Do a couple of cleanups to direct_splice_read():

 (1) Cast to struct page **, not void *.

 (2) Simplify the calculation of the number of pages to keep/reclaim in
     direct_splice_read().

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/splice.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 2e76dbb81a8f..abd21a455a2b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -295,7 +295,7 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos,
 	struct kiocb kiocb;
 	struct page **pages;
 	ssize_t ret;
-	size_t used, npages, chunk, remain, reclaim;
+	size_t used, npages, chunk, remain, keep = 0;
 	int i;
 
 	/* Work out how much data we can actually add into the pipe */
@@ -309,7 +309,7 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos,
 	if (!bv)
 		return -ENOMEM;
 
-	pages = (void *)(bv + npages);
+	pages = (struct page **)(bv + npages);
 	npages = alloc_pages_bulk_array(GFP_USER, npages, pages);
 	if (!npages) {
 		kfree(bv);
@@ -332,11 +332,8 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos,
 	kiocb.ki_pos = *ppos;
 	ret = call_read_iter(in, &kiocb, &to);
 
-	reclaim = npages * PAGE_SIZE;
-	remain = 0;
 	if (ret > 0) {
-		reclaim -= ret;
-		remain = ret;
+		keep = DIV_ROUND_UP(ret, PAGE_SIZE);
 		*ppos = kiocb.ki_pos;
 		file_accessed(in);
 	} else if (ret < 0) {
@@ -349,14 +346,12 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos,
 	}
 
 	/* Free any pages that didn't get touched at all. */
-	reclaim /= PAGE_SIZE;
-	if (reclaim) {
-		npages -= reclaim;
-		release_pages(pages + npages, reclaim);
-	}
+	if (keep < npages)
+		release_pages(pages + keep, npages - keep);
 
 	/* Push the remaining pages into the pipe. */
-	for (i = 0; i < npages; i++) {
+	remain = ret;
+	for (i = 0; i < keep; i++) {
 		struct pipe_buffer *buf = pipe_head_buf(pipe);
 
 		chunk = min_t(size_t, remain, PAGE_SIZE);


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 02/14] splice: Make do_splice_to() generic and export it
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
  2023-03-08 16:52 ` [PATCH v17 01/14] splice: Clean up direct_splice_read() a bit David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-14 17:31   ` Christoph Hellwig
                     ` (2 more replies)
  2023-03-08 16:52 ` [PATCH v17 03/14] shmem: Implement splice-read David Howells
                   ` (11 subsequent siblings)
  13 siblings, 3 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Miklos Szeredi, Christoph Hellwig,
	John Hubbard, linux-unionfs

Rename do_splice_to() to vfs_splice_read() and export it so that it can be
used as a helper when calling down to a lower layer filesystem as it
performs all the necessary checks[1].

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-unionfs@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/CAJfpeguGksS3sCigmRi9hJdUec8qtM9f+_9jC1rJhsXT+dV01w@mail.gmail.com/ [1]
---
 fs/splice.c            | 27 ++++++++++++++++++++-------
 include/linux/splice.h |  3 +++
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index abd21a455a2b..90ccd3666dca 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -851,12 +851,24 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
 	return out->f_op->splice_write(pipe, out, ppos, len, flags);
 }
 
-/*
- * Attempt to initiate a splice from a file to a pipe.
+/**
+ * vfs_splice_read - Read data from a file and splice it into a pipe
+ * @in:		File to splice from
+ * @ppos:	Input file offset
+ * @pipe:	Pipe to splice to
+ * @len:	Number of bytes to splice
+ * @flags:	Splice modifier flags (SPLICE_F_*)
+ *
+ * Splice the requested amount of data from the input file to the pipe.  This
+ * is synchronous as the caller must hold the pipe lock across the entire
+ * operation.
+ *
+ * If successful, it returns the amount of data spliced, 0 if it hit the EOF or
+ * a hole and a negative error code otherwise.
  */
-static long do_splice_to(struct file *in, loff_t *ppos,
-			 struct pipe_inode_info *pipe, size_t len,
-			 unsigned int flags)
+long vfs_splice_read(struct file *in, loff_t *ppos,
+		     struct pipe_inode_info *pipe, size_t len,
+		     unsigned int flags)
 {
 	unsigned int p_space;
 	int ret;
@@ -879,6 +891,7 @@ static long do_splice_to(struct file *in, loff_t *ppos,
 		return warn_unsupported(in, "read");
 	return in->f_op->splice_read(in, ppos, pipe, len, flags);
 }
+EXPORT_SYMBOL_GPL(vfs_splice_read);
 
 /**
  * splice_direct_to_actor - splices data directly between two non-pipes
@@ -949,7 +962,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 		size_t read_len;
 		loff_t pos = sd->pos, prev_pos = pos;
 
-		ret = do_splice_to(in, &pos, pipe, len, flags);
+		ret = vfs_splice_read(in, &pos, pipe, len, flags);
 		if (unlikely(ret <= 0))
 			goto out_release;
 
@@ -1097,7 +1110,7 @@ long splice_file_to_pipe(struct file *in,
 	pipe_lock(opipe);
 	ret = wait_for_space(opipe, flags);
 	if (!ret)
-		ret = do_splice_to(in, offset, opipe, len, flags);
+		ret = vfs_splice_read(in, offset, opipe, len, flags);
 	pipe_unlock(opipe);
 	if (ret > 0)
 		wakeup_pipe_readers(opipe);
diff --git a/include/linux/splice.h b/include/linux/splice.h
index a55179fd60fc..8f052c3dae95 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -76,6 +76,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
 			      struct splice_pipe_desc *);
 extern ssize_t add_to_pipe(struct pipe_inode_info *,
 			      struct pipe_buffer *);
+long vfs_splice_read(struct file *in, loff_t *ppos,
+		     struct pipe_inode_info *pipe, size_t len,
+		     unsigned int flags);
 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
 				      splice_direct_actor *);
 extern long do_splice(struct file *in, loff_t *off_in,


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 03/14] shmem: Implement splice-read
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
  2023-03-08 16:52 ` [PATCH v17 01/14] splice: Clean up direct_splice_read() a bit David Howells
  2023-03-08 16:52 ` [PATCH v17 02/14] splice: Make do_splice_to() generic and export it David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-08 22:39   ` Linus Torvalds
  2023-03-08 23:42   ` David Howells
  2023-03-08 16:52 ` [PATCH v17 04/14] overlayfs: " David Howells
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Daniel Golle, Guenter Roeck,
	Christoph Hellwig, John Hubbard, Hugh Dickins

The new filemap_splice_read() has an implicit expectation via
filemap_get_pages() that ->read_folio() exists if ->readahead() doesn't
fully populate the pagecache of the file it is reading from[1], potentially
leading to a jump to NULL if this doesn't exist.  shmem, however, (and by
extension, tmpfs, ramfs and rootfs), doesn't have ->read_folio(),

Work around this by equipping shmem with its own splice-read
implementation, based on filemap_splice_read(), but able to paste in
zero_page when there's a page missing.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Daniel Golle <daniel@makrotopia.org>
cc: Guenter Roeck <groeck7@gmail.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Hugh Dickins <hughd@google.com>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/Y+pdHFFTk1TTEBsO@makrotopia.org/ [1]
---
 mm/shmem.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 123 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 448f393d8ab2..3cbec1d56112 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2719,6 +2719,128 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	return retval ? retval : error;
 }
 
+static bool zero_pipe_buf_try_steal(struct pipe_inode_info *pipe,
+				    struct pipe_buffer *buf)
+{
+	return false;
+}
+
+static const struct pipe_buf_operations zero_pipe_buf_ops = {
+	.release	= generic_pipe_buf_release,
+	.try_steal	= zero_pipe_buf_try_steal,
+	.get		= generic_pipe_buf_get,
+};
+
+static size_t splice_zeropage_into_pipe(struct pipe_inode_info *pipe,
+					loff_t fpos, size_t size)
+{
+	size_t offset = fpos & ~PAGE_MASK;
+
+	size = min_t(size_t, size, PAGE_SIZE - offset);
+
+	if (!pipe_full(pipe->head, pipe->tail, pipe->max_usage)) {
+		struct pipe_buffer *buf = pipe_head_buf(pipe);
+
+		*buf = (struct pipe_buffer) {
+			.ops	= &zero_pipe_buf_ops,
+			.page	= ZERO_PAGE(0),
+			.offset	= offset,
+			.len	= size,
+		};
+		get_page(buf->page);
+		pipe->head++;
+	}
+
+	return size;
+}
+
+static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
+				      struct pipe_inode_info *pipe,
+				      size_t len, unsigned int flags)
+{
+	struct inode *inode = file_inode(in);
+	struct address_space *mapping = inode->i_mapping;
+	struct folio *folio = NULL;
+	size_t total_spliced = 0, used, npages, n, part;
+	loff_t isize;
+	int error = 0;
+
+	/* Work out how much data we can actually add into the pipe */
+	used = pipe_occupancy(pipe->head, pipe->tail);
+	npages = max_t(ssize_t, pipe->max_usage - used, 0);
+	len = min_t(size_t, len, npages * PAGE_SIZE);
+
+	do {
+		if (*ppos >= i_size_read(inode))
+			break;
+
+		error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio, SGP_READ);
+		if (error) {
+			if (error == -EINVAL)
+				error = 0;
+			break;
+		}
+		if (folio) {
+			folio_unlock(folio);
+
+			if (folio_test_hwpoison(folio)) {
+				error = -EIO;
+				break;
+			}
+		}
+
+		/*
+		 * i_size must be checked after we know the pages are Uptodate.
+		 *
+		 * Checking i_size after the check allows us to calculate
+		 * the correct value for "nr", which means the zero-filled
+		 * part of the page is not copied back to userspace (unless
+		 * another truncate extends the file - this is desired though).
+		 */
+		isize = i_size_read(inode);
+		if (unlikely(*ppos >= isize))
+			break;
+		part = min_t(loff_t, isize - *ppos, len);
+
+		if (folio) {
+			/*
+			 * If users can be writing to this page using arbitrary
+			 * virtual addresses, take care about potential aliasing
+			 * before reading the page on the kernel side.
+			 */
+			if (mapping_writably_mapped(mapping))
+				flush_dcache_folio(folio);
+			folio_mark_accessed(folio);
+			/*
+			 * Ok, we have the page, and it's up-to-date, so we can
+			 * now splice it into the pipe.
+			 */
+			n = splice_folio_into_pipe(pipe, folio, *ppos, part);
+			folio_put(folio);
+			folio = NULL;
+		} else {
+			n = splice_zeropage_into_pipe(pipe, *ppos, len);
+		}
+
+		if (!n)
+			break;
+		len -= n;
+		total_spliced += n;
+		*ppos += n;
+		in->f_ra.prev_pos = *ppos;
+		if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
+			break;
+
+		cond_resched();
+	} while (len);
+
+	if (folio)
+		folio_put(folio);
+
+	file_accessed(in);
+	return total_spliced ? total_spliced : error;
+}
+
 static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence)
 {
 	struct address_space *mapping = file->f_mapping;
@@ -3938,7 +4060,7 @@ static const struct file_operations shmem_file_operations = {
 	.read_iter	= shmem_file_read_iter,
 	.write_iter	= generic_file_write_iter,
 	.fsync		= noop_fsync,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= shmem_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.fallocate	= shmem_fallocate,
 #endif


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 04/14] overlayfs: Implement splice-read
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (2 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 03/14] shmem: Implement splice-read David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-08 16:52 ` [PATCH v17 05/14] coda: " David Howells
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, John Hubbard,
	Miklos Szeredi, linux-unionfs

Implement splice-read for overlayfs by passing the request down a layer
rather than going through generic_file_splice_read() which is going to be
changed to assume that ->read_folio() is present on buffered files.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: linux-unionfs@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---

Notes:
    ver #17)
     - Use vfs_splice_read() helper rather than open-coding checks.
    
    ver #15)
     - Remove redundant FMODE_CAN_ODIRECT check on real file.
     - Do rw_verify_area() on the real file, not the overlay file.
     - Fix a file leak.

 fs/overlayfs/file.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 7c04f033aadd..86197882ff35 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -419,6 +419,27 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	return ret;
 }
 
+static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
+			       struct pipe_inode_info *pipe, size_t len,
+			       unsigned int flags)
+{
+	const struct cred *old_cred;
+	struct fd real;
+	ssize_t ret;
+
+	ret = ovl_real_fdget(in, &real);
+	if (ret)
+		return ret;
+
+	old_cred = ovl_override_creds(file_inode(in)->i_sb);
+	ret = vfs_splice_read(real.file, ppos, pipe, len, flags);
+	revert_creds(old_cred);
+	ovl_file_accessed(in);
+
+	fdput(real);
+	return ret;
+}
+
 /*
  * Calling iter_file_splice_write() directly from overlay's f_op may deadlock
  * due to lock order inversion between pipe->mutex in iter_file_splice_write()
@@ -695,7 +716,7 @@ const struct file_operations ovl_file_operations = {
 	.fallocate	= ovl_fallocate,
 	.fadvise	= ovl_fadvise,
 	.flush		= ovl_flush,
-	.splice_read    = generic_file_splice_read,
+	.splice_read    = ovl_splice_read,
 	.splice_write   = ovl_splice_write,
 
 	.copy_file_range	= ovl_copy_file_range,


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 05/14] coda: Implement splice-read
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (3 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 04/14] overlayfs: " David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-13 13:28   ` Jan Harkes
  2023-03-08 16:52 ` [PATCH v17 06/14] tty, proc, kernfs, random: Use direct_splice_read() David Howells
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Jan Harkes, Christoph Hellwig,
	John Hubbard, coda, codalist, linux-unionfs

Implement splice-read for coda by passing the request down a layer rather
than going through generic_file_splice_read() which is going to be changed
to assume that ->read_folio() is present on buffered files.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jan Harkes <jaharkes@cs.cmu.edu>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: coda@cs.cmu.edu
cc: codalist@coda.cs.cmu.edu
cc: linux-unionfs@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---

Notes:
    ver #17)
     - Use vfs_splice_read() helper rather than open-coding checks.

 fs/coda/file.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/fs/coda/file.c b/fs/coda/file.c
index 3f3c81e6b1ab..12b26bd13564 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -23,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/uio.h>
+#include <linux/splice.h>
 
 #include <linux/coda.h>
 #include "coda_psdev.h"
@@ -94,6 +95,32 @@ coda_file_write_iter(struct kiocb *iocb, struct iov_iter *to)
 	return ret;
 }
 
+static ssize_t
+coda_file_splice_read(struct file *coda_file, loff_t *ppos,
+		      struct pipe_inode_info *pipe,
+		      size_t len, unsigned int flags)
+{
+	struct inode *coda_inode = file_inode(coda_file);
+	struct coda_file_info *cfi = coda_ftoc(coda_file);
+	struct file *in = cfi->cfi_container;
+	loff_t ki_pos = *ppos;
+	ssize_t ret;
+
+	ret = venus_access_intent(coda_inode->i_sb, coda_i2f(coda_inode),
+				  &cfi->cfi_access_intent,
+				  len, ki_pos, CODA_ACCESS_TYPE_READ);
+	if (ret)
+		goto finish_read;
+
+	ret = vfs_splice_read(in, ppos, pipe, len, flags);
+
+finish_read:
+	venus_access_intent(coda_inode->i_sb, coda_i2f(coda_inode),
+			    &cfi->cfi_access_intent,
+			    len, ki_pos, CODA_ACCESS_TYPE_READ_FINISH);
+	return ret;
+}
+
 static void
 coda_vm_open(struct vm_area_struct *vma)
 {
@@ -302,5 +329,5 @@ const struct file_operations coda_file_operations = {
 	.open		= coda_open,
 	.release	= coda_release,
 	.fsync		= coda_fsync,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= coda_file_splice_read,
 };


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 06/14] tty, proc, kernfs, random: Use direct_splice_read()
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (4 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 05/14] coda: " David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-08 16:52 ` [PATCH v17 07/14] splice: Do splice read from a file without using ITER_PIPE David Howells
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Greg Kroah-Hartman, Christoph Hellwig,
	John Hubbard, Miklos Szeredi, Arnd Bergmann

Use direct_splice_read() for tty, procfs, kernfs and random files rather
than going through generic_file_splice_read() as they just copy the file
into the output buffer and don't splice pages.  This avoids the need for
them to have a ->read_folio() to satisfy filemap_splice_read().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Arnd Bergmann <arnd@arndb.de>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---
 drivers/char/random.c | 4 ++--
 drivers/tty/tty_io.c  | 4 ++--
 fs/kernfs/file.c      | 2 +-
 fs/proc/inode.c       | 4 ++--
 fs/proc/proc_sysctl.c | 2 +-
 fs/proc_namespace.c   | 6 +++---
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index ce3ccd172cc8..792713616ba8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1546,7 +1546,7 @@ const struct file_operations random_fops = {
 	.compat_ioctl = compat_ptr_ioctl,
 	.fasync = random_fasync,
 	.llseek = noop_llseek,
-	.splice_read = generic_file_splice_read,
+	.splice_read = direct_splice_read,
 	.splice_write = iter_file_splice_write,
 };
 
@@ -1557,7 +1557,7 @@ const struct file_operations urandom_fops = {
 	.compat_ioctl = compat_ptr_ioctl,
 	.fasync = random_fasync,
 	.llseek = noop_llseek,
-	.splice_read = generic_file_splice_read,
+	.splice_read = direct_splice_read,
 	.splice_write = iter_file_splice_write,
 };
 
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 36fb945fdad4..9d117e579dfb 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -466,7 +466,7 @@ static const struct file_operations tty_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= tty_read,
 	.write_iter	= tty_write,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= direct_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.poll		= tty_poll,
 	.unlocked_ioctl	= tty_ioctl,
@@ -481,7 +481,7 @@ static const struct file_operations console_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= tty_read,
 	.write_iter	= redirected_tty_write,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= direct_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.poll		= tty_poll,
 	.unlocked_ioctl	= tty_ioctl,
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index e4a50e4ff0d2..9d23b8141db7 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -1011,7 +1011,7 @@ const struct file_operations kernfs_file_fops = {
 	.release	= kernfs_fop_release,
 	.poll		= kernfs_fop_poll,
 	.fsync		= noop_fsync,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= direct_splice_read,
 	.splice_write	= iter_file_splice_write,
 };
 
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index f495fdb39151..711f12706469 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -591,7 +591,7 @@ static const struct file_operations proc_iter_file_ops = {
 	.llseek		= proc_reg_llseek,
 	.read_iter	= proc_reg_read_iter,
 	.write		= proc_reg_write,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= direct_splice_read,
 	.poll		= proc_reg_poll,
 	.unlocked_ioctl	= proc_reg_unlocked_ioctl,
 	.mmap		= proc_reg_mmap,
@@ -617,7 +617,7 @@ static const struct file_operations proc_reg_file_ops_compat = {
 static const struct file_operations proc_iter_file_ops_compat = {
 	.llseek		= proc_reg_llseek,
 	.read_iter	= proc_reg_read_iter,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= direct_splice_read,
 	.write		= proc_reg_write,
 	.poll		= proc_reg_poll,
 	.unlocked_ioctl	= proc_reg_unlocked_ioctl,
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 5851eb5bc726..e49f99657d1c 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -869,7 +869,7 @@ static const struct file_operations proc_sys_file_operations = {
 	.poll		= proc_sys_poll,
 	.read_iter	= proc_sys_read,
 	.write_iter	= proc_sys_write,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= direct_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.llseek		= default_llseek,
 };
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 846f9455ae22..492abbbeff5e 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -324,7 +324,7 @@ static int mountstats_open(struct inode *inode, struct file *file)
 const struct file_operations proc_mounts_operations = {
 	.open		= mounts_open,
 	.read_iter	= seq_read_iter,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= direct_splice_read,
 	.llseek		= seq_lseek,
 	.release	= mounts_release,
 	.poll		= mounts_poll,
@@ -333,7 +333,7 @@ const struct file_operations proc_mounts_operations = {
 const struct file_operations proc_mountinfo_operations = {
 	.open		= mountinfo_open,
 	.read_iter	= seq_read_iter,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= direct_splice_read,
 	.llseek		= seq_lseek,
 	.release	= mounts_release,
 	.poll		= mounts_poll,
@@ -342,7 +342,7 @@ const struct file_operations proc_mountinfo_operations = {
 const struct file_operations proc_mountstats_operations = {
 	.open		= mountstats_open,
 	.read_iter	= seq_read_iter,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= direct_splice_read,
 	.llseek		= seq_lseek,
 	.release	= mounts_release,
 };


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 07/14] splice: Do splice read from a file without using ITER_PIPE
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (5 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 06/14] tty, proc, kernfs, random: Use direct_splice_read() David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-14 17:32   ` Christoph Hellwig
  2023-03-14 21:52   ` David Howells
  2023-03-08 16:52 ` [PATCH v17 08/14] iov_iter: Kill ITER_PIPE David Howells
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Steve French, Christoph Hellwig,
	John Hubbard, linux-cifs

Make generic_file_splice_read() use filemap_splice_read() and
direct_splice_read() rather than using an ITER_PIPE and call_read_iter().

Make cifs use generic_file_splice_read() rather than doing it for itself.

Unexport filemap_splice_read().

With this, ITER_PIPE is no longer used.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Steve French <smfrench@gmail.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/cifs/cifsfs.c |  8 ++++----
 fs/cifs/cifsfs.h |  3 ---
 fs/cifs/file.c   | 16 ----------------
 fs/splice.c      | 30 +++++++-----------------------
 mm/filemap.c     |  1 -
 5 files changed, 11 insertions(+), 47 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index cbcf210d56e4..ba963a26cb19 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1359,7 +1359,7 @@ const struct file_operations cifs_file_ops = {
 	.fsync = cifs_fsync,
 	.flush = cifs_flush,
 	.mmap  = cifs_file_mmap,
-	.splice_read = cifs_splice_read,
+	.splice_read = generic_file_splice_read,
 	.splice_write = iter_file_splice_write,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
@@ -1379,7 +1379,7 @@ const struct file_operations cifs_file_strict_ops = {
 	.fsync = cifs_strict_fsync,
 	.flush = cifs_flush,
 	.mmap = cifs_file_strict_mmap,
-	.splice_read = cifs_splice_read,
+	.splice_read = generic_file_splice_read,
 	.splice_write = iter_file_splice_write,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
@@ -1417,7 +1417,7 @@ const struct file_operations cifs_file_nobrl_ops = {
 	.fsync = cifs_fsync,
 	.flush = cifs_flush,
 	.mmap  = cifs_file_mmap,
-	.splice_read = cifs_splice_read,
+	.splice_read = generic_file_splice_read,
 	.splice_write = iter_file_splice_write,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
@@ -1435,7 +1435,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
 	.fsync = cifs_strict_fsync,
 	.flush = cifs_flush,
 	.mmap = cifs_file_strict_mmap,
-	.splice_read = cifs_splice_read,
+	.splice_read = generic_file_splice_read,
 	.splice_write = iter_file_splice_write,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 71fe0a0a7992..8b239854e590 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -100,9 +100,6 @@ extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
-extern ssize_t cifs_splice_read(struct file *in, loff_t *ppos,
-				struct pipe_inode_info *pipe, size_t len,
-				unsigned int flags);
 extern int cifs_flock(struct file *pfile, int cmd, struct file_lock *plock);
 extern int cifs_lock(struct file *, int, struct file_lock *);
 extern int cifs_fsync(struct file *, loff_t, loff_t, int);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 4d4a2d82636d..321f9b7c84c9 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -5066,19 +5066,3 @@ const struct address_space_operations cifs_addr_ops_smallbuf = {
 	.launder_folio = cifs_launder_folio,
 	.migrate_folio = filemap_migrate_folio,
 };
-
-/*
- * Splice data from a file into a pipe.
- */
-ssize_t cifs_splice_read(struct file *in, loff_t *ppos,
-			 struct pipe_inode_info *pipe, size_t len,
-			 unsigned int flags)
-{
-	if (unlikely(*ppos >= file_inode(in)->i_sb->s_maxbytes))
-		return 0;
-	if (unlikely(!len))
-		return 0;
-	if (in->f_flags & O_DIRECT)
-		return direct_splice_read(in, ppos, pipe, len, flags);
-	return filemap_splice_read(in, ppos, pipe, len, flags);
-}
diff --git a/fs/splice.c b/fs/splice.c
index 90ccd3666dca..f46dd1fb367b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -387,29 +387,13 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
 				 struct pipe_inode_info *pipe, size_t len,
 				 unsigned int flags)
 {
-	struct iov_iter to;
-	struct kiocb kiocb;
-	int ret;
-
-	iov_iter_pipe(&to, ITER_DEST, pipe, len);
-	init_sync_kiocb(&kiocb, in);
-	kiocb.ki_pos = *ppos;
-	ret = call_read_iter(in, &kiocb, &to);
-	if (ret > 0) {
-		*ppos = kiocb.ki_pos;
-		file_accessed(in);
-	} else if (ret < 0) {
-		/* free what was emitted */
-		pipe_discard_from(pipe, to.start_head);
-		/*
-		 * callers of ->splice_read() expect -EAGAIN on
-		 * "can't put anything in there", rather than -EFAULT.
-		 */
-		if (ret == -EFAULT)
-			ret = -EAGAIN;
-	}
-
-	return ret;
+	if (unlikely(*ppos >= file_inode(in)->i_sb->s_maxbytes))
+		return 0;
+	if (unlikely(!len))
+		return 0;
+	if (in->f_flags & O_DIRECT)
+		return direct_splice_read(in, ppos, pipe, len, flags);
+	return filemap_splice_read(in, ppos, pipe, len, flags);
 }
 EXPORT_SYMBOL(generic_file_splice_read);
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 2723104cc06a..3a93515ae2ed 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2967,7 +2967,6 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
 
 	return total_spliced ? total_spliced : error;
 }
-EXPORT_SYMBOL(filemap_splice_read);
 
 static inline loff_t folio_seek_hole_data(struct xa_state *xas,
 		struct address_space *mapping, struct folio *folio,


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 08/14] iov_iter: Kill ITER_PIPE
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (6 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 07/14] splice: Do splice read from a file without using ITER_PIPE David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-08 16:52 ` [PATCH v17 09/14] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, John Hubbard

The ITER_PIPE-type iterator was only used for generic_file_splice_read(),
but that has now been switched to either pull pages directly from the
pagecache for buffered file splice-reads or to use ITER_BVEC instead for
O_DIRECT file splice-reads.  This leaves ITER_PIPE unused - so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 include/linux/uio.h |  14 --
 lib/iov_iter.c      | 429 +-------------------------------------------
 mm/filemap.c        |   3 +-
 3 files changed, 4 insertions(+), 442 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 27e3fd942960..74598426edb4 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -11,7 +11,6 @@
 #include <uapi/linux/uio.h>
 
 struct page;
-struct pipe_inode_info;
 
 typedef unsigned int __bitwise iov_iter_extraction_t;
 
@@ -25,7 +24,6 @@ enum iter_type {
 	ITER_IOVEC,
 	ITER_KVEC,
 	ITER_BVEC,
-	ITER_PIPE,
 	ITER_XARRAY,
 	ITER_DISCARD,
 	ITER_UBUF,
@@ -55,15 +53,10 @@ struct iov_iter {
 		const struct kvec *kvec;
 		const struct bio_vec *bvec;
 		struct xarray *xarray;
-		struct pipe_inode_info *pipe;
 		void __user *ubuf;
 	};
 	union {
 		unsigned long nr_segs;
-		struct {
-			unsigned int head;
-			unsigned int start_head;
-		};
 		loff_t xarray_start;
 	};
 };
@@ -101,11 +94,6 @@ static inline bool iov_iter_is_bvec(const struct iov_iter *i)
 	return iov_iter_type(i) == ITER_BVEC;
 }
 
-static inline bool iov_iter_is_pipe(const struct iov_iter *i)
-{
-	return iov_iter_type(i) == ITER_PIPE;
-}
-
 static inline bool iov_iter_is_discard(const struct iov_iter *i)
 {
 	return iov_iter_type(i) == ITER_DISCARD;
@@ -247,8 +235,6 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec
 			unsigned long nr_segs, size_t count);
 void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec,
 			unsigned long nr_segs, size_t count);
-void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe,
-			size_t count);
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
 void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray,
 		     loff_t start, size_t count);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 274014e4eafe..fad95e4cf372 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -14,8 +14,6 @@
 #include <linux/scatterlist.h>
 #include <linux/instrumented.h>
 
-#define PIPE_PARANOIA /* for now */
-
 /* covers ubuf and kbuf alike */
 #define iterate_buf(i, n, base, len, off, __p, STEP) {		\
 	size_t __maybe_unused off = 0;				\
@@ -186,150 +184,6 @@ static int copyin(void *to, const void __user *from, size_t n)
 	return res;
 }
 
-#ifdef PIPE_PARANOIA
-static bool sanity(const struct iov_iter *i)
-{
-	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int p_head = pipe->head;
-	unsigned int p_tail = pipe->tail;
-	unsigned int p_occupancy = pipe_occupancy(p_head, p_tail);
-	unsigned int i_head = i->head;
-	unsigned int idx;
-
-	if (i->last_offset) {
-		struct pipe_buffer *p;
-		if (unlikely(p_occupancy == 0))
-			goto Bad;	// pipe must be non-empty
-		if (unlikely(i_head != p_head - 1))
-			goto Bad;	// must be at the last buffer...
-
-		p = pipe_buf(pipe, i_head);
-		if (unlikely(p->offset + p->len != abs(i->last_offset)))
-			goto Bad;	// ... at the end of segment
-	} else {
-		if (i_head != p_head)
-			goto Bad;	// must be right after the last buffer
-	}
-	return true;
-Bad:
-	printk(KERN_ERR "idx = %d, offset = %d\n", i_head, i->last_offset);
-	printk(KERN_ERR "head = %d, tail = %d, buffers = %d\n",
-			p_head, p_tail, pipe->ring_size);
-	for (idx = 0; idx < pipe->ring_size; idx++)
-		printk(KERN_ERR "[%p %p %d %d]\n",
-			pipe->bufs[idx].ops,
-			pipe->bufs[idx].page,
-			pipe->bufs[idx].offset,
-			pipe->bufs[idx].len);
-	WARN_ON(1);
-	return false;
-}
-#else
-#define sanity(i) true
-#endif
-
-static struct page *push_anon(struct pipe_inode_info *pipe, unsigned size)
-{
-	struct page *page = alloc_page(GFP_USER);
-	if (page) {
-		struct pipe_buffer *buf = pipe_buf(pipe, pipe->head++);
-		*buf = (struct pipe_buffer) {
-			.ops = &default_pipe_buf_ops,
-			.page = page,
-			.offset = 0,
-			.len = size
-		};
-	}
-	return page;
-}
-
-static void push_page(struct pipe_inode_info *pipe, struct page *page,
-			unsigned int offset, unsigned int size)
-{
-	struct pipe_buffer *buf = pipe_buf(pipe, pipe->head++);
-	*buf = (struct pipe_buffer) {
-		.ops = &page_cache_pipe_buf_ops,
-		.page = page,
-		.offset = offset,
-		.len = size
-	};
-	get_page(page);
-}
-
-static inline int last_offset(const struct pipe_buffer *buf)
-{
-	if (buf->ops == &default_pipe_buf_ops)
-		return buf->len;	// buf->offset is 0 for those
-	else
-		return -(buf->offset + buf->len);
-}
-
-static struct page *append_pipe(struct iov_iter *i, size_t size,
-				unsigned int *off)
-{
-	struct pipe_inode_info *pipe = i->pipe;
-	int offset = i->last_offset;
-	struct pipe_buffer *buf;
-	struct page *page;
-
-	if (offset > 0 && offset < PAGE_SIZE) {
-		// some space in the last buffer; add to it
-		buf = pipe_buf(pipe, pipe->head - 1);
-		size = min_t(size_t, size, PAGE_SIZE - offset);
-		buf->len += size;
-		i->last_offset += size;
-		i->count -= size;
-		*off = offset;
-		return buf->page;
-	}
-	// OK, we need a new buffer
-	*off = 0;
-	size = min_t(size_t, size, PAGE_SIZE);
-	if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
-		return NULL;
-	page = push_anon(pipe, size);
-	if (!page)
-		return NULL;
-	i->head = pipe->head - 1;
-	i->last_offset = size;
-	i->count -= size;
-	return page;
-}
-
-static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
-			 struct iov_iter *i)
-{
-	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int head = pipe->head;
-
-	if (unlikely(bytes > i->count))
-		bytes = i->count;
-
-	if (unlikely(!bytes))
-		return 0;
-
-	if (!sanity(i))
-		return 0;
-
-	if (offset && i->last_offset == -offset) { // could we merge it?
-		struct pipe_buffer *buf = pipe_buf(pipe, head - 1);
-		if (buf->page == page) {
-			buf->len += bytes;
-			i->last_offset -= bytes;
-			i->count -= bytes;
-			return bytes;
-		}
-	}
-	if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
-		return 0;
-
-	push_page(pipe, page, offset, bytes);
-	i->last_offset = -(offset + bytes);
-	i->head = head;
-	i->count -= bytes;
-	return bytes;
-}
-
 /*
  * fault_in_iov_iter_readable - fault in iov iterator for reading
  * @i: iterator
@@ -433,46 +287,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 }
 EXPORT_SYMBOL(iov_iter_init);
 
-// returns the offset in partial buffer (if any)
-static inline unsigned int pipe_npages(const struct iov_iter *i, int *npages)
-{
-	struct pipe_inode_info *pipe = i->pipe;
-	int used = pipe->head - pipe->tail;
-	int off = i->last_offset;
-
-	*npages = max((int)pipe->max_usage - used, 0);
-
-	if (off > 0 && off < PAGE_SIZE) { // anon and not full
-		(*npages)++;
-		return off;
-	}
-	return 0;
-}
-
-static size_t copy_pipe_to_iter(const void *addr, size_t bytes,
-				struct iov_iter *i)
-{
-	unsigned int off, chunk;
-
-	if (unlikely(bytes > i->count))
-		bytes = i->count;
-	if (unlikely(!bytes))
-		return 0;
-
-	if (!sanity(i))
-		return 0;
-
-	for (size_t n = bytes; n; n -= chunk) {
-		struct page *page = append_pipe(i, n, &off);
-		chunk = min_t(size_t, n, PAGE_SIZE - off);
-		if (!page)
-			return bytes - n;
-		memcpy_to_page(page, off, addr, chunk);
-		addr += chunk;
-	}
-	return bytes;
-}
-
 static __wsum csum_and_memcpy(void *to, const void *from, size_t len,
 			      __wsum sum, size_t off)
 {
@@ -480,44 +294,10 @@ static __wsum csum_and_memcpy(void *to, const void *from, size_t len,
 	return csum_block_add(sum, next, off);
 }
 
-static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes,
-					 struct iov_iter *i, __wsum *sump)
-{
-	__wsum sum = *sump;
-	size_t off = 0;
-	unsigned int chunk, r;
-
-	if (unlikely(bytes > i->count))
-		bytes = i->count;
-	if (unlikely(!bytes))
-		return 0;
-
-	if (!sanity(i))
-		return 0;
-
-	while (bytes) {
-		struct page *page = append_pipe(i, bytes, &r);
-		char *p;
-
-		if (!page)
-			break;
-		chunk = min_t(size_t, bytes, PAGE_SIZE - r);
-		p = kmap_local_page(page);
-		sum = csum_and_memcpy(p + r, addr + off, chunk, sum, off);
-		kunmap_local(p);
-		off += chunk;
-		bytes -= chunk;
-	}
-	*sump = sum;
-	return off;
-}
-
 size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (WARN_ON_ONCE(i->data_source))
 		return 0;
-	if (unlikely(iov_iter_is_pipe(i)))
-		return copy_pipe_to_iter(addr, bytes, i);
 	if (user_backed_iter(i))
 		might_fault();
 	iterate_and_advance(i, bytes, base, len, off,
@@ -539,42 +319,6 @@ static int copyout_mc(void __user *to, const void *from, size_t n)
 	return n;
 }
 
-static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
-				struct iov_iter *i)
-{
-	size_t xfer = 0;
-	unsigned int off, chunk;
-
-	if (unlikely(bytes > i->count))
-		bytes = i->count;
-	if (unlikely(!bytes))
-		return 0;
-
-	if (!sanity(i))
-		return 0;
-
-	while (bytes) {
-		struct page *page = append_pipe(i, bytes, &off);
-		unsigned long rem;
-		char *p;
-
-		if (!page)
-			break;
-		chunk = min_t(size_t, bytes, PAGE_SIZE - off);
-		p = kmap_local_page(page);
-		rem = copy_mc_to_kernel(p + off, addr + xfer, chunk);
-		chunk -= rem;
-		kunmap_local(p);
-		xfer += chunk;
-		bytes -= chunk;
-		if (rem) {
-			iov_iter_revert(i, rem);
-			break;
-		}
-	}
-	return xfer;
-}
-
 /**
  * _copy_mc_to_iter - copy to iter with source memory error exception handling
  * @addr: source kernel address
@@ -594,9 +338,8 @@ static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
  *   alignment and poison alignment assumptions to avoid re-triggering
  *   hardware exceptions.
  *
- * * ITER_KVEC, ITER_PIPE, and ITER_BVEC can return short copies.
- *   Compare to copy_to_iter() where only ITER_IOVEC attempts might return
- *   a short copy.
+ * * ITER_KVEC and ITER_BVEC can return short copies.  Compare to
+ *   copy_to_iter() where only ITER_IOVEC attempts might return a short copy.
  *
  * Return: number of bytes copied (may be %0)
  */
@@ -604,8 +347,6 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (WARN_ON_ONCE(i->data_source))
 		return 0;
-	if (unlikely(iov_iter_is_pipe(i)))
-		return copy_mc_pipe_to_iter(addr, bytes, i);
 	if (user_backed_iter(i))
 		might_fault();
 	__iterate_and_advance(i, bytes, base, len, off,
@@ -711,8 +452,6 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 		return 0;
 	if (WARN_ON_ONCE(i->data_source))
 		return 0;
-	if (unlikely(iov_iter_is_pipe(i)))
-		return copy_page_to_iter_pipe(page, offset, bytes, i);
 	page += offset / PAGE_SIZE; // first subpage
 	offset %= PAGE_SIZE;
 	while (1) {
@@ -761,36 +500,8 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 }
 EXPORT_SYMBOL(copy_page_from_iter);
 
-static size_t pipe_zero(size_t bytes, struct iov_iter *i)
-{
-	unsigned int chunk, off;
-
-	if (unlikely(bytes > i->count))
-		bytes = i->count;
-	if (unlikely(!bytes))
-		return 0;
-
-	if (!sanity(i))
-		return 0;
-
-	for (size_t n = bytes; n; n -= chunk) {
-		struct page *page = append_pipe(i, n, &off);
-		char *p;
-
-		if (!page)
-			return bytes - n;
-		chunk = min_t(size_t, n, PAGE_SIZE - off);
-		p = kmap_local_page(page);
-		memset(p + off, 0, chunk);
-		kunmap_local(p);
-	}
-	return bytes;
-}
-
 size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
 {
-	if (unlikely(iov_iter_is_pipe(i)))
-		return pipe_zero(bytes, i);
 	iterate_and_advance(i, bytes, base, len, count,
 		clear_user(base, len),
 		memset(base, 0, len)
@@ -821,32 +532,6 @@ size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t byt
 }
 EXPORT_SYMBOL(copy_page_from_iter_atomic);
 
-static void pipe_advance(struct iov_iter *i, size_t size)
-{
-	struct pipe_inode_info *pipe = i->pipe;
-	int off = i->last_offset;
-
-	if (!off && !size) {
-		pipe_discard_from(pipe, i->start_head); // discard everything
-		return;
-	}
-	i->count -= size;
-	while (1) {
-		struct pipe_buffer *buf = pipe_buf(pipe, i->head);
-		if (off) /* make it relative to the beginning of buffer */
-			size += abs(off) - buf->offset;
-		if (size <= buf->len) {
-			buf->len = size;
-			i->last_offset = last_offset(buf);
-			break;
-		}
-		size -= buf->len;
-		i->head++;
-		off = 0;
-	}
-	pipe_discard_from(pipe, i->head + 1); // discard everything past this one
-}
-
 static void iov_iter_bvec_advance(struct iov_iter *i, size_t size)
 {
 	const struct bio_vec *bvec, *end;
@@ -898,8 +583,6 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
 		iov_iter_iovec_advance(i, size);
 	} else if (iov_iter_is_bvec(i)) {
 		iov_iter_bvec_advance(i, size);
-	} else if (iov_iter_is_pipe(i)) {
-		pipe_advance(i, size);
 	} else if (iov_iter_is_discard(i)) {
 		i->count -= size;
 	}
@@ -913,26 +596,6 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
 	if (WARN_ON(unroll > MAX_RW_COUNT))
 		return;
 	i->count += unroll;
-	if (unlikely(iov_iter_is_pipe(i))) {
-		struct pipe_inode_info *pipe = i->pipe;
-		unsigned int head = pipe->head;
-
-		while (head > i->start_head) {
-			struct pipe_buffer *b = pipe_buf(pipe, --head);
-			if (unroll < b->len) {
-				b->len -= unroll;
-				i->last_offset = last_offset(b);
-				i->head = head;
-				return;
-			}
-			unroll -= b->len;
-			pipe_buf_release(pipe, b);
-			pipe->head--;
-		}
-		i->last_offset = 0;
-		i->head = head;
-		return;
-	}
 	if (unlikely(iov_iter_is_discard(i)))
 		return;
 	if (unroll <= i->iov_offset) {
@@ -1020,24 +683,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
 }
 EXPORT_SYMBOL(iov_iter_bvec);
 
-void iov_iter_pipe(struct iov_iter *i, unsigned int direction,
-			struct pipe_inode_info *pipe,
-			size_t count)
-{
-	BUG_ON(direction != READ);
-	WARN_ON(pipe_full(pipe->head, pipe->tail, pipe->ring_size));
-	*i = (struct iov_iter){
-		.iter_type = ITER_PIPE,
-		.data_source = false,
-		.pipe = pipe,
-		.head = pipe->head,
-		.start_head = pipe->head,
-		.last_offset = 0,
-		.count = count
-	};
-}
-EXPORT_SYMBOL(iov_iter_pipe);
-
 /**
  * iov_iter_xarray - Initialise an I/O iterator to use the pages in an xarray
  * @i: The iterator to initialise.
@@ -1162,19 +807,6 @@ bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask,
 	if (iov_iter_is_bvec(i))
 		return iov_iter_aligned_bvec(i, addr_mask, len_mask);
 
-	if (iov_iter_is_pipe(i)) {
-		size_t size = i->count;
-
-		if (size & len_mask)
-			return false;
-		if (size && i->last_offset > 0) {
-			if (i->last_offset & addr_mask)
-				return false;
-		}
-
-		return true;
-	}
-
 	if (iov_iter_is_xarray(i)) {
 		if (i->count & len_mask)
 			return false;
@@ -1244,14 +876,6 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
 	if (iov_iter_is_bvec(i))
 		return iov_iter_alignment_bvec(i);
 
-	if (iov_iter_is_pipe(i)) {
-		size_t size = i->count;
-
-		if (size && i->last_offset > 0)
-			return size | i->last_offset;
-		return size;
-	}
-
 	if (iov_iter_is_xarray(i))
 		return (i->xarray_start + i->iov_offset) | i->count;
 
@@ -1303,36 +927,6 @@ static int want_pages_array(struct page ***res, size_t size,
 	return count;
 }
 
-static ssize_t pipe_get_pages(struct iov_iter *i,
-		   struct page ***pages, size_t maxsize, unsigned maxpages,
-		   size_t *start)
-{
-	unsigned int npages, count, off, chunk;
-	struct page **p;
-	size_t left;
-
-	if (!sanity(i))
-		return -EFAULT;
-
-	*start = off = pipe_npages(i, &npages);
-	if (!npages)
-		return -EFAULT;
-	count = want_pages_array(pages, maxsize, off, min(npages, maxpages));
-	if (!count)
-		return -ENOMEM;
-	p = *pages;
-	for (npages = 0, left = maxsize ; npages < count; npages++, left -= chunk) {
-		struct page *page = append_pipe(i, left, &off);
-		if (!page)
-			break;
-		chunk = min_t(size_t, left, PAGE_SIZE - off);
-		get_page(*p++ = page);
-	}
-	if (!npages)
-		return -EFAULT;
-	return maxsize - left;
-}
-
 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
 					  pgoff_t index, unsigned int nr_pages)
 {
@@ -1482,8 +1076,6 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		}
 		return maxsize;
 	}
-	if (iov_iter_is_pipe(i))
-		return pipe_get_pages(i, pages, maxsize, maxpages, start);
 	if (iov_iter_is_xarray(i))
 		return iter_xarray_get_pages(i, pages, maxsize, maxpages, start);
 	return -EFAULT;
@@ -1573,9 +1165,7 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
 	}
 
 	sum = csum_shift(csstate->csum, csstate->off);
-	if (unlikely(iov_iter_is_pipe(i)))
-		bytes = csum_and_copy_to_pipe_iter(addr, bytes, i, &sum);
-	else iterate_and_advance(i, bytes, base, len, off, ({
+	iterate_and_advance(i, bytes, base, len, off, ({
 		next = csum_and_copy_to_user(addr + off, base, len);
 		sum = csum_block_add(sum, next, off);
 		next ? 0 : len;
@@ -1660,15 +1250,6 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
 		return iov_npages(i, maxpages);
 	if (iov_iter_is_bvec(i))
 		return bvec_npages(i, maxpages);
-	if (iov_iter_is_pipe(i)) {
-		int npages;
-
-		if (!sanity(i))
-			return 0;
-
-		pipe_npages(i, &npages);
-		return min(npages, maxpages);
-	}
 	if (iov_iter_is_xarray(i)) {
 		unsigned offset = (i->xarray_start + i->iov_offset) % PAGE_SIZE;
 		int npages = DIV_ROUND_UP(offset + i->count, PAGE_SIZE);
@@ -1681,10 +1262,6 @@ EXPORT_SYMBOL(iov_iter_npages);
 const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 {
 	*new = *old;
-	if (unlikely(iov_iter_is_pipe(new))) {
-		WARN_ON(1);
-		return NULL;
-	}
 	if (iov_iter_is_bvec(new))
 		return new->bvec = kmemdup(new->bvec,
 				    new->nr_segs * sizeof(struct bio_vec),
diff --git a/mm/filemap.c b/mm/filemap.c
index 3a93515ae2ed..470be06b6096 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2690,8 +2690,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
 		if (unlikely(iocb->ki_pos >= i_size_read(inode)))
 			break;
 
-		error = filemap_get_pages(iocb, iter->count, &fbatch,
-					  iov_iter_is_pipe(iter));
+		error = filemap_get_pages(iocb, iter->count, &fbatch, false);
 		if (error < 0)
 			break;
 


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 09/14] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (7 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 08/14] iov_iter: Kill ITER_PIPE David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-08 21:08   ` Dave Chinner
  2023-03-14 17:33   ` Christoph Hellwig
  2023-03-08 16:52 ` [PATCH v17 10/14] block: Fix bio_flagged() so that gcc can better optimise it David Howells
                   ` (4 subsequent siblings)
  13 siblings, 2 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, John Hubbard

ZERO_PAGE can't go away, no need to hold an extra reference.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: linux-fsdevel@vger.kernel.org
---
 fs/iomap/direct-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index f771001574d0..850fb9870c2f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -202,7 +202,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 	bio->bi_private = dio;
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
-	get_page(page);
+	bio_set_flag(bio, BIO_NO_PAGE_REF);
 	__bio_add_page(bio, page, len, 0);
 	iomap_dio_submit_bio(iter, dio, bio, pos);
 }


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 10/14] block: Fix bio_flagged() so that gcc can better optimise it
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (8 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 09/14] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-08 16:52 ` [PATCH v17 11/14] block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic David Howells
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, John Hubbard

Fix bio_flagged() so that multiple instances of it, such as:

	if (bio_flagged(bio, BIO_PAGE_REFFED) ||
	    bio_flagged(bio, BIO_PAGE_PINNED))

can be combined by the gcc optimiser into a single test in assembly
(arguably, this is a compiler optimisation issue[1]).

The missed optimisation stems from bio_flagged() comparing the result of
the bitwise-AND to zero.  This results in an out-of-line bio_release_page()
being compiled to something like:

   <+0>:     mov    0x14(%rdi),%eax
   <+3>:     test   $0x1,%al
   <+5>:     jne    0xffffffff816dac53 <bio_release_pages+11>
   <+7>:     test   $0x2,%al
   <+9>:     je     0xffffffff816dac5c <bio_release_pages+20>
   <+11>:    movzbl %sil,%esi
   <+15>:    jmp    0xffffffff816daba1 <__bio_release_pages>
   <+20>:    jmp    0xffffffff81d0b800 <__x86_return_thunk>

However, the test is superfluous as the return type is bool.  Removing it
results in:

   <+0>:     testb  $0x3,0x14(%rdi)
   <+4>:     je     0xffffffff816e4af4 <bio_release_pages+15>
   <+6>:     movzbl %sil,%esi
   <+10>:    jmp    0xffffffff816dab7c <__bio_release_pages>
   <+15>:    jmp    0xffffffff81d0b7c0 <__x86_return_thunk>

instead.

Also, the MOVZBL instruction looks unnecessary[2] - I think it's just
're-booling' the mark_dirty parameter.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-block@vger.kernel.org
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1]
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2]
Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6
---
 include/linux/bio.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index d766be7152e1..d9d6df62ea57 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -229,7 +229,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count)
 
 static inline bool bio_flagged(struct bio *bio, unsigned int bit)
 {
-	return (bio->bi_flags & (1U << bit)) != 0;
+	return bio->bi_flags & (1U << bit);
 }
 
 static inline void bio_set_flag(struct bio *bio, unsigned int bit)


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 11/14] block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (9 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 10/14] block: Fix bio_flagged() so that gcc can better optimise it David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-08 16:52 ` [PATCH v17 12/14] block: Add BIO_PAGE_PINNED and associated infrastructure David Howells
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, John Hubbard

From: Christoph Hellwig <hch@lst.de>

Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted
meaning is only set when a page reference has been acquired that needs to
be released by bio_release_pages().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
---

Notes:
    ver #8)
     - Split out from another patch [hch].
     - Don't default to BIO_PAGE_REFFED [hch].
    
    ver #5)
     - Split from patch that uses iov_iter_extract_pages().

 block/bio.c               | 2 +-
 block/blk-map.c           | 1 +
 fs/direct-io.c            | 2 ++
 fs/iomap/direct-io.c      | 1 -
 include/linux/bio.h       | 2 +-
 include/linux/blk_types.h | 2 +-
 6 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index fd11614bba4d..4ff96a0e4091 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1190,7 +1190,6 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
 	bio->bi_io_vec = (struct bio_vec *)iter->bvec;
 	bio->bi_iter.bi_bvec_done = iter->iov_offset;
 	bio->bi_iter.bi_size = size;
-	bio_set_flag(bio, BIO_NO_PAGE_REF);
 	bio_set_flag(bio, BIO_CLONED);
 }
 
@@ -1335,6 +1334,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		return 0;
 	}
 
+	bio_set_flag(bio, BIO_PAGE_REFFED);
 	do {
 		ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
diff --git a/block/blk-map.c b/block/blk-map.c
index 9137d16cecdc..c77fdb1fbda7 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -281,6 +281,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 	if (blk_queue_pci_p2pdma(rq->q))
 		extraction_flags |= ITER_ALLOW_P2PDMA;
 
+	bio_set_flag(bio, BIO_PAGE_REFFED);
 	while (iov_iter_count(iter)) {
 		struct page **pages, *stack_pages[UIO_FASTIOV];
 		ssize_t bytes;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index ab0d7ea89813..47b90c68b369 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -403,6 +403,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 		bio->bi_end_io = dio_bio_end_aio;
 	else
 		bio->bi_end_io = dio_bio_end_io;
+	/* for now require references for all pages */
+	bio_set_flag(bio, BIO_PAGE_REFFED);
 	sdio->bio = bio;
 	sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
 }
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 850fb9870c2f..ceeb0a183cea 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -202,7 +202,6 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 	bio->bi_private = dio;
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
-	bio_set_flag(bio, BIO_NO_PAGE_REF);
 	__bio_add_page(bio, page, len, 0);
 	iomap_dio_submit_bio(iter, dio, bio, pos);
 }
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d9d6df62ea57..b537d03377f0 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -488,7 +488,7 @@ void zero_fill_bio(struct bio *bio);
 
 static inline void bio_release_pages(struct bio *bio, bool mark_dirty)
 {
-	if (!bio_flagged(bio, BIO_NO_PAGE_REF))
+	if (bio_flagged(bio, BIO_PAGE_REFFED))
 		__bio_release_pages(bio, mark_dirty);
 }
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 99be590f952f..7daa261f4f98 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -318,7 +318,7 @@ struct bio {
  * bio flags
  */
 enum {
-	BIO_NO_PAGE_REF,	/* don't put release vec pages */
+	BIO_PAGE_REFFED,	/* put pages in bio_release_pages() */
 	BIO_CLONED,		/* doesn't own data */
 	BIO_BOUNCED,		/* bio is a bounce bio */
 	BIO_QUIET,		/* Make BIO Quiet */


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 12/14] block: Add BIO_PAGE_PINNED and associated infrastructure
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (10 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 11/14] block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-08 16:52 ` [PATCH v17 13/14] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
  2023-03-08 16:52 ` [PATCH v17 14/14] block: convert bio_map_user_iov " David Howells
  13 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, John Hubbard

Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned
(FOLL_PIN) and that the pin will need removing.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
---

Notes:
    ver #10)
     - Drop bio_set_cleanup_mode(), open coding it instead.
    
    ver #9)
     - Only consider pinning in bio_set_cleanup_mode().  Ref'ing pages in
       struct bio is going away.
     - page_put_unpin() is removed; call unpin_user_page() and put_page()
       directly.
     - Use bio_release_page() in __bio_release_pages().
     - BIO_PAGE_PINNED and BIO_PAGE_REFFED can't both be set, so use if-else
       when testing both of them.
    
    ver #8)
     - Move the infrastructure to clean up pinned pages to this patch [hch].
     - Put BIO_PAGE_PINNED before BIO_PAGE_REFFED as the latter should
       probably be removed at some point.  FOLL_PIN can then be renumbered
       first.

 block/bio.c               |  6 +++---
 block/blk.h               | 12 ++++++++++++
 include/linux/bio.h       |  3 ++-
 include/linux/blk_types.h |  1 +
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 4ff96a0e4091..51ae957cc4b6 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1168,7 +1168,7 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty)
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		if (mark_dirty && !PageCompound(bvec->bv_page))
 			set_page_dirty_lock(bvec->bv_page);
-		put_page(bvec->bv_page);
+		bio_release_page(bio, bvec->bv_page);
 	}
 }
 EXPORT_SYMBOL_GPL(__bio_release_pages);
@@ -1488,8 +1488,8 @@ void bio_set_pages_dirty(struct bio *bio)
  * the BIO and re-dirty the pages in process context.
  *
  * It is expected that bio_check_pages_dirty() will wholly own the BIO from
- * here on.  It will run one put_page() against each page and will run one
- * bio_put() against the BIO.
+ * here on.  It will unpin each page and will run one bio_put() against the
+ * BIO.
  */
 
 static void bio_dirty_fn(struct work_struct *work);
diff --git a/block/blk.h b/block/blk.h
index cc4e8873dfde..d65d96994a94 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -432,6 +432,18 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
 		struct page *page, unsigned int len, unsigned int offset,
 		unsigned int max_sectors, bool *same_page);
 
+/*
+ * Clean up a page appropriately, where the page may be pinned, may have a
+ * ref taken on it or neither.
+ */
+static inline void bio_release_page(struct bio *bio, struct page *page)
+{
+	if (bio_flagged(bio, BIO_PAGE_PINNED))
+		unpin_user_page(page);
+	else if (bio_flagged(bio, BIO_PAGE_REFFED))
+		put_page(page);
+}
+
 struct request_queue *blk_alloc_queue(int node_id);
 
 int disk_scan_partitions(struct gendisk *disk, fmode_t mode);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index b537d03377f0..d8c30c791a9a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -488,7 +488,8 @@ void zero_fill_bio(struct bio *bio);
 
 static inline void bio_release_pages(struct bio *bio, bool mark_dirty)
 {
-	if (bio_flagged(bio, BIO_PAGE_REFFED))
+	if (bio_flagged(bio, BIO_PAGE_REFFED) ||
+	    bio_flagged(bio, BIO_PAGE_PINNED))
 		__bio_release_pages(bio, mark_dirty);
 }
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 7daa261f4f98..a0e339ff3d09 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -318,6 +318,7 @@ struct bio {
  * bio flags
  */
 enum {
+	BIO_PAGE_PINNED,	/* Unpin pages in bio_release_pages() */
 	BIO_PAGE_REFFED,	/* put pages in bio_release_pages() */
 	BIO_CLONED,		/* doesn't own data */
 	BIO_BOUNCED,		/* bio is a bounce bio */


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 13/14] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (11 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 12/14] block: Add BIO_PAGE_PINNED and associated infrastructure David Howells
@ 2023-03-08 16:52 ` David Howells
  2023-03-08 16:52 ` [PATCH v17 14/14] block: convert bio_map_user_iov " David Howells
  13 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, John Hubbard

This will pin pages or leave them unaltered rather than getting a ref on
them as appropriate to the iterator.

The pages need to be pinned for DIO rather than having refs taken on them to
prevent VM copy-on-write from malfunctioning during a concurrent fork() (the
result of the I/O could otherwise end up being affected by/visible to the
child process).

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
---

Notes:
    ver #10)
     - Drop bio_set_cleanup_mode(), open coding it instead.
    
    ver #8)
     - Split the patch up a bit [hch].
     - We should only be using pinned/non-pinned pages and not ref'd pages,
       so adjust the comments appropriately.
    
    ver #7)
     - Don't treat BIO_PAGE_REFFED/PINNED as being the same as FOLL_GET/PIN.
    
    ver #5)
     - Transcribe the FOLL_* flags returned by iov_iter_extract_pages() to
       BIO_* flags and got rid of bi_cleanup_mode.
     - Replaced BIO_NO_PAGE_REF to BIO_PAGE_REFFED in the preceding patch.

 block/bio.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 51ae957cc4b6..fc98c1c723ca 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1204,7 +1204,7 @@ static int bio_iov_add_page(struct bio *bio, struct page *page,
 	}
 
 	if (same_page)
-		put_page(page);
+		bio_release_page(bio, page);
 	return 0;
 }
 
@@ -1218,7 +1218,7 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
 			queue_max_zone_append_sectors(q), &same_page) != len)
 		return -EINVAL;
 	if (same_page)
-		put_page(page);
+		bio_release_page(bio, page);
 	return 0;
 }
 
@@ -1229,10 +1229,10 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
  * @bio: bio to add pages to
  * @iter: iov iterator describing the region to be mapped
  *
- * Pins pages from *iter and appends them to @bio's bvec array. The
- * pages will have to be released using put_page() when done.
- * For multi-segment *iter, this function only adds pages from the
- * next non-empty segment of the iov iterator.
+ * Extracts pages from *iter and appends them to @bio's bvec array.  The pages
+ * will have to be cleaned up in the way indicated by the BIO_PAGE_PINNED flag.
+ * For a multi-segment *iter, this function only adds pages from the next
+ * non-empty segment of the iov iterator.
  */
 static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
@@ -1264,9 +1264,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 * result to ensure the bio's total size is correct. The remainder of
 	 * the iov data will be picked up in the next bio iteration.
 	 */
-	size = iov_iter_get_pages(iter, pages,
-				  UINT_MAX - bio->bi_iter.bi_size,
-				  nr_pages, &offset, extraction_flags);
+	size = iov_iter_extract_pages(iter, &pages,
+				      UINT_MAX - bio->bi_iter.bi_size,
+				      nr_pages, extraction_flags, &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
@@ -1299,7 +1299,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	iov_iter_revert(iter, left);
 out:
 	while (i < nr_pages)
-		put_page(pages[i++]);
+		bio_release_page(bio, pages[i++]);
 
 	return ret;
 }
@@ -1334,7 +1334,8 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		return 0;
 	}
 
-	bio_set_flag(bio, BIO_PAGE_REFFED);
+	if (iov_iter_extract_will_pin(iter))
+		bio_set_flag(bio, BIO_PAGE_PINNED);
 	do {
 		ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v17 14/14] block: convert bio_map_user_iov to use iov_iter_extract_pages
  2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
                   ` (12 preceding siblings ...)
  2023-03-08 16:52 ` [PATCH v17 13/14] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
@ 2023-03-08 16:52 ` David Howells
  13 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 16:52 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, John Hubbard

This will pin pages or leave them unaltered rather than getting a ref on
them as appropriate to the iterator.

The pages need to be pinned for DIO rather than having refs taken on them
to prevent VM copy-on-write from malfunctioning during a concurrent fork()
(the result of the I/O could otherwise end up being visible to/affected by
the child process).

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
---

Notes:
    ver #10)
     - Drop bio_set_cleanup_mode(), open coding it instead.
    
    ver #8)
     - Split the patch up a bit [hch].
     - We should only be using pinned/non-pinned pages and not ref'd pages,
       so adjust the comments appropriately.
    
    ver #7)
     - Don't treat BIO_PAGE_REFFED/PINNED as being the same as FOLL_GET/PIN.
    
    ver #5)
     - Transcribe the FOLL_* flags returned by iov_iter_extract_pages() to
       BIO_* flags and got rid of bi_cleanup_mode.
     - Replaced BIO_NO_PAGE_REF to BIO_PAGE_REFFED in the preceding patch.

 block/blk-map.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index c77fdb1fbda7..7b12f4bb4d4c 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -280,22 +280,21 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 
 	if (blk_queue_pci_p2pdma(rq->q))
 		extraction_flags |= ITER_ALLOW_P2PDMA;
+	if (iov_iter_extract_will_pin(iter))
+		bio_set_flag(bio, BIO_PAGE_PINNED);
 
-	bio_set_flag(bio, BIO_PAGE_REFFED);
 	while (iov_iter_count(iter)) {
-		struct page **pages, *stack_pages[UIO_FASTIOV];
+		struct page *stack_pages[UIO_FASTIOV];
+		struct page **pages = stack_pages;
 		ssize_t bytes;
 		size_t offs;
 		int npages;
 
-		if (nr_vecs <= ARRAY_SIZE(stack_pages)) {
-			pages = stack_pages;
-			bytes = iov_iter_get_pages(iter, pages, LONG_MAX,
-						   nr_vecs, &offs, extraction_flags);
-		} else {
-			bytes = iov_iter_get_pages_alloc(iter, &pages,
-						LONG_MAX, &offs, extraction_flags);
-		}
+		if (nr_vecs > ARRAY_SIZE(stack_pages))
+			pages = NULL;
+
+		bytes = iov_iter_extract_pages(iter, &pages, LONG_MAX,
+					       nr_vecs, extraction_flags, &offs);
 		if (unlikely(bytes <= 0)) {
 			ret = bytes ? bytes : -EFAULT;
 			goto out_unmap;
@@ -317,7 +316,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 				if (!bio_add_hw_page(rq->q, bio, page, n, offs,
 						     max_sectors, &same_page)) {
 					if (same_page)
-						put_page(page);
+						bio_release_page(bio, page);
 					break;
 				}
 
@@ -329,7 +328,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 		 * release the pages we didn't map into the bio, if any
 		 */
 		while (j < npages)
-			put_page(pages[j++]);
+			bio_release_page(bio, pages[j++]);
 		if (pages != stack_pages)
 			kvfree(pages);
 		/* couldn't stuff something into bio? */


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 09/14] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing
  2023-03-08 16:52 ` [PATCH v17 09/14] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
@ 2023-03-08 21:08   ` Dave Chinner
  2023-03-14 17:33   ` Christoph Hellwig
  1 sibling, 0 replies; 33+ messages in thread
From: Dave Chinner @ 2023-03-08 21:08 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, John Hubbard

On Wed, Mar 08, 2023 at 04:52:46PM +0000, David Howells wrote:
> ZERO_PAGE can't go away, no need to hold an extra reference.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: David Hildenbrand <david@redhat.com>
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/iomap/direct-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index f771001574d0..850fb9870c2f 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -202,7 +202,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
>  	bio->bi_private = dio;
>  	bio->bi_end_io = iomap_dio_bio_end_io;
>  
> -	get_page(page);
> +	bio_set_flag(bio, BIO_NO_PAGE_REF);
>  	__bio_add_page(bio, page, len, 0);
>  	iomap_dio_submit_bio(iter, dio, bio, pos);
>  }

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 03/14] shmem: Implement splice-read
  2023-03-08 16:52 ` [PATCH v17 03/14] shmem: Implement splice-read David Howells
@ 2023-03-08 22:39   ` Linus Torvalds
  2023-03-14 16:42     ` Matthew Wilcox
  2023-03-08 23:42   ` David Howells
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2023-03-08 22:39 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, linux-fsdevel, linux-block, linux-kernel, linux-mm,
	Daniel Golle, Guenter Roeck, Christoph Hellwig, John Hubbard,
	Hugh Dickins

On Wed, Mar 8, 2023 at 8:53 AM David Howells <dhowells@redhat.com> wrote:
>
> The new filemap_splice_read() has an implicit expectation via
> filemap_get_pages() that ->read_folio() exists if ->readahead() doesn't
> fully populate the pagecache of the file it is reading from[1], potentially
> leading to a jump to NULL if this doesn't exist.  shmem, however, (and by
> extension, tmpfs, ramfs and rootfs), doesn't have ->read_folio(),

This patch is the only one in your series that I went "Ugh, that's
really ugly" for.

Do we really want to basically duplicate all of filemap_splice_read()?

I get the feeling that the zeropage case just isn't so important that
we'd need to duplicate filemap_splice_read() just for that, and I
think that the code should either

 (a) just make a silly "read_folio()" for shmfs that just clears the page.

     Ugly but maybe simple and not horrid?

or

 (b) teach filemap_splice_read() that a NULL 'read_folio' function
means "use the zero page"

     That might not be splice() itself, but maybe in
filemap_get_pages() or something.

or

 (c) go even further, and teach read_folio() in general about file
holes, and allow *any* filesystem to read zeroes that way in general
without creating a folio for it.

in a perfect world, if done well I think shmem_file_read_iter() should
go away, and it could use generic_file_read_iter too.

I dunno. Maybe shm really is *so* special that this is the right way
to do things, but I did react quite negatively to this patch. So not a
complete NAK, but definitely a "do we _really_ have to do this?"

                       Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 03/14] shmem: Implement splice-read
  2023-03-08 16:52 ` [PATCH v17 03/14] shmem: Implement splice-read David Howells
  2023-03-08 22:39   ` Linus Torvalds
@ 2023-03-08 23:42   ` David Howells
  1 sibling, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-08 23:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jan Kara, Jeff Layton, David Hildenbrand, Jason Gunthorpe,
	Logan Gunthorpe, Hillf Danton, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Daniel Golle, Guenter Roeck,
	Christoph Hellwig, John Hubbard, Hugh Dickins

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I get the feeling that the zeropage case just isn't so important that we'd
> need to duplicate filemap_splice_read() just for that, and I think that the
> code should either
> 
>  (a) just make a silly "read_folio()" for shmfs that just clears the page.
> 
>      Ugly but maybe simple and not horrid?

The problem with that is that once a page is in the pagecache attached to a
shmem file, we can't get rid of it without deleting or truncating the file...
At least, I think that the case.  For all other regular filesystems, a page
full of zeros can be flushed/discarded by the LRU.

shmem also has its own function for allocating folios in its pagecache, the
caller of ->read_folio() would probably have to use that.

>  (b) teach filemap_splice_read() that a NULL 'read_folio' function
> means "use the zero page"
> 
>      That might not be splice() itself, but maybe in
> filemap_get_pages() or something.

It would require some special handling in filemap_get_pages() - and/or
probably better filemap_splice_read() since, for shmem, it's only relevant to
splice.  An additional flag could be added to filemap_get_pages() to tell it
to stop at a hole in the pagecache rather than invoking readahead.
filemap_splice_read() would then need to examine the pagecache to work out how
big the hole is and insert the appropriate number of zeropages before calling
back into filemap_get_pages() again.  Possibly it could use SEEK_DATA.

> or
> 
>  (c) go even further, and teach read_folio() in general about file
> holes, and allow *any* filesystem to read zeroes that way in general
> without creating a folio for it.

Nice idea, but we'd need a way to store a "negative" marker (as opposed to
"unknown") in the pagecache for the filemap code to be able to use it.  This
sort of thing might become easier if xarray gets switched to a maple tree
implementation as that would better allow for caching of a known file hole of
arbitrary size with a single entry.

But for the moment, the filemap code would have to jump through a filesystem's
->readahead or ->read_folio vectors to work out if there's a hole there or not
- but in both cases it must already have allocated the pages it wants to
query.

David


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 05/14] coda: Implement splice-read
  2023-03-08 16:52 ` [PATCH v17 05/14] coda: " David Howells
@ 2023-03-13 13:28   ` Jan Harkes
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Harkes @ 2023-03-13 13:28 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, John Hubbard,
	linux-unionfs

That actually looks better than the original code because this brings
in the experimental read intent hinting which allows userspace to
mediate access to partially cached files.

Jan


On Wed, Mar 08, 2023 at 11:53:19AM -0500, David Howells wrote:
> Implement splice-read for coda by passing the request down a layer rather
> than going through generic_file_splice_read() which is going to be changed
> to assume that ->read_folio() is present on buffered files.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>

> cc: Jan Harkes <jaharkes@cs.cmu.edu>
> cc: Christoph Hellwig <hch@lst.de>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: John Hubbard <jhubbard@nvidia.com>
> cc: David Hildenbrand <david@redhat.com>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: coda@cs.cmu.edu
> cc: codalist@coda.cs.cmu.edu
> cc: linux-unionfs@vger.kernel.org
> cc: linux-block@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-mm@kvack.org
> ---
> 
> Notes:
>     ver #17)
>      - Use vfs_splice_read() helper rather than open-coding checks.
> 
>  fs/coda/file.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/coda/file.c b/fs/coda/file.c
> index 3f3c81e6b1ab..12b26bd13564 100644
> --- a/fs/coda/file.c
> +++ b/fs/coda/file.c
> @@ -23,6 +23,7 @@
>  #include <linux/slab.h>
>  #include <linux/uaccess.h>
>  #include <linux/uio.h>
> +#include <linux/splice.h>
>  
>  #include <linux/coda.h>
>  #include "coda_psdev.h"
> @@ -94,6 +95,32 @@ coda_file_write_iter(struct kiocb *iocb, struct iov_iter *to)
>  	return ret;
>  }
>  
> +static ssize_t
> +coda_file_splice_read(struct file *coda_file, loff_t *ppos,
> +		      struct pipe_inode_info *pipe,
> +		      size_t len, unsigned int flags)
> +{
> +	struct inode *coda_inode = file_inode(coda_file);
> +	struct coda_file_info *cfi = coda_ftoc(coda_file);
> +	struct file *in = cfi->cfi_container;
> +	loff_t ki_pos = *ppos;
> +	ssize_t ret;
> +
> +	ret = venus_access_intent(coda_inode->i_sb, coda_i2f(coda_inode),
> +				  &cfi->cfi_access_intent,
> +				  len, ki_pos, CODA_ACCESS_TYPE_READ);
> +	if (ret)
> +		goto finish_read;
> +
> +	ret = vfs_splice_read(in, ppos, pipe, len, flags);
> +
> +finish_read:
> +	venus_access_intent(coda_inode->i_sb, coda_i2f(coda_inode),
> +			    &cfi->cfi_access_intent,
> +			    len, ki_pos, CODA_ACCESS_TYPE_READ_FINISH);
> +	return ret;
> +}
> +
>  static void
>  coda_vm_open(struct vm_area_struct *vma)
>  {
> @@ -302,5 +329,5 @@ const struct file_operations coda_file_operations = {
>  	.open		= coda_open,
>  	.release	= coda_release,
>  	.fsync		= coda_fsync,
> -	.splice_read	= generic_file_splice_read,
> +	.splice_read	= coda_file_splice_read,
>  };
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 03/14] shmem: Implement splice-read
  2023-03-08 22:39   ` Linus Torvalds
@ 2023-03-14 16:42     ` Matthew Wilcox
  2023-03-14 18:02       ` Linus Torvalds
  2023-03-14 18:26       ` David Howells
  0 siblings, 2 replies; 33+ messages in thread
From: Matthew Wilcox @ 2023-03-14 16:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Howells, Jens Axboe, Al Viro, Christoph Hellwig, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, linux-fsdevel, linux-block, linux-kernel, linux-mm,
	Daniel Golle, Guenter Roeck, Christoph Hellwig, John Hubbard,
	Hugh Dickins

On Wed, Mar 08, 2023 at 02:39:00PM -0800, Linus Torvalds wrote:
> On Wed, Mar 8, 2023 at 8:53 AM David Howells <dhowells@redhat.com> wrote:
> >
> > The new filemap_splice_read() has an implicit expectation via
> > filemap_get_pages() that ->read_folio() exists if ->readahead() doesn't
> > fully populate the pagecache of the file it is reading from[1], potentially
> > leading to a jump to NULL if this doesn't exist.  shmem, however, (and by
> > extension, tmpfs, ramfs and rootfs), doesn't have ->read_folio(),
> 
> This patch is the only one in your series that I went "Ugh, that's
> really ugly" for.
> 
> Do we really want to basically duplicate all of filemap_splice_read()?
> 
> I get the feeling that the zeropage case just isn't so important that
> we'd need to duplicate filemap_splice_read() just for that, and I
> think that the code should either
> 
>  (a) just make a silly "read_folio()" for shmfs that just clears the page.
> 
>      Ugly but maybe simple and not horrid?

The problem is that we might have swapped out the shmem folio.  So we
don't want to clear the page, but ask swap to fill the page.  The way
that currently works (see shmem_get_folio_gfp()) is to fetch the swap
entry from the page cache, allocate a new folio inside the shmem code,
then replace the swap entry with the new folio.

What I'd like to see is the generic code say "Ah, this is a shmem
inode, so it's special and the xa_value entry is swap information,
not workingset information, so I'll allocate the folio and restore
the folio->private swap information to let the shmem_read_folio
function do its job correctly".

Either that or we completely overhaul the shmem code to store the
location of its swapped data somewhere that's not the page cache.

>  (b) teach filemap_splice_read() that a NULL 'read_folio' function
> means "use the zero page"

Same problem as (a).

>  (c) go even further, and teach read_folio() in general about file
> holes, and allow *any* filesystem to read zeroes that way in general
> without creating a folio for it.

I've had thoughts along those lines in the past.  It's pretty major
surgery, I think.  At the moment, we allocate the pages and add them
to the page cache in a locked state before asking the filesystem to
populate them.  So the fs doesn't even have the file layout (eg the
get_block or iomap info) that would tell it where the holes are until
the page has already been allocated and inserted.  We could of course
free the page and replace it with a special 'THIS_IS_A_HOLE' entry.
It's just never seemed important enuogh to me to do this surgery.

> in a perfect world, if done well I think shmem_file_read_iter() should
> go away, and it could use generic_file_read_iter too.
> 
> I dunno. Maybe shm really is *so* special that this is the right way
> to do things, but I did react quite negatively to this patch. So not a
> complete NAK, but definitely a "do we _really_ have to do this?"

I'd really like to see shmem have a read_folio implementation.  I
don't know how much work it's going to be.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 01/14] splice: Clean up direct_splice_read() a bit
  2023-03-08 16:52 ` [PATCH v17 01/14] splice: Clean up direct_splice_read() a bit David Howells
@ 2023-03-14 17:30   ` Christoph Hellwig
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2023-03-14 17:30 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, John Hubbard

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 02/14] splice: Make do_splice_to() generic and export it
  2023-03-08 16:52 ` [PATCH v17 02/14] splice: Make do_splice_to() generic and export it David Howells
@ 2023-03-14 17:31   ` Christoph Hellwig
  2023-03-14 21:15   ` David Howells
  2023-03-15 16:34   ` [RFC PATCH] splice: Convert longs and some ints into ssize_t David Howells
  2 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2023-03-14 17:31 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Miklos Szeredi, Christoph Hellwig,
	John Hubbard, linux-unionfs

> -static long do_splice_to(struct file *in, loff_t *ppos,
> -			 struct pipe_inode_info *pipe, size_t len,
> -			 unsigned int flags)
> +long vfs_splice_read(struct file *in, loff_t *ppos,

The (pre-existing) long here is odd given that ->splice_read
returns a ssize_t.  This might be a good time to fix that up.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 07/14] splice: Do splice read from a file without using ITER_PIPE
  2023-03-08 16:52 ` [PATCH v17 07/14] splice: Do splice read from a file without using ITER_PIPE David Howells
@ 2023-03-14 17:32   ` Christoph Hellwig
  2023-03-14 21:52   ` David Howells
  1 sibling, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2023-03-14 17:32 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Steve French, Christoph Hellwig,
	John Hubbard, linux-cifs

On Wed, Mar 08, 2023 at 04:52:44PM +0000, David Howells wrote:
> Make generic_file_splice_read() use filemap_splice_read() and
> direct_splice_read() rather than using an ITER_PIPE and call_read_iter().
> 
> Make cifs use generic_file_splice_read() rather than doing it for itself.

Please split the cifs patch out into a separate one.

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 09/14] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing
  2023-03-08 16:52 ` [PATCH v17 09/14] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
  2023-03-08 21:08   ` Dave Chinner
@ 2023-03-14 17:33   ` Christoph Hellwig
  1 sibling, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2023-03-14 17:33 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, John Hubbard

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 03/14] shmem: Implement splice-read
  2023-03-14 16:42     ` Matthew Wilcox
@ 2023-03-14 18:02       ` Linus Torvalds
  2023-03-14 20:08         ` Matthew Wilcox
  2023-03-14 18:26       ` David Howells
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2023-03-14 18:02 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Howells, Jens Axboe, Al Viro, Christoph Hellwig, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, linux-fsdevel, linux-block, linux-kernel, linux-mm,
	Daniel Golle, Guenter Roeck, Christoph Hellwig, John Hubbard,
	Hugh Dickins

On Tue, Mar 14, 2023 at 9:43 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> The problem is that we might have swapped out the shmem folio.  So we
> don't want to clear the page, but ask swap to fill the page.

Doesn't shmem_swapin_folio() already basically do all that work?

The real oddity with shmem - compared to other filesystems - is that
the xarray has a value entry instead of being a real folio. And yes,
the current filemap code will then just ignore such entries as
"doesn't exist", and so the regular read iterators will all fail on
it.

But while filemap_get_read_batch() will stop at a value-folio, I feel
like filemap_create_folio() should be able to turn a value page into a
"real" page. Right now it already allocates said page, but then I
think filemap_add_folio() will return -EEXIST when said entry exists
as a value.

But *if* instead of -EEXIST we could just replace the value with the
(already locked) page, and have some sane way to pass that value
(which is the swap entry data) to readpage(), I think that should just
do it all.

Admittedly I really don't know this area very well, so I may be
*entirely* out to lunch.

But the whole "teach the filemap code to actually react to XA value
entries" would be how I'd solve the hole issue too. So I think there
are commonalities here.

             Linus
               Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 03/14] shmem: Implement splice-read
  2023-03-14 16:42     ` Matthew Wilcox
  2023-03-14 18:02       ` Linus Torvalds
@ 2023-03-14 18:26       ` David Howells
  2023-03-14 19:07         ` Linus Torvalds
  2023-03-14 21:50         ` David Howells
  1 sibling, 2 replies; 33+ messages in thread
From: David Howells @ 2023-03-14 18:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Matthew Wilcox, Jens Axboe, Al Viro, Christoph Hellwig,
	Jan Kara, Jeff Layton, David Hildenbrand, Jason Gunthorpe,
	Logan Gunthorpe, Hillf Danton, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Daniel Golle, Guenter Roeck,
	Christoph Hellwig, John Hubbard, Hugh Dickins

Hi Linus,

Are you okay if we go with my current patch for the moment?

David


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 03/14] shmem: Implement splice-read
  2023-03-14 18:26       ` David Howells
@ 2023-03-14 19:07         ` Linus Torvalds
  2023-03-14 19:09           ` Linus Torvalds
  2023-03-14 21:50         ` David Howells
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2023-03-14 19:07 UTC (permalink / raw)
  To: David Howells
  Cc: Matthew Wilcox, Jens Axboe, Al Viro, Christoph Hellwig, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, linux-fsdevel, linux-block, linux-kernel, linux-mm,
	Daniel Golle, Guenter Roeck, Christoph Hellwig, John Hubbard,
	Hugh Dickins

On Tue, Mar 14, 2023 at 11:26 AM David Howells <dhowells@redhat.com> wrote:
>
> Are you okay if we go with my current patch for the moment?

I  guess.

But please at least stop doing the

     get_page(buf->page);

on the zero-page (which includes using no-op .get and .put functions
in  zero_pipe_buf_ops().

Maybe we can do /dev/null some day and actually have a common case for those.

             Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 03/14] shmem: Implement splice-read
  2023-03-14 19:07         ` Linus Torvalds
@ 2023-03-14 19:09           ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2023-03-14 19:09 UTC (permalink / raw)
  To: David Howells
  Cc: Matthew Wilcox, Jens Axboe, Al Viro, Christoph Hellwig, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, linux-fsdevel, linux-block, linux-kernel, linux-mm,
	Daniel Golle, Guenter Roeck, Christoph Hellwig, John Hubbard,
	Hugh Dickins

On Tue, Mar 14, 2023 at 12:07 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Maybe we can do /dev/null some day and actually have a common case for those.

/dev/zero, I mean. We already do splice to /dev/null (and splicing
from /dev/null isn't interesting ;)

            Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 03/14] shmem: Implement splice-read
  2023-03-14 18:02       ` Linus Torvalds
@ 2023-03-14 20:08         ` Matthew Wilcox
  0 siblings, 0 replies; 33+ messages in thread
From: Matthew Wilcox @ 2023-03-14 20:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Howells, Jens Axboe, Al Viro, Christoph Hellwig, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, linux-fsdevel, linux-block, linux-kernel, linux-mm,
	Daniel Golle, Guenter Roeck, Christoph Hellwig, John Hubbard,
	Hugh Dickins

On Tue, Mar 14, 2023 at 11:02:40AM -0700, Linus Torvalds wrote:
> On Tue, Mar 14, 2023 at 9:43 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > The problem is that we might have swapped out the shmem folio.  So we
> > don't want to clear the page, but ask swap to fill the page.
> 
> Doesn't shmem_swapin_folio() already basically do all that work?
> 
> The real oddity with shmem - compared to other filesystems - is that
> the xarray has a value entry instead of being a real folio. And yes,
> the current filemap code will then just ignore such entries as
> "doesn't exist", and so the regular read iterators will all fail on
> it.
> 
> But while filemap_get_read_batch() will stop at a value-folio, I feel
> like filemap_create_folio() should be able to turn a value page into a
> "real" page. Right now it already allocates said page, but then I
> think filemap_add_folio() will return -EEXIST when said entry exists
> as a value.
> 
> But *if* instead of -EEXIST we could just replace the value with the
> (already locked) page, and have some sane way to pass that value
> (which is the swap entry data) to readpage(), I think that should just
> do it all.

This was basically what I had in mind:

I don't think this will handle a case like:

Alloc order-0 folio at index 4
Alloc order-0 folio at index 7
Swap out both folios
Alloc order-9 folio at indices 0-511

But I don't see where shmem currently handles that either.  Maybe it
falls back to order-0 folios instead of the crude BUG_ON I put in.
Hugh?

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 82c1262f396f..30f2502314de 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -114,12 +114,6 @@ int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
 struct folio *shmem_read_folio_gfp(struct address_space *mapping,
 		pgoff_t index, gfp_t gfp);
 
-static inline struct folio *shmem_read_folio(struct address_space *mapping,
-		pgoff_t index)
-{
-	return shmem_read_folio_gfp(mapping, index, mapping_gfp_mask(mapping));
-}
-
 static inline struct page *shmem_read_mapping_page(
 				struct address_space *mapping, pgoff_t index)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index 57c1b154fb5a..8e4f95c5b65a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -877,6 +877,8 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 					order, gfp);
 		xas_lock_irq(&xas);
 		xas_for_each_conflict(&xas, entry) {
+			if (old)
+				BUG_ON(shmem_mapping(mapping));
 			old = entry;
 			if (!xa_is_value(entry)) {
 				xas_set_err(&xas, -EEXIST);
@@ -885,7 +887,12 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 		}
 
 		if (old) {
-			if (shadowp)
+			if (shmem_mapping(mapping)) {
+				folio_set_swap_entry(folio,
+						radix_to_swp_entry(old));
+				folio_set_swapcache(folio);
+				folio_set_swapbacked(folio);
+			} else if (shadowp)
 				*shadowp = old;
 			/* entry may have been split before we acquired lock */
 			order = xa_get_order(xas.xa, xas.xa_index);
diff --git a/mm/shmem.c b/mm/shmem.c
index 8e60826e4246..ea75c7dcf5ec 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2059,6 +2059,18 @@ int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
 			mapping_gfp_mask(inode->i_mapping), NULL, NULL, NULL);
 }
 
+static int shmem_read_folio(struct file *file, struct folio *folio)
+{
+	if (folio_test_swapcache(folio)) {
+		swap_readpage(&folio->page, true, NULL);
+	} else {
+		folio_zero_segment(folio, 0, folio_size(folio));
+		folio_mark_uptodate(folio);
+		folio_unlock(folio);
+	}
+	return 0;
+}
+
 /*
  * This is like autoremove_wake_function, but it removes the wait queue
  * entry unconditionally - even if something else had already woken the
@@ -2396,7 +2408,8 @@ static int shmem_fadvise_willneed(struct address_space *mapping,
 	xa_for_each_range(&mapping->i_pages, index, folio, start, end) {
 		if (!xa_is_value(folio))
 			continue;
-		folio = shmem_read_folio(mapping, index);
+		folio = shmem_read_folio_gfp(mapping, index,
+						mapping_gfp_mask(mapping));
 		if (!IS_ERR(folio))
 			folio_put(folio);
 	}
@@ -4037,6 +4050,7 @@ static int shmem_error_remove_page(struct address_space *mapping,
 }
 
 const struct address_space_operations shmem_aops = {
+	.read_folio	= shmem_read_folio,
 	.writepage	= shmem_writepage,
 	.dirty_folio	= noop_dirty_folio,
 #ifdef CONFIG_TMPFS

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 02/14] splice: Make do_splice_to() generic and export it
  2023-03-08 16:52 ` [PATCH v17 02/14] splice: Make do_splice_to() generic and export it David Howells
  2023-03-14 17:31   ` Christoph Hellwig
@ 2023-03-14 21:15   ` David Howells
  2023-03-15 16:34   ` [RFC PATCH] splice: Convert longs and some ints into ssize_t David Howells
  2 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-14 21:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, Jens Axboe, Al Viro, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Miklos Szeredi, Christoph Hellwig,
	John Hubbard, linux-unionfs

Christoph Hellwig <hch@infradead.org> wrote:

> > -static long do_splice_to(struct file *in, loff_t *ppos,
> > -			 struct pipe_inode_info *pipe, size_t len,
> > -			 unsigned int flags)
> > +long vfs_splice_read(struct file *in, loff_t *ppos,
> 
> The (pre-existing) long here is odd given that ->splice_read
> returns a ssize_t.  This might be a good time to fix that up.

There's a while lot of places in splice.c that use long rather than ssize_t.
I can produce a separate patch that changes them all, but that should perhaps
be separate to this.

David


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 03/14] shmem: Implement splice-read
  2023-03-14 18:26       ` David Howells
  2023-03-14 19:07         ` Linus Torvalds
@ 2023-03-14 21:50         ` David Howells
  1 sibling, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-14 21:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Matthew Wilcox, Jens Axboe, Al Viro, Christoph Hellwig,
	Jan Kara, Jeff Layton, David Hildenbrand, Jason Gunthorpe,
	Logan Gunthorpe, Hillf Danton, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Daniel Golle, Guenter Roeck,
	Christoph Hellwig, John Hubbard, Hugh Dickins

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> But please at least stop doing the
> 
>      get_page(buf->page);
> 
> on the zero-page (which includes using no-op .get and .put functions
> in  zero_pipe_buf_ops().

I'll make the attached change.  It seems to work.

David
---
diff --git a/mm/shmem.c b/mm/shmem.c
index 3cbec1d56112..d9b60ab556fe 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2719,6 +2719,17 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	return retval ? retval : error;
 }
 
+static bool zero_pipe_buf_get(struct pipe_inode_info *pipe,
+			      struct pipe_buffer *buf)
+{
+	return true;
+}
+
+static void zero_pipe_buf_release(struct pipe_inode_info *pipe,
+				  struct pipe_buffer *buf)
+{
+}
+
 static bool zero_pipe_buf_try_steal(struct pipe_inode_info *pipe,
 				    struct pipe_buffer *buf)
 {
@@ -2726,9 +2737,9 @@ static bool zero_pipe_buf_try_steal(struct pipe_inode_info *pipe,
 }
 
 static const struct pipe_buf_operations zero_pipe_buf_ops = {
-	.release	= generic_pipe_buf_release,
+	.release	= zero_pipe_buf_release,
 	.try_steal	= zero_pipe_buf_try_steal,
-	.get		= generic_pipe_buf_get,
+	.get		= zero_pipe_buf_get,
 };
 
 static size_t splice_zeropage_into_pipe(struct pipe_inode_info *pipe,


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v17 07/14] splice: Do splice read from a file without using ITER_PIPE
  2023-03-08 16:52 ` [PATCH v17 07/14] splice: Do splice read from a file without using ITER_PIPE David Howells
  2023-03-14 17:32   ` Christoph Hellwig
@ 2023-03-14 21:52   ` David Howells
  1 sibling, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-14 21:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, Jens Axboe, Al Viro, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Steve French, Christoph Hellwig,
	John Hubbard, linux-cifs

Christoph Hellwig <hch@infradead.org> wrote:

> > Make cifs use generic_file_splice_read() rather than doing it for itself.
> 
> Please split the cifs patch out into a separate one.

Okay - I've done that.  It needs to go after this one and the unexport of
filemap_splice_read() needs to go with it (or in a separate follow-up patch).

David


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH] splice: Convert longs and some ints into ssize_t
  2023-03-08 16:52 ` [PATCH v17 02/14] splice: Make do_splice_to() generic and export it David Howells
  2023-03-14 17:31   ` Christoph Hellwig
  2023-03-14 21:15   ` David Howells
@ 2023-03-15 16:34   ` David Howells
  2 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2023-03-15 16:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, Jens Axboe, Al Viro, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Linus Torvalds, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Miklos Szeredi, Christoph Hellwig,
	John Hubbard, linux-unionfs

Christoph Hellwig <hch@infradead.org> wrote:

> The (pre-existing) long here is odd given that ->splice_read
> returns a ssize_t.  This might be a good time to fix that up.

Here's a patch to do that.  I'm not sure yet that I've got all the places that
need changing as there are a couple of function pointer-taking functions where
the pointed-to function return value should be changed.

There are a couple of potential bugs fixed here too, where something takes a
size_t length, but counts the data spliced in an int.  iter_to_pipe() for
example.

David
---
splice: Convert longs and some ints into ssize_t

Convert 'long' and some 'int' into ssize_t in the code involved in splice.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/ZBCvdKQskS46qyV3@infradead.org/
---
 drivers/char/mem.c            |    4 -
 drivers/char/virtio_console.c |    7 +-
 fs/splice.c                   |  119 +++++++++++++++++++++---------------------
 include/linux/splice.h        |   24 ++++----
 4 files changed, 78 insertions(+), 76 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index ffb101d349f0..230b72e12c54 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -468,8 +468,8 @@ static ssize_t write_iter_null(struct kiocb *iocb, struct iov_iter *from)
 	return count;
 }
 
-static int pipe_to_null(struct pipe_inode_info *info, struct pipe_buffer *buf,
-			struct splice_desc *sd)
+static ssize_t pipe_to_null(struct pipe_inode_info *info, struct pipe_buffer *buf,
+			    struct splice_desc *sd)
 {
 	return sd->len;
 }
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index d5ac4d955bc8..d38bee859d5c 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -854,11 +854,12 @@ struct sg_list {
 	struct scatterlist *sg;
 };
 
-static int pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
-			struct splice_desc *sd)
+static ssize_t pipe_to_sg(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
+			  struct splice_desc *sd)
 {
 	struct sg_list *sgl = sd->u.data;
-	unsigned int offset, len;
+	ssize_t len;
+	size_t offset;
 
 	if (sgl->n == sgl->size)
 		return 0;
diff --git a/fs/splice.c b/fs/splice.c
index f46dd1fb367b..2bfa94d21346 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -186,7 +186,8 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
 	unsigned int tail = pipe->tail;
 	unsigned int head = pipe->head;
 	unsigned int mask = pipe->ring_size - 1;
-	int ret = 0, page_nr = 0;
+	ssize_t ret = 0;
+	int page_nr = 0;
 
 	if (!spd_pages)
 		return 0;
@@ -232,7 +233,7 @@ ssize_t add_to_pipe(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
 	unsigned int head = pipe->head;
 	unsigned int tail = pipe->tail;
 	unsigned int mask = pipe->ring_size - 1;
-	int ret;
+	ssize_t ret;
 
 	if (unlikely(!pipe->readers)) {
 		send_sig(SIGPIPE, current, 0);
@@ -414,8 +415,8 @@ EXPORT_SYMBOL(nosteal_pipe_buf_ops);
  * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos'
  * using sendpage(). Return the number of bytes sent.
  */
-static int pipe_to_sendpage(struct pipe_inode_info *pipe,
-			    struct pipe_buffer *buf, struct splice_desc *sd)
+static ssize_t pipe_to_sendpage(struct pipe_inode_info *pipe,
+				struct pipe_buffer *buf, struct splice_desc *sd)
 {
 	struct file *file = sd->u.file;
 	loff_t pos = sd->pos;
@@ -468,7 +469,7 @@ static int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_des
 	unsigned int head = pipe->head;
 	unsigned int tail = pipe->tail;
 	unsigned int mask = pipe->ring_size - 1;
-	int ret;
+	ssize_t ret;
 
 	while (!pipe_empty(head, tail)) {
 		struct pipe_buffer *buf = &pipe->bufs[tail & mask];
@@ -621,7 +622,7 @@ static void splice_from_pipe_end(struct pipe_inode_info *pipe, struct splice_des
 ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd,
 			   splice_actor *actor)
 {
-	int ret;
+	ssize_t ret;
 
 	splice_from_pipe_begin(sd);
 	do {
@@ -827,8 +828,8 @@ static int warn_unsupported(struct file *file, const char *op)
 /*
  * Attempt to initiate a splice from pipe to file.
  */
-static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
-			   loff_t *ppos, size_t len, unsigned int flags)
+static ssize_t do_splice_from(struct pipe_inode_info *pipe, struct file *out,
+			      loff_t *ppos, size_t len, unsigned int flags)
 {
 	if (unlikely(!out->f_op->splice_write))
 		return warn_unsupported(out, "write");
@@ -850,12 +851,12 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
  * If successful, it returns the amount of data spliced, 0 if it hit the EOF or
  * a hole and a negative error code otherwise.
  */
-long vfs_splice_read(struct file *in, loff_t *ppos,
-		     struct pipe_inode_info *pipe, size_t len,
-		     unsigned int flags)
+ssize_t vfs_splice_read(struct file *in, loff_t *ppos,
+			struct pipe_inode_info *pipe, size_t len,
+			unsigned int flags)
 {
 	unsigned int p_space;
-	int ret;
+	ssize_t ret;
 
 	if (unlikely(!(in->f_mode & FMODE_READ)))
 		return -EBADF;
@@ -894,7 +895,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 			       splice_direct_actor *actor)
 {
 	struct pipe_inode_info *pipe;
-	long ret, bytes;
+	ssize_t ret, bytes;
 	size_t len;
 	int i, flags, more;
 
@@ -1007,7 +1008,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 }
 EXPORT_SYMBOL(splice_direct_to_actor);
 
-static int direct_splice_actor(struct pipe_inode_info *pipe,
+static ssize_t direct_splice_actor(struct pipe_inode_info *pipe,
 			       struct splice_desc *sd)
 {
 	struct file *file = sd->u.file;
@@ -1032,8 +1033,8 @@ static int direct_splice_actor(struct pipe_inode_info *pipe,
  *    can splice directly through a process-private pipe.
  *
  */
-long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
-		      loff_t *opos, size_t len, unsigned int flags)
+ssize_t do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
+			 loff_t *opos, size_t len, unsigned int flags)
 {
 	struct splice_desc sd = {
 		.len		= len,
@@ -1043,7 +1044,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
 		.u.file		= out,
 		.opos		= opos,
 	};
-	long ret;
+	ssize_t ret;
 
 	if (unlikely(!(out->f_mode & FMODE_WRITE)))
 		return -EBADF;
@@ -1080,16 +1081,16 @@ static int wait_for_space(struct pipe_inode_info *pipe, unsigned flags)
 	}
 }
 
-static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
-			       struct pipe_inode_info *opipe,
-			       size_t len, unsigned int flags);
+static ssize_t splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
+				   struct pipe_inode_info *opipe,
+				   size_t len, unsigned int flags);
 
-long splice_file_to_pipe(struct file *in,
-			 struct pipe_inode_info *opipe,
-			 loff_t *offset,
-			 size_t len, unsigned int flags)
+ssize_t splice_file_to_pipe(struct file *in,
+			    struct pipe_inode_info *opipe,
+			    loff_t *offset,
+			    size_t len, unsigned int flags)
 {
-	long ret;
+	ssize_t ret;
 
 	pipe_lock(opipe);
 	ret = wait_for_space(opipe, flags);
@@ -1104,13 +1105,13 @@ long splice_file_to_pipe(struct file *in,
 /*
  * Determine where to splice to/from.
  */
-long do_splice(struct file *in, loff_t *off_in, struct file *out,
-	       loff_t *off_out, size_t len, unsigned int flags)
+ssize_t do_splice(struct file *in, loff_t *off_in, struct file *out,
+		  loff_t *off_out, size_t len, unsigned int flags)
 {
 	struct pipe_inode_info *ipipe;
 	struct pipe_inode_info *opipe;
 	loff_t offset;
-	long ret;
+	ssize_t ret;
 
 	if (unlikely(!(in->f_mode & FMODE_READ) ||
 		     !(out->f_mode & FMODE_WRITE)))
@@ -1192,14 +1193,14 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out,
 	return -EINVAL;
 }
 
-static long __do_splice(struct file *in, loff_t __user *off_in,
-			struct file *out, loff_t __user *off_out,
-			size_t len, unsigned int flags)
+static ssize_t __do_splice(struct file *in, loff_t __user *off_in,
+			   struct file *out, loff_t __user *off_out,
+			   size_t len, unsigned int flags)
 {
 	struct pipe_inode_info *ipipe;
 	struct pipe_inode_info *opipe;
 	loff_t offset, *__off_in = NULL, *__off_out = NULL;
-	long ret;
+	ssize_t ret;
 
 	ipipe = get_pipe_info(in, true);
 	opipe = get_pipe_info(out, true);
@@ -1232,16 +1233,16 @@ static long __do_splice(struct file *in, loff_t __user *off_in,
 	return ret;
 }
 
-static int iter_to_pipe(struct iov_iter *from,
-			struct pipe_inode_info *pipe,
-			unsigned flags)
+static ssize_t iter_to_pipe(struct iov_iter *from,
+			    struct pipe_inode_info *pipe,
+			    unsigned flags)
 {
 	struct pipe_buffer buf = {
 		.ops = &user_page_pipe_buf_ops,
 		.flags = flags
 	};
 	size_t total = 0;
-	int ret = 0;
+	ssize_t ret = 0;
 
 	while (iov_iter_count(from)) {
 		struct page *pages[16];
@@ -1257,7 +1258,7 @@ static int iter_to_pipe(struct iov_iter *from,
 
 		n = DIV_ROUND_UP(left + start, PAGE_SIZE);
 		for (i = 0; i < n; i++) {
-			int size = min_t(int, left, PAGE_SIZE - start);
+			size_t size = min_t(int, left, PAGE_SIZE - start);
 
 			buf.page = pages[i];
 			buf.offset = start;
@@ -1279,10 +1280,10 @@ static int iter_to_pipe(struct iov_iter *from,
 	return total ? total : ret;
 }
 
-static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
-			struct splice_desc *sd)
+static ssize_t pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
+			    struct splice_desc *sd)
 {
-	int n = copy_page_to_iter(buf->page, buf->offset, sd->len, sd->u.data);
+	size_t n = copy_page_to_iter(buf->page, buf->offset, sd->len, sd->u.data);
 	return n == sd->len ? n : -EFAULT;
 }
 
@@ -1290,8 +1291,8 @@ static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
  * For lack of a better implementation, implement vmsplice() to userspace
  * as a simple copy of the pipes pages to the user iov.
  */
-static long vmsplice_to_user(struct file *file, struct iov_iter *iter,
-			     unsigned int flags)
+static ssize_t vmsplice_to_user(struct file *file, struct iov_iter *iter,
+				unsigned int flags)
 {
 	struct pipe_inode_info *pipe = get_pipe_info(file, true);
 	struct splice_desc sd = {
@@ -1299,7 +1300,7 @@ static long vmsplice_to_user(struct file *file, struct iov_iter *iter,
 		.flags = flags,
 		.u.data = iter
 	};
-	long ret = 0;
+	ssize_t ret = 0;
 
 	if (!pipe)
 		return -EBADF;
@@ -1318,12 +1319,12 @@ static long vmsplice_to_user(struct file *file, struct iov_iter *iter,
  * as splice-from-memory, where the regular splice is splice-from-file (or
  * to file). In both cases the output is a pipe, naturally.
  */
-static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
-			     unsigned int flags)
+static ssize_t vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
+				unsigned int flags)
 {
 	struct pipe_inode_info *pipe;
-	long ret = 0;
-	unsigned buf_flag = 0;
+	ssize_t ret = 0;
+	unsigned int buf_flag = 0;
 
 	if (flags & SPLICE_F_GIFT)
 		buf_flag = PIPE_BUF_FLAG_GIFT;
@@ -1414,7 +1415,7 @@ SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in,
 		size_t, len, unsigned int, flags)
 {
 	struct fd in, out;
-	long error;
+	ssize_t error;
 
 	if (unlikely(!len))
 		return 0;
@@ -1514,15 +1515,15 @@ static int opipe_prep(struct pipe_inode_info *pipe, unsigned int flags)
 /*
  * Splice contents of ipipe to opipe.
  */
-static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
-			       struct pipe_inode_info *opipe,
-			       size_t len, unsigned int flags)
+static ssize_t splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
+				   struct pipe_inode_info *opipe,
+				   size_t len, unsigned int flags)
 {
 	struct pipe_buffer *ibuf, *obuf;
 	unsigned int i_head, o_head;
 	unsigned int i_tail, o_tail;
 	unsigned int i_mask, o_mask;
-	int ret = 0;
+	ssize_t ret = 0;
 	bool input_wakeup = false;
 
 
@@ -1651,15 +1652,15 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 /*
  * Link contents of ipipe to opipe.
  */
-static int link_pipe(struct pipe_inode_info *ipipe,
-		     struct pipe_inode_info *opipe,
-		     size_t len, unsigned int flags)
+static ssize_t link_pipe(struct pipe_inode_info *ipipe,
+			 struct pipe_inode_info *opipe,
+			 size_t len, unsigned int flags)
 {
 	struct pipe_buffer *ibuf, *obuf;
 	unsigned int i_head, o_head;
 	unsigned int i_tail, o_tail;
 	unsigned int i_mask, o_mask;
-	int ret = 0;
+	ssize_t ret = 0;
 
 	/*
 	 * Potential ABBA deadlock, work around it by ordering lock
@@ -1742,11 +1743,11 @@ static int link_pipe(struct pipe_inode_info *ipipe,
  * The 'flags' used are the SPLICE_F_* variants, currently the only
  * applicable one is SPLICE_F_NONBLOCK.
  */
-long do_tee(struct file *in, struct file *out, size_t len, unsigned int flags)
+ssize_t do_tee(struct file *in, struct file *out, size_t len, unsigned int flags)
 {
 	struct pipe_inode_info *ipipe = get_pipe_info(in, true);
 	struct pipe_inode_info *opipe = get_pipe_info(out, true);
-	int ret = -EINVAL;
+	ssize_t ret = -EINVAL;
 
 	if (unlikely(!(in->f_mode & FMODE_READ) ||
 		     !(out->f_mode & FMODE_WRITE)))
@@ -1778,7 +1779,7 @@ long do_tee(struct file *in, struct file *out, size_t len, unsigned int flags)
 SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags)
 {
 	struct fd in, out;
-	int error;
+	ssize_t error;
 
 	if (unlikely(flags & ~SPLICE_F_ALL))
 		return -EINVAL;
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 8f052c3dae95..6af5e197ccd0 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -62,10 +62,10 @@ struct splice_pipe_desc {
 	void (*spd_release)(struct splice_pipe_desc *, unsigned int);
 };
 
-typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
-			   struct splice_desc *);
-typedef int (splice_direct_actor)(struct pipe_inode_info *,
-				  struct splice_desc *);
+typedef ssize_t (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
+			       struct splice_desc *);
+typedef ssize_t (splice_direct_actor)(struct pipe_inode_info *,
+				      struct splice_desc *);
 
 extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *,
 				loff_t *, size_t, unsigned int,
@@ -76,17 +76,17 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
 			      struct splice_pipe_desc *);
 extern ssize_t add_to_pipe(struct pipe_inode_info *,
 			      struct pipe_buffer *);
-long vfs_splice_read(struct file *in, loff_t *ppos,
-		     struct pipe_inode_info *pipe, size_t len,
-		     unsigned int flags);
+ssize_t vfs_splice_read(struct file *in, loff_t *ppos,
+			struct pipe_inode_info *pipe, size_t len,
+			unsigned int flags);
 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
 				      splice_direct_actor *);
-extern long do_splice(struct file *in, loff_t *off_in,
-		      struct file *out, loff_t *off_out,
-		      size_t len, unsigned int flags);
+extern ssize_t do_splice(struct file *in, loff_t *off_in,
+			 struct file *out, loff_t *off_out,
+			 size_t len, unsigned int flags);
 
-extern long do_tee(struct file *in, struct file *out, size_t len,
-		   unsigned int flags);
+extern ssize_t do_tee(struct file *in, struct file *out, size_t len,
+		      unsigned int flags);
 
 /*
  * for dynamic pipe sizing


^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-03-15 16:35 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
2023-03-08 16:52 ` [PATCH v17 01/14] splice: Clean up direct_splice_read() a bit David Howells
2023-03-14 17:30   ` Christoph Hellwig
2023-03-08 16:52 ` [PATCH v17 02/14] splice: Make do_splice_to() generic and export it David Howells
2023-03-14 17:31   ` Christoph Hellwig
2023-03-14 21:15   ` David Howells
2023-03-15 16:34   ` [RFC PATCH] splice: Convert longs and some ints into ssize_t David Howells
2023-03-08 16:52 ` [PATCH v17 03/14] shmem: Implement splice-read David Howells
2023-03-08 22:39   ` Linus Torvalds
2023-03-14 16:42     ` Matthew Wilcox
2023-03-14 18:02       ` Linus Torvalds
2023-03-14 20:08         ` Matthew Wilcox
2023-03-14 18:26       ` David Howells
2023-03-14 19:07         ` Linus Torvalds
2023-03-14 19:09           ` Linus Torvalds
2023-03-14 21:50         ` David Howells
2023-03-08 23:42   ` David Howells
2023-03-08 16:52 ` [PATCH v17 04/14] overlayfs: " David Howells
2023-03-08 16:52 ` [PATCH v17 05/14] coda: " David Howells
2023-03-13 13:28   ` Jan Harkes
2023-03-08 16:52 ` [PATCH v17 06/14] tty, proc, kernfs, random: Use direct_splice_read() David Howells
2023-03-08 16:52 ` [PATCH v17 07/14] splice: Do splice read from a file without using ITER_PIPE David Howells
2023-03-14 17:32   ` Christoph Hellwig
2023-03-14 21:52   ` David Howells
2023-03-08 16:52 ` [PATCH v17 08/14] iov_iter: Kill ITER_PIPE David Howells
2023-03-08 16:52 ` [PATCH v17 09/14] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
2023-03-08 21:08   ` Dave Chinner
2023-03-14 17:33   ` Christoph Hellwig
2023-03-08 16:52 ` [PATCH v17 10/14] block: Fix bio_flagged() so that gcc can better optimise it David Howells
2023-03-08 16:52 ` [PATCH v17 11/14] block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic David Howells
2023-03-08 16:52 ` [PATCH v17 12/14] block: Add BIO_PAGE_PINNED and associated infrastructure David Howells
2023-03-08 16:52 ` [PATCH v17 13/14] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
2023-03-08 16:52 ` [PATCH v17 14/14] block: convert bio_map_user_iov " David Howells

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.