All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list)
@ 2023-01-23 17:29 David Howells
  2023-01-23 17:29 ` [PATCH v8 01/10] iov_iter: Define flags to qualify page extraction David Howells
                   ` (12 more replies)
  0 siblings, 13 replies; 75+ messages in thread
From: David Howells @ 2023-01-23 17:29 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel

Hi Al, Christoph,

Here are patches to provide support for extracting pages from an iov_iter
and to use this in the extraction functions in the block layer bio code.

The patches make the following changes:

 (1) Add a function, iov_iter_extract_pages() to replace
     iov_iter_get_pages*() that gets refs, pins or just lists the pages as
     appropriate to the iterator type.

     Add a function, iov_iter_extract_mode() that will indicate from the
     iterator type how the cleanup is to be performed, returning FOLL_PIN
     or 0.

 (2) Add a function, folio_put_unpin(), and a wrapper, page_put_unpin(),
     that take a page and the return from iov_iter_extract_mode() and do
     the right thing to clean up the page.

 (3) Make the bio struct carry a pair of flags to indicate the cleanup
     mode.  BIO_NO_PAGE_REF is replaced with BIO_PAGE_REFFED (equivalent to
     FOLL_GET) and BIO_PAGE_PINNED (equivalent to BIO_PAGE_PINNED) is
     added.

 (4) Add a function, bio_release_page(), to release a page appropriately to
     the cleanup mode indicated by the BIO_PAGE_* flags.

 (5) Make the iter-to-bio code use iov_iter_extract_pages() to retain the
     pages appropriately and clean them up later.

 (6) Fix bio_flagged() so that it doesn't prevent a gcc optimisation.

 (7) Renumber FOLL_PIN and FOLL_GET down so that they're at bits 0 and 1
     and coincident with BIO_PAGE_PINNED and BIO_PAGE_REFFED.  The compiler
     can then optimise on that.  Also, it's probably going to be necessary
     to embed these in the page pointer in sk_buff fragments.  This patch
     can go independently through the mm tree.

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-extract

David

Changes:
========
ver #8)
 - Import Christoph Hellwig's changes.
   - Split the conversion-to-extraction patch.
   - Drop the extract_flags arg from iov_iter_extract_mode().
   - Don't default bios to BIO_PAGE_REFFED, but set explicitly.
 - Switch FOLL_PIN and FOLL_GET when renumbering so PIN is at bit 0.
 - Switch BIO_PAGE_PINNED and BIO_PAGE_REFFED so PINNED is at bit 0.
 - We should always be using FOLL_PIN (not FOLL_GET) for DIO, so adjust the
   patches for that.

ver #7)
 - For now, drop the parts to pass the I/O direction to iov_iter_*pages*()
   as it turned out to be a lot more complicated, with places not setting
   IOCB_WRITE when they should, for example.
 - Drop all the patches that changed things other then the block layer's
   bio handling.  The netfslib and cifs changes can go into a separate
   patchset.
 - Add support for extracting pages from KVEC-type iterators.
 - When extracting from BVEC/KVEC, skip over empty vecs at the front.

ver #6)
 - Fix write() syscall and co. not setting IOCB_WRITE.
 - Added iocb_is_read() and iocb_is_write() to check IOCB_WRITE.
 - Use op_is_write() in bio_copy_user_iov().
 - Drop the iterator direction checks from smbd_recv().
 - Define FOLL_SOURCE_BUF and FOLL_DEST_BUF and pass them in as part of
   gup_flags to iov_iter_get/extract_pages*().
 - Replace iov_iter_get_pages*2() with iov_iter_get_pages*() and remove.
 - Add back the function to indicate the cleanup mode.
 - Drop the cleanup_mode return arg to iov_iter_extract_pages().
 - Provide a helper to clean up a page.
 - Renumbered FOLL_GET and FOLL_PIN and made BIO_PAGE_REFFED/PINNED have
   the same numerical values, enforced with an assertion.
 - Converted AF_ALG, SCSI vhost, generic DIO, FUSE, splice to pipe, 9P and
   NFS.
 - Added in the patches to make CIFS do top-to-bottom iterators and use
   various of the added extraction functions.
 - Added a pair of work-in-progess patches to make sk_buff fragments store
   FOLL_GET and FOLL_PIN.

ver #5)
 - Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED and split into own patch.
 - Transcribe FOLL_GET/PIN into BIO_PAGE_REFFED/PINNED flags.
 - Add patch to allow bio_flagged() to be combined by gcc.

ver #4)
 - Drop the patch to move the FOLL_* flags to linux/mm_types.h as they're
   no longer referenced by linux/uio.h.
 - Add ITER_SOURCE/DEST cleanup patches.
 - Make iov_iter/netfslib iter extraction patches use ITER_SOURCE/DEST.
 - Allow additional gup_flags to be passed into iov_iter_extract_pages().
 - Add struct bio patch.

ver #3)
 - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access
   to get/pin_user_pages_fast()[1].

ver #2)
 - Rolled the extraction cleanup mode query function into the extraction
   function, returning the indication through the argument list.
 - Fixed patch 4 (extract to scatterlist) to actually use the new
   extraction API.

Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1]
Link: https://lore.kernel.org/r/166697254399.61150.1256557652599252121.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166722777223.2555743.162508599131141451.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732024173.3186319.18204305072070871546.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166869687556.3723671.10061142538708346995.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166920902005.1461876.2786264600108839814.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/166997419665.9475.15014699817597102032.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/167305160937.1521586.133299343565358971.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/167344725490.2425628.13771289553670112965.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk/ # v6
Link: https://lore.kernel.org/r/20230120175556.3556978-1-dhowells@redhat.com/ # v7

Christoph Hellwig (2):
  iomap: don't get an reference on ZERO_PAGE for direct I/O block
    zeroing
  block: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the
    meaning

David Howells (8):
  iov_iter: Define flags to qualify page extraction.
  iov_iter: Add a function to extract a page list from an iterator
  mm: Provide a helper to drop a pin/ref on a page
  block: Fix bio_flagged() so that gcc can better optimise it
  block: Switch to pinning pages.
  block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
  block: convert bio_map_user_iov to use iov_iter_extract_pages
  mm: Renumber FOLL_PIN and FOLL_GET down

 block/bio.c               |  33 ++--
 block/blk-map.c           |  25 ++-
 block/blk.h               |  28 ++++
 fs/direct-io.c            |   2 +
 fs/iomap/direct-io.c      |   1 -
 include/linux/bio.h       |   5 +-
 include/linux/blk_types.h |   3 +-
 include/linux/mm.h        |  35 ++--
 include/linux/uio.h       |  29 +++-
 lib/iov_iter.c            | 334 +++++++++++++++++++++++++++++++++++++-
 mm/gup.c                  |  22 +++
 11 files changed, 461 insertions(+), 56 deletions(-)


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v8 01/10] iov_iter: Define flags to qualify page extraction.
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
@ 2023-01-23 17:29 ` David Howells
  2023-01-23 18:20   ` Christoph Hellwig
  2023-01-24  2:12   ` John Hubbard
  2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 75+ messages in thread
From: David Howells @ 2023-01-23 17:29 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel

Define flags to qualify page extraction to pass into iov_iter_*_pages*()
rather than passing in FOLL_* flags.

For now only a flag to allow peer-to-peer DMA is supported.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
---

Notes:
    ver #7)
     - Don't use FOLL_* as a parameter, but rather define constants
       specifically to use with iov_iter_*_pages*().
     - Drop the I/O direction constants for now.

 block/bio.c         |  6 +++---
 block/blk-map.c     |  8 ++++----
 include/linux/uio.h |  7 +++++--
 lib/iov_iter.c      | 14 ++++++++------
 4 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index ab59a491a883..a289bbff036f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1249,7 +1249,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
-	unsigned int gup_flags = 0;
+	unsigned int extract_flags = 0;
 	ssize_t size, left;
 	unsigned len, i = 0;
 	size_t offset, trim;
@@ -1264,7 +1264,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
 	if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
-		gup_flags |= FOLL_PCI_P2PDMA;
+		extract_flags |= ITER_ALLOW_P2PDMA;
 
 	/*
 	 * Each segment in the iov is required to be a block size multiple.
@@ -1275,7 +1275,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 */
 	size = iov_iter_get_pages(iter, pages,
 				  UINT_MAX - bio->bi_iter.bi_size,
-				  nr_pages, &offset, gup_flags);
+				  nr_pages, &offset, extract_flags);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
diff --git a/block/blk-map.c b/block/blk-map.c
index 19940c978c73..bc111261fc82 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -267,7 +267,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 {
 	unsigned int max_sectors = queue_max_hw_sectors(rq->q);
 	unsigned int nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS);
-	unsigned int gup_flags = 0;
+	unsigned int extract_flags = 0;
 	struct bio *bio;
 	int ret;
 	int j;
@@ -280,7 +280,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 		return -ENOMEM;
 
 	if (blk_queue_pci_p2pdma(rq->q))
-		gup_flags |= FOLL_PCI_P2PDMA;
+		extract_flags |= ITER_ALLOW_P2PDMA;
 
 	while (iov_iter_count(iter)) {
 		struct page **pages, *stack_pages[UIO_FASTIOV];
@@ -291,10 +291,10 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 		if (nr_vecs <= ARRAY_SIZE(stack_pages)) {
 			pages = stack_pages;
 			bytes = iov_iter_get_pages(iter, pages, LONG_MAX,
-						   nr_vecs, &offs, gup_flags);
+						   nr_vecs, &offs, extract_flags);
 		} else {
 			bytes = iov_iter_get_pages_alloc(iter, &pages,
-						LONG_MAX, &offs, gup_flags);
+						LONG_MAX, &offs, extract_flags);
 		}
 		if (unlikely(bytes <= 0)) {
 			ret = bytes ? bytes : -EFAULT;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 9f158238edba..46d5080314c6 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -252,12 +252,12 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *
 		     loff_t start, size_t count);
 ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
 		size_t maxsize, unsigned maxpages, size_t *start,
-		unsigned gup_flags);
+		unsigned extract_flags);
 ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
 			size_t maxsize, unsigned maxpages, size_t *start);
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		struct page ***pages, size_t maxsize, size_t *start,
-		unsigned gup_flags);
+		unsigned extract_flags);
 ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
 			size_t maxsize, size_t *start);
 int iov_iter_npages(const struct iov_iter *i, int maxpages);
@@ -360,4 +360,7 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 	};
 }
 
+/* Flags for iov_iter_get/extract_pages*() */
+#define ITER_ALLOW_P2PDMA	0x01	/* Allow P2PDMA on the extracted pages */
+
 #endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f9a3ff37ecd1..fb04abe7d746 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1432,9 +1432,9 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
 static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
 		   unsigned int maxpages, size_t *start,
-		   unsigned int gup_flags)
+		   unsigned int extract_flags)
 {
-	unsigned int n;
+	unsigned int n, gup_flags = 0;
 
 	if (maxsize > i->count)
 		maxsize = i->count;
@@ -1442,6 +1442,8 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		return 0;
 	if (maxsize > MAX_RW_COUNT)
 		maxsize = MAX_RW_COUNT;
+	if (extract_flags & ITER_ALLOW_P2PDMA)
+		gup_flags |= FOLL_PCI_P2PDMA;
 
 	if (likely(user_backed_iter(i))) {
 		unsigned long addr;
@@ -1495,14 +1497,14 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 
 ssize_t iov_iter_get_pages(struct iov_iter *i,
 		   struct page **pages, size_t maxsize, unsigned maxpages,
-		   size_t *start, unsigned gup_flags)
+		   size_t *start, unsigned extract_flags)
 {
 	if (!maxpages)
 		return 0;
 	BUG_ON(!pages);
 
 	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages,
-					  start, gup_flags);
+					  start, extract_flags);
 }
 EXPORT_SYMBOL_GPL(iov_iter_get_pages);
 
@@ -1515,14 +1517,14 @@ EXPORT_SYMBOL(iov_iter_get_pages2);
 
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
-		   size_t *start, unsigned gup_flags)
+		   size_t *start, unsigned extract_flags)
 {
 	ssize_t len;
 
 	*pages = NULL;
 
 	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start,
-					 gup_flags);
+					 extract_flags);
 	if (len <= 0) {
 		kvfree(*pages);
 		*pages = NULL;


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
  2023-01-23 17:29 ` [PATCH v8 01/10] iov_iter: Define flags to qualify page extraction David Howells
@ 2023-01-23 17:29 ` David Howells
  2023-01-23 18:21   ` Christoph Hellwig
                     ` (2 more replies)
  2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells
                   ` (10 subsequent siblings)
  12 siblings, 3 replies; 75+ messages in thread
From: David Howells @ 2023-01-23 17:29 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, John Hubbard, David Hildenbrand, linux-mm

Add a function, iov_iter_extract_pages(), to extract a list of pages from
an iterator.  The pages may be returned with a pin added or nothing,
depending on the type of iterator.

Add a second function, iov_iter_extract_mode(), to determine how the
cleanup should be done.

There are two cases:

 (1) ITER_IOVEC or ITER_UBUF iterator.

     Extracted pages will have pins (FOLL_PIN) obtained on them so that a
     concurrent fork() will forcibly copy the page so that DMA is done
     to/from the parent's buffer and is unavailable to/unaffected by the
     child process.

     iov_iter_extract_mode() will return FOLL_PIN for this case.  The
     caller should use something like folio_put_unpin() to dispose of the
     page.

 (2) Any other sort of iterator.

     No refs or pins are obtained on the page, the assumption is made that
     the caller will manage page retention.

     iov_iter_extract_mode() will return 0.  The pages don't need
     additional disposal.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Christoph Hellwig <hch@lst.de>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---

Notes:
    ver #8)
     - It seems that all DIO is supposed to be done under FOLL_PIN now, and not
       FOLL_GET, so switch to only using pin_user_pages() for user-backed
       iters.
     - Wrap an argument in brackets in the iov_iter_extract_mode() macro.
     - Drop the extract_flags argument to iov_iter_extract_mode() for now
       [hch].
    
    ver #7)
     - Switch to passing in iter-specific flags rather than FOLL_* flags.
     - Drop the direction flags for now.
     - Use ITER_ALLOW_P2PDMA to request FOLL_PCI_P2PDMA.
     - Disallow use of ITER_ALLOW_P2PDMA with non-user-backed iter.
     - Add support for extraction from KVEC-type iters.
     - Use iov_iter_advance() rather than open-coding it.
     - Make BVEC- and KVEC-type skip over initial empty vectors.
    
    ver #6)
     - Add back the function to indicate the cleanup mode.
     - Drop the cleanup_mode return arg to iov_iter_extract_pages().
     - Pass FOLL_SOURCE/DEST_BUF in gup_flags.  Check this against the iter
       data_source.
    
    ver #4)
     - Use ITER_SOURCE/DEST instead of WRITE/READ.
     - Allow additional FOLL_* flags, such as FOLL_PCI_P2PDMA to be passed in.
    
    ver #3)
     - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access
       to get/pin_user_pages_fast()[1].

 include/linux/uio.h |  22 +++
 lib/iov_iter.c      | 320 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 342 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 46d5080314c6..a8165335f8da 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -363,4 +363,26 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 /* Flags for iov_iter_get/extract_pages*() */
 #define ITER_ALLOW_P2PDMA	0x01	/* Allow P2PDMA on the extracted pages */
 
+ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages,
+			       size_t maxsize, unsigned int maxpages,
+			       unsigned int extract_flags, size_t *offset0);
+
+/**
+ * iov_iter_extract_mode - Indicate how pages from the iterator will be retained
+ * @iter: The iterator
+ *
+ * Examine the iterator and indicate by returning FOLL_PIN or 0 as to how, if
+ * at all, pages extracted from the iterator will be retained by the extraction
+ * function.
+ *
+ * FOLL_PIN indicates that the pages will have a pin placed in them that the
+ * caller must unpin.  This is must be done for DMA/async DIO to force fork()
+ * to forcibly copy a page for the child (the parent must retain the original
+ * page).
+ *
+ * 0 indicates that no measures are taken and that it's up to the caller to
+ * retain the pages.
+ */
+#define iov_iter_extract_mode(iter) (user_backed_iter(iter) ? FOLL_PIN : 0)
+
 #endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index fb04abe7d746..57f3e9404160 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1916,3 +1916,323 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
 		i->iov -= state->nr_segs - i->nr_segs;
 	i->nr_segs = state->nr_segs;
 }
+
+/*
+ * Extract a list of contiguous pages from an ITER_PIPE iterator.  This does
+ * not get references of its own on the pages, nor does it get a pin on them.
+ * If there's a partial page, it adds that first and will then allocate and add
+ * pages into the pipe to make up the buffer space to the amount required.
+ *
+ * The caller must hold the pipe locked and only transferring into a pipe is
+ * supported.
+ */
+static ssize_t iov_iter_extract_pipe_pages(struct iov_iter *i,
+					   struct page ***pages, size_t maxsize,
+					   unsigned int maxpages,
+					   unsigned int extract_flags,
+					   size_t *offset0)
+{
+	unsigned int nr, offset, chunk, j;
+	struct page **p;
+	size_t left;
+
+	if (!sanity(i))
+		return -EFAULT;
+
+	offset = pipe_npages(i, &nr);
+	if (!nr)
+		return -EFAULT;
+	*offset0 = offset;
+
+	maxpages = min_t(size_t, nr, maxpages);
+	maxpages = want_pages_array(pages, maxsize, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	p = *pages;
+
+	left = maxsize;
+	for (j = 0; j < maxpages; j++) {
+		struct page *page = append_pipe(i, left, &offset);
+		if (!page)
+			break;
+		chunk = min_t(size_t, left, PAGE_SIZE - offset);
+		left -= chunk;
+		*p++ = page;
+	}
+	if (!j)
+		return -EFAULT;
+	return maxsize - left;
+}
+
+/*
+ * Extract a list of contiguous pages from an ITER_XARRAY iterator.  This does not
+ * get references on the pages, nor does it get a pin on them.
+ */
+static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
+					     struct page ***pages, size_t maxsize,
+					     unsigned int maxpages,
+					     unsigned int extract_flags,
+					     size_t *offset0)
+{
+	struct page *page, **p;
+	unsigned int nr = 0, offset;
+	loff_t pos = i->xarray_start + i->iov_offset;
+	pgoff_t index = pos >> PAGE_SHIFT;
+	XA_STATE(xas, i->xarray, index);
+
+	offset = pos & ~PAGE_MASK;
+	*offset0 = offset;
+
+	maxpages = want_pages_array(pages, maxsize, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	p = *pages;
+
+	rcu_read_lock();
+	for (page = xas_load(&xas); page; page = xas_next(&xas)) {
+		if (xas_retry(&xas, page))
+			continue;
+
+		/* Has the page moved or been split? */
+		if (unlikely(page != xas_reload(&xas))) {
+			xas_reset(&xas);
+			continue;
+		}
+
+		p[nr++] = find_subpage(page, xas.xa_index);
+		if (nr == maxpages)
+			break;
+	}
+	rcu_read_unlock();
+
+	maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
+	iov_iter_advance(i, maxsize);
+	return maxsize;
+}
+
+/*
+ * Extract a list of contiguous pages from an ITER_BVEC iterator.  This does
+ * not get references on the pages, nor does it get a pin on them.
+ */
+static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
+					   struct page ***pages, size_t maxsize,
+					   unsigned int maxpages,
+					   unsigned int extract_flags,
+					   size_t *offset0)
+{
+	struct page **p, *page;
+	size_t skip = i->iov_offset, offset;
+	int k;
+
+	for (;;) {
+		if (i->nr_segs == 0)
+			return 0;
+		maxsize = min(maxsize, i->bvec->bv_len - skip);
+		if (maxsize)
+			break;
+		i->iov_offset = 0;
+		i->nr_segs--;
+		i->kvec++;
+		skip = 0;
+	}
+
+	skip += i->bvec->bv_offset;
+	page = i->bvec->bv_page + skip / PAGE_SIZE;
+	offset = skip % PAGE_SIZE;
+	*offset0 = offset;
+
+	maxpages = want_pages_array(pages, maxsize, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	p = *pages;
+	for (k = 0; k < maxpages; k++)
+		p[k] = page + k;
+
+	maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset);
+	iov_iter_advance(i, maxsize);
+	return maxsize;
+}
+
+/*
+ * Extract a list of virtually contiguous pages from an ITER_KVEC iterator.
+ * This does not get references on the pages, nor does it get a pin on them.
+ */
+static ssize_t iov_iter_extract_kvec_pages(struct iov_iter *i,
+					   struct page ***pages, size_t maxsize,
+					   unsigned int maxpages,
+					   unsigned int extract_flags,
+					   size_t *offset0)
+{
+	struct page **p, *page;
+	const void *kaddr;
+	size_t skip = i->iov_offset, offset, len;
+	int k;
+
+	for (;;) {
+		if (i->nr_segs == 0)
+			return 0;
+		maxsize = min(maxsize, i->kvec->iov_len - skip);
+		if (maxsize)
+			break;
+		i->iov_offset = 0;
+		i->nr_segs--;
+		i->kvec++;
+		skip = 0;
+	}
+
+	offset = skip % PAGE_SIZE;
+	*offset0 = offset;
+	kaddr = i->kvec->iov_base;
+
+	maxpages = want_pages_array(pages, maxsize, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	p = *pages;
+
+	kaddr -= offset;
+	len = offset + maxsize;
+	for (k = 0; k < maxpages; k++) {
+		size_t seg = min_t(size_t, len, PAGE_SIZE);
+
+		if (is_vmalloc_or_module_addr(kaddr))
+			page = vmalloc_to_page(kaddr);
+		else
+			page = virt_to_page(kaddr);
+
+		p[k] = page;
+		len -= seg;
+		kaddr += PAGE_SIZE;
+	}
+
+	maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset);
+	iov_iter_advance(i, maxsize);
+	return maxsize;
+}
+
+/*
+ * Extract a list of contiguous pages from a user iterator and get a pin on
+ * each of them.  This should only be used if the iterator is user-backed
+ * (IOBUF/UBUF).
+ *
+ * It does not get refs on the pages, but the pages must be unpinned by the
+ * caller once the transfer is complete.
+ *
+ * This is safe to be used where background IO/DMA *is* going to be modifying
+ * the buffer; using a pin rather than a ref makes forces fork() to give the
+ * child a copy of the page.
+ */
+static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
+					   struct page ***pages,
+					   size_t maxsize,
+					   unsigned int maxpages,
+					   unsigned int extract_flags,
+					   size_t *offset0)
+{
+	unsigned long addr;
+	unsigned int gup_flags = FOLL_PIN;
+	size_t offset;
+	int res;
+
+	if (i->data_source == ITER_DEST)
+		gup_flags |= FOLL_WRITE;
+	if (extract_flags & ITER_ALLOW_P2PDMA)
+		gup_flags |= FOLL_PCI_P2PDMA;
+	if (i->nofault)
+		gup_flags |= FOLL_NOFAULT;
+
+	addr = first_iovec_segment(i, &maxsize);
+	*offset0 = offset = addr % PAGE_SIZE;
+	addr &= PAGE_MASK;
+	maxpages = want_pages_array(pages, maxsize, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages);
+	if (unlikely(res <= 0))
+		return res;
+	maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset);
+	iov_iter_advance(i, maxsize);
+	return maxsize;
+}
+
+/**
+ * iov_iter_extract_pages - Extract a list of contiguous pages from an iterator
+ * @i: The iterator to extract from
+ * @pages: Where to return the list of pages
+ * @maxsize: The maximum amount of iterator to extract
+ * @maxpages: The maximum size of the list of pages
+ * @extract_flags: Flags to qualify request
+ * @offset0: Where to return the starting offset into (*@pages)[0]
+ *
+ * Extract a list of contiguous pages from the current point of the iterator,
+ * advancing the iterator.  The maximum number of pages and the maximum amount
+ * of page contents can be set.
+ *
+ * If *@pages is NULL, a page list will be allocated to the required size and
+ * *@pages will be set to its base.  If *@pages is not NULL, it will be assumed
+ * that the caller allocated a page list at least @maxpages in size and this
+ * will be filled in.
+ *
+ * @extract_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA be
+ * allowed on the pages extracted.
+ *
+ * The iov_iter_extract_mode() function can be used to query how cleanup should
+ * be performed.
+ *
+ * Extra refs or pins on the pages may be obtained as follows:
+ *
+ *  (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF), pins will be
+ *      added to the pages, but refs will not be taken.
+ *      iov_iter_extract_mode() will return FOLL_PIN.
+ *
+ *  (*) If the iterator is ITER_PIPE, this must describe a destination for the
+ *      data.  Additional pages may be allocated and added to the pipe (which
+ *      will hold the refs), but pins will not be obtained for the caller.  The
+ *      caller must hold the pipe lock.  iov_iter_extract_mode() will return 0.
+ *
+ *  (*) If the iterator is ITER_KVEC, ITER_BVEC or ITER_XARRAY, the pages are
+ *      merely listed; no extra refs or pins are obtained.
+ *      iov_iter_extract_mode() will return 0.
+ *
+ * Note also:
+ *
+ *  (*) Use with ITER_DISCARD is not supported as that has no content.
+ *
+ * On success, the function sets *@pages to the new pagelist, if allocated, and
+ * sets *offset0 to the offset into the first page.
+ *
+ * It may also return -ENOMEM and -EFAULT.
+ */
+ssize_t iov_iter_extract_pages(struct iov_iter *i,
+			       struct page ***pages,
+			       size_t maxsize,
+			       unsigned int maxpages,
+			       unsigned int extract_flags,
+			       size_t *offset0)
+{
+	maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT);
+	if (!maxsize)
+		return 0;
+
+	if (likely(user_backed_iter(i)))
+		return iov_iter_extract_user_pages(i, pages, maxsize,
+						   maxpages, extract_flags,
+						   offset0);
+	if (iov_iter_is_kvec(i))
+		return iov_iter_extract_kvec_pages(i, pages, maxsize,
+						   maxpages, extract_flags,
+						   offset0);
+	if (iov_iter_is_bvec(i))
+		return iov_iter_extract_bvec_pages(i, pages, maxsize,
+						   maxpages, extract_flags,
+						   offset0);
+	if (iov_iter_is_pipe(i))
+		return iov_iter_extract_pipe_pages(i, pages, maxsize,
+						   maxpages, extract_flags,
+						   offset0);
+	if (iov_iter_is_xarray(i))
+		return iov_iter_extract_xarray_pages(i, pages, maxsize,
+						     maxpages, extract_flags,
+						     offset0);
+	return -EFAULT;
+}
+EXPORT_SYMBOL_GPL(iov_iter_extract_pages);


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
  2023-01-23 17:29 ` [PATCH v8 01/10] iov_iter: Define flags to qualify page extraction David Howells
  2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells
@ 2023-01-23 17:30 ` David Howells
  2023-01-23 18:21   ` Christoph Hellwig
                     ` (3 more replies)
  2023-01-23 17:30 ` [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
                   ` (9 subsequent siblings)
  12 siblings, 4 replies; 75+ messages in thread
From: David Howells @ 2023-01-23 17:30 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

Provide a helper in the get_user_pages code to drop a pin or a ref on a
page based on being given FOLL_GET or FOLL_PIN in its flags argument or do
nothing if neither is set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Christoph Hellwig <hch@lst.de>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
---
 include/linux/mm.h |  3 +++
 mm/gup.c           | 22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8f857163ac89..3de9d88f8524 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1367,6 +1367,9 @@ static inline bool is_cow_mapping(vm_flags_t flags)
 #define SECTION_IN_PAGE_FLAGS
 #endif
 
+void folio_put_unpin(struct folio *folio, unsigned int flags);
+void page_put_unpin(struct page *page, unsigned int flags);
+
 /*
  * The identification function is mainly used by the buddy allocator for
  * determining if two pages could be buddies. We are not really identifying
diff --git a/mm/gup.c b/mm/gup.c
index f45a3a5be53a..3ee4b4c7e0cb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -191,6 +191,28 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 		folio_put_refs(folio, refs);
 }
 
+/**
+ * folio_put_unpin - Unpin/put a folio as appropriate
+ * @folio: The folio to release
+ * @flags: gup flags indicating the mode of release (FOLL_*)
+ *
+ * Release a folio according to the flags.  If FOLL_GET is set, the folio has a
+ * ref dropped; if FOLL_PIN is set, it is unpinned; otherwise it is left
+ * unaltered.
+ */
+void folio_put_unpin(struct folio *folio, unsigned int flags)
+{
+	if (flags & (FOLL_GET | FOLL_PIN))
+		gup_put_folio(folio, 1, flags);
+}
+EXPORT_SYMBOL_GPL(folio_put_unpin);
+
+void page_put_unpin(struct page *page, unsigned int flags)
+{
+	folio_put_unpin(page_folio(page), flags);
+}
+EXPORT_SYMBOL_GPL(page_put_unpin);
+
 /**
  * try_grab_page() - elevate a page's refcount by a flag-dependent amount
  * @page:    pointer to page to be grabbed


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
                   ` (2 preceding siblings ...)
  2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells
@ 2023-01-23 17:30 ` David Howells
  2023-01-23 18:22   ` Christoph Hellwig
                     ` (2 more replies)
  2023-01-23 17:30 ` [PATCH v8 05/10] block: Fix bio_flagged() so that gcc can better optimise it David Howells
                   ` (8 subsequent siblings)
  12 siblings, 3 replies; 75+ messages in thread
From: David Howells @ 2023-01-23 17:30 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig

From: Christoph Hellwig <hch@lst.de>

ZERO_PAGE can't go away, no need to hold an extra reference.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
---
 fs/iomap/direct-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 9804714b1751..47db4ead1e74 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -202,7 +202,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 	bio->bi_private = dio;
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
-	get_page(page);
+	bio_set_flag(bio, BIO_NO_PAGE_REF);
 	__bio_add_page(bio, page, len, 0);
 	iomap_dio_submit_bio(iter, dio, bio, pos);
 }


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v8 05/10] block: Fix bio_flagged() so that gcc can better optimise it
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
                   ` (3 preceding siblings ...)
  2023-01-23 17:30 ` [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
@ 2023-01-23 17:30 ` David Howells
  2023-01-23 17:30 ` [PATCH v8 06/10] block: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning David Howells
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 75+ messages in thread
From: David Howells @ 2023-01-23 17:30 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig

Fix bio_flagged() so that multiple instances of it, such as:

	if (bio_flagged(bio, BIO_PAGE_REFFED) ||
	    bio_flagged(bio, BIO_PAGE_PINNED))

can be combined by the gcc optimiser into a single test in assembly
(arguably, this is a compiler optimisation issue[1]).

The missed optimisation stems from bio_flagged() comparing the result of
the bitwise-AND to zero.  This results in an out-of-line bio_release_page()
being compiled to something like:

   <+0>:     mov    0x14(%rdi),%eax
   <+3>:     test   $0x1,%al
   <+5>:     jne    0xffffffff816dac53 <bio_release_pages+11>
   <+7>:     test   $0x2,%al
   <+9>:     je     0xffffffff816dac5c <bio_release_pages+20>
   <+11>:    movzbl %sil,%esi
   <+15>:    jmp    0xffffffff816daba1 <__bio_release_pages>
   <+20>:    jmp    0xffffffff81d0b800 <__x86_return_thunk>

However, the test is superfluous as the return type is bool.  Removing it
results in:

   <+0>:     testb  $0x3,0x14(%rdi)
   <+4>:     je     0xffffffff816e4af4 <bio_release_pages+15>
   <+6>:     movzbl %sil,%esi
   <+10>:    jmp    0xffffffff816dab7c <__bio_release_pages>
   <+15>:    jmp    0xffffffff81d0b7c0 <__x86_return_thunk>

instead.

Also, the MOVZBL instruction looks unnecessary[2] - I think it's just
're-booling' the mark_dirty parameter.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-block@vger.kernel.org
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1]
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2]
Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6
---
 include/linux/bio.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index c1da63f6c808..10366b8bdb13 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -227,7 +227,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count)
 
 static inline bool bio_flagged(struct bio *bio, unsigned int bit)
 {
-	return (bio->bi_flags & (1U << bit)) != 0;
+	return bio->bi_flags & (1U << bit);
 }
 
 static inline void bio_set_flag(struct bio *bio, unsigned int bit)


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v8 06/10] block: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
                   ` (4 preceding siblings ...)
  2023-01-23 17:30 ` [PATCH v8 05/10] block: Fix bio_flagged() so that gcc can better optimise it David Howells
@ 2023-01-23 17:30 ` David Howells
  2023-01-23 18:23   ` Christoph Hellwig
  2023-01-23 17:30 ` [PATCH v8 07/10] block: Switch to pinning pages David Howells
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 75+ messages in thread
From: David Howells @ 2023-01-23 17:30 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig

From: Christoph Hellwig <hch@lst.de>

Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted
meaning is only set when a page reference has been acquired that needs to
be relesed by bio_release_pages.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
---

Notes:
    ver #8)
     - Don't default to BIO_PAGE_REFFED [hch].
    
    ver #5)
     - Split from patch that uses iov_iter_extract_pages().

 block/bio.c               | 2 +-
 block/blk-map.c           | 1 +
 fs/direct-io.c            | 2 ++
 fs/iomap/direct-io.c      | 1 -
 include/linux/bio.h       | 2 +-
 include/linux/blk_types.h | 2 +-
 6 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index a289bbff036f..40c2b01906da 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1198,7 +1198,6 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
 	bio->bi_io_vec = (struct bio_vec *)iter->bvec;
 	bio->bi_iter.bi_bvec_done = iter->iov_offset;
 	bio->bi_iter.bi_size = size;
-	bio_set_flag(bio, BIO_NO_PAGE_REF);
 	bio_set_flag(bio, BIO_CLONED);
 }
 
@@ -1343,6 +1342,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		return 0;
 	}
 
+	bio_set_flag(bio, BIO_PAGE_REFFED);
 	do {
 		ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
diff --git a/block/blk-map.c b/block/blk-map.c
index bc111261fc82..a4ada4389d5e 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -282,6 +282,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 	if (blk_queue_pci_p2pdma(rq->q))
 		extract_flags |= ITER_ALLOW_P2PDMA;
 
+	bio_set_flag(bio, BIO_PAGE_REFFED);
 	while (iov_iter_count(iter)) {
 		struct page **pages, *stack_pages[UIO_FASTIOV];
 		ssize_t bytes;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 03d381377ae1..07810465fc9d 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -403,6 +403,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 		bio->bi_end_io = dio_bio_end_aio;
 	else
 		bio->bi_end_io = dio_bio_end_io;
+	/* for now require references for all pages */
+	bio_set_flag(bio, BIO_PAGE_REFFED);
 	sdio->bio = bio;
 	sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
 }
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 47db4ead1e74..c0e75900e754 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -202,7 +202,6 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 	bio->bi_private = dio;
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
-	bio_set_flag(bio, BIO_NO_PAGE_REF);
 	__bio_add_page(bio, page, len, 0);
 	iomap_dio_submit_bio(iter, dio, bio, pos);
 }
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 10366b8bdb13..805957c99147 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -484,7 +484,7 @@ void zero_fill_bio(struct bio *bio);
 
 static inline void bio_release_pages(struct bio *bio, bool mark_dirty)
 {
-	if (!bio_flagged(bio, BIO_NO_PAGE_REF))
+	if (bio_flagged(bio, BIO_PAGE_REFFED))
 		__bio_release_pages(bio, mark_dirty);
 }
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 99be590f952f..7daa261f4f98 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -318,7 +318,7 @@ struct bio {
  * bio flags
  */
 enum {
-	BIO_NO_PAGE_REF,	/* don't put release vec pages */
+	BIO_PAGE_REFFED,	/* put pages in bio_release_pages() */
 	BIO_CLONED,		/* doesn't own data */
 	BIO_BOUNCED,		/* bio is a bounce bio */
 	BIO_QUIET,		/* Make BIO Quiet */


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
                   ` (5 preceding siblings ...)
  2023-01-23 17:30 ` [PATCH v8 06/10] block: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning David Howells
@ 2023-01-23 17:30 ` David Howells
  2023-01-23 18:23   ` Christoph Hellwig
                     ` (2 more replies)
  2023-01-23 17:30 ` [PATCH v8 08/10] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
                   ` (5 subsequent siblings)
  12 siblings, 3 replies; 75+ messages in thread
From: David Howells @ 2023-01-23 17:30 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig

Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned
(FOLL_PIN) and that the pin will need removing.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Christoph Hellwig <hch@lst.de>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
---

Notes:
    ver #8)
     - Move the infrastructure to clean up pinned pages to this patch [hch].
     - Put BIO_PAGE_PINNED before BIO_PAGE_REFFED as the latter should
       probably be removed at some point.  FOLL_PIN can then be renumbered
       first.

 block/bio.c               |  7 ++++---
 block/blk.h               | 28 ++++++++++++++++++++++++++++
 include/linux/bio.h       |  3 ++-
 include/linux/blk_types.h |  1 +
 4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 40c2b01906da..6f98bcfc0c92 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1170,13 +1170,14 @@ bool bio_add_folio(struct bio *bio, struct folio *folio, size_t len,
 
 void __bio_release_pages(struct bio *bio, bool mark_dirty)
 {
+	unsigned int gup_flags = bio_to_gup_flags(bio);
 	struct bvec_iter_all iter_all;
 	struct bio_vec *bvec;
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		if (mark_dirty && !PageCompound(bvec->bv_page))
 			set_page_dirty_lock(bvec->bv_page);
-		put_page(bvec->bv_page);
+		page_put_unpin(bvec->bv_page, gup_flags);
 	}
 }
 EXPORT_SYMBOL_GPL(__bio_release_pages);
@@ -1496,8 +1497,8 @@ void bio_set_pages_dirty(struct bio *bio)
  * the BIO and re-dirty the pages in process context.
  *
  * It is expected that bio_check_pages_dirty() will wholly own the BIO from
- * here on.  It will run one put_page() against each page and will run one
- * bio_put() against the BIO.
+ * here on.  It will run one page_put_unpin() against each page and will run
+ * one bio_put() against the BIO.
  */
 
 static void bio_dirty_fn(struct work_struct *work);
diff --git a/block/blk.h b/block/blk.h
index 4c3b3325219a..294044d696e0 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -425,6 +425,34 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
 		struct page *page, unsigned int len, unsigned int offset,
 		unsigned int max_sectors, bool *same_page);
 
+/*
+ * Set the cleanup mode for a bio from an iterator and the extraction flags.
+ */
+static inline void bio_set_cleanup_mode(struct bio *bio, struct iov_iter *iter)
+{
+	unsigned int cleanup_mode = iov_iter_extract_mode(iter);
+
+	if (cleanup_mode & FOLL_GET)
+		bio_set_flag(bio, BIO_PAGE_REFFED);
+	if (cleanup_mode & FOLL_PIN)
+		bio_set_flag(bio, BIO_PAGE_PINNED);
+}
+
+static inline unsigned int bio_to_gup_flags(struct bio *bio)
+{
+	return (bio_flagged(bio, BIO_PAGE_REFFED) ? FOLL_GET : 0) |
+		(bio_flagged(bio, BIO_PAGE_PINNED) ? FOLL_PIN : 0);
+}
+
+/*
+ * Clean up a page appropriately, where the page may be pinned, may have a
+ * ref taken on it or neither.
+ */
+static inline void bio_release_page(struct bio *bio, struct page *page)
+{
+	page_put_unpin(page, bio_to_gup_flags(bio));
+}
+
 struct request_queue *blk_alloc_queue(int node_id);
 
 int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 805957c99147..b2c09997d79c 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -484,7 +484,8 @@ void zero_fill_bio(struct bio *bio);
 
 static inline void bio_release_pages(struct bio *bio, bool mark_dirty)
 {
-	if (bio_flagged(bio, BIO_PAGE_REFFED))
+	if (bio_flagged(bio, BIO_PAGE_REFFED) ||
+	    bio_flagged(bio, BIO_PAGE_PINNED))
 		__bio_release_pages(bio, mark_dirty);
 }
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 7daa261f4f98..a0e339ff3d09 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -318,6 +318,7 @@ struct bio {
  * bio flags
  */
 enum {
+	BIO_PAGE_PINNED,	/* Unpin pages in bio_release_pages() */
 	BIO_PAGE_REFFED,	/* put pages in bio_release_pages() */
 	BIO_CLONED,		/* doesn't own data */
 	BIO_BOUNCED,		/* bio is a bounce bio */


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v8 08/10] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
                   ` (6 preceding siblings ...)
  2023-01-23 17:30 ` [PATCH v8 07/10] block: Switch to pinning pages David Howells
@ 2023-01-23 17:30 ` David Howells
  2023-01-23 18:23   ` Christoph Hellwig
  2023-01-23 17:30 ` [PATCH v8 09/10] block: convert bio_map_user_iov " David Howells
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 75+ messages in thread
From: David Howells @ 2023-01-23 17:30 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig

This will pin pages or leave them unaltered rather than getting a ref on
them as appropriate to the iterator.

The pages need to be pinned for DIO rather than having refs taken on them to
prevent VM copy-on-write from malfunctioning during a concurrent fork() (the
result of the I/O could otherwise end up being affected by/visible to the
child process).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Christoph Hellwig <hch@lst.de>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
---

Notes:
    ver #8)
     - Split the patch up a bit [hch].
     - We should only be using pinned/non-pinned pages and not ref'd pages,
       so adjust the comments appropriately.
    
    ver #7)
     - Don't treat BIO_PAGE_REFFED/PINNED as being the same as FOLL_GET/PIN.
    
    ver #5)
     - Transcribe the FOLL_* flags returned by iov_iter_extract_pages() to
       BIO_* flags and got rid of bi_cleanup_mode.
     - Replaced BIO_NO_PAGE_REF to BIO_PAGE_REFFED in the preceding patch.

 block/bio.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 6f98bcfc0c92..6dc54bf3ed27 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1213,7 +1213,7 @@ static int bio_iov_add_page(struct bio *bio, struct page *page,
 	}
 
 	if (same_page)
-		put_page(page);
+		bio_release_page(bio, page);
 	return 0;
 }
 
@@ -1227,7 +1227,7 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
 			queue_max_zone_append_sectors(q), &same_page) != len)
 		return -EINVAL;
 	if (same_page)
-		put_page(page);
+		bio_release_page(bio, page);
 	return 0;
 }
 
@@ -1238,10 +1238,10 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
  * @bio: bio to add pages to
  * @iter: iov iterator describing the region to be mapped
  *
- * Pins pages from *iter and appends them to @bio's bvec array. The
- * pages will have to be released using put_page() when done.
- * For multi-segment *iter, this function only adds pages from the
- * next non-empty segment of the iov iterator.
+ * Extracts pages from *iter and appends them to @bio's bvec array.  The pages
+ * will have to be cleaned up in the way indicated by the BIO_PAGE_PINNED flag.
+ * For a multi-segment *iter, this function only adds pages from the next
+ * non-empty segment of the iov iterator.
  */
 static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
@@ -1273,9 +1273,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 * result to ensure the bio's total size is correct. The remainder of
 	 * the iov data will be picked up in the next bio iteration.
 	 */
-	size = iov_iter_get_pages(iter, pages,
-				  UINT_MAX - bio->bi_iter.bi_size,
-				  nr_pages, &offset, extract_flags);
+	size = iov_iter_extract_pages(iter, &pages,
+				      UINT_MAX - bio->bi_iter.bi_size,
+				      nr_pages, extract_flags, &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
@@ -1308,7 +1308,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	iov_iter_revert(iter, left);
 out:
 	while (i < nr_pages)
-		put_page(pages[i++]);
+		bio_release_page(bio, pages[i++]);
 
 	return ret;
 }
@@ -1343,7 +1343,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		return 0;
 	}
 
-	bio_set_flag(bio, BIO_PAGE_REFFED);
+	bio_set_cleanup_mode(bio, iter);
 	do {
 		ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v8 09/10] block: convert bio_map_user_iov to use iov_iter_extract_pages
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
                   ` (7 preceding siblings ...)
  2023-01-23 17:30 ` [PATCH v8 08/10] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
@ 2023-01-23 17:30 ` David Howells
  2023-01-23 18:24   ` Christoph Hellwig
  2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 75+ messages in thread
From: David Howells @ 2023-01-23 17:30 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig

This will pin pages or leave them unaltered rather than getting a ref on
them as appropriate to the iterator.

The pages need to be pinned for DIO rather than having refs taken on them
to prevent VM copy-on-write from malfunctioning during a concurrent fork()
(the result of the I/O could otherwise end up being visible to/affected by
the child process).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Christoph Hellwig <hch@lst.de>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
---

Notes:
    ver #8)
     - Split the patch up a bit [hch].
     - We should only be using pinned/non-pinned pages and not ref'd pages,
       so adjust the comments appropriately.
    
    ver #7)
     - Don't treat BIO_PAGE_REFFED/PINNED as being the same as FOLL_GET/PIN.
    
    ver #5)
     - Transcribe the FOLL_* flags returned by iov_iter_extract_pages() to
       BIO_* flags and got rid of bi_cleanup_mode.
     - Replaced BIO_NO_PAGE_REF to BIO_PAGE_REFFED in the preceding patch.

 block/blk-map.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index a4ada4389d5e..b9b36af3c3f7 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -282,21 +282,19 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 	if (blk_queue_pci_p2pdma(rq->q))
 		extract_flags |= ITER_ALLOW_P2PDMA;
 
-	bio_set_flag(bio, BIO_PAGE_REFFED);
+	bio_set_cleanup_mode(bio, iter);
 	while (iov_iter_count(iter)) {
-		struct page **pages, *stack_pages[UIO_FASTIOV];
+		struct page *stack_pages[UIO_FASTIOV];
+		struct page **pages = stack_pages;
 		ssize_t bytes;
 		size_t offs;
 		int npages;
 
-		if (nr_vecs <= ARRAY_SIZE(stack_pages)) {
-			pages = stack_pages;
-			bytes = iov_iter_get_pages(iter, pages, LONG_MAX,
-						   nr_vecs, &offs, extract_flags);
-		} else {
-			bytes = iov_iter_get_pages_alloc(iter, &pages,
-						LONG_MAX, &offs, extract_flags);
-		}
+		if (nr_vecs > ARRAY_SIZE(stack_pages))
+			pages = NULL;
+
+		bytes = iov_iter_extract_pages(iter, &pages, LONG_MAX,
+					       nr_vecs, extract_flags, &offs);
 		if (unlikely(bytes <= 0)) {
 			ret = bytes ? bytes : -EFAULT;
 			goto out_unmap;
@@ -318,7 +316,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 				if (!bio_add_hw_page(rq->q, bio, page, n, offs,
 						     max_sectors, &same_page)) {
 					if (same_page)
-						put_page(page);
+						bio_release_page(bio, page);
 					break;
 				}
 
@@ -330,7 +328,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 		 * release the pages we didn't map into the bio, if any
 		 */
 		while (j < npages)
-			put_page(pages[j++]);
+			bio_release_page(bio, pages[j++]);
 		if (pages != stack_pages)
 			kvfree(pages);
 		/* couldn't stuff something into bio? */


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
                   ` (8 preceding siblings ...)
  2023-01-23 17:30 ` [PATCH v8 09/10] block: convert bio_map_user_iov " David Howells
@ 2023-01-23 17:30 ` David Howells
  2023-01-23 18:25   ` Christoph Hellwig
                     ` (2 more replies)
  2023-01-24  2:02 ` [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) John Hubbard
                   ` (2 subsequent siblings)
  12 siblings, 3 replies; 75+ messages in thread
From: David Howells @ 2023-01-23 17:30 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that
they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and
also so that they can be stored in the bottom two bits of a page pointer
(something I'm looking at for zerocopy socket fragments).

(Note that BIO_PAGE_REFFED should probably be got rid of at some point,
hence why FOLL_PIN is at 0.)

Also renumber down the other FOLL_* flags to close the gaps.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Christoph Hellwig <hch@lst.de>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---

Notes:
    ver #8)
     - Put FOLL_PIN at bit 0 and FOLL_GET at bit 1 to match BIO_PAGE_*.
     - Renumber the remaining flags down to fill in the gap.

 include/linux/mm.h | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3de9d88f8524..c95bc4f77e8f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3074,26 +3074,28 @@ static inline vm_fault_t vmf_error(int err)
 struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 			 unsigned int foll_flags);
 
-#define FOLL_WRITE	0x01	/* check pte is writable */
-#define FOLL_TOUCH	0x02	/* mark page accessed */
-#define FOLL_GET	0x04	/* do get_page on page */
-#define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
-#define FOLL_FORCE	0x10	/* get_user_pages read/write w/o permission */
-#define FOLL_NOWAIT	0x20	/* if a disk transfer is needed, start the IO
+#define FOLL_PIN	0x01	/* pages must be released via unpin_user_page */
+#define FOLL_GET	0x02	/* do get_page on page (equivalent to BIO_FOLL_GET) */
+#define FOLL_WRITE	0x04	/* check pte is writable */
+#define FOLL_TOUCH	0x08	/* mark page accessed */
+#define FOLL_DUMP	0x10	/* give error on hole if it would be zero */
+#define FOLL_FORCE	0x20	/* get_user_pages read/write w/o permission */
+#define FOLL_NOWAIT	0x40	/* if a disk transfer is needed, start the IO
 				 * and return without waiting upon it */
 #define FOLL_NOFAULT	0x80	/* do not fault in pages */
 #define FOLL_HWPOISON	0x100	/* check page is hwpoisoned */
-#define FOLL_TRIED	0x800	/* a retry, previous pass started an IO */
-#define FOLL_REMOTE	0x2000	/* we are working on non-current tsk/mm */
-#define FOLL_ANON	0x8000	/* don't do file mappings */
-#define FOLL_LONGTERM	0x10000	/* mapping lifetime is indefinite: see below */
-#define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
-#define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
-#define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
-#define FOLL_PCI_P2PDMA	0x100000 /* allow returning PCI P2PDMA pages */
-#define FOLL_INTERRUPTIBLE  0x200000 /* allow interrupts from generic signals */
+#define FOLL_TRIED	0x200	/* a retry, previous pass started an IO */
+#define FOLL_REMOTE	0x400	/* we are working on non-current tsk/mm */
+#define FOLL_ANON	0x800	/* don't do file mappings */
+#define FOLL_LONGTERM	0x1000	/* mapping lifetime is indefinite: see below */
+#define FOLL_SPLIT_PMD	0x2000	/* split huge pmd before returning */
+#define FOLL_FAST_ONLY	0x4000	/* gup_fast: prevent fall-back to slow gup */
+#define FOLL_PCI_P2PDMA	0x8000 /* allow returning PCI P2PDMA pages */
+#define FOLL_INTERRUPTIBLE  0x10000 /* allow interrupts from generic signals */
 
 /*
+ * Note that FOLL_PIN is sorted to bit 0 to be coincident with BIO_PAGE_PINNED.
+ *
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
  * other. Here is what they mean, and how to use them:
  *


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 01/10] iov_iter: Define flags to qualify page extraction.
  2023-01-23 17:29 ` [PATCH v8 01/10] iov_iter: Define flags to qualify page extraction David Howells
@ 2023-01-23 18:20   ` Christoph Hellwig
  2023-01-24  2:12   ` John Hubbard
  1 sibling, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-23 18:20 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel

On Mon, Jan 23, 2023 at 05:29:58PM +0000, David Howells wrote:
> Define flags to qualify page extraction to pass into iov_iter_*_pages*()
> rather than passing in FOLL_* flags.
> 
> For now only a flag to allow peer-to-peer DMA is supported.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator
  2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells
@ 2023-01-23 18:21   ` Christoph Hellwig
  2023-01-24 14:27   ` David Hildenbrand
  2023-01-24 14:35   ` David Howells
  2 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-23 18:21 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig, John Hubbard, David Hildenbrand,
	linux-mm

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page
  2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells
@ 2023-01-23 18:21   ` Christoph Hellwig
  2023-01-24  3:03   ` John Hubbard
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-23 18:21 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig, linux-mm

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing
  2023-01-23 17:30 ` [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
@ 2023-01-23 18:22   ` Christoph Hellwig
  2023-01-24  2:42   ` John Hubbard
  2023-01-24 14:29   ` David Hildenbrand
  2 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-23 18:22 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig

On Mon, Jan 23, 2023 at 05:30:01PM +0000, David Howells wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> ZERO_PAGE can't go away, no need to hold an extra reference.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: David Howells <dhowells@redhat.com>

I did originally attribute this to you as it's just extracted as
an intermedia step from your patch.  But I can live with it being
credited to me if you want.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 06/10] block: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning
  2023-01-23 17:30 ` [PATCH v8 06/10] block: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning David Howells
@ 2023-01-23 18:23   ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-23 18:23 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig

I think the subject is incorrect.  It's not really renamed but
inverted.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-23 17:30 ` [PATCH v8 07/10] block: Switch to pinning pages David Howells
@ 2023-01-23 18:23   ` Christoph Hellwig
  2023-01-24 14:32   ` David Hildenbrand
  2023-01-24 14:47   ` David Howells
  2 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-23 18:23 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig

On Mon, Jan 23, 2023 at 05:30:04PM +0000, David Howells wrote:
> Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned
> (FOLL_PIN) and that the pin will need removing.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 08/10] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
  2023-01-23 17:30 ` [PATCH v8 08/10] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
@ 2023-01-23 18:23   ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-23 18:23 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 09/10] block: convert bio_map_user_iov to use iov_iter_extract_pages
  2023-01-23 17:30 ` [PATCH v8 09/10] block: convert bio_map_user_iov " David Howells
@ 2023-01-23 18:24   ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-23 18:24 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells
@ 2023-01-23 18:25   ` Christoph Hellwig
  2023-01-24  3:08   ` John Hubbard
  2023-01-24  7:05   ` David Howells
  2 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-23 18:25 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig, linux-mm

On Mon, Jan 23, 2023 at 05:30:07PM +0000, David Howells wrote:
> Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that
> they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and
> also so that they can be stored in the bottom two bits of a page pointer
> (something I'm looking at for zerocopy socket fragments).
> 
> (Note that BIO_PAGE_REFFED should probably be got rid of at some point,
> hence why FOLL_PIN is at 0.)
> 
> Also renumber down the other FOLL_* flags to close the gaps.

Taking the risk of having a long bikeshed:  I much prefer (1U << bitnr)
style definition that make it obvious where there are holes, but
otherwise:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list)
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
                   ` (9 preceding siblings ...)
  2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells
@ 2023-01-24  2:02 ` John Hubbard
  2023-01-24 12:44 ` David Hildenbrand
  2023-01-24 13:44 ` David Howells
  12 siblings, 0 replies; 75+ messages in thread
From: John Hubbard @ 2023-01-24  2:02 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel

On 1/23/23 09:29, David Howells wrote:
> Hi Al, Christoph,
> 
> Here are patches to provide support for extracting pages from an iov_iter
> and to use this in the extraction functions in the block layer bio code.
> 

Hi David,

It's great to see this series. I attempted this a few times but got
caught in a loop of "don't quite see all the pieces, but it almost makes
sense...Al Viro has spotted major problems (again)...squirrel!"; and
repeat. :)

I saw your earlier versions go by and expected that they would end up
being an iov_iter prerequisite to getting Direct IO converted over to
FOLL_PIN. But now it looks like you are actually doing the conversion as
well! That's really excellent.

I've made a first pass through the series and have some minor notes
that I'll send out shortly, but it looks nice overall.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 01/10] iov_iter: Define flags to qualify page extraction.
  2023-01-23 17:29 ` [PATCH v8 01/10] iov_iter: Define flags to qualify page extraction David Howells
  2023-01-23 18:20   ` Christoph Hellwig
@ 2023-01-24  2:12   ` John Hubbard
  1 sibling, 0 replies; 75+ messages in thread
From: John Hubbard @ 2023-01-24  2:12 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel

On 1/23/23 09:29, David Howells wrote:
> Define flags to qualify page extraction to pass into iov_iter_*_pages*()
> rather than passing in FOLL_* flags.
> 
> For now only a flag to allow peer-to-peer DMA is supported.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: Christoph Hellwig <hch@infradead.org>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Logan Gunthorpe <logang@deltatee.com>
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-block@vger.kernel.org
> ---
> 
> Notes:
>      ver #7)
>       - Don't use FOLL_* as a parameter, but rather define constants
>         specifically to use with iov_iter_*_pages*().
>       - Drop the I/O direction constants for now.
> 
>   block/bio.c         |  6 +++---
>   block/blk-map.c     |  8 ++++----
>   include/linux/uio.h |  7 +++++--
>   lib/iov_iter.c      | 14 ++++++++------
>   4 files changed, 20 insertions(+), 15 deletions(-)

One possible way this could run into problems is if any callers are
violating the new requirement that flags be limited to those used to
govern iov extraction. In other words, if any callers were passing other
gup flags in (because instead of adding to them, the code is now setting
flags to zero and only looking at the new extract_flags). So I checked
for that and there aren't any.

So this looks good. I will lightly suggest renaming extract_flags to
extraction_flags, because it reads like a boolean: "to extract, or not
to extract flags, that is the question". Of course, once you look at the
code it is clear. But the extra "ion" doesn't overflow the 80 cols
anywhere and it is a nice touch.

Up to you. Either way, please feel free to add:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

> 
> diff --git a/block/bio.c b/block/bio.c
> index ab59a491a883..a289bbff036f 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1249,7 +1249,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>   	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
>   	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
>   	struct page **pages = (struct page **)bv;
> -	unsigned int gup_flags = 0;
> +	unsigned int extract_flags = 0;
>   	ssize_t size, left;
>   	unsigned len, i = 0;
>   	size_t offset, trim;
> @@ -1264,7 +1264,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>   	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
>   
>   	if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
> -		gup_flags |= FOLL_PCI_P2PDMA;
> +		extract_flags |= ITER_ALLOW_P2PDMA;
>   
>   	/*
>   	 * Each segment in the iov is required to be a block size multiple.
> @@ -1275,7 +1275,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>   	 */
>   	size = iov_iter_get_pages(iter, pages,
>   				  UINT_MAX - bio->bi_iter.bi_size,
> -				  nr_pages, &offset, gup_flags);
> +				  nr_pages, &offset, extract_flags);
>   	if (unlikely(size <= 0))
>   		return size ? size : -EFAULT;
>   
> diff --git a/block/blk-map.c b/block/blk-map.c
> index 19940c978c73..bc111261fc82 100644
> --- a/block/blk-map.c
> +++ b/block/blk-map.c
> @@ -267,7 +267,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
>   {
>   	unsigned int max_sectors = queue_max_hw_sectors(rq->q);
>   	unsigned int nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS);
> -	unsigned int gup_flags = 0;
> +	unsigned int extract_flags = 0;
>   	struct bio *bio;
>   	int ret;
>   	int j;
> @@ -280,7 +280,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
>   		return -ENOMEM;
>   
>   	if (blk_queue_pci_p2pdma(rq->q))
> -		gup_flags |= FOLL_PCI_P2PDMA;
> +		extract_flags |= ITER_ALLOW_P2PDMA;
>   
>   	while (iov_iter_count(iter)) {
>   		struct page **pages, *stack_pages[UIO_FASTIOV];
> @@ -291,10 +291,10 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
>   		if (nr_vecs <= ARRAY_SIZE(stack_pages)) {
>   			pages = stack_pages;
>   			bytes = iov_iter_get_pages(iter, pages, LONG_MAX,
> -						   nr_vecs, &offs, gup_flags);
> +						   nr_vecs, &offs, extract_flags);
>   		} else {
>   			bytes = iov_iter_get_pages_alloc(iter, &pages,
> -						LONG_MAX, &offs, gup_flags);
> +						LONG_MAX, &offs, extract_flags);
>   		}
>   		if (unlikely(bytes <= 0)) {
>   			ret = bytes ? bytes : -EFAULT;
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 9f158238edba..46d5080314c6 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -252,12 +252,12 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *
>   		     loff_t start, size_t count);
>   ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
>   		size_t maxsize, unsigned maxpages, size_t *start,
> -		unsigned gup_flags);
> +		unsigned extract_flags);
>   ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
>   			size_t maxsize, unsigned maxpages, size_t *start);
>   ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
>   		struct page ***pages, size_t maxsize, size_t *start,
> -		unsigned gup_flags);
> +		unsigned extract_flags);
>   ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
>   			size_t maxsize, size_t *start);
>   int iov_iter_npages(const struct iov_iter *i, int maxpages);
> @@ -360,4 +360,7 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
>   	};
>   }
>   
> +/* Flags for iov_iter_get/extract_pages*() */
> +#define ITER_ALLOW_P2PDMA	0x01	/* Allow P2PDMA on the extracted pages */
> +
>   #endif
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index f9a3ff37ecd1..fb04abe7d746 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1432,9 +1432,9 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
>   static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>   		   struct page ***pages, size_t maxsize,
>   		   unsigned int maxpages, size_t *start,
> -		   unsigned int gup_flags)
> +		   unsigned int extract_flags)
>   {
> -	unsigned int n;
> +	unsigned int n, gup_flags = 0;
>   
>   	if (maxsize > i->count)
>   		maxsize = i->count;
> @@ -1442,6 +1442,8 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>   		return 0;
>   	if (maxsize > MAX_RW_COUNT)
>   		maxsize = MAX_RW_COUNT;
> +	if (extract_flags & ITER_ALLOW_P2PDMA)
> +		gup_flags |= FOLL_PCI_P2PDMA;
>   
>   	if (likely(user_backed_iter(i))) {
>   		unsigned long addr;
> @@ -1495,14 +1497,14 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>   
>   ssize_t iov_iter_get_pages(struct iov_iter *i,
>   		   struct page **pages, size_t maxsize, unsigned maxpages,
> -		   size_t *start, unsigned gup_flags)
> +		   size_t *start, unsigned extract_flags)
>   {
>   	if (!maxpages)
>   		return 0;
>   	BUG_ON(!pages);
>   
>   	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages,
> -					  start, gup_flags);
> +					  start, extract_flags);
>   }
>   EXPORT_SYMBOL_GPL(iov_iter_get_pages);
>   
> @@ -1515,14 +1517,14 @@ EXPORT_SYMBOL(iov_iter_get_pages2);
>   
>   ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
>   		   struct page ***pages, size_t maxsize,
> -		   size_t *start, unsigned gup_flags)
> +		   size_t *start, unsigned extract_flags)
>   {
>   	ssize_t len;
>   
>   	*pages = NULL;
>   
>   	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start,
> -					 gup_flags);
> +					 extract_flags);
>   	if (len <= 0) {
>   		kvfree(*pages);
>   		*pages = NULL;
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing
  2023-01-23 17:30 ` [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
  2023-01-23 18:22   ` Christoph Hellwig
@ 2023-01-24  2:42   ` John Hubbard
  2023-01-24  5:59     ` Christoph Hellwig
  2023-01-24 14:29   ` David Hildenbrand
  2 siblings, 1 reply; 75+ messages in thread
From: John Hubbard @ 2023-01-24  2:42 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig

On 1/23/23 09:30, David Howells wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> ZERO_PAGE can't go away, no need to hold an extra reference.

That statement is true, but...

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
>   fs/iomap/direct-io.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 9804714b1751..47db4ead1e74 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -202,7 +202,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
>   	bio->bi_private = dio;
>   	bio->bi_end_io = iomap_dio_bio_end_io;
>   
> -	get_page(page);
> +	bio_set_flag(bio, BIO_NO_PAGE_REF);

...is it accurate to assume that the entire bio is pointing to the zero
page? I recall working through this area earlier last year, and ended up
just letting the zero page get pinned, and then unpinning upon release,
which is harmless.

I like your approach, if it is possible. I'm just not sure that it's
correct given that bio's can have more than one page.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page
  2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells
  2023-01-23 18:21   ` Christoph Hellwig
@ 2023-01-24  3:03   ` John Hubbard
  2023-01-24 14:28   ` David Hildenbrand
  2023-01-24 14:41   ` David Howells
  3 siblings, 0 replies; 75+ messages in thread
From: John Hubbard @ 2023-01-24  3:03 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig, Jason Gunthorpe
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

On 1/23/23 09:30, David Howells wrote:
> Provide a helper in the get_user_pages code to drop a pin or a ref on a
> page based on being given FOLL_GET or FOLL_PIN in its flags argument or do
> nothing if neither is set.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: Christoph Hellwig <hch@lst.de>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-block@vger.kernel.org
> cc: linux-mm@kvack.org
> ---
>   include/linux/mm.h |  3 +++
>   mm/gup.c           | 22 ++++++++++++++++++++++
>   2 files changed, 25 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8f857163ac89..3de9d88f8524 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1367,6 +1367,9 @@ static inline bool is_cow_mapping(vm_flags_t flags)
>   #define SECTION_IN_PAGE_FLAGS
>   #endif
>   
> +void folio_put_unpin(struct folio *folio, unsigned int flags);
> +void page_put_unpin(struct page *page, unsigned int flags);

How about these names instead:

     folio_put_or_unpin()
     page_put_or_unpin()

?

Also, could we please change the name of the flags argument, to
gup_flags?

> +
>   /*
>    * The identification function is mainly used by the buddy allocator for
>    * determining if two pages could be buddies. We are not really identifying
> diff --git a/mm/gup.c b/mm/gup.c
> index f45a3a5be53a..3ee4b4c7e0cb 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -191,6 +191,28 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
>   		folio_put_refs(folio, refs);
>   }
>   
> +/**
> + * folio_put_unpin - Unpin/put a folio as appropriate
> + * @folio: The folio to release
> + * @flags: gup flags indicating the mode of release (FOLL_*)
> + *
> + * Release a folio according to the flags.  If FOLL_GET is set, the folio has a
> + * ref dropped; if FOLL_PIN is set, it is unpinned; otherwise it is left
> + * unaltered.
> + */
> +void folio_put_unpin(struct folio *folio, unsigned int flags)
> +{
> +	if (flags & (FOLL_GET | FOLL_PIN))

Another minor complication is that FOLL_PIN is supposed to be an
internal-to-mm flag. But here (and in another part of the series), it
has leaked into the public API. One approach would be to give up and
just admit that, like FOLL_GET, FOLL_PIN has escaped into the wild.

Another approach would be to use a new set of flags, such as
USE_FOLL_GET and USE_FOLL_PIN.

But I'm starting to lean toward the first approach: just let FOLL_PIN be
used in this way, treat it as part of the external API at least for this
area (not for gup/pup calls, though).

So after all that thinking out loud, I think this is OK to use FOLL_PIN.

+Cc Jason Gunthorpe, because he is about to split up FOLL_* into public
and internal sets.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells
  2023-01-23 18:25   ` Christoph Hellwig
@ 2023-01-24  3:08   ` John Hubbard
  2023-01-24  3:11     ` John Hubbard
  2023-01-24  7:05   ` David Howells
  2 siblings, 1 reply; 75+ messages in thread
From: John Hubbard @ 2023-01-24  3:08 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

On 1/23/23 09:30, David Howells wrote:
> Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that
> they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and
> also so that they can be stored in the bottom two bits of a page pointer
> (something I'm looking at for zerocopy socket fragments).
> 
> (Note that BIO_PAGE_REFFED should probably be got rid of at some point,
> hence why FOLL_PIN is at 0.)
> 
> Also renumber down the other FOLL_* flags to close the gaps.

Should we also get these sorted into internal-to-mm and public sets?
Because Jason (+Cc) again was about to split them apart into
mm/internal.h [1] and that might make that a little cleaner.

Also, I don't think that there is any large readability difference
either way between 0x and <<1, so whatever you and Christophe settle on
there seems fine.

So either way with those points,

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: Christoph Hellwig <hch@lst.de>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-mm@kvack.org
> ---
> 
> Notes:
>      ver #8)
>       - Put FOLL_PIN at bit 0 and FOLL_GET at bit 1 to match BIO_PAGE_*.
>       - Renumber the remaining flags down to fill in the gap.
> 
>   include/linux/mm.h | 32 +++++++++++++++++---------------
>   1 file changed, 17 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3de9d88f8524..c95bc4f77e8f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3074,26 +3074,28 @@ static inline vm_fault_t vmf_error(int err)
>   struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>   			 unsigned int foll_flags);
>   
> -#define FOLL_WRITE	0x01	/* check pte is writable */
> -#define FOLL_TOUCH	0x02	/* mark page accessed */
> -#define FOLL_GET	0x04	/* do get_page on page */
> -#define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
> -#define FOLL_FORCE	0x10	/* get_user_pages read/write w/o permission */
> -#define FOLL_NOWAIT	0x20	/* if a disk transfer is needed, start the IO
> +#define FOLL_PIN	0x01	/* pages must be released via unpin_user_page */
> +#define FOLL_GET	0x02	/* do get_page on page (equivalent to BIO_FOLL_GET) */
> +#define FOLL_WRITE	0x04	/* check pte is writable */
> +#define FOLL_TOUCH	0x08	/* mark page accessed */
> +#define FOLL_DUMP	0x10	/* give error on hole if it would be zero */
> +#define FOLL_FORCE	0x20	/* get_user_pages read/write w/o permission */
> +#define FOLL_NOWAIT	0x40	/* if a disk transfer is needed, start the IO
>   				 * and return without waiting upon it */
>   #define FOLL_NOFAULT	0x80	/* do not fault in pages */
>   #define FOLL_HWPOISON	0x100	/* check page is hwpoisoned */
> -#define FOLL_TRIED	0x800	/* a retry, previous pass started an IO */
> -#define FOLL_REMOTE	0x2000	/* we are working on non-current tsk/mm */
> -#define FOLL_ANON	0x8000	/* don't do file mappings */
> -#define FOLL_LONGTERM	0x10000	/* mapping lifetime is indefinite: see below */
> -#define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
> -#define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
> -#define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
> -#define FOLL_PCI_P2PDMA	0x100000 /* allow returning PCI P2PDMA pages */
> -#define FOLL_INTERRUPTIBLE  0x200000 /* allow interrupts from generic signals */
> +#define FOLL_TRIED	0x200	/* a retry, previous pass started an IO */
> +#define FOLL_REMOTE	0x400	/* we are working on non-current tsk/mm */
> +#define FOLL_ANON	0x800	/* don't do file mappings */
> +#define FOLL_LONGTERM	0x1000	/* mapping lifetime is indefinite: see below */
> +#define FOLL_SPLIT_PMD	0x2000	/* split huge pmd before returning */
> +#define FOLL_FAST_ONLY	0x4000	/* gup_fast: prevent fall-back to slow gup */
> +#define FOLL_PCI_P2PDMA	0x8000 /* allow returning PCI P2PDMA pages */
> +#define FOLL_INTERRUPTIBLE  0x10000 /* allow interrupts from generic signals */
>   
>   /*
> + * Note that FOLL_PIN is sorted to bit 0 to be coincident with BIO_PAGE_PINNED.
> + *
>    * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
>    * other. Here is what they mean, and how to use them:
>    *
> 
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24  3:08   ` John Hubbard
@ 2023-01-24  3:11     ` John Hubbard
  2023-01-24 13:13       ` Jason Gunthorpe
                         ` (2 more replies)
  0 siblings, 3 replies; 75+ messages in thread
From: John Hubbard @ 2023-01-24  3:11 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig, Jason Gunthorpe
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

On 1/23/23 19:08, John Hubbard wrote:
> On 1/23/23 09:30, David Howells wrote:
>> Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that
>> they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and
>> also so that they can be stored in the bottom two bits of a page pointer
>> (something I'm looking at for zerocopy socket fragments).
>>
>> (Note that BIO_PAGE_REFFED should probably be got rid of at some point,
>> hence why FOLL_PIN is at 0.)
>>
>> Also renumber down the other FOLL_* flags to close the gaps.
> 
> Should we also get these sorted into internal-to-mm and public sets?
> Because Jason (+Cc) again was about to split them apart into
> mm/internal.h [1] and that might make that a little cleaner.

I see that I omitted both Jason's Cc and the reference, so I'll resend.
Sorry for the extra noise.

[1] https://lore.kernel.org/all/fcdb3294-3740-a0e0-b115-12842eb0696d@nvidia.com/

> 
> Also, I don't think that there is any large readability difference
> either way between 0x and <<1, so whatever you and Christophe settle on
> there seems fine.
> 
> So either way with those points,
> 
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> 
> thanks,

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing
  2023-01-24  2:42   ` John Hubbard
@ 2023-01-24  5:59     ` Christoph Hellwig
  2023-01-24  7:03       ` John Hubbard
  0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24  5:59 UTC (permalink / raw)
  To: John Hubbard
  Cc: David Howells, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig

On Mon, Jan 23, 2023 at 06:42:28PM -0800, John Hubbard wrote:
> > @@ -202,7 +202,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
> >   	bio->bi_private = dio;
> >   	bio->bi_end_io = iomap_dio_bio_end_io;
> > -	get_page(page);
> > +	bio_set_flag(bio, BIO_NO_PAGE_REF);
> 
> ...is it accurate to assume that the entire bio is pointing to the zero
> page? I recall working through this area earlier last year, and ended up
> just letting the zero page get pinned, and then unpinning upon release,
> which is harmless.

Yes, the bio is built 4 lines above what is quoted here, and submitted
right after it.  It only contains the ZERO_PAGE.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing
  2023-01-24  5:59     ` Christoph Hellwig
@ 2023-01-24  7:03       ` John Hubbard
  0 siblings, 0 replies; 75+ messages in thread
From: John Hubbard @ 2023-01-24  7:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Howells, Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig

On 1/23/23 21:59, Christoph Hellwig wrote:
> On Mon, Jan 23, 2023 at 06:42:28PM -0800, John Hubbard wrote:
>>> @@ -202,7 +202,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
>>>   	bio->bi_private = dio;
>>>   	bio->bi_end_io = iomap_dio_bio_end_io;
>>> -	get_page(page);
>>> +	bio_set_flag(bio, BIO_NO_PAGE_REF);
>>
>> ...is it accurate to assume that the entire bio is pointing to the zero
>> page? I recall working through this area earlier last year, and ended up
>> just letting the zero page get pinned, and then unpinning upon release,
>> which is harmless.
> 
> Yes, the bio is built 4 lines above what is quoted here, and submitted
> right after it.  It only contains the ZERO_PAGE.

OK, yes. All good, then.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells
  2023-01-23 18:25   ` Christoph Hellwig
  2023-01-24  3:08   ` John Hubbard
@ 2023-01-24  7:05   ` David Howells
  2 siblings, 0 replies; 75+ messages in thread
From: David Howells @ 2023-01-24  7:05 UTC (permalink / raw)
  To: John Hubbard
  Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel, Christoph Hellwig, linux-mm

John Hubbard <jhubbard@nvidia.com> wrote:

> > Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that
> > they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and
> > also so that they can be stored in the bottom two bits of a page pointer
> > (something I'm looking at for zerocopy socket fragments).
> > (Note that BIO_PAGE_REFFED should probably be got rid of at some point,
> > hence why FOLL_PIN is at 0.)
> > Also renumber down the other FOLL_* flags to close the gaps.
> 
> Should we also get these sorted into internal-to-mm and public sets?
> Because Jason (+Cc) again was about to split them apart into
> mm/internal.h [1] and that might make that a little cleaner.

My plan was to push this patch by itself through akpm since it's only an
optimisation and not necessary to the rest of the patches here.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list)
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
                   ` (10 preceding siblings ...)
  2023-01-24  2:02 ` [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) John Hubbard
@ 2023-01-24 12:44 ` David Hildenbrand
  2023-01-24 13:16   ` Christoph Hellwig
  2023-01-24 13:44 ` David Howells
  12 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 12:44 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel

On 23.01.23 18:29, David Howells wrote:
> Hi Al, Christoph,
> 
> Here are patches to provide support for extracting pages from an iov_iter
> and to use this in the extraction functions in the block layer bio code.
> 
> The patches make the following changes:
> 
>   (1) Add a function, iov_iter_extract_pages() to replace
>       iov_iter_get_pages*() that gets refs, pins or just lists the pages as
>       appropriate to the iterator type.
> 
>       Add a function, iov_iter_extract_mode() that will indicate from the
>       iterator type how the cleanup is to be performed, returning FOLL_PIN
>       or 0.
> 
>   (2) Add a function, folio_put_unpin(), and a wrapper, page_put_unpin(),
>       that take a page and the return from iov_iter_extract_mode() and do
>       the right thing to clean up the page.
> 
>   (3) Make the bio struct carry a pair of flags to indicate the cleanup
>       mode.  BIO_NO_PAGE_REF is replaced with BIO_PAGE_REFFED (equivalent to
>       FOLL_GET) and BIO_PAGE_PINNED (equivalent to BIO_PAGE_PINNED) is
>       added.
> 
>   (4) Add a function, bio_release_page(), to release a page appropriately to
>       the cleanup mode indicated by the BIO_PAGE_* flags.
> 
>   (5) Make the iter-to-bio code use iov_iter_extract_pages() to retain the
>       pages appropriately and clean them up later.
> 
>   (6) Fix bio_flagged() so that it doesn't prevent a gcc optimisation.
> 
>   (7) Renumber FOLL_PIN and FOLL_GET down so that they're at bits 0 and 1
>       and coincident with BIO_PAGE_PINNED and BIO_PAGE_REFFED.  The compiler
>       can then optimise on that.  Also, it's probably going to be necessary
>       to embed these in the page pointer in sk_buff fragments.  This patch
>       can go independently through the mm tree.

^ I feel like some of that information might be stale now that you're 
only using FOLL_PIN.

> 
> I've pushed the patches here also:
> 
> 	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-extract

I gave this a quick test and it indeed fixes the last remaining test 
case of my O_DIRECT+fork tests [1] that was still failing on upstream 
(test3).


Once landed upstream, if we feel confident enough (I tend to), we could 
adjust the open() man page to state that O_DIRECT can now be run 
concurrently with fork(). Especially, the following documentation might 
be adjusted:

"O_DIRECT  I/Os  should  never  be run concurrently with the fork(2) 
system call, if the memory buffer is a private mapping (i.e., any 
mapping created with the mmap(2) MAP_PRIVATE flag; this includes  memory 
  allocated  on  the  heap  and statically allocated buffers).  Any such 
I/Os, whether submitted via an asynchronous I/O interface or from 
another thread in the  process, should  be completed before fork(2) is 
called.  Failure to do so can result in data corruption and undefined 
behavior in parent and child processes."


This series does not yet fix vmsplice()+hugetlb ... simply because your 
series does not mess with the vmsplice() implementation I assume ;) Once 
vmsplice() uses FOLL_PIN, all cow tests should be passing as well. Easy 
to test:

$ cd tools/testing/selftests/vm/
$ echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
$ echo 2 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
$ ./cow
...
Bail out! 8 out of 190 tests failed
# Totals: pass:181 fail:8 xfail:0 xpass:0 skip:1 error:0


[1] https://gitlab.com/davidhildenbrand/o_direct_fork_tests

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24  3:11     ` John Hubbard
@ 2023-01-24 13:13       ` Jason Gunthorpe
  2023-01-24 13:18         ` Christoph Hellwig
  2023-01-24 13:40       ` David Howells
  2023-01-24 13:46       ` David Howells
  2 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-01-24 13:13 UTC (permalink / raw)
  To: John Hubbard
  Cc: David Howells, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig,
	linux-mm

On Mon, Jan 23, 2023 at 07:11:42PM -0800, John Hubbard wrote:
> On 1/23/23 19:08, John Hubbard wrote:
> > On 1/23/23 09:30, David Howells wrote:
> > > Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that
> > > they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and
> > > also so that they can be stored in the bottom two bits of a page pointer
> > > (something I'm looking at for zerocopy socket fragments).
> > > 
> > > (Note that BIO_PAGE_REFFED should probably be got rid of at some point,
> > > hence why FOLL_PIN is at 0.)
> > > 
> > > Also renumber down the other FOLL_* flags to close the gaps.
> > 
> > Should we also get these sorted into internal-to-mm and public sets?
> > Because Jason (+Cc) again was about to split them apart into
> > mm/internal.h [1] and that might make that a little cleaner.
> 
> I see that I omitted both Jason's Cc and the reference, so I'll resend.
> Sorry for the extra noise.
> 
> [1] https://lore.kernel.org/all/fcdb3294-3740-a0e0-b115-12842eb0696d@nvidia.com/

Yeah, I already wrote a similar patch, using the 1<< notation, 
splitting the internal/external, and rebasing on the move to
mm_types.. I can certainly drop that patch if we'd rather do this.

Though, I'm not so keen on using FOLL_ internal flags inside the block
layer.. Can you stick with the BIO versions of these?

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list)
  2023-01-24 12:44 ` David Hildenbrand
@ 2023-01-24 13:16   ` Christoph Hellwig
  2023-01-24 13:22     ` David Hildenbrand
  0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24 13:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Howells, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel

On Tue, Jan 24, 2023 at 01:44:21PM +0100, David Hildenbrand wrote:
> Once landed upstream, if we feel confident enough (I tend to), we could
> adjust the open() man page to state that O_DIRECT can now be run
> concurrently with fork(). Especially, the following documentation might be
> adjusted:

Note that while these series coverts the two most commonly used
O_DIRECT implementations, there are various others ones that do not
pin the pages yet.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 13:13       ` Jason Gunthorpe
@ 2023-01-24 13:18         ` Christoph Hellwig
  2023-01-24 13:43           ` Jason Gunthorpe
  0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24 13:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, David Howells, Al Viro, Christoph Hellwig,
	Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

On Tue, Jan 24, 2023 at 09:13:30AM -0400, Jason Gunthorpe wrote:
> Yeah, I already wrote a similar patch, using the 1<< notation, 
> splitting the internal/external, and rebasing on the move to
> mm_types.. I can certainly drop that patch if we'd rather do this.

Given that you are doing more work in that area it might be best
to drop this patch from this series.

> Though, I'm not so keen on using FOLL_ internal flags inside the block
> layer.. Can you stick with the BIO versions of these?

The block layer doesn't really use it - the new helper in iov_iter.c
returns it, and the block layer instantly turns it into an internal
flag.  But maybe it should just return a bool pinned (by reference)
now?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list)
  2023-01-24 13:16   ` Christoph Hellwig
@ 2023-01-24 13:22     ` David Hildenbrand
  2023-01-24 13:32       ` Christoph Hellwig
  0 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 13:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Howells, Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel

On 24.01.23 14:16, Christoph Hellwig wrote:
> On Tue, Jan 24, 2023 at 01:44:21PM +0100, David Hildenbrand wrote:
>> Once landed upstream, if we feel confident enough (I tend to), we could
>> adjust the open() man page to state that O_DIRECT can now be run
>> concurrently with fork(). Especially, the following documentation might be
>> adjusted:
> 
> Note that while these series coverts the two most commonly used
> O_DIRECT implementations, there are various others ones that do not
> pin the pages yet.

Thanks for the info ... I assume these are then for other filesystems, 
right? (such that we could adjust the tests to exercise these as well)

... do we have a list (or is it easy to make one)? :)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list)
  2023-01-24 13:22     ` David Hildenbrand
@ 2023-01-24 13:32       ` Christoph Hellwig
  2023-01-24 13:35         ` David Hildenbrand
  0 siblings, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24 13:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christoph Hellwig, David Howells, Al Viro, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel

On Tue, Jan 24, 2023 at 02:22:36PM +0100, David Hildenbrand wrote:
> > Note that while these series coverts the two most commonly used
> > O_DIRECT implementations, there are various others ones that do not
> > pin the pages yet.
> 
> Thanks for the info ... I assume these are then for other filesystems,
> right? (such that we could adjust the tests to exercise these as well)

Yes.  There's the fs/direct-io.c code still used by a few block based
file systems, and then all the not block based file systems as well
(e.g. NFS, cifs).

> ... do we have a list (or is it easy to make one)? :)

fs/direct-io.c is easy, just grep for blockdev_direct_IO.

The others are more complicated to find, but a grep for
iov_iter_get_pages2 and iov_iter_get_pages_alloc2 in fs/ should be a
good approximation.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list)
  2023-01-24 13:32       ` Christoph Hellwig
@ 2023-01-24 13:35         ` David Hildenbrand
  0 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 13:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Howells, Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel

On 24.01.23 14:32, Christoph Hellwig wrote:
> On Tue, Jan 24, 2023 at 02:22:36PM +0100, David Hildenbrand wrote:
>>> Note that while these series coverts the two most commonly used
>>> O_DIRECT implementations, there are various others ones that do not
>>> pin the pages yet.
>>
>> Thanks for the info ... I assume these are then for other filesystems,
>> right? (such that we could adjust the tests to exercise these as well)
> 
> Yes.  There's the fs/direct-io.c code still used by a few block based
> file systems, and then all the not block based file systems as well
> (e.g. NFS, cifs).
> 
>> ... do we have a list (or is it easy to make one)? :)
> 
> fs/direct-io.c is easy, just grep for blockdev_direct_IO.
> 
> The others are more complicated to find, but a grep for
> iov_iter_get_pages2 and iov_iter_get_pages_alloc2 in fs/ should be a
> good approximation.

Right, iov_iter_get_pages2() includes vmsplice() that still needs love. 
Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24  3:11     ` John Hubbard
  2023-01-24 13:13       ` Jason Gunthorpe
@ 2023-01-24 13:40       ` David Howells
  2023-01-24 13:46       ` David Howells
  2 siblings, 0 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 13:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig,
	Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

Jason Gunthorpe <jgg@nvidia.com> wrote:

> Though, I'm not so keen on using FOLL_ internal flags inside the block
> layer.. Can you stick with the BIO versions of these?

It's used in a new iov_iter function and will be used in a bunch of
filesystems including network filesystems.  I'm also looking at using it in
the sk_buff handling also.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 13:18         ` Christoph Hellwig
@ 2023-01-24 13:43           ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2023-01-24 13:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: John Hubbard, David Howells, Al Viro, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel, Christoph Hellwig, linux-mm

On Tue, Jan 24, 2023 at 05:18:07AM -0800, Christoph Hellwig wrote:
> On Tue, Jan 24, 2023 at 09:13:30AM -0400, Jason Gunthorpe wrote:
> > Yeah, I already wrote a similar patch, using the 1<< notation, 
> > splitting the internal/external, and rebasing on the move to
> > mm_types.. I can certainly drop that patch if we'd rather do this.
> 
> Given that you are doing more work in that area it might be best
> to drop this patch from this series.
> 
> > Though, I'm not so keen on using FOLL_ internal flags inside the block
> > layer.. Can you stick with the BIO versions of these?
> 
> The block layer doesn't really use it - the new helper in iov_iter.c
> returns it, and the block layer instantly turns it into an internal
> flag.  But maybe it should just return a bool pinned (by reference)
> now?

Yes, a bool flag to the new mm helper instead of a FOLL_* seems good
to me

Thanks,
Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list)
  2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
                   ` (11 preceding siblings ...)
  2023-01-24 12:44 ` David Hildenbrand
@ 2023-01-24 13:44 ` David Howells
  12 siblings, 0 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 13:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel

If you look here:

	https://lore.kernel.org/r/167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk/

you can see additional patches fixing other users.  Christoph asked if I could
pare down the patchset to the minimum to fix the bio case for the moment.

It will be easier to do the others once iov_iter_extract_pages() is upstream
as the individual bits can go via their respective maintainers.

However, I do want to see about adding cifs iteratorisation in this merge
window also, which also depends on the iov_iter_extract_pages() function being
added.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24  3:11     ` John Hubbard
  2023-01-24 13:13       ` Jason Gunthorpe
  2023-01-24 13:40       ` David Howells
@ 2023-01-24 13:46       ` David Howells
  2023-01-24 13:47         ` Jason Gunthorpe
  2023-01-24 13:57         ` David Howells
  2 siblings, 2 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 13:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig,
	Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

Jason Gunthorpe <jgg@nvidia.com> wrote:

> Yeah, I already wrote a similar patch, using the 1<< notation, 
> splitting the internal/external, and rebasing on the move to
> mm_types.. I can certainly drop that patch if we'd rather do this.

Note that I've already given Andrew a patch to move FOLL_* to mm_types.h.

I'm happy to go with your patch on top of that if you can just renumber
FOLL_PIN to 0 and FOLL_GET to 1.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 13:46       ` David Howells
@ 2023-01-24 13:47         ` Jason Gunthorpe
  2023-01-24 13:57         ` David Howells
  1 sibling, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2023-01-24 13:47 UTC (permalink / raw)
  To: David Howells
  Cc: John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig,
	linux-mm

On Tue, Jan 24, 2023 at 01:46:23PM +0000, David Howells wrote:
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > Yeah, I already wrote a similar patch, using the 1<< notation, 
> > splitting the internal/external, and rebasing on the move to
> > mm_types.. I can certainly drop that patch if we'd rather do this.
> 
> Note that I've already given Andrew a patch to move FOLL_* to mm_types.h.
> 
> I'm happy to go with your patch on top of that if you can just renumber
> FOLL_PIN to 0 and FOLL_GET to 1.

I moved FOLL_PIN to internal.h because it is not supposed to be used
outside mm/*

I would prefer to keep it that way.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 13:46       ` David Howells
  2023-01-24 13:47         ` Jason Gunthorpe
@ 2023-01-24 13:57         ` David Howells
  2023-01-24 14:00           ` Jason Gunthorpe
                             ` (3 more replies)
  1 sibling, 4 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 13:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig,
	Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

Jason Gunthorpe <jgg@nvidia.com> wrote:

> I moved FOLL_PIN to internal.h because it is not supposed to be used
> outside mm/*
> 
> I would prefer to keep it that way.

I'll need to do something else, then, such as creating a new pair of 'cleanup'
flags:

	[include/linux/mm_types.h]
	#define PAGE_CLEANUP_UNPIN	(1U << 0)
	#define PAGE_CLEANUP_PUT	(1U << 0)

	[mm/gup.h]
	void folio_put_unpin(struct folio *folio, unsigned int cleanup_flags)
	{
		unsigned int gup_flags = 0;

		cleanup_flags &= PAGE_CLEANUP_UNPIN | PAGE_CLEANUP_PUT;
		if (!cleanup_flags)
			return;
		gup_flags |= cleanup_flags & PAGE_CLEANUP_UNPIN ? FOLL_PIN : 0;
		gup_flags |= cleanup_flags & PAGE_CLEANUP_PUT ? FOLL_GET : 0;
		gup_put_folio(folio, 1, flags);
	}
	EXPORT_SYMBOL_GPL(folio_put_unpin);

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 13:57         ` David Howells
@ 2023-01-24 14:00           ` Jason Gunthorpe
  2023-01-24 14:02           ` Christoph Hellwig
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2023-01-24 14:00 UTC (permalink / raw)
  To: David Howells
  Cc: John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig,
	linux-mm

On Tue, Jan 24, 2023 at 01:57:08PM +0000, David Howells wrote:
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > I moved FOLL_PIN to internal.h because it is not supposed to be used
> > outside mm/*
> > 
> > I would prefer to keep it that way.
> 
> I'll need to do something else, then, such as creating a new pair of 'cleanup'
> flags:
> 
> 	[include/linux/mm_types.h]
> 	#define PAGE_CLEANUP_UNPIN	(1U << 0)
> 	#define PAGE_CLEANUP_PUT	(1U << 0)
> 
> 	[mm/gup.h]
> 	void folio_put_unpin(struct folio *folio, unsigned int cleanup_flags)
> 	{
> 		unsigned int gup_flags = 0;
> 
> 		cleanup_flags &= PAGE_CLEANUP_UNPIN | PAGE_CLEANUP_PUT;
> 		if (!cleanup_flags)
> 			return;
> 		gup_flags |= cleanup_flags & PAGE_CLEANUP_UNPIN ? FOLL_PIN : 0;
> 		gup_flags |= cleanup_flags & PAGE_CLEANUP_PUT ? FOLL_GET : 0;
> 		gup_put_folio(folio, 1, flags);
> 	}
> 	EXPORT_SYMBOL_GPL(folio_put_unpin);


I suggest:

if (cleanup_flags & PAGE_CLEANUP_UNPIN)
   gup_put_folio(folio, 1, true);
else if (cleanup_flags & PAGE_CLEANUP_PUT)
   gup_put_folio(folio, 1, false);

or if you are really counting cycles

if (cleanup_flags & PAGE_CLEANUP_NEEDED)
   gup_put_folio(folio, 1, cleanup_flags & PAGE_CLEANUP_UNPIN)

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 13:57         ` David Howells
  2023-01-24 14:00           ` Jason Gunthorpe
@ 2023-01-24 14:02           ` Christoph Hellwig
  2023-01-24 14:11           ` David Howells
  2023-01-24 14:12           ` David Howells
  3 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24 14:02 UTC (permalink / raw)
  To: David Howells
  Cc: Jason Gunthorpe, John Hubbard, Al Viro, Christoph Hellwig,
	Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

On Tue, Jan 24, 2023 at 01:57:08PM +0000, David Howells wrote:
> 	[include/linux/mm_types.h]
> 	#define PAGE_CLEANUP_UNPIN	(1U << 0)
> 	#define PAGE_CLEANUP_PUT	(1U << 0)

With the latest series we don't need PAGE_CLEANUP_PUT at all.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 13:57         ` David Howells
  2023-01-24 14:00           ` Jason Gunthorpe
  2023-01-24 14:02           ` Christoph Hellwig
@ 2023-01-24 14:11           ` David Howells
  2023-01-24 14:14             ` Jason Gunthorpe
  2023-01-24 14:27             ` David Howells
  2023-01-24 14:12           ` David Howells
  3 siblings, 2 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 14:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig,
	Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

Jason Gunthorpe <jgg@nvidia.com> wrote:

> if (cleanup_flags & PAGE_CLEANUP_UNPIN)
>    gup_put_folio(folio, 1, true);
> else if (cleanup_flags & PAGE_CLEANUP_PUT)
>    gup_put_folio(folio, 1, false);

gup_put_folio() doesn't take a bool - it takes FOLL_PIN and FOLL_GET:

static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)

though I could do:

	if (cleanup_flags & PAGE_CLEANUP_UNPIN)
		gup_put_folio(folio, 1, FOLL_PIN);
	else if (cleanup_flags & PAGE_CLEANUP_PUT)
		gup_put_folio(folio, 1, FOLL_GET);

But the reason I wrote it like this:

 		gup_flags |= cleanup_flags & PAGE_CLEANUP_UNPIN ? FOLL_PIN : 0;
 		gup_flags |= cleanup_flags & PAGE_CLEANUP_PUT ? FOLL_GET : 0;

is that if PAGE_CLEANUP_UNPIN == FOLL_PIN and PAGE_CLEANUP_PUT == FOLL_GET,
the C compiler optimiser should be able to turn all of that into a single AND
instruction.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 13:57         ` David Howells
                             ` (2 preceding siblings ...)
  2023-01-24 14:11           ` David Howells
@ 2023-01-24 14:12           ` David Howells
  2023-01-24 14:13             ` Christoph Hellwig
  2023-01-24 14:25             ` David Howells
  3 siblings, 2 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 14:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, Jason Gunthorpe, John Hubbard, Al Viro, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig,
	linux-mm

Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Jan 24, 2023 at 01:57:08PM +0000, David Howells wrote:
> > 	[include/linux/mm_types.h]
> > 	#define PAGE_CLEANUP_UNPIN	(1U << 0)
> > 	#define PAGE_CLEANUP_PUT	(1U << 0)
> 
> With the latest series we don't need PAGE_CLEANUP_PUT at all.

We will when it comes to skbuffs.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 14:12           ` David Howells
@ 2023-01-24 14:13             ` Christoph Hellwig
  2023-01-24 14:25             ` David Howells
  1 sibling, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24 14:13 UTC (permalink / raw)
  To: David Howells
  Cc: Christoph Hellwig, Jason Gunthorpe, John Hubbard, Al Viro,
	Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

On Tue, Jan 24, 2023 at 02:12:25PM +0000, David Howells wrote:
> > With the latest series we don't need PAGE_CLEANUP_PUT at all.
> 
> We will when it comes to skbuffs.

I'm a little doubtful of that.  But IFF skbuffs need it, we can
just change the interface at that point.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 14:11           ` David Howells
@ 2023-01-24 14:14             ` Jason Gunthorpe
  2023-01-24 14:27             ` David Howells
  1 sibling, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2023-01-24 14:14 UTC (permalink / raw)
  To: David Howells
  Cc: John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig,
	linux-mm

On Tue, Jan 24, 2023 at 02:11:50PM +0000, David Howells wrote:
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > if (cleanup_flags & PAGE_CLEANUP_UNPIN)
> >    gup_put_folio(folio, 1, true);
> > else if (cleanup_flags & PAGE_CLEANUP_PUT)
> >    gup_put_folio(folio, 1, false);
> 
> gup_put_folio() doesn't take a bool - it takes FOLL_PIN and FOLL_GET:

My point was to change that to take a 'bool unpin' because FOLL_PIN is
not to be used outside gup.c

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 14:12           ` David Howells
  2023-01-24 14:13             ` Christoph Hellwig
@ 2023-01-24 14:25             ` David Howells
  1 sibling, 0 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 14:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, Jason Gunthorpe, John Hubbard, Al Viro, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig,
	linux-mm

Christoph Hellwig <hch@infradead.org> wrote:

> > > With the latest series we don't need PAGE_CLEANUP_PUT at all.
> > 
> > We will when it comes to skbuffs.
> 
> I'm a little doubtful of that.

skbuff fragments will be and will have to be a mixture of allocated pages that
need putting and pinned or non-ref'd and non-pinned zerocopy stuff.  I have
posted a patch that works for the limited amount of driverage that I use on my
test machine.

Think network filesystem messages where you have a mixture of protocol bits
generated by the kernel and data provided by direct I/O being sent by zerocopy
(libceph, for example).

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator
  2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells
  2023-01-23 18:21   ` Christoph Hellwig
@ 2023-01-24 14:27   ` David Hildenbrand
  2023-01-24 14:35   ` David Howells
  2 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 14:27 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, John Hubbard, linux-mm

On 23.01.23 18:29, David Howells wrote:
> Add a function, iov_iter_extract_pages(), to extract a list of pages from
> an iterator.  The pages may be returned with a pin added or nothing,
> depending on the type of iterator.
> 
> Add a second function, iov_iter_extract_mode(), to determine how the
> cleanup should be done.
> 
> There are two cases:
> 
>   (1) ITER_IOVEC or ITER_UBUF iterator.
> 
>       Extracted pages will have pins (FOLL_PIN) obtained on them so that a
>       concurrent fork() will forcibly copy the page so that DMA is done
>       to/from the parent's buffer and is unavailable to/unaffected by the
>       child process.
> 
>       iov_iter_extract_mode() will return FOLL_PIN for this case.  The
>       caller should use something like folio_put_unpin() to dispose of the
>       page.
> 
>   (2) Any other sort of iterator.
> 
>       No refs or pins are obtained on the page, the assumption is made that
>       the caller will manage page retention.
> 
>       iov_iter_extract_mode() will return 0.  The pages don't need
>       additional disposal.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: Christoph Hellwig <hch@lst.de>
> cc: John Hubbard <jhubbard@nvidia.com>
> cc: David Hildenbrand <david@redhat.com>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-mm@kvack.org
> ---
> 
> Notes:
>      ver #8)
>       - It seems that all DIO is supposed to be done under FOLL_PIN now, and not
>         FOLL_GET, so switch to only using pin_user_pages() for user-backed
>         iters.
>       - Wrap an argument in brackets in the iov_iter_extract_mode() macro.
>       - Drop the extract_flags argument to iov_iter_extract_mode() for now
>         [hch].
>      
>      ver #7)
>       - Switch to passing in iter-specific flags rather than FOLL_* flags.
>       - Drop the direction flags for now.
>       - Use ITER_ALLOW_P2PDMA to request FOLL_PCI_P2PDMA.
>       - Disallow use of ITER_ALLOW_P2PDMA with non-user-backed iter.
>       - Add support for extraction from KVEC-type iters.
>       - Use iov_iter_advance() rather than open-coding it.
>       - Make BVEC- and KVEC-type skip over initial empty vectors.
>      
>      ver #6)
>       - Add back the function to indicate the cleanup mode.
>       - Drop the cleanup_mode return arg to iov_iter_extract_pages().
>       - Pass FOLL_SOURCE/DEST_BUF in gup_flags.  Check this against the iter
>         data_source.
>      
>      ver #4)
>       - Use ITER_SOURCE/DEST instead of WRITE/READ.
>       - Allow additional FOLL_* flags, such as FOLL_PCI_P2PDMA to be passed in.
>      
>      ver #3)
>       - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access
>         to get/pin_user_pages_fast()[1].
> 
>   include/linux/uio.h |  22 +++
>   lib/iov_iter.c      | 320 ++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 342 insertions(+)
> 
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 46d5080314c6..a8165335f8da 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -363,4 +363,26 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
>   /* Flags for iov_iter_get/extract_pages*() */
>   #define ITER_ALLOW_P2PDMA	0x01	/* Allow P2PDMA on the extracted pages */
>   
> +ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages,
> +			       size_t maxsize, unsigned int maxpages,
> +			       unsigned int extract_flags, size_t *offset0);
> +
> +/**
> + * iov_iter_extract_mode - Indicate how pages from the iterator will be retained
> + * @iter: The iterator
> + *
> + * Examine the iterator and indicate by returning FOLL_PIN or 0 as to how, if
> + * at all, pages extracted from the iterator will be retained by the extraction
> + * function.
> + *
> + * FOLL_PIN indicates that the pages will have a pin placed in them that the
> + * caller must unpin.  This is must be done for DMA/async DIO to force fork()
> + * to forcibly copy a page for the child (the parent must retain the original
> + * page).
> + *
> + * 0 indicates that no measures are taken and that it's up to the caller to
> + * retain the pages.
> + */
> +#define iov_iter_extract_mode(iter) (user_backed_iter(iter) ? FOLL_PIN : 0)
> +

Does it make sense to move that to the patch where it is needed? (do we 
need it at all anymore?)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 14:11           ` David Howells
  2023-01-24 14:14             ` Jason Gunthorpe
@ 2023-01-24 14:27             ` David Howells
  2023-01-24 14:31               ` Jason Gunthorpe
  2023-01-24 14:59               ` David Howells
  1 sibling, 2 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 14:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig,
	Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

Jason Gunthorpe <jgg@nvidia.com> wrote:

> My point was to change that to take a 'bool unpin' because FOLL_PIN is
> not to be used outside gup.c

I need a 3-state wrapper.  But if I can't do it in gup.c, then I'll have to do
it elsewhere.  As Christoph says, most of the places will only be pinned or
not-pinned.  The whole point was to avoid generating new constants when
existing constants would do.

David.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page
  2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells
  2023-01-23 18:21   ` Christoph Hellwig
  2023-01-24  3:03   ` John Hubbard
@ 2023-01-24 14:28   ` David Hildenbrand
  2023-01-24 14:41   ` David Howells
  3 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 14:28 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

On 23.01.23 18:30, David Howells wrote:
> Provide a helper in the get_user_pages code to drop a pin or a ref on a
> page based on being given FOLL_GET or FOLL_PIN in its flags argument or do
> nothing if neither is set.

Does the FOLL_GET part still apply to this patch set?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing
  2023-01-23 17:30 ` [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
  2023-01-23 18:22   ` Christoph Hellwig
  2023-01-24  2:42   ` John Hubbard
@ 2023-01-24 14:29   ` David Hildenbrand
  2 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 14:29 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig

On 23.01.23 18:30, David Howells wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> ZERO_PAGE can't go away, no need to hold an extra reference.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
>   fs/iomap/direct-io.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 9804714b1751..47db4ead1e74 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -202,7 +202,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
>   	bio->bi_private = dio;
>   	bio->bi_end_io = iomap_dio_bio_end_io;
>   
> -	get_page(page);
> +	bio_set_flag(bio, BIO_NO_PAGE_REF);
>   	__bio_add_page(bio, page, len, 0);
>   	iomap_dio_submit_bio(iter, dio, bio, pos);
>   }
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 14:27             ` David Howells
@ 2023-01-24 14:31               ` Jason Gunthorpe
  2023-01-24 14:59               ` David Howells
  1 sibling, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2023-01-24 14:31 UTC (permalink / raw)
  To: David Howells
  Cc: John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig,
	linux-mm

On Tue, Jan 24, 2023 at 02:27:53PM +0000, David Howells wrote:
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > My point was to change that to take a 'bool unpin' because FOLL_PIN is
> > not to be used outside gup.c
> 
> I need a 3-state wrapper.  But if I can't do it in gup.c, then I'll have to do
> it elsewhere.  As Christoph says, most of the places will only be pinned or
> not-pinned.  The whole point was to avoid generating new constants when
> existing constants would do.

What is the 3rd state?

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-23 17:30 ` [PATCH v8 07/10] block: Switch to pinning pages David Howells
  2023-01-23 18:23   ` Christoph Hellwig
@ 2023-01-24 14:32   ` David Hildenbrand
  2023-01-24 14:47   ` David Howells
  2 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 14:32 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig
  Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig

On 23.01.23 18:30, David Howells wrote:
> Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned
> (FOLL_PIN) and that the pin will need removing.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Jan Kara <jack@suse.cz>
> cc: Christoph Hellwig <hch@lst.de>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: Logan Gunthorpe <logang@deltatee.com>
> cc: linux-block@vger.kernel.org
> ---
> 
> Notes:
>      ver #8)
>       - Move the infrastructure to clean up pinned pages to this patch [hch].
>       - Put BIO_PAGE_PINNED before BIO_PAGE_REFFED as the latter should
>         probably be removed at some point.  FOLL_PIN can then be renumbered
>         first.
> 
>   block/bio.c               |  7 ++++---
>   block/blk.h               | 28 ++++++++++++++++++++++++++++
>   include/linux/bio.h       |  3 ++-
>   include/linux/blk_types.h |  1 +
>   4 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 40c2b01906da..6f98bcfc0c92 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1170,13 +1170,14 @@ bool bio_add_folio(struct bio *bio, struct folio *folio, size_t len,
>   
>   void __bio_release_pages(struct bio *bio, bool mark_dirty)
>   {
> +	unsigned int gup_flags = bio_to_gup_flags(bio);
>   	struct bvec_iter_all iter_all;
>   	struct bio_vec *bvec;
>   
>   	bio_for_each_segment_all(bvec, bio, iter_all) {
>   		if (mark_dirty && !PageCompound(bvec->bv_page))
>   			set_page_dirty_lock(bvec->bv_page);
> -		put_page(bvec->bv_page);
> +		page_put_unpin(bvec->bv_page, gup_flags);
>   	}
>   }
>   EXPORT_SYMBOL_GPL(__bio_release_pages);
> @@ -1496,8 +1497,8 @@ void bio_set_pages_dirty(struct bio *bio)
>    * the BIO and re-dirty the pages in process context.
>    *
>    * It is expected that bio_check_pages_dirty() will wholly own the BIO from
> - * here on.  It will run one put_page() against each page and will run one
> - * bio_put() against the BIO.
> + * here on.  It will run one page_put_unpin() against each page and will run
> + * one bio_put() against the BIO.
>    */
>   
>   static void bio_dirty_fn(struct work_struct *work);
> diff --git a/block/blk.h b/block/blk.h
> index 4c3b3325219a..294044d696e0 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -425,6 +425,34 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
>   		struct page *page, unsigned int len, unsigned int offset,
>   		unsigned int max_sectors, bool *same_page);
>   
> +/*
> + * Set the cleanup mode for a bio from an iterator and the extraction flags.
> + */
> +static inline void bio_set_cleanup_mode(struct bio *bio, struct iov_iter *iter)
> +{
> +	unsigned int cleanup_mode = iov_iter_extract_mode(iter);
> +
> +	if (cleanup_mode & FOLL_GET)
> +		bio_set_flag(bio, BIO_PAGE_REFFED);
> +	if (cleanup_mode & FOLL_PIN)
> +		bio_set_flag(bio, BIO_PAGE_PINNED);

Can FOLL_GET ever happen?

IOW, can't this even be

if (user_backed_iter(iter))
	bio_set_flag(bio, BIO_PAGE_PINNED);

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator
  2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells
  2023-01-23 18:21   ` Christoph Hellwig
  2023-01-24 14:27   ` David Hildenbrand
@ 2023-01-24 14:35   ` David Howells
  2023-01-24 14:37     ` David Hildenbrand
  2023-01-24 14:45     ` David Howells
  2 siblings, 2 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 14:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel, Christoph Hellwig, John Hubbard,
	linux-mm

David Hildenbrand <david@redhat.com> wrote:

> > +#define iov_iter_extract_mode(iter) (user_backed_iter(iter) ? FOLL_PIN : 0)
> > +
> 
> Does it make sense to move that to the patch where it is needed? (do we need
> it at all anymore?)

Yes, we need something.  Granted, there are now only two alternatives, not
three: either the pages are going to be pinned or they're not going to be
pinned, but I would rather have a specific function that tells you than just
use, say, user_backed_iter() to make it easier to adjust it later if we need
to - and easier to find the places where we need to adjust.

But if it's preferred we could require people to use user_backed_iter()
instead.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator
  2023-01-24 14:35   ` David Howells
@ 2023-01-24 14:37     ` David Hildenbrand
  2023-01-24 14:45     ` David Howells
  1 sibling, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 14:37 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig, John Hubbard, linux-mm

On 24.01.23 15:35, David Howells wrote:
> David Hildenbrand <david@redhat.com> wrote:
> 
>>> +#define iov_iter_extract_mode(iter) (user_backed_iter(iter) ? FOLL_PIN : 0)
>>> +
>>
>> Does it make sense to move that to the patch where it is needed? (do we need
>> it at all anymore?)
> 
> Yes, we need something.  Granted, there are now only two alternatives, not
> three: either the pages are going to be pinned or they're not going to be
> pinned, but I would rather have a specific function that tells you than just
> use, say, user_backed_iter() to make it easier to adjust it later if we need
> to - and easier to find the places where we need to adjust.
> 
> But if it's preferred we could require people to use user_backed_iter()
> instead.

At least reduces the occurrences of FOLL_PIN :)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page
  2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells
                     ` (2 preceding siblings ...)
  2023-01-24 14:28   ` David Hildenbrand
@ 2023-01-24 14:41   ` David Howells
  2023-01-24 14:52     ` Christoph Hellwig
  2023-01-24 15:04     ` David Howells
  3 siblings, 2 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 14:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel, Christoph Hellwig, linux-mm

David Hildenbrand <david@redhat.com> wrote:

> > Provide a helper in the get_user_pages code to drop a pin or a ref on a
> > page based on being given FOLL_GET or FOLL_PIN in its flags argument or do
> > nothing if neither is set.
> 
> Does the FOLL_GET part still apply to this patch set?

Yes.  Christoph insisted that the bio conversion patch be split up.  That
means there's an interval where you can get FOLL_GET from that.  However,
since Jason wants to hide FOLL_PUT, this is going to have to be removed and
the switching done in the bio code for the bio case (until that reduces to
just pinning) and the skbuff cleanup code (when that is eventually done - that
will have the possibility of skbuffs comprising a mix of ref'd, pinned and
unpinned data, albeit in separate fragments; I've posted patches that
illustrate this[1]).

David

https://lore.kernel.org/all/167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk/ [1] Patches 33-34


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator
  2023-01-24 14:35   ` David Howells
  2023-01-24 14:37     ` David Hildenbrand
@ 2023-01-24 14:45     ` David Howells
  2023-01-24 14:52       ` David Hildenbrand
  1 sibling, 1 reply; 75+ messages in thread
From: David Howells @ 2023-01-24 14:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel, Christoph Hellwig, John Hubbard,
	linux-mm

David Hildenbrand <david@redhat.com> wrote:

> At least reduces the occurrences of FOLL_PIN :)

I don't see where the problem is in letting people supply FOLL_PIN or
FOLL_GET.  Why even have pin_user_pages() and get_user_pages() since they end
up at the same place.  They should be inline wrappers, if separate functions
at all.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-23 17:30 ` [PATCH v8 07/10] block: Switch to pinning pages David Howells
  2023-01-23 18:23   ` Christoph Hellwig
  2023-01-24 14:32   ` David Hildenbrand
@ 2023-01-24 14:47   ` David Howells
  2023-01-24 14:53     ` Christoph Hellwig
  2023-01-24 15:03     ` David Howells
  2 siblings, 2 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 14:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel, Christoph Hellwig

David Hildenbrand <david@redhat.com> wrote:

> > +static inline void bio_set_cleanup_mode(struct bio *bio, struct iov_iter *iter)
> > +{
> > +	unsigned int cleanup_mode = iov_iter_extract_mode(iter);
> > +
> > +	if (cleanup_mode & FOLL_GET)
> > +		bio_set_flag(bio, BIO_PAGE_REFFED);
> > +	if (cleanup_mode & FOLL_PIN)
> > +		bio_set_flag(bio, BIO_PAGE_PINNED);
> 
> Can FOLL_GET ever happen?

Yes - unless patches 8 and 9 are merged.  I had them as one, but Christoph
split them up.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page
  2023-01-24 14:41   ` David Howells
@ 2023-01-24 14:52     ` Christoph Hellwig
  2023-01-24 14:53       ` David Hildenbrand
  2023-01-24 15:04     ` David Howells
  1 sibling, 1 reply; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24 14:52 UTC (permalink / raw)
  To: David Howells
  Cc: David Hildenbrand, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig,
	linux-mm

On Tue, Jan 24, 2023 at 02:41:33PM +0000, David Howells wrote:
> Yes.  Christoph insisted that the bio conversion patch be split up.  That
> means there's an interval where you can get FOLL_GET from that.

The only place where we have both is in the block layer.  It never gets
set by bio_set_cleanup_mode.

Instead we can just keep using put_page dirctly for the BIO_PAGE_REFFED
case in the callers of bio_release_page and in bio_release_pages itself,
and then do away with bio_to_gup_flags and bio_release_page entirely.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator
  2023-01-24 14:45     ` David Howells
@ 2023-01-24 14:52       ` David Hildenbrand
  0 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 14:52 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara,
	Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block,
	linux-kernel, Christoph Hellwig, John Hubbard, linux-mm

On 24.01.23 15:45, David Howells wrote:
> David Hildenbrand <david@redhat.com> wrote:
> 
>> At least reduces the occurrences of FOLL_PIN :)
> 
> I don't see where the problem is in letting people supply FOLL_PIN or
> FOLL_GET.  Why even have pin_user_pages() and get_user_pages() since they end
> up at the same place.  They should be inline wrappers, if separate functions
> at all.

There once was a proposed goal of restricting FOLL_GET to core-mm and 
allowing other drivers etc. to only use FOLL_PIN. Providing 
pin_user_pages() and the corresponding unpin part seemed very clean to me.

To me that makes perfect sense and will prevent any misuse once we 
converted everything relevant to FOLL_PIN.


To be precise, I think we could get away in this patch set by not using 
FOLL_PIN and FOLL_GET and it wouldn't really reduce readability of the 
code ...

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page
  2023-01-24 14:52     ` Christoph Hellwig
@ 2023-01-24 14:53       ` David Hildenbrand
  0 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 14:53 UTC (permalink / raw)
  To: Christoph Hellwig, David Howells
  Cc: Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

On 24.01.23 15:52, Christoph Hellwig wrote:
> On Tue, Jan 24, 2023 at 02:41:33PM +0000, David Howells wrote:
>> Yes.  Christoph insisted that the bio conversion patch be split up.  That
>> means there's an interval where you can get FOLL_GET from that.
> 
> The only place where we have both is in the block layer.  It never gets
> set by bio_set_cleanup_mode.
> 
> Instead we can just keep using put_page dirctly for the BIO_PAGE_REFFED
> case in the callers of bio_release_page and in bio_release_pages itself,
> and then do away with bio_to_gup_flags and bio_release_page entirely.
> 

Thank you, just what I had in mind ...

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-24 14:47   ` David Howells
@ 2023-01-24 14:53     ` Christoph Hellwig
  2023-01-24 15:03     ` David Howells
  1 sibling, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24 14:53 UTC (permalink / raw)
  To: David Howells
  Cc: David Hildenbrand, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig

On Tue, Jan 24, 2023 at 02:47:51PM +0000, David Howells wrote:
> > > +static inline void bio_set_cleanup_mode(struct bio *bio, struct iov_iter *iter)
> > > +{
> > > +	unsigned int cleanup_mode = iov_iter_extract_mode(iter);
> > > +
> > > +	if (cleanup_mode & FOLL_GET)
> > > +		bio_set_flag(bio, BIO_PAGE_REFFED);
> > > +	if (cleanup_mode & FOLL_PIN)
> > > +		bio_set_flag(bio, BIO_PAGE_PINNED);
> > 
> > Can FOLL_GET ever happen?
> 
> Yes - unless patches 8 and 9 are merged.  I had them as one, but Christoph
> split them up.

It can't.  Per your latest branch:

#define iov_iter_extract_mode(iter) (user_backed_iter(iter) ? FOLL_PIN : 0)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 14:27             ` David Howells
  2023-01-24 14:31               ` Jason Gunthorpe
@ 2023-01-24 14:59               ` David Howells
  2023-01-24 15:06                 ` Jason Gunthorpe
  2023-01-24 15:12                 ` David Howells
  1 sibling, 2 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 14:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig,
	Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

Jason Gunthorpe <jgg@nvidia.com> wrote:

> What is the 3rd state?

Consider a network filesystem message generated for a direct I/O that the
network filesystem does zerocopy on.  You may have an sk_buff that has
fragments from one or more of three different sources:

 (1) Fragments consisting of specifically allocated pages, such as the
     IP/UDP/TCP headers that have refs taken on them.

 (2) Fragments consisting of zerocopy kernel buffers that has neither refs nor
     pins belonging to the sk_buff.

     iov_iter_extract_pages() will not take pins when extracting from, say, an
     XARRAY-type or KVEC-type iterator.  iov_iter_extract_mode() will return
     0.

 (3) Fragments consisting of zerocopy user buffers that have pins taken on
     them belonging to the sk_buff.

     iov_iter_extract_pages() will take pins when extracting from, say, a
     UBUF-type or IOVEC-type iterator.  iov_iter_extract_mode() will return
     FOLL_PIN (at the moment).

So you have three states: Ref'd, pinned and no-retention.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-24 14:47   ` David Howells
  2023-01-24 14:53     ` Christoph Hellwig
@ 2023-01-24 15:03     ` David Howells
  2023-01-24 16:44       ` Christoph Hellwig
  1 sibling, 1 reply; 75+ messages in thread
From: David Howells @ 2023-01-24 15:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, David Hildenbrand, Al Viro, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel, Christoph Hellwig

Christoph Hellwig <hch@infradead.org> wrote:

> It can't.  Per your latest branch:

Yes it can.  Patch 6:

--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -282,6 +282,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 	if (blk_queue_pci_p2pdma(rq->q))
 		extract_flags |= ITER_ALLOW_P2PDMA;
 
+	bio_set_flag(bio, BIO_PAGE_REFFED);
 	while (iov_iter_count(iter)) {
 		struct page **pages, *stack_pages[UIO_FASTIOV];
 		ssize_t bytes;

Patch 7:

+static inline unsigned int bio_to_gup_flags(struct bio *bio)
+{
+	return (bio_flagged(bio, BIO_PAGE_REFFED) ? FOLL_GET : 0) |
+		(bio_flagged(bio, BIO_PAGE_PINNED) ? FOLL_PIN : 0);
+}
+
+/*
+ * Clean up a page appropriately, where the page may be pinned, may have a
+ * ref taken on it or neither.
+ */
+static inline void bio_release_page(struct bio *bio, struct page *page)
+{
+	page_put_unpin(page, bio_to_gup_flags(bio));
+}

At patch 8, you can get either FOLL_GET, FOLL_PIN or 0, depending on the path
you go through.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page
  2023-01-24 14:41   ` David Howells
  2023-01-24 14:52     ` Christoph Hellwig
@ 2023-01-24 15:04     ` David Howells
  1 sibling, 0 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 15:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, David Hildenbrand, Al Viro, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel, Christoph Hellwig, linux-mm

Christoph Hellwig <hch@infradead.org> wrote:

> The only place where we have both is in the block layer.  It never gets
> set by bio_set_cleanup_mode.

It gets set directly by patch 6.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 14:59               ` David Howells
@ 2023-01-24 15:06                 ` Jason Gunthorpe
  2023-01-24 15:12                 ` David Howells
  1 sibling, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2023-01-24 15:06 UTC (permalink / raw)
  To: David Howells
  Cc: John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig,
	linux-mm

On Tue, Jan 24, 2023 at 02:59:25PM +0000, David Howells wrote:
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > What is the 3rd state?
> 
> Consider a network filesystem message generated for a direct I/O that the
> network filesystem does zerocopy on.  You may have an sk_buff that has
> fragments from one or more of three different sources:
> 
>  (1) Fragments consisting of specifically allocated pages, such as the
>      IP/UDP/TCP headers that have refs taken on them.
> 
>  (2) Fragments consisting of zerocopy kernel buffers that has neither refs nor
>      pins belonging to the sk_buff.
> 
>      iov_iter_extract_pages() will not take pins when extracting from, say, an
>      XARRAY-type or KVEC-type iterator.  iov_iter_extract_mode() will return
>      0.
> 
>  (3) Fragments consisting of zerocopy user buffers that have pins taken on
>      them belonging to the sk_buff.
> 
>      iov_iter_extract_pages() will take pins when extracting from, say, a
>      UBUF-type or IOVEC-type iterator.  iov_iter_extract_mode() will return
>      FOLL_PIN (at the moment).
> 
> So you have three states: Ref'd, pinned and no-retention.

Isn't that this:

if (cleanup_flags & PAGE_CLEANUP_NEEDED)
   gup_put_folio(folio, 1, cleanup_flags & PAGE_CLEANUP_UNPIN)


?

Three states - decr get, decr pinned, do nothing?

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down
  2023-01-24 14:59               ` David Howells
  2023-01-24 15:06                 ` Jason Gunthorpe
@ 2023-01-24 15:12                 ` David Howells
  1 sibling, 0 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 15:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig,
	Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig, linux-mm

Jason Gunthorpe <jgg@nvidia.com> wrote:

> Isn't that this:
> 
> if (cleanup_flags & PAGE_CLEANUP_NEEDED)
>    gup_put_folio(folio, 1, cleanup_flags & PAGE_CLEANUP_UNPIN)
> 
> 
> ?

Yes.  As claimed, that would be three states.

> Three states - decr get, decr pinned, do nothing?

Yes.  Don't worry about it.  I'm not going to do it in gup.c.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-24 15:03     ` David Howells
@ 2023-01-24 16:44       ` Christoph Hellwig
  2023-01-24 16:46         ` David Hildenbrand
                           ` (3 more replies)
  0 siblings, 4 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24 16:44 UTC (permalink / raw)
  To: David Howells
  Cc: Christoph Hellwig, David Hildenbrand, Al Viro, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig

On Tue, Jan 24, 2023 at 03:03:53PM +0000, David Howells wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > It can't.  Per your latest branch:
> 
> Yes it can.  Patch 6:

This never involves the cleanup mode as input.  And as I pointed out
in the other mail, there is no need for the FOLL_ flags on the
cleanup side.  You can just check the bio flag in bio_release_apges
and call either put_page or unpin_user_page on that.  The direct
callers of bio_release_page never mix them pin and get cases anyway.
Let me find some time to code this up if it's easier to understand
that way.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-24 16:44       ` Christoph Hellwig
@ 2023-01-24 16:46         ` David Hildenbrand
  2023-01-24 16:59         ` Christoph Hellwig
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2023-01-24 16:46 UTC (permalink / raw)
  To: Christoph Hellwig, David Howells
  Cc: Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton,
	Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel,
	Christoph Hellwig

On 24.01.23 17:44, Christoph Hellwig wrote:
> On Tue, Jan 24, 2023 at 03:03:53PM +0000, David Howells wrote:
>> Christoph Hellwig <hch@infradead.org> wrote:
>>
>>> It can't.  Per your latest branch:
>>
>> Yes it can.  Patch 6:
> 
> This never involves the cleanup mode as input.  And as I pointed out
> in the other mail, there is no need for the FOLL_ flags on the
> cleanup side.  You can just check the bio flag in bio_release_apges
> and call either put_page or unpin_user_page on that.  The direct
> callers of bio_release_page never mix them pin and get cases anyway.
> Let me find some time to code this up if it's easier to understand
> that way.

In case this series gets resend (which I assume), it would be great to 
CC linux-mm on the whole thing.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-24 16:44       ` Christoph Hellwig
  2023-01-24 16:46         ` David Hildenbrand
@ 2023-01-24 16:59         ` Christoph Hellwig
  2023-01-24 18:37         ` David Howells
  2023-01-24 18:38         ` David Howells
  3 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24 16:59 UTC (permalink / raw)
  To: David Howells
  Cc: Christoph Hellwig, David Hildenbrand, Al Viro, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig

This is the incremental patch.  Doesn't do the FOLL_PIN to bool
conversion for the extra helper yet, and needs to be folded into the
original patches still.


diff --git a/block/bio.c b/block/bio.c
index 6dc54bf3ed27d4..bd8433f3644fd7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1170,14 +1170,20 @@ bool bio_add_folio(struct bio *bio, struct folio *folio, size_t len,
 
 void __bio_release_pages(struct bio *bio, bool mark_dirty)
 {
-	unsigned int gup_flags = bio_to_gup_flags(bio);
+	bool pinned = bio_flagged(bio, BIO_PAGE_PINNED);
+	bool reffed = bio_flagged(bio, BIO_PAGE_REFFED);
 	struct bvec_iter_all iter_all;
 	struct bio_vec *bvec;
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		if (mark_dirty && !PageCompound(bvec->bv_page))
 			set_page_dirty_lock(bvec->bv_page);
-		page_put_unpin(bvec->bv_page, gup_flags);
+
+		if (pinned)
+			unpin_user_page(bvec->bv_page);
+		/* this can go away once direct-io.c is converted: */
+		else if (reffed)
+			put_page(bvec->bv_page);
 	}
 }
 EXPORT_SYMBOL_GPL(__bio_release_pages);
diff --git a/block/blk.h b/block/blk.h
index 294044d696e09f..a16d4425d2751c 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -430,27 +430,19 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
  */
 static inline void bio_set_cleanup_mode(struct bio *bio, struct iov_iter *iter)
 {
-	unsigned int cleanup_mode = iov_iter_extract_mode(iter);
-
-	if (cleanup_mode & FOLL_GET)
-		bio_set_flag(bio, BIO_PAGE_REFFED);
-	if (cleanup_mode & FOLL_PIN)
+	if (iov_iter_extract_mode(iter) & FOLL_PIN)
 		bio_set_flag(bio, BIO_PAGE_PINNED);
 }
 
-static inline unsigned int bio_to_gup_flags(struct bio *bio)
-{
-	return (bio_flagged(bio, BIO_PAGE_REFFED) ? FOLL_GET : 0) |
-		(bio_flagged(bio, BIO_PAGE_PINNED) ? FOLL_PIN : 0);
-}
-
 /*
  * Clean up a page appropriately, where the page may be pinned, may have a
  * ref taken on it or neither.
  */
 static inline void bio_release_page(struct bio *bio, struct page *page)
 {
-	page_put_unpin(page, bio_to_gup_flags(bio));
+	WARN_ON_ONCE(bio_flagged(bio, BIO_PAGE_REFFED));
+	if (bio_flagged(bio, BIO_PAGE_PINNED))
+		unpin_user_page(page);
 }
 
 struct request_queue *blk_alloc_queue(int node_id);

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-24 16:44       ` Christoph Hellwig
  2023-01-24 16:46         ` David Hildenbrand
  2023-01-24 16:59         ` Christoph Hellwig
@ 2023-01-24 18:37         ` David Howells
  2023-01-24 18:55           ` Christoph Hellwig
  2023-01-24 18:38         ` David Howells
  3 siblings, 1 reply; 75+ messages in thread
From: David Howells @ 2023-01-24 18:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, David Hildenbrand, Al Viro, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel, Christoph Hellwig

Christoph Hellwig <hch@infradead.org> wrote:

> +	WARN_ON_ONCE(bio_flagged(bio, BIO_PAGE_REFFED));

It's still set by fs/direct-io.c.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-24 16:44       ` Christoph Hellwig
                           ` (2 preceding siblings ...)
  2023-01-24 18:37         ` David Howells
@ 2023-01-24 18:38         ` David Howells
  3 siblings, 0 replies; 75+ messages in thread
From: David Howells @ 2023-01-24 18:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: dhowells, Christoph Hellwig, Al Viro, Matthew Wilcox, Jens Axboe,
	Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel,
	linux-block, linux-kernel, Christoph Hellwig

David Hildenbrand <david@redhat.com> wrote:

> In case this series gets resend (which I assume), it would be great to CC
> linux-mm on the whole thing.

v9 is already posted, but I hadn't added linux-mm to it.  I dropped all the
bits that touched the mm side of things.

David


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v8 07/10] block: Switch to pinning pages.
  2023-01-24 18:37         ` David Howells
@ 2023-01-24 18:55           ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2023-01-24 18:55 UTC (permalink / raw)
  To: David Howells
  Cc: Christoph Hellwig, David Hildenbrand, Al Viro, Matthew Wilcox,
	Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe,
	linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig

On Tue, Jan 24, 2023 at 06:37:17PM +0000, David Howells wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > +	WARN_ON_ONCE(bio_flagged(bio, BIO_PAGE_REFFED));
> 
> It's still set by fs/direct-io.c.

But that should never end up in bio_release_page, only
bio_release_pages.


^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2023-01-24 18:56 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-23 17:29 [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) David Howells
2023-01-23 17:29 ` [PATCH v8 01/10] iov_iter: Define flags to qualify page extraction David Howells
2023-01-23 18:20   ` Christoph Hellwig
2023-01-24  2:12   ` John Hubbard
2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells
2023-01-23 18:21   ` Christoph Hellwig
2023-01-24 14:27   ` David Hildenbrand
2023-01-24 14:35   ` David Howells
2023-01-24 14:37     ` David Hildenbrand
2023-01-24 14:45     ` David Howells
2023-01-24 14:52       ` David Hildenbrand
2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells
2023-01-23 18:21   ` Christoph Hellwig
2023-01-24  3:03   ` John Hubbard
2023-01-24 14:28   ` David Hildenbrand
2023-01-24 14:41   ` David Howells
2023-01-24 14:52     ` Christoph Hellwig
2023-01-24 14:53       ` David Hildenbrand
2023-01-24 15:04     ` David Howells
2023-01-23 17:30 ` [PATCH v8 04/10] iomap: don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
2023-01-23 18:22   ` Christoph Hellwig
2023-01-24  2:42   ` John Hubbard
2023-01-24  5:59     ` Christoph Hellwig
2023-01-24  7:03       ` John Hubbard
2023-01-24 14:29   ` David Hildenbrand
2023-01-23 17:30 ` [PATCH v8 05/10] block: Fix bio_flagged() so that gcc can better optimise it David Howells
2023-01-23 17:30 ` [PATCH v8 06/10] block: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning David Howells
2023-01-23 18:23   ` Christoph Hellwig
2023-01-23 17:30 ` [PATCH v8 07/10] block: Switch to pinning pages David Howells
2023-01-23 18:23   ` Christoph Hellwig
2023-01-24 14:32   ` David Hildenbrand
2023-01-24 14:47   ` David Howells
2023-01-24 14:53     ` Christoph Hellwig
2023-01-24 15:03     ` David Howells
2023-01-24 16:44       ` Christoph Hellwig
2023-01-24 16:46         ` David Hildenbrand
2023-01-24 16:59         ` Christoph Hellwig
2023-01-24 18:37         ` David Howells
2023-01-24 18:55           ` Christoph Hellwig
2023-01-24 18:38         ` David Howells
2023-01-23 17:30 ` [PATCH v8 08/10] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
2023-01-23 18:23   ` Christoph Hellwig
2023-01-23 17:30 ` [PATCH v8 09/10] block: convert bio_map_user_iov " David Howells
2023-01-23 18:24   ` Christoph Hellwig
2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells
2023-01-23 18:25   ` Christoph Hellwig
2023-01-24  3:08   ` John Hubbard
2023-01-24  3:11     ` John Hubbard
2023-01-24 13:13       ` Jason Gunthorpe
2023-01-24 13:18         ` Christoph Hellwig
2023-01-24 13:43           ` Jason Gunthorpe
2023-01-24 13:40       ` David Howells
2023-01-24 13:46       ` David Howells
2023-01-24 13:47         ` Jason Gunthorpe
2023-01-24 13:57         ` David Howells
2023-01-24 14:00           ` Jason Gunthorpe
2023-01-24 14:02           ` Christoph Hellwig
2023-01-24 14:11           ` David Howells
2023-01-24 14:14             ` Jason Gunthorpe
2023-01-24 14:27             ` David Howells
2023-01-24 14:31               ` Jason Gunthorpe
2023-01-24 14:59               ` David Howells
2023-01-24 15:06                 ` Jason Gunthorpe
2023-01-24 15:12                 ` David Howells
2023-01-24 14:12           ` David Howells
2023-01-24 14:13             ` Christoph Hellwig
2023-01-24 14:25             ` David Howells
2023-01-24  7:05   ` David Howells
2023-01-24  2:02 ` [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list) John Hubbard
2023-01-24 12:44 ` David Hildenbrand
2023-01-24 13:16   ` Christoph Hellwig
2023-01-24 13:22     ` David Hildenbrand
2023-01-24 13:32       ` Christoph Hellwig
2023-01-24 13:35         ` David Hildenbrand
2023-01-24 13:44 ` David Howells

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.