linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor
@ 2022-10-31 14:49 David Howells
  2022-10-31 14:49 ` [RFC PATCH 1/2] iov_iter: Add a function to extract a page list from an iterator David Howells
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: David Howells @ 2022-10-31 14:49 UTC (permalink / raw)
  To: viro
  Cc: linux-mm, John Hubbard, linux-fsdevel, Christoph Hellwig,
	Matthew Wilcox, dhowells, smfrench, torvalds, linux-cifs,
	linux-kernel


Hi Al,

Here's a patch to provide a function to extract a list of pages from an
iterator, getting pins of refs on them as appropriate.

I added a macro by which you can query an iterator to find out how the
extraction function will treat the pages (it returns 0, FOLL_GET or FOLL_PIN
as appropriate).  Note that it's a macro to avoid #inclusion of linux/mm.h in
linux/uio.h.

I've added another crude, incomplete patch to make cifs use it a bit as an
example.  Note that cifs won't work properly with this under all
circumstances, particularly if it decides to split the rdata or wdata record -
but it seems to work for small I/Os.

David
---
David Howells (2):
      iov_iter: Add a function to extract a page list from an iterator
      cifs: Test use of iov_iter_extract_pages() and iov_iter_extract_mode()


 fs/cifs/cifsglob.h  |   2 +
 fs/cifs/file.c      |  93 +++++++++----
 include/linux/uio.h |  26 ++++
 lib/iov_iter.c      | 333 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 427 insertions(+), 27 deletions(-)



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/2] iov_iter: Add a function to extract a page list from an iterator
  2022-10-31 14:49 [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor David Howells
@ 2022-10-31 14:49 ` David Howells
  2022-10-31 14:49 ` [RFC PATCH 2/2] cifs: Test use of iov_iter_extract_pages() and iov_iter_extract_mode() David Howells
  2022-10-31 14:52 ` [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor Matthew Wilcox
  2 siblings, 0 replies; 5+ messages in thread
From: David Howells @ 2022-10-31 14:49 UTC (permalink / raw)
  To: viro
  Cc: Christoph Hellwig, John Hubbard, Matthew Wilcox, linux-fsdevel,
	linux-mm, dhowells, smfrench, torvalds, linux-cifs, linux-kernel

Add a function, iov_iter_extract_pages(), to extract a list of pages from
an iterator.  The pages may be returned with a reference added or a pin
added or neither, depending on the type of iterator and the direction of
transfer.

If the transfer is *into* an ITER_IOVEC or ITER_UBUF iterator, the pages
will have pins obtained on them, but not references, to avoid fork()
CoW'ing the pages incorrectly whilst DMA read is in progess.

If the transfer is *out of* an ITER_IOVEC or ITER_UBUF iterator, the pages
will have references obtained on them, but not pins.  DMA write is assumed
not to change the page contents.

For any other sort of iterator, no refs or pins are obtained on the page,
the assumption being that the caller will manage retention.

An additional function, iov_iter_extract_mode() is provided to indicate
what the mode of retention employed will be for that iterator.  This is
indicated as 0 (no special retention), FOLL_GET (will get page refs) or
FOLL_PIN (will get page pins).  This can be used by the caller to work out
how to dispose of the pages afterwards: nothing, put_page() or
unpin_user_page().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Christoph Hellwig <hch@lst.de>
cc: John Hubbard <jhubbard@nvidia.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@vger.kernel.org
---

 include/linux/uio.h |   26 ++++
 lib/iov_iter.c      |  333 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 359 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 87fc3d0dda98..1c9b9e8ba777 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -354,4 +354,30 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 	};
 }
 
+ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages,
+			       size_t maxsize, unsigned int maxpages,
+			       size_t *offset0);
+
+/**
+ * iov_iter_extract_mode - Indicate how pages from the iterator will be retained
+ * @iter: The iterator
+ *
+ * Examine the indicator and indicate with FOLL_PIN, FOLL_GET or 0 as to how,
+ * if at all, pages extracted from the iterator will be retained by the
+ * extraction function.
+ *
+ * FOLL_GET indicates that the pages will have a reference taken on them that
+ * the caller must put.  This can be done for DMA/async DIO write from a page.
+ *
+ * FOLL_PIN indicates that the pages will have a pin placed in them that the
+ * caller must unpin.  This is must be done for DMA/async DIO read to a page to
+ * avoid CoW problems in fork.
+ *
+ * 0 indicates that no measures are taken and that it's up to the caller to
+ * retain the pages.
+ */
+#define iov_iter_extract_mode(iter)			\
+	(!user_backed_iter(iter) ? 0 :			\
+	 (iter)->data_source ? FOLL_GET : FOLL_PIN)
+
 #endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 98e8425b060d..96bfb117f19a 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1898,3 +1898,336 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
 		i->iov -= state->nr_segs - i->nr_segs;
 	i->nr_segs = state->nr_segs;
 }
+
+/*
+ * Extract a list of contiguous pages from an ITER_PIPE iterator.  This does
+ * not get references of its own on the pages, nor does it get a pin on them.
+ * If there's a partial page, it adds that first and will then allocate and add
+ * pages into the pipe to make up the buffer space to the amount required.
+ *
+ * The caller must hold the pipe locked and only transferring into a pipe is
+ * supported.
+ */
+static ssize_t iov_iter_extract_pipe_pages(struct iov_iter *i,
+					   struct page ***pages, size_t maxsize,
+					   unsigned int maxpages,
+					   size_t *offset0)
+{
+	unsigned int nr, offset, chunk, j;
+	struct page **p;
+	size_t left;
+
+	if (!sanity(i))
+		return -EFAULT;
+
+	offset = pipe_npages(i, &nr);
+	if (!nr)
+		return -EFAULT;
+	*offset0 = offset;
+
+	maxpages = min_t(size_t, nr, maxpages);
+	maxpages = want_pages_array(pages, maxsize, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	p = *pages;
+
+	left = maxsize;
+	for (j = 0; j < maxpages; j++) {
+		struct page *page = append_pipe(i, left, &offset);
+		if (!page)
+			break;
+		chunk = min_t(size_t, left, PAGE_SIZE - offset);
+		left -= chunk;
+		*p++ = page;
+	}
+	if (!j)
+		return -EFAULT;
+	return maxsize - left;
+}
+
+/*
+ * Extract a list of contiguous pages from an ITER_XARRAY iterator.  This does not
+ * get references on the pages, nor does it get a pin on them.
+ */
+static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
+					     struct page ***pages, size_t maxsize,
+					     unsigned int maxpages,
+					     size_t *offset0)
+{
+	struct page *page, **p;
+	unsigned int nr = 0, offset;
+	loff_t pos = i->xarray_start + i->iov_offset;
+	pgoff_t index = pos >> PAGE_SHIFT;
+	XA_STATE(xas, i->xarray, index);
+
+	offset = pos & ~PAGE_MASK;
+	*offset0 = offset;
+
+	maxpages = want_pages_array(pages, maxsize, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	p = *pages;
+
+	rcu_read_lock();
+	for (page = xas_load(&xas); page; page = xas_next(&xas)) {
+		if (xas_retry(&xas, page))
+			continue;
+
+		/* Has the page moved or been split? */
+		if (unlikely(page != xas_reload(&xas))) {
+			xas_reset(&xas);
+			continue;
+		}
+
+		p[nr++] = find_subpage(page, xas.xa_index);
+		if (nr == maxpages)
+			break;
+	}
+	rcu_read_unlock();
+
+	maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
+	i->iov_offset += maxsize;
+	i->count -= maxsize;
+	return maxsize;
+}
+
+/*
+ * Extract a list of contiguous pages from an ITER_BVEC iterator.  This does
+ * not get references on the pages, nor does it get a pin on them.
+ */
+static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
+					   struct page ***pages, size_t maxsize,
+					   unsigned int maxpages,
+					   size_t *offset0)
+{
+	struct page **p, *page;
+	size_t skip = i->iov_offset, offset;
+	int k;
+
+	maxsize = min(maxsize, i->bvec->bv_len - skip);
+	skip += i->bvec->bv_offset;
+	page = i->bvec->bv_page + skip / PAGE_SIZE;
+	offset = skip % PAGE_SIZE;
+	*offset0 = offset;
+
+	maxpages = want_pages_array(pages, maxsize, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	p = *pages;
+	for (k = 0; k < maxpages; k++)
+		p[k] = page + k;
+
+	maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset);
+	i->count -= maxsize;
+	i->iov_offset += maxsize;
+	if (i->iov_offset == i->bvec->bv_len) {
+		i->iov_offset = 0;
+		i->bvec++;
+		i->nr_segs--;
+	}
+	return maxsize;
+}
+
+/*
+ * Get the first segment from an ITER_UBUF or ITER_IOVEC iterator.  The
+ * iterator must not be empty.
+ */
+static unsigned long iov_iter_extract_first_user_segment(const struct iov_iter *i,
+							 size_t *size)
+{
+	size_t skip;
+	long k;
+
+	if (iter_is_ubuf(i))
+		return (unsigned long)i->ubuf + i->iov_offset;
+
+	for (k = 0, skip = i->iov_offset; k < i->nr_segs; k++, skip = 0) {
+		size_t len = i->iov[k].iov_len - skip;
+
+		if (unlikely(!len))
+			continue;
+		if (*size > len)
+			*size = len;
+		return (unsigned long)i->iov[k].iov_base + skip;
+	}
+	BUG(); // if it had been empty, we wouldn't get called
+}
+
+/*
+ * Extract a list of contiguous pages from a user iterator and get references
+ * on them.  This should only be used iff the iterator is user-backed
+ * (IOBUF/UBUF) and data is being transferred out of the buffer described by
+ * the iterator (ie. this is the source).
+ *
+ * The pages are returned with incremented refcounts that the caller must undo
+ * once the transfer is complete, but no additional pins are obtained.
+ *
+ * This is only safe to be used where background IO/DMA is not going to be
+ * modifying the buffer, and so won't cause a problem with CoW on fork.
+ */
+static ssize_t iov_iter_extract_user_pages_and_get(struct iov_iter *i,
+						   struct page ***pages,
+						   size_t maxsize,
+						   unsigned int maxpages,
+						   size_t *offset0)
+{
+	unsigned long addr;
+	unsigned int gup_flags = FOLL_GET;
+	size_t offset;
+	int res;
+
+	if (WARN_ON_ONCE(iov_iter_rw(i) != WRITE))
+		return -EFAULT;
+
+	if (i->nofault)
+		gup_flags |= FOLL_NOFAULT;
+
+	addr = iov_iter_extract_first_user_segment(i, &maxsize);
+	*offset0 = offset = addr % PAGE_SIZE;
+	addr &= PAGE_MASK;
+	maxpages = want_pages_array(pages, maxsize, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	res = get_user_pages_fast(addr, maxpages, gup_flags, *pages);
+	if (unlikely(res <= 0))
+		return res;
+	maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset);
+	iov_iter_advance(i, maxsize);
+	return maxsize;
+}
+
+/*
+ * Extract a list of contiguous pages from a user iterator and get a pin on
+ * each of them.  This should only be used iff the iterator is user-backed
+ * (IOBUF/UBUF) and data is being transferred into the buffer described by the
+ * iterator (ie. this is the destination).
+ *
+ * It does not get refs on the pages, but the pages must be unpinned by the
+ * caller once the transfer is complete.
+ *
+ * This is safe to be used where background IO/DMA *is* going to be modifying
+ * the buffer; using a pin rather than a ref makes sure that CoW happens
+ * correctly in the parent during fork.
+ */
+static ssize_t iov_iter_extract_user_pages_and_pin(struct iov_iter *i,
+						   struct page ***pages,
+						   size_t maxsize,
+						   unsigned int maxpages,
+						   size_t *offset0)
+{
+	unsigned long addr;
+	unsigned int gup_flags = FOLL_PIN | FOLL_WRITE;
+	size_t offset;
+	int res;
+
+	if (WARN_ON_ONCE(iov_iter_rw(i) != READ))
+		return -EFAULT;
+
+	if (i->nofault)
+		gup_flags |= FOLL_NOFAULT;
+
+	addr = first_iovec_segment(i, &maxsize);
+	*offset0 = offset = addr % PAGE_SIZE;
+	addr &= PAGE_MASK;
+	maxpages = want_pages_array(pages, maxsize, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages);
+	if (unlikely(res <= 0))
+		return res;
+	maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset);
+	iov_iter_advance(i, maxsize);
+	return maxsize;
+}
+
+static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
+					   struct page ***pages, size_t maxsize,
+					   unsigned int maxpages,
+					   size_t *offset0)
+{
+	switch (iov_iter_extract_mode(i)) {
+	case FOLL_GET:
+		return iov_iter_extract_user_pages_and_get(i, pages, maxsize,
+							   maxpages, offset0);
+	case FOLL_PIN:
+		return iov_iter_extract_user_pages_and_pin(i, pages, maxsize,
+							   maxpages, offset0);
+	default:
+		BUG();
+	}
+}
+
+/**
+ * iov_iter_extract_pages - Extract a list of contiguous pages from an iterator
+ * @i: The iterator to extract from
+ * @pages: Where to return the list of pages
+ * @maxsize: The maximum amount of iterator to extract
+ * @maxpages: The maximum size of the list of pages
+ * @offset0: Where to return the starting offset into (*@pages)[0]
+ *
+ * Extract a list of contiguous pages from the current point of the iterator,
+ * advancing the iterator.  The maximum number of pages and the maximum amount
+ * of page contents can be set.
+ *
+ * If *@pages is NULL, a page list will be allocated to the required size and
+ * *@pages will be set to its base.  If *@pages is not NULL, it will be assumed
+ * that the caller allocated a page list at least @maxpages in size and this
+ * will be filled in.
+ *
+ * Extra refs or pins on the pages may be obtained as follows:
+ *
+ *  (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF) and data is to be
+ *      transferred /OUT OF/ the described buffer, refs will be taken on the
+ *      pages, but pins will not be added.  This can be used for DMA from a
+ *      page; it cannot be used for DMA to a page, as it may cause page-COW
+ *      problems in fork.
+ *
+ *  (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF) and data is to be
+ *      transferred /INTO/ the described buffer, pins will be added to the
+ *      pages, but refs will not be taken.  This must be used for DMA to a
+ *      page.
+ *
+ *  (*) If the iterator is ITER_PIPE, this must describe a destination for the
+ *      data.  Additional pages may be allocated and added to the pipe (which
+ *      will hold the refs), but neither refs nor pins will be obtained for the
+ *      caller.  The caller must hold the pipe lock.
+ *
+ *  (*) If the iterator is ITER_BVEC or ITER_XARRAY, the pages are merely
+ *      listed; no extra refs or pins are obtained.
+ *
+ * Note also:
+ *
+ *  (*) Use with ITER_KVEC is not supported as that may refer to memory that
+ *      doesn't have associated page structs.
+ *
+ *  (*) Use with ITER_DISCARD is not supported as that has no content.
+ *
+ * On success, the function sets *@pages to the new pagelist, if allocated, and
+ * sets *offset0 to the offset into the first page and returns the amount of
+ * buffer space added represented by the page list.
+ *
+ * It may also return -ENOMEM and -EFAULT.
+ */
+ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages,
+			       size_t maxsize, unsigned int maxpages,
+			       size_t *offset0)
+{
+	maxsize = min3(maxsize, i->count, MAX_RW_COUNT);
+	if (!maxsize)
+		return 0;
+
+	if (likely(user_backed_iter(i)))
+		return iov_iter_extract_user_pages(i, pages, maxsize,
+						   maxpages, offset0);
+	if (iov_iter_is_bvec(i))
+		return iov_iter_extract_bvec_pages(i, pages, maxsize,
+						   maxpages, offset0);
+	if (iov_iter_is_pipe(i))
+		return iov_iter_extract_pipe_pages(i, pages, maxsize,
+						   maxpages, offset0);
+	if (iov_iter_is_xarray(i))
+		return iov_iter_extract_xarray_pages(i, pages, maxsize,
+						     maxpages, offset0);
+	return -EFAULT;
+}
+EXPORT_SYMBOL(iov_iter_extract_pages);



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/2] cifs: Test use of iov_iter_extract_pages() and iov_iter_extract_mode()
  2022-10-31 14:49 [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor David Howells
  2022-10-31 14:49 ` [RFC PATCH 1/2] iov_iter: Add a function to extract a page list from an iterator David Howells
@ 2022-10-31 14:49 ` David Howells
  2022-10-31 14:52 ` [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor Matthew Wilcox
  2 siblings, 0 replies; 5+ messages in thread
From: David Howells @ 2022-10-31 14:49 UTC (permalink / raw)
  To: viro; +Cc: dhowells, smfrench, torvalds, linux-cifs, linux-kernel

Make cifs use iov_iter_extract_pages() rather than iov_iter_get_pages2*()
so that pages get pins taken on them rather than refs if they're
potentially going to have RDMA read into them.

Note that this is only a test and not complete as I think other refs may
get taken on the pages in other places.  Better, I think, to pass iterators
down as far as possible and avoid extracting the pages at all if we can
manage it.

Two things to note:

 (1) The code has to decide whether put, unpin or abandon the page
     dependent the mode (which is stored in rdata and wdata).  This
     probably isn't necessary in all the cases I've made it check the mode.
     Further, I'm very likely not handling correctly the places where wdata
     or rdata records are split.

 (2) There are a couple of places after calling iov_iter_get_pages2*() in
     the upstream kernel where I think the list of pages and all the pages
     it points to are leaked on error.  I've added some comments at those
     points.
---

 fs/cifs/cifsglob.h |    2 +
 fs/cifs/file.c     |   93 +++++++++++++++++++++++++++++++++++++---------------
 2 files changed, 68 insertions(+), 27 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 1420acf987f0..86792b15197f 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1446,6 +1446,7 @@ struct cifs_readdata {
 	__u64				offset;
 	unsigned int			bytes;
 	unsigned int			got_bytes;
+	unsigned int			pages_cleanup_mode;	/* How to clean up pages[] */
 	pid_t				pid;
 	int				result;
 	struct work_struct		work;
@@ -1479,6 +1480,7 @@ struct cifs_writedata {
 	struct cifs_aio_ctx		*ctx;
 	__u64				offset;
 	pid_t				pid;
+	unsigned int			pages_cleanup_mode;	/* How to clean up pages[] */
 	unsigned int			bytes;
 	int				result;
 	struct TCP_Server_Info		*server;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 21d41b3c1882..385599c08572 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2366,11 +2366,15 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
 		}
 
 		for (j = 0; j < nr_pages; j++) {
-			unlock_page(wdata2->pages[j]);
+			struct page *page = wdata2->pages[j];
+			unlock_page(page);
 			if (rc != 0 && !is_retryable_error(rc)) {
-				SetPageError(wdata2->pages[j]);
-				end_page_writeback(wdata2->pages[j]);
-				put_page(wdata2->pages[j]);
+				SetPageError(page);
+				end_page_writeback(page);
+				if (wdata->pages_cleanup_mode & FOLL_PIN)
+					unpin_user_page(page);
+				if (wdata->pages_cleanup_mode & FOLL_GET)
+					put_page(page);
 			}
 		}
 
@@ -2388,9 +2392,13 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
 
 	/* cleanup remaining pages from the original wdata */
 	for (; i < wdata->nr_pages; i++) {
-		SetPageError(wdata->pages[i]);
-		end_page_writeback(wdata->pages[i]);
-		put_page(wdata->pages[i]);
+		struct page *page = wdata->pages[i];
+		SetPageError(page);
+		end_page_writeback(page);
+		if (wdata->pages_cleanup_mode & FOLL_PIN)
+			unpin_user_page(page);
+		if (wdata->pages_cleanup_mode & FOLL_GET)
+			put_page(page);
 	}
 
 	if (rc != 0 && !is_retryable_error(rc))
@@ -2424,7 +2432,10 @@ cifs_writev_complete(struct work_struct *work)
 			SetPageError(page);
 		end_page_writeback(page);
 		cifs_readpage_to_fscache(inode, page);
-		put_page(page);
+		if (wdata->pages_cleanup_mode & FOLL_PIN)
+			unpin_user_page(page);
+		if (wdata->pages_cleanup_mode & FOLL_GET)
+			put_page(page);
 	}
 	if (wdata->result != -EAGAIN)
 		mapping_set_error(inode->i_mapping, wdata->result);
@@ -2529,6 +2540,7 @@ wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping,
 	if (!wdata)
 		return NULL;
 
+	wdata->pages_cleanup_mode = FOLL_GET;
 	*found_pages = find_get_pages_range_tag(mapping, index, end,
 				PAGECACHE_TAG_DIRTY, tofind, wdata->pages);
 	return wdata;
@@ -3089,8 +3101,12 @@ cifs_uncached_writedata_release(struct kref *refcount)
 					struct cifs_writedata, refcount);
 
 	kref_put(&wdata->ctx->refcount, cifs_aio_ctx_release);
-	for (i = 0; i < wdata->nr_pages; i++)
-		put_page(wdata->pages[i]);
+	if (wdata->pages_cleanup_mode & FOLL_PIN)
+		for (i = 0; i < wdata->nr_pages; i++)
+			unpin_user_page(wdata->pages[i]);
+	if (wdata->pages_cleanup_mode & FOLL_GET)
+		for (i = 0; i < wdata->nr_pages; i++)
+			put_page(wdata->pages[i]);
 	cifs_writedata_release(refcount);
 }
 
@@ -3277,11 +3293,12 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 		if (ctx->direct_io) {
 			ssize_t result;
 
-			result = iov_iter_get_pages_alloc2(
-				from, &pagevec, cur_len, &start);
+			pagevec = NULL;
+			result = iov_iter_extract_pages(
+				from, &pagevec, cur_len, INT_MAX, &start);
 			if (result < 0) {
 				cifs_dbg(VFS,
-					 "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n",
+					 "direct_writev couldn't extract user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n",
 					 result, iov_iter_type(from),
 					 from->iov_offset, from->count);
 				dump_stack();
@@ -3298,12 +3315,14 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 			wdata = cifs_writedata_direct_alloc(pagevec,
 					     cifs_uncached_writev_complete);
 			if (!wdata) {
+				// Just leak pagevec?
 				rc = -ENOMEM;
 				add_credits_and_wake_if(server, credits, 0);
 				break;
 			}
 
 
+			wdata->pages_cleanup_mode = iov_iter_extract_mode(from);
 			wdata->page_offset = start;
 			wdata->tailsz =
 				nr_pages > 1 ?
@@ -3328,6 +3347,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 				break;
 			}
 
+			wdata->pages_cleanup_mode = FOLL_GET;
 			num_pages = nr_pages;
 			rc = wdata_fill_from_iovec(
 				wdata, from, &cur_len, &num_pages);
@@ -3489,9 +3509,9 @@ static ssize_t __cifs_writev(
 	int rc;
 
 	/*
-	 * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
-	 * In this case, fall back to non-direct write function.
-	 * this could be improved by getting pages directly in ITER_KVEC
+	 * iov_iter_extract_pages_alloc can't work with ITER_KVEC as kvec pages
+	 * must not be got.  In this case, fall back to non-direct write
+	 * function.
 	 */
 	if (direct && iov_iter_is_kvec(from)) {
 		cifs_dbg(FYI, "use non-direct cifs_writev for kvec I/O\n");
@@ -3749,7 +3769,12 @@ cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages)
 		unsigned int nr_page_failed = i;
 
 		for (i = 0; i < nr_page_failed; i++) {
-			put_page(rdata->pages[i]);
+			struct page *page = rdata->pages[i];
+
+			if (rdata->pages_cleanup_mode & FOLL_PIN)
+				unpin_user_page(page);
+			if (rdata->pages_cleanup_mode & FOLL_GET)
+				put_page(page);
 			rdata->pages[i] = NULL;
 		}
 	}
@@ -3765,7 +3790,12 @@ cifs_uncached_readdata_release(struct kref *refcount)
 
 	kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
 	for (i = 0; i < rdata->nr_pages; i++) {
-		put_page(rdata->pages[i]);
+		struct page *page = rdata->pages[i];
+
+		if (rdata->pages_cleanup_mode & FOLL_PIN)
+			unpin_user_page(page);
+		if (rdata->pages_cleanup_mode & FOLL_GET)
+			put_page(page);
 	}
 	cifs_readdata_release(refcount);
 }
@@ -3845,7 +3875,10 @@ uncached_fill_pages(struct TCP_Server_Info *server,
 			/* no need to hold page hostage */
 			rdata->pages[i] = NULL;
 			rdata->nr_pages--;
-			put_page(page);
+			if (rdata->pages_cleanup_mode & FOLL_PIN)
+				unpin_user_page(page);
+			if (rdata->pages_cleanup_mode & FOLL_GET)
+				put_page(page);
 			continue;
 		}
 
@@ -4015,12 +4048,12 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 		if (ctx->direct_io) {
 			ssize_t result;
 
-			result = iov_iter_get_pages_alloc2(
-					&direct_iov, &pagevec,
-					cur_len, &start);
+			pagevec = NULL;
+			result = iov_iter_extract_pages(&direct_iov, &pagevec,
+							cur_len, INT_MAX, &start);
 			if (result < 0) {
 				cifs_dbg(VFS,
-					 "Couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n",
+					 "Couldn't extract user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n",
 					 result, iov_iter_type(&direct_iov),
 					 direct_iov.iov_offset,
 					 direct_iov.count);
@@ -4035,6 +4068,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 			rdata = cifs_readdata_direct_alloc(
 					pagevec, cifs_uncached_readv_complete);
 			if (!rdata) {
+				// Leaks pagevec?
 				add_credits_and_wake_if(server, credits, 0);
 				rc = -ENOMEM;
 				break;
@@ -4045,6 +4079,8 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 			rdata->tailsz = npages > 1 ?
 				cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
 				cur_len;
+			rdata->pages_cleanup_mode =
+				iov_iter_extract_mode(&direct_iov);
 
 		} else {
 
@@ -4228,9 +4264,9 @@ static ssize_t __cifs_readv(
 	struct cifs_aio_ctx *ctx;
 
 	/*
-	 * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
-	 * fall back to data copy read path
-	 * this could be improved by getting pages directly in ITER_KVEC
+	 * iov_iter_extract_pages() doesn't work with ITER_KVEC as kvec pages
+	 * can't have refs or pins taken on them - so fall back to data copy
+	 * read path
 	 */
 	if (direct && iov_iter_is_kvec(to)) {
 		cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
@@ -4563,7 +4599,10 @@ cifs_readv_complete(struct work_struct *work)
 
 		got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes);
 
-		put_page(page);
+		if (rdata->pages_cleanup_mode & FOLL_PIN)
+			unpin_user_page(page);
+		if (rdata->pages_cleanup_mode & FOLL_GET)
+			put_page(page);
 		rdata->pages[i] = NULL;
 	}
 	kref_put(&rdata->refcount, cifs_readdata_release);



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor
  2022-10-31 14:49 [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor David Howells
  2022-10-31 14:49 ` [RFC PATCH 1/2] iov_iter: Add a function to extract a page list from an iterator David Howells
  2022-10-31 14:49 ` [RFC PATCH 2/2] cifs: Test use of iov_iter_extract_pages() and iov_iter_extract_mode() David Howells
@ 2022-10-31 14:52 ` Matthew Wilcox
  2022-10-31 20:46   ` John Hubbard
  2 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2022-10-31 14:52 UTC (permalink / raw)
  To: David Howells
  Cc: viro, linux-mm, John Hubbard, linux-fsdevel, Christoph Hellwig,
	smfrench, torvalds, linux-cifs, linux-kernel

On Mon, Oct 31, 2022 at 02:49:32PM +0000, David Howells wrote:
> I added a macro by which you can query an iterator to find out how the
> extraction function will treat the pages (it returns 0, FOLL_GET or FOLL_PIN
> as appropriate).  Note that it's a macro to avoid #inclusion of linux/mm.h in
> linux/uio.h.

I'd support moving FOLL_* definitions to mm_types.h along with
FAULT_FLAG_* and VM_FAULT_*.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor
  2022-10-31 14:52 ` [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor Matthew Wilcox
@ 2022-10-31 20:46   ` John Hubbard
  0 siblings, 0 replies; 5+ messages in thread
From: John Hubbard @ 2022-10-31 20:46 UTC (permalink / raw)
  To: Matthew Wilcox, David Howells
  Cc: viro, linux-mm, linux-fsdevel, Christoph Hellwig, smfrench,
	torvalds, linux-cifs, linux-kernel

On 10/31/22 07:52, Matthew Wilcox wrote:
> On Mon, Oct 31, 2022 at 02:49:32PM +0000, David Howells wrote:
>> I added a macro by which you can query an iterator to find out how the
>> extraction function will treat the pages (it returns 0, FOLL_GET or FOLL_PIN
>> as appropriate).  Note that it's a macro to avoid #inclusion of linux/mm.h in
>> linux/uio.h.
> 
> I'd support moving FOLL_* definitions to mm_types.h along with
> FAULT_FLAG_* and VM_FAULT_*.

+1, great idea. The use of FOLL_* without including it's .h directly was
the first thing that jumped out at me when I just started looking at
this. And these really are mm types, so yes.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-31 20:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31 14:49 [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor David Howells
2022-10-31 14:49 ` [RFC PATCH 1/2] iov_iter: Add a function to extract a page list from an iterator David Howells
2022-10-31 14:49 ` [RFC PATCH 2/2] cifs: Test use of iov_iter_extract_pages() and iov_iter_extract_mode() David Howells
2022-10-31 14:52 ` [RFC PATCH 0/2] iov_iter: Provide a function to extract/pin/get pages from an iteraor Matthew Wilcox
2022-10-31 20:46   ` John Hubbard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).