All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Layton <jlayton@kernel.org>,
	Steve French <sfrench@samba.org>,
	Shyam Prasad N <nspmangalore@gmail.com>,
	Rohith Surabattula <rohiths.msft@gmail.com>,
	linux-cachefs@redhat.com, linux-cifs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, dhowells@redhat.com,
	Christoph Hellwig <hch@infradead.org>,
	Matthew Wilcox <willy@infradead.org>,
	Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.cz>,
	Jeff Layton <jlayton@kernel.org>,
	Logan Gunthorpe <logang@deltatee.com>,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v6 13/34] netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator
Date: Mon, 16 Jan 2023 23:09:34 +0000	[thread overview]
Message-ID: <167391057444.2311931.12321968641492694017.stgit@warthog.procyon.org.uk> (raw)
In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk>

Add a function to extract the pages from a user-space supplied iterator
(UBUF- or IOVEC-type) into a BVEC-type iterator, retaining the pages by
getting a ref on them (FOLL_SOURCE_BUF is indicated) or pinning them
(FOLL_DEST_BUF is indicated) as we go.

This is useful in three situations:

 (1) A userspace thread may have a sibling that unmaps or remaps the
     process's VM during the operation, changing the assignment of the
     pages and potentially causing an error.  Retaining the pages keeps
     some pages around, even if this occurs; futher, we find out at the
     point of extraction if EFAULT is going to be incurred.

 (2) Pages might get swapped out/discarded if not retained, so we want to
     retain them to avoid the reload causing a deadlock due to a DIO
     from/to an mmapped region on the same file.

 (3) The iterator may get passed to sendmsg() by the filesystem.  If a
     fault occurs, we may get a short write to a TCP stream that's then
     tricky to recover from.

We don't deal with other types of iterator here, leaving it to other
mechanisms to retain the pages (eg. PG_locked, PG_writeback and the pipe
lock).

Changes:
========
ver #6)
 - Pass in a gup_flags argument to allow FOLL_SOURCE_BUF and FOLL_DEST_BUF
   and other FOLL_* flags to be passed in.
 - Don't pass back the cleanup mode - iov_iter_extract_mode() can be used
   to determine that.

ver #3)
 - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access
   to get/pin_user_pages_fast()[1].

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org

Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1]
Link: https://lore.kernel.org/r/166697255265.61150.6289490555867717077.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732026503.3186319.12020462741051772825.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166869690376.3723671.8813331570219190705.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166920904810.1461876.11603559311247187100.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/166997422579.9475.12101700945635692496.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/167305164634.1521586.12199658904363317567.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/167344729278.2425628.3277966637577509831.stgit@warthog.procyon.org.uk/ # v5
---

 fs/netfs/Makefile     |    1 
 fs/netfs/iterator.c   |  102 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/netfs.h |    2 +
 3 files changed, 105 insertions(+)
 create mode 100644 fs/netfs/iterator.c

diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index f684c0cd1ec5..386d6fb92793 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -3,6 +3,7 @@
 netfs-y := \
 	buffered_read.o \
 	io.o \
+	iterator.o \
 	main.o \
 	objects.o
 
diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
new file mode 100644
index 000000000000..f7f26de1a247
--- /dev/null
+++ b/fs/netfs/iterator.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Iterator helpers.
+ *
+ * Copyright (C) 2022 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/export.h>
+#include <linux/slab.h>
+#include <linux/uio.h>
+#include <linux/netfs.h>
+#include "internal.h"
+
+/**
+ * netfs_extract_user_iter - Extract the pages from a user iterator into a bvec
+ * @orig: The original iterator
+ * @orig_len: The amount of iterator to copy
+ * @new: The iterator to be set up
+ * @gup_flags: Direction indicator and additional flags
+ *
+ * Extract the page fragments from the given amount of the source iterator and
+ * build up a second iterator that refers to all of those bits.  This allows
+ * the original iterator to disposed of.
+ *
+ * @gup_flags should indicate FOLL_SOURCE_BUF or FOLL_DEST_BUF plus any
+ * additional flags needed.
+ *
+ * On success, the number of elements in the bvec is returned, the original
+ * iterator will have been advanced by the amount extracted.
+ *
+ * The iov_iter_extract_mode() function should be used to query how cleanup
+ * should be performed.
+ */
+ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len,
+				struct iov_iter *new, unsigned int gup_flags)
+{
+	struct bio_vec *bv = NULL;
+	struct page **pages;
+	unsigned int cur_npages;
+	unsigned int max_pages;
+	unsigned int npages = 0;
+	unsigned int i;
+	ssize_t ret;
+	size_t count = orig_len, offset, len;
+	size_t bv_size, pg_size;
+
+	if (WARN_ON_ONCE(!iter_is_ubuf(orig) && !iter_is_iovec(orig)))
+		return -EIO;
+
+	max_pages = iov_iter_npages(orig, INT_MAX);
+	bv_size = array_size(max_pages, sizeof(*bv));
+	bv = kvmalloc(bv_size, GFP_KERNEL);
+	if (!bv)
+		return -ENOMEM;
+
+	/* Put the page list at the end of the bvec list storage.  bvec
+	 * elements are larger than page pointers, so as long as we work
+	 * 0->last, we should be fine.
+	 */
+	pg_size = array_size(max_pages, sizeof(*pages));
+	pages = (void *)bv + bv_size - pg_size;
+
+	while (count && npages < max_pages) {
+		ret = iov_iter_extract_pages(orig, &pages, count,
+					     max_pages - npages, gup_flags,
+					     &offset);
+		if (ret < 0) {
+			pr_err("Couldn't get user pages (rc=%zd)\n", ret);
+			break;
+		}
+
+		if (ret > count) {
+			pr_err("get_pages rc=%zd more than %zu\n", ret, count);
+			break;
+		}
+
+		count -= ret;
+		ret += offset;
+		cur_npages = DIV_ROUND_UP(ret, PAGE_SIZE);
+
+		if (npages + cur_npages > max_pages) {
+			pr_err("Out of bvec array capacity (%u vs %u)\n",
+			       npages + cur_npages, max_pages);
+			break;
+		}
+
+		for (i = 0; i < cur_npages; i++) {
+			len = ret > PAGE_SIZE ? PAGE_SIZE : ret;
+			bv[npages + i].bv_page	 = *pages++;
+			bv[npages + i].bv_offset = offset;
+			bv[npages + i].bv_len	 = len - offset;
+			ret -= len;
+			offset = 0;
+		}
+
+		npages += cur_npages;
+	}
+
+	iov_iter_bvec(new, orig->data_source, bv, npages, orig_len - count);
+	return npages;
+}
+EXPORT_SYMBOL_GPL(netfs_extract_user_iter);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 4c76ddfb6a67..a45757dd382d 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -296,6 +296,8 @@ void netfs_get_subrequest(struct netfs_io_subrequest *subreq,
 void netfs_put_subrequest(struct netfs_io_subrequest *subreq,
 			  bool was_async, enum netfs_sreq_ref_trace what);
 void netfs_stats_show(struct seq_file *);
+ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len,
+				struct iov_iter *new, unsigned int gup_flags);
 
 /**
  * netfs_inode - Get the netfs inode context from the inode



  parent reply	other threads:[~2023-01-16 23:12 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-16 23:07 [PATCH v6 00/34] iov_iter: Improve page extraction (ref, pin or just list) David Howells
2023-01-16 23:08 ` [PATCH v6 01/34] vfs: Unconditionally set IOCB_WRITE in call_write_iter() David Howells
2023-01-17  7:52   ` Christoph Hellwig
2023-01-18 22:11     ` Al Viro
2023-01-19  5:44       ` Christoph Hellwig
2023-01-19 11:34       ` David Howells
2023-01-19 16:48         ` Christoph Hellwig
2023-01-19 21:14         ` David Howells
2023-01-17  8:28   ` David Howells
2023-01-17  8:44     ` Christoph Hellwig
2023-01-17 11:11   ` David Howells
2023-01-17 11:11     ` Christoph Hellwig
2023-01-18 22:05   ` Al Viro
2023-01-19  5:41     ` Christoph Hellwig
2023-01-19 10:01   ` David Howells
2023-01-19 16:46     ` Christoph Hellwig
2023-01-19 11:06   ` David Howells
2023-01-16 23:08 ` [PATCH v6 02/34] iov_iter: Use IOCB/IOMAP_WRITE/op_is_write rather than iterator direction David Howells
2023-01-17  7:55   ` Christoph Hellwig
2023-01-18 22:18     ` Al Viro
2023-01-16 23:08 ` [PATCH v6 03/34] iov_iter: Pass I/O direction into iov_iter_get_pages*() David Howells
2023-01-17  7:57   ` Christoph Hellwig
2023-01-17  8:07     ` David Hildenbrand
2023-01-17  8:09       ` Christoph Hellwig
2023-01-18 23:03     ` Al Viro
2023-01-19  0:15       ` Al Viro
2023-01-19  2:11         ` Al Viro
2023-01-19  5:47           ` Christoph Hellwig
2023-01-19  5:47         ` Christoph Hellwig
2023-01-17  8:44   ` David Howells
2023-01-17  8:46     ` Christoph Hellwig
2023-01-17  8:47     ` David Hildenbrand
2023-01-16 23:08 ` [PATCH v6 04/34] iov_iter: Remove iov_iter_get_pages2/pages_alloc2() David Howells
2023-01-16 23:08 ` [PATCH v6 05/34] iov_iter: Change the direction macros into an enum David Howells
2023-01-18 23:14   ` Al Viro
2023-01-18 23:17   ` David Howells
2023-01-18 23:19     ` Al Viro
2023-01-18 23:24     ` David Howells
2023-01-16 23:08 ` [PATCH v6 06/34] iov_iter: Use the direction in the iterator functions David Howells
2023-01-17  7:58   ` Christoph Hellwig
2023-01-18 22:28     ` Al Viro
2023-01-19  5:45       ` Christoph Hellwig
2023-01-18 23:15   ` Al Viro
2023-01-16 23:08 ` [PATCH v6 07/34] iov_iter: Add a function to extract a page list from an iterator David Howells
2023-01-17  8:01   ` Christoph Hellwig
2023-01-17  8:19   ` David Howells
2023-01-16 23:08 ` [PATCH v6 08/34] mm: Provide a helper to drop a pin/ref on a page David Howells
2023-01-17  8:02   ` Christoph Hellwig
2023-01-17  8:21   ` David Howells
2023-01-16 23:09 ` [PATCH v6 09/34] bio: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning David Howells
2023-01-17  8:02   ` Christoph Hellwig
2023-01-16 23:09 ` [PATCH v6 10/34] mm, block: Make BIO_PAGE_REFFED/PINNED the same as FOLL_GET/PIN numerically David Howells
2023-01-17  8:03   ` Christoph Hellwig
2023-01-16 23:09 ` [PATCH v6 11/34] iov_iter, block: Make bio structs pin pages rather than ref'ing if appropriate David Howells
2023-01-17  8:07   ` Christoph Hellwig
2023-01-17  8:26   ` David Howells
2023-01-17  8:44     ` Christoph Hellwig
2023-01-16 23:09 ` [PATCH v6 12/34] bio: Fix bio_flagged() so that gcc can better optimise it David Howells
2023-01-17  8:07   ` Christoph Hellwig
2023-01-16 23:09 ` David Howells [this message]
2023-01-16 23:09 ` [PATCH v6 14/34] netfs: Add a function to extract an iterator into a scatterlist David Howells
2023-01-16 23:09 ` [PATCH v6 15/34] af_alg: Pin pages rather than ref'ing if appropriate David Howells
2023-01-16 23:09 ` [PATCH v6 16/34] af_alg: [RFC] Use netfs_extract_iter_to_sg() to create scatterlists David Howells
2023-01-16 23:10 ` [PATCH v6 17/34] scsi: [RFC] Use netfs_extract_iter_to_sg() David Howells
2023-01-16 23:10 ` [PATCH v6 18/34] dio: Pin pages rather than ref'ing if appropriate David Howells
2023-01-19  5:04   ` Al Viro
2023-01-19  5:51     ` Christoph Hellwig
2023-01-16 23:10 ` [PATCH v6 19/34] fuse: " David Howells
2023-01-16 23:10 ` [PATCH v6 20/34] vfs: Make splice use iov_iter_extract_pages() David Howells
2023-01-19  2:31   ` Al Viro
2023-01-16 23:10 ` [PATCH v6 21/34] 9p: Pin pages rather than ref'ing if appropriate David Howells
2023-01-19  2:52   ` Al Viro
2023-01-19 16:44   ` David Howells
2023-01-19 16:51     ` Christoph Hellwig
2023-01-16 23:10 ` [PATCH v6 22/34] nfs: " David Howells
2023-01-16 23:10 ` [PATCH v6 23/34] cifs: Implement splice_read to pass down ITER_BVEC not ITER_PIPE David Howells
2023-01-16 23:10 ` [PATCH v6 24/34] cifs: Add a function to build an RDMA SGE list from an iterator David Howells
2023-01-16 23:11 ` [PATCH v6 25/34] cifs: Add a function to Hash the contents of " David Howells
2023-01-16 23:11 ` [PATCH v6 26/34] cifs: Add some helper functions David Howells
2023-01-16 23:11 ` [PATCH v6 27/34] cifs: Add a function to read into an iter from a socket David Howells
2023-01-16 23:11 ` [PATCH v6 28/34] cifs: Change the I/O paths to use an iterator rather than a page list David Howells
2023-01-16 23:11 ` [PATCH v6 29/34] cifs: Build the RDMA SGE list directly from an iterator David Howells
2023-01-16 23:11 ` [PATCH v6 30/34] cifs: Remove unused code David Howells
2023-01-16 23:11 ` [PATCH v6 31/34] cifs: Fix problem with encrypted RDMA data read David Howells
2023-01-19 16:25   ` Stefan Metzmacher
2023-01-16 23:11 ` [PATCH v6 32/34] cifs: DIO to/from KVEC-type iterators should now work David Howells
2023-01-16 23:12 ` [PATCH v6 33/34] net: [RFC][WIP] Mark each skb_frags as to how they should be cleaned up David Howells
2023-01-16 23:12 ` [PATCH v6 34/34] net: [RFC][WIP] Make __zerocopy_sg_from_iter() correctly pin or leave pages unref'd David Howells
2023-01-17  7:46 ` [PATCH v6 00/34] iov_iter: Improve page extraction (ref, pin or just list) Christoph Hellwig
2023-01-18 14:00 ` [PATCH v6 09/34] bio: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning David Howells
2023-01-18 14:09   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=167391057444.2311931.12321968641492694017.stgit@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=nspmangalore@gmail.com \
    --cc=rohiths.msft@gmail.com \
    --cc=sfrench@samba.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.