All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	David Sterba <dsterba@suse.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Anton Altaparmakov <anton@tuxera.com>,
	David Howells <dhowells@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Pavel Begunkov <asml.silence@gmail.com>
Subject: Re: [RFC][PATCHSET] iov_iter work
Date: Sun, 6 Jun 2021 23:29:59 +0000	[thread overview]
Message-ID: <YL1ad7+I30xnCto8@zeniv-ca.linux.org.uk> (raw)
In-Reply-To: <CAHk-=wj6ZiTgqbeCPtzP+5tgHjur6Amag66YWub_2DkGpP9h-Q@mail.gmail.com>

On Sun, Jun 06, 2021 at 03:05:49PM -0700, Linus Torvalds wrote:

> The end result certainly looks cleaner than before, although I can't
> say that it's pretty. It's still haitry and grotty, but at least
> _some_ of the hairiness has been removed (yay for
> _copy_from_iter_full() being *much* simpler now, and
> iterate_all_kinds() being gone).
> 
> I also have to admit to despising what iov_iter_init() ends up looking
> like. Not because I think that using those initializer assignments to
> set it is wrong, but simply because the initializers are basically
> identical except for "iter_type".

TBH, I wonder if we still need that these days.  That is to say,
do we even call iov_iter_init() when uaccess_kernel() is true?
We certainly used to do that, but...  We have

* copy_from_kernel_nofault(); calls __copy_from_user_inatomic() under
KERNEL_DS and pagefault_disable(), and that would better *not* fuck with
iov_iter inside.

* copy_to_kernel_nofault(); similar, with s/_from_/_to_/.

* strncpy_from_kernel_nofault(); ditto, with __get_user() instead of
__copy_..._user_inatomic().

* unaligned trap handling in kernel mode on sh and itanic

* arm dumping insns in oops handler

* 3 turds in arm oabi compat: sys_semtimedop_time32(), sys_epoll_wait()
and sys_fcntl64() (for [GS]ETLK... fcntls) called under KERNEL_DS.
The last one should simply call fcntl_[gs]etlk() and be done with
that, not that it was going to play with any iov_iter anyway.
epoll_wait() is currently not using iov_iter (it does use __put_user()
and not in a pretty way, but that's a separate story).  And
for semtimedop_time32(), we do not have any iov_iter in sight *and*
I would rather get rid of KERNEL_DS there completely - all it takes
is a variant of do_semtimedop() with tsops already copied into the
kernel.  In any case, none of those do iov_iter_init().

	I have *not* checked if any kernel threads might be playing
with iov_iter_init() outside of kthread_use_mm(); some might, but
IMO any such place would be a good candidate for conversion to
explicit iov_iter_kvec()...

	BTW, speaking of initializers...  Pavel, could you check if
the following breaks anything?  Unless I'm misreading __io_import_fixed(),
looks like that's what it's trying to do...

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f46acbbeed57..9bd2da9a4c3d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2773,57 +2773,14 @@ static int __io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter
 {
 	size_t len = req->rw.len;
 	u64 buf_end, buf_addr = req->rw.addr;
-	size_t offset;
 
 	if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end)))
 		return -EFAULT;
 	/* not inside the mapped region */
 	if (unlikely(buf_addr < imu->ubuf || buf_end > imu->ubuf_end))
 		return -EFAULT;
-
-	/*
-	 * May not be a start of buffer, set size appropriately
-	 * and advance us to the beginning.
-	 */
-	offset = buf_addr - imu->ubuf;
-	iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len);
-
-	if (offset) {
-		/*
-		 * Don't use iov_iter_advance() here, as it's really slow for
-		 * using the latter parts of a big fixed buffer - it iterates
-		 * over each segment manually. We can cheat a bit here, because
-		 * we know that:
-		 *
-		 * 1) it's a BVEC iter, we set it up
-		 * 2) all bvecs are PAGE_SIZE in size, except potentially the
-		 *    first and last bvec
-		 *
-		 * So just find our index, and adjust the iterator afterwards.
-		 * If the offset is within the first bvec (or the whole first
-		 * bvec, just use iov_iter_advance(). This makes it easier
-		 * since we can just skip the first segment, which may not
-		 * be PAGE_SIZE aligned.
-		 */
-		const struct bio_vec *bvec = imu->bvec;
-
-		if (offset <= bvec->bv_len) {
-			iov_iter_advance(iter, offset);
-		} else {
-			unsigned long seg_skip;
-
-			/* skip first vec */
-			offset -= bvec->bv_len;
-			seg_skip = 1 + (offset >> PAGE_SHIFT);
-
-			iter->bvec = bvec + seg_skip;
-			iter->nr_segs -= seg_skip;
-			iter->count -= bvec->bv_len + offset;
-			iter->iov_offset = offset & ~PAGE_MASK;
-		}
-	}
-
-	return 0;
+	return import_pagevec(rw, buf_addr, len, imu->ubuf,
+			      imu->nr_bvecs, imu->bvec, iter);
 }
 
 static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index fd88d9911dad..f6291e981d07 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -299,5 +299,8 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec,
 		 struct iov_iter *i, bool compat);
 int import_single_range(int type, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i);
+int import_pagevec(int rw, unsigned long from, size_t len,
+		unsigned long base, unsigned nr_pages,
+		struct bio_vec *bvec, struct iov_iter *i);
 
 #endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 11b39bd1d1ab..4a771fcb529b 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1982,3 +1982,21 @@ int import_single_range(int rw, void __user *buf, size_t len,
 	return 0;
 }
 EXPORT_SYMBOL(import_single_range);
+
+int import_pagevec(int rw, unsigned long from, size_t len,
+		unsigned long base, unsigned nr_pages,
+		struct bio_vec *bvec, struct iov_iter *i)
+
+{
+	unsigned long skip_pages = (from >> PAGE_SHIFT) - (base >> PAGE_SHIFT);
+
+	*i = (struct iov_iter){
+		.iter_type = ITER_BVEC,
+		.data_source = rw,
+		.bvec = bvec + skip_pages,
+		.nr_segs = nr_pages - skip_pages,
+		.iov_offset = skip_pages ? from & ~PAGE_MASK : from - base,
+		.count = len
+	};
+	return 0;
+}
-- 
2.11.0


  parent reply	other threads:[~2021-06-06 23:30 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-06 19:07 [RFC][PATCHSET] iov_iter work Al Viro
2021-06-06 19:10 ` [RFC PATCH 01/37] ntfs_copy_from_user_iter(): don't bother with copying iov_iter Al Viro
2021-06-06 19:10   ` [RFC PATCH 02/37] generic_perform_write()/iomap_write_actor(): saner logics for short copy Al Viro
2021-06-06 19:10   ` [RFC PATCH 03/37] fuse_fill_write_pages(): don't bother with iov_iter_single_seg_count() Al Viro
2021-06-06 19:10   ` [RFC PATCH 04/37] iov_iter: Remove iov_iter_for_each_range() Al Viro
2021-06-06 19:10   ` [RFC PATCH 05/37] teach copy_page_to_iter() to handle compound pages Al Viro
2021-06-06 19:10   ` [RFC PATCH 06/37] copy_page_to_iter(): fix ITER_DISCARD case Al Viro
2021-06-06 19:10   ` [RFC PATCH 07/37] [xarray] iov_iter_fault_in_readable() should do nothing in xarray case Al Viro
2021-06-06 19:10   ` [RFC PATCH 08/37] iov_iter_advance(): use consistent semantics for move past the end Al Viro
2021-06-06 19:10   ` [RFC PATCH 09/37] iov_iter: switch ..._full() variants of primitives to use of iov_iter_revert() Al Viro
2021-06-06 19:10   ` [RFC PATCH 10/37] iov_iter: reorder handling of flavours in primitives Al Viro
2021-06-06 19:10   ` [RFC PATCH 11/37] iov_iter_advance(): don't modify ->iov_offset for ITER_DISCARD Al Viro
2021-06-06 19:10   ` [RFC PATCH 12/37] iov_iter: separate direction from flavour Al Viro
2021-06-06 19:10   ` [RFC PATCH 13/37] iov_iter: optimize iov_iter_advance() for iovec and kvec Al Viro
2021-06-06 19:10   ` [RFC PATCH 14/37] sanitize iov_iter_fault_in_readable() Al Viro
2021-06-06 19:10   ` [RFC PATCH 15/37] iov_iter_alignment(): don't bother with iterate_all_kinds() Al Viro
2021-06-06 19:10   ` [RFC PATCH 16/37] iov_iter_gap_alignment(): get rid of iterate_all_kinds() Al Viro
2021-06-09 13:01     ` Qian Cai
2021-06-09 18:06       ` Al Viro
2021-06-06 19:10   ` [RFC PATCH 17/37] get rid of iterate_all_kinds() in iov_iter_get_pages()/iov_iter_get_pages_alloc() Al Viro
2021-06-06 19:10   ` [RFC PATCH 18/37] iov_iter_npages(): don't bother with iterate_all_kinds() Al Viro
2021-06-06 19:10   ` [RFC PATCH 19/37] [xarray] iov_iter_npages(): just use DIV_ROUND_UP() Al Viro
2021-06-06 19:10   ` [RFC PATCH 20/37] iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant Al Viro
2021-06-06 19:10   ` [RFC PATCH 21/37] csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter() Al Viro
2021-06-06 19:10   ` [RFC PATCH 22/37] iterate_and_advance(): get rid of magic in case when n is 0 Al Viro
2021-06-06 19:10   ` [RFC PATCH 23/37] iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec Al Viro
2021-06-06 19:10   ` [RFC PATCH 24/37] iov_iter: unify iterate_iovec and iterate_kvec Al Viro
2021-06-06 19:10   ` [RFC PATCH 25/37] iterate_bvec(): expand bvec.h macro forest, massage a bit Al Viro
2021-06-06 19:10   ` [RFC PATCH 26/37] iov_iter: teach iterate_{bvec,xarray}() about possible short copies Al Viro
2021-06-06 19:10   ` [RFC PATCH 27/37] iov_iter: get rid of separate bvec and xarray callbacks Al Viro
2021-06-06 19:10   ` [RFC PATCH 28/37] iov_iter: make the amount already copied available to iterator callbacks Al Viro
2021-06-06 19:10   ` [RFC PATCH 29/37] iov_iter: make iterator callbacks use base and len instead of iovec Al Viro
2021-06-06 19:10   ` [RFC PATCH 30/37] pull handling of ->iov_offset into iterate_{iovec,bvec,xarray} Al Viro
2021-06-06 19:10   ` [RFC PATCH 31/37] iterate_xarray(): only of the first iteration we might get offset != 0 Al Viro
2021-06-06 19:10   ` [RFC PATCH 32/37] copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases Al Viro
2021-06-06 19:10   ` [RFC PATCH 33/37] copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases Al Viro
2021-06-06 19:10   ` [RFC PATCH 34/37] iov_iter: clean csum_and_copy_...() primitives up a bit Al Viro
2021-06-06 19:10   ` [RFC PATCH 35/37] pipe_zero(): we don't need no stinkin' kmap_atomic() Al Viro
2021-06-06 19:10   ` [RFC PATCH 36/37] clean up copy_mc_pipe_to_iter() Al Viro
2021-06-06 19:10   ` [RFC PATCH 37/37] csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller Al Viro
2021-06-06 22:05 ` [RFC][PATCHSET] iov_iter work Linus Torvalds
2021-06-06 22:46   ` Linus Torvalds
2021-06-07  9:28     ` Christoph Hellwig
2021-06-07 14:43       ` Al Viro
2021-06-07 15:59         ` Christoph Hellwig
2021-06-07 21:07           ` Al Viro
2021-06-07 22:01             ` Linus Torvalds
2021-06-07 23:35               ` Linus Torvalds
2021-06-08  5:25                 ` Christoph Hellwig
2021-06-08 11:27                 ` Al Viro
2021-06-06 23:29   ` Al Viro [this message]
2021-06-07 10:38     ` Pavel Begunkov
2021-06-08 14:43 ` David Laight
2021-06-10 14:29 ` Qian Cai
2021-06-10 15:35   ` Al Viro
2021-06-10 15:48     ` Al Viro
2021-06-10 19:08     ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YL1ad7+I30xnCto8@zeniv-ca.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=anton@tuxera.com \
    --cc=asml.silence@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dsterba@suse.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.