linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>,
	Matthew Wilcox <willy@infradead.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] what to do with IOCB_DSYNC?
Date: Sun, 22 May 2022 19:50:55 -0600	[thread overview]
Message-ID: <9c3a6ad4-cdb5-8e0d-9b01-c2825ea891ad@kernel.dk> (raw)
In-Reply-To: <22de2da8-7db0-68c5-2c85-d752a67f9604@kernel.dk>

[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]

On 5/22/22 7:28 PM, Jens Axboe wrote:
> On 5/22/22 7:22 PM, Jens Axboe wrote:
>> On 5/22/22 6:42 PM, Al Viro wrote:
>>> On Sun, May 22, 2022 at 02:03:35PM -0600, Jens Axboe wrote:
>>>
>>>> Right, I'm saying it's not _immediately_ clear which cases are what when
>>>> reading the code.
>>>>
>>>>> up a while ago.  And no, turning that into indirect calls ended up with
>>>>> arseloads of overhead, more's the pity...
>>>>
>>>> It's a shame, since indirect calls make for nicer code, but it's always
>>>> been slower and these days even more so.
>>>>
>>>>> Anyway, at the moment I have something that builds; hadn't tried to
>>>>> boot it yet.
>>>>
>>>> Nice!
>>>
>>> Boots and survives LTP and xfstests...  Current variant is in
>>> vfs.git#work.iov_iter (head should be at 27fa77a9829c).  I have *not*
>>> looked into the code generation in primitives; the likely/unlikely on
>>> those cascades of ifs need rethinking.
>>
>> I noticed too. Haven't fiddled much in iov_iter.c, but for uio.h I had
>> the below. iov_iter.c is a worse "offender" though, with 53 unlikely and
>> 22 likely annotations...
> 
> Here it is...

Few more, most notably making sure that dio dirties reads even if they
are not of the iovec type.

Last two just add a helper for import_ubuf() and then adopts it for
io_uring send/recv which is also a hot path. The single range read/write
can be converted too, but that needs a bit more work...

-- 
Jens Axboe

[-- Attachment #2: 0004-io_uring-use-import_ubuf-for-send-recv.patch --]
[-- Type: text/x-patch, Size: 1713 bytes --]

From d05b59dc10e6c7c74da439561831f744196eb412 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Sun, 22 May 2022 19:49:36 -0600
Subject: [PATCH 4/4] io_uring: use import_ubuf() for send/recv

We just have the single buffer, no point in using ITER_IOVEC when we
can be using the more efficient ITER_UBUF instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 73f53c208df2..91255f2326e5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5272,7 +5272,6 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_sr_msg *sr = &req->sr_msg;
 	struct msghdr msg;
-	struct iovec iov;
 	struct socket *sock;
 	unsigned flags;
 	int min_ret = 0;
@@ -5282,7 +5281,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(!sock))
 		return -ENOTSOCK;
 
-	ret = import_single_range(WRITE, sr->buf, sr->len, &iov, &msg.msg_iter);
+	ret = import_ubuf(WRITE, sr->buf, sr->len, &msg.msg_iter);
 	if (unlikely(ret))
 		return ret;
 
@@ -5521,7 +5520,6 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	struct msghdr msg;
 	void __user *buf = sr->buf;
 	struct socket *sock;
-	struct iovec iov;
 	unsigned flags;
 	int ret, min_ret = 0;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
@@ -5537,7 +5535,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 		buf = u64_to_user_ptr(kbuf->addr);
 	}
 
-	ret = import_single_range(READ, buf, sr->len, &iov, &msg.msg_iter);
+	ret = import_ubuf(READ, buf, sr->len, &msg.msg_iter);
 	if (unlikely(ret))
 		goto out_free;
 
-- 
2.35.1


[-- Attachment #3: 0003-iov-add-import_ubuf.patch --]
[-- Type: text/x-patch, Size: 1589 bytes --]

From 5786b1db0bf43313092e61797f639290079b2709 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Sun, 22 May 2022 19:47:29 -0600
Subject: [PATCH 3/4] iov: add import_ubuf()

Like import_single_range(), but for ITER_UBUF.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/uio.h |  1 +
 lib/iov_iter.c      | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index d10c19a650a8..4e473ea90b20 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -331,6 +331,7 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec,
 		 struct iov_iter *i, bool compat);
 int import_single_range(int type, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i);
+int import_ubuf(int type, void __user *buf, size_t len, struct iov_iter *i);
 
 static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 			void __user *buf, size_t count)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index aa41394174eb..348412757335 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -2167,6 +2167,17 @@ int import_single_range(int rw, void __user *buf, size_t len,
 }
 EXPORT_SYMBOL(import_single_range);
 
+int import_ubuf(int rw, void __user *buf, size_t len, struct iov_iter *i)
+{
+	if (len > MAX_RW_COUNT)
+		len = MAX_RW_COUNT;
+	if (unlikely(!access_ok(buf, len)))
+		return -EFAULT;
+
+	iov_iter_ubuf(i, rw, buf, len);
+	return 0;
+}
+
 /**
  * iov_iter_restore() - Restore a &struct iov_iter to the same state as when
  *     iov_iter_save_state() was called.
-- 
2.35.1


[-- Attachment #4: 0002-Switch-iter_is_iovec-checks-for-user-memory-with-ite.patch --]
[-- Type: text/x-patch, Size: 2714 bytes --]

From cc8bfd64c1a8d754230809a56bf23bcadeabd63f Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Sun, 22 May 2022 19:37:33 -0600
Subject: [PATCH 2/4] Switch iter_is_iovec() checks for user memory with
 iter_is_user()

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/fops.c         | 7 +++----
 fs/direct-io.c       | 2 +-
 fs/iomap/direct-io.c | 2 +-
 fs/nfs/direct.c      | 2 +-
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index b1f7c4111458..15788a36983a 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -77,8 +77,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 
 	if (iov_iter_rw(iter) == READ) {
 		bio_init(&bio, bdev, vecs, nr_pages, REQ_OP_READ);
-		if (iter_is_iovec(iter))
-			should_dirty = true;
+		should_dirty = iter_is_user(iter);
 	} else {
 		bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb));
 	}
@@ -217,7 +216,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	}
 
 	dio->size = 0;
-	if (is_read && iter_is_iovec(iter))
+	if (is_read && iter_is_user(iter))
 		dio->flags |= DIO_SHOULD_DIRTY;
 
 	blk_start_plug(&plug);
@@ -346,7 +345,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	dio->size = bio->bi_iter.bi_size;
 
 	if (is_read) {
-		if (iter_is_iovec(iter)) {
+		if (iter_is_user(iter)) {
 			dio->flags |= DIO_SHOULD_DIRTY;
 			bio_set_pages_dirty(bio);
 		}
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 56dc5a7ad119..5cfa53e0783f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1246,7 +1246,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	spin_lock_init(&dio->bio_lock);
 	dio->refcount = 1;
 
-	dio->should_dirty = iter_is_iovec(iter) && iov_iter_rw(iter) == READ;
+	dio->should_dirty = iter_is_user(iter) && iov_iter_rw(iter) == READ;
 	sdio.iter = iter;
 	sdio.final_block_in_request = end >> blkbits;
 
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 33b94e33189a..7d212bde101a 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -523,7 +523,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			iomi.flags |= IOMAP_NOWAIT;
 		}
 
-		if (iter_is_iovec(iter))
+		if (iter_is_user(iter))
 			dio->flags |= IOMAP_DIO_DIRTY;
 	} else {
 		iomi.flags |= IOMAP_WRITE;
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 11c566d8769f..b6125d5a25c6 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -481,7 +481,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
 	if (!is_sync_kiocb(iocb))
 		dreq->iocb = iocb;
 
-	if (iter_is_iovec(iter))
+	if (iter_is_user(iter))
 		dreq->flags = NFS_ODIRECT_SHOULD_DIRTY;
 
 	if (!swap)
-- 
2.35.1


[-- Attachment #5: 0001-iov-add-iter_is_user.patch --]
[-- Type: text/x-patch, Size: 2015 bytes --]

From 992e03e0bd87801c35f5c6d8855396fc481387b8 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Sun, 22 May 2022 19:34:45 -0600
Subject: [PATCH 1/4] iov: add iter_is_user()

Returns true if the iter is holding user memory. Use that for the
might_fault() checks.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/uio.h | 5 +++++
 lib/iov_iter.c      | 6 +++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 52baa3c89505..d10c19a650a8 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -112,6 +112,11 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i)
 	return i->data_source ? WRITE : READ;
 }
 
+static inline bool iter_is_user(const struct iov_iter *i)
+{
+	return iter_is_iovec(i) || iter_is_ubuf(i);
+}
+
 /*
  * Total number of bytes covered by an iovec.
  *
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 08296769f5c6..aa41394174eb 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -767,7 +767,7 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (unlikely(iov_iter_is_pipe(i)))
 		return copy_pipe_to_iter(addr, bytes, i);
-	if (iter_is_iovec(i) || iter_is_ubuf(i))
+	if (iter_is_user(i))
 		might_fault();
 	iterate_and_advance(i, bytes, base, len, off,
 		copyout(base, addr + off, len),
@@ -849,7 +849,7 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (unlikely(iov_iter_is_pipe(i)))
 		return copy_mc_pipe_to_iter(addr, bytes, i);
-	if (iter_is_iovec(i) || iter_is_ubuf(i))
+	if (iter_is_user(i))
 		might_fault();
 	__iterate_and_advance(i, bytes, base, len, off,
 		copyout_mc(base, addr + off, len),
@@ -867,7 +867,7 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 		WARN_ON(1);
 		return 0;
 	}
-	if (iter_is_iovec(i) || iter_is_ubuf(i))
+	if (iter_is_user(i))
 		might_fault();
 	iterate_and_advance(i, bytes, base, len, off,
 		copyin(addr + off, base, len),
-- 
2.35.1


  reply	other threads:[~2022-05-23  1:51 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21  0:46 [RFC] what to do with IOCB_DSYNC? Al Viro
2021-06-21 13:59 ` Christoph Hellwig
2021-06-21 14:03   ` Matthew Wilcox
2021-06-21 14:09     ` Christoph Hellwig
2021-06-21 14:16       ` Matthew Wilcox
2021-06-21 14:22         ` Christoph Hellwig
2021-06-21 14:32           ` Al Viro
2021-06-21 14:35             ` Christoph Hellwig
2021-06-21 15:22               ` Jens Axboe
2022-05-21 17:48               ` Al Viro
2022-05-21 19:03                 ` Jens Axboe
2022-05-21 22:14                   ` Jens Axboe
2022-05-22  7:45                     ` Christoph Hellwig
2022-05-22 10:23                       ` Matthew Wilcox
2022-05-22 10:36                         ` Al Viro
2022-05-22 11:15                           ` Matthew Wilcox
2022-05-22 11:45                             ` Christoph Hellwig
2022-05-22 12:39                               ` Jens Axboe
2022-05-22 12:48                                 ` Al Viro
2022-05-22 13:02                                   ` Jens Axboe
2022-05-22 13:07                                     ` Al Viro
2022-05-22 13:09                                       ` Jens Axboe
2022-05-22 18:06                                         ` Jens Axboe
2022-05-22 18:25                                           ` Al Viro
2022-05-22 18:29                                             ` Jens Axboe
2022-05-22 18:39                                               ` Al Viro
2022-05-22 18:48                                                 ` Jens Axboe
2022-05-22 19:04                                                   ` Al Viro
2022-05-22 20:03                                                     ` Jens Axboe
2022-05-23  0:42                                                       ` Al Viro
2022-05-23  1:22                                                         ` Jens Axboe
2022-05-23  1:28                                                           ` Jens Axboe
2022-05-23  1:50                                                             ` Jens Axboe [this message]
2022-05-23  2:43                                                               ` Jens Axboe
2022-05-23 14:22                                                                 ` Al Viro
2022-05-23 14:34                                                                   ` Jens Axboe
2022-05-23 14:47                                                                     ` Al Viro
2022-05-23 15:12                                                                       ` Jens Axboe
2022-05-23 15:44                                                                         ` Jens Axboe
2022-05-23 15:49                                                                           ` Matthew Wilcox
2022-05-23 15:55                                                                             ` Jens Axboe
2022-05-23 16:03                                                                               ` Jens Axboe
2022-05-26 14:46                                                                                 ` Jason A. Donenfeld
2022-05-27 10:09                                                                                   ` Ingo Molnar
2022-05-27 10:15                                                                                     ` Jason A. Donenfeld
2022-05-27 14:45                                                                                       ` Samuel Neves
2022-05-27 10:25                                                                                     ` Ingo Molnar
2022-05-27 10:36                                                                                       ` Borislav Petkov
2022-05-28 20:54                                                                                         ` Sedat Dilek
2022-05-28 20:38                                                                                       ` Sedat Dilek
2022-05-28 20:39                                                                                         ` Sedat Dilek
2022-05-23 16:15                                                                         ` Al Viro
2022-05-25 14:34                                                         ` Matthew Wilcox
2022-05-26 23:19                                                     ` Al Viro
2022-05-27 14:51                                                       ` Jens Axboe
2022-05-22 12:21                             ` Al Viro
2022-05-22  7:43                 ` Christoph Hellwig
2022-05-22 12:41                   ` Al Viro
2022-05-22 12:51                     ` Christoph Hellwig
2021-06-21 14:22       ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c3a6ad4-cdb5-8e0d-9b01-c2825ea891ad@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).