From: Jens Axboe <axboe@kernel.dk>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>,
Matthew Wilcox <willy@infradead.org>,
linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] what to do with IOCB_DSYNC?
Date: Sun, 22 May 2022 19:50:55 -0600 [thread overview]
Message-ID: <9c3a6ad4-cdb5-8e0d-9b01-c2825ea891ad@kernel.dk> (raw)
In-Reply-To: <22de2da8-7db0-68c5-2c85-d752a67f9604@kernel.dk>
[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]
On 5/22/22 7:28 PM, Jens Axboe wrote:
> On 5/22/22 7:22 PM, Jens Axboe wrote:
>> On 5/22/22 6:42 PM, Al Viro wrote:
>>> On Sun, May 22, 2022 at 02:03:35PM -0600, Jens Axboe wrote:
>>>
>>>> Right, I'm saying it's not _immediately_ clear which cases are what when
>>>> reading the code.
>>>>
>>>>> up a while ago. And no, turning that into indirect calls ended up with
>>>>> arseloads of overhead, more's the pity...
>>>>
>>>> It's a shame, since indirect calls make for nicer code, but it's always
>>>> been slower and these days even more so.
>>>>
>>>>> Anyway, at the moment I have something that builds; hadn't tried to
>>>>> boot it yet.
>>>>
>>>> Nice!
>>>
>>> Boots and survives LTP and xfstests... Current variant is in
>>> vfs.git#work.iov_iter (head should be at 27fa77a9829c). I have *not*
>>> looked into the code generation in primitives; the likely/unlikely on
>>> those cascades of ifs need rethinking.
>>
>> I noticed too. Haven't fiddled much in iov_iter.c, but for uio.h I had
>> the below. iov_iter.c is a worse "offender" though, with 53 unlikely and
>> 22 likely annotations...
>
> Here it is...
Few more, most notably making sure that dio dirties reads even if they
are not of the iovec type.
Last two just add a helper for import_ubuf() and then adopts it for
io_uring send/recv which is also a hot path. The single range read/write
can be converted too, but that needs a bit more work...
--
Jens Axboe
[-- Attachment #2: 0004-io_uring-use-import_ubuf-for-send-recv.patch --]
[-- Type: text/x-patch, Size: 1713 bytes --]
From d05b59dc10e6c7c74da439561831f744196eb412 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Sun, 22 May 2022 19:49:36 -0600
Subject: [PATCH 4/4] io_uring: use import_ubuf() for send/recv
We just have the single buffer, no point in using ITER_IOVEC when we
can be using the more efficient ITER_UBUF instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
fs/io_uring.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 73f53c208df2..91255f2326e5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5272,7 +5272,6 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
- struct iovec iov;
struct socket *sock;
unsigned flags;
int min_ret = 0;
@@ -5282,7 +5281,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(!sock))
return -ENOTSOCK;
- ret = import_single_range(WRITE, sr->buf, sr->len, &iov, &msg.msg_iter);
+ ret = import_ubuf(WRITE, sr->buf, sr->len, &msg.msg_iter);
if (unlikely(ret))
return ret;
@@ -5521,7 +5520,6 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
struct msghdr msg;
void __user *buf = sr->buf;
struct socket *sock;
- struct iovec iov;
unsigned flags;
int ret, min_ret = 0;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
@@ -5537,7 +5535,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
buf = u64_to_user_ptr(kbuf->addr);
}
- ret = import_single_range(READ, buf, sr->len, &iov, &msg.msg_iter);
+ ret = import_ubuf(READ, buf, sr->len, &msg.msg_iter);
if (unlikely(ret))
goto out_free;
--
2.35.1
[-- Attachment #3: 0003-iov-add-import_ubuf.patch --]
[-- Type: text/x-patch, Size: 1589 bytes --]
From 5786b1db0bf43313092e61797f639290079b2709 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Sun, 22 May 2022 19:47:29 -0600
Subject: [PATCH 3/4] iov: add import_ubuf()
Like import_single_range(), but for ITER_UBUF.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
include/linux/uio.h | 1 +
lib/iov_iter.c | 11 +++++++++++
2 files changed, 12 insertions(+)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index d10c19a650a8..4e473ea90b20 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -331,6 +331,7 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec,
struct iov_iter *i, bool compat);
int import_single_range(int type, void __user *buf, size_t len,
struct iovec *iov, struct iov_iter *i);
+int import_ubuf(int type, void __user *buf, size_t len, struct iov_iter *i);
static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
void __user *buf, size_t count)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index aa41394174eb..348412757335 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -2167,6 +2167,17 @@ int import_single_range(int rw, void __user *buf, size_t len,
}
EXPORT_SYMBOL(import_single_range);
+int import_ubuf(int rw, void __user *buf, size_t len, struct iov_iter *i)
+{
+ if (len > MAX_RW_COUNT)
+ len = MAX_RW_COUNT;
+ if (unlikely(!access_ok(buf, len)))
+ return -EFAULT;
+
+ iov_iter_ubuf(i, rw, buf, len);
+ return 0;
+}
+
/**
* iov_iter_restore() - Restore a &struct iov_iter to the same state as when
* iov_iter_save_state() was called.
--
2.35.1
[-- Attachment #4: 0002-Switch-iter_is_iovec-checks-for-user-memory-with-ite.patch --]
[-- Type: text/x-patch, Size: 2714 bytes --]
From cc8bfd64c1a8d754230809a56bf23bcadeabd63f Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Sun, 22 May 2022 19:37:33 -0600
Subject: [PATCH 2/4] Switch iter_is_iovec() checks for user memory with
iter_is_user()
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
block/fops.c | 7 +++----
fs/direct-io.c | 2 +-
fs/iomap/direct-io.c | 2 +-
fs/nfs/direct.c | 2 +-
4 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index b1f7c4111458..15788a36983a 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -77,8 +77,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
if (iov_iter_rw(iter) == READ) {
bio_init(&bio, bdev, vecs, nr_pages, REQ_OP_READ);
- if (iter_is_iovec(iter))
- should_dirty = true;
+ should_dirty = iter_is_user(iter);
} else {
bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb));
}
@@ -217,7 +216,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
}
dio->size = 0;
- if (is_read && iter_is_iovec(iter))
+ if (is_read && iter_is_user(iter))
dio->flags |= DIO_SHOULD_DIRTY;
blk_start_plug(&plug);
@@ -346,7 +345,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
dio->size = bio->bi_iter.bi_size;
if (is_read) {
- if (iter_is_iovec(iter)) {
+ if (iter_is_user(iter)) {
dio->flags |= DIO_SHOULD_DIRTY;
bio_set_pages_dirty(bio);
}
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 56dc5a7ad119..5cfa53e0783f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1246,7 +1246,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
spin_lock_init(&dio->bio_lock);
dio->refcount = 1;
- dio->should_dirty = iter_is_iovec(iter) && iov_iter_rw(iter) == READ;
+ dio->should_dirty = iter_is_user(iter) && iov_iter_rw(iter) == READ;
sdio.iter = iter;
sdio.final_block_in_request = end >> blkbits;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 33b94e33189a..7d212bde101a 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -523,7 +523,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
iomi.flags |= IOMAP_NOWAIT;
}
- if (iter_is_iovec(iter))
+ if (iter_is_user(iter))
dio->flags |= IOMAP_DIO_DIRTY;
} else {
iomi.flags |= IOMAP_WRITE;
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 11c566d8769f..b6125d5a25c6 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -481,7 +481,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
if (!is_sync_kiocb(iocb))
dreq->iocb = iocb;
- if (iter_is_iovec(iter))
+ if (iter_is_user(iter))
dreq->flags = NFS_ODIRECT_SHOULD_DIRTY;
if (!swap)
--
2.35.1
[-- Attachment #5: 0001-iov-add-iter_is_user.patch --]
[-- Type: text/x-patch, Size: 2015 bytes --]
From 992e03e0bd87801c35f5c6d8855396fc481387b8 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Sun, 22 May 2022 19:34:45 -0600
Subject: [PATCH 1/4] iov: add iter_is_user()
Returns true if the iter is holding user memory. Use that for the
might_fault() checks.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
include/linux/uio.h | 5 +++++
lib/iov_iter.c | 6 +++---
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 52baa3c89505..d10c19a650a8 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -112,6 +112,11 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i)
return i->data_source ? WRITE : READ;
}
+static inline bool iter_is_user(const struct iov_iter *i)
+{
+ return iter_is_iovec(i) || iter_is_ubuf(i);
+}
+
/*
* Total number of bytes covered by an iovec.
*
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 08296769f5c6..aa41394174eb 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -767,7 +767,7 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
{
if (unlikely(iov_iter_is_pipe(i)))
return copy_pipe_to_iter(addr, bytes, i);
- if (iter_is_iovec(i) || iter_is_ubuf(i))
+ if (iter_is_user(i))
might_fault();
iterate_and_advance(i, bytes, base, len, off,
copyout(base, addr + off, len),
@@ -849,7 +849,7 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
{
if (unlikely(iov_iter_is_pipe(i)))
return copy_mc_pipe_to_iter(addr, bytes, i);
- if (iter_is_iovec(i) || iter_is_ubuf(i))
+ if (iter_is_user(i))
might_fault();
__iterate_and_advance(i, bytes, base, len, off,
copyout_mc(base, addr + off, len),
@@ -867,7 +867,7 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
WARN_ON(1);
return 0;
}
- if (iter_is_iovec(i) || iter_is_ubuf(i))
+ if (iter_is_user(i))
might_fault();
iterate_and_advance(i, bytes, base, len, off,
copyin(addr + off, base, len),
--
2.35.1
next prev parent reply other threads:[~2022-05-23 1:51 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-21 0:46 [RFC] what to do with IOCB_DSYNC? Al Viro
2021-06-21 13:59 ` Christoph Hellwig
2021-06-21 14:03 ` Matthew Wilcox
2021-06-21 14:09 ` Christoph Hellwig
2021-06-21 14:16 ` Matthew Wilcox
2021-06-21 14:22 ` Christoph Hellwig
2021-06-21 14:32 ` Al Viro
2021-06-21 14:35 ` Christoph Hellwig
2021-06-21 15:22 ` Jens Axboe
2022-05-21 17:48 ` Al Viro
2022-05-21 19:03 ` Jens Axboe
2022-05-21 22:14 ` Jens Axboe
2022-05-22 7:45 ` Christoph Hellwig
2022-05-22 10:23 ` Matthew Wilcox
2022-05-22 10:36 ` Al Viro
2022-05-22 11:15 ` Matthew Wilcox
2022-05-22 11:45 ` Christoph Hellwig
2022-05-22 12:39 ` Jens Axboe
2022-05-22 12:48 ` Al Viro
2022-05-22 13:02 ` Jens Axboe
2022-05-22 13:07 ` Al Viro
2022-05-22 13:09 ` Jens Axboe
2022-05-22 18:06 ` Jens Axboe
2022-05-22 18:25 ` Al Viro
2022-05-22 18:29 ` Jens Axboe
2022-05-22 18:39 ` Al Viro
2022-05-22 18:48 ` Jens Axboe
2022-05-22 19:04 ` Al Viro
2022-05-22 20:03 ` Jens Axboe
2022-05-23 0:42 ` Al Viro
2022-05-23 1:22 ` Jens Axboe
2022-05-23 1:28 ` Jens Axboe
2022-05-23 1:50 ` Jens Axboe [this message]
2022-05-23 2:43 ` Jens Axboe
2022-05-23 14:22 ` Al Viro
2022-05-23 14:34 ` Jens Axboe
2022-05-23 14:47 ` Al Viro
2022-05-23 15:12 ` Jens Axboe
2022-05-23 15:44 ` Jens Axboe
2022-05-23 15:49 ` Matthew Wilcox
2022-05-23 15:55 ` Jens Axboe
2022-05-23 16:03 ` Jens Axboe
2022-05-26 14:46 ` Jason A. Donenfeld
2022-05-27 10:09 ` Ingo Molnar
2022-05-27 10:15 ` Jason A. Donenfeld
2022-05-27 14:45 ` Samuel Neves
2022-05-27 10:25 ` Ingo Molnar
2022-05-27 10:36 ` Borislav Petkov
2022-05-28 20:54 ` Sedat Dilek
2022-05-28 20:38 ` Sedat Dilek
2022-05-28 20:39 ` Sedat Dilek
2022-05-23 16:15 ` Al Viro
2022-05-25 14:34 ` Matthew Wilcox
2022-05-26 23:19 ` Al Viro
2022-05-27 14:51 ` Jens Axboe
2022-05-22 12:21 ` Al Viro
2022-05-22 7:43 ` Christoph Hellwig
2022-05-22 12:41 ` Al Viro
2022-05-22 12:51 ` Christoph Hellwig
2021-06-21 14:22 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9c3a6ad4-cdb5-8e0d-9b01-c2825ea891ad@kernel.dk \
--to=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).