From: Jens Axboe <axboe@kernel.dk>
To: Jann Horn <jannh@google.com>
Cc: linux-aio@kvack.org, linux-block@vger.kernel.org,
Linux API <linux-api@vger.kernel.org>,
hch@lst.de, jmoyer@redhat.com, Avi Kivity <avi@scylladb.com>
Subject: Re: [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers
Date: Tue, 29 Jan 2019 16:14:56 -0700 [thread overview]
Message-ID: <379a631f-e55b-cc31-f84a-ace73fd66ea1@kernel.dk> (raw)
In-Reply-To: <CAG48ez1m22z0Pqi0cT=UZdsA14SCM5579T1d40cZdyD6KqBw_g@mail.gmail.com>
On 1/29/19 4:08 PM, Jann Horn wrote:
> On Wed, Jan 30, 2019 at 12:06 AM Jens Axboe <axboe@kernel.dk> wrote:
>> On 1/29/19 4:03 PM, Jann Horn wrote:
>>> On Tue, Jan 29, 2019 at 11:56 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 1/29/19 3:44 PM, Jann Horn wrote:
>>>>> On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>> If we have fixed user buffers, we can map them into the kernel when we
>>>>>> setup the io_context. That avoids the need to do get_user_pages() for
>>>>>> each and every IO.
>>>>>>
>>>>>> To utilize this feature, the application must call io_uring_register()
>>>>>> after having setup an io_uring context, passing in
>>>>>> IORING_REGISTER_BUFFERS as the opcode. The argument must be a pointer
>>>>>> to an iovec array, and the nr_args should contain how many iovecs the
>>>>>> application wishes to map.
>>>>>>
>>>>>> If successful, these buffers are now mapped into the kernel, eligible
>>>>>> for IO. To use these fixed buffers, the application must use the
>>>>>> IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED opcodes, and then
>>>>>> set sqe->index to the desired buffer index. sqe->addr..sqe->addr+seq->len
>>>>>> must point to somewhere inside the indexed buffer.
>>>>>>
>>>>>> The application may register buffers throughout the lifetime of the
>>>>>> io_uring context. It can call io_uring_register() with
>>>>>> IORING_UNREGISTER_BUFFERS as the opcode to unregister the current set of
>>>>>> buffers, and then register a new set. The application need not
>>>>>> unregister buffers explicitly before shutting down the io_uring context.
>>> [...]
>>>>>> + imu = &ctx->user_bufs[index];
>>>>>> + buf_addr = READ_ONCE(sqe->addr);
>>>>>> + if (buf_addr < imu->ubuf || buf_addr + len > imu->ubuf + imu->len)
>>>>>
>>>>> This can wrap around if `buf_addr` or `len` is very big, right? Then
>>>>> you e.g. get past the first check because `buf_addr` is sufficiently
>>>>> big, and get past the second check because `buf_addr + len` wraps
>>>>> around and becomes small.
>>>>
>>>> Good point. I wonder if we have a verification helper for something like
>>>> this?
>>>
>>> check_add_overflow() exists, I guess that might help a bit. I don't
>>> think I've seen a more specific helper for this situation.
>>
>> Hmm, not super appropriate. How about something ala:
>>
>> if (buf_addr + len < buf_addr)
>> ... overflow ...
>>
>> ?
>
> Sure, sounds good.
Just folded in this incremental, which should fix all the issues outlined
in your email.
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7364feebafed..d42541357969 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -751,7 +751,7 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
{
size_t len = READ_ONCE(sqe->len);
struct io_mapped_ubuf *imu;
- int buf_index, index;
+ unsigned index, buf_index;
size_t offset;
u64 buf_addr;
@@ -763,9 +763,12 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
if (unlikely(buf_index >= ctx->nr_user_bufs))
return -EFAULT;
- index = array_index_nospec(buf_index, ctx->sq_entries);
+ index = array_index_nospec(buf_index, ctx->nr_user_bufs);
imu = &ctx->user_bufs[index];
buf_addr = READ_ONCE(sqe->addr);
+
+ if (buf_addr + len < buf_addr)
+ return -EFAULT;
if (buf_addr < imu->ubuf || buf_addr + len > imu->ubuf + imu->len)
return -EFAULT;
@@ -1602,6 +1605,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes,
static int io_sq_thread(void *data)
{
+ struct io_uring_sqe lsqe[IO_IOPOLL_BATCH];
struct sqe_submit sqes[IO_IOPOLL_BATCH];
struct io_ring_ctx *ctx = data;
struct mm_struct *cur_mm = NULL;
@@ -1701,6 +1705,14 @@ static int io_sq_thread(void *data)
i = 0;
all_fixed = true;
do {
+ /*
+ * Ensure sqe is stable between checking if we need
+ * user access, and actually importing the iovec
+ * further down the stack.
+ */
+ memcpy(&lsqe[i], sqes[i].sqe, sizeof(lsqe[i]));
+ sqes[i].sqe = &lsqe[i];
+
if (all_fixed && io_sqe_needs_user(sqes[i].sqe))
all_fixed = false;
@@ -2081,7 +2093,7 @@ static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst,
struct iovec __user *src;
#ifdef CONFIG_COMPAT
- if (in_compat_syscall()) {
+ if (ctx->compat) {
struct compat_iovec __user *ciovs;
struct compat_iovec ciov;
@@ -2103,7 +2115,6 @@ static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst,
static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
unsigned nr_args)
{
- struct vm_area_struct **vmas = NULL;
struct page **pages = NULL;
int i, j, got_pages = 0;
int ret = -EINVAL;
@@ -2138,7 +2149,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
* submitted if they are wrong.
*/
ret = -EFAULT;
- if (!iov.iov_base)
+ if (!iov.iov_base || !iov.iov_len)
goto err;
/* arbitrary limit, but we need something */
@@ -2155,14 +2166,10 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
goto err;
if (!pages || nr_pages > got_pages) {
- kfree(vmas);
kfree(pages);
pages = kmalloc_array(nr_pages, sizeof(struct page *),
GFP_KERNEL);
- vmas = kmalloc_array(nr_pages,
- sizeof(struct vma_area_struct *),
- GFP_KERNEL);
- if (!pages || !vmas) {
+ if (!pages) {
io_unaccount_mem(ctx, nr_pages);
goto err;
}
@@ -2176,32 +2183,18 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
goto err;
}
- down_write(¤t->mm->mmap_sem);
- pret = get_user_pages_longterm(ubuf, nr_pages, FOLL_WRITE,
- pages, vmas);
- if (pret == nr_pages) {
- /* don't support file backed memory */
- for (j = 0; j < nr_pages; j++) {
- struct vm_area_struct *vma = vmas[j];
+ down_read(¤t->mm->mmap_sem);
+ pret = get_user_pages_longterm(ubuf, nr_pages,
+ FOLL_WRITE | FOLL_ANON, pages,
+ NULL);
+ up_read(¤t->mm->mmap_sem);
- if (vma->vm_file) {
- ret = -EOPNOTSUPP;
- break;
- }
- }
- } else {
- ret = pret < 0 ? pret : -EFAULT;
- }
- up_write(¤t->mm->mmap_sem);
- if (ret) {
- /*
- * if we did partial map, or found file backed vmas,
- * release any pages we did get
- */
+ if (pret != nr_pages) {
if (pret > 0) {
for (j = 0; j < pret; j++)
put_page(pages[j]);
}
+ ret = pret < 0 ? pret : -EFAULT;
io_unaccount_mem(ctx, nr_pages);
goto err;
}
@@ -2224,12 +2217,10 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
imu->nr_bvecs = nr_pages;
}
kfree(pages);
- kfree(vmas);
ctx->nr_user_bufs = nr_args;
return 0;
err:
kfree(pages);
- kfree(vmas);
io_sqe_buffer_unregister(ctx);
return ret;
}
--
Jens Axboe
next prev parent reply other threads:[~2019-01-29 23:15 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-29 19:26 [PATCHSET v9] io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-29 19:26 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-29 19:26 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-29 19:26 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-29 19:26 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-29 19:26 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-29 20:47 ` Jann Horn
2019-01-29 20:56 ` Jens Axboe
2019-01-29 21:10 ` Jann Horn
2019-01-29 21:33 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-29 19:26 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-29 23:31 ` Jann Horn
2019-01-29 23:44 ` Jens Axboe
2019-01-30 15:33 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-29 19:26 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-29 19:26 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-29 22:44 ` Jann Horn
2019-01-29 22:56 ` Jens Axboe
2019-01-29 23:03 ` Jann Horn
2019-01-29 23:06 ` Jens Axboe
2019-01-29 23:08 ` Jann Horn
2019-01-29 23:14 ` Jens Axboe [this message]
2019-01-29 23:42 ` Jann Horn
2019-01-29 23:51 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30 1:29 ` Jann Horn
2019-01-30 15:35 ` Jens Axboe
2019-02-04 2:56 ` Al Viro
2019-02-05 2:19 ` Jens Axboe
2019-02-05 17:57 ` Jens Axboe
2019-02-05 19:08 ` Jens Axboe
2019-02-06 0:27 ` Jens Axboe
2019-02-06 1:01 ` Al Viro
2019-02-06 17:56 ` Jens Axboe
2019-02-07 4:05 ` Al Viro
2019-02-07 16:14 ` Jens Axboe
2019-02-07 16:30 ` Al Viro
2019-02-07 16:35 ` Jens Axboe
2019-02-07 16:51 ` Al Viro
2019-02-06 0:56 ` Al Viro
2019-02-06 13:41 ` Jens Axboe
2019-02-07 4:00 ` Al Viro
2019-02-07 9:22 ` Miklos Szeredi
2019-02-07 13:31 ` Al Viro
2019-02-07 14:20 ` Miklos Szeredi
2019-02-07 15:20 ` Al Viro
2019-02-07 15:27 ` Miklos Szeredi
2019-02-07 16:26 ` Al Viro
2019-02-07 19:08 ` Miklos Szeredi
2019-02-07 18:45 ` Jens Axboe
2019-02-07 18:58 ` Jens Axboe
2019-02-11 15:55 ` Jonathan Corbet
2019-02-11 17:35 ` Al Viro
2019-02-11 20:33 ` Jonathan Corbet
2019-01-29 19:26 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-29 19:26 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-29 19:27 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-29 19:27 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-29 19:27 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2019-02-07 19:55 [PATCHSET v12] io_uring IO interface Jens Axboe
2019-02-07 19:55 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-02-07 20:57 ` Jeff Moyer
2019-02-07 21:02 ` Jens Axboe
2019-02-07 22:38 ` Jeff Moyer
2019-02-07 22:47 ` Jens Axboe
2019-02-01 15:23 [PATCHSET v11] io_uring IO interface Jens Axboe
2019-02-01 15:24 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-30 21:55 [PATCHSET v10] io_uring IO interface Jens Axboe
2019-01-30 21:55 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-28 23:35 ` Jann Horn
2019-01-28 23:50 ` Jens Axboe
2019-01-29 0:36 ` Jann Horn
2019-01-29 1:25 ` Jens Axboe
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=379a631f-e55b-cc31-f84a-ace73fd66ea1@kernel.dk \
--to=axboe@kernel.dk \
--cc=avi@scylladb.com \
--cc=hch@lst.de \
--cc=jannh@google.com \
--cc=jmoyer@redhat.com \
--cc=linux-aio@kvack.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).