linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Jann Horn <jannh@google.com>
Cc: linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, Avi Kivity <avi@scylladb.com>
Subject: Re: [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers
Date: Tue, 29 Jan 2019 16:14:56 -0700	[thread overview]
Message-ID: <379a631f-e55b-cc31-f84a-ace73fd66ea1@kernel.dk> (raw)
In-Reply-To: <CAG48ez1m22z0Pqi0cT=UZdsA14SCM5579T1d40cZdyD6KqBw_g@mail.gmail.com>

On 1/29/19 4:08 PM, Jann Horn wrote:
> On Wed, Jan 30, 2019 at 12:06 AM Jens Axboe <axboe@kernel.dk> wrote:
>> On 1/29/19 4:03 PM, Jann Horn wrote:
>>> On Tue, Jan 29, 2019 at 11:56 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 1/29/19 3:44 PM, Jann Horn wrote:
>>>>> On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>> If we have fixed user buffers, we can map them into the kernel when we
>>>>>> setup the io_context. That avoids the need to do get_user_pages() for
>>>>>> each and every IO.
>>>>>>
>>>>>> To utilize this feature, the application must call io_uring_register()
>>>>>> after having setup an io_uring context, passing in
>>>>>> IORING_REGISTER_BUFFERS as the opcode. The argument must be a pointer
>>>>>> to an iovec array, and the nr_args should contain how many iovecs the
>>>>>> application wishes to map.
>>>>>>
>>>>>> If successful, these buffers are now mapped into the kernel, eligible
>>>>>> for IO. To use these fixed buffers, the application must use the
>>>>>> IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED opcodes, and then
>>>>>> set sqe->index to the desired buffer index. sqe->addr..sqe->addr+seq->len
>>>>>> must point to somewhere inside the indexed buffer.
>>>>>>
>>>>>> The application may register buffers throughout the lifetime of the
>>>>>> io_uring context. It can call io_uring_register() with
>>>>>> IORING_UNREGISTER_BUFFERS as the opcode to unregister the current set of
>>>>>> buffers, and then register a new set. The application need not
>>>>>> unregister buffers explicitly before shutting down the io_uring context.
>>> [...]
>>>>>> +       imu = &ctx->user_bufs[index];
>>>>>> +       buf_addr = READ_ONCE(sqe->addr);
>>>>>> +       if (buf_addr < imu->ubuf || buf_addr + len > imu->ubuf + imu->len)
>>>>>
>>>>> This can wrap around if `buf_addr` or `len` is very big, right? Then
>>>>> you e.g. get past the first check because `buf_addr` is sufficiently
>>>>> big, and get past the second check because `buf_addr + len` wraps
>>>>> around and becomes small.
>>>>
>>>> Good point. I wonder if we have a verification helper for something like
>>>> this?
>>>
>>> check_add_overflow() exists, I guess that might help a bit. I don't
>>> think I've seen a more specific helper for this situation.
>>
>> Hmm, not super appropriate. How about something ala:
>>
>> if (buf_addr + len < buf_addr)
>>     ... overflow ...
>>
>> ?
> 
> Sure, sounds good.

Just folded in this incremental, which should fix all the issues outlined
in your email.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7364feebafed..d42541357969 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -751,7 +751,7 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
 {
 	size_t len = READ_ONCE(sqe->len);
 	struct io_mapped_ubuf *imu;
-	int buf_index, index;
+	unsigned index, buf_index;
 	size_t offset;
 	u64 buf_addr;
 
@@ -763,9 +763,12 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
 	if (unlikely(buf_index >= ctx->nr_user_bufs))
 		return -EFAULT;
 
-	index = array_index_nospec(buf_index, ctx->sq_entries);
+	index = array_index_nospec(buf_index, ctx->nr_user_bufs);
 	imu = &ctx->user_bufs[index];
 	buf_addr = READ_ONCE(sqe->addr);
+
+	if (buf_addr + len < buf_addr)
+		return -EFAULT;
 	if (buf_addr < imu->ubuf || buf_addr + len > imu->ubuf + imu->len)
 		return -EFAULT;
 
@@ -1602,6 +1605,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes,
 
 static int io_sq_thread(void *data)
 {
+	struct io_uring_sqe lsqe[IO_IOPOLL_BATCH];
 	struct sqe_submit sqes[IO_IOPOLL_BATCH];
 	struct io_ring_ctx *ctx = data;
 	struct mm_struct *cur_mm = NULL;
@@ -1701,6 +1705,14 @@ static int io_sq_thread(void *data)
 		i = 0;
 		all_fixed = true;
 		do {
+			/*
+			 * Ensure sqe is stable between checking if we need
+			 * user access, and actually importing the iovec
+			 * further down the stack.
+			 */
+			memcpy(&lsqe[i], sqes[i].sqe, sizeof(lsqe[i]));
+			sqes[i].sqe = &lsqe[i];
+
 			if (all_fixed && io_sqe_needs_user(sqes[i].sqe))
 				all_fixed = false;
 
@@ -2081,7 +2093,7 @@ static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst,
 	struct iovec __user *src;
 
 #ifdef CONFIG_COMPAT
-	if (in_compat_syscall()) {
+	if (ctx->compat) {
 		struct compat_iovec __user *ciovs;
 		struct compat_iovec ciov;
 
@@ -2103,7 +2115,6 @@ static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst,
 static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 				  unsigned nr_args)
 {
-	struct vm_area_struct **vmas = NULL;
 	struct page **pages = NULL;
 	int i, j, got_pages = 0;
 	int ret = -EINVAL;
@@ -2138,7 +2149,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 		 * submitted if they are wrong.
 		 */
 		ret = -EFAULT;
-		if (!iov.iov_base)
+		if (!iov.iov_base || !iov.iov_len)
 			goto err;
 
 		/* arbitrary limit, but we need something */
@@ -2155,14 +2166,10 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 			goto err;
 
 		if (!pages || nr_pages > got_pages) {
-			kfree(vmas);
 			kfree(pages);
 			pages = kmalloc_array(nr_pages, sizeof(struct page *),
 						GFP_KERNEL);
-			vmas = kmalloc_array(nr_pages,
-					sizeof(struct vma_area_struct *),
-					GFP_KERNEL);
-			if (!pages || !vmas) {
+			if (!pages) {
 				io_unaccount_mem(ctx, nr_pages);
 				goto err;
 			}
@@ -2176,32 +2183,18 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 			goto err;
 		}
 
-		down_write(&current->mm->mmap_sem);
-		pret = get_user_pages_longterm(ubuf, nr_pages, FOLL_WRITE,
-						pages, vmas);
-		if (pret == nr_pages) {
-			/* don't support file backed memory */
-			for (j = 0; j < nr_pages; j++) {
-				struct vm_area_struct *vma = vmas[j];
+		down_read(&current->mm->mmap_sem);
+		pret = get_user_pages_longterm(ubuf, nr_pages,
+						FOLL_WRITE | FOLL_ANON, pages,
+						NULL);
+		up_read(&current->mm->mmap_sem);
 
-				if (vma->vm_file) {
-					ret = -EOPNOTSUPP;
-					break;
-				}
-			}
-		} else {
-			ret = pret < 0 ? pret : -EFAULT;
-		}
-		up_write(&current->mm->mmap_sem);
-		if (ret) {
-			/*
-			 * if we did partial map, or found file backed vmas,
-			 * release any pages we did get
-			 */
+		if (pret != nr_pages) {
 			if (pret > 0) {
 				for (j = 0; j < pret; j++)
 					put_page(pages[j]);
 			}
+			ret = pret < 0 ? pret : -EFAULT;
 			io_unaccount_mem(ctx, nr_pages);
 			goto err;
 		}
@@ -2224,12 +2217,10 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 		imu->nr_bvecs = nr_pages;
 	}
 	kfree(pages);
-	kfree(vmas);
 	ctx->nr_user_bufs = nr_args;
 	return 0;
 err:
 	kfree(pages);
-	kfree(vmas);
 	io_sqe_buffer_unregister(ctx);
 	return ret;
 }

-- 
Jens Axboe


  reply	other threads:[~2019-01-29 23:15 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-29 19:26 [PATCHSET v9] io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-29 19:26 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-29 19:26 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-29 19:26 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-29 19:26 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-29 19:26 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-29 20:47   ` Jann Horn
2019-01-29 20:56     ` Jens Axboe
2019-01-29 21:10       ` Jann Horn
2019-01-29 21:33         ` Jens Axboe
2019-01-29 19:26 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-29 19:26 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-29 23:31   ` Jann Horn
2019-01-29 23:44     ` Jens Axboe
2019-01-30 15:33       ` Jens Axboe
2019-01-29 19:26 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-29 19:26 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-29 19:26 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-29 22:44   ` Jann Horn
2019-01-29 22:56     ` Jens Axboe
2019-01-29 23:03       ` Jann Horn
2019-01-29 23:06         ` Jens Axboe
2019-01-29 23:08           ` Jann Horn
2019-01-29 23:14             ` Jens Axboe [this message]
2019-01-29 23:42               ` Jann Horn
2019-01-29 23:51                 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30  1:29   ` Jann Horn
2019-01-30 15:35     ` Jens Axboe
2019-02-04  2:56     ` Al Viro
2019-02-05  2:19       ` Jens Axboe
2019-02-05 17:57         ` Jens Axboe
2019-02-05 19:08           ` Jens Axboe
2019-02-06  0:27             ` Jens Axboe
2019-02-06  1:01               ` Al Viro
2019-02-06 17:56                 ` Jens Axboe
2019-02-07  4:05                   ` Al Viro
2019-02-07 16:14                     ` Jens Axboe
2019-02-07 16:30                       ` Al Viro
2019-02-07 16:35                         ` Jens Axboe
2019-02-07 16:51                         ` Al Viro
2019-02-06  0:56             ` Al Viro
2019-02-06 13:41               ` Jens Axboe
2019-02-07  4:00                 ` Al Viro
2019-02-07  9:22                   ` Miklos Szeredi
2019-02-07 13:31                     ` Al Viro
2019-02-07 14:20                       ` Miklos Szeredi
2019-02-07 15:20                         ` Al Viro
2019-02-07 15:27                           ` Miklos Szeredi
2019-02-07 16:26                             ` Al Viro
2019-02-07 19:08                               ` Miklos Szeredi
2019-02-07 18:45                   ` Jens Axboe
2019-02-07 18:58                     ` Jens Axboe
2019-02-11 15:55                     ` Jonathan Corbet
2019-02-11 17:35                       ` Al Viro
2019-02-11 20:33                         ` Jonathan Corbet
2019-01-29 19:26 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-29 19:26 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-29 19:27 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-29 19:27 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-29 19:27 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2019-02-07 19:55 [PATCHSET v12] io_uring IO interface Jens Axboe
2019-02-07 19:55 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-02-07 20:57   ` Jeff Moyer
2019-02-07 21:02     ` Jens Axboe
2019-02-07 22:38   ` Jeff Moyer
2019-02-07 22:47     ` Jens Axboe
2019-02-01 15:23 [PATCHSET v11] io_uring IO interface Jens Axboe
2019-02-01 15:24 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-30 21:55 [PATCHSET v10] io_uring IO interface Jens Axboe
2019-01-30 21:55 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-28 23:35   ` Jann Horn
2019-01-28 23:50     ` Jens Axboe
2019-01-29  0:36       ` Jann Horn
2019-01-29  1:25         ` Jens Axboe
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=379a631f-e55b-cc31-f84a-ace73fd66ea1@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=avi@scylladb.com \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).