linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Jann Horn <jannh@google.com>
Cc: linux-aio@kvack.org, linux-block@vger.kernel.org,
	linux-man <linux-man@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, Avi Kivity <avi@scylladb.com>
Subject: Re: [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers
Date: Mon, 28 Jan 2019 18:25:49 -0700	[thread overview]
Message-ID: <512f5084-3185-f7dd-80a6-b418ca6534db@kernel.dk> (raw)
In-Reply-To: <CAG48ez0JjCDtAwbYRKcEOjsMDWmEWKUYHo4nLEDbXmQ9N9Ca=w@mail.gmail.com>

On 1/28/19 5:36 PM, Jann Horn wrote:
> On Tue, Jan 29, 2019 at 12:50 AM Jens Axboe <axboe@kernel.dk> wrote:
>> On 1/28/19 4:35 PM, Jann Horn wrote:
>>> On Mon, Jan 28, 2019 at 10:36 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> If we have fixed user buffers, we can map them into the kernel when we
>>>> setup the io_context. That avoids the need to do get_user_pages() for
>>>> each and every IO.
>>> [...]
>>>> +static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
>>>> +                              void __user *arg, unsigned nr_args)
>>>> +{
>>>> +       int ret;
>>>> +
>>>> +       /* Drop our initial ref and wait for the ctx to be fully idle */
>>>> +       percpu_ref_put(&ctx->refs);
>>>
>>> The line above drops a reference that you just got in the caller...
>>
>> Right
>>
>>>> +       percpu_ref_kill(&ctx->refs);
>>>> +       wait_for_completion(&ctx->ctx_done);
>>>> +
>>>> +       switch (opcode) {
>>>> +       case IORING_REGISTER_BUFFERS:
>>>> +               ret = io_sqe_buffer_register(ctx, arg, nr_args);
>>>> +               break;
>>>> +       case IORING_UNREGISTER_BUFFERS:
>>>> +               ret = -EINVAL;
>>>> +               if (arg || nr_args)
>>>> +                       break;
>>>> +               ret = io_sqe_buffer_unregister(ctx);
>>>> +               break;
>>>> +       default:
>>>> +               ret = -EINVAL;
>>>> +               break;
>>>> +       }
>>>> +
>>>> +       /* bring the ctx back to life */
>>>> +       reinit_completion(&ctx->ctx_done);
>>>> +       percpu_ref_resurrect(&ctx->refs);
>>>> +       percpu_ref_get(&ctx->refs);
>>>
>>> And then this line takes a reference that the caller will immediately
>>> drop again? Why?
>>
>> Just want to keep it symmetric and avoid having weird "this function drops
>> a reference" use cases.
>>
>>>
>>>> +       return ret;
>>>> +}
>>>> +
>>>> +SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
>>>> +               void __user *, arg, unsigned int, nr_args)
>>>> +{
>>>> +       struct io_ring_ctx *ctx;
>>>> +       long ret = -EBADF;
>>>> +       struct fd f;
>>>> +
>>>> +       f = fdget(fd);
>>>> +       if (!f.file)
>>>> +               return -EBADF;
>>>> +
>>>> +       ret = -EOPNOTSUPP;
>>>> +       if (f.file->f_op != &io_uring_fops)
>>>> +               goto out_fput;
>>>> +
>>>> +       ret = -ENXIO;
>>>> +       ctx = f.file->private_data;
>>>> +       if (!percpu_ref_tryget(&ctx->refs))
>>>> +               goto out_fput;
>>>
>>> If you are holding the uring_lock of a ctx that can be accessed
>>> through a file descriptor (which you do just after this point), you
>>> know that the percpu_ref isn't zero, right? Why are you doing the
>>> tryget here?
>>
>> Not sure I follow... We don't hold the lock at this point. I guess your
>> point is that since the descriptor is open (or we'd fail the above
>> check), then there's no point doing the tryget variant here? That's
>> strictly true, that could just be a get().
> 
> As far as I can tell, you could do the following without breaking anything:
> 
> ========================
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 6916dc3222cf..c2d82765eefe 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2485,7 +2485,6 @@ static int __io_uring_register(struct
> io_ring_ctx *ctx, unsigned opcode,
>         int ret;
> 
>         /* Drop our initial ref and wait for the ctx to be fully idle */
> -       percpu_ref_put(&ctx->refs);
>         percpu_ref_kill(&ctx->refs);
>         wait_for_completion(&ctx->ctx_done);
> 
> @@ -2516,7 +2515,6 @@ static int __io_uring_register(struct
> io_ring_ctx *ctx, unsigned opcode,
>         /* bring the ctx back to life */
>         reinit_completion(&ctx->ctx_done);
>         percpu_ref_resurrect(&ctx->refs);
> -       percpu_ref_get(&ctx->refs);
>         return ret;
>  }
> 
> @@ -2535,17 +2533,13 @@ SYSCALL_DEFINE4(io_uring_register, unsigned
> int, fd, unsigned int, opcode,
>         if (f.file->f_op != &io_uring_fops)
>                 goto out_fput;
> 
> -       ret = -ENXIO;
>         ctx = f.file->private_data;
> -       if (!percpu_ref_tryget(&ctx->refs))
> -               goto out_fput;
> 
>         ret = -EBUSY;
>         if (mutex_trylock(&ctx->uring_lock)) {
>                 ret = __io_uring_register(ctx, opcode, arg, nr_args);
>                 mutex_unlock(&ctx->uring_lock);
>         }
> -       io_ring_drop_ctx_refs(ctx, 1);
>  out_fput:
>         fdput(f);
>         return ret;
> ========================
> 
> The two functions that can drop the initial ref of the percpu refcount are:
> 
> 1. io_ring_ctx_wait_and_kill(); this is only used on ->release() or on
> setup failure, meaning that as long as you have a reference to the
> file from fget()/fdget(), io_ring_ctx_wait_and_kill() can't have been
> called on your context
> 2. __io_uring_register(); this temporarily kills the percpu refcount
> and resurrects it, all under ctx->uring_lock, meaning that as long as
> you're holding ctx->uring_lock, __io_uring_register() can't have
> killed the percpu refcount
> 
> Therefore, I think that as long as you're in sys_io_uring_register and
> holding the ctx->uring_lock, you know that the percpu refcount is
> alive, and bumping and dropping non-initial references has no effect.
> 
> Perhaps this makes more sense when you view the percpu refcount as a
> read/write lock - percpu_ref_tryget() takes a read lock, the
> percpu_ref_kill() dance takes a write lock.

This looks good, I'll fold it in. Thanks!

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

  reply	other threads:[~2019-01-29  1:25 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-28 21:35 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-28 21:35 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-28 21:35 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-28 21:35 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-28 21:53   ` Jeff Moyer
2019-01-28 21:56     ` Jens Axboe
2019-01-28 22:32   ` Jann Horn
2019-01-28 23:46     ` Jens Axboe
2019-01-28 23:59       ` Jann Horn
2019-01-29  0:03         ` Jens Axboe
2019-01-29  0:31           ` Jens Axboe
2019-01-29  0:34             ` Jann Horn
2019-01-29  0:55               ` Jens Axboe
2019-01-29  0:58                 ` Jann Horn
2019-01-29  1:01                   ` Jens Axboe
2019-02-01 16:57         ` Matt Mullins
2019-02-01 17:04           ` Jann Horn
2019-02-01 17:23             ` Jann Horn
2019-02-01 18:05               ` Al Viro
2019-01-29  1:07   ` Jann Horn
2019-01-29  2:21     ` Jann Horn
2019-01-29  2:54       ` Jens Axboe
2019-01-29  3:46       ` Jens Axboe
2019-01-29 15:56         ` Jann Horn
2019-01-29 16:06           ` Jens Axboe
2019-01-29  2:21     ` Jens Axboe
2019-01-29  1:29   ` Jann Horn
2019-01-29  1:31     ` Jens Axboe
2019-01-29  1:32       ` Jann Horn
2019-01-29  2:23         ` Jens Axboe
2019-01-29  7:12   ` Bert Wesarg
2019-01-29 12:12   ` Florian Weimer
2019-01-29 13:35     ` Jens Axboe
2019-01-28 21:35 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-28 21:35 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-29 17:24   ` Christoph Hellwig
2019-01-29 18:31     ` Jens Axboe
2019-01-29 19:10       ` Jens Axboe
2019-01-29 20:35         ` Jeff Moyer
2019-01-29 20:37           ` Jens Axboe
2019-01-28 21:35 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-28 21:35 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-28 21:56   ` Jann Horn
2019-01-28 22:03     ` Jens Axboe
2019-01-28 21:35 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-29 17:26   ` Christoph Hellwig
2019-01-29 18:14     ` Jens Axboe
2019-01-28 21:35 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-28 21:35 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-28 23:35   ` Jann Horn
2019-01-28 23:50     ` Jens Axboe
2019-01-29  0:36       ` Jann Horn
2019-01-29  1:25         ` Jens Axboe [this message]
2019-01-28 21:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-28 21:35 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-28 21:35 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-29 17:26   ` Christoph Hellwig
2019-01-28 21:35 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-28 21:35 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-28 21:35 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512f5084-3185-f7dd-80a6-b418ca6534db@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=avi@scylladb.com \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).