All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	linux-man <linux-man@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, Avi Kivity <avi@scylladb.com>
Subject: Re: [PATCH 05/18] Add io_uring IO interface
Date: Tue, 29 Jan 2019 00:59:32 +0100	[thread overview]
Message-ID: <CAG48ez17NW0GJVRC6dFcHZTgQifFz5og1XCUbXkHKhr6f=j74Q@mail.gmail.com> (raw)
In-Reply-To: <e9326a77-54c5-e2b8-d9e5-663261462597@kernel.dk>

On Tue, Jan 29, 2019 at 12:47 AM Jens Axboe <axboe@kernel.dk> wrote:
> On 1/28/19 3:32 PM, Jann Horn wrote:
> > On Mon, Jan 28, 2019 at 10:35 PM Jens Axboe <axboe@kernel.dk> wrote:
> >> The submission queue (SQ) and completion queue (CQ) rings are shared
> >> between the application and the kernel. This eliminates the need to
> >> copy data back and forth to submit and complete IO.
> >>
> >> IO submissions use the io_uring_sqe data structure, and completions
> >> are generated in the form of io_uring_sqe data structures. The SQ
> >> ring is an index into the io_uring_sqe array, which makes it possible
> >> to submit a batch of IOs without them being contiguous in the ring.
> >> The CQ ring is always contiguous, as completion events are inherently
> >> unordered, and hence any io_uring_cqe entry can point back to an
> >> arbitrary submission.
> >>
> >> Two new system calls are added for this:
> >>
> >> io_uring_setup(entries, params)
> >>         Sets up a context for doing async IO. On success, returns a file
> >>         descriptor that the application can mmap to gain access to the
> >>         SQ ring, CQ ring, and io_uring_sqes.
> >>
> >> io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
> >>         Initiates IO against the rings mapped to this fd, or waits for
> >>         them to complete, or both. The behavior is controlled by the
> >>         parameters passed in. If 'to_submit' is non-zero, then we'll
> >>         try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
> >>         kernel will wait for 'min_complete' events, if they aren't
> >>         already available. It's valid to set IORING_ENTER_GETEVENTS
> >>         and 'min_complete' == 0 at the same time, this allows the
> >>         kernel to return already completed events without waiting
> >>         for them. This is useful only for polling, as for IRQ
> >>         driven IO, the application can just check the CQ ring
> >>         without entering the kernel.
> >>
> >> With this setup, it's possible to do async IO with a single system
> >> call. Future developments will enable polled IO with this interface,
> >> and polled submission as well. The latter will enable an application
> >> to do IO without doing ANY system calls at all.
> >>
> >> For IRQ driven IO, an application only needs to enter the kernel for
> >> completions if it wants to wait for them to occur.
> >>
> >> Each io_uring is backed by a workqueue, to support buffered async IO
> >> as well. We will only punt to an async context if the command would
> >> need to wait for IO on the device side. Any data that can be accessed
> >> directly in the page cache is done inline. This avoids the slowness
> >> issue of usual threadpools, since cached data is accessed as quickly
> >> as a sync interface.
> >>
> >> Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c
> > [...]
> >> +static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> >> +                     bool force_nonblock)
> >> +{
> >> +       struct kiocb *kiocb = &req->rw;
> >> +       int ret;
> >> +
> >> +       kiocb->ki_filp = fget(sqe->fd);
> >> +       if (unlikely(!kiocb->ki_filp))
> >> +               return -EBADF;
> >> +       kiocb->ki_pos = sqe->off;
> >> +       kiocb->ki_flags = iocb_flags(kiocb->ki_filp);
> >> +       kiocb->ki_hint = ki_hint_validate(file_write_hint(kiocb->ki_filp));
> >> +       if (sqe->ioprio) {
> >> +               ret = ioprio_check_cap(sqe->ioprio);
> >> +               if (ret)
> >> +                       goto out_fput;
> >> +
> >> +               kiocb->ki_ioprio = sqe->ioprio;
> >> +       } else
> >> +               kiocb->ki_ioprio = get_current_ioprio();
> >> +
> >> +       ret = kiocb_set_rw_flags(kiocb, sqe->rw_flags);
> >> +       if (unlikely(ret))
> >> +               goto out_fput;
> >> +       if (force_nonblock) {
> >> +               kiocb->ki_flags |= IOCB_NOWAIT;
> >> +               req->flags |= REQ_F_FORCE_NONBLOCK;
> >> +       }
> >> +       if (kiocb->ki_flags & IOCB_HIPRI) {
> >> +               ret = -EINVAL;
> >> +               goto out_fput;
> >> +       }
> >> +
> >> +       kiocb->ki_complete = io_complete_rw;
> >> +       return 0;
> >> +out_fput:
> >> +       fput(kiocb->ki_filp);
> >> +       return ret;
> >> +}
> > [...]
> >> +static ssize_t io_read(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> >> +                      bool force_nonblock)
> >> +{
> >> +       struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
> >> +       struct kiocb *kiocb = &req->rw;
> >> +       struct iov_iter iter;
> >> +       struct file *file;
> >> +       ssize_t ret;
> >> +
> >> +       ret = io_prep_rw(req, sqe, force_nonblock);
> >> +       if (ret)
> >> +               return ret;
> >> +       file = kiocb->ki_filp;
> >> +
> >> +       ret = -EBADF;
> >> +       if (unlikely(!(file->f_mode & FMODE_READ)))
> >> +               goto out_fput;
> >> +       ret = -EINVAL;
> >> +       if (unlikely(!file->f_op->read_iter))
> >> +               goto out_fput;
> >> +
> >> +       ret = io_import_iovec(req->ctx, READ, sqe, &iovec, &iter);
> >> +       if (ret)
> >> +               goto out_fput;
> >> +
> >> +       ret = rw_verify_area(READ, file, &kiocb->ki_pos, iov_iter_count(&iter));
> >> +       if (!ret) {
> >> +               ssize_t ret2;
> >> +
> >> +               /* Catch -EAGAIN return for forced non-blocking submission */
> >> +               ret2 = call_read_iter(file, kiocb, &iter);
> >> +               if (!force_nonblock || ret2 != -EAGAIN)
> >> +                       io_rw_done(kiocb, ret2);
> >> +               else
> >> +                       ret = -EAGAIN;
> >> +       }
> >> +       kfree(iovec);
> >> +out_fput:
> >> +       if (unlikely(ret))
> >> +               fput(file);
> >> +       return ret;
> >> +}
> > [...]
> >> +static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
> >> +                          struct sqe_submit *s, bool force_nonblock)
> >> +{
> >> +       const struct io_uring_sqe *sqe = s->sqe;
> >> +       ssize_t ret;
> >> +
> >> +       if (unlikely(s->index >= ctx->sq_entries))
> >> +               return -EINVAL;
> >> +       req->user_data = sqe->user_data;
> >> +
> >> +       ret = -EINVAL;
> >> +       switch (sqe->opcode) {
> >> +       case IORING_OP_NOP:
> >> +               ret = io_nop(req, sqe);
> >> +               break;
> >> +       case IORING_OP_READV:
> >> +               ret = io_read(req, sqe, force_nonblock);
> >> +               break;
> >> +       case IORING_OP_WRITEV:
> >> +               ret = io_write(req, sqe, force_nonblock);
> >> +               break;
> >> +       default:
> >> +               ret = -EINVAL;
> >> +               break;
> >> +       }
> >> +
> >> +       return ret;
> >> +}
> >> +
> >> +static void io_sq_wq_submit_work(struct work_struct *work)
> >> +{
> >> +       struct io_kiocb *req = container_of(work, struct io_kiocb, work);
> >> +       struct sqe_submit *s = &req->submit;
> >> +       u64 user_data = s->sqe->user_data;
> >> +       struct io_ring_ctx *ctx = req->ctx;
> >> +       mm_segment_t old_fs = get_fs();
> >> +       struct files_struct *old_files;
> >> +       int ret;
> >> +
> >> +        /* Ensure we clear previously set forced non-block flag */
> >> +       req->flags &= ~REQ_F_FORCE_NONBLOCK;
> >> +
> >> +       old_files = current->files;
> >> +       current->files = ctx->sqo_files;
> >
> > I think you're not supposed to twiddle with current->files without
> > holding task_lock(current).
>
> 'current' is the work queue item in this case, do we need to protect
> against anything else? I can add the locking around the assignments
> (both places).

Stuff like proc_fd_link() uses get_files_struct(), which grabs a
reference to your current files_struct protected only by task_lock();
and it doesn't use anything like READ_ONCE(), so even if the object
lifetime is not a problem, get_files_struct() could potentially crash
due to a double-read (reading task->files twice and assuming that the
result will be the same). As far as I can tell, this procfs code also
works on kernel threads.

> >> +       if (!mmget_not_zero(ctx->sqo_mm)) {
> >> +               ret = -EFAULT;
> >> +               goto err;
> >> +       }
> >> +
> >> +       use_mm(ctx->sqo_mm);
> >> +       set_fs(USER_DS);
> >> +
> >> +       ret = __io_submit_sqe(ctx, req, s, false);
> >> +
> >> +       set_fs(old_fs);
> >> +       unuse_mm(ctx->sqo_mm);
> >> +       mmput(ctx->sqo_mm);
> >> +err:
> >> +       if (ret) {
> >> +               io_cqring_add_event(ctx, user_data, ret, 0);
> >> +               io_free_req(req);
> >> +       }
> >> +       current->files = old_files;
> >> +}
> > [...]
> >> +static int io_sq_offload_start(struct io_ring_ctx *ctx)
> >> +{
> >> +       int ret;
> >> +
> >> +       ctx->sqo_mm = current->mm;
> >
> > What keeps this thing alive?
>
> I think we're deadling with the same thing as the files below, I'll
> defer to that.
>
> >> +       /*
> >> +        * This is safe since 'current' has the fd installed, and if that gets
> >> +        * closed on exit, then fops->release() is invoked which waits for the
> >> +        * async contexts to flush and exit before exiting.
> >> +        */
> >> +       ret = -EBADF;
> >> +       ctx->sqo_files = current->files;
> >> +       if (!ctx->sqo_files)
> >> +               goto err;
> >
> > That's gnarly. Adding Al Viro to the thread.
> >
> > I think you misunderstand the semantics of f_op->release. The ->flush
> > handler is invoked whenever a file descriptor is closed through
> > filp_close() (via deletion of the files_struct, sys_close(),
> > sys_dup2(), ...), so if you had used that one, _maybe_ this would
> > work. But the ->release handler only runs when the _last_ reference to
> > a struct file has been dropped - so you can, for example, fork() a
> > child, then exit() in the parent, and the ->release handler isn't
> > invoked. So I don't see how this can work.
>
> The anonfd is CLOEXEC. The idea is exactly that it only runs when the
> last reference to the file has been dropped. Not sure why you think I
> need ->flush() here?

Can't I just use fcntl(fd, F_SETFD, fd, 0) to clear the CLOEXEC flag?
Or send the fd via SCM_RIGHTS?

> > But even if you had abused ->flush for this instead: close_files()
> > currently has a comment in it that claims that "this is the last
> > reference to the files structure"; this change would make that claim
> > untrue.
>
> Let me see if I can explain my intent better than that comment... We
> know the parent who set up the io_uring instance will be around for as
> long as io_uring instance persists.

That's the part that I think is wrong: As far as I can tell, the
parent can go away and you won't notice.

Also, note that "the parent" is different things for ->files and ->mm.
You can have a multithreaded process whose threads don't have the same
->files, or multiple process that share ->files without sharing ->mm,
...

> When we are tearing down the
> io_uring, then we wait for any async contexts (like the one above) to
> exit. Once they are exited, it's safe to proceed with the exit and
> teardown ->files[].

But you only do that teardown on ->release, right? And ->release
doesn't have much to do with the process lifetime.

> That's the idea... Not trying to be clever, some of this dates back to
> the aio weirdness where it was impossible to have cross references like
> this, since it would lead to teardown deadlocks with how exit_aio() is
> used. I can probably grab a struct files reference above, but currently
> I don't see why it's needed.

WARNING: multiple messages have this Message-ID (diff)
From: Jann Horn <jannh@google.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	linux-man <linux-man@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, Avi Kivity <avi@scylladb.com>
Subject: Re: [PATCH 05/18] Add io_uring IO interface
Date: Tue, 29 Jan 2019 00:59:32 +0100	[thread overview]
Message-ID: <CAG48ez17NW0GJVRC6dFcHZTgQifFz5og1XCUbXkHKhr6f=j74Q@mail.gmail.com> (raw)
In-Reply-To: <e9326a77-54c5-e2b8-d9e5-663261462597@kernel.dk>

On Tue, Jan 29, 2019 at 12:47 AM Jens Axboe <axboe@kernel.dk> wrote:
> On 1/28/19 3:32 PM, Jann Horn wrote:
> > On Mon, Jan 28, 2019 at 10:35 PM Jens Axboe <axboe@kernel.dk> wrote:
> >> The submission queue (SQ) and completion queue (CQ) rings are shared
> >> between the application and the kernel. This eliminates the need to
> >> copy data back and forth to submit and complete IO.
> >>
> >> IO submissions use the io_uring_sqe data structure, and completions
> >> are generated in the form of io_uring_sqe data structures. The SQ
> >> ring is an index into the io_uring_sqe array, which makes it possible
> >> to submit a batch of IOs without them being contiguous in the ring.
> >> The CQ ring is always contiguous, as completion events are inherently
> >> unordered, and hence any io_uring_cqe entry can point back to an
> >> arbitrary submission.
> >>
> >> Two new system calls are added for this:
> >>
> >> io_uring_setup(entries, params)
> >>         Sets up a context for doing async IO. On success, returns a file
> >>         descriptor that the application can mmap to gain access to the
> >>         SQ ring, CQ ring, and io_uring_sqes.
> >>
> >> io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
> >>         Initiates IO against the rings mapped to this fd, or waits for
> >>         them to complete, or both. The behavior is controlled by the
> >>         parameters passed in. If 'to_submit' is non-zero, then we'll
> >>         try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
> >>         kernel will wait for 'min_complete' events, if they aren't
> >>         already available. It's valid to set IORING_ENTER_GETEVENTS
> >>         and 'min_complete' == 0 at the same time, this allows the
> >>         kernel to return already completed events without waiting
> >>         for them. This is useful only for polling, as for IRQ
> >>         driven IO, the application can just check the CQ ring
> >>         without entering the kernel.
> >>
> >> With this setup, it's possible to do async IO with a single system
> >> call. Future developments will enable polled IO with this interface,
> >> and polled submission as well. The latter will enable an application
> >> to do IO without doing ANY system calls at all.
> >>
> >> For IRQ driven IO, an application only needs to enter the kernel for
> >> completions if it wants to wait for them to occur.
> >>
> >> Each io_uring is backed by a workqueue, to support buffered async IO
> >> as well. We will only punt to an async context if the command would
> >> need to wait for IO on the device side. Any data that can be accessed
> >> directly in the page cache is done inline. This avoids the slowness
> >> issue of usual threadpools, since cached data is accessed as quickly
> >> as a sync interface.
> >>
> >> Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c
> > [...]
> >> +static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> >> +                     bool force_nonblock)
> >> +{
> >> +       struct kiocb *kiocb = &req->rw;
> >> +       int ret;
> >> +
> >> +       kiocb->ki_filp = fget(sqe->fd);
> >> +       if (unlikely(!kiocb->ki_filp))
> >> +               return -EBADF;
> >> +       kiocb->ki_pos = sqe->off;
> >> +       kiocb->ki_flags = iocb_flags(kiocb->ki_filp);
> >> +       kiocb->ki_hint = ki_hint_validate(file_write_hint(kiocb->ki_filp));
> >> +       if (sqe->ioprio) {
> >> +               ret = ioprio_check_cap(sqe->ioprio);
> >> +               if (ret)
> >> +                       goto out_fput;
> >> +
> >> +               kiocb->ki_ioprio = sqe->ioprio;
> >> +       } else
> >> +               kiocb->ki_ioprio = get_current_ioprio();
> >> +
> >> +       ret = kiocb_set_rw_flags(kiocb, sqe->rw_flags);
> >> +       if (unlikely(ret))
> >> +               goto out_fput;
> >> +       if (force_nonblock) {
> >> +               kiocb->ki_flags |= IOCB_NOWAIT;
> >> +               req->flags |= REQ_F_FORCE_NONBLOCK;
> >> +       }
> >> +       if (kiocb->ki_flags & IOCB_HIPRI) {
> >> +               ret = -EINVAL;
> >> +               goto out_fput;
> >> +       }
> >> +
> >> +       kiocb->ki_complete = io_complete_rw;
> >> +       return 0;
> >> +out_fput:
> >> +       fput(kiocb->ki_filp);
> >> +       return ret;
> >> +}
> > [...]
> >> +static ssize_t io_read(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> >> +                      bool force_nonblock)
> >> +{
> >> +       struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
> >> +       struct kiocb *kiocb = &req->rw;
> >> +       struct iov_iter iter;
> >> +       struct file *file;
> >> +       ssize_t ret;
> >> +
> >> +       ret = io_prep_rw(req, sqe, force_nonblock);
> >> +       if (ret)
> >> +               return ret;
> >> +       file = kiocb->ki_filp;
> >> +
> >> +       ret = -EBADF;
> >> +       if (unlikely(!(file->f_mode & FMODE_READ)))
> >> +               goto out_fput;
> >> +       ret = -EINVAL;
> >> +       if (unlikely(!file->f_op->read_iter))
> >> +               goto out_fput;
> >> +
> >> +       ret = io_import_iovec(req->ctx, READ, sqe, &iovec, &iter);
> >> +       if (ret)
> >> +               goto out_fput;
> >> +
> >> +       ret = rw_verify_area(READ, file, &kiocb->ki_pos, iov_iter_count(&iter));
> >> +       if (!ret) {
> >> +               ssize_t ret2;
> >> +
> >> +               /* Catch -EAGAIN return for forced non-blocking submission */
> >> +               ret2 = call_read_iter(file, kiocb, &iter);
> >> +               if (!force_nonblock || ret2 != -EAGAIN)
> >> +                       io_rw_done(kiocb, ret2);
> >> +               else
> >> +                       ret = -EAGAIN;
> >> +       }
> >> +       kfree(iovec);
> >> +out_fput:
> >> +       if (unlikely(ret))
> >> +               fput(file);
> >> +       return ret;
> >> +}
> > [...]
> >> +static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
> >> +                          struct sqe_submit *s, bool force_nonblock)
> >> +{
> >> +       const struct io_uring_sqe *sqe = s->sqe;
> >> +       ssize_t ret;
> >> +
> >> +       if (unlikely(s->index >= ctx->sq_entries))
> >> +               return -EINVAL;
> >> +       req->user_data = sqe->user_data;
> >> +
> >> +       ret = -EINVAL;
> >> +       switch (sqe->opcode) {
> >> +       case IORING_OP_NOP:
> >> +               ret = io_nop(req, sqe);
> >> +               break;
> >> +       case IORING_OP_READV:
> >> +               ret = io_read(req, sqe, force_nonblock);
> >> +               break;
> >> +       case IORING_OP_WRITEV:
> >> +               ret = io_write(req, sqe, force_nonblock);
> >> +               break;
> >> +       default:
> >> +               ret = -EINVAL;
> >> +               break;
> >> +       }
> >> +
> >> +       return ret;
> >> +}
> >> +
> >> +static void io_sq_wq_submit_work(struct work_struct *work)
> >> +{
> >> +       struct io_kiocb *req = container_of(work, struct io_kiocb, work);
> >> +       struct sqe_submit *s = &req->submit;
> >> +       u64 user_data = s->sqe->user_data;
> >> +       struct io_ring_ctx *ctx = req->ctx;
> >> +       mm_segment_t old_fs = get_fs();
> >> +       struct files_struct *old_files;
> >> +       int ret;
> >> +
> >> +        /* Ensure we clear previously set forced non-block flag */
> >> +       req->flags &= ~REQ_F_FORCE_NONBLOCK;
> >> +
> >> +       old_files = current->files;
> >> +       current->files = ctx->sqo_files;
> >
> > I think you're not supposed to twiddle with current->files without
> > holding task_lock(current).
>
> 'current' is the work queue item in this case, do we need to protect
> against anything else? I can add the locking around the assignments
> (both places).

Stuff like proc_fd_link() uses get_files_struct(), which grabs a
reference to your current files_struct protected only by task_lock();
and it doesn't use anything like READ_ONCE(), so even if the object
lifetime is not a problem, get_files_struct() could potentially crash
due to a double-read (reading task->files twice and assuming that the
result will be the same). As far as I can tell, this procfs code also
works on kernel threads.

> >> +       if (!mmget_not_zero(ctx->sqo_mm)) {
> >> +               ret = -EFAULT;
> >> +               goto err;
> >> +       }
> >> +
> >> +       use_mm(ctx->sqo_mm);
> >> +       set_fs(USER_DS);
> >> +
> >> +       ret = __io_submit_sqe(ctx, req, s, false);
> >> +
> >> +       set_fs(old_fs);
> >> +       unuse_mm(ctx->sqo_mm);
> >> +       mmput(ctx->sqo_mm);
> >> +err:
> >> +       if (ret) {
> >> +               io_cqring_add_event(ctx, user_data, ret, 0);
> >> +               io_free_req(req);
> >> +       }
> >> +       current->files = old_files;
> >> +}
> > [...]
> >> +static int io_sq_offload_start(struct io_ring_ctx *ctx)
> >> +{
> >> +       int ret;
> >> +
> >> +       ctx->sqo_mm = current->mm;
> >
> > What keeps this thing alive?
>
> I think we're deadling with the same thing as the files below, I'll
> defer to that.
>
> >> +       /*
> >> +        * This is safe since 'current' has the fd installed, and if that gets
> >> +        * closed on exit, then fops->release() is invoked which waits for the
> >> +        * async contexts to flush and exit before exiting.
> >> +        */
> >> +       ret = -EBADF;
> >> +       ctx->sqo_files = current->files;
> >> +       if (!ctx->sqo_files)
> >> +               goto err;
> >
> > That's gnarly. Adding Al Viro to the thread.
> >
> > I think you misunderstand the semantics of f_op->release. The ->flush
> > handler is invoked whenever a file descriptor is closed through
> > filp_close() (via deletion of the files_struct, sys_close(),
> > sys_dup2(), ...), so if you had used that one, _maybe_ this would
> > work. But the ->release handler only runs when the _last_ reference to
> > a struct file has been dropped - so you can, for example, fork() a
> > child, then exit() in the parent, and the ->release handler isn't
> > invoked. So I don't see how this can work.
>
> The anonfd is CLOEXEC. The idea is exactly that it only runs when the
> last reference to the file has been dropped. Not sure why you think I
> need ->flush() here?

Can't I just use fcntl(fd, F_SETFD, fd, 0) to clear the CLOEXEC flag?
Or send the fd via SCM_RIGHTS?

> > But even if you had abused ->flush for this instead: close_files()
> > currently has a comment in it that claims that "this is the last
> > reference to the files structure"; this change would make that claim
> > untrue.
>
> Let me see if I can explain my intent better than that comment... We
> know the parent who set up the io_uring instance will be around for as
> long as io_uring instance persists.

That's the part that I think is wrong: As far as I can tell, the
parent can go away and you won't notice.

Also, note that "the parent" is different things for ->files and ->mm.
You can have a multithreaded process whose threads don't have the same
->files, or multiple process that share ->files without sharing ->mm,
...

> When we are tearing down the
> io_uring, then we wait for any async contexts (like the one above) to
> exit. Once they are exited, it's safe to proceed with the exit and
> teardown ->files[].

But you only do that teardown on ->release, right? And ->release
doesn't have much to do with the process lifetime.

> That's the idea... Not trying to be clever, some of this dates back to
> the aio weirdness where it was impossible to have cross references like
> this, since it would lead to teardown deadlocks with how exit_aio() is
> used. I can probably grab a struct files reference above, but currently
> I don't see why it's needed.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

  reply	other threads:[~2019-01-29  0:00 UTC|newest]

Thread overview: 201+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` Jens Axboe
2019-01-28 21:35 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:35 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:35 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:35 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:35 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:53   ` Jeff Moyer
2019-01-28 21:53     ` Jeff Moyer
2019-01-28 21:56     ` Jens Axboe
2019-01-28 21:56       ` Jens Axboe
2019-01-28 22:32   ` Jann Horn
2019-01-28 22:32     ` Jann Horn
2019-01-28 23:46     ` Jens Axboe
2019-01-28 23:46       ` Jens Axboe
2019-01-28 23:59       ` Jann Horn [this message]
2019-01-28 23:59         ` Jann Horn
2019-01-29  0:03         ` Jens Axboe
2019-01-29  0:03           ` Jens Axboe
2019-01-29  0:31           ` Jens Axboe
2019-01-29  0:31             ` Jens Axboe
2019-01-29  0:34             ` Jann Horn
2019-01-29  0:34               ` Jann Horn
2019-01-29  0:55               ` Jens Axboe
2019-01-29  0:55                 ` Jens Axboe
2019-01-29  0:58                 ` Jann Horn
2019-01-29  0:58                   ` Jann Horn
2019-01-29  1:01                   ` Jens Axboe
2019-01-29  1:01                     ` Jens Axboe
2019-02-01 16:57         ` Matt Mullins
2019-02-01 16:57           ` Matt Mullins
2019-02-01 17:04           ` Jann Horn
2019-02-01 17:04             ` Jann Horn
2019-02-01 17:23             ` Jann Horn
2019-02-01 17:23               ` Jann Horn
2019-02-01 18:05               ` Al Viro
2019-02-01 18:05                 ` Al Viro
2019-01-29  1:07   ` Jann Horn
2019-01-29  1:07     ` Jann Horn
2019-01-29  2:21     ` Jann Horn
2019-01-29  2:21       ` Jann Horn
2019-01-29  2:54       ` Jens Axboe
2019-01-29  2:54         ` Jens Axboe
2019-01-29  3:46       ` Jens Axboe
2019-01-29  3:46         ` Jens Axboe
2019-01-29 15:56         ` Jann Horn
2019-01-29 15:56           ` Jann Horn
2019-01-29 16:06           ` Jens Axboe
2019-01-29 16:06             ` Jens Axboe
2019-01-29  2:21     ` Jens Axboe
2019-01-29  2:21       ` Jens Axboe
2019-01-29  1:29   ` Jann Horn
2019-01-29  1:29     ` Jann Horn
2019-01-29  1:31     ` Jens Axboe
2019-01-29  1:31       ` Jens Axboe
2019-01-29  1:32       ` Jann Horn
2019-01-29  1:32         ` Jann Horn
2019-01-29  2:23         ` Jens Axboe
2019-01-29  2:23           ` Jens Axboe
2019-01-29  7:12   ` Bert Wesarg
2019-01-29  7:12     ` Bert Wesarg
2019-01-29 12:12   ` Florian Weimer
2019-01-29 12:12     ` Florian Weimer
2019-01-29 13:35     ` Jens Axboe
2019-01-29 13:35       ` Jens Axboe
2019-01-29 15:38       ` Jann Horn
2019-01-29 15:38         ` Jann Horn
2019-01-29 15:54         ` Jens Axboe
2019-01-29 15:54           ` Jens Axboe
2019-01-29 16:55         ` Christoph Hellwig
2019-01-29 16:55           ` Christoph Hellwig
2019-01-29 15:35   ` Jann Horn
2019-01-29 15:35     ` Jann Horn
2019-01-29 15:39     ` Jens Axboe
2019-01-29 15:39       ` Jens Axboe
2019-01-28 21:35 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:35 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-29 17:24   ` Christoph Hellwig
2019-01-29 17:24     ` Christoph Hellwig
2019-01-29 18:31     ` Jens Axboe
2019-01-29 18:31       ` Jens Axboe
2019-01-29 19:10       ` Jens Axboe
2019-01-29 19:10         ` Jens Axboe
2019-01-29 20:35         ` Jeff Moyer
2019-01-29 20:35           ` Jeff Moyer
2019-01-29 20:37           ` Jens Axboe
2019-01-29 20:37             ` Jens Axboe
2019-01-28 21:35 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:35 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:56   ` Jann Horn
2019-01-28 21:56     ` Jann Horn
2019-01-28 22:03     ` Jens Axboe
2019-01-28 22:03       ` Jens Axboe
2019-01-28 21:35 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-29 17:26   ` Christoph Hellwig
2019-01-29 17:26     ` Christoph Hellwig
2019-01-29 18:14     ` Jens Axboe
2019-01-29 18:14       ` Jens Axboe
2019-01-28 21:35 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:35 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 23:35   ` Jann Horn
2019-01-28 23:35     ` Jann Horn
2019-01-28 23:50     ` Jens Axboe
2019-01-28 23:50       ` Jens Axboe
2019-01-29  0:36       ` Jann Horn
2019-01-29  0:36         ` Jann Horn
2019-01-29  1:25         ` Jens Axboe
2019-01-29  1:25           ` Jens Axboe
2019-01-28 21:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-29 16:36   ` Jann Horn
2019-01-29 16:36     ` Jann Horn
2019-01-29 18:13     ` Jens Axboe
2019-01-29 18:13       ` Jens Axboe
2019-01-28 21:35 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:35 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-29 17:26   ` Christoph Hellwig
2019-01-29 17:26     ` Christoph Hellwig
2019-01-28 21:35 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:35 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:35 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
2019-01-28 21:35   ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2019-02-07 19:55 [PATCHSET v12] io_uring IO interface Jens Axboe
2019-02-07 19:55 ` [PATCH 05/18] Add " Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 20:15   ` Keith Busch
2019-02-07 20:15     ` Keith Busch
2019-02-07 20:16     ` Jens Axboe
2019-02-07 20:16       ` Jens Axboe
2019-02-01 15:23 [PATCHSET v11] " Jens Axboe
2019-02-01 15:24 ` [PATCH 05/18] Add " Jens Axboe
2019-02-01 15:24   ` Jens Axboe
2019-02-01 18:20   ` Florian Weimer
2019-02-01 18:20     ` Florian Weimer
2019-02-05 16:58     ` Jens Axboe
2019-02-05 16:58       ` Jens Axboe
2019-02-04 23:22   ` Jeff Moyer
2019-02-04 23:22     ` Jeff Moyer
2019-02-04 23:52     ` Jeff Moyer
2019-02-04 23:52       ` Jeff Moyer
2019-02-05 16:59       ` Jens Axboe
2019-02-05 16:59         ` Jens Axboe
2019-02-05 16:58     ` Jens Axboe
2019-02-05 16:58       ` Jens Axboe
2019-01-30 21:55 [PATCHSET v10] " Jens Axboe
2019-01-30 21:55 ` [PATCH 05/18] Add " Jens Axboe
2019-01-30 21:55   ` Jens Axboe
2019-01-29 19:26 [PATCHSET v9] " Jens Axboe
2019-01-29 19:26 ` [PATCH 05/18] Add " Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-23 15:35 [PATCHSET v7] " Jens Axboe
2019-01-23 15:35 ` [PATCH 05/18] Add " Jens Axboe
2019-01-28 14:57   ` Christoph Hellwig
2019-01-28 14:57     ` Christoph Hellwig
2019-01-28 16:26     ` Jens Axboe
2019-01-28 16:26       ` Jens Axboe
2019-01-28 16:34       ` Christoph Hellwig
2019-01-28 16:34         ` Christoph Hellwig
2019-01-28 19:32         ` Jens Axboe
2019-01-28 19:32           ` Jens Axboe
2019-01-28 18:25     ` Jens Axboe
2019-01-28 18:25       ` Jens Axboe
2019-01-29  6:30       ` Christoph Hellwig
2019-01-29  6:30         ` Christoph Hellwig
2019-01-29 11:58         ` Arnd Bergmann
2019-01-29 11:58           ` Arnd Bergmann
2019-01-29 15:20           ` Jens Axboe
2019-01-29 15:20             ` Jens Axboe
2019-01-29 16:18             ` Arnd Bergmann
2019-01-29 16:18               ` Arnd Bergmann
2019-01-29 16:19               ` Jens Axboe
2019-01-29 16:19                 ` Jens Axboe
2019-01-29 16:26                 ` Arnd Bergmann
2019-01-29 16:26                   ` Arnd Bergmann
2019-01-29 16:28                   ` Jens Axboe
2019-01-29 16:28                     ` Jens Axboe
2019-01-29 16:46                     ` Arnd Bergmann
2019-01-29 16:46                       ` Arnd Bergmann
2019-01-29  0:47     ` Andy Lutomirski
2019-01-29  0:47       ` Andy Lutomirski
2019-01-29  1:20       ` Jens Axboe
2019-01-29  1:20         ` Jens Axboe
2019-01-29  6:45         ` Christoph Hellwig
2019-01-29  6:45           ` Christoph Hellwig
2019-01-29 12:05           ` Arnd Bergmann
2019-01-29 12:05             ` Arnd Bergmann
2019-01-31  5:11         ` Andy Lutomirski
2019-01-31  5:11           ` Andy Lutomirski
2019-01-31 16:37           ` Jens Axboe
2019-01-31 16:37             ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez17NW0GJVRC6dFcHZTgQifFz5og1XCUbXkHKhr6f=j74Q@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=avi@scylladb.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.