All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH 1/3] io_uring: add support for async work inheriting files table
Date: Thu, 24 Oct 2019 22:31:07 +0200	[thread overview]
Message-ID: <CAG48ez0K_wtHA4DSWjz4TjohHkMTGo2pTpDVMZPQWD2gtrqZJw@mail.gmail.com> (raw)
In-Reply-To: <c3fb07d4-223c-8835-5c22-68367e957a4f@kernel.dk>

On Thu, Oct 24, 2019 at 9:41 PM Jens Axboe <axboe@kernel.dk> wrote:
> On 10/18/19 12:50 PM, Jann Horn wrote:
> > On Fri, Oct 18, 2019 at 8:16 PM Jens Axboe <axboe@kernel.dk> wrote:
> >> On 10/18/19 12:06 PM, Jann Horn wrote:
> >>> But actually, by the way: Is this whole files_struct thing creating a
> >>> reference loop? The files_struct has a reference to the uring file,
> >>> and the uring file has ACCEPT work that has a reference to the
> >>> files_struct. If the task gets killed and the accept work blocks, the
> >>> entire files_struct will stay alive, right?
> >>
> >> Yes, for the lifetime of the request, it does create a loop. So if the
> >> application goes away, I think you're right, the files_struct will stay.
> >> And so will the io_uring, for that matter, as we depend on the closing
> >> of the files to do the final reap.
> >>
> >> Hmm, not sure how best to handle that, to be honest. We need some way to
> >> break the loop, if the request never finishes.
> >
> > A wacky and dubious approach would be to, instead of taking a
> > reference to the files_struct, abuse f_op->flush() to synchronously
> > flush out pending requests with references to the files_struct... But
> > it's probably a bad idea, given that in f_op->flush(), you can't
> > easily tell which files_struct the close is coming from. I suppose you
> > could keep a list of (fdtable, fd) pairs through which ACCEPT requests
> > have come in and then let f_op->flush() probe whether the file
> > pointers are gone from them...
>
> Got back to this after finishing the io-wq stuff, which we need for the
> cancel.
>
> Here's an updated patch:
>
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-test&id=1ea847edc58d6a54ca53001ad0c656da57257570
>
> that seems to work for me (lightly tested), we correctly find and cancel
> work that is holding on to the file table.
>
> The full series sits on top of my for-5.5/io_uring-wq branch, and can be
> viewed here:
>
> http://git.kernel.dk/cgit/linux-block/log/?h=for-5.5/io_uring-test
>
> Let me know what you think!

Ah, I didn't realize that the second argument to f_op->flush is a
pointer to the files_struct. That's neat.


Security: There is no guarantee that ->flush() will run after the last
io_uring_enter() finishes. You can race like this, with threads A and
B in one process and C in another one:

A: sends uring fd to C via unix domain socket
A: starts syscall io_uring_enter(fd, ...)
A: calls fdget(fd), takes reference to file
B: starts syscall close(fd)
B: fd table entry is removed
B: f_op->flush is invoked and finds no pending transactions
B: syscall close() returns
A: continues io_uring_enter(), grabbing current->files
A: io_uring_enter() returns
A and B: exit
worker: use-after-free access to files_struct

I think the solution to this would be (unless you're fine with adding
some broad global read-write mutex) something like this in
__io_queue_sqe(), where "fd" and "f" are the variables from
io_uring_enter(), plumbed through the stack somehow:

if (req->flags & REQ_F_NEED_FILES) {
  rcu_read_lock();
  spin_lock_irq(&ctx->inflight_lock);
  if (fcheck(fd) == f) {
    list_add(&req->inflight_list,
      &ctx->inflight_list);
    req->work.files = current->files;
    ret = 0;
  } else {
    ret = -EBADF;
  }
  spin_unlock_irq(&ctx->inflight_lock);
  rcu_read_unlock();
  if (ret)
    goto put_req;
}


Minor note: If a process uses dup() to duplicate the uring fd, then
closes the duplicated fd, that will cause work cancellations - but I
guess that's fine?


Style nit: I find it a bit confusing to name both the list head and
the list member heads "inflight_list". Maybe name them "inflight_list"
and "inflight_entry", or something like that?


Correctness: Why is the wait in io_uring_flush() TASK_INTERRUPTIBLE?
Shouldn't it be TASK_UNINTERRUPTIBLE? If someone sends a signal to the
task while it's at that schedule(), it's just going to loop back
around and retry what it was doing already, right?


Security + Correctness: If there is more than one io_wqe, it seems to
me that io_uring_flush() calls io_wq_cancel_work(), which calls
io_wqe_cancel_work(), which may return IO_WQ_CANCEL_OK if the first
request it looks at is pending. In that case, io_wq_cancel_work() will
immediately return, and io_uring_flush() will also immediately return.
It looks like any other requests will continue running?

  reply	other threads:[~2019-10-24 20:31 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 21:28 [PATCHSET] io_uring: add support for accept(4) Jens Axboe
2019-10-17 21:28 ` [PATCH 1/3] io_uring: add support for async work inheriting files table Jens Axboe
2019-10-18  2:41   ` Jann Horn
2019-10-18 14:01     ` Jens Axboe
2019-10-18 14:34       ` Jann Horn
2019-10-18 14:37         ` Jens Axboe
2019-10-18 14:40           ` Jann Horn
2019-10-18 14:43             ` Jens Axboe
2019-10-18 14:52               ` Jann Horn
2019-10-18 15:00                 ` Jens Axboe
2019-10-18 15:54                   ` Jens Axboe
2019-10-18 16:20                     ` Jann Horn
2019-10-18 16:36                       ` Jens Axboe
2019-10-18 17:05                         ` Jens Axboe
2019-10-18 18:06                           ` Jann Horn
2019-10-18 18:16                             ` Jens Axboe
2019-10-18 18:50                               ` Jann Horn
2019-10-24 19:41                                 ` Jens Axboe
2019-10-24 20:31                                   ` Jann Horn [this message]
2019-10-24 22:04                                     ` Jens Axboe
2019-10-24 22:09                                       ` Jens Axboe
2019-10-24 23:13                                       ` Jann Horn
2019-10-25  0:35                                         ` Jens Axboe
2019-10-25  0:52                                           ` Jens Axboe
2019-10-23 12:04   ` Wolfgang Bumiller
2019-10-23 14:11     ` Jens Axboe
2019-10-17 21:28 ` [PATCH 2/3] net: add __sys_accept4_file() helper Jens Axboe
2019-10-17 21:28 ` [PATCH 3/3] io_uring: add support for IORING_OP_ACCEPT Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez0K_wtHA4DSWjz4TjohHkMTGo2pTpDVMZPQWD2gtrqZJw@mail.gmail.com \
    --to=jannh@google.com \
    --cc=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=linux-block@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.