From: Jann Horn <jannh@google.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-aio@kvack.org, linux-block@vger.kernel.org,
Linux API <linux-api@vger.kernel.org>,
hch@lst.de, jmoyer@redhat.com, Avi Kivity <avi@scylladb.com>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 14/19] io_uring: add file set registration
Date: Fri, 8 Feb 2019 21:26:01 +0100 [thread overview]
Message-ID: <CAG48ez2Fp10zQ4Y-x4RJh3Cu=vABu9w8ZrAJThvkh6pr74ZiQw@mail.gmail.com> (raw)
In-Reply-To: <20190208173423.27014-15-axboe@kernel.dk>
On Fri, Feb 8, 2019 at 6:35 PM Jens Axboe <axboe@kernel.dk> wrote:
> We normally have to fget/fput for each IO we do on a file. Even with
> the batching we do, the cost of the atomic inc/dec of the file usage
> count adds up.
>
> This adds IORING_REGISTER_FILES, and IORING_UNREGISTER_FILES opcodes
> for the io_uring_register(2) system call. The arguments passed in must
> be an array of __s32 holding file descriptors, and nr_args should hold
> the number of file descriptors the application wishes to pin for the
> duration of the io_uring instance (or until IORING_UNREGISTER_FILES is
> called).
>
> When used, the application must set IOSQE_FIXED_FILE in the sqe->flags
> member. Then, instead of setting sqe->fd to the real fd, it sets sqe->fd
> to the index in the array passed in to IORING_REGISTER_FILES.
>
> Files are automatically unregistered when the io_uring instance is torn
> down. An application need only unregister if it wishes to register a new
> set of fds.
I think the overall concept here is still broken: You're giving the
user_files to the GC, and I think the GC can drop their refcounts, but
I don't see you actually getting feedback from the GC anywhere that
would let the GC break your references? E.g. in io_prep_rw() you grab
file pointers from ctx->user_files after simply checking
ctx->nr_user_files, and there is no path from the GC that touches
those fields. As far as I can tell, the GC is just going to go through
unix_destruct_scm() and drop references on your files, causing
use-after-free.
But the unix GC is complicated, and maybe I'm just missing something...
> +static void __io_sqe_files_unregister(struct io_ring_ctx *ctx)
> +{
> +#if defined(CONFIG_UNIX)
> + if (ctx->ring_sock) {
> + struct sock *sock = ctx->ring_sock->sk;
> + struct sk_buff *skb;
> +
> + while ((skb = skb_dequeue(&sock->sk_receive_queue)) != NULL)
> + kfree_skb(skb);
> + }
> +#else
> + int i;
> +
> + for (i = 0; i < ctx->nr_user_files; i++)
> + fput(ctx->user_files[i]);
> +#endif
> +}
> +
> +static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
> +{
> + if (!ctx->user_files)
> + return -ENXIO;
> +
> + __io_sqe_files_unregister(ctx);
> + kfree(ctx->user_files);
> + ctx->user_files = NULL;
> + return 0;
> +}
> +
> +#if defined(CONFIG_UNIX)
> +static int __io_sqe_files_scm(struct io_ring_ctx *ctx, int nr, int offset)
> +{
> + struct scm_fp_list *fpl;
> + struct sk_buff *skb;
> + int i;
> +
> + fpl = kzalloc(sizeof(*fpl), GFP_KERNEL);
> + if (!fpl)
> + return -ENOMEM;
> +
> + skb = alloc_skb(0, GFP_KERNEL);
> + if (!skb) {
> + kfree(fpl);
> + return -ENOMEM;
> + }
> +
> + skb->sk = ctx->ring_sock->sk;
> + skb->destructor = unix_destruct_scm;
> +
> + fpl->user = get_uid(ctx->user);
> + for (i = 0; i < nr; i++) {
> + fpl->fp[i] = get_file(ctx->user_files[i + offset]);
> + unix_inflight(fpl->user, fpl->fp[i]);
> + fput(fpl->fp[i]);
This pattern is almost always superfluous. You increment the file's
refcount, maybe insert the file into a list (essentially), and drop
the file's refcount back down. You're already holding a stable
reference, and you're not temporarily lending that to anyone else.
> + }
> +
> + fpl->max = fpl->count = nr;
> + UNIXCB(skb).fp = fpl;
> + skb_queue_head(&ctx->ring_sock->sk->sk_receive_queue, skb);
> + return 0;
> +}
> +
> +/*
> + * If UNIX sockets are enabled, fd passing can cause a reference cycle which
> + * causes regular reference counting to break down. We rely on the UNIX
> + * garbage collection to take care of this problem for us.
> + */
> +static int io_sqe_files_scm(struct io_ring_ctx *ctx)
> +{
> + unsigned left, total;
> + int ret = 0;
> +
> + total = 0;
> + left = ctx->nr_user_files;
> + while (left) {
> + unsigned this_files = min_t(unsigned, left, SCM_MAX_FD);
> + int ret;
> +
> + ret = __io_sqe_files_scm(ctx, this_files, total);
> + if (ret)
> + break;
> + left -= this_files;
> + total += this_files;
> + }
> +
> + return ret;
> +}
> +#else
> +static int io_sqe_files_scm(struct io_ring_ctx *ctx)
> +{
> + return 0;
> +}
> +#endif
> +
> +static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
> + unsigned nr_args)
> +{
> + __s32 __user *fds = (__s32 __user *) arg;
> + int fd, ret = 0;
> + unsigned i;
> +
> + if (ctx->user_files)
> + return -EBUSY;
> + if (!nr_args)
> + return -EINVAL;
> + if (nr_args > IORING_MAX_FIXED_FILES)
> + return -EMFILE;
> +
> + ctx->user_files = kcalloc(nr_args, sizeof(struct file *), GFP_KERNEL);
> + if (!ctx->user_files)
> + return -ENOMEM;
> +
> + for (i = 0; i < nr_args; i++) {
> + ret = -EFAULT;
> + if (copy_from_user(&fd, &fds[i], sizeof(fd)))
> + break;
> +
> + ctx->user_files[i] = fget(fd);
> +
> + ret = -EBADF;
> + if (!ctx->user_files[i])
> + break;
> + /*
> + * Don't allow io_uring instances to be registered. If UNIX
> + * isn't enabled, then this causes a reference cycle and this
> + * instance can never get freed. If UNIX is enabled we'll
> + * handle it just fine, but there's still no point in allowing
> + * a ring fd as it doesn't suppor regular read/write anyway.
nit: support
> + */
> + if (ctx->user_files[i]->f_op == &io_uring_fops) {
> + fput(ctx->user_files[i]);
> + break;
> + }
> + ctx->nr_user_files++;
> + ret = 0;
> + }
> +
> + if (!ret)
> + ret = io_sqe_files_scm(ctx);
> + if (ret)
> + io_sqe_files_unregister(ctx);
> +
> + return ret;
> +}
> +
> static int io_sq_offload_start(struct io_ring_ctx *ctx)
> {
> int ret;
> @@ -1521,14 +1708,16 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
> destroy_workqueue(ctx->sqo_wq);
> if (ctx->sqo_mm)
> mmdrop(ctx->sqo_mm);
> +
> + io_iopoll_reap_events(ctx);
> + io_sqe_buffer_unregister(ctx);
> + io_sqe_files_unregister(ctx);
> +
> #if defined(CONFIG_UNIX)
> if (ctx->ring_sock)
> sock_release(ctx->ring_sock);
> #endif
>
> - io_iopoll_reap_events(ctx);
> - io_sqe_buffer_unregister(ctx);
next prev parent reply other threads:[~2019-02-08 20:26 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-08 17:34 [PATCHSET v13] io_uring IO interface Jens Axboe
2019-02-08 17:34 ` [PATCH 01/19] fs: add an iopoll method to struct file_operations Jens Axboe
2019-02-09 9:20 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 02/19] block: wire up block device iopoll method Jens Axboe
2019-02-09 9:22 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 03/19] block: add bio_set_polled() helper Jens Axboe
2019-02-09 9:24 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 04/19] iomap: wire up the iopoll method Jens Axboe
2019-02-09 9:25 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 05/19] Add io_uring IO interface Jens Axboe
2019-02-08 22:12 ` Jann Horn
2019-02-09 4:15 ` Jens Axboe
2019-02-12 21:42 ` Jann Horn
2019-02-12 22:03 ` Jens Axboe
2019-02-12 22:06 ` Jens Axboe
2019-02-12 22:40 ` Jann Horn
2019-02-12 22:45 ` Jens Axboe
2019-02-12 22:52 ` Jens Axboe
2019-02-12 22:57 ` Jann Horn
2019-02-12 23:00 ` Jens Axboe
2019-02-12 23:11 ` Jann Horn
2019-02-12 23:19 ` Jens Axboe
2019-02-12 23:28 ` Jann Horn
2019-02-12 23:46 ` Jens Axboe
2019-02-12 23:53 ` Jens Axboe
2019-02-13 0:07 ` Andy Lutomirski
2019-02-13 0:14 ` Jann Horn
2019-02-13 0:24 ` Jens Axboe
2019-02-09 9:35 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 06/19] io_uring: add fsync support Jens Axboe
2019-02-08 22:36 ` Jann Horn
2019-02-08 23:31 ` Jens Axboe
2019-02-09 9:37 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 07/19] io_uring: support for IO polling Jens Axboe
2019-02-09 9:39 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 08/19] fs: add fget_many() and fput_many() Jens Axboe
2019-02-09 9:41 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 09/19] io_uring: use fget/fput_many() for file references Jens Axboe
2019-02-09 9:42 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 10/19] io_uring: batch io_kiocb allocation Jens Axboe
2019-02-09 9:43 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 11/19] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-02-09 9:45 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 12/19] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-02-08 22:54 ` Jann Horn
2019-02-08 23:38 ` Jens Axboe
2019-02-09 16:50 ` Jens Axboe
2019-02-09 9:48 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 13/19] net: split out functions related to registering inflight socket files Jens Axboe
2019-02-08 19:49 ` David Miller
2019-02-08 19:51 ` Jens Axboe
2019-02-09 9:49 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 14/19] io_uring: add file set registration Jens Axboe
2019-02-08 20:26 ` Jann Horn [this message]
2019-02-09 0:16 ` Jens Axboe
2019-02-09 9:50 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 15/19] io_uring: add submission polling Jens Axboe
2019-02-09 9:53 ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 16/19] io_uring: add io_kiocb ref count Jens Axboe
2019-02-08 17:34 ` [PATCH 17/19] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-02-08 17:34 ` [PATCH 18/19] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-02-08 17:34 ` [PATCH 19/19] io_uring: add io_uring_event cache hit information Jens Axboe
2019-02-09 21:13 [PATCHSET v14] io_uring IO interface Jens Axboe
2019-02-09 21:13 ` [PATCH 14/19] io_uring: add file set registration Jens Axboe
2019-02-09 23:52 ` Matt Mullins
2019-02-10 0:47 ` Jens Axboe
[not found] ` <60e4c6a489549daad1fb2c5e8eee5496c668d79a.camel@fb.com>
2019-02-10 2:34 ` Jens Axboe
2019-02-10 2:57 ` Jens Axboe
2019-02-10 19:55 ` Matt Mullins
2019-02-11 19:00 [PATCHSET v15] io_uring IO interface Jens Axboe
2019-02-11 19:00 ` [PATCH 14/19] io_uring: add file set registration Jens Axboe
2019-02-19 16:12 ` Jann Horn
2019-02-22 22:29 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAG48ez2Fp10zQ4Y-x4RJh3Cu=vABu9w8ZrAJThvkh6pr74ZiQw@mail.gmail.com' \
--to=jannh@google.com \
--cc=avi@scylladb.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=jmoyer@redhat.com \
--cc=linux-aio@kvack.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).