All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Jann Horn <jannh@google.com>
Cc: linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, Avi Kivity <avi@scylladb.com>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 14/19] io_uring: add file set registration
Date: Fri, 8 Feb 2019 17:16:05 -0700	[thread overview]
Message-ID: <7a003733-a1aa-0f55-324d-4bf0360a5df9@kernel.dk> (raw)
In-Reply-To: <CAG48ez2Fp10zQ4Y-x4RJh3Cu=vABu9w8ZrAJThvkh6pr74ZiQw@mail.gmail.com>

On 2/8/19 1:26 PM, Jann Horn wrote:
> On Fri, Feb 8, 2019 at 6:35 PM Jens Axboe <axboe@kernel.dk> wrote:
>> We normally have to fget/fput for each IO we do on a file. Even with
>> the batching we do, the cost of the atomic inc/dec of the file usage
>> count adds up.
>>
>> This adds IORING_REGISTER_FILES, and IORING_UNREGISTER_FILES opcodes
>> for the io_uring_register(2) system call. The arguments passed in must
>> be an array of __s32 holding file descriptors, and nr_args should hold
>> the number of file descriptors the application wishes to pin for the
>> duration of the io_uring instance (or until IORING_UNREGISTER_FILES is
>> called).
>>
>> When used, the application must set IOSQE_FIXED_FILE in the sqe->flags
>> member. Then, instead of setting sqe->fd to the real fd, it sets sqe->fd
>> to the index in the array passed in to IORING_REGISTER_FILES.
>>
>> Files are automatically unregistered when the io_uring instance is torn
>> down. An application need only unregister if it wishes to register a new
>> set of fds.
> 
> I think the overall concept here is still broken: You're giving the
> user_files to the GC, and I think the GC can drop their refcounts, but
> I don't see you actually getting feedback from the GC anywhere that
> would let the GC break your references? E.g. in io_prep_rw() you grab
> file pointers from ctx->user_files after simply checking
> ctx->nr_user_files, and there is no path from the GC that touches
> those fields. As far as I can tell, the GC is just going to go through
> unix_destruct_scm() and drop references on your files, causing
> use-after-free.
> 
> But the unix GC is complicated, and maybe I'm just missing something...

Only when the skb is released, which is either done when the io_uring
is torn down (and then definitely safe), or if the socket is released,
which is again also at a safe time.

>> +static void __io_sqe_files_unregister(struct io_ring_ctx *ctx)
>> +{
>> +#if defined(CONFIG_UNIX)
>> +       if (ctx->ring_sock) {
>> +               struct sock *sock = ctx->ring_sock->sk;
>> +               struct sk_buff *skb;
>> +
>> +               while ((skb = skb_dequeue(&sock->sk_receive_queue)) != NULL)
>> +                       kfree_skb(skb);
>> +       }
>> +#else
>> +       int i;
>> +
>> +       for (i = 0; i < ctx->nr_user_files; i++)
>> +               fput(ctx->user_files[i]);
>> +#endif
>> +}
>> +
>> +static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
>> +{
>> +       if (!ctx->user_files)
>> +               return -ENXIO;
>> +
>> +       __io_sqe_files_unregister(ctx);
>> +       kfree(ctx->user_files);
>> +       ctx->user_files = NULL;
>> +       return 0;
>> +}
>> +
>> +#if defined(CONFIG_UNIX)
>> +static int __io_sqe_files_scm(struct io_ring_ctx *ctx, int nr, int offset)
>> +{
>> +       struct scm_fp_list *fpl;
>> +       struct sk_buff *skb;
>> +       int i;
>> +
>> +       fpl = kzalloc(sizeof(*fpl), GFP_KERNEL);
>> +       if (!fpl)
>> +               return -ENOMEM;
>> +
>> +       skb = alloc_skb(0, GFP_KERNEL);
>> +       if (!skb) {
>> +               kfree(fpl);
>> +               return -ENOMEM;
>> +       }
>> +
>> +       skb->sk = ctx->ring_sock->sk;
>> +       skb->destructor = unix_destruct_scm;
>> +
>> +       fpl->user = get_uid(ctx->user);
>> +       for (i = 0; i < nr; i++) {
>> +               fpl->fp[i] = get_file(ctx->user_files[i + offset]);
>> +               unix_inflight(fpl->user, fpl->fp[i]);
>> +               fput(fpl->fp[i]);
> 
> This pattern is almost always superfluous. You increment the file's
> refcount, maybe insert the file into a list (essentially), and drop
> the file's refcount back down. You're already holding a stable
> reference, and you're not temporarily lending that to anyone else.

Actually, this is me messing up. The fput() should be done AFTER
adding to the socket. I'll fix that.

-- 
Jens Axboe


WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@kernel.dk>
To: Jann Horn <jannh@google.com>
Cc: linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, Avi Kivity <avi@scylladb.com>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 14/19] io_uring: add file set registration
Date: Fri, 8 Feb 2019 17:16:05 -0700	[thread overview]
Message-ID: <7a003733-a1aa-0f55-324d-4bf0360a5df9@kernel.dk> (raw)
In-Reply-To: <CAG48ez2Fp10zQ4Y-x4RJh3Cu=vABu9w8ZrAJThvkh6pr74ZiQw@mail.gmail.com>

On 2/8/19 1:26 PM, Jann Horn wrote:
> On Fri, Feb 8, 2019 at 6:35 PM Jens Axboe <axboe@kernel.dk> wrote:
>> We normally have to fget/fput for each IO we do on a file. Even with
>> the batching we do, the cost of the atomic inc/dec of the file usage
>> count adds up.
>>
>> This adds IORING_REGISTER_FILES, and IORING_UNREGISTER_FILES opcodes
>> for the io_uring_register(2) system call. The arguments passed in must
>> be an array of __s32 holding file descriptors, and nr_args should hold
>> the number of file descriptors the application wishes to pin for the
>> duration of the io_uring instance (or until IORING_UNREGISTER_FILES is
>> called).
>>
>> When used, the application must set IOSQE_FIXED_FILE in the sqe->flags
>> member. Then, instead of setting sqe->fd to the real fd, it sets sqe->fd
>> to the index in the array passed in to IORING_REGISTER_FILES.
>>
>> Files are automatically unregistered when the io_uring instance is torn
>> down. An application need only unregister if it wishes to register a new
>> set of fds.
> 
> I think the overall concept here is still broken: You're giving the
> user_files to the GC, and I think the GC can drop their refcounts, but
> I don't see you actually getting feedback from the GC anywhere that
> would let the GC break your references? E.g. in io_prep_rw() you grab
> file pointers from ctx->user_files after simply checking
> ctx->nr_user_files, and there is no path from the GC that touches
> those fields. As far as I can tell, the GC is just going to go through
> unix_destruct_scm() and drop references on your files, causing
> use-after-free.
> 
> But the unix GC is complicated, and maybe I'm just missing something...

Only when the skb is released, which is either done when the io_uring
is torn down (and then definitely safe), or if the socket is released,
which is again also at a safe time.

>> +static void __io_sqe_files_unregister(struct io_ring_ctx *ctx)
>> +{
>> +#if defined(CONFIG_UNIX)
>> +       if (ctx->ring_sock) {
>> +               struct sock *sock = ctx->ring_sock->sk;
>> +               struct sk_buff *skb;
>> +
>> +               while ((skb = skb_dequeue(&sock->sk_receive_queue)) != NULL)
>> +                       kfree_skb(skb);
>> +       }
>> +#else
>> +       int i;
>> +
>> +       for (i = 0; i < ctx->nr_user_files; i++)
>> +               fput(ctx->user_files[i]);
>> +#endif
>> +}
>> +
>> +static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
>> +{
>> +       if (!ctx->user_files)
>> +               return -ENXIO;
>> +
>> +       __io_sqe_files_unregister(ctx);
>> +       kfree(ctx->user_files);
>> +       ctx->user_files = NULL;
>> +       return 0;
>> +}
>> +
>> +#if defined(CONFIG_UNIX)
>> +static int __io_sqe_files_scm(struct io_ring_ctx *ctx, int nr, int offset)
>> +{
>> +       struct scm_fp_list *fpl;
>> +       struct sk_buff *skb;
>> +       int i;
>> +
>> +       fpl = kzalloc(sizeof(*fpl), GFP_KERNEL);
>> +       if (!fpl)
>> +               return -ENOMEM;
>> +
>> +       skb = alloc_skb(0, GFP_KERNEL);
>> +       if (!skb) {
>> +               kfree(fpl);
>> +               return -ENOMEM;
>> +       }
>> +
>> +       skb->sk = ctx->ring_sock->sk;
>> +       skb->destructor = unix_destruct_scm;
>> +
>> +       fpl->user = get_uid(ctx->user);
>> +       for (i = 0; i < nr; i++) {
>> +               fpl->fp[i] = get_file(ctx->user_files[i + offset]);
>> +               unix_inflight(fpl->user, fpl->fp[i]);
>> +               fput(fpl->fp[i]);
> 
> This pattern is almost always superfluous. You increment the file's
> refcount, maybe insert the file into a list (essentially), and drop
> the file's refcount back down. You're already holding a stable
> reference, and you're not temporarily lending that to anyone else.

Actually, this is me messing up. The fput() should be done AFTER
adding to the socket. I'll fix that.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

  reply	other threads:[~2019-02-09  0:16 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-08 17:34 [PATCHSET v13] io_uring IO interface Jens Axboe
2019-02-08 17:34 ` Jens Axboe
2019-02-08 17:34 ` [PATCH 01/19] fs: add an iopoll method to struct file_operations Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09  9:20   ` Hannes Reinecke
2019-02-09  9:20     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 02/19] block: wire up block device iopoll method Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09  9:22   ` Hannes Reinecke
2019-02-09  9:22     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 03/19] block: add bio_set_polled() helper Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09  9:24   ` Hannes Reinecke
2019-02-09  9:24     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 04/19] iomap: wire up the iopoll method Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09  9:25   ` Hannes Reinecke
2019-02-09  9:25     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 05/19] Add io_uring IO interface Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-08 22:12   ` Jann Horn
2019-02-08 22:12     ` Jann Horn
2019-02-09  4:15     ` Jens Axboe
2019-02-09  4:15       ` Jens Axboe
2019-02-12 21:42       ` Jann Horn
2019-02-12 21:42         ` Jann Horn
2019-02-12 22:03         ` Jens Axboe
2019-02-12 22:03           ` Jens Axboe
2019-02-12 22:06           ` Jens Axboe
2019-02-12 22:06             ` Jens Axboe
2019-02-12 22:40             ` Jann Horn
2019-02-12 22:40               ` Jann Horn
2019-02-12 22:45               ` Jens Axboe
2019-02-12 22:45                 ` Jens Axboe
2019-02-12 22:52                 ` Jens Axboe
2019-02-12 22:52                   ` Jens Axboe
2019-02-12 22:57                   ` Jann Horn
2019-02-12 22:57                     ` Jann Horn
2019-02-12 23:00                     ` Jens Axboe
2019-02-12 23:00                       ` Jens Axboe
2019-02-12 23:11                       ` Jann Horn
2019-02-12 23:11                         ` Jann Horn
2019-02-12 23:19                         ` Jens Axboe
2019-02-12 23:19                           ` Jens Axboe
2019-02-12 23:28                           ` Jann Horn
2019-02-12 23:28                             ` Jann Horn
2019-02-12 23:46                             ` Jens Axboe
2019-02-12 23:46                               ` Jens Axboe
2019-02-12 23:53                               ` Jens Axboe
2019-02-12 23:53                                 ` Jens Axboe
2019-02-13  0:07                                 ` Andy Lutomirski
2019-02-13  0:07                                   ` Andy Lutomirski
2019-02-13  0:14                                   ` Jann Horn
2019-02-13  0:14                                     ` Jann Horn
2019-02-13  0:24                                   ` Jens Axboe
2019-02-13  0:24                                     ` Jens Axboe
2019-02-09  9:35   ` Hannes Reinecke
2019-02-09  9:35     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 06/19] io_uring: add fsync support Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-08 22:36   ` Jann Horn
2019-02-08 22:36     ` Jann Horn
2019-02-08 23:31     ` Jens Axboe
2019-02-08 23:31       ` Jens Axboe
2019-02-09  9:37   ` Hannes Reinecke
2019-02-09  9:37     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 07/19] io_uring: support for IO polling Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09  9:39   ` Hannes Reinecke
2019-02-09  9:39     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 08/19] fs: add fget_many() and fput_many() Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09  9:41   ` Hannes Reinecke
2019-02-09  9:41     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 09/19] io_uring: use fget/fput_many() for file references Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09  9:42   ` Hannes Reinecke
2019-02-09  9:42     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 10/19] io_uring: batch io_kiocb allocation Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09  9:43   ` Hannes Reinecke
2019-02-09  9:43     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 11/19] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09  9:45   ` Hannes Reinecke
2019-02-09  9:45     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 12/19] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-08 22:54   ` Jann Horn
2019-02-08 22:54     ` Jann Horn
2019-02-08 23:38     ` Jens Axboe
2019-02-08 23:38       ` Jens Axboe
2019-02-09 16:50       ` Jens Axboe
2019-02-09 16:50         ` Jens Axboe
2019-02-09  9:48   ` Hannes Reinecke
2019-02-09  9:48     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 13/19] net: split out functions related to registering inflight socket files Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-08 19:49   ` David Miller
2019-02-08 19:49     ` David Miller
2019-02-08 19:51     ` Jens Axboe
2019-02-09  9:49   ` Hannes Reinecke
2019-02-09  9:49     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 14/19] io_uring: add file set registration Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-08 20:26   ` Jann Horn
2019-02-08 20:26     ` Jann Horn
2019-02-09  0:16     ` Jens Axboe [this message]
2019-02-09  0:16       ` Jens Axboe
2019-02-09  9:50   ` Hannes Reinecke
2019-02-09  9:50     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 15/19] io_uring: add submission polling Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09  9:53   ` Hannes Reinecke
2019-02-09  9:53     ` Hannes Reinecke
2019-02-08 17:34 ` [PATCH 16/19] io_uring: add io_kiocb ref count Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-08 17:34 ` [PATCH 17/19] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-08 17:34 ` [PATCH 18/19] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-08 17:34 ` [PATCH 19/19] io_uring: add io_uring_event cache hit information Jens Axboe
2019-02-08 17:34   ` Jens Axboe
2019-02-09 21:13 [PATCHSET v14] io_uring IO interface Jens Axboe
2019-02-09 21:13 ` [PATCH 14/19] io_uring: add file set registration Jens Axboe
2019-02-09 21:13   ` Jens Axboe
2019-02-09 23:52   ` Matt Mullins
2019-02-10  0:47     ` Jens Axboe
2019-02-10  0:47       ` Jens Axboe
2019-02-10  1:11       ` Matt Mullins
2019-02-10  2:34         ` Jens Axboe
2019-02-10  2:34           ` Jens Axboe
2019-02-10  2:57           ` Jens Axboe
2019-02-10  2:57             ` Jens Axboe
2019-02-10 19:55             ` Matt Mullins
2019-02-11 19:00 [PATCHSET v15] io_uring IO interface Jens Axboe
2019-02-11 19:00 ` [PATCH 14/19] io_uring: add file set registration Jens Axboe
2019-02-11 19:00   ` Jens Axboe
2019-02-19 16:12   ` Jann Horn
2019-02-19 16:12     ` Jann Horn
2019-02-22 22:29     ` Jens Axboe
2019-02-22 22:29       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a003733-a1aa-0f55-324d-4bf0360a5df9@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=avi@scylladb.com \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.