linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Thu, 7 Feb 2019 11:45:40 -0700	[thread overview]
Message-ID: <73e23146-2138-5a46-46ed-9c7f1f912a04@kernel.dk> (raw)
In-Reply-To: <20190207040058.GW2217@ZenIV.linux.org.uk>

On 2/6/19 9:00 PM, Al Viro wrote:
> On Wed, Feb 06, 2019 at 06:41:00AM -0700, Jens Axboe wrote:
>> On 2/5/19 5:56 PM, Al Viro wrote:
>>> On Tue, Feb 05, 2019 at 12:08:25PM -0700, Jens Axboe wrote:
>>>> Proof is in the pudding, here's the main commit introducing io_uring
>>>> and now wiring it up to the AF_UNIX garbage collection:
>>>>
>>>> http://git.kernel.dk/cgit/linux-block/commit/?h=io_uring&id=158e6f42b67d0abe9ee84886b96ca8c4b3d3dfd5
>>>>
>>>> How does that look?
>>>
>>> In a word - wrong.  Some theory: garbage collector assumes that there is
>>> a subset of file references such that
>>> 	* for all files with such references there's an associated unix_sock.
>>> 	* all such references are stored in SCM_RIGHTS datagrams that can be
>>> found by the garbage collector (currently: for data-bearing AF_UNIX sockets -
>>> queued SCM_RIGHTS datagrams, for listeners - SCM_RIGHTS datagrams sent via
>>> yet-to-be-accepted connections).
>>> 	* there is an efficient way to count those references for given file
>>> (->inflight of the corresponding unix_sock).
>>> 	* removal of those references would render the graph acyclic.
>>> 	* file can _NOT_ be subject to syscalls unless there are references
>>> to it outside of that subset.
>>
>> IOW, we cannot use fget() for registering files, and we still need fget/fput
>> in the fast path to retain safe use of the file. If I'm understanding you
>> correctly?
> 
> No.  *ALL* references (inflight and not) are the same for file->f_count.
> unix_inflight() does not grab a new reference to file; it only says that
> reference passed to it by the caller is now an in-flight one.
> 
> OK, braindump time:

[snip]

This is great info, and I think it belongs in Documentation/ somewhere.
Not sure I've ever seen such a good and detailed dump of this before.

> What you are about to add is *ANOTHER* kind of loops - references
> to files in the "registered" set are pinned down by owning io_uring.
> 
> That would invalidate just about every assumption made the garbage
> collector - even if you forbid to register io_uring itself, you
> still can register both ends of AF_UNIX socket pair, then pass
> io_uring in SCM_RIGHTS over that, then close all descriptors involved.
> From the garbage collector point of view all sockets have external
> references, so there's nothing to collect.  In fact those external
> references are only reachable if you have a reachable reference
> to io_uring, so we get a leak.
> 
> To make it work:
> 	* have unix_sock created for each io_uring (as your code does)
> 	* do *NOT* have unix_inflight() done at that point - it's
> completely wrong there.
> 	* file set registration becomes
> 		* create and populate SCM_RIGHTS, with the same
> fget()+grab an extra reference + unix_inflight() sequence.
> Don't forget to have skb->destructor set to unix_destruct_scm
> or equivalent thereof.
> 		* remember UNIXCB(skb).fp - that'll give you your
> array of struct file *, to use in lookups.
> 		* queue it into your unix_sock
> 		* do _one_ fput() for everything you've grabbed,
> dropping one of two references you've taken.
> 	* unregistering is simply skb_dequeue() + kfree_skb().
> 	* in ->release() you do sock_release(); it'll do
> everything you need (including unregistering the set, etc.)

This is genius! I implemented this and it works. I've verified that the
previous test app failed to release due to the loop, and with this in
place, once the GC kicks in, the io_uring is released appropriately.

> The hairiest part is the file set registration, of course -
> there's almost certainly a helper or two buried in that thing;
> simply exposing all the required net/unix/af_unix.c bits is
> ucking fugly.

Outside of the modification to unix_get_socket(), the only change I had
to make was to ensure that unix_destruct_scm() is available to io_uring.
No other changes needed.

> I'm not sure what you propose for non-registered descriptors -
> _any_ struct file reference that outlives the return from syscall
> stored in some io_uring-attached data structure is has exact same
> loop (and leak) problem.  And if you mean to have it dropped before
> return from syscall, I'm afraid I don't follow you.  How would
> that be done?
> 
> Again, "io_uring descriptor can't be used in those requests" does
> not help at all - use a socket instead, pass the io_uring fd over
> it in SCM_RIGHTS and you are back to square 1.

I wasn't proposing to fput() before return, otherwise I can't hang on to
that file *.

Right now for async punt, we don't release the reference, and then we
fput() when IO completes. According to what you're saying here, that's
not good enough. Correct me if I'm wrong, but what if we:

1) For non-sock/io_uring fds, the current approach is sufficient
2) Disallow io_uring fd, doesn't make sense anyway

That leaves the socket fd, which is problematic. Should be solvable by
allocating an skb and marking that file inflight?

-- 
Jens Axboe


  parent reply	other threads:[~2019-02-07 18:45 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190129192702.3605-1-axboe@kernel.dk>
     [not found] ` <20190129192702.3605-14-axboe@kernel.dk>
2019-01-30  1:29   ` [PATCH 13/18] io_uring: add file set registration Jann Horn
2019-01-30 15:35     ` Jens Axboe
2019-02-04  2:56     ` Al Viro
2019-02-05  2:19       ` Jens Axboe
2019-02-05 17:57         ` Jens Axboe
2019-02-05 19:08           ` Jens Axboe
2019-02-06  0:27             ` Jens Axboe
2019-02-06  1:01               ` Al Viro
2019-02-06 17:56                 ` Jens Axboe
2019-02-07  4:05                   ` Al Viro
2019-02-07 16:14                     ` Jens Axboe
2019-02-07 16:30                       ` Al Viro
2019-02-07 16:35                         ` Jens Axboe
2019-02-07 16:51                         ` Al Viro
2019-02-06  0:56             ` Al Viro
2019-02-06 13:41               ` Jens Axboe
2019-02-07  4:00                 ` Al Viro
2019-02-07  9:22                   ` Miklos Szeredi
2019-02-07 13:31                     ` Al Viro
2019-02-07 14:20                       ` Miklos Szeredi
2019-02-07 15:20                         ` Al Viro
2019-02-07 15:27                           ` Miklos Szeredi
2019-02-07 16:26                             ` Al Viro
2019-02-07 19:08                               ` Miklos Szeredi
2019-02-07 18:45                   ` Jens Axboe [this message]
2019-02-07 18:58                     ` Jens Axboe
2019-02-11 15:55                     ` Jonathan Corbet
2019-02-11 17:35                       ` Al Viro
2019-02-11 20:33                         ` Jonathan Corbet
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73e23146-2138-5a46-46ed-9c7f1f912a04@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=avi@scylladb.com \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).