All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
	io-uring <io-uring@vger.kernel.org>
Cc: Jann Horn <jannh@google.com>
Subject: Re: [PATCH for-next] io_uring: ensure IOSQE_ASYNC file table grabbing works, with SQPOLL
Date: Thu, 10 Sep 2020 17:04:55 -0600	[thread overview]
Message-ID: <9661c330-62eb-4619-cece-2eddf8cc5d6d@kernel.dk> (raw)
In-Reply-To: <eefc2ece-0beb-c27a-2785-19cf1d6aab92@kernel.dk>

On 9/10/20 4:11 PM, Jens Axboe wrote:
> On 9/10/20 3:01 PM, Jens Axboe wrote:
>> On 9/10/20 12:18 PM, Jens Axboe wrote:
>>> On 9/10/20 7:11 AM, Jens Axboe wrote:
>>>> On 9/10/20 6:37 AM, Pavel Begunkov wrote:
>>>>> On 09/09/2020 19:07, Jens Axboe wrote:
>>>>>> On 9/9/20 9:48 AM, Pavel Begunkov wrote:
>>>>>>> On 09/09/2020 16:10, Jens Axboe wrote:
>>>>>>>> On 9/9/20 1:09 AM, Pavel Begunkov wrote:
>>>>>>>>> On 09/09/2020 01:54, Jens Axboe wrote:
>>>>>>>>>> On 9/8/20 3:22 PM, Jens Axboe wrote:
>>>>>>>>>>> On 9/8/20 2:58 PM, Pavel Begunkov wrote:
>>>>>>>>>>>> On 08/09/2020 20:48, Jens Axboe wrote:
>>>>>>>>>>>>> Fd instantiating commands like IORING_OP_ACCEPT now work with SQPOLL, but
>>>>>>>>>>>>> we have an error in grabbing that if IOSQE_ASYNC is set. Ensure we assign
>>>>>>>>>>>>> the ring fd/file appropriately so we can defer grab them.
>>>>>>>>>>>>
>>>>>>>>>>>> IIRC, for fcheck() in io_grab_files() to work it should be under fdget(),
>>>>>>>>>>>> that isn't the case with SQPOLL threads. Am I mistaken?
>>>>>>>>>>>>
>>>>>>>>>>>> And it looks strange that the following snippet will effectively disable
>>>>>>>>>>>> such requests.
>>>>>>>>>>>>
>>>>>>>>>>>> fd = dup(ring_fd)
>>>>>>>>>>>> close(ring_fd)
>>>>>>>>>>>> ring_fd = fd
>>>>>>>>>>>
>>>>>>>>>>> Not disagreeing with that, I think my initial posting made it clear
>>>>>>>>>>> it was a hack. Just piled it in there for easier testing in terms
>>>>>>>>>>> of functionality.
>>>>>>>>>>>
>>>>>>>>>>> But the next question is how to do this right...> 
>>>>>>>>>> Looking at this a bit more, and I don't necessarily think there's a
>>>>>>>>>> better option. If you dup+close, then it just won't work. We have no
>>>>>>>>>> way of knowing if the 'fd' changed, but we can detect if it was closed
>>>>>>>>>> and then we'll end up just EBADF'ing the requests.
>>>>>>>>>>
>>>>>>>>>> So right now the answer is that we can support this just fine with
>>>>>>>>>> SQPOLL, but you better not dup and close the original fd. Which is not
>>>>>>>>>> ideal, but better than NOT being able to support it.
>>>>>>>>>>
>>>>>>>>>> Only other option I see is to to provide an io_uring_register()
>>>>>>>>>> command to update the fd/file associated with it. Which may be useful,
>>>>>>>>>> it allows a process to indeed to this, if it absolutely has to.
>>>>>>>>>
>>>>>>>>> Let's put aside such dirty hacks, at least until someone actually
>>>>>>>>> needs it. Ideally, for many reasons I'd prefer to get rid of
>>>>>>>>
>>>>>>>> BUt it is actually needed, otherwise we're even more in a limbo state of
>>>>>>>> "SQPOLL works for most things now, just not all". And this isn't that
>>>>>>>> hard to make right - on the flush() side, we just need to park/stall the
>>>>>>>
>>>>>>> I understand that it isn't hard, but I just don't want to expose it to
>>>>>>> the userspace, a) because it's a userspace API, so couldn't probably be
>>>>>>> killed in the future, b) works around kernel's problems, and so
>>>>>>> shouldn't really be exposed to the userspace in normal circumstances.
>>>>>>>
>>>>>>> And it's not generic enough because of a possible "many fds -> single
>>>>>>> file" mapping, and there will be a lot of questions and problems.
>>>>>>>
>>>>>>> e.g. if a process shares a io_uring with another process, then
>>>>>>> dup()+close() would require not only this hook but also additional
>>>>>>> inter-process synchronisation. And so on.
>>>>>>
>>>>>> I think you're blowing this out of proportion. Just to restate the
>>>>>
>>>>> I just think that if there is a potentially cleaner solution without
>>>>> involving userspace, we should try to look for it first, even if it
>>>>> would take more time. That was the point.
>>>>
>>>> Regardless of whether or not we can eliminate that need, at least it'll
>>>> be a relaxing of the restriction, not an increase of it. It'll never
>>>> hurt to do an extra system call for the case where you're swapping fds.
>>>> I do get your point, I just don't think it's a big deal.
>>>
>>> BTW, I don't see how we can ever get rid of a need to enter the kernel,
>>> we'd need some chance at grabbing the updated ->files, for instance.
>>> Might be possible to hold a reference to the task and grab it from
>>> there, though feels a bit iffy to hold a task reference from the ring on
>>> the task that holds a reference to the ring. Haven't looked too close,
>>> should work though as this won't hold a file/files reference, it's just
>>> a freeing reference.
>>
>> Sort of half assed attempt...
>>
>> Idea is to assign a ->files sequence before we grab files, and then
>> compare with the current one once we need to use the files. If they
>> mismatch, we -ECANCELED the request.
>>
>> For SQPOLL, don't grab ->files upfront, grab a reference to the task
>> instead. Use the task reference to assign files when we need it.
>>
>> Adding Jann to help poke holes in this scheme. I'd be surprised if it's
>> solid as-is, but hopefully we can build on this idea and get rid of the
>> fcheck().
> 
> Split it into two, to make it easier to reason about. Added a few
> comments, etc.

Pushed to a temp branch, made a few more edits (forgot to wire up open).

https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-files_struct

-- 
Jens Axboe


  reply	other threads:[~2020-09-10 23:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-08 17:48 [PATCH for-next] io_uring: ensure IOSQE_ASYNC file table grabbing works, with SQPOLL Jens Axboe
2020-09-08 20:58 ` Pavel Begunkov
2020-09-08 21:22   ` Jens Axboe
2020-09-08 22:54     ` Jens Axboe
2020-09-09  0:48       ` Josef
2020-09-09  7:09       ` Pavel Begunkov
2020-09-09 13:10         ` Jens Axboe
2020-09-09 13:53           ` Jens Axboe
2020-09-09 15:48           ` Pavel Begunkov
2020-09-09 16:07             ` Jens Axboe
2020-09-10 12:37               ` Pavel Begunkov
2020-09-10 13:11                 ` Jens Axboe
2020-09-10 18:18                   ` Jens Axboe
2020-09-10 21:01                     ` Jens Axboe
2020-09-10 22:11                       ` Jens Axboe
2020-09-10 23:04                         ` Jens Axboe [this message]
2020-09-11 19:23                     ` Pavel Begunkov
2020-09-11 20:06                       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9661c330-62eb-4619-cece-2eddf8cc5d6d@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=io-uring@vger.kernel.org \
    --cc=jannh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.