io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring <io-uring@vger.kernel.org>,
	Glauber Costa <glauber@scylladb.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Pavel Begunkov <asml.silence@gmail.com>
Subject: Re: [PATCH 7/9] io_uring: add per-task callback handler
Date: Thu, 20 Feb 2020 23:23:57 +0100	[thread overview]
Message-ID: <CAG48ez37KerMukJ6zU=VQPtHsxo29S7TxqcqvU=Bs7Lfxtfdcg@mail.gmail.com> (raw)
In-Reply-To: <b78cd45a-9e6f-04ec-d096-d6e1f6cec8bd@kernel.dk>

On Thu, Feb 20, 2020 at 11:14 PM Jens Axboe <axboe@kernel.dk> wrote:
> On 2/20/20 3:02 PM, Jann Horn wrote:
> > On Thu, Feb 20, 2020 at 9:32 PM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> For poll requests, it's not uncommon to link a read (or write) after
> >> the poll to execute immediately after the file is marked as ready.
> >> Since the poll completion is called inside the waitqueue wake up handler,
> >> we have to punt that linked request to async context. This slows down
> >> the processing, and actually means it's faster to not use a link for this
> >> use case.
> >>
> >> We also run into problems if the completion_lock is contended, as we're
> >> doing a different lock ordering than the issue side is. Hence we have
> >> to do trylock for completion, and if that fails, go async. Poll removal
> >> needs to go async as well, for the same reason.
> >>
> >> eventfd notification needs special case as well, to avoid stack blowing
> >> recursion or deadlocks.
> >>
> >> These are all deficiencies that were inherited from the aio poll
> >> implementation, but I think we can do better. When a poll completes,
> >> simply queue it up in the task poll list. When the task completes the
> >> list, we can run dependent links inline as well. This means we never
> >> have to go async, and we can remove a bunch of code associated with
> >> that, and optimizations to try and make that run faster. The diffstat
> >> speaks for itself.
> > [...]
> >> -static void io_poll_trigger_evfd(struct io_wq_work **workptr)
> >> +static void io_poll_task_func(struct callback_head *cb)
> >>  {
> >> -       struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
> >> +       struct io_kiocb *req = container_of(cb, struct io_kiocb, sched_work);
> >> +       struct io_kiocb *nxt = NULL;
> >>
> > [...]
> >> +       io_poll_task_handler(req, &nxt);
> >> +       if (nxt)
> >> +               __io_queue_sqe(nxt, NULL);
> >
> > This can now get here from anywhere that calls schedule(), right?
> > Which means that this might almost double the required kernel stack
> > size, if one codepath exists that calls schedule() while near the
> > bottom of the stack and another codepath exists that goes from here
> > through the VFS and again uses a big amount of stack space? This is a
> > somewhat ugly suggestion, but I wonder whether it'd make sense to
> > check whether we've consumed over 25% of stack space, or something
> > like that, and if so, directly punt the request.
>
> Right, it'll increase the stack usage. Not against adding some safe
> guard that punts if we're too deep in, though I'd have to look how to
> even do that... Looks like stack_not_used(), though it's not clear to me
> how efficient that is?

No, I don't think you want to do that... at least on X86-64, I think
something vaguely like this should do the job:

unsigned long cur_stack = (unsigned long)__builtin_frame_address(0);
unsigned long begin = (unsigned long)task_stack_page(task);
unsigned long end   = (unsigned long)task_stack_page(task) + THREAD_SIZE;
if (cur_stack < begin || cur_stack >= end || cur_stack < begin +
THREAD_SIZE*3/4)
  [bailout]

But since stacks grow in different directions depending on the
architecture and so on, it might have to be an arch-specific thing...
I'm not sure.

> > Also, can we recursively hit this point? Even if __io_queue_sqe()
> > doesn't *want* to block, the code it calls into might still block on a
> > mutex or something like that, at which point the mutex code would call
> > into schedule(), which would then again hit sched_out_update() and get
> > here, right? As far as I can tell, this could cause unbounded
> > recursion.
>
> The sched_work items are pruned before being run, so that can't happen.

And is it impossible for new ones to be added in the meantime if a
second poll operation completes in the background just when we're
entering __io_queue_sqe()?

  parent reply	other threads:[~2020-02-20 22:24 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-20 20:31 [PATCHSET 0/9] io_uring: use polled async retry Jens Axboe
2020-02-20 20:31 ` [PATCH 1/9] io_uring: consider any io_read/write -EAGAIN as final Jens Axboe
2020-02-20 20:31 ` [PATCH 2/9] io_uring: io_accept() should hold on to submit reference on retry Jens Axboe
2020-02-20 20:31 ` [PATCH 3/9] sched: move io-wq/workqueue worker sched in/out into helpers Jens Axboe
2020-02-20 20:31 ` [PATCH 4/9] task_work_run: don't take ->pi_lock unconditionally Jens Axboe
2020-02-20 20:31 ` [PATCH 5/9] kernel: abstract out task work helpers Jens Axboe
2020-02-20 21:07   ` Peter Zijlstra
2020-02-20 21:08     ` Jens Axboe
2020-02-20 20:31 ` [PATCH 6/9] sched: add a sched_work list Jens Axboe
2020-02-20 21:17   ` Peter Zijlstra
2020-02-20 21:53     ` Jens Axboe
2020-02-20 22:02       ` Jens Axboe
2020-02-20 20:31 ` [PATCH 7/9] io_uring: add per-task callback handler Jens Axboe
2020-02-20 22:02   ` Jann Horn
2020-02-20 22:14     ` Jens Axboe
2020-02-20 22:18       ` Jens Axboe
2020-02-20 22:25         ` Jann Horn
2020-02-20 22:23       ` Jens Axboe
2020-02-20 22:38         ` Jann Horn
2020-02-20 22:56           ` Jens Axboe
2020-02-20 22:58             ` Jann Horn
2020-02-20 23:02               ` Jens Axboe
2020-02-20 22:23       ` Jann Horn [this message]
2020-02-20 23:00         ` Jens Axboe
2020-02-20 23:12           ` Jann Horn
2020-02-20 23:22             ` Jens Axboe
2020-02-21  1:29               ` Jann Horn
2020-02-21 17:32                 ` Jens Axboe
2020-02-21 19:24                   ` Jann Horn
2020-02-21 20:18                     ` Jens Axboe
2020-02-20 22:56     ` Jann Horn
2020-02-21 10:47     ` Peter Zijlstra
2020-02-21 14:49       ` Jens Axboe
2020-02-21 15:02         ` Jann Horn
2020-02-21 16:12           ` Peter Zijlstra
2020-02-21 16:23         ` Peter Zijlstra
2020-02-21 20:13           ` Jens Axboe
2020-02-21 13:51   ` Pavel Begunkov
2020-02-21 14:50     ` Jens Axboe
2020-02-21 18:30       ` Pavel Begunkov
2020-02-21 19:10         ` Jens Axboe
2020-02-21 19:22           ` Pavel Begunkov
2020-02-23  6:00           ` Jens Axboe
2020-02-23  6:26             ` Jens Axboe
2020-02-23 11:02               ` Pavel Begunkov
2020-02-23 14:49                 ` Jens Axboe
2020-02-23 14:58                   ` Jens Axboe
2020-02-23 15:07                     ` Jens Axboe
2020-02-23 18:04                       ` Pavel Begunkov
2020-02-23 18:06                         ` Jens Axboe
2020-02-23 17:55                   ` Pavel Begunkov
2020-02-20 20:31 ` [PATCH 8/9] io_uring: mark requests that we can do poll async in io_op_defs Jens Axboe
2020-02-20 20:31 ` [PATCH 9/9] io_uring: use poll driven retry for files that support it Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez37KerMukJ6zU=VQPtHsxo29S7TxqcqvU=Bs7Lfxtfdcg@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=glauber@scylladb.com \
    --cc=io-uring@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).