All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Hao Xu <haoxu@linux.alibaba.com>, Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org, Joseph Qi <joseph.qi@linux.alibaba.com>
Subject: Re: [PATCH 3/8] io_uring: add a limited tw list for irq completion work
Date: Thu, 30 Sep 2021 10:02:07 +0100	[thread overview]
Message-ID: <b61b4f03-55af-b8bc-f177-9807803e8b4e@gmail.com> (raw)
In-Reply-To: <5c1ffe9e-23e7-3b50-b48c-6a87c2c7d1ba@linux.alibaba.com>

On 9/29/21 12:38 PM, Hao Xu wrote:
> 在 2021/9/28 下午7:29, Pavel Begunkov 写道:
[...]
>>>   @@ -2132,12 +2136,16 @@ static void tctx_task_work(struct callback_head *cb)
>>>       while (1) {
>>>           struct io_wq_work_node *node;
>>>   -        if (!tctx->task_list.first && locked)
>>> +        if (!tctx->prior_task_list.first &&
>>> +            !tctx->task_list.first && locked)
>>>               io_submit_flush_completions(ctx);
>>>             spin_lock_irq(&tctx->task_lock);
>>> -        node = tctx->task_list.first;
>>> +        wq_list_merge(&tctx->prior_task_list, &tctx->task_list);
>>> +        node = tctx->prior_task_list.first;
>>
>> I find all this accounting expensive, sure I'll see it for my BPF tests.
> May I ask how do you evaluate the overhead with BPF here?

It's a custom branch and apparently would need some thinking on how
to apply your stuff on top, because of yet another list in [1]. In
short, the case in mind spins inside of tctx_task_work() doing one
request at a time.
I think would be easier if I try it out myself.

[1] https://github.com/isilence/linux/commit/d6285a9817eb26aa52ad54a79584512d7efa82fd

>>
>> How about
>> 1) remove MAX_EMERGENCY_TW_RATIO and all the counters,
>> prior_nr and others.
>>
>> 2) rely solely on list merging
>>
>> So, when it enters an iteration of the loop it finds a set of requests
>> to run, it first executes all priority ones of that set and then the
>> rest (just by the fact that you merged the lists and execute all from
>> them).
>>
>> It solves the problem of total starvation of non-prio requests, e.g.
>> if new completions coming as fast as you complete previous ones. One
>> downside is that prio requests coming while we execute a previous
>> batch will be executed only after a previous batch of non-prio
>> requests, I don't think it's much of a problem but interesting to
>> see numbers.
> hmm, this probably doesn't solve the starvation, since there may be
> a number of priority TWs ahead of non-prio TWs in one iteration, in the
> case of submitting many sqes in one io_submit_sqes. That's why I keep
> just 1/3 priority TWs there.

I don't think it's a problem, they should be fast enough and we have
a forward progress guarantees for non-prio. IMHO that should be enough.


>>
>>
>>>           INIT_WQ_LIST(&tctx->task_list);
>>> +        INIT_WQ_LIST(&tctx->prior_task_list);
>>> +        tctx->nr = tctx->prior_nr = 0;
>>>           if (!node)
>>>               tctx->task_running = false;
>>>           spin_unlock_irq(&tctx->task_lock);
>>> @@ -2166,7 +2174,7 @@ static void tctx_task_work(struct callback_head *cb)
>>>       ctx_flush_and_put(ctx, &locked);
>>>   }
>>>   -static void io_req_task_work_add(struct io_kiocb *req)
>>> +static void io_req_task_work_add(struct io_kiocb *req, bool emergency)
>>
>> It think "priority" instead of "emergency" will be more accurate
>>
>>>   {
>>>       struct task_struct *tsk = req->task;
>>>       struct io_uring_task *tctx = tsk->io_uring;
>>> @@ -2178,7 +2186,13 @@ static void io_req_task_work_add(struct io_kiocb *req)
>>>       WARN_ON_ONCE(!tctx);
>>>         spin_lock_irqsave(&tctx->task_lock, flags);
>>> -    wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
>>> +    if (emergency && tctx->prior_nr * MAX_EMERGENCY_TW_RATIO < tctx->nr) {
>>> +        wq_list_add_tail(&req->io_task_work.node, &tctx->prior_task_list);
>>> +        tctx->prior_nr++;
>>> +    } else {
>>> +        wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
>>> +    }
>>> +    tctx->nr++;
>>>       running = tctx->task_running;
>>>       if (!running)
>>>           tctx->task_running = true;
>>
>>
>>
> 

-- 
Pavel Begunkov

  reply	other threads:[~2021-09-30  9:02 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-27  6:17 [PATCH 0/6] task_work optimization Hao Xu
2021-09-27  6:17 ` [PATCH 1/8] io-wq: code clean for io_wq_add_work_after() Hao Xu
2021-09-28 11:08   ` Pavel Begunkov
2021-09-29  7:36     ` Hao Xu
2021-09-29 11:23       ` Pavel Begunkov
2021-09-27  6:17 ` [PATCH 2/8] io-wq: add helper to merge two wq_lists Hao Xu
2021-09-27 10:17   ` Hao Xu
2021-09-28 11:10   ` Pavel Begunkov
2021-09-28 16:48     ` Hao Xu
2021-09-29 11:23       ` Pavel Begunkov
2021-09-27  6:17 ` [PATCH 3/8] io_uring: add a limited tw list for irq completion work Hao Xu
2021-09-28 11:29   ` Pavel Begunkov
2021-09-28 16:55     ` Hao Xu
2021-09-29 11:25       ` Pavel Begunkov
2021-09-29 11:38     ` Hao Xu
2021-09-30  9:02       ` Pavel Begunkov [this message]
2021-09-30  3:21     ` Hao Xu
2021-09-27  6:17 ` [PATCH 4/8] io_uring: add helper for task work execution code Hao Xu
2021-09-27  6:17 ` [PATCH 5/8] io_uring: split io_req_complete_post() and add a helper Hao Xu
2021-09-27  6:17 ` [PATCH 6/8] io_uring: move up io_put_kbuf() and io_put_rw_kbuf() Hao Xu
2021-09-27  6:17 ` [PATCH 7/8] io_uring: add tw_ctx for io_uring_task Hao Xu
2021-09-27  6:17 ` [PATCH 8/8] io_uring: batch completion in prior_task_list Hao Xu
2021-09-27  6:21 ` [PATCH 0/6] task_work optimization Hao Xu
2021-09-27 10:51 [PATCH v2 0/8] " Hao Xu
2021-09-27 10:51 ` [PATCH 3/8] io_uring: add a limited tw list for irq completion work Hao Xu
2021-09-29 12:31   ` Hao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b61b4f03-55af-b8bc-f177-9807803e8b4e@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=haoxu@linux.alibaba.com \
    --cc=io-uring@vger.kernel.org \
    --cc=joseph.qi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.