From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DBA4C4338F for ; Wed, 25 Aug 2021 03:19:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0F73761247 for ; Wed, 25 Aug 2021 03:19:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236811AbhHYDUX (ORCPT ); Tue, 24 Aug 2021 23:20:23 -0400 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:43892 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230021AbhHYDUW (ORCPT ); Tue, 24 Aug 2021 23:20:22 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=haoxu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0Ulgw7LG_1629861573; Received: from B-25KNML85-0107.local(mailfrom:haoxu@linux.alibaba.com fp:SMTPD_---0Ulgw7LG_1629861573) by smtp.aliyun-inc.com(127.0.0.1); Wed, 25 Aug 2021 11:19:34 +0800 Subject: Re: [PATCH 2/2] io_uring: add irq completion work to the head of task_list To: Pavel Begunkov , Jens Axboe Cc: io-uring@vger.kernel.org, Joseph Qi References: <20210823183648.163361-1-haoxu@linux.alibaba.com> <20210823183648.163361-3-haoxu@linux.alibaba.com> <50876fd1-9e8a-baf4-e76e-7232eaae45d9@gmail.com> From: Hao Xu Message-ID: Date: Wed, 25 Aug 2021 11:19:33 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <50876fd1-9e8a-baf4-e76e-7232eaae45d9@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org 在 2021/8/24 下午8:57, Pavel Begunkov 写道: > On 8/23/21 7:36 PM, Hao Xu wrote: >> Now we have a lot of task_work users, some are just to complete a req >> and generate a cqe. Let's put the work at the head position of the >> task_list, so that it can be handled quickly and thus to reduce >> avg req latency. an explanatory case: >> >> origin timeline: >> submit_sqe-->irq-->add completion task_work >> -->run heavy work0~n-->run completion task_work >> now timeline: >> submit_sqe-->irq-->add completion task_work >> -->run completion task_work-->run heavy work0~n > > Might be good. There are not so many hot tw users: > poll, queuing linked requests, and the new IRQ. Could be > BPF in the future. async buffered reads as well, regarding buffered reads is hot operation. > > So, for the test case I'd think about some heavy-ish > submissions linked to your IRQ req. For instance, > keeping a large QD of > > read(IRQ-based) -> linked read_pipe(PAGE_SIZE); > > and running it for a while, so they get completely > out of sync and tw works really mix up. It reads > from pipes size<=PAGE_SIZE, so it completes inline, > but the copy takes enough of time. Thanks Pavel, previously I tried direct read-->buffered read(async buffered read) didn't see much difference. I'll try the above case you offered. > > One thing is that Jens specifically wanted tw's to > be in FIFO order, where IRQ based will be in LIFO. > I don't think it's a real problem though, the > completion handler should be brief enough.In my latest code, the IRQ based tw are also FIFO, only LIFO between IRQ based tw and other tw: timeline: tw1 tw2 irq1 irq2 task_list: irq1 irq2 tw1 tw2 > >> Signed-off-by: Hao Xu >> --- >> fs/io-wq.h | 9 +++++++++ >> fs/io_uring.c | 21 ++++++++++++--------- >> 2 files changed, 21 insertions(+), 9 deletions(-) >> >> diff --git a/fs/io-wq.h b/fs/io-wq.h >> index 308af3928424..51b4408fd177 100644 >> --- a/fs/io-wq.h >> +++ b/fs/io-wq.h >> @@ -41,6 +41,15 @@ static inline void wq_list_add_after(struct io_wq_work_node *node, >> list->last = node; >> } >> >> +static inline void wq_list_add_head(struct io_wq_work_node *node, >> + struct io_wq_work_list *list) >> +{ >> + node->next = list->first; >> + list->first = node; >> + if (!node->next) >> + list->last = node; >> +} >> + >> static inline void wq_list_add_tail(struct io_wq_work_node *node, >> struct io_wq_work_list *list) >> { >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 8172f5f893ad..954cd8583945 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -2050,7 +2050,7 @@ static void tctx_task_work(struct callback_head *cb) >> ctx_flush_and_put(ctx); >> } >> >> -static void io_req_task_work_add(struct io_kiocb *req) >> +static void io_req_task_work_add(struct io_kiocb *req, bool emergency) >> { >> struct task_struct *tsk = req->task; >> struct io_uring_task *tctx = tsk->io_uring; >> @@ -2062,7 +2062,10 @@ static void io_req_task_work_add(struct io_kiocb *req) >> WARN_ON_ONCE(!tctx); >> >> spin_lock_irqsave(&tctx->task_lock, flags); >> - wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); >> + if (emergency) >> + wq_list_add_head(&req->io_task_work.node, &tctx->task_list); >> + else >> + wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); >> running = tctx->task_running; >> if (!running) >> tctx->task_running = true; >> @@ -2122,19 +2125,19 @@ static void io_req_task_queue_fail(struct io_kiocb *req, int ret) >> { >> req->result = ret; >> req->io_task_work.func = io_req_task_cancel; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, true); >> } >> >> static void io_req_task_queue(struct io_kiocb *req) >> { >> req->io_task_work.func = io_req_task_submit; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, false); >> } >> >> static void io_req_task_queue_reissue(struct io_kiocb *req) >> { >> req->io_task_work.func = io_queue_async_work; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, false); >> } >> >> static inline void io_queue_next(struct io_kiocb *req) >> @@ -2249,7 +2252,7 @@ static inline void io_put_req_deferred(struct io_kiocb *req) >> { >> if (req_ref_put_and_test(req)) { >> req->io_task_work.func = io_free_req; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, false); >> } >> } >> >> @@ -2564,7 +2567,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res, long res2) >> return; >> req->result = res; >> req->io_task_work.func = io_req_task_complete; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, true); >> } >> >> static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2) >> @@ -4881,7 +4884,7 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll, >> * of executing it. We can't safely execute it anyway, as we may not >> * have the needed state needed for it anyway. >> */ >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, false); >> return 1; >> } >> >> @@ -6430,7 +6433,7 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer) >> spin_unlock_irqrestore(&ctx->timeout_lock, flags); >> >> req->io_task_work.func = io_req_task_link_timeout; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, false); >> return HRTIMER_NORESTART; >> } >> >> >