All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guo Xuenan <guoxuenan@huawei.com>
To: Jens Axboe <axboe@kernel.dk>, <asml.silence@gmail.com>,
	<io-uring@vger.kernel.org>
Cc: <houtao1@huawei.com>, <yi.zhang@huawei.com>,
	<chengzhihao1@huawei.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] io_uring: add a schedule condition in io_submit_sqes
Date: Mon, 23 May 2022 22:45:26 +0800	[thread overview]
Message-ID: <a2b0340c-7bf7-a00e-6338-aca8ca02a1e2@huawei.com> (raw)
In-Reply-To: <00772002-8df8-3a41-6e6c-20e3854ad3f0@kernel.dk>

Hi Jens

On 2022/5/22 10:42, Jens Axboe wrote:
> On 5/21/22 8:33 AM, Guo Xuenan wrote:
>> when set up sq ring size with IORING_MAX_ENTRIES, io_submit_sqes may
>> looping ~32768 times which may trigger soft lockups. add need_resched
>> condition to avoid this bad situation.
>>
>> set sq ring size 32768 and using io_sq_thread to perform stress test
>> as follows:
>> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [iou-sqp-600:601]
>> Kernel panic - not syncing: softlockup: hung tasks
>> CPU: 2 PID: 601 Comm: iou-sqp-600 Tainted: G L 5.18.0-rc7+ #3
>> Hardware name: linux,dummy-virt (DT)
>> Call trace:
>>   dump_backtrace+0x218/0x228
>>   show_stack+0x20/0x68
>>   dump_stack_lvl+0x68/0x84
>>   dump_stack+0x1c/0x38
>>   panic+0x1ec/0x3ec
>>   watchdog_timer_fn+0x28c/0x300
>>   __hrtimer_run_queues+0x1d8/0x498
>>   hrtimer_interrupt+0x238/0x558
>>   arch_timer_handler_virt+0x48/0x60
>>   handle_percpu_devid_irq+0xdc/0x270
>>   generic_handle_domain_irq+0x50/0x70
>>   gic_handle_irq+0x8c/0x4bc
>>   call_on_irq_stack+0x2c/0x38
>>   do_interrupt_handler+0xc4/0xc8
>>   el1_interrupt+0x48/0xb0
>>   el1h_64_irq_handler+0x18/0x28
>>   el1h_64_irq+0x74/0x78
>>   console_unlock+0x5d0/0x908
>>   vprintk_emit+0x21c/0x470
>>   vprintk_default+0x40/0x50
>>   vprintk+0xd0/0x128
>>   _printk+0xb4/0xe8
>>   io_issue_sqe+0x1784/0x2908
>>   io_submit_sqes+0x538/0x2880
>>   io_sq_thread+0x328/0x7b0
>>   ret_from_fork+0x10/0x20
>> SMP: stopping secondary CPUs
>> Kernel Offset: 0x40f1e8600000 from 0xffff800008000000
>> PHYS_OFFSET: 0xfffffa8c80000000
>> CPU features: 0x110,0000cf09,00001006
>> Memory Limit: none
>> ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---
>>
>> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
>> ---
>>   fs/io_uring.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 92ac50f139cd..d897c6798f00 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -7864,7 +7864,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
>>   			if (!(ctx->flags & IORING_SETUP_SUBMIT_ALL))
>>   				break;
>>   		}
>> -	} while (submitted < nr);
>> +	} while (submitted < nr && !need_resched());
>>   
>>   	if (unlikely(submitted != nr)) {
>>   		int ref_used = (submitted == -EAGAIN) ? 0 : submitted;
> This is wrong, you'll potentially end up doing random short submits for
> non-sqpoll as well.
Sorry, Indeed, this is not a good solution. Since, the function 
io_submit_sqes
not only called by io_sq_thread, it also called by syscall 
io_uring_enter sending
large amounts of requests, will also trigger soft lockup.
> sqpoll already supports capping how many it submits in one go, it just
> doesn't do it if it's only running one ring. As simple as the below,
> with 1024 pulled out of thin air. Would be great if you could experiment
> and submit a v2 based on this principle instead. Might still need a
yes, Jens, your patch sloved sq-poll-thread problem, but the problem may 
not
completely solved; when using syscall io_uring_enter to subimit large 
amounts

of requests.So in my opinion How about 1) add cond_resched() in the 
while cycle

part of io_submit_sqes ?. OR 2) set macro IORING_MAX_ENTRIES smaller? (i'm

curious about the value,why we set it with 32768)

> cond_resched() carefully placed in io_sq_thread().
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index e0823f58f795..3830d7b493b9 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -7916,7 +7916,8 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
>   	unsigned int to_submit;
>   	int ret = 0;
>   
> -	to_submit = io_sqring_entries(ctx);
> +	/* cap at 1024 to avoid doing too much in one submit round */
> +	to_submit = min(io_sqring_entries(ctx), 1024U);
Yes, it works.;)
>   	/* if we're handling multiple rings, cap submit size for fairness */
>   	if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
>   		to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
>

  reply	other threads:[~2022-05-23 14:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-21 14:33 [PATCH] io_uring: add a schedule condition in io_submit_sqes Guo Xuenan
2022-05-22  2:42 ` Jens Axboe
2022-05-23 14:45   ` Guo Xuenan [this message]
2022-05-23 16:27     ` Jens Axboe
2022-05-24  6:58       ` Guo Xuenan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2b0340c-7bf7-a00e-6338-aca8ca02a1e2@huawei.com \
    --to=guoxuenan@huawei.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=chengzhihao1@huawei.com \
    --cc=houtao1@huawei.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.