* [QUESTION] blk_mq_freeze_queue in elevator_init_mq
@ 2021-11-17 3:37 yangerkun
2021-11-17 4:11 ` Bart Van Assche
2021-11-17 8:06 ` Ming Lei
0 siblings, 2 replies; 8+ messages in thread
From: yangerkun @ 2021-11-17 3:37 UTC (permalink / raw)
To: ming.lei, damien.lemoal, axboe, miquel.raynal, richard, vigneshr
Cc: linux-block, linux-mtd, yangerkun, yi.zhang, yebin10, houtao1
Nowdays we meet the boot regression while enable lots of mtdblock
compare with 4.4. The main reason was that the blk_mq_freeze_queue in
elevator_init_mq will wait a RCU gap which want to make sure no IO will
happen while blk_mq_init_sched.
Other module like loop meets this problem too and has been fix with
follow patches:
2112f5c1330a loop: Select I/O scheduler 'none' from inside add_disk()
90b7198001f2 blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag
They change the default IO scheduler for loop to 'none'. So no need to
call blk_mq_freeze_queue and blk_mq_init_sched. But it seems not
appropriate for mtdblocks. Mtdblocks can use 'mq-deadline' to help
optimize the random write with the help of mtdblock's cache. Once change
to 'none', we may meet the regression for random write.
commit 737eb78e82d52d35df166d29af32bf61992de71d
Author: Damien Le Moal <damien.lemoal@wdc.com>
Date: Thu Sep 5 18:51:33 2019 +0900
block: Delay default elevator initialization
...
Additionally, to make sure that the elevator initialization is never
done while requests are in-flight (there should be none when the device
driver calls device_add_disk()), freeze and quiesce the device request
queue before calling blk_mq_init_sched() in elevator_init_mq().
...
This commit add blk_mq_freeze_queue in elevator_init_mq which try to
make sure no in-flight request while we go through blk_mq_init_sched.
But does there any drivers can leave IO alive while we go through
elevator_init_mq? And if no, maybe we can just remove this logical to
fix the regression...
.
.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
2021-11-17 3:37 [QUESTION] blk_mq_freeze_queue in elevator_init_mq yangerkun
@ 2021-11-17 4:11 ` Bart Van Assche
2021-11-17 8:39 ` yangerkun
2021-11-17 8:06 ` Ming Lei
1 sibling, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2021-11-17 4:11 UTC (permalink / raw)
To: yangerkun, ming.lei, damien.lemoal, axboe, miquel.raynal,
richard, vigneshr
Cc: linux-block, linux-mtd, yi.zhang, yebin10, houtao1
On 11/16/21 19:37, yangerkun wrote:
> This commit add blk_mq_freeze_queue in elevator_init_mq which try to
> make sure no in-flight request while we go through blk_mq_init_sched.
> But does there any drivers can leave IO alive while we go through
> elevator_init_mq? And if no, maybe we can just remove this logical to
> fix the regression...
Does this untested patch help? Please note that I'm not recommending to
integrate this patch in the upstream kernel but if it helps it can be a
building block of a solution.
Thanks,
Bart.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ab34c4f20da..b85dcb72a579 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -167,6 +167,7 @@ void blk_freeze_queue_start(struct request_queue *q)
mutex_unlock(&q->mq_freeze_lock);
if (queue_is_mq(q))
blk_mq_run_hw_queues(q, false);
+ synchronize_rcu_expedited();
} else {
mutex_unlock(&q->mq_freeze_lock);
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
2021-11-17 3:37 [QUESTION] blk_mq_freeze_queue in elevator_init_mq yangerkun
2021-11-17 4:11 ` Bart Van Assche
@ 2021-11-17 8:06 ` Ming Lei
2021-11-17 8:37 ` yangerkun
2021-11-17 9:00 ` yangerkun
1 sibling, 2 replies; 8+ messages in thread
From: Ming Lei @ 2021-11-17 8:06 UTC (permalink / raw)
To: yangerkun
Cc: damien.lemoal, axboe, miquel.raynal, richard, vigneshr,
linux-block, linux-mtd, yi.zhang, yebin10, houtao1
On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
> Nowdays we meet the boot regression while enable lots of mtdblock
What is your boot regression? Any dmesg log?
> compare with 4.4. The main reason was that the blk_mq_freeze_queue in
> elevator_init_mq will wait a RCU gap which want to make sure no IO will
> happen while blk_mq_init_sched.
There isn't RCU grace period implied in the blk_mq_freeze_queue() called
from elevator_init_mq(), because the .q_usage_counter works at atomic mode
at that time.
>
> Other module like loop meets this problem too and has been fix with
Again, what is the problem?
> follow patches:
>
> 2112f5c1330a loop: Select I/O scheduler 'none' from inside add_disk()
> 90b7198001f2 blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag
>
> They change the default IO scheduler for loop to 'none'. So no need to
> call blk_mq_freeze_queue and blk_mq_init_sched. But it seems not
> appropriate for mtdblocks. Mtdblocks can use 'mq-deadline' to help
> optimize the random write with the help of mtdblock's cache. Once change
> to 'none', we may meet the regression for random write.
>
> commit 737eb78e82d52d35df166d29af32bf61992de71d
> Author: Damien Le Moal <damien.lemoal@wdc.com>
> Date: Thu Sep 5 18:51:33 2019 +0900
>
> block: Delay default elevator initialization
>
> ...
>
> Additionally, to make sure that the elevator initialization is never
> done while requests are in-flight (there should be none when the device
> driver calls device_add_disk()), freeze and quiesce the device request
> queue before calling blk_mq_init_sched() in elevator_init_mq().
> ...
>
> This commit add blk_mq_freeze_queue in elevator_init_mq which try to
> make sure no in-flight request while we go through blk_mq_init_sched.
> But does there any drivers can leave IO alive while we go through
> elevator_init_mq? And if no, maybe we can just remove this logical to
> fix the regression...
SCSI should have passthrough requests at that moment.
Thanks,
Ming
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
2021-11-17 8:06 ` Ming Lei
@ 2021-11-17 8:37 ` yangerkun
2021-11-17 9:00 ` yangerkun
1 sibling, 0 replies; 8+ messages in thread
From: yangerkun @ 2021-11-17 8:37 UTC (permalink / raw)
To: Ming Lei
Cc: damien.lemoal, axboe, miquel.raynal, richard, vigneshr,
linux-block, linux-mtd, yi.zhang, yebin10, houtao1
On 2021/11/17 16:06, Ming Lei wrote:
> On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
>> Nowdays we meet the boot regression while enable lots of mtdblock
>
> What is your boot regression? Any dmesg log?
>
>> compare with 4.4. The main reason was that the blk_mq_freeze_queue in
>> elevator_init_mq will wait a RCU gap which want to make sure no IO will
>> happen while blk_mq_init_sched.
>
> There isn't RCU grace period implied in the blk_mq_freeze_queue() called
> from elevator_init_mq(), because the .q_usage_counter works at atomic mode
> at that time.
Emm... It's my fault. Which add the RCU is blk_mq_quiesce_queue...
>
>>
>> Other module like loop meets this problem too and has been fix with
>
> Again, what is the problem?
>
>> follow patches:
>>
>> 2112f5c1330a loop: Select I/O scheduler 'none' from inside add_disk()
>> 90b7198001f2 blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag
>>
>> They change the default IO scheduler for loop to 'none'. So no need to
>> call blk_mq_freeze_queue and blk_mq_init_sched. But it seems not
>> appropriate for mtdblocks. Mtdblocks can use 'mq-deadline' to help
>> optimize the random write with the help of mtdblock's cache. Once change
>> to 'none', we may meet the regression for random write.
>>
>> commit 737eb78e82d52d35df166d29af32bf61992de71d
>> Author: Damien Le Moal <damien.lemoal@wdc.com>
>> Date: Thu Sep 5 18:51:33 2019 +0900
>>
>> block: Delay default elevator initialization
>>
>> ...
>>
>> Additionally, to make sure that the elevator initialization is never
>> done while requests are in-flight (there should be none when the device
>> driver calls device_add_disk()), freeze and quiesce the device request
>> queue before calling blk_mq_init_sched() in elevator_init_mq().
>> ...
>>
>> This commit add blk_mq_freeze_queue in elevator_init_mq which try to
>> make sure no in-flight request while we go through blk_mq_init_sched.
>> But does there any drivers can leave IO alive while we go through
>> elevator_init_mq? And if no, maybe we can just remove this logical to
>> fix the regression...
>
> SCSI should have passthrough requests at that moment.
>
>
>
> Thanks,
> Ming
>
> .
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
2021-11-17 4:11 ` Bart Van Assche
@ 2021-11-17 8:39 ` yangerkun
0 siblings, 0 replies; 8+ messages in thread
From: yangerkun @ 2021-11-17 8:39 UTC (permalink / raw)
To: Bart Van Assche, ming.lei, damien.lemoal, axboe, miquel.raynal,
richard, vigneshr
Cc: linux-block, linux-mtd, yi.zhang, yebin10, houtao1
On 2021/11/17 12:11, Bart Van Assche wrote:
> On 11/16/21 19:37, yangerkun wrote:
>> This commit add blk_mq_freeze_queue in elevator_init_mq which try to
>> make sure no in-flight request while we go through blk_mq_init_sched.
>> But does there any drivers can leave IO alive while we go through
>> elevator_init_mq? And if no, maybe we can just remove this logical to
>> fix the regression...
>
> Does this untested patch help? Please note that I'm not recommending to
> integrate this patch in the upstream kernel but if it helps it can be a
> building block of a solution.
>
> Thanks,
>
> Bart.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3ab34c4f20da..b85dcb72a579 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -167,6 +167,7 @@ void blk_freeze_queue_start(struct request_queue *q)
> mutex_unlock(&q->mq_freeze_lock);
> if (queue_is_mq(q))
> blk_mq_run_hw_queues(q, false);
> + synchronize_rcu_expedited();
> } else {
> mutex_unlock(&q->mq_freeze_lock);
> }
> .
Sorry for that it's blk_mq_quiesce_queue which actually introduce the
RCU gap... I have try synchronize_rcu_expedited in blk_mq_quiesce_queue
which seems useless...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
2021-11-17 8:06 ` Ming Lei
2021-11-17 8:37 ` yangerkun
@ 2021-11-17 9:00 ` yangerkun
2021-11-17 10:19 ` Ming Lei
1 sibling, 1 reply; 8+ messages in thread
From: yangerkun @ 2021-11-17 9:00 UTC (permalink / raw)
To: Ming Lei
Cc: damien.lemoal, axboe, miquel.raynal, richard, vigneshr,
linux-block, linux-mtd, yi.zhang, yebin10, houtao1
On 2021/11/17 16:06, Ming Lei wrote:
> On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
>> Nowdays we meet the boot regression while enable lots of mtdblock
>
> What is your boot regression? Any dmesg log?
The result is that when boot with 5.10 kernel compare with 4.4, 5.10
will consume about 1.6s more...
>
>> compare with 4.4. The main reason was that the blk_mq_freeze_queue in
>> elevator_init_mq will wait a RCU gap which want to make sure no IO will
>> happen while blk_mq_init_sched.
>
> There isn't RCU grace period implied in the blk_mq_freeze_queue() called
> from elevator_init_mq(), because the .q_usage_counter works at atomic mode
> at that time.
>
>>
>> Other module like loop meets this problem too and has been fix with
>
> Again, what is the problem?
commit 2112f5c1330a671fa852051d85cb9eadc05d7eb7
Author: Bart Van Assche <bvanassche@acm.org>
Date: Thu Aug 5 10:42:00 2021 -0700
loop: Select I/O scheduler 'none' from inside add_disk()
...
e.g. via a udev rule. This approach has an advantage compared to
changing
the I/O scheduler from userspace from 'mq-deadline' into 'none', namely
that synchronize_rcu() does not get called.
Actually, we has meets this problem too
before(https://www.spinics.net/lists/linux-block/msg70660.html).
>
>> follow patches:
>>
>> 2112f5c1330a loop: Select I/O scheduler 'none' from inside add_disk()
>> 90b7198001f2 blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag
>>
>> They change the default IO scheduler for loop to 'none'. So no need to
>> call blk_mq_freeze_queue and blk_mq_init_sched. But it seems not
>> appropriate for mtdblocks. Mtdblocks can use 'mq-deadline' to help
>> optimize the random write with the help of mtdblock's cache. Once change
>> to 'none', we may meet the regression for random write.
>>
>> commit 737eb78e82d52d35df166d29af32bf61992de71d
>> Author: Damien Le Moal <damien.lemoal@wdc.com>
>> Date: Thu Sep 5 18:51:33 2019 +0900
>>
>> block: Delay default elevator initialization
>>
>> ...
>>
>> Additionally, to make sure that the elevator initialization is never
>> done while requests are in-flight (there should be none when the device
>> driver calls device_add_disk()), freeze and quiesce the device request
>> queue before calling blk_mq_init_sched() in elevator_init_mq().
>> ...
>>
>> This commit add blk_mq_freeze_queue in elevator_init_mq which try to
>> make sure no in-flight request while we go through blk_mq_init_sched.
>> But does there any drivers can leave IO alive while we go through
>> elevator_init_mq? And if no, maybe we can just remove this logical to
>> fix the regression...
>
> SCSI should have passthrough requests at that moment.
>
>
>
> Thanks,
> Ming
>
> .
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
2021-11-17 9:00 ` yangerkun
@ 2021-11-17 10:19 ` Ming Lei
2021-11-18 13:12 ` yangerkun
0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2021-11-17 10:19 UTC (permalink / raw)
To: yangerkun
Cc: damien.lemoal, axboe, miquel.raynal, richard, vigneshr,
linux-block, linux-mtd, yi.zhang, yebin10, houtao1, ming.lei
On Wed, Nov 17, 2021 at 05:00:22PM +0800, yangerkun wrote:
>
>
> On 2021/11/17 16:06, Ming Lei wrote:
> > On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
> > > Nowdays we meet the boot regression while enable lots of mtdblock
> >
> > What is your boot regression? Any dmesg log?
>
> The result is that when boot with 5.10 kernel compare with 4.4, 5.10
> will consume about 1.6s more...
OK, I understand the issue now, and please try the attached patch
which depends on the following one:
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=block-5.16&id=2a19b28f7929866e1cec92a3619f4de9f2d20005
diff --git a/block/elevator.c b/block/elevator.c
index 1f39f6e8ebb9..19a78d5516ba 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -694,12 +694,18 @@ void elevator_init_mq(struct request_queue *q)
if (!e)
return;
+ /*
+ * We are called before adding disk, when there isn't any FS I/O,
+ * so freezing queue plus canceling dispatch work is enough to
+ * drain any dispatch activities originated from passthrough
+ * requests, then no need to quiesce queue which may add long boot
+ * latency, especially when lots of disks are involved.
+ */
blk_mq_freeze_queue(q);
- blk_mq_quiesce_queue(q);
+ blk_mq_cancel_work_sync(q);
err = blk_mq_init_sched(q, e);
- blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
if (err) {
thanks,
Ming
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
2021-11-17 10:19 ` Ming Lei
@ 2021-11-18 13:12 ` yangerkun
0 siblings, 0 replies; 8+ messages in thread
From: yangerkun @ 2021-11-18 13:12 UTC (permalink / raw)
To: Ming Lei
Cc: damien.lemoal, axboe, miquel.raynal, richard, vigneshr,
linux-block, linux-mtd, yi.zhang, yebin10, houtao1
On 2021/11/17 18:19, Ming Lei wrote:
> On Wed, Nov 17, 2021 at 05:00:22PM +0800, yangerkun wrote:
>>
>>
>> On 2021/11/17 16:06, Ming Lei wrote:
>>> On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
>>>> Nowdays we meet the boot regression while enable lots of mtdblock
>>>
>>> What is your boot regression? Any dmesg log?
>>
>> The result is that when boot with 5.10 kernel compare with 4.4, 5.10
>> will consume about 1.6s more...
>
> OK, I understand the issue now, and please try the attached patch
> which depends on the following one:
Hi, this patch can help solve the problem!
>
> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=block-5.16&id=2a19b28f7929866e1cec92a3619f4de9f2d20005
>
>
> diff --git a/block/elevator.c b/block/elevator.c
> index 1f39f6e8ebb9..19a78d5516ba 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -694,12 +694,18 @@ void elevator_init_mq(struct request_queue *q)
> if (!e)
> return;
>
> + /*
> + * We are called before adding disk, when there isn't any FS I/O,
> + * so freezing queue plus canceling dispatch work is enough to
> + * drain any dispatch activities originated from passthrough
> + * requests, then no need to quiesce queue which may add long boot
> + * latency, especially when lots of disks are involved.
> + */
> blk_mq_freeze_queue(q);
> - blk_mq_quiesce_queue(q);
> + blk_mq_cancel_work_sync(q);
>
> err = blk_mq_init_sched(q, e);
>
> - blk_mq_unquiesce_queue(q);
> blk_mq_unfreeze_queue(q);
>
> if (err) {
>
>
>
> thanks,
> Ming
>
> .
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-11-18 13:12 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17 3:37 [QUESTION] blk_mq_freeze_queue in elevator_init_mq yangerkun
2021-11-17 4:11 ` Bart Van Assche
2021-11-17 8:39 ` yangerkun
2021-11-17 8:06 ` Ming Lei
2021-11-17 8:37 ` yangerkun
2021-11-17 9:00 ` yangerkun
2021-11-17 10:19 ` Ming Lei
2021-11-18 13:12 ` yangerkun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).