linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [QUESTION] blk_mq_freeze_queue in elevator_init_mq
@ 2021-11-17  3:37 yangerkun
  2021-11-17  4:11 ` Bart Van Assche
  2021-11-17  8:06 ` Ming Lei
  0 siblings, 2 replies; 8+ messages in thread
From: yangerkun @ 2021-11-17  3:37 UTC (permalink / raw)
  To: ming.lei, damien.lemoal, axboe, miquel.raynal, richard, vigneshr
  Cc: linux-block, linux-mtd, yangerkun, yi.zhang, yebin10, houtao1

Nowdays we meet the boot regression while enable lots of mtdblock
compare with 4.4. The main reason was that the blk_mq_freeze_queue in
elevator_init_mq will wait a RCU gap which want to make sure no IO will
happen while blk_mq_init_sched.

Other module like loop meets this problem too and has been fix with
follow patches:

  2112f5c1330a loop: Select I/O scheduler 'none' from inside add_disk()
  90b7198001f2 blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag

They change the default IO scheduler for loop to 'none'. So no need to
call blk_mq_freeze_queue and blk_mq_init_sched. But it seems not
appropriate for mtdblocks. Mtdblocks can use 'mq-deadline' to help
optimize the random write with the help of mtdblock's cache. Once change
to 'none', we may meet the regression for random write.

commit 737eb78e82d52d35df166d29af32bf61992de71d
Author: Damien Le Moal <damien.lemoal@wdc.com>
Date:   Thu Sep 5 18:51:33 2019 +0900

     block: Delay default elevator initialization

     ...

     Additionally, to make sure that the elevator initialization is never
     done while requests are in-flight (there should be none when the device
     driver calls device_add_disk()), freeze and quiesce the device request
     queue before calling blk_mq_init_sched() in elevator_init_mq().
     ...

This commit add blk_mq_freeze_queue in elevator_init_mq which try to
make sure no in-flight request while we go through blk_mq_init_sched.
But does there any drivers can leave IO alive while we go through
elevator_init_mq? And if no, maybe we can just remove this logical to
fix the regression...
.
.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
  2021-11-17  3:37 [QUESTION] blk_mq_freeze_queue in elevator_init_mq yangerkun
@ 2021-11-17  4:11 ` Bart Van Assche
  2021-11-17  8:39   ` yangerkun
  2021-11-17  8:06 ` Ming Lei
  1 sibling, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2021-11-17  4:11 UTC (permalink / raw)
  To: yangerkun, ming.lei, damien.lemoal, axboe, miquel.raynal,
	richard, vigneshr
  Cc: linux-block, linux-mtd, yi.zhang, yebin10, houtao1

On 11/16/21 19:37, yangerkun wrote:
> This commit add blk_mq_freeze_queue in elevator_init_mq which try to
> make sure no in-flight request while we go through blk_mq_init_sched.
> But does there any drivers can leave IO alive while we go through
> elevator_init_mq? And if no, maybe we can just remove this logical to
> fix the regression...

Does this untested patch help? Please note that I'm not recommending to
integrate this patch in the upstream kernel but if it helps it can be a
building block of a solution.

Thanks,

Bart.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ab34c4f20da..b85dcb72a579 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -167,6 +167,7 @@ void blk_freeze_queue_start(struct request_queue *q)
  		mutex_unlock(&q->mq_freeze_lock);
  		if (queue_is_mq(q))
  			blk_mq_run_hw_queues(q, false);
+		synchronize_rcu_expedited();
  	} else {
  		mutex_unlock(&q->mq_freeze_lock);
  	}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
  2021-11-17  3:37 [QUESTION] blk_mq_freeze_queue in elevator_init_mq yangerkun
  2021-11-17  4:11 ` Bart Van Assche
@ 2021-11-17  8:06 ` Ming Lei
  2021-11-17  8:37   ` yangerkun
  2021-11-17  9:00   ` yangerkun
  1 sibling, 2 replies; 8+ messages in thread
From: Ming Lei @ 2021-11-17  8:06 UTC (permalink / raw)
  To: yangerkun
  Cc: damien.lemoal, axboe, miquel.raynal, richard, vigneshr,
	linux-block, linux-mtd, yi.zhang, yebin10, houtao1

On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
> Nowdays we meet the boot regression while enable lots of mtdblock

What is your boot regression? Any dmesg log?

> compare with 4.4. The main reason was that the blk_mq_freeze_queue in
> elevator_init_mq will wait a RCU gap which want to make sure no IO will
> happen while blk_mq_init_sched.

There isn't RCU grace period implied in the blk_mq_freeze_queue() called
from elevator_init_mq(), because the .q_usage_counter works at atomic mode
at that time.

> 
> Other module like loop meets this problem too and has been fix with

Again, what is the problem?

> follow patches:
> 
>  2112f5c1330a loop: Select I/O scheduler 'none' from inside add_disk()
>  90b7198001f2 blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag
> 
> They change the default IO scheduler for loop to 'none'. So no need to
> call blk_mq_freeze_queue and blk_mq_init_sched. But it seems not
> appropriate for mtdblocks. Mtdblocks can use 'mq-deadline' to help
> optimize the random write with the help of mtdblock's cache. Once change
> to 'none', we may meet the regression for random write.
> 
> commit 737eb78e82d52d35df166d29af32bf61992de71d
> Author: Damien Le Moal <damien.lemoal@wdc.com>
> Date:   Thu Sep 5 18:51:33 2019 +0900
> 
>     block: Delay default elevator initialization
> 
>     ...
> 
>     Additionally, to make sure that the elevator initialization is never
>     done while requests are in-flight (there should be none when the device
>     driver calls device_add_disk()), freeze and quiesce the device request
>     queue before calling blk_mq_init_sched() in elevator_init_mq().
>     ...
> 
> This commit add blk_mq_freeze_queue in elevator_init_mq which try to
> make sure no in-flight request while we go through blk_mq_init_sched.
> But does there any drivers can leave IO alive while we go through
> elevator_init_mq? And if no, maybe we can just remove this logical to
> fix the regression...

SCSI should have passthrough requests at that moment.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
  2021-11-17  8:06 ` Ming Lei
@ 2021-11-17  8:37   ` yangerkun
  2021-11-17  9:00   ` yangerkun
  1 sibling, 0 replies; 8+ messages in thread
From: yangerkun @ 2021-11-17  8:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: damien.lemoal, axboe, miquel.raynal, richard, vigneshr,
	linux-block, linux-mtd, yi.zhang, yebin10, houtao1



On 2021/11/17 16:06, Ming Lei wrote:
> On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
>> Nowdays we meet the boot regression while enable lots of mtdblock
> 
> What is your boot regression? Any dmesg log?
> 
>> compare with 4.4. The main reason was that the blk_mq_freeze_queue in
>> elevator_init_mq will wait a RCU gap which want to make sure no IO will
>> happen while blk_mq_init_sched.
> 
> There isn't RCU grace period implied in the blk_mq_freeze_queue() called
> from elevator_init_mq(), because the .q_usage_counter works at atomic mode
> at that time.

Emm... It's my fault. Which add the RCU is blk_mq_quiesce_queue...

> 
>>
>> Other module like loop meets this problem too and has been fix with
> 
> Again, what is the problem?
> 
>> follow patches:
>>
>>   2112f5c1330a loop: Select I/O scheduler 'none' from inside add_disk()
>>   90b7198001f2 blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag
>>
>> They change the default IO scheduler for loop to 'none'. So no need to
>> call blk_mq_freeze_queue and blk_mq_init_sched. But it seems not
>> appropriate for mtdblocks. Mtdblocks can use 'mq-deadline' to help
>> optimize the random write with the help of mtdblock's cache. Once change
>> to 'none', we may meet the regression for random write.
>>
>> commit 737eb78e82d52d35df166d29af32bf61992de71d
>> Author: Damien Le Moal <damien.lemoal@wdc.com>
>> Date:   Thu Sep 5 18:51:33 2019 +0900
>>
>>      block: Delay default elevator initialization
>>
>>      ...
>>
>>      Additionally, to make sure that the elevator initialization is never
>>      done while requests are in-flight (there should be none when the device
>>      driver calls device_add_disk()), freeze and quiesce the device request
>>      queue before calling blk_mq_init_sched() in elevator_init_mq().
>>      ...
>>
>> This commit add blk_mq_freeze_queue in elevator_init_mq which try to
>> make sure no in-flight request while we go through blk_mq_init_sched.
>> But does there any drivers can leave IO alive while we go through
>> elevator_init_mq? And if no, maybe we can just remove this logical to
>> fix the regression...
> 
> SCSI should have passthrough requests at that moment.
> 
> 
> 
> Thanks,
> Ming
> 
> .
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
  2021-11-17  4:11 ` Bart Van Assche
@ 2021-11-17  8:39   ` yangerkun
  0 siblings, 0 replies; 8+ messages in thread
From: yangerkun @ 2021-11-17  8:39 UTC (permalink / raw)
  To: Bart Van Assche, ming.lei, damien.lemoal, axboe, miquel.raynal,
	richard, vigneshr
  Cc: linux-block, linux-mtd, yi.zhang, yebin10, houtao1



On 2021/11/17 12:11, Bart Van Assche wrote:
> On 11/16/21 19:37, yangerkun wrote:
>> This commit add blk_mq_freeze_queue in elevator_init_mq which try to
>> make sure no in-flight request while we go through blk_mq_init_sched.
>> But does there any drivers can leave IO alive while we go through
>> elevator_init_mq? And if no, maybe we can just remove this logical to
>> fix the regression...
> 
> Does this untested patch help? Please note that I'm not recommending to
> integrate this patch in the upstream kernel but if it helps it can be a
> building block of a solution.
> 
> Thanks,
> 
> Bart.
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3ab34c4f20da..b85dcb72a579 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -167,6 +167,7 @@ void blk_freeze_queue_start(struct request_queue *q)
>           mutex_unlock(&q->mq_freeze_lock);
>           if (queue_is_mq(q))
>               blk_mq_run_hw_queues(q, false);
> +        synchronize_rcu_expedited();
>       } else {
>           mutex_unlock(&q->mq_freeze_lock);
>       }
> .

Sorry for that it's blk_mq_quiesce_queue which actually introduce the 
RCU gap... I have try synchronize_rcu_expedited in blk_mq_quiesce_queue 
which seems useless...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
  2021-11-17  8:06 ` Ming Lei
  2021-11-17  8:37   ` yangerkun
@ 2021-11-17  9:00   ` yangerkun
  2021-11-17 10:19     ` Ming Lei
  1 sibling, 1 reply; 8+ messages in thread
From: yangerkun @ 2021-11-17  9:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: damien.lemoal, axboe, miquel.raynal, richard, vigneshr,
	linux-block, linux-mtd, yi.zhang, yebin10, houtao1



On 2021/11/17 16:06, Ming Lei wrote:
> On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
>> Nowdays we meet the boot regression while enable lots of mtdblock
> 
> What is your boot regression? Any dmesg log?

The result is that when boot with 5.10 kernel compare with 4.4, 5.10
will consume about 1.6s more...
> 
>> compare with 4.4. The main reason was that the blk_mq_freeze_queue in
>> elevator_init_mq will wait a RCU gap which want to make sure no IO will
>> happen while blk_mq_init_sched.
> 
> There isn't RCU grace period implied in the blk_mq_freeze_queue() called
> from elevator_init_mq(), because the .q_usage_counter works at atomic mode
> at that time.
> 
>>
>> Other module like loop meets this problem too and has been fix with
> 
> Again, what is the problem?

commit 2112f5c1330a671fa852051d85cb9eadc05d7eb7
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Thu Aug 5 10:42:00 2021 -0700

     loop: Select I/O scheduler 'none' from inside add_disk()

     ...
     e.g. via a udev rule. This approach has an advantage compared to 
changing
     the I/O scheduler from userspace from 'mq-deadline' into 'none', namely
     that synchronize_rcu() does not get called.

Actually, we has meets this problem too 
before(https://www.spinics.net/lists/linux-block/msg70660.html).


> 
>> follow patches:
>>
>>   2112f5c1330a loop: Select I/O scheduler 'none' from inside add_disk()
>>   90b7198001f2 blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag
>>
>> They change the default IO scheduler for loop to 'none'. So no need to
>> call blk_mq_freeze_queue and blk_mq_init_sched. But it seems not
>> appropriate for mtdblocks. Mtdblocks can use 'mq-deadline' to help
>> optimize the random write with the help of mtdblock's cache. Once change
>> to 'none', we may meet the regression for random write.
>>
>> commit 737eb78e82d52d35df166d29af32bf61992de71d
>> Author: Damien Le Moal <damien.lemoal@wdc.com>
>> Date:   Thu Sep 5 18:51:33 2019 +0900
>>
>>      block: Delay default elevator initialization
>>
>>      ...
>>
>>      Additionally, to make sure that the elevator initialization is never
>>      done while requests are in-flight (there should be none when the device
>>      driver calls device_add_disk()), freeze and quiesce the device request
>>      queue before calling blk_mq_init_sched() in elevator_init_mq().
>>      ...
>>
>> This commit add blk_mq_freeze_queue in elevator_init_mq which try to
>> make sure no in-flight request while we go through blk_mq_init_sched.
>> But does there any drivers can leave IO alive while we go through
>> elevator_init_mq? And if no, maybe we can just remove this logical to
>> fix the regression...
> 
> SCSI should have passthrough requests at that moment.
> 
> 
> 
> Thanks,
> Ming
> 
> .
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
  2021-11-17  9:00   ` yangerkun
@ 2021-11-17 10:19     ` Ming Lei
  2021-11-18 13:12       ` yangerkun
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2021-11-17 10:19 UTC (permalink / raw)
  To: yangerkun
  Cc: damien.lemoal, axboe, miquel.raynal, richard, vigneshr,
	linux-block, linux-mtd, yi.zhang, yebin10, houtao1, ming.lei

On Wed, Nov 17, 2021 at 05:00:22PM +0800, yangerkun wrote:
> 
> 
> On 2021/11/17 16:06, Ming Lei wrote:
> > On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
> > > Nowdays we meet the boot regression while enable lots of mtdblock
> > 
> > What is your boot regression? Any dmesg log?
> 
> The result is that when boot with 5.10 kernel compare with 4.4, 5.10
> will consume about 1.6s more...

OK, I understand the issue now, and please try the attached patch
which depends on the following one:

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=block-5.16&id=2a19b28f7929866e1cec92a3619f4de9f2d20005


diff --git a/block/elevator.c b/block/elevator.c
index 1f39f6e8ebb9..19a78d5516ba 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -694,12 +694,18 @@ void elevator_init_mq(struct request_queue *q)
 	if (!e)
 		return;
 
+	/*
+	 * We are called before adding disk, when there isn't any FS I/O,
+	 * so freezing queue plus canceling dispatch work is enough to
+	 * drain any dispatch activities originated from passthrough
+	 * requests, then no need to quiesce queue which may add long boot
+	 * latency, especially when lots of disks are involved.
+	 */
 	blk_mq_freeze_queue(q);
-	blk_mq_quiesce_queue(q);
+	blk_mq_cancel_work_sync(q);
 
 	err = blk_mq_init_sched(q, e);
 
-	blk_mq_unquiesce_queue(q);
 	blk_mq_unfreeze_queue(q);
 
 	if (err) {



thanks,
Ming


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq
  2021-11-17 10:19     ` Ming Lei
@ 2021-11-18 13:12       ` yangerkun
  0 siblings, 0 replies; 8+ messages in thread
From: yangerkun @ 2021-11-18 13:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: damien.lemoal, axboe, miquel.raynal, richard, vigneshr,
	linux-block, linux-mtd, yi.zhang, yebin10, houtao1



On 2021/11/17 18:19, Ming Lei wrote:
> On Wed, Nov 17, 2021 at 05:00:22PM +0800, yangerkun wrote:
>>
>>
>> On 2021/11/17 16:06, Ming Lei wrote:
>>> On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
>>>> Nowdays we meet the boot regression while enable lots of mtdblock
>>>
>>> What is your boot regression? Any dmesg log?
>>
>> The result is that when boot with 5.10 kernel compare with 4.4, 5.10
>> will consume about 1.6s more...
> 
> OK, I understand the issue now, and please try the attached patch
> which depends on the following one:

Hi, this patch can help solve the problem!

> 
> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=block-5.16&id=2a19b28f7929866e1cec92a3619f4de9f2d20005
> 
> 
> diff --git a/block/elevator.c b/block/elevator.c
> index 1f39f6e8ebb9..19a78d5516ba 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -694,12 +694,18 @@ void elevator_init_mq(struct request_queue *q)
>   	if (!e)
>   		return;
>   
> +	/*
> +	 * We are called before adding disk, when there isn't any FS I/O,
> +	 * so freezing queue plus canceling dispatch work is enough to
> +	 * drain any dispatch activities originated from passthrough
> +	 * requests, then no need to quiesce queue which may add long boot
> +	 * latency, especially when lots of disks are involved.
> +	 */
>   	blk_mq_freeze_queue(q);
> -	blk_mq_quiesce_queue(q);
> +	blk_mq_cancel_work_sync(q);
>   
>   	err = blk_mq_init_sched(q, e);
>   
> -	blk_mq_unquiesce_queue(q);
>   	blk_mq_unfreeze_queue(q);
>   
>   	if (err) {
> 
> 
> 
> thanks,
> Ming
> 
> .
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-11-18 13:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17  3:37 [QUESTION] blk_mq_freeze_queue in elevator_init_mq yangerkun
2021-11-17  4:11 ` Bart Van Assche
2021-11-17  8:39   ` yangerkun
2021-11-17  8:06 ` Ming Lei
2021-11-17  8:37   ` yangerkun
2021-11-17  9:00   ` yangerkun
2021-11-17 10:19     ` Ming Lei
2021-11-18 13:12       ` yangerkun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).