linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] blk-mq: plug request for shared sbitmap
@ 2021-05-14  2:20 Ming Lei
  2021-05-14 14:59 ` Jens Axboe
  2021-05-18  9:44 ` John Garry
  0 siblings, 2 replies; 15+ messages in thread
From: Ming Lei @ 2021-05-14  2:20 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-block, Ming Lei, Yanhui Ma, John Garry, Bart Van Assche,
	kashyap.desai

In case of shared sbitmap, request won't be held in plug list any more
sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per
tagset"), this way makes request merge from flush plug list & batching
submission not possible, so cause performance regression.

Yanhui reports performance regression when running sequential IO
test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
is emulated with image stored on xfs/megaraid_sas.

Fix the issue by recovering original behavior to allow to hold request
in plug list.

Cc: Yanhui Ma <yama@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: kashyap.desai@broadcom.com
Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ae7f5ee41cd3..baf7a9546068 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2236,8 +2236,9 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
 		/* Bypass scheduler for flush requests */
 		blk_insert_flush(rq);
 		blk_mq_run_hw_queue(data.hctx, true);
-	} else if (plug && (q->nr_hw_queues == 1 || q->mq_ops->commit_rqs ||
-				!blk_queue_nonrot(q))) {
+	} else if (plug && (q->nr_hw_queues == 1 ||
+		   blk_mq_is_sbitmap_shared(rq->mq_hctx->flags) ||
+		   q->mq_ops->commit_rqs || !blk_queue_nonrot(q))) {
 		/*
 		 * Use plugging if we have a ->commit_rqs() hook as well, as
 		 * we know the driver uses bd->last in a smart fashion.
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-14  2:20 [PATCH] blk-mq: plug request for shared sbitmap Ming Lei
@ 2021-05-14 14:59 ` Jens Axboe
  2021-05-18  9:44 ` John Garry
  1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2021-05-14 14:59 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig
  Cc: linux-block, Yanhui Ma, John Garry, Bart Van Assche, kashyap.desai

On 5/13/21 8:20 PM, Ming Lei wrote:
> In case of shared sbitmap, request won't be held in plug list any more
> sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per
> tagset"), this way makes request merge from flush plug list & batching
> submission not possible, so cause performance regression.
> 
> Yanhui reports performance regression when running sequential IO
> test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
> is emulated with image stored on xfs/megaraid_sas.
> 
> Fix the issue by recovering original behavior to allow to hold request
> in plug list.

Applied, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-14  2:20 [PATCH] blk-mq: plug request for shared sbitmap Ming Lei
  2021-05-14 14:59 ` Jens Axboe
@ 2021-05-18  9:44 ` John Garry
  2021-05-18 11:16   ` Ming Lei
  1 sibling, 1 reply; 15+ messages in thread
From: John Garry @ 2021-05-18  9:44 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Christoph Hellwig
  Cc: linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang

On 14/05/2021 03:20, Ming Lei wrote:
> In case of shared sbitmap, request won't be held in plug list any more
> sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per
> tagset"), this way makes request merge from flush plug list & batching
> submission not possible, so cause performance regression.
> 
> Yanhui reports performance regression when running sequential IO
> test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
> is emulated with image stored on xfs/megaraid_sas.
> 
> Fix the issue by recovering original behavior to allow to hold request
> in plug list.

Hi Ming,

Since testing v5.13-rc2, I noticed that this patch made the hang I was 
seeing disappear:
https://lore.kernel.org/linux-scsi/3d72d64d-314f-9d34-e039-7e508b2abe1b@huawei.com/

I don't think that problem is solved, though.

So I wonder about throughput performance (I had hoped to test before it 
was merged). I only have 6x SAS SSDs at hand, but I see some significant 
changes (good and bad) for mq-deadline for hisi_sas:
Before 620K (read), 300K IOPs (randread)
After 430K (read), 460-490K IOPs (randread)

none IO sched is always about 450K (read) and 500K (randread)

Do you guys have any figures? Are my results as expected?

Thanks,
John

> 
> Cc: Yanhui Ma <yama@redhat.com>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: kashyap.desai@broadcom.com
> Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-mq.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ae7f5ee41cd3..baf7a9546068 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2236,8 +2236,9 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
>   		/* Bypass scheduler for flush requests */
>   		blk_insert_flush(rq);
>   		blk_mq_run_hw_queue(data.hctx, true);
> -	} else if (plug && (q->nr_hw_queues == 1 || q->mq_ops->commit_rqs ||
> -				!blk_queue_nonrot(q))) {
> +	} else if (plug && (q->nr_hw_queues == 1 ||
> +		   blk_mq_is_sbitmap_shared(rq->mq_hctx->flags) ||
> +		   q->mq_ops->commit_rqs || !blk_queue_nonrot(q))) {
>   		/*
>   		 * Use plugging if we have a ->commit_rqs() hook as well, as
>   		 * we know the driver uses bd->last in a smart fashion.
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-18  9:44 ` John Garry
@ 2021-05-18 11:16   ` Ming Lei
  2021-05-18 11:42     ` John Garry
  2021-05-18 11:54     ` Hannes Reinecke
  0 siblings, 2 replies; 15+ messages in thread
From: Ming Lei @ 2021-05-18 11:16 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On Tue, May 18, 2021 at 10:44:43AM +0100, John Garry wrote:
> On 14/05/2021 03:20, Ming Lei wrote:
> > In case of shared sbitmap, request won't be held in plug list any more
> > sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per
> > tagset"), this way makes request merge from flush plug list & batching
> > submission not possible, so cause performance regression.
> > 
> > Yanhui reports performance regression when running sequential IO
> > test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
> > is emulated with image stored on xfs/megaraid_sas.
> > 
> > Fix the issue by recovering original behavior to allow to hold request
> > in plug list.
> 
> Hi Ming,
> 
> Since testing v5.13-rc2, I noticed that this patch made the hang I was
> seeing disappear:
> https://lore.kernel.org/linux-scsi/3d72d64d-314f-9d34-e039-7e508b2abe1b@huawei.com/
> 
> I don't think that problem is solved, though.

This kind of hang or lockup is usually related with cpu utilization, and
this patch may reduce cpu utilization in submission context.

> 
> So I wonder about throughput performance (I had hoped to test before it was
> merged). I only have 6x SAS SSDs at hand, but I see some significant changes
> (good and bad) for mq-deadline for hisi_sas:
> Before 620K (read), 300K IOPs (randread)
> After 430K (read), 460-490K IOPs (randread)

'Before 620K' could be caused by busy queue when batching submission isn't
applied, so merge chance is increased. This patch applies batching
submission, so queue becomes not busy enough.

BTW, what is the queue depth of sdev and can_queue of shost for your hisilision SAS?
 
> 
> none IO sched is always about 450K (read) and 500K (randread)
> 
> Do you guys have any figures? Are my results as expected?

In yanhui's virt workload(qemu, libaio, dio, high queue depth, single
job), the patch can improve throughput much(>50%) when running
sequential write(dio, libaio, 16 jobs) to XFS. And it is observed that
IO merge is recovered to level of disabling host tags.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-18 11:16   ` Ming Lei
@ 2021-05-18 11:42     ` John Garry
  2021-05-18 12:00       ` Ming Lei
  2021-05-18 11:54     ` Hannes Reinecke
  1 sibling, 1 reply; 15+ messages in thread
From: John Garry @ 2021-05-18 11:42 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On 18/05/2021 12:16, Ming Lei wrote:
> On Tue, May 18, 2021 at 10:44:43AM +0100, John Garry wrote:
>> On 14/05/2021 03:20, Ming Lei wrote:
>>> In case of shared sbitmap, request won't be held in plug list any more
>>> sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per
>>> tagset"), this way makes request merge from flush plug list & batching
>>> submission not possible, so cause performance regression.
>>>
>>> Yanhui reports performance regression when running sequential IO
>>> test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
>>> is emulated with image stored on xfs/megaraid_sas.
>>>
>>> Fix the issue by recovering original behavior to allow to hold request
>>> in plug list.
>>
>> Hi Ming,
>>
>> Since testing v5.13-rc2, I noticed that this patch made the hang I was
>> seeing disappear:
>> https://lore.kernel.org/linux-scsi/3d72d64d-314f-9d34-e039-7e508b2abe1b@huawei.com/
>>
>> I don't think that problem is solved, though.
> 
> This kind of hang or lockup is usually related with cpu utilization, and
> this patch may reduce cpu utilization in submission context.

Right.

> 
>>
>> So I wonder about throughput performance (I had hoped to test before it was
>> merged). I only have 6x SAS SSDs at hand, but I see some significant changes
>> (good and bad) for mq-deadline for hisi_sas:
>> Before 620K (read), 300K IOPs (randread)
>> After 430K (read), 460-490K IOPs (randread)
> 
> 'Before 620K' could be caused by busy queue when batching submission isn't
> applied, so merge chance is increased. This patch applies batching
> submission, so queue becomes not busy enough.
> 
> BTW, what is the queue depth of sdev and can_queue of shost for your hisilision SAS?

sdev queue depth is 64 (see hisi_sas_slave_configure()) and host depth 
is 4096 - 96 (for reserved tags) = 4000

IIRC, megaraid sas and mpt3sas use 256 for SAS sdev queue depth

>   
>>
>> none IO sched is always about 450K (read) and 500K (randread)
>>
>> Do you guys have any figures? Are my results as expected?
> 
> In yanhui's virt workload(qemu, libaio, dio, high queue depth, single
> job), the patch can improve throughput much(>50%) when running
> sequential write(dio, libaio, 16 jobs) to XFS. And it is observed that
> IO merge is recovered to level of disabling host tags.
> 

Thanks,
John

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-18 11:16   ` Ming Lei
  2021-05-18 11:42     ` John Garry
@ 2021-05-18 11:54     ` Hannes Reinecke
  2021-05-18 12:37       ` John Garry
  1 sibling, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2021-05-18 11:54 UTC (permalink / raw)
  To: Ming Lei, John Garry
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On 5/18/21 1:16 PM, Ming Lei wrote:
> On Tue, May 18, 2021 at 10:44:43AM +0100, John Garry wrote:
>> On 14/05/2021 03:20, Ming Lei wrote:
>>> In case of shared sbitmap, request won't be held in plug list any more
>>> sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per
>>> tagset"), this way makes request merge from flush plug list & batching
>>> submission not possible, so cause performance regression.
>>>
>>> Yanhui reports performance regression when running sequential IO
>>> test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
>>> is emulated with image stored on xfs/megaraid_sas.
>>>
>>> Fix the issue by recovering original behavior to allow to hold request
>>> in plug list.
>>
>> Hi Ming,
>>
>> Since testing v5.13-rc2, I noticed that this patch made the hang I was
>> seeing disappear:
>> https://lore.kernel.org/linux-scsi/3d72d64d-314f-9d34-e039-7e508b2abe1b@huawei.com/
>>
>> I don't think that problem is solved, though.
> 
> This kind of hang or lockup is usually related with cpu utilization, and
> this patch may reduce cpu utilization in submission context.
> 
>>
>> So I wonder about throughput performance (I had hoped to test before it was
>> merged). I only have 6x SAS SSDs at hand, but I see some significant changes
>> (good and bad) for mq-deadline for hisi_sas:
>> Before 620K (read), 300K IOPs (randread)
>> After 430K (read), 460-490K IOPs (randread)
> 
> 'Before 620K' could be caused by busy queue when batching submission isn't
> applied, so merge chance is increased. This patch applies batching
> submission, so queue becomes not busy enough.
> 
> BTW, what is the queue depth of sdev and can_queue of shost for your hisilision SAS?
>  
>>
>> none IO sched is always about 450K (read) and 500K (randread)
>>
>> Do you guys have any figures? Are my results as expected?
> 
> In yanhui's virt workload(qemu, libaio, dio, high queue depth, single
> job), the patch can improve throughput much(>50%) when running
> sequential write(dio, libaio, 16 jobs) to XFS. And it is observed that
> IO merge is recovered to level of disabling host tags.
> 
I've found a good testcase for this by using null_blk:

modprobe null_blk submit_queues=32 queue_mode=2 irqmode=0 bs=4096
hw_queue_depth=2048 completion_nsec=1 nr_devices=4 shared_tags=1
shared_tag_bitmap=1

and using a simple fio job with libaio and rw=read and numjobs=32 will
do the trick:

[nullb]
rw=read
size=16g
ioengine=libaio
direct=1
buffered=0
group_reporting=1
bs=4096
numjobs=32

(of course, the 'numjobs' and 'submit_queues' parameter would need to be
adjusted to your machine).
Omitting the 'shared_tag_bitmap' module parameter would yield around 12M
iops; adding it would see a rather dramatic drop to 172k iops.
With this patchset it's back to the expected iops value.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-18 11:42     ` John Garry
@ 2021-05-18 12:00       ` Ming Lei
  2021-05-18 12:51         ` John Garry
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2021-05-18 12:00 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On Tue, May 18, 2021 at 12:42:22PM +0100, John Garry wrote:
> On 18/05/2021 12:16, Ming Lei wrote:
> > On Tue, May 18, 2021 at 10:44:43AM +0100, John Garry wrote:
> > > On 14/05/2021 03:20, Ming Lei wrote:
> > > > In case of shared sbitmap, request won't be held in plug list any more
> > > > sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per
> > > > tagset"), this way makes request merge from flush plug list & batching
> > > > submission not possible, so cause performance regression.
> > > > 
> > > > Yanhui reports performance regression when running sequential IO
> > > > test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
> > > > is emulated with image stored on xfs/megaraid_sas.
> > > > 
> > > > Fix the issue by recovering original behavior to allow to hold request
> > > > in plug list.
> > > 
> > > Hi Ming,
> > > 
> > > Since testing v5.13-rc2, I noticed that this patch made the hang I was
> > > seeing disappear:
> > > https://lore.kernel.org/linux-scsi/3d72d64d-314f-9d34-e039-7e508b2abe1b@huawei.com/
> > > 
> > > I don't think that problem is solved, though.
> > 
> > This kind of hang or lockup is usually related with cpu utilization, and
> > this patch may reduce cpu utilization in submission context.
> 
> Right.
> 
> > 
> > > 
> > > So I wonder about throughput performance (I had hoped to test before it was
> > > merged). I only have 6x SAS SSDs at hand, but I see some significant changes
> > > (good and bad) for mq-deadline for hisi_sas:
> > > Before 620K (read), 300K IOPs (randread)
> > > After 430K (read), 460-490K IOPs (randread)
> > 
> > 'Before 620K' could be caused by busy queue when batching submission isn't
> > applied, so merge chance is increased. This patch applies batching
> > submission, so queue becomes not busy enough.
> > 
> > BTW, what is the queue depth of sdev and can_queue of shost for your hisilision SAS?
> 
> sdev queue depth is 64 (see hisi_sas_slave_configure()) and host depth is
> 4096 - 96 (for reserved tags) = 4000

OK, so queue depth should get IO merged if there are too many requests
queued.

What is the same read test result without shared tags? still 430K?
And what is your exact read test script? And how many cpu cores in
your system?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-18 11:54     ` Hannes Reinecke
@ 2021-05-18 12:37       ` John Garry
  2021-05-18 13:22         ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: John Garry @ 2021-05-18 12:37 UTC (permalink / raw)
  To: Hannes Reinecke, Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On 18/05/2021 12:54, Hannes Reinecke wrote:
>> In yanhui's virt workload(qemu, libaio, dio, high queue depth, single
>> job), the patch can improve throughput much(>50%) when running
>> sequential write(dio, libaio, 16 jobs) to XFS. And it is observed that
>> IO merge is recovered to level of disabling host tags.
>>
> I've found a good testcase for this by using null_blk:
> 
> modprobe null_blk submit_queues=32 queue_mode=2 irqmode=0 bs=4096
> hw_queue_depth=2048 completion_nsec=1 nr_devices=4 shared_tags=1
> shared_tag_bitmap=1
> 
> and using a simple fio job with libaio and rw=read and numjobs=32 will
> do the trick:
> 
> [nullb]
> rw=read
> size=16g
> ioengine=libaio
> direct=1
> buffered=0
> group_reporting=1
> bs=4096
> numjobs=32
> 
> (of course, the 'numjobs' and 'submit_queues' parameter would need to be
> adjusted to your machine).
> Omitting the 'shared_tag_bitmap' module parameter would yield around 12M
> iops; adding it would see a rather dramatic drop to 172k iops.
> With this patchset it's back to the expected iops value.

I will consider running this test myself, but do you mean that you get 
~12M with shared_tag_bitmap=1 and this patch?

I would always expect shared_tag_bitmap=1 to give a drop for null_blk, 
as we move from per-submit queue driver tag to all submit queues sharing 
the same driver tagset.

And I am not sure if you are fixing the IO sched from default.

Thanks,
John

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-18 12:00       ` Ming Lei
@ 2021-05-18 12:51         ` John Garry
  2021-05-18 16:01           ` John Garry
  0 siblings, 1 reply; 15+ messages in thread
From: John Garry @ 2021-05-18 12:51 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On 18/05/2021 13:00, Ming Lei wrote:
>>> 'Before 620K' could be caused by busy queue when batching submission isn't
>>> applied, so merge chance is increased. This patch applies batching
>>> submission, so queue becomes not busy enough.
>>>
>>> BTW, what is the queue depth of sdev and can_queue of shost for your hisilision SAS?
>> sdev queue depth is 64 (see hisi_sas_slave_configure()) and host depth is
>> 4096 - 96 (for reserved tags) = 4000
> OK, so queue depth should get IO merged if there are too many requests
> queued.
> 
> What is the same read test result without shared tags? still 430K?

I never left a driver switch in place to disable it.

I can forward-port "reply-map" support, which is not too difficult and I 
will let you know the result.

> And what is your exact read test script? And how many cpu cores in
> your system?

64 CPUs, 16x hw queues.

[global]
rw=read
direct=1
ioengine=libaio
iodepth=128
numjobs=1
bs=4k
group_reporting=1
runtime=4500
loops = 10000

[job1]
filename=/dev/sda:
[job1]
filename=/dev/sdc:
[job1]
filename=/dev/sde:
[job1]
filename=/dev/sdf:
[job1]
filename=/dev/sdg:
[job1]
filename=/dev/sdh:

Thanks,
John

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-18 12:37       ` John Garry
@ 2021-05-18 13:22         ` Hannes Reinecke
  0 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2021-05-18 13:22 UTC (permalink / raw)
  To: John Garry, Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On 5/18/21 2:37 PM, John Garry wrote:
> On 18/05/2021 12:54, Hannes Reinecke wrote:
>>> In yanhui's virt workload(qemu, libaio, dio, high queue depth, single
>>> job), the patch can improve throughput much(>50%) when running
>>> sequential write(dio, libaio, 16 jobs) to XFS. And it is observed that
>>> IO merge is recovered to level of disabling host tags.
>>>
>> I've found a good testcase for this by using null_blk:
>>
>> modprobe null_blk submit_queues=32 queue_mode=2 irqmode=0 bs=4096
>> hw_queue_depth=2048 completion_nsec=1 nr_devices=4 shared_tags=1
>> shared_tag_bitmap=1
>>
>> and using a simple fio job with libaio and rw=read and numjobs=32 will
>> do the trick:
>>
>> [nullb]
>> rw=read
>> size=16g
>> ioengine=libaio
>> direct=1
>> buffered=0
>> group_reporting=1
>> bs=4096
>> numjobs=32
>>
>> (of course, the 'numjobs' and 'submit_queues' parameter would need to be
>> adjusted to your machine).
>> Omitting the 'shared_tag_bitmap' module parameter would yield around 12M
>> iops; adding it would see a rather dramatic drop to 172k iops.
>> With this patchset it's back to the expected iops value.
> 
> I will consider running this test myself, but do you mean that you get
> ~12M with shared_tag_bitmap=1 and this patch?
> 
> I would always expect shared_tag_bitmap=1 to give a drop for null_blk,
> as we move from per-submit queue driver tag to all submit queues sharing
> the same driver tagset.
> 
There is a slight drop (less than 1M iops), but that is about to be
expected; however, as there is a rather high fluctuation in the numbers
I didn't include it.

But nothing as dramatic as without this patchset :-)

> And I am not sure if you are fixing the IO sched from default.
> 
No, I don't; I'm using whatever it's there (ie mq-deadline).

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-18 12:51         ` John Garry
@ 2021-05-18 16:01           ` John Garry
  2021-05-19  0:21             ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: John Garry @ 2021-05-18 16:01 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On 18/05/2021 13:51, John Garry wrote:
> n 18/05/2021 13:00, Ming Lei wrote:
>>>> 'Before 620K' could be caused by busy queue when batching submission 
>>>> isn't
>>>> applied, so merge chance is increased. This patch applies batching
>>>> submission, so queue becomes not busy enough.
>>>>
>>>> BTW, what is the queue depth of sdev and can_queue of shost for your 
>>>> hisilision SAS?
>>> sdev queue depth is 64 (see hisi_sas_slave_configure()) and host 
>>> depth is
>>> 4096 - 96 (for reserved tags) = 4000
>> OK, so queue depth should get IO merged if there are too many requests
>> queued.
>>
>> What is the same read test result without shared tags? still 430K?
> 
> I never left a driver switch in place to disable it.
> 
> I can forward-port "reply-map" support, which is not too difficult and I 
> will let you know the result.

The 'after' results are similar to without shared sbitmap, i.e using 
reply-map:

reply-map:
450K (read), 430K IOPs (randread)

For reference, with shared shared sbitmap:
Before 620K (read), 300K IOPs (randread)
After  460K (read), 430K (randread)*

These are all mq-deadline.

* I mixed read and randread result earlier by accident

> 
>> And what is your exact read test script? And how many cpu cores in
>> your system?
> 

Thanks,
John

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-18 16:01           ` John Garry
@ 2021-05-19  0:21             ` Ming Lei
  2021-05-19  8:41               ` John Garry
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2021-05-19  0:21 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On Tue, May 18, 2021 at 05:01:16PM +0100, John Garry wrote:
> On 18/05/2021 13:51, John Garry wrote:
> > n 18/05/2021 13:00, Ming Lei wrote:
> > > > > 'Before 620K' could be caused by busy queue when batching
> > > > > submission isn't
> > > > > applied, so merge chance is increased. This patch applies batching
> > > > > submission, so queue becomes not busy enough.
> > > > > 
> > > > > BTW, what is the queue depth of sdev and can_queue of shost
> > > > > for your hisilision SAS?
> > > > sdev queue depth is 64 (see hisi_sas_slave_configure()) and host
> > > > depth is
> > > > 4096 - 96 (for reserved tags) = 4000
> > > OK, so queue depth should get IO merged if there are too many requests
> > > queued.
> > > 
> > > What is the same read test result without shared tags? still 430K?
> > 
> > I never left a driver switch in place to disable it.
> > 
> > I can forward-port "reply-map" support, which is not too difficult and I
> > will let you know the result.
> 
> The 'after' results are similar to without shared sbitmap, i.e using
> reply-map:
> 
> reply-map:
> 450K (read), 430K IOPs (randread)

OK, that is expected result. After shared sbitmap, IO merge gets improved
when batching submission is bypassed, meantime IOPS of random IO drops
because cpu utilization is increased.

So that isn't a regression, let's live with this awkward situation, :-(


Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-19  0:21             ` Ming Lei
@ 2021-05-19  8:41               ` John Garry
  2021-05-20  1:23                 ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: John Garry @ 2021-05-19  8:41 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On 19/05/2021 01:21, Ming Lei wrote:
>> The 'after' results are similar to without shared sbitmap, i.e using
>> reply-map:
>>
>> reply-map:
>> 450K (read), 430K IOPs (randread)
> OK, that is expected result. After shared sbitmap, IO merge gets improved
> when batching submission is bypassed, meantime IOPS of random IO drops
> because cpu utilization is increased.
> 
> So that isn't a regression, let's live with this awkward situation,:-(

Well at least we have ~ parity with non-shared sbitmap now. And also 
know higher performance is possible for "read" (vs "randread") scenario, 
FWIW.

BTW, recently we have seen 2x optimisation/improvement for shared 
sbitmap which were under/related to nr_hw_queues == 1 check - this patch 
and the changing of the default IO sched.

I am wondering how you detected/analyzed this issue, and whether we need 
to audit other nr_hw_queues == 1 checks? I did a quick scan, and the 
only possible thing I see is the other q->nr_hw_queues > 1 check for 
direct issue in blk_mq_subit_bio() - I suspect you know more about that 
topic.

Thanks,
John


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-19  8:41               ` John Garry
@ 2021-05-20  1:23                 ` Ming Lei
  2021-05-20  8:21                   ` John Garry
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2021-05-20  1:23 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On Wed, May 19, 2021 at 09:41:01AM +0100, John Garry wrote:
> On 19/05/2021 01:21, Ming Lei wrote:
> > > The 'after' results are similar to without shared sbitmap, i.e using
> > > reply-map:
> > > 
> > > reply-map:
> > > 450K (read), 430K IOPs (randread)
> > OK, that is expected result. After shared sbitmap, IO merge gets improved
> > when batching submission is bypassed, meantime IOPS of random IO drops
> > because cpu utilization is increased.
> > 
> > So that isn't a regression, let's live with this awkward situation,:-(
> 
> Well at least we have ~ parity with non-shared sbitmap now. And also know
> higher performance is possible for "read" (vs "randread") scenario, FWIW.

NVMe too, but never see it becomes true, :-)

> 
> BTW, recently we have seen 2x optimisation/improvement for shared sbitmap
> which were under/related to nr_hw_queues == 1 check - this patch and the
> changing of the default IO sched.

You mean you saw 2X improvement in your hisilicon SAS compared with
non-shared sbitmap? In Yanhui's virt test, we just bring back the perf
to non-shared sbitmap's level.

> 
> I am wondering how you detected/analyzed this issue, and whether we need to
> audit other nr_hw_queues == 1 checks? I did a quick scan, and the only
> possible thing I see is the other q->nr_hw_queues > 1 check for direct issue
> in blk_mq_subit_bio() - I suspect you know more about that topic.

IMO, the direct issue code path is fine.

For slow devices, we won't use none, so the code path can't be reached.

For fast device, direct issue is often a win.

Also looks your test on non has shown that perf is fine.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] blk-mq: plug request for shared sbitmap
  2021-05-20  1:23                 ` Ming Lei
@ 2021-05-20  8:21                   ` John Garry
  0 siblings, 0 replies; 15+ messages in thread
From: John Garry @ 2021-05-20  8:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma,
	Bart Van Assche, kashyap.desai, chenxiang

On 20/05/2021 02:23, Ming Lei wrote:
>> BTW, recently we have seen 2x optimisation/improvement for shared sbitmap
>> which were under/related to nr_hw_queues == 1 check - this patch and the
>> changing of the default IO sched.
> You mean you saw 2X improvement in your hisilicon SAS compared with
> non-shared sbitmap? In Yanhui's virt test, we just bring back the perf
> to non-shared sbitmap's level.
> 

Sorry, I meant one improvement is using mq-deadline by default, and 
other improvement is the change in this patch. I didn't mean a double in 
throughput.

Thanks,
John

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-20  8:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-14  2:20 [PATCH] blk-mq: plug request for shared sbitmap Ming Lei
2021-05-14 14:59 ` Jens Axboe
2021-05-18  9:44 ` John Garry
2021-05-18 11:16   ` Ming Lei
2021-05-18 11:42     ` John Garry
2021-05-18 12:00       ` Ming Lei
2021-05-18 12:51         ` John Garry
2021-05-18 16:01           ` John Garry
2021-05-19  0:21             ` Ming Lei
2021-05-19  8:41               ` John Garry
2021-05-20  1:23                 ` Ming Lei
2021-05-20  8:21                   ` John Garry
2021-05-18 11:54     ` Hannes Reinecke
2021-05-18 12:37       ` John Garry
2021-05-18 13:22         ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).