* [PATCH] blk-mq: plug request for shared sbitmap @ 2021-05-14 2:20 Ming Lei 2021-05-14 14:59 ` Jens Axboe 2021-05-18 9:44 ` John Garry 0 siblings, 2 replies; 15+ messages in thread From: Ming Lei @ 2021-05-14 2:20 UTC (permalink / raw) To: Jens Axboe, Christoph Hellwig Cc: linux-block, Ming Lei, Yanhui Ma, John Garry, Bart Van Assche, kashyap.desai In case of shared sbitmap, request won't be held in plug list any more sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset"), this way makes request merge from flush plug list & batching submission not possible, so cause performance regression. Yanhui reports performance regression when running sequential IO test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk is emulated with image stored on xfs/megaraid_sas. Fix the issue by recovering original behavior to allow to hold request in plug list. Cc: Yanhui Ma <yama@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: kashyap.desai@broadcom.com Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset") Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index ae7f5ee41cd3..baf7a9546068 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2236,8 +2236,9 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio) /* Bypass scheduler for flush requests */ blk_insert_flush(rq); blk_mq_run_hw_queue(data.hctx, true); - } else if (plug && (q->nr_hw_queues == 1 || q->mq_ops->commit_rqs || - !blk_queue_nonrot(q))) { + } else if (plug && (q->nr_hw_queues == 1 || + blk_mq_is_sbitmap_shared(rq->mq_hctx->flags) || + q->mq_ops->commit_rqs || !blk_queue_nonrot(q))) { /* * Use plugging if we have a ->commit_rqs() hook as well, as * we know the driver uses bd->last in a smart fashion. -- 2.29.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-14 2:20 [PATCH] blk-mq: plug request for shared sbitmap Ming Lei @ 2021-05-14 14:59 ` Jens Axboe 2021-05-18 9:44 ` John Garry 1 sibling, 0 replies; 15+ messages in thread From: Jens Axboe @ 2021-05-14 14:59 UTC (permalink / raw) To: Ming Lei, Christoph Hellwig Cc: linux-block, Yanhui Ma, John Garry, Bart Van Assche, kashyap.desai On 5/13/21 8:20 PM, Ming Lei wrote: > In case of shared sbitmap, request won't be held in plug list any more > sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per > tagset"), this way makes request merge from flush plug list & batching > submission not possible, so cause performance regression. > > Yanhui reports performance regression when running sequential IO > test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk > is emulated with image stored on xfs/megaraid_sas. > > Fix the issue by recovering original behavior to allow to hold request > in plug list. Applied, thanks. -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-14 2:20 [PATCH] blk-mq: plug request for shared sbitmap Ming Lei 2021-05-14 14:59 ` Jens Axboe @ 2021-05-18 9:44 ` John Garry 2021-05-18 11:16 ` Ming Lei 1 sibling, 1 reply; 15+ messages in thread From: John Garry @ 2021-05-18 9:44 UTC (permalink / raw) To: Ming Lei, Jens Axboe, Christoph Hellwig Cc: linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On 14/05/2021 03:20, Ming Lei wrote: > In case of shared sbitmap, request won't be held in plug list any more > sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per > tagset"), this way makes request merge from flush plug list & batching > submission not possible, so cause performance regression. > > Yanhui reports performance regression when running sequential IO > test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk > is emulated with image stored on xfs/megaraid_sas. > > Fix the issue by recovering original behavior to allow to hold request > in plug list. Hi Ming, Since testing v5.13-rc2, I noticed that this patch made the hang I was seeing disappear: https://lore.kernel.org/linux-scsi/3d72d64d-314f-9d34-e039-7e508b2abe1b@huawei.com/ I don't think that problem is solved, though. So I wonder about throughput performance (I had hoped to test before it was merged). I only have 6x SAS SSDs at hand, but I see some significant changes (good and bad) for mq-deadline for hisi_sas: Before 620K (read), 300K IOPs (randread) After 430K (read), 460-490K IOPs (randread) none IO sched is always about 450K (read) and 500K (randread) Do you guys have any figures? Are my results as expected? Thanks, John > > Cc: Yanhui Ma <yama@redhat.com> > Cc: John Garry <john.garry@huawei.com> > Cc: Bart Van Assche <bvanassche@acm.org> > Cc: kashyap.desai@broadcom.com > Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset") > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-mq.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index ae7f5ee41cd3..baf7a9546068 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2236,8 +2236,9 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio) > /* Bypass scheduler for flush requests */ > blk_insert_flush(rq); > blk_mq_run_hw_queue(data.hctx, true); > - } else if (plug && (q->nr_hw_queues == 1 || q->mq_ops->commit_rqs || > - !blk_queue_nonrot(q))) { > + } else if (plug && (q->nr_hw_queues == 1 || > + blk_mq_is_sbitmap_shared(rq->mq_hctx->flags) || > + q->mq_ops->commit_rqs || !blk_queue_nonrot(q))) { > /* > * Use plugging if we have a ->commit_rqs() hook as well, as > * we know the driver uses bd->last in a smart fashion. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-18 9:44 ` John Garry @ 2021-05-18 11:16 ` Ming Lei 2021-05-18 11:42 ` John Garry 2021-05-18 11:54 ` Hannes Reinecke 0 siblings, 2 replies; 15+ messages in thread From: Ming Lei @ 2021-05-18 11:16 UTC (permalink / raw) To: John Garry Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On Tue, May 18, 2021 at 10:44:43AM +0100, John Garry wrote: > On 14/05/2021 03:20, Ming Lei wrote: > > In case of shared sbitmap, request won't be held in plug list any more > > sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per > > tagset"), this way makes request merge from flush plug list & batching > > submission not possible, so cause performance regression. > > > > Yanhui reports performance regression when running sequential IO > > test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk > > is emulated with image stored on xfs/megaraid_sas. > > > > Fix the issue by recovering original behavior to allow to hold request > > in plug list. > > Hi Ming, > > Since testing v5.13-rc2, I noticed that this patch made the hang I was > seeing disappear: > https://lore.kernel.org/linux-scsi/3d72d64d-314f-9d34-e039-7e508b2abe1b@huawei.com/ > > I don't think that problem is solved, though. This kind of hang or lockup is usually related with cpu utilization, and this patch may reduce cpu utilization in submission context. > > So I wonder about throughput performance (I had hoped to test before it was > merged). I only have 6x SAS SSDs at hand, but I see some significant changes > (good and bad) for mq-deadline for hisi_sas: > Before 620K (read), 300K IOPs (randread) > After 430K (read), 460-490K IOPs (randread) 'Before 620K' could be caused by busy queue when batching submission isn't applied, so merge chance is increased. This patch applies batching submission, so queue becomes not busy enough. BTW, what is the queue depth of sdev and can_queue of shost for your hisilision SAS? > > none IO sched is always about 450K (read) and 500K (randread) > > Do you guys have any figures? Are my results as expected? In yanhui's virt workload(qemu, libaio, dio, high queue depth, single job), the patch can improve throughput much(>50%) when running sequential write(dio, libaio, 16 jobs) to XFS. And it is observed that IO merge is recovered to level of disabling host tags. Thanks, Ming ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-18 11:16 ` Ming Lei @ 2021-05-18 11:42 ` John Garry 2021-05-18 12:00 ` Ming Lei 2021-05-18 11:54 ` Hannes Reinecke 1 sibling, 1 reply; 15+ messages in thread From: John Garry @ 2021-05-18 11:42 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On 18/05/2021 12:16, Ming Lei wrote: > On Tue, May 18, 2021 at 10:44:43AM +0100, John Garry wrote: >> On 14/05/2021 03:20, Ming Lei wrote: >>> In case of shared sbitmap, request won't be held in plug list any more >>> sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per >>> tagset"), this way makes request merge from flush plug list & batching >>> submission not possible, so cause performance regression. >>> >>> Yanhui reports performance regression when running sequential IO >>> test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk >>> is emulated with image stored on xfs/megaraid_sas. >>> >>> Fix the issue by recovering original behavior to allow to hold request >>> in plug list. >> >> Hi Ming, >> >> Since testing v5.13-rc2, I noticed that this patch made the hang I was >> seeing disappear: >> https://lore.kernel.org/linux-scsi/3d72d64d-314f-9d34-e039-7e508b2abe1b@huawei.com/ >> >> I don't think that problem is solved, though. > > This kind of hang or lockup is usually related with cpu utilization, and > this patch may reduce cpu utilization in submission context. Right. > >> >> So I wonder about throughput performance (I had hoped to test before it was >> merged). I only have 6x SAS SSDs at hand, but I see some significant changes >> (good and bad) for mq-deadline for hisi_sas: >> Before 620K (read), 300K IOPs (randread) >> After 430K (read), 460-490K IOPs (randread) > > 'Before 620K' could be caused by busy queue when batching submission isn't > applied, so merge chance is increased. This patch applies batching > submission, so queue becomes not busy enough. > > BTW, what is the queue depth of sdev and can_queue of shost for your hisilision SAS? sdev queue depth is 64 (see hisi_sas_slave_configure()) and host depth is 4096 - 96 (for reserved tags) = 4000 IIRC, megaraid sas and mpt3sas use 256 for SAS sdev queue depth > >> >> none IO sched is always about 450K (read) and 500K (randread) >> >> Do you guys have any figures? Are my results as expected? > > In yanhui's virt workload(qemu, libaio, dio, high queue depth, single > job), the patch can improve throughput much(>50%) when running > sequential write(dio, libaio, 16 jobs) to XFS. And it is observed that > IO merge is recovered to level of disabling host tags. > Thanks, John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-18 11:42 ` John Garry @ 2021-05-18 12:00 ` Ming Lei 2021-05-18 12:51 ` John Garry 0 siblings, 1 reply; 15+ messages in thread From: Ming Lei @ 2021-05-18 12:00 UTC (permalink / raw) To: John Garry Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On Tue, May 18, 2021 at 12:42:22PM +0100, John Garry wrote: > On 18/05/2021 12:16, Ming Lei wrote: > > On Tue, May 18, 2021 at 10:44:43AM +0100, John Garry wrote: > > > On 14/05/2021 03:20, Ming Lei wrote: > > > > In case of shared sbitmap, request won't be held in plug list any more > > > > sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per > > > > tagset"), this way makes request merge from flush plug list & batching > > > > submission not possible, so cause performance regression. > > > > > > > > Yanhui reports performance regression when running sequential IO > > > > test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk > > > > is emulated with image stored on xfs/megaraid_sas. > > > > > > > > Fix the issue by recovering original behavior to allow to hold request > > > > in plug list. > > > > > > Hi Ming, > > > > > > Since testing v5.13-rc2, I noticed that this patch made the hang I was > > > seeing disappear: > > > https://lore.kernel.org/linux-scsi/3d72d64d-314f-9d34-e039-7e508b2abe1b@huawei.com/ > > > > > > I don't think that problem is solved, though. > > > > This kind of hang or lockup is usually related with cpu utilization, and > > this patch may reduce cpu utilization in submission context. > > Right. > > > > > > > > > So I wonder about throughput performance (I had hoped to test before it was > > > merged). I only have 6x SAS SSDs at hand, but I see some significant changes > > > (good and bad) for mq-deadline for hisi_sas: > > > Before 620K (read), 300K IOPs (randread) > > > After 430K (read), 460-490K IOPs (randread) > > > > 'Before 620K' could be caused by busy queue when batching submission isn't > > applied, so merge chance is increased. This patch applies batching > > submission, so queue becomes not busy enough. > > > > BTW, what is the queue depth of sdev and can_queue of shost for your hisilision SAS? > > sdev queue depth is 64 (see hisi_sas_slave_configure()) and host depth is > 4096 - 96 (for reserved tags) = 4000 OK, so queue depth should get IO merged if there are too many requests queued. What is the same read test result without shared tags? still 430K? And what is your exact read test script? And how many cpu cores in your system? Thanks, Ming ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-18 12:00 ` Ming Lei @ 2021-05-18 12:51 ` John Garry 2021-05-18 16:01 ` John Garry 0 siblings, 1 reply; 15+ messages in thread From: John Garry @ 2021-05-18 12:51 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On 18/05/2021 13:00, Ming Lei wrote: >>> 'Before 620K' could be caused by busy queue when batching submission isn't >>> applied, so merge chance is increased. This patch applies batching >>> submission, so queue becomes not busy enough. >>> >>> BTW, what is the queue depth of sdev and can_queue of shost for your hisilision SAS? >> sdev queue depth is 64 (see hisi_sas_slave_configure()) and host depth is >> 4096 - 96 (for reserved tags) = 4000 > OK, so queue depth should get IO merged if there are too many requests > queued. > > What is the same read test result without shared tags? still 430K? I never left a driver switch in place to disable it. I can forward-port "reply-map" support, which is not too difficult and I will let you know the result. > And what is your exact read test script? And how many cpu cores in > your system? 64 CPUs, 16x hw queues. [global] rw=read direct=1 ioengine=libaio iodepth=128 numjobs=1 bs=4k group_reporting=1 runtime=4500 loops = 10000 [job1] filename=/dev/sda: [job1] filename=/dev/sdc: [job1] filename=/dev/sde: [job1] filename=/dev/sdf: [job1] filename=/dev/sdg: [job1] filename=/dev/sdh: Thanks, John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-18 12:51 ` John Garry @ 2021-05-18 16:01 ` John Garry 2021-05-19 0:21 ` Ming Lei 0 siblings, 1 reply; 15+ messages in thread From: John Garry @ 2021-05-18 16:01 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On 18/05/2021 13:51, John Garry wrote: > n 18/05/2021 13:00, Ming Lei wrote: >>>> 'Before 620K' could be caused by busy queue when batching submission >>>> isn't >>>> applied, so merge chance is increased. This patch applies batching >>>> submission, so queue becomes not busy enough. >>>> >>>> BTW, what is the queue depth of sdev and can_queue of shost for your >>>> hisilision SAS? >>> sdev queue depth is 64 (see hisi_sas_slave_configure()) and host >>> depth is >>> 4096 - 96 (for reserved tags) = 4000 >> OK, so queue depth should get IO merged if there are too many requests >> queued. >> >> What is the same read test result without shared tags? still 430K? > > I never left a driver switch in place to disable it. > > I can forward-port "reply-map" support, which is not too difficult and I > will let you know the result. The 'after' results are similar to without shared sbitmap, i.e using reply-map: reply-map: 450K (read), 430K IOPs (randread) For reference, with shared shared sbitmap: Before 620K (read), 300K IOPs (randread) After 460K (read), 430K (randread)* These are all mq-deadline. * I mixed read and randread result earlier by accident > >> And what is your exact read test script? And how many cpu cores in >> your system? > Thanks, John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-18 16:01 ` John Garry @ 2021-05-19 0:21 ` Ming Lei 2021-05-19 8:41 ` John Garry 0 siblings, 1 reply; 15+ messages in thread From: Ming Lei @ 2021-05-19 0:21 UTC (permalink / raw) To: John Garry Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On Tue, May 18, 2021 at 05:01:16PM +0100, John Garry wrote: > On 18/05/2021 13:51, John Garry wrote: > > n 18/05/2021 13:00, Ming Lei wrote: > > > > > 'Before 620K' could be caused by busy queue when batching > > > > > submission isn't > > > > > applied, so merge chance is increased. This patch applies batching > > > > > submission, so queue becomes not busy enough. > > > > > > > > > > BTW, what is the queue depth of sdev and can_queue of shost > > > > > for your hisilision SAS? > > > > sdev queue depth is 64 (see hisi_sas_slave_configure()) and host > > > > depth is > > > > 4096 - 96 (for reserved tags) = 4000 > > > OK, so queue depth should get IO merged if there are too many requests > > > queued. > > > > > > What is the same read test result without shared tags? still 430K? > > > > I never left a driver switch in place to disable it. > > > > I can forward-port "reply-map" support, which is not too difficult and I > > will let you know the result. > > The 'after' results are similar to without shared sbitmap, i.e using > reply-map: > > reply-map: > 450K (read), 430K IOPs (randread) OK, that is expected result. After shared sbitmap, IO merge gets improved when batching submission is bypassed, meantime IOPS of random IO drops because cpu utilization is increased. So that isn't a regression, let's live with this awkward situation, :-( Thanks, Ming ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-19 0:21 ` Ming Lei @ 2021-05-19 8:41 ` John Garry 2021-05-20 1:23 ` Ming Lei 0 siblings, 1 reply; 15+ messages in thread From: John Garry @ 2021-05-19 8:41 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On 19/05/2021 01:21, Ming Lei wrote: >> The 'after' results are similar to without shared sbitmap, i.e using >> reply-map: >> >> reply-map: >> 450K (read), 430K IOPs (randread) > OK, that is expected result. After shared sbitmap, IO merge gets improved > when batching submission is bypassed, meantime IOPS of random IO drops > because cpu utilization is increased. > > So that isn't a regression, let's live with this awkward situation,:-( Well at least we have ~ parity with non-shared sbitmap now. And also know higher performance is possible for "read" (vs "randread") scenario, FWIW. BTW, recently we have seen 2x optimisation/improvement for shared sbitmap which were under/related to nr_hw_queues == 1 check - this patch and the changing of the default IO sched. I am wondering how you detected/analyzed this issue, and whether we need to audit other nr_hw_queues == 1 checks? I did a quick scan, and the only possible thing I see is the other q->nr_hw_queues > 1 check for direct issue in blk_mq_subit_bio() - I suspect you know more about that topic. Thanks, John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-19 8:41 ` John Garry @ 2021-05-20 1:23 ` Ming Lei 2021-05-20 8:21 ` John Garry 0 siblings, 1 reply; 15+ messages in thread From: Ming Lei @ 2021-05-20 1:23 UTC (permalink / raw) To: John Garry Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On Wed, May 19, 2021 at 09:41:01AM +0100, John Garry wrote: > On 19/05/2021 01:21, Ming Lei wrote: > > > The 'after' results are similar to without shared sbitmap, i.e using > > > reply-map: > > > > > > reply-map: > > > 450K (read), 430K IOPs (randread) > > OK, that is expected result. After shared sbitmap, IO merge gets improved > > when batching submission is bypassed, meantime IOPS of random IO drops > > because cpu utilization is increased. > > > > So that isn't a regression, let's live with this awkward situation,:-( > > Well at least we have ~ parity with non-shared sbitmap now. And also know > higher performance is possible for "read" (vs "randread") scenario, FWIW. NVMe too, but never see it becomes true, :-) > > BTW, recently we have seen 2x optimisation/improvement for shared sbitmap > which were under/related to nr_hw_queues == 1 check - this patch and the > changing of the default IO sched. You mean you saw 2X improvement in your hisilicon SAS compared with non-shared sbitmap? In Yanhui's virt test, we just bring back the perf to non-shared sbitmap's level. > > I am wondering how you detected/analyzed this issue, and whether we need to > audit other nr_hw_queues == 1 checks? I did a quick scan, and the only > possible thing I see is the other q->nr_hw_queues > 1 check for direct issue > in blk_mq_subit_bio() - I suspect you know more about that topic. IMO, the direct issue code path is fine. For slow devices, we won't use none, so the code path can't be reached. For fast device, direct issue is often a win. Also looks your test on non has shown that perf is fine. Thanks, Ming ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-20 1:23 ` Ming Lei @ 2021-05-20 8:21 ` John Garry 0 siblings, 0 replies; 15+ messages in thread From: John Garry @ 2021-05-20 8:21 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On 20/05/2021 02:23, Ming Lei wrote: >> BTW, recently we have seen 2x optimisation/improvement for shared sbitmap >> which were under/related to nr_hw_queues == 1 check - this patch and the >> changing of the default IO sched. > You mean you saw 2X improvement in your hisilicon SAS compared with > non-shared sbitmap? In Yanhui's virt test, we just bring back the perf > to non-shared sbitmap's level. > Sorry, I meant one improvement is using mq-deadline by default, and other improvement is the change in this patch. I didn't mean a double in throughput. Thanks, John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-18 11:16 ` Ming Lei 2021-05-18 11:42 ` John Garry @ 2021-05-18 11:54 ` Hannes Reinecke 2021-05-18 12:37 ` John Garry 1 sibling, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2021-05-18 11:54 UTC (permalink / raw) To: Ming Lei, John Garry Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On 5/18/21 1:16 PM, Ming Lei wrote: > On Tue, May 18, 2021 at 10:44:43AM +0100, John Garry wrote: >> On 14/05/2021 03:20, Ming Lei wrote: >>> In case of shared sbitmap, request won't be held in plug list any more >>> sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per >>> tagset"), this way makes request merge from flush plug list & batching >>> submission not possible, so cause performance regression. >>> >>> Yanhui reports performance regression when running sequential IO >>> test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk >>> is emulated with image stored on xfs/megaraid_sas. >>> >>> Fix the issue by recovering original behavior to allow to hold request >>> in plug list. >> >> Hi Ming, >> >> Since testing v5.13-rc2, I noticed that this patch made the hang I was >> seeing disappear: >> https://lore.kernel.org/linux-scsi/3d72d64d-314f-9d34-e039-7e508b2abe1b@huawei.com/ >> >> I don't think that problem is solved, though. > > This kind of hang or lockup is usually related with cpu utilization, and > this patch may reduce cpu utilization in submission context. > >> >> So I wonder about throughput performance (I had hoped to test before it was >> merged). I only have 6x SAS SSDs at hand, but I see some significant changes >> (good and bad) for mq-deadline for hisi_sas: >> Before 620K (read), 300K IOPs (randread) >> After 430K (read), 460-490K IOPs (randread) > > 'Before 620K' could be caused by busy queue when batching submission isn't > applied, so merge chance is increased. This patch applies batching > submission, so queue becomes not busy enough. > > BTW, what is the queue depth of sdev and can_queue of shost for your hisilision SAS? > >> >> none IO sched is always about 450K (read) and 500K (randread) >> >> Do you guys have any figures? Are my results as expected? > > In yanhui's virt workload(qemu, libaio, dio, high queue depth, single > job), the patch can improve throughput much(>50%) when running > sequential write(dio, libaio, 16 jobs) to XFS. And it is observed that > IO merge is recovered to level of disabling host tags. > I've found a good testcase for this by using null_blk: modprobe null_blk submit_queues=32 queue_mode=2 irqmode=0 bs=4096 hw_queue_depth=2048 completion_nsec=1 nr_devices=4 shared_tags=1 shared_tag_bitmap=1 and using a simple fio job with libaio and rw=read and numjobs=32 will do the trick: [nullb] rw=read size=16g ioengine=libaio direct=1 buffered=0 group_reporting=1 bs=4096 numjobs=32 (of course, the 'numjobs' and 'submit_queues' parameter would need to be adjusted to your machine). Omitting the 'shared_tag_bitmap' module parameter would yield around 12M iops; adding it would see a rather dramatic drop to 172k iops. With this patchset it's back to the expected iops value. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, 90409 Nürnberg GF: F. Imendörffer, HRB 36809 (AG Nürnberg) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-18 11:54 ` Hannes Reinecke @ 2021-05-18 12:37 ` John Garry 2021-05-18 13:22 ` Hannes Reinecke 0 siblings, 1 reply; 15+ messages in thread From: John Garry @ 2021-05-18 12:37 UTC (permalink / raw) To: Hannes Reinecke, Ming Lei Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On 18/05/2021 12:54, Hannes Reinecke wrote: >> In yanhui's virt workload(qemu, libaio, dio, high queue depth, single >> job), the patch can improve throughput much(>50%) when running >> sequential write(dio, libaio, 16 jobs) to XFS. And it is observed that >> IO merge is recovered to level of disabling host tags. >> > I've found a good testcase for this by using null_blk: > > modprobe null_blk submit_queues=32 queue_mode=2 irqmode=0 bs=4096 > hw_queue_depth=2048 completion_nsec=1 nr_devices=4 shared_tags=1 > shared_tag_bitmap=1 > > and using a simple fio job with libaio and rw=read and numjobs=32 will > do the trick: > > [nullb] > rw=read > size=16g > ioengine=libaio > direct=1 > buffered=0 > group_reporting=1 > bs=4096 > numjobs=32 > > (of course, the 'numjobs' and 'submit_queues' parameter would need to be > adjusted to your machine). > Omitting the 'shared_tag_bitmap' module parameter would yield around 12M > iops; adding it would see a rather dramatic drop to 172k iops. > With this patchset it's back to the expected iops value. I will consider running this test myself, but do you mean that you get ~12M with shared_tag_bitmap=1 and this patch? I would always expect shared_tag_bitmap=1 to give a drop for null_blk, as we move from per-submit queue driver tag to all submit queues sharing the same driver tagset. And I am not sure if you are fixing the IO sched from default. Thanks, John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blk-mq: plug request for shared sbitmap 2021-05-18 12:37 ` John Garry @ 2021-05-18 13:22 ` Hannes Reinecke 0 siblings, 0 replies; 15+ messages in thread From: Hannes Reinecke @ 2021-05-18 13:22 UTC (permalink / raw) To: John Garry, Ming Lei Cc: Jens Axboe, Christoph Hellwig, linux-block, Yanhui Ma, Bart Van Assche, kashyap.desai, chenxiang On 5/18/21 2:37 PM, John Garry wrote: > On 18/05/2021 12:54, Hannes Reinecke wrote: >>> In yanhui's virt workload(qemu, libaio, dio, high queue depth, single >>> job), the patch can improve throughput much(>50%) when running >>> sequential write(dio, libaio, 16 jobs) to XFS. And it is observed that >>> IO merge is recovered to level of disabling host tags. >>> >> I've found a good testcase for this by using null_blk: >> >> modprobe null_blk submit_queues=32 queue_mode=2 irqmode=0 bs=4096 >> hw_queue_depth=2048 completion_nsec=1 nr_devices=4 shared_tags=1 >> shared_tag_bitmap=1 >> >> and using a simple fio job with libaio and rw=read and numjobs=32 will >> do the trick: >> >> [nullb] >> rw=read >> size=16g >> ioengine=libaio >> direct=1 >> buffered=0 >> group_reporting=1 >> bs=4096 >> numjobs=32 >> >> (of course, the 'numjobs' and 'submit_queues' parameter would need to be >> adjusted to your machine). >> Omitting the 'shared_tag_bitmap' module parameter would yield around 12M >> iops; adding it would see a rather dramatic drop to 172k iops. >> With this patchset it's back to the expected iops value. > > I will consider running this test myself, but do you mean that you get > ~12M with shared_tag_bitmap=1 and this patch? > > I would always expect shared_tag_bitmap=1 to give a drop for null_blk, > as we move from per-submit queue driver tag to all submit queues sharing > the same driver tagset. > There is a slight drop (less than 1M iops), but that is about to be expected; however, as there is a rather high fluctuation in the numbers I didn't include it. But nothing as dramatic as without this patchset :-) > And I am not sure if you are fixing the IO sched from default. > No, I don't; I'm using whatever it's there (ie mq-deadline). Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, 90409 Nürnberg GF: F. Imendörffer, HRB 36809 (AG Nürnberg) ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-05-20 8:23 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-14 2:20 [PATCH] blk-mq: plug request for shared sbitmap Ming Lei 2021-05-14 14:59 ` Jens Axboe 2021-05-18 9:44 ` John Garry 2021-05-18 11:16 ` Ming Lei 2021-05-18 11:42 ` John Garry 2021-05-18 12:00 ` Ming Lei 2021-05-18 12:51 ` John Garry 2021-05-18 16:01 ` John Garry 2021-05-19 0:21 ` Ming Lei 2021-05-19 8:41 ` John Garry 2021-05-20 1:23 ` Ming Lei 2021-05-20 8:21 ` John Garry 2021-05-18 11:54 ` Hannes Reinecke 2021-05-18 12:37 ` John Garry 2021-05-18 13:22 ` Hannes Reinecke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).