* [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset @ 2021-04-06 3:19 Ming Lei 2021-04-06 3:49 ` Martin K. Petersen ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Ming Lei @ 2021-04-06 3:19 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Ming Lei, Yanhui Ma, John Garry, Hannes Reinecke Yanhui found that write performance is degraded a lot after applying hctx shared tagset on one test machine with megaraid_sas. And turns out it is caused by none scheduler which becomes default elevator caused by hctx shared tagset patchset. Given more scsi HBAs will apply hctx shared tagset, and the similar performance exists for them too. So keep previous behavior by still using default mq-deadline for queues which apply hctx shared tagset, just like before. Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset") Reported-by: Yanhui Ma <yama@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/elevator.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/elevator.c b/block/elevator.c index 293c5c81397a..440699c28119 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -621,7 +621,8 @@ static inline bool elv_support_iosched(struct request_queue *q) */ static struct elevator_type *elevator_get_default(struct request_queue *q) { - if (q->nr_hw_queues != 1) + if (q->nr_hw_queues != 1 && + !blk_mq_is_sbitmap_shared(q->tag_set->flags)) return NULL; return elevator_get(q, "mq-deadline", false); -- 2.29.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset 2021-04-06 3:19 [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset Ming Lei @ 2021-04-06 3:49 ` Martin K. Petersen 2021-04-06 17:54 ` Bart Van Assche ` (2 subsequent siblings) 3 siblings, 0 replies; 11+ messages in thread From: Martin K. Petersen @ 2021-04-06 3:49 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, linux-block, Yanhui Ma, John Garry, Hannes Reinecke Ming, > Given more scsi HBAs will apply hctx shared tagset, and the similar > performance exists for them too. > > So keep previous behavior by still using default mq-deadline for > queues which apply hctx shared tagset, just like before. Seems sensible to me. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset 2021-04-06 3:19 [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset Ming Lei 2021-04-06 3:49 ` Martin K. Petersen @ 2021-04-06 17:54 ` Bart Van Assche 2021-04-06 22:25 ` John Garry 2021-04-08 8:36 ` Ming Lei 3 siblings, 0 replies; 11+ messages in thread From: Bart Van Assche @ 2021-04-06 17:54 UTC (permalink / raw) To: Ming Lei, Jens Axboe; +Cc: linux-block, Yanhui Ma, John Garry, Hannes Reinecke On 4/5/21 8:19 PM, Ming Lei wrote: > Yanhui found that write performance is degraded a lot after applying > hctx shared tagset on one test machine with megaraid_sas. And turns out > it is caused by none scheduler which becomes default elevator caused by > hctx shared tagset patchset. > > Given more scsi HBAs will apply hctx shared tagset, and the similar > performance exists for them too. > > So keep previous behavior by still using default mq-deadline for queues > which apply hctx shared tagset, just like before. Reviewed-by: Bart Van Assche <bvanassche@acm.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset 2021-04-06 3:19 [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset Ming Lei 2021-04-06 3:49 ` Martin K. Petersen 2021-04-06 17:54 ` Bart Van Assche @ 2021-04-06 22:25 ` John Garry 2021-04-07 0:48 ` Ming Lei 2021-04-08 8:36 ` Ming Lei 3 siblings, 1 reply; 11+ messages in thread From: John Garry @ 2021-04-06 22:25 UTC (permalink / raw) To: Ming Lei, Jens Axboe; +Cc: linux-block, Yanhui Ma, Hannes Reinecke On 06/04/2021 04:19, Ming Lei wrote: Hi Ming, > Yanhui found that write performance is degraded a lot after applying > hctx shared tagset on one test machine with megaraid_sas. And turns out > it is caused by none scheduler which becomes default elevator caused by > hctx shared tagset patchset. > > Given more scsi HBAs will apply hctx shared tagset, and the similar > performance exists for them too. > > So keep previous behavior by still using default mq-deadline for queues > which apply hctx shared tagset, just like before. I think that there a some SCSI HBAs which have nr_hw_queues > 1 and don't use shared sbitmap - do you think that they want want this as well (without knowing it)? IIRC, the upcoming broadcom SCSI HBA driver does this. Thanks, John > > Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset") > Reported-by: Yanhui Ma <yama@redhat.com> > Cc: John Garry <john.garry@huawei.com> > Cc: Hannes Reinecke <hare@suse.de> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/elevator.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/block/elevator.c b/block/elevator.c > index 293c5c81397a..440699c28119 100644 > --- a/block/elevator.c > +++ b/block/elevator.c > @@ -621,7 +621,8 @@ static inline bool elv_support_iosched(struct request_queue *q) > */ > static struct elevator_type *elevator_get_default(struct request_queue *q) > { > - if (q->nr_hw_queues != 1) > + if (q->nr_hw_queues != 1 && > + !blk_mq_is_sbitmap_shared(q->tag_set->flags)) > return NULL; > > return elevator_get(q, "mq-deadline", false); > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset 2021-04-06 22:25 ` John Garry @ 2021-04-07 0:48 ` Ming Lei 2021-04-07 8:04 ` John Garry 0 siblings, 1 reply; 11+ messages in thread From: Ming Lei @ 2021-04-07 0:48 UTC (permalink / raw) To: John Garry; +Cc: Jens Axboe, linux-block, Yanhui Ma, Hannes Reinecke On Tue, Apr 06, 2021 at 11:25:08PM +0100, John Garry wrote: > On 06/04/2021 04:19, Ming Lei wrote: > > Hi Ming, > > > Yanhui found that write performance is degraded a lot after applying > > hctx shared tagset on one test machine with megaraid_sas. And turns out > > it is caused by none scheduler which becomes default elevator caused by > > hctx shared tagset patchset. > > > > Given more scsi HBAs will apply hctx shared tagset, and the similar > > performance exists for them too. > > > > So keep previous behavior by still using default mq-deadline for queues > > which apply hctx shared tagset, just like before. > > I think that there a some SCSI HBAs which have nr_hw_queues > 1 and don't > use shared sbitmap - do you think that they want want this as well (without > knowing it)? I don't know but none has been used for them since the beginning, so not an regression of shared tagset, but this one is really. Thanks, Ming ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset 2021-04-07 0:48 ` Ming Lei @ 2021-04-07 8:04 ` John Garry 2021-04-07 10:14 ` Ming Lei 0 siblings, 1 reply; 11+ messages in thread From: John Garry @ 2021-04-07 8:04 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Yanhui Ma, Hannes Reinecke, Kashyap Desai Reviewed-by: John Garry <john.garry@huawei.com> > On Tue, Apr 06, 2021 at 11:25:08PM +0100, John Garry wrote: >> On 06/04/2021 04:19, Ming Lei wrote: >> >> Hi Ming, >> >>> Yanhui found that write performance is degraded a lot after applying >>> hctx shared tagset on one test machine with megaraid_sas. And turns out >>> it is caused by none scheduler which becomes default elevator caused by >>> hctx shared tagset patchset. >>> >>> Given more scsi HBAs will apply hctx shared tagset, and the similar >>> performance exists for them too. >>> >>> So keep previous behavior by still using default mq-deadline for queues >>> which apply hctx shared tagset, just like before. >> I think that there a some SCSI HBAs which have nr_hw_queues > 1 and don't >> use shared sbitmap - do you think that they want want this as well (without >> knowing it)? > I don't know but none has been used for them since the beginning, so not > an regression of shared tagset, but this one is really. It seems fine to revert to previous behavior when host_tagset is set. I didn't check the results for this recently, but for the original shared tagset patchset [0] I had: none sched: 2132K IOPS mq-deadline sched: 2145K IOPS A quick audit of other SCSI HBA drivers in drivers/scsi which set nr_hw_queues and don't set host_tagset gives lpfc, qla2xxx, qedi (nr_hw_queues seems to be getting unset), storvsc_drv (host_tagset might be getting set), virtio_scsi, and then mpi3mr Thanks, John [0] https://lore.kernel.org/linux-scsi/1597850436-116171-1-git-send-email-john.garry@huawei.com/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset 2021-04-07 8:04 ` John Garry @ 2021-04-07 10:14 ` Ming Lei 2021-04-14 8:21 ` Kashyap Desai 0 siblings, 1 reply; 11+ messages in thread From: Ming Lei @ 2021-04-07 10:14 UTC (permalink / raw) To: John Garry Cc: Jens Axboe, linux-block, Yanhui Ma, Hannes Reinecke, Kashyap Desai On Wed, Apr 07, 2021 at 09:04:30AM +0100, John Garry wrote: > Reviewed-by: John Garry <john.garry@huawei.com> > > > > On Tue, Apr 06, 2021 at 11:25:08PM +0100, John Garry wrote: > > > On 06/04/2021 04:19, Ming Lei wrote: > > > > > > Hi Ming, > > > > > > > Yanhui found that write performance is degraded a lot after applying > > > > hctx shared tagset on one test machine with megaraid_sas. And turns out > > > > it is caused by none scheduler which becomes default elevator caused by > > > > hctx shared tagset patchset. > > > > > > > > Given more scsi HBAs will apply hctx shared tagset, and the similar > > > > performance exists for them too. > > > > > > > > So keep previous behavior by still using default mq-deadline for queues > > > > which apply hctx shared tagset, just like before. > > > I think that there a some SCSI HBAs which have nr_hw_queues > 1 and don't > > > use shared sbitmap - do you think that they want want this as well (without > > > knowing it)? > > I don't know but none has been used for them since the beginning, so not > > an regression of shared tagset, but this one is really. > > It seems fine to revert to previous behavior when host_tagset is set. I > didn't check the results for this recently, but for the original shared > tagset patchset [0] I had: > > none sched: 2132K IOPS > mq-deadline sched: 2145K IOPS BTW, Yanhui reported that sequential write on virtio-scsi drops by 40~70% in VM, and the virito-scsi is backed by file image on XFS over megaraid_sas. And the disk is actually SSD, instead of HDD. It could be worse in case of megaraid_sas HDD. Same drop is observed on virtio-blk too. I didn't figure out one simple reproducer in host side yet, but the performance data is pretty stable in the VM IO workload. Thanks, Ming ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset 2021-04-07 10:14 ` Ming Lei @ 2021-04-14 8:21 ` Kashyap Desai 2021-04-14 10:48 ` Ming Lei 0 siblings, 1 reply; 11+ messages in thread From: Kashyap Desai @ 2021-04-14 8:21 UTC (permalink / raw) To: Ming Lei, John Garry; +Cc: Jens Axboe, linux-block, Yanhui Ma, Hannes Reinecke [-- Attachment #1: Type: text/plain, Size: 3091 bytes --] > > On Wed, Apr 07, 2021 at 09:04:30AM +0100, John Garry wrote: > > Reviewed-by: John Garry <john.garry@huawei.com> > > > > > > > On Tue, Apr 06, 2021 at 11:25:08PM +0100, John Garry wrote: > > > > On 06/04/2021 04:19, Ming Lei wrote: > > > > > > > > Hi Ming, > > > > > > > > > Yanhui found that write performance is degraded a lot after > > > > > applying hctx shared tagset on one test machine with > > > > > megaraid_sas. And turns out it is caused by none scheduler which > > > > > becomes default elevator caused by hctx shared tagset patchset. > > > > > > > > > > Given more scsi HBAs will apply hctx shared tagset, and the > > > > > similar performance exists for them too. > > > > > > > > > > So keep previous behavior by still using default mq-deadline for > > > > > queues which apply hctx shared tagset, just like before. > > > > I think that there a some SCSI HBAs which have nr_hw_queues > 1 > > > > and don't use shared sbitmap - do you think that they want want > > > > this as well (without knowing it)? John - I have noted this and discussing internally. This patch fixing shared host tag behavior is good (and required to intact earlier behavior) but for <mpi3mr> which is true multi hardware queue interface, I will update later. In general most of the OS vendor recommend <mq-deadline> for rotational media and <none> for non-rotational media. We would like to go with this method in <mpi3mr> driver. > > > I don't know but none has been used for them since the beginning, so > > > not an regression of shared tagset, but this one is really. > > > > It seems fine to revert to previous behavior when host_tagset is set. > > I didn't check the results for this recently, but for the original > > shared tagset patchset [0] I had: > > > > none sched: 2132K IOPS > > mq-deadline sched: 2145K IOPS On my local setup also I did not see much difference. > > BTW, Yanhui reported that sequential write on virtio-scsi drops by 40~70% in > VM, and the virito-scsi is backed by file image on XFS over megaraid_sas. And > the disk is actually SSD, instead of HDD. It could be worse in case of > megaraid_sas HDD. Ming - If we have old megaraid_sas driver (without host tag set patch), and just toggling io-scheduler from <none> to <mq-deadline> (through sysfs) also gives similar performance drop. ? I think performance drop using <none> io scheduler, might be due to bio merge is missing compare to mq-deadline. It may not be linked to shared host tag IO path. Usually bio merge does not help for sequential work load if back-end is enterprise SSDs/NVME, but it is not always true. It is difficult to have all setup and workload to get benefit from one io-scheduler. I may like to reproduce similar drop locally. I will check with you and Yanhui about how to reproduce similar drop (for my future reference and want to have similar test in my performance BST). Kashyap > > Same drop is observed on virtio-blk too. > > I didn't figure out one simple reproducer in host side yet, but the performance > data is pretty stable in the VM IO workload. > > > Thanks, > Ming [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4212 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset 2021-04-14 8:21 ` Kashyap Desai @ 2021-04-14 10:48 ` Ming Lei 0 siblings, 0 replies; 11+ messages in thread From: Ming Lei @ 2021-04-14 10:48 UTC (permalink / raw) To: Kashyap Desai Cc: John Garry, Jens Axboe, linux-block, Yanhui Ma, Hannes Reinecke On Wed, Apr 14, 2021 at 01:51:01PM +0530, Kashyap Desai wrote: > > > > On Wed, Apr 07, 2021 at 09:04:30AM +0100, John Garry wrote: > > > Reviewed-by: John Garry <john.garry@huawei.com> > > > > > > > > > > On Tue, Apr 06, 2021 at 11:25:08PM +0100, John Garry wrote: > > > > > On 06/04/2021 04:19, Ming Lei wrote: > > > > > > > > > > Hi Ming, > > > > > > > > > > > Yanhui found that write performance is degraded a lot after > > > > > > applying hctx shared tagset on one test machine with > > > > > > megaraid_sas. And turns out it is caused by none scheduler which > > > > > > becomes default elevator caused by hctx shared tagset patchset. > > > > > > > > > > > > Given more scsi HBAs will apply hctx shared tagset, and the > > > > > > similar performance exists for them too. > > > > > > > > > > > > So keep previous behavior by still using default mq-deadline for > > > > > > queues which apply hctx shared tagset, just like before. > > > > > I think that there a some SCSI HBAs which have nr_hw_queues > 1 > > > > > and don't use shared sbitmap - do you think that they want want > > > > > this as well (without knowing it)? > > John - I have noted this and discussing internally. > This patch fixing shared host tag behavior is good (and required to intact > earlier behavior) but for <mpi3mr> which is true multi hardware queue > interface, I will update later. > In general most of the OS vendor recommend <mq-deadline> for rotational > media and <none> for non-rotational media. We would like to go with this > method in <mpi3mr> driver. > > > > > > I don't know but none has been used for them since the beginning, so > > > > not an regression of shared tagset, but this one is really. > > > > > > It seems fine to revert to previous behavior when host_tagset is set. > > > I didn't check the results for this recently, but for the original > > > shared tagset patchset [0] I had: > > > > > > none sched: 2132K IOPS > > > mq-deadline sched: 2145K IOPS > > On my local setup also I did not see much difference. > > > > > BTW, Yanhui reported that sequential write on virtio-scsi drops by > 40~70% in > > VM, and the virito-scsi is backed by file image on XFS over > megaraid_sas. And > > the disk is actually SSD, instead of HDD. It could be worse in case of > > megaraid_sas HDD. > > Ming - If we have old megaraid_sas driver (without host tag set patch), > and just toggling io-scheduler from <none> to <mq-deadline> (through > sysfs) also gives similar performance drop. ? The default io sched for old megraid_sas is mq-deadline, which performs very well in Yanhui's virt workloads. And with none, IO performance drops much with new driver(shared tags). The disk is INTEL SSDSC2CT06. > > I think performance drop using <none> io scheduler, might be due to bio > merge is missing compare to mq-deadline. It may not be linked to shared > host tag IO path. > Usually bio merge does not help for sequential work load if back-end is > enterprise SSDs/NVME, but it is not always true. It is difficult to have > all setup and workload to get benefit from one io-scheduler. BTW, with mq-deadline & shared tags, CPU utilization is increased by ~20% in some VM fio test Thanks, Ming ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset 2021-04-06 3:19 [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset Ming Lei ` (2 preceding siblings ...) 2021-04-06 22:25 ` John Garry @ 2021-04-08 8:36 ` Ming Lei 2021-04-08 15:57 ` Jens Axboe 3 siblings, 1 reply; 11+ messages in thread From: Ming Lei @ 2021-04-08 8:36 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Yanhui Ma, John Garry, Hannes Reinecke Hello Jens, On Tue, Apr 06, 2021 at 11:19:33AM +0800, Ming Lei wrote: > Yanhui found that write performance is degraded a lot after applying > hctx shared tagset on one test machine with megaraid_sas. And turns out > it is caused by none scheduler which becomes default elevator caused by > hctx shared tagset patchset. > > Given more scsi HBAs will apply hctx shared tagset, and the similar > performance exists for them too. > > So keep previous behavior by still using default mq-deadline for queues > which apply hctx shared tagset, just like before. > > Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset") > Reported-by: Yanhui Ma <yama@redhat.com> > Cc: John Garry <john.garry@huawei.com> > Cc: Hannes Reinecke <hare@suse.de> > Signed-off-by: Ming Lei <ming.lei@redhat.com> Any chance to make it in 5.12 if you are fine? Thanks, Ming ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset 2021-04-08 8:36 ` Ming Lei @ 2021-04-08 15:57 ` Jens Axboe 0 siblings, 0 replies; 11+ messages in thread From: Jens Axboe @ 2021-04-08 15:57 UTC (permalink / raw) To: Ming Lei; +Cc: linux-block, Yanhui Ma, John Garry, Hannes Reinecke On 4/8/21 2:36 AM, Ming Lei wrote: > Hello Jens, > > On Tue, Apr 06, 2021 at 11:19:33AM +0800, Ming Lei wrote: >> Yanhui found that write performance is degraded a lot after applying >> hctx shared tagset on one test machine with megaraid_sas. And turns out >> it is caused by none scheduler which becomes default elevator caused by >> hctx shared tagset patchset. >> >> Given more scsi HBAs will apply hctx shared tagset, and the similar >> performance exists for them too. >> >> So keep previous behavior by still using default mq-deadline for queues >> which apply hctx shared tagset, just like before. >> >> Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset") >> Reported-by: Yanhui Ma <yama@redhat.com> >> Cc: John Garry <john.garry@huawei.com> >> Cc: Hannes Reinecke <hare@suse.de> >> Signed-off-by: Ming Lei <ming.lei@redhat.com> > > Any chance to make it in 5.12 if you are fine? Let's just queue it through stable when it's in Linus's tree. This isn't a new regression, so there should be no need to expedite into the current release. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-04-14 10:48 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-06 3:19 [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset Ming Lei 2021-04-06 3:49 ` Martin K. Petersen 2021-04-06 17:54 ` Bart Van Assche 2021-04-06 22:25 ` John Garry 2021-04-07 0:48 ` Ming Lei 2021-04-07 8:04 ` John Garry 2021-04-07 10:14 ` Ming Lei 2021-04-14 8:21 ` Kashyap Desai 2021-04-14 10:48 ` Ming Lei 2021-04-08 8:36 ` Ming Lei 2021-04-08 15:57 ` Jens Axboe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.