linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
@ 2022-11-18 11:30 Yu Kuai
  2022-11-19  1:11 ` Bart Van Assche
  2022-11-26  8:54 ` Yu Kuai
  0 siblings, 2 replies; 13+ messages in thread
From: Yu Kuai @ 2022-11-18 11:30 UTC (permalink / raw)
  To: ming.lei, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

During code review, I found that 'restarts' is not useful anymore after
the following commits:

1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget"
is a reason to kick")
2) commit d3b38596875d ("blk-mq: run queue no matter whether the request
is the last request")
3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE
and completion")

Now that if get budget ever failed, block layer will make sure to
trigger new run queue for the hctx. Hence there is no need to run queue
from scsi layer in this case.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/scsi/scsi_lib.c    | 35 -----------------------------------
 include/scsi/scsi_device.h |  1 -
 2 files changed, 36 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 56f641ba1261..f6325a0f80fb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -507,24 +507,6 @@ static void scsi_run_queue_async(struct scsi_device *sdev)
 	if (scsi_target(sdev)->single_lun ||
 	    !list_empty(&sdev->host->starved_list)) {
 		kblockd_schedule_work(&sdev->requeue_work);
-	} else {
-		/*
-		 * smp_mb() present in sbitmap_queue_clear() or implied in
-		 * .end_io is for ordering writing .device_busy in
-		 * scsi_device_unbusy() and reading sdev->restarts.
-		 */
-		int old = atomic_read(&sdev->restarts);
-
-		/*
-		 * ->restarts has to be kept as non-zero if new budget
-		 *  contention occurs.
-		 *
-		 *  No need to run queue when either another re-run
-		 *  queue wins in updating ->restarts or a new budget
-		 *  contention occurs.
-		 */
-		if (old && atomic_cmpxchg(&sdev->restarts, old, 0) == old)
-			blk_mq_run_hw_queues(sdev->request_queue, true);
 	}
 }
 
@@ -1666,23 +1648,6 @@ static int scsi_mq_get_budget(struct request_queue *q)
 	if (token >= 0)
 		return token;
 
-	atomic_inc(&sdev->restarts);
-
-	/*
-	 * Orders atomic_inc(&sdev->restarts) and atomic_read(&sdev->device_busy).
-	 * .restarts must be incremented before .device_busy is read because the
-	 * code in scsi_run_queue_async() depends on the order of these operations.
-	 */
-	smp_mb__after_atomic();
-
-	/*
-	 * If all in-flight requests originated from this LUN are completed
-	 * before reading .device_busy, sdev->device_busy will be observed as
-	 * zero, then blk_mq_delay_run_hw_queues() will dispatch this request
-	 * soon. Otherwise, completion of one of these requests will observe
-	 * the .restarts flag, and the request queue will be run for handling
-	 * this request, see scsi_end_request().
-	 */
 	if (unlikely(scsi_device_busy(sdev) == 0 &&
 				!scsi_device_blocked(sdev)))
 		blk_mq_delay_run_hw_queues(sdev->request_queue, SCSI_QUEUE_DELAY);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 24bdbf7999ab..66345de80897 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -115,7 +115,6 @@ struct scsi_device {
 	struct sbitmap budget_map;
 	atomic_t device_blocked;	/* Device returned QUEUE_FULL. */
 
-	atomic_t restarts;
 	spinlock_t list_lock;
 	struct list_head starved_entry;
 	unsigned short queue_depth;	/* How deep of a queue we want */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-18 11:30 [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device Yu Kuai
@ 2022-11-19  1:11 ` Bart Van Assche
  2022-11-19  5:57   ` Yu Kuai
  2022-11-26  8:54 ` Yu Kuai
  1 sibling, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2022-11-19  1:11 UTC (permalink / raw)
  To: Yu Kuai, ming.lei, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, yukuai3, yi.zhang

On 11/18/22 03:30, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>

Regarding the subject: unsed -> unused?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-19  1:11 ` Bart Van Assche
@ 2022-11-19  5:57   ` Yu Kuai
  0 siblings, 0 replies; 13+ messages in thread
From: Yu Kuai @ 2022-11-19  5:57 UTC (permalink / raw)
  To: Bart Van Assche, Yu Kuai, ming.lei, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2022/11/19 9:11, Bart Van Assche 写道:
> On 11/18/22 03:30, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
> 
> Regarding the subject: unsed -> unused?

Yes, sorry for the spelling mistake.
> 
> .
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-18 11:30 [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device Yu Kuai
  2022-11-19  1:11 ` Bart Van Assche
@ 2022-11-26  8:54 ` Yu Kuai
  2022-11-27  9:45   ` Ming Lei
                     ` (2 more replies)
  1 sibling, 3 replies; 13+ messages in thread
From: Yu Kuai @ 2022-11-26  8:54 UTC (permalink / raw)
  To: Yu Kuai, ming.lei, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2022/11/18 19:30, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> During code review, I found that 'restarts' is not useful anymore after
> the following commits:
> 
> 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget"
> is a reason to kick")
> 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request
> is the last request")
> 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE
> and completion")
> 
> Now that if get budget ever failed, block layer will make sure to
> trigger new run queue for the hctx. Hence there is no need to run queue
> from scsi layer in this case.
> 

Does anyone has suggestions about this patch?

More info why I tried to remove this:

while testing megaraid with 4 nvme with none elevator, the default
queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth,
bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth,
bw is decreased to about 0.8Gib/s, and with this patch applied,
bw can stay 4Gib/s in the later case.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-26  8:54 ` Yu Kuai
@ 2022-11-27  9:45   ` Ming Lei
  2022-11-28  2:49     ` Yu Kuai
  2022-11-28  2:26   ` Jason Yan
  2022-11-28  3:27   ` Ming Lei
  2 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2022-11-27  9:45 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, yi.zhang, yukuai (C)

On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2022/11/18 19:30, Yu Kuai 写道:
> > From: Yu Kuai <yukuai3@huawei.com>
> > 
> > During code review, I found that 'restarts' is not useful anymore after
> > the following commits:
> > 
> > 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget"
> > is a reason to kick")
> > 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request
> > is the last request")
> > 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE
> > and completion")
> > 
> > Now that if get budget ever failed, block layer will make sure to
> > trigger new run queue for the hctx. Hence there is no need to run queue
> > from scsi layer in this case.
> > 
> 
> Does anyone has suggestions about this patch?
> 
> More info why I tried to remove this:
> 
> while testing megaraid with 4 nvme with none elevator, the default
> queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth,
> bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth,
> bw is decreased to about 0.8Gib/s, and with this patch applied,
> bw can stay 4Gib/s in the later case.

I will look at this patch next week.

Can you investigate a bit the reason why perf boost is from this patch?

Thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-26  8:54 ` Yu Kuai
  2022-11-27  9:45   ` Ming Lei
@ 2022-11-28  2:26   ` Jason Yan
  2022-11-28  3:27   ` Ming Lei
  2 siblings, 0 replies; 13+ messages in thread
From: Jason Yan @ 2022-11-28  2:26 UTC (permalink / raw)
  To: Yu Kuai, ming.lei, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, yi.zhang, yukuai (C)

On 2022/11/26 16:54, Yu Kuai wrote:
> Hi,
> 
> 在 2022/11/18 19:30, Yu Kuai 写道:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> During code review, I found that 'restarts' is not useful anymore after
>> the following commits:
>>
>> 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget"
>> is a reason to kick")
>> 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request
>> is the last request")
>> 3) commit 673235f91531 ("scsi: core: Fix race between handling 
>> STS_RESOURCE
>> and completion")
>>
>> Now that if get budget ever failed, block layer will make sure to
>> trigger new run queue for the hctx. Hence there is no need to run queue
>> from scsi layer in this case.
>>
> 
> Does anyone has suggestions about this patch?
> 
> More info why I tried to remove this:
> 
> while testing megaraid with 4 nvme with none elevator, the default
> queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth,
> bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth,
> bw is decreased to about 0.8Gib/s, and with this patch applied,
> bw can stay 4Gib/s in the later case.
> 

Hi Yu Kuai,

This information should be included in the commit message.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-27  9:45   ` Ming Lei
@ 2022-11-28  2:49     ` Yu Kuai
  0 siblings, 0 replies; 13+ messages in thread
From: Yu Kuai @ 2022-11-28  2:49 UTC (permalink / raw)
  To: Ming Lei, Yu Kuai
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, yi.zhang, yukuai (C)

[-- Attachment #1: Type: text/plain, Size: 2225 bytes --]

Hi, Ming

在 2022/11/27 17:45, Ming Lei 写道:
> On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2022/11/18 19:30, Yu Kuai 写道:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> During code review, I found that 'restarts' is not useful anymore after
>>> the following commits:
>>>
>>> 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget"
>>> is a reason to kick")
>>> 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request
>>> is the last request")
>>> 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE
>>> and completion")
>>>
>>> Now that if get budget ever failed, block layer will make sure to
>>> trigger new run queue for the hctx. Hence there is no need to run queue
>>> from scsi layer in this case.
>>>
>>
>> Does anyone has suggestions about this patch?
>>
>> More info why I tried to remove this:
>>
>> while testing megaraid with 4 nvme with none elevator, the default
>> queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth,
>> bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth,
>> bw is decreased to about 0.8Gib/s, and with this patch applied,
>> bw can stay 4Gib/s in the later case.
> 
> I will look at this patch next week.
> 
> Can you investigate a bit the reason why perf boost is from this patch?

test cmd:

without this patch, perf will show that on cpu time is wasted on
mod_delayed_work_on:

-    3.44%  swapper          [kernel.vmlinux]            [k] 
mod_delayed_work_on
    - mod_delayed_work_on
       - 3.44% kblockd_mod_delayed_work_on
            __blk_mq_delay_run_hw_queue
                  scsi_run_queue_async
                  scsi_end_request
                  scsi_io_completion
                  scsi_finish_command
                + scsi_complete
+    1.05%  ksoftirqd/30     [kernel.vmlinux]            [k] 
mod_delayed_work_on
+    0.54%  fio              [kernel.vmlinux]            [k] 
mod_delayed_work_on
      0.36%  ksoftirqd/22     [kernel.vmlinux]            [k] 
mod_delayed_work_on
      0.35%  ksoftirqd/1      [kernel.vmlinux]            [k] 
mod_delayed_work_on

I also attched the flamegraph.

Thanks,
Kuai
> 
o> Thanks,
> Ming
> 
> 
> .
> 

[-- Attachment #2: 6.1.0-rc6.svg --]
[-- Type: image/svg+xml, Size: 828715 bytes --]

[-- Attachment #3: 6.1.0-rc6-patched.svg --]
[-- Type: image/svg+xml, Size: 590303 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-26  8:54 ` Yu Kuai
  2022-11-27  9:45   ` Ming Lei
  2022-11-28  2:26   ` Jason Yan
@ 2022-11-28  3:27   ` Ming Lei
  2022-11-28  3:35     ` Yu Kuai
  2 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2022-11-28  3:27 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, yi.zhang, yukuai (C)

On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2022/11/18 19:30, Yu Kuai 写道:
> > From: Yu Kuai <yukuai3@huawei.com>
> > 
> > During code review, I found that 'restarts' is not useful anymore after
> > the following commits:
> > 
> > 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget"
> > is a reason to kick")
> > 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request
> > is the last request")
> > 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE
> > and completion")
> > 
> > Now that if get budget ever failed, block layer will make sure to
> > trigger new run queue for the hctx. Hence there is no need to run queue
> > from scsi layer in this case.
> > 

But scsi_run_queue_async() needs to run all hw queue because budget is
actually LUN/request queue wide.

> 
> Does anyone has suggestions about this patch?
> 
> More info why I tried to remove this:
> 
> while testing megaraid with 4 nvme with none elevator, the default
> queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth,
> bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth,
> bw is decreased to about 0.8Gib/s, and with this patch applied,
> bw can stay 4Gib/s in the later case.

What is .can_queue and nr_hw_queues in your setting?



thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-28  3:27   ` Ming Lei
@ 2022-11-28  3:35     ` Yu Kuai
  2022-11-28  3:38       ` Yu Kuai
  2022-11-28  4:12       ` Ming Lei
  0 siblings, 2 replies; 13+ messages in thread
From: Yu Kuai @ 2022-11-28  3:35 UTC (permalink / raw)
  To: Ming Lei, Yu Kuai
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, yi.zhang, yukuai (C)



在 2022/11/28 11:27, Ming Lei 写道:
> On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2022/11/18 19:30, Yu Kuai 写道:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> During code review, I found that 'restarts' is not useful anymore after
>>> the following commits:
>>>
>>> 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget"
>>> is a reason to kick")
>>> 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request
>>> is the last request")
>>> 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE
>>> and completion")
>>>
>>> Now that if get budget ever failed, block layer will make sure to
>>> trigger new run queue for the hctx. Hence there is no need to run queue
>>> from scsi layer in this case.
>>>
> 
> But scsi_run_queue_async() needs to run all hw queue because budget is
> actually LUN/request queue wide.

Why the hw queue need to run if get budget never failed in this hw
queue?

> 
>>
>> Does anyone has suggestions about this patch?
>>
>> More info why I tried to remove this:
>>
>> while testing megaraid with 4 nvme with none elevator, the default
>> queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth,
>> bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth,
>> bw is decreased to about 0.8Gib/s, and with this patch applied,
>> bw can stay 4Gib/s in the later case.
> 
> What is .can_queue and nr_hw_queues in your setting?
test cmd:
fio -name=0 -ioengine=libaio -direct=1 -group_reporting=1 -randseed=2022 
-rwmixread=70 -refill_buffers -filename=/dev/sdg -numjobs=128 -size=1TB 
-runtime=60s -bs=4k -iodepth=2 -rw=randwrite

test environment:
arm64 Kunpeng-920, 128 cpu
megaraid with 4 NVMEs, 128 hctx and queue_depth is 128

> 
> 
> 
> thanks,
> Ming
> 
> .
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-28  3:35     ` Yu Kuai
@ 2022-11-28  3:38       ` Yu Kuai
  2022-11-28  4:12       ` Ming Lei
  1 sibling, 0 replies; 13+ messages in thread
From: Yu Kuai @ 2022-11-28  3:38 UTC (permalink / raw)
  To: Yu Kuai, Ming Lei
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, yi.zhang, yukuai (C)



在 2022/11/28 11:35, Yu Kuai 写道:

> 
> Why the hw queue need to run if get budget never failed in this hw
> queue?

And by the way, after Jan's patch "blk-mq: Improve performance of non-mq
IO schedulers with multiple HW queues", scsi_run_queue_async() can only
garantee to run hw queue for the current cpu, not all the hw queues.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-28  3:35     ` Yu Kuai
  2022-11-28  3:38       ` Yu Kuai
@ 2022-11-28  4:12       ` Ming Lei
  2022-11-28  6:08         ` Yu Kuai
  1 sibling, 1 reply; 13+ messages in thread
From: Ming Lei @ 2022-11-28  4:12 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, yi.zhang, yukuai (C)

On Mon, Nov 28, 2022 at 11:35:18AM +0800, Yu Kuai wrote:
> 
> 
> 在 2022/11/28 11:27, Ming Lei 写道:
> > On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2022/11/18 19:30, Yu Kuai 写道:
> > > > From: Yu Kuai <yukuai3@huawei.com>
> > > > 
> > > > During code review, I found that 'restarts' is not useful anymore after
> > > > the following commits:
> > > > 
> > > > 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget"
> > > > is a reason to kick")
> > > > 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request
> > > > is the last request")
> > > > 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE
> > > > and completion")
> > > > 
> > > > Now that if get budget ever failed, block layer will make sure to
> > > > trigger new run queue for the hctx. Hence there is no need to run queue
> > > > from scsi layer in this case.
> > > > 
> > 
> > But scsi_run_queue_async() needs to run all hw queue because budget is
> > actually LUN/request queue wide.
> 
> Why the hw queue need to run if get budget never failed in this hw
> queue?

Because all hw queues share the queue wide budget, and once budget
is available, all hw queues are re-run, and the hw queue won't be
scheduled actually if there is nothing to run, see
blk_mq_run_hw_queue().

> 
> > 
> > > 
> > > Does anyone has suggestions about this patch?
> > > 
> > > More info why I tried to remove this:
> > > 
> > > while testing megaraid with 4 nvme with none elevator, the default
> > > queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth,
> > > bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth,
> > > bw is decreased to about 0.8Gib/s, and with this patch applied,
> > > bw can stay 4Gib/s in the later case.
> > 
> > What is .can_queue and nr_hw_queues in your setting?
> test cmd:
> fio -name=0 -ioengine=libaio -direct=1 -group_reporting=1 -randseed=2022
> -rwmixread=70 -refill_buffers -filename=/dev/sdg -numjobs=128 -size=1TB
> -runtime=60s -bs=4k -iodepth=2 -rw=randwrite
> 
> test environment:
> arm64 Kunpeng-920, 128 cpu
> megaraid with 4 NVMEs, 128 hctx and queue_depth is 128

From your setting, megaraid should sets ->host_tagset, that said there
is only 128 tags for all 4 NVMEs(128 hw queue shares the all 128 tags
too).

That looks one really bad setting.

BTW, why do you drive nvme via megaraid instead nvme driver?

> And by the way, after Jan's patch "blk-mq: Improve performance of non-mq
> IO schedulers with multiple HW queues", scsi_run_queue_async() can only
> garantee to run hw queue for the current cpu, not all the hw queues.

That isn't true, each hctx is still run in case of none & kyber scheduler.

thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-28  4:12       ` Ming Lei
@ 2022-11-28  6:08         ` Yu Kuai
  2022-11-29  2:15           ` Yu Kuai
  0 siblings, 1 reply; 13+ messages in thread
From: Yu Kuai @ 2022-11-28  6:08 UTC (permalink / raw)
  To: Ming Lei, Yu Kuai
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2022/11/28 12:12, Ming Lei 写道:
> On Mon, Nov 28, 2022 at 11:35:18AM +0800, Yu Kuai wrote:
>>
>>
>> 在 2022/11/28 11:27, Ming Lei 写道:
>>> On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> 在 2022/11/18 19:30, Yu Kuai 写道:
>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>
>>>>> During code review, I found that 'restarts' is not useful anymore after
>>>>> the following commits:
>>>>>
>>>>> 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget"
>>>>> is a reason to kick")
>>>>> 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request
>>>>> is the last request")
>>>>> 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE
>>>>> and completion")
>>>>>
>>>>> Now that if get budget ever failed, block layer will make sure to
>>>>> trigger new run queue for the hctx. Hence there is no need to run queue
>>>>> from scsi layer in this case.
>>>>>
>>>
>>> But scsi_run_queue_async() needs to run all hw queue because budget is
>>> actually LUN/request queue wide.
>>
>> Why the hw queue need to run if get budget never failed in this hw
>> queue?
> 
> Because all hw queues share the queue wide budget, and once budget
> is available, all hw queues are re-run, and the hw queue won't be
> scheduled actually if there is nothing to run, see
> blk_mq_run_hw_queue().

I still don't get it why all hw queues should be re-run, is this just
for performance or fixing a bug? And I'm not sure this behavior is good
for performance.

> 
>>
>>>
>>>>
>>>> Does anyone has suggestions about this patch?
>>>>
>>>> More info why I tried to remove this:
>>>>
>>>> while testing megaraid with 4 nvme with none elevator, the default
>>>> queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth,
>>>> bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth,
>>>> bw is decreased to about 0.8Gib/s, and with this patch applied,
>>>> bw can stay 4Gib/s in the later case.
>>>
>>> What is .can_queue and nr_hw_queues in your setting?
>> test cmd:
>> fio -name=0 -ioengine=libaio -direct=1 -group_reporting=1 -randseed=2022
>> -rwmixread=70 -refill_buffers -filename=/dev/sdg -numjobs=128 -size=1TB
>> -runtime=60s -bs=4k -iodepth=2 -rw=randwrite
>>
>> test environment:
>> arm64 Kunpeng-920, 128 cpu
>> megaraid with 4 NVMEs, 128 hctx and queue_depth is 128
> 
>>From your setting, megaraid should sets ->host_tagset, that said there
> is only 128 tags for all 4 NVMEs(128 hw queue shares the all 128 tags
> too).
> 
> That looks one really bad setting.

It's right that is bad, but the point here is to triggered get budget
failed frequently. If I set queue_depth to 2048, and I use 128 numjobs
with 32 iodpeth, performance is still bad.
> 
> BTW, why do you drive nvme via megaraid instead nvme driver?
> 
>> And by the way, after Jan's patch "blk-mq: Improve performance of non-mq
>> IO schedulers with multiple HW queues", scsi_run_queue_async() can only
>> garantee to run hw queue for the current cpu, not all the hw queues.
> 
> That isn't true, each hctx is still run in case of none & kyber scheduler.

Yes, but current default hctx shared elevator is deadline.
> 
> thanks,
> Ming
> 
> .
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device
  2022-11-28  6:08         ` Yu Kuai
@ 2022-11-29  2:15           ` Yu Kuai
  0 siblings, 0 replies; 13+ messages in thread
From: Yu Kuai @ 2022-11-29  2:15 UTC (permalink / raw)
  To: Yu Kuai, Ming Lei, Jan Kara
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, yi.zhang, yukuai (C)

Hi,

> Hi,
> 
> 在 2022/11/28 12:12, Ming Lei 写道:

>>
>> BTW, why do you drive nvme via megaraid instead nvme driver?
>>
>>> And by the way, after Jan's patch "blk-mq: Improve performance of non-mq
>>> IO schedulers with multiple HW queues", scsi_run_queue_async() can only
>>> garantee to run hw queue for the current cpu, not all the hw queues.
>>
>> That isn't true, each hctx is still run in case of none & kyber 
>> scheduler.

And I really suspect that why Jan's patch can improve performance is
because it avoid many run queues from scsi_run_queue_async().

> 
> Yes, but current default hctx shared elevator is deadline.
>>
>> thanks,
>> Ming
>>
>> .
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-29  2:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-18 11:30 [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device Yu Kuai
2022-11-19  1:11 ` Bart Van Assche
2022-11-19  5:57   ` Yu Kuai
2022-11-26  8:54 ` Yu Kuai
2022-11-27  9:45   ` Ming Lei
2022-11-28  2:49     ` Yu Kuai
2022-11-28  2:26   ` Jason Yan
2022-11-28  3:27   ` Ming Lei
2022-11-28  3:35     ` Yu Kuai
2022-11-28  3:38       ` Yu Kuai
2022-11-28  4:12       ` Ming Lei
2022-11-28  6:08         ` Yu Kuai
2022-11-29  2:15           ` Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).