linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Improve mq-deadline performance in HDD
@ 2022-07-22  9:51 Wang You
  2022-07-22  9:51 ` [PATCH v2 1/2] block: Introduce nr_sched_batch sys interface Wang You
  2022-07-22  9:51 ` [PATCH v2 2/2] block/mq-deadline: Prioritize first request Wang You
  0 siblings, 2 replies; 8+ messages in thread
From: Wang You @ 2022-07-22  9:51 UTC (permalink / raw)
  To: axboe, bvanassche
  Cc: linux-block, linux-kernel, hch, jaegeuk, fio, ming.lei, wangyoua,
	wangxiaohua

Hi jens,

We already know that batching requests can increase the storage
performance of some devices, but after testing, I found that it
will reduce the performance of some low and medium speed devices
when using the scheduler (especially mq-deadline), such as
rotational disks connected to a raid.

In order to balance the performance difference between different
devices, a more sophisticated mechanism may be needed to control
the number of batching (such as sometimes 1 is better),
but obviously this is not easy.

At the same time I noticed that when the mq-deadline scheduler select
a request for a higher sector outside the batch, deadline_fifo_request
is used to select the request within the fifo and restart the batch.
Selecting a request from a fifo tends to satisfy the request's timeline,
but it is not always very urgent.

So in the case of no expire, can the selection strategy of the first
request in the batch be changed? Because in devices with rotation
characteristics, proper sector access order is beneficial to performance.

This patch series includes the following two parts:

- Added a temporary sys interface nr_sched_batch to control the number
  of batching requests.

- Added a deadline_head_request function to select the request from the
  first of the red-black tree instead of the fifo when appropriate.

Thanks,

Wang.

Wang You (2):
  block: Introduce nr_sched_batch sys interface
  block/mq-deadline: Prioritize first request

 block/blk-mq-sched.c   |  4 +++-
 block/blk-sysfs.c      | 32 +++++++++++++++++++++++++++++++
 block/mq-deadline.c    | 43 +++++++++++++++++++++++++++++++++++++++---
 include/linux/blkdev.h |  1 +
 4 files changed, 76 insertions(+), 4 deletions(-)

-- 
2.27.0





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] block: Introduce nr_sched_batch sys interface
  2022-07-22  9:51 [PATCH v2 0/2] Improve mq-deadline performance in HDD Wang You
@ 2022-07-22  9:51 ` Wang You
  2022-07-22 17:35   ` Bart Van Assche
  2022-07-22  9:51 ` [PATCH v2 2/2] block/mq-deadline: Prioritize first request Wang You
  1 sibling, 1 reply; 8+ messages in thread
From: Wang You @ 2022-07-22  9:51 UTC (permalink / raw)
  To: axboe, bvanassche
  Cc: linux-block, linux-kernel, hch, jaegeuk, fio, ming.lei, wangyoua,
	wangxiaohua

The function of this patch is to add an nr_sched_batch interface under
/sys/block/sdx/queue/, which can be used to set the number of batching
requests. Of course, the default value is nr_requests and will follow
nr_request when it has not been changed.

Signed-off-by: Wang You <wangyoua@uniontech.com>
---
 block/blk-mq-sched.c   |  4 +++-
 block/blk-sysfs.c      | 32 ++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index a4f7c101b53b..92798a0c03bd 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -100,7 +100,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 	if (hctx->dispatch_busy)
 		max_dispatch = 1;
 	else
-		max_dispatch = hctx->queue->nr_requests;
+		max_dispatch = q->nr_sched_batch;
 
 	do {
 		struct request *rq;
@@ -567,6 +567,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 		blk_queue_flag_clear(QUEUE_FLAG_SQ_SCHED, q);
 		q->elevator = NULL;
 		q->nr_requests = q->tag_set->queue_depth;
+		q->nr_sched_batch = q->nr_requests;
 		return 0;
 	}
 
@@ -577,6 +578,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 	 */
 	q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
 				   BLKDEV_DEFAULT_RQ);
+	q->nr_sched_batch = q->nr_requests;
 
 	if (blk_mq_is_shared_tags(flags)) {
 		ret = blk_mq_init_sched_shared_tags(q);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9b905e9443e4..34fbfb22e3e1 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -70,6 +70,7 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
 {
 	unsigned long nr;
 	int ret, err;
+	unsigned long prev_nr_request = q->nr_requests;
 
 	if (!queue_is_mq(q))
 		return -EINVAL;
@@ -85,6 +86,35 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
 	if (err)
 		return err;
 
+	if (q->nr_sched_batch == prev_nr_request || q->nr_sched_batch > nr)
+		q->nr_sched_batch = nr;
+
+	return ret;
+}
+
+static ssize_t elv_nr_batch_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->nr_sched_batch, page);
+}
+
+static ssize_t elv_nr_batch_store(struct request_queue *q, const char *page,
+				  size_t count)
+{
+	unsigned long nr;
+	int ret;
+
+	if (!queue_is_mq(q))
+		return -EINVAL;
+
+	ret = queue_var_store(&nr, page, count);
+	if (ret < 0)
+		return ret;
+
+	if (nr > q->nr_requests || nr < 1)
+		return -EINVAL;
+
+	q->nr_sched_batch = nr;
+
 	return ret;
 }
 
@@ -573,6 +603,7 @@ QUEUE_RO_ENTRY(queue_max_segments, "max_segments");
 QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments");
 QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size");
 QUEUE_RW_ENTRY(elv_iosched, "scheduler");
+QUEUE_RW_ENTRY(elv_nr_batch, "nr_sched_batch");
 
 QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size");
 QUEUE_RO_ENTRY(queue_physical_block_size, "physical_block_size");
@@ -632,6 +663,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_max_integrity_segments_entry.attr,
 	&queue_max_segment_size_entry.attr,
 	&elv_iosched_entry.attr,
+	&elv_nr_batch_entry.attr,
 	&queue_hw_sector_size_entry.attr,
 	&queue_logical_block_size_entry.attr,
 	&queue_physical_block_size_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2f7b43444c5f..e38eba4161a3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -422,6 +422,7 @@ struct request_queue {
 	 * queue settings
 	 */
 	unsigned long		nr_requests;	/* Max # of requests */
+	unsigned long nr_sched_batch;
 
 	unsigned int		dma_pad_mask;
 	unsigned int		dma_alignment;
-- 
2.27.0




^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] block/mq-deadline: Prioritize first request
  2022-07-22  9:51 [PATCH v2 0/2] Improve mq-deadline performance in HDD Wang You
  2022-07-22  9:51 ` [PATCH v2 1/2] block: Introduce nr_sched_batch sys interface Wang You
@ 2022-07-22  9:51 ` Wang You
  2022-07-22 17:47   ` Bart Van Assche
  1 sibling, 1 reply; 8+ messages in thread
From: Wang You @ 2022-07-22  9:51 UTC (permalink / raw)
  To: axboe, bvanassche
  Cc: linux-block, linux-kernel, hch, jaegeuk, fio, ming.lei, wangyoua,
	wangxiaohua

The function deadline_head_request can select the request located at
the head from the sector red-black tree of the mq-deadline scheduler,
dispatch such a request may cause the disk access address to return
to the head, so as to prevent it from swinging back and forth.

- The presence of the scheduler batching requests may reduce or
  even eliminate its ability to fuse and sort, so I sometimes set
  it to 1.

- This pathc may exacerbate the risk of expire, I don't know if
  a more absolute expire detection is necessary.

- Tested some disks (mainly rotational disks and some SSDs) with
  the fio tool (using sync, direct, etc. parameters), the results
  show that they increase the disk's small sector sequential read
  and write performance, does this imply that changing
  nr_sched_batch is reasonable?

- Later testing on different hardware showed that the raid controller 
  probably played an important role, but the performance of a single 
  disk did not improve as expected. so I'm not sure if this patch really 
  has the desired effect.

Thanks,

Wang.

The following are all test data:

The test hardware is:
Kunpeng-920, HW-SAS3508+(MG04ACA400N * 2), RAID0.

The test command is:
fio -ioengine=psync -lockmem=1G -buffered=0 -time_based=1 -direct=1
-iodepth=1 -thread -bs=512B -size=110g -numjobs=16 -runtime=300
-group_reporting -name=read -filename=/dev/sdb14
-ioscheduler=mq-deadline -rw=read[,write,rw]

- The following is the test data:
origin/master:
read iops: 152421       write iops: 136959      rw iops: 54593,54581

nr_sched_batch = 1:
read iops: 166449       write iops: 139477      rw iops: 55363,55355

nr_sched_batch = 1, use deadline_head_request:
read iops: 171177       write iops: 184431      rw iops: 56178,56169


- The test hardware is:
Hygon C86, MG04ACA400N

The test command is:
fio -ioengine=psync -lockmem=1G -buffered=0 -time_based=1 -direct=1 -iodepth=1 
-thread -bs=512B -size=110g -numjobs=32 -runtime=300 -group_reporting 
-name=read -filename=/dev/sdc -ioscheduler=mq-deadline -rw=read[,write,rw]

The following is the test data:
origin/master:
read iops: 15463	write iops: 5949	rw iops: 574,576

nr_sched_batch = 1:
read iops: 15082	write iops: 6283	rw iops: 783,786

nr_sched_batch = 1, use deadline_head_request:
read iops: 15368	write iops: 6575	rw iops: 907,906


- The test hardware is:
Kunpeng-920, HW-SAS3508 + Samsung SSD 780, RAID0.

The test command is:
fio -ioengine=psync -lockmem=1G -buffered=0 -time_based=1 -direct=1 -iodepth=1 
-thread -bs=512B -size=110g -numjobs=16 -runtime=300 -group_reporting 
-name=read -filename=/dev/sda -ioscheduler=mq-deadline -rw=read[,write,rw]

The following is the test data:
origin/master:
read iops: 115399       write iops: 136801      rw iops: 58082,58084

nr_sched_batch = 1, use deadline_head_request:
read iops: 136473       write iops: 184646      rw iops: 56460,56454

Signed-off-by: Wang You <wangyoua@uniontech.com>
---
 block/mq-deadline.c | 43 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 3 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 1a9e835e816c..4660dd4a16f6 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -344,6 +344,36 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	return rq;
 }
 
+static inline struct request *
+deadline_head_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
+		      int data_dir)
+{
+	struct rb_node *node = rb_first(&per_prio->sort_list[data_dir]);
+	struct request *rq;
+	unsigned long flags;
+
+	if (!node)
+		return NULL;
+
+	rq = rb_entry_rq(node);
+	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+		return rq;
+
+	/*
+	 * Look for a write request that can be dispatched, that is one with
+	 * an unlocked target zone.
+	 */
+	spin_lock_irqsave(&dd->zone_lock, flags);
+	while (rq) {
+		if (blk_req_can_dispatch_to_zone(rq))
+			break;
+		rq = deadline_latter_request(rq);
+	}
+	spin_unlock_irqrestore(&dd->zone_lock, flags);
+
+	return rq;
+}
+
 /*
  * Returns true if and only if @rq started after @latest_start where
  * @latest_start is in jiffies.
@@ -429,13 +459,20 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	 * we are not running a batch, find best request for selected data_dir
 	 */
 	next_rq = deadline_next_request(dd, per_prio, data_dir);
-	if (deadline_check_fifo(per_prio, data_dir) || !next_rq) {
+	if (deadline_check_fifo(per_prio, data_dir)) {
 		/*
 		 * A deadline has expired, the last request was in the other
-		 * direction, or we have run out of higher-sectored requests.
-		 * Start again from the request with the earliest expiry time.
+		 * direction. Start again from the request with the earliest
+		 * expiry time.
 		 */
 		rq = deadline_fifo_request(dd, per_prio, data_dir);
+	} else if (!next_rq) {
+		/*
+		 * There is no operation expired, and we have run out of
+		 * higher-sectored requests. Look for the sector at the head
+		 * which may reduce disk seek consumption.
+		 */
+		rq = deadline_head_request(dd, per_prio, data_dir);
 	} else {
 		/*
 		 * The last req was the same dir and we have a next request in
-- 
2.27.0




^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] block: Introduce nr_sched_batch sys interface
  2022-07-22  9:51 ` [PATCH v2 1/2] block: Introduce nr_sched_batch sys interface Wang You
@ 2022-07-22 17:35   ` Bart Van Assche
  2022-07-23 10:04     ` Wang You
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2022-07-22 17:35 UTC (permalink / raw)
  To: Wang You, axboe
  Cc: linux-block, linux-kernel, hch, jaegeuk, fio, ming.lei, wangxiaohua

On 7/22/22 02:51, Wang You wrote:
> The function of this patch is to add an nr_sched_batch interface under
> /sys/block/sdx/queue/, which can be used to set the number of batching
> requests. Of course, the default value is nr_requests and will follow
> nr_request when it has not been changed.

Do we need this new parameter? How does the performance of reducing 
nr_requests compare to configuring nr_sched_batch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] block/mq-deadline: Prioritize first request
  2022-07-22  9:51 ` [PATCH v2 2/2] block/mq-deadline: Prioritize first request Wang You
@ 2022-07-22 17:47   ` Bart Van Assche
  2022-07-23 10:59     ` Wang You
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2022-07-22 17:47 UTC (permalink / raw)
  To: Wang You, axboe
  Cc: linux-block, linux-kernel, hch, jaegeuk, fio, ming.lei, wangxiaohua

On 7/22/22 02:51, Wang You wrote:
> The test hardware is:
> Kunpeng-920, HW-SAS3508+(MG04ACA400N * 2), RAID0.

What is MG04ACA400N? The test results suggest that it is an SSD but this 
is something that should be mentioned explicitly.

> - The test hardware is:
> Hygon C86, MG04ACA400N

What is MG04ACA400N?

> The test command is:
> fio -ioengine=psync -lockmem=1G -buffered=0 -time_based=1 -direct=1 -iodepth=1
> -thread -bs=512B -size=110g -numjobs=32 -runtime=300 -group_reporting
> -name=read -filename=/dev/sdc -ioscheduler=mq-deadline -rw=read[,write,rw]
> 
> The following is the test data:
> origin/master:
> read iops: 15463	write iops: 5949	rw iops: 574,576
> 
> nr_sched_batch = 1:
> read iops: 15082	write iops: 6283	rw iops: 783,786
> 
> nr_sched_batch = 1, use deadline_head_request:
> read iops: 15368	write iops: 6575	rw iops: 907,906

The above results are low enough such that these could come from a hard 
disk. However, the test results are hard to interpret since the I/O 
pattern is neither perfectly sequential nor perfectly random (32 
sequential jobs). Please provide separate measurements for sequential 
and random I/O.

The above results show that this patch makes reading from a hard disk 
slower. Isn't the primary use case of mq-deadline to make reading from 
hard disks faster? So why should these two patches be applied if these 
slow down reading from a hard disk?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] block: Introduce nr_sched_batch sys interface
  2022-07-22 17:35   ` Bart Van Assche
@ 2022-07-23 10:04     ` Wang You
  0 siblings, 0 replies; 8+ messages in thread
From: Wang You @ 2022-07-23 10:04 UTC (permalink / raw)
  To: bvanassche
  Cc: axboe, fio, hch, jaegeuk, linux-block, linux-kernel, ming.lei,
	wangxiaohua, wangyoua

> Do we need this new parameter? How does the performance of reducing 
> nr_requests compare to configuring nr_sched_batch?

It seems I can't get convincing data in the current test conditions, sorry, 
please ignore it unless I can produce valid evidence.

I will reconsider the effect of this modification on a single disk, 
from recent testing it seems that the link to the raid controller has 
a lot of impact.

Thanks,

Wang.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] block/mq-deadline: Prioritize first request
  2022-07-22 17:47   ` Bart Van Assche
@ 2022-07-23 10:59     ` Wang You
  2022-07-25  1:48       ` Bart Van Assche
  0 siblings, 1 reply; 8+ messages in thread
From: Wang You @ 2022-07-23 10:59 UTC (permalink / raw)
  To: bvanassche
  Cc: axboe, fio, hch, jaegeuk, linux-block, linux-kernel, ming.lei,
	wangxiaohua, wangyoua

> What is MG04ACA400N?

It is a Toshiba 7200 RPM hard drive.

> The above results are low enough such that these could come from a hard 
> disk. However, the test results are hard to interpret since the I/O 
> pattern is neither perfectly sequential nor perfectly random (32 
> sequential jobs). Please provide separate measurements for sequential 
> and random I/O.

> The above results show that this patch makes reading from a hard disk 
> slower. Isn't the primary use case of mq-deadline to make reading from 
> hard disks faster? So why should these two patches be applied if these 
> slow down reading from a hard disk?

The data of MG04ACA400N on the raid controller is obviously different from 
the single disk, especially the reading data, I did not expect this situation, 
the data on the raid controller made me mistakenly think that the same applies 
to HDD.

I will re-analyze the impact of this patch on the HDD later, please ignore it 
for now.

Also, can I ask? If using fio or other tools, how should testing be done to get 
more accurate and convincing data? Such as the perfectly sequential and random I/O 
performance you mentioned above (fio's multi-threaded test does result in neither 
perfectly sequential nor perfectly random, but single thread dispatch is too slow, 
and cannot play the merge and sorting ability of elv).

Thanks,

Wang.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] block/mq-deadline: Prioritize first request
  2022-07-23 10:59     ` Wang You
@ 2022-07-25  1:48       ` Bart Van Assche
  0 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2022-07-25  1:48 UTC (permalink / raw)
  To: Wang You
  Cc: axboe, fio, hch, jaegeuk, linux-block, linux-kernel, ming.lei,
	wangxiaohua

On 7/23/22 03:59, Wang You wrote:
> Also, can I ask? If using fio or other tools, how should testing be done to get
> more accurate and convincing data? Such as the perfectly sequential and random I/O
> performance you mentioned above (fio's multi-threaded test does result in neither
> perfectly sequential nor perfectly random, but single thread dispatch is too slow,
> and cannot play the merge and sorting ability of elv).

I'm not sure that there is agreement about for which data patterns to 
measure performance to conclude that certain code changes improve 
performance of an I/O scheduler.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-25  1:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-22  9:51 [PATCH v2 0/2] Improve mq-deadline performance in HDD Wang You
2022-07-22  9:51 ` [PATCH v2 1/2] block: Introduce nr_sched_batch sys interface Wang You
2022-07-22 17:35   ` Bart Van Assche
2022-07-23 10:04     ` Wang You
2022-07-22  9:51 ` [PATCH v2 2/2] block/mq-deadline: Prioritize first request Wang You
2022-07-22 17:47   ` Bart Van Assche
2022-07-23 10:59     ` Wang You
2022-07-25  1:48       ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).