All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yufen Yu <yuyufen@huawei.com>
To: song@kernel.org
Cc: linux-raid@vger.kernel.org, neilb@suse.com,
	guoqing.jiang@cloud.ionos.com, colyli@suse.de, xni@redhat.com,
	houtao1@huawei.com, yuyufen@huawei.com
Subject: [PATCH v3 01/11] md/raid5: add CONFIG_MD_RAID456_STRIPE_SHIFT to set STRIPE_SIZE
Date: Wed, 27 May 2020 21:19:23 +0800	[thread overview]
Message-ID: <20200527131933.34400-2-yuyufen@huawei.com> (raw)
In-Reply-To: <20200527131933.34400-1-yuyufen@huawei.com>

In RAID5, if issued bio size is bigger than STRIPE_SIZE, it will be split
in the unit of STRIPE_SIZE and process them one by one. Even for size
less then STRIPE_SIZE, RAID5 also request data from disk at least of
STRIPE_SIZE.

Nowdays, STRIPE_SIZE is equal to the value of PAGE_SIZE. Since filesystem
usually issue bio in the unit of 4KB, there is no problem for PAGE_SIZE as
4KB. But, for 64KB PAGE_SIZE, bio from filesystem requests 4KB data while
RAID5 issue IO at least STRIPE_SIZE (64KB) each time. That will waste
resource of disk bandwidth and compute xor.

To avoding the waste, we want to add a new CONFIG option to adjust
STREIPE_SIZE. Default value is 4096. User can also set the value bigger
than 4KB for some special requirements, such as we know the issued io
size is more than 4KB.

To evaluate the new feature, we create raid5 device '/dev/md5' with
4 SSD disk and test it on arm64 machine with 64KB PAGE_SIZE.

1) We format /dev/md5 with mkfs.ext4 and mount ext4 with default
 configure on /mnt directory. Then, trying to test it by dbench with
 command: dbench -D /mnt -t 1000 10. Result show as:

 'STRIPE_SIZE = 64KB'

  Operation      Count    AvgLat    MaxLat
  ----------------------------------------
  NTCreateX    9805011     0.021    64.728
  Close        7202525     0.001     0.120
  Rename        415213     0.051    44.681
  Unlink       1980066     0.079    93.147
  Deltree          240     1.793     6.516
  Mkdir            120     0.004     0.007
  Qpathinfo    8887512     0.007    37.114
  Qfileinfo    1557262     0.001     0.030
  Qfsinfo      1629582     0.012     0.152
  Sfileinfo     798756     0.040    57.641
  Find         3436004     0.019    57.782
  WriteX       4887239     0.021    57.638
  ReadX        15370483     0.005    37.818
  LockX          31934     0.003     0.022
  UnlockX        31933     0.001     0.021
  Flush         687205    13.302   530.088

 Throughput 307.799 MB/sec  10 clients  10 procs  max_latency=530.091 ms
 -------------------------------------------------------

 'STRIPE_SIZE = 4KB'

  Operation      Count    AvgLat    MaxLat
  ----------------------------------------
  NTCreateX    11999166     0.021    36.380
  Close        8814128     0.001     0.122
  Rename        508113     0.051    29.169
  Unlink       2423242     0.070    38.141
  Deltree          300     1.885     7.155
  Mkdir            150     0.004     0.006
  Qpathinfo    10875921     0.007    35.485
  Qfileinfo    1905837     0.001     0.032
  Qfsinfo      1994304     0.012     0.125
  Sfileinfo     977450     0.029    26.489
  Find         4204952     0.019     9.361
  WriteX       5981890     0.019    27.804
  ReadX        18809742     0.004    33.491
  LockX          39074     0.003     0.025
  UnlockX        39074     0.001     0.014
  Flush         841022    10.712   458.848

 Throughput 376.777 MB/sec  10 clients  10 procs  max_latency=458.852 ms
 -------------------------------------------------------

 It show that setting STREIP_SIZE as 4KB has higher thoughput, i.e.
 (376.777 vs 307.799) and has smaller latency (530.091 vs 458.852)
 than that setting as 64KB.

 2) We try to evaluate IO throughput for /dev/md5 by fio with config:

 [4KB randwrite]
 direct=1
 numjob=2
 iodepth=64
 ioengine=libaio
 filename=/dev/md5
 bs=4KB
 rw=randwrite

 [64KB write]
 direct=1
 numjob=2
 iodepth=64
 ioengine=libaio
 filename=/dev/md5
 bs=1MB
 rw=write

 The result as follow:

               +                   +
               | STRIPE_SIZE(64KB) | STRIPE_SIZE(4KB)
 +----------------------------------------------------+
 4KB randwrite |     15MB/s        |      100MB/s
 +----------------------------------------------------+
 1MB write     |   1000MB/s        |      700MB/s

 The result show that when size of io is bigger than 4KB (64KB),
 64KB STRIPE_SIZE has much higher IOPS. But for 4KB randwrite, that
 means, size of io issued to device are smaller, 4KB STRIPE_SIZE
 have better performance.

Thus, we provide a configure to set STRIPE_SIZE when PAGE_SIZE is bigger
than 4096. Normally, default value (4096) can get relatively good
performance. But if each issued io is bigger than 4096, setting value more
than 4096 may get better performance.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 drivers/md/Kconfig | 21 +++++++++++++++++++++
 drivers/md/raid5.h |  4 +++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index d6d5ab23c088..629324f92c42 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -157,6 +157,27 @@ config MD_RAID456
 
 	  If unsure, say Y.
 
+config MD_RAID456_STRIPE_SHIFT
+	int "RAID4/RAID5/RAID6 stripe size shift"
+	default "1"
+	depends on MD_RAID456
+	help
+	  When set the value as 'N', stripe size will be set as 'N << 9',
+	  which is a multiple of 4KB.
+
+	  The default value is 1, that means the default stripe size is
+	  4096(1 << 9). Just setting as a bigger value when PAGE_SIZE is
+	  bigger than 4096. In that case, you can set it as 2(8KB),
+	  4(16K), 16(64K).
+
+	  When you try to set a big value, likely 16 on arm64 with 64KB
+	  PAGE_SIZE, that means, you know size of each io that issued to
+	  raid device is more than 4096. Otherwise just use default value.
+
+	  Normally, using default value can get better performance.
+	  Only change this value if you know what you are doing.
+
+
 config MD_MULTIPATH
 	tristate "Multipath I/O support"
 	depends on BLK_DEV_MD
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index f90e0704bed9..b25f107dafc7 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -472,7 +472,9 @@ struct disk_info {
  */
 
 #define NR_STRIPES		256
-#define STRIPE_SIZE		PAGE_SIZE
+#define CONFIG_STRIPE_SIZE	(CONFIG_MD_RAID456_STRIPE_SHIFT << 9)
+#define STRIPE_SIZE		\
+	(CONFIG_STRIPE_SIZE > PAGE_SIZE ? PAGE_SIZE : CONFIG_STRIPE_SIZE)
 #define STRIPE_SHIFT		(PAGE_SHIFT - 9)
 #define STRIPE_SECTORS		(STRIPE_SIZE>>9)
 #define	IO_THRESHOLD		1
-- 
2.21.3

  reply	other threads:[~2020-05-27 13:19 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-27 13:19 [PATCH v3 00/11] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
2020-05-27 13:19 ` Yufen Yu [this message]
2020-05-27 13:54   ` [PATCH v3 01/11] md/raid5: add CONFIG_MD_RAID456_STRIPE_SHIFT to set STRIPE_SIZE Guoqing Jiang
2020-05-27 23:30     ` John Stoffel
2020-05-28  6:17     ` Yufen Yu
2020-05-27 15:16   ` Xiao Ni
2020-05-28  6:29     ` Yufen Yu
2020-05-27 20:21   ` kbuild test robot
2020-05-27 20:21     ` kbuild test robot
2020-05-28 14:23   ` Song Liu
2020-05-29  8:42     ` Yufen Yu
2020-05-27 13:19 ` [PATCH v3 02/11] md/raid5: add a member of r5pages for struct stripe_head Yufen Yu
2020-05-27 13:19 ` [PATCH v3 03/11] md/raid5: allocate and free pages of r5pages Yufen Yu
2020-05-27 13:19 ` [PATCH v3 04/11] md/raid5: set correct page offset for bi_io_vec in ops_run_io() Yufen Yu
2020-05-27 13:19 ` [PATCH v3 05/11] md/raid5: set correct page offset for async_copy_data() Yufen Yu
2020-05-27 13:19 ` [PATCH v3 06/11] md/raid5: add new xor function to support different page offset Yufen Yu
2020-05-27 13:19 ` [PATCH v3 07/11] md/raid5: add offset array in scribble buffer Yufen Yu
2020-05-27 13:19 ` [PATCH v3 08/11] md/raid5: compute xor with correct page offset Yufen Yu
2020-05-27 13:19 ` [PATCH v3 09/11] md/raid6: let syndrome computor support different " Yufen Yu
2020-05-27 13:19 ` [PATCH v3 10/11] md/raid6: compute syndrome with correct " Yufen Yu
2020-05-27 13:19 ` [PATCH v3 11/11] raid6test: adaptation with syndrome function Yufen Yu
2020-05-28 14:10 ` [PATCH v3 00/11] md/raid5: set STRIPE_SIZE as a configurable value Song Liu
2020-05-28 14:28   ` Song Liu
2020-05-29  9:32     ` Yufen Yu
2020-05-28 22:07 ` Guoqing Jiang
2020-05-29 11:49   ` Yufen Yu
2020-05-29 12:22     ` Guoqing Jiang
2020-05-30  2:15       ` Yufen Yu
2020-06-01 14:02         ` Guoqing Jiang
2020-06-02  6:59           ` Song Liu
2020-06-04 13:17             ` Yufen Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200527131933.34400-2-yuyufen@huawei.com \
    --to=yuyufen@huawei.com \
    --cc=colyli@suse.de \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=houtao1@huawei.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.