From: "Finlayson, James M CIV (USA)" <james.m.finlayson4.civ@mail.mil>
To: "'linux-raid@vger.kernel.org'" <linux-raid@vger.kernel.org>
Cc: 'Gal Ofri' <gal.ofri@volumez.com>,
"Finlayson, James M CIV (USA)" <james.m.finlayson4.civ@mail.mil>
Subject: RE: [Non-DoD Source] Re: Can't get RAID5/RAID6 NVMe randomread IOPS - AMD ROME what am I missing?????
Date: Thu, 5 Aug 2021 19:52:01 +0000 [thread overview]
Message-ID: <5EAED86C53DED2479E3E145969315A2385856B25@UMECHPA7B.easf.csd.disa.mil> (raw)
In-Reply-To: <5EAED86C53DED2479E3E145969315A2385856AF7@UMECHPA7B.easf.csd.disa.mil>
Sorry - again..I sent HTML instead of plain text
Resend - mailing list bounce
All,
Sorry for the delay - both work and life got into the way. Here is some feedback:
BLUF upfront with 5.14rc3 kernel that our SA built - md0 a 10+1+1 RAID5 - 5.332 M IOPS 20.3GiB/s, md1 a 10+1+1 RAID5, 5.892M IOPS 22.5GiB/s - best hero numbers I've ever seen on mdraid RAID5 IOPS. I think the kernel patch is good. Prior was socket0 1.263M IOPS 4934MiB/s, socket1 1.071M IOSP, 4183MiB/s.... I'm willing to help push this as hard as we can until we hit a bottleneck outside of our control.
I need to verify the RAW IOPS - admittedly this is a different server and I didn't do any regression testing before the kernel, but my raw were socket0: 13.2M IOPS and socket1 13.5M IOPS. Prior was socket0 16.0M IOPS and socket1 13.5M IOPS. - admittedly there appears to a regression in the socket0 "hero run" but what I don't know that since this is a different server, I don't know if I have a configuration management issue in my zealousness to test this patch or whether we have a regression. I was so excited to have the attention of kernel developers that needed my help that I borrowed another system, because I didn't want to tear apart my "Frankenstein's monster" 32 partition mdraid LVM mess. If I can switch kernels and reboot before work and life get back in the way, I'll follow up..
I think I might have to give myself the action to run this to ground next week on the other server. Without a doubt the mdraid lock improvement is worth taking forward. I either have to find my error or point a finger as my raw hero numbers got worse. I tend to see one socket outrun another - the way HPE allocates the nvme drives to pcie root complexes is not how I'd like to do it so the drives are unbalanced on the PCIe root complexes (drives are in 4 different root complexes on socket 0 and 3 on socket 1, so one would think socket0 will always be faster for hero runs (an NPS4 numa mapping is the best way to show it:
[root@gremlin04 hornet05]# cat *nps4
#filename=/dev/nvme0n1 0
#filename=/dev/nvme1n1 0
#filename=/dev/nvme2n1 1
#filename=/dev/nvme3n1 1
#filename=/dev/nvme4n1 2
#filename=/dev/nvme5n1 2
#filename=/dev/nvme6n1 2
#filename=/dev/nvme7n1 2
#filename=/dev/nvme8n1 3
#filename=/dev/nvme9n1 3
#filename=/dev/nvme10n1 3
#filename=/dev/nvme11n1 3
#filename=/dev/nvme12n1 4
#filename=/dev/nvme13n1 4
#filename=/dev/nvme14n1 4
#filename=/dev/nvme15n1 4
#filename=/dev/nvme17n1 5
#filename=/dev/nvme18n1 5
#filename=/dev/nvme19n1 5
#filename=/dev/nvme20n1 5
#filename=/dev/nvme21n1 6
#filename=/dev/nvme22n1 6
#filename=/dev/nvme23n1 6
#filename=/dev/nvme24n1 6
fio fiojim.hpdl385.nps1
socket0: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
socket1: (g=1): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
socket0-md: (g=2): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
socket1-md: (g=3): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.26
Starting 256 processes
Jobs: 128 (f=128): [_(128),r(128)][1.5%][r=42.8GiB/s][r=11.2M IOPS][eta 10h:40m:00s]
socket0: (groupid=0, jobs=64): err= 0: pid=522428: Thu Aug 5 19:33:05 2021
read: IOPS=13.2M, BW=50.2GiB/s (53.9GB/s)(14.7TiB/300005msec)
slat (nsec): min=1312, max=8308.1k, avg=2206.72, stdev=1505.92
clat (usec): min=14, max=42033, avg=619.56, stdev=671.45
lat (usec): min=19, max=42045, avg=621.83, stdev=671.46
clat percentiles (usec):
| 1.00th=[ 113], 5.00th=[ 149], 10.00th=[ 180], 20.00th=[ 229],
| 30.00th=[ 273], 40.00th=[ 310], 50.00th=[ 351], 60.00th=[ 408],
| 70.00th=[ 578], 80.00th=[ 938], 90.00th=[ 1467], 95.00th=[ 1909],
| 99.00th=[ 3163], 99.50th=[ 4178], 99.90th=[ 5800], 99.95th=[ 6390],
| 99.99th=[ 8455]
bw ( MiB/s): min=28741, max=61365, per=18.56%, avg=51489.80, stdev=82.09, samples=38016
iops : min=7357916, max=15709528, avg=13181362.22, stdev=21013.83, samples=38016
lat (usec) : 20=0.01%, 50=0.02%, 100=0.42%, 250=24.52%, 500=42.21%
lat (usec) : 750=7.94%, 1000=6.34%
lat (msec) : 2=14.26%, 4=3.74%, 10=0.54%, 20=0.01%, 50=0.01%
cpu : usr=14.58%, sys=47.48%, ctx=291912925, majf=0, minf=10492
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=3949519687,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128
socket1: (groupid=1, jobs=64): err= 0: pid=522492: Thu Aug 5 19:33:05 2021
read: IOPS=13.6M, BW=51.8GiB/s (55.7GB/s)(15.2TiB/300004msec)
slat (nsec): min=1323, max=4335.7k, avg=2242.27, stdev=1608.25
clat (usec): min=14, max=41341, avg=600.15, stdev=726.62
lat (usec): min=20, max=41358, avg=602.46, stdev=726.64
clat percentiles (usec):
| 1.00th=[ 115], 5.00th=[ 151], 10.00th=[ 184], 20.00th=[ 231],
| 30.00th=[ 269], 40.00th=[ 306], 50.00th=[ 347], 60.00th=[ 400],
| 70.00th=[ 506], 80.00th=[ 799], 90.00th=[ 1303], 95.00th=[ 1909],
| 99.00th=[ 3589], 99.50th=[ 4424], 99.90th=[ 7111], 99.95th=[ 7767],
| 99.99th=[10290]
bw ( MiB/s): min=28663, max=71847, per=21.11%, avg=53145.09, stdev=111.29, samples=38016
iops : min=7337860, max=18392866, avg=13605117.00, stdev=28491.19, samples=38016
lat (usec) : 20=0.01%, 50=0.02%, 100=0.36%, 250=24.52%, 500=44.77%
lat (usec) : 750=8.90%, 1000=6.37%
lat (msec) : 2=10.52%, 4=3.87%, 10=0.66%, 20=0.01%, 50=0.01%
cpu : usr=14.86%, sys=49.40%, ctx=282634154, majf=0, minf=10276
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=4076360454,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128
socket0-md: (groupid=2, jobs=64): err= 0: pid=524061: Thu Aug 5 19:33:05 2021
read: IOPS=5332k, BW=20.3GiB/s (21.8GB/s)(6102GiB/300002msec)
slat (nsec): min=1633, max=17043k, avg=11123.38, stdev=8694.61
clat (usec): min=186, max=18705, avg=1524.87, stdev=115.29
lat (usec): min=200, max=18743, avg=1536.08, stdev=115.90
clat percentiles (usec):
| 1.00th=[ 1270], 5.00th=[ 1336], 10.00th=[ 1369], 20.00th=[ 1418],
| 30.00th=[ 1467], 40.00th=[ 1500], 50.00th=[ 1532], 60.00th=[ 1549],
| 70.00th=[ 1582], 80.00th=[ 1631], 90.00th=[ 1680], 95.00th=[ 1713],
| 99.00th=[ 1795], 99.50th=[ 1811], 99.90th=[ 1893], 99.95th=[ 1926],
| 99.99th=[ 2089]
bw ( MiB/s): min=19030, max=21969, per=100.00%, avg=20843.43, stdev= 5.35, samples=38272
iops : min=4871687, max=5624289, avg=5335900.01, stdev=1370.43, samples=38272
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=99.97%, 4=0.02%, 10=0.01%, 20=0.01%
cpu : usr=5.56%, sys=77.91%, ctx=8118, majf=0, minf=9018
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=1599503201,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128
socket1-md: (groupid=3, jobs=64): err= 0: pid=524125: Thu Aug 5 19:33:05 2021
read: IOPS=5892k, BW=22.5GiB/s (24.1GB/s)(6743GiB/300002msec)
slat (nsec): min=1663, max=1274.1k, avg=9896.09, stdev=7939.50
clat (usec): min=236, max=11102, avg=1379.86, stdev=148.64
lat (usec): min=239, max=11110, avg=1389.84, stdev=149.54
clat percentiles (usec):
| 1.00th=[ 1106], 5.00th=[ 1172], 10.00th=[ 1205], 20.00th=[ 1254],
| 30.00th=[ 1287], 40.00th=[ 1336], 50.00th=[ 1369], 60.00th=[ 1401],
| 70.00th=[ 1434], 80.00th=[ 1500], 90.00th=[ 1582], 95.00th=[ 1663],
| 99.00th=[ 1811], 99.50th=[ 1860], 99.90th=[ 1942], 99.95th=[ 1958],
| 99.99th=[ 2040]
bw ( MiB/s): min=20982, max=24535, per=-82.15%, avg=23034.61, stdev=15.46, samples=38272
iops : min=5371404, max=6281119, avg=5896843.14, stdev=3958.21, samples=38272
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=99.97%, 4=0.02%, 10=0.01%, 20=0.01%
cpu : usr=6.55%, sys=74.98%, ctx=9833, majf=0, minf=8956
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=1767618924,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128
Run status group 0 (all jobs):
READ: bw=50.2GiB/s (53.9GB/s), 50.2GiB/s-50.2GiB/s (53.9GB/s-53.9GB/s), io=14.7TiB (16.2TB), run=300005-300005msec
Run status group 1 (all jobs):
READ: bw=51.8GiB/s (55.7GB/s), 51.8GiB/s-51.8GiB/s (55.7GB/s-55.7GB/s), io=15.2TiB (16.7TB), run=300004-300004msec
Run status group 2 (all jobs):
READ: bw=20.3GiB/s (21.8GB/s), 20.3GiB/s-20.3GiB/s (21.8GB/s-21.8GB/s), io=6102GiB (6552GB), run=300002-300002msec
Run status group 3 (all jobs):
READ: bw=22.5GiB/s (24.1GB/s), 22.5GiB/s-22.5GiB/s (24.1GB/s-24.1GB/s), io=6743GiB (7240GB), run=300002-300002msec
Disk stats (read/write):
nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme1n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme2n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme3n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme4n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme5n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme6n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme7n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme8n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme9n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme10n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme11n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme12n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme13n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme14n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme15n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme17n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme18n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme19n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme20n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme21n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme22n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme23n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme24n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
md0: ios=1599378656/0, merge=0/0, ticks=391992721/0, in_queue=391992721, util=100.00%
md1: ios=1767484212/0, merge=0/0, ticks=427666887/0, in_queue=427666887, util=100.00%
From: Gal Ofri <gal.ofri@volumez.com>
Sent: Wednesday, July 28, 2021 5:43 AM
To: Finlayson, James M CIV (USA) <james.m.finlayson4.civ@mail.mil>; 'linux-raid@vger.kernel.org' <linux-raid@vger.kernel.org>
Subject: [Non-DoD Source] Re: Can't get RAID5/RAID6 NVMe randomread IOPS - AMD ROME what am I missing?????
All active links contained in this email were disabled. Please verify the identity of the sender, and confirm the authenticity of all links contained within the message prior to copying and pasting the address to a Web browser.
________________________________________
A recent commit raised the limit on raid5/6 read iops.
It's available in 5.14.
See Caution-https://github.com/torvalds/linux/commit/97ae27252f4962d0fcc38ee1d9f913d817a2024e < Caution-https://github.com/torvalds/linux/commit/97ae27252f4962d0fcc38ee1d9f913d817a2024e >
commit 97ae27252f4962d0fcc38ee1d9f913d817a2024e
Author: Gal Ofri <gal.ofri@storing.io>
Date: Mon Jun 7 14:07:03 2021 +0300
md/raid5: avoid device_lock in read_one_chunk()
Please do share if you reach more iops in your env than described in the commit.
Cheers,
Gal,
Volumez (formerly storing.io)
next prev parent reply other threads:[~2021-08-05 19:52 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-27 20:32 Can't get RAID5/RAID6 NVMe randomread IOPS - AMD ROME what am I missing????? Finlayson, James M CIV (USA)
2021-07-27 21:52 ` Chris Murphy
2021-07-27 22:42 ` Peter Grandi
2021-07-28 10:31 ` Matt Wallis
2021-07-28 10:43 ` [Non-DoD Source] " Finlayson, James M CIV (USA)
2021-07-29 0:54 ` [Non-DoD Source] " Matt Wallis
2021-07-29 16:35 ` Wols Lists
2021-07-29 18:12 ` Finlayson, James M CIV (USA)
2021-07-29 22:05 ` Finlayson, James M CIV (USA)
2021-07-30 8:28 ` Matt Wallis
2021-07-30 8:45 ` Miao Wang
2021-07-30 9:59 ` Finlayson, James M CIV (USA)
2021-07-30 14:03 ` Doug Ledford
2021-07-30 13:17 ` Peter Grandi
2021-07-30 9:54 ` Finlayson, James M CIV (USA)
2021-08-01 11:21 ` Gal Ofri
2021-08-03 14:59 ` [Non-DoD Source] " Finlayson, James M CIV (USA)
2021-08-04 9:33 ` Gal Ofri
[not found] ` <AS8PR04MB799205817C4647DAC740DE9A91EA9@AS8PR04MB7992.eurprd04.prod.outlook.com>
[not found] ` <5EAED86C53DED2479E3E145969315A2385856AD0@UMECHPA7B.easf.csd.disa.mil>
[not found] ` <5EAED86C53DED2479E3E145969315A2385856AF7@UMECHPA7B.easf.csd.disa.mil>
2021-08-05 19:52 ` Finlayson, James M CIV (USA) [this message]
2021-08-05 20:50 ` Finlayson, James M CIV (USA)
2021-08-05 21:10 ` Finlayson, James M CIV (USA)
2021-08-08 14:43 ` Gal Ofri
2021-08-09 19:01 ` Finlayson, James M CIV (USA)
2021-08-17 21:21 ` Finlayson, James M CIV (USA)
2021-08-18 0:45 ` [Non-DoD Source] " Matt Wallis
2021-08-18 10:20 ` Finlayson, James M CIV (USA)
2021-08-18 19:48 ` Doug Ledford
2021-08-18 19:59 ` Doug Ledford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5EAED86C53DED2479E3E145969315A2385856B25@UMECHPA7B.easf.csd.disa.mil \
--to=james.m.finlayson4.civ@mail.mil \
--cc=gal.ofri@volumez.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.