All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Finlayson, James M CIV (USA)" <james.m.finlayson4.civ@mail.mil>
To: "'linux-raid@vger.kernel.org'" <linux-raid@vger.kernel.org>
Cc: 'Gal Ofri' <gal.ofri@volumez.com>,
	"Finlayson, James M CIV (USA)" <james.m.finlayson4.civ@mail.mil>
Subject: RE: [Non-DoD Source] Re: Can't get RAID5/RAID6 NVMe randomread IOPS - AMD ROME what am I missing?????
Date: Thu, 5 Aug 2021 19:52:01 +0000	[thread overview]
Message-ID: <5EAED86C53DED2479E3E145969315A2385856B25@UMECHPA7B.easf.csd.disa.mil> (raw)
In-Reply-To: <5EAED86C53DED2479E3E145969315A2385856AF7@UMECHPA7B.easf.csd.disa.mil>

Sorry - again..I sent HTML instead of plain text

Resend - mailing list bounce  
All, 
Sorry for the delay - both work and life got into the way.   Here is some feedback:

BLUF upfront with 5.14rc3 kernel that our SA built - md0 a 10+1+1 RAID5 - 5.332 M IOPS 20.3GiB/s, md1 a 10+1+1 RAID5, 5.892M IOPS 22.5GiB/s  - best hero numbers I've ever seen on mdraid  RAID5 IOPS.   I think the kernel patch is good.  Prior was  socket0 1.263M IOPS 4934MiB/s, socket1 1.071M IOSP, 4183MiB/s....   I'm willing to help push this as hard as we can until we hit a bottleneck outside of our control.   

I need to verify the RAW IOPS - admittedly this is a different server and I didn't do any regression testing before the kernel, but my raw were  socket0: 13.2M IOPS and socket1  13.5M IOPS.   Prior was socket0 16.0M IOPS and socket1 13.5M IOPS.   - admittedly there appears to a regression in the socket0 "hero run" but what I don't know that since this is a different server, I don't know if I have a configuration management issue in my zealousness to test this patch or whether we have a regression.   I was so excited to have the attention of kernel developers that needed my help that I borrowed another system, because I didn't want to tear apart my "Frankenstein's monster" 32 partition mdraid LVM mess.   If I can switch kernels and reboot before work and life get back in the way, I'll follow  up..

I think I might have to give myself the action to run this to ground next week on the other server.   Without a doubt the mdraid lock improvement is worth taking forward.   I either have to find my error or point a finger as my raw hero numbers got worse.   I tend to see one socket outrun another -  the way HPE allocates the nvme drives to pcie root complexes  is not how I'd like to do it so the drives are unbalanced on the PCIe root complexes (drives are in 4 different root complexes on socket 0 and 3 on socket 1, so one would think socket0 will always be faster for hero runs  (an NPS4 numa mapping is the best way to show it:  
[root@gremlin04 hornet05]# cat *nps4
#filename=/dev/nvme0n1 0
#filename=/dev/nvme1n1 0
#filename=/dev/nvme2n1 1
#filename=/dev/nvme3n1 1
#filename=/dev/nvme4n1 2
#filename=/dev/nvme5n1 2
#filename=/dev/nvme6n1 2
#filename=/dev/nvme7n1 2
#filename=/dev/nvme8n1 3
#filename=/dev/nvme9n1 3
#filename=/dev/nvme10n1 3
#filename=/dev/nvme11n1 3
#filename=/dev/nvme12n1 4
#filename=/dev/nvme13n1 4
#filename=/dev/nvme14n1 4
#filename=/dev/nvme15n1 4
#filename=/dev/nvme17n1 5
#filename=/dev/nvme18n1 5
#filename=/dev/nvme19n1 5
#filename=/dev/nvme20n1 5
#filename=/dev/nvme21n1 6
#filename=/dev/nvme22n1 6
#filename=/dev/nvme23n1 6
#filename=/dev/nvme24n1 6


fio fiojim.hpdl385.nps1
socket0: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
socket1: (g=1): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
socket0-md: (g=2): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
socket1-md: (g=3): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.26
Starting 256 processes
Jobs: 128 (f=128): [_(128),r(128)][1.5%][r=42.8GiB/s][r=11.2M IOPS][eta 10h:40m:00s]        
socket0: (groupid=0, jobs=64): err= 0: pid=522428: Thu Aug  5 19:33:05 2021
  read: IOPS=13.2M, BW=50.2GiB/s (53.9GB/s)(14.7TiB/300005msec)
    slat (nsec): min=1312, max=8308.1k, avg=2206.72, stdev=1505.92
    clat (usec): min=14, max=42033, avg=619.56, stdev=671.45
     lat (usec): min=19, max=42045, avg=621.83, stdev=671.46
    clat percentiles (usec):
     |  1.00th=[  113],  5.00th=[  149], 10.00th=[  180], 20.00th=[  229],
     | 30.00th=[  273], 40.00th=[  310], 50.00th=[  351], 60.00th=[  408],
     | 70.00th=[  578], 80.00th=[  938], 90.00th=[ 1467], 95.00th=[ 1909],
     | 99.00th=[ 3163], 99.50th=[ 4178], 99.90th=[ 5800], 99.95th=[ 6390],
     | 99.99th=[ 8455]
   bw (  MiB/s): min=28741, max=61365, per=18.56%, avg=51489.80, stdev=82.09, samples=38016
   iops        : min=7357916, max=15709528, avg=13181362.22, stdev=21013.83, samples=38016
  lat (usec)   : 20=0.01%, 50=0.02%, 100=0.42%, 250=24.52%, 500=42.21%
  lat (usec)   : 750=7.94%, 1000=6.34%
  lat (msec)   : 2=14.26%, 4=3.74%, 10=0.54%, 20=0.01%, 50=0.01%
  cpu          : usr=14.58%, sys=47.48%, ctx=291912925, majf=0, minf=10492
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=3949519687,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128
socket1: (groupid=1, jobs=64): err= 0: pid=522492: Thu Aug  5 19:33:05 2021
  read: IOPS=13.6M, BW=51.8GiB/s (55.7GB/s)(15.2TiB/300004msec)
    slat (nsec): min=1323, max=4335.7k, avg=2242.27, stdev=1608.25
    clat (usec): min=14, max=41341, avg=600.15, stdev=726.62
     lat (usec): min=20, max=41358, avg=602.46, stdev=726.64
    clat percentiles (usec):
     |  1.00th=[  115],  5.00th=[  151], 10.00th=[  184], 20.00th=[  231],
     | 30.00th=[  269], 40.00th=[  306], 50.00th=[  347], 60.00th=[  400],
     | 70.00th=[  506], 80.00th=[  799], 90.00th=[ 1303], 95.00th=[ 1909],
     | 99.00th=[ 3589], 99.50th=[ 4424], 99.90th=[ 7111], 99.95th=[ 7767],
     | 99.99th=[10290]
   bw (  MiB/s): min=28663, max=71847, per=21.11%, avg=53145.09, stdev=111.29, samples=38016
   iops        : min=7337860, max=18392866, avg=13605117.00, stdev=28491.19, samples=38016
  lat (usec)   : 20=0.01%, 50=0.02%, 100=0.36%, 250=24.52%, 500=44.77%
  lat (usec)   : 750=8.90%, 1000=6.37%
  lat (msec)   : 2=10.52%, 4=3.87%, 10=0.66%, 20=0.01%, 50=0.01%
  cpu          : usr=14.86%, sys=49.40%, ctx=282634154, majf=0, minf=10276
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=4076360454,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128
socket0-md: (groupid=2, jobs=64): err= 0: pid=524061: Thu Aug  5 19:33:05 2021
  read: IOPS=5332k, BW=20.3GiB/s (21.8GB/s)(6102GiB/300002msec)
    slat (nsec): min=1633, max=17043k, avg=11123.38, stdev=8694.61
    clat (usec): min=186, max=18705, avg=1524.87, stdev=115.29
     lat (usec): min=200, max=18743, avg=1536.08, stdev=115.90
    clat percentiles (usec):
     |  1.00th=[ 1270],  5.00th=[ 1336], 10.00th=[ 1369], 20.00th=[ 1418],
     | 30.00th=[ 1467], 40.00th=[ 1500], 50.00th=[ 1532], 60.00th=[ 1549],
     | 70.00th=[ 1582], 80.00th=[ 1631], 90.00th=[ 1680], 95.00th=[ 1713],
     | 99.00th=[ 1795], 99.50th=[ 1811], 99.90th=[ 1893], 99.95th=[ 1926],
     | 99.99th=[ 2089]
   bw (  MiB/s): min=19030, max=21969, per=100.00%, avg=20843.43, stdev= 5.35, samples=38272
   iops        : min=4871687, max=5624289, avg=5335900.01, stdev=1370.43, samples=38272
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=99.97%, 4=0.02%, 10=0.01%, 20=0.01%
  cpu          : usr=5.56%, sys=77.91%, ctx=8118, majf=0, minf=9018
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
    issued rwts: total=1599503201,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128
socket1-md: (groupid=3, jobs=64): err= 0: pid=524125: Thu Aug  5 19:33:05 2021
  read: IOPS=5892k, BW=22.5GiB/s (24.1GB/s)(6743GiB/300002msec)
    slat (nsec): min=1663, max=1274.1k, avg=9896.09, stdev=7939.50
    clat (usec): min=236, max=11102, avg=1379.86, stdev=148.64
     lat (usec): min=239, max=11110, avg=1389.84, stdev=149.54
    clat percentiles (usec):
     |  1.00th=[ 1106],  5.00th=[ 1172], 10.00th=[ 1205], 20.00th=[ 1254],
     | 30.00th=[ 1287], 40.00th=[ 1336], 50.00th=[ 1369], 60.00th=[ 1401],
     | 70.00th=[ 1434], 80.00th=[ 1500], 90.00th=[ 1582], 95.00th=[ 1663],
     | 99.00th=[ 1811], 99.50th=[ 1860], 99.90th=[ 1942], 99.95th=[ 1958],
     | 99.99th=[ 2040]
   bw (  MiB/s): min=20982, max=24535, per=-82.15%, avg=23034.61, stdev=15.46, samples=38272
   iops        : min=5371404, max=6281119, avg=5896843.14, stdev=3958.21, samples=38272
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=99.97%, 4=0.02%, 10=0.01%, 20=0.01%
  cpu          : usr=6.55%, sys=74.98%, ctx=9833, majf=0, minf=8956
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=1767618924,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=50.2GiB/s (53.9GB/s), 50.2GiB/s-50.2GiB/s (53.9GB/s-53.9GB/s), io=14.7TiB (16.2TB), run=300005-300005msec

Run status group 1 (all jobs):
   READ: bw=51.8GiB/s (55.7GB/s), 51.8GiB/s-51.8GiB/s (55.7GB/s-55.7GB/s), io=15.2TiB (16.7TB), run=300004-300004msec

Run status group 2 (all jobs):
   READ: bw=20.3GiB/s (21.8GB/s), 20.3GiB/s-20.3GiB/s (21.8GB/s-21.8GB/s), io=6102GiB (6552GB), run=300002-300002msec

Run status group 3 (all jobs):
   READ: bw=22.5GiB/s (24.1GB/s), 22.5GiB/s-22.5GiB/s (24.1GB/s-24.1GB/s), io=6743GiB (7240GB), run=300002-300002msec

Disk stats (read/write):
  nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme1n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme2n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme3n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme4n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme5n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme6n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme7n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme8n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme9n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme10n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme11n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme12n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme13n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme14n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme15n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme17n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme18n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme19n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme20n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme21n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme22n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme23n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme24n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  md0: ios=1599378656/0, merge=0/0, ticks=391992721/0, in_queue=391992721, util=100.00%
  md1: ios=1767484212/0, merge=0/0, ticks=427666887/0, in_queue=427666887, util=100.00%

From: Gal Ofri <gal.ofri@volumez.com> 
Sent: Wednesday, July 28, 2021 5:43 AM
To: Finlayson, James M CIV (USA) <james.m.finlayson4.civ@mail.mil>; 'linux-raid@vger.kernel.org' <linux-raid@vger.kernel.org>
Subject: [Non-DoD Source] Re: Can't get RAID5/RAID6 NVMe randomread IOPS - AMD ROME what am I missing?????

All active links contained in this email were disabled. Please verify the identity of the sender, and confirm the authenticity of all links contained within the message prior to copying and pasting the address to a Web browser. 
________________________________________

A recent commit raised the limit on raid5/6 read iops.
It's available in 5.14.
See Caution-https://github.com/torvalds/linux/commit/97ae27252f4962d0fcc38ee1d9f913d817a2024e < Caution-https://github.com/torvalds/linux/commit/97ae27252f4962d0fcc38ee1d9f913d817a2024e > 
commit 97ae27252f4962d0fcc38ee1d9f913d817a2024e
Author: Gal Ofri <gal.ofri@storing.io>
Date:   Mon Jun 7 14:07:03 2021 +0300
    md/raid5: avoid device_lock in read_one_chunk()

Please do share if you reach more iops in your env than described in the commit.

Cheers,
Gal, 
Volumez (formerly storing.io)

  parent reply	other threads:[~2021-08-05 19:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27 20:32 Can't get RAID5/RAID6 NVMe randomread IOPS - AMD ROME what am I missing????? Finlayson, James M CIV (USA)
2021-07-27 21:52 ` Chris Murphy
2021-07-27 22:42 ` Peter Grandi
2021-07-28 10:31 ` Matt Wallis
2021-07-28 10:43   ` [Non-DoD Source] " Finlayson, James M CIV (USA)
2021-07-29  0:54     ` [Non-DoD Source] " Matt Wallis
2021-07-29 16:35       ` Wols Lists
2021-07-29 18:12         ` Finlayson, James M CIV (USA)
2021-07-29 22:05       ` Finlayson, James M CIV (USA)
2021-07-30  8:28         ` Matt Wallis
2021-07-30  8:45           ` Miao Wang
2021-07-30  9:59             ` Finlayson, James M CIV (USA)
2021-07-30 14:03               ` Doug Ledford
2021-07-30 13:17             ` Peter Grandi
2021-07-30  9:54           ` Finlayson, James M CIV (USA)
2021-08-01 11:21 ` Gal Ofri
2021-08-03 14:59   ` [Non-DoD Source] " Finlayson, James M CIV (USA)
2021-08-04  9:33     ` Gal Ofri
     [not found] ` <AS8PR04MB799205817C4647DAC740DE9A91EA9@AS8PR04MB7992.eurprd04.prod.outlook.com>
     [not found]   ` <5EAED86C53DED2479E3E145969315A2385856AD0@UMECHPA7B.easf.csd.disa.mil>
     [not found]     ` <5EAED86C53DED2479E3E145969315A2385856AF7@UMECHPA7B.easf.csd.disa.mil>
2021-08-05 19:52       ` Finlayson, James M CIV (USA) [this message]
2021-08-05 20:50         ` Finlayson, James M CIV (USA)
2021-08-05 21:10           ` Finlayson, James M CIV (USA)
2021-08-08 14:43             ` Gal Ofri
2021-08-09 19:01               ` Finlayson, James M CIV (USA)
2021-08-17 21:21                 ` Finlayson, James M CIV (USA)
2021-08-18  0:45                   ` [Non-DoD Source] " Matt Wallis
2021-08-18 10:20                     ` Finlayson, James M CIV (USA)
2021-08-18 19:48                       ` Doug Ledford
2021-08-18 19:59                       ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5EAED86C53DED2479E3E145969315A2385856B25@UMECHPA7B.easf.csd.disa.mil \
    --to=james.m.finlayson4.civ@mail.mil \
    --cc=gal.ofri@volumez.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.