All of lore.kernel.org
 help / color / mirror / Atom feed
From: pg@mdraid.list.sabi.co.UK (Peter Grandi)
To: list Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: [Non-DoD Source] Can't get RAID5/RAID6 NVMe randomread IOPS - AMD ROME what am I missing?????
Date: Fri, 30 Jul 2021 15:17:32 +0200	[thread overview]
Message-ID: <24835.64492.166839.611174@cyme.ty.sabi.co.uk> (raw)
In-Reply-To: <CD53203D-6A23-40F8-9FD4-A60019F67B37@gmail.com>

>>> On Fri, 30 Jul 2021 16:45:32 +0800, Miao Wang
>>> <shankerwangmiao@gmail.com> said:

> [...] was also stuck in a similar problem and finally gave
> up. Since it is very difficult to find such environment with
> so many fast nvme drives, I wonder if you have any interest in
> ZFS. [...]

Or Btrfs or the new 'bachefs' which is OK for simple
configurations (RAID10-like).

But part of the issue here with MD RAID is that it is in theory
mostly a translation layer like 'loop', but also sort of like a
virtual block device too, and weird things happen as IO requests
get reshape and requeued.

My impression that I mentioned in a previous message is that
probably the critical detail is:


>> Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
>> nvme0n1       1317510.00    0.00 5270044.00      0.00     0.00     0.00   0.00   0.00    0.31    0.00 411.95     4.00     0.00   0.00 100.40
>> [...]
>> Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
>> nvme0n1       114589.00    0.00 458356.00      0.00     0.00     0.00   0.00   0.00    0.29    0.00  33.54     4.00     0.00   0.01 100.00

> The obvious difference is the factor of 10 in "aqu-sz" and that
> correspond to the factor of 10 in "r/s" and "rkB/s".

That may happen because the test is run directly on the 'md[01]'
block device, which can do odd things. Counterintutively much
bigger 'aqu-sz' and thus much better speed could be achieved by
doing the test using a suitable filesystem on top of the 'md[01]'
device.

With ZFS there is a good chance that since striping is integrated
within ZFS that could happen too, especially on highly parallel
workloads.

There is however a huge warning: the test is run on IOPS with
4KiB blocks, and ZFS in COW mode does not work well with that
(especially for writes, but also for reads, if compression and
checksumming are enabled, for RAIDz) so I think that it should be
run with COW disabled, or perhaps on a 'zvol'.

  parent reply	other threads:[~2021-07-30 14:20 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27 20:32 Can't get RAID5/RAID6 NVMe randomread IOPS - AMD ROME what am I missing????? Finlayson, James M CIV (USA)
2021-07-27 21:52 ` Chris Murphy
2021-07-27 22:42 ` Peter Grandi
2021-07-28 10:31 ` Matt Wallis
2021-07-28 10:43   ` [Non-DoD Source] " Finlayson, James M CIV (USA)
2021-07-29  0:54     ` [Non-DoD Source] " Matt Wallis
2021-07-29 16:35       ` Wols Lists
2021-07-29 18:12         ` Finlayson, James M CIV (USA)
2021-07-29 22:05       ` Finlayson, James M CIV (USA)
2021-07-30  8:28         ` Matt Wallis
2021-07-30  8:45           ` Miao Wang
2021-07-30  9:59             ` Finlayson, James M CIV (USA)
2021-07-30 14:03               ` Doug Ledford
2021-07-30 13:17             ` Peter Grandi [this message]
2021-07-30  9:54           ` Finlayson, James M CIV (USA)
2021-08-01 11:21 ` Gal Ofri
2021-08-03 14:59   ` [Non-DoD Source] " Finlayson, James M CIV (USA)
2021-08-04  9:33     ` Gal Ofri
     [not found] ` <AS8PR04MB799205817C4647DAC740DE9A91EA9@AS8PR04MB7992.eurprd04.prod.outlook.com>
     [not found]   ` <5EAED86C53DED2479E3E145969315A2385856AD0@UMECHPA7B.easf.csd.disa.mil>
     [not found]     ` <5EAED86C53DED2479E3E145969315A2385856AF7@UMECHPA7B.easf.csd.disa.mil>
2021-08-05 19:52       ` Finlayson, James M CIV (USA)
2021-08-05 20:50         ` Finlayson, James M CIV (USA)
2021-08-05 21:10           ` Finlayson, James M CIV (USA)
2021-08-08 14:43             ` Gal Ofri
2021-08-09 19:01               ` Finlayson, James M CIV (USA)
2021-08-17 21:21                 ` Finlayson, James M CIV (USA)
2021-08-18  0:45                   ` [Non-DoD Source] " Matt Wallis
2021-08-18 10:20                     ` Finlayson, James M CIV (USA)
2021-08-18 19:48                       ` Doug Ledford
2021-08-18 19:59                       ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24835.64492.166839.611174@cyme.ty.sabi.co.uk \
    --to=pg@mdraid.list.sabi.co.uk \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.