From: Song Liu <songliubraving@fb.com>
To: Vishal Verma <vverma@digitalocean.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Regarding Raid-6 array
Date: Mon, 1 Feb 2021 22:00:41 +0000 [thread overview]
Message-ID: <EE769930-E732-453A-A3A7-CBB0F6967F2B@fb.com> (raw)
In-Reply-To: <CAPgOLid8qb3igOttaZx1dSwPRpvHDaFbiqn+mFAaYZDaepijag@mail.gmail.com>
CC the list
Hi Vishal,
> On Jan 27, 2021, at 9:04 PM, Vishal Verma <vverma@digitalocean.com> wrote:
>
> Hello Song,
>
> This is Vishal Verma, Performance Engineer at DigitalOcean.
>
> I was recently playing with our 6x nvme drive based RAID stack assessing its performance against a RAID-10 array.
> I understand with RAID-10 there is no striping or parity work so its write performance looks really nice.
>
> But, I was not sure about the RAID-6 piece, specifically the RMW piece.
> I was running FIO 128K 100% sequential workload to the raid6 drive array with O_DIRECT and noticed how drives performed:
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.18 0.00 2.36 0.00 0.00 97.46
>
> Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
> nvme1n1 546.00 8330.00 69768.00 994900.00 16896.00 240311.00 96.87 96.65 0.38 0.56 0.00 127.78 119.44 0.10 84.60
> nvme0n1 513.00 8333.50 65544.00 982314.00 15873.00 237245.00 96.87 96.61 0.35 0.32 0.00 127.77 117.88 0.10 86.80
> nvme4n1 480.00 8795.50 61440.00 1045290.00 14880.00 252535.00 96.88 96.63 0.35 0.28 0.00 128.00 118.84 0.09 87.00
> nvme3n1 513.00 8425.50 65664.00 1012554.00 15903.00 244868.50 96.88 96.67 0.37 0.70 0.01 128.00 120.18 0.09 84.20
> nvme5n1 497.00 8618.00 63496.00 1011158.00 15377.00 244201.50 96.87 96.59 0.36 0.64 0.00 127.76 117.33 0.10 88.20
> nvme2n1 529.00 8306.50 67712.00 998920.00 16399.00 241243.50 96.88 96.67 0.39 0.37 0.00 128.00 120.26 0.09 83.60
>
>
> I was surprised to see the amount of reads the drives were doing even though it was a full 100% write test.
> I understand for every write IO RAID-6 md array has to first read P,Q read the data and then calculate new P and Q and write the data.
>
> However, do we expect the drives to read that much i.e ~65MB/s (6%) worth of reads for every 1GB/s (100%) of writes?
6% of read for sequential write is not surprising. In the worst case,
for 4kB write from upper layer, RAID-6 does introduce 2x reads and 3x
writes: read P, readQ, write data, write P, write Q. The raid layer
will optimize full stripe writes to avoid reads, but it is common for
upper layer to do non-full-stripe writes.
Thanks,
Song
[...]
parent reply other threads:[~2021-02-01 22:01 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <CAPgOLid8qb3igOttaZx1dSwPRpvHDaFbiqn+mFAaYZDaepijag@mail.gmail.com>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=EE769930-E732-453A-A3A7-CBB0F6967F2B@fb.com \
--to=songliubraving@fb.com \
--cc=linux-raid@vger.kernel.org \
--cc=vverma@digitalocean.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).