From: Marcin Wanat <marcin.wanat@gmail.com>
To: linux-raid@vger.kernel.org
Subject: Re: Slow initial resync in RAID6 with 36 SAS drives
Date: Wed, 25 Aug 2021 12:06:09 +0200 [thread overview]
Message-ID: <CAFDAVzmjGYsdgx0Yyn3n8NWVpAZQqmhBSneZY9fagV5PGTrgGw@mail.gmail.com> (raw)
In-Reply-To: <CAFDAVznKiKC7YrCTJ4oj6NimXrhnY-=PUnJhFopw6Ur5LvOCjg@mail.gmail.com>
On Thu, Aug 19, 2021 at 11:28 AM Marcin Wanat <marcin.wanat@gmail.com> wrote:
>
> Sorry, this will be a long email with everything I find to be relevant.
> I have a mdraid6 array with 36 hdd SAS drives each able to do
> >200MB/s, but I am unable to get more than 38MB/s resync speed on a
> fast system (48cores/96GB ram) with no other load.
I have done a bit more research on 24 NVMe drives server and found
that resync speed bottleneck affect RAID6 with >16 drives:
# mdadm --create --verbose /dev/md0 --level=6 --raid-devices=16
/dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1
/dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1
/dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 /dev/nvme15n1
/dev/nvme16n1
# iostat -dx 5
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s
%rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme1n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.88 0.00 0.99 470.84 2.25 2.51 86.04
nvme4n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.89 0.00 0.99 470.84 2.25 2.51 86.06
nvme5n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.89 0.00 0.99 470.84 2.25 2.51 86.14
nvme10n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.90 0.00 0.99 470.84 2.25 2.51 86.20
nvme9n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.91 0.00 1.00 470.84 2.25 2.53 86.76
nvme13n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.93 0.00 1.00 470.84 2.25 2.54 87.00
nvme12n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.94 0.00 1.01 470.84 2.25 2.54 87.08
nvme8n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.93 0.00 1.00 470.84 2.25 2.54 87.02
nvme14n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.96 0.00 1.01 470.84 2.25 2.56 87.64
nvme22n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme17n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme16n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 3.05 0.00 1.04 470.84 2.25 2.58 88.56
nvme19n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme2n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.94 0.00 1.01 470.84 2.25 2.54 87.20
nvme6n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.95 0.00 1.01 470.84 2.25 2.55 87.52
nvme7n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.94 0.00 1.01 470.84 2.25 2.54 87.22
nvme21n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme11n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.96 0.00 1.02 470.84 2.25 2.56 87.72
nvme15n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.99 0.00 1.02 470.84 2.25 2.53 86.84
nvme23n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme18n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme3n1 342.60 0.40 161311.20 0.90 39996.60 0.00
99.15 0.00 2.97 0.00 1.02 470.84 2.25 2.53 86.66
nvme20n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
as you can see, there are 342 iops with ~470 rareq-sz, but when i
create RAID6 with 17 drives or more:
# mdadm --create --verbose /dev/md0 --level=6 --raid-devices=17
/dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1
/dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1
/dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 /dev/nvme15n1
/dev/nvme16n1 /dev/nvme17n1
# iostat -dx 5
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s
%rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme1n1 21484.20 0.40 85936.80 0.90 0.00 0.00
0.00 0.00 0.04 0.00 0.82 4.00 2.25 0.05 99.16
nvme4n1 21484.00 0.40 85936.00 0.90 0.00 0.00
0.00 0.00 0.03 0.00 0.74 4.00 2.25 0.05 99.16
nvme5n1 21484.00 0.40 85936.00 0.90 0.00 0.00
0.00 0.00 0.04 0.00 0.84 4.00 2.25 0.05 99.16
nvme10n1 21483.80 0.40 85935.20 0.90 0.00 0.00
0.00 0.00 0.03 0.00 0.65 4.00 2.25 0.04 83.64
nvme9n1 21483.80 0.40 85935.20 0.90 0.00 0.00
0.00 0.00 0.03 0.00 0.67 4.00 2.25 0.04 85.86
nvme13n1 21483.60 0.40 85934.40 0.90 0.00 0.00
0.00 0.00 0.03 0.00 0.63 4.00 2.25 0.04 83.66
nvme12n1 21483.60 0.40 85934.40 0.90 0.00 0.00
0.00 0.00 0.03 0.00 0.65 4.00 2.25 0.04 83.66
nvme8n1 21483.60 0.40 85934.40 0.90 0.00 0.00
0.00 0.00 0.04 0.00 0.81 4.00 2.25 0.05 99.22
nvme14n1 21481.80 0.40 85927.20 0.90 0.00 0.00
0.00 0.00 0.03 0.00 0.67 4.00 2.25 0.04 83.66
nvme22n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme17n1 21482.00 0.40 85928.00 0.90 0.00 0.00
0.00 0.00 0.02 0.00 0.49 4.00 2.25 0.03 67.12
nvme16n1 21481.60 0.40 85926.40 0.90 0.00 0.00
0.00 0.00 0.03 0.00 0.75 4.00 2.25 0.04 83.66
nvme19n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme2n1 21481.60 0.40 85926.40 0.90 0.00 0.00
0.00 0.00 0.04 0.00 0.95 4.00 2.25 0.05 99.26
nvme6n1 21481.60 0.40 85926.40 0.90 0.00 0.00
0.00 0.00 0.04 0.00 0.91 4.00 2.25 0.05 99.26
nvme7n1 21481.60 0.40 85926.40 0.90 0.00 0.00
0.00 0.00 0.04 0.00 0.87 4.00 2.25 0.05 99.24
nvme21n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme11n1 21481.20 0.40 85924.80 0.90 0.00 0.00
0.00 0.00 0.03 0.00 0.75 4.00 2.25 0.04 83.66
nvme15n1 21480.20 0.40 85920.80 0.90 0.00 0.00
0.00 0.00 0.04 0.00 0.80 4.00 2.25 0.04 83.66
nvme23n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme18n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme3n1 21480.40 0.40 85921.60 0.90 0.00 0.00
0.00 0.00 0.05 0.00 1.02 4.00 2.25 0.05 99.26
nvme20n1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
rareq-sz drops to 4, iops increase to 21483 and resync speed drops to 85MB/s.
Why is it like that? Could someone let me know which part of mdraid
kernel code is responsible for this limitation ? Is changing this and
recompiling the kernel on machine with 512GB+ ram safe ?
Regards,
Marcin Wanat
next prev parent reply other threads:[~2021-08-25 10:06 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-19 9:28 Slow initial resync in RAID6 with 36 SAS drives Marcin Wanat
2021-08-25 10:06 ` Marcin Wanat [this message]
2021-08-25 10:28 ` [Non-DoD Source] " Finlayson, James M CIV (USA)
2021-09-01 1:22 ` antlists
2021-09-01 1:50 ` Guoqing Jiang
2021-09-01 5:19 ` Song Liu
2021-09-03 0:58 ` Song Liu
2021-09-03 2:56 ` Jens Axboe
2021-09-04 15:24 ` Marcin Wanat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFDAVzmjGYsdgx0Yyn3n8NWVpAZQqmhBSneZY9fagV5PGTrgGw@mail.gmail.com \
--to=marcin.wanat@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.