* Nr_requests mdraid
@ 2020-10-21 20:29 Finlayson, James M CIV (USA)
2020-10-28 18:39 ` Vitaly Mayatskih
0 siblings, 1 reply; 6+ messages in thread
From: Finlayson, James M CIV (USA) @ 2020-10-21 20:29 UTC (permalink / raw)
To: linux-raid
All,
I'm working on creating raid5 or raid6 arrays of 800K IOP nvme drives. Each of the drives performs well with a queue depth of 128 and I set to 1023 if allowed. In order for me to try to max out the queue depth on each RAID member, so I'd like to set the sysfs nr_requests on the md device to something greater than 128, like #raid members * 128. Even though /sys/block/md127/queue/nr_requests is mode 644, when I try to change nr_requests in any way as root, I get write error: invalid argument. When I'm hitting the md device with random reads, my nvme drives are 100% utilized, but only doing 160K IOPS because they have no queue depth.
Am I doing something silly?
Regards,
Jim Finlayson
[root@hp-dl325 ~]# cd /sys/block/md127/queue
[root@hp-dl325 queue]# ls
add_random discard_zeroes_data io_timeout max_segments optimal_io_size unpriv_sgio
chunk_sectors fua logical_block_size max_segment_size physical_block_size wbt_lat_usec
dax hw_sector_size max_discard_segments minimum_io_size read_ahead_kb write_cache
discard_granularity io_poll max_hw_sectors_kb nomerges rotational write_same_max_bytes
discard_max_bytes io_poll_delay max_integrity_segments nr_requests rq_affinity write_zeroes_max_bytes
discard_max_hw_bytes iostats max_sectors_kb nr_zones scheduler zoned
[root@hp-dl325 queue]# cat nr_requests
128
[root@hp-dl325 queue]# ls -l nr_requests
-rw-r--r-- 1 root root 4096 Oct 21 18:55 nr_requests
[root@hp-dl325 queue]# echo 1023 > nr_requests
-bash: echo: write error: Invalid argument
[root@hp-dl325 queue]# echo 128 > nr_requests
-bash: echo: write error: Invalid argument
[root@hp-dl325 queue]# pwd
/sys/block/md127/queue
[root@hp-dl325 queue]#
[root@hp-dl325 queue]# mdadm --version
mdadm - v4.1 - 2018-10-01
[root@hp-dl325 queue]# uname -r
4.18.0-193.el8.x86_64
[root@hp-dl325 queue]# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.2 (Ootpa)
[root@hp-dl325 queue]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0]
md126 : active raid5 nvme9n1[10] nvme8n1[8] nvme7n1[7] nvme6n1[6] nvme5n1[5] nvme4n1[4] nvme3n1[3] nvme2n1[2] nvme1n1[1] nvme0n1[0]
16877177856 blocks super 1.2 level 5, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
bitmap: 0/14 pages [0KB], 65536KB chunk
md127 : active raid5 nvme18n1[9] nvme17n1[7] nvme16n1[6] nvme15n1[5] nvme14n1[4] nvme13n1[3] nvme12n1[2] nvme11n1[1] nvme10n1[0]
15001935872 blocks super 1.2 level 5, 512k chunk, algorithm 2 [9/9] [UUUUUUUUU]
bitmap: 0/14 pages [0KB], 65536KB chunk
unused devices: <none>
[root@hp-dl325 queue]# cat /etc/udev/rules.d/99-local.rules
SUBSYSTEM=="block", ACTION=="add|change", KERNEL=="md*", ATTR{md/sync_speed_max}="2000000",ATTR{md/group_thread_cnt}="16", ATTR{md/stripe_cache_size}="1024" ATTR{queue/nomerges}="2", ATTR{queue/nr_requests}="1023", ATTR{queue/rotational}="0", ATTR{queue/rq_affinity}="2", ATTR{queue/scheduler}="none", ATTR{queue/add_random}="0", ATTR{queue/max_sectors_kb}="4096"
SUBSYSTEM=="block", ACTION=="add|change", KERNEL=="nvme*[0-9]n*[0-9]", ATTRS{model}=="*PM1725a*", ATTR{queue/nomerges}="2", ATTR{queue/nr_requests}="1023", ATTR{queue/rotational}="0", ATTR{queue/rq_affinity}="2", ATTR{queue/scheduler}="none", ATTR{queue/add_random}="0", ATTR{queue/max_sectors_kb}="4096"
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Nr_requests mdraid
2020-10-21 20:29 Nr_requests mdraid Finlayson, James M CIV (USA)
@ 2020-10-28 18:39 ` Vitaly Mayatskih
0 siblings, 0 replies; 6+ messages in thread
From: Vitaly Mayatskih @ 2020-10-28 18:39 UTC (permalink / raw)
To: Finlayson, James M CIV (USA); +Cc: linux-raid
On Thu, Oct 22, 2020 at 2:56 AM Finlayson, James M CIV (USA)
<james.m.finlayson4.civ@mail.mil> wrote:
>
> All,
> I'm working on creating raid5 or raid6 arrays of 800K IOP nvme drives. Each of the drives performs well with a queue depth of 128 and I set to 1023 if allowed. In order for me to try to max out the queue depth on each RAID member, so I'd like to set the sysfs nr_requests on the md device to something greater than 128, like #raid members * 128. Even though /sys/block/md127/queue/nr_requests is mode 644, when I try to change nr_requests in any way as root, I get write error: invalid argument. When I'm hitting the md device with random reads, my nvme drives are 100% utilized, but only doing 160K IOPS because they have no queue depth.
>
> Am I doing something silly?
It only works for blk-mq block devices. MD is not blk-mq.
You can exchange simplicity for performance: instead of creating one
RAID-5/6 array you can partition drives in N equal sized partitions,
create N RAID-5/6 arrays using one partition from every disk, then
stripe them into top-level RAID-0. So that would be RAID-5+0 (or 6+0).
It is awful, but simulates multiqueue and performs better in parallel
loads. Especially for writes (on RAID-5/6).
--
wbr, Vitaly
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Nr_requests mdraid
@ 2020-11-16 16:51 Finlayson, James M CIV (USA)
2020-11-20 10:42 ` Nikolay Kichukov
2020-11-20 13:20 ` Vitaly Mayatskih
0 siblings, 2 replies; 6+ messages in thread
From: Finlayson, James M CIV (USA) @ 2020-11-16 16:51 UTC (permalink / raw)
To: 'linux-raid@vger.kernel.org'
On Wed, Oct 28, 2020 at 6:39 PM Vitaly Mayatskih <v.mayatskih@gmail.com> wrote
>
>On Thu, Oct 22, 2020 at 2:56 AM Finlayson, James M CIV (USA) <james.m.finlayson4.civ@mail.mil> wrote:
>>
>> All,
>> I'm working on creating raid5 or raid6 arrays of 800K IOP nvme drives. Each of \
>> the drives performs well with a queue depth of 128 and I set to 1023 if allowed. \
>> In order for me to try to max out the queue depth on each RAID member, so I'd like \
>> to set the sysfs nr_requests on the md device to something greater than 128, like \
>> #raid members * 128. Even though /sys/block/md127/queue/nr_requests is mode 644, \
>> when I try to change nr_requests in any way as root, I get write error: invalid \
>> argument. When I'm hitting the md device with random reads, my nvme drives are \
>> 100% utilized, but only doing 160K IOPS because they have no queue depth.
>> Am I doing something silly?
>
>It only works for blk-mq block devices. MD is not blk-mq.
>
>You can exchange simplicity for performance: instead of creating one
>RAID-5/6 array you can partition drives in N equal sized partitions,
>create N RAID-5/6 arrays using one partition from every disk, then
>stripe them into top-level RAID-0. So that would be RAID-5+0 (or 6+0).
>
>It is awful, but simulates multiqueue and performs better in parallel
>loads. Especially for writes (on RAID-5/6).
>
>
>--
>wbr, Vitaly
Vitaly,
Thank you for the tip. My raid5 performance (after creating 32 partitions per SSD) and running 64 9+1 (2 in reality) stripes is up to 11.4M 4K random read IOPS, out of 17M that the box is capable, which I'm happy with, because I can't NUMA the raid stripes as I would the individual SSDs themselves. However, when I perform the RAID0 striping to make the "RAID50 from hell", my performance drops to 7.1M 4K random read IOPS. Any suggestions? The last RAID50, again won't let me generate the queue depth.
Thanks in advance,
Jim
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Nr_requests mdraid
2020-11-16 16:51 Finlayson, James M CIV (USA)
@ 2020-11-20 10:42 ` Nikolay Kichukov
2020-11-20 13:04 ` Vitaly Mayatskih
2020-11-20 13:20 ` Vitaly Mayatskih
1 sibling, 1 reply; 6+ messages in thread
From: Nikolay Kichukov @ 2020-11-20 10:42 UTC (permalink / raw)
To: Finlayson, James M CIV (USA), 'linux-raid@vger.kernel.org'
Hello all,
On Mon, 2020-11-16 at 16:51 +0000, Finlayson, James M CIV (USA) wrote:
> On Wed, Oct 28, 2020 at 6:39 PM Vitaly Mayatskih <
> v.mayatskih@gmail.com> wrote
> >
> > On Thu, Oct 22, 2020 at 2:56 AM Finlayson, James M CIV (USA) <
> > james.m.finlayson4.civ@mail.mil> wrote:
> > >
> > > All,
> > > I'm working on creating raid5 or raid6 arrays of 800K IOP nvme
> > > drives. Each of \
> > > the drives performs well with a queue depth of 128 and I set to
> > > 1023 if allowed. \
> > > In order for me to try to max out the queue depth on each RAID
> > > member, so I'd like \
> > > to set the sysfs nr_requests on the md device to something greater
> > > than 128, like \
> > > #raid members * 128. Even though
> > > /sys/block/md127/queue/nr_requests is mode 644, \
> > > when I try to change nr_requests in any way as root, I get write
> > > error: invalid \
> > > argument. When I'm hitting the md device with random reads, my
> > > nvme drives are \
> > > 100% utilized, but only doing 160K IOPS because they have no queue
> > > depth.
> > > Am I doing something silly?
> >
> > It only works for blk-mq block devices. MD is not blk-mq.
Would it be possible to implement something similar to the use_blk_mq of
dm_mod on md_mod?
> >
> > You can exchange simplicity for performance: instead of creating one
> > RAID-5/6 array you can partition drives in N equal sized partitions,
> > create N RAID-5/6 arrays using one partition from every disk, then
> > stripe them into top-level RAID-0. So that would be RAID-5+0 (or
> > 6+0).
> >
> > It is awful, but simulates multiqueue and performs better in
> > parallel
> > loads. Especially for writes (on RAID-5/6).
> >
> >
> > --
> > wbr, Vitaly
>
> Vitaly,
> Thank you for the tip. My raid5 performance (after creating 32
> partitions per SSD) and running 64 9+1 (2 in reality) stripes is up
> to 11.4M 4K random read IOPS, out of 17M that the box is capable,
> which I'm happy with, because I can't NUMA the raid stripes as I would
> the individual SSDs themselves. However, when I perform the RAID0
> striping to make the "RAID50 from hell", my performance drops to 7.1M
> 4K random read IOPS. Any suggestions? The last RAID50, again won't
> let me generate the queue depth.
>
> Thanks in advance,
> Jim
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Nr_requests mdraid
2020-11-16 16:51 Finlayson, James M CIV (USA)
2020-11-20 10:42 ` Nikolay Kichukov
@ 2020-11-20 13:20 ` Vitaly Mayatskih
1 sibling, 0 replies; 6+ messages in thread
From: Vitaly Mayatskih @ 2020-11-20 13:20 UTC (permalink / raw)
To: Finlayson, James M CIV (USA); +Cc: linux-raid
On Mon, Nov 16, 2020 at 12:27 PM Finlayson, James M CIV (USA)
<james.m.finlayson4.civ@mail.mil> wrote:
> Thank you for the tip. My raid5 performance (after creating 32 partitions per SSD) and running 64 9+1 (2 in reality) stripes is up to 11.4M 4K random read IOPS, out of 17M that the box is capable, which I'm happy with, because I can't NUMA the raid stripes as I would the individual SSDs themselves. However, when I perform the RAID0 striping to make the "RAID50 from hell", my performance drops to 7.1M 4K random read IOPS. Any suggestions? The last RAID50, again won't let me generate the queue depth.
This is something not currently supported for non-blk-mq disks. Best
you could do is to recompile your kernel with a bigger BLKDEV_MAX_RQ.
--
wbr, Vitaly
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-11-20 13:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-21 20:29 Nr_requests mdraid Finlayson, James M CIV (USA)
2020-10-28 18:39 ` Vitaly Mayatskih
2020-11-16 16:51 Finlayson, James M CIV (USA)
2020-11-20 10:42 ` Nikolay Kichukov
2020-11-20 13:04 ` Vitaly Mayatskih
2020-11-20 13:20 ` Vitaly Mayatskih
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).