linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Nr_requests mdraid
@ 2020-11-16 16:51 Finlayson, James M CIV (USA)
  2020-11-20 10:42 ` Nikolay Kichukov
  2020-11-20 13:20 ` Vitaly Mayatskih
  0 siblings, 2 replies; 6+ messages in thread
From: Finlayson, James M CIV (USA) @ 2020-11-16 16:51 UTC (permalink / raw)
  To: 'linux-raid@vger.kernel.org'

On Wed, Oct 28, 2020 at 6:39 PM Vitaly Mayatskih <v.mayatskih@gmail.com> wrote
>
>On Thu, Oct 22, 2020 at 2:56 AM Finlayson, James M CIV (USA) <james.m.finlayson4.civ@mail.mil> wrote:
>> 
>> All,
>> I'm working on creating raid5 or raid6 arrays of 800K IOP nvme drives.   Each of \
>> the drives performs well with a queue depth of 128 and I set to 1023 if allowed.   \
>> In order for me to try to max out the queue depth on each RAID member, so I'd like \
>> to set the sysfs nr_requests on the md device to something greater than 128, like \
>> #raid members * 128.   Even though /sys/block/md127/queue/nr_requests is mode 644, \
>> when I try to change nr_requests in any way as root, I get write error: invalid \
>> argument.  When I'm hitting the md device with random reads, my nvme drives are \
>> 100% utilized, but only doing 160K IOPS because they have no queue depth. 
>> Am I doing something silly?
>
>It only works for blk-mq block devices. MD is not blk-mq.
>
>You can exchange simplicity for performance: instead of creating one
>RAID-5/6 array you can partition drives in N equal sized partitions,
>create N RAID-5/6 arrays using one partition from every disk, then
>stripe them into top-level RAID-0. So that would be RAID-5+0 (or 6+0).
>
>It is awful, but simulates multiqueue and performs better in parallel
>loads. Especially for writes (on RAID-5/6).
>
>
>-- 
>wbr, Vitaly

Vitaly,
Thank you for the tip.  My raid5 performance (after creating 32 partitions per SSD) and running 64  9+1 (2 in reality) stripes is up to 11.4M 4K random read IOPS, out of 17M that the box is capable, which I'm happy with, because I can't NUMA the raid stripes as I would the individual SSDs themselves.   However, when I perform the RAID0 striping to make the "RAID50 from hell", my performance drops to 7.1M 4K random read IOPS.   Any suggestions?  The last RAID50, again won't let me generate the queue depth.

Thanks in advance,
Jim




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Nr_requests mdraid
  2020-11-16 16:51 Nr_requests mdraid Finlayson, James M CIV (USA)
@ 2020-11-20 10:42 ` Nikolay Kichukov
  2020-11-20 13:04   ` Vitaly Mayatskih
  2020-11-20 13:20 ` Vitaly Mayatskih
  1 sibling, 1 reply; 6+ messages in thread
From: Nikolay Kichukov @ 2020-11-20 10:42 UTC (permalink / raw)
  To: Finlayson, James M CIV (USA), 'linux-raid@vger.kernel.org'

Hello all,

On Mon, 2020-11-16 at 16:51 +0000, Finlayson, James M CIV (USA) wrote:
> On Wed, Oct 28, 2020 at 6:39 PM Vitaly Mayatskih <
> v.mayatskih@gmail.com> wrote
> > 
> > On Thu, Oct 22, 2020 at 2:56 AM Finlayson, James M CIV (USA) <
> > james.m.finlayson4.civ@mail.mil> wrote:
> > > 
> > > All,
> > > I'm working on creating raid5 or raid6 arrays of 800K IOP nvme
> > > drives.   Each of \
> > > the drives performs well with a queue depth of 128 and I set to
> > > 1023 if allowed.   \
> > > In order for me to try to max out the queue depth on each RAID
> > > member, so I'd like \
> > > to set the sysfs nr_requests on the md device to something greater
> > > than 128, like \
> > > #raid members * 128.   Even though
> > > /sys/block/md127/queue/nr_requests is mode 644, \
> > > when I try to change nr_requests in any way as root, I get write
> > > error: invalid \
> > > argument.  When I'm hitting the md device with random reads, my
> > > nvme drives are \
> > > 100% utilized, but only doing 160K IOPS because they have no queue
> > > depth. 
> > > Am I doing something silly?
> > 
> > It only works for blk-mq block devices. MD is not blk-mq.

Would it be possible to implement something similar to the use_blk_mq of
dm_mod on md_mod?

> > 
> > You can exchange simplicity for performance: instead of creating one
> > RAID-5/6 array you can partition drives in N equal sized partitions,
> > create N RAID-5/6 arrays using one partition from every disk, then
> > stripe them into top-level RAID-0. So that would be RAID-5+0 (or
> > 6+0).
> > 
> > It is awful, but simulates multiqueue and performs better in
> > parallel
> > loads. Especially for writes (on RAID-5/6).
> > 
> > 
> > -- 
> > wbr, Vitaly
> 
> Vitaly,
> Thank you for the tip.  My raid5 performance (after creating 32
> partitions per SSD) and running 64  9+1 (2 in reality) stripes is up
> to 11.4M 4K random read IOPS, out of 17M that the box is capable,
> which I'm happy with, because I can't NUMA the raid stripes as I would
> the individual SSDs themselves.   However, when I perform the RAID0
> striping to make the "RAID50 from hell", my performance drops to 7.1M
> 4K random read IOPS.   Any suggestions?  The last RAID50, again won't
> let me generate the queue depth.
> 
> Thanks in advance,
> Jim
> 
> 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Nr_requests mdraid
  2020-11-20 10:42 ` Nikolay Kichukov
@ 2020-11-20 13:04   ` Vitaly Mayatskih
  0 siblings, 0 replies; 6+ messages in thread
From: Vitaly Mayatskih @ 2020-11-20 13:04 UTC (permalink / raw)
  To: Nikolay Kichukov; +Cc: Finlayson, James M CIV (USA), linux-raid

On Fri, Nov 20, 2020 at 5:43 AM Nikolay Kichukov <nikolay@oldum.net> wrote:

> Would it be possible to implement something similar to the use_blk_mq of
> dm_mod on md_mod?

One day, one day... Converting it to blk-mq with nr_queues=1 would be
more or less easy.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Nr_requests mdraid
  2020-11-16 16:51 Nr_requests mdraid Finlayson, James M CIV (USA)
  2020-11-20 10:42 ` Nikolay Kichukov
@ 2020-11-20 13:20 ` Vitaly Mayatskih
  1 sibling, 0 replies; 6+ messages in thread
From: Vitaly Mayatskih @ 2020-11-20 13:20 UTC (permalink / raw)
  To: Finlayson, James M CIV (USA); +Cc: linux-raid

On Mon, Nov 16, 2020 at 12:27 PM Finlayson, James M CIV (USA)
<james.m.finlayson4.civ@mail.mil> wrote:

> Thank you for the tip.  My raid5 performance (after creating 32 partitions per SSD) and running 64  9+1 (2 in reality) stripes is up to 11.4M 4K random read IOPS, out of 17M that the box is capable, which I'm happy with, because I can't NUMA the raid stripes as I would the individual SSDs themselves.   However, when I perform the RAID0 striping to make the "RAID50 from hell", my performance drops to 7.1M 4K random read IOPS.   Any suggestions?  The last RAID50, again won't let me generate the queue depth.

This is something not currently supported for non-blk-mq disks. Best
you could do is to recompile your kernel with a bigger BLKDEV_MAX_RQ.

-- 
wbr, Vitaly

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Nr_requests mdraid
  2020-10-21 20:29 Finlayson, James M CIV (USA)
@ 2020-10-28 18:39 ` Vitaly Mayatskih
  0 siblings, 0 replies; 6+ messages in thread
From: Vitaly Mayatskih @ 2020-10-28 18:39 UTC (permalink / raw)
  To: Finlayson, James M CIV (USA); +Cc: linux-raid

On Thu, Oct 22, 2020 at 2:56 AM Finlayson, James M CIV (USA)
<james.m.finlayson4.civ@mail.mil> wrote:
>
> All,
> I'm working on creating raid5 or raid6 arrays of 800K IOP nvme drives.   Each of the drives performs well with a queue depth of 128 and I set to 1023 if allowed.   In order for me to try to max out the queue depth on each RAID member, so I'd like to set the sysfs nr_requests on the md device to something greater than 128, like #raid members * 128.   Even though /sys/block/md127/queue/nr_requests is mode 644, when I try to change nr_requests in any way as root, I get write error: invalid argument.  When I'm hitting the md device with random reads, my nvme drives are 100% utilized, but only doing 160K IOPS because they have no queue depth.
>
> Am I doing something silly?

It only works for blk-mq block devices. MD is not blk-mq.

You can exchange simplicity for performance: instead of creating one
RAID-5/6 array you can partition drives in N equal sized partitions,
create N RAID-5/6 arrays using one partition from every disk, then
stripe them into top-level RAID-0. So that would be RAID-5+0 (or 6+0).

It is awful, but simulates multiqueue and performs better in parallel
loads. Especially for writes (on RAID-5/6).


-- 
wbr, Vitaly

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Nr_requests mdraid
@ 2020-10-21 20:29 Finlayson, James M CIV (USA)
  2020-10-28 18:39 ` Vitaly Mayatskih
  0 siblings, 1 reply; 6+ messages in thread
From: Finlayson, James M CIV (USA) @ 2020-10-21 20:29 UTC (permalink / raw)
  To: linux-raid

All,
I'm working on creating raid5 or raid6 arrays of 800K IOP nvme drives.   Each of the drives performs well with a queue depth of 128 and I set to 1023 if allowed.   In order for me to try to max out the queue depth on each RAID member, so I'd like to set the sysfs nr_requests on the md device to something greater than 128, like #raid members * 128.   Even though /sys/block/md127/queue/nr_requests is mode 644, when I try to change nr_requests in any way as root, I get write error: invalid argument.  When I'm hitting the md device with random reads, my nvme drives are 100% utilized, but only doing 160K IOPS because they have no queue depth.   

Am I doing something silly?   
Regards,
Jim Finlayson

[root@hp-dl325 ~]# cd /sys/block/md127/queue
[root@hp-dl325 queue]# ls
add_random            discard_zeroes_data  io_timeout              max_segments      optimal_io_size      unpriv_sgio
chunk_sectors         fua                  logical_block_size      max_segment_size  physical_block_size  wbt_lat_usec
dax                   hw_sector_size       max_discard_segments    minimum_io_size   read_ahead_kb        write_cache
discard_granularity   io_poll              max_hw_sectors_kb       nomerges          rotational           write_same_max_bytes
discard_max_bytes     io_poll_delay        max_integrity_segments  nr_requests       rq_affinity          write_zeroes_max_bytes
discard_max_hw_bytes  iostats              max_sectors_kb          nr_zones          scheduler            zoned
[root@hp-dl325 queue]# cat nr_requests
128
[root@hp-dl325 queue]# ls -l nr_requests
-rw-r--r-- 1 root root 4096 Oct 21 18:55 nr_requests
[root@hp-dl325 queue]# echo 1023 > nr_requests
-bash: echo: write error: Invalid argument
[root@hp-dl325 queue]# echo 128 > nr_requests
-bash: echo: write error: Invalid argument
[root@hp-dl325 queue]# pwd
/sys/block/md127/queue
[root@hp-dl325 queue]#

[root@hp-dl325 queue]# mdadm --version
mdadm - v4.1 - 2018-10-01
[root@hp-dl325 queue]# uname -r
4.18.0-193.el8.x86_64
[root@hp-dl325 queue]# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.2 (Ootpa)
[root@hp-dl325 queue]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0]
md126 : active raid5 nvme9n1[10] nvme8n1[8] nvme7n1[7] nvme6n1[6] nvme5n1[5] nvme4n1[4] nvme3n1[3] nvme2n1[2] nvme1n1[1] nvme0n1[0]
      16877177856 blocks super 1.2 level 5, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
      bitmap: 0/14 pages [0KB], 65536KB chunk

md127 : active raid5 nvme18n1[9] nvme17n1[7] nvme16n1[6] nvme15n1[5] nvme14n1[4] nvme13n1[3] nvme12n1[2] nvme11n1[1] nvme10n1[0]
      15001935872 blocks super 1.2 level 5, 512k chunk, algorithm 2 [9/9] [UUUUUUUUU]
      bitmap: 0/14 pages [0KB], 65536KB chunk

unused devices: <none>

[root@hp-dl325 queue]# cat /etc/udev/rules.d/99-local.rules
SUBSYSTEM=="block", ACTION=="add|change", KERNEL=="md*", ATTR{md/sync_speed_max}="2000000",ATTR{md/group_thread_cnt}="16", ATTR{md/stripe_cache_size}="1024" ATTR{queue/nomerges}="2", ATTR{queue/nr_requests}="1023", ATTR{queue/rotational}="0", ATTR{queue/rq_affinity}="2", ATTR{queue/scheduler}="none", ATTR{queue/add_random}="0", ATTR{queue/max_sectors_kb}="4096"
SUBSYSTEM=="block", ACTION=="add|change", KERNEL=="nvme*[0-9]n*[0-9]", ATTRS{model}=="*PM1725a*", ATTR{queue/nomerges}="2", ATTR{queue/nr_requests}="1023", ATTR{queue/rotational}="0", ATTR{queue/rq_affinity}="2", ATTR{queue/scheduler}="none", ATTR{queue/add_random}="0", ATTR{queue/max_sectors_kb}="4096"

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-20 13:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16 16:51 Nr_requests mdraid Finlayson, James M CIV (USA)
2020-11-20 10:42 ` Nikolay Kichukov
2020-11-20 13:04   ` Vitaly Mayatskih
2020-11-20 13:20 ` Vitaly Mayatskih
  -- strict thread matches above, loose matches on Subject: below --
2020-10-21 20:29 Finlayson, James M CIV (USA)
2020-10-28 18:39 ` Vitaly Mayatskih

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).