All of lore.kernel.org
 help / color / mirror / Atom feed
* NVMe array support
@ 2017-11-22 17:56 Kevin M. Hildebrand
       [not found] ` <329VkVTTB6304S04.1511378401@web04.cms.usa.net>
  2017-11-23  8:10 ` Christoph Hellwig
  0 siblings, 2 replies; 5+ messages in thread
From: Kevin M. Hildebrand @ 2017-11-22 17:56 UTC (permalink / raw)


I've got eight Samsung PM1725a NVMe drives I'm trying to combine into
an array to be able to aggregate the performance of having multiple
drives.  My initial experiments have yielded abysmal performance in
most cases.  I've tried creating RAID 0 arrays with MD raid, ZFS, and
a few others and most of the time I'm getting somewhere around the
performance of a single drive, even though I've got more than one.
The only way I can get decent performance is when writing to the array
in direct mode (O_DIRECT).  I've been using fio, fdt, and dd for
running tests.  Has anyone successfully created software arrays of
NVMe drives and been able to get usable performance from them?  The
drives are all in a DELL R940 server, which has 4 Skylake CPUs, and
all of the drives are connected to a single CPU, with full PCI
bandwidth.

Sorry if this isn't the right place to send this message, I'm having a
hard time finding anyone that's doing this.

If anyone's doing this successfully, I'd love to hear more about your
configuration.

Thanks!
Kevin

--
Kevin Hildebrand
University of Maryland
Division of IT

^ permalink raw reply	[flat|nested] 5+ messages in thread

* NVMe array support
       [not found] ` <329VkVTTB6304S04.1511378401@web04.cms.usa.net>
@ 2017-11-22 20:09   ` Kevin M. Hildebrand
       [not found]     ` <040VkwDHn2448S07.1511406519@web07.cms.usa.net>
  0 siblings, 1 reply; 5+ messages in thread
From: Kevin M. Hildebrand @ 2017-11-22 20:09 UTC (permalink / raw)


You're using linux MD raid?  Have you been able to get good
performance with something other than "fio -direct"?

I have a RAID 0 with eight elements (see below for details).

Running fio on an individual drive in direct mode gives me okay
performance for that drive- around 1.8-1.9GB/s seq write.
Running fio on an individual drive in buffered mode gives me wildly
variable performance according to fio, but iostat shows similar rates
to the drive, around 1.8-1.9GB/s.

Running fio to the array in direct mode gives me performance for the
array at around 12GB/s, which is reasonable, and approximately what
I'd expect.
Running fio to the array in buffered mode also gives varied
performance according to fio, but iostat shows write rates to the
array at around 2GB/s, barely better than a single drive.

If I put a filesystem (ext4, for example, though I've also tried
others...) on top of the array and run fio with multiple files and
multiple threads, I get slightly better performance in buffered mode,
but nowhere near the 12-14GB/s I'm looking for.  Playing with CPU
affinity helps a little too, but still nowhere near what I need.

Running fdt or gridftp or other actual applications I am able to get
no better than around 2GB/s, which again is around the speed of a
single drive.

Thanks,
Kevin

# mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Wed Nov 22 14:57:35 2017
        Raid Level : raid0
        Array Size : 12501458944 (11922.32 GiB 12801.49 GB)
      Raid Devices : 8
     Total Devices : 8
       Persistence : Superblock is persistent

       Update Time : Wed Nov 22 14:57:35 2017
             State : clean
    Active Devices : 8
   Working Devices : 8
    Failed Devices : 0
     Spare Devices : 0

        Chunk Size : 512K

Consistency Policy : none

              Name : XXX
              UUID : 2a2234a4:78d2bbb2:9e1b3031:022b3315
            Events : 0

    Number   Major   Minor   RaidDevice State
       0     259        2        0      active sync   /dev/nvme0n1
       1     259        7        1      active sync   /dev/nvme1n1
       2     259        5        2      active sync   /dev/nvme2n1
       3     259        1        3      active sync   /dev/nvme3n1
       4     259        4        4      active sync   /dev/nvme4n1
       5     259        3        5      active sync   /dev/nvme5n1
       6     259        0        6      active sync   /dev/nvme6n1
       7     259        6        7      active sync   /dev/nvme7n1



On Wed, Nov 22, 2017@2:20 PM, Joshua Mora <joshua_mora@usa.net> wrote:
> Hi Kevin.
> I did, they are great.
> I get to max them out for both reads and writes.
> I have used the ones of 1.6TB (so ~3.4GB/s seq read and ~2.2GB/s seq writes
> with 128k record length). You don't need large iodepth.
> I tested for instance RAID 10 with 4 drives and tested surprise removal when
> I was doing writes.
> Using AMD EPYC based platform, leveraging the many PCIE lanes that it has.
> You want to use 1 core for every 2 NVMEs to max them out for large record
> length.
> You will need more cores for 4k record length.
>
> Joshua
>
>
> ------ Original Message ------
> Received: 11:57 AM CST, 11/22/2017
> From: "Kevin M. Hildebrand" <kevin at umd.edu>
> To: linux-nvme at lists.infradead.org
> Subject: NVMe array support
>
>
> I've got eight Samsung PM1725a NVMe drives I'm trying to combine into
> an array to be able to aggregate the performance of having multiple
> drives. My initial experiments have yielded abysmal performance in
> most cases. I've tried creating RAID 0 arrays with MD raid, ZFS, and
> a few others and most of the time I'm getting somewhere around the
> performance of a single drive, even though I've got more than one.
> The only way I can get decent performance is when writing to the array
> in direct mode (O_DIRECT). I've been using fio, fdt, and dd for
> running tests. Has anyone successfully created software arrays of
> NVMe drives and been able to get usable performance from them? The
> drives are all in a DELL R940 server, which has 4 Skylake CPUs, and
> all of the drives are connected to a single CPU, with full PCI
> bandwidth.
>
> Sorry if this isn't the right place to send this message, I'm having a
> hard time finding anyone that's doing this.
>
> If anyone's doing this successfully, I'd love to hear more about your
> configuration.
>
> Thanks!
> Kevin
>
> --
> Kevin Hildebrand
> University of Maryland
> Division of IT
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* NVMe array support
  2017-11-22 17:56 NVMe array support Kevin M. Hildebrand
       [not found] ` <329VkVTTB6304S04.1511378401@web04.cms.usa.net>
@ 2017-11-23  8:10 ` Christoph Hellwig
  2017-11-27 13:44   ` Kevin M. Hildebrand
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2017-11-23  8:10 UTC (permalink / raw)


On Wed, Nov 22, 2017@12:56:39PM -0500, Kevin M. Hildebrand wrote:
> I've got eight Samsung PM1725a NVMe drives I'm trying to combine into
> an array to be able to aggregate the performance of having multiple
> drives.  My initial experiments have yielded abysmal performance in
> most cases.  I've tried creating RAID 0 arrays with MD raid, ZFS, and

If you want to use non-GPL licensed out of tree modules you are on your
own, please don't waste our time.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* NVMe array support
       [not found]     ` <040VkwDHn2448S07.1511406519@web07.cms.usa.net>
@ 2017-11-27 13:41       ` Kevin M. Hildebrand
  0 siblings, 0 replies; 5+ messages in thread
From: Kevin M. Hildebrand @ 2017-11-27 13:41 UTC (permalink / raw)


I have indeed looked at the IRQ affinity.  At the moment, without
doing anything special, and even with irqbalance running, it appears
that IRQs are well spread across all of the CPU cores.
I've checked on both of my test boxes, one running kernel 4.14.1, and
the other running 3.10.0-693.5.2 (both on RedHat 7.4 systems).

As I originally mentioned, I am able to get good performance with
multiple fio jobs running in direct mode, but that's only good for
benchmarks.  I'm looking for others that are able to get good
real-world performance out of their arrays, using buffered mode.

Do you have a filesystem on your arrays (if so, which one?), and are
you able to get anywhere close to your measured performance when using
other applications?

Thanks!
Kevin



On Wed, Nov 22, 2017@10:08 PM, Joshua Mora <joshua_mora@usa.net> wrote:
> Did you play with IRQ affinity on the NVMEs ?
> By default they may go to single core.
> You have to spread them across several cores.
>
> I get 13.6GB/s read with 4 NVMEs.
> I get 52GB/s read with 16 NVMEs.
>
> I get 9M iops with 16 NVMEs using kernel mode.
>
> I am assuming you are running multiple jobs not a single one and using
> cpus_allowed to pin each fio job to a different core.
>
> All the tests I do are direct, not through memory.
>
> Joshua
>
>
> ------ Original Message ------
> Received: 02:09 PM CST, 11/22/2017
> From: "Kevin M. Hildebrand" <kevin at umd.edu>
> To: Joshua Mora <joshua_mora at usa.net>
> Cc: linux-nvme at lists.infradead.org
> Subject: Re: NVMe array support
>
>
> You're using linux MD raid? Have you been able to get good
> performance with something other than "fio -direct"?
>
> I have a RAID 0 with eight elements (see below for details).
>
> Running fio on an individual drive in direct mode gives me okay
> performance for that drive- around 1.8-1.9GB/s seq write.
> Running fio on an individual drive in buffered mode gives me wildly
> variable performance according to fio, but iostat shows similar rates
> to the drive, around 1.8-1.9GB/s.
>
> Running fio to the array in direct mode gives me performance for the
> array at around 12GB/s, which is reasonable, and approximately what
> I'd expect.
> Running fio to the array in buffered mode also gives varied
> performance according to fio, but iostat shows write rates to the
> array at around 2GB/s, barely better than a single drive.
>
> If I put a filesystem (ext4, for example, though I've also tried
> others...) on top of the array and run fio with multiple files and
> multiple threads, I get slightly better performance in buffered mode,
> but nowhere near the 12-14GB/s I'm looking for. Playing with CPU
> affinity helps a little too, but still nowhere near what I need.
>
> Running fdt or gridftp or other actual applications I am able to get
> no better than around 2GB/s, which again is around the speed of a
> single drive.
>
> Thanks,
> Kevin
>
> # mdadm --detail /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Wed Nov 22 14:57:35 2017
> Raid Level : raid0
> Array Size : 12501458944 (11922.32 GiB 12801.49 GB)
> Raid Devices : 8
> Total Devices : 8
> Persistence : Superblock is persistent
>
> Update Time : Wed Nov 22 14:57:35 2017
> State : clean
> Active Devices : 8
> Working Devices : 8
> Failed Devices : 0
> Spare Devices : 0
>
> Chunk Size : 512K
>
> Consistency Policy : none
>
> Name : XXX
> UUID : 2a2234a4:78d2bbb2:9e1b3031:022b3315
> Events : 0
>
> Number Major Minor RaidDevice State
> 0 259 2 0 active sync /dev/nvme0n1
> 1 259 7 1 active sync /dev/nvme1n1
> 2 259 5 2 active sync /dev/nvme2n1
> 3 259 1 3 active sync /dev/nvme3n1
> 4 259 4 4 active sync /dev/nvme4n1
> 5 259 3 5 active sync /dev/nvme5n1
> 6 259 0 6 active sync /dev/nvme6n1
> 7 259 6 7 active sync /dev/nvme7n1
>
>
>
> On Wed, Nov 22, 2017@2:20 PM, Joshua Mora <joshua_mora@usa.net> wrote:
>> Hi Kevin.
>> I did, they are great.
>> I get to max them out for both reads and writes.
>> I have used the ones of 1.6TB (so ~3.4GB/s seq read and ~2.2GB/s seq
>> writes
>> with 128k record length). You don't need large iodepth.
>> I tested for instance RAID 10 with 4 drives and tested surprise removal
>> when
>> I was doing writes.
>> Using AMD EPYC based platform, leveraging the many PCIE lanes that it has.
>> You want to use 1 core for every 2 NVMEs to max them out for large record
>> length.
>> You will need more cores for 4k record length.
>>
>> Joshua
>>
>>
>> ------ Original Message ------
>> Received: 11:57 AM CST, 11/22/2017
>> From: "Kevin M. Hildebrand" <kevin at umd.edu>
>> To: linux-nvme at lists.infradead.org
>> Subject: NVMe array support
>>
>>
>> I've got eight Samsung PM1725a NVMe drives I'm trying to combine into
>> an array to be able to aggregate the performance of having multiple
>> drives. My initial experiments have yielded abysmal performance in
>> most cases. I've tried creating RAID 0 arrays with MD raid, ZFS, and
>> a few others and most of the time I'm getting somewhere around the
>> performance of a single drive, even though I've got more than one.
>> The only way I can get decent performance is when writing to the array
>> in direct mode (O_DIRECT). I've been using fio, fdt, and dd for
>> running tests. Has anyone successfully created software arrays of
>> NVMe drives and been able to get usable performance from them? The
>> drives are all in a DELL R940 server, which has 4 Skylake CPUs, and
>> all of the drives are connected to a single CPU, with full PCI
>> bandwidth.
>>
>> Sorry if this isn't the right place to send this message, I'm having a
>> hard time finding anyone that's doing this.
>>
>> If anyone's doing this successfully, I'd love to hear more about your
>> configuration.
>>
>> Thanks!
>> Kevin
>>
>> --
>> Kevin Hildebrand
>> University of Maryland
>> Division of IT
>>
>> _______________________________________________
>> Linux-nvme mailing list
>> Linux-nvme at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>>
>>
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* NVMe array support
  2017-11-23  8:10 ` Christoph Hellwig
@ 2017-11-27 13:44   ` Kevin M. Hildebrand
  0 siblings, 0 replies; 5+ messages in thread
From: Kevin M. Hildebrand @ 2017-11-27 13:44 UTC (permalink / raw)


Thanks for your comment.  I mentioned ZFS only as a possible test
case, to try to eliminate whether it was MD raid that was the
performance bottleneck, or something else.  I'm not planning on using
ZFS on these arrays otherwise.

Kevin

On Thu, Nov 23, 2017@3:10 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Nov 22, 2017@12:56:39PM -0500, Kevin M. Hildebrand wrote:
>> I've got eight Samsung PM1725a NVMe drives I'm trying to combine into
>> an array to be able to aggregate the performance of having multiple
>> drives.  My initial experiments have yielded abysmal performance in
>> most cases.  I've tried creating RAID 0 arrays with MD raid, ZFS, and
>
> If you want to use non-GPL licensed out of tree modules you are on your
> own, please don't waste our time.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-27 13:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-22 17:56 NVMe array support Kevin M. Hildebrand
     [not found] ` <329VkVTTB6304S04.1511378401@web04.cms.usa.net>
2017-11-22 20:09   ` Kevin M. Hildebrand
     [not found]     ` <040VkwDHn2448S07.1511406519@web07.cms.usa.net>
2017-11-27 13:41       ` Kevin M. Hildebrand
2017-11-23  8:10 ` Christoph Hellwig
2017-11-27 13:44   ` Kevin M. Hildebrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.