All of lore.kernel.org
 help / color / mirror / Atom feed
* NVMf (NVME over fabrics) Performance
@ 2016-09-19  5:33 Kirubakaran Kaliannan
  2016-09-21 18:17 ` Sagi Grimberg
  0 siblings, 1 reply; 4+ messages in thread
From: Kirubakaran Kaliannan @ 2016-09-19  5:33 UTC (permalink / raw)


Hi All,

I am working on measuring the NVMf (with Mellanox ConnectX-3 pro(40Gps)
and Intel P3600) performance numbers on my 2 servers with 32 CPU each
(64GB RAM).

These are the numbers I am getting from IOPS perspective for Read, and for
4K size I/O?s

1 NULL block-devices using NVMf = 600K
2 NULL block-device using NVMf = 600k (not growing linearly per device)

1 Intel NVME device through NVMf= 450K
2 Intel NVME device through NVMf = 470K (There is no increase in IOPs
beyond 500K by adding more devices)

I installed 4.7 + rc2 Linux kernel
(git://git.infradead.org/nvme-fabrics.git)
CPU/RAM is not the bottleneck.
Mellanox card is 40GBps ? 600 IOPs use only 2400 MB (it still have ~2000
MB bandwidth)

With NVME (same server) I can see the linear increase in performance by
adding more devices.

Questions:

Can you please share the NVMf numbers available ?
Is there any configuration required to improve the performance linearly by
adding more devices ?
Looking for your direction/suggestion in achieving the max NVMf
performance numbers.

Thanks,
-kiru

^ permalink raw reply	[flat|nested] 4+ messages in thread

* NVMf (NVME over fabrics) Performance
  2016-09-19  5:33 NVMf (NVME over fabrics) Performance Kirubakaran Kaliannan
@ 2016-09-21 18:17 ` Sagi Grimberg
  2016-09-22  8:53   ` Kirubakaran Kaliannan
  0 siblings, 1 reply; 4+ messages in thread
From: Sagi Grimberg @ 2016-09-21 18:17 UTC (permalink / raw)


> Hi All,

Hey Kiru,

> I am working on measuring the NVMf (with Mellanox ConnectX-3 pro(40Gps)
> and Intel P3600) performance numbers on my 2 servers with 32 CPU each
> (64GB RAM).
>
> These are the numbers I am getting from IOPS perspective for Read, and for
> 4K size I/O?s
>
> 1 NULL block-devices using NVMf = 600K
> 2 NULL block-device using NVMf = 600k (not growing linearly per device)

Can you recognize if any side is blocked by CPU (it shouldn't).

Are all cores active in the target system?

is irqbalancer running?

Do you have register_always modparam turned on in nvme-rdma? Can you
try without it?

> 1 Intel NVME device through NVMf= 450K
> 2 Intel NVME device through NVMf = 470K (There is no increase in IOPs
> beyond 500K by adding more devices)

Can you try the latest code in 4.8-rc7?

As a second experiment can you try this patch applied (submitted to
linux-rdma lately)?
---
  drivers/infiniband/core/device.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/device.c 
b/drivers/infiniband/core/device.c
index 760ef603a468..15f4bdf89fe1 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -999,8 +999,7 @@ static int __init ib_core_init(void)
  		return -ENOMEM;

  	ib_comp_wq = alloc_workqueue("ib-comp-wq",
-			WQ_UNBOUND | WQ_HIGHPRI | WQ_MEM_RECLAIM,
-			WQ_UNBOUND_MAX_ACTIVE);
+			WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
  	if (!ib_comp_wq) {
  		ret = -ENOMEM;
  		goto err;
-- 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* NVMf (NVME over fabrics) Performance
  2016-09-21 18:17 ` Sagi Grimberg
@ 2016-09-22  8:53   ` Kirubakaran Kaliannan
  2016-09-23 21:10     ` Sagi Grimberg
  0 siblings, 1 reply; 4+ messages in thread
From: Kirubakaran Kaliannan @ 2016-09-22  8:53 UTC (permalink / raw)


Hi Saig,

Thanks for the suggestion, here is what I have (re)tried

1. Both target and initiator is not blocked by CPU (CPU is atleast 70% idle,
and load is distributed across 16 CPU's)
2. Yes, irqbalancer is running.
3. register_always is true already, I have upgraded to 4.8.rc7.
4. added the fix that you suggested

Still, the max IOPS (for NULL device) that I get is 600 with NVMf !!

Thanks
-kiru

-----Original Message-----
From: Sagi Grimberg [mailto:sagi@grimberg.me]
Sent: Wednesday, September 21, 2016 11:48 PM
To: Kirubakaran Kaliannan; linux-nvme at lists.infradead.org
Subject: Re: NVMf (NVME over fabrics) Performance

> Hi All,

Hey Kiru,

> I am working on measuring the NVMf (with Mellanox ConnectX-3
> pro(40Gps) and Intel P3600) performance numbers on my 2 servers with
> 32 CPU each (64GB RAM).
>
> These are the numbers I am getting from IOPS perspective for Read, and
> for 4K size I/O?s
>
> 1 NULL block-devices using NVMf = 600K
> 2 NULL block-device using NVMf = 600k (not growing linearly per
> device)

Can you recognize if any side is blocked by CPU (it shouldn't).

Are all cores active in the target system?

is irqbalancer running?

Do you have register_always modparam turned on in nvme-rdma? Can you try
without it?

> 1 Intel NVME device through NVMf= 450K
> 2 Intel NVME device through NVMf = 470K (There is no increase in IOPs
> beyond 500K by adding more devices)

Can you try the latest code in 4.8-rc7?

As a second experiment can you try this patch applied (submitted to
linux-rdma lately)?
---
  drivers/infiniband/core/device.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/device.c
b/drivers/infiniband/core/device.c
index 760ef603a468..15f4bdf89fe1 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -999,8 +999,7 @@ static int __init ib_core_init(void)
  		return -ENOMEM;

  	ib_comp_wq = alloc_workqueue("ib-comp-wq",
-			WQ_UNBOUND | WQ_HIGHPRI | WQ_MEM_RECLAIM,
-			WQ_UNBOUND_MAX_ACTIVE);
+			WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
  	if (!ib_comp_wq) {
  		ret = -ENOMEM;
  		goto err;
--

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* NVMf (NVME over fabrics) Performance
  2016-09-22  8:53   ` Kirubakaran Kaliannan
@ 2016-09-23 21:10     ` Sagi Grimberg
  0 siblings, 0 replies; 4+ messages in thread
From: Sagi Grimberg @ 2016-09-23 21:10 UTC (permalink / raw)



> Hi Saig,

Hey Kiru,

> Thanks for the suggestion, here is what I have (re)tried
>
> 1. Both target and initiator is not blocked by CPU (CPU is atleast 70% idle,
> and load is distributed across 16 CPU's)

The distribution makes sense as we're multi-queue.

> 2. Yes, irqbalancer is running.

I'd advise to turn it off when testing performance. I've never
really seen irqbalancer actually help something...

> 3. register_always is true already, I have upgraded to 4.8.rc7.

OK, this is a bit tricky, but having it on will usually hurt your
4k read performance. Its the correct thing to do but you should
be able to get better performance.

Some background:
nvme-rdma (like iser, srp and others) can optimize 4k reads (or reads
that fit in a single page) by skipping memory registration and send
a global rkey, which is good for performance but exposes host memory
to the target (which can abuse it if buggy/malicious).

Since you mentioned you are using ConnectX3 devices, it makes sense
that it really slow things down because ConnectX3 devices has severe
fencing strategy for memory registrations. There are some devices
that has better performance with small registrations on...

So, I suggest using register_always=N when testing small 4k reads.

Do you see the same with 4k writes btw?

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-09-23 21:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-19  5:33 NVMf (NVME over fabrics) Performance Kirubakaran Kaliannan
2016-09-21 18:17 ` Sagi Grimberg
2016-09-22  8:53   ` Kirubakaran Kaliannan
2016-09-23 21:10     ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.