* bad IOPS when running multiple btest/fio in parallel
@ 2018-10-12 4:44 Yao Lin
2018-10-12 14:39 ` Keith Busch
2018-10-12 15:49 ` Bart Van Assche
0 siblings, 2 replies; 6+ messages in thread
From: Yao Lin @ 2018-10-12 4:44 UTC (permalink / raw)
Today I changed to a much simpler setup and the same issue persists.
Directly connect 2 PCs (identical hardware) with a pair of 100G rNICs. Create a null block device on the target PC and configure it as the NVMeOF target. So, there is no switch or SSD in this setup. And this is a single FIO, not the 4 FIO in parallel I mentioned earlier.
Start fio test against that null block device from the host, the best IOPS is 1550K. That's the best IOPS after I try out many different QD, # of job, and CPU affinity setting. Run the same fio test on the target, I get 2250K IOPS (it jumps to 3650K when I increased the number of threads). ?
So it seems to me that Linux NVMe stack is quite good and can support 100Gb/s + throughput. But the same can not be said of the NVMeOF stack. Any tuning possible?
^ permalink raw reply [flat|nested] 6+ messages in thread
* bad IOPS when running multiple btest/fio in parallel
2018-10-12 4:44 bad IOPS when running multiple btest/fio in parallel Yao Lin
@ 2018-10-12 14:39 ` Keith Busch
2018-10-12 15:37 ` [EXT] " Yao Lin
2018-10-12 15:49 ` Bart Van Assche
1 sibling, 1 reply; 6+ messages in thread
From: Keith Busch @ 2018-10-12 14:39 UTC (permalink / raw)
On Fri, Oct 12, 2018@04:44:22AM +0000, Yao Lin wrote:
> Today I changed to a much simpler setup and the same issue persists.
>
> Directly connect 2 PCs (identical hardware) with a pair of 100G rNICs. Create a null block device on the target PC and configure it as the NVMeOF target. So, there is no switch or SSD in this setup. And this is a single FIO, not the 4 FIO in parallel I mentioned earlier.
>
> Start fio test against that null block device from the host, the best IOPS is 1550K. That's the best IOPS after I try out many different QD, # of job, and CPU affinity setting. Run the same fio test on the target, I get 2250K IOPS (it jumps to 3650K when I increased the number of threads). ?
>
> So it seems to me that Linux NVMe stack is quite good and can support 100Gb/s + throughput. But the same can not be said of the NVMeOF stack. Any tuning possible?
You're sure it's the software stack? Need to check your CPU utilization to
see if that's a possibility.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [EXT] Re: bad IOPS when running multiple btest/fio in parallel
2018-10-12 14:39 ` Keith Busch
@ 2018-10-12 15:37 ` Yao Lin
0 siblings, 0 replies; 6+ messages in thread
From: Yao Lin @ 2018-10-12 15:37 UTC (permalink / raw)
I monitor the CPU usage during all these tests. I have a powerful CPU (i9-7940X) and none of its cores ever reach 80% load.
-----Original Message-----
From: Keith Busch [mailto:keith.busch@intel.com]
Sent: Friday, October 12, 2018 7:39 AM
To: Yao Lin <yaolin at marvell.com>
Cc: linux-nvme at lists.infradead.org
Subject: [EXT] Re: bad IOPS when running multiple btest/fio in parallel
External Email
----------------------------------------------------------------------
On Fri, Oct 12, 2018@04:44:22AM +0000, Yao Lin wrote:
> Today I changed to a much simpler setup and the same issue persists.
>
> Directly connect 2 PCs (identical hardware) with a pair of 100G rNICs. Create a null block device on the target PC and configure it as the NVMeOF target. So, there is no switch or SSD in this setup. And this is a single FIO, not the 4 FIO in parallel I mentioned earlier.
>
> Start fio test against that null block device from the host, the best
> IOPS is 1550K. That's the best IOPS after I try out many different QD,
> # of job, and CPU affinity setting. Run the same fio test on the
> target, I get 2250K IOPS (it jumps to 3650K when I increased the
> number of threads). ?
>
> So it seems to me that Linux NVMe stack is quite good and can support 100Gb/s + throughput. But the same can not be said of the NVMeOF stack. Any tuning possible?
You're sure it's the software stack? Need to check your CPU utilization to see if that's a possibility.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bad IOPS when running multiple btest/fio in parallel
2018-10-12 4:44 bad IOPS when running multiple btest/fio in parallel Yao Lin
2018-10-12 14:39 ` Keith Busch
@ 2018-10-12 15:49 ` Bart Van Assche
2018-10-12 16:02 ` [EXT] " Yao Lin
1 sibling, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2018-10-12 15:49 UTC (permalink / raw)
On Fri, 2018-10-12@04:44 +0000, Yao Lin wrote:
> Today I changed to a much simpler setup and the same issue persists.
>
> Directly connect 2 PCs (identical hardware) with a pair of 100G rNICs.
> Create a null block device on the target PC and configure it as the
> NVMeOF target. So, there is no switch or SSD in this setup. And this is
> a single FIO, not the 4 FIO in parallel I mentioned earlier.
>
> Start fio test against that null block device from the host, the best
> IOPS is 1550K. That's the best IOPS after I try out many different QD,
> # of job, and CPU affinity setting. Run the same fio test on the target,
> I get 2250K IOPS (it jumps to 3650K when I increased the number of
> threads).
>
> So it seems to me that Linux NVMe stack is quite good and can support
> 100Gb/s + throughput. But the same can not be said of the NVMeOF stack.
> Any tuning possible?
Many high-speed network adapters need multiple connections between
initiator and target to achieve line rate (typically 2-4 connections).
>From the NVMeOF initiator driver:
set->nr_hw_queues = nctrl->queue_count - 1;
I think the "queue_count" parameter can be configured when creating a
connection. From the drivers/nvme/host/fabrics.c source file:
static const match_table_t opt_tokens = {
[ ... ]
{ NVMF_OPT_NR_IO_QUEUES, "nr_io_queues=%d" },
[ ... ]
};
Have you tried to modify the nr_io_queues parameter? Have you verified
whether the 100G NICs you are using allocate multiple MSI/X vectors and
whether each vector has been assigned to another CPU?
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [EXT] Re: bad IOPS when running multiple btest/fio in parallel
2018-10-12 15:49 ` Bart Van Assche
@ 2018-10-12 16:02 ` Yao Lin
2018-10-15 7:50 ` Sagi Grimberg
0 siblings, 1 reply; 6+ messages in thread
From: Yao Lin @ 2018-10-12 16:02 UTC (permalink / raw)
Thanks Bart. In my original post, I list the performance from 2 different 100G NICs. I worked with the engineer for the NIC that performs better. Their driver does support large number of IRQ which are assigned to all 28 CPUs in a round-robin manner. But even with this design, that NIC can hit only 76Gb/s for RoCEv2 traffic.
I haven't got the response from the other NIC vendor. Their RoCEv2 throughput has never exceed 55Gb/s. I will take a look at the source code.
-----Original Message-----
From: Bart Van Assche [mailto:bvanassche@acm.org]
Sent: Friday, October 12, 2018 8:49 AM
To: Yao Lin ; linux-nvme at lists.infradead.org
Subject: [EXT] Re: bad IOPS when running multiple btest/fio in parallel
External Email
----------------------------------------------------------------------
On Fri, 2018-10-12@04:44 +0000, Yao Lin wrote:
> Today I changed to a much simpler setup and the same issue persists.
>
> Directly connect 2 PCs (identical hardware) with a pair of 100G rNICs.
> Create a null block device on the target PC and configure it as the
> NVMeOF target. So, there is no switch or SSD in this setup. And this
> is a single FIO, not the 4 FIO in parallel I mentioned earlier.
>
> Start fio test against that null block device from the host, the best
> IOPS is 1550K. That's the best IOPS after I try out many different QD,
> # of job, and CPU affinity setting. Run the same fio test on the
> target, I get 2250K IOPS (it jumps to 3650K when I increased the
> number of threads).
>
> So it seems to me that Linux NVMe stack is quite good and can support
> 100Gb/s + throughput. But the same can not be said of the NVMeOF stack.
> Any tuning possible?
Many high-speed network adapters need multiple connections between initiator and target to achieve line rate (typically 2-4 connections).
>From the NVMeOF initiator driver:
set->nr_hw_queues = nctrl->queue_count - 1;
I think the "queue_count" parameter can be configured when creating a connection. From the drivers/nvme/host/fabrics.c source file:
static const match_table_t opt_tokens = {
[ ... ]
{ NVMF_OPT_NR_IO_QUEUES, "nr_io_queues=%d" },
[ ... ]
};
Have you tried to modify the nr_io_queues parameter? Have you verified whether the 100G NICs you are using allocate multiple MSI/X vectors and whether each vector has been assigned to another CPU?
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [EXT] Re: bad IOPS when running multiple btest/fio in parallel
2018-10-12 16:02 ` [EXT] " Yao Lin
@ 2018-10-15 7:50 ` Sagi Grimberg
0 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2018-10-15 7:50 UTC (permalink / raw)
> Thanks Bart. In my original post, I list the performance from 2 different 100G NICs. I worked with the engineer for the NIC that performs better. Their driver does support large number of IRQ which are assigned to all 28 CPUs in a round-robin manner. But even with this design, that NIC can hit only 76Gb/s for RoCEv2 traffic.
>
> I haven't got the response from the other NIC vendor. Their RoCEv2 throughput has never exceed 55Gb/s. I will take a look at the source code.
What kernel version are you running?
Do you happen to run irq balancer?
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-10-15 7:50 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-12 4:44 bad IOPS when running multiple btest/fio in parallel Yao Lin
2018-10-12 14:39 ` Keith Busch
2018-10-12 15:37 ` [EXT] " Yao Lin
2018-10-12 15:49 ` Bart Van Assche
2018-10-12 16:02 ` [EXT] " Yao Lin
2018-10-15 7:50 ` Sagi Grimberg
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.