linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector
@ 2020-03-18 15:02 Max Gurtovoy
  2020-03-18 15:02 ` [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD Max Gurtovoy
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-18 15:02 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, loberman, bvanassche, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch,
	Max Gurtovoy, idanb

This set is a renewed version of the feature for NVMEoF/RDMA target. In
this series I've decided to implement it also for SRP target that had
similar implementatiom (SRQ per HCA) after previous requests from the
community. The logic is intended to save resource allocation (by sharing
them) and utilize the locality of completions to get the best performance
with Shared Receive Queues (SRQs). We'll create a SRQ per completion
vector (and not per device) using a new API (basic SRQ pool, added to this
patchset too) and associate each created QP/CQ/channel with an
appropriate SRQ. This will also reduce the lock contention on the single
SRQ per device (today's solution).

For NVMEoF, my testing environment included 4 initiators (CX5, CX5, CX4,
CX3) that were connected to 4 subsystems (1 ns per sub) throw 2 ports
(each initiator connected to unique subsystem backed in a different
bull_blk device) using a switch to the NVMEoF target (CX5).
I used RoCE link layer. For SRP, I used 1 server with RoCE loopback connection
(results are not mentioned below) for testing. Hopefully I'll get a tested-by
signature and feedback from Laurence and Rupesh on the SRP part during the review
process.

The below results were made a while ago using NVMEoF.

Configuration:
 - Irqbalancer stopped on each server
 - set_irq_affinity.sh on each interface
 - 2 initiators run traffic throw port 1
 - 2 initiators run traffic throw port 2
 - On initiator set register_always=N
 - Fio with 12 jobs, iodepth 128

Memory consumption calculation for recv buffers (target):
 - Multiple SRQ: SRQ_size * comp_num * ib_devs_num * inline_buffer_size
 - Single SRQ: SRQ_size * 1 * ib_devs_num * inline_buffer_size
 - MQ: RQ_size * CPU_num * ctrl_num * inline_buffer_size

Cases:
 1. Multiple SRQ with 1024 entries:
    - Mem = 1024 * 24 * 2 * 4k = 192MiB (Constant number - not depend on initiators number)
 2. Multiple SRQ with 256 entries:
    - Mem = 256 * 24 * 2 * 4k = 48MiB (Constant number - not depend on initiators number)
 3. MQ:
    - Mem = 256 * 24 * 8 * 4k = 192MiB (Mem grows for every new created ctrl)
 4. Single SRQ (current SRQ implementation):
    - Mem = 4096 * 1 * 2 * 4k = 32MiB (Constant number - not depend on initiators number)

results:

BS    1.read (target CPU)   2.read (target CPU)    3.read (target CPU)   4.read (target CPU)
---  --------------------- --------------------- --------------------- ----------------------
1k     5.88M (80%)            5.45M (72%)            6.77M (91%)          2.2M (72%)

2k     3.56M (65%)            3.45M (59%)            3.72M (64%)          2.12M (59%)

4k     1.8M (33%)             1.87M (32%)            1.88M (32%)          1.59M (34%)

BS    1.write (target CPU)   2.write (target CPU) 3.write (target CPU)   4.write (target CPU)
---  --------------------- --------------------- --------------------- ----------------------
1k     5.42M (63%)            5.14M (55%)            7.75M (82%)          2.14M (74%)

2k     4.15M (56%)            4.14M (51%)            4.16M (52%)          2.08M (73%)

4k     2.17M (28%)            2.17M (27%)            2.16M (28%)          1.62M (24%)


We can see the perf improvement between Case 2 and Case 4 (same order of resource).
We can see the benefit in resource consumption (mem and CPU) with a small perf loss
between cases 2 and 3.
There is still an open question between the perf differance for 1k between Case 1 and
Case 3, but I guess we can investigate and improve it incrementaly.

Thanks to Idan Burstein and Oren Duer for suggesting this nice feature.

Changes from v1:
 - rename srq_set to srq_pool (Leon)
 - changed srpt to use ib_alloc_cq (patch 4/5)
 - removed caching of comp_vector in ib_cq
 - minor fixes got from Leon's review

Max Gurtovoy (5):
  IB/core: add a simple SRQ pool per PD
  nvmet-rdma: add srq pointer to rdma_cmd
  nvmet-rdma: use SRQ per completion vector
  RDMA/srpt: use ib_alloc_cq instead of ib_alloc_cq_any
  RDMA/srpt: use SRQ per completion vector

 drivers/infiniband/core/Makefile      |   2 +-
 drivers/infiniband/core/srq_pool.c    |  75 +++++++++++++
 drivers/infiniband/core/verbs.c       |   3 +
 drivers/infiniband/ulp/srpt/ib_srpt.c | 187 +++++++++++++++++++++++--------
 drivers/infiniband/ulp/srpt/ib_srpt.h |  28 ++++-
 drivers/nvme/target/rdma.c            | 203 ++++++++++++++++++++++++++--------
 include/rdma/ib_verbs.h               |   4 +
 include/rdma/srq_pool.h               |  18 +++
 8 files changed, 419 insertions(+), 101 deletions(-)
 create mode 100644 drivers/infiniband/core/srq_pool.c
 create mode 100644 include/rdma/srq_pool.h

-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-03-20 14:28 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-18 15:02 [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector Max Gurtovoy
2020-03-18 15:02 ` [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD Max Gurtovoy
2020-03-20  5:59   ` Sagi Grimberg
2020-03-20 13:21     ` Max Gurtovoy
2020-03-20 14:27     ` Leon Romanovsky
2020-03-18 15:02 ` [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd Max Gurtovoy
2020-03-18 23:32   ` Jason Gunthorpe
2020-03-19  8:48     ` Max Gurtovoy
2020-03-19  9:14       ` Leon Romanovsky
2020-03-19 10:55         ` Max Gurtovoy
2020-03-19 11:54       ` Jason Gunthorpe
2020-03-19 14:08         ` Konstantin Ryabitsev
2020-03-19 21:58         ` Konstantin Ryabitsev
2020-03-19  4:05   ` Bart Van Assche
2020-03-18 15:02 ` [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector Max Gurtovoy
2020-03-19  4:09   ` Bart Van Assche
2020-03-19  9:15     ` Max Gurtovoy
2020-03-19 11:56       ` Jason Gunthorpe
2020-03-19 12:48         ` Max Gurtovoy
2020-03-19 13:53           ` Jason Gunthorpe
2020-03-19 14:49             ` Bart Van Assche
     [not found]               ` <50dd8f5d-d092-54bc-236d-1e702fb95240@mellanox.com>
     [not found]                 ` <6e3cc1c4-b24e-f607-42b3-5b83dd8c312c@mellanox.com>
2020-03-19 16:27                   ` Max Gurtovoy
2020-03-20  5:47   ` Sagi Grimberg
2020-03-18 15:02 ` [PATCH v2 4/5] RDMA/srpt: use ib_alloc_cq instead of ib_alloc_cq_any Max Gurtovoy
2020-03-19  4:15   ` Bart Van Assche
2020-03-18 15:02 ` [PATCH v2 5/5] RDMA/srpt: use SRQ per completion vector Max Gurtovoy
2020-03-19  4:20   ` Bart Van Assche
2020-03-19  4:02 ` [PATCH v2 0/5] nvmet-rdma/srpt: " Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).