linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug Report] Limitation on number of QPs for single process observed
@ 2020-07-24 11:04 Sun, Mingbao
  0 siblings, 0 replies; only message in thread
From: Sun, Mingbao @ 2020-07-24 11:04 UTC (permalink / raw)
  To: linux-rdma; +Cc: Sun, Ao, Gan, Ping, Zhang, Libin, Cai, Yanxiu

Dell Customer Communication - Confidential

Hi,

Information of the Systems:

HOST lehi-dirt (server side)

lehi-dirt:~ # cat /etc/os-release 
NAME="SLES"
VERSION="12-SP4"
VERSION_ID="12.4"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp4"

lehi-dirt:~ # uname -r
4.12.14-95.48-default

lehi-dirt:~ # ibv_devinfo 
hca_id: mlx5_bond_0
              transport:                                       InfiniBand (0)
              fw_ver:                                            14.26.6000
              node_guid:                                     506b:4b03:00b1:7cae
              sys_image_guid:                                          506b:4b03:00b1:7cae
              vendor_id:                                      0x02c9
              vendor_part_id:                                          4117
              hw_ver:                                                         0x0
              board_id:                                        DEL2810000034
              phys_port_cnt:                              1
              Device ports:
                             port:     1
                                           state:                                 PORT_ACTIVE (4)
                                           max_mtu:                         4096 (5)
                                           active_mtu:                      4096 (5)
                                           sm_lid:                              0
                                           port_lid:                            0
                                           port_lmc:                          0x00
                                           link_layer:                        Ethernet

HOST murray-dirt (client side)

murray-dirt:~ # cat /etc/os-release 
NAME="SLES"
VERSION="12-SP4"
VERSION_ID="12.4"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp4"

murray-dirt:~ # uname -r
4.12.14-95.48-default

murray-dirt:~ # ibv_devinfo 
hca_id: mlx5_bond_0
              transport:                                       InfiniBand (0)
              fw_ver:                                            14.26.6000
              node_guid:                                     506b:4b03:00b1:7ca6
              sys_image_guid:                                          506b:4b03:00b1:7ca6
              vendor_id:                                      0x02c9
              vendor_part_id:                                          4117
              hw_ver:                                                         0x0
              board_id:                                        DEL2810000034
              phys_port_cnt:                              1
              Device ports:
                             port:     1
                                           state:                                 PORT_ACTIVE (4)
                                           max_mtu:                         4096 (5)
                                           active_mtu:                      4096 (5)
                                           sm_lid:                              0
                                           port_lid:                            0
                                           port_lmc:                          0x00
                                           link_layer:                        Ethernet

Way to produce the Bug:

              Use a single user-space process as the RDMA client to create more than 339 QPs (through API rdma_connect from librdmacm) to a given RDMA server.
              The problem we found is that only 339 QPs could be created.
              During the creation of the 340th QP, the rdma_create_ep returns fail (Cannot allocate memory) at the client side.



              Following are some of the logs generated by our test tool ib_perf.exe:

(1) 1 Server and 1 Client: Client can not create the 340th QP, failed at rdma_create_ep (Cannot allocate memory).


              
lehi-dirt:/home/admin/NVMe_OF_test # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -s --qp-num 1024
qp [0] local 192.168.219.7:10001 peer 192.168.219.8:42196 created.
qp [1] local 192.168.219.7:10001 peer 192.168.219.8:50411 created.
qp [2] local 192.168.219.7:10001 peer 192.168.219.8:44152 created.
......
qp [337] local 192.168.219.7:10001 peer 192.168.219.8:46325 created.
qp [338] local 192.168.219.7:10001 peer 192.168.219.8:60163 created.


              
murray-dirt:/home/admin # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -c --qp-num 1024
qp [0] local 192.168.219.8:42196 peer 192.168.219.7:10001 created.
qp [1] local 192.168.219.8:50411 peer 192.168.219.7:10001 created.
qp [2] local 192.168.219.8:44152 peer 192.168.219.7:10001 created.
......
qp [337] local 192.168.219.8:46325 peer 192.168.219.7:10001 created.
qp [338] local 192.168.219.8:60163 peer 192.168.219.7:10001 created.
ERR_DBG:/mnt/linux-dev-framework-master/apps/ib_perf/perf_frmwk.c(599)-create_connections_client:
rdma_create_ep failed: Cannot allocate memory



(2) 1 Server and 2 Clients:  Server can not create the 340th QP, failed at rdma_get_request  (Cannot allocate memory).
And the rdma_create_ep returned success at client side for the 340th QP.

              
lehi-dirt:~ # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -s --qp-num 1024
qp [0] local 192.168.219.7:10001 peer 192.168.219.8:37360 created.
qp [1] local 192.168.219.7:10001 peer 192.168.219.8:35951 created.
......
qp [337] local 192.168.219.7:10001 peer 192.168.219.8:50314 created.
qp [338] local 192.168.219.7:10001 peer 192.168.219.8:42648 created.
ERR_DBG:/mnt/linux-dev-framework-master/apps/ib_perf/perf_frmwk.c(515)-create_connections_server:
rdma_get_request: Cannot allocate memory

              
murray-dirt:/home/admin # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -c --qp-num 200
qp [0] local 192.168.219.8:37360 peer 192.168.219.7:10001 created.
qp [1] local 192.168.219.8:35951 peer 192.168.219.7:10001 created.
......
qp [198] local 192.168.219.8:59660 peer 192.168.219.7:10001 created.
qp [199] local 192.168.219.8:48077 peer 192.168.219.7:10001 created.
200 connection(s) created in total

              
murray-dirt:/home/admin # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -c --qp-num 200
qp [0] local 192.168.219.8:45772 peer 192.168.219.7:10001 created.
qp [1] local 192.168.219.8:58067 peer 192.168.219.7:10001 created.
......
qp [137] local 192.168.219.8:50314 peer 192.168.219.7:10001 created.
qp [138] local 192.168.219.8:42648 peer 192.168.219.7:10001 created.
ERR_DBG:/mnt/linux-dev-framework-master/apps/ib_perf/perf_frmwk.c(630)-create_connections_client:
rdma_connect: Connection refused



(3) NVMe_OF target runs as the server, and 2 ib_perf.exe run as client (each of them creates 200 QPs): the result is OK. 

murray-dirt:/home/admin # ib_perf.exe --server-ip 169.254.85.7 --server-port 4420 -c --qp-num 200
qp [0] local 169.254.85.8:53907 peer 169.254.85.7:4420 created.
qp [1] local 169.254.85.8:57988 peer 169.254.85.7:4420 created.
......
qp [198] local 169.254.85.8:58852 peer 169.254.85.7:4420 created.
qp [199] local 169.254.85.8:33436 peer 169.254.85.7:4420 created.
200 connection(s) created in total



murray-dirt:/home/admin # ib_perf.exe --server-ip 169.254.85.7 --server-port 4420 -c --qp-num 200
qp [0] local 169.254.85.8:50105 peer 169.254.85.7:4420 created.
qp [1] local 169.254.85.8:44136 peer 169.254.85.7:4420 created.
......
qp [198] local 169.254.85.8:53581 peer 169.254.85.7:4420 created.
qp [199] local 169.254.85.8:50082 peer 169.254.85.7:4420 created.
200 connection(s) created in total



Thanks,
Tyler

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-07-24 11:05 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 11:04 [Bug Report] Limitation on number of QPs for single process observed Sun, Mingbao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).