* [Bug Report] Limitation on number of QPs for single process observed
@ 2020-07-24 11:04 Sun, Mingbao
0 siblings, 0 replies; only message in thread
From: Sun, Mingbao @ 2020-07-24 11:04 UTC (permalink / raw)
To: linux-rdma; +Cc: Sun, Ao, Gan, Ping, Zhang, Libin, Cai, Yanxiu
Dell Customer Communication - Confidential
Hi,
Information of the Systems:
HOST lehi-dirt (server side)
lehi-dirt:~ # cat /etc/os-release
NAME="SLES"
VERSION="12-SP4"
VERSION_ID="12.4"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp4"
lehi-dirt:~ # uname -r
4.12.14-95.48-default
lehi-dirt:~ # ibv_devinfo
hca_id: mlx5_bond_0
transport: InfiniBand (0)
fw_ver: 14.26.6000
node_guid: 506b:4b03:00b1:7cae
sys_image_guid: 506b:4b03:00b1:7cae
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: DEL2810000034
phys_port_cnt: 1
Device ports:
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
HOST murray-dirt (client side)
murray-dirt:~ # cat /etc/os-release
NAME="SLES"
VERSION="12-SP4"
VERSION_ID="12.4"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp4"
murray-dirt:~ # uname -r
4.12.14-95.48-default
murray-dirt:~ # ibv_devinfo
hca_id: mlx5_bond_0
transport: InfiniBand (0)
fw_ver: 14.26.6000
node_guid: 506b:4b03:00b1:7ca6
sys_image_guid: 506b:4b03:00b1:7ca6
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: DEL2810000034
phys_port_cnt: 1
Device ports:
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
Way to produce the Bug:
Use a single user-space process as the RDMA client to create more than 339 QPs (through API rdma_connect from librdmacm) to a given RDMA server.
The problem we found is that only 339 QPs could be created.
During the creation of the 340th QP, the rdma_create_ep returns fail (Cannot allocate memory) at the client side.
Following are some of the logs generated by our test tool ib_perf.exe:
(1) 1 Server and 1 Client: Client can not create the 340th QP, failed at rdma_create_ep (Cannot allocate memory).
lehi-dirt:/home/admin/NVMe_OF_test # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -s --qp-num 1024
qp [0] local 192.168.219.7:10001 peer 192.168.219.8:42196 created.
qp [1] local 192.168.219.7:10001 peer 192.168.219.8:50411 created.
qp [2] local 192.168.219.7:10001 peer 192.168.219.8:44152 created.
......
qp [337] local 192.168.219.7:10001 peer 192.168.219.8:46325 created.
qp [338] local 192.168.219.7:10001 peer 192.168.219.8:60163 created.
murray-dirt:/home/admin # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -c --qp-num 1024
qp [0] local 192.168.219.8:42196 peer 192.168.219.7:10001 created.
qp [1] local 192.168.219.8:50411 peer 192.168.219.7:10001 created.
qp [2] local 192.168.219.8:44152 peer 192.168.219.7:10001 created.
......
qp [337] local 192.168.219.8:46325 peer 192.168.219.7:10001 created.
qp [338] local 192.168.219.8:60163 peer 192.168.219.7:10001 created.
ERR_DBG:/mnt/linux-dev-framework-master/apps/ib_perf/perf_frmwk.c(599)-create_connections_client:
rdma_create_ep failed: Cannot allocate memory
(2) 1 Server and 2 Clients: Server can not create the 340th QP, failed at rdma_get_request (Cannot allocate memory).
And the rdma_create_ep returned success at client side for the 340th QP.
lehi-dirt:~ # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -s --qp-num 1024
qp [0] local 192.168.219.7:10001 peer 192.168.219.8:37360 created.
qp [1] local 192.168.219.7:10001 peer 192.168.219.8:35951 created.
......
qp [337] local 192.168.219.7:10001 peer 192.168.219.8:50314 created.
qp [338] local 192.168.219.7:10001 peer 192.168.219.8:42648 created.
ERR_DBG:/mnt/linux-dev-framework-master/apps/ib_perf/perf_frmwk.c(515)-create_connections_server:
rdma_get_request: Cannot allocate memory
murray-dirt:/home/admin # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -c --qp-num 200
qp [0] local 192.168.219.8:37360 peer 192.168.219.7:10001 created.
qp [1] local 192.168.219.8:35951 peer 192.168.219.7:10001 created.
......
qp [198] local 192.168.219.8:59660 peer 192.168.219.7:10001 created.
qp [199] local 192.168.219.8:48077 peer 192.168.219.7:10001 created.
200 connection(s) created in total
murray-dirt:/home/admin # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -c --qp-num 200
qp [0] local 192.168.219.8:45772 peer 192.168.219.7:10001 created.
qp [1] local 192.168.219.8:58067 peer 192.168.219.7:10001 created.
......
qp [137] local 192.168.219.8:50314 peer 192.168.219.7:10001 created.
qp [138] local 192.168.219.8:42648 peer 192.168.219.7:10001 created.
ERR_DBG:/mnt/linux-dev-framework-master/apps/ib_perf/perf_frmwk.c(630)-create_connections_client:
rdma_connect: Connection refused
(3) NVMe_OF target runs as the server, and 2 ib_perf.exe run as client (each of them creates 200 QPs): the result is OK.
murray-dirt:/home/admin # ib_perf.exe --server-ip 169.254.85.7 --server-port 4420 -c --qp-num 200
qp [0] local 169.254.85.8:53907 peer 169.254.85.7:4420 created.
qp [1] local 169.254.85.8:57988 peer 169.254.85.7:4420 created.
......
qp [198] local 169.254.85.8:58852 peer 169.254.85.7:4420 created.
qp [199] local 169.254.85.8:33436 peer 169.254.85.7:4420 created.
200 connection(s) created in total
murray-dirt:/home/admin # ib_perf.exe --server-ip 169.254.85.7 --server-port 4420 -c --qp-num 200
qp [0] local 169.254.85.8:50105 peer 169.254.85.7:4420 created.
qp [1] local 169.254.85.8:44136 peer 169.254.85.7:4420 created.
......
qp [198] local 169.254.85.8:53581 peer 169.254.85.7:4420 created.
qp [199] local 169.254.85.8:50082 peer 169.254.85.7:4420 created.
200 connection(s) created in total
Thanks,
Tyler
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2020-07-24 11:05 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 11:04 [Bug Report] Limitation on number of QPs for single process observed Sun, Mingbao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).