Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
From: "Sun, Mingbao" <Tyler.Sun@dell.com>
To: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Cc: "Sun, Ao" <Ao.Sun@dell.com>, "Gan, Ping" <Ping.Gan@dell.com>,
	"Zhang, Libin" <Libin.Zhang@dell.com>,
	"Cai, Yanxiu" <Yanxiu.Cai@dell.com>
Subject: [Bug Report] Limitation on number of QPs for single process observed
Date: Fri, 24 Jul 2020 11:04:57 +0000
Message-ID: <BN8PR19MB260932E68B6399764F79CDA6F5770@BN8PR19MB2609.namprd19.prod.outlook.com> (raw)

Dell Customer Communication - Confidential

Hi,

Information of the Systems:

HOST lehi-dirt (server side)

lehi-dirt:~ # cat /etc/os-release 
NAME="SLES"
VERSION="12-SP4"
VERSION_ID="12.4"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp4"

lehi-dirt:~ # uname -r
4.12.14-95.48-default

lehi-dirt:~ # ibv_devinfo 
hca_id: mlx5_bond_0
              transport:                                       InfiniBand (0)
              fw_ver:                                            14.26.6000
              node_guid:                                     506b:4b03:00b1:7cae
              sys_image_guid:                                          506b:4b03:00b1:7cae
              vendor_id:                                      0x02c9
              vendor_part_id:                                          4117
              hw_ver:                                                         0x0
              board_id:                                        DEL2810000034
              phys_port_cnt:                              1
              Device ports:
                             port:     1
                                           state:                                 PORT_ACTIVE (4)
                                           max_mtu:                         4096 (5)
                                           active_mtu:                      4096 (5)
                                           sm_lid:                              0
                                           port_lid:                            0
                                           port_lmc:                          0x00
                                           link_layer:                        Ethernet

HOST murray-dirt (client side)

murray-dirt:~ # cat /etc/os-release 
NAME="SLES"
VERSION="12-SP4"
VERSION_ID="12.4"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp4"

murray-dirt:~ # uname -r
4.12.14-95.48-default

murray-dirt:~ # ibv_devinfo 
hca_id: mlx5_bond_0
              transport:                                       InfiniBand (0)
              fw_ver:                                            14.26.6000
              node_guid:                                     506b:4b03:00b1:7ca6
              sys_image_guid:                                          506b:4b03:00b1:7ca6
              vendor_id:                                      0x02c9
              vendor_part_id:                                          4117
              hw_ver:                                                         0x0
              board_id:                                        DEL2810000034
              phys_port_cnt:                              1
              Device ports:
                             port:     1
                                           state:                                 PORT_ACTIVE (4)
                                           max_mtu:                         4096 (5)
                                           active_mtu:                      4096 (5)
                                           sm_lid:                              0
                                           port_lid:                            0
                                           port_lmc:                          0x00
                                           link_layer:                        Ethernet

Way to produce the Bug:

              Use a single user-space process as the RDMA client to create more than 339 QPs (through API rdma_connect from librdmacm) to a given RDMA server.
              The problem we found is that only 339 QPs could be created.
              During the creation of the 340th QP, the rdma_create_ep returns fail (Cannot allocate memory) at the client side.



              Following are some of the logs generated by our test tool ib_perf.exe:

(1) 1 Server and 1 Client: Client can not create the 340th QP, failed at rdma_create_ep (Cannot allocate memory).


              
lehi-dirt:/home/admin/NVMe_OF_test # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -s --qp-num 1024
qp [0] local 192.168.219.7:10001 peer 192.168.219.8:42196 created.
qp [1] local 192.168.219.7:10001 peer 192.168.219.8:50411 created.
qp [2] local 192.168.219.7:10001 peer 192.168.219.8:44152 created.
......
qp [337] local 192.168.219.7:10001 peer 192.168.219.8:46325 created.
qp [338] local 192.168.219.7:10001 peer 192.168.219.8:60163 created.


              
murray-dirt:/home/admin # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -c --qp-num 1024
qp [0] local 192.168.219.8:42196 peer 192.168.219.7:10001 created.
qp [1] local 192.168.219.8:50411 peer 192.168.219.7:10001 created.
qp [2] local 192.168.219.8:44152 peer 192.168.219.7:10001 created.
......
qp [337] local 192.168.219.8:46325 peer 192.168.219.7:10001 created.
qp [338] local 192.168.219.8:60163 peer 192.168.219.7:10001 created.
ERR_DBG:/mnt/linux-dev-framework-master/apps/ib_perf/perf_frmwk.c(599)-create_connections_client:
rdma_create_ep failed: Cannot allocate memory



(2) 1 Server and 2 Clients:  Server can not create the 340th QP, failed at rdma_get_request  (Cannot allocate memory).
And the rdma_create_ep returned success at client side for the 340th QP.

              
lehi-dirt:~ # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -s --qp-num 1024
qp [0] local 192.168.219.7:10001 peer 192.168.219.8:37360 created.
qp [1] local 192.168.219.7:10001 peer 192.168.219.8:35951 created.
......
qp [337] local 192.168.219.7:10001 peer 192.168.219.8:50314 created.
qp [338] local 192.168.219.7:10001 peer 192.168.219.8:42648 created.
ERR_DBG:/mnt/linux-dev-framework-master/apps/ib_perf/perf_frmwk.c(515)-create_connections_server:
rdma_get_request: Cannot allocate memory

              
murray-dirt:/home/admin # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -c --qp-num 200
qp [0] local 192.168.219.8:37360 peer 192.168.219.7:10001 created.
qp [1] local 192.168.219.8:35951 peer 192.168.219.7:10001 created.
......
qp [198] local 192.168.219.8:59660 peer 192.168.219.7:10001 created.
qp [199] local 192.168.219.8:48077 peer 192.168.219.7:10001 created.
200 connection(s) created in total

              
murray-dirt:/home/admin # ib_perf.exe --server-ip 192.168.219.7 --server-port 10001 -c --qp-num 200
qp [0] local 192.168.219.8:45772 peer 192.168.219.7:10001 created.
qp [1] local 192.168.219.8:58067 peer 192.168.219.7:10001 created.
......
qp [137] local 192.168.219.8:50314 peer 192.168.219.7:10001 created.
qp [138] local 192.168.219.8:42648 peer 192.168.219.7:10001 created.
ERR_DBG:/mnt/linux-dev-framework-master/apps/ib_perf/perf_frmwk.c(630)-create_connections_client:
rdma_connect: Connection refused



(3) NVMe_OF target runs as the server, and 2 ib_perf.exe run as client (each of them creates 200 QPs): the result is OK. 

murray-dirt:/home/admin # ib_perf.exe --server-ip 169.254.85.7 --server-port 4420 -c --qp-num 200
qp [0] local 169.254.85.8:53907 peer 169.254.85.7:4420 created.
qp [1] local 169.254.85.8:57988 peer 169.254.85.7:4420 created.
......
qp [198] local 169.254.85.8:58852 peer 169.254.85.7:4420 created.
qp [199] local 169.254.85.8:33436 peer 169.254.85.7:4420 created.
200 connection(s) created in total



murray-dirt:/home/admin # ib_perf.exe --server-ip 169.254.85.7 --server-port 4420 -c --qp-num 200
qp [0] local 169.254.85.8:50105 peer 169.254.85.7:4420 created.
qp [1] local 169.254.85.8:44136 peer 169.254.85.7:4420 created.
......
qp [198] local 169.254.85.8:53581 peer 169.254.85.7:4420 created.
qp [199] local 169.254.85.8:50082 peer 169.254.85.7:4420 created.
200 connection(s) created in total



Thanks,
Tyler

                 reply index

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN8PR19MB260932E68B6399764F79CDA6F5770@BN8PR19MB2609.namprd19.prod.outlook.com \
    --to=tyler.sun@dell.com \
    --cc=Ao.Sun@dell.com \
    --cc=Libin.Zhang@dell.com \
    --cc=Ping.Gan@dell.com \
    --cc=Yanxiu.Cai@dell.com \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git