* SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma. @ 2021-08-31 13:42 Mark Ruijter 2021-09-01 12:52 ` Sagi Grimberg 2021-09-02 21:36 ` Max Gurtovoy 0 siblings, 2 replies; 8+ messages in thread From: Mark Ruijter @ 2021-08-31 13:42 UTC (permalink / raw) To: linux-nvme When I connect an SPDK initiator it will try to connect using 1024 connections. The linux target is unable to handle this situation and return an error. Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12 Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12). It is really easy to reproduce the problem, even when not using the SPDK initiator. Just type: nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections. The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect. Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect. See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-( In noticed that someone reported this problem on the SPDK list: https://github.com/spdk/spdk/issues/1719 Thanks, Mark --- static int nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn, struct nvmet_rdma_queue *queue) { struct nvme_rdma_cm_req *req; req = (struct nvme_rdma_cm_req *)conn->private_data; if (!req || conn->private_data_len == 0) return NVME_RDMA_CM_INVALID_LEN; if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0) return NVME_RDMA_CM_INVALID_RECFMT; queue->host_qid = le16_to_cpu(req->qid); /* * req->hsqsize corresponds to our recv queue size plus 1 * req->hrqsize corresponds to our send queue size */ queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1; queue->send_queue_size = le16_to_cpu(req->hrqsize); if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) { pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE); return NVME_RDMA_CM_INVALID_HSQSIZE; } + if (queue->recv_queue_size > 256) + queue->recv_queue_size = 256; + if (queue->send_queue_size > 256) + queue->send_queue_size = 256; + pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size); + pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size); /* XXX: Should we enforce some kind of max for IO queues? */ return 0; } _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma. 2021-08-31 13:42 SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma Mark Ruijter @ 2021-09-01 12:52 ` Sagi Grimberg 2021-09-01 14:51 ` Mark Ruijter 2021-09-02 21:36 ` Max Gurtovoy 1 sibling, 1 reply; 8+ messages in thread From: Sagi Grimberg @ 2021-09-01 12:52 UTC (permalink / raw) To: Mark Ruijter, linux-nvme > When I connect an SPDK initiator it will try to connect using 1024 connections. > The linux target is unable to handle this situation and return an error. > > Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed > Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12 > Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12). Seems that the target is trying to open a queue-pair that is larger than supported, which device are you using? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma. 2021-09-01 12:52 ` Sagi Grimberg @ 2021-09-01 14:51 ` Mark Ruijter 2021-09-01 14:58 ` Sagi Grimberg 0 siblings, 1 reply; 8+ messages in thread From: Mark Ruijter @ 2021-09-01 14:51 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme Hi Sagi, I am using VMware 7.x as initiator with RDMA. The target system is running Ubuntu 20.04.3 LTS with kernel 5.11.22+. The device that is exported is an LVM volume, however I also tested with a file backed loop device. Connecting with SPDK seems to be the problem and as reported on the SPDK mailing-list it can be used to reproduce the issue when VMWare is not available. ./perf -q 64 -P 1 -s 4096 -w read -t 300 -c 0x1 -o 4096 -r 'trtype:RDMA adrfam:IPv4 traddr:169.254.85.8 trsvcid:4420' This seems to produce a similar result: nvme connect --transport=rdma --queue-size=1024 --nqn=testnqn_1 --traddr=169.254.85.8 --trsvcid=4420 I hope this helps, --Mark On 01/09/2021, 14:52, "Sagi Grimberg" <sagi@grimberg.me> wrote: > When I connect an SPDK initiator it will try to connect using 1024 connections. > The linux target is unable to handle this situation and return an error. > > Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed > Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12 > Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12). Seems that the target is trying to open a queue-pair that is larger than supported, which device are you using? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma. 2021-09-01 14:51 ` Mark Ruijter @ 2021-09-01 14:58 ` Sagi Grimberg 2021-09-01 15:08 ` Mark Ruijter 0 siblings, 1 reply; 8+ messages in thread From: Sagi Grimberg @ 2021-09-01 14:58 UTC (permalink / raw) To: Mark Ruijter, linux-nvme > Hi Sagi, > > I am using VMware 7.x as initiator with RDMA. > The target system is running Ubuntu 20.04.3 LTS with kernel 5.11.22+. > > The device that is exported is an LVM volume, however I also tested with a file backed loop device. > Connecting with SPDK seems to be the problem and as reported on the SPDK mailing-list it can be used to reproduce the issue when VMWare is not available. > ./perf -q 64 -P 1 -s 4096 -w read -t 300 -c 0x1 -o 4096 -r 'trtype:RDMA adrfam:IPv4 traddr:169.254.85.8 trsvcid:4420' > > This seems to produce a similar result: > nvme connect --transport=rdma --queue-size=1024 --nqn=testnqn_1 --traddr=169.254.85.8 --trsvcid=4420 > > I hope this helps, I meant which rdma device are you using? that device is failing the qp creation. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma. 2021-09-01 14:58 ` Sagi Grimberg @ 2021-09-01 15:08 ` Mark Ruijter 0 siblings, 0 replies; 8+ messages in thread From: Mark Ruijter @ 2021-09-01 15:08 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme The device is a Mellanox ConnectX-6 controller. Vmware can connect to an SPDK target started on the exact same Ubuntu target system. 4: enp129s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4096 qdisc mq state UP group default qlen 25000 link/ether b8:ce:f6:92:b7:b6 brd ff:ff:ff:ff:ff:ff inet 192.168.100.34/24 brd 192.168.100.255 scope global enp129s0f0np0 valid_lft forever preferred_lft forever 81:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] 81:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] -- # cd /sys/kernel/config/nvmet/ports/ # ls 11290 11291 21290 21291 # cd 11290 # cat * ipv4 192.168.100.34 not specified 4420 rdma -- On 01/09/2021, 16:58, "Sagi Grimberg" <sagi@grimberg.me> wrote: > Hi Sagi, > > I am using VMware 7.x as initiator with RDMA. > The target system is running Ubuntu 20.04.3 LTS with kernel 5.11.22+. > > The device that is exported is an LVM volume, however I also tested with a file backed loop device. > Connecting with SPDK seems to be the problem and as reported on the SPDK mailing-list it can be used to reproduce the issue when VMWare is not available. > ./perf -q 64 -P 1 -s 4096 -w read -t 300 -c 0x1 -o 4096 -r 'trtype:RDMA adrfam:IPv4 traddr:169.254.85.8 trsvcid:4420' > > This seems to produce a similar result: > nvme connect --transport=rdma --queue-size=1024 --nqn=testnqn_1 --traddr=169.254.85.8 --trsvcid=4420 > > I hope this helps, I meant which rdma device are you using? that device is failing the qp creation. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma. 2021-08-31 13:42 SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma Mark Ruijter 2021-09-01 12:52 ` Sagi Grimberg @ 2021-09-02 21:36 ` Max Gurtovoy 2021-09-06 9:12 ` Mark Ruijter 1 sibling, 1 reply; 8+ messages in thread From: Max Gurtovoy @ 2021-09-02 21:36 UTC (permalink / raw) To: Mark Ruijter, linux-nvme On 8/31/2021 4:42 PM, Mark Ruijter wrote: > When I connect an SPDK initiator it will try to connect using 1024 connections. > The linux target is unable to handle this situation and return an error. > > Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed > Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12 > Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12). > > It is really easy to reproduce the problem, even when not using the SPDK initiator. > > Just type: > nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX > While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections. 1024 connections or is it the queue depth ? how many cores you have in initiator ? can you give more details on the systems ? > > The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect. > Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect. > See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-( > > In noticed that someone reported this problem on the SPDK list: > https://github.com/spdk/spdk/issues/1719 > > Thanks, > > Mark > > --- > static int > nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn, > struct nvmet_rdma_queue *queue) > { > struct nvme_rdma_cm_req *req; > > req = (struct nvme_rdma_cm_req *)conn->private_data; > if (!req || conn->private_data_len == 0) > return NVME_RDMA_CM_INVALID_LEN; > > if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0) > return NVME_RDMA_CM_INVALID_RECFMT; > > queue->host_qid = le16_to_cpu(req->qid); > > /* > * req->hsqsize corresponds to our recv queue size plus 1 > * req->hrqsize corresponds to our send queue size > */ > queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1; > queue->send_queue_size = le16_to_cpu(req->hrqsize); > if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) { > pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE); > return NVME_RDMA_CM_INVALID_HSQSIZE; > } > > + if (queue->recv_queue_size > 256) > + queue->recv_queue_size = 256; > + if (queue->send_queue_size > 256) > + queue->send_queue_size = 256; > + pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size); > + pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size); > > /* XXX: Should we enforce some kind of max for IO queues? */ > return 0; > } > > > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma. 2021-09-02 21:36 ` Max Gurtovoy @ 2021-09-06 9:12 ` Mark Ruijter 2021-09-07 14:25 ` Max Gurtovoy 0 siblings, 1 reply; 8+ messages in thread From: Mark Ruijter @ 2021-09-06 9:12 UTC (permalink / raw) To: Max Gurtovoy, linux-nvme Hi Max, The system I use has dual AMD EPYC 7452 32-Core Processors. MemTotal: 197784196 kB It has a single dual port ConnectX-6 card. 81:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] 81:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] The problem is not related to hardware. Vmware works flawlessly using the SPDK target with this system. The kernel target fails like this: target/rdma.c -> infiniband/cma.c -> infiniband/verbs.c -> infiniband/hw/mlx5/qp.c nvmet_rdma_cm_accept -> rdma_accept -> ib_create_named_qp -> create_kernel_qp -> returns -12 -> mlx5_0: create_qp:2774:(pid 1246): MARK Create QP type 2 failed) The queue-size is 1024. The mlx5 driver now entered the function calc_sq_size where it fails here and returns ENOMEM. -- if (qp->sq.wqe_cnt > (1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz))) { mlx5_ib_dbg(dev, "send queue size (%d * %d / %d -> %d) exceeds limits(%d)\n", attr->cap.max_send_wr, wqe_size, MLX5_SEND_WQE_BB, qp->sq.wqe_cnt, 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz)); return -ENOMEM; } -- Sep 5 12:53:45 everest kernel: [ 567.691658] MARK enter ib_create_named_qp Sep 5 12:53:45 everest kernel: [ 567.691667] MARK wq_size = 2097152 Sep 5 12:53:46 everest kernel: [ 567.692419] MARK create_kernel_qp 0 Sep 5 12:53:46 everest kernel: [ 568.204213] MARK enter ib_create_named_qp Sep 5 12:53:46 everest kernel: [ 568.204218] MARK wq_size = 4194304 Sep 5 12:53:46 everest kernel: [ 568.204219] MARK 1 send queue size (4097 * 640 / 64 -> 65536) exceeds limits(32768) Sep 5 12:53:46 everest kernel: [ 568.204220] MARK 1 calc_sq_size return ENOMEM A hack / fix I tested and that seems to work, or at least prevents immediate failure, is this: --- /root/linux-5.11/drivers/nvme/target/rdma.c +++ rdma.c 2021-09-06 03:05:08.998364562 -0400 @@ -1397,6 +1397,10 @@ if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) return NVME_RDMA_CM_INVALID_HSQSIZE; + if ( queue->send_queue_size > 256 ) { + queue->send_queue_size = 256; + pr_info("MARK : reducing the queue->send_queue_size to 256"); + } /* XXX: Should we enforce some kind of max for IO queues? */ return 0; --- The answer to the question in the code: "Should we enforce some kind of max for IO queues?" seems to be: yes? Although VMware now discovers and connects to the kernel target the path not working and declared dead. The volume appears with a nguid since the target does not set the eui64 field. However, setting it by using a pass-through device does not solve the issue. When I don't set pass-through nvme reports this: esxcli nvme namespace list Name Controller Number Namespace ID Block Size Capacity in MB ------------------------------------- ----------------- ------------ ---------- -------------- eui.344337304e8001510025384100000001 263 1 4096 12207104 uuid.fa8ab2201ffb4429ba1719ca0d5a3405 322 1 512 14649344 When I use pass-through it reports: [root@vmw01:~] esxcli nvme namespace list Name Controller Number Namespace ID Block Size Capacity in MB ------------------------------------ ----------------- ------------ ---------- -------------- eui.344337304e8001510025384100000001 263 1 4096 12207104 eui.344337304e7000780025384100000001 324 1 512 14649344 The reason is easy to explain. Without pass-through the kernel target shows this when I query a device with sg_inq: sg_inq -e -p 0x83 /dev/nvmeXn1 -vvv VPD INQUIRY: Device Identification page Designation descriptor number 1, descriptor length: 52 designator_type: T10 vendor identification, code_set: ASCII associated with the Target device that contains addressed lu vendor id: NVMe vendor specific: testvg/testlv_79d87ff74dac1b27 With pass-through the kernel target provides this information for the same device: VPD INQUIRY: Device Identification page Designation descriptor number 1, descriptor length: 56 designator_type: T10 vendor identification, code_set: ASCII associated with the Target device that contains addressed lu vendor id: NVMe vendor specific: SAMSUNG MZWLL12THMLA-00005_S4C7NA0N700078 Designation descriptor number 2, descriptor length: 20 designator_type: EUI-64 based, code_set: Binary associated with the Addressed logical unit EUI-64 based 16 byte identifier Identifier extension: 0x344337304e700078 IEEE Company_id: 0x2538 Vendor Specific Extension Identifier: 0x410000000103 [0x344337304e7000780025384100000001] Designation descriptor number 3, descriptor length: 40 designator_type: SCSI name string, code_set: UTF-8 associated with the Addressed logical unit SCSI name string: eui.344337304E7000780025384100000001 SPDK returns this for the same device: VPD INQUIRY: Device Identification page Designation descriptor number 1, descriptor length: 48 designator_type: T10 vendor identification, code_set: ASCII associated with the Target device that contains addressed lu vendor id: NVMe vendor specific: SPDK_Controller1_SPDK00000000000001 Designation descriptor number 2, descriptor length: 20 designator_type: EUI-64 based, code_set: Binary associated with the Addressed logical unit EUI-64 based 16 byte identifier Identifier extension: 0xe0e9311590254d4f IEEE Company_id: 0x8fa737 Vendor Specific Extension Identifier: 0xb56897382503 [0xe0e9311590254d4f8fa737b568973825] Designation descriptor number 3, descriptor length: 40 designator_type: SCSI name string, code_set: UTF-8 associated with the Addressed logical unit SCSI name string: eui.E0E9311590254D4F8FA737B568973825 So, the kernel target returns limited information when not using pass-through which forces VMware to use the nguid. We could use the nguid to fill the eui64 attribute and always report the extended info like we do with a pass-through device? ------------------- --- /root/linux-5.11/drivers/nvme/target/admin-cmd.c 2021-02-14 17:32:24.000000000 -0500 +++ admin-cmd.c 2021-09-05 06:18:10.836865874 -0400 @@ -526,6 +526,7 @@ id->anagrpid = cpu_to_le32(ns->anagrpid); memcpy(&id->nguid, &ns->nguid, sizeof(id->nguid)); + memcpy(&id->eui64, &ns->nguid, sizeof(id->eui64)); id->lbaf[0].ds = ns->blksize_shift; --- /root/linux-5.11/drivers/nvme/target/configfs.c 2021-02-14 17:32:24.000000000 -0500 +++ configfs.c 2021-09-05 05:35:35.741619651 -0400 @@ -477,6 +477,7 @@ } memcpy(&ns->nguid, nguid, sizeof(nguid)); + memcpy(&ns->eui64, nguid, sizeof(ns->eui64)); out_unlock: mutex_unlock(&subsys->lock); return ret ? ret : count; -------------- Even with pass-through enabled and the kernel target returning all information the path is immediately reported to be dead. esxcli storage core path list rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown- UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown- Runtime Name: vmhba64:C0:T1:L0 Device: No associated device Device Display Name: No associated device Adapter: vmhba64 Channel: 0 Target: 1 LUN: 0 Plugin: (unclaimed) State: dead Transport: rdma Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10 Target Identifier: rdma.unknown Adapter Transport Details: Unavailable or path is unclaimed Target Transport Details: Unavailable or path is unclaimed Maximum IO Size: 131072 This may or may not be a Vmware path-checker issue. Since SPDK does not show this problem some difference between the kernel target and SPDK target must exist. I don't know if the patch I use that limits the queue-depth to 256 is to blame. The path for the exact same device exported with SPDK shows up like this: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e Runtime Name: vmhba64:C0:T0:L0 Device: eui.a012ce7696bf47d5be87760d8f78fb8e Device Display Name: NVMe RDMA Disk (eui.a012ce7696bf47d5be87760d8f78fb8e) Adapter: vmhba64 Channel: 0 Target: 0 LUN: 0 Plugin: HPP State: active Transport: rdma Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10 Target Identifier: rdma.unknown Adapter Transport Details: Unavailable or path is unclaimed Target Transport Details: Unavailable or path is unclaimed Maximum IO Size: 131072 It looks like the connect patch does work but something else causes VMware not to accept the nvmet-rdma target devices. Not sure what to make of that. It could still be eui related? See the UID from the nvmet-rdma target. Thanks, --Mark On 02/09/2021, 23:36, "Max Gurtovoy" <mgurtovoy@nvidia.com> wrote: On 8/31/2021 4:42 PM, Mark Ruijter wrote: > When I connect an SPDK initiator it will try to connect using 1024 connections. > The linux target is unable to handle this situation and return an error. > > Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed > Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12 > Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12). > > It is really easy to reproduce the problem, even when not using the SPDK initiator. > > Just type: > nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX > While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections. 1024 connections or is it the queue depth ? how many cores you have in initiator ? can you give more details on the systems ? > > The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect. > Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect. > See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-( > > In noticed that someone reported this problem on the SPDK list: > https://github.com/spdk/spdk/issues/1719 > > Thanks, > > Mark > > --- > static int > nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn, > struct nvmet_rdma_queue *queue) > { > struct nvme_rdma_cm_req *req; > > req = (struct nvme_rdma_cm_req *)conn->private_data; > if (!req || conn->private_data_len == 0) > return NVME_RDMA_CM_INVALID_LEN; > > if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0) > return NVME_RDMA_CM_INVALID_RECFMT; > > queue->host_qid = le16_to_cpu(req->qid); > > /* > * req->hsqsize corresponds to our recv queue size plus 1 > * req->hrqsize corresponds to our send queue size > */ > queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1; > queue->send_queue_size = le16_to_cpu(req->hrqsize); > if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) { > pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE); > return NVME_RDMA_CM_INVALID_HSQSIZE; > } > > + if (queue->recv_queue_size > 256) > + queue->recv_queue_size = 256; > + if (queue->send_queue_size > 256) > + queue->send_queue_size = 256; > + pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size); > + pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size); > > /* XXX: Should we enforce some kind of max for IO queues? */ > return 0; > } > > > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma. 2021-09-06 9:12 ` Mark Ruijter @ 2021-09-07 14:25 ` Max Gurtovoy 0 siblings, 0 replies; 8+ messages in thread From: Max Gurtovoy @ 2021-09-07 14:25 UTC (permalink / raw) To: Mark Ruijter, linux-nvme On 9/6/2021 12:12 PM, Mark Ruijter wrote: > Hi Max, > > The system I use has dual AMD EPYC 7452 32-Core Processors. > MemTotal: 197784196 kB > > It has a single dual port ConnectX-6 card. > 81:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] > 81:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] > > The problem is not related to hardware. Vmware works flawlessly using the SPDK target with this system. > > The kernel target fails like this: > target/rdma.c -> infiniband/cma.c -> infiniband/verbs.c -> infiniband/hw/mlx5/qp.c > nvmet_rdma_cm_accept -> rdma_accept -> ib_create_named_qp -> create_kernel_qp -> > returns -12 -> mlx5_0: create_qp:2774:(pid 1246): MARK Create QP type 2 failed) > > The queue-size is 1024. The mlx5 driver now entered the function calc_sq_size where it fails here and returns ENOMEM. Ok I see the issue here. I can repro it with Linux initiator if I set -Q 1024 in the connect command. We need to fix few things in the max_qp_wr calculation and add .get_queue_size op to nvmet_fabrics_ops to solve it completely. For now you can use 256 queue size in SPDK initiator to work around this. I'll send a fix. > -- > if (qp->sq.wqe_cnt > (1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz))) { > mlx5_ib_dbg(dev, "send queue size (%d * %d / %d -> %d) exceeds limits(%d)\n", > attr->cap.max_send_wr, wqe_size, MLX5_SEND_WQE_BB, > qp->sq.wqe_cnt, > 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz)); > return -ENOMEM; > } > -- > Sep 5 12:53:45 everest kernel: [ 567.691658] MARK enter ib_create_named_qp > Sep 5 12:53:45 everest kernel: [ 567.691667] MARK wq_size = 2097152 > Sep 5 12:53:46 everest kernel: [ 567.692419] MARK create_kernel_qp 0 > Sep 5 12:53:46 everest kernel: [ 568.204213] MARK enter ib_create_named_qp > Sep 5 12:53:46 everest kernel: [ 568.204218] MARK wq_size = 4194304 > Sep 5 12:53:46 everest kernel: [ 568.204219] MARK 1 send queue size (4097 * 640 / 64 -> 65536) exceeds limits(32768) > Sep 5 12:53:46 everest kernel: [ 568.204220] MARK 1 calc_sq_size return ENOMEM > > A hack / fix I tested and that seems to work, or at least prevents immediate failure, is this: > > --- /root/linux-5.11/drivers/nvme/target/rdma.c > +++ rdma.c 2021-09-06 03:05:08.998364562 -0400 > @@ -1397,6 +1397,10 @@ > if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) > return NVME_RDMA_CM_INVALID_HSQSIZE; > > + if ( queue->send_queue_size > 256 ) { > + queue->send_queue_size = 256; > + pr_info("MARK : reducing the queue->send_queue_size to 256"); > + } > /* XXX: Should we enforce some kind of max for IO queues? */ > > return 0; > > --- > > The answer to the question in the code: "Should we enforce some kind of max for IO queues?" seems to be: yes? > Although VMware now discovers and connects to the kernel target the path not working and declared dead. > > The volume appears with a nguid since the target does not set the eui64 field. > However, setting it by using a pass-through device does not solve the issue. > > When I don't set pass-through nvme reports this: > esxcli nvme namespace list > Name Controller Number Namespace ID Block Size Capacity in MB > ------------------------------------- ----------------- ------------ ---------- -------------- > eui.344337304e8001510025384100000001 263 1 4096 12207104 > uuid.fa8ab2201ffb4429ba1719ca0d5a3405 322 1 512 14649344 > > When I use pass-through it reports: > [root@vmw01:~] esxcli nvme namespace list > Name Controller Number Namespace ID Block Size Capacity in MB > ------------------------------------ ----------------- ------------ ---------- -------------- > eui.344337304e8001510025384100000001 263 1 4096 12207104 > eui.344337304e7000780025384100000001 324 1 512 14649344 > > The reason is easy to explain. Without pass-through the kernel target shows this when I query a device with sg_inq: > sg_inq -e -p 0x83 /dev/nvmeXn1 -vvv > VPD INQUIRY: Device Identification page > Designation descriptor number 1, descriptor length: 52 > designator_type: T10 vendor identification, code_set: ASCII > associated with the Target device that contains addressed lu > vendor id: NVMe > vendor specific: testvg/testlv_79d87ff74dac1b27 > > With pass-through the kernel target provides this information for the same device: > VPD INQUIRY: Device Identification page > Designation descriptor number 1, descriptor length: 56 > designator_type: T10 vendor identification, code_set: ASCII > associated with the Target device that contains addressed lu > vendor id: NVMe > vendor specific: SAMSUNG MZWLL12THMLA-00005_S4C7NA0N700078 > Designation descriptor number 2, descriptor length: 20 > designator_type: EUI-64 based, code_set: Binary > associated with the Addressed logical unit > EUI-64 based 16 byte identifier > Identifier extension: 0x344337304e700078 > IEEE Company_id: 0x2538 > Vendor Specific Extension Identifier: 0x410000000103 > [0x344337304e7000780025384100000001] > Designation descriptor number 3, descriptor length: 40 > designator_type: SCSI name string, code_set: UTF-8 > associated with the Addressed logical unit > SCSI name string: > eui.344337304E7000780025384100000001 > > SPDK returns this for the same device: > > VPD INQUIRY: Device Identification page > Designation descriptor number 1, descriptor length: 48 > designator_type: T10 vendor identification, code_set: ASCII > associated with the Target device that contains addressed lu > vendor id: NVMe > vendor specific: SPDK_Controller1_SPDK00000000000001 > Designation descriptor number 2, descriptor length: 20 > designator_type: EUI-64 based, code_set: Binary > associated with the Addressed logical unit > EUI-64 based 16 byte identifier > Identifier extension: 0xe0e9311590254d4f > IEEE Company_id: 0x8fa737 > Vendor Specific Extension Identifier: 0xb56897382503 > [0xe0e9311590254d4f8fa737b568973825] > Designation descriptor number 3, descriptor length: 40 > designator_type: SCSI name string, code_set: UTF-8 > associated with the Addressed logical unit > SCSI name string: > eui.E0E9311590254D4F8FA737B568973825 > > So, the kernel target returns limited information when not using pass-through which forces VMware to use the nguid. > We could use the nguid to fill the eui64 attribute and always report the extended info like we do with a pass-through device? > > ------------------- > --- /root/linux-5.11/drivers/nvme/target/admin-cmd.c 2021-02-14 17:32:24.000000000 -0500 > +++ admin-cmd.c 2021-09-05 06:18:10.836865874 -0400 > @@ -526,6 +526,7 @@ > id->anagrpid = cpu_to_le32(ns->anagrpid); > > memcpy(&id->nguid, &ns->nguid, sizeof(id->nguid)); > + memcpy(&id->eui64, &ns->nguid, sizeof(id->eui64)); > > id->lbaf[0].ds = ns->blksize_shift; > > --- /root/linux-5.11/drivers/nvme/target/configfs.c 2021-02-14 17:32:24.000000000 -0500 > +++ configfs.c 2021-09-05 05:35:35.741619651 -0400 > @@ -477,6 +477,7 @@ > } > > memcpy(&ns->nguid, nguid, sizeof(nguid)); > + memcpy(&ns->eui64, nguid, sizeof(ns->eui64)); > out_unlock: > mutex_unlock(&subsys->lock); > return ret ? ret : count; > -------------- > > Even with pass-through enabled and the kernel target returning all information the path is immediately reported to be dead. > esxcli storage core path list > rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown- > UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown- > Runtime Name: vmhba64:C0:T1:L0 > Device: No associated device > Device Display Name: No associated device > Adapter: vmhba64 > Channel: 0 > Target: 1 > LUN: 0 > Plugin: (unclaimed) > State: dead > Transport: rdma > Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10 > Target Identifier: rdma.unknown > Adapter Transport Details: Unavailable or path is unclaimed > Target Transport Details: Unavailable or path is unclaimed > Maximum IO Size: 131072 > > This may or may not be a Vmware path-checker issue. > Since SPDK does not show this problem some difference between the kernel target and SPDK target must exist. > I don't know if the patch I use that limits the queue-depth to 256 is to blame. > The path for the exact same device exported with SPDK shows up like this: > > rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e > UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e > Runtime Name: vmhba64:C0:T0:L0 > Device: eui.a012ce7696bf47d5be87760d8f78fb8e > Device Display Name: NVMe RDMA Disk (eui.a012ce7696bf47d5be87760d8f78fb8e) > Adapter: vmhba64 > Channel: 0 > Target: 0 > LUN: 0 > Plugin: HPP > State: active > Transport: rdma > Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10 > Target Identifier: rdma.unknown > Adapter Transport Details: Unavailable or path is unclaimed > Target Transport Details: Unavailable or path is unclaimed > Maximum IO Size: 131072 > > It looks like the connect patch does work but something else causes VMware not to accept the nvmet-rdma target devices. > Not sure what to make of that. It could still be eui related? See the UID from the nvmet-rdma target. > > Thanks, > > --Mark > > On 02/09/2021, 23:36, "Max Gurtovoy" <mgurtovoy@nvidia.com> wrote: > > > On 8/31/2021 4:42 PM, Mark Ruijter wrote: > > When I connect an SPDK initiator it will try to connect using 1024 connections. > > The linux target is unable to handle this situation and return an error. > > > > Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed > > Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12 > > Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12). > > > > It is really easy to reproduce the problem, even when not using the SPDK initiator. > > > > Just type: > > nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX > > While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections. > > 1024 connections or is it the queue depth ? > > how many cores you have in initiator ? > > can you give more details on the systems ? > > > > > The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect. > > Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect. > > See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-( > > > > In noticed that someone reported this problem on the SPDK list: > > https://github.com/spdk/spdk/issues/1719 > > > > Thanks, > > > > Mark > > > > --- > > static int > > nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn, > > struct nvmet_rdma_queue *queue) > > { > > struct nvme_rdma_cm_req *req; > > > > req = (struct nvme_rdma_cm_req *)conn->private_data; > > if (!req || conn->private_data_len == 0) > > return NVME_RDMA_CM_INVALID_LEN; > > > > if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0) > > return NVME_RDMA_CM_INVALID_RECFMT; > > > > queue->host_qid = le16_to_cpu(req->qid); > > > > /* > > * req->hsqsize corresponds to our recv queue size plus 1 > > * req->hrqsize corresponds to our send queue size > > */ > > queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1; > > queue->send_queue_size = le16_to_cpu(req->hrqsize); > > if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) { > > pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE); > > return NVME_RDMA_CM_INVALID_HSQSIZE; > > } > > > > + if (queue->recv_queue_size > 256) > > + queue->recv_queue_size = 256; > > + if (queue->send_queue_size > 256) > > + queue->send_queue_size = 256; > > + pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size); > > + pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size); > > > > /* XXX: Should we enforce some kind of max for IO queues? */ > > return 0; > > } > > > > > > > > _______________________________________________ > > Linux-nvme mailing list > > Linux-nvme@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-nvme > _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-09-07 14:26 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-31 13:42 SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma Mark Ruijter 2021-09-01 12:52 ` Sagi Grimberg 2021-09-01 14:51 ` Mark Ruijter 2021-09-01 14:58 ` Sagi Grimberg 2021-09-01 15:08 ` Mark Ruijter 2021-09-02 21:36 ` Max Gurtovoy 2021-09-06 9:12 ` Mark Ruijter 2021-09-07 14:25 ` Max Gurtovoy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.