* SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
@ 2021-08-31 13:42 Mark Ruijter
2021-09-01 12:52 ` Sagi Grimberg
2021-09-02 21:36 ` Max Gurtovoy
0 siblings, 2 replies; 8+ messages in thread
From: Mark Ruijter @ 2021-08-31 13:42 UTC (permalink / raw)
To: linux-nvme
When I connect an SPDK initiator it will try to connect using 1024 connections.
The linux target is unable to handle this situation and return an error.
Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
It is really easy to reproduce the problem, even when not using the SPDK initiator.
Just type:
nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX
While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections.
The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect.
Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect.
See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-(
In noticed that someone reported this problem on the SPDK list:
https://github.com/spdk/spdk/issues/1719
Thanks,
Mark
---
static int
nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
struct nvmet_rdma_queue *queue)
{
struct nvme_rdma_cm_req *req;
req = (struct nvme_rdma_cm_req *)conn->private_data;
if (!req || conn->private_data_len == 0)
return NVME_RDMA_CM_INVALID_LEN;
if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0)
return NVME_RDMA_CM_INVALID_RECFMT;
queue->host_qid = le16_to_cpu(req->qid);
/*
* req->hsqsize corresponds to our recv queue size plus 1
* req->hrqsize corresponds to our send queue size
*/
queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
queue->send_queue_size = le16_to_cpu(req->hrqsize);
if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) {
pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE);
return NVME_RDMA_CM_INVALID_HSQSIZE;
}
+ if (queue->recv_queue_size > 256)
+ queue->recv_queue_size = 256;
+ if (queue->send_queue_size > 256)
+ queue->send_queue_size = 256;
+ pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size);
+ pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size);
/* XXX: Should we enforce some kind of max for IO queues? */
return 0;
}
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
2021-08-31 13:42 SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma Mark Ruijter
@ 2021-09-01 12:52 ` Sagi Grimberg
2021-09-01 14:51 ` Mark Ruijter
2021-09-02 21:36 ` Max Gurtovoy
1 sibling, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2021-09-01 12:52 UTC (permalink / raw)
To: Mark Ruijter, linux-nvme
> When I connect an SPDK initiator it will try to connect using 1024 connections.
> The linux target is unable to handle this situation and return an error.
>
> Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
> Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
> Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
Seems that the target is trying to open a queue-pair that is larger than
supported, which device are you using?
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
2021-09-01 12:52 ` Sagi Grimberg
@ 2021-09-01 14:51 ` Mark Ruijter
2021-09-01 14:58 ` Sagi Grimberg
0 siblings, 1 reply; 8+ messages in thread
From: Mark Ruijter @ 2021-09-01 14:51 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme
Hi Sagi,
I am using VMware 7.x as initiator with RDMA.
The target system is running Ubuntu 20.04.3 LTS with kernel 5.11.22+.
The device that is exported is an LVM volume, however I also tested with a file backed loop device.
Connecting with SPDK seems to be the problem and as reported on the SPDK mailing-list it can be used to reproduce the issue when VMWare is not available.
./perf -q 64 -P 1 -s 4096 -w read -t 300 -c 0x1 -o 4096 -r 'trtype:RDMA adrfam:IPv4 traddr:169.254.85.8 trsvcid:4420'
This seems to produce a similar result:
nvme connect --transport=rdma --queue-size=1024 --nqn=testnqn_1 --traddr=169.254.85.8 --trsvcid=4420
I hope this helps,
--Mark
On 01/09/2021, 14:52, "Sagi Grimberg" <sagi@grimberg.me> wrote:
> When I connect an SPDK initiator it will try to connect using 1024 connections.
> The linux target is unable to handle this situation and return an error.
>
> Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
> Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
> Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
Seems that the target is trying to open a queue-pair that is larger than
supported, which device are you using?
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
2021-09-01 14:51 ` Mark Ruijter
@ 2021-09-01 14:58 ` Sagi Grimberg
2021-09-01 15:08 ` Mark Ruijter
0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2021-09-01 14:58 UTC (permalink / raw)
To: Mark Ruijter, linux-nvme
> Hi Sagi,
>
> I am using VMware 7.x as initiator with RDMA.
> The target system is running Ubuntu 20.04.3 LTS with kernel 5.11.22+.
>
> The device that is exported is an LVM volume, however I also tested with a file backed loop device.
> Connecting with SPDK seems to be the problem and as reported on the SPDK mailing-list it can be used to reproduce the issue when VMWare is not available.
> ./perf -q 64 -P 1 -s 4096 -w read -t 300 -c 0x1 -o 4096 -r 'trtype:RDMA adrfam:IPv4 traddr:169.254.85.8 trsvcid:4420'
>
> This seems to produce a similar result:
> nvme connect --transport=rdma --queue-size=1024 --nqn=testnqn_1 --traddr=169.254.85.8 --trsvcid=4420
>
> I hope this helps,
I meant which rdma device are you using? that device is failing the
qp creation.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
2021-09-01 14:58 ` Sagi Grimberg
@ 2021-09-01 15:08 ` Mark Ruijter
0 siblings, 0 replies; 8+ messages in thread
From: Mark Ruijter @ 2021-09-01 15:08 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme
The device is a Mellanox ConnectX-6 controller.
Vmware can connect to an SPDK target started on the exact same Ubuntu target system.
4: enp129s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4096 qdisc mq state UP group default qlen 25000
link/ether b8:ce:f6:92:b7:b6 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.34/24 brd 192.168.100.255 scope global enp129s0f0np0
valid_lft forever preferred_lft forever
81:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
81:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
--
# cd /sys/kernel/config/nvmet/ports/
# ls
11290 11291 21290 21291
# cd 11290
# cat *
ipv4
192.168.100.34
not specified
4420
rdma
--
On 01/09/2021, 16:58, "Sagi Grimberg" <sagi@grimberg.me> wrote:
> Hi Sagi,
>
> I am using VMware 7.x as initiator with RDMA.
> The target system is running Ubuntu 20.04.3 LTS with kernel 5.11.22+.
>
> The device that is exported is an LVM volume, however I also tested with a file backed loop device.
> Connecting with SPDK seems to be the problem and as reported on the SPDK mailing-list it can be used to reproduce the issue when VMWare is not available.
> ./perf -q 64 -P 1 -s 4096 -w read -t 300 -c 0x1 -o 4096 -r 'trtype:RDMA adrfam:IPv4 traddr:169.254.85.8 trsvcid:4420'
>
> This seems to produce a similar result:
> nvme connect --transport=rdma --queue-size=1024 --nqn=testnqn_1 --traddr=169.254.85.8 --trsvcid=4420
>
> I hope this helps,
I meant which rdma device are you using? that device is failing the
qp creation.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
2021-08-31 13:42 SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma Mark Ruijter
2021-09-01 12:52 ` Sagi Grimberg
@ 2021-09-02 21:36 ` Max Gurtovoy
2021-09-06 9:12 ` Mark Ruijter
1 sibling, 1 reply; 8+ messages in thread
From: Max Gurtovoy @ 2021-09-02 21:36 UTC (permalink / raw)
To: Mark Ruijter, linux-nvme
On 8/31/2021 4:42 PM, Mark Ruijter wrote:
> When I connect an SPDK initiator it will try to connect using 1024 connections.
> The linux target is unable to handle this situation and return an error.
>
> Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
> Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
> Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
>
> It is really easy to reproduce the problem, even when not using the SPDK initiator.
>
> Just type:
> nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX
> While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections.
1024 connections or is it the queue depth ?
how many cores you have in initiator ?
can you give more details on the systems ?
>
> The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect.
> Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect.
> See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-(
>
> In noticed that someone reported this problem on the SPDK list:
> https://github.com/spdk/spdk/issues/1719
>
> Thanks,
>
> Mark
>
> ---
> static int
> nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
> struct nvmet_rdma_queue *queue)
> {
> struct nvme_rdma_cm_req *req;
>
> req = (struct nvme_rdma_cm_req *)conn->private_data;
> if (!req || conn->private_data_len == 0)
> return NVME_RDMA_CM_INVALID_LEN;
>
> if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0)
> return NVME_RDMA_CM_INVALID_RECFMT;
>
> queue->host_qid = le16_to_cpu(req->qid);
>
> /*
> * req->hsqsize corresponds to our recv queue size plus 1
> * req->hrqsize corresponds to our send queue size
> */
> queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
> queue->send_queue_size = le16_to_cpu(req->hrqsize);
> if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) {
> pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE);
> return NVME_RDMA_CM_INVALID_HSQSIZE;
> }
>
> + if (queue->recv_queue_size > 256)
> + queue->recv_queue_size = 256;
> + if (queue->send_queue_size > 256)
> + queue->send_queue_size = 256;
> + pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size);
> + pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size);
>
> /* XXX: Should we enforce some kind of max for IO queues? */
> return 0;
> }
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
2021-09-02 21:36 ` Max Gurtovoy
@ 2021-09-06 9:12 ` Mark Ruijter
2021-09-07 14:25 ` Max Gurtovoy
0 siblings, 1 reply; 8+ messages in thread
From: Mark Ruijter @ 2021-09-06 9:12 UTC (permalink / raw)
To: Max Gurtovoy, linux-nvme
Hi Max,
The system I use has dual AMD EPYC 7452 32-Core Processors.
MemTotal: 197784196 kB
It has a single dual port ConnectX-6 card.
81:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
81:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
The problem is not related to hardware. Vmware works flawlessly using the SPDK target with this system.
The kernel target fails like this:
target/rdma.c -> infiniband/cma.c -> infiniband/verbs.c -> infiniband/hw/mlx5/qp.c
nvmet_rdma_cm_accept -> rdma_accept -> ib_create_named_qp -> create_kernel_qp ->
returns -12 -> mlx5_0: create_qp:2774:(pid 1246): MARK Create QP type 2 failed)
The queue-size is 1024. The mlx5 driver now entered the function calc_sq_size where it fails here and returns ENOMEM.
--
if (qp->sq.wqe_cnt > (1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz))) {
mlx5_ib_dbg(dev, "send queue size (%d * %d / %d -> %d) exceeds limits(%d)\n",
attr->cap.max_send_wr, wqe_size, MLX5_SEND_WQE_BB,
qp->sq.wqe_cnt,
1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz));
return -ENOMEM;
}
--
Sep 5 12:53:45 everest kernel: [ 567.691658] MARK enter ib_create_named_qp
Sep 5 12:53:45 everest kernel: [ 567.691667] MARK wq_size = 2097152
Sep 5 12:53:46 everest kernel: [ 567.692419] MARK create_kernel_qp 0
Sep 5 12:53:46 everest kernel: [ 568.204213] MARK enter ib_create_named_qp
Sep 5 12:53:46 everest kernel: [ 568.204218] MARK wq_size = 4194304
Sep 5 12:53:46 everest kernel: [ 568.204219] MARK 1 send queue size (4097 * 640 / 64 -> 65536) exceeds limits(32768)
Sep 5 12:53:46 everest kernel: [ 568.204220] MARK 1 calc_sq_size return ENOMEM
A hack / fix I tested and that seems to work, or at least prevents immediate failure, is this:
--- /root/linux-5.11/drivers/nvme/target/rdma.c
+++ rdma.c 2021-09-06 03:05:08.998364562 -0400
@@ -1397,6 +1397,10 @@
if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH)
return NVME_RDMA_CM_INVALID_HSQSIZE;
+ if ( queue->send_queue_size > 256 ) {
+ queue->send_queue_size = 256;
+ pr_info("MARK : reducing the queue->send_queue_size to 256");
+ }
/* XXX: Should we enforce some kind of max for IO queues? */
return 0;
---
The answer to the question in the code: "Should we enforce some kind of max for IO queues?" seems to be: yes?
Although VMware now discovers and connects to the kernel target the path not working and declared dead.
The volume appears with a nguid since the target does not set the eui64 field.
However, setting it by using a pass-through device does not solve the issue.
When I don't set pass-through nvme reports this:
esxcli nvme namespace list
Name Controller Number Namespace ID Block Size Capacity in MB
------------------------------------- ----------------- ------------ ---------- --------------
eui.344337304e8001510025384100000001 263 1 4096 12207104
uuid.fa8ab2201ffb4429ba1719ca0d5a3405 322 1 512 14649344
When I use pass-through it reports:
[root@vmw01:~] esxcli nvme namespace list
Name Controller Number Namespace ID Block Size Capacity in MB
------------------------------------ ----------------- ------------ ---------- --------------
eui.344337304e8001510025384100000001 263 1 4096 12207104
eui.344337304e7000780025384100000001 324 1 512 14649344
The reason is easy to explain. Without pass-through the kernel target shows this when I query a device with sg_inq:
sg_inq -e -p 0x83 /dev/nvmeXn1 -vvv
VPD INQUIRY: Device Identification page
Designation descriptor number 1, descriptor length: 52
designator_type: T10 vendor identification, code_set: ASCII
associated with the Target device that contains addressed lu
vendor id: NVMe
vendor specific: testvg/testlv_79d87ff74dac1b27
With pass-through the kernel target provides this information for the same device:
VPD INQUIRY: Device Identification page
Designation descriptor number 1, descriptor length: 56
designator_type: T10 vendor identification, code_set: ASCII
associated with the Target device that contains addressed lu
vendor id: NVMe
vendor specific: SAMSUNG MZWLL12THMLA-00005_S4C7NA0N700078
Designation descriptor number 2, descriptor length: 20
designator_type: EUI-64 based, code_set: Binary
associated with the Addressed logical unit
EUI-64 based 16 byte identifier
Identifier extension: 0x344337304e700078
IEEE Company_id: 0x2538
Vendor Specific Extension Identifier: 0x410000000103
[0x344337304e7000780025384100000001]
Designation descriptor number 3, descriptor length: 40
designator_type: SCSI name string, code_set: UTF-8
associated with the Addressed logical unit
SCSI name string:
eui.344337304E7000780025384100000001
SPDK returns this for the same device:
VPD INQUIRY: Device Identification page
Designation descriptor number 1, descriptor length: 48
designator_type: T10 vendor identification, code_set: ASCII
associated with the Target device that contains addressed lu
vendor id: NVMe
vendor specific: SPDK_Controller1_SPDK00000000000001
Designation descriptor number 2, descriptor length: 20
designator_type: EUI-64 based, code_set: Binary
associated with the Addressed logical unit
EUI-64 based 16 byte identifier
Identifier extension: 0xe0e9311590254d4f
IEEE Company_id: 0x8fa737
Vendor Specific Extension Identifier: 0xb56897382503
[0xe0e9311590254d4f8fa737b568973825]
Designation descriptor number 3, descriptor length: 40
designator_type: SCSI name string, code_set: UTF-8
associated with the Addressed logical unit
SCSI name string:
eui.E0E9311590254D4F8FA737B568973825
So, the kernel target returns limited information when not using pass-through which forces VMware to use the nguid.
We could use the nguid to fill the eui64 attribute and always report the extended info like we do with a pass-through device?
-------------------
--- /root/linux-5.11/drivers/nvme/target/admin-cmd.c 2021-02-14 17:32:24.000000000 -0500
+++ admin-cmd.c 2021-09-05 06:18:10.836865874 -0400
@@ -526,6 +526,7 @@
id->anagrpid = cpu_to_le32(ns->anagrpid);
memcpy(&id->nguid, &ns->nguid, sizeof(id->nguid));
+ memcpy(&id->eui64, &ns->nguid, sizeof(id->eui64));
id->lbaf[0].ds = ns->blksize_shift;
--- /root/linux-5.11/drivers/nvme/target/configfs.c 2021-02-14 17:32:24.000000000 -0500
+++ configfs.c 2021-09-05 05:35:35.741619651 -0400
@@ -477,6 +477,7 @@
}
memcpy(&ns->nguid, nguid, sizeof(nguid));
+ memcpy(&ns->eui64, nguid, sizeof(ns->eui64));
out_unlock:
mutex_unlock(&subsys->lock);
return ret ? ret : count;
--------------
Even with pass-through enabled and the kernel target returning all information the path is immediately reported to be dead.
esxcli storage core path list
rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-
UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-
Runtime Name: vmhba64:C0:T1:L0
Device: No associated device
Device Display Name: No associated device
Adapter: vmhba64
Channel: 0
Target: 1
LUN: 0
Plugin: (unclaimed)
State: dead
Transport: rdma
Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10
Target Identifier: rdma.unknown
Adapter Transport Details: Unavailable or path is unclaimed
Target Transport Details: Unavailable or path is unclaimed
Maximum IO Size: 131072
This may or may not be a Vmware path-checker issue.
Since SPDK does not show this problem some difference between the kernel target and SPDK target must exist.
I don't know if the patch I use that limits the queue-depth to 256 is to blame.
The path for the exact same device exported with SPDK shows up like this:
rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e
UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e
Runtime Name: vmhba64:C0:T0:L0
Device: eui.a012ce7696bf47d5be87760d8f78fb8e
Device Display Name: NVMe RDMA Disk (eui.a012ce7696bf47d5be87760d8f78fb8e)
Adapter: vmhba64
Channel: 0
Target: 0
LUN: 0
Plugin: HPP
State: active
Transport: rdma
Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10
Target Identifier: rdma.unknown
Adapter Transport Details: Unavailable or path is unclaimed
Target Transport Details: Unavailable or path is unclaimed
Maximum IO Size: 131072
It looks like the connect patch does work but something else causes VMware not to accept the nvmet-rdma target devices.
Not sure what to make of that. It could still be eui related? See the UID from the nvmet-rdma target.
Thanks,
--Mark
On 02/09/2021, 23:36, "Max Gurtovoy" <mgurtovoy@nvidia.com> wrote:
On 8/31/2021 4:42 PM, Mark Ruijter wrote:
> When I connect an SPDK initiator it will try to connect using 1024 connections.
> The linux target is unable to handle this situation and return an error.
>
> Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
> Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
> Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
>
> It is really easy to reproduce the problem, even when not using the SPDK initiator.
>
> Just type:
> nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX
> While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections.
1024 connections or is it the queue depth ?
how many cores you have in initiator ?
can you give more details on the systems ?
>
> The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect.
> Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect.
> See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-(
>
> In noticed that someone reported this problem on the SPDK list:
> https://github.com/spdk/spdk/issues/1719
>
> Thanks,
>
> Mark
>
> ---
> static int
> nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
> struct nvmet_rdma_queue *queue)
> {
> struct nvme_rdma_cm_req *req;
>
> req = (struct nvme_rdma_cm_req *)conn->private_data;
> if (!req || conn->private_data_len == 0)
> return NVME_RDMA_CM_INVALID_LEN;
>
> if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0)
> return NVME_RDMA_CM_INVALID_RECFMT;
>
> queue->host_qid = le16_to_cpu(req->qid);
>
> /*
> * req->hsqsize corresponds to our recv queue size plus 1
> * req->hrqsize corresponds to our send queue size
> */
> queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
> queue->send_queue_size = le16_to_cpu(req->hrqsize);
> if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) {
> pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE);
> return NVME_RDMA_CM_INVALID_HSQSIZE;
> }
>
> + if (queue->recv_queue_size > 256)
> + queue->recv_queue_size = 256;
> + if (queue->send_queue_size > 256)
> + queue->send_queue_size = 256;
> + pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size);
> + pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size);
>
> /* XXX: Should we enforce some kind of max for IO queues? */
> return 0;
> }
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
2021-09-06 9:12 ` Mark Ruijter
@ 2021-09-07 14:25 ` Max Gurtovoy
0 siblings, 0 replies; 8+ messages in thread
From: Max Gurtovoy @ 2021-09-07 14:25 UTC (permalink / raw)
To: Mark Ruijter, linux-nvme
On 9/6/2021 12:12 PM, Mark Ruijter wrote:
> Hi Max,
>
> The system I use has dual AMD EPYC 7452 32-Core Processors.
> MemTotal: 197784196 kB
>
> It has a single dual port ConnectX-6 card.
> 81:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
> 81:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
>
> The problem is not related to hardware. Vmware works flawlessly using the SPDK target with this system.
>
> The kernel target fails like this:
> target/rdma.c -> infiniband/cma.c -> infiniband/verbs.c -> infiniband/hw/mlx5/qp.c
> nvmet_rdma_cm_accept -> rdma_accept -> ib_create_named_qp -> create_kernel_qp ->
> returns -12 -> mlx5_0: create_qp:2774:(pid 1246): MARK Create QP type 2 failed)
>
> The queue-size is 1024. The mlx5 driver now entered the function calc_sq_size where it fails here and returns ENOMEM.
Ok I see the issue here.
I can repro it with Linux initiator if I set -Q 1024 in the connect command.
We need to fix few things in the max_qp_wr calculation and add
.get_queue_size op to nvmet_fabrics_ops to solve it completely.
For now you can use 256 queue size in SPDK initiator to work around this.
I'll send a fix.
> --
> if (qp->sq.wqe_cnt > (1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz))) {
> mlx5_ib_dbg(dev, "send queue size (%d * %d / %d -> %d) exceeds limits(%d)\n",
> attr->cap.max_send_wr, wqe_size, MLX5_SEND_WQE_BB,
> qp->sq.wqe_cnt,
> 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz));
> return -ENOMEM;
> }
> --
> Sep 5 12:53:45 everest kernel: [ 567.691658] MARK enter ib_create_named_qp
> Sep 5 12:53:45 everest kernel: [ 567.691667] MARK wq_size = 2097152
> Sep 5 12:53:46 everest kernel: [ 567.692419] MARK create_kernel_qp 0
> Sep 5 12:53:46 everest kernel: [ 568.204213] MARK enter ib_create_named_qp
> Sep 5 12:53:46 everest kernel: [ 568.204218] MARK wq_size = 4194304
> Sep 5 12:53:46 everest kernel: [ 568.204219] MARK 1 send queue size (4097 * 640 / 64 -> 65536) exceeds limits(32768)
> Sep 5 12:53:46 everest kernel: [ 568.204220] MARK 1 calc_sq_size return ENOMEM
>
> A hack / fix I tested and that seems to work, or at least prevents immediate failure, is this:
>
> --- /root/linux-5.11/drivers/nvme/target/rdma.c
> +++ rdma.c 2021-09-06 03:05:08.998364562 -0400
> @@ -1397,6 +1397,10 @@
> if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH)
> return NVME_RDMA_CM_INVALID_HSQSIZE;
>
> + if ( queue->send_queue_size > 256 ) {
> + queue->send_queue_size = 256;
> + pr_info("MARK : reducing the queue->send_queue_size to 256");
> + }
> /* XXX: Should we enforce some kind of max for IO queues? */
>
> return 0;
>
> ---
>
> The answer to the question in the code: "Should we enforce some kind of max for IO queues?" seems to be: yes?
> Although VMware now discovers and connects to the kernel target the path not working and declared dead.
>
> The volume appears with a nguid since the target does not set the eui64 field.
> However, setting it by using a pass-through device does not solve the issue.
>
> When I don't set pass-through nvme reports this:
> esxcli nvme namespace list
> Name Controller Number Namespace ID Block Size Capacity in MB
> ------------------------------------- ----------------- ------------ ---------- --------------
> eui.344337304e8001510025384100000001 263 1 4096 12207104
> uuid.fa8ab2201ffb4429ba1719ca0d5a3405 322 1 512 14649344
>
> When I use pass-through it reports:
> [root@vmw01:~] esxcli nvme namespace list
> Name Controller Number Namespace ID Block Size Capacity in MB
> ------------------------------------ ----------------- ------------ ---------- --------------
> eui.344337304e8001510025384100000001 263 1 4096 12207104
> eui.344337304e7000780025384100000001 324 1 512 14649344
>
> The reason is easy to explain. Without pass-through the kernel target shows this when I query a device with sg_inq:
> sg_inq -e -p 0x83 /dev/nvmeXn1 -vvv
> VPD INQUIRY: Device Identification page
> Designation descriptor number 1, descriptor length: 52
> designator_type: T10 vendor identification, code_set: ASCII
> associated with the Target device that contains addressed lu
> vendor id: NVMe
> vendor specific: testvg/testlv_79d87ff74dac1b27
>
> With pass-through the kernel target provides this information for the same device:
> VPD INQUIRY: Device Identification page
> Designation descriptor number 1, descriptor length: 56
> designator_type: T10 vendor identification, code_set: ASCII
> associated with the Target device that contains addressed lu
> vendor id: NVMe
> vendor specific: SAMSUNG MZWLL12THMLA-00005_S4C7NA0N700078
> Designation descriptor number 2, descriptor length: 20
> designator_type: EUI-64 based, code_set: Binary
> associated with the Addressed logical unit
> EUI-64 based 16 byte identifier
> Identifier extension: 0x344337304e700078
> IEEE Company_id: 0x2538
> Vendor Specific Extension Identifier: 0x410000000103
> [0x344337304e7000780025384100000001]
> Designation descriptor number 3, descriptor length: 40
> designator_type: SCSI name string, code_set: UTF-8
> associated with the Addressed logical unit
> SCSI name string:
> eui.344337304E7000780025384100000001
>
> SPDK returns this for the same device:
>
> VPD INQUIRY: Device Identification page
> Designation descriptor number 1, descriptor length: 48
> designator_type: T10 vendor identification, code_set: ASCII
> associated with the Target device that contains addressed lu
> vendor id: NVMe
> vendor specific: SPDK_Controller1_SPDK00000000000001
> Designation descriptor number 2, descriptor length: 20
> designator_type: EUI-64 based, code_set: Binary
> associated with the Addressed logical unit
> EUI-64 based 16 byte identifier
> Identifier extension: 0xe0e9311590254d4f
> IEEE Company_id: 0x8fa737
> Vendor Specific Extension Identifier: 0xb56897382503
> [0xe0e9311590254d4f8fa737b568973825]
> Designation descriptor number 3, descriptor length: 40
> designator_type: SCSI name string, code_set: UTF-8
> associated with the Addressed logical unit
> SCSI name string:
> eui.E0E9311590254D4F8FA737B568973825
>
> So, the kernel target returns limited information when not using pass-through which forces VMware to use the nguid.
> We could use the nguid to fill the eui64 attribute and always report the extended info like we do with a pass-through device?
>
> -------------------
> --- /root/linux-5.11/drivers/nvme/target/admin-cmd.c 2021-02-14 17:32:24.000000000 -0500
> +++ admin-cmd.c 2021-09-05 06:18:10.836865874 -0400
> @@ -526,6 +526,7 @@
> id->anagrpid = cpu_to_le32(ns->anagrpid);
>
> memcpy(&id->nguid, &ns->nguid, sizeof(id->nguid));
> + memcpy(&id->eui64, &ns->nguid, sizeof(id->eui64));
>
> id->lbaf[0].ds = ns->blksize_shift;
>
> --- /root/linux-5.11/drivers/nvme/target/configfs.c 2021-02-14 17:32:24.000000000 -0500
> +++ configfs.c 2021-09-05 05:35:35.741619651 -0400
> @@ -477,6 +477,7 @@
> }
>
> memcpy(&ns->nguid, nguid, sizeof(nguid));
> + memcpy(&ns->eui64, nguid, sizeof(ns->eui64));
> out_unlock:
> mutex_unlock(&subsys->lock);
> return ret ? ret : count;
> --------------
>
> Even with pass-through enabled and the kernel target returning all information the path is immediately reported to be dead.
> esxcli storage core path list
> rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-
> UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-
> Runtime Name: vmhba64:C0:T1:L0
> Device: No associated device
> Device Display Name: No associated device
> Adapter: vmhba64
> Channel: 0
> Target: 1
> LUN: 0
> Plugin: (unclaimed)
> State: dead
> Transport: rdma
> Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10
> Target Identifier: rdma.unknown
> Adapter Transport Details: Unavailable or path is unclaimed
> Target Transport Details: Unavailable or path is unclaimed
> Maximum IO Size: 131072
>
> This may or may not be a Vmware path-checker issue.
> Since SPDK does not show this problem some difference between the kernel target and SPDK target must exist.
> I don't know if the patch I use that limits the queue-depth to 256 is to blame.
> The path for the exact same device exported with SPDK shows up like this:
>
> rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e
> UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e
> Runtime Name: vmhba64:C0:T0:L0
> Device: eui.a012ce7696bf47d5be87760d8f78fb8e
> Device Display Name: NVMe RDMA Disk (eui.a012ce7696bf47d5be87760d8f78fb8e)
> Adapter: vmhba64
> Channel: 0
> Target: 0
> LUN: 0
> Plugin: HPP
> State: active
> Transport: rdma
> Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10
> Target Identifier: rdma.unknown
> Adapter Transport Details: Unavailable or path is unclaimed
> Target Transport Details: Unavailable or path is unclaimed
> Maximum IO Size: 131072
>
> It looks like the connect patch does work but something else causes VMware not to accept the nvmet-rdma target devices.
> Not sure what to make of that. It could still be eui related? See the UID from the nvmet-rdma target.
>
> Thanks,
>
> --Mark
>
> On 02/09/2021, 23:36, "Max Gurtovoy" <mgurtovoy@nvidia.com> wrote:
>
>
> On 8/31/2021 4:42 PM, Mark Ruijter wrote:
> > When I connect an SPDK initiator it will try to connect using 1024 connections.
> > The linux target is unable to handle this situation and return an error.
> >
> > Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
> > Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
> > Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
> >
> > It is really easy to reproduce the problem, even when not using the SPDK initiator.
> >
> > Just type:
> > nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX
> > While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections.
>
> 1024 connections or is it the queue depth ?
>
> how many cores you have in initiator ?
>
> can you give more details on the systems ?
>
> >
> > The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect.
> > Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect.
> > See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-(
> >
> > In noticed that someone reported this problem on the SPDK list:
> > https://github.com/spdk/spdk/issues/1719
> >
> > Thanks,
> >
> > Mark
> >
> > ---
> > static int
> > nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
> > struct nvmet_rdma_queue *queue)
> > {
> > struct nvme_rdma_cm_req *req;
> >
> > req = (struct nvme_rdma_cm_req *)conn->private_data;
> > if (!req || conn->private_data_len == 0)
> > return NVME_RDMA_CM_INVALID_LEN;
> >
> > if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0)
> > return NVME_RDMA_CM_INVALID_RECFMT;
> >
> > queue->host_qid = le16_to_cpu(req->qid);
> >
> > /*
> > * req->hsqsize corresponds to our recv queue size plus 1
> > * req->hrqsize corresponds to our send queue size
> > */
> > queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
> > queue->send_queue_size = le16_to_cpu(req->hrqsize);
> > if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) {
> > pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE);
> > return NVME_RDMA_CM_INVALID_HSQSIZE;
> > }
> >
> > + if (queue->recv_queue_size > 256)
> > + queue->recv_queue_size = 256;
> > + if (queue->send_queue_size > 256)
> > + queue->send_queue_size = 256;
> > + pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size);
> > + pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size);
> >
> > /* XXX: Should we enforce some kind of max for IO queues? */
> > return 0;
> > }
> >
> >
> >
> > _______________________________________________
> > Linux-nvme mailing list
> > Linux-nvme@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-nvme
>
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-09-07 14:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31 13:42 SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma Mark Ruijter
2021-09-01 12:52 ` Sagi Grimberg
2021-09-01 14:51 ` Mark Ruijter
2021-09-01 14:58 ` Sagi Grimberg
2021-09-01 15:08 ` Mark Ruijter
2021-09-02 21:36 ` Max Gurtovoy
2021-09-06 9:12 ` Mark Ruijter
2021-09-07 14:25 ` Max Gurtovoy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.