All of lore.kernel.org
 help / color / mirror / Atom feed
* SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
@ 2021-08-31 13:42 Mark Ruijter
  2021-09-01 12:52 ` Sagi Grimberg
  2021-09-02 21:36 ` Max Gurtovoy
  0 siblings, 2 replies; 8+ messages in thread
From: Mark Ruijter @ 2021-08-31 13:42 UTC (permalink / raw)
  To: linux-nvme

When I connect an SPDK initiator it will try to connect using 1024 connections.
The linux target is unable to handle this situation and return an error.

Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).

It is really easy to reproduce the problem, even when not using the SPDK initiator.

Just type:
nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX
While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections.

The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect.
Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect.
See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-(

In noticed that someone reported this problem on the SPDK list: 
https://github.com/spdk/spdk/issues/1719

Thanks,

Mark

---
static int
nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
                                struct nvmet_rdma_queue *queue)
{
        struct nvme_rdma_cm_req *req;

        req = (struct nvme_rdma_cm_req *)conn->private_data;
        if (!req || conn->private_data_len == 0)
                return NVME_RDMA_CM_INVALID_LEN;

        if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0)
                return NVME_RDMA_CM_INVALID_RECFMT;

        queue->host_qid = le16_to_cpu(req->qid);

        /*
         * req->hsqsize corresponds to our recv queue size plus 1
         * req->hrqsize corresponds to our send queue size
         */
        queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
        queue->send_queue_size = le16_to_cpu(req->hrqsize);
        if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) {
                pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE);
                return NVME_RDMA_CM_INVALID_HSQSIZE;
        }

+        if (queue->recv_queue_size > 256)
+               queue->recv_queue_size = 256;
+        if (queue->send_queue_size > 256)
+               queue->send_queue_size = 256;
+       pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size);
+       pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size);

        /* XXX: Should we enforce some kind of max for IO queues? */
        return 0;
}



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
  2021-08-31 13:42 SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma Mark Ruijter
@ 2021-09-01 12:52 ` Sagi Grimberg
  2021-09-01 14:51   ` Mark Ruijter
  2021-09-02 21:36 ` Max Gurtovoy
  1 sibling, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2021-09-01 12:52 UTC (permalink / raw)
  To: Mark Ruijter, linux-nvme


> When I connect an SPDK initiator it will try to connect using 1024 connections.
> The linux target is unable to handle this situation and return an error.
> 
> Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
> Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
> Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).

Seems that the target is trying to open a queue-pair that is larger than
supported, which device are you using?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
  2021-09-01 12:52 ` Sagi Grimberg
@ 2021-09-01 14:51   ` Mark Ruijter
  2021-09-01 14:58     ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Ruijter @ 2021-09-01 14:51 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme

Hi Sagi,

I am using VMware 7.x as initiator with RDMA.
The target system is running Ubuntu 20.04.3 LTS with kernel 5.11.22+.

The device that is exported is an LVM volume, however I also tested with a file backed loop device.
Connecting with SPDK seems to be the problem and as reported on the SPDK mailing-list it can be used to reproduce the issue when VMWare is not available.
./perf -q 64 -P 1 -s 4096 -w read -t 300 -c 0x1 -o 4096 -r 'trtype:RDMA adrfam:IPv4 traddr:169.254.85.8 trsvcid:4420'

This seems to produce a similar result:
nvme connect --transport=rdma --queue-size=1024 --nqn=testnqn_1 --traddr=169.254.85.8 --trsvcid=4420

I hope this helps,

--Mark

On 01/09/2021, 14:52, "Sagi Grimberg" <sagi@grimberg.me> wrote:


    > When I connect an SPDK initiator it will try to connect using 1024 connections.
    > The linux target is unable to handle this situation and return an error.
    > 
    > Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
    > Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
    > Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).

    Seems that the target is trying to open a queue-pair that is larger than
    supported, which device are you using?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
  2021-09-01 14:51   ` Mark Ruijter
@ 2021-09-01 14:58     ` Sagi Grimberg
  2021-09-01 15:08       ` Mark Ruijter
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2021-09-01 14:58 UTC (permalink / raw)
  To: Mark Ruijter, linux-nvme


> Hi Sagi,
> 
> I am using VMware 7.x as initiator with RDMA.
> The target system is running Ubuntu 20.04.3 LTS with kernel 5.11.22+.
> 
> The device that is exported is an LVM volume, however I also tested with a file backed loop device.
> Connecting with SPDK seems to be the problem and as reported on the SPDK mailing-list it can be used to reproduce the issue when VMWare is not available.
> ./perf -q 64 -P 1 -s 4096 -w read -t 300 -c 0x1 -o 4096 -r 'trtype:RDMA adrfam:IPv4 traddr:169.254.85.8 trsvcid:4420'
> 
> This seems to produce a similar result:
> nvme connect --transport=rdma --queue-size=1024 --nqn=testnqn_1 --traddr=169.254.85.8 --trsvcid=4420
> 
> I hope this helps,

I meant which rdma device are you using? that device is failing the
qp creation.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
  2021-09-01 14:58     ` Sagi Grimberg
@ 2021-09-01 15:08       ` Mark Ruijter
  0 siblings, 0 replies; 8+ messages in thread
From: Mark Ruijter @ 2021-09-01 15:08 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme


The device is a Mellanox ConnectX-6 controller.
Vmware can connect to an SPDK target started on the exact same Ubuntu target system.

4: enp129s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4096 qdisc mq state UP group default qlen 25000
    link/ether b8:ce:f6:92:b7:b6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.34/24 brd 192.168.100.255 scope global enp129s0f0np0
       valid_lft forever preferred_lft forever

81:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
81:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]

--
# cd /sys/kernel/config/nvmet/ports/
# ls
11290  11291  21290  21291
# cd 11290
# cat *
ipv4
192.168.100.34
not specified
4420
rdma
--

On 01/09/2021, 16:58, "Sagi Grimberg" <sagi@grimberg.me> wrote:


    > Hi Sagi,
    > 
    > I am using VMware 7.x as initiator with RDMA.
    > The target system is running Ubuntu 20.04.3 LTS with kernel 5.11.22+.
    > 
    > The device that is exported is an LVM volume, however I also tested with a file backed loop device.
    > Connecting with SPDK seems to be the problem and as reported on the SPDK mailing-list it can be used to reproduce the issue when VMWare is not available.
    > ./perf -q 64 -P 1 -s 4096 -w read -t 300 -c 0x1 -o 4096 -r 'trtype:RDMA adrfam:IPv4 traddr:169.254.85.8 trsvcid:4420'
    > 
    > This seems to produce a similar result:
    > nvme connect --transport=rdma --queue-size=1024 --nqn=testnqn_1 --traddr=169.254.85.8 --trsvcid=4420
    > 
    > I hope this helps,

    I meant which rdma device are you using? that device is failing the
    qp creation.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
  2021-08-31 13:42 SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma Mark Ruijter
  2021-09-01 12:52 ` Sagi Grimberg
@ 2021-09-02 21:36 ` Max Gurtovoy
  2021-09-06  9:12   ` Mark Ruijter
  1 sibling, 1 reply; 8+ messages in thread
From: Max Gurtovoy @ 2021-09-02 21:36 UTC (permalink / raw)
  To: Mark Ruijter, linux-nvme


On 8/31/2021 4:42 PM, Mark Ruijter wrote:
> When I connect an SPDK initiator it will try to connect using 1024 connections.
> The linux target is unable to handle this situation and return an error.
>
> Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
> Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
> Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
>
> It is really easy to reproduce the problem, even when not using the SPDK initiator.
>
> Just type:
> nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX
> While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections.

1024 connections or is it the queue depth ?

how many cores you have in initiator ?

can you give more details on the systems ?

>
> The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect.
> Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect.
> See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-(
>
> In noticed that someone reported this problem on the SPDK list:
> https://github.com/spdk/spdk/issues/1719
>
> Thanks,
>
> Mark
>
> ---
> static int
> nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
>                                  struct nvmet_rdma_queue *queue)
> {
>          struct nvme_rdma_cm_req *req;
>
>          req = (struct nvme_rdma_cm_req *)conn->private_data;
>          if (!req || conn->private_data_len == 0)
>                  return NVME_RDMA_CM_INVALID_LEN;
>
>          if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0)
>                  return NVME_RDMA_CM_INVALID_RECFMT;
>
>          queue->host_qid = le16_to_cpu(req->qid);
>
>          /*
>           * req->hsqsize corresponds to our recv queue size plus 1
>           * req->hrqsize corresponds to our send queue size
>           */
>          queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
>          queue->send_queue_size = le16_to_cpu(req->hrqsize);
>          if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) {
>                  pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE);
>                  return NVME_RDMA_CM_INVALID_HSQSIZE;
>          }
>
> +        if (queue->recv_queue_size > 256)
> +               queue->recv_queue_size = 256;
> +        if (queue->send_queue_size > 256)
> +               queue->send_queue_size = 256;
> +       pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size);
> +       pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size);
>
>          /* XXX: Should we enforce some kind of max for IO queues? */
>          return 0;
> }
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
  2021-09-02 21:36 ` Max Gurtovoy
@ 2021-09-06  9:12   ` Mark Ruijter
  2021-09-07 14:25     ` Max Gurtovoy
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Ruijter @ 2021-09-06  9:12 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme

Hi Max,

The system I use has dual AMD EPYC 7452 32-Core Processors.
MemTotal:       197784196 kB

It has a single dual port ConnectX-6 card.
81:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
81:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]

The problem is not related to hardware. Vmware works flawlessly using the SPDK target with this system.

The kernel target fails like this:
target/rdma.c                          -> infiniband/cma.c -> infiniband/verbs.c       -> infiniband/hw/mlx5/qp.c
nvmet_rdma_cm_accept      -> rdma_accept        -> ib_create_named_qp -> create_kernel_qp ->
returns -12  -> mlx5_0: create_qp:2774:(pid 1246): MARK Create QP type 2 failed)

The queue-size is 1024. The mlx5 driver now entered the function calc_sq_size where it fails here and returns ENOMEM.
--
 if (qp->sq.wqe_cnt > (1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz))) {
                mlx5_ib_dbg(dev, "send queue size (%d * %d / %d -> %d) exceeds limits(%d)\n",
                            attr->cap.max_send_wr, wqe_size, MLX5_SEND_WQE_BB,
                            qp->sq.wqe_cnt,
                            1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz));
                return -ENOMEM;
}
--
Sep  5 12:53:45 everest kernel: [  567.691658] MARK enter ib_create_named_qp
Sep  5 12:53:45 everest kernel: [  567.691667] MARK wq_size = 2097152
Sep  5 12:53:46 everest kernel: [  567.692419] MARK create_kernel_qp 0
Sep  5 12:53:46 everest kernel: [  568.204213] MARK enter ib_create_named_qp
Sep  5 12:53:46 everest kernel: [  568.204218] MARK wq_size = 4194304
Sep  5 12:53:46 everest kernel: [  568.204219] MARK 1 send queue size (4097 * 640 / 64 -> 65536) exceeds limits(32768)
Sep  5 12:53:46 everest kernel: [  568.204220] MARK 1 calc_sq_size return ENOMEM

A hack / fix I tested and that seems to work, or at least prevents immediate failure, is this:

--- /root/linux-5.11/drivers/nvme/target/rdma.c	
+++ rdma.c	2021-09-06 03:05:08.998364562 -0400
@@ -1397,6 +1397,10 @@
 	if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH)
 		return NVME_RDMA_CM_INVALID_HSQSIZE;
 
+	if ( queue->send_queue_size > 256 ) {
+		queue->send_queue_size = 256;
+		pr_info("MARK : reducing the queue->send_queue_size to 256");
+	}
 	/* XXX: Should we enforce some kind of max for IO queues? */
 
 	return 0;

---

The answer to the question in the code: "Should we enforce some kind of max for IO queues?" seems to be: yes?
Although VMware now discovers and connects to the kernel target the path not working and declared dead.

The volume appears with a nguid since the target does not set the eui64 field. 
However, setting it by using a pass-through device does not solve the issue.

When I don't set pass-through nvme reports this:
esxcli nvme namespace list
Name                                   Controller Number  Namespace ID  Block Size  Capacity in MB
-------------------------------------  -----------------  ------------  ----------  --------------
eui.344337304e8001510025384100000001                 263             1        4096        12207104
uuid.fa8ab2201ffb4429ba1719ca0d5a3405                322             1         512        14649344

When I use pass-through it reports:
[root@vmw01:~] esxcli nvme namespace list
Name                                  Controller Number  Namespace ID  Block Size  Capacity in MB
------------------------------------  -----------------  ------------  ----------  --------------
eui.344337304e8001510025384100000001                263             1        4096        12207104
eui.344337304e7000780025384100000001                324             1         512        14649344

The reason is easy to explain. Without pass-through the kernel target shows this when I query a device with sg_inq:
sg_inq -e -p 0x83 /dev/nvmeXn1 -vvv
VPD INQUIRY: Device Identification page
  Designation descriptor number 1, descriptor length: 52
    designator_type: T10 vendor identification,  code_set: ASCII
    associated with the Target device that contains addressed lu
      vendor id: NVMe    
      vendor specific: testvg/testlv_79d87ff74dac1b27

With pass-through the kernel target provides this information for the same device:
VPD INQUIRY: Device Identification page
  Designation descriptor number 1, descriptor length: 56
    designator_type: T10 vendor identification,  code_set: ASCII
    associated with the Target device that contains addressed lu
      vendor id: NVMe    
      vendor specific: SAMSUNG MZWLL12THMLA-00005_S4C7NA0N700078
  Designation descriptor number 2, descriptor length: 20
    designator_type: EUI-64 based,  code_set: Binary
    associated with the Addressed logical unit
      EUI-64 based 16 byte identifier
      Identifier extension: 0x344337304e700078
      IEEE Company_id: 0x2538
      Vendor Specific Extension Identifier: 0x410000000103
      [0x344337304e7000780025384100000001]
  Designation descriptor number 3, descriptor length: 40
    designator_type: SCSI name string,  code_set: UTF-8
    associated with the Addressed logical unit
      SCSI name string:
      eui.344337304E7000780025384100000001

SPDK returns this for the same device:

VPD INQUIRY: Device Identification page
  Designation descriptor number 1, descriptor length: 48
    designator_type: T10 vendor identification,  code_set: ASCII
    associated with the Target device that contains addressed lu
      vendor id: NVMe    
      vendor specific: SPDK_Controller1_SPDK00000000000001
  Designation descriptor number 2, descriptor length: 20
    designator_type: EUI-64 based,  code_set: Binary
    associated with the Addressed logical unit
      EUI-64 based 16 byte identifier
      Identifier extension: 0xe0e9311590254d4f
      IEEE Company_id: 0x8fa737
      Vendor Specific Extension Identifier: 0xb56897382503
      [0xe0e9311590254d4f8fa737b568973825]
  Designation descriptor number 3, descriptor length: 40
    designator_type: SCSI name string,  code_set: UTF-8
    associated with the Addressed logical unit
      SCSI name string:
      eui.E0E9311590254D4F8FA737B568973825

So, the kernel target returns limited information when not using pass-through which forces VMware to use the nguid.
We could use the nguid to fill the eui64 attribute and always report the extended info like we do with a pass-through device?

-------------------
--- /root/linux-5.11/drivers/nvme/target/admin-cmd.c	2021-02-14 17:32:24.000000000 -0500
+++ admin-cmd.c	2021-09-05 06:18:10.836865874 -0400
@@ -526,6 +526,7 @@
 	id->anagrpid = cpu_to_le32(ns->anagrpid);
 
 	memcpy(&id->nguid, &ns->nguid, sizeof(id->nguid));
+	memcpy(&id->eui64, &ns->nguid, sizeof(id->eui64));
 
 	id->lbaf[0].ds = ns->blksize_shift;

--- /root/linux-5.11/drivers/nvme/target/configfs.c	2021-02-14 17:32:24.000000000 -0500
+++ configfs.c	2021-09-05 05:35:35.741619651 -0400
@@ -477,6 +477,7 @@
 	}
 
 	memcpy(&ns->nguid, nguid, sizeof(nguid));
+	memcpy(&ns->eui64, nguid, sizeof(ns->eui64));
 out_unlock:
 	mutex_unlock(&subsys->lock);
 	return ret ? ret : count;
--------------

Even with pass-through enabled and the kernel target returning all information the path is immediately reported to be dead.
esxcli storage core path list
rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-
   UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-
   Runtime Name: vmhba64:C0:T1:L0
   Device: No associated device
   Device Display Name: No associated device
   Adapter: vmhba64
   Channel: 0
   Target: 1
   LUN: 0
   Plugin: (unclaimed)
   State: dead
   Transport: rdma
   Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10
   Target Identifier: rdma.unknown
   Adapter Transport Details: Unavailable or path is unclaimed
   Target Transport Details: Unavailable or path is unclaimed
   Maximum IO Size: 131072

This may or may not be a Vmware path-checker issue.
Since SPDK does not show this problem some difference between the kernel target and SPDK target must exist.
I don't know if the patch I use that limits the queue-depth to 256 is to blame.
The path for the exact same device exported with SPDK shows up like this:

rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e
   UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e
   Runtime Name: vmhba64:C0:T0:L0
   Device: eui.a012ce7696bf47d5be87760d8f78fb8e
   Device Display Name: NVMe RDMA Disk (eui.a012ce7696bf47d5be87760d8f78fb8e)
   Adapter: vmhba64
   Channel: 0
   Target: 0
   LUN: 0
   Plugin: HPP
   State: active
   Transport: rdma
   Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10
   Target Identifier: rdma.unknown
   Adapter Transport Details: Unavailable or path is unclaimed
   Target Transport Details: Unavailable or path is unclaimed
   Maximum IO Size: 131072

It looks like the connect patch does work but something else causes VMware not to accept the nvmet-rdma target devices.
Not sure what to make of that. It could still be eui related? See the UID from the nvmet-rdma target.

Thanks,

--Mark

On 02/09/2021, 23:36, "Max Gurtovoy" <mgurtovoy@nvidia.com> wrote:


    On 8/31/2021 4:42 PM, Mark Ruijter wrote:
    > When I connect an SPDK initiator it will try to connect using 1024 connections.
    > The linux target is unable to handle this situation and return an error.
    >
    > Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
    > Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
    > Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
    >
    > It is really easy to reproduce the problem, even when not using the SPDK initiator.
    >
    > Just type:
    > nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX
    > While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections.

    1024 connections or is it the queue depth ?

    how many cores you have in initiator ?

    can you give more details on the systems ?

    >
    > The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect.
    > Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect.
    > See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-(
    >
    > In noticed that someone reported this problem on the SPDK list:
    > https://github.com/spdk/spdk/issues/1719
    >
    > Thanks,
    >
    > Mark
    >
    > ---
    > static int
    > nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
    >                                  struct nvmet_rdma_queue *queue)
    > {
    >          struct nvme_rdma_cm_req *req;
    >
    >          req = (struct nvme_rdma_cm_req *)conn->private_data;
    >          if (!req || conn->private_data_len == 0)
    >                  return NVME_RDMA_CM_INVALID_LEN;
    >
    >          if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0)
    >                  return NVME_RDMA_CM_INVALID_RECFMT;
    >
    >          queue->host_qid = le16_to_cpu(req->qid);
    >
    >          /*
    >           * req->hsqsize corresponds to our recv queue size plus 1
    >           * req->hrqsize corresponds to our send queue size
    >           */
    >          queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
    >          queue->send_queue_size = le16_to_cpu(req->hrqsize);
    >          if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) {
    >                  pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE);
    >                  return NVME_RDMA_CM_INVALID_HSQSIZE;
    >          }
    >
    > +        if (queue->recv_queue_size > 256)
    > +               queue->recv_queue_size = 256;
    > +        if (queue->send_queue_size > 256)
    > +               queue->send_queue_size = 256;
    > +       pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size);
    > +       pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size);
    >
    >          /* XXX: Should we enforce some kind of max for IO queues? */
    >          return 0;
    > }
    >
    >
    >
    > _______________________________________________
    > Linux-nvme mailing list
    > Linux-nvme@lists.infradead.org
    > http://lists.infradead.org/mailman/listinfo/linux-nvme

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
  2021-09-06  9:12   ` Mark Ruijter
@ 2021-09-07 14:25     ` Max Gurtovoy
  0 siblings, 0 replies; 8+ messages in thread
From: Max Gurtovoy @ 2021-09-07 14:25 UTC (permalink / raw)
  To: Mark Ruijter, linux-nvme


On 9/6/2021 12:12 PM, Mark Ruijter wrote:
> Hi Max,
>
> The system I use has dual AMD EPYC 7452 32-Core Processors.
> MemTotal:       197784196 kB
>
> It has a single dual port ConnectX-6 card.
> 81:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
> 81:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
>
> The problem is not related to hardware. Vmware works flawlessly using the SPDK target with this system.
>
> The kernel target fails like this:
> target/rdma.c                          -> infiniband/cma.c -> infiniband/verbs.c       -> infiniband/hw/mlx5/qp.c
> nvmet_rdma_cm_accept      -> rdma_accept        -> ib_create_named_qp -> create_kernel_qp ->
> returns -12  -> mlx5_0: create_qp:2774:(pid 1246): MARK Create QP type 2 failed)
>
> The queue-size is 1024. The mlx5 driver now entered the function calc_sq_size where it fails here and returns ENOMEM.

Ok I see the issue here.

I can repro it with Linux initiator if I set -Q 1024 in the connect command.

We need to fix few things in the max_qp_wr calculation and add 
.get_queue_size op to nvmet_fabrics_ops to solve it completely.

For now you can use 256 queue size in SPDK initiator to work around this.

I'll send a fix.

> --
>   if (qp->sq.wqe_cnt > (1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz))) {
>                  mlx5_ib_dbg(dev, "send queue size (%d * %d / %d -> %d) exceeds limits(%d)\n",
>                              attr->cap.max_send_wr, wqe_size, MLX5_SEND_WQE_BB,
>                              qp->sq.wqe_cnt,
>                              1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz));
>                  return -ENOMEM;
> }
> --
> Sep  5 12:53:45 everest kernel: [  567.691658] MARK enter ib_create_named_qp
> Sep  5 12:53:45 everest kernel: [  567.691667] MARK wq_size = 2097152
> Sep  5 12:53:46 everest kernel: [  567.692419] MARK create_kernel_qp 0
> Sep  5 12:53:46 everest kernel: [  568.204213] MARK enter ib_create_named_qp
> Sep  5 12:53:46 everest kernel: [  568.204218] MARK wq_size = 4194304
> Sep  5 12:53:46 everest kernel: [  568.204219] MARK 1 send queue size (4097 * 640 / 64 -> 65536) exceeds limits(32768)
> Sep  5 12:53:46 everest kernel: [  568.204220] MARK 1 calc_sq_size return ENOMEM
>
> A hack / fix I tested and that seems to work, or at least prevents immediate failure, is this:
>
> --- /root/linux-5.11/drivers/nvme/target/rdma.c	
> +++ rdma.c	2021-09-06 03:05:08.998364562 -0400
> @@ -1397,6 +1397,10 @@
>   	if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH)
>   		return NVME_RDMA_CM_INVALID_HSQSIZE;
>   
> +	if ( queue->send_queue_size > 256 ) {
> +		queue->send_queue_size = 256;
> +		pr_info("MARK : reducing the queue->send_queue_size to 256");
> +	}
>   	/* XXX: Should we enforce some kind of max for IO queues? */
>   
>   	return 0;
>
> ---
>
> The answer to the question in the code: "Should we enforce some kind of max for IO queues?" seems to be: yes?
> Although VMware now discovers and connects to the kernel target the path not working and declared dead.
>
> The volume appears with a nguid since the target does not set the eui64 field.
> However, setting it by using a pass-through device does not solve the issue.
>
> When I don't set pass-through nvme reports this:
> esxcli nvme namespace list
> Name                                   Controller Number  Namespace ID  Block Size  Capacity in MB
> -------------------------------------  -----------------  ------------  ----------  --------------
> eui.344337304e8001510025384100000001                 263             1        4096        12207104
> uuid.fa8ab2201ffb4429ba1719ca0d5a3405                322             1         512        14649344
>
> When I use pass-through it reports:
> [root@vmw01:~] esxcli nvme namespace list
> Name                                  Controller Number  Namespace ID  Block Size  Capacity in MB
> ------------------------------------  -----------------  ------------  ----------  --------------
> eui.344337304e8001510025384100000001                263             1        4096        12207104
> eui.344337304e7000780025384100000001                324             1         512        14649344
>
> The reason is easy to explain. Without pass-through the kernel target shows this when I query a device with sg_inq:
> sg_inq -e -p 0x83 /dev/nvmeXn1 -vvv
> VPD INQUIRY: Device Identification page
>    Designation descriptor number 1, descriptor length: 52
>      designator_type: T10 vendor identification,  code_set: ASCII
>      associated with the Target device that contains addressed lu
>        vendor id: NVMe
>        vendor specific: testvg/testlv_79d87ff74dac1b27
>
> With pass-through the kernel target provides this information for the same device:
> VPD INQUIRY: Device Identification page
>    Designation descriptor number 1, descriptor length: 56
>      designator_type: T10 vendor identification,  code_set: ASCII
>      associated with the Target device that contains addressed lu
>        vendor id: NVMe
>        vendor specific: SAMSUNG MZWLL12THMLA-00005_S4C7NA0N700078
>    Designation descriptor number 2, descriptor length: 20
>      designator_type: EUI-64 based,  code_set: Binary
>      associated with the Addressed logical unit
>        EUI-64 based 16 byte identifier
>        Identifier extension: 0x344337304e700078
>        IEEE Company_id: 0x2538
>        Vendor Specific Extension Identifier: 0x410000000103
>        [0x344337304e7000780025384100000001]
>    Designation descriptor number 3, descriptor length: 40
>      designator_type: SCSI name string,  code_set: UTF-8
>      associated with the Addressed logical unit
>        SCSI name string:
>        eui.344337304E7000780025384100000001
>
> SPDK returns this for the same device:
>
> VPD INQUIRY: Device Identification page
>    Designation descriptor number 1, descriptor length: 48
>      designator_type: T10 vendor identification,  code_set: ASCII
>      associated with the Target device that contains addressed lu
>        vendor id: NVMe
>        vendor specific: SPDK_Controller1_SPDK00000000000001
>    Designation descriptor number 2, descriptor length: 20
>      designator_type: EUI-64 based,  code_set: Binary
>      associated with the Addressed logical unit
>        EUI-64 based 16 byte identifier
>        Identifier extension: 0xe0e9311590254d4f
>        IEEE Company_id: 0x8fa737
>        Vendor Specific Extension Identifier: 0xb56897382503
>        [0xe0e9311590254d4f8fa737b568973825]
>    Designation descriptor number 3, descriptor length: 40
>      designator_type: SCSI name string,  code_set: UTF-8
>      associated with the Addressed logical unit
>        SCSI name string:
>        eui.E0E9311590254D4F8FA737B568973825
>
> So, the kernel target returns limited information when not using pass-through which forces VMware to use the nguid.
> We could use the nguid to fill the eui64 attribute and always report the extended info like we do with a pass-through device?
>
> -------------------
> --- /root/linux-5.11/drivers/nvme/target/admin-cmd.c	2021-02-14 17:32:24.000000000 -0500
> +++ admin-cmd.c	2021-09-05 06:18:10.836865874 -0400
> @@ -526,6 +526,7 @@
>   	id->anagrpid = cpu_to_le32(ns->anagrpid);
>   
>   	memcpy(&id->nguid, &ns->nguid, sizeof(id->nguid));
> +	memcpy(&id->eui64, &ns->nguid, sizeof(id->eui64));
>   
>   	id->lbaf[0].ds = ns->blksize_shift;
>
> --- /root/linux-5.11/drivers/nvme/target/configfs.c	2021-02-14 17:32:24.000000000 -0500
> +++ configfs.c	2021-09-05 05:35:35.741619651 -0400
> @@ -477,6 +477,7 @@
>   	}
>   
>   	memcpy(&ns->nguid, nguid, sizeof(nguid));
> +	memcpy(&ns->eui64, nguid, sizeof(ns->eui64));
>   out_unlock:
>   	mutex_unlock(&subsys->lock);
>   	return ret ? ret : count;
> --------------
>
> Even with pass-through enabled and the kernel target returning all information the path is immediately reported to be dead.
> esxcli storage core path list
> rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-
>     UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-
>     Runtime Name: vmhba64:C0:T1:L0
>     Device: No associated device
>     Device Display Name: No associated device
>     Adapter: vmhba64
>     Channel: 0
>     Target: 1
>     LUN: 0
>     Plugin: (unclaimed)
>     State: dead
>     Transport: rdma
>     Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10
>     Target Identifier: rdma.unknown
>     Adapter Transport Details: Unavailable or path is unclaimed
>     Target Transport Details: Unavailable or path is unclaimed
>     Maximum IO Size: 131072
>
> This may or may not be a Vmware path-checker issue.
> Since SPDK does not show this problem some difference between the kernel target and SPDK target must exist.
> I don't know if the patch I use that limits the queue-depth to 256 is to blame.
> The path for the exact same device exported with SPDK shows up like this:
>
> rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e
>     UID: rdma.vmnic2:98:03:9b:03:45:10-rdma.unknown-eui.a012ce7696bf47d5be87760d8f78fb8e
>     Runtime Name: vmhba64:C0:T0:L0
>     Device: eui.a012ce7696bf47d5be87760d8f78fb8e
>     Device Display Name: NVMe RDMA Disk (eui.a012ce7696bf47d5be87760d8f78fb8e)
>     Adapter: vmhba64
>     Channel: 0
>     Target: 0
>     LUN: 0
>     Plugin: HPP
>     State: active
>     Transport: rdma
>     Adapter Identifier: rdma.vmnic2:98:03:9b:03:45:10
>     Target Identifier: rdma.unknown
>     Adapter Transport Details: Unavailable or path is unclaimed
>     Target Transport Details: Unavailable or path is unclaimed
>     Maximum IO Size: 131072
>
> It looks like the connect patch does work but something else causes VMware not to accept the nvmet-rdma target devices.
> Not sure what to make of that. It could still be eui related? See the UID from the nvmet-rdma target.
>
> Thanks,
>
> --Mark
>
> On 02/09/2021, 23:36, "Max Gurtovoy" <mgurtovoy@nvidia.com> wrote:
>
>
>      On 8/31/2021 4:42 PM, Mark Ruijter wrote:
>      > When I connect an SPDK initiator it will try to connect using 1024 connections.
>      > The linux target is unable to handle this situation and return an error.
>      >
>      > Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
>      > Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
>      > Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
>      >
>      > It is really easy to reproduce the problem, even when not using the SPDK initiator.
>      >
>      > Just type:
>      > nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX
>      > While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections.
>
>      1024 connections or is it the queue depth ?
>
>      how many cores you have in initiator ?
>
>      can you give more details on the systems ?
>
>      >
>      > The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect.
>      > Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect.
>      > See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-(
>      >
>      > In noticed that someone reported this problem on the SPDK list:
>      > https://github.com/spdk/spdk/issues/1719
>      >
>      > Thanks,
>      >
>      > Mark
>      >
>      > ---
>      > static int
>      > nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
>      >                                  struct nvmet_rdma_queue *queue)
>      > {
>      >          struct nvme_rdma_cm_req *req;
>      >
>      >          req = (struct nvme_rdma_cm_req *)conn->private_data;
>      >          if (!req || conn->private_data_len == 0)
>      >                  return NVME_RDMA_CM_INVALID_LEN;
>      >
>      >          if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0)
>      >                  return NVME_RDMA_CM_INVALID_RECFMT;
>      >
>      >          queue->host_qid = le16_to_cpu(req->qid);
>      >
>      >          /*
>      >           * req->hsqsize corresponds to our recv queue size plus 1
>      >           * req->hrqsize corresponds to our send queue size
>      >           */
>      >          queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
>      >          queue->send_queue_size = le16_to_cpu(req->hrqsize);
>      >          if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) {
>      >                  pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE);
>      >                  return NVME_RDMA_CM_INVALID_HSQSIZE;
>      >          }
>      >
>      > +        if (queue->recv_queue_size > 256)
>      > +               queue->recv_queue_size = 256;
>      > +        if (queue->send_queue_size > 256)
>      > +               queue->send_queue_size = 256;
>      > +       pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size);
>      > +       pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size);
>      >
>      >          /* XXX: Should we enforce some kind of max for IO queues? */
>      >          return 0;
>      > }
>      >
>      >
>      >
>      > _______________________________________________
>      > Linux-nvme mailing list
>      > Linux-nvme@lists.infradead.org
>      > http://lists.infradead.org/mailman/listinfo/linux-nvme
>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-07 14:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31 13:42 SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma Mark Ruijter
2021-09-01 12:52 ` Sagi Grimberg
2021-09-01 14:51   ` Mark Ruijter
2021-09-01 14:58     ` Sagi Grimberg
2021-09-01 15:08       ` Mark Ruijter
2021-09-02 21:36 ` Max Gurtovoy
2021-09-06  9:12   ` Mark Ruijter
2021-09-07 14:25     ` Max Gurtovoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.