All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/20] NVMeTCP Offload ULP
@ 2021-06-29 12:47 Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 01/20] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Prabhakar Kushwaha
                   ` (20 more replies)
  0 siblings, 21 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

With the goal of enabling a generic infrastructure that allows NVMe/TCP
offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
patch series introduces the nvme-tcp-offload ULP host layer, which will
be a new transport type called "tcp-offload" and will serve as an 
abstraction layer to work with device specific nvme-tcp offload drivers.

NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes 
both the TCP level and the NVMeTCP level.

The nvme-tcp-offload transport can co-exist with the existing tcp and 
other transports. The tcp offload was designed so that stack changes are 
kept to a bare minimum: only registering new transports.
All other APIs, ops etc. are identical to the regular tcp transport.
Representing the TCP offload as a new transport allows clear and manageable
differentiation between the connections which should use the offload path 
and those that are not offloaded (even on the same device).

The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:

* NVMe layer: *

       [ nvme/nvme-fabrics/blk-mq ]
             |
        (nvme API and blk-mq API)
             |
             |			 
* Transport layer: *

      [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
             |        |             |
           (Verbs) 
             |        |             |
             |     (Socket)
             |        |             |
             |        |        (nvme-tcp-offload API)
             |        |             |
             |        |             |
* Transport Driver: *

             |        |             |
      [ RDMA driver ]       
                      |             |
             [ Network driver ]
                                    |
                       [ NVMeTCP Offload driver ]

Upstream plan:
==============
As discussed in RFCV7, "NVMeTCP Offload ULP and QEDN Device Driver" 
contains 3 parts:
https://lore.kernel.org/linux-nvme/20210531225222.16992-1-smalin@marvell.com/

This series contains part1 and part3, intended for linux-nvme:
- Part 1: The nvme-tcp-offload patches
- Part 3: Marvell's Offload device driver(qedn) patches. 
          It has "compilation dependency" on both Part 1 and Part 2. 

Part 2 is already accepted in net-next.git:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=eda1bc65b0dc1b03006e427430ba23746ec44714


Usage:
======
The user will interact with the network-device in order to configure 
the ip/vlan - Logically similar to the RDMA model.
The NVMeTCP configuration is populated as part of the 
nvme connect command.

Example:
Assign IP to the net-device (from any existing Linux tool):

    ip addr add 100.100.0.101/24 dev p1p1

This IP will be used by both net-device and offload-device.

In order to connect from "sw" nvme-tcp through the net-device:

    nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn

In order to connect from "offload" nvme-tcp through the offload-device:

    nvme connect -t tcp_offload -s 4420 -a 100.100.0.100 -n testnqn
	
An alternative approach, and as a future enhancement that will not impact this 
series will be to modify nvme-cli with a new flag that will determine 
if "-t tcp" should be the regular nvme-tcp (which will be the default) 
or nvme-tcp-offload.
Exmaple:
    nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn -[new flag]


Queue Initialization Design:
============================
The nvme-tcp-offload ULP module shall register with the existing 
nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
The nvme-tcp-offload driver shall register to nvme-tcp-offload ULP
with the following ops:
- claim_dev() - in order to resolve the route to the target according to
                the paired net_dev.
- create_queue() - in order to create offloaded nvme-tcp queue.

The nvme-tcp-offload ULP module shall manage all the controller level
functionalities, call claim_dev and based on the return values shall call
the relevant module create_queue in order to create the admin queue and
the IO queues.


IO-path Design:
===============
The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload 
ULP module shall pass the request (the IO) to the nvme-tcp-offload 
driver and later, the nvme-tcp-offload driver returns the request
completion (the IO completion).
No additional handling is needed in between; this design will reduce the
CPU utilization as we will describe below.

The nvme-tcp-offload driver shall register to nvme-tcp-offload ULP
with the following IO-path ops:
- send_req() - in order to pass the request to the handling of the
               offload driver that shall pass it to the offload device 
			   specific device.
- poll_queue()

Once the IO completes, the nvme-tcp-offload driver shall call 
command.done() that will invoke the nvme-tcp-offload ULP layer to
complete the request.


Teardown and errors:
====================
In case of NVMeTCP queue error the nvme-tcp-offload driver shall
call the nvme_tcp_ofld_report_queue_err.
The nvme-tcp-offload driver shall register to nvme-tcp-offload ULP
with the following teardown ops:
- drain_queue()
- destroy_queue()


The Marvell qedn driver:
========================
The new driver will be added under "drivers/nvme/hw" and will be enabled
by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
As part of the qedn init, the driver will register as a pci device driver 
and will work with the Marvell fastlinQ NIC.
As part of the probe, the driver will register to the nvme_tcp_offload
(ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
"qed_*_ops" which are used by the qede, qedr, qedf and qedi device
drivers.


Changes since RFC v1:
=====================
- nvme-tcp-offload: Fix nvme_tcp_ofld_ops return values.
- nvme-tcp-offload: Remove NVMF_TRTYPE_TCP_OFFLOAD.
- nvme-tcp-offload: Add nvme_tcp_ofld_poll() implementation.
- nvme-tcp-offload: Fix nvme_tcp_ofld_queue_rq() to check map_sg() and 
  send_req() return values.

Changes since RFC v2:
=====================
- nvme-tcp-offload: Fixes in controller and queue level (patches 3-6).
- qedn: Add the Marvell's NVMeTCP HW offload device driver init and probe
  (patches 8-11).
  
Changes since RFC v3:
=====================
- nvme-tcp-offload: Add the full implementation of the nvme-tcp-offload layer 
  including the new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new 
  flows (ASYNC and timeout).
- nvme-tcp-offload: Add device maximums: max_hw_sectors, max_segments.
- nvme-tcp-offload: layer design and optimization changes.
- qedn: Add full implementation for the conn level, IO path and error handling.

Changes since RFC v4:
=====================
(Many thanks to Hannes Reinecke for his feedback)
- nvme_tcp_offload: Add num_hw_vectors in order to limit the number of queues.
- nvme_tcp_offload: Add per device private_data.
- nvme_tcp_offload: Fix header digest, data digest and tos initialization.
- qedn: Remove the qedn_global list.
- qedn: Remove the workqueue flow from send_req.
- qedn: Add db_recovery support.

Changes since RFC v5:
=====================
(Many thanks to Sagi Grimberg for his feedback)
- nvme-fabrics: Expose nvmf_check_required_opts() globally (as a new patch).
- nvme_tcp_offload: Remove io-queues BLK_MQ_F_BLOCKING.
- nvme_tcp_offload: Fix the nvme_tcp_ofld_stop_queue (drain_queue) flow.
- nvme_tcp_offload: Fix the nvme_tcp_ofld_free_queue (destroy_queue) flow.
- nvme_tcp_offload: Change rwsem to mutex.
- nvme_tcp_offload: remove redundant fields.
- nvme_tcp_offload: Remove the "new" from setup_ctrl().
- nvme_tcp_offload: Remove the init_req() and commit_rqs() ops.
- nvme_tcp_offload: Minor fixes in nvme_tcp_ofld_create_ctrl() ansd 
  nvme_tcp_ofld_free_queue().
- nvme_tcp_offload: Patch 8 (timeout and async) was squeashed into 
  patch 7 (io level).
- qedn: Fix the free_queue flow and the destroy_queue flow.
- qedn: Remove version number.

Changes since RFC v6:
=====================
- No changes in nvme_tcp_offload
- qedn: Remove redundant logic in the io-queues core affinity initialization.
- qedn: Remove qedn_validate_cccid_in_range().

Changes since v1:
=====================
- nvme_tcp_offload: Add support for NVME_OPT_HOST_IFACE.
- nvme_tcp_offload: Kconfig fix (thanks to Petr Mladek).
- nvme_tcp_offload: return code fix (thanks to Dan Carpenter).

Changes since v2:
=====================
- nvme_tcp_offload: Fix overly long lines.
- nvme_tcp_offload: use correct terminology for vendor driver.
- qedn: Added qedn driver as part of series.

Changes since v3:
=====================
- nvme_tcp_offload: Rename nvme_tcp_ofld_map_data() to 
  nvme_tcp_ofld_set_sg_host_data().


Arie Gershberg (2):
  nvme-tcp-offload: Add controller level implementation
  nvme-tcp-offload: Add controller level error recovery implementation

Dean Balandin (3):
  nvme-tcp-offload: Add device scan implementation
  nvme-tcp-offload: Add queue level implementation
  nvme-tcp-offload: Add IO level implementation

Nikolay Assa (1):
  qedn: Add qedn_claim_dev API support

Prabhakar Kushwaha (7):
  nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
    definitions
  nvme-fabrics: Expose nvmf_check_required_opts() globally
  qedn: Add connection-level slowpath functionality
  qedn: Add support of configuring HW filter block
  qedn: Add support of Task and SGL
  qedn: Add support of NVME ICReq & ICResp
  qedn: Add support of ASYNC

Shai Malin (7):
  nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  qedn: Add qedn - Marvell's NVMeTCP HW offload device driver
  qedn: Add qedn probe
  qedn: Add IRQ and fast-path resources initializations
  qedn: Add IO level qedn_send_req and fw_cq workqueue
  qedn: Add IO level fastpath functionality
  qedn: Add Connection and IO level recovery flows

 MAINTAINERS                      |   18 +
 drivers/nvme/Kconfig             |    1 +
 drivers/nvme/Makefile            |    1 +
 drivers/nvme/host/Kconfig        |   15 +
 drivers/nvme/host/Makefile       |    3 +
 drivers/nvme/host/fabrics.c      |   12 +-
 drivers/nvme/host/fabrics.h      |    9 +
 drivers/nvme/host/tcp-offload.c  | 1346 ++++++++++++++++++++++++++++++
 drivers/nvme/host/tcp-offload.h  |  207 +++++
 drivers/nvme/hw/Kconfig          |    9 +
 drivers/nvme/hw/Makefile         |    3 +
 drivers/nvme/hw/qedn/Makefile    |    4 +
 drivers/nvme/hw/qedn/qedn.h      |  402 +++++++++
 drivers/nvme/hw/qedn/qedn_conn.c | 1076 ++++++++++++++++++++++++
 drivers/nvme/hw/qedn/qedn_main.c | 1109 ++++++++++++++++++++++++
 drivers/nvme/hw/qedn/qedn_task.c |  873 +++++++++++++++++++
 16 files changed, 5079 insertions(+), 9 deletions(-)
 create mode 100644 drivers/nvme/host/tcp-offload.c
 create mode 100644 drivers/nvme/host/tcp-offload.h
 create mode 100644 drivers/nvme/hw/Kconfig
 create mode 100644 drivers/nvme/hw/Makefile
 create mode 100644 drivers/nvme/hw/qedn/Makefile
 create mode 100644 drivers/nvme/hw/qedn/qedn.h
 create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c
 create mode 100644 drivers/nvme/hw/qedn/qedn_main.c
 create mode 100644 drivers/nvme/hw/qedn/qedn_task.c

-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v4 01/20] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-07-01 13:34   ` Christoph Hellwig
  2021-06-29 12:47 ` [PATCH v4 02/20] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Prabhakar Kushwaha
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024, Dean Balandin

From: Shai Malin <smalin@marvell.com>

This patch will present the structure for the NVMeTCP offload common
layer driver. This module is added under "drivers/nvme/host/" and future
offload drivers which will register to it will be placed under
"drivers/nvme/hw".
This new driver will be enabled by the Kconfig "NVM Express over Fabrics
TCP offload commmon layer".
In order to support the new transport type, for host mode, no change is
needed.

Each new offload device specific driver will register to this ULP during
its probe function, by filling out the nvme_tcp_ofld_dev->ops and
nvme_tcp_ofld_dev->private_data and calling nvme_tcp_ofld_register_dev
with the initialized struct.

The internal implementation:
- tcp-offload.h:
  Includes all common structs and ops to be used and shared by offload
  drivers.

- tcp-offload.c:
  Includes the init function which registers as a NVMf transport just
  like any other transport.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
---
 MAINTAINERS                     |   8 ++
 drivers/nvme/host/Kconfig       |  15 +++
 drivers/nvme/host/Makefile      |   3 +
 drivers/nvme/host/tcp-offload.c | 124 ++++++++++++++++++++
 drivers/nvme/host/tcp-offload.h | 199 ++++++++++++++++++++++++++++++++
 5 files changed, 349 insertions(+)
 create mode 100644 drivers/nvme/host/tcp-offload.c
 create mode 100644 drivers/nvme/host/tcp-offload.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 81e1edeceae4..01fbebdc7722 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13093,6 +13093,14 @@ F:	drivers/nvme/host/
 F:	include/linux/nvme.h
 F:	include/uapi/linux/nvme_ioctl.h
 
+NVM EXPRESS TCP OFFLOAD TRANSPORT DRIVERS
+M:	Shai Malin <smalin@marvell.com>
+M:	Ariel Elior <aelior@marvell.com>
+L:	linux-nvme@lists.infradead.org
+S:	Supported
+F:	drivers/nvme/host/tcp-offload.c
+F:	drivers/nvme/host/tcp-offload.h
+
 NVM EXPRESS FC TRANSPORT DRIVERS
 M:	James Smart <james.smart@broadcom.com>
 L:	linux-nvme@lists.infradead.org
diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index 102292289cdf..1993734d0104 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -84,3 +84,18 @@ config NVME_TCP
 	  from https://github.com/linux-nvme/nvme-cli.
 
 	  If unsure, say N.
+
+config NVME_TCP_OFFLOAD
+	tristate "NVM Express over Fabrics TCP offload common layer"
+	depends on INET
+	depends on BLK_DEV_NVME
+	select NVME_FABRICS
+	help
+	  This provides support for the NVMe over Fabrics protocol using
+	  the TCP offload transport. This allows you to use remote block devices
+	  exported using the NVMe protocol set.
+
+	  To configure a NVMe over Fabrics controller use the nvme-cli tool
+	  from https://github.com/linux-nvme/nvme-cli.
+
+	  If unsure, say N.
diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index cbc509784b2e..3c3fdf83ce38 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)		+= nvme-fabrics.o
 obj-$(CONFIG_NVME_RDMA)			+= nvme-rdma.o
 obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
 obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
+obj-$(CONFIG_NVME_TCP_OFFLOAD)	+= nvme-tcp-offload.o
 
 nvme-core-y				:= core.o ioctl.o
 nvme-core-$(CONFIG_TRACING)		+= trace.o
@@ -26,3 +27,5 @@ nvme-rdma-y				+= rdma.o
 nvme-fc-y				+= fc.o
 
 nvme-tcp-y				+= tcp.o
+
+nvme-tcp-offload-y		+= tcp-offload.o
diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
new file mode 100644
index 000000000000..10b87f5b875b
--- /dev/null
+++ b/drivers/nvme/host/tcp-offload.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+/* Kernel includes */
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* Driver includes */
+#include "tcp-offload.h"
+
+static LIST_HEAD(nvme_tcp_ofld_devices);
+static DEFINE_MUTEX(nvme_tcp_ofld_devices_mutex);
+
+/**
+ * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
+ * function.
+ * @dev:	NVMeTCP offload device instance to be registered to the
+ *		common tcp offload instance.
+ *
+ * API function that registers the type of offload device specific driver
+ * being implemented to the common NVMe over TCP offload library. Part of
+ * the overall init sequence of starting up an offload driver.
+ */
+int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev)
+{
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
+
+	if (!ops->claim_dev ||
+	    !ops->setup_ctrl ||
+	    !ops->release_ctrl ||
+	    !ops->create_queue ||
+	    !ops->drain_queue ||
+	    !ops->destroy_queue ||
+	    !ops->poll_queue ||
+	    !ops->send_req)
+		return -EINVAL;
+
+	mutex_lock(&nvme_tcp_ofld_devices_mutex);
+	list_add_tail(&dev->entry, &nvme_tcp_ofld_devices);
+	mutex_unlock(&nvme_tcp_ofld_devices_mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_register_dev);
+
+/**
+ * nvme_tcp_ofld_unregister_dev() - NVMeTCP Offload Library unregistration
+ * function.
+ * @dev:	NVMeTCP offload device instance to be unregistered from the
+ *		common tcp offload instance.
+ *
+ * API function that unregisters the type of offload device specific driver
+ * being implemented from the common NVMe over TCP offload library.
+ * Part of the overall exit sequence of unloading the implemented driver.
+ */
+void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)
+{
+	mutex_lock(&nvme_tcp_ofld_devices_mutex);
+	list_del(&dev->entry);
+	mutex_unlock(&nvme_tcp_ofld_devices_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
+
+/**
+ * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event
+ * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.
+ * @queue:	NVMeTCP offload queue instance on which the error has occurred.
+ *
+ * API function that allows the offload device specific driver to reports
+ * errors to the common offload layer, to invoke error recovery.
+ */
+int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)
+{
+	/* Placeholder - invoke error recovery flow */
+
+	return 0;
+}
+
+/**
+ * nvme_tcp_ofld_req_done() - NVMeTCP Offload request done callback
+ * function. Pointed to by nvme_tcp_ofld_req->done.
+ * Handles both NVME_TCP_F_DATA_SUCCESS flag and NVMe CQ.
+ * @req:	NVMeTCP offload request to complete.
+ * @result:     The nvme_result.
+ * @status:     The completion status.
+ *
+ * API function that allows the offload device specific driver to report
+ * request completions to the common offload layer.
+ */
+void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
+			    union nvme_result *result,
+			    __le16 status)
+{
+	/* Placeholder - complete request with/without error */
+}
+
+static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
+	.name		= "tcp_offload",
+	.module		= THIS_MODULE,
+	.required_opts	= NVMF_OPT_TRADDR,
+	.allowed_opts	= NVMF_OPT_TRSVCID | NVMF_OPT_NR_WRITE_QUEUES  |
+			  NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
+			  NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_HDR_DIGEST |
+			  NVMF_OPT_DATA_DIGEST | NVMF_OPT_NR_POLL_QUEUES |
+			  NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE,
+};
+
+static int __init nvme_tcp_ofld_init_module(void)
+{
+	nvmf_register_transport(&nvme_tcp_ofld_transport);
+
+	return 0;
+}
+
+static void __exit nvme_tcp_ofld_cleanup_module(void)
+{
+	nvmf_unregister_transport(&nvme_tcp_ofld_transport);
+}
+
+module_init(nvme_tcp_ofld_init_module);
+module_exit(nvme_tcp_ofld_cleanup_module);
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
new file mode 100644
index 000000000000..bfa759177a07
--- /dev/null
+++ b/drivers/nvme/host/tcp-offload.h
@@ -0,0 +1,199 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+/* Linux includes */
+#include <linux/dma-mapping.h>
+#include <linux/scatterlist.h>
+#include <linux/types.h>
+#include <linux/nvme-tcp.h>
+
+/* Driver includes */
+#include "nvme.h"
+#include "fabrics.h"
+
+/* Forward declarations */
+struct nvme_tcp_ofld_ops;
+
+/* Representation of a offload device. This is the struct used to register
+ * to the offload layer by the offload device specific driver, during its probe
+ * function.
+ * Allocated by offload device specific driver.
+ */
+struct nvme_tcp_ofld_dev {
+	struct list_head entry;
+	struct nvme_tcp_ofld_ops *ops;
+
+	/* Offload device specific driver context */
+	int num_hw_vectors;
+};
+
+/* Per IO struct holding the nvme_request and command
+ * Allocated by blk-mq.
+ */
+struct nvme_tcp_ofld_req {
+	struct nvme_request req;
+	struct nvme_command nvme_cmd;
+	struct list_head queue_entry;
+	struct nvme_tcp_ofld_queue *queue;
+
+	/* Offload device specific driver context */
+	void *private_data;
+
+	/* async flag is used to distinguish between async and IO flow
+	 * in common send_req() of nvme_tcp_ofld_ops.
+	 */
+	bool async;
+
+	void (*done)(struct nvme_tcp_ofld_req *req,
+		     union nvme_result *result,
+		     __le16 status);
+};
+
+enum nvme_tcp_ofld_queue_flags {
+	NVME_TCP_OFLD_Q_ALLOCATED = 0,
+	NVME_TCP_OFLD_Q_LIVE = 1,
+};
+
+/* Allocated by nvme_tcp_ofld */
+struct nvme_tcp_ofld_queue {
+	/* Offload device associated to this queue */
+	struct nvme_tcp_ofld_dev *dev;
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	unsigned long flags;
+	size_t cmnd_capsule_len;
+
+	u8 hdr_digest;
+	u8 data_digest;
+	u8 tos;
+
+	/* Offload device specific driver context */
+	void *private_data;
+
+	/* Error callback function */
+	int (*report_err)(struct nvme_tcp_ofld_queue *queue);
+};
+
+/* Connectivity (routing) params used for establishing a connection */
+struct nvme_tcp_ofld_ctrl_con_params {
+	struct sockaddr_storage remote_ip_addr;
+
+	/* If NVMF_OPT_HOST_TRADDR is provided it will be set in local_ip_addr
+	 * in nvme_tcp_ofld_create_ctrl().
+	 * If NVMF_OPT_HOST_TRADDR is not provided the local_ip_addr will be
+	 * initialized by claim_dev().
+	 */
+	struct sockaddr_storage local_ip_addr;
+};
+
+/* Allocated by nvme_tcp_ofld */
+struct nvme_tcp_ofld_ctrl {
+	struct nvme_ctrl nctrl;
+	struct list_head list;
+	struct net_device *ndev;
+	struct nvme_tcp_ofld_dev *dev;
+
+	/* admin and IO queues */
+	struct blk_mq_tag_set tag_set;
+	struct blk_mq_tag_set admin_tag_set;
+	struct nvme_tcp_ofld_queue *queues;
+
+	struct work_struct err_work;
+	struct delayed_work connect_work;
+
+	/*
+	 * Each entry in the array indicates the number of queues of
+	 * corresponding type.
+	 */
+	u32 io_queues[HCTX_MAX_TYPES];
+
+	/* Connectivity params */
+	struct nvme_tcp_ofld_ctrl_con_params conn_params;
+
+	/* Offload device driver context */
+	void *private_data;
+};
+
+struct nvme_tcp_ofld_ops {
+	const char *name;
+	struct module *module;
+
+	/* For offload device specific driver to report what opts it supports.
+	 * It could be different than the ULP supported opts due to hardware
+	 * limitations. Also it could be different among different offload
+	 * specific device drivers.
+	 */
+	int required_opts; /* bitmap using enum nvmf_parsing_opts */
+	int allowed_opts; /* bitmap using enum nvmf_parsing_opts */
+
+	/* For offload device specific max num of segments and IO sizes */
+	u32 max_hw_sectors;
+	u32 max_segments;
+
+	/**
+	 * claim_dev: Return True if addr is reachable via offload device.
+	 * @dev: The offload device to check.
+	 * @ctrl: The offload ctrl have the conn_params field. The
+	 * conn_params is to be filled with routing params by the
+	 * offload device specific driver.
+	 */
+	int (*claim_dev)(struct nvme_tcp_ofld_dev *dev,
+			 struct nvme_tcp_ofld_ctrl *ctrl);
+
+	/**
+	 * setup_ctrl: Setup device specific controller structures.
+	 * @ctrl: The offload ctrl.
+	 */
+	int (*setup_ctrl)(struct nvme_tcp_ofld_ctrl *ctrl);
+
+	/**
+	 * release_ctrl: Release/Free device specific controller structures.
+	 * @ctrl: The offload ctrl.
+	 */
+	int (*release_ctrl)(struct nvme_tcp_ofld_ctrl *ctrl);
+
+	/**
+	 * create_queue: Create offload queue and establish TCP + NVMeTCP
+	 * (icreq+icresp) connection. Return true on successful connection.
+	 * Based on nvme_tcp_alloc_queue.
+	 * @queue: The queue itself - used as input and output.
+	 * @qid: The queue ID associated with the requested queue.
+	 * @q_size: The queue depth.
+	 */
+	int (*create_queue)(struct nvme_tcp_ofld_queue *queue, int qid,
+			    size_t queue_size);
+
+	/**
+	 * drain_queue: Drain a given queue - blocking function call.
+	 * Return from this function ensures that no additional
+	 * completions will arrive on this queue and that the HW will
+	 * not access host memory.
+	 * @queue: The queue to drain.
+	 */
+	void (*drain_queue)(struct nvme_tcp_ofld_queue *queue);
+
+	/**
+	 * destroy_queue: Close the TCP + NVMeTCP connection of a given queue
+	 * and make sure its no longer active (no completions will arrive on the
+	 * queue).
+	 * @queue: The queue to destroy.
+	 */
+	void (*destroy_queue)(struct nvme_tcp_ofld_queue *queue);
+
+	/**
+	 * poll_queue: Poll a given queue for completions.
+	 * @queue: The queue to poll.
+	 */
+	int (*poll_queue)(struct nvme_tcp_ofld_queue *queue);
+
+	/**
+	 * send_req: Dispatch a request. Returns the execution status.
+	 * @req: Ptr to request to be sent.
+	 */
+	int (*send_req)(struct nvme_tcp_ofld_req *req);
+};
+
+/* Exported functions for offload device specific drivers */
+int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
+void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 02/20] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 01/20] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 03/20] nvme-fabrics: Expose nvmf_check_required_opts() globally Prabhakar Kushwaha
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024, Arie Gershberg

Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
to header file, so it can be used by the different HW devices.

NVMeTCP offload devices might have different limitations of the
allowed options, for example, a device that does not support all the
queue types. With tcp and rdma, only the nvme-tcp and nvme-rdma layers
handle those attributes and the HW devices do not create any limitations
for the allowed options.

An alternative design could be to add separate fields in
nvme_tcp_ofld_ops such as max_hw_sectors and max_segments that
we already have in this series.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/fabrics.c | 7 -------
 drivers/nvme/host/fabrics.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 1e6a7cc056ca..21d67775d2af 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -947,13 +947,6 @@ void nvmf_free_options(struct nvmf_ctrl_options *opts)
 }
 EXPORT_SYMBOL_GPL(nvmf_free_options);
 
-#define NVMF_REQUIRED_OPTS	(NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
-#define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
-				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
-				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
-				 NVMF_OPT_DISABLE_SQFLOW |\
-				 NVMF_OPT_FAIL_FAST_TMO)
-
 static struct nvme_ctrl *
 nvmf_create_ctrl(struct device *dev, const char *buf)
 {
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index c31dad69a773..38ac7b757d78 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -69,6 +69,13 @@ enum {
 	NVMF_OPT_HOST_IFACE	= 1 << 21,
 };
 
+#define NVMF_REQUIRED_OPTS	(NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
+#define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
+				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
+				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
+				 NVMF_OPT_DISABLE_SQFLOW |\
+				 NVMF_OPT_FAIL_FAST_TMO)
+
 /**
  * struct nvmf_ctrl_options - Used to hold the options specified
  *			      with the parsing opts enum.
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 03/20] nvme-fabrics: Expose nvmf_check_required_opts() globally
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 01/20] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 02/20] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-07-01 13:35   ` Christoph Hellwig
  2021-06-29 12:47 ` [PATCH v4 04/20] nvme-tcp-offload: Add device scan implementation Prabhakar Kushwaha
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

nvmf_check_required_opts() is used to check if user provided opts has
the required_opts or not. if not, it will log which options are not
provided.

It can be leveraged by nvme-tcp-offload to check if provided opts are
supported by this specific offload driver or not.

So expose nvmf_check_required_opts() globally.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/fabrics.c | 5 +++--
 drivers/nvme/host/fabrics.h | 2 ++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 21d67775d2af..7830d2540492 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -864,8 +864,8 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 	return ret;
 }
 
-static int nvmf_check_required_opts(struct nvmf_ctrl_options *opts,
-		unsigned int required_opts)
+int nvmf_check_required_opts(struct nvmf_ctrl_options *opts,
+			     unsigned int required_opts)
 {
 	if ((opts->mask & required_opts) != required_opts) {
 		int i;
@@ -883,6 +883,7 @@ static int nvmf_check_required_opts(struct nvmf_ctrl_options *opts,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(nvmf_check_required_opts);
 
 bool nvmf_ip_options_match(struct nvme_ctrl *ctrl,
 		struct nvmf_ctrl_options *opts)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 38ac7b757d78..15d9c15ef8a6 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -197,5 +197,7 @@ int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size);
 bool nvmf_should_reconnect(struct nvme_ctrl *ctrl);
 bool nvmf_ip_options_match(struct nvme_ctrl *ctrl,
 		struct nvmf_ctrl_options *opts);
+int nvmf_check_required_opts(struct nvmf_ctrl_options *opts,
+			     unsigned int required_opts);
 
 #endif /* _NVME_FABRICS_H */
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 04/20] nvme-tcp-offload: Add device scan implementation
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (2 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 03/20] nvme-fabrics: Expose nvmf_check_required_opts() globally Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-07-01 13:36   ` Christoph Hellwig
  2021-06-29 12:47 ` [PATCH v4 05/20] nvme-tcp-offload: Add controller level implementation Prabhakar Kushwaha
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024, Dean Balandin

From: Dean Balandin <dbalandin@marvell.com>

As part of create_ctrl(), it scans the registered devices and calls
the claim_dev op on each of them, to find the first devices that matches
the connection params. Once the correct devices is found (claim_dev
returns true), we raise the refcnt of that device and return that device
as the device to be used for ctrl currently being created.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp-offload.c | 91 +++++++++++++++++++++++++++++++++
 drivers/nvme/host/tcp-offload.h |  1 +
 2 files changed, 92 insertions(+)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 10b87f5b875b..d0f4b83549b9 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -13,6 +13,11 @@
 static LIST_HEAD(nvme_tcp_ofld_devices);
 static DEFINE_MUTEX(nvme_tcp_ofld_devices_mutex);
 
+static inline struct nvme_tcp_ofld_ctrl *to_tcp_ofld_ctrl(struct nvme_ctrl *nc)
+{
+	return container_of(nc, struct nvme_tcp_ofld_ctrl, nctrl);
+}
+
 /**
  * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
  * function.
@@ -96,6 +101,91 @@ void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
 	/* Placeholder - complete request with/without error */
 }
 
+static struct nvme_tcp_ofld_dev *
+nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)
+{
+	struct nvme_ctrl *nctrl = &ctrl->nctrl;
+	struct nvme_tcp_ofld_dev *dev;
+	char *iface;
+
+	if (nctrl->opts->mask & NVMF_OPT_HOST_IFACE) {
+		iface = nctrl->opts->host_iface;
+		ctrl->ndev = __dev_get_by_name(&init_net, iface);
+		if (!ctrl->ndev) {
+			pr_err("invalid interface passed: %s\n", iface);
+			return NULL;
+		}
+	}
+
+	mutex_lock(&nvme_tcp_ofld_devices_mutex);
+	list_for_each_entry(dev, &nvme_tcp_ofld_devices, entry) {
+	/* ctrl includes the destination ip, source ip (if provided) and
+	 * network interface (if provided).
+	 */
+		if (dev->ops->claim_dev(dev, ctrl))
+			goto out;
+	}
+
+	dev = NULL;
+out:
+	mutex_unlock(&nvme_tcp_ofld_devices_mutex);
+
+	return dev;
+}
+
+static struct nvme_ctrl *
+nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	struct nvme_tcp_ofld_dev *dev;
+	struct nvme_ctrl *nctrl;
+	int rc = 0;
+
+	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
+	if (!ctrl)
+		return ERR_PTR(-ENOMEM);
+
+	nctrl = &ctrl->nctrl;
+
+	/* Init nvme_tcp_ofld_ctrl & nvme_ctrl params based on received opts */
+
+	/* Find device that can reach the dest addr */
+	dev = nvme_tcp_ofld_lookup_dev(ctrl);
+	if (!dev) {
+		pr_info("no device found for addr %s:%s.\n",
+			opts->traddr, opts->trsvcid);
+		rc = -EINVAL;
+		goto out_free_ctrl;
+	}
+
+	/* Increase driver refcnt */
+	if (!try_module_get(dev->ops->module)) {
+		pr_err("try_module_get failed\n");
+		rc = -ENODEV;
+		goto out_free_ctrl;
+	}
+
+	ctrl->dev = dev;
+
+	if (ctrl->dev->ops->max_hw_sectors)
+		nctrl->max_hw_sectors = ctrl->dev->ops->max_hw_sectors;
+	if (ctrl->dev->ops->max_segments)
+		nctrl->max_segments = ctrl->dev->ops->max_segments;
+
+	/* Init queues */
+
+	/* Call nvme_init_ctrl */
+
+	/* Setup ctrl */
+
+	return nctrl;
+
+out_free_ctrl:
+	kfree(ctrl);
+
+	return ERR_PTR(rc);
+}
+
 static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
 	.name		= "tcp_offload",
 	.module		= THIS_MODULE,
@@ -105,6 +195,7 @@ static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
 			  NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_HDR_DIGEST |
 			  NVMF_OPT_DATA_DIGEST | NVMF_OPT_NR_POLL_QUEUES |
 			  NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE,
+	.create_ctrl	= nvme_tcp_ofld_create_ctrl,
 };
 
 static int __init nvme_tcp_ofld_init_module(void)
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
index bfa759177a07..d1c2c6171897 100644
--- a/drivers/nvme/host/tcp-offload.h
+++ b/drivers/nvme/host/tcp-offload.h
@@ -8,6 +8,7 @@
 #include <linux/scatterlist.h>
 #include <linux/types.h>
 #include <linux/nvme-tcp.h>
+#include <linux/netdevice.h>
 
 /* Driver includes */
 #include "nvme.h"
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 05/20] nvme-tcp-offload: Add controller level implementation
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (3 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 04/20] nvme-tcp-offload: Add device scan implementation Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 06/20] nvme-tcp-offload: Add controller level error recovery implementation Prabhakar Kushwaha
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024, Arie Gershberg

From: Arie Gershberg <agershberg@marvell.com>

In this patch we implement controller level functionality including:
- create_ctrl.
- delete_ctrl.
- free_ctrl.

The implementation is similar to other nvme fabrics modules, the main
difference being that the nvme-tcp-offload ULP calls the offload specific
claim_dev() op with the given TCP/IP parameters to determine which device
will be used for this controller.
Once found, the offload specific device and controller will be paired and
kept in a controller list managed by the ULP.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp-offload.c | 487 +++++++++++++++++++++++++++++++-
 1 file changed, 482 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index d0f4b83549b9..806247722a35 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -12,6 +12,10 @@
 
 static LIST_HEAD(nvme_tcp_ofld_devices);
 static DEFINE_MUTEX(nvme_tcp_ofld_devices_mutex);
+static LIST_HEAD(nvme_tcp_ofld_ctrl_list);
+static DEFINE_MUTEX(nvme_tcp_ofld_ctrl_mutex);
+static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops;
+static struct blk_mq_ops nvme_tcp_ofld_mq_ops;
 
 static inline struct nvme_tcp_ofld_ctrl *to_tcp_ofld_ctrl(struct nvme_ctrl *nc)
 {
@@ -133,21 +137,441 @@ nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)
 	return dev;
 }
 
+static struct blk_mq_tag_set *
+nvme_tcp_ofld_alloc_tagset(struct nvme_ctrl *nctrl, bool admin)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct blk_mq_tag_set *set;
+	int rc;
+
+	if (admin) {
+		set = &ctrl->admin_tag_set;
+		memset(set, 0, sizeof(*set));
+		set->ops = &nvme_tcp_ofld_admin_mq_ops;
+		set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
+		set->reserved_tags = NVMF_RESERVED_TAGS;
+		set->numa_node = nctrl->numa_node;
+		set->flags = BLK_MQ_F_BLOCKING;
+		set->cmd_size = sizeof(struct nvme_tcp_ofld_req);
+		set->driver_data = ctrl;
+		set->nr_hw_queues = 1;
+		set->timeout = NVME_ADMIN_TIMEOUT;
+	} else {
+		set = &ctrl->tag_set;
+		memset(set, 0, sizeof(*set));
+		set->ops = &nvme_tcp_ofld_mq_ops;
+		set->queue_depth = nctrl->sqsize + 1;
+		set->reserved_tags = NVMF_RESERVED_TAGS;
+		set->numa_node = nctrl->numa_node;
+		set->flags = BLK_MQ_F_SHOULD_MERGE;
+		set->cmd_size = sizeof(struct nvme_tcp_ofld_req);
+		set->driver_data = ctrl;
+		set->nr_hw_queues = nctrl->queue_count - 1;
+		set->timeout = NVME_IO_TIMEOUT;
+		set->nr_maps = nctrl->opts->nr_poll_queues ? HCTX_MAX_TYPES : 2;
+	}
+
+	rc = blk_mq_alloc_tag_set(set);
+	if (rc)
+		return ERR_PTR(rc);
+
+	return set;
+}
+
+static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
+					       bool new)
+{
+	int rc;
+
+	/* Placeholder - alloc_admin_queue */
+	if (new) {
+		nctrl->admin_tagset =
+				nvme_tcp_ofld_alloc_tagset(nctrl, true);
+		if (IS_ERR(nctrl->admin_tagset)) {
+			rc = PTR_ERR(nctrl->admin_tagset);
+			nctrl->admin_tagset = NULL;
+			goto out_destroy_queue;
+		}
+
+		nctrl->fabrics_q = blk_mq_init_queue(nctrl->admin_tagset);
+		if (IS_ERR(nctrl->fabrics_q)) {
+			rc = PTR_ERR(nctrl->fabrics_q);
+			nctrl->fabrics_q = NULL;
+			goto out_free_tagset;
+		}
+
+		nctrl->admin_q = blk_mq_init_queue(nctrl->admin_tagset);
+		if (IS_ERR(nctrl->admin_q)) {
+			rc = PTR_ERR(nctrl->admin_q);
+			nctrl->admin_q = NULL;
+			goto out_cleanup_fabrics_q;
+		}
+	}
+
+	/* Placeholder - nvme_tcp_ofld_start_queue */
+
+	rc = nvme_enable_ctrl(nctrl);
+	if (rc)
+		goto out_stop_queue;
+
+	blk_mq_unquiesce_queue(nctrl->admin_q);
+
+	rc = nvme_init_ctrl_finish(nctrl);
+	if (rc)
+		goto out_quiesce_queue;
+
+	return 0;
+
+out_quiesce_queue:
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	blk_sync_queue(nctrl->admin_q);
+
+out_stop_queue:
+	/* Placeholder - stop offload queue */
+	nvme_cancel_admin_tagset(nctrl);
+
+out_cleanup_fabrics_q:
+	if (new)
+		blk_cleanup_queue(nctrl->fabrics_q);
+out_free_tagset:
+	if (new)
+		blk_mq_free_tag_set(nctrl->admin_tagset);
+out_destroy_queue:
+	/* Placeholder - free admin queue */
+
+	return rc;
+}
+
+static int
+nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
+{
+	int rc;
+
+	/* Placeholder - alloc_io_queues */
+
+	if (new) {
+		nctrl->tagset = nvme_tcp_ofld_alloc_tagset(nctrl, false);
+		if (IS_ERR(nctrl->tagset)) {
+			rc = PTR_ERR(nctrl->tagset);
+			nctrl->tagset = NULL;
+			goto out_free_io_queues;
+		}
+
+		nctrl->connect_q = blk_mq_init_queue(nctrl->tagset);
+		if (IS_ERR(nctrl->connect_q)) {
+			rc = PTR_ERR(nctrl->connect_q);
+			nctrl->connect_q = NULL;
+			goto out_free_tag_set;
+		}
+	}
+
+	/* Placeholder - start_io_queues */
+
+	if (!new) {
+		nvme_start_queues(nctrl);
+		if (!nvme_wait_freeze_timeout(nctrl, NVME_IO_TIMEOUT)) {
+			/*
+			 * If we timed out waiting for freeze we are likely to
+			 * be stuck.  Fail the controller initialization just
+			 * to be safe.
+			 */
+			rc = -ENODEV;
+			goto out_wait_freeze_timed_out;
+		}
+		blk_mq_update_nr_hw_queues(nctrl->tagset,
+					   nctrl->queue_count - 1);
+		nvme_unfreeze(nctrl);
+	}
+
+	return 0;
+
+out_wait_freeze_timed_out:
+	nvme_stop_queues(nctrl);
+	nvme_sync_io_queues(nctrl);
+
+	/* Placeholder - Stop IO queues */
+
+	if (new)
+		blk_cleanup_queue(nctrl->connect_q);
+out_free_tag_set:
+	if (new)
+		blk_mq_free_tag_set(nctrl->tagset);
+out_free_io_queues:
+	/* Placeholder - free_io_queues */
+
+	return rc;
+}
+
+static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvmf_ctrl_options *opts = nctrl->opts;
+	int rc = 0;
+
+	rc = ctrl->dev->ops->setup_ctrl(ctrl);
+	if (rc)
+		return rc;
+
+	rc = nvme_tcp_ofld_configure_admin_queue(nctrl, new);
+	if (rc)
+		goto out_release_ctrl;
+
+	if (nctrl->icdoff) {
+		dev_err(nctrl->device, "icdoff is not supported!\n");
+		rc = -EOPNOTSUPP;
+		goto destroy_admin;
+	}
+
+	if (!nvme_ctrl_sgl_supported(nctrl)) {
+		dev_err(nctrl->device, "Mandatory sgls are not supported!\n");
+		rc = -EOPNOTSUPP;
+		goto destroy_admin;
+	}
+
+	if (opts->queue_size > nctrl->sqsize + 1)
+		dev_warn(nctrl->device,
+			 "queue_size %zu > ctrl sqsize %u, clamping down\n",
+			 opts->queue_size, nctrl->sqsize + 1);
+
+	if (nctrl->sqsize + 1 > nctrl->maxcmd) {
+		dev_warn(nctrl->device,
+			 "sqsize %u > ctrl maxcmd %u, clamping down\n",
+			 nctrl->sqsize + 1, nctrl->maxcmd);
+		nctrl->sqsize = nctrl->maxcmd - 1;
+	}
+
+	if (nctrl->queue_count > 1) {
+		rc = nvme_tcp_ofld_configure_io_queues(nctrl, new);
+		if (rc)
+			goto destroy_admin;
+	}
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_LIVE)) {
+		/*
+		 * state change failure is ok if we started ctrl delete,
+		 * unless we're during creation of a new controller to
+		 * avoid races with teardown flow.
+		 */
+		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
+			     nctrl->state != NVME_CTRL_DELETING_NOIO);
+		WARN_ON_ONCE(new);
+		rc = -EINVAL;
+		goto destroy_io;
+	}
+
+	nvme_start_ctrl(nctrl);
+
+	return 0;
+
+destroy_io:
+	/* Placeholder - stop and destroy io queues*/
+destroy_admin:
+	/* Placeholder - stop and destroy admin queue*/
+out_release_ctrl:
+	ctrl->dev->ops->release_ctrl(ctrl);
+
+	return rc;
+}
+
+static int
+nvme_tcp_ofld_check_dev_opts(struct nvmf_ctrl_options *opts,
+			     struct nvme_tcp_ofld_ops *ofld_ops)
+{
+	unsigned int nvme_tcp_ofld_opt_mask = NVMF_ALLOWED_OPTS |
+			ofld_ops->allowed_opts | ofld_ops->required_opts;
+	struct nvmf_ctrl_options dev_opts_mask;
+
+	if (opts->mask & ~nvme_tcp_ofld_opt_mask) {
+		pr_warn("One or more nvmf options missing from ofld drvr %s.\n",
+			ofld_ops->name);
+
+		dev_opts_mask.mask = nvme_tcp_ofld_opt_mask;
+
+		return nvmf_check_required_opts(&dev_opts_mask, opts->mask);
+	}
+
+	return 0;
+}
+
+static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_dev *dev = ctrl->dev;
+
+	if (list_empty(&ctrl->list))
+		goto free_ctrl;
+
+	ctrl->dev->ops->release_ctrl(ctrl);
+
+	mutex_lock(&nvme_tcp_ofld_ctrl_mutex);
+	list_del(&ctrl->list);
+	mutex_unlock(&nvme_tcp_ofld_ctrl_mutex);
+
+	nvmf_free_options(nctrl->opts);
+free_ctrl:
+	module_put(dev->ops->module);
+	kfree(ctrl->queues);
+	kfree(ctrl);
+}
+
+static void
+nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *ctrl, bool remove)
+{
+	/* Placeholder - teardown_admin_queue */
+}
+
+static void
+nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
+{
+	/* Placeholder - teardown_io_queues */
+}
+
+static void
+nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown)
+{
+	/* Placeholder - err_work and connect_work */
+	nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown);
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	if (shutdown)
+		nvme_shutdown_ctrl(nctrl);
+	else
+		nvme_disable_ctrl(nctrl);
+	nvme_tcp_ofld_teardown_admin_queue(nctrl, shutdown);
+}
+
+static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl)
+{
+	nvme_tcp_ofld_teardown_ctrl(nctrl, true);
+}
+
+static int
+nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
+			   struct request *rq,
+			   unsigned int hctx_idx,
+			   unsigned int numa_node)
+{
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
+
+	/* Placeholder - init request */
+
+	req->done = nvme_tcp_ofld_req_done;
+
+	return 0;
+}
+
+static blk_status_t
+nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
+		       const struct blk_mq_queue_data *bd)
+{
+	/* Call nvme_setup_cmd(...) */
+
+	/* Call ops->send_req(...) */
+
+	return BLK_STS_OK;
+}
+
+static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
+	.queue_rq	= nvme_tcp_ofld_queue_rq,
+	.init_request	= nvme_tcp_ofld_init_request,
+	/*
+	 * All additional ops will be also implemented and registered similar to
+	 * tcp.c
+	 */
+};
+
+static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
+	.queue_rq	= nvme_tcp_ofld_queue_rq,
+	.init_request	= nvme_tcp_ofld_init_request,
+	/*
+	 * All additional ops will be also implemented and registered similar to
+	 * tcp.c
+	 */
+};
+
+static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
+	.name			= "tcp_offload",
+	.module			= THIS_MODULE,
+	.flags			= NVME_F_FABRICS,
+	.reg_read32		= nvmf_reg_read32,
+	.reg_read64		= nvmf_reg_read64,
+	.reg_write32		= nvmf_reg_write32,
+	.free_ctrl		= nvme_tcp_ofld_free_ctrl,
+	.delete_ctrl		= nvme_tcp_ofld_delete_ctrl,
+	.get_address		= nvmf_get_address,
+};
+
+static bool
+nvme_tcp_ofld_existing_controller(struct nvmf_ctrl_options *opts)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	bool found = false;
+
+	mutex_lock(&nvme_tcp_ofld_ctrl_mutex);
+	list_for_each_entry(ctrl, &nvme_tcp_ofld_ctrl_list, list) {
+		found = nvmf_ip_options_match(&ctrl->nctrl, opts);
+		if (found)
+			break;
+	}
+	mutex_unlock(&nvme_tcp_ofld_ctrl_mutex);
+
+	return found;
+}
+
 static struct nvme_ctrl *
 nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 {
+	struct nvme_tcp_ofld_queue *queue;
 	struct nvme_tcp_ofld_ctrl *ctrl;
 	struct nvme_tcp_ofld_dev *dev;
 	struct nvme_ctrl *nctrl;
-	int rc = 0;
+	int i, rc = 0;
 
 	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
 	if (!ctrl)
 		return ERR_PTR(-ENOMEM);
 
+	INIT_LIST_HEAD(&ctrl->list);
 	nctrl = &ctrl->nctrl;
+	nctrl->opts = opts;
+	nctrl->queue_count = opts->nr_io_queues + opts->nr_write_queues +
+			     opts->nr_poll_queues + 1;
+	nctrl->sqsize = opts->queue_size - 1;
+	nctrl->kato = opts->kato;
+	if (!(opts->mask & NVMF_OPT_TRSVCID)) {
+		opts->trsvcid =
+			kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL);
+		if (!opts->trsvcid) {
+			rc = -ENOMEM;
+			goto out_free_ctrl;
+		}
+		opts->mask |= NVMF_OPT_TRSVCID;
+	}
+
+	rc = inet_pton_with_scope(&init_net, AF_UNSPEC, opts->traddr,
+				  opts->trsvcid,
+				  &ctrl->conn_params.remote_ip_addr);
+	if (rc) {
+		pr_err("malformed address passed: %s:%s\n",
+		       opts->traddr, opts->trsvcid);
+		goto out_free_ctrl;
+	}
 
-	/* Init nvme_tcp_ofld_ctrl & nvme_ctrl params based on received opts */
+	if (opts->mask & NVMF_OPT_HOST_TRADDR) {
+		rc = inet_pton_with_scope(&init_net, AF_UNSPEC,
+					  opts->host_traddr, NULL,
+					  &ctrl->conn_params.local_ip_addr);
+		if (rc) {
+			pr_err("malformed src address passed: %s\n",
+			       opts->host_traddr);
+			goto out_free_ctrl;
+		}
+	}
+
+	if (!opts->duplicate_connect &&
+	    nvme_tcp_ofld_existing_controller(opts)) {
+		rc = -EALREADY;
+		goto out_free_ctrl;
+	}
 
 	/* Find device that can reach the dest addr */
 	dev = nvme_tcp_ofld_lookup_dev(ctrl);
@@ -165,6 +589,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 		goto out_free_ctrl;
 	}
 
+	rc = nvme_tcp_ofld_check_dev_opts(opts, dev->ops);
+	if (rc)
+		goto out_module_put;
+
 	ctrl->dev = dev;
 
 	if (ctrl->dev->ops->max_hw_sectors)
@@ -172,14 +600,55 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 	if (ctrl->dev->ops->max_segments)
 		nctrl->max_segments = ctrl->dev->ops->max_segments;
 
-	/* Init queues */
+	ctrl->queues = kcalloc(nctrl->queue_count,
+			       sizeof(struct nvme_tcp_ofld_queue),
+			       GFP_KERNEL);
+	if (!ctrl->queues) {
+		rc = -ENOMEM;
+		goto out_module_put;
+	}
+
+	for (i = 0; i < nctrl->queue_count; ++i) {
+		queue = &ctrl->queues[i];
+		queue->ctrl = ctrl;
+		queue->dev = dev;
+		queue->report_err = nvme_tcp_ofld_report_queue_err;
+	}
+
+	rc = nvme_init_ctrl(nctrl, ndev, &nvme_tcp_ofld_ctrl_ops, 0);
+	if (rc)
+		goto out_free_queues;
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
+		WARN_ON_ONCE(1);
+		rc = -EINTR;
+		goto out_uninit_ctrl;
+	}
 
-	/* Call nvme_init_ctrl */
+	rc = nvme_tcp_ofld_setup_ctrl(nctrl, true);
+	if (rc)
+		goto out_uninit_ctrl;
 
-	/* Setup ctrl */
+	dev_info(nctrl->device, "new ctrl: NQN \"%s\", addr %pISp\n",
+		 opts->subsysnqn, &ctrl->conn_params.remote_ip_addr);
+
+	mutex_lock(&nvme_tcp_ofld_ctrl_mutex);
+	list_add_tail(&ctrl->list, &nvme_tcp_ofld_ctrl_list);
+	mutex_unlock(&nvme_tcp_ofld_ctrl_mutex);
 
 	return nctrl;
 
+out_uninit_ctrl:
+	nvme_uninit_ctrl(nctrl);
+	nvme_put_ctrl(nctrl);
+	if (rc > 0)
+		rc = -EIO;
+
+	return ERR_PTR(rc);
+out_free_queues:
+	kfree(ctrl->queues);
+out_module_put:
+	module_put(dev->ops->module);
 out_free_ctrl:
 	kfree(ctrl);
 
@@ -207,7 +676,15 @@ static int __init nvme_tcp_ofld_init_module(void)
 
 static void __exit nvme_tcp_ofld_cleanup_module(void)
 {
+	struct nvme_tcp_ofld_ctrl *ctrl;
+
 	nvmf_unregister_transport(&nvme_tcp_ofld_transport);
+
+	mutex_lock(&nvme_tcp_ofld_ctrl_mutex);
+	list_for_each_entry(ctrl, &nvme_tcp_ofld_ctrl_list, list)
+		nvme_delete_ctrl(&ctrl->nctrl);
+	mutex_unlock(&nvme_tcp_ofld_ctrl_mutex);
+	flush_workqueue(nvme_delete_wq);
 }
 
 module_init(nvme_tcp_ofld_init_module);
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 06/20] nvme-tcp-offload: Add controller level error recovery implementation
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (4 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 05/20] nvme-tcp-offload: Add controller level implementation Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 07/20] nvme-tcp-offload: Add queue level implementation Prabhakar Kushwaha
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024, Arie Gershberg

From: Arie Gershberg <agershberg@marvell.com>

In this patch, we implement controller level error handling and recovery.
Upon an error discovered by the ULP or reset controller initiated by the
nvme-core (using reset_ctrl workqueue), the ULP will initiate a controller
recovery which includes teardown and re-connect of all queues.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
---
 drivers/nvme/host/tcp-offload.c | 127 +++++++++++++++++++++++++++++++-
 drivers/nvme/host/tcp-offload.h |   1 +
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 806247722a35..3af89a938d1b 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -72,6 +72,23 @@ void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)
 }
 EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
 
+/**
+ * nvme_tcp_ofld_error_recovery() - NVMeTCP Offload library error recovery.
+ * function.
+ * @nctrl:	NVMe controller instance to change to resetting.
+ *
+ * API function that change the controller state to resseting.
+ * Part of the overall controller reset sequence.
+ */
+void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl)
+{
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_RESETTING))
+		return;
+
+	queue_work(nvme_reset_wq, &to_tcp_ofld_ctrl(nctrl)->err_work);
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_error_recovery);
+
 /**
  * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event
  * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.
@@ -82,7 +99,8 @@ EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
  */
 int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)
 {
-	/* Placeholder - invoke error recovery flow */
+	pr_err("nvme-tcp-offload queue error\n");
+	nvme_tcp_ofld_error_recovery(&queue->ctrl->nctrl);
 
 	return 0;
 }
@@ -302,6 +320,28 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 	return rc;
 }
 
+static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)
+{
+	/* If we are resetting/deleting then do nothing */
+	if (nctrl->state != NVME_CTRL_CONNECTING) {
+		WARN_ON_ONCE(nctrl->state == NVME_CTRL_NEW ||
+			     nctrl->state == NVME_CTRL_LIVE);
+
+		return;
+	}
+
+	if (nvmf_should_reconnect(nctrl)) {
+		dev_info(nctrl->device, "Reconnecting in %d seconds...\n",
+			 nctrl->opts->reconnect_delay);
+		queue_delayed_work(nvme_wq,
+				   &to_tcp_ofld_ctrl(nctrl)->connect_work,
+				   nctrl->opts->reconnect_delay * HZ);
+	} else {
+		dev_info(nctrl->device, "Removing controller...\n");
+		nvme_delete_ctrl(nctrl);
+	}
+}
+
 static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
 {
 	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
@@ -426,10 +466,63 @@ nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
 	/* Placeholder - teardown_io_queues */
 }
 
+static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl =
+				container_of(to_delayed_work(work),
+					     struct nvme_tcp_ofld_ctrl,
+					     connect_work);
+	struct nvme_ctrl *nctrl = &ctrl->nctrl;
+
+	++nctrl->nr_reconnects;
+
+	if (nvme_tcp_ofld_setup_ctrl(nctrl, false))
+		goto requeue;
+
+	dev_info(nctrl->device, "Successfully reconnected (%d attempt)\n",
+		 nctrl->nr_reconnects);
+
+	nctrl->nr_reconnects = 0;
+
+	return;
+
+requeue:
+	dev_info(nctrl->device, "Failed reconnect attempt %d\n",
+		 nctrl->nr_reconnects);
+	nvme_tcp_ofld_reconnect_or_remove(nctrl);
+}
+
+static void nvme_tcp_ofld_error_recovery_work(struct work_struct *work)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl =
+		container_of(work, struct nvme_tcp_ofld_ctrl, err_work);
+	struct nvme_ctrl *nctrl = &ctrl->nctrl;
+
+	nvme_stop_keep_alive(nctrl);
+	nvme_tcp_ofld_teardown_io_queues(nctrl, false);
+	/* unquiesce to fail fast pending requests */
+	nvme_start_queues(nctrl);
+	nvme_tcp_ofld_teardown_admin_queue(nctrl, false);
+	blk_mq_unquiesce_queue(nctrl->admin_q);
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
+		/* state change failure is ok if we started nctrl delete */
+		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
+			     nctrl->state != NVME_CTRL_DELETING_NOIO);
+
+		return;
+	}
+
+	nvme_tcp_ofld_reconnect_or_remove(nctrl);
+}
+
 static void
 nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown)
 {
-	/* Placeholder - err_work and connect_work */
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+
+	cancel_work_sync(&ctrl->err_work);
+	cancel_delayed_work_sync(&ctrl->connect_work);
 	nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown);
 	blk_mq_quiesce_queue(nctrl->admin_q);
 	if (shutdown)
@@ -444,6 +537,32 @@ static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl)
 	nvme_tcp_ofld_teardown_ctrl(nctrl, true);
 }
 
+static void nvme_tcp_ofld_reset_ctrl_work(struct work_struct *work)
+{
+	struct nvme_ctrl *nctrl =
+		container_of(work, struct nvme_ctrl, reset_work);
+
+	nvme_stop_ctrl(nctrl);
+	nvme_tcp_ofld_teardown_ctrl(nctrl, false);
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
+		/* state change failure is ok if we started ctrl delete */
+		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
+			     nctrl->state != NVME_CTRL_DELETING_NOIO);
+
+		return;
+	}
+
+	if (nvme_tcp_ofld_setup_ctrl(nctrl, false))
+		goto out_fail;
+
+	return;
+
+out_fail:
+	++nctrl->nr_reconnects;
+	nvme_tcp_ofld_reconnect_or_remove(nctrl);
+}
+
 static int
 nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
 			   struct request *rq,
@@ -537,6 +656,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 			     opts->nr_poll_queues + 1;
 	nctrl->sqsize = opts->queue_size - 1;
 	nctrl->kato = opts->kato;
+	INIT_DELAYED_WORK(&ctrl->connect_work,
+			  nvme_tcp_ofld_reconnect_ctrl_work);
+	INIT_WORK(&ctrl->err_work, nvme_tcp_ofld_error_recovery_work);
+	INIT_WORK(&nctrl->reset_work, nvme_tcp_ofld_reset_ctrl_work);
 	if (!(opts->mask & NVMF_OPT_TRSVCID)) {
 		opts->trsvcid =
 			kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL);
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
index d1c2c6171897..51fec632b72b 100644
--- a/drivers/nvme/host/tcp-offload.h
+++ b/drivers/nvme/host/tcp-offload.h
@@ -198,3 +198,4 @@ struct nvme_tcp_ofld_ops {
 /* Exported functions for offload device specific drivers */
 int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
 void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
+void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 07/20] nvme-tcp-offload: Add queue level implementation
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (5 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 06/20] nvme-tcp-offload: Add controller level error recovery implementation Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 08/20] nvme-tcp-offload: Add IO " Prabhakar Kushwaha
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024, Dean Balandin

From: Dean Balandin <dbalandin@marvell.com>

In this patch we implement queue level functionality.
The implementation is similar to the nvme-tcp module, the main
difference being that we call the offload device specific create_queue
op which creates the TCP connection, and NVMeTPC connection including
icreq+icresp negotiation.
Once create_queue returns successfully, we can move on to the fabrics
connect.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp-offload.c | 420 +++++++++++++++++++++++++++++---
 drivers/nvme/host/tcp-offload.h |   4 +
 2 files changed, 396 insertions(+), 28 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 3af89a938d1b..26253b107db2 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -22,6 +22,11 @@ static inline struct nvme_tcp_ofld_ctrl *to_tcp_ofld_ctrl(struct nvme_ctrl *nc)
 	return container_of(nc, struct nvme_tcp_ofld_ctrl, nctrl);
 }
 
+static inline int nvme_tcp_ofld_qid(struct nvme_tcp_ofld_queue *queue)
+{
+	return queue - queue->ctrl->queues;
+}
+
 /**
  * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
  * function.
@@ -196,19 +201,127 @@ nvme_tcp_ofld_alloc_tagset(struct nvme_ctrl *nctrl, bool admin)
 	return set;
 }
 
+static void __nvme_tcp_ofld_stop_queue(struct nvme_tcp_ofld_queue *queue)
+{
+	queue->dev->ops->drain_queue(queue);
+}
+
+static void nvme_tcp_ofld_stop_queue(struct nvme_ctrl *nctrl, int qid)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
+
+	mutex_lock(&queue->queue_lock);
+	if (test_and_clear_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags))
+		__nvme_tcp_ofld_stop_queue(queue);
+	mutex_unlock(&queue->queue_lock);
+}
+
+static void nvme_tcp_ofld_stop_io_queues(struct nvme_ctrl *ctrl)
+{
+	int i;
+
+	for (i = 1; i < ctrl->queue_count; i++)
+		nvme_tcp_ofld_stop_queue(ctrl, i);
+}
+
+static void __nvme_tcp_ofld_free_queue(struct nvme_tcp_ofld_queue *queue)
+{
+	queue->dev->ops->destroy_queue(queue);
+}
+
+static void nvme_tcp_ofld_free_queue(struct nvme_ctrl *nctrl, int qid)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
+
+	if (test_and_clear_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags)) {
+		__nvme_tcp_ofld_free_queue(queue);
+		mutex_destroy(&queue->queue_lock);
+	}
+}
+
+static void
+nvme_tcp_ofld_free_io_queues(struct nvme_ctrl *nctrl)
+{
+	int i;
+
+	for (i = 1; i < nctrl->queue_count; i++)
+		nvme_tcp_ofld_free_queue(nctrl, i);
+}
+
+static void nvme_tcp_ofld_destroy_io_queues(struct nvme_ctrl *nctrl,
+					    bool remove)
+{
+	nvme_tcp_ofld_stop_io_queues(nctrl);
+	if (remove) {
+		blk_cleanup_queue(nctrl->connect_q);
+		blk_mq_free_tag_set(nctrl->tagset);
+	}
+	nvme_tcp_ofld_free_io_queues(nctrl);
+}
+
+static void nvme_tcp_ofld_destroy_admin_queue(struct nvme_ctrl *nctrl,
+					      bool remove)
+{
+	nvme_tcp_ofld_stop_queue(nctrl, 0);
+	if (remove) {
+		blk_cleanup_queue(nctrl->admin_q);
+		blk_cleanup_queue(nctrl->fabrics_q);
+		blk_mq_free_tag_set(nctrl->admin_tagset);
+	}
+	nvme_tcp_ofld_free_queue(nctrl, 0);
+}
+
+static int nvme_tcp_ofld_start_queue(struct nvme_ctrl *nctrl, int qid)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
+	int rc;
+
+	queue = &ctrl->queues[qid];
+	if (qid) {
+		queue->cmnd_capsule_len = nctrl->ioccsz * 16;
+		rc = nvmf_connect_io_queue(nctrl, qid, false);
+	} else {
+		queue->cmnd_capsule_len = sizeof(struct nvme_command) +
+						NVME_TCP_ADMIN_CCSZ;
+		rc = nvmf_connect_admin_queue(nctrl);
+	}
+
+	if (!rc) {
+		set_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags);
+	} else {
+		if (test_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags))
+			__nvme_tcp_ofld_stop_queue(queue);
+		dev_err(nctrl->device,
+			"failed to connect queue: %d ret=%d\n", qid, rc);
+	}
+
+	return rc;
+}
+
 static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 					       bool new)
 {
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[0];
 	int rc;
 
-	/* Placeholder - alloc_admin_queue */
+	mutex_init(&queue->queue_lock);
+
+	rc = ctrl->dev->ops->create_queue(queue, 0, NVME_AQ_DEPTH);
+	if (rc)
+		return rc;
+
+	set_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags);
 	if (new) {
 		nctrl->admin_tagset =
 				nvme_tcp_ofld_alloc_tagset(nctrl, true);
 		if (IS_ERR(nctrl->admin_tagset)) {
 			rc = PTR_ERR(nctrl->admin_tagset);
 			nctrl->admin_tagset = NULL;
-			goto out_destroy_queue;
+			goto out_free_queue;
 		}
 
 		nctrl->fabrics_q = blk_mq_init_queue(nctrl->admin_tagset);
@@ -226,7 +339,9 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 		}
 	}
 
-	/* Placeholder - nvme_tcp_ofld_start_queue */
+	rc = nvme_tcp_ofld_start_queue(nctrl, 0);
+	if (rc)
+		goto out_cleanup_queue;
 
 	rc = nvme_enable_ctrl(nctrl);
 	if (rc)
@@ -243,19 +358,143 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 out_quiesce_queue:
 	blk_mq_quiesce_queue(nctrl->admin_q);
 	blk_sync_queue(nctrl->admin_q);
-
 out_stop_queue:
-	/* Placeholder - stop offload queue */
+	nvme_tcp_ofld_stop_queue(nctrl, 0);
 	nvme_cancel_admin_tagset(nctrl);
-
+out_cleanup_queue:
+	if (new)
+		blk_cleanup_queue(nctrl->admin_q);
 out_cleanup_fabrics_q:
 	if (new)
 		blk_cleanup_queue(nctrl->fabrics_q);
 out_free_tagset:
 	if (new)
 		blk_mq_free_tag_set(nctrl->admin_tagset);
-out_destroy_queue:
-	/* Placeholder - free admin queue */
+out_free_queue:
+	nvme_tcp_ofld_free_queue(nctrl, 0);
+
+	return rc;
+}
+
+static unsigned int nvme_tcp_ofld_nr_io_queues(struct nvme_ctrl *nctrl)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_dev *dev = ctrl->dev;
+	u32 hw_vectors = dev->num_hw_vectors;
+	u32 nr_write_queues, nr_poll_queues;
+	u32 nr_io_queues, nr_total_queues;
+
+	nr_io_queues = min3(nctrl->opts->nr_io_queues, num_online_cpus(),
+			    hw_vectors);
+	nr_write_queues = min3(nctrl->opts->nr_write_queues, num_online_cpus(),
+			       hw_vectors);
+	nr_poll_queues = min3(nctrl->opts->nr_poll_queues, num_online_cpus(),
+			      hw_vectors);
+
+	nr_total_queues = nr_io_queues + nr_write_queues + nr_poll_queues;
+
+	return nr_total_queues;
+}
+
+static void
+nvme_tcp_ofld_set_io_queues(struct nvme_ctrl *nctrl, unsigned int nr_io_queues)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvmf_ctrl_options *opts = nctrl->opts;
+
+	if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
+		/*
+		 * separate read/write queues
+		 * hand out dedicated default queues only after we have
+		 * sufficient read queues.
+		 */
+		ctrl->io_queues[HCTX_TYPE_READ] = opts->nr_io_queues;
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_READ];
+		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+			min(opts->nr_write_queues, nr_io_queues);
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
+	} else {
+		/*
+		 * shared read/write queues
+		 * either no write queues were requested, or we don't have
+		 * sufficient queue count to have dedicated default queues.
+		 */
+		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+			min(opts->nr_io_queues, nr_io_queues);
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
+	}
+
+	if (opts->nr_poll_queues && nr_io_queues) {
+		/* map dedicated poll queues only if we have queues left */
+		ctrl->io_queues[HCTX_TYPE_POLL] =
+			min(opts->nr_poll_queues, nr_io_queues);
+	}
+}
+
+static int nvme_tcp_ofld_create_io_queues(struct nvme_ctrl *nctrl)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	int i, rc;
+
+	for (i = 1; i < nctrl->queue_count; i++) {
+		mutex_init(&ctrl->queues[i].queue_lock);
+
+		rc = ctrl->dev->ops->create_queue(&ctrl->queues[i],
+						  i, nctrl->sqsize + 1);
+		if (rc)
+			goto out_free_queues;
+
+		set_bit(NVME_TCP_OFLD_Q_ALLOCATED, &ctrl->queues[i].flags);
+	}
+
+	return 0;
+
+out_free_queues:
+	for (i--; i >= 1; i--)
+		nvme_tcp_ofld_free_queue(nctrl, i);
+
+	return rc;
+}
+
+static int nvme_tcp_ofld_alloc_io_queues(struct nvme_ctrl *nctrl)
+{
+	unsigned int nr_io_queues;
+	int rc;
+
+	nr_io_queues = nvme_tcp_ofld_nr_io_queues(nctrl);
+	rc = nvme_set_queue_count(nctrl, &nr_io_queues);
+	if (rc)
+		return rc;
+
+	nctrl->queue_count = nr_io_queues + 1;
+	if (nctrl->queue_count < 2) {
+		dev_err(nctrl->device,
+			"unable to set any I/O queues\n");
+
+		return -ENOMEM;
+	}
+
+	dev_info(nctrl->device, "creating %d I/O queues.\n", nr_io_queues);
+	nvme_tcp_ofld_set_io_queues(nctrl, nr_io_queues);
+
+	return nvme_tcp_ofld_create_io_queues(nctrl);
+}
+
+static int nvme_tcp_ofld_start_io_queues(struct nvme_ctrl *nctrl)
+{
+	int i, rc = 0;
+
+	for (i = 1; i < nctrl->queue_count; i++) {
+		rc = nvme_tcp_ofld_start_queue(nctrl, i);
+		if (rc)
+			goto out_stop_queues;
+	}
+
+	return 0;
+
+out_stop_queues:
+	for (i--; i >= 1; i--)
+		nvme_tcp_ofld_stop_queue(nctrl, i);
 
 	return rc;
 }
@@ -263,9 +502,10 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 static int
 nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 {
-	int rc;
+	int rc = nvme_tcp_ofld_alloc_io_queues(nctrl);
 
-	/* Placeholder - alloc_io_queues */
+	if (rc)
+		return rc;
 
 	if (new) {
 		nctrl->tagset = nvme_tcp_ofld_alloc_tagset(nctrl, false);
@@ -283,7 +523,9 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 		}
 	}
 
-	/* Placeholder - start_io_queues */
+	rc = nvme_tcp_ofld_start_io_queues(nctrl);
+	if (rc)
+		goto out_cleanup_connect_q;
 
 	if (!new) {
 		nvme_start_queues(nctrl);
@@ -306,16 +548,16 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 out_wait_freeze_timed_out:
 	nvme_stop_queues(nctrl);
 	nvme_sync_io_queues(nctrl);
-
-	/* Placeholder - Stop IO queues */
-
+	nvme_tcp_ofld_stop_io_queues(nctrl);
+out_cleanup_connect_q:
+	nvme_cancel_tagset(nctrl);
 	if (new)
 		blk_cleanup_queue(nctrl->connect_q);
 out_free_tag_set:
 	if (new)
 		blk_mq_free_tag_set(nctrl->tagset);
 out_free_io_queues:
-	/* Placeholder - free_io_queues */
+	nvme_tcp_ofld_free_io_queues(nctrl);
 
 	return rc;
 }
@@ -342,6 +584,17 @@ static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)
 	}
 }
 
+static int
+nvme_tcp_ofld_init_admin_hctx(struct blk_mq_hw_ctx *hctx, void *data,
+			      unsigned int hctx_idx)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = data;
+
+	hctx->driver_data = &ctrl->queues[0];
+
+	return 0;
+}
+
 static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
 {
 	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
@@ -404,9 +657,19 @@ static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
 	return 0;
 
 destroy_io:
-	/* Placeholder - stop and destroy io queues*/
+	if (nctrl->queue_count > 1) {
+		nvme_stop_queues(nctrl);
+		nvme_sync_io_queues(nctrl);
+		nvme_tcp_ofld_stop_io_queues(nctrl);
+		nvme_cancel_tagset(nctrl);
+		nvme_tcp_ofld_destroy_io_queues(nctrl, new);
+	}
 destroy_admin:
-	/* Placeholder - stop and destroy admin queue*/
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	blk_sync_queue(nctrl->admin_q);
+	nvme_tcp_ofld_stop_queue(nctrl, 0);
+	nvme_cancel_admin_tagset(nctrl);
+	nvme_tcp_ofld_destroy_admin_queue(nctrl, new);
 out_release_ctrl:
 	ctrl->dev->ops->release_ctrl(ctrl);
 
@@ -455,15 +718,37 @@ static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
 }
 
 static void
-nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *ctrl, bool remove)
+nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *nctrl, bool remove)
 {
-	/* Placeholder - teardown_admin_queue */
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	blk_sync_queue(nctrl->admin_q);
+
+	nvme_tcp_ofld_stop_queue(nctrl, 0);
+	nvme_cancel_admin_tagset(nctrl);
+
+	if (remove)
+		blk_mq_unquiesce_queue(nctrl->admin_q);
+
+	nvme_tcp_ofld_destroy_admin_queue(nctrl, remove);
 }
 
 static void
 nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
 {
-	/* Placeholder - teardown_io_queues */
+	if (nctrl->queue_count <= 1)
+		return;
+
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	nvme_start_freeze(nctrl);
+	nvme_stop_queues(nctrl);
+	nvme_sync_io_queues(nctrl);
+	nvme_tcp_ofld_stop_io_queues(nctrl);
+	nvme_cancel_tagset(nctrl);
+
+	if (remove)
+		nvme_start_queues(nctrl);
+
+	nvme_tcp_ofld_destroy_io_queues(nctrl, remove);
 }
 
 static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)
@@ -578,6 +863,12 @@ nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
 	return 0;
 }
 
+inline size_t nvme_tcp_ofld_inline_data_size(struct nvme_tcp_ofld_queue *queue)
+{
+	return queue->cmnd_capsule_len - sizeof(struct nvme_command);
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_inline_data_size);
+
 static blk_status_t
 nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
 		       const struct blk_mq_queue_data *bd)
@@ -589,22 +880,95 @@ nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return BLK_STS_OK;
 }
 
+static void
+nvme_tcp_ofld_exit_request(struct blk_mq_tag_set *set,
+			   struct request *rq, unsigned int hctx_idx)
+{
+	/*
+	 * Nothing is allocated in nvme_tcp_ofld_init_request,
+	 * hence empty.
+	 */
+}
+
+static int
+nvme_tcp_ofld_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
+			unsigned int hctx_idx)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = data;
+
+	hctx->driver_data = &ctrl->queues[hctx_idx + 1];
+
+	return 0;
+}
+
+static int nvme_tcp_ofld_map_queues(struct blk_mq_tag_set *set)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
+	struct nvmf_ctrl_options *opts = ctrl->nctrl.opts;
+
+	if (opts->nr_write_queues && ctrl->io_queues[HCTX_TYPE_READ]) {
+		/* separate read/write queues */
+		set->map[HCTX_TYPE_DEFAULT].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+		set->map[HCTX_TYPE_READ].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_READ];
+		set->map[HCTX_TYPE_READ].queue_offset =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+	} else {
+		/* shared read/write queues */
+		set->map[HCTX_TYPE_DEFAULT].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+		set->map[HCTX_TYPE_READ].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_READ].queue_offset = 0;
+	}
+	blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
+	blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
+
+	if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) {
+		/* map dedicated poll queues only if we have queues left */
+		set->map[HCTX_TYPE_POLL].nr_queues =
+				ctrl->io_queues[HCTX_TYPE_POLL];
+		set->map[HCTX_TYPE_POLL].queue_offset =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT] +
+			ctrl->io_queues[HCTX_TYPE_READ];
+		blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
+	}
+
+	dev_info(ctrl->nctrl.device,
+		 "mapped %d/%d/%d default/read/poll queues.\n",
+		 ctrl->io_queues[HCTX_TYPE_DEFAULT],
+		 ctrl->io_queues[HCTX_TYPE_READ],
+		 ctrl->io_queues[HCTX_TYPE_POLL]);
+
+	return 0;
+}
+
+static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
+{
+	/* Placeholder - Implement polling mechanism */
+
+	return 0;
+}
+
 static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
 	.queue_rq	= nvme_tcp_ofld_queue_rq,
+	.complete	= nvme_complete_rq,
 	.init_request	= nvme_tcp_ofld_init_request,
-	/*
-	 * All additional ops will be also implemented and registered similar to
-	 * tcp.c
-	 */
+	.exit_request	= nvme_tcp_ofld_exit_request,
+	.init_hctx	= nvme_tcp_ofld_init_hctx,
+	.map_queues	= nvme_tcp_ofld_map_queues,
+	.poll		= nvme_tcp_ofld_poll,
 };
 
 static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
 	.queue_rq	= nvme_tcp_ofld_queue_rq,
+	.complete	= nvme_complete_rq,
 	.init_request	= nvme_tcp_ofld_init_request,
-	/*
-	 * All additional ops will be also implemented and registered similar to
-	 * tcp.c
-	 */
+	.exit_request	= nvme_tcp_ofld_exit_request,
+	.init_hctx	= nvme_tcp_ofld_init_admin_hctx,
 };
 
 static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
index 51fec632b72b..b3502c01394e 100644
--- a/drivers/nvme/host/tcp-offload.h
+++ b/drivers/nvme/host/tcp-offload.h
@@ -65,6 +65,9 @@ struct nvme_tcp_ofld_queue {
 	unsigned long flags;
 	size_t cmnd_capsule_len;
 
+	/* mutex used during stop_queue */
+	struct mutex queue_lock;
+
 	u8 hdr_digest;
 	u8 data_digest;
 	u8 tos;
@@ -199,3 +202,4 @@ struct nvme_tcp_ofld_ops {
 int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
 void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
 void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);
+inline size_t nvme_tcp_ofld_inline_data_size(struct nvme_tcp_ofld_queue *queue);
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 08/20] nvme-tcp-offload: Add IO level implementation
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (6 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 07/20] nvme-tcp-offload: Add queue level implementation Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 09/20] qedn: Add qedn - Marvell's NVMeTCP HW offload device driver Prabhakar Kushwaha
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024, Dean Balandin

From: Dean Balandin <dbalandin@marvell.com>

In this patch, we present the IO level functionality.
The nvme-tcp-offload shall work on the IO-level, meaning the
nvme-tcp-offload ULP module shall pass the request to the nvme-tcp-offload
driver and shall expect for the request completion.
No additional handling is needed in between, this design will reduce the
CPU utilization as we will describe below.

The nvme-tcp-offload driver shall register to nvme-tcp-offload ULP
with the following IO-path ops:
 - send_req - in order to pass the request to the handling of the offload
   driver that shall pass it to the offload specific device
 - poll_queue

The offload device driver will manage the context from which the request
will be executed and the request aggregations.
Once the IO completed, the nvme-tcp-offload driver shall call
command.done() that shall invoke the nvme-tcp-offload ULP layer for
completing the request.

This patch also add support for the nvme-tcp-offload timeout and
nvme-tcp-offload ASYNC flow.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
---
 drivers/nvme/host/tcp-offload.c | 181 ++++++++++++++++++++++++++++++--
 drivers/nvme/host/tcp-offload.h |   2 +
 2 files changed, 176 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 26253b107db2..501006ec9c97 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -125,7 +125,30 @@ void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
 			    union nvme_result *result,
 			    __le16 status)
 {
-	/* Placeholder - complete request with/without error */
+	struct request *rq = blk_mq_rq_from_pdu(req);
+
+	if (!nvme_try_complete_req(rq, cpu_to_le16(status << 1), *result))
+		nvme_complete_rq(rq);
+}
+
+/**
+ * nvme_tcp_ofld_async_req_done() - NVMeTCP Offload request done callback
+ * function for async request. Pointed to by nvme_tcp_ofld_req->done.
+ * Handles both NVME_TCP_F_DATA_SUCCESS flag and NVMe CQ.
+ * @req:	NVMeTCP offload request to complete.
+ * @result:     The nvme_result.
+ * @status:     The completion status.
+ *
+ * API function that allows the offload device specific driver to report
+ * request completions to the common offload layer.
+ */
+void nvme_tcp_ofld_async_req_done(struct nvme_tcp_ofld_req *req,
+				  union nvme_result *result, __le16 status)
+{
+	struct nvme_tcp_ofld_queue *queue = req->queue;
+	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
+
+	nvme_complete_async_event(&ctrl->nctrl, status, result);
 }
 
 static struct nvme_tcp_ofld_dev *
@@ -717,6 +740,57 @@ static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
 	kfree(ctrl);
 }
 
+static void nvme_tcp_ofld_set_sg_null(struct nvme_command *c)
+{
+	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
+
+	sg->addr = 0;
+	sg->length = 0;
+	sg->type = (NVME_TRANSPORT_SGL_DATA_DESC << 4) |
+			NVME_SGL_FMT_TRANSPORT_A;
+}
+
+inline void nvme_tcp_ofld_set_sg_inline(struct nvme_tcp_ofld_queue *queue,
+					struct nvme_command *c, u32 data_len)
+{
+	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
+
+	sg->addr = cpu_to_le64(queue->ctrl->nctrl.icdoff);
+	sg->length = cpu_to_le32(data_len);
+	sg->type = (NVME_SGL_FMT_DATA_DESC << 4) | NVME_SGL_FMT_OFFSET;
+}
+
+static void nvme_tcp_ofld_set_sg_host_data(struct nvme_command *c,
+					   u32 data_len)
+{
+	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
+
+	sg->addr = 0;
+	sg->length = cpu_to_le32(data_len);
+	sg->type = (NVME_TRANSPORT_SGL_DATA_DESC << 4) |
+			NVME_SGL_FMT_TRANSPORT_A;
+}
+
+static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(arg);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[0];
+	struct nvme_tcp_ofld_dev *dev = queue->dev;
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
+
+	ctrl->async_req.nvme_cmd.common.opcode = nvme_admin_async_event;
+	ctrl->async_req.nvme_cmd.common.command_id = NVME_AQ_BLK_MQ_DEPTH;
+	ctrl->async_req.nvme_cmd.common.flags |= NVME_CMD_SGL_METABUF;
+
+	nvme_tcp_ofld_set_sg_null(&ctrl->async_req.nvme_cmd);
+
+	ctrl->async_req.async = true;
+	ctrl->async_req.queue = queue;
+	ctrl->async_req.done = nvme_tcp_ofld_async_req_done;
+
+	ops->send_req(&ctrl->async_req);
+}
+
 static void
 nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *nctrl, bool remove)
 {
@@ -855,9 +929,13 @@ nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
 			   unsigned int numa_node)
 {
 	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
+	int qid;
 
-	/* Placeholder - init request */
-
+	qid = (set == &ctrl->tag_set) ? hctx_idx + 1 : 0;
+	req->queue = &ctrl->queues[qid];
+	nvme_req(rq)->ctrl = &ctrl->nctrl;
+	nvme_req(rq)->cmd = &req->nvme_cmd;
 	req->done = nvme_tcp_ofld_req_done;
 
 	return 0;
@@ -873,9 +951,46 @@ static blk_status_t
 nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
 		       const struct blk_mq_queue_data *bd)
 {
-	/* Call nvme_setup_cmd(...) */
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(bd->rq);
+	struct nvme_tcp_ofld_queue *queue = hctx->driver_data;
+	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
+	struct nvme_ns *ns = hctx->queue->queuedata;
+	struct nvme_tcp_ofld_dev *dev = queue->dev;
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
+	struct nvme_command *nvme_cmd;
+	struct request *rq = bd->rq;
+	bool queue_ready;
+	u32 data_len;
+	int rc;
+
+	queue_ready = test_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags);
+
+	req->async = false;
+
+	if (!nvme_check_ready(&ctrl->nctrl, rq, queue_ready))
+		return nvme_fail_nonready_command(&ctrl->nctrl, rq);
+
+	rc = nvme_setup_cmd(ns, rq);
+	if (unlikely(rc))
+		return rc;
 
-	/* Call ops->send_req(...) */
+	blk_mq_start_request(rq);
+
+	nvme_cmd = &req->nvme_cmd;
+	nvme_cmd->common.flags |= NVME_CMD_SGL_METABUF;
+
+	data_len = blk_rq_nr_phys_segments(rq) ? blk_rq_payload_bytes(rq) : 0;
+	if (!data_len)
+		nvme_tcp_ofld_set_sg_null(&req->nvme_cmd);
+	else if ((rq_data_dir(rq) == WRITE) &&
+		 data_len <= nvme_tcp_ofld_inline_data_size(queue))
+		nvme_tcp_ofld_set_sg_inline(queue, nvme_cmd, data_len);
+	else
+		nvme_tcp_ofld_set_sg_host_data(nvme_cmd, data_len);
+
+	rc = ops->send_req(req);
+	if (unlikely(rc))
+		return rc;
 
 	return BLK_STS_OK;
 }
@@ -948,9 +1063,58 @@ static int nvme_tcp_ofld_map_queues(struct blk_mq_tag_set *set)
 
 static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
 {
-	/* Placeholder - Implement polling mechanism */
+	struct nvme_tcp_ofld_queue *queue = hctx->driver_data;
+	struct nvme_tcp_ofld_dev *dev = queue->dev;
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
 
-	return 0;
+	return ops->poll_queue(queue);
+}
+
+static void nvme_tcp_ofld_complete_timed_out(struct request *rq)
+{
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_ctrl *nctrl = &req->queue->ctrl->nctrl;
+
+	nvme_tcp_ofld_stop_queue(nctrl, nvme_tcp_ofld_qid(req->queue));
+	if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
+		nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
+		blk_mq_complete_request(rq);
+	}
+}
+
+static enum blk_eh_timer_return nvme_tcp_ofld_timeout(struct request *rq,
+						      bool reserved)
+{
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_tcp_ofld_ctrl *ctrl = req->queue->ctrl;
+
+	dev_warn(ctrl->nctrl.device,
+		 "queue %d: timeout request %#x type %d\n",
+		 nvme_tcp_ofld_qid(req->queue), rq->tag,
+		 req->nvme_cmd.common.opcode);
+
+	if (ctrl->nctrl.state != NVME_CTRL_LIVE) {
+		/*
+		 * If we are resetting, connecting or deleting we should
+		 * complete immediately because we may block controller
+		 * teardown or setup sequence
+		 * - ctrl disable/shutdown fabrics requests
+		 * - connect requests
+		 * - initialization admin requests
+		 * - I/O requests that entered after unquiescing and
+		 *   the controller stopped responding
+		 *
+		 * All other requests should be cancelled by the error
+		 * recovery work, so it's fine that we fail it here.
+		 */
+		nvme_tcp_ofld_complete_timed_out(rq);
+
+		return BLK_EH_DONE;
+	}
+
+	nvme_tcp_ofld_error_recovery(&ctrl->nctrl);
+
+	return BLK_EH_RESET_TIMER;
 }
 
 static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
@@ -959,6 +1123,7 @@ static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
 	.init_request	= nvme_tcp_ofld_init_request,
 	.exit_request	= nvme_tcp_ofld_exit_request,
 	.init_hctx	= nvme_tcp_ofld_init_hctx,
+	.timeout	= nvme_tcp_ofld_timeout,
 	.map_queues	= nvme_tcp_ofld_map_queues,
 	.poll		= nvme_tcp_ofld_poll,
 };
@@ -969,6 +1134,7 @@ static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
 	.init_request	= nvme_tcp_ofld_init_request,
 	.exit_request	= nvme_tcp_ofld_exit_request,
 	.init_hctx	= nvme_tcp_ofld_init_admin_hctx,
+	.timeout	= nvme_tcp_ofld_timeout,
 };
 
 static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
@@ -979,6 +1145,7 @@ static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
 	.reg_read64		= nvmf_reg_read64,
 	.reg_write32		= nvmf_reg_write32,
 	.free_ctrl		= nvme_tcp_ofld_free_ctrl,
+	.submit_async_event     = nvme_tcp_ofld_submit_async_event,
 	.delete_ctrl		= nvme_tcp_ofld_delete_ctrl,
 	.get_address		= nvmf_get_address,
 };
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
index b3502c01394e..a4c28ddaf3ab 100644
--- a/drivers/nvme/host/tcp-offload.h
+++ b/drivers/nvme/host/tcp-offload.h
@@ -115,6 +115,8 @@ struct nvme_tcp_ofld_ctrl {
 	/* Connectivity params */
 	struct nvme_tcp_ofld_ctrl_con_params conn_params;
 
+	struct nvme_tcp_ofld_req async_req;
+
 	/* Offload device driver context */
 	void *private_data;
 };
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 09/20] qedn: Add qedn - Marvell's NVMeTCP HW offload device driver
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (7 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 08/20] nvme-tcp-offload: Add IO " Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-07-01 13:41   ` Christoph Hellwig
  2021-06-29 12:47 ` [PATCH v4 10/20] qedn: Add qedn probe Prabhakar Kushwaha
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024, Arie Gershberg

From: Shai Malin <smalin@marvell.com>

This patch will present the skeleton of the qedn driver.
The new driver will be added under "drivers/nvme/hw/qedn" and will be
enabled by the Kconfig "Marvell NVM Express over Fabrics TCP offload".

The internal implementation:
- qedn.h:
  Includes all common structs to be used by the qedn offload device driver.

- qedn_main.c
  Includes the qedn_init and qedn_cleanup implementation.
  As part of the qedn init, the driver will register as a pci device and
  will work with the Marvell fastlinQ NICs.
  As part of the probe, the driver will register to the nvme_tcp_offload
  (ULP).

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 MAINTAINERS                      |  10 ++
 drivers/nvme/Kconfig             |   1 +
 drivers/nvme/Makefile            |   1 +
 drivers/nvme/hw/Kconfig          |   8 ++
 drivers/nvme/hw/Makefile         |   3 +
 drivers/nvme/hw/qedn/Makefile    |   5 +
 drivers/nvme/hw/qedn/qedn.h      |  19 +++
 drivers/nvme/hw/qedn/qedn_main.c | 200 +++++++++++++++++++++++++++++++
 8 files changed, 247 insertions(+)
 create mode 100644 drivers/nvme/hw/Kconfig
 create mode 100644 drivers/nvme/hw/Makefile
 create mode 100644 drivers/nvme/hw/qedn/Makefile
 create mode 100644 drivers/nvme/hw/qedn/qedn.h
 create mode 100644 drivers/nvme/hw/qedn/qedn_main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 01fbebdc7722..207a62b768c5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15006,6 +15006,16 @@ S:	Supported
 F:	drivers/infiniband/hw/qedr/
 F:	include/uapi/rdma/qedr-abi.h
 
+QLOGIC QL4xxx NVME-TCP-OFFLOAD DRIVER
+M:	Shai Malin <smalin@marvell.com>
+M:	Ariel Elior <aelior@marvell.com>
+L:	linux-nvme@lists.infradead.org
+S:	Supported
+W:	http://git.infradead.org/nvme.git
+T:	git://git.infradead.org/nvme.git
+F:	drivers/nvme/hw/qedn/
+F:	include/linux/qed/
+
 QLOGIC QLA1280 SCSI DRIVER
 M:	Michael Reed <mdr@sgi.com>
 L:	linux-scsi@vger.kernel.org
diff --git a/drivers/nvme/Kconfig b/drivers/nvme/Kconfig
index 87ae409a32b9..827c2c9f0ad1 100644
--- a/drivers/nvme/Kconfig
+++ b/drivers/nvme/Kconfig
@@ -3,5 +3,6 @@ menu "NVME Support"
 
 source "drivers/nvme/host/Kconfig"
 source "drivers/nvme/target/Kconfig"
+source "drivers/nvme/hw/Kconfig"
 
 endmenu
diff --git a/drivers/nvme/Makefile b/drivers/nvme/Makefile
index fb42c44609a8..14c569040ef2 100644
--- a/drivers/nvme/Makefile
+++ b/drivers/nvme/Makefile
@@ -2,3 +2,4 @@
 
 obj-y		+= host/
 obj-y		+= target/
+obj-y		+= hw/
\ No newline at end of file
diff --git a/drivers/nvme/hw/Kconfig b/drivers/nvme/hw/Kconfig
new file mode 100644
index 000000000000..374f1f9dbd3d
--- /dev/null
+++ b/drivers/nvme/hw/Kconfig
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config NVME_QEDN
+	tristate "Marvell NVM Express over Fabrics TCP offload"
+	depends on NVME_TCP_OFFLOAD
+	help
+	  This enables the Marvell NVMe TCP offload support (qedn).
+
+	  If unsure, say N.
diff --git a/drivers/nvme/hw/Makefile b/drivers/nvme/hw/Makefile
new file mode 100644
index 000000000000..2f38e0520795
--- /dev/null
+++ b/drivers/nvme/hw/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_NVME_QEDN)		+= qedn/
diff --git a/drivers/nvme/hw/qedn/Makefile b/drivers/nvme/hw/qedn/Makefile
new file mode 100644
index 000000000000..1422cd878680
--- /dev/null
+++ b/drivers/nvme/hw/qedn/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_NVME_QEDN) := qedn.o
+
+qedn-y := qedn_main.o
diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
new file mode 100644
index 000000000000..bcd0748a10fd
--- /dev/null
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+#ifndef _QEDN_H_
+#define _QEDN_H_
+
+/* Driver includes */
+#include "../../host/tcp-offload.h"
+
+#define QEDN_MODULE_NAME "qedn"
+
+struct qedn_ctx {
+	struct pci_dev *pdev;
+	struct nvme_tcp_ofld_dev qedn_ofld_dev;
+};
+
+#endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
new file mode 100644
index 000000000000..d81b7c65acdf
--- /dev/null
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+ /* Kernel includes */
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* Driver includes */
+#include "qedn.h"
+
+#define CHIP_NUM_NVMETCP 0x8194
+
+static struct pci_device_id qedn_pci_tbl[] = {
+	{ PCI_VDEVICE(QLOGIC, CHIP_NUM_NVMETCP), 0 },
+	{0, 0},
+};
+
+static int
+qedn_claim_dev(struct nvme_tcp_ofld_dev *dev,
+	       struct nvme_tcp_ofld_ctrl *ctrl)
+{
+	/* Placeholder - qedn_claim_dev */
+
+	return 0;
+}
+
+static int qedn_setup_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
+{
+	/* Placeholder - qedn_setup_ctrl */
+
+	return 0;
+}
+
+static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
+{
+	/* Placeholder - qedn_release_ctrl */
+
+	return 0;
+}
+
+static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
+			     size_t queue_size)
+{
+	/* Placeholder - qedn_create_queue */
+
+	return 0;
+}
+
+static void qedn_drain_queue(struct nvme_tcp_ofld_queue *queue)
+{
+	/* Placeholder - qedn_drain_queue */
+}
+
+static void qedn_destroy_queue(struct nvme_tcp_ofld_queue *queue)
+{
+	/* Placeholder - qedn_destroy_queue */
+}
+
+static int qedn_poll_queue(struct nvme_tcp_ofld_queue *queue)
+{
+	/*
+	 * Poll queue support will be added as part of future
+	 * enhancements.
+	 */
+
+	return 0;
+}
+
+static int qedn_send_req(struct nvme_tcp_ofld_req *req)
+{
+	/* Placeholder - qedn_send_req */
+
+	return 0;
+}
+
+static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
+	.name = "qedn",
+	.module = THIS_MODULE,
+	.required_opts = NVMF_OPT_TRADDR,
+	.allowed_opts = NVMF_OPT_TRSVCID | NVMF_OPT_NR_WRITE_QUEUES |
+			NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
+			NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_HOST_IFACE,
+		/* These flags will be as part of future enhancements
+		 *	NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
+		 *	NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS
+		 */
+	.claim_dev = qedn_claim_dev,
+	.setup_ctrl = qedn_setup_ctrl,
+	.release_ctrl = qedn_release_ctrl,
+	.create_queue = qedn_create_queue,
+	.drain_queue = qedn_drain_queue,
+	.destroy_queue = qedn_destroy_queue,
+	.poll_queue = qedn_poll_queue,
+	.send_req = qedn_send_req,
+};
+
+static void __qedn_remove(struct pci_dev *pdev)
+{
+	struct qedn_ctx *qedn = pci_get_drvdata(pdev);
+
+	pr_notice("Starting qedn_remove\n");
+	nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
+	kfree(qedn);
+	pr_notice("Ending qedn_remove successfully\n");
+}
+
+static void qedn_remove(struct pci_dev *pdev)
+{
+	__qedn_remove(pdev);
+}
+
+static void qedn_shutdown(struct pci_dev *pdev)
+{
+	__qedn_remove(pdev);
+}
+
+static struct qedn_ctx *qedn_alloc_ctx(struct pci_dev *pdev)
+{
+	struct qedn_ctx *qedn = NULL;
+
+	qedn = kzalloc(sizeof(*qedn), GFP_KERNEL);
+	if (!qedn)
+		return NULL;
+
+	qedn->pdev = pdev;
+	pci_set_drvdata(pdev, qedn);
+
+	return qedn;
+}
+
+static int __qedn_probe(struct pci_dev *pdev)
+{
+	struct qedn_ctx *qedn;
+	int rc;
+
+	pr_notice("Starting qedn probe\n");
+
+	qedn = qedn_alloc_ctx(pdev);
+	if (!qedn)
+		return -ENODEV;
+
+	qedn->qedn_ofld_dev.ops = &qedn_ofld_ops;
+	INIT_LIST_HEAD(&qedn->qedn_ofld_dev.entry);
+	rc = nvme_tcp_ofld_register_dev(&qedn->qedn_ofld_dev);
+	if (rc)
+		goto release_qedn;
+
+	return 0;
+release_qedn:
+	kfree(qedn);
+
+	return rc;
+}
+
+static int qedn_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	return __qedn_probe(pdev);
+}
+
+static struct pci_driver qedn_pci_driver = {
+	.name     = QEDN_MODULE_NAME,
+	.id_table = qedn_pci_tbl,
+	.probe    = qedn_probe,
+	.remove   = qedn_remove,
+	.shutdown = qedn_shutdown,
+};
+
+static int __init qedn_init(void)
+{
+	int rc;
+
+	rc = pci_register_driver(&qedn_pci_driver);
+	if (rc) {
+		pr_err("Failed to register pci driver\n");
+
+		return -EINVAL;
+	}
+
+	pr_notice("driver loaded successfully\n");
+
+	return 0;
+}
+
+static void __exit qedn_cleanup(void)
+{
+	pci_unregister_driver(&qedn_pci_driver);
+	pr_notice("Unloading qedn ended\n");
+}
+
+module_init(qedn_init);
+module_exit(qedn_cleanup);
+
+MODULE_LICENSE("GPL v2");
+MODULE_SOFTDEP("pre: qede nvme-fabrics nvme-tcp-offload");
+MODULE_DESCRIPTION("Marvell 25/50/100G NVMe-TCP Offload Host Driver");
+MODULE_AUTHOR("Marvell");
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 10/20] qedn: Add qedn probe
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (8 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 09/20] qedn: Add qedn - Marvell's NVMeTCP HW offload device driver Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-07-01 13:48   ` Christoph Hellwig
  2021-06-29 12:47 ` [PATCH v4 11/20] qedn: Add qedn_claim_dev API support Prabhakar Kushwaha
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024, Dean Balandin

From: Shai Malin <smalin@marvell.com>

This patch introduces the functionality of loading and unloading
physical function.
qedn_probe() loads the offload device PF(physical function), and
initialize the HW and the FW with the PF parameters using the
HW ops->qed_nvmetcp_ops, which are similar to other "qed_*_ops" which
are used by the qede, qedr, qedf and qedi device drivers.
qedn_remove() unloads the offload device PF, re-initialize the HW and
the FW with the PF parameters.

The struct qedn_ctx is per PF container for PF-specific attributes and
resources.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/hw/Kconfig          |   1 +
 drivers/nvme/hw/qedn/qedn.h      |  26 ++++++
 drivers/nvme/hw/qedn/qedn_main.c | 155 ++++++++++++++++++++++++++++++-
 3 files changed, 177 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/hw/Kconfig b/drivers/nvme/hw/Kconfig
index 374f1f9dbd3d..91b1bd6f07d8 100644
--- a/drivers/nvme/hw/Kconfig
+++ b/drivers/nvme/hw/Kconfig
@@ -2,6 +2,7 @@
 config NVME_QEDN
 	tristate "Marvell NVM Express over Fabrics TCP offload"
 	depends on NVME_TCP_OFFLOAD
+	select QED_NVMETCP
 	help
 	  This enables the Marvell NVMe TCP offload support (qedn).
 
diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index bcd0748a10fd..931efc3afbaa 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -6,14 +6,40 @@
 #ifndef _QEDN_H_
 #define _QEDN_H_
 
+#include <linux/qed/qed_if.h>
+#include <linux/qed/qed_nvmetcp_if.h>
+
 /* Driver includes */
 #include "../../host/tcp-offload.h"
 
 #define QEDN_MODULE_NAME "qedn"
 
+#define QEDN_MAX_TASKS_PER_PF (16 * 1024)
+#define QEDN_MAX_CONNS_PER_PF (4 * 1024)
+#define QEDN_FW_CQ_SIZE (4 * 1024)
+#define QEDN_PROTO_CQ_PROD_IDX	0
+#define QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES 2
+
+enum qedn_state {
+	QEDN_STATE_CORE_PROBED = 0,
+	QEDN_STATE_CORE_OPEN,
+	QEDN_STATE_MFW_STATE,
+	QEDN_STATE_REGISTERED_OFFLOAD_DEV,
+	QEDN_STATE_MODULE_REMOVE_ONGOING,
+};
+
 struct qedn_ctx {
 	struct pci_dev *pdev;
+	struct qed_dev *cdev;
+	struct qed_dev_nvmetcp_info dev_info;
 	struct nvme_tcp_ofld_dev qedn_ofld_dev;
+	struct qed_pf_params pf_params;
+
+	/* Accessed with atomic bit ops, used with enum qedn_state */
+	unsigned long state;
+
+	/* Fast path queues */
+	u8 num_fw_cqs;
 };
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index d81b7c65acdf..97591797605e 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -14,6 +14,9 @@
 
 #define CHIP_NUM_NVMETCP 0x8194
 
+const struct qed_nvmetcp_ops *qed_ops;
+
+/* Global context instance */
 static struct pci_device_id qedn_pci_tbl[] = {
 	{ PCI_VDEVICE(QLOGIC, CHIP_NUM_NVMETCP), 0 },
 	{0, 0},
@@ -98,12 +101,109 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
 	.send_req = qedn_send_req,
 };
 
+static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)
+{
+	/* Placeholder - Initialize qedn fields */
+}
+
+static inline void
+qedn_init_core_probe_params(struct qed_probe_params *probe_params)
+{
+	memset(probe_params, 0, sizeof(*probe_params));
+	probe_params->protocol = QED_PROTOCOL_NVMETCP;
+	probe_params->is_vf = false;
+	probe_params->recov_in_prog = 0;
+}
+
+static inline int qedn_core_probe(struct qedn_ctx *qedn)
+{
+	struct qed_probe_params probe_params;
+	int rc = 0;
+
+	qedn_init_core_probe_params(&probe_params);
+	pr_info("Starting QED probe\n");
+	qedn->cdev = qed_ops->common->probe(qedn->pdev, &probe_params);
+	if (!qedn->cdev) {
+		rc = -ENODEV;
+		pr_err("QED probe failed\n");
+	}
+
+	return rc;
+}
+
+static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)
+{
+	u32 fw_conn_queue_pages = QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES;
+	struct qed_nvmetcp_pf_params *pf_params;
+
+	pf_params = &qedn->pf_params.nvmetcp_pf_params;
+	memset(pf_params, 0, sizeof(*pf_params));
+	qedn->num_fw_cqs = min_t(u8, qedn->dev_info.num_cqs, num_online_cpus());
+
+	pf_params->num_cons = QEDN_MAX_CONNS_PER_PF;
+	pf_params->num_tasks = QEDN_MAX_TASKS_PER_PF;
+
+	/* Placeholder - Initialize function level queues */
+
+	/* Placeholder - Initialize TCP params */
+
+	/* Queues */
+	pf_params->num_sq_pages_in_ring = fw_conn_queue_pages;
+	pf_params->num_r2tq_pages_in_ring = fw_conn_queue_pages;
+	pf_params->num_uhq_pages_in_ring = fw_conn_queue_pages;
+	pf_params->num_queues = qedn->num_fw_cqs;
+	pf_params->cq_num_entries = QEDN_FW_CQ_SIZE;
+
+	/* the CQ SB pi */
+	pf_params->gl_rq_pi = QEDN_PROTO_CQ_PROD_IDX;
+
+	return 0;
+}
+
+static inline int qedn_slowpath_start(struct qedn_ctx *qedn)
+{
+	struct qed_slowpath_params sp_params = {};
+	int rc = 0;
+
+	/* Start the Slowpath-process */
+	sp_params.int_mode = QED_INT_MODE_MSIX;
+	strscpy(sp_params.name, "qedn NVMeTCP", QED_DRV_VER_STR_SIZE);
+	rc = qed_ops->common->slowpath_start(qedn->cdev, &sp_params);
+	if (rc)
+		pr_err("Cannot start slowpath\n");
+
+	return rc;
+}
+
 static void __qedn_remove(struct pci_dev *pdev)
 {
 	struct qedn_ctx *qedn = pci_get_drvdata(pdev);
+	int rc;
+
+	pr_notice("Starting qedn_remove: abs PF id=%u\n",
+		  qedn->dev_info.common.abs_pf_id);
+
+	if (test_and_set_bit(QEDN_STATE_MODULE_REMOVE_ONGOING, &qedn->state)) {
+		pr_err("Remove already ongoing\n");
+
+		return;
+	}
+
+	if (test_and_clear_bit(QEDN_STATE_REGISTERED_OFFLOAD_DEV, &qedn->state))
+		nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
+
+	if (test_and_clear_bit(QEDN_STATE_MFW_STATE, &qedn->state)) {
+		rc = qed_ops->common->update_drv_state(qedn->cdev, false);
+		if (rc)
+			pr_err("Failed to send drv state to MFW\n");
+	}
+
+	if (test_and_clear_bit(QEDN_STATE_CORE_OPEN, &qedn->state))
+		qed_ops->common->slowpath_stop(qedn->cdev);
+
+	if (test_and_clear_bit(QEDN_STATE_CORE_PROBED, &qedn->state))
+		qed_ops->common->remove(qedn->cdev);
 
-	pr_notice("Starting qedn_remove\n");
-	nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
 	kfree(qedn);
 	pr_notice("Ending qedn_remove successfully\n");
 }
@@ -143,15 +243,52 @@ static int __qedn_probe(struct pci_dev *pdev)
 	if (!qedn)
 		return -ENODEV;
 
+	qedn_init_pf_struct(qedn);
+
+	/* QED probe */
+	rc = qedn_core_probe(qedn);
+	if (rc)
+		goto exit_probe_and_release_mem;
+
+	set_bit(QEDN_STATE_CORE_PROBED, &qedn->state);
+
+	rc = qed_ops->fill_dev_info(qedn->cdev, &qedn->dev_info);
+	if (rc) {
+		pr_err("fill_dev_info failed\n");
+		goto exit_probe_and_release_mem;
+	}
+
+	rc = qedn_set_nvmetcp_pf_param(qedn);
+	if (rc)
+		goto exit_probe_and_release_mem;
+
+	qed_ops->common->update_pf_params(qedn->cdev, &qedn->pf_params);
+	rc = qedn_slowpath_start(qedn);
+	if (rc)
+		goto exit_probe_and_release_mem;
+
+	set_bit(QEDN_STATE_CORE_OPEN, &qedn->state);
+
+	rc = qed_ops->common->update_drv_state(qedn->cdev, true);
+	if (rc) {
+		pr_err("Failed to send drv state to MFW\n");
+		goto exit_probe_and_release_mem;
+	}
+
+	set_bit(QEDN_STATE_MFW_STATE, &qedn->state);
+
 	qedn->qedn_ofld_dev.ops = &qedn_ofld_ops;
 	INIT_LIST_HEAD(&qedn->qedn_ofld_dev.entry);
 	rc = nvme_tcp_ofld_register_dev(&qedn->qedn_ofld_dev);
 	if (rc)
-		goto release_qedn;
+		goto exit_probe_and_release_mem;
+
+	set_bit(QEDN_STATE_REGISTERED_OFFLOAD_DEV, &qedn->state);
 
 	return 0;
-release_qedn:
-	kfree(qedn);
+exit_probe_and_release_mem:
+	__qedn_remove(pdev);
+	pr_err("probe ended with error\n");
 
 	return rc;
 }
@@ -173,6 +310,13 @@ static int __init qedn_init(void)
 {
 	int rc;
 
+	qed_ops = qed_get_nvmetcp_ops();
+	if (!qed_ops) {
+		pr_err("Failed to get QED NVMeTCP ops\n");
+
+		return -EINVAL;
+	}
+
 	rc = pci_register_driver(&qedn_pci_driver);
 	if (rc) {
 		pr_err("Failed to register pci driver\n");
@@ -188,6 +332,7 @@ static int __init qedn_init(void)
 static void __exit qedn_cleanup(void)
 {
 	pci_unregister_driver(&qedn_pci_driver);
+	qed_put_nvmetcp_ops();
 	pr_notice("Unloading qedn ended\n");
 }
 
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 11/20] qedn: Add qedn_claim_dev API support
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (9 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 10/20] qedn: Add qedn probe Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 12/20] qedn: Add IRQ and fast-path resources initializations Prabhakar Kushwaha
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024, Nikolay Assa

From: Nikolay Assa <nassa@marvell.com>

This patch introduces the qedn_claim_dev() network service which the
offload device (qedn) is using through the paired net-device (qede).
qedn_claim_dev() returns true if the IP addr(IPv4 or IPv6) of the target
server is reachable via the net-device which is paired with the
offloaded device.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Nikolay Assa <nassa@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/hw/qedn/qedn.h      |  4 +++
 drivers/nvme/hw/qedn/qedn_main.c | 55 ++++++++++++++++++++++++++++++--
 2 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 931efc3afbaa..0ce1e19d1ba8 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -8,6 +8,10 @@
 
 #include <linux/qed/qed_if.h>
 #include <linux/qed/qed_nvmetcp_if.h>
+#include <linux/qed/qed_nvmetcp_ip_services_if.h>
+#include <linux/qed/qed_chain.h>
+#include <linux/qed/storage_common.h>
+#include <linux/qed/nvmetcp_common.h>
 
 /* Driver includes */
 #include "../../host/tcp-offload.h"
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 97591797605e..78bc9fe17e7b 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -22,13 +22,62 @@ static struct pci_device_id qedn_pci_tbl[] = {
 	{0, 0},
 };
 
+static int
+qedn_find_dev(struct nvme_tcp_ofld_dev *dev,
+	      struct nvme_tcp_ofld_ctrl *ctrl)
+{
+	struct nvme_tcp_ofld_ctrl_con_params *conn_params;
+	struct pci_dev *qede_pdev = NULL;
+	struct sockaddr remote_mac_addr;
+	struct net_device *ndev = NULL;
+	u16 vlan_id = 0;
+	int rc = 0;
+
+	conn_params = &ctrl->conn_params;
+
+	/* qedn utilizes host network stack through paired qede device for
+	 * non-offload traffic. First we verify there is valid route to remote
+	 * peer.
+	 */
+	if (conn_params->remote_ip_addr.ss_family == AF_INET) {
+		rc = qed_route_ipv4(&conn_params->local_ip_addr,
+				    &conn_params->remote_ip_addr,
+				    &remote_mac_addr, &ndev);
+	} else if (conn_params->remote_ip_addr.ss_family == AF_INET6) {
+		rc = qed_route_ipv6(&conn_params->local_ip_addr,
+				    &conn_params->remote_ip_addr,
+				    &remote_mac_addr, &ndev);
+	} else {
+		pr_err("address family %d not supported\n",
+		       conn_params->remote_ip_addr.ss_family);
+
+		return false;
+	}
+
+	if (rc)
+		return false;
+
+	if (!ctrl->private_data && ctrl->ndev &&
+	    strcmp(ctrl->ndev->name, ndev->name))
+		return false;
+
+	ctrl->ndev = ndev;
+
+	qed_vlan_get_ndev(&ctrl->ndev, &vlan_id);
+
+	/* route found through ndev - validate this is qede*/
+	qede_pdev = qed_validate_ndev(ctrl->ndev);
+	if (!qede_pdev)
+		return false;
+
+	return true;
+}
+
 static int
 qedn_claim_dev(struct nvme_tcp_ofld_dev *dev,
 	       struct nvme_tcp_ofld_ctrl *ctrl)
 {
-	/* Placeholder - qedn_claim_dev */
-
-	return 0;
+	return qedn_find_dev(dev, ctrl);
 }
 
 static int qedn_setup_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 12/20] qedn: Add IRQ and fast-path resources initializations
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (10 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 11/20] qedn: Add qedn_claim_dev API support Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 13/20] qedn: Add connection-level slowpath functionality Prabhakar Kushwaha
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

From: Shai Malin <smalin@marvell.com>

This patch will present the adding of qedn_fp_queue - this is a per cpu
core element which handles all of the connections on that cpu core.
The qedn_fp_queue will handle a group of connections (NVMeoF QPs) which
are handled on the same cpu core, and will only use the same FW-driver
resources with no need to be related to the same NVMeoF controller.

The per qedn_fq_queue resources are the FW CQ and FW status block:
- The FW CQ will be used for the FW to notify the driver that the
  the exchange has ended and the FW will pass the incoming NVMeoF CQE
  (if exist) to the driver.
- FW status block - which is used for the FW to notify the driver with
  the producer update of the FW CQE chain.

The FW fast-path queues are based on qed_chain.h

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/hw/qedn/qedn.h      |  25 +++
 drivers/nvme/hw/qedn/qedn_main.c | 289 ++++++++++++++++++++++++++++++-
 2 files changed, 311 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 0ce1e19d1ba8..edb0836bca87 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -24,17 +24,39 @@
 #define QEDN_PROTO_CQ_PROD_IDX	0
 #define QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES 2
 
+#define QEDN_PAGE_SIZE	4096 /* FW page size - Configurable */
+#define QEDN_IRQ_NAME_LEN 24
+#define QEDN_IRQ_NO_FLAGS 0
+
+#define QEDN_TCP_RTO_DEFAULT 280
+
 enum qedn_state {
 	QEDN_STATE_CORE_PROBED = 0,
 	QEDN_STATE_CORE_OPEN,
 	QEDN_STATE_MFW_STATE,
+	QEDN_STATE_NVMETCP_OPEN,
+	QEDN_STATE_IRQ_SET,
+	QEDN_STATE_FP_WORK_THREAD_SET,
 	QEDN_STATE_REGISTERED_OFFLOAD_DEV,
 	QEDN_STATE_MODULE_REMOVE_ONGOING,
 };
 
+/* Per CPU core params */
+struct qedn_fp_queue {
+	struct qed_chain cq_chain;
+	u16 *cq_prod;
+	struct mutex cq_mutex; /* cq handler mutex */
+	struct qedn_ctx	*qedn;
+	struct qed_sb_info *sb_info;
+	unsigned int cpu;
+	u16 sb_id;
+	char irqname[QEDN_IRQ_NAME_LEN];
+};
+
 struct qedn_ctx {
 	struct pci_dev *pdev;
 	struct qed_dev *cdev;
+	struct qed_int_info int_info;
 	struct qed_dev_nvmetcp_info dev_info;
 	struct nvme_tcp_ofld_dev qedn_ofld_dev;
 	struct qed_pf_params pf_params;
@@ -44,6 +66,9 @@ struct qedn_ctx {
 
 	/* Fast path queues */
 	u8 num_fw_cqs;
+	struct qedn_fp_queue *fp_q_arr;
+	struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;
+	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
 };
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 78bc9fe17e7b..6c0f36f7d9d1 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -150,6 +150,104 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
 	.send_req = qedn_send_req,
 };
 
+/* Fastpath IRQ handler */
+static irqreturn_t qedn_irq_handler(int irq, void *dev_id)
+{
+	/* Placeholder */
+
+	return IRQ_HANDLED;
+}
+
+static void qedn_sync_free_irqs(struct qedn_ctx *qedn)
+{
+	u16 vector_idx;
+	int i;
+
+	for (i = 0; i < qedn->num_fw_cqs; i++) {
+		vector_idx = i * qedn->dev_info.common.num_hwfns +
+			     qed_ops->common->get_affin_hwfn_idx(qedn->cdev);
+		synchronize_irq(qedn->int_info.msix[vector_idx].vector);
+		irq_set_affinity_hint(qedn->int_info.msix[vector_idx].vector,
+				      NULL);
+		free_irq(qedn->int_info.msix[vector_idx].vector,
+			 &qedn->fp_q_arr[i]);
+	}
+
+	qedn->int_info.used_cnt = 0;
+	qed_ops->common->set_fp_int(qedn->cdev, 0);
+}
+
+static int qedn_request_msix_irq(struct qedn_ctx *qedn)
+{
+	struct pci_dev *pdev = qedn->pdev;
+	struct qedn_fp_queue *fp_q = NULL;
+	int i, rc, cpu;
+	u16 vector_idx;
+	u32 vector;
+
+	/* numa-awareness will be added in future enhancements */
+	cpu = cpumask_first(cpu_online_mask);
+	for (i = 0; i < qedn->num_fw_cqs; i++) {
+		fp_q = &qedn->fp_q_arr[i];
+		vector_idx = i * qedn->dev_info.common.num_hwfns +
+			     qed_ops->common->get_affin_hwfn_idx(qedn->cdev);
+		vector = qedn->int_info.msix[vector_idx].vector;
+		sprintf(fp_q->irqname, "qedn_queue_%x.%x.%x_%d",
+			pdev->bus->number, PCI_SLOT(pdev->devfn),
+			PCI_FUNC(pdev->devfn), i);
+		rc = request_irq(vector, qedn_irq_handler, QEDN_IRQ_NO_FLAGS,
+				 fp_q->irqname, fp_q);
+		if (rc) {
+			pr_err("request_irq failed.\n");
+			qedn_sync_free_irqs(qedn);
+
+			return rc;
+		}
+
+		fp_q->cpu = cpu;
+		qedn->int_info.used_cnt++;
+		rc = irq_set_affinity_hint(vector, get_cpu_mask(cpu));
+		cpu = cpumask_next_wrap(cpu, cpu_online_mask, -1, false);
+	}
+
+	return 0;
+}
+
+static int qedn_setup_irq(struct qedn_ctx *qedn)
+{
+	int rc = 0;
+	u8 rval;
+
+	rval = qed_ops->common->set_fp_int(qedn->cdev, qedn->num_fw_cqs);
+	if (rval < qedn->num_fw_cqs) {
+		qedn->num_fw_cqs = rval;
+		if (rval == 0) {
+			pr_err("set_fp_int return 0 IRQs\n");
+
+			return -ENODEV;
+		}
+	}
+
+	rc = qed_ops->common->get_fp_int(qedn->cdev, &qedn->int_info);
+	if (rc) {
+		pr_err("get_fp_int failed\n");
+		goto exit_setup_int;
+	}
+
+	if (qedn->int_info.msix_cnt) {
+		rc = qedn_request_msix_irq(qedn);
+		goto exit_setup_int;
+	} else {
+		pr_err("msix_cnt = 0\n");
+		rc = -EINVAL;
+		goto exit_setup_int;
+	}
+
+exit_setup_int:
+
+	return rc;
+}
+
 static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)
 {
 	/* Placeholder - Initialize qedn fields */
@@ -180,21 +278,174 @@ static inline int qedn_core_probe(struct qedn_ctx *qedn)
 	return rc;
 }
 
+static void qedn_free_function_queues(struct qedn_ctx *qedn)
+{
+	struct qed_sb_info *sb_info = NULL;
+	struct qedn_fp_queue *fp_q;
+	int i;
+
+	/* Free workqueues */
+
+	/* Free the fast path queues*/
+	for (i = 0; i < qedn->num_fw_cqs; i++) {
+		fp_q = &qedn->fp_q_arr[i];
+
+		/* Free SB */
+		sb_info = fp_q->sb_info;
+		if (sb_info->sb_virt) {
+			qed_ops->common->sb_release(qedn->cdev, sb_info,
+						    fp_q->sb_id,
+						    QED_SB_TYPE_STORAGE);
+			dma_free_coherent(&qedn->pdev->dev,
+					  sizeof(*sb_info->sb_virt),
+					  (void *)sb_info->sb_virt,
+					  sb_info->sb_phys);
+			memset(sb_info, 0, sizeof(*sb_info));
+			kfree(sb_info);
+			fp_q->sb_info = NULL;
+		}
+
+		qed_ops->common->chain_free(qedn->cdev, &fp_q->cq_chain);
+	}
+
+	if (qedn->fw_cq_array_virt)
+		dma_free_coherent(&qedn->pdev->dev,
+				  qedn->num_fw_cqs * sizeof(u64),
+				  qedn->fw_cq_array_virt,
+				  qedn->fw_cq_array_phy);
+	kfree(qedn->fp_q_arr);
+	qedn->fp_q_arr = NULL;
+}
+
+static int qedn_alloc_and_init_sb(struct qedn_ctx *qedn,
+				  struct qed_sb_info *sb_info, u16 sb_id)
+{
+	int rc = 0;
+
+	sb_info->sb_virt = dma_alloc_coherent(&qedn->pdev->dev,
+					      sizeof(struct status_block_e4),
+					      &sb_info->sb_phys, GFP_KERNEL);
+	if (!sb_info->sb_virt) {
+		pr_err("Status block allocation failed\n");
+
+		return -ENOMEM;
+	}
+
+	rc = qed_ops->common->sb_init(qedn->cdev, sb_info, sb_info->sb_virt,
+				      sb_info->sb_phys, sb_id,
+				      QED_SB_TYPE_STORAGE);
+	if (rc) {
+		pr_err("Status block initialization failed\n");
+
+		return rc;
+	}
+
+	return 0;
+}
+
+static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
+{
+	struct qed_chain_init_params chain_params = {};
+	struct status_block_e4 *sb = NULL;
+	struct qedn_fp_queue *fp_q = NULL;
+	int rc = 0, arr_size;
+	u64 cq_phy_addr;
+	int i;
+
+	/* Place holder - IO-path workqueues */
+
+	qedn->fp_q_arr = kcalloc(qedn->num_fw_cqs,
+				 sizeof(struct qedn_fp_queue), GFP_KERNEL);
+	if (!qedn->fp_q_arr)
+		return -ENOMEM;
+
+	arr_size = qedn->num_fw_cqs * sizeof(struct nvmetcp_glbl_queue_entry);
+	qedn->fw_cq_array_virt = dma_alloc_coherent(&qedn->pdev->dev,
+						    arr_size,
+						    &qedn->fw_cq_array_phy,
+						    GFP_KERNEL);
+	if (!qedn->fw_cq_array_virt) {
+		rc = -ENOMEM;
+		goto mem_alloc_failure;
+	}
+
+	/* placeholder - create task pools */
+
+	for (i = 0; i < qedn->num_fw_cqs; i++) {
+		fp_q = &qedn->fp_q_arr[i];
+		mutex_init(&fp_q->cq_mutex);
+
+		/* FW CQ */
+		chain_params.intended_use = QED_CHAIN_USE_TO_CONSUME,
+		chain_params.mode = QED_CHAIN_MODE_PBL,
+		chain_params.cnt_type = QED_CHAIN_CNT_TYPE_U16,
+		chain_params.num_elems = QEDN_FW_CQ_SIZE;
+		/* Placeholder - sizeof(struct nvmetcp_fw_cqe)*/
+		chain_params.elem_size = 64;
+
+		rc = qed_ops->common->chain_alloc(qedn->cdev,
+						  &fp_q->cq_chain,
+						  &chain_params);
+		if (rc) {
+			pr_err("CQ chain pci_alloc_consistent fail\n");
+			goto mem_alloc_failure;
+		}
+
+		cq_phy_addr = qed_chain_get_pbl_phys(&fp_q->cq_chain);
+		qedn->fw_cq_array_virt[i].cq_pbl_addr.hi = PTR_HI(cq_phy_addr);
+		qedn->fw_cq_array_virt[i].cq_pbl_addr.lo = PTR_LO(cq_phy_addr);
+
+		/* SB */
+		fp_q->sb_info = kzalloc(sizeof(*fp_q->sb_info), GFP_KERNEL);
+		if (!fp_q->sb_info)
+			goto mem_alloc_failure;
+
+		fp_q->sb_id = i;
+		rc = qedn_alloc_and_init_sb(qedn, fp_q->sb_info, fp_q->sb_id);
+		if (rc) {
+			pr_err("SB allocation and initialization failed.\n");
+			goto mem_alloc_failure;
+		}
+
+		sb = fp_q->sb_info->sb_virt;
+		fp_q->cq_prod = (u16 *)&sb->pi_array[QEDN_PROTO_CQ_PROD_IDX];
+		fp_q->qedn = qedn;
+
+		/* Placeholder - Init IO-path workqueue */
+
+		/* Placeholder - Init IO-path resources */
+	}
+
+	return 0;
+
+mem_alloc_failure:
+	pr_err("Function allocation failed\n");
+	qedn_free_function_queues(qedn);
+
+	return rc;
+}
+
 static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)
 {
 	u32 fw_conn_queue_pages = QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES;
 	struct qed_nvmetcp_pf_params *pf_params;
+	int rc;
 
 	pf_params = &qedn->pf_params.nvmetcp_pf_params;
 	memset(pf_params, 0, sizeof(*pf_params));
 	qedn->num_fw_cqs = min_t(u8, qedn->dev_info.num_cqs, num_online_cpus());
+	pr_info("Num qedn FW CQs %u\n", qedn->num_fw_cqs);
 
 	pf_params->num_cons = QEDN_MAX_CONNS_PER_PF;
 	pf_params->num_tasks = QEDN_MAX_TASKS_PER_PF;
 
-	/* Placeholder - Initialize function level queues */
+	rc = qedn_alloc_function_queues(qedn);
+	if (rc) {
+		pr_err("Global queue allocation failed.\n");
+		goto err_alloc_mem;
+	}
 
-	/* Placeholder - Initialize TCP params */
+	set_bit(QEDN_STATE_FP_WORK_THREAD_SET, &qedn->state);
 
 	/* Queues */
 	pf_params->num_sq_pages_in_ring = fw_conn_queue_pages;
@@ -202,11 +453,14 @@ static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)
 	pf_params->num_uhq_pages_in_ring = fw_conn_queue_pages;
 	pf_params->num_queues = qedn->num_fw_cqs;
 	pf_params->cq_num_entries = QEDN_FW_CQ_SIZE;
+	pf_params->glbl_q_params_addr = qedn->fw_cq_array_phy;
 
 	/* the CQ SB pi */
 	pf_params->gl_rq_pi = QEDN_PROTO_CQ_PROD_IDX;
 
-	return 0;
+err_alloc_mem:
+
+	return rc;
 }
 
 static inline int qedn_slowpath_start(struct qedn_ctx *qedn)
@@ -241,6 +495,12 @@ static void __qedn_remove(struct pci_dev *pdev)
 	if (test_and_clear_bit(QEDN_STATE_REGISTERED_OFFLOAD_DEV, &qedn->state))
 		nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
 
+	if (test_and_clear_bit(QEDN_STATE_IRQ_SET, &qedn->state))
+		qedn_sync_free_irqs(qedn);
+
+	if (test_and_clear_bit(QEDN_STATE_NVMETCP_OPEN, &qedn->state))
+		qed_ops->stop(qedn->cdev);
+
 	if (test_and_clear_bit(QEDN_STATE_MFW_STATE, &qedn->state)) {
 		rc = qed_ops->common->update_drv_state(qedn->cdev, false);
 		if (rc)
@@ -250,6 +510,9 @@ static void __qedn_remove(struct pci_dev *pdev)
 	if (test_and_clear_bit(QEDN_STATE_CORE_OPEN, &qedn->state))
 		qed_ops->common->slowpath_stop(qedn->cdev);
 
+	if (test_and_clear_bit(QEDN_STATE_FP_WORK_THREAD_SET, &qedn->state))
+		qedn_free_function_queues(qedn);
+
 	if (test_and_clear_bit(QEDN_STATE_CORE_PROBED, &qedn->state))
 		qed_ops->common->remove(qedn->cdev);
 
@@ -318,6 +581,25 @@ static int __qedn_probe(struct pci_dev *pdev)
 
 	set_bit(QEDN_STATE_CORE_OPEN, &qedn->state);
 
+	rc = qedn_setup_irq(qedn);
+	if (rc)
+		goto exit_probe_and_release_mem;
+
+	set_bit(QEDN_STATE_IRQ_SET, &qedn->state);
+
+	/* NVMeTCP start HW PF */
+	rc = qed_ops->start(qedn->cdev,
+			    NULL /* Placeholder for FW IO-path resources */,
+			    qedn,
+			    NULL /* Placeholder for FW Event callback */);
+	if (rc) {
+		rc = -ENODEV;
+		pr_err("Cannot start NVMeTCP Function\n");
+		goto exit_probe_and_release_mem;
+	}
+
+	set_bit(QEDN_STATE_NVMETCP_OPEN, &qedn->state);
+
 	rc = qed_ops->common->update_drv_state(qedn->cdev, true);
 	if (rc) {
 		pr_err("Failed to send drv state to MFW\n");
@@ -326,6 +608,7 @@ static int __qedn_probe(struct pci_dev *pdev)
 
 	set_bit(QEDN_STATE_MFW_STATE, &qedn->state);
 
+	qedn->qedn_ofld_dev.num_hw_vectors = qedn->num_fw_cqs;
 	qedn->qedn_ofld_dev.ops = &qedn_ofld_ops;
 	INIT_LIST_HEAD(&qedn->qedn_ofld_dev.entry);
 	rc = nvme_tcp_ofld_register_dev(&qedn->qedn_ofld_dev);
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 13/20] qedn: Add connection-level slowpath functionality
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (11 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 12/20] qedn: Add IRQ and fast-path resources initializations Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 14/20] qedn: Add support of configuring HW filter block Prabhakar Kushwaha
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

This patch will present the connection (queue) level slowpath
implementation relevant for create_queue flow.

The internal implementation:
- Add per controller slowpath workqeueue via pre_setup_ctrl

- qedn_main.c:
  Includes qedn's implementation of the create_queue op.

- qedn_conn.c will include main slowpath connection level functions,
  including:
    1. Per-queue resources allocation.
    2. Creating a new connection.
    3. Offloading the connection to the FW for TCP handshake.
    4. Destroy of a connection.
    5. Support of delete and free controller.
    6. TCP port management via qed_fetch_tcp_port, qed_return_tcp_port

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/hw/qedn/Makefile    |   5 +-
 drivers/nvme/hw/qedn/qedn.h      | 179 ++++++++++
 drivers/nvme/hw/qedn/qedn_conn.c | 564 +++++++++++++++++++++++++++++++
 drivers/nvme/hw/qedn/qedn_main.c | 210 +++++++++++-
 4 files changed, 948 insertions(+), 10 deletions(-)
 create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c

diff --git a/drivers/nvme/hw/qedn/Makefile b/drivers/nvme/hw/qedn/Makefile
index 1422cd878680..ece84772d317 100644
--- a/drivers/nvme/hw/qedn/Makefile
+++ b/drivers/nvme/hw/qedn/Makefile
@@ -1,5 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
-obj-$(CONFIG_NVME_QEDN) := qedn.o
-
-qedn-y := qedn_main.o
+obj-$(CONFIG_NVME_QEDN) += qedn.o
+qedn-y := qedn_main.o qedn_conn.o
diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index edb0836bca87..38ab4ff88999 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -6,6 +6,7 @@
 #ifndef _QEDN_H_
 #define _QEDN_H_
 
+#include <linux/qed/common_hsi.h>
 #include <linux/qed/qed_if.h>
 #include <linux/qed/qed_nvmetcp_if.h>
 #include <linux/qed/qed_nvmetcp_ip_services_if.h>
@@ -28,7 +29,41 @@
 #define QEDN_IRQ_NAME_LEN 24
 #define QEDN_IRQ_NO_FLAGS 0
 
+/* Destroy connection defines */
+#define QEDN_NON_ABORTIVE_TERMINATION 0
+#define QEDN_ABORTIVE_TERMINATION 1
+
+/*
+ * TCP offload stack default configurations and defines.
+ * Future enhancements will allow controlling the configurable
+ * parameters via devlink.
+ */
 #define QEDN_TCP_RTO_DEFAULT 280
+#define QEDN_TCP_ECN_EN 0
+#define QEDN_TCP_TS_EN 0
+#define QEDN_TCP_DA_EN 0
+#define QEDN_TCP_KA_EN 0
+#define QEDN_TCP_TOS 0
+#define QEDN_TCP_TTL 0xfe
+#define QEDN_TCP_FLOW_LABEL 0
+#define QEDN_TCP_KA_TIMEOUT 7200000
+#define QEDN_TCP_KA_INTERVAL 10000
+#define QEDN_TCP_KA_MAX_PROBE_COUNT 10
+#define QEDN_TCP_MAX_RT_TIME 30000
+#define QEDN_TCP_MAX_CWND 4
+#define QEDN_TCP_RCV_WND_SCALE 2
+#define QEDN_TCP_TS_OPTION_LEN 12
+
+/* SP Work queue defines */
+#define QEDN_SP_WORKQUEUE "qedn_sp_wq"
+#define QEDN_SP_WORKQUEUE_MAX_ACTIVE 1
+
+#define QEDN_HOST_MAX_SQ_SIZE (512)
+#define QEDN_SQ_SIZE (2 * QEDN_HOST_MAX_SQ_SIZE)
+
+/* Timeouts and delay constants */
+#define QEDN_WAIT_CON_ESTABLSH_TMO 10000 /* 10 seconds */
+#define QEDN_RLS_CONS_TMO 5000 /* 5 sec */
 
 enum qedn_state {
 	QEDN_STATE_CORE_PROBED = 0,
@@ -64,6 +99,12 @@ struct qedn_ctx {
 	/* Accessed with atomic bit ops, used with enum qedn_state */
 	unsigned long state;
 
+	u8 local_mac_addr[ETH_ALEN];
+	u16 mtu;
+
+	/* Connections */
+	DECLARE_HASHTABLE(conn_ctx_hash, 16);
+
 	/* Fast path queues */
 	u8 num_fw_cqs;
 	struct qedn_fp_queue *fp_q_arr;
@@ -71,4 +112,142 @@ struct qedn_ctx {
 	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
 };
 
+struct qedn_endpoint {
+	/* FW Params */
+	struct qed_chain fw_sq_chain;
+	struct nvmetcp_db_data db_data;
+	void __iomem *p_doorbell;
+
+	/* TCP Params */
+	__be32 dst_addr[4]; /* In network order */
+	__be32 src_addr[4]; /* In network order */
+	u16 src_port;
+	u16 dst_port;
+	u16 vlan_id;
+	u8 src_mac[ETH_ALEN];
+	u8 dst_mac[ETH_ALEN];
+	u8 ip_type;
+};
+
+enum sp_work_agg_action {
+	CREATE_CONNECTION = 0,
+	SEND_ICREQ,
+	HANDLE_ICRESP,
+	DESTROY_CONNECTION,
+};
+
+enum qedn_ctrl_agg_state {
+	QEDN_CTRL_SET_TO_OFLD_CTRL = 0, /* CTRL set to OFLD_CTRL */
+	QEDN_STATE_SP_WORK_THREAD_SET, /* slow patch WQ was created*/
+	LLH_FILTER, /* LLH filter added */
+	QEDN_RECOVERY,
+	ADMINQ_CONNECTED, /* At least one connection has attempted offload */
+	ERR_FLOW,
+};
+
+enum qedn_ctrl_sp_wq_state {
+	QEDN_CTRL_STATE_UNINITIALIZED = 0,
+	QEDN_CTRL_STATE_FREE_CTRL,
+	QEDN_CTRL_STATE_CTRL_ERR,
+};
+
+/* Any change to this enum requires an update of qedn_conn_state_str */
+enum qedn_conn_state {
+	CONN_STATE_CONN_IDLE = 0,
+	CONN_STATE_CREATE_CONNECTION,
+	CONN_STATE_WAIT_FOR_CONNECT_DONE,
+	CONN_STATE_OFFLOAD_COMPLETE,
+	CONN_STATE_WAIT_FOR_UPDATE_EQE,
+	CONN_STATE_WAIT_FOR_IC_COMP,
+	CONN_STATE_NVMETCP_CONN_ESTABLISHED,
+	CONN_STATE_DESTROY_CONNECTION,
+	CONN_STATE_WAIT_FOR_DESTROY_DONE,
+	CONN_STATE_DESTROY_COMPLETE
+};
+
+struct qedn_ctrl {
+	struct list_head glb_entry;
+	struct list_head pf_entry;
+
+	struct qedn_ctx *qedn;
+	struct nvme_tcp_ofld_queue *queue;
+	struct nvme_tcp_ofld_ctrl *ctrl;
+
+	struct sockaddr remote_mac_addr;
+	u16 vlan_id;
+
+	struct workqueue_struct *sp_wq;
+	enum qedn_ctrl_sp_wq_state sp_wq_state;
+
+	struct work_struct sp_wq_entry;
+
+	struct qedn_llh_filter *llh_filter;
+
+	unsigned long agg_state;
+
+	atomic_t host_num_active_conns;
+};
+
+/* Connection level struct */
+struct qedn_conn_ctx {
+	struct qedn_ctx *qedn;
+	struct nvme_tcp_ofld_queue *queue;
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	u32 conn_handle;
+	u32 fw_cid;
+
+	atomic_t est_conn_indicator;
+	atomic_t destroy_conn_indicator;
+	wait_queue_head_t conn_waitq;
+
+	struct work_struct sp_wq_entry;
+
+	/* Connection aggregative state.
+	 * Can have different states independently.
+	 */
+	unsigned long agg_work_action;
+
+	struct hlist_node hash_node;
+	struct nvmetcp_host_cccid_itid_entry *host_cccid_itid;
+	dma_addr_t host_cccid_itid_phy_addr;
+	struct qedn_endpoint ep;
+	int abrt_flag;
+
+	/* Connection resources - turned on to indicate what resource was
+	 * allocated, to that it can later be released.
+	 */
+	unsigned long resrc_state;
+
+	/* Connection state */
+	spinlock_t conn_state_lock;
+	enum qedn_conn_state state;
+
+	size_t sq_depth;
+
+	/* "dummy" socket */
+	struct socket *sock;
+};
+
+enum qedn_conn_resources_state {
+	QEDN_CONN_RESRC_FW_SQ,
+	QEDN_CONN_RESRC_ACQUIRE_CONN,
+	QEDN_CONN_RESRC_CCCID_ITID_MAP,
+	QEDN_CONN_RESRC_TCP_PORT,
+	QEDN_CONN_RESRC_DB_ADD,
+	QEDN_CONN_RESRC_MAX = 64
+};
+
+struct qedn_conn_ctx *qedn_get_conn_hash(struct qedn_ctx *qedn, u16 icid);
+int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data);
+void qedn_sp_wq_handler(struct work_struct *work);
+void qedn_set_sp_wa(struct qedn_conn_ctx *conn_ctx, u32 bit);
+void qedn_clr_sp_wa(struct qedn_conn_ctx *conn_ctx, u32 bit);
+int qedn_initialize_endpoint(struct qedn_endpoint *ep, u8 *local_mac_addr,
+			     struct nvme_tcp_ofld_ctrl *ctrl);
+int qedn_wait_for_conn_est(struct qedn_conn_ctx *conn_ctx);
+int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx,
+		       enum qedn_conn_state new_state);
+void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx);
+void qedn_cleanp_fw(struct qedn_conn_ctx *conn_ctx);
+
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
new file mode 100644
index 000000000000..04c45a6fa14b
--- /dev/null
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -0,0 +1,564 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+ /* Kernel includes */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <net/tcp.h>
+
+/* Driver includes */
+#include "qedn.h"
+
+extern const struct qed_nvmetcp_ops *qed_ops;
+
+static const char * const qedn_conn_state_str[] = {
+	"CONN_IDLE",
+	"CREATE_CONNECTION",
+	"WAIT_FOR_CONNECT_DONE",
+	"OFFLOAD_COMPLETE",
+	"WAIT_FOR_UPDATE_EQE",
+	"WAIT_FOR_IC_COMP",
+	"NVMETCP_CONN_ESTABLISHED",
+	"DESTROY_CONNECTION",
+	"WAIT_FOR_DESTROY_DONE",
+	"DESTROY_COMPLETE",
+	NULL
+};
+
+int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx,
+		       enum qedn_conn_state new_state)
+{
+	spin_lock_bh(&conn_ctx->conn_state_lock);
+	conn_ctx->state = new_state;
+	spin_unlock_bh(&conn_ctx->conn_state_lock);
+
+	return 0;
+}
+
+static void qedn_return_tcp_port(struct qedn_conn_ctx *conn_ctx)
+{
+	if (conn_ctx->sock && conn_ctx->sock->sk) {
+		qed_return_tcp_port(conn_ctx->sock);
+		conn_ctx->sock = NULL;
+	}
+
+	conn_ctx->ep.src_port = 0;
+}
+
+int qedn_wait_for_conn_est(struct qedn_conn_ctx *conn_ctx)
+{
+	u32 est_timeout = msecs_to_jiffies(QEDN_WAIT_CON_ESTABLSH_TMO);
+	atomic_t *conn_est_ind = &conn_ctx->est_conn_indicator;
+	int wrc, rc;
+
+	wrc = wait_event_interruptible_timeout(conn_ctx->conn_waitq,
+					       atomic_read(conn_est_ind) > 0,
+					       est_timeout);
+	atomic_set(conn_est_ind, 0);
+	if (!wrc ||
+	    conn_ctx->state != CONN_STATE_NVMETCP_CONN_ESTABLISHED) {
+		rc = -ETIMEDOUT;
+
+		/* If error was prior or during offload, conn_ctx was released.
+		 * If the error was after offload sync has completed, we need to
+		 * terminate the connection ourselves.
+		 */
+		if (conn_ctx &&
+		    conn_ctx->state >= CONN_STATE_WAIT_FOR_CONNECT_DONE &&
+		    conn_ctx->state <= CONN_STATE_NVMETCP_CONN_ESTABLISHED)
+			qedn_terminate_connection(conn_ctx);
+	} else {
+		rc = 0;
+	}
+
+	return rc;
+}
+
+int qedn_fill_ep_addr4(struct qedn_endpoint *ep,
+		       struct nvme_tcp_ofld_ctrl_con_params *conn_params)
+{
+	struct sockaddr_in *raddr, *laddr;
+
+	raddr = (struct sockaddr_in *)&conn_params->remote_ip_addr;
+	laddr = (struct sockaddr_in *)&conn_params->local_ip_addr;
+
+	ep->ip_type = TCP_IPV4;
+	ep->src_port = laddr->sin_port;
+	ep->dst_port = ntohs(raddr->sin_port);
+
+	ep->src_addr[0] = laddr->sin_addr.s_addr;
+	ep->dst_addr[0] = raddr->sin_addr.s_addr;
+
+	return 0;
+}
+
+int qedn_fill_ep_addr6(struct qedn_endpoint *ep,
+		       struct nvme_tcp_ofld_ctrl_con_params *conn_params)
+{
+	struct sockaddr_in6 *raddr6, *laddr6;
+	int i;
+
+	raddr6 = (struct sockaddr_in6 *)&conn_params->remote_ip_addr;
+	laddr6 = (struct sockaddr_in6 *)&conn_params->local_ip_addr;
+
+	ep->ip_type = TCP_IPV6;
+	ep->src_port = laddr6->sin6_port;
+	ep->dst_port = ntohs(raddr6->sin6_port);
+
+	for (i = 0; i < 4; i++) {
+		ep->src_addr[i] = laddr6->sin6_addr.in6_u.u6_addr32[i];
+		ep->dst_addr[i] = raddr6->sin6_addr.in6_u.u6_addr32[i];
+	}
+
+	return 0;
+}
+
+int qedn_initialize_endpoint(struct qedn_endpoint *ep, u8 *local_mac_addr,
+			     struct nvme_tcp_ofld_ctrl *ctrl)
+{
+	struct nvme_tcp_ofld_ctrl_con_params *conn_params = &ctrl->conn_params;
+	struct qedn_ctrl *qctrl;
+
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+	if (!qctrl)
+		return -ENODEV;
+
+	ether_addr_copy(ep->dst_mac, qctrl->remote_mac_addr.sa_data);
+	ether_addr_copy(ep->src_mac, local_mac_addr);
+	ep->vlan_id = qctrl->vlan_id;
+	if (conn_params->remote_ip_addr.ss_family == AF_INET)
+		qedn_fill_ep_addr4(ep, conn_params);
+	else
+		qedn_fill_ep_addr6(ep, conn_params);
+
+	return -1;
+}
+
+static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int rc = 0;
+
+	if (test_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state)) {
+		qed_ops->common->chain_free(qedn->cdev,
+					    &conn_ctx->ep.fw_sq_chain);
+		clear_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
+	}
+
+	if (test_bit(QEDN_CONN_RESRC_DB_ADD, &conn_ctx->resrc_state)) {
+		rc = qed_ops->common->db_recovery_del(qedn->cdev,
+						      conn_ctx->ep.p_doorbell,
+						      &conn_ctx->ep.db_data);
+		if (rc)
+			pr_warn("Doorbell recovery del returned error %u\n",
+				rc);
+
+		clear_bit(QEDN_CONN_RESRC_DB_ADD, &conn_ctx->resrc_state);
+	}
+
+	if (test_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state)) {
+		hash_del(&conn_ctx->hash_node);
+		rc = qed_ops->release_conn(qedn->cdev, conn_ctx->conn_handle);
+		if (rc)
+			pr_warn("Release_conn returned with an error %u\n",
+				rc);
+
+		clear_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
+	}
+
+	if (test_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state)) {
+		dma_free_coherent(&qedn->pdev->dev,
+				  conn_ctx->sq_depth *
+				  sizeof(struct nvmetcp_host_cccid_itid_entry),
+				  conn_ctx->host_cccid_itid,
+				  conn_ctx->host_cccid_itid_phy_addr);
+		clear_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP,
+			  &conn_ctx->resrc_state);
+	}
+
+	if (test_bit(QEDN_CONN_RESRC_TCP_PORT, &conn_ctx->resrc_state)) {
+		qedn_return_tcp_port(conn_ctx);
+		clear_bit(QEDN_CONN_RESRC_TCP_PORT,
+			  &conn_ctx->resrc_state);
+	}
+
+	if (conn_ctx->resrc_state)
+		pr_err("Conn resources state isn't 0 as expected 0x%lx\n",
+		       conn_ctx->resrc_state);
+
+	atomic_inc(&conn_ctx->destroy_conn_indicator);
+	qedn_set_con_state(conn_ctx, CONN_STATE_DESTROY_COMPLETE);
+	wake_up_interruptible(&conn_ctx->conn_waitq);
+}
+
+static int qedn_alloc_fw_sq(struct qedn_ctx *qedn,
+			    struct qedn_endpoint *ep)
+{
+	struct qed_chain_init_params params = {
+		.mode           = QED_CHAIN_MODE_PBL,
+		.intended_use   = QED_CHAIN_USE_TO_PRODUCE,
+		.cnt_type       = QED_CHAIN_CNT_TYPE_U16,
+		.num_elems      = QEDN_SQ_SIZE,
+		.elem_size      = sizeof(struct nvmetcp_wqe),
+	};
+	int rc;
+
+	rc = qed_ops->common->chain_alloc(qedn->cdev,
+					   &ep->fw_sq_chain,
+					   &params);
+	if (rc) {
+		pr_err("Failed to allocate SQ chain\n");
+
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int qedn_nvmetcp_offload_conn(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qed_nvmetcp_params_offload offld_prms = { 0 };
+	struct qedn_endpoint *qedn_ep = &conn_ctx->ep;
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	u8 ts_hdr_size = 0;
+	u32 hdr_size;
+	int rc, i;
+
+	ether_addr_copy(offld_prms.src.mac, qedn_ep->src_mac);
+	ether_addr_copy(offld_prms.dst.mac, qedn_ep->dst_mac);
+	offld_prms.vlan_id = qedn_ep->vlan_id;
+	offld_prms.ecn_en = QEDN_TCP_ECN_EN;
+	offld_prms.timestamp_en =  QEDN_TCP_TS_EN;
+	offld_prms.delayed_ack_en = QEDN_TCP_DA_EN;
+	offld_prms.tcp_keep_alive_en = QEDN_TCP_KA_EN;
+	offld_prms.ip_version = qedn_ep->ip_type;
+
+	offld_prms.src.ip[0] = ntohl(qedn_ep->src_addr[0]);
+	offld_prms.dst.ip[0] = ntohl(qedn_ep->dst_addr[0]);
+	if (qedn_ep->ip_type == TCP_IPV6) {
+		for (i = 1; i < 4; i++) {
+			offld_prms.src.ip[i] = ntohl(qedn_ep->src_addr[i]);
+			offld_prms.dst.ip[i] = ntohl(qedn_ep->dst_addr[i]);
+		}
+	}
+
+	offld_prms.ttl = QEDN_TCP_TTL;
+	offld_prms.tos_or_tc = QEDN_TCP_TOS;
+	offld_prms.dst.port = qedn_ep->dst_port;
+	offld_prms.src.port = qedn_ep->src_port;
+	offld_prms.nvmetcp_cccid_itid_table_addr =
+		conn_ctx->host_cccid_itid_phy_addr;
+	offld_prms.nvmetcp_cccid_max_range = conn_ctx->sq_depth;
+
+	/* Calculate MSS */
+	if (offld_prms.timestamp_en)
+		ts_hdr_size = QEDN_TCP_TS_OPTION_LEN;
+
+	hdr_size = qedn_ep->ip_type == TCP_IPV4 ?
+		   sizeof(struct iphdr) : sizeof(struct ipv6hdr);
+	hdr_size += sizeof(struct tcphdr) + ts_hdr_size;
+
+	offld_prms.mss = qedn->mtu - hdr_size;
+	offld_prms.rcv_wnd_scale = QEDN_TCP_RCV_WND_SCALE;
+	offld_prms.cwnd = QEDN_TCP_MAX_CWND * offld_prms.mss;
+	offld_prms.ka_max_probe_cnt = QEDN_TCP_KA_MAX_PROBE_COUNT;
+	offld_prms.ka_timeout = QEDN_TCP_KA_TIMEOUT;
+	offld_prms.ka_interval = QEDN_TCP_KA_INTERVAL;
+	offld_prms.max_rt_time = QEDN_TCP_MAX_RT_TIME;
+	offld_prms.sq_pbl_addr =
+		(u64)qed_chain_get_pbl_phys(&qedn_ep->fw_sq_chain);
+
+	rc = qed_ops->offload_conn(qedn->cdev,
+				   conn_ctx->conn_handle,
+				   &offld_prms);
+	if (rc)
+		pr_err("offload_conn returned with an error\n");
+
+	return rc;
+}
+
+static int qedn_fetch_tcp_port(struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	struct qedn_ctrl *qctrl;
+	int rc = 0;
+
+	ctrl = conn_ctx->ctrl;
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+	if (!qctrl)
+		return -ENODEV;
+
+	rc = qed_fetch_tcp_port(ctrl->conn_params.local_ip_addr,
+				&conn_ctx->sock, &conn_ctx->ep.src_port);
+
+	return rc;
+}
+
+static void qedn_decouple_conn(struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_tcp_ofld_queue *queue;
+
+	queue = conn_ctx->queue;
+	queue->private_data = NULL;
+}
+
+void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctrl *qctrl;
+
+	if (!conn_ctx)
+		return;
+
+	qctrl = (struct qedn_ctrl *)conn_ctx->ctrl->private_data;
+	if (!qctrl)
+		return;
+
+	if (test_and_set_bit(DESTROY_CONNECTION, &conn_ctx->agg_work_action))
+		return;
+
+	qedn_set_con_state(conn_ctx, CONN_STATE_DESTROY_CONNECTION);
+	queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
+}
+
+/* Slowpath EQ Callback */
+int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
+{
+	struct nvmetcp_connect_done_results *eqe_connect_done;
+	struct nvmetcp_eqe_data *eqe_data;
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	struct qedn_conn_ctx *conn_ctx;
+	struct qedn_ctrl *qctrl;
+	struct qedn_ctx *qedn;
+	u16 icid;
+	int rc;
+
+	if (!context || !event_ring_data) {
+		pr_err("Recv event with ctx NULL\n");
+
+		return -EINVAL;
+	}
+
+	qedn = (struct qedn_ctx *)context;
+
+	if (fw_event_code != NVMETCP_EVENT_TYPE_ASYN_CONNECT_COMPLETE) {
+		eqe_data = (struct nvmetcp_eqe_data *)event_ring_data;
+		icid = le16_to_cpu(eqe_data->icid);
+		pr_err("EQE Type=0x%x icid=0x%x, conn_id=0x%x err-code=0x%x\n",
+		       fw_event_code, eqe_data->icid, eqe_data->conn_id,
+		       eqe_data->error_code);
+	} else {
+		eqe_connect_done =
+			(struct nvmetcp_connect_done_results *)event_ring_data;
+		icid = le16_to_cpu(eqe_connect_done->icid);
+	}
+
+	conn_ctx = qedn_get_conn_hash(qedn, icid);
+	if (!conn_ctx) {
+		pr_err("Connection with icid=0x%x doesn't exist in conn list\n",
+		       icid);
+
+		return -EINVAL;
+	}
+
+	ctrl = conn_ctx->ctrl;
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+	if (!qctrl)
+		return -ENODEV;
+
+	switch (fw_event_code) {
+	case NVMETCP_EVENT_TYPE_ASYN_CONNECT_COMPLETE:
+		if (conn_ctx->state != CONN_STATE_WAIT_FOR_CONNECT_DONE) {
+			pr_err("CID=0x%x:ASYN_CONNECT_COMPL:Wrong state %u\n",
+			       conn_ctx->fw_cid, conn_ctx->state);
+		} else {
+			rc = qedn_set_con_state(conn_ctx,
+						CONN_STATE_OFFLOAD_COMPLETE);
+
+			if (rc)
+				return rc;
+
+			/* Placeholder - for ICReq flow */
+		}
+
+		break;
+	case NVMETCP_EVENT_TYPE_ASYN_TERMINATE_DONE:
+		if (conn_ctx->state != CONN_STATE_WAIT_FOR_DESTROY_DONE)
+			pr_err("CID=0x%x:ASYN_TERMINATE_DONE:Wrong state %u\n",
+			       conn_ctx->fw_cid, conn_ctx->state);
+		else
+			queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
+
+		break;
+	default:
+		pr_err("CID=0x%x - Recv Unknown Event %u\n",
+		       conn_ctx->fw_cid, fw_event_code);
+		break;
+	}
+
+	return 0;
+}
+
+void qedn_prep_db_data(struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvmetcp_db_data *db_data = &conn_ctx->ep.db_data;
+	u8 params = 0;
+
+	params |= DB_DEST_XCM << NVMETCP_DB_DATA_DEST_SHIFT;
+	params |= DB_AGG_CMD_SET << NVMETCP_DB_DATA_AGG_CMD_SHIFT;
+	params |= DQ_XCM_ISCSI_SQ_PROD_CMD << NVMETCP_DB_DATA_AGG_VAL_SEL_SHIFT;
+	params |= 1 << NVMETCP_DB_DATA_BYPASS_EN_SHIFT;
+
+	db_data->params = params;
+	db_data->agg_flags = 0;
+}
+
+static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	size_t dma_size;
+	int rc;
+
+	rc = qedn_alloc_fw_sq(qedn, &conn_ctx->ep);
+	if (rc) {
+		pr_err("Failed to allocate FW SQ\n");
+		goto rel_conn;
+	}
+
+	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
+	rc = qed_ops->acquire_conn(qedn->cdev,
+				   &conn_ctx->conn_handle,
+				   &conn_ctx->fw_cid,
+				   &conn_ctx->ep.p_doorbell);
+	if (rc) {
+		pr_err("Couldn't acquire connection\n");
+		goto rel_conn;
+	}
+
+	hash_add(qedn->conn_ctx_hash, &conn_ctx->hash_node,
+		 conn_ctx->conn_handle);
+	set_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
+
+	/* Placeholder - Allocate task resources and initialize fields */
+
+	rc = qedn_fetch_tcp_port(conn_ctx);
+	if (rc)
+		goto rel_conn;
+
+	set_bit(QEDN_CONN_RESRC_TCP_PORT, &conn_ctx->resrc_state);
+	dma_size = conn_ctx->sq_depth *
+			   sizeof(struct nvmetcp_host_cccid_itid_entry);
+	conn_ctx->host_cccid_itid =
+			dma_alloc_coherent(&qedn->pdev->dev,
+					   dma_size,
+					   &conn_ctx->host_cccid_itid_phy_addr,
+					   GFP_ATOMIC);
+	if (!conn_ctx->host_cccid_itid) {
+		pr_err("CCCID-iTID Map allocation failed\n");
+		goto rel_conn;
+	}
+
+	memset(conn_ctx->host_cccid_itid, 0xFF, dma_size);
+	set_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state);
+	rc = qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_CONNECT_DONE);
+	if (rc)
+		goto rel_conn;
+
+	qedn_prep_db_data(conn_ctx);
+	rc = qed_ops->common->db_recovery_add(qedn->cdev,
+					      conn_ctx->ep.p_doorbell,
+					      &conn_ctx->ep.db_data,
+					      DB_REC_WIDTH_32B, DB_REC_KERNEL);
+	if (rc)
+		goto rel_conn;
+	set_bit(QEDN_CONN_RESRC_DB_ADD, &conn_ctx->resrc_state);
+
+	rc = qedn_nvmetcp_offload_conn(conn_ctx);
+	if (rc) {
+		pr_err("Offload error: CID=0x%x\n", conn_ctx->fw_cid);
+		goto rel_conn;
+	}
+
+	return 0;
+
+rel_conn:
+	pr_err("qedn create queue ended with ERROR\n");
+	qedn_release_conn_ctx(conn_ctx);
+
+	return -EINVAL;
+}
+
+void qedn_cleanp_fw(struct qedn_conn_ctx *conn_ctx)
+{
+	/* Placeholder - task cleanup */
+}
+
+void qedn_destroy_connection(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int rc;
+
+	qedn_decouple_conn(conn_ctx);
+
+	if (qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_DESTROY_DONE))
+		return;
+
+	rc = qed_ops->destroy_conn(qedn->cdev, conn_ctx->conn_handle,
+				   conn_ctx->abrt_flag);
+	if (rc)
+		pr_warn("destroy_conn failed - rc %u\n", rc);
+}
+
+void qedn_sp_wq_handler(struct work_struct *work)
+{
+	struct qedn_conn_ctx *conn_ctx;
+	struct qedn_ctx *qedn;
+	int rc;
+
+	conn_ctx = container_of(work, struct qedn_conn_ctx, sp_wq_entry);
+	qedn = conn_ctx->qedn;
+
+	if (conn_ctx->state == CONN_STATE_DESTROY_COMPLETE) {
+		pr_err("Connection already released!\n");
+
+		return;
+	}
+
+	if (conn_ctx->state == CONN_STATE_WAIT_FOR_DESTROY_DONE) {
+		qedn_release_conn_ctx(conn_ctx);
+
+		return;
+	}
+
+	qedn = conn_ctx->qedn;
+	if (test_bit(DESTROY_CONNECTION, &conn_ctx->agg_work_action)) {
+		qedn_destroy_connection(conn_ctx);
+
+		return;
+	}
+
+	if (test_bit(CREATE_CONNECTION, &conn_ctx->agg_work_action)) {
+		qedn_clr_sp_wa(conn_ctx, CREATE_CONNECTION);
+		rc = qedn_prep_and_offload_queue(conn_ctx);
+		if (rc) {
+			pr_err("Error in queue prepare & firmware offload\n");
+
+			return;
+		}
+	}
+}
+
+/* Clear connection aggregative slowpath work action */
+void qedn_clr_sp_wa(struct qedn_conn_ctx *conn_ctx, u32 bit)
+{
+	clear_bit(bit, &conn_ctx->agg_work_action);
+}
+
+/* Set connection aggregative slowpath work action */
+void qedn_set_sp_wa(struct qedn_conn_ctx *conn_ctx, u32 bit)
+{
+	set_bit(bit, &conn_ctx->agg_work_action);
+}
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 6c0f36f7d9d1..bd5618f65c70 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -22,6 +22,15 @@ static struct pci_device_id qedn_pci_tbl[] = {
 	{0, 0},
 };
 
+static bool qedn_matches_qede(struct qedn_ctx *qedn, struct pci_dev *qede_pdev)
+{
+	struct pci_dev *qedn_pdev = qedn->pdev;
+
+	return (qede_pdev->bus->number == qedn_pdev->bus->number &&
+		PCI_SLOT(qede_pdev->devfn) == PCI_SLOT(qedn_pdev->devfn) &&
+		PCI_FUNC(qede_pdev->devfn) == qedn->dev_info.port_id);
+}
+
 static int
 qedn_find_dev(struct nvme_tcp_ofld_dev *dev,
 	      struct nvme_tcp_ofld_ctrl *ctrl)
@@ -29,7 +38,9 @@ qedn_find_dev(struct nvme_tcp_ofld_dev *dev,
 	struct nvme_tcp_ofld_ctrl_con_params *conn_params;
 	struct pci_dev *qede_pdev = NULL;
 	struct sockaddr remote_mac_addr;
+	struct qedn_ctrl *qctrl = NULL;
 	struct net_device *ndev = NULL;
+	struct qedn_ctx *qedn = NULL;
 	u16 vlan_id = 0;
 	int rc = 0;
 
@@ -65,11 +76,24 @@ qedn_find_dev(struct nvme_tcp_ofld_dev *dev,
 
 	qed_vlan_get_ndev(&ctrl->ndev, &vlan_id);
 
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+	if (qctrl) {
+		qctrl->remote_mac_addr = remote_mac_addr;
+		qctrl->vlan_id = vlan_id;
+	}
+
 	/* route found through ndev - validate this is qede*/
 	qede_pdev = qed_validate_ndev(ctrl->ndev);
 	if (!qede_pdev)
 		return false;
 
+	qedn = container_of(dev, struct qedn_ctx, qedn_ofld_dev);
+	if (!qedn)
+		return false;
+
+	if (!qedn_matches_qede(qedn, qede_pdev))
+		return false;
+
 	return true;
 }
 
@@ -82,14 +106,73 @@ qedn_claim_dev(struct nvme_tcp_ofld_dev *dev,
 
 static int qedn_setup_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 {
-	/* Placeholder - qedn_setup_ctrl */
+	struct nvme_tcp_ofld_dev *dev = ctrl->dev;
+	struct qedn_ctrl *qctrl = NULL;
+	struct qedn_ctx *qedn = NULL;
+	bool new = true;
+	int rc = 0;
+
+	if (ctrl->private_data) {
+		qctrl = (struct qedn_ctrl *)ctrl->private_data;
+		new = false;
+	}
+
+	if (new) {
+		qctrl = kzalloc(sizeof(*qctrl), GFP_KERNEL);
+		if (!qctrl)
+			return -ENOMEM;
+
+		ctrl->private_data = (void *)qctrl;
+		set_bit(QEDN_CTRL_SET_TO_OFLD_CTRL, &qctrl->agg_state);
+
+		qctrl->sp_wq = alloc_workqueue(QEDN_SP_WORKQUEUE,
+					       WQ_MEM_RECLAIM,
+					       QEDN_SP_WORKQUEUE_MAX_ACTIVE);
+		if (!qctrl->sp_wq) {
+			rc = -ENODEV;
+			pr_err("Unable to create slowpath work queue!\n");
+			kfree(qctrl);
+
+			return rc;
+		}
+
+		set_bit(QEDN_STATE_SP_WORK_THREAD_SET, &qctrl->agg_state);
+	}
+
+	if (!qedn_find_dev(dev, ctrl)) {
+		rc = -ENODEV;
+		goto err_out;
+	}
+
+	qedn = container_of(dev, struct qedn_ctx, qedn_ofld_dev);
+	qctrl->qedn = qedn;
+
+	/* Placeholder - setup LLH filter */
 
 	return 0;
+err_out:
+	flush_workqueue(qctrl->sp_wq);
+	kfree(qctrl);
+
+	return rc;
 }
 
 static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 {
-	/* Placeholder - qedn_release_ctrl */
+	struct qedn_ctrl *qctrl;
+
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+	if (!qctrl)
+		return -ENODEV;
+
+	if (test_and_clear_bit(QEDN_STATE_SP_WORK_THREAD_SET,
+			       &qctrl->agg_state))
+		flush_workqueue(qctrl->sp_wq);
+
+	if (test_and_clear_bit(QEDN_CTRL_SET_TO_OFLD_CTRL, &qctrl->agg_state)) {
+		kfree(qctrl);
+		ctrl->private_data = NULL;
+	}
 
 	return 0;
 }
@@ -97,19 +180,117 @@ static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
 			     size_t queue_size)
 {
-	/* Placeholder - qedn_create_queue */
+	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
+	struct nvme_ctrl *nctrl = &ctrl->nctrl;
+	struct qedn_conn_ctx *conn_ctx;
+	struct qedn_ctrl *qctrl;
+	struct qedn_ctx *qedn;
+	int rc;
+
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+	if (!qctrl)
+		return -ENODEV;
+
+	qedn = qctrl->qedn;
+
+	/* Allocate qedn connection context */
+	conn_ctx = kzalloc(sizeof(*conn_ctx), GFP_KERNEL);
+	if (!conn_ctx)
+		return -ENOMEM;
+
+	queue->private_data = conn_ctx;
+	queue->hdr_digest = nctrl->opts->hdr_digest;
+	queue->data_digest = nctrl->opts->data_digest;
+	queue->tos = nctrl->opts->tos;
+
+	conn_ctx->qedn = qedn;
+	conn_ctx->queue = queue;
+	conn_ctx->ctrl = ctrl;
+	conn_ctx->sq_depth = queue_size;
+
+	init_waitqueue_head(&conn_ctx->conn_waitq);
+	atomic_set(&conn_ctx->est_conn_indicator, 0);
+	atomic_set(&conn_ctx->destroy_conn_indicator, 0);
+
+	spin_lock_init(&conn_ctx->conn_state_lock);
+
+	qedn_initialize_endpoint(&conn_ctx->ep, qedn->local_mac_addr, ctrl);
+
+	atomic_inc(&qctrl->host_num_active_conns);
+
+	qedn_set_sp_wa(conn_ctx, CREATE_CONNECTION);
+	qedn_set_con_state(conn_ctx, CONN_STATE_CREATE_CONNECTION);
+	INIT_WORK(&conn_ctx->sp_wq_entry, qedn_sp_wq_handler);
+	queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
+
+	/* Wait for the connection establishment to complete - this includes the
+	 * FW TCP connection establishment and the NVMeTCP ICReq & ICResp
+	 */
+	rc = qedn_wait_for_conn_est(conn_ctx);
+	if (rc)
+		return -ENXIO;
 
 	return 0;
 }
 
 static void qedn_drain_queue(struct nvme_tcp_ofld_queue *queue)
 {
-	/* Placeholder - qedn_drain_queue */
+	struct qedn_conn_ctx *conn_ctx;
+
+	if (!queue) {
+		pr_err("ctrl has no queues\n");
+
+		return;
+	}
+
+	conn_ctx = (struct qedn_conn_ctx *)queue->private_data;
+	if (!conn_ctx)
+		return;
+
+	qedn_cleanp_fw(conn_ctx);
+}
+
+static inline void
+qedn_queue_wait_for_terminate_complete(struct qedn_conn_ctx *conn_ctx)
+{
+	/* Returns valid non-0 */
+	atomic_t *conn_dest_ind = &conn_ctx->destroy_conn_indicator;
+	u32 dest_timeout = msecs_to_jiffies(QEDN_RLS_CONS_TMO);
+	int wrc, state;
+
+	wrc = wait_event_interruptible_timeout(conn_ctx->conn_waitq,
+					       atomic_read(conn_dest_ind) > 0,
+					       dest_timeout);
+
+	atomic_set(conn_dest_ind, 0);
+
+	spin_lock_bh(&conn_ctx->conn_state_lock);
+	state = conn_ctx->state;
+	spin_unlock_bh(&conn_ctx->conn_state_lock);
+
+	if (!wrc  || state != CONN_STATE_DESTROY_COMPLETE)
+		pr_warn("Timed out waiting for clear-SQ on FW conns");
 }
 
 static void qedn_destroy_queue(struct nvme_tcp_ofld_queue *queue)
 {
-	/* Placeholder - qedn_destroy_queue */
+	struct qedn_conn_ctx *conn_ctx;
+
+	if (!queue) {
+		pr_err("ctrl has no queues\n");
+
+		return;
+	}
+
+	conn_ctx = (struct qedn_conn_ctx *)queue->private_data;
+	if (!conn_ctx)
+		return;
+
+	qedn_terminate_connection(conn_ctx);
+
+	qedn_queue_wait_for_terminate_complete(conn_ctx);
+
+	kfree(conn_ctx);
 }
 
 static int qedn_poll_queue(struct nvme_tcp_ofld_queue *queue)
@@ -150,6 +331,21 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
 	.send_req = qedn_send_req,
 };
 
+struct qedn_conn_ctx *qedn_get_conn_hash(struct qedn_ctx *qedn, u16 icid)
+{
+	struct qedn_conn_ctx *conn = NULL;
+
+	hash_for_each_possible(qedn->conn_ctx_hash, conn, hash_node, icid) {
+		if (conn->conn_handle == icid)
+			break;
+	}
+
+	if (!conn || conn->conn_handle != icid)
+		return NULL;
+
+	return conn;
+}
+
 /* Fastpath IRQ handler */
 static irqreturn_t qedn_irq_handler(int irq, void *dev_id)
 {
@@ -250,7 +446,7 @@ static int qedn_setup_irq(struct qedn_ctx *qedn)
 
 static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)
 {
-	/* Placeholder - Initialize qedn fields */
+	hash_init(qedn->conn_ctx_hash);
 }
 
 static inline void
@@ -591,7 +787,7 @@ static int __qedn_probe(struct pci_dev *pdev)
 	rc = qed_ops->start(qedn->cdev,
 			    NULL /* Placeholder for FW IO-path resources */,
 			    qedn,
-			    NULL /* Placeholder for FW Event callback */);
+			    qedn_event_cb);
 	if (rc) {
 		rc = -ENODEV;
 		pr_err("Cannot start NVMeTCP Function\n");
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 14/20] qedn: Add support of configuring HW filter block
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (12 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 13/20] qedn: Add connection-level slowpath functionality Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 15/20] qedn: Add IO level qedn_send_req and fw_cq workqueue Prabhakar Kushwaha
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

HW filter can be configured to filter TCP packets based on either
source or target TCP port. QEDN leverage this feature to route
NVMeTCP traffic.

This patch configures HW filter block based on source port for all
receiving packets to deliver correct QEDN PF.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/hw/qedn/qedn.h      |  15 ++++
 drivers/nvme/hw/qedn/qedn_main.c | 113 ++++++++++++++++++++++++++++++-
 2 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 38ab4ff88999..9e16297fa323 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -29,6 +29,11 @@
 #define QEDN_IRQ_NAME_LEN 24
 #define QEDN_IRQ_NO_FLAGS 0
 
+/* HW defines */
+
+/* QEDN_MAX_LLH_PORTS will be extended in future */
+#define QEDN_MAX_LLH_PORTS 16
+
 /* Destroy connection defines */
 #define QEDN_NON_ABORTIVE_TERMINATION 0
 #define QEDN_ABORTIVE_TERMINATION 1
@@ -68,6 +73,7 @@
 enum qedn_state {
 	QEDN_STATE_CORE_PROBED = 0,
 	QEDN_STATE_CORE_OPEN,
+	QEDN_STATE_LLH_PORT_FILTER_SET,
 	QEDN_STATE_MFW_STATE,
 	QEDN_STATE_NVMETCP_OPEN,
 	QEDN_STATE_IRQ_SET,
@@ -99,6 +105,8 @@ struct qedn_ctx {
 	/* Accessed with atomic bit ops, used with enum qedn_state */
 	unsigned long state;
 
+	u8 num_llh_filters;
+	struct list_head llh_filter_list;
 	u8 local_mac_addr[ETH_ALEN];
 	u16 mtu;
 
@@ -165,6 +173,12 @@ enum qedn_conn_state {
 	CONN_STATE_DESTROY_COMPLETE
 };
 
+struct qedn_llh_filter {
+	struct list_head entry;
+	u16 port;
+	u16 ref_cnt;
+};
+
 struct qedn_ctrl {
 	struct list_head glb_entry;
 	struct list_head pf_entry;
@@ -249,5 +263,6 @@ int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx,
 		       enum qedn_conn_state new_state);
 void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx);
 void qedn_cleanp_fw(struct qedn_conn_ctx *conn_ctx);
+__be16 qedn_get_in_port(struct sockaddr_storage *sa);
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index bd5618f65c70..3cadec6d6f5d 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -22,6 +22,86 @@ static struct pci_device_id qedn_pci_tbl[] = {
 	{0, 0},
 };
 
+__be16 qedn_get_in_port(struct sockaddr_storage *sa)
+{
+	return sa->ss_family == AF_INET
+		? ((struct sockaddr_in *)sa)->sin_port
+		: ((struct sockaddr_in6 *)sa)->sin6_port;
+}
+
+struct qedn_llh_filter *qedn_add_llh_filter(struct qedn_ctx *qedn, u16 tcp_port)
+{
+	struct qedn_llh_filter *llh_filter = NULL;
+	struct qedn_llh_filter *llh_tmp = NULL;
+	bool new_filter = 1;
+	int rc = 0;
+
+	/* Check if LLH filter already defined */
+	list_for_each_entry_safe(llh_filter, llh_tmp, &qedn->llh_filter_list,
+				 entry) {
+		if (llh_filter->port == tcp_port) {
+			new_filter = 0;
+			llh_filter->ref_cnt++;
+			break;
+		}
+	}
+
+	if (new_filter) {
+		if (qedn->num_llh_filters >= QEDN_MAX_LLH_PORTS) {
+			pr_err("PF reached the max target ports limit %u. %u\n",
+			       qedn->dev_info.common.abs_pf_id,
+			       qedn->num_llh_filters);
+
+			return NULL;
+		}
+
+		rc = qed_ops->add_src_tcp_port_filter(qedn->cdev, tcp_port);
+		if (rc) {
+			pr_err("LLH port config failed. port:%u; rc:%u\n",
+			       tcp_port, rc);
+
+			return NULL;
+		}
+
+		llh_filter = kzalloc(sizeof(*llh_filter), GFP_KERNEL);
+		if (!llh_filter) {
+			qed_ops->remove_src_tcp_port_filter(qedn->cdev,
+							    tcp_port);
+
+			return NULL;
+		}
+
+		llh_filter->port = tcp_port;
+		llh_filter->ref_cnt = 1;
+		++qedn->num_llh_filters;
+		list_add_tail(&llh_filter->entry, &qedn->llh_filter_list);
+		set_bit(QEDN_STATE_LLH_PORT_FILTER_SET, &qedn->state);
+	}
+
+	return llh_filter;
+}
+
+void qedn_dec_llh_filter(struct qedn_ctx *qedn,
+			 struct qedn_llh_filter *llh_filter)
+{
+	if (!llh_filter)
+		return;
+
+	llh_filter->ref_cnt--;
+	if (!llh_filter->ref_cnt) {
+		list_del(&llh_filter->entry);
+
+		/* Remove LLH protocol port filter */
+		qed_ops->remove_src_tcp_port_filter(qedn->cdev,
+						    llh_filter->port);
+
+		--qedn->num_llh_filters;
+		kfree(llh_filter);
+		if (!qedn->num_llh_filters)
+			clear_bit(QEDN_STATE_LLH_PORT_FILTER_SET, &qedn->state);
+	}
+}
+
 static bool qedn_matches_qede(struct qedn_ctx *qedn, struct pci_dev *qede_pdev)
 {
 	struct pci_dev *qedn_pdev = qedn->pdev;
@@ -107,8 +187,10 @@ qedn_claim_dev(struct nvme_tcp_ofld_dev *dev,
 static int qedn_setup_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 {
 	struct nvme_tcp_ofld_dev *dev = ctrl->dev;
+	struct qedn_llh_filter *llh_filter = NULL;
 	struct qedn_ctrl *qctrl = NULL;
 	struct qedn_ctx *qedn = NULL;
+	__be16 remote_port;
 	bool new = true;
 	int rc = 0;
 
@@ -147,7 +229,22 @@ static int qedn_setup_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 	qedn = container_of(dev, struct qedn_ctx, qedn_ofld_dev);
 	qctrl->qedn = qedn;
 
-	/* Placeholder - setup LLH filter */
+	if (qedn->num_llh_filters == 0) {
+		qedn->mtu = ctrl->ndev->mtu;
+		memcpy(qedn->local_mac_addr, ctrl->ndev->dev_addr, ETH_ALEN);
+	}
+
+	remote_port = qedn_get_in_port(&ctrl->conn_params.remote_ip_addr);
+	if (new) {
+		llh_filter = qedn_add_llh_filter(qedn, ntohs(remote_port));
+		if (!llh_filter) {
+			rc = -EFAULT;
+			goto err_out;
+		}
+
+		qctrl->llh_filter = llh_filter;
+		set_bit(LLH_FILTER, &qctrl->agg_state);
+	}
 
 	return 0;
 err_out:
@@ -165,6 +262,12 @@ static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 	if (!qctrl)
 		return -ENODEV;
 
+	if (test_and_clear_bit(LLH_FILTER, &qctrl->agg_state) &&
+	    qctrl->llh_filter) {
+		qedn_dec_llh_filter(qctrl->qedn, qctrl->llh_filter);
+		qctrl->llh_filter = NULL;
+	}
+
 	if (test_and_clear_bit(QEDN_STATE_SP_WORK_THREAD_SET,
 			       &qctrl->agg_state))
 		flush_workqueue(qctrl->sp_wq);
@@ -446,6 +549,8 @@ static int qedn_setup_irq(struct qedn_ctx *qedn)
 
 static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)
 {
+	INIT_LIST_HEAD(&qedn->llh_filter_list);
+	qedn->num_llh_filters = 0;
 	hash_init(qedn->conn_ctx_hash);
 }
 
@@ -688,6 +793,12 @@ static void __qedn_remove(struct pci_dev *pdev)
 		return;
 	}
 
+	if (test_and_clear_bit(QEDN_STATE_LLH_PORT_FILTER_SET, &qedn->state)) {
+		pr_err("LLH port configuration removal. %d filters still set\n",
+		       qedn->num_llh_filters);
+		qed_ops->clear_all_filters(qedn->cdev);
+	}
+
 	if (test_and_clear_bit(QEDN_STATE_REGISTERED_OFFLOAD_DEV, &qedn->state))
 		nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
 
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 15/20] qedn: Add IO level qedn_send_req and fw_cq workqueue
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (13 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 14/20] qedn: Add support of configuring HW filter block Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 16/20] qedn: Add support of Task and SGL Prabhakar Kushwaha
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

From: Shai Malin <smalin@marvell.com>

This patch will present the IO level skeleton flows:

- qedn_send_req(): process new requests, similar to nvme_tcp_queue_rq().

- qedn_fw_cq_fp_wq():   process new FW completions, the flow starts from
			the IRQ handler and for a single interrupt it will
			process all the pending NVMeoF Completions under
			polling mode.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/Makefile    |  2 +-
 drivers/nvme/hw/qedn/qedn.h      | 14 ++++++
 drivers/nvme/hw/qedn/qedn_conn.c |  2 +
 drivers/nvme/hw/qedn/qedn_main.c | 84 +++++++++++++++++++++++++++++---
 drivers/nvme/hw/qedn/qedn_task.c | 79 ++++++++++++++++++++++++++++++
 5 files changed, 172 insertions(+), 9 deletions(-)
 create mode 100644 drivers/nvme/hw/qedn/qedn_task.c

diff --git a/drivers/nvme/hw/qedn/Makefile b/drivers/nvme/hw/qedn/Makefile
index ece84772d317..888d466fa5ed 100644
--- a/drivers/nvme/hw/qedn/Makefile
+++ b/drivers/nvme/hw/qedn/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_NVME_QEDN) += qedn.o
-qedn-y := qedn_main.o qedn_conn.o
+qedn-y := qedn_main.o qedn_conn.o qedn_task.o
\ No newline at end of file
diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 9e16297fa323..13d63f420a23 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -38,6 +38,8 @@
 #define QEDN_NON_ABORTIVE_TERMINATION 0
 #define QEDN_ABORTIVE_TERMINATION 1
 
+#define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"
+
 /*
  * TCP offload stack default configurations and defines.
  * Future enhancements will allow controlling the configurable
@@ -90,6 +92,7 @@ struct qedn_fp_queue {
 	struct qedn_ctx	*qedn;
 	struct qed_sb_info *sb_info;
 	unsigned int cpu;
+	struct work_struct fw_cq_fp_wq_entry;
 	u16 sb_id;
 	char irqname[QEDN_IRQ_NAME_LEN];
 };
@@ -118,6 +121,7 @@ struct qedn_ctx {
 	struct qedn_fp_queue *fp_q_arr;
 	struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;
 	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
+	struct workqueue_struct *fw_cq_fp_wq;
 };
 
 struct qedn_endpoint {
@@ -204,6 +208,12 @@ struct qedn_ctrl {
 
 /* Connection level struct */
 struct qedn_conn_ctx {
+	/* IO path */
+	struct qedn_fp_queue *fp_q;
+	/* mutex for queueing request */
+	struct mutex send_mutex;
+	int qid;
+
 	struct qedn_ctx *qedn;
 	struct nvme_tcp_ofld_queue *queue;
 	struct nvme_tcp_ofld_ctrl *ctrl;
@@ -264,5 +274,9 @@ int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx,
 void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx);
 void qedn_cleanp_fw(struct qedn_conn_ctx *conn_ctx);
 __be16 qedn_get_in_port(struct sockaddr_storage *sa);
+int qedn_queue_request(struct qedn_conn_ctx *qedn_conn,
+		       struct nvme_tcp_ofld_req *req);
+void qedn_nvme_req_fp_wq_handler(struct work_struct *work);
+void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe);
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
index 04c45a6fa14b..962c88a4f345 100644
--- a/drivers/nvme/hw/qedn/qedn_conn.c
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -190,6 +190,7 @@ static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
 		pr_err("Conn resources state isn't 0 as expected 0x%lx\n",
 		       conn_ctx->resrc_state);
 
+	mutex_destroy(&conn_ctx->send_mutex);
 	atomic_inc(&conn_ctx->destroy_conn_indicator);
 	qedn_set_con_state(conn_ctx, CONN_STATE_DESTROY_COMPLETE);
 	wake_up_interruptible(&conn_ctx->conn_waitq);
@@ -429,6 +430,7 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 	}
 
 	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
+
 	rc = qed_ops->acquire_conn(qedn->cdev,
 				   &conn_ctx->conn_handle,
 				   &conn_ctx->fw_cid,
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 3cadec6d6f5d..975949ce6fb0 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -310,6 +310,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
 	conn_ctx->queue = queue;
 	conn_ctx->ctrl = ctrl;
 	conn_ctx->sq_depth = queue_size;
+	mutex_init(&conn_ctx->send_mutex);
 
 	init_waitqueue_head(&conn_ctx->conn_waitq);
 	atomic_set(&conn_ctx->est_conn_indicator, 0);
@@ -317,6 +318,8 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
 
 	spin_lock_init(&conn_ctx->conn_state_lock);
 
+	conn_ctx->qid = qid;
+
 	qedn_initialize_endpoint(&conn_ctx->ep, qedn->local_mac_addr, ctrl);
 
 	atomic_inc(&qctrl->host_num_active_conns);
@@ -408,9 +411,18 @@ static int qedn_poll_queue(struct nvme_tcp_ofld_queue *queue)
 
 static int qedn_send_req(struct nvme_tcp_ofld_req *req)
 {
-	/* Placeholder - qedn_send_req */
+	struct qedn_conn_ctx *qedn_conn;
+	int rc = 0;
 
-	return 0;
+	qedn_conn = (struct qedn_conn_ctx *)req->queue->private_data;
+	if (unlikely(!qedn_conn))
+		return -ENXIO;
+
+	mutex_lock(&qedn_conn->send_mutex);
+	rc = qedn_queue_request(qedn_conn, req);
+	mutex_unlock(&qedn_conn->send_mutex);
+
+	return rc;
 }
 
 static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
@@ -450,9 +462,58 @@ struct qedn_conn_ctx *qedn_get_conn_hash(struct qedn_ctx *qedn, u16 icid)
 }
 
 /* Fastpath IRQ handler */
+void qedn_fw_cq_fp_handler(struct qedn_fp_queue *fp_q)
+{
+	u16 sb_id, cq_prod_idx, cq_cons_idx;
+	struct qedn_ctx *qedn = fp_q->qedn;
+	struct nvmetcp_fw_cqe *cqe = NULL;
+
+	sb_id = fp_q->sb_id;
+	qed_sb_update_sb_idx(fp_q->sb_info);
+
+	/* rmb - to prevent missing new cqes */
+	rmb();
+
+	/* Read the latest cq_prod from the SB */
+	cq_prod_idx = *fp_q->cq_prod;
+	cq_cons_idx = qed_chain_get_cons_idx(&fp_q->cq_chain);
+
+	while (cq_cons_idx != cq_prod_idx) {
+		cqe = qed_chain_consume(&fp_q->cq_chain);
+		if (likely(cqe))
+			qedn_io_work_cq(qedn, cqe);
+		else
+			pr_err("Failed consuming cqe\n");
+
+		cq_cons_idx = qed_chain_get_cons_idx(&fp_q->cq_chain);
+
+		/* Check if new completions were posted */
+		if (unlikely(cq_prod_idx == cq_cons_idx)) {
+			/* rmb - to prevent missing new cqes */
+			rmb();
+
+			/* Update the latest cq_prod from the SB */
+			cq_prod_idx = *fp_q->cq_prod;
+		}
+	}
+}
+
+static void qedn_fw_cq_fq_wq_handler(struct work_struct *work)
+{
+	struct qedn_fp_queue *fp_q = container_of(work, struct qedn_fp_queue,
+						  fw_cq_fp_wq_entry);
+
+	qedn_fw_cq_fp_handler(fp_q);
+	qed_sb_ack(fp_q->sb_info, IGU_INT_ENABLE, 1);
+}
+
 static irqreturn_t qedn_irq_handler(int irq, void *dev_id)
 {
-	/* Placeholder */
+	struct qedn_fp_queue *fp_q = dev_id;
+	struct qedn_ctx *qedn = fp_q->qedn;
+
+	qed_sb_ack(fp_q->sb_info, IGU_INT_DISABLE, 0);
+	queue_work_on(fp_q->cpu, qedn->fw_cq_fp_wq, &fp_q->fw_cq_fp_wq_entry);
 
 	return IRQ_HANDLED;
 }
@@ -586,6 +647,8 @@ static void qedn_free_function_queues(struct qedn_ctx *qedn)
 	int i;
 
 	/* Free workqueues */
+	destroy_workqueue(qedn->fw_cq_fp_wq);
+	qedn->fw_cq_fp_wq = NULL;
 
 	/* Free the fast path queues*/
 	for (i = 0; i < qedn->num_fw_cqs; i++) {
@@ -653,7 +716,14 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
 	u64 cq_phy_addr;
 	int i;
 
-	/* Place holder - IO-path workqueues */
+	qedn->fw_cq_fp_wq = alloc_workqueue(QEDN_FW_CQ_FP_WQ_WORKQUEUE,
+					    WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
+	if (!qedn->fw_cq_fp_wq) {
+		rc = -ENODEV;
+		pr_err("Unable to create fastpath FW CQ workqueue!\n");
+
+		return rc;
+	}
 
 	qedn->fp_q_arr = kcalloc(qedn->num_fw_cqs,
 				 sizeof(struct qedn_fp_queue), GFP_KERNEL);
@@ -681,8 +751,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
 		chain_params.mode = QED_CHAIN_MODE_PBL,
 		chain_params.cnt_type = QED_CHAIN_CNT_TYPE_U16,
 		chain_params.num_elems = QEDN_FW_CQ_SIZE;
-		/* Placeholder - sizeof(struct nvmetcp_fw_cqe)*/
-		chain_params.elem_size = 64;
+		chain_params.elem_size = sizeof(struct nvmetcp_fw_cqe);
 
 		rc = qed_ops->common->chain_alloc(qedn->cdev,
 						  &fp_q->cq_chain,
@@ -711,8 +780,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
 		sb = fp_q->sb_info->sb_virt;
 		fp_q->cq_prod = (u16 *)&sb->pi_array[QEDN_PROTO_CQ_PROD_IDX];
 		fp_q->qedn = qedn;
-
-		/* Placeholder - Init IO-path workqueue */
+		INIT_WORK(&fp_q->fw_cq_fp_wq_entry, qedn_fw_cq_fq_wq_handler);
 
 		/* Placeholder - Init IO-path resources */
 	}
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
new file mode 100644
index 000000000000..a3af55fba95c
--- /dev/null
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+ /* Kernel includes */
+#include <linux/kernel.h>
+
+/* Driver includes */
+#include "qedn.h"
+
+int qedn_queue_request(struct qedn_conn_ctx *qedn_conn,
+		       struct nvme_tcp_ofld_req *req)
+{
+	/* Process the request */
+
+	return 0;
+}
+
+struct qedn_task_ctx *qedn_cqe_get_active_task(struct nvmetcp_fw_cqe *cqe)
+{
+	struct regpair *p = &cqe->task_opaque;
+
+	return (struct qedn_task_ctx *)((((u64)(le32_to_cpu(p->hi)) << 32)
+					+ le32_to_cpu(p->lo)));
+}
+
+void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
+{
+	struct qedn_task_ctx *qedn_task = NULL;
+	struct qedn_conn_ctx *conn_ctx = NULL;
+	u16 itid;
+	u32 cid;
+
+	conn_ctx = qedn_get_conn_hash(qedn, le16_to_cpu(cqe->conn_id));
+	if (unlikely(!conn_ctx)) {
+		pr_err("CID 0x%x: Failed to fetch conn_ctx from hash\n",
+		       le16_to_cpu(cqe->conn_id));
+
+		return;
+	}
+
+	cid = conn_ctx->fw_cid;
+	itid = le16_to_cpu(cqe->itid);
+	qedn_task = qedn_cqe_get_active_task(cqe);
+	if (unlikely(!qedn_task))
+		return;
+
+	if (likely(cqe->cqe_type == NVMETCP_FW_CQE_TYPE_NORMAL)) {
+		/* Placeholder - verify the connection was established */
+
+		switch (cqe->task_type) {
+		case NVMETCP_TASK_TYPE_HOST_WRITE:
+		case NVMETCP_TASK_TYPE_HOST_READ:
+
+			/* Placeholder - IO flow */
+
+			break;
+
+		case NVMETCP_TASK_TYPE_HOST_READ_NO_CQE:
+
+			/* Placeholder - IO flow */
+
+			break;
+
+		case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST:
+
+			/* Placeholder - ICReq flow */
+
+			break;
+		default:
+			pr_info("Could not identify task type\n");
+		}
+	} else {
+		/* Placeholder - Recovery flows */
+	}
+}
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 16/20] qedn: Add support of Task and SGL
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (14 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 15/20] qedn: Add IO level qedn_send_req and fw_cq workqueue Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 17/20] qedn: Add support of NVME ICReq & ICResp Prabhakar Kushwaha
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

This patch will add support of Task and SGL which is used
for slowpath and fast path IO. here Task is IO granule used
by firmware to perform tasks

The internal implementation:
- Create task/sgl resources used by all connection
- Provide APIs to allocate and free task.
- Add task support during connection establishment i.e. slowpath

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/qedn.h      |  61 ++++++
 drivers/nvme/hw/qedn/qedn_conn.c |  46 ++++-
 drivers/nvme/hw/qedn/qedn_main.c |  34 +++-
 drivers/nvme/hw/qedn/qedn_task.c | 326 +++++++++++++++++++++++++++++++
 4 files changed, 463 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 13d63f420a23..78efa8c02810 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -40,6 +40,16 @@
 
 #define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"
 
+/* Protocol defines */
+#define QEDN_MAX_IO_SIZE QED_NVMETCP_MAX_IO_SIZE
+
+#define QEDN_FW_SLOW_IO_MIN_SGE_LIMIT (9700 / 6)
+
+#define QEDN_MAX_HW_SECTORS (QEDN_MAX_IO_SIZE / 512)
+#define QEDN_MAX_SEGMENTS 2048
+
+#define QEDN_INVALID_ITID 0xFFFF
+
 /*
  * TCP offload stack default configurations and defines.
  * Future enhancements will allow controlling the configurable
@@ -84,6 +94,15 @@ enum qedn_state {
 	QEDN_STATE_MODULE_REMOVE_ONGOING,
 };
 
+struct qedn_io_resources {
+	/* Lock for IO resources */
+	spinlock_t resources_lock;
+	struct list_head task_free_list;
+	u32 num_alloc_tasks;
+	u32 num_free_tasks;
+	u32 no_avail_resrc_cnt;
+};
+
 /* Per CPU core params */
 struct qedn_fp_queue {
 	struct qed_chain cq_chain;
@@ -93,6 +112,10 @@ struct qedn_fp_queue {
 	struct qed_sb_info *sb_info;
 	unsigned int cpu;
 	struct work_struct fw_cq_fp_wq_entry;
+
+	/* IO related resources for host */
+	struct qedn_io_resources host_resrc;
+
 	u16 sb_id;
 	char irqname[QEDN_IRQ_NAME_LEN];
 };
@@ -116,12 +139,35 @@ struct qedn_ctx {
 	/* Connections */
 	DECLARE_HASHTABLE(conn_ctx_hash, 16);
 
+	u32 num_tasks_per_pool;
+
 	/* Fast path queues */
 	u8 num_fw_cqs;
 	struct qedn_fp_queue *fp_q_arr;
 	struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;
 	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
 	struct workqueue_struct *fw_cq_fp_wq;
+
+	/* Fast Path Tasks */
+	struct qed_nvmetcp_tid	tasks;
+};
+
+struct qedn_task_ctx {
+	struct qedn_conn_ctx *qedn_conn;
+	struct qedn_ctx *qedn;
+	void *fw_task_ctx;
+	struct qedn_fp_queue *fp_q;
+	struct scatterlist *nvme_sg;
+	struct nvme_tcp_ofld_req *req; /* currently proccessed request */
+	struct list_head entry;
+	spinlock_t lock; /* To protect task resources */
+	bool valid;
+	unsigned long flags; /* Used by qedn_task_flags */
+	u32 task_size;
+	u16 itid;
+	u16 cccid;
+	int req_direction;
+	struct storage_sgl_task_params sgl_task_params;
 };
 
 struct qedn_endpoint {
@@ -219,6 +265,7 @@ struct qedn_conn_ctx {
 	struct nvme_tcp_ofld_ctrl *ctrl;
 	u32 conn_handle;
 	u32 fw_cid;
+	u8 default_cq;
 
 	atomic_t est_conn_indicator;
 	atomic_t destroy_conn_indicator;
@@ -236,6 +283,11 @@ struct qedn_conn_ctx {
 	dma_addr_t host_cccid_itid_phy_addr;
 	struct qedn_endpoint ep;
 	int abrt_flag;
+	/* Spinlock for accessing active_task_list */
+	spinlock_t task_list_lock;
+	struct list_head active_task_list;
+	atomic_t num_active_tasks;
+	atomic_t num_active_fw_tasks;
 
 	/* Connection resources - turned on to indicate what resource was
 	 * allocated, to that it can later be released.
@@ -255,6 +307,7 @@ struct qedn_conn_ctx {
 enum qedn_conn_resources_state {
 	QEDN_CONN_RESRC_FW_SQ,
 	QEDN_CONN_RESRC_ACQUIRE_CONN,
+	QEDN_CONN_RESRC_TASKS,
 	QEDN_CONN_RESRC_CCCID_ITID_MAP,
 	QEDN_CONN_RESRC_TCP_PORT,
 	QEDN_CONN_RESRC_DB_ADD,
@@ -278,5 +331,13 @@ int qedn_queue_request(struct qedn_conn_ctx *qedn_conn,
 		       struct nvme_tcp_ofld_req *req);
 void qedn_nvme_req_fp_wq_handler(struct work_struct *work);
 void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe);
+int qedn_alloc_tasks(struct qedn_conn_ctx *conn_ctx);
+inline int qedn_qid(struct nvme_tcp_ofld_queue *queue);
+void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params);
+void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx);
+struct qedn_task_ctx *
+qedn_get_free_task_from_pool(struct qedn_conn_ctx *conn_ctx, u16 cccid);
+void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,
+			     struct qedn_io_resources *io_resrc);
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
index 962c88a4f345..5c5b365df522 100644
--- a/drivers/nvme/hw/qedn/qedn_conn.c
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -29,6 +29,11 @@ static const char * const qedn_conn_state_str[] = {
 	NULL
 };
 
+inline int qedn_qid(struct nvme_tcp_ofld_queue *queue)
+{
+	return queue - queue->ctrl->queues;
+}
+
 int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx,
 		       enum qedn_conn_state new_state)
 {
@@ -170,6 +175,11 @@ static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
 		clear_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
 	}
 
+	if (test_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state)) {
+		clear_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);
+			qedn_return_active_tasks(conn_ctx);
+	}
+
 	if (test_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state)) {
 		dma_free_coherent(&qedn->pdev->dev,
 				  conn_ctx->sq_depth *
@@ -272,6 +282,7 @@ static int qedn_nvmetcp_offload_conn(struct qedn_conn_ctx *conn_ctx)
 	offld_prms.max_rt_time = QEDN_TCP_MAX_RT_TIME;
 	offld_prms.sq_pbl_addr =
 		(u64)qed_chain_get_pbl_phys(&qedn_ep->fw_sq_chain);
+	offld_prms.default_cq = conn_ctx->default_cq;
 
 	rc = qed_ops->offload_conn(qedn->cdev,
 				   conn_ctx->conn_handle,
@@ -420,7 +431,10 @@ void qedn_prep_db_data(struct qedn_conn_ctx *conn_ctx)
 static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 {
 	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_io_resources *io_resrc;
+	struct qedn_fp_queue *fp_q;
 	size_t dma_size;
+	u8 qid;
 	int rc;
 
 	rc = qedn_alloc_fw_sq(qedn, &conn_ctx->ep);
@@ -431,6 +445,9 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 
 	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
 
+	atomic_set(&conn_ctx->num_active_tasks, 0);
+	atomic_set(&conn_ctx->num_active_fw_tasks, 0);
+
 	rc = qed_ops->acquire_conn(qedn->cdev,
 				   &conn_ctx->conn_handle,
 				   &conn_ctx->fw_cid,
@@ -444,7 +461,34 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 		 conn_ctx->conn_handle);
 	set_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
 
-	/* Placeholder - Allocate task resources and initialize fields */
+	qid = qedn_qid(conn_ctx->queue);
+
+	/* default_cq is mapped 1:1 with qid (with cpu core) which are
+	 * assigned for the driver
+	 */
+	conn_ctx->default_cq = qid ? qid - 1 : 0;
+	fp_q = &qedn->fp_q_arr[conn_ctx->default_cq];
+	conn_ctx->fp_q = fp_q;
+	io_resrc = &fp_q->host_resrc;
+
+	/* The first connection on each fp_q will fill task
+	 * resources
+	 */
+	spin_lock(&io_resrc->resources_lock);
+	if (io_resrc->num_alloc_tasks == 0) {
+		rc = qedn_alloc_tasks(conn_ctx);
+		if (rc) {
+			pr_err("Failed allocating tasks: CID=0x%x\n",
+			       conn_ctx->fw_cid);
+			spin_unlock(&io_resrc->resources_lock);
+			goto rel_conn;
+		}
+	}
+	spin_unlock(&io_resrc->resources_lock);
+
+	spin_lock_init(&conn_ctx->task_list_lock);
+	INIT_LIST_HEAD(&conn_ctx->active_task_list);
+	set_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);
 
 	rc = qedn_fetch_tcp_port(conn_ctx);
 	if (rc)
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 975949ce6fb0..42c8ad6ac2d6 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -29,6 +29,12 @@ __be16 qedn_get_in_port(struct sockaddr_storage *sa)
 		: ((struct sockaddr_in6 *)sa)->sin6_port;
 }
 
+static void qedn_init_io_resc(struct qedn_io_resources *io_resrc)
+{
+	spin_lock_init(&io_resrc->resources_lock);
+	INIT_LIST_HEAD(&io_resrc->task_free_list);
+}
+
 struct qedn_llh_filter *qedn_add_llh_filter(struct qedn_ctx *qedn, u16 tcp_port)
 {
 	struct qedn_llh_filter *llh_filter = NULL;
@@ -436,6 +442,8 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
 		 *	NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
 		 *	NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS
 		 */
+	.max_hw_sectors = QEDN_MAX_HW_SECTORS,
+	.max_segments = QEDN_MAX_SEGMENTS,
 	.claim_dev = qedn_claim_dev,
 	.setup_ctrl = qedn_setup_ctrl,
 	.release_ctrl = qedn_release_ctrl,
@@ -640,8 +648,24 @@ static inline int qedn_core_probe(struct qedn_ctx *qedn)
 	return rc;
 }
 
+static void qedn_call_destroy_free_tasks(struct qedn_fp_queue *fp_q,
+					 struct qedn_io_resources *io_resrc)
+{
+	if (list_empty(&io_resrc->task_free_list))
+		return;
+
+	if (io_resrc->num_alloc_tasks != io_resrc->num_free_tasks)
+		pr_err("Task Pool:Not all returned allocated=0x%x, free=0x%x\n",
+		       io_resrc->num_alloc_tasks, io_resrc->num_free_tasks);
+
+	qedn_destroy_free_tasks(fp_q, io_resrc);
+	if (io_resrc->num_free_tasks)
+		pr_err("Expected num_free_tasks to be 0\n");
+}
+
 static void qedn_free_function_queues(struct qedn_ctx *qedn)
 {
+	struct qedn_io_resources *host_resrc;
 	struct qed_sb_info *sb_info = NULL;
 	struct qedn_fp_queue *fp_q;
 	int i;
@@ -653,6 +677,9 @@ static void qedn_free_function_queues(struct qedn_ctx *qedn)
 	/* Free the fast path queues*/
 	for (i = 0; i < qedn->num_fw_cqs; i++) {
 		fp_q = &qedn->fp_q_arr[i];
+		host_resrc = &fp_q->host_resrc;
+
+		qedn_call_destroy_free_tasks(fp_q, host_resrc);
 
 		/* Free SB */
 		sb_info = fp_q->sb_info;
@@ -740,7 +767,8 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
 		goto mem_alloc_failure;
 	}
 
-	/* placeholder - create task pools */
+	qedn->num_tasks_per_pool =
+		qedn->pf_params.nvmetcp_pf_params.num_tasks / qedn->num_fw_cqs;
 
 	for (i = 0; i < qedn->num_fw_cqs; i++) {
 		fp_q = &qedn->fp_q_arr[i];
@@ -782,7 +810,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
 		fp_q->qedn = qedn;
 		INIT_WORK(&fp_q->fw_cq_fp_wq_entry, qedn_fw_cq_fq_wq_handler);
 
-		/* Placeholder - Init IO-path resources */
+		qedn_init_io_resc(&fp_q->host_resrc);
 	}
 
 	return 0;
@@ -964,7 +992,7 @@ static int __qedn_probe(struct pci_dev *pdev)
 
 	/* NVMeTCP start HW PF */
 	rc = qed_ops->start(qedn->cdev,
-			    NULL /* Placeholder for FW IO-path resources */,
+			    &qedn->tasks,
 			    qedn,
 			    qedn_event_cb);
 	if (rc) {
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
index a3af55fba95c..7b228b5c5169 100644
--- a/drivers/nvme/hw/qedn/qedn_task.c
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -11,6 +11,332 @@
 /* Driver includes */
 #include "qedn.h"
 
+static void qedn_free_nvme_sg(struct qedn_task_ctx *qedn_task)
+{
+	kfree(qedn_task->nvme_sg);
+	qedn_task->nvme_sg = NULL;
+}
+
+static void qedn_free_fw_sgl(struct qedn_task_ctx *qedn_task)
+{
+	struct qedn_ctx *qedn = qedn_task->qedn;
+	dma_addr_t sgl_pa;
+
+	sgl_pa = HILO_DMA_REGPAIR(qedn_task->sgl_task_params.sgl_phys_addr);
+	dma_free_coherent(&qedn->pdev->dev,
+			  QEDN_MAX_SEGMENTS * sizeof(struct nvmetcp_sge),
+			  qedn_task->sgl_task_params.sgl,
+			  sgl_pa);
+	qedn_task->sgl_task_params.sgl = NULL;
+}
+
+static void qedn_destroy_single_task(struct qedn_task_ctx *qedn_task)
+{
+	u16 itid;
+
+	itid = qedn_task->itid;
+	list_del(&qedn_task->entry);
+	qedn_free_nvme_sg(qedn_task);
+	qedn_free_fw_sgl(qedn_task);
+	kfree(qedn_task);
+	qedn_task = NULL;
+}
+
+void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,
+			     struct qedn_io_resources *io_resrc)
+{
+	struct qedn_task_ctx *qedn_task, *task_tmp;
+
+	/* Destroy tasks from the free task list */
+	list_for_each_entry_safe(qedn_task, task_tmp,
+				 &io_resrc->task_free_list, entry) {
+		qedn_destroy_single_task(qedn_task);
+		io_resrc->num_free_tasks -= 1;
+	}
+}
+
+static int qedn_alloc_nvme_sg(struct qedn_task_ctx *qedn_task)
+{
+	int rc;
+
+	qedn_task->nvme_sg = kcalloc(QEDN_MAX_SEGMENTS,
+				     sizeof(*qedn_task->nvme_sg), GFP_KERNEL);
+	if (!qedn_task->nvme_sg) {
+		rc = -ENOMEM;
+
+		return rc;
+	}
+
+	return 0;
+}
+
+static int qedn_alloc_fw_sgl(struct qedn_task_ctx *qedn_task)
+{
+	struct nvmetcp_sge **sgl = &qedn_task->sgl_task_params.sgl;
+	struct qedn_ctx *qedn = qedn_task->qedn_conn->qedn;
+	dma_addr_t sgl_phys;
+	u32 sz;
+
+	sz = QEDN_MAX_SEGMENTS * sizeof(struct nvmetcp_sge);
+	*sgl = dma_alloc_coherent(&qedn->pdev->dev, sz, &sgl_phys, GFP_KERNEL);
+	if (!*sgl) {
+		pr_err("Couldn't allocate FW sgl\n");
+
+		return -ENOMEM;
+	}
+
+	DMA_REGPAIR_LE(qedn_task->sgl_task_params.sgl_phys_addr, sgl_phys);
+
+	return 0;
+}
+
+static inline void *qedn_get_fw_task(struct qed_nvmetcp_tid *info, u16 itid)
+{
+	return (void *)(info->blocks[itid / info->num_tids_per_block] +
+			(itid % info->num_tids_per_block) * info->size);
+}
+
+static struct qedn_task_ctx *qedn_alloc_task(struct qedn_conn_ctx *conn_ctx,
+					     u16 itid)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_task_ctx *qedn_task;
+	void *fw_task_ctx;
+	int rc = 0;
+
+	qedn_task = kzalloc(sizeof(*qedn_task), GFP_KERNEL);
+	if (!qedn_task)
+		return NULL;
+
+	spin_lock_init(&qedn_task->lock);
+	fw_task_ctx = qedn_get_fw_task(&qedn->tasks, itid);
+	if (!fw_task_ctx) {
+		pr_err("iTID: 0x%x; Failed getting fw_task_ctx memory\n", itid);
+		goto release_task;
+	}
+
+	/* No need to memset fw_task_ctx - its done in the HSI func */
+	qedn_task->qedn_conn = conn_ctx;
+	qedn_task->qedn = qedn;
+	qedn_task->fw_task_ctx = fw_task_ctx;
+	qedn_task->valid = 0;
+	qedn_task->flags = 0;
+	qedn_task->itid = itid;
+	rc = qedn_alloc_fw_sgl(qedn_task);
+	if (rc) {
+		pr_err("iTID: 0x%x; Failed allocating FW sgl\n", itid);
+		goto release_task;
+	}
+
+	rc = qedn_alloc_nvme_sg(qedn_task);
+	if (rc) {
+		pr_err("iTID: 0x%x; Failed allocating FW sgl\n", itid);
+		goto release_fw_sgl;
+	}
+
+	return qedn_task;
+
+release_fw_sgl:
+	qedn_free_fw_sgl(qedn_task);
+release_task:
+	kfree(qedn_task);
+
+	return NULL;
+}
+
+int qedn_alloc_tasks(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_task_ctx *qedn_task = NULL;
+	struct qedn_io_resources *io_resrc;
+	u16 itid, start_itid, offset;
+	struct qedn_fp_queue *fp_q;
+	int i, rc;
+
+	fp_q = conn_ctx->fp_q;
+
+	offset = fp_q->sb_id;
+	io_resrc = &fp_q->host_resrc;
+
+	start_itid = qedn->num_tasks_per_pool * offset;
+	for (i = 0; i < qedn->num_tasks_per_pool; ++i) {
+		itid = start_itid + i;
+		qedn_task = qedn_alloc_task(conn_ctx, itid);
+		if (!qedn_task) {
+			pr_err("Failed allocating task\n");
+			rc = -ENOMEM;
+			goto release_tasks;
+		}
+
+		qedn_task->fp_q = fp_q;
+		io_resrc->num_free_tasks += 1;
+		list_add_tail(&qedn_task->entry, &io_resrc->task_free_list);
+	}
+
+	io_resrc->num_alloc_tasks = io_resrc->num_free_tasks;
+
+	return 0;
+
+release_tasks:
+	qedn_destroy_free_tasks(fp_q, io_resrc);
+
+	return rc;
+}
+
+void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params)
+{
+	u16 sge_cnt = sgl_task_params->num_sges;
+
+	memset(&sgl_task_params->sgl[(sge_cnt - 1)], 0,
+	       sizeof(struct nvmetcp_sge));
+	sgl_task_params->total_buffer_size = 0;
+	sgl_task_params->small_mid_sge = false;
+	sgl_task_params->num_sges = 0;
+}
+
+inline void qedn_host_reset_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx,
+					     u16 cccid)
+{
+	conn_ctx->host_cccid_itid[cccid].itid = cpu_to_le16(QEDN_INVALID_ITID);
+}
+
+inline void qedn_host_set_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx,
+					   u16 cccid, u16 itid)
+{
+	conn_ctx->host_cccid_itid[cccid].itid = cpu_to_le16(itid);
+}
+
+static void qedn_clear_sgl(struct qedn_ctx *qedn,
+			   struct qedn_task_ctx *qedn_task)
+{
+	struct storage_sgl_task_params *sgl_task_params;
+	enum dma_data_direction dma_dir;
+	u32 sge_cnt;
+
+	sgl_task_params = &qedn_task->sgl_task_params;
+	sge_cnt = sgl_task_params->num_sges;
+
+	/* Nothing to do if no SGEs were used */
+	if (!qedn_task->task_size || !sge_cnt)
+		return;
+
+	dma_dir = (qedn_task->req_direction == WRITE ?
+			DMA_TO_DEVICE : DMA_FROM_DEVICE);
+	dma_unmap_sg(&qedn->pdev->dev, qedn_task->nvme_sg, sge_cnt, dma_dir);
+	memset(&qedn_task->nvme_sg[(sge_cnt - 1)], 0,
+	       sizeof(struct scatterlist));
+	qedn_common_clear_fw_sgl(sgl_task_params);
+	qedn_task->task_size = 0;
+}
+
+static void qedn_clear_task(struct qedn_conn_ctx *conn_ctx,
+			    struct qedn_task_ctx *qedn_task)
+{
+	/* Task lock isn't needed since it is no longer in use */
+	qedn_clear_sgl(conn_ctx->qedn, qedn_task);
+	qedn_task->valid = 0;
+	qedn_task->flags = 0;
+
+	atomic_dec(&conn_ctx->num_active_tasks);
+}
+
+void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_fp_queue *fp_q = conn_ctx->fp_q;
+	struct qedn_task_ctx *qedn_task, *task_tmp;
+	struct qedn_io_resources *io_resrc;
+	int num_returned_tasks = 0;
+	int num_active_tasks;
+
+	io_resrc = &fp_q->host_resrc;
+
+	/* Return tasks that aren't "Used by FW" to the pool */
+	list_for_each_entry_safe(qedn_task, task_tmp,
+				 &conn_ctx->active_task_list, entry) {
+		qedn_clear_task(conn_ctx, qedn_task);
+		num_returned_tasks++;
+	}
+
+	if (num_returned_tasks) {
+		spin_lock(&io_resrc->resources_lock);
+		/* Return tasks to FP_Q pool in one shot */
+
+		list_splice_tail_init(&conn_ctx->active_task_list,
+				      &io_resrc->task_free_list);
+		io_resrc->num_free_tasks += num_returned_tasks;
+		spin_unlock(&io_resrc->resources_lock);
+	}
+
+	num_active_tasks = atomic_read(&conn_ctx->num_active_tasks);
+	if (num_active_tasks)
+		pr_err("num_active_tasks is %u after cleanup.\n",
+		       num_active_tasks);
+}
+
+void qedn_return_task_to_pool(struct qedn_conn_ctx *conn_ctx,
+			      struct qedn_task_ctx *qedn_task)
+{
+	struct qedn_fp_queue *fp_q = conn_ctx->fp_q;
+	struct qedn_io_resources *io_resrc;
+	unsigned long lock_flags;
+
+	io_resrc = &fp_q->host_resrc;
+
+	spin_lock_irqsave(&qedn_task->lock, lock_flags);
+	qedn_task->valid = 0;
+	qedn_task->flags = 0;
+	qedn_clear_sgl(conn_ctx->qedn, qedn_task);
+	spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
+
+	spin_lock(&conn_ctx->task_list_lock);
+	list_del(&qedn_task->entry);
+	qedn_host_reset_cccid_itid_entry(conn_ctx, qedn_task->cccid);
+	spin_unlock(&conn_ctx->task_list_lock);
+
+	atomic_dec(&conn_ctx->num_active_tasks);
+	atomic_dec(&conn_ctx->num_active_fw_tasks);
+
+	spin_lock(&io_resrc->resources_lock);
+	list_add_tail(&qedn_task->entry, &io_resrc->task_free_list);
+	io_resrc->num_free_tasks += 1;
+	spin_unlock(&io_resrc->resources_lock);
+}
+
+struct qedn_task_ctx *
+qedn_get_free_task_from_pool(struct qedn_conn_ctx *conn_ctx, u16 cccid)
+{
+	struct qedn_task_ctx *qedn_task = NULL;
+	struct qedn_io_resources *io_resrc;
+	struct qedn_fp_queue *fp_q;
+
+	fp_q = conn_ctx->fp_q;
+	io_resrc = &fp_q->host_resrc;
+
+	spin_lock(&io_resrc->resources_lock);
+	qedn_task = list_first_entry_or_null(&io_resrc->task_free_list,
+					     struct qedn_task_ctx, entry);
+	if (unlikely(!qedn_task)) {
+		spin_unlock(&io_resrc->resources_lock);
+
+		return NULL;
+	}
+	list_del(&qedn_task->entry);
+	io_resrc->num_free_tasks -= 1;
+	spin_unlock(&io_resrc->resources_lock);
+
+	spin_lock(&conn_ctx->task_list_lock);
+	list_add_tail(&qedn_task->entry, &conn_ctx->active_task_list);
+	qedn_host_set_cccid_itid_entry(conn_ctx, cccid, qedn_task->itid);
+	spin_unlock(&conn_ctx->task_list_lock);
+
+	atomic_inc(&conn_ctx->num_active_tasks);
+	qedn_task->cccid = cccid;
+	qedn_task->qedn_conn = conn_ctx;
+	qedn_task->valid = 1;
+
+	return qedn_task;
+}
+
 int qedn_queue_request(struct qedn_conn_ctx *qedn_conn,
 		       struct nvme_tcp_ofld_req *req)
 {
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 17/20] qedn: Add support of NVME ICReq & ICResp
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (15 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 16/20] qedn: Add support of Task and SGL Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 18/20] qedn: Add IO level fastpath functionality Prabhakar Kushwaha
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

Once a TCP connection established, the host sends an Initialize
Connection Request (ICReq) PDU to the controller.
Further Initialize Connection Response (ICResp) PDU received from
controller is processed by host to establish a connection and
exchange connection configuration parameters.

This patch present support of generation of ICReq and processing of
ICResp. It also update host configuration based on exchanged parameters.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/hw/qedn/qedn.h      |  38 ++++
 drivers/nvme/hw/qedn/qedn_conn.c | 334 ++++++++++++++++++++++++++++++-
 drivers/nvme/hw/qedn/qedn_main.c |  14 ++
 drivers/nvme/hw/qedn/qedn_task.c |   8 +-
 4 files changed, 390 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 78efa8c02810..829d474b3ab1 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -16,6 +16,7 @@
 
 /* Driver includes */
 #include "../../host/tcp-offload.h"
+#include <linux/nvme-tcp.h>
 
 #define QEDN_MODULE_NAME "qedn"
 
@@ -42,6 +43,8 @@
 
 /* Protocol defines */
 #define QEDN_MAX_IO_SIZE QED_NVMETCP_MAX_IO_SIZE
+#define QEDN_MAX_PDU_SIZE 0x80000 /* 512KB */
+#define QEDN_MAX_OUTSTANDING_R2T_PDUS 0 /* 0 Based == 1 max R2T */
 
 #define QEDN_FW_SLOW_IO_MIN_SGE_LIMIT (9700 / 6)
 
@@ -50,6 +53,13 @@
 
 #define QEDN_INVALID_ITID 0xFFFF
 
+#define QEDN_ICREQ_FW_PAYLOAD (sizeof(struct nvme_tcp_icreq_pdu) -\
+			       QED_NVMETCP_NON_IO_HDR_SIZE)
+#define QEDN_ICREQ_FW_PAYLOAD_START 8
+
+/* The FW will handle the ICReq as CCCID 0 (FW internal design) */
+#define QEDN_ICREQ_CCCID 0
+
 /*
  * TCP offload stack default configurations and defines.
  * Future enhancements will allow controlling the configurable
@@ -120,6 +130,16 @@ struct qedn_fp_queue {
 	char irqname[QEDN_IRQ_NAME_LEN];
 };
 
+struct qedn_negotiation_params {
+	u32 maxh2cdata; /* Negotiation */
+	u32 maxr2t; /* Validation */
+	u16 pfv; /* Validation */
+	bool hdr_digest; /* Negotiation */
+	bool data_digest; /* Negotiation */
+	u8 cpda; /* Negotiation */
+	u8 hpda; /* Validation */
+};
+
 struct qedn_ctx {
 	struct pci_dev *pdev;
 	struct qed_dev *cdev;
@@ -176,6 +196,9 @@ struct qedn_endpoint {
 	struct nvmetcp_db_data db_data;
 	void __iomem *p_doorbell;
 
+	/* Spinlock for accessing FW queue */
+	spinlock_t doorbell_lock;
+
 	/* TCP Params */
 	__be32 dst_addr[4]; /* In network order */
 	__be32 src_addr[4]; /* In network order */
@@ -252,6 +275,12 @@ struct qedn_ctrl {
 	atomic_t host_num_active_conns;
 };
 
+struct qedn_icreq_padding {
+	u32 *buffer;
+	dma_addr_t pa;
+	struct nvmetcp_sge sge;
+};
+
 /* Connection level struct */
 struct qedn_conn_ctx {
 	/* IO path */
@@ -300,6 +329,11 @@ struct qedn_conn_ctx {
 
 	size_t sq_depth;
 
+	struct qedn_negotiation_params required_params;
+	struct qedn_negotiation_params pdu_params;
+	struct nvme_tcp_icresp_pdu icresp;
+	struct qedn_icreq_padding *icreq_pad;
+
 	/* "dummy" socket */
 	struct socket *sock;
 };
@@ -308,6 +342,7 @@ enum qedn_conn_resources_state {
 	QEDN_CONN_RESRC_FW_SQ,
 	QEDN_CONN_RESRC_ACQUIRE_CONN,
 	QEDN_CONN_RESRC_TASKS,
+	QEDN_CONN_RESRC_ICREQ_PAD,
 	QEDN_CONN_RESRC_CCCID_ITID_MAP,
 	QEDN_CONN_RESRC_TCP_PORT,
 	QEDN_CONN_RESRC_DB_ADD,
@@ -339,5 +374,8 @@ struct qedn_task_ctx *
 qedn_get_free_task_from_pool(struct qedn_conn_ctx *conn_ctx, u16 cccid);
 void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,
 			     struct qedn_io_resources *io_resrc);
+void qedn_prep_icresp(struct qedn_conn_ctx *conn_ctx,
+		      struct nvmetcp_fw_cqe *cqe);
+void qedn_ring_doorbell(struct qedn_conn_ctx *conn_ctx);
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
index 5c5b365df522..b4c0a1a3e890 100644
--- a/drivers/nvme/hw/qedn/qedn_conn.c
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -34,6 +34,18 @@ inline int qedn_qid(struct nvme_tcp_ofld_queue *queue)
 	return queue - queue->ctrl->queues;
 }
 
+void qedn_ring_doorbell(struct qedn_conn_ctx *conn_ctx)
+{
+	u16 prod_idx;
+
+	prod_idx = qed_chain_get_prod_idx(&conn_ctx->ep.fw_sq_chain);
+	conn_ctx->ep.db_data.sq_prod = cpu_to_le16(prod_idx);
+
+	/* wmb - Make sure fw idx is coherent */
+	wmb();
+	writel(*(u32 *)&conn_ctx->ep.db_data, conn_ctx->ep.p_doorbell);
+}
+
 int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx,
 		       enum qedn_conn_state new_state)
 {
@@ -143,6 +155,71 @@ int qedn_initialize_endpoint(struct qedn_endpoint *ep, u8 *local_mac_addr,
 	return -1;
 }
 
+static int qedn_alloc_icreq_pad(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_icreq_padding *icreq_pad;
+	u32 *buffer;
+	int rc = 0;
+
+	icreq_pad = kzalloc(sizeof(*icreq_pad), GFP_KERNEL);
+	if (!icreq_pad)
+		return -ENOMEM;
+
+	conn_ctx->icreq_pad = icreq_pad;
+	memset(&icreq_pad->sge, 0, sizeof(icreq_pad->sge));
+	buffer = dma_alloc_coherent(&qedn->pdev->dev,
+				    QEDN_ICREQ_FW_PAYLOAD,
+				    &icreq_pad->pa,
+				    GFP_KERNEL);
+	if (!buffer) {
+		pr_err("Could not allocate icreq_padding SGE buffer.\n");
+		rc =  -ENOMEM;
+		goto release_icreq_pad;
+	}
+
+	DMA_REGPAIR_LE(icreq_pad->sge.sge_addr, icreq_pad->pa);
+	icreq_pad->sge.sge_len = cpu_to_le32(QEDN_ICREQ_FW_PAYLOAD);
+	icreq_pad->buffer = buffer;
+	set_bit(QEDN_CONN_RESRC_ICREQ_PAD, &conn_ctx->resrc_state);
+
+	return 0;
+
+release_icreq_pad:
+	kfree(icreq_pad);
+	conn_ctx->icreq_pad = NULL;
+
+	return rc;
+}
+
+static void qedn_free_icreq_pad(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_icreq_padding *icreq_pad;
+	u32 *buffer;
+
+	icreq_pad = conn_ctx->icreq_pad;
+	if (unlikely(!icreq_pad)) {
+		pr_err("null ptr in icreq_pad in conn_ctx\n");
+		goto finally;
+	}
+
+	buffer = icreq_pad->buffer;
+	if (buffer) {
+		dma_free_coherent(&qedn->pdev->dev,
+				  QEDN_ICREQ_FW_PAYLOAD,
+				  (void *)buffer,
+				  icreq_pad->pa);
+		icreq_pad->buffer = NULL;
+	}
+
+	kfree(icreq_pad);
+	conn_ctx->icreq_pad = NULL;
+
+finally:
+	clear_bit(QEDN_CONN_RESRC_ICREQ_PAD, &conn_ctx->resrc_state);
+}
+
 static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
 {
 	struct qedn_ctx *qedn = conn_ctx->qedn;
@@ -175,6 +252,9 @@ static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
 		clear_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
 	}
 
+	if (test_bit(QEDN_CONN_RESRC_ICREQ_PAD, &conn_ctx->resrc_state))
+		qedn_free_icreq_pad(conn_ctx);
+
 	if (test_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state)) {
 		clear_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);
 			qedn_return_active_tasks(conn_ctx);
@@ -336,6 +416,215 @@ void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx)
 	queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
 }
 
+static int qedn_nvmetcp_update_conn(struct qedn_ctx *qedn,
+				    struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_negotiation_params *pdu_params = &conn_ctx->pdu_params;
+	struct qed_nvmetcp_params_update *conn_info;
+	int rc;
+
+	conn_info = kzalloc(sizeof(*conn_info), GFP_KERNEL);
+	if (!conn_info)
+		return -ENOMEM;
+
+	conn_info->hdr_digest_en = pdu_params->hdr_digest;
+	conn_info->data_digest_en = pdu_params->data_digest;
+	conn_info->max_recv_pdu_length = QEDN_MAX_PDU_SIZE;
+	conn_info->max_io_size = QEDN_MAX_IO_SIZE;
+	conn_info->max_send_pdu_length = pdu_params->maxh2cdata;
+
+	rc = qed_ops->update_conn(qedn->cdev, conn_ctx->conn_handle, conn_info);
+	if (rc) {
+		pr_err("Could not update connection\n");
+		rc = -ENXIO;
+	}
+
+	kfree(conn_info);
+
+	return rc;
+}
+
+static int qedn_update_ramrod(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int rc = 0;
+
+	rc = qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_UPDATE_EQE);
+	if (rc)
+		return rc;
+
+	rc = qedn_nvmetcp_update_conn(qedn, conn_ctx);
+	if (rc)
+		return rc;
+
+	if (conn_ctx->state != CONN_STATE_WAIT_FOR_UPDATE_EQE) {
+		pr_err("cid 0x%x: Unexpected state 0x%x after update ramrod\n",
+		       conn_ctx->fw_cid, conn_ctx->state);
+
+		return -EINVAL;
+	}
+
+	return rc;
+}
+
+static int qedn_send_icreq(struct qedn_conn_ctx *conn_ctx)
+{
+	struct storage_sgl_task_params *sgl_task_params;
+	struct nvmetcp_task_params task_params;
+	struct qedn_task_ctx *qedn_task = NULL;
+	struct nvme_tcp_icreq_pdu icreq;
+	struct nvmetcp_wqe *chain_sqe;
+	struct nvmetcp_wqe local_sqe;
+
+	qedn_task = qedn_get_free_task_from_pool(conn_ctx, QEDN_ICREQ_CCCID);
+	if (!qedn_task)
+		return -EINVAL;
+
+	memset(&icreq, 0, sizeof(icreq));
+	memset(&local_sqe, 0, sizeof(local_sqe));
+
+	/* Initialize ICReq */
+	icreq.hdr.type = nvme_tcp_icreq;
+	icreq.hdr.hlen = sizeof(icreq);
+	icreq.hdr.pdo = 0;
+	icreq.hdr.plen = cpu_to_le32(icreq.hdr.hlen);
+	icreq.pfv = cpu_to_le16(conn_ctx->required_params.pfv);
+	icreq.maxr2t = cpu_to_le32(conn_ctx->required_params.maxr2t);
+	icreq.hpda = conn_ctx->required_params.hpda;
+	if (conn_ctx->required_params.hdr_digest)
+		icreq.digest |= NVME_TCP_HDR_DIGEST_ENABLE;
+	if (conn_ctx->required_params.data_digest)
+		icreq.digest |= NVME_TCP_DATA_DIGEST_ENABLE;
+
+	/* Initialize task params */
+	task_params.opq.lo = cpu_to_le32(((u64)(qedn_task)) & 0xffffffff);
+	task_params.opq.hi = cpu_to_le32(((u64)(qedn_task)) >> 32);
+	task_params.context = qedn_task->fw_task_ctx;
+	task_params.sqe = &local_sqe;
+	task_params.conn_icid = (u16)conn_ctx->conn_handle;
+	task_params.itid = qedn_task->itid;
+	task_params.cq_rss_number = conn_ctx->default_cq;
+	task_params.tx_io_size = QEDN_ICREQ_FW_PAYLOAD;
+	task_params.rx_io_size = 0; /* Rx doesn't use SGL for icresp */
+
+	/* Init SGE for ICReq padding */
+	sgl_task_params = &qedn_task->sgl_task_params;
+	sgl_task_params->total_buffer_size = task_params.tx_io_size;
+	sgl_task_params->small_mid_sge = false;
+	sgl_task_params->num_sges = 1;
+	memcpy(sgl_task_params->sgl, &conn_ctx->icreq_pad->sge,
+	       sizeof(conn_ctx->icreq_pad->sge));
+
+	/* ICReq is sent as two parts.
+	 * First part: (16 bytes + First 8 bytes of icreq.rsvd2[]) are sent
+	 *             via task context which is initialized above in icreq
+	 * Second part: Rest bytes are sent via SGE, happening here
+	 */
+	memcpy(conn_ctx->icreq_pad->buffer,
+	       &icreq.rsvd2[QEDN_ICREQ_FW_PAYLOAD_START],
+	       QEDN_ICREQ_FW_PAYLOAD);
+
+	qed_ops->init_icreq_exchange(&task_params, &icreq,
+				     sgl_task_params,  NULL);
+
+	qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_IC_COMP);
+	atomic_inc(&conn_ctx->num_active_fw_tasks);
+
+	/* spin_lock - doorbell is accessed  both Rx flow and response flow */
+	spin_lock(&conn_ctx->ep.doorbell_lock);
+	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
+	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
+	qedn_ring_doorbell(conn_ctx);
+	spin_unlock(&conn_ctx->ep.doorbell_lock);
+
+	return 0;
+}
+
+void qedn_prep_icresp(struct qedn_conn_ctx *conn_ctx,
+		      struct nvmetcp_fw_cqe *cqe)
+{
+	struct nvmetcp_icresp_mdata *icresp_from_cqe =
+		(struct nvmetcp_icresp_mdata *)&cqe->cqe_data.icresp_mdata;
+	struct nvme_tcp_icresp_pdu *icresp = &conn_ctx->icresp;
+	struct nvme_tcp_ofld_ctrl *ctrl = conn_ctx->ctrl;
+	struct qedn_ctrl *qctrl = NULL;
+
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+	if (!qctrl)
+		return;
+
+	icresp->pfv = cpu_to_le16(icresp_from_cqe->pfv);
+	icresp->cpda = icresp_from_cqe->cpda;
+	icresp->digest = icresp_from_cqe->digest;
+	icresp->maxdata = cpu_to_le32(icresp_from_cqe->maxdata);
+
+	qedn_set_sp_wa(conn_ctx, HANDLE_ICRESP);
+	queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
+}
+
+static int qedn_handle_icresp(struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_tcp_icresp_pdu *icresp = &conn_ctx->icresp;
+	struct qedn_negotiation_params *pdu_params;
+	int rc = 0;
+	u16 pfv;
+
+	/* Swapping requirement will be removed in future FW versions */
+	pfv = __swab16(le16_to_cpu(icresp->pfv));
+
+	qedn_free_icreq_pad(conn_ctx);
+
+	/* Validate ICResp */
+	if (pfv != conn_ctx->required_params.pfv) {
+		pr_err("cid %u: unsupported pfv %u\n", conn_ctx->fw_cid, pfv);
+
+		return -EINVAL;
+	}
+
+	if (icresp->cpda > conn_ctx->required_params.cpda) {
+		pr_err("cid %u: unsupported cpda %u\n",
+		       conn_ctx->fw_cid, icresp->cpda);
+
+		return -EINVAL;
+	}
+
+	if ((NVME_TCP_HDR_DIGEST_ENABLE & icresp->digest) !=
+	    conn_ctx->required_params.hdr_digest) {
+		if ((NVME_TCP_HDR_DIGEST_ENABLE & icresp->digest) >
+		    conn_ctx->required_params.hdr_digest) {
+			pr_err("cid 0x%x: invalid header digest bit\n",
+			       conn_ctx->fw_cid);
+		}
+	}
+
+	if ((NVME_TCP_DATA_DIGEST_ENABLE & icresp->digest) !=
+	    conn_ctx->required_params.data_digest) {
+		if ((NVME_TCP_DATA_DIGEST_ENABLE & icresp->digest) >
+		    conn_ctx->required_params.data_digest) {
+			pr_err("cid 0x%x: invalid data digest bit\n",
+			       conn_ctx->fw_cid);
+	}
+	}
+
+	pdu_params = &conn_ctx->pdu_params;
+	memset(pdu_params, 0, sizeof(conn_ctx->pdu_params));
+	/* Swapping requirement will be removed in future FW versions */
+	pdu_params->maxh2cdata = __swab32(le32_to_cpu(icresp->maxdata));
+	pdu_params->maxh2cdata = QEDN_MAX_PDU_SIZE;
+	if (pdu_params->maxh2cdata > QEDN_MAX_PDU_SIZE)
+		pdu_params->maxh2cdata = QEDN_MAX_PDU_SIZE;
+
+	pdu_params->pfv = pfv;
+	pdu_params->cpda = icresp->cpda;
+	pdu_params->hpda = conn_ctx->required_params.hpda;
+	pdu_params->hdr_digest = NVME_TCP_HDR_DIGEST_ENABLE & icresp->digest;
+	pdu_params->data_digest = NVME_TCP_DATA_DIGEST_ENABLE & icresp->digest;
+	pdu_params->maxr2t = conn_ctx->required_params.maxr2t;
+	rc = qedn_update_ramrod(conn_ctx);
+
+	return rc;
+}
+
 /* Slowpath EQ Callback */
 int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 {
@@ -393,7 +682,8 @@ int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 			if (rc)
 				return rc;
 
-			/* Placeholder - for ICReq flow */
+			qedn_set_sp_wa(conn_ctx, SEND_ICREQ);
+			queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
 		}
 
 		break;
@@ -445,6 +735,8 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 
 	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
 
+	spin_lock_init(&conn_ctx->ep.doorbell_lock);
+
 	atomic_set(&conn_ctx->num_active_tasks, 0);
 	atomic_set(&conn_ctx->num_active_fw_tasks, 0);
 
@@ -509,6 +801,11 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 
 	memset(conn_ctx->host_cccid_itid, 0xFF, dma_size);
 	set_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state);
+
+	rc = qedn_alloc_icreq_pad(conn_ctx);
+		if (rc)
+			goto rel_conn;
+
 	rc = qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_CONNECT_DONE);
 	if (rc)
 		goto rel_conn;
@@ -581,6 +878,9 @@ void qedn_sp_wq_handler(struct work_struct *work)
 
 	qedn = conn_ctx->qedn;
 	if (test_bit(DESTROY_CONNECTION, &conn_ctx->agg_work_action)) {
+		if (test_bit(HANDLE_ICRESP, &conn_ctx->agg_work_action))
+			qedn_clr_sp_wa(conn_ctx, HANDLE_ICRESP);
+
 		qedn_destroy_connection(conn_ctx);
 
 		return;
@@ -595,6 +895,38 @@ void qedn_sp_wq_handler(struct work_struct *work)
 			return;
 		}
 	}
+
+	if (test_bit(SEND_ICREQ, &conn_ctx->agg_work_action)) {
+		qedn_clr_sp_wa(conn_ctx, SEND_ICREQ);
+		rc = qedn_send_icreq(conn_ctx);
+		if (rc)
+			return;
+
+		return;
+	}
+
+	if (test_bit(HANDLE_ICRESP, &conn_ctx->agg_work_action)) {
+		rc = qedn_handle_icresp(conn_ctx);
+
+		qedn_clr_sp_wa(conn_ctx, HANDLE_ICRESP);
+		if (rc) {
+			pr_err("IC handling returned with 0x%x\n", rc);
+			if (test_and_set_bit(DESTROY_CONNECTION,
+					     &conn_ctx->agg_work_action))
+				return;
+
+			qedn_destroy_connection(conn_ctx);
+
+			return;
+		}
+
+		atomic_inc(&conn_ctx->est_conn_indicator);
+		qedn_set_con_state(conn_ctx,
+				   CONN_STATE_NVMETCP_CONN_ESTABLISHED);
+		wake_up_interruptible(&conn_ctx->conn_waitq);
+
+		return;
+	}
 }
 
 /* Clear connection aggregative slowpath work action */
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 42c8ad6ac2d6..3cf913d527c0 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -286,6 +286,19 @@ static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 	return 0;
 }
 
+static void qedn_set_pdu_params(struct qedn_conn_ctx *conn_ctx)
+{
+	/* Enable digest once supported */
+	conn_ctx->required_params.hdr_digest = 0;
+	conn_ctx->required_params.data_digest = 0;
+
+	conn_ctx->required_params.maxr2t = QEDN_MAX_OUTSTANDING_R2T_PDUS;
+	conn_ctx->required_params.pfv = NVME_TCP_PFV_1_0;
+	conn_ctx->required_params.cpda = 0;
+	conn_ctx->required_params.hpda = 0;
+	conn_ctx->required_params.maxh2cdata = QEDN_MAX_PDU_SIZE;
+}
+
 static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
 			     size_t queue_size)
 {
@@ -317,6 +330,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
 	conn_ctx->ctrl = ctrl;
 	conn_ctx->sq_depth = queue_size;
 	mutex_init(&conn_ctx->send_mutex);
+	qedn_set_pdu_params(conn_ctx);
 
 	init_waitqueue_head(&conn_ctx->conn_waitq);
 	atomic_set(&conn_ctx->est_conn_indicator, 0);
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
index 7b228b5c5169..e52460f9650e 100644
--- a/drivers/nvme/hw/qedn/qedn_task.c
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -392,9 +392,11 @@ void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
 			break;
 
 		case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST:
-
-			/* Placeholder - ICReq flow */
-
+			/* Clear ICReq-padding SGE from SGL */
+			qedn_common_clear_fw_sgl(&qedn_task->sgl_task_params);
+			/* Task is not required for icresp processing */
+			qedn_return_task_to_pool(conn_ctx, qedn_task);
+			qedn_prep_icresp(conn_ctx, cqe);
 			break;
 		default:
 			pr_info("Could not identify task type\n");
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 18/20] qedn: Add IO level fastpath functionality
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (16 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 17/20] qedn: Add support of NVME ICReq & ICResp Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 19/20] qedn: Add Connection and IO level recovery flows Prabhakar Kushwaha
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

From: Shai Malin <smalin@marvell.com>

This patch will present the IO level functionality of qedn
nvme-tcp-offload host mode. The qedn_task_ctx structure is containing
various params and state of the current IO, and is mapped 1x1 to the
fw_task_ctx which is a HW and FW IO context.
A qedn_task is mapped directly to its parent connection.
For every new IO a qedn_task structure will be assigned and they will be
linked for the entire IO's life span.

The patch will include 2 flows:
  1. Send new command to the FW:
	 The flow is: nvme_tcp_ofld_queue_rq() which invokes qedn_send_req()
	 which invokes qedn_queue_request() which will:
         - Assign fw_task_ctx.
	 - Prepare the Read/Write SG buffer.
	 - Initialize the HW and FW context.
	 - Pass the IO to the FW.

  2. Process the IO completion:
     The flow is: qedn_irq_handler() which invokes qedn_fw_cq_fp_handler()
	 which invokes qedn_io_work_cq() which will:
	 - process the FW completion.
	 - Return the fw_task_ctx to the task pool.
	 - complete the nvme req.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/hw/qedn/qedn.h      |   5 +
 drivers/nvme/hw/qedn/qedn_conn.c |   1 +
 drivers/nvme/hw/qedn/qedn_main.c |   8 +
 drivers/nvme/hw/qedn/qedn_task.c | 317 ++++++++++++++++++++++++++++++-
 4 files changed, 327 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 829d474b3ab1..b36994be65cb 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -172,6 +172,10 @@ struct qedn_ctx {
 	struct qed_nvmetcp_tid	tasks;
 };
 
+enum qedn_task_flags {
+	QEDN_TASK_USED_BY_FW,
+};
+
 struct qedn_task_ctx {
 	struct qedn_conn_ctx *qedn_conn;
 	struct qedn_ctx *qedn;
@@ -376,6 +380,7 @@ void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,
 			     struct qedn_io_resources *io_resrc);
 void qedn_prep_icresp(struct qedn_conn_ctx *conn_ctx,
 		      struct nvmetcp_fw_cqe *cqe);
+void qedn_swap_bytes(u32 *p, int size);
 void qedn_ring_doorbell(struct qedn_conn_ctx *conn_ctx);
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
index b4c0a1a3e890..ea072eff34a6 100644
--- a/drivers/nvme/hw/qedn/qedn_conn.c
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -528,6 +528,7 @@ static int qedn_send_icreq(struct qedn_conn_ctx *conn_ctx)
 				     sgl_task_params,  NULL);
 
 	qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_IC_COMP);
+	set_bit(QEDN_TASK_USED_BY_FW, &qedn_task->flags);
 	atomic_inc(&conn_ctx->num_active_fw_tasks);
 
 	/* spin_lock - doorbell is accessed  both Rx flow and response flow */
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 3cf913d527c0..fb47e315ab03 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -1047,6 +1047,14 @@ static int qedn_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	return __qedn_probe(pdev);
 }
 
+void qedn_swap_bytes(u32 *p, int size)
+{
+	int i;
+
+	for (i = 0; i < size; ++i, ++p)
+		*p = __swab32(*p);
+}
+
 static struct pci_driver qedn_pci_driver = {
 	.name     = QEDN_MODULE_NAME,
 	.id_table = qedn_pci_tbl,
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
index e52460f9650e..dd0b5f31c052 100644
--- a/drivers/nvme/hw/qedn/qedn_task.c
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -11,6 +11,77 @@
 /* Driver includes */
 #include "qedn.h"
 
+extern const struct qed_nvmetcp_ops *qed_ops;
+
+static bool qedn_sgl_has_small_mid_sge(struct nvmetcp_sge *sgl, u16 sge_count)
+{
+	u16 sge_num;
+
+	if (sge_count > 8) {
+		for (sge_num = 0; sge_num < sge_count; sge_num++) {
+			if (le32_to_cpu(sgl[sge_num].sge_len) <
+			    QEDN_FW_SLOW_IO_MIN_SGE_LIMIT)
+				return true; /* small middle SGE found */
+		}
+	}
+
+	return false; /* no small middle SGEs */
+}
+
+static int qedn_init_sgl(struct qedn_ctx *qedn, struct qedn_task_ctx *qedn_task)
+{
+	struct storage_sgl_task_params *sgl_task_params;
+	enum dma_data_direction dma_dir;
+	struct scatterlist *sg;
+	struct request *rq;
+	u16 num_sges;
+	int index;
+	u32 len;
+	int rc;
+
+	sgl_task_params = &qedn_task->sgl_task_params;
+	rq = blk_mq_rq_from_pdu(qedn_task->req);
+	if (qedn_task->task_size == 0) {
+		sgl_task_params->num_sges = 0;
+
+		return 0;
+	}
+
+	/* Convert BIO to scatterlist */
+	num_sges = blk_rq_map_sg(rq->q, rq, qedn_task->nvme_sg);
+	if (qedn_task->req_direction == WRITE)
+		dma_dir = DMA_TO_DEVICE;
+	else
+		dma_dir = DMA_FROM_DEVICE;
+
+	/* DMA map the scatterlist */
+	if (dma_map_sg(&qedn->pdev->dev, qedn_task->nvme_sg,
+		       num_sges, dma_dir) != num_sges) {
+		pr_err("Couldn't map sgl\n");
+		rc = -EPERM;
+
+		return rc;
+	}
+
+	sgl_task_params->total_buffer_size = qedn_task->task_size;
+	sgl_task_params->num_sges = num_sges;
+
+	for_each_sg(qedn_task->nvme_sg, sg, num_sges, index) {
+		DMA_REGPAIR_LE(sgl_task_params->sgl[index].sge_addr,
+			       sg_dma_address(sg));
+		len = sg_dma_len(sg);
+		sgl_task_params->sgl[index].sge_len = cpu_to_le32(len);
+	}
+
+	/* Relevant for Host Write Only */
+	sgl_task_params->small_mid_sge = (qedn_task->req_direction == READ) ?
+		false :
+		qedn_sgl_has_small_mid_sge(sgl_task_params->sgl,
+					   sgl_task_params->num_sges);
+
+	return 0;
+}
+
 static void qedn_free_nvme_sg(struct qedn_task_ctx *qedn_task)
 {
 	kfree(qedn_task->nvme_sg);
@@ -337,12 +408,168 @@ qedn_get_free_task_from_pool(struct qedn_conn_ctx *conn_ctx, u16 cccid)
 	return qedn_task;
 }
 
+int qedn_send_read_cmd(struct qedn_task_ctx *qedn_task,
+		       struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_command *nvme_cmd = &qedn_task->req->nvme_cmd;
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct nvmetcp_task_params task_params;
+	struct nvme_tcp_cmd_pdu cmd_hdr;
+	struct nvmetcp_wqe *chain_sqe;
+	struct nvmetcp_wqe local_sqe;
+	int rc;
+
+	rc = qedn_init_sgl(qedn, qedn_task);
+	if (rc)
+		return rc;
+
+	task_params.opq.lo = cpu_to_le32(((u64)(qedn_task)) & 0xffffffff);
+	task_params.opq.hi = cpu_to_le32(((u64)(qedn_task)) >> 32);
+
+	/* Initialize task params */
+	task_params.context = qedn_task->fw_task_ctx;
+	task_params.sqe = &local_sqe;
+	task_params.tx_io_size = 0;
+	task_params.rx_io_size = qedn_task->task_size;
+	task_params.conn_icid = (u16)conn_ctx->conn_handle;
+	task_params.itid = qedn_task->itid;
+	task_params.cq_rss_number = conn_ctx->default_cq;
+	task_params.send_write_incapsule = 0;
+
+	cmd_hdr.hdr.type = nvme_tcp_cmd;
+	cmd_hdr.hdr.flags = 0;
+	cmd_hdr.hdr.hlen = sizeof(cmd_hdr);
+	cmd_hdr.hdr.pdo = 0x0;
+	cmd_hdr.hdr.plen = cpu_to_le32(cmd_hdr.hdr.hlen);
+
+	qed_ops->init_read_io(&task_params, &cmd_hdr, nvme_cmd,
+			      &qedn_task->sgl_task_params);
+
+	set_bit(QEDN_TASK_USED_BY_FW, &qedn_task->flags);
+	atomic_inc(&conn_ctx->num_active_fw_tasks);
+
+	spin_lock(&conn_ctx->ep.doorbell_lock);
+	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
+	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
+	qedn_ring_doorbell(conn_ctx);
+	spin_unlock(&conn_ctx->ep.doorbell_lock);
+
+	return 0;
+}
+
+int qedn_send_write_cmd(struct qedn_task_ctx *qedn_task,
+			struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_command *nvme_cmd = &qedn_task->req->nvme_cmd;
+	struct nvmetcp_task_params task_params;
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct nvme_tcp_cmd_pdu cmd_hdr;
+	u32 pdu_len = sizeof(cmd_hdr);
+	struct nvmetcp_wqe *chain_sqe;
+	struct nvmetcp_wqe local_sqe;
+	u8 send_write_incapsule;
+	int rc;
+
+	if (qedn_task->task_size <=
+	    nvme_tcp_ofld_inline_data_size(conn_ctx->queue) &&
+	    qedn_task->task_size) {
+		send_write_incapsule = 1;
+		pdu_len += qedn_task->task_size;
+
+		/* Add digest length once supported */
+		cmd_hdr.hdr.pdo = sizeof(cmd_hdr);
+	} else {
+		send_write_incapsule = 0;
+
+		cmd_hdr.hdr.pdo = 0x0;
+	}
+
+	rc = qedn_init_sgl(qedn, qedn_task);
+	if (rc)
+		return rc;
+
+	task_params.host_cccid = cpu_to_le16(qedn_task->cccid);
+	task_params.opq.lo = cpu_to_le32(((u64)(qedn_task)) & 0xffffffff);
+	task_params.opq.hi = cpu_to_le32(((u64)(qedn_task)) >> 32);
+
+	/* Initialize task params */
+	task_params.context = qedn_task->fw_task_ctx;
+	task_params.sqe = &local_sqe;
+	task_params.tx_io_size = qedn_task->task_size;
+	task_params.rx_io_size = 0;
+	task_params.conn_icid = (u16)conn_ctx->conn_handle;
+	task_params.itid = qedn_task->itid;
+	task_params.cq_rss_number = conn_ctx->default_cq;
+	task_params.send_write_incapsule = send_write_incapsule;
+
+	cmd_hdr.hdr.type = nvme_tcp_cmd;
+	cmd_hdr.hdr.flags = 0;
+	cmd_hdr.hdr.hlen = sizeof(cmd_hdr);
+	cmd_hdr.hdr.plen = cpu_to_le32(pdu_len);
+
+	qed_ops->init_write_io(&task_params, &cmd_hdr, nvme_cmd,
+			       &qedn_task->sgl_task_params);
+
+	set_bit(QEDN_TASK_USED_BY_FW, &qedn_task->flags);
+	atomic_inc(&conn_ctx->num_active_fw_tasks);
+
+	spin_lock(&conn_ctx->ep.doorbell_lock);
+	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
+	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
+	qedn_ring_doorbell(conn_ctx);
+	spin_unlock(&conn_ctx->ep.doorbell_lock);
+
+	return 0;
+}
+
 int qedn_queue_request(struct qedn_conn_ctx *qedn_conn,
 		       struct nvme_tcp_ofld_req *req)
 {
-	/* Process the request */
+	struct qedn_task_ctx *qedn_task;
+	struct request *rq;
+	int rc = 0;
+	u16 cccid;
 
-	return 0;
+	rq = blk_mq_rq_from_pdu(req);
+
+	/* Placeholder - async */
+
+	cccid = rq->tag;
+	qedn_task = qedn_get_free_task_from_pool(qedn_conn, cccid);
+	if (unlikely(!qedn_task)) {
+		pr_err("Not able to allocate task context resource\n");
+
+		return BLK_STS_NOTSUPP;
+	}
+
+	req->private_data = qedn_task;
+	qedn_task->req = req;
+
+	/* Placeholder - handle (req->async) */
+
+	/* Check if there are physical segments in request to determine the
+	 * task size. The logic of nvme_tcp_set_sg_null() will be implemented
+	 * as a part of qedn_set_sg_host_data().
+	 */
+	qedn_task->task_size = blk_rq_nr_phys_segments(rq) ?
+				blk_rq_payload_bytes(rq) : 0;
+	qedn_task->req_direction = rq_data_dir(rq);
+	if (qedn_task->req_direction == WRITE)
+		rc = qedn_send_write_cmd(qedn_task, qedn_conn);
+	else
+		rc = qedn_send_read_cmd(qedn_task, qedn_conn);
+
+	if (unlikely(rc)) {
+		pr_err("Read/Write command failure\n");
+
+		return BLK_STS_TRANSPORT;
+	}
+
+	spin_lock(&qedn_conn->ep.doorbell_lock);
+	qedn_ring_doorbell(qedn_conn);
+	spin_unlock(&qedn_conn->ep.doorbell_lock);
+
+	return BLK_STS_OK;
 }
 
 struct qedn_task_ctx *qedn_cqe_get_active_task(struct nvmetcp_fw_cqe *cqe)
@@ -353,8 +580,75 @@ struct qedn_task_ctx *qedn_cqe_get_active_task(struct nvmetcp_fw_cqe *cqe)
 					+ le32_to_cpu(p->lo)));
 }
 
+static struct nvme_tcp_ofld_req *qedn_decouple_req_task(struct qedn_task_ctx
+							*qedn_task)
+{
+	struct nvme_tcp_ofld_req *ulp_req = qedn_task->req;
+
+	qedn_task->req = NULL;
+	if (ulp_req)
+		ulp_req->private_data = NULL;
+
+	return ulp_req;
+}
+
+static inline int qedn_comp_valid_task(struct qedn_task_ctx *qedn_task,
+				       union nvme_result *result, __le16 status)
+{
+	struct qedn_conn_ctx *conn_ctx = qedn_task->qedn_conn;
+	struct nvme_tcp_ofld_req *req;
+
+	req = qedn_decouple_req_task(qedn_task);
+	qedn_return_task_to_pool(conn_ctx, qedn_task);
+	if (!req) {
+		pr_err("req not found\n");
+
+		return -EINVAL;
+	}
+
+	/* Call request done to complete the request */
+	if (req->done)
+		req->done(req, result, status);
+	else
+		pr_err("request done not Set !!!\n");
+
+	return 0;
+}
+
+int qedn_process_nvme_cqe(struct qedn_task_ctx *qedn_task,
+			  struct nvme_completion *cqe)
+{
+	int rc = 0;
+
+	/* CQE arrives swapped
+	 * Swapping requirement will be removed in future FW versions
+	 */
+	qedn_swap_bytes((u32 *)cqe, (sizeof(*cqe) / sizeof(u32)));
+
+	/* Placeholder - async */
+
+	rc = qedn_comp_valid_task(qedn_task, &cqe->result, cqe->status);
+
+	return rc;
+}
+
+int qedn_complete_c2h(struct qedn_task_ctx *qedn_task)
+{
+	int rc = 0;
+
+	__le16 status = cpu_to_le16(NVME_SC_SUCCESS << 1);
+	union nvme_result result = {};
+
+	rc = qedn_comp_valid_task(qedn_task, &result, status);
+
+	return rc;
+}
+
 void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
 {
+	int rc = 0;
+
+	struct nvme_completion *nvme_cqe = NULL;
 	struct qedn_task_ctx *qedn_task = NULL;
 	struct qedn_conn_ctx *conn_ctx = NULL;
 	u16 itid;
@@ -381,13 +675,28 @@ void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
 		case NVMETCP_TASK_TYPE_HOST_WRITE:
 		case NVMETCP_TASK_TYPE_HOST_READ:
 
-			/* Placeholder - IO flow */
+			/* Verify data digest once supported */
 
+			nvme_cqe = (struct nvme_completion *)
+						&cqe->cqe_data.nvme_cqe;
+			rc = qedn_process_nvme_cqe(qedn_task, nvme_cqe);
+			if (rc) {
+				pr_err("Read/Write completion error\n");
+
+				return;
+			}
 			break;
 
 		case NVMETCP_TASK_TYPE_HOST_READ_NO_CQE:
 
-			/* Placeholder - IO flow */
+			/* Verify data digest once supported */
+
+			rc = qedn_complete_c2h(qedn_task);
+			if (rc) {
+				pr_err("Controller To Host Data Transfer error error\n");
+
+				return;
+			}
 
 			break;
 
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 19/20] qedn: Add Connection and IO level recovery flows
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (17 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 18/20] qedn: Add IO level fastpath functionality Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-06-29 12:47 ` [PATCH v4 20/20] qedn: Add support of ASYNC Prabhakar Kushwaha
  2021-07-01 13:23 ` [PATCH v4 00/20] NVMeTCP Offload ULP Christoph Hellwig
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

From: Shai Malin <smalin@marvell.com>

This patch will present the connection level functionalities:
 - conn clear-sq: will release the FW restrictions in order to flush all
   the pending IOs.
 - drain: in case clear-sq is stuck, will release all the device FW
   restrictions in order to flush all the pending IOs.
 - task cleanup - will flush the IO level resources.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/hw/qedn/qedn.h      |   8 ++
 drivers/nvme/hw/qedn/qedn_conn.c | 135 ++++++++++++++++++++++++++++++-
 drivers/nvme/hw/qedn/qedn_main.c |   6 ++
 drivers/nvme/hw/qedn/qedn_task.c |  30 ++++++-
 4 files changed, 176 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index b36994be65cb..5539791fc49b 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -41,6 +41,8 @@
 
 #define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"
 
+#define QEDN_DRAIN_MAX_ATTEMPTS 3
+
 /* Protocol defines */
 #define QEDN_MAX_IO_SIZE QED_NVMETCP_MAX_IO_SIZE
 #define QEDN_MAX_PDU_SIZE 0x80000 /* 512KB */
@@ -91,6 +93,8 @@
 /* Timeouts and delay constants */
 #define QEDN_WAIT_CON_ESTABLSH_TMO 10000 /* 10 seconds */
 #define QEDN_RLS_CONS_TMO 5000 /* 5 sec */
+#define QEDN_TASK_CLEANUP_TMO 3000 /* 3 sec */
+#define QEDN_DRAIN_TMO 1000 /* 1 sec */
 
 enum qedn_state {
 	QEDN_STATE_CORE_PROBED = 0,
@@ -173,7 +177,9 @@ struct qedn_ctx {
 };
 
 enum qedn_task_flags {
+	QEDN_TASK_IS_ICREQ,
 	QEDN_TASK_USED_BY_FW,
+	QEDN_TASK_WAIT_FOR_CLEANUP,
 };
 
 struct qedn_task_ctx {
@@ -321,6 +327,8 @@ struct qedn_conn_ctx {
 	struct list_head active_task_list;
 	atomic_t num_active_tasks;
 	atomic_t num_active_fw_tasks;
+	atomic_t task_cleanups_cnt;
+	wait_queue_head_t cleanup_waitq;
 
 	/* Connection resources - turned on to indicate what resource was
 	 * allocated, to that it can later be released.
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
index ea072eff34a6..a61e77bf8bd0 100644
--- a/drivers/nvme/hw/qedn/qedn_conn.c
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -626,6 +626,11 @@ static int qedn_handle_icresp(struct qedn_conn_ctx *conn_ctx)
 	return rc;
 }
 
+void qedn_error_recovery(struct nvme_ctrl *nctrl)
+{
+	nvme_tcp_ofld_error_recovery(nctrl);
+}
+
 /* Slowpath EQ Callback */
 int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 {
@@ -688,6 +693,7 @@ int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 		}
 
 		break;
+
 	case NVMETCP_EVENT_TYPE_ASYN_TERMINATE_DONE:
 		if (conn_ctx->state != CONN_STATE_WAIT_FOR_DESTROY_DONE)
 			pr_err("CID=0x%x:ASYN_TERMINATE_DONE:Wrong state %u\n",
@@ -696,6 +702,19 @@ int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 			queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
 
 		break;
+
+	case NVMETCP_EVENT_TYPE_ASYN_CLOSE_RCVD:
+	case NVMETCP_EVENT_TYPE_ASYN_ABORT_RCVD:
+	case NVMETCP_EVENT_TYPE_ASYN_MAX_RT_TIME:
+	case NVMETCP_EVENT_TYPE_ASYN_MAX_RT_CNT:
+	case NVMETCP_EVENT_TYPE_ASYN_SYN_RCVD:
+	case NVMETCP_EVENT_TYPE_ASYN_MAX_KA_PROBES_CNT:
+	case NVMETCP_EVENT_TYPE_NVMETCP_CONN_ERROR:
+	case NVMETCP_EVENT_TYPE_TCP_CONN_ERROR:
+		qedn_error_recovery(&conn_ctx->ctrl->nctrl);
+
+		break;
+
 	default:
 		pr_err("CID=0x%x - Recv Unknown Event %u\n",
 		       conn_ctx->fw_cid, fw_event_code);
@@ -835,9 +854,123 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 	return -EINVAL;
 }
 
+static void qedn_cleanup_fw_task(struct qedn_ctx *qedn,
+				 struct qedn_task_ctx *qedn_task)
+{
+	struct qedn_conn_ctx *conn_ctx = qedn_task->qedn_conn;
+	struct nvmetcp_task_params task_params;
+	struct nvmetcp_wqe *chain_sqe;
+	struct nvmetcp_wqe local_sqe;
+	unsigned long lock_flags;
+
+	/* Take lock to prevent race with fastpath, we don't want to
+	 * invoke cleanup flows on tasks that already returned.
+	 */
+	spin_lock_irqsave(&qedn_task->lock, lock_flags);
+	if (!qedn_task->valid) {
+		spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
+
+		return;
+	}
+	/* Skip tasks not used by FW */
+	if (!test_bit(QEDN_TASK_USED_BY_FW, &qedn_task->flags)) {
+		spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
+
+		return;
+	}
+	/* Skip tasks that were already invoked for cleanup */
+	if (unlikely(test_bit(QEDN_TASK_WAIT_FOR_CLEANUP, &qedn_task->flags))) {
+		spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
+
+		return;
+	}
+	set_bit(QEDN_TASK_WAIT_FOR_CLEANUP, &qedn_task->flags);
+	spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
+
+	atomic_inc(&conn_ctx->task_cleanups_cnt);
+
+	task_params.sqe = &local_sqe;
+	task_params.itid = qedn_task->itid;
+	qed_ops->init_task_cleanup(&task_params);
+
+	/* spin_lock - doorbell is accessed  both Rx flow and response flow */
+	spin_lock(&conn_ctx->ep.doorbell_lock);
+	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
+	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
+	qedn_ring_doorbell(conn_ctx);
+	spin_unlock(&conn_ctx->ep.doorbell_lock);
+}
+
+inline int qedn_drain(struct qedn_conn_ctx *conn_ctx)
+{
+	u32 drain_timeout = msecs_to_jiffies(QEDN_DRAIN_TMO);
+	atomic_t *clnup_cnt = &conn_ctx->task_cleanups_cnt;
+	int drain_iter = QEDN_DRAIN_MAX_ATTEMPTS;
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int wrc;
+
+	while (drain_iter) {
+		qed_ops->common->drain(qedn->cdev);
+		msleep(100);
+
+		wrc = wait_event_interruptible_timeout(conn_ctx->cleanup_waitq,
+						       !atomic_read(clnup_cnt),
+						       drain_timeout);
+		if (!wrc) {
+			drain_iter--;
+			continue;
+		}
+
+		return 0;
+	}
+
+	pr_err("CID 0x%x: cleanup after drain failed - need hard reset.\n",
+	       conn_ctx->fw_cid);
+
+	return -EINVAL;
+}
+
+void qedn_cleanup_all_fw_tasks(struct qedn_conn_ctx *conn_ctx)
+{
+	u32 clnup_timeout = msecs_to_jiffies(QEDN_TASK_CLEANUP_TMO);
+	atomic_t *clnup_cnt = &conn_ctx->task_cleanups_cnt;
+	struct qedn_task_ctx *qedn_task, *task_tmp;
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int wrc;
+
+	list_for_each_entry_safe_reverse(qedn_task, task_tmp,
+					 &conn_ctx->active_task_list, entry) {
+		qedn_cleanup_fw_task(qedn, qedn_task);
+	}
+
+	wrc = wait_event_interruptible_timeout(conn_ctx->cleanup_waitq,
+					       atomic_read(clnup_cnt) == 0,
+					       clnup_timeout);
+	if (!wrc) {
+		if (qedn_drain(conn_ctx))
+			return;
+	}
+}
+
+static void qedn_clear_fw_sq(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int rc;
+
+	rc = qed_ops->clear_sq(qedn->cdev, conn_ctx->conn_handle);
+	if (rc)
+		pr_warn("clear_sq failed - rc %u\n", rc);
+}
+
 void qedn_cleanp_fw(struct qedn_conn_ctx *conn_ctx)
 {
-	/* Placeholder - task cleanup */
+	if (atomic_read(&conn_ctx->num_active_fw_tasks)) {
+		conn_ctx->abrt_flag = QEDN_ABORTIVE_TERMINATION;
+		qedn_clear_fw_sq(conn_ctx);
+		qedn_cleanup_all_fw_tasks(conn_ctx);
+	} else {
+		conn_ctx->abrt_flag = QEDN_NON_ABORTIVE_TERMINATION;
+	}
 }
 
 void qedn_destroy_connection(struct qedn_conn_ctx *conn_ctx)
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index fb47e315ab03..d18cc3624764 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -262,12 +262,17 @@ static int qedn_setup_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 
 static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 {
+	struct nvme_ctrl *nctrl = &ctrl->nctrl;
 	struct qedn_ctrl *qctrl;
 
 	qctrl = (struct qedn_ctrl *)ctrl->private_data;
 	if (!qctrl)
 		return -ENODEV;
 
+	if (nctrl->state == NVME_CTRL_CONNECTING ||
+	    nctrl->state == NVME_CTRL_RESETTING)
+		return 0;
+
 	if (test_and_clear_bit(LLH_FILTER, &qctrl->agg_state) &&
 	    qctrl->llh_filter) {
 		qedn_dec_llh_filter(qctrl->qedn, qctrl->llh_filter);
@@ -333,6 +338,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
 	qedn_set_pdu_params(conn_ctx);
 
 	init_waitqueue_head(&conn_ctx->conn_waitq);
+	init_waitqueue_head(&conn_ctx->cleanup_waitq);
 	atomic_set(&conn_ctx->est_conn_indicator, 0);
 	atomic_set(&conn_ctx->destroy_conn_indicator, 0);
 
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
index dd0b5f31c052..6664cc587f06 100644
--- a/drivers/nvme/hw/qedn/qedn_task.c
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -324,6 +324,17 @@ void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx)
 	/* Return tasks that aren't "Used by FW" to the pool */
 	list_for_each_entry_safe(qedn_task, task_tmp,
 				 &conn_ctx->active_task_list, entry) {
+		/* If we got this far, cleanup was already done
+		 * in which case we want to return the task to the pool and
+		 * release it. So we make sure the cleanup indication is down
+		 */
+		clear_bit(QEDN_TASK_WAIT_FOR_CLEANUP, &qedn_task->flags);
+
+		/* Special handling in case of ICREQ task */
+		if (unlikely(conn_ctx->state ==	CONN_STATE_WAIT_FOR_IC_COMP &&
+			     test_bit(QEDN_TASK_IS_ICREQ, &(qedn_task)->flags)))
+			qedn_common_clear_fw_sgl(&qedn_task->sgl_task_params);
+
 		qedn_clear_task(conn_ctx, qedn_task);
 		num_returned_tasks++;
 	}
@@ -669,7 +680,9 @@ void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
 		return;
 
 	if (likely(cqe->cqe_type == NVMETCP_FW_CQE_TYPE_NORMAL)) {
-		/* Placeholder - verify the connection was established */
+		if (unlikely(test_bit(QEDN_TASK_WAIT_FOR_CLEANUP,
+				      &qedn_task->flags)))
+			return;
 
 		switch (cqe->task_type) {
 		case NVMETCP_TASK_TYPE_HOST_WRITE:
@@ -711,6 +724,19 @@ void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
 			pr_info("Could not identify task type\n");
 		}
 	} else {
-		/* Placeholder - Recovery flows */
+		if (cqe->cqe_type == NVMETCP_FW_CQE_TYPE_CLEANUP) {
+			clear_bit(QEDN_TASK_WAIT_FOR_CLEANUP,
+				  &qedn_task->flags);
+			qedn_return_task_to_pool(conn_ctx, qedn_task);
+			atomic_dec(&conn_ctx->task_cleanups_cnt);
+			wake_up_interruptible(&conn_ctx->cleanup_waitq);
+
+			return;
+		}
+
+		 /* The else is NVMETCP_FW_CQE_TYPE_DUMMY - in which don't
+		  * return the task. The task will return during
+		  * NVMETCP_FW_CQE_TYPE_CLEANUP.
+		  */
 	}
 }
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 20/20] qedn: Add support of ASYNC
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (18 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 19/20] qedn: Add Connection and IO level recovery flows Prabhakar Kushwaha
@ 2021-06-29 12:47 ` Prabhakar Kushwaha
  2021-07-01 13:23 ` [PATCH v4 00/20] NVMeTCP Offload ULP Christoph Hellwig
  20 siblings, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-06-29 12:47 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, axboe, kbusch
  Cc: davem, kuba, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	prabhakar.pkin, malin1024

This patch implement ASYNC request and response event notification
handling at qedn driver level.

NVME Ofld layer's ASYNC request is treated similar to read with
fake CCCID. This CCCID used to route ASYNC notification back to
the NVME ofld layer.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/hw/qedn/qedn.h      |   8 ++
 drivers/nvme/hw/qedn/qedn_main.c |   1 +
 drivers/nvme/hw/qedn/qedn_task.c | 147 +++++++++++++++++++++++++++++--
 3 files changed, 148 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 5539791fc49b..b275d6b9bc59 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -96,6 +96,9 @@
 #define QEDN_TASK_CLEANUP_TMO 3000 /* 3 sec */
 #define QEDN_DRAIN_TMO 1000 /* 1 sec */
 
+#define QEDN_MAX_OUTSTAND_ASYNC 32
+#define QEDN_INVALID_CCCID (-1)
+
 enum qedn_state {
 	QEDN_STATE_CORE_PROBED = 0,
 	QEDN_STATE_CORE_OPEN,
@@ -178,6 +181,7 @@ struct qedn_ctx {
 
 enum qedn_task_flags {
 	QEDN_TASK_IS_ICREQ,
+	QEDN_TASK_ASYNC,
 	QEDN_TASK_USED_BY_FW,
 	QEDN_TASK_WAIT_FOR_CLEANUP,
 };
@@ -346,6 +350,10 @@ struct qedn_conn_ctx {
 	struct nvme_tcp_icresp_pdu icresp;
 	struct qedn_icreq_padding *icreq_pad;
 
+	DECLARE_BITMAP(async_cccid_idx_map, QEDN_MAX_OUTSTAND_ASYNC);
+	/* Spinlock for fetching pseudo CCCID for async request */
+	spinlock_t async_cccid_bitmap_lock;
+
 	/* "dummy" socket */
 	struct socket *sock;
 };
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index d18cc3624764..2aaf1af8f9a7 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -343,6 +343,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
 	atomic_set(&conn_ctx->destroy_conn_indicator, 0);
 
 	spin_lock_init(&conn_ctx->conn_state_lock);
+	spin_lock_init(&conn_ctx->async_cccid_bitmap_lock);
 
 	conn_ctx->qid = qid;
 
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
index 6664cc587f06..351afbc750b2 100644
--- a/drivers/nvme/hw/qedn/qedn_task.c
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -266,9 +266,45 @@ void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params)
 }
 
 inline void qedn_host_reset_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx,
-					     u16 cccid)
+					     u16 cccid, bool async)
 {
 	conn_ctx->host_cccid_itid[cccid].itid = cpu_to_le16(QEDN_INVALID_ITID);
+	if (unlikely(async))
+		clear_bit(cccid - NVME_AQ_DEPTH,
+			  conn_ctx->async_cccid_idx_map);
+}
+
+static int qedn_get_free_idx(struct qedn_conn_ctx *conn_ctx, unsigned int size)
+{
+	int idx;
+
+	spin_lock(&conn_ctx->async_cccid_bitmap_lock);
+	idx = find_first_zero_bit(conn_ctx->async_cccid_idx_map, size);
+	if (unlikely(idx >= size)) {
+		idx = -1;
+		spin_unlock(&conn_ctx->async_cccid_bitmap_lock);
+		goto err_idx;
+	}
+	set_bit(idx, conn_ctx->async_cccid_idx_map);
+	spin_unlock(&conn_ctx->async_cccid_bitmap_lock);
+
+err_idx:
+
+	return idx;
+}
+
+int qedn_get_free_async_cccid(struct qedn_conn_ctx *conn_ctx)
+{
+	int async_cccid;
+
+	async_cccid =
+		qedn_get_free_idx(conn_ctx, QEDN_MAX_OUTSTAND_ASYNC);
+	if (unlikely(async_cccid == QEDN_INVALID_CCCID))
+		pr_err("No available CCCID for Async.\n");
+	else
+		async_cccid += NVME_AQ_DEPTH;
+
+	return async_cccid;
 }
 
 inline void qedn_host_set_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx,
@@ -361,10 +397,12 @@ void qedn_return_task_to_pool(struct qedn_conn_ctx *conn_ctx,
 	struct qedn_fp_queue *fp_q = conn_ctx->fp_q;
 	struct qedn_io_resources *io_resrc;
 	unsigned long lock_flags;
+	bool async;
 
 	io_resrc = &fp_q->host_resrc;
 
 	spin_lock_irqsave(&qedn_task->lock, lock_flags);
+	async = test_bit(QEDN_TASK_ASYNC, &(qedn_task)->flags);
 	qedn_task->valid = 0;
 	qedn_task->flags = 0;
 	qedn_clear_sgl(conn_ctx->qedn, qedn_task);
@@ -372,7 +410,7 @@ void qedn_return_task_to_pool(struct qedn_conn_ctx *conn_ctx,
 
 	spin_lock(&conn_ctx->task_list_lock);
 	list_del(&qedn_task->entry);
-	qedn_host_reset_cccid_itid_entry(conn_ctx, qedn_task->cccid);
+	qedn_host_reset_cccid_itid_entry(conn_ctx, qedn_task->cccid, async);
 	spin_unlock(&conn_ctx->task_list_lock);
 
 	atomic_dec(&conn_ctx->num_active_tasks);
@@ -419,6 +457,60 @@ qedn_get_free_task_from_pool(struct qedn_conn_ctx *conn_ctx, u16 cccid)
 	return qedn_task;
 }
 
+void qedn_send_async_event_cmd(struct qedn_task_ctx *qedn_task,
+			       struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_tcp_ofld_req *async_req = qedn_task->req;
+	struct nvme_command *nvme_cmd = &async_req->nvme_cmd;
+	struct storage_sgl_task_params *sgl_task_params;
+	struct nvmetcp_task_params task_params;
+	struct nvme_tcp_cmd_pdu cmd_hdr;
+	struct nvmetcp_wqe *chain_sqe;
+	struct nvmetcp_wqe local_sqe;
+
+	set_bit(QEDN_TASK_ASYNC, &qedn_task->flags);
+	nvme_cmd->common.command_id = qedn_task->cccid;
+	qedn_task->task_size = 0;
+
+	/* Initialize sgl params */
+	sgl_task_params = &qedn_task->sgl_task_params;
+	sgl_task_params->total_buffer_size = 0;
+	sgl_task_params->num_sges = 0;
+	sgl_task_params->small_mid_sge = false;
+
+	task_params.opq.lo = cpu_to_le32(((u64)(qedn_task)) & 0xffffffff);
+	task_params.opq.hi = cpu_to_le32(((u64)(qedn_task)) >> 32);
+
+	/* Initialize task params */
+	task_params.context = qedn_task->fw_task_ctx;
+	task_params.sqe = &local_sqe;
+	task_params.tx_io_size = 0;
+	task_params.rx_io_size = 0;
+	task_params.conn_icid = (u16)conn_ctx->conn_handle;
+	task_params.itid = qedn_task->itid;
+	task_params.cq_rss_number = conn_ctx->default_cq;
+	task_params.send_write_incapsule = 0;
+
+	/* Internal impl. - async is treated like zero len read */
+	cmd_hdr.hdr.type = nvme_tcp_cmd;
+	cmd_hdr.hdr.flags = 0;
+	cmd_hdr.hdr.hlen = sizeof(cmd_hdr);
+	cmd_hdr.hdr.pdo = 0x0;
+	cmd_hdr.hdr.plen = cpu_to_le32(cmd_hdr.hdr.hlen);
+
+	qed_ops->init_read_io(&task_params, &cmd_hdr, nvme_cmd,
+			      &qedn_task->sgl_task_params);
+
+	set_bit(QEDN_TASK_USED_BY_FW, &qedn_task->flags);
+	atomic_inc(&conn_ctx->num_active_fw_tasks);
+
+	spin_lock(&conn_ctx->ep.doorbell_lock);
+	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
+	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
+	qedn_ring_doorbell(conn_ctx);
+	spin_unlock(&conn_ctx->ep.doorbell_lock);
+}
+
 int qedn_send_read_cmd(struct qedn_task_ctx *qedn_task,
 		       struct qedn_conn_ctx *conn_ctx)
 {
@@ -533,6 +625,21 @@ int qedn_send_write_cmd(struct qedn_task_ctx *qedn_task,
 	return 0;
 }
 
+static void qedn_return_error_req(struct nvme_tcp_ofld_req *req)
+{
+	__le16 status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+	union nvme_result res = {};
+
+	if (!req)
+		return;
+
+	/* Call request done to compelete the request */
+	if (req->done)
+		req->done(req, &res, status);
+	else
+		pr_err("request done not set !!!\n");
+}
+
 int qedn_queue_request(struct qedn_conn_ctx *qedn_conn,
 		       struct nvme_tcp_ofld_req *req)
 {
@@ -543,9 +650,17 @@ int qedn_queue_request(struct qedn_conn_ctx *qedn_conn,
 
 	rq = blk_mq_rq_from_pdu(req);
 
-	/* Placeholder - async */
+	if (unlikely(req->async)) {
+		cccid = qedn_get_free_async_cccid(qedn_conn);
+		if (cccid == QEDN_INVALID_CCCID) {
+			qedn_return_error_req(req);
+
+			return BLK_STS_NOTSUPP;
+		}
+	} else {
+		cccid = rq->tag;
+	}
 
-	cccid = rq->tag;
 	qedn_task = qedn_get_free_task_from_pool(qedn_conn, cccid);
 	if (unlikely(!qedn_task)) {
 		pr_err("Not able to allocate task context resource\n");
@@ -556,7 +671,11 @@ int qedn_queue_request(struct qedn_conn_ctx *qedn_conn,
 	req->private_data = qedn_task;
 	qedn_task->req = req;
 
-	/* Placeholder - handle (req->async) */
+	if (unlikely(req->async)) {
+		qedn_send_async_event_cmd(qedn_task, qedn_conn);
+
+		return BLK_STS_TRANSPORT;
+	}
 
 	/* Check if there are physical segments in request to determine the
 	 * task size. The logic of nvme_tcp_set_sg_null() will be implemented
@@ -629,16 +748,28 @@ static inline int qedn_comp_valid_task(struct qedn_task_ctx *qedn_task,
 int qedn_process_nvme_cqe(struct qedn_task_ctx *qedn_task,
 			  struct nvme_completion *cqe)
 {
+	struct qedn_conn_ctx *conn_ctx = qedn_task->qedn_conn;
+	struct nvme_tcp_ofld_req *req;
 	int rc = 0;
+	bool async;
+
+	async = test_bit(QEDN_TASK_ASYNC, &(qedn_task)->flags);
 
 	/* CQE arrives swapped
 	 * Swapping requirement will be removed in future FW versions
 	 */
 	qedn_swap_bytes((u32 *)cqe, (sizeof(*cqe) / sizeof(u32)));
 
-	/* Placeholder - async */
-
-	rc = qedn_comp_valid_task(qedn_task, &cqe->result, cqe->status);
+	if (unlikely(async)) {
+		qedn_return_task_to_pool(conn_ctx, qedn_task);
+		req = qedn_task->req;
+		if (req->done)
+			req->done(req, &cqe->result, cqe->status);
+		else
+			pr_err("request done not set for async request !!!\n");
+	} else {
+		rc = qedn_comp_valid_task(qedn_task, &cqe->result, cqe->status);
+	}
 
 	return rc;
 }
-- 
2.24.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 00/20] NVMeTCP Offload ULP
  2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
                   ` (19 preceding siblings ...)
  2021-06-29 12:47 ` [PATCH v4 20/20] qedn: Add support of ASYNC Prabhakar Kushwaha
@ 2021-07-01 13:23 ` Christoph Hellwig
  2021-07-07 14:58   ` Hannes Reinecke
  20 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2021-07-01 13:23 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: linux-nvme, sagi, hch, axboe, kbusch, davem, kuba, smalin,
	aelior, mkalderon, okulkarni, prabhakar.pkin, malin1024

I looked over it a bit (and will send some individual comments), and I
have to say I really dislike how this layer and how the hardware works.

The whole point of NVMe is that we have a nicely standardized PCIe
register level interface.  One that you can trivially hide a TCP offload
under with just a little control plane logic.  But instead we come up with
this gigant mess.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 01/20] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2021-06-29 12:47 ` [PATCH v4 01/20] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Prabhakar Kushwaha
@ 2021-07-01 13:34   ` Christoph Hellwig
  2021-07-05 15:09     ` Shai Malin
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2021-07-01 13:34 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: linux-nvme, sagi, hch, axboe, kbusch, davem, kuba, smalin,
	aelior, mkalderon, okulkarni, prabhakar.pkin, malin1024,
	Dean Balandin

> +/* Kernel includes */


> +/* Driver includes */

I think these comments are a little pointless.

> +int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev)

Can you spell out offload everywhere?

> +int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)
> +{
> +	/* Placeholder - invoke error recovery flow */
> +
> +	return 0;
> +}

Please don't add any placeholders like this.  The whole file is
still pretty small with all the patches applied, so no need to split it.

> +static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
> +	.name		= "tcp_offload",
> +	.module		= THIS_MODULE,
> +	.required_opts	= NVMF_OPT_TRADDR,
> +	.allowed_opts	= NVMF_OPT_TRSVCID | NVMF_OPT_NR_WRITE_QUEUES  |
> +			  NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
> +			  NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_HDR_DIGEST |
> +			  NVMF_OPT_DATA_DIGEST | NVMF_OPT_NR_POLL_QUEUES |
> +			  NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE,
> +};
> +
> +static int __init nvme_tcp_ofld_init_module(void)
> +{
> +	nvmf_register_transport(&nvme_tcp_ofld_transport);
> +
> +	return 0;
> +}
> +
> +static void __exit nvme_tcp_ofld_cleanup_module(void)
> +{
> +	nvmf_unregister_transport(&nvme_tcp_ofld_transport);
> +}

Looking at the final result this doesn't do much.  Assuming we want
to support these kinds of whacky offloads (which I'd rather not do),
the proper way would be to allow registering multiple transport_ops
structures for a given name rather adding an indirection that duplicates
a whole lot of code.

> +	/* Offload device specific driver context */
> +	int num_hw_vectors;
> +};

As far as I can tell this is about queues, not "vectors" of some kind.

> +struct nvme_tcp_ofld_req {
> +	struct nvme_request req;
> +	struct nvme_command nvme_cmd;
> +	struct list_head queue_entry;
> +	struct nvme_tcp_ofld_queue *queue;
> +
> +	/* Offload device specific driver context */
> +	void *private_data;
> +
> +	/* async flag is used to distinguish between async and IO flow
> +	 * in common send_req() of nvme_tcp_ofld_ops.
> +	 */
> +	bool async;
> +
> +	void (*done)(struct nvme_tcp_ofld_req *req,
> +		     union nvme_result *result,
> +		     __le16 status);

This always points to nvme_tcp_ofld_req_done, why the costly indirection?

> +	/* Error callback function */
> +	int (*report_err)(struct nvme_tcp_ofld_queue *queue);
> +};

This seems to always point to nvme_tcp_ofld_report_queue_err, why the
indirection?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 03/20] nvme-fabrics: Expose nvmf_check_required_opts() globally
  2021-06-29 12:47 ` [PATCH v4 03/20] nvme-fabrics: Expose nvmf_check_required_opts() globally Prabhakar Kushwaha
@ 2021-07-01 13:35   ` Christoph Hellwig
  2021-07-05 15:10     ` Shai Malin
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2021-07-01 13:35 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: linux-nvme, sagi, hch, axboe, kbusch, davem, kuba, smalin,
	aelior, mkalderon, okulkarni, prabhakar.pkin, malin1024

This and the previous patch are pretty much two sides of the same coin
and belong together.  But without the registration indirection they
wouldn't even be needed.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 04/20] nvme-tcp-offload: Add device scan implementation
  2021-06-29 12:47 ` [PATCH v4 04/20] nvme-tcp-offload: Add device scan implementation Prabhakar Kushwaha
@ 2021-07-01 13:36   ` Christoph Hellwig
  2021-07-05 15:10     ` Shai Malin
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2021-07-01 13:36 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: linux-nvme, sagi, hch, axboe, kbusch, davem, kuba, smalin,
	aelior, mkalderon, okulkarni, prabhakar.pkin, malin1024,
	Dean Balandin

> +	mutex_lock(&nvme_tcp_ofld_devices_mutex);
> +	list_for_each_entry(dev, &nvme_tcp_ofld_devices, entry) {
> +	/* ctrl includes the destination ip, source ip (if provided) and
> +	 * network interface (if provided).
> +	 */

This is not the normal kernel comment style, and also incorrectly
indented.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 09/20] qedn: Add qedn - Marvell's NVMeTCP HW offload device driver
  2021-06-29 12:47 ` [PATCH v4 09/20] qedn: Add qedn - Marvell's NVMeTCP HW offload device driver Prabhakar Kushwaha
@ 2021-07-01 13:41   ` Christoph Hellwig
  2021-07-05 15:13     ` Shai Malin
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2021-07-01 13:41 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: linux-nvme, sagi, hch, axboe, kbusch, davem, kuba, smalin,
	aelior, mkalderon, okulkarni, prabhakar.pkin, malin1024,
	Arie Gershberg

On Tue, Jun 29, 2021 at 03:47:32PM +0300, Prabhakar Kushwaha wrote:
> From: Shai Malin <smalin@marvell.com>
> 
> This patch will present the skeleton of the qedn driver.
> The new driver will be added under "drivers/nvme/hw/qedn" and will be
> enabled by the Kconfig "Marvell NVM Express over Fabrics TCP offload".

I don't see why we need a separate hw/ directory.   nvme-pci.c already
is very much a hardware driver.

> +config NVME_QEDN
> +	tristate "Marvell NVM Express over Fabrics TCP offload"
> +	depends on NVME_TCP_OFFLOAD

I think it also depends on PCI.

This whole patch is a bit pointless.  In general splitting a driver
submission into multiple patches is not very helpful unless the later
patches add pretty much optional and clearly separatable bits.  Otherwise
it just becomes really hard to review.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 10/20] qedn: Add qedn probe
  2021-06-29 12:47 ` [PATCH v4 10/20] qedn: Add qedn probe Prabhakar Kushwaha
@ 2021-07-01 13:48   ` Christoph Hellwig
  2021-07-05 15:13     ` Shai Malin
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2021-07-01 13:48 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: linux-nvme, sagi, hch, axboe, kbusch, davem, kuba, smalin,
	aelior, mkalderon, okulkarni, prabhakar.pkin, malin1024,
	Dean Balandin

On Tue, Jun 29, 2021 at 03:47:33PM +0300, Prabhakar Kushwaha wrote:
> From: Shai Malin <smalin@marvell.com>
> 
> This patch introduces the functionality of loading and unloading
> physical function.
> qedn_probe() loads the offload device PF(physical function), and
> initialize the HW and the FW with the PF parameters using the
> HW ops->qed_nvmetcp_ops, which are similar to other "qed_*_ops" which
> are used by the qede, qedr, qedf and qedi device drivers.
> qedn_remove() unloads the offload device PF, re-initialize the HW and
> the FW with the PF parameters.
> 
> The struct qedn_ctx is per PF container for PF-specific attributes and
> resources.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/nvme/hw/Kconfig          |   1 +
>  drivers/nvme/hw/qedn/qedn.h      |  26 ++++++
>  drivers/nvme/hw/qedn/qedn_main.c | 155 ++++++++++++++++++++++++++++++-
>  3 files changed, 177 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/hw/Kconfig b/drivers/nvme/hw/Kconfig
> index 374f1f9dbd3d..91b1bd6f07d8 100644
> --- a/drivers/nvme/hw/Kconfig
> +++ b/drivers/nvme/hw/Kconfig
> @@ -2,6 +2,7 @@
>  config NVME_QEDN
>  	tristate "Marvell NVM Express over Fabrics TCP offload"
>  	depends on NVME_TCP_OFFLOAD
> +	select QED_NVMETCP

This makes kconfig unhappy:

WARNING: unmet direct dependencies detected for QED_NVMETCP
  Depends on [n]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_QLOGIC [=n]
  Selected by [y]:
  - NVME_QEDN [=y] && NVME_TCP_OFFLOAD [=y]

While we're at unhappy:  without pinpointing it to the pointlessly
split patches, this series generates tons of sparse warnings:


drivers/nvme/hw/qedn/qedn_main.c:17:30: warning: symbol 'qed_ops' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_main.c:38:24: warning: symbol 'qedn_add_llh_filter' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_main.c:90:6: warning: symbol 'qedn_dec_llh_filter' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_main.c:493:6: warning: symbol 'qedn_fw_cq_fp_handler' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_main.c:813:58: warning: incorrect type in assignment (different base types)
drivers/nvme/hw/qedn/qedn_main.c:813:58:    expected restricted __le32 [usertype] hi
drivers/nvme/hw/qedn/qedn_main.c:813:58:    got unsigned int [usertype]
drivers/nvme/hw/qedn/qedn_main.c:814:58: warning: incorrect type in assignment (different base types)
drivers/nvme/hw/qedn/qedn_main.c:814:58:    expected restricted __le32 [usertype] lo
drivers/nvme/hw/qedn/qedn_main.c:814:58:    got unsigned int [usertype]
drivers/nvme/hw/qedn/qedn_task.c:358:6: warning: symbol 'qedn_return_task_to_pool' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_task.c:422:5: warning: symbol 'qedn_send_read_cmd' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_task.c:502:32: warning: incorrect type in assignment (different base types)
drivers/nvme/hw/qedn/qedn_task.c:502:32:    expected unsigned short [usertype] host_cccid
drivers/nvme/hw/qedn/qedn_task.c:502:32:    got restricted __le16 [usertype]
drivers/nvme/hw/qedn/qedn_task.c:471:5: warning: symbol 'qedn_send_write_cmd' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_task.c:553:24: warning: incorrect type in return expression (different base types)
drivers/nvme/hw/qedn/qedn_task.c:553:24:    expected int
drivers/nvme/hw/qedn/qedn_task.c:553:24:    got restricted blk_status_t [usertype]
drivers/nvme/hw/qedn/qedn_task.c:576:24: warning: incorrect type in return expression (different base types)
drivers/nvme/hw/qedn/qedn_task.c:576:24:    expected int
drivers/nvme/hw/qedn/qedn_task.c:576:24:    got restricted blk_status_t [usertype]
drivers/nvme/hw/qedn/qedn_task.c:586:22: warning: symbol 'qedn_cqe_get_active_task' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_task.c:629:5: warning: symbol 'qedn_process_nvme_cqe' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_task.c:646:5: warning: symbol 'qedn_complete_c2h' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_task.c: note: in included file (through drivers/nvme/hw/qedn/qedn.h):
drivers/nvme/hw/qedn/../../host/tcp-offload.h:207:45: error: marked inline, but without a definition
drivers/nvme/hw/qedn/qedn_conn.c:107:22: warning: incorrect type in assignment (different base types)
drivers/nvme/hw/qedn/qedn_conn.c:107:22:    expected unsigned short [usertype] src_port
drivers/nvme/hw/qedn/qedn_conn.c:107:22:    got restricted __be16 [usertype] sin_port
drivers/nvme/hw/qedn/qedn_conn.c:98:5: warning: symbol 'qedn_fill_ep_addr4' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_conn.c:126:22: warning: incorrect type in assignment (different base types)
drivers/nvme/hw/qedn/qedn_conn.c:126:22:    expected unsigned short [usertype] src_port
drivers/nvme/hw/qedn/qedn_conn.c:126:22:    got restricted __be16 [usertype] sin6_port
drivers/nvme/hw/qedn/qedn_conn.c:116:5: warning: symbol 'qedn_fill_ep_addr6' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_conn.c:557:23: warning: cast from restricted __le16
drivers/nvme/hw/qedn/qedn_conn.c:560:27: warning: cast from restricted __le32
drivers/nvme/hw/qedn/qedn_conn.c:629:6: warning: symbol 'qedn_error_recovery' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_conn.c:727:6: warning: symbol 'qedn_prep_db_data' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_conn.c:933:6: warning: symbol 'qedn_cleanup_all_fw_tasks' was not declared. Should it be static?
drivers/nvme/hw/qedn/qedn_conn.c:976:6: warning: symbol 'qedn_destroy_connection' was not declared. Should it be static?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 01/20] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2021-07-01 13:34   ` Christoph Hellwig
@ 2021-07-05 15:09     ` Shai Malin
  2021-07-12 14:39       ` Prabhakar Kushwaha
  2021-07-16  7:45       ` Christoph Hellwig
  0 siblings, 2 replies; 37+ messages in thread
From: Shai Malin @ 2021-07-05 15:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Prabhakar Kushwaha, linux-nvme, Sagi Grimberg, axboe,
	Keith Busch, David Miller, Jakub Kicinski, Shai Malin,
	Ariel Elior, Michal Kalderon, okulkarni, prabhakar.pkin,
	Dean Balandin

On Thu, 1 Jul 2021 at 16:34, Christoph Hellwig <hch@lst.de> wrote:
>
> > +/* Kernel includes */
>
>
> > +/* Driver includes */
>
> I think these comments are a little pointless.

Will be removed.

>
> > +int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev)
>
> Can you spell out offload everywhere?

Sure.

>
> > +int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)
> > +{
> > +     /* Placeholder - invoke error recovery flow */
> > +
> > +     return 0;
> > +}
>
> Please don't add any placeholders like this.  The whole file is
> still pretty small with all the patches applied, so no need to split it.

Sure.

>
> > +static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
> > +     .name           = "tcp_offload",
> > +     .module         = THIS_MODULE,
> > +     .required_opts  = NVMF_OPT_TRADDR,
> > +     .allowed_opts   = NVMF_OPT_TRSVCID | NVMF_OPT_NR_WRITE_QUEUES  |
> > +                       NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
> > +                       NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_HDR_DIGEST |
> > +                       NVMF_OPT_DATA_DIGEST | NVMF_OPT_NR_POLL_QUEUES |
> > +                       NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE,
> > +};
> > +
> > +static int __init nvme_tcp_ofld_init_module(void)
> > +{
> > +     nvmf_register_transport(&nvme_tcp_ofld_transport);
> > +
> > +     return 0;
> > +}
> > +
> > +static void __exit nvme_tcp_ofld_cleanup_module(void)
> > +{
> > +     nvmf_unregister_transport(&nvme_tcp_ofld_transport);
> > +}
>
> Looking at the final result this doesn't do much.  Assuming we want
> to support these kinds of whacky offloads (which I'd rather not do),
> the proper way would be to allow registering multiple transport_ops
> structures for a given name rather adding an indirection that duplicates
> a whole lot of code.

In that case, would you prefer that we invoke the tcp-offload from
within the tcp flow?
Should it be with the same transport name (“tcp”) or with a different
transport name (“tcp_offload”)?

Also, would you prefer that we register the offload device driver
directly to blk_mq layer or through the tcp-offload layer?

>
> > +     /* Offload device specific driver context */
> > +     int num_hw_vectors;
> > +};
>
> As far as I can tell this is about queues, not "vectors" of some kind.

It's the num irqs which the device support. We can rename it "num_hw_queues".

>
> > +struct nvme_tcp_ofld_req {
> > +     struct nvme_request req;
> > +     struct nvme_command nvme_cmd;
> > +     struct list_head queue_entry;
> > +     struct nvme_tcp_ofld_queue *queue;
> > +
> > +     /* Offload device specific driver context */
> > +     void *private_data;
> > +
> > +     /* async flag is used to distinguish between async and IO flow
> > +      * in common send_req() of nvme_tcp_ofld_ops.
> > +      */
> > +     bool async;
> > +
> > +     void (*done)(struct nvme_tcp_ofld_req *req,
> > +                  union nvme_result *result,
> > +                  __le16 status);
>
> This always points to nvme_tcp_ofld_req_done, why the costly indirection?

Will be fixed.

>
> > +     /* Error callback function */
> > +     int (*report_err)(struct nvme_tcp_ofld_queue *queue);
> > +};
>
> This seems to always point to nvme_tcp_ofld_report_queue_err, why the
> indirection?

Will be fixed.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 03/20] nvme-fabrics: Expose nvmf_check_required_opts() globally
  2021-07-01 13:35   ` Christoph Hellwig
@ 2021-07-05 15:10     ` Shai Malin
  0 siblings, 0 replies; 37+ messages in thread
From: Shai Malin @ 2021-07-05 15:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Prabhakar Kushwaha, linux-nvme, Sagi Grimberg, axboe,
	Keith Busch, David Miller, Jakub Kicinski, Shai Malin,
	Ariel Elior, Michal Kalderon, okulkarni, prabhakar.pkin

On Thu, 1 Jul 2021 at 16:35, Christoph Hellwig <hch@lst.de> wrote:
> This and the previous patch are pretty much two sides of the same coin
> and belong together.  But without the registration indirection they
> wouldn't even be needed.

Sure.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 04/20] nvme-tcp-offload: Add device scan implementation
  2021-07-01 13:36   ` Christoph Hellwig
@ 2021-07-05 15:10     ` Shai Malin
  0 siblings, 0 replies; 37+ messages in thread
From: Shai Malin @ 2021-07-05 15:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Prabhakar Kushwaha, linux-nvme, Sagi Grimberg, axboe,
	Keith Busch, David Miller, Jakub Kicinski, Shai Malin,
	Ariel Elior, Michal Kalderon, okulkarni, prabhakar.pkin,
	Dean Balandin

On Thu, 1 Jul 2021 at 16:36, Christoph Hellwig <hch@lst.de> wrote:
> > +     mutex_lock(&nvme_tcp_ofld_devices_mutex);
> > +     list_for_each_entry(dev, &nvme_tcp_ofld_devices, entry) {
> > +     /* ctrl includes the destination ip, source ip (if provided) and
> > +      * network interface (if provided).
> > +      */
>
> This is not the normal kernel comment style, and also incorrectly
> indented.

Will be fixed.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 09/20] qedn: Add qedn - Marvell's NVMeTCP HW offload device driver
  2021-07-01 13:41   ` Christoph Hellwig
@ 2021-07-05 15:13     ` Shai Malin
  0 siblings, 0 replies; 37+ messages in thread
From: Shai Malin @ 2021-07-05 15:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Prabhakar Kushwaha, linux-nvme, Sagi Grimberg, axboe,
	Keith Busch, David Miller, Jakub Kicinski, Shai Malin,
	Ariel Elior, Michal Kalderon, okulkarni, prabhakar.pkin,
	Arie Gershberg

On Thu, 1 Jul 2021 at 16:41, Christoph Hellwig <hch@lst.de> wrote:
> On Tue, Jun 29, 2021 at 03:47:32PM +0300, Prabhakar Kushwaha wrote:
> > From: Shai Malin <smalin@marvell.com>
> >
> > This patch will present the skeleton of the qedn driver.
> > The new driver will be added under "drivers/nvme/hw/qedn" and will be
> > enabled by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
>
> I don't see why we need a separate hw/ directory.   nvme-pci.c already
> is very much a hardware driver.

We will restructure and keep only the bare minimum qedn.c under nvme/host.

>
> > +config NVME_QEDN
> > +     tristate "Marvell NVM Express over Fabrics TCP offload"
> > +     depends on NVME_TCP_OFFLOAD
>
> I think it also depends on PCI.

Thanks!

>
> This whole patch is a bit pointless.  In general splitting a driver
> submission into multiple patches is not very helpful unless the later
> patches add pretty much optional and clearly separatable bits.  Otherwise
> it just becomes really hard to review.

Understood. We will improve it.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 10/20] qedn: Add qedn probe
  2021-07-01 13:48   ` Christoph Hellwig
@ 2021-07-05 15:13     ` Shai Malin
  0 siblings, 0 replies; 37+ messages in thread
From: Shai Malin @ 2021-07-05 15:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Prabhakar Kushwaha, linux-nvme, Sagi Grimberg, axboe,
	Keith Busch, David Miller, Jakub Kicinski, Shai Malin,
	Ariel Elior, Michal Kalderon, okulkarni, prabhakar.pkin,
	Dean Balandin

On Thu, 1 Jul 2021 at 16:48, Christoph Hellwig <hch@lst.de> wrote:
> On Tue, Jun 29, 2021 at 03:47:33PM +0300, Prabhakar Kushwaha wrote:
> > From: Shai Malin <smalin@marvell.com>
> >
> > This patch introduces the functionality of loading and unloading
> > physical function.
> > qedn_probe() loads the offload device PF(physical function), and
> > initialize the HW and the FW with the PF parameters using the
> > HW ops->qed_nvmetcp_ops, which are similar to other "qed_*_ops" which
> > are used by the qede, qedr, qedf and qedi device drivers.
> > qedn_remove() unloads the offload device PF, re-initialize the HW and
> > the FW with the PF parameters.
> >
> > The struct qedn_ctx is per PF container for PF-specific attributes and
> > resources.
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > Reviewed-by: Hannes Reinecke <hare@suse.de>
> > ---
> >  drivers/nvme/hw/Kconfig          |   1 +
> >  drivers/nvme/hw/qedn/qedn.h      |  26 ++++++
> >  drivers/nvme/hw/qedn/qedn_main.c | 155 ++++++++++++++++++++++++++++++-
> >  3 files changed, 177 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/nvme/hw/Kconfig b/drivers/nvme/hw/Kconfig
> > index 374f1f9dbd3d..91b1bd6f07d8 100644
> > --- a/drivers/nvme/hw/Kconfig
> > +++ b/drivers/nvme/hw/Kconfig
> > @@ -2,6 +2,7 @@
> >  config NVME_QEDN
> >       tristate "Marvell NVM Express over Fabrics TCP offload"
> >       depends on NVME_TCP_OFFLOAD
> > +     select QED_NVMETCP
>
> This makes kconfig unhappy:
>
> WARNING: unmet direct dependencies detected for QED_NVMETCP
>   Depends on [n]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_QLOGIC [=n]
>   Selected by [y]:
>   - NVME_QEDN [=y] && NVME_TCP_OFFLOAD [=y]
>
> While we're at unhappy:  without pinpointing it to the pointlessly
> split patches, this series generates tons of sparse warnings:
>

Sorry about that. Will be fixed.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 00/20] NVMeTCP Offload ULP
  2021-07-01 13:23 ` [PATCH v4 00/20] NVMeTCP Offload ULP Christoph Hellwig
@ 2021-07-07 14:58   ` Hannes Reinecke
  2021-07-07 15:07     ` Keith Busch
  0 siblings, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2021-07-07 14:58 UTC (permalink / raw)
  To: Christoph Hellwig, Prabhakar Kushwaha
  Cc: linux-nvme, sagi, axboe, kbusch, davem, kuba, smalin, aelior,
	mkalderon, okulkarni, prabhakar.pkin, malin1024

On 7/1/21 3:23 PM, Christoph Hellwig wrote:
> I looked over it a bit (and will send some individual comments), and I
> have to say I really dislike how this layer and how the hardware works.
> 
> The whole point of NVMe is that we have a nicely standardized PCIe
> register level interface.  One that you can trivially hide a TCP offload
> under with just a little control plane logic.  But instead we come up with
> this gigant mess.
> 
I can't really see how this control plane logic should work; how would 
the entire NVMe-oF discovery be abstracted away to hide behind an 
NVMe-PCI device?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 00/20] NVMeTCP Offload ULP
  2021-07-07 14:58   ` Hannes Reinecke
@ 2021-07-07 15:07     ` Keith Busch
  2021-07-07 15:25       ` Hannes Reinecke
  0 siblings, 1 reply; 37+ messages in thread
From: Keith Busch @ 2021-07-07 15:07 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Prabhakar Kushwaha, linux-nvme, sagi, axboe,
	davem, kuba, smalin, aelior, mkalderon, okulkarni,
	prabhakar.pkin, malin1024

On Wed, Jul 07, 2021 at 04:58:44PM +0200, Hannes Reinecke wrote:
> On 7/1/21 3:23 PM, Christoph Hellwig wrote:
> > I looked over it a bit (and will send some individual comments), and I
> > have to say I really dislike how this layer and how the hardware works.
> > 
> > The whole point of NVMe is that we have a nicely standardized PCIe
> > register level interface.  One that you can trivially hide a TCP offload
> > under with just a little control plane logic.  But instead we come up with
> > this gigant mess.
> > 
> I can't really see how this control plane logic should work; how would the
> entire NVMe-oF discovery be abstracted away to hide behind an NVMe-PCI
> device?

Devices are already doing this. The discovery setup is device specific,
though.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 00/20] NVMeTCP Offload ULP
  2021-07-07 15:07     ` Keith Busch
@ 2021-07-07 15:25       ` Hannes Reinecke
  0 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2021-07-07 15:25 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Prabhakar Kushwaha, linux-nvme, sagi, axboe,
	davem, kuba, smalin, aelior, mkalderon, okulkarni,
	prabhakar.pkin, malin1024

On 7/7/21 5:07 PM, Keith Busch wrote:
> On Wed, Jul 07, 2021 at 04:58:44PM +0200, Hannes Reinecke wrote:
>> On 7/1/21 3:23 PM, Christoph Hellwig wrote:
>>> I looked over it a bit (and will send some individual comments), and I
>>> have to say I really dislike how this layer and how the hardware works.
>>>
>>> The whole point of NVMe is that we have a nicely standardized PCIe
>>> register level interface.  One that you can trivially hide a TCP offload
>>> under with just a little control plane logic.  But instead we come up with
>>> this gigant mess.
>>>
>> I can't really see how this control plane logic should work; how would the
>> entire NVMe-oF discovery be abstracted away to hide behind an NVMe-PCI
>> device?
> 
> Devices are already doing this. The discovery setup is device specific,
> though.
> 
Oh, grand.
I hoped we could steer away from this after the horrible experience we 
had with iSCSI offload engines.
I'd rather have a standardized way for this.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 01/20] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2021-07-05 15:09     ` Shai Malin
@ 2021-07-12 14:39       ` Prabhakar Kushwaha
  2021-07-16  7:45       ` Christoph Hellwig
  1 sibling, 0 replies; 37+ messages in thread
From: Prabhakar Kushwaha @ 2021-07-12 14:39 UTC (permalink / raw)
  To: Christoph Hellwig, linux-nvme, axboe, Sagi Grimberg, Keith Busch
  Cc: Prabhakar Kushwaha, David Miller, Jakub Kicinski, Shai Malin,
	Ariel Elior, Michal Kalderon, Omkar Kulkarni, Dean Balandin,
	Shai Malin

Hi Christoph,

On Mon, Jul 5, 2021 at 8:40 PM Shai Malin <malin1024@gmail.com> wrote:
>
> On Thu, 1 Jul 2021 at 16:34, Christoph Hellwig <hch@lst.de> wrote:
> >

[snip]

>
> >
> > > +static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
> > > +     .name           = "tcp_offload",
> > > +     .module         = THIS_MODULE,
> > > +     .required_opts  = NVMF_OPT_TRADDR,
> > > +     .allowed_opts   = NVMF_OPT_TRSVCID | NVMF_OPT_NR_WRITE_QUEUES  |
> > > +                       NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
> > > +                       NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_HDR_DIGEST |
> > > +                       NVMF_OPT_DATA_DIGEST | NVMF_OPT_NR_POLL_QUEUES |
> > > +                       NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE,
> > > +};
> > > +
> > > +static int __init nvme_tcp_ofld_init_module(void)
> > > +{
> > > +     nvmf_register_transport(&nvme_tcp_ofld_transport);
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +static void __exit nvme_tcp_ofld_cleanup_module(void)
> > > +{
> > > +     nvmf_unregister_transport(&nvme_tcp_ofld_transport);
> > > +}
> >
> > Looking at the final result this doesn't do much.  Assuming we want
> > to support these kinds of whacky offloads (which I'd rather not do),
> > the proper way would be to allow registering multiple transport_ops
> > structures for a given name rather adding an indirection that duplicates
> > a whole lot of code.
>
> In that case, would you prefer that we invoke the tcp-offload from
> within the tcp flow?
> Should it be with the same transport name (“tcp”) or with a different
> transport name (“tcp_offload”)?
>
> Also, would you prefer that we register the offload device driver
> directly to blk_mq layer or through the tcp-offload layer?
>

I hope you got some time to look into the above queries.
We would appreciate your feedback in order to finalize the design.

--pk

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 01/20] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2021-07-05 15:09     ` Shai Malin
  2021-07-12 14:39       ` Prabhakar Kushwaha
@ 2021-07-16  7:45       ` Christoph Hellwig
  1 sibling, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-07-16  7:45 UTC (permalink / raw)
  To: Shai Malin
  Cc: Christoph Hellwig, Prabhakar Kushwaha, linux-nvme, Sagi Grimberg,
	axboe, Keith Busch, David Miller, Jakub Kicinski, Shai Malin,
	Ariel Elior, Michal Kalderon, okulkarni, prabhakar.pkin,
	Dean Balandin

On Mon, Jul 05, 2021 at 06:09:58PM +0300, Shai Malin wrote:
> > > +static void __exit nvme_tcp_ofld_cleanup_module(void)
> > > +{
> > > +     nvmf_unregister_transport(&nvme_tcp_ofld_transport);
> > > +}
> >
> > Looking at the final result this doesn't do much.  Assuming we want
> > to support these kinds of whacky offloads (which I'd rather not do),
> > the proper way would be to allow registering multiple transport_ops
> > structures for a given name rather adding an indirection that duplicates
> > a whole lot of code.
> 
> In that case, would you prefer that we invoke the tcp-offload from
> within the tcp flow?
> Should it be with the same transport name (“tcp”) or with a different
> transport name (“tcp_offload”)?
> 
> Also, would you prefer that we register the offload device driver
> directly to blk_mq layer or through the tcp-offload layer?

As said, we should allow the different offload drivers register
as te offload transport and just iterate through the multiple instances
of the transport until we find a match instead of duplicating the
registration infrastructure.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-07-16  7:45 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-29 12:47 [PATCH v4 00/20] NVMeTCP Offload ULP Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 01/20] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Prabhakar Kushwaha
2021-07-01 13:34   ` Christoph Hellwig
2021-07-05 15:09     ` Shai Malin
2021-07-12 14:39       ` Prabhakar Kushwaha
2021-07-16  7:45       ` Christoph Hellwig
2021-06-29 12:47 ` [PATCH v4 02/20] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 03/20] nvme-fabrics: Expose nvmf_check_required_opts() globally Prabhakar Kushwaha
2021-07-01 13:35   ` Christoph Hellwig
2021-07-05 15:10     ` Shai Malin
2021-06-29 12:47 ` [PATCH v4 04/20] nvme-tcp-offload: Add device scan implementation Prabhakar Kushwaha
2021-07-01 13:36   ` Christoph Hellwig
2021-07-05 15:10     ` Shai Malin
2021-06-29 12:47 ` [PATCH v4 05/20] nvme-tcp-offload: Add controller level implementation Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 06/20] nvme-tcp-offload: Add controller level error recovery implementation Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 07/20] nvme-tcp-offload: Add queue level implementation Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 08/20] nvme-tcp-offload: Add IO " Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 09/20] qedn: Add qedn - Marvell's NVMeTCP HW offload device driver Prabhakar Kushwaha
2021-07-01 13:41   ` Christoph Hellwig
2021-07-05 15:13     ` Shai Malin
2021-06-29 12:47 ` [PATCH v4 10/20] qedn: Add qedn probe Prabhakar Kushwaha
2021-07-01 13:48   ` Christoph Hellwig
2021-07-05 15:13     ` Shai Malin
2021-06-29 12:47 ` [PATCH v4 11/20] qedn: Add qedn_claim_dev API support Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 12/20] qedn: Add IRQ and fast-path resources initializations Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 13/20] qedn: Add connection-level slowpath functionality Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 14/20] qedn: Add support of configuring HW filter block Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 15/20] qedn: Add IO level qedn_send_req and fw_cq workqueue Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 16/20] qedn: Add support of Task and SGL Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 17/20] qedn: Add support of NVME ICReq & ICResp Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 18/20] qedn: Add IO level fastpath functionality Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 19/20] qedn: Add Connection and IO level recovery flows Prabhakar Kushwaha
2021-06-29 12:47 ` [PATCH v4 20/20] qedn: Add support of ASYNC Prabhakar Kushwaha
2021-07-01 13:23 ` [PATCH v4 00/20] NVMeTCP Offload ULP Christoph Hellwig
2021-07-07 14:58   ` Hannes Reinecke
2021-07-07 15:07     ` Keith Busch
2021-07-07 15:25       ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.