All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP
@ 2020-11-19 14:21 Shai Malin
  2020-11-19 14:21 ` [PATCH 1/7] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Shai Malin @ 2020-11-19 14:21 UTC (permalink / raw)
  To: linux-nvme, sagi, Erik.Smith, Douglas.Farley, liuw
  Cc: smalin, aelior, agershberg, mkalderon, nassa, dbalandin, malin1024

This patch series introduces the nvme-tcp-offload ULP host layer, which will be a new transport type called "tcp-offload" and will serve as an abstraction layer to work with vendor specific nvme-tcp offload drivers.

The nvme-tcp-offload transport can co-exist with the existing tcp and other transports. The tcp offload was designed so that stack changes are kept to a bare minimum: only registering new transports. All other APIs, ops etc. are identical to the regular tcp transport.
Representing the TCP offload as a new transport allows clear and manageable differentiation between the connections which should use the offload path and those that are not offloaded (even on the same device).


Queue Initialization:
=====================
The nvme-tcp-offload ULP module shall register with the existing nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following ops:
 - claim_dev() - in order to resolve the route to the target according to the net_dev.
 - create_queue() - in order to create offloaded nvme-tcp queue.

The nvme-tcp-offload ULP module shall manage all the controller level functionalities, call claim_dev and based on the return values shall call the relevant module create_queue in order to create the admin queue and the IO queues.


IO-path:
========
The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor driver and later, the nvme-tcp-offload vendor driver return the request completion (the IO completion).
No additional handling is needed in between; this design will reduce the CPU utilization as we will describe below.

The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following IO-path ops:
 - init_req()
 - map_sg() - in order to map the request sg (similar to nvme_rdma_map_data() ).
 - send_req() - in order to pass the request to the handling of the offload driver that shall pass it to the vendor specific device.

Once the IO completes, the nvme-tcp-offload vendor driver shall call command.done() that will invoke the nvme-tcp-offload ULP layer to complete the request.


TCP events:
===========
The Marvell HW engine handle all the TCP re-transmissions and OOO events.


Teardown and errors:
====================
In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall call the nvme_tcp_ofld_report_queue_err.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following teardown ops:
 - drain_queue()
 - destroy_queue()
 

 The Marvell HW engine:
======================
The Marvell HW engine is capable of offloading the entire TCP/IP layer and managing up to 64K connections as already done 
with iWARP (by the Marvell qedr driver) and iSCSI (by the Marvell qedi driver).
In addition, the Marvell HW engine offloads the NVMeTCP queue layer and is able to manage the IO level also in case of TCP re-transmittions and OOO events.
The HW engine enables direct data placement (including the data digest CRC calculation and validation) and direct data transmission (including data digest CRC calculation).

 
The series patches:
===================
Patch 1-2       Add the nvme-tcp-offload ULP module, including the upper and lower API.
Patches 3-5     nvme-tcp-offload ULP controller level functionalities.
Patch 6         nvme-tcp-offload ULP queue level functionalities.
Patch 7         nvme-tcp-offload ULP IO level functionalities.


Performance:
============
With this implementation on top of the Marvell qedn driver (using the Marvell fastlinq NIC), we were able to demonstrate x3 CPU utilization improvement for 4K queued read/write IOs and up to x20 in case of 512K read/write IOs.
In addition, we were able to demonstrate latency improvement, and specifically 99.99% tail latency improvement of up to x2-5 (depends on the queue-depth).


Future work:
============
 - The Marvell nvme-tcp-offload "qedn" host driver. This driver will interact with the qed core module, in a similar fashion to the existing ethernet (qede), rdma (qedr), iscsi (qedi) and fcoe
(qedf) drivers of the same product-line.
 - The nvme-tcp-offload ULP target abstraction layer.
 - The Marvell nvme-tcp-offload "qednt" target driver.
 
 
Arie Gershberg (3):
  nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
    definitions
  nvme-tcp-offload: Add controller level implementation
  nvme-tcp-offload: Add controller level error recovery implementation

Dean Balandin (3):
  nvme-tcp-offload: Add device scan implementation
  nvme-tcp-offload: Add queue level implementation
  nvme-tcp-offload: Add IO level implementation

Shai Malin (1):
  nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP

 drivers/nvme/host/Kconfig       |   16 +
 drivers/nvme/host/Makefile      |    3 +
 drivers/nvme/host/fabrics.c     |    6 -
 drivers/nvme/host/fabrics.h     |    6 +
 drivers/nvme/host/tcp-offload.c | 1086 +++++++++++++++++++++++++++++++
 drivers/nvme/host/tcp-offload.h |  184 ++++++
 include/linux/nvme.h            |    1 +
 7 files changed, 1296 insertions(+), 6 deletions(-)
 create mode 100644 drivers/nvme/host/tcp-offload.c
 create mode 100644 drivers/nvme/host/tcp-offload.h

-- 
2.22.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2020-11-19 14:21 [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Shai Malin
@ 2020-11-19 14:21 ` Shai Malin
  2020-11-26  1:22   ` Sagi Grimberg
  2020-11-26  1:55   ` Sagi Grimberg
  2020-11-19 14:21 ` [PATCH 2/7] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Shai Malin
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 16+ messages in thread
From: Shai Malin @ 2020-11-19 14:21 UTC (permalink / raw)
  To: linux-nvme, sagi, Erik.Smith, Douglas.Farley, liuw
  Cc: smalin, aelior, agershberg, mkalderon, nassa, dbalandin, malin1024

This patch will present the structure for the NVMeTCP offload common
layer driver. This module is added under "drivers/nvme/host/" and future
offload drivers which will register to it will be placed under
"drivers/nvme/hw".
This new driver will be enabled by the Kconfig "NVM Express over Fabrics
TCP offload commmon layer".
In order to support the new transport type, for host mode, no change is
needed.

Each new vendor-specific offload driver will register to this ULP during
its probe function, by filling out the nvme_tcp_ofld_dev->ops and
nvme_tcp_ofld_dev->private_data and calling nvme_tcp_ofld_register_dev
with the initialized struct.

The internal implementation:
- tcp-offload.h:
  Includes all common structs and ops to be used and shared by offload
  drivers.

- tcp-offload.c:
  Includes the init function which registers as a NVMf transport just
  like any other transport.

Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
---
 drivers/nvme/host/Kconfig       |  16 +++
 drivers/nvme/host/Makefile      |   3 +
 drivers/nvme/host/tcp-offload.c | 121 ++++++++++++++++++++++
 drivers/nvme/host/tcp-offload.h | 173 ++++++++++++++++++++++++++++++++
 include/linux/nvme.h            |   1 +
 5 files changed, 314 insertions(+)
 create mode 100644 drivers/nvme/host/tcp-offload.c
 create mode 100644 drivers/nvme/host/tcp-offload.h

diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index 3ed9786b88d8..c05057e8a7a4 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -83,3 +83,19 @@ config NVME_TCP
 	  from https://github.com/linux-nvme/nvme-cli.
 
 	  If unsure, say N.
+
+config NVME_TCP_OFFLOAD
+	tristate "NVM Express over Fabrics TCP offload common layer"
+	default m
+	depends on INET
+	depends on BLK_DEV_NVME
+	select NVME_FABRICS
+	help
+	  This provides support for the NVMe over Fabrics protocol using
+	  the TCP offload transport. This allows you to use remote block devices
+	  exported using the NVMe protocol set.
+
+	  To configure a NVMe over Fabrics controller use the nvme-cli tool
+	  from https://github.com/linux-nvme/nvme-cli.
+
+	  If unsure, say N.
diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index d7f6a87687b8..0e7ef044cf29 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)		+= nvme-fabrics.o
 obj-$(CONFIG_NVME_RDMA)			+= nvme-rdma.o
 obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
 obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
+obj-$(CONFIG_NVME_TCP_OFFLOAD)	+= nvme-tcp-offload.o
 
 nvme-core-y				:= core.o
 nvme-core-$(CONFIG_TRACING)		+= trace.o
@@ -26,3 +27,5 @@ nvme-rdma-y				+= rdma.o
 nvme-fc-y				+= fc.o
 
 nvme-tcp-y				+= tcp.o
+
+nvme-tcp-offload-y		+= tcp-offload.o
diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
new file mode 100644
index 000000000000..dfb25da7a564
--- /dev/null
+++ b/drivers/nvme/host/tcp-offload.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright 2020 Marvell. All rights reserved.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+/* Kernel includes */
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* Driver includes */
+#include "tcp-offload.h"
+
+static LIST_HEAD(nvme_tcp_ofld_devices);
+static DECLARE_RWSEM(nvme_tcp_ofld_devices_rwsem);
+
+/**
+ * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
+ * function.
+ * @dev:	NVMeTCP offload device instance to be registered to the
+ *		common tcp offload instance.
+ *
+ * API function that registers the type of vendor specific driver
+ * being implemented to the common NVMe over TCP offload library. Part of
+ * the overall init sequence of starting up an offload driver.
+ */
+int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev)
+{
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
+
+	if (!ops->claim_dev ||
+	    !ops->create_queue ||
+	    !ops->drain_queue ||
+	    !ops->destroy_queue ||
+	    !ops->init_req ||
+	    !ops->map_sg ||
+	    !ops->send_req)
+		return -EINVAL;
+
+	down_write(&nvme_tcp_ofld_devices_rwsem);
+	list_add_tail(&dev->entry, &nvme_tcp_ofld_devices);
+	up_write(&nvme_tcp_ofld_devices_rwsem);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_register_dev);
+
+/**
+ * nvme_tcp_ofld_unregister_dev() - NVMeTCP Offload Library unregistration
+ * function.
+ * @dev:	NVMeTCP offload device instance to be unregistered from the
+ *		common tcp offload instance.
+ *
+ * API function that unregisters the type of vendor specific driver being
+ * implemented from the common NVMe over TCP offload library.
+ * Part of the overall exit sequence of unloading the implemented driver.
+ */
+void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)
+{
+	down_write(&nvme_tcp_ofld_devices_rwsem);
+	list_del(&dev->entry);
+	up_write(&nvme_tcp_ofld_devices_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
+
+/**
+ * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event
+ * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.
+ * @queue:	NVMeTCP offload queue instance on which the error has occurred.
+ *
+ * API function that allows the vendor specific offload driver to reports errors
+ * to the common offload layer, to invoke error recovery.
+ */
+void nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)
+{
+	/* Placeholder - invoke error recovery flow */
+}
+
+/**
+ * nvme_tcp_ofld_req_done() - NVMeTCP Offload request done callback
+ * function. Pointed to by nvme_tcp_ofld_req->done.
+ * @req:	NVMeTCP offload request to complete.
+ * @result:     The nvme_result.
+ * @status:     The completion status.
+ *
+ * API function that allows the vendor specific offload driver to report request
+ * completions to the common offload layer.
+ */
+void
+nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
+		       union nvme_result *result,
+		       __le16 status)
+{
+	/* Placeholder - complete request with/without error */
+}
+
+static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
+	.name		= "tcp_offload",
+	.module		= THIS_MODULE,
+	.required_opts	= NVMF_OPT_TRADDR,
+	.allowed_opts	= NVMF_OPT_TRSVCID | NVMF_OPT_DISABLE_SQFLOW |
+			  NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_HOST_TRADDR |
+			  NVMF_OPT_CTRL_LOSS_TMO | NVMF_OPT_RECONNECT_DELAY |
+			  NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
+			  NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS,
+};
+
+static int __init nvme_tcp_ofld_init_module(void)
+{
+	nvmf_register_transport(&nvme_tcp_ofld_transport);
+
+	return 0;
+}
+
+static void __exit nvme_tcp_ofld_cleanup_module(void)
+{
+	nvmf_unregister_transport(&nvme_tcp_ofld_transport);
+}
+
+module_init(nvme_tcp_ofld_init_module);
+module_exit(nvme_tcp_ofld_cleanup_module);
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
new file mode 100644
index 000000000000..ed340b33d366
--- /dev/null
+++ b/drivers/nvme/host/tcp-offload.h
@@ -0,0 +1,173 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/*
+ * Copyright 2020 Marvell. All rights reserved.
+ */
+
+/* Linux includes */
+#include <linux/dma-mapping.h>
+#include <linux/scatterlist.h>
+#include <linux/types.h>
+#include <linux/nvme-tcp.h>
+
+/* Driver includes */
+#include "nvme.h"
+#include "fabrics.h"
+
+/* Forward declarations */
+struct nvme_tcp_ofld_ops;
+
+/* Representation of a vendor-specific device. This is the struct used to
+ * register to the offload layer by the vendor-specific driver during its probe
+ * function.
+ * Allocated by vendor-specific driver.
+ */
+struct nvme_tcp_ofld_dev {
+	struct list_head entry;
+	struct nvme_tcp_ofld_ops *ops;
+};
+
+/* Per IO struct holding the nvme_request and command
+ * Allocated by blk-mq.
+ */
+struct nvme_tcp_ofld_req {
+	struct nvme_request req;
+	struct nvme_command nvme_cmd;
+	struct nvme_tcp_ofld_queue *queue;
+
+	/* Vendor specific driver context */
+	void *private_data;
+
+	void (*done)(struct nvme_tcp_ofld_req *req,
+		     union nvme_result *result,
+		     __le16 status);
+};
+
+/* Allocated by nvme_tcp_ofld */
+struct nvme_tcp_ofld_queue {
+	/* Offload device associated to this queue */
+	struct nvme_tcp_ofld_dev *dev;
+	struct nvme_tcp_ofld_ctrl *ctrl;
+
+	/* Vendor specific driver context */
+	void *private_data;
+
+	/* Error callback function */
+	void (*report_err)(struct nvme_tcp_ofld_queue *queue);
+};
+
+/* Connectivity (routing) params used for establishing a connection */
+struct nvme_tcp_ofld_ctrl_con_params {
+	/* Input params */
+	struct sockaddr_storage remote_ip_addr;
+
+	/*
+	 * Can be input or output, depending if host traddr was passed.
+	 * Vendor-specific driver should determine if it should use the passed
+	 * addr or fill it on its own.
+	 */
+	struct sockaddr_storage local_ip_addr;
+
+	/* Output params */
+	struct sockaddr	remote_mac_addr;
+	struct sockaddr	local_mac_addr;
+	u16 vlan_id;
+};
+
+/* Allocated by nvme_tcp_ofld */
+struct nvme_tcp_ofld_ctrl {
+	struct nvme_ctrl nctrl;
+	struct nvme_tcp_ofld_dev *dev;
+
+	/* admin and IO queues */
+	struct blk_mq_tag_set tag_set;
+	struct blk_mq_tag_set admin_tag_set;
+	struct nvme_tcp_ofld_queue *queues;
+
+	/*
+	 * Each entry in the array indicates the number of queues of
+	 * corresponding type.
+	 */
+	u32 queue_type_mapping[HCTX_MAX_TYPES];
+
+	/* Connectivity params */
+	struct nvme_tcp_ofld_ctrl_con_params conn_params;
+
+	/* Vendor specific driver context */
+	void *private_data;
+};
+
+struct nvme_tcp_ofld_ops {
+	const char *name;
+	struct module *module;
+
+	/* For vendor-specific driver to report what opts it supports */
+	int required_opts; /* bitmap using enum nvmf_parsing_opts */
+	int allowed_opts; /* bitmap using enum nvmf_parsing_opts */
+
+	/**
+	 * claim_dev: Return True if addr is reachable via offload device.
+	 * @dev: The offload device to check.
+	 * @conn_params: ptr to routing params to be filled by the lower
+	 *               driver. Input+Output argument.
+	 */
+	bool (*claim_dev)(struct nvme_tcp_ofld_dev *dev,
+			  struct nvme_tcp_ofld_ctrl_con_params *conn_params);
+
+	/**
+	 * create_queue: Create offload queue and establish TCP + NVMeTCP
+	 * (icreq+icresp) connection. Return true on successful connection.
+	 * Based on nvme_tcp_alloc_queue.
+	 * @queue: The queue itself.
+	 * @qid: The queue ID associated with the requested queue.
+	 * @q_size: The queue depth.
+	 */
+	bool (*create_queue)(struct nvme_tcp_ofld_queue *queue, int qid,
+			     size_t q_size);
+
+	/**
+	 * drain_queue: Drain a given queue - Returning from this function
+	 * ensures that no additional completions will arrive on this queue.
+	 * @queue: The queue to drain.
+	 */
+	void (*drain_queue)(struct nvme_tcp_ofld_queue *queue);
+
+	/**
+	 * destroy_queue: Close the TCP + NVMeTCP connection of a given queue
+	 * and make sure its no longer active (no completions will arrive on the
+	 * queue).
+	 * @queue: The queue to destroy.
+	 */
+	void (*destroy_queue)(struct nvme_tcp_ofld_queue *queue);
+
+	/**
+	 * poll_queue: Poll a given queue for completions.
+	 * @queue: The queue to poll.
+	 */
+	void (*poll_queue)(struct nvme_tcp_ofld_queue *queue);
+
+	/**
+	 * init_req: Initialize vendor-specific params for a new request.
+	 * @req: Ptr to request to be initialized. Input+Output argument.
+	 */
+	void (*init_req)(struct nvme_tcp_ofld_req *req);
+
+	/**
+	 * send_req: Dispatch a request. Returns the execution status.
+	 * @req: Ptr to request to be sent.
+	 */
+	blk_status_t (*send_req)(struct nvme_tcp_ofld_req *req);
+
+	/**
+	 * map_sg: Map a scatter/gather list to DMA addresses Returns the
+	 * number of SGs entries mapped successfully.
+	 * @dev: The device for which the DMA addresses are to be created.
+	 * @req: The request corresponding to the SGs, to allow vendor-specific
+	 *       driver to initialize additional params if it needs to.
+	 */
+	int (*map_sg)(struct nvme_tcp_ofld_dev *dev,
+		      struct nvme_tcp_ofld_req *req);
+};
+
+/* Exported functions for lower vendor specific offload drivers */
+int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
+void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index d92535997687..2c4c21f39a2f 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -47,6 +47,7 @@ enum {
 	NVMF_TRTYPE_RDMA	= 1,	/* RDMA */
 	NVMF_TRTYPE_FC		= 2,	/* Fibre Channel */
 	NVMF_TRTYPE_TCP		= 3,	/* TCP/IP */
+	NVMF_TRTYPE_TCP_OFFLOAD = 4,	/* TCP/IP offload */
 	NVMF_TRTYPE_LOOP	= 254,	/* Reserved for host usage */
 	NVMF_TRTYPE_MAX,
 };
-- 
2.22.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
  2020-11-19 14:21 [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Shai Malin
  2020-11-19 14:21 ` [PATCH 1/7] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
@ 2020-11-19 14:21 ` Shai Malin
  2020-11-26  1:27   ` Sagi Grimberg
  2020-11-19 14:21 ` [PATCH 3/7] nvme-tcp-offload: Add device scan implementation Shai Malin
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Shai Malin @ 2020-11-19 14:21 UTC (permalink / raw)
  To: linux-nvme, sagi, Erik.Smith, Douglas.Farley, liuw
  Cc: smalin, aelior, agershberg, mkalderon, nassa, dbalandin, malin1024

From: Arie Gershberg <agershberg@marvell.com>

Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
to header file, so it can be used by transport modules.

Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
---
 drivers/nvme/host/fabrics.c | 6 ------
 drivers/nvme/host/fabrics.h | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 4ec4829d6233..f5af440b1751 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -981,12 +981,6 @@ void nvmf_free_options(struct nvmf_ctrl_options *opts)
 }
 EXPORT_SYMBOL_GPL(nvmf_free_options);
 
-#define NVMF_REQUIRED_OPTS	(NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
-#define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
-				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
-				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
-				 NVMF_OPT_DISABLE_SQFLOW)
-
 static struct nvme_ctrl *
 nvmf_create_ctrl(struct device *dev, const char *buf)
 {
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index a9c1e3b4585e..0a8712a2462e 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -58,6 +58,12 @@ enum {
 	NVMF_OPT_TOS		= 1 << 19,
 };
 
+#define NVMF_REQUIRED_OPTS	(NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
+#define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
+				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
+				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
+				 NVMF_OPT_DISABLE_SQFLOW)
+
 /**
  * struct nvmf_ctrl_options - Used to hold the options specified
  *			      with the parsing opts enum.
-- 
2.22.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] nvme-tcp-offload: Add device scan implementation
  2020-11-19 14:21 [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Shai Malin
  2020-11-19 14:21 ` [PATCH 1/7] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
  2020-11-19 14:21 ` [PATCH 2/7] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Shai Malin
@ 2020-11-19 14:21 ` Shai Malin
  2020-11-19 14:21 ` [PATCH 4/7] nvme-tcp-offload: Add controller level implementation Shai Malin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Shai Malin @ 2020-11-19 14:21 UTC (permalink / raw)
  To: linux-nvme, sagi, Erik.Smith, Douglas.Farley, liuw
  Cc: smalin, aelior, agershberg, mkalderon, nassa, dbalandin, malin1024

From: Dean Balandin <dbalandin@marvell.com>

This patch implements the create_ctrl skeleton, specifcially to
demonstrate how the claim_dev op will be used.

The driver scans the registered devices and calls the claim_dev op on
each of them, to find the first devices that matches the connection
params. Once the correct devices is found (claim_dev returns true), we
raise the refcnt of that device and return that device as the device to
be used for ctrl currently being created.

Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

new
---
 drivers/nvme/host/tcp-offload.c | 105 ++++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index dfb25da7a564..f21cb2497ea7 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -13,6 +13,12 @@
 static LIST_HEAD(nvme_tcp_ofld_devices);
 static DECLARE_RWSEM(nvme_tcp_ofld_devices_rwsem);
 
+static inline struct nvme_tcp_ofld_ctrl *
+to_tcp_ofld_ctrl(struct nvme_ctrl *nctrl)
+{
+	return container_of(nctrl, struct nvme_tcp_ofld_ctrl, nctrl);
+}
+
 /**
  * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
  * function.
@@ -93,6 +99,104 @@ nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
 	/* Placeholder - complete request with/without error */
 }
 
+struct nvme_tcp_ofld_dev *
+nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)
+{
+	struct nvme_tcp_ofld_dev *dev;
+
+	down_read(&nvme_tcp_ofld_devices_rwsem);
+	list_for_each_entry(dev, &nvme_tcp_ofld_devices, entry) {
+		if (dev->ops->claim_dev(dev, &ctrl->conn_params)) {
+			/* Increase driver refcnt */
+			if (!try_module_get(dev->ops->module)) {
+				pr_err("try_module_get failed");
+				dev = NULL;
+			}
+
+			goto out;
+		}
+	}
+
+	dev = NULL;
+out:
+	up_read(&nvme_tcp_ofld_devices_rwsem);
+
+	return dev;
+}
+
+static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
+{
+	/* Placeholder - validates inputs and creates admin and IO queues */
+
+	return 0;
+}
+
+static struct nvme_ctrl *
+nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
+{
+	struct nvme_tcp_ofld_queue *queue;
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	struct nvme_tcp_ofld_dev *dev;
+	struct nvme_ctrl *nctrl;
+	int i, rc = 0;
+
+	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
+	if (!ctrl)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * Init nvme_tcp_ofld_ctrl and nvme_ctrl params based on received
+	 * opts
+	 */
+
+	/* Find device that can reach the dest addr */
+	dev = nvme_tcp_ofld_lookup_dev(ctrl);
+	if (!dev) {
+		pr_info("no device found for addr %s:%s.\n",
+			opts->traddr, opts->trsvcid);
+		rc = -EINVAL;
+		goto out_free_ctrl;
+	}
+
+	ctrl->dev = dev;
+	ctrl->queues = kcalloc(nctrl->queue_count,
+			       sizeof(struct nvme_tcp_ofld_queue),
+			       GFP_KERNEL);
+	if (!ctrl->queues) {
+		rc = -ENOMEM;
+		goto out_module_put;
+	}
+
+	for (i = 0; i < nctrl->queue_count; ++i) {
+		queue = &ctrl->queues[i];
+		queue->ctrl = ctrl;
+		queue->dev = dev;
+		queue->report_err = nvme_tcp_ofld_report_queue_err;
+	}
+
+	/* Call nvme_init_ctrl */
+
+	rc = nvme_tcp_ofld_setup_ctrl(nctrl, true);
+	if (rc)
+		goto out_uninit_ctrl;
+
+	return nctrl;
+
+out_uninit_ctrl:
+	nvme_uninit_ctrl(nctrl);
+	nvme_put_ctrl(nctrl);
+	if (rc > 0)
+		rc = -EIO;
+
+	return ERR_PTR(rc);
+out_module_put:
+	module_put(dev->ops->module);
+out_free_ctrl:
+	kfree(ctrl);
+
+	return ERR_PTR(rc);
+}
+
 static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
 	.name		= "tcp_offload",
 	.module		= THIS_MODULE,
@@ -102,6 +206,7 @@ static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
 			  NVMF_OPT_CTRL_LOSS_TMO | NVMF_OPT_RECONNECT_DELAY |
 			  NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
 			  NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS,
+	.create_ctrl	= nvme_tcp_ofld_create_ctrl,
 };
 
 static int __init nvme_tcp_ofld_init_module(void)
-- 
2.22.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] nvme-tcp-offload: Add controller level implementation
  2020-11-19 14:21 [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Shai Malin
                   ` (2 preceding siblings ...)
  2020-11-19 14:21 ` [PATCH 3/7] nvme-tcp-offload: Add device scan implementation Shai Malin
@ 2020-11-19 14:21 ` Shai Malin
  2020-11-19 14:21 ` [PATCH 5/7] nvme-tcp-offload: Add controller level error recovery implementation Shai Malin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Shai Malin @ 2020-11-19 14:21 UTC (permalink / raw)
  To: linux-nvme, sagi, Erik.Smith, Douglas.Farley, liuw
  Cc: smalin, aelior, agershberg, mkalderon, nassa, dbalandin, malin1024

From: Arie Gershberg <agershberg@marvell.com>

In this patch we implement controller level functionality including:
- create_ctrl.
- delete_ctrl.
- free_ctrl.

The implementation is similar to other nvme fabrics modules, the main
difference being that the nvme-tcp-offload ULP calls the vendor specific
claim_dev() op with the given TCP/IP parameters to determine which device
will be used for this controller.
Once found, the vendor specific device and controller will be paired and
kept in a controller list managed by the ULP.

Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
---
 drivers/nvme/host/tcp-offload.c | 436 +++++++++++++++++++++++++++++++-
 drivers/nvme/host/tcp-offload.h |   1 +
 2 files changed, 431 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index f21cb2497ea7..4002354a8afe 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -12,6 +12,10 @@
 
 static LIST_HEAD(nvme_tcp_ofld_devices);
 static DECLARE_RWSEM(nvme_tcp_ofld_devices_rwsem);
+static LIST_HEAD(nvme_tcp_ofld_ctrl_list);
+static DECLARE_RWSEM(nvme_tcp_ofld_ctrl_rwsem);
+static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops;
+static struct blk_mq_ops nvme_tcp_ofld_mq_ops;
 
 static inline struct nvme_tcp_ofld_ctrl *
 to_tcp_ofld_ctrl(struct nvme_ctrl *nctrl)
@@ -124,13 +128,366 @@ nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)
 	return dev;
 }
 
+static struct blk_mq_tag_set *
+nvme_tcp_ofld_alloc_tagset(struct nvme_ctrl *nctrl, bool admin)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct blk_mq_tag_set *set;
+	int rc;
+
+	if (admin) {
+		set = &ctrl->admin_tag_set;
+		memset(set, 0, sizeof(*set));
+		set->ops = &nvme_tcp_ofld_admin_mq_ops;
+		set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
+		set->reserved_tags = 2; /* connect + keep-alive */
+		set->numa_node = nctrl->numa_node;
+		set->flags = BLK_MQ_F_BLOCKING;
+		set->cmd_size = sizeof(struct nvme_tcp_ofld_req);
+		set->driver_data = ctrl;
+		set->nr_hw_queues = 1;
+		set->timeout = ADMIN_TIMEOUT;
+	} else {
+		set = &ctrl->tag_set;
+		memset(set, 0, sizeof(*set));
+		set->ops = &nvme_tcp_ofld_mq_ops;
+		set->queue_depth = nctrl->sqsize + 1;
+		set->reserved_tags = 1; /* fabric connect */
+		set->numa_node = nctrl->numa_node;
+		set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
+		set->cmd_size = sizeof(struct nvme_tcp_ofld_req);
+		set->driver_data = ctrl;
+		set->nr_hw_queues = nctrl->queue_count - 1;
+		set->timeout = NVME_IO_TIMEOUT;
+		set->nr_maps = nctrl->opts->nr_poll_queues ?
+							HCTX_MAX_TYPES : 2;
+	}
+
+	rc = blk_mq_alloc_tag_set(set);
+	if (rc)
+		return ERR_PTR(rc);
+
+	return set;
+}
+
+static int nvme_tcp_ofld_start_queue(struct nvme_ctrl *nctrl, int qid)
+{
+	/* Placeholder - start_queue */
+	return 0;
+}
+
+static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
+					       bool new)
+{
+	int rc;
+
+	/* Placeholder - alloc_admin_queue */
+	if (new) {
+		nctrl->admin_tagset =
+				nvme_tcp_ofld_alloc_tagset(nctrl, true);
+		if (IS_ERR(nctrl->admin_tagset)) {
+			rc = PTR_ERR(nctrl->admin_tagset);
+			nctrl->admin_tagset = NULL;
+			goto out_free_queue;
+		}
+
+		nctrl->fabrics_q = blk_mq_init_queue(nctrl->admin_tagset);
+		if (IS_ERR(nctrl->fabrics_q)) {
+			rc = PTR_ERR(nctrl->fabrics_q);
+			nctrl->fabrics_q = NULL;
+			goto out_free_tagset;
+		}
+
+		nctrl->admin_q = blk_mq_init_queue(nctrl->admin_tagset);
+		if (IS_ERR(nctrl->admin_q)) {
+			rc = PTR_ERR(nctrl->admin_q);
+			nctrl->admin_q = NULL;
+			goto out_cleanup_fabrics_q;
+		}
+	}
+
+	rc = nvme_tcp_ofld_start_queue(nctrl, 0);
+	if (rc)
+		goto out_cleanup_queue;
+
+	rc = nvme_enable_ctrl(nctrl);
+	if (rc)
+		goto out_stop_queue;
+
+	blk_mq_unquiesce_queue(nctrl->admin_q);
+	rc = nvme_init_identify(nctrl);
+	if (rc)
+		goto out_stop_queue;
+
+	return 0;
+
+out_stop_queue:
+	/* Placeholder - stop admin queue */
+out_cleanup_queue:
+	if (new)
+		blk_cleanup_queue(nctrl->admin_q);
+out_cleanup_fabrics_q:
+	if (new)
+		blk_cleanup_queue(nctrl->fabrics_q);
+out_free_tagset:
+	if (new)
+		blk_mq_free_tag_set(nctrl->admin_tagset);
+out_free_queue:
+	/* Placeholder - free admin queue */
+
+	return rc;
+}
+
+static int
+nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
+{
+	int rc;
+
+	/* Placeholder - alloc_io_queues */
+
+	if (new) {
+		nctrl->tagset = nvme_tcp_ofld_alloc_tagset(nctrl, false);
+		if (IS_ERR(nctrl->tagset)) {
+			rc = PTR_ERR(nctrl->tagset);
+			nctrl->tagset = NULL;
+			goto out_free_io_queues;
+		}
+
+		nctrl->connect_q = blk_mq_init_queue(nctrl->tagset);
+		if (IS_ERR(nctrl->connect_q)) {
+			rc = PTR_ERR(nctrl->connect_q);
+			nctrl->connect_q = NULL;
+			goto out_free_tag_set;
+		}
+	}
+
+	/* Placeholder - start_io_queues */
+
+	if (!new) {
+		nvme_start_queues(nctrl);
+		nvme_wait_freeze(nctrl);
+		blk_mq_update_nr_hw_queues(nctrl->tagset,
+					   nctrl->queue_count - 1);
+		nvme_unfreeze(nctrl);
+	}
+
+	return 0;
+
+out_free_tag_set:
+	if (new)
+		blk_mq_free_tag_set(nctrl->tagset);
+out_free_io_queues:
+	/* Placeholder - free_io_queues */
+
+	return rc;
+}
+
 static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
 {
-	/* Placeholder - validates inputs and creates admin and IO queues */
+	struct nvmf_ctrl_options *opts = nctrl->opts;
+	int rc;
+
+	rc = nvme_tcp_ofld_configure_admin_queue(nctrl, new);
+	if (rc)
+		goto destroy_admin;
+
+	if (nctrl->icdoff) {
+		dev_err(nctrl->device, "icdoff is not supported!\n");
+		rc = -EINVAL;
+		goto destroy_admin;
+	}
+
+	if (opts->queue_size > nctrl->sqsize + 1)
+		dev_warn(nctrl->device,
+			 "queue_size %zu > ctrl sqsize %u, clamping down\n",
+			 opts->queue_size, nctrl->sqsize + 1);
+
+	if (nctrl->sqsize + 1 > nctrl->maxcmd) {
+		dev_warn(nctrl->device,
+			 "sqsize %u > ctrl maxcmd %u, clamping down\n",
+			 nctrl->sqsize + 1, nctrl->maxcmd);
+		nctrl->sqsize = nctrl->maxcmd - 1;
+	}
+
+	if (nctrl->queue_count > 1) {
+		rc = nvme_tcp_ofld_configure_io_queues(nctrl, new);
+		if (rc)
+			goto destroy_admin;
+	}
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_LIVE)) {
+		/*
+		 * state change failure is ok if we started ctrl delete,
+		 * unless we're during creation of a new controller to
+		 * avoid races with teardown flow.
+		 */
+		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
+			     nctrl->state != NVME_CTRL_DELETING_NOIO);
+		WARN_ON_ONCE(new);
+		rc = -EINVAL;
+		goto destroy_io;
+	}
+
+	nvme_start_ctrl(nctrl);
+	return 0;
+
+destroy_io:
+	/* Placeholder - stop and destroy io queues*/
+destroy_admin:
+	/* Placeholder - stop and destroy admin queue*/
+
+	return rc;
+}
+
+static int
+nvme_tcp_ofld_check_dev_opts(struct nvmf_ctrl_options *opts,
+			     struct nvme_tcp_ofld_ops *ofld_ops)
+{
+	unsigned int nvme_tcp_ofld_opt_mask = NVMF_ALLOWED_OPTS |
+			ofld_ops->allowed_opts | ofld_ops->required_opts;
+	if (opts->mask & ~nvme_tcp_ofld_opt_mask) {
+		pr_warn("One or more of the nvmf options isn't supported by %s.\n",
+			ofld_ops->name);
+		return -EINVAL;
+	}
 
 	return 0;
 }
 
+static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_dev *dev = ctrl->dev;
+
+	if (list_empty(&ctrl->list))
+		goto free_ctrl;
+
+	down_write(&nvme_tcp_ofld_ctrl_rwsem);
+	list_del(&ctrl->list);
+	up_write(&nvme_tcp_ofld_ctrl_rwsem);
+
+	nvmf_free_options(nctrl->opts);
+free_ctrl:
+	module_put(dev->ops->module);
+	kfree(ctrl->queues);
+	kfree(ctrl);
+}
+
+static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)
+{
+	/* Placeholder - submit_async_event */
+}
+
+static void
+nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *ctrl, bool remove)
+{
+	/* Placeholder - teardown_admin_queue */
+}
+
+static void
+nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
+{
+	/* Placeholder - teardown_io_queues */
+}
+
+static void
+nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown)
+{
+	/* Placeholder - err_work and connect_work */
+	nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown);
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	if (shutdown)
+		nvme_shutdown_ctrl(nctrl);
+	else
+		nvme_disable_ctrl(nctrl);
+	nvme_tcp_ofld_teardown_admin_queue(nctrl, shutdown);
+}
+
+static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl)
+{
+	nvme_tcp_ofld_teardown_ctrl(nctrl, true);
+}
+
+static int
+nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
+			   struct request *rq,
+			   unsigned int hctx_idx,
+			   unsigned int numa_node)
+{
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
+
+	/* Placeholder - init request */
+
+	req->done = nvme_tcp_ofld_req_done;
+	ctrl->dev->ops->init_req(req);
+
+	return 0;
+}
+
+static blk_status_t
+nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
+		       const struct blk_mq_queue_data *bd)
+{
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(bd->rq);
+	struct nvme_tcp_ofld_queue *queue = hctx->driver_data;
+	struct nvme_tcp_ofld_ops *ops = queue->dev->ops;
+
+	/* Call nvme_setup_cmd(...) */
+
+	/* Call ops->map_sg(...) */
+
+	return BLK_STS_OK;
+}
+
+static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
+	.queue_rq	= nvme_tcp_ofld_queue_rq,
+	.init_request	= nvme_tcp_ofld_init_request,
+	/*
+	 * All additional ops will be also implemented and registered similar to
+	 * tcp.c
+	 */
+};
+
+static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
+	.queue_rq	= nvme_tcp_ofld_queue_rq,
+	.init_request	= nvme_tcp_ofld_init_request,
+	/*
+	 * All additional ops will be also implemented and registered similar to
+	 * tcp.c
+	 */
+};
+
+static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
+	.name			= "tcp_offload",
+	.module			= THIS_MODULE,
+	.flags			= NVME_F_FABRICS,
+	.reg_read32		= nvmf_reg_read32,
+	.reg_read64		= nvmf_reg_read64,
+	.reg_write32		= nvmf_reg_write32,
+	.free_ctrl		= nvme_tcp_ofld_free_ctrl,
+	.submit_async_event	= nvme_tcp_ofld_submit_async_event,
+	.delete_ctrl		= nvme_tcp_ofld_delete_ctrl,
+	.get_address		= nvmf_get_address,
+};
+
+static bool
+nvme_tcp_ofld_existing_controller(struct nvmf_ctrl_options *opts)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	bool found = false;
+
+	down_read(&nvme_tcp_ofld_ctrl_rwsem);
+	list_for_each_entry(ctrl, &nvme_tcp_ofld_ctrl_list, list) {
+		found = nvmf_ip_options_match(&ctrl->nctrl, opts);
+		if (found)
+			break;
+	}
+	up_read(&nvme_tcp_ofld_ctrl_rwsem);
+
+	return found;
+}
+
 static struct nvme_ctrl *
 nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 {
@@ -144,10 +501,48 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 	if (!ctrl)
 		return ERR_PTR(-ENOMEM);
 
-	/*
-	 * Init nvme_tcp_ofld_ctrl and nvme_ctrl params based on received
-	 * opts
-	 */
+	INIT_LIST_HEAD(&ctrl->list);
+	nctrl = &ctrl->nctrl;
+	nctrl->opts = opts;
+	nctrl->queue_count = opts->nr_io_queues + opts->nr_write_queues +
+			     opts->nr_poll_queues + 1;
+	nctrl->sqsize = opts->queue_size - 1;
+	nctrl->kato = opts->kato;
+	if (!(opts->mask & NVMF_OPT_TRSVCID)) {
+		opts->trsvcid =
+			kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL);
+		if (!opts->trsvcid) {
+			rc = -ENOMEM;
+			goto out_free_ctrl;
+		}
+		opts->mask |= NVMF_OPT_TRSVCID;
+	}
+
+	rc = inet_pton_with_scope(&init_net, AF_UNSPEC, opts->traddr,
+				  opts->trsvcid,
+				  &ctrl->conn_params.remote_ip_addr);
+	if (rc) {
+		pr_err("malformed address passed: %s:%s\n",
+		       opts->traddr, opts->trsvcid);
+		goto out_free_ctrl;
+	}
+
+	if (opts->mask & NVMF_OPT_HOST_TRADDR) {
+		rc = inet_pton_with_scope(&init_net, AF_UNSPEC,
+					  opts->host_traddr, NULL,
+					  &ctrl->conn_params.local_ip_addr);
+		if (rc) {
+			pr_err("malformed src address passed: %s\n",
+			       opts->host_traddr);
+			goto out_free_ctrl;
+		}
+	}
+
+	if (!opts->duplicate_connect &&
+	    nvme_tcp_ofld_existing_controller(opts)) {
+		rc = -EALREADY;
+		goto out_free_ctrl;
+	}
 
 	/* Find device that can reach the dest addr */
 	dev = nvme_tcp_ofld_lookup_dev(ctrl);
@@ -158,6 +553,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 		goto out_free_ctrl;
 	}
 
+	rc = nvme_tcp_ofld_check_dev_opts(opts, dev->ops);
+	if (rc)
+		goto out_module_put;
+
 	ctrl->dev = dev;
 	ctrl->queues = kcalloc(nctrl->queue_count,
 			       sizeof(struct nvme_tcp_ofld_queue),
@@ -174,12 +573,27 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 		queue->report_err = nvme_tcp_ofld_report_queue_err;
 	}
 
-	/* Call nvme_init_ctrl */
+	rc = nvme_init_ctrl(nctrl, ndev, &nvme_tcp_ofld_ctrl_ops, 0);
+	if (rc)
+		goto out_free_queues;
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
+		WARN_ON_ONCE(1);
+		rc = -EINTR;
+		goto out_uninit_ctrl;
+	}
 
 	rc = nvme_tcp_ofld_setup_ctrl(nctrl, true);
 	if (rc)
 		goto out_uninit_ctrl;
 
+	dev_info(nctrl->device, "new ctrl: NQN \"%s\", addr %pISp\n",
+		 opts->subsysnqn, &ctrl->conn_params.remote_ip_addr);
+
+	down_write(&nvme_tcp_ofld_ctrl_rwsem);
+	list_add_tail(&ctrl->list, &nvme_tcp_ofld_ctrl_list);
+	up_write(&nvme_tcp_ofld_ctrl_rwsem);
+
 	return nctrl;
 
 out_uninit_ctrl:
@@ -189,6 +603,8 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 		rc = -EIO;
 
 	return ERR_PTR(rc);
+out_free_queues:
+	kfree(ctrl->queues);
 out_module_put:
 	module_put(dev->ops->module);
 out_free_ctrl:
@@ -218,7 +634,15 @@ static int __init nvme_tcp_ofld_init_module(void)
 
 static void __exit nvme_tcp_ofld_cleanup_module(void)
 {
+	struct nvme_tcp_ofld_ctrl *ctrl;
+
 	nvmf_unregister_transport(&nvme_tcp_ofld_transport);
+
+	down_write(&nvme_tcp_ofld_ctrl_rwsem);
+	list_for_each_entry(ctrl, &nvme_tcp_ofld_ctrl_list, list)
+		nvme_delete_ctrl(&ctrl->nctrl);
+	up_write(&nvme_tcp_ofld_ctrl_rwsem);
+	flush_workqueue(nvme_delete_wq);
 }
 
 module_init(nvme_tcp_ofld_init_module);
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
index ed340b33d366..d465feeee957 100644
--- a/drivers/nvme/host/tcp-offload.h
+++ b/drivers/nvme/host/tcp-offload.h
@@ -76,6 +76,7 @@ struct nvme_tcp_ofld_ctrl_con_params {
 /* Allocated by nvme_tcp_ofld */
 struct nvme_tcp_ofld_ctrl {
 	struct nvme_ctrl nctrl;
+	struct list_head list;
 	struct nvme_tcp_ofld_dev *dev;
 
 	/* admin and IO queues */
-- 
2.22.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] nvme-tcp-offload: Add controller level error recovery implementation
  2020-11-19 14:21 [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Shai Malin
                   ` (3 preceding siblings ...)
  2020-11-19 14:21 ` [PATCH 4/7] nvme-tcp-offload: Add controller level implementation Shai Malin
@ 2020-11-19 14:21 ` Shai Malin
  2020-11-19 14:21 ` [PATCH 6/7] nvme-tcp-offload: Add queue level implementation Shai Malin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Shai Malin @ 2020-11-19 14:21 UTC (permalink / raw)
  To: linux-nvme, sagi, Erik.Smith, Douglas.Farley, liuw
  Cc: smalin, aelior, agershberg, mkalderon, nassa, dbalandin, malin1024

From: Arie Gershberg <agershberg@marvell.com>

In this patch, we implement controller level error handling and recovery.
Upon an error discovered by the ULP or reset controller initiated by the
nvme-core (using reset_ctrl workqueue), the ULP will initiate a controller
recovery which includes teardown and re-connect of all queues.

Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
---
 drivers/nvme/host/tcp-offload.c | 113 +++++++++++++++++++++++++++++++-
 drivers/nvme/host/tcp-offload.h |   4 ++
 2 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 4002354a8afe..74f708674345 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -72,6 +72,15 @@ void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)
 }
 EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
 
+void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl)
+{
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_RESETTING))
+		return;
+
+	queue_work(nvme_reset_wq, &to_tcp_ofld_ctrl(nctrl)->err_work);
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_error_recovery);
+
 /**
  * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event
  * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.
@@ -282,6 +291,27 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 	return rc;
 }
 
+static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)
+{
+	/* If we are resetting/deleting then do nothing */
+	if (nctrl->state != NVME_CTRL_CONNECTING) {
+		WARN_ON_ONCE(nctrl->state == NVME_CTRL_NEW ||
+			     nctrl->state == NVME_CTRL_LIVE);
+		return;
+	}
+
+	if (nvmf_should_reconnect(nctrl)) {
+		dev_info(nctrl->device, "Reconnecting in %d seconds...\n",
+			 nctrl->opts->reconnect_delay);
+		queue_delayed_work(nvme_wq,
+				   &to_tcp_ofld_ctrl(nctrl)->connect_work,
+				   nctrl->opts->reconnect_delay * HZ);
+	} else {
+		dev_info(nctrl->device, "Removing controller...\n");
+		nvme_delete_ctrl(nctrl);
+	}
+}
+
 static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
 {
 	struct nvmf_ctrl_options *opts = nctrl->opts;
@@ -390,10 +420,62 @@ nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
 	/* Placeholder - teardown_io_queues */
 }
 
+static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl =
+				container_of(to_delayed_work(work),
+					     struct nvme_tcp_ofld_ctrl,
+					     connect_work);
+	struct nvme_ctrl *nctrl = &ctrl->nctrl;
+
+	++nctrl->nr_reconnects;
+
+	if (nvme_tcp_ofld_setup_ctrl(nctrl, false))
+		goto requeue;
+
+	dev_info(nctrl->device, "Successfully reconnected (%d attempt)\n",
+		 nctrl->nr_reconnects);
+
+	nctrl->nr_reconnects = 0;
+
+	return;
+
+requeue:
+	dev_info(nctrl->device, "Failed reconnect attempt %d\n",
+		 nctrl->nr_reconnects);
+	nvme_tcp_ofld_reconnect_or_remove(nctrl);
+}
+
+static void nvme_tcp_ofld_error_recovery_work(struct work_struct *work)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl =
+		container_of(work, struct nvme_tcp_ofld_ctrl, err_work);
+	struct nvme_ctrl *nctrl = &ctrl->nctrl;
+
+	nvme_stop_keep_alive(nctrl);
+	nvme_tcp_ofld_teardown_io_queues(nctrl, false);
+	/* unquiesce to fail fast pending requests */
+	nvme_start_queues(nctrl);
+	nvme_tcp_ofld_teardown_admin_queue(nctrl, false);
+	blk_mq_unquiesce_queue(nctrl->admin_q);
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
+		/* state change failure is ok if we started nctrl delete */
+		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
+			     nctrl->state != NVME_CTRL_DELETING_NOIO);
+		return;
+	}
+
+	nvme_tcp_ofld_reconnect_or_remove(nctrl);
+}
+
 static void
 nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown)
 {
-	/* Placeholder - err_work and connect_work */
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+
+	cancel_work_sync(&ctrl->err_work);
+	cancel_delayed_work_sync(&ctrl->connect_work);
 	nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown);
 	blk_mq_quiesce_queue(nctrl->admin_q);
 	if (shutdown)
@@ -408,6 +490,31 @@ static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl)
 	nvme_tcp_ofld_teardown_ctrl(nctrl, true);
 }
 
+static void nvme_tcp_ofld_reset_ctrl_work(struct work_struct *work)
+{
+	struct nvme_ctrl *nctrl =
+		container_of(work, struct nvme_ctrl, reset_work);
+
+	nvme_stop_ctrl(nctrl);
+	nvme_tcp_ofld_teardown_ctrl(nctrl, false);
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
+		/* state change failure is ok if we started ctrl delete */
+		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
+			     nctrl->state != NVME_CTRL_DELETING_NOIO);
+		return;
+	}
+
+	if (nvme_tcp_ofld_setup_ctrl(nctrl, false))
+		goto out_fail;
+
+	return;
+
+out_fail:
+	++nctrl->nr_reconnects;
+	nvme_tcp_ofld_reconnect_or_remove(nctrl);
+}
+
 static int
 nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
 			   struct request *rq,
@@ -508,6 +615,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 			     opts->nr_poll_queues + 1;
 	nctrl->sqsize = opts->queue_size - 1;
 	nctrl->kato = opts->kato;
+	INIT_DELAYED_WORK(&ctrl->connect_work,
+			  nvme_tcp_ofld_reconnect_ctrl_work);
+	INIT_WORK(&ctrl->err_work, nvme_tcp_ofld_error_recovery_work);
+	INIT_WORK(&nctrl->reset_work, nvme_tcp_ofld_reset_ctrl_work);
 	if (!(opts->mask & NVMF_OPT_TRSVCID)) {
 		opts->trsvcid =
 			kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL);
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
index d465feeee957..d4eb62873de1 100644
--- a/drivers/nvme/host/tcp-offload.h
+++ b/drivers/nvme/host/tcp-offload.h
@@ -84,6 +84,9 @@ struct nvme_tcp_ofld_ctrl {
 	struct blk_mq_tag_set admin_tag_set;
 	struct nvme_tcp_ofld_queue *queues;
 
+	struct work_struct err_work;
+	struct delayed_work connect_work;
+
 	/*
 	 * Each entry in the array indicates the number of queues of
 	 * corresponding type.
@@ -172,3 +175,4 @@ struct nvme_tcp_ofld_ops {
 /* Exported functions for lower vendor specific offload drivers */
 int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
 void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
+void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);
-- 
2.22.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] nvme-tcp-offload: Add queue level implementation
  2020-11-19 14:21 [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Shai Malin
                   ` (4 preceding siblings ...)
  2020-11-19 14:21 ` [PATCH 5/7] nvme-tcp-offload: Add controller level error recovery implementation Shai Malin
@ 2020-11-19 14:21 ` Shai Malin
  2020-11-19 14:21 ` [PATCH 7/7] nvme-tcp-offload: Add IO " Shai Malin
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Shai Malin @ 2020-11-19 14:21 UTC (permalink / raw)
  To: linux-nvme, sagi, Erik.Smith, Douglas.Farley, liuw
  Cc: smalin, aelior, agershberg, mkalderon, nassa, dbalandin, malin1024

From: Dean Balandin <dbalandin@marvell.com>

In this patch we implement queue level functionality.
The implementation is similar to the nvme-tcp module, the main
difference being that we call the vendor specific create_queue op which
creates the TCP connection, and NVMeTPC connection including
icreq+icresp negotiation.
Once create_queue returns sucessfully, we can move on to the fabrics
connect.

Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
---
 drivers/nvme/host/tcp-offload.c | 308 +++++++++++++++++++++++++++++---
 drivers/nvme/host/tcp-offload.h |   6 +
 2 files changed, 294 insertions(+), 20 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 74f708674345..baf38526ccb9 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -23,6 +23,11 @@ to_tcp_ofld_ctrl(struct nvme_ctrl *nctrl)
 	return container_of(nctrl, struct nvme_tcp_ofld_ctrl, nctrl);
 }
 
+static inline int nvme_tcp_ofld_qid(struct nvme_tcp_ofld_queue *queue)
+{
+	return queue - queue->ctrl->queues;
+}
+
 /**
  * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
  * function.
@@ -179,18 +184,97 @@ nvme_tcp_ofld_alloc_tagset(struct nvme_ctrl *nctrl, bool admin)
 	return set;
 }
 
+static bool nvme_tcp_ofld_poll_q(struct nvme_tcp_ofld_queue *queue)
+{
+	/* Placeholder - implement logic to determine if poll queue */
+
+	return false;
+}
+
+static void __nvme_tcp_ofld_stop_queue(struct nvme_tcp_ofld_queue *queue)
+{
+	queue->dev->ops->drain_queue(queue);
+	queue->dev->ops->destroy_queue(queue);
+
+	/* Placeholder - additional cleanup such as cancel_work_sync io_work */
+	clear_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags);
+}
+
+static void nvme_tcp_ofld_stop_queue(struct nvme_ctrl *nctrl, int qid)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
+
+	if (!test_and_clear_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags))
+		return;
+
+	__nvme_tcp_ofld_stop_queue(queue);
+}
+
+static void nvme_tcp_ofld_free_queue(struct nvme_ctrl *nctrl, int qid)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
+
+	if (!test_and_clear_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags))
+		return;
+
+	/* Placeholder - additional queue cleanup */
+}
+
+static void
+nvme_tcp_ofld_terminate_admin_queue(struct nvme_ctrl *nctrl, bool remove)
+{
+	nvme_tcp_ofld_stop_queue(nctrl, 0);
+	if (remove) {
+		if (nctrl->admin_q && !blk_queue_dead(nctrl->admin_q))
+			blk_cleanup_queue(nctrl->admin_q);
+
+		if (nctrl->fabrics_q)
+			blk_cleanup_queue(nctrl->fabrics_q);
+
+		if (nctrl->admin_tagset)
+			blk_mq_free_tag_set(nctrl->admin_tagset);
+	}
+}
+
 static int nvme_tcp_ofld_start_queue(struct nvme_ctrl *nctrl, int qid)
 {
-	/* Placeholder - start_queue */
-	return 0;
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
+	int rc;
+
+	queue = &ctrl->queues[qid];
+	if (qid)
+		rc = nvmf_connect_io_queue(nctrl, qid,
+					   nvme_tcp_ofld_poll_q(queue));
+	else
+		rc = nvmf_connect_admin_queue(nctrl);
+
+	if (!rc) {
+		set_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags);
+	} else {
+		if (test_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags))
+			__nvme_tcp_ofld_stop_queue(queue);
+		dev_err(nctrl->device,
+			"failed to connect queue: %d ret=%d\n", qid, rc);
+	}
+
+	return rc;
 }
 
 static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 					       bool new)
 {
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[0];
 	int rc;
 
-	/* Placeholder - alloc_admin_queue */
+	rc = ctrl->dev->ops->create_queue(queue, 0, NVME_AQ_DEPTH);
+	if (rc)
+		return rc;
+
+	set_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags);
 	if (new) {
 		nctrl->admin_tagset =
 				nvme_tcp_ofld_alloc_tagset(nctrl, true);
@@ -231,7 +315,7 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 	return 0;
 
 out_stop_queue:
-	/* Placeholder - stop admin queue */
+	nvme_tcp_ofld_stop_queue(nctrl, 0);
 out_cleanup_queue:
 	if (new)
 		blk_cleanup_queue(nctrl->admin_q);
@@ -242,7 +326,116 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 	if (new)
 		blk_mq_free_tag_set(nctrl->admin_tagset);
 out_free_queue:
-	/* Placeholder - free admin queue */
+	nvme_tcp_ofld_free_queue(nctrl, 0);
+
+	return rc;
+}
+
+static unsigned int nvme_tcp_ofld_nr_io_queues(struct nvme_ctrl *nctrl)
+{
+	unsigned int nr_io_queues;
+
+	nr_io_queues = min(nctrl->opts->nr_io_queues, num_online_cpus());
+	nr_io_queues += min(nctrl->opts->nr_write_queues, num_online_cpus());
+	nr_io_queues += min(nctrl->opts->nr_poll_queues, num_online_cpus());
+
+	return nr_io_queues;
+}
+
+static void
+nvme_tcp_ofld_set_io_queues(struct nvme_ctrl *nctrl, unsigned int nr_io_queues)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvmf_ctrl_options *opts = nctrl->opts;
+
+	if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
+		/*
+		 * separate read/write queues
+		 * hand out dedicated default queues only after we have
+		 * sufficient read queues.
+		 */
+		ctrl->queue_type_mapping[HCTX_TYPE_READ] = opts->nr_io_queues;
+		nr_io_queues -= ctrl->queue_type_mapping[HCTX_TYPE_READ];
+		ctrl->queue_type_mapping[HCTX_TYPE_DEFAULT] =
+			min(opts->nr_write_queues, nr_io_queues);
+		nr_io_queues -= ctrl->queue_type_mapping[HCTX_TYPE_DEFAULT];
+	} else {
+		/*
+		 * shared read/write queues
+		 * either no write queues were requested, or we don't have
+		 * sufficient queue count to have dedicated default queues.
+		 */
+		ctrl->queue_type_mapping[HCTX_TYPE_DEFAULT] =
+			min(opts->nr_io_queues, nr_io_queues);
+		nr_io_queues -= ctrl->queue_type_mapping[HCTX_TYPE_DEFAULT];
+	}
+}
+
+static void
+nvme_tcp_ofld_terminate_io_queues(struct nvme_ctrl *nctrl, int start_from)
+{
+	int i;
+
+	/* adminq will be ignored because of the loop condition */
+	for (i = start_from; i >= 1; i--) {
+		nvme_tcp_ofld_stop_queue(nctrl, i);
+	}
+}
+
+static int __nvme_tcp_ofld_alloc_io_queues(struct nvme_ctrl *nctrl)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	int i, rc;
+
+	for (i = 1; i < nctrl->queue_count; i++) {
+		rc = ctrl->dev->ops->create_queue(
+			&ctrl->queues[i], i, nctrl->sqsize + 1);
+		if (rc)
+			goto out_free_queues;
+	}
+
+	return 0;
+
+out_free_queues:
+	nvme_tcp_ofld_terminate_io_queues(nctrl, --i);
+
+	return rc;
+}
+
+static int nvme_tcp_ofld_alloc_io_queues(struct nvme_ctrl *nctrl)
+{
+	unsigned int nr_io_queues;
+	int rc;
+
+	nr_io_queues = nvme_tcp_ofld_nr_io_queues(nctrl);
+	rc = nvme_set_queue_count(nctrl, &nr_io_queues);
+	if (rc)
+		return rc;
+
+	nctrl->queue_count = nr_io_queues + 1;
+	if (nctrl->queue_count < 2)
+		return 0;
+
+	dev_info(nctrl->device, "creating %d I/O queues.\n", nr_io_queues);
+	nvme_tcp_ofld_set_io_queues(nctrl, nr_io_queues);
+
+	return __nvme_tcp_ofld_alloc_io_queues(nctrl);
+}
+
+static int nvme_tcp_ofld_start_io_queues(struct nvme_ctrl *nctrl)
+{
+	int i, rc = 0;
+
+	for (i = 1; i < nctrl->queue_count; i++) {
+		rc = nvme_tcp_ofld_start_queue(nctrl, i);
+		if (rc)
+			goto terminate_queues;
+	}
+
+	return 0;
+
+terminate_queues:
+	nvme_tcp_ofld_terminate_io_queues(nctrl, --i);
 
 	return rc;
 }
@@ -250,9 +443,10 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 static int
 nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 {
-	int rc;
+	int rc = nvme_tcp_ofld_alloc_io_queues(nctrl);
 
-	/* Placeholder - alloc_io_queues */
+	if (rc)
+		return rc;
 
 	if (new) {
 		nctrl->tagset = nvme_tcp_ofld_alloc_tagset(nctrl, false);
@@ -270,7 +464,9 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 		}
 	}
 
-	/* Placeholder - start_io_queues */
+	rc = nvme_tcp_ofld_start_io_queues(nctrl);
+	if (rc)
+		goto out_cleanup_connect_q;
 
 	if (!new) {
 		nvme_start_queues(nctrl);
@@ -282,11 +478,14 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 
 	return 0;
 
+out_cleanup_connect_q:
+	if (new && nctrl->connect_q)
+		blk_cleanup_queue(nctrl->connect_q);
 out_free_tag_set:
 	if (new)
 		blk_mq_free_tag_set(nctrl->tagset);
 out_free_io_queues:
-	/* Placeholder - free_io_queues */
+	nvme_tcp_ofld_terminate_io_queues(nctrl, nctrl->queue_count);
 
 	return rc;
 }
@@ -362,9 +561,9 @@ static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
 	return 0;
 
 destroy_io:
-	/* Placeholder - stop and destroy io queues*/
+	nvme_tcp_ofld_terminate_io_queues(nctrl, nctrl->queue_count);
 destroy_admin:
-	/* Placeholder - stop and destroy admin queue*/
+	nvme_tcp_ofld_terminate_admin_queue(nctrl, new);
 
 	return rc;
 }
@@ -525,7 +724,6 @@ nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
 	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
 
 	/* Placeholder - init request */
-
 	req->done = nvme_tcp_ofld_req_done;
 	ctrl->dev->ops->init_req(req);
 
@@ -547,22 +745,92 @@ nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return BLK_STS_OK;
 }
 
+static void
+nvme_tcp_ofld_exit_request(struct blk_mq_tag_set *set,
+			   struct request *rq, unsigned int hctx_idx)
+{
+	/* Placeholder */
+}
+
+static int
+nvme_tcp_ofld_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
+			unsigned int hctx_idx)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = data;
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[hctx_idx + 1];
+
+	hctx->driver_data = queue;
+	return 0;
+}
+
+static int nvme_tcp_ofld_map_queues(struct blk_mq_tag_set *set)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
+	struct nvmf_ctrl_options *opts = ctrl->nctrl.opts;
+
+	if (opts->nr_write_queues && ctrl->queue_type_mapping[HCTX_TYPE_READ]) {
+		/* separate read/write queues */
+		set->map[HCTX_TYPE_DEFAULT].nr_queues =
+			ctrl->queue_type_mapping[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+		set->map[HCTX_TYPE_READ].nr_queues =
+			ctrl->queue_type_mapping[HCTX_TYPE_READ];
+		set->map[HCTX_TYPE_READ].queue_offset =
+			ctrl->queue_type_mapping[HCTX_TYPE_DEFAULT];
+	} else {
+		/* shared read/write queues */
+		set->map[HCTX_TYPE_DEFAULT].nr_queues =
+			ctrl->queue_type_mapping[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+		set->map[HCTX_TYPE_READ].nr_queues =
+			ctrl->queue_type_mapping[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_READ].queue_offset = 0;
+	}
+	blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
+	blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
+
+	if (opts->nr_poll_queues && ctrl->queue_type_mapping[HCTX_TYPE_POLL]) {
+		/* map dedicated poll queues only if we have queues left */
+		set->map[HCTX_TYPE_POLL].nr_queues =
+				ctrl->queue_type_mapping[HCTX_TYPE_POLL];
+		set->map[HCTX_TYPE_POLL].queue_offset =
+			ctrl->queue_type_mapping[HCTX_TYPE_DEFAULT] +
+			ctrl->queue_type_mapping[HCTX_TYPE_READ];
+		blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
+	}
+
+	dev_info(ctrl->nctrl.device,
+		"mapped %d/%d/%d default/read/poll queues.\n",
+		ctrl->queue_type_mapping[HCTX_TYPE_DEFAULT],
+		ctrl->queue_type_mapping[HCTX_TYPE_READ],
+		ctrl->queue_type_mapping[HCTX_TYPE_POLL]);
+
+	return 0;
+}
+
+static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
+{
+	/* Placeholder - Implement polling mechanism */
+
+	return 0;
+}
+
 static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
 	.queue_rq	= nvme_tcp_ofld_queue_rq,
 	.init_request	= nvme_tcp_ofld_init_request,
-	/*
-	 * All additional ops will be also implemented and registered similar to
-	 * tcp.c
-	 */
+	.complete	= nvme_complete_rq,
+	.exit_request	= nvme_tcp_ofld_exit_request,
+	.init_hctx	= nvme_tcp_ofld_init_hctx,
+	.map_queues	= nvme_tcp_ofld_map_queues,
+	.poll		= nvme_tcp_ofld_poll,
 };
 
 static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
 	.queue_rq	= nvme_tcp_ofld_queue_rq,
 	.init_request	= nvme_tcp_ofld_init_request,
-	/*
-	 * All additional ops will be also implemented and registered similar to
-	 * tcp.c
-	 */
+	.complete	= nvme_complete_rq,
+	.exit_request	= nvme_tcp_ofld_exit_request,
+	.init_hctx	= nvme_tcp_ofld_init_hctx,
 };
 
 static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
index d4eb62873de1..4802924f3dd8 100644
--- a/drivers/nvme/host/tcp-offload.h
+++ b/drivers/nvme/host/tcp-offload.h
@@ -42,11 +42,17 @@ struct nvme_tcp_ofld_req {
 		     __le16 status);
 };
 
+enum nvme_tcp_ofld_queue_flags {
+	NVME_TCP_OFLD_Q_ALLOCATED = 0,
+	NVME_TCP_OFLD_Q_LIVE = 1,
+};
+
 /* Allocated by nvme_tcp_ofld */
 struct nvme_tcp_ofld_queue {
 	/* Offload device associated to this queue */
 	struct nvme_tcp_ofld_dev *dev;
 	struct nvme_tcp_ofld_ctrl *ctrl;
+	unsigned long flags;
 
 	/* Vendor specific driver context */
 	void *private_data;
-- 
2.22.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] nvme-tcp-offload: Add IO level implementation
  2020-11-19 14:21 [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Shai Malin
                   ` (5 preceding siblings ...)
  2020-11-19 14:21 ` [PATCH 6/7] nvme-tcp-offload: Add queue level implementation Shai Malin
@ 2020-11-19 14:21 ` Shai Malin
  2020-11-24 10:41 ` [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Max Gurtovoy
  2020-11-26  1:26 ` Sagi Grimberg
  8 siblings, 0 replies; 16+ messages in thread
From: Shai Malin @ 2020-11-19 14:21 UTC (permalink / raw)
  To: linux-nvme, sagi, Erik.Smith, Douglas.Farley, liuw
  Cc: smalin, aelior, agershberg, mkalderon, nassa, dbalandin, malin1024

From: Dean Balandin <dbalandin@marvell.com>

In this patch, we present the IO level functionality.
The nvme-tcp-offload shall work on the IO-level, meaning the nvme-tcp-offload ULP module shall pass the request to the nvme-tcp-offload vendor driver and shall expect for the request compilation.
No additional handling is needed in between, this design will reduce the CPU utilization as we will describe below.

The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following IO-path ops:
 - init_req
 - map_sg - in order to map the request sg (similar to nvme_rdma_map_data).
 - send_req - in order to pass the request to the handling of the offload driver that shall pass it to the vendor specific device.

The vendor driver will manage the context from which the request will be executed and the request aggregations.
Once the IO completed, the nvme-tcp-offload vendor driver shall call command.done() that shall invoke the nvme-tcp-offload ULP layer for completing the request.

Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
---
 drivers/nvme/host/tcp-offload.c | 67 ++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index baf38526ccb9..6163f8360072 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -114,7 +114,10 @@ nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
 		       union nvme_result *result,
 		       __le16 status)
 {
-	/* Placeholder - complete request with/without error */
+	struct request *rq = blk_mq_rq_from_pdu(req);
+
+	if (!nvme_end_request(rq, cpu_to_le16(status << 1), *result))
+		nvme_complete_rq(rq);
 }
 
 struct nvme_tcp_ofld_dev *
@@ -722,8 +725,10 @@ nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
 {
 	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
 	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
+	int qid = (set == &ctrl->tag_set) ? hctx_idx + 1 : 0;
 
-	/* Placeholder - init request */
+	req->queue = &ctrl->queues[qid];
+	nvme_req(rq)->ctrl = &ctrl->nctrl;
 	req->done = nvme_tcp_ofld_req_done;
 	ctrl->dev->ops->init_req(req);
 
@@ -736,11 +741,25 @@ nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
 {
 	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(bd->rq);
 	struct nvme_tcp_ofld_queue *queue = hctx->driver_data;
-	struct nvme_tcp_ofld_ops *ops = queue->dev->ops;
+	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
+	struct nvme_ns *ns = hctx->queue->queuedata;
+	struct nvme_tcp_ofld_dev *dev = queue->dev;
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
+	struct request *rq = bd->rq;
+	bool queue_ready;
+	int rc;
 
-	/* Call nvme_setup_cmd(...) */
+	queue_ready = test_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags);
+	if (!nvmf_check_ready(&ctrl->nctrl, rq, queue_ready))
+		return nvmf_fail_nonready_command(&ctrl->nctrl, rq);
 
-	/* Call ops->map_sg(...) */
+	rc = nvme_setup_cmd(ns, rq, &req->nvme_cmd);
+	if (rc)
+		return rc;
+
+	blk_mq_start_request(rq);
+	ops->map_sg(dev, req);
+	ops->send_req(req);
 
 	return BLK_STS_OK;
 }
@@ -815,6 +834,42 @@ static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
 	return 0;
 }
 
+static enum blk_eh_timer_return
+nvme_tcp_ofld_timeout(struct request *rq, bool reserved)
+{
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_tcp_ofld_ctrl *ctrl = req->queue->ctrl;
+
+	/* Restart the timer if a controller reset is already scheduled. Any
+	 * timed out request would be handled before entering the connecting
+	 * state.
+	 */
+	if (ctrl->nctrl.state == NVME_CTRL_RESETTING)
+		return BLK_EH_RESET_TIMER;
+
+	dev_warn(ctrl->nctrl.device,
+		"queue %d: timeout request %#x type %d\n",
+		nvme_tcp_ofld_qid(req->queue), rq->tag,
+		req->nvme_cmd.common.opcode);
+
+	if (ctrl->nctrl.state != NVME_CTRL_LIVE) {
+		/*
+		 * Teardown immediately if controller times out while starting
+		 * or we are already started error recovery. all outstanding
+		 * requests are completed on shutdown, so we return BLK_EH_DONE.
+		 */
+		flush_work(&ctrl->err_work);
+		nvme_tcp_ofld_teardown_io_queues(&ctrl->nctrl, false);
+		nvme_tcp_ofld_teardown_admin_queue(&ctrl->nctrl, false);
+		return BLK_EH_DONE;
+	}
+
+	dev_warn(ctrl->nctrl.device, "starting error recovery\n");
+	nvme_tcp_ofld_error_recovery(&ctrl->nctrl);
+
+	return BLK_EH_RESET_TIMER;
+}
+
 static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
 	.queue_rq	= nvme_tcp_ofld_queue_rq,
 	.init_request	= nvme_tcp_ofld_init_request,
@@ -822,6 +877,7 @@ static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
 	.exit_request	= nvme_tcp_ofld_exit_request,
 	.init_hctx	= nvme_tcp_ofld_init_hctx,
 	.map_queues	= nvme_tcp_ofld_map_queues,
+	.timeout	= nvme_tcp_ofld_timeout,
 	.poll		= nvme_tcp_ofld_poll,
 };
 
@@ -831,6 +887,7 @@ static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
 	.complete	= nvme_complete_rq,
 	.exit_request	= nvme_tcp_ofld_exit_request,
 	.init_hctx	= nvme_tcp_ofld_init_hctx,
+	.timeout	= nvme_tcp_ofld_timeout,
 };
 
 static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
-- 
2.22.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP
  2020-11-19 14:21 [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Shai Malin
                   ` (6 preceding siblings ...)
  2020-11-19 14:21 ` [PATCH 7/7] nvme-tcp-offload: Add IO " Shai Malin
@ 2020-11-24 10:41 ` Max Gurtovoy
  2020-11-26  1:26 ` Sagi Grimberg
  8 siblings, 0 replies; 16+ messages in thread
From: Max Gurtovoy @ 2020-11-24 10:41 UTC (permalink / raw)
  To: Shai Malin, linux-nvme, sagi, Erik.Smith, Douglas.Farley, liuw
  Cc: aelior, agershberg, mkalderon, nassa, dbalandin, malin1024


On 11/19/2020 4:21 PM, Shai Malin wrote:
> This patch series introduces the nvme-tcp-offload ULP host layer, which will be a new transport type called "tcp-offload" and will serve as an abstraction layer to work with vendor specific nvme-tcp offload drivers.
>
> The nvme-tcp-offload transport can co-exist with the existing tcp and other transports. The tcp offload was designed so that stack changes are kept to a bare minimum: only registering new transports. All other APIs, ops etc. are identical to the regular tcp transport.
> Representing the TCP offload as a new transport allows clear and manageable differentiation between the connections which should use the offload path and those that are not offloaded (even on the same device).

why can't we extend the current NVMe-TCP driver and register vendor ops 
to it, instead of duplicating the entire driver ?

AFAIU, only the IO path logic is vendor specific but all the rest is the 
same.



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2020-11-19 14:21 ` [PATCH 1/7] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
@ 2020-11-26  1:22   ` Sagi Grimberg
  2020-11-26  1:55   ` Sagi Grimberg
  1 sibling, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2020-11-26  1:22 UTC (permalink / raw)
  To: Shai Malin, linux-nvme, Erik.Smith, Douglas.Farley, liuw
  Cc: aelior, agershberg, mkalderon, nassa, dbalandin, malin1024


> diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> index d92535997687..2c4c21f39a2f 100644
> --- a/include/linux/nvme.h
> +++ b/include/linux/nvme.h
> @@ -47,6 +47,7 @@ enum {
>   	NVMF_TRTYPE_RDMA	= 1,	/* RDMA */
>   	NVMF_TRTYPE_FC		= 2,	/* Fibre Channel */
>   	NVMF_TRTYPE_TCP		= 3,	/* TCP/IP */
> +	NVMF_TRTYPE_TCP_OFFLOAD = 4,	/* TCP/IP offload */
>   	NVMF_TRTYPE_LOOP	= 254,	/* Reserved for host usage */
>   	NVMF_TRTYPE_MAX,
>   };

We cannot do that because this is not a spec'd transport...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP
  2020-11-19 14:21 [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Shai Malin
                   ` (7 preceding siblings ...)
  2020-11-24 10:41 ` [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Max Gurtovoy
@ 2020-11-26  1:26 ` Sagi Grimberg
  2020-12-01 22:21   ` [EXT] " Shai Malin
  8 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-11-26  1:26 UTC (permalink / raw)
  To: Shai Malin, linux-nvme, Erik.Smith, Douglas.Farley, liuw
  Cc: aelior, agershberg, mkalderon, nassa, dbalandin, malin1024

You should tag this set with RFC, this is not in consideration
for inclusion...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/7] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
  2020-11-19 14:21 ` [PATCH 2/7] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Shai Malin
@ 2020-11-26  1:27   ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2020-11-26  1:27 UTC (permalink / raw)
  To: Shai Malin, linux-nvme, Erik.Smith, Douglas.Farley, liuw
  Cc: aelior, agershberg, mkalderon, nassa, dbalandin, malin1024

This looks reasonable...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2020-11-19 14:21 ` [PATCH 1/7] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
  2020-11-26  1:22   ` Sagi Grimberg
@ 2020-11-26  1:55   ` Sagi Grimberg
  2020-12-01 22:38     ` [EXT] " Shai Malin
  1 sibling, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-11-26  1:55 UTC (permalink / raw)
  To: Shai Malin, linux-nvme, Erik.Smith, Douglas.Farley, liuw
  Cc: aelior, agershberg, mkalderon, nassa, dbalandin, malin1024



On 11/19/20 6:21 AM, Shai Malin wrote:
> This patch will present the structure for the NVMeTCP offload common
> layer driver. This module is added under "drivers/nvme/host/" and future
> offload drivers which will register to it will be placed under
> "drivers/nvme/hw".
> This new driver will be enabled by the Kconfig "NVM Express over Fabrics
> TCP offload commmon layer".
> In order to support the new transport type, for host mode, no change is
> needed.
> 
> Each new vendor-specific offload driver will register to this ULP during
> its probe function, by filling out the nvme_tcp_ofld_dev->ops and
> nvme_tcp_ofld_dev->private_data and calling nvme_tcp_ofld_register_dev
> with the initialized struct.
> 
> The internal implementation:
> - tcp-offload.h:
>    Includes all common structs and ops to be used and shared by offload
>    drivers.
> 
> - tcp-offload.c:
>    Includes the init function which registers as a NVMf transport just
>    like any other transport.
> 
> Signed-off-by: Shai Malin <smalin@marvell.com>
> Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> ---
>   drivers/nvme/host/Kconfig       |  16 +++
>   drivers/nvme/host/Makefile      |   3 +
>   drivers/nvme/host/tcp-offload.c | 121 ++++++++++++++++++++++
>   drivers/nvme/host/tcp-offload.h | 173 ++++++++++++++++++++++++++++++++
>   include/linux/nvme.h            |   1 +
>   5 files changed, 314 insertions(+)
>   create mode 100644 drivers/nvme/host/tcp-offload.c
>   create mode 100644 drivers/nvme/host/tcp-offload.h
> 
> diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
> index 3ed9786b88d8..c05057e8a7a4 100644
> --- a/drivers/nvme/host/Kconfig
> +++ b/drivers/nvme/host/Kconfig
> @@ -83,3 +83,19 @@ config NVME_TCP
>   	  from https://github.com/linux-nvme/nvme-cli.
>   
>   	  If unsure, say N.
> +
> +config NVME_TCP_OFFLOAD
> +	tristate "NVM Express over Fabrics TCP offload common layer"
> +	default m
> +	depends on INET
> +	depends on BLK_DEV_NVME
> +	select NVME_FABRICS
> +	help
> +	  This provides support for the NVMe over Fabrics protocol using
> +	  the TCP offload transport. This allows you to use remote block devices
> +	  exported using the NVMe protocol set.
> +
> +	  To configure a NVMe over Fabrics controller use the nvme-cli tool
> +	  from https://github.com/linux-nvme/nvme-cli.
> +
> +	  If unsure, say N.
> diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
> index d7f6a87687b8..0e7ef044cf29 100644
> --- a/drivers/nvme/host/Makefile
> +++ b/drivers/nvme/host/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)		+= nvme-fabrics.o
>   obj-$(CONFIG_NVME_RDMA)			+= nvme-rdma.o
>   obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
>   obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
> +obj-$(CONFIG_NVME_TCP_OFFLOAD)	+= nvme-tcp-offload.o
>   
>   nvme-core-y				:= core.o
>   nvme-core-$(CONFIG_TRACING)		+= trace.o
> @@ -26,3 +27,5 @@ nvme-rdma-y				+= rdma.o
>   nvme-fc-y				+= fc.o
>   
>   nvme-tcp-y				+= tcp.o
> +
> +nvme-tcp-offload-y		+= tcp-offload.o
> diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
> new file mode 100644
> index 000000000000..dfb25da7a564
> --- /dev/null
> +++ b/drivers/nvme/host/tcp-offload.c
> @@ -0,0 +1,121 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright 2020 Marvell. All rights reserved.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +/* Kernel includes */
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +/* Driver includes */
> +#include "tcp-offload.h"
> +
> +static LIST_HEAD(nvme_tcp_ofld_devices);
> +static DECLARE_RWSEM(nvme_tcp_ofld_devices_rwsem);
> +
> +/**
> + * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
> + * function.
> + * @dev:	NVMeTCP offload device instance to be registered to the
> + *		common tcp offload instance.
> + *
> + * API function that registers the type of vendor specific driver
> + * being implemented to the common NVMe over TCP offload library. Part of
> + * the overall init sequence of starting up an offload driver.
> + */
> +int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev)
> +{
> +	struct nvme_tcp_ofld_ops *ops = dev->ops;
> +
> +	if (!ops->claim_dev ||
> +	    !ops->create_queue ||
> +	    !ops->drain_queue ||
> +	    !ops->destroy_queue ||
> +	    !ops->init_req ||
> +	    !ops->map_sg ||
> +	    !ops->send_req)
> +		return -EINVAL;
> +
> +	down_write(&nvme_tcp_ofld_devices_rwsem);
> +	list_add_tail(&dev->entry, &nvme_tcp_ofld_devices);
> +	up_write(&nvme_tcp_ofld_devices_rwsem);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_register_dev);
> +
> +/**
> + * nvme_tcp_ofld_unregister_dev() - NVMeTCP Offload Library unregistration
> + * function.
> + * @dev:	NVMeTCP offload device instance to be unregistered from the
> + *		common tcp offload instance.
> + *
> + * API function that unregisters the type of vendor specific driver being
> + * implemented from the common NVMe over TCP offload library.
> + * Part of the overall exit sequence of unloading the implemented driver.
> + */
> +void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)
> +{
> +	down_write(&nvme_tcp_ofld_devices_rwsem);
> +	list_del(&dev->entry);
> +	up_write(&nvme_tcp_ofld_devices_rwsem);
> +}
> +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
> +
> +/**
> + * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event
> + * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.
> + * @queue:	NVMeTCP offload queue instance on which the error has occurred.
> + *
> + * API function that allows the vendor specific offload driver to reports errors
> + * to the common offload layer, to invoke error recovery.
> + */
> +void nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)
> +{
> +	/* Placeholder - invoke error recovery flow */
> +}
> +
> +/**
> + * nvme_tcp_ofld_req_done() - NVMeTCP Offload request done callback
> + * function. Pointed to by nvme_tcp_ofld_req->done.
> + * @req:	NVMeTCP offload request to complete.
> + * @result:     The nvme_result.
> + * @status:     The completion status.
> + *
> + * API function that allows the vendor specific offload driver to report request
> + * completions to the common offload layer.
> + */
> +void
> +nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
> +		       union nvme_result *result,
> +		       __le16 status)
> +{
> +	/* Placeholder - complete request with/without error */
> +}
> +
> +static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
> +	.name		= "tcp_offload",
> +	.module		= THIS_MODULE,
> +	.required_opts	= NVMF_OPT_TRADDR,
> +	.allowed_opts	= NVMF_OPT_TRSVCID | NVMF_OPT_DISABLE_SQFLOW |
> +			  NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_HOST_TRADDR |
> +			  NVMF_OPT_CTRL_LOSS_TMO | NVMF_OPT_RECONNECT_DELAY |
> +			  NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
> +			  NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS,
> +};
> +
> +static int __init nvme_tcp_ofld_init_module(void)
> +{
> +	nvmf_register_transport(&nvme_tcp_ofld_transport);
> +
> +	return 0;
> +}
> +
> +static void __exit nvme_tcp_ofld_cleanup_module(void)
> +{
> +	nvmf_unregister_transport(&nvme_tcp_ofld_transport);
> +}
> +
> +module_init(nvme_tcp_ofld_init_module);
> +module_exit(nvme_tcp_ofld_cleanup_module);
> +MODULE_LICENSE("GPL v2");
> diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
> new file mode 100644
> index 000000000000..ed340b33d366
> --- /dev/null
> +++ b/drivers/nvme/host/tcp-offload.h
> @@ -0,0 +1,173 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
> +/*
> + * Copyright 2020 Marvell. All rights reserved.
> + */
> +
> +/* Linux includes */
> +#include <linux/dma-mapping.h>
> +#include <linux/scatterlist.h>
> +#include <linux/types.h>
> +#include <linux/nvme-tcp.h>
> +
> +/* Driver includes */
> +#include "nvme.h"
> +#include "fabrics.h"
> +
> +/* Forward declarations */
> +struct nvme_tcp_ofld_ops;
> +
> +/* Representation of a vendor-specific device. This is the struct used to
> + * register to the offload layer by the vendor-specific driver during its probe
> + * function.
> + * Allocated by vendor-specific driver.
> + */
> +struct nvme_tcp_ofld_dev {
> +	struct list_head entry;
> +	struct nvme_tcp_ofld_ops *ops;
> +};

No name? corresponding netdev?

> +
> +/* Per IO struct holding the nvme_request and command
> + * Allocated by blk-mq.
> + */
> +struct nvme_tcp_ofld_req {
> +	struct nvme_request req;
> +	struct nvme_command nvme_cmd;
> +	struct nvme_tcp_ofld_queue *queue;
> +
> +	/* Vendor specific driver context */
> +	void *private_data;
> +
> +	void (*done)(struct nvme_tcp_ofld_req *req,
> +		     union nvme_result *result,
> +		     __le16 status);

Where is the sgl?

> +};
> +
> +/* Allocated by nvme_tcp_ofld */
> +struct nvme_tcp_ofld_queue {
> +	/* Offload device associated to this queue */
> +	struct nvme_tcp_ofld_dev *dev;
> +	struct nvme_tcp_ofld_ctrl *ctrl;
> +
> +	/* Vendor specific driver context */
> +	void *private_data;
> +
> +	/* Error callback function */
> +	void (*report_err)(struct nvme_tcp_ofld_queue *queue);

Where is the actual error?

> +};
> +
> +/* Connectivity (routing) params used for establishing a connection */
> +struct nvme_tcp_ofld_ctrl_con_params {
> +	/* Input params */
> +	struct sockaddr_storage remote_ip_addr;
> +
> +	/*
> +	 * Can be input or output, depending if host traddr was passed.
> +	 * Vendor-specific driver should determine if it should use the passed
> +	 * addr or fill it on its own.
> +	 */
> +	struct sockaddr_storage local_ip_addr;

Why is this needed? Comment is unclear to me.

> +
> +	/* Output params */
> +	struct sockaddr	remote_mac_addr;
> +	struct sockaddr	local_mac_addr;
> +	u16 vlan_id;

And why are these needed for a tcp driver?

> +};
> +
> +/* Allocated by nvme_tcp_ofld */
> +struct nvme_tcp_ofld_ctrl {
> +	struct nvme_ctrl nctrl;
> +	struct nvme_tcp_ofld_dev *dev;
> +
> +	/* admin and IO queues */
> +	struct blk_mq_tag_set tag_set;
> +	struct blk_mq_tag_set admin_tag_set;
> +	struct nvme_tcp_ofld_queue *queues;
> +
> +	/*
> +	 * Each entry in the array indicates the number of queues of
> +	 * corresponding type.
> +	 */
> +	u32 queue_type_mapping[HCTX_MAX_TYPES];
> +
> +	/* Connectivity params */
> +	struct nvme_tcp_ofld_ctrl_con_params conn_params;
> +
> +	/* Vendor specific driver context */
> +	void *private_data;
> +};
> +
> +struct nvme_tcp_ofld_ops {
> +	const char *name;
> +	struct module *module;
> +
> +	/* For vendor-specific driver to report what opts it supports */
> +	int required_opts; /* bitmap using enum nvmf_parsing_opts */
> +	int allowed_opts; /* bitmap using enum nvmf_parsing_opts */

Why is this needed?

> +
> +	/**
> +	 * claim_dev: Return True if addr is reachable via offload device.
> +	 * @dev: The offload device to check.
> +	 * @conn_params: ptr to routing params to be filled by the lower
> +	 *               driver. Input+Output argument.
> +	 */
> +	bool (*claim_dev)(struct nvme_tcp_ofld_dev *dev,
> +			  struct nvme_tcp_ofld_ctrl_con_params *conn_params);
> +
> +	/**
> +	 * create_queue: Create offload queue and establish TCP + NVMeTCP
> +	 * (icreq+icresp) connection. Return true on successful connection.
> +	 * Based on nvme_tcp_alloc_queue.
> +	 * @queue: The queue itself.
> +	 * @qid: The queue ID associated with the requested queue.
> +	 * @q_size: The queue depth.
> +	 */
> +	bool (*create_queue)(struct nvme_tcp_ofld_queue *queue, int qid,
> +			     size_t q_size);

Why is this passing the queue and not getting the queue back from
create? anyways bool is not a good error propagation...

> +
> +	/**
> +	 * drain_queue: Drain a given queue - Returning from this function
> +	 * ensures that no additional completions will arrive on this queue.
> +	 * @queue: The queue to drain.
> +	 */
> +	void (*drain_queue)(struct nvme_tcp_ofld_queue *queue);
> +
> +	/**
> +	 * destroy_queue: Close the TCP + NVMeTCP connection of a given queue
> +	 * and make sure its no longer active (no completions will arrive on the
> +	 * queue).
> +	 * @queue: The queue to destroy.
> +	 */
> +	void (*destroy_queue)(struct nvme_tcp_ofld_queue *queue);
> +
> +	/**
> +	 * poll_queue: Poll a given queue for completions.
> +	 * @queue: The queue to poll.
> +	 */
> +	void (*poll_queue)(struct nvme_tcp_ofld_queue *queue);

This should return how much was processed.

> +
> +	/**
> +	 * init_req: Initialize vendor-specific params for a new request.
> +	 * @req: Ptr to request to be initialized. Input+Output argument.
> +	 */
> +	void (*init_req)(struct nvme_tcp_ofld_req *req);

void? what does it do?

> +
> +	/**
> +	 * send_req: Dispatch a request. Returns the execution status.
> +	 * @req: Ptr to request to be sent.
> +	 */
> +	blk_status_t (*send_req)(struct nvme_tcp_ofld_req *req);

This should return a normal int not a blk_status_t.

> +
> +	/**
> +	 * map_sg: Map a scatter/gather list to DMA addresses Returns the
> +	 * number of SGs entries mapped successfully.
> +	 * @dev: The device for which the DMA addresses are to be created.
> +	 * @req: The request corresponding to the SGs, to allow vendor-specific
> +	 *       driver to initialize additional params if it needs to.
> +	 */
> +	int (*map_sg)(struct nvme_tcp_ofld_dev *dev,
> +		      struct nvme_tcp_ofld_req *req);

really weird interface map_sg that doesn't receive nor return an sg...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [EXT] Re: [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP
  2020-11-26  1:26 ` Sagi Grimberg
@ 2020-12-01 22:21   ` Shai Malin
  0 siblings, 0 replies; 16+ messages in thread
From: Shai Malin @ 2020-12-01 22:21 UTC (permalink / raw)
  To: linux-nvme

On 11/26/2020 3:27 AM, Sagi Grimberg wrote:
> You should tag this set with RFC, this is not in consideration for inclusion...

Understood.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [EXT] Re: [PATCH 1/7] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2020-11-26  1:55   ` Sagi Grimberg
@ 2020-12-01 22:38     ` Shai Malin
  0 siblings, 0 replies; 16+ messages in thread
From: Shai Malin @ 2020-12-01 22:38 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, Erik.Smith, Douglas.Farley, liuw
  Cc: Ariel Elior, Arie Gershberg, Michal Kalderon, Nikolay Assa,
	Dean Balandin, malin1024

On 11/26/2020 3:55 AM, Sagi Grimberg wrote:
> On 11/19/20 6:21 AM, Shai Malin wrote:
> > This patch will present the structure for the NVMeTCP offload common
> > layer driver. This module is added under "drivers/nvme/host/" and
> > future offload drivers which will register to it will be placed under
> > "drivers/nvme/hw".
> > This new driver will be enabled by the Kconfig "NVM Express over
> > Fabrics TCP offload commmon layer".
> > In order to support the new transport type, for host mode, no change
> > is needed.
> >
> > Each new vendor-specific offload driver will register to this ULP
> > during its probe function, by filling out the nvme_tcp_ofld_dev->ops
> > and nvme_tcp_ofld_dev->private_data and calling
> > nvme_tcp_ofld_register_dev with the initialized struct.
> >
> > The internal implementation:
> > - tcp-offload.h:
> >    Includes all common structs and ops to be used and shared by offload
> >    drivers.
> >
> > - tcp-offload.c:
> >    Includes the init function which registers as a NVMf transport just
> >    like any other transport.
> >
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > ---
> >   drivers/nvme/host/Kconfig       |  16 +++
> >   drivers/nvme/host/Makefile      |   3 +
> >   drivers/nvme/host/tcp-offload.c | 121 ++++++++++++++++++++++
> >   drivers/nvme/host/tcp-offload.h | 173
> ++++++++++++++++++++++++++++++++
> >   include/linux/nvme.h            |   1 +
> >   5 files changed, 314 insertions(+)
> >   create mode 100644 drivers/nvme/host/tcp-offload.c
> >   create mode 100644 drivers/nvme/host/tcp-offload.h
> >
> > diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
> > index 3ed9786b88d8..c05057e8a7a4 100644
> > --- a/drivers/nvme/host/Kconfig
> > +++ b/drivers/nvme/host/Kconfig
> > @@ -83,3 +83,19 @@ config NVME_TCP
> >   	  from https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_linux-2Dnvme_nvme-
> 2Dcli&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=gL_O7mMt4U8-
> BQyJeNh97LwlWMWb6LnAoaOBqmewlbM&m=s3K5jBLKJtu9t4sFp3CVVMslo
> KYolMOL12FLMKP3nsM&s=8IqmanoPGNymN5p5kvQggktPXni1fG-
> 6Aq_byvBuWF0&e= .
> >
> >   	  If unsure, say N.
> > +
> > +config NVME_TCP_OFFLOAD
> > +	tristate "NVM Express over Fabrics TCP offload common layer"
> > +	default m
> > +	depends on INET
> > +	depends on BLK_DEV_NVME
> > +	select NVME_FABRICS
> > +	help
> > +	  This provides support for the NVMe over Fabrics protocol using
> > +	  the TCP offload transport. This allows you to use remote block
> devices
> > +	  exported using the NVMe protocol set.
> > +
> > +	  To configure a NVMe over Fabrics controller use the nvme-cli tool
> > +	  from https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_linux-2Dnvme_nvme-
> 2Dcli&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=gL_O7mMt4U8-
> BQyJeNh97LwlWMWb6LnAoaOBqmewlbM&m=s3K5jBLKJtu9t4sFp3CVVMslo
> KYolMOL12FLMKP3nsM&s=8IqmanoPGNymN5p5kvQggktPXni1fG-
> 6Aq_byvBuWF0&e= .
> > +
> > +	  If unsure, say N.
> > diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
> > index d7f6a87687b8..0e7ef044cf29 100644
> > --- a/drivers/nvme/host/Makefile
> > +++ b/drivers/nvme/host/Makefile
> > @@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)		+= nvme-
> fabrics.o
> >   obj-$(CONFIG_NVME_RDMA)			+= nvme-rdma.o
> >   obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
> >   obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
> > +obj-$(CONFIG_NVME_TCP_OFFLOAD)	+= nvme-tcp-offload.o
> >
> >   nvme-core-y				:= core.o
> >   nvme-core-$(CONFIG_TRACING)		+= trace.o
> > @@ -26,3 +27,5 @@ nvme-rdma-y				+= rdma.o
> >   nvme-fc-y				+= fc.o
> >
> >   nvme-tcp-y				+= tcp.o
> > +
> > +nvme-tcp-offload-y		+= tcp-offload.o
> > diff --git a/drivers/nvme/host/tcp-offload.c
> > b/drivers/nvme/host/tcp-offload.c new file mode 100644 index
> > 000000000000..dfb25da7a564
> > --- /dev/null
> > +++ b/drivers/nvme/host/tcp-offload.c
> > @@ -0,0 +1,121 @@
> > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > +/*
> > + * Copyright 2020 Marvell. All rights reserved.
> > + */
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +/* Kernel includes */
> > +#include <linux/kernel.h>
> > +#include <linux/module.h>
> > +
> > +/* Driver includes */
> > +#include "tcp-offload.h"
> > +
> > +static LIST_HEAD(nvme_tcp_ofld_devices); static
> > +DECLARE_RWSEM(nvme_tcp_ofld_devices_rwsem);
> > +
> > +/**
> > + * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library
> > +registration
> > + * function.
> > + * @dev:	NVMeTCP offload device instance to be registered to the
> > + *		common tcp offload instance.
> > + *
> > + * API function that registers the type of vendor specific driver
> > + * being implemented to the common NVMe over TCP offload library.
> > +Part of
> > + * the overall init sequence of starting up an offload driver.
> > + */
> > +int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev) {
> > +	struct nvme_tcp_ofld_ops *ops = dev->ops;
> > +
> > +	if (!ops->claim_dev ||
> > +	    !ops->create_queue ||
> > +	    !ops->drain_queue ||
> > +	    !ops->destroy_queue ||
> > +	    !ops->init_req ||
> > +	    !ops->map_sg ||
> > +	    !ops->send_req)
> > +		return -EINVAL;
> > +
> > +	down_write(&nvme_tcp_ofld_devices_rwsem);
> > +	list_add_tail(&dev->entry, &nvme_tcp_ofld_devices);
> > +	up_write(&nvme_tcp_ofld_devices_rwsem);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_register_dev);
> > +
> > +/**
> > + * nvme_tcp_ofld_unregister_dev() - NVMeTCP Offload Library
> > +unregistration
> > + * function.
> > + * @dev:	NVMeTCP offload device instance to be unregistered from
> the
> > + *		common tcp offload instance.
> > + *
> > + * API function that unregisters the type of vendor specific driver
> > +being
> > + * implemented from the common NVMe over TCP offload library.
> > + * Part of the overall exit sequence of unloading the implemented driver.
> > + */
> > +void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev) {
> > +	down_write(&nvme_tcp_ofld_devices_rwsem);
> > +	list_del(&dev->entry);
> > +	up_write(&nvme_tcp_ofld_devices_rwsem);
> > +}
> > +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
> > +
> > +/**
> > + * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error
> > +event
> > + * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.
> > + * @queue:	NVMeTCP offload queue instance on which the error has
> occurred.
> > + *
> > + * API function that allows the vendor specific offload driver to
> > +reports errors
> > + * to the common offload layer, to invoke error recovery.
> > + */
> > +void nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue
> > +*queue) {
> > +	/* Placeholder - invoke error recovery flow */ }
> > +
> > +/**
> > + * nvme_tcp_ofld_req_done() - NVMeTCP Offload request done callback
> > + * function. Pointed to by nvme_tcp_ofld_req->done.
> > + * @req:	NVMeTCP offload request to complete.
> > + * @result:     The nvme_result.
> > + * @status:     The completion status.
> > + *
> > + * API function that allows the vendor specific offload driver to
> > +report request
> > + * completions to the common offload layer.
> > + */
> > +void
> > +nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
> > +		       union nvme_result *result,
> > +		       __le16 status)
> > +{
> > +	/* Placeholder - complete request with/without error */ }
> > +
> > +static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
> > +	.name		= "tcp_offload",
> > +	.module		= THIS_MODULE,
> > +	.required_opts	= NVMF_OPT_TRADDR,
> > +	.allowed_opts	= NVMF_OPT_TRSVCID |
> NVMF_OPT_DISABLE_SQFLOW |
> > +			  NVMF_OPT_NR_WRITE_QUEUES |
> NVMF_OPT_HOST_TRADDR |
> > +			  NVMF_OPT_CTRL_LOSS_TMO |
> NVMF_OPT_RECONNECT_DELAY |
> > +			  NVMF_OPT_HDR_DIGEST |
> NVMF_OPT_DATA_DIGEST |
> > +			  NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS,
> };
> > +
> > +static int __init nvme_tcp_ofld_init_module(void) {
> > +	nvmf_register_transport(&nvme_tcp_ofld_transport);
> > +
> > +	return 0;
> > +}
> > +
> > +static void __exit nvme_tcp_ofld_cleanup_module(void)
> > +{
> > +	nvmf_unregister_transport(&nvme_tcp_ofld_transport);
> > +}
> > +
> > +module_init(nvme_tcp_ofld_init_module);
> > +module_exit(nvme_tcp_ofld_cleanup_module);
> > +MODULE_LICENSE("GPL v2");
> > diff --git a/drivers/nvme/host/tcp-offload.h
> > b/drivers/nvme/host/tcp-offload.h new file mode 100644 index
> > 000000000000..ed340b33d366
> > --- /dev/null
> > +++ b/drivers/nvme/host/tcp-offload.h
> > @@ -0,0 +1,173 @@
> > +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
> > +/*
> > + * Copyright 2020 Marvell. All rights reserved.
> > + */
> > +
> > +/* Linux includes */
> > +#include <linux/dma-mapping.h>
> > +#include <linux/scatterlist.h>
> > +#include <linux/types.h>
> > +#include <linux/nvme-tcp.h>
> > +
> > +/* Driver includes */
> > +#include "nvme.h"
> > +#include "fabrics.h"
> > +
> > +/* Forward declarations */
> > +struct nvme_tcp_ofld_ops;
> > +
> > +/* Representation of a vendor-specific device. This is the struct
> > +used to
> > + * register to the offload layer by the vendor-specific driver during
> > +its probe
> > + * function.
> > + * Allocated by vendor-specific driver.
> > + */
> > +struct nvme_tcp_ofld_dev {
> > +	struct list_head entry;
> > +	struct nvme_tcp_ofld_ops *ops;
> > +};
> 
> No name? corresponding netdev?

The name exists in nvme_tcp_ofld_ops (same as in nvmf_transport_ops).
The netdev is not needed because the op claim_dev() implements the upper 
layer usage of netdev.

> 
> > +
> > +/* Per IO struct holding the nvme_request and command
> > + * Allocated by blk-mq.
> > + */
> > +struct nvme_tcp_ofld_req {
> > +	struct nvme_request req;
> > +	struct nvme_command nvme_cmd;
> > +	struct nvme_tcp_ofld_queue *queue;
> > +
> > +	/* Vendor specific driver context */
> > +	void *private_data;
> > +
> > +	void (*done)(struct nvme_tcp_ofld_req *req,
> > +		     union nvme_result *result,
> > +		     __le16 status);
> 
> Where is the sgl?

In our design, we don’t see the need for the upper layer to hold the sg and we 
suggest that the sg will be owned by the vendor driver which will convert it 
into the relevant HW HSI. 
This is demonstrated in patch 7 (Add IO level implementation) in 
nvme_tcp_ofld_queue_rq().

> 
> > +};
> > +
> > +/* Allocated by nvme_tcp_ofld */
> > +struct nvme_tcp_ofld_queue {
> > +	/* Offload device associated to this queue */
> > +	struct nvme_tcp_ofld_dev *dev;
> > +	struct nvme_tcp_ofld_ctrl *ctrl;
> > +
> > +	/* Vendor specific driver context */
> > +	void *private_data;
> > +
> > +	/* Error callback function */
> > +	void (*report_err)(struct nvme_tcp_ofld_queue *queue);
> 
> Where is the actual error?

Will be added in V2.

> 
> > +};
> > +
> > +/* Connectivity (routing) params used for establishing a connection
> > +*/ struct nvme_tcp_ofld_ctrl_con_params {
> > +	/* Input params */
> > +	struct sockaddr_storage remote_ip_addr;
> > +
> > +	/*
> > +	 * Can be input or output, depending if host traddr was passed.
> > +	 * Vendor-specific driver should determine if it should use the passed
> > +	 * addr or fill it on its own.
> > +	 */
> > +	struct sockaddr_storage local_ip_addr;
> 
> Why is this needed? Comment is unclear to me.

I will rephrase the comment.

If NVMF_OPT_HOST_TRADDR is provided it will be set in local_ip_addr as 
presented in patch 4 (Add controller level implementation) in 
nvme_tcp_ofld_create_ctrl().
If NVMF_OPT_HOST_TRADDR is not provided it will be initialized by claim_dev().

> 
> > +
> > +	/* Output params */
> > +	struct sockaddr	remote_mac_addr;
> > +	struct sockaddr	local_mac_addr;
> > +	u16 vlan_id;
> 
> And why are these needed for a tcp driver?

The offload device offloads all the layers and not only the tcp layer. 
claim_dev() queries all the relevant parameters including the mac and the 
vlan from the net_dev in order to pass those to the offload device as part 
of create_queue().

> 
> > +};
> > +
> > +/* Allocated by nvme_tcp_ofld */
> > +struct nvme_tcp_ofld_ctrl {
> > +	struct nvme_ctrl nctrl;
> > +	struct nvme_tcp_ofld_dev *dev;
> > +
> > +	/* admin and IO queues */
> > +	struct blk_mq_tag_set tag_set;
> > +	struct blk_mq_tag_set admin_tag_set;
> > +	struct nvme_tcp_ofld_queue *queues;
> > +
> > +	/*
> > +	 * Each entry in the array indicates the number of queues of
> > +	 * corresponding type.
> > +	 */
> > +	u32 queue_type_mapping[HCTX_MAX_TYPES];
> > +
> > +	/* Connectivity params */
> > +	struct nvme_tcp_ofld_ctrl_con_params conn_params;
> > +
> > +	/* Vendor specific driver context */
> > +	void *private_data;
> > +};
> > +
> > +struct nvme_tcp_ofld_ops {
> > +	const char *name;
> > +	struct module *module;
> > +
> > +	/* For vendor-specific driver to report what opts it supports */
> > +	int required_opts; /* bitmap using enum nvmf_parsing_opts */
> > +	int allowed_opts; /* bitmap using enum nvmf_parsing_opts */
> 
> Why is this needed?

This was added in order to allow each vendor specific driver to support only 
a subset of the opts while the nvme-tcp offload ULP will verify per controller 
if the requested opts are supported.
The implementation is demonstrated in patch 4 (Add controller level 
implementation) in nvme_tcp_ofld_check_dev_opts().

> 
> > +
> > +	/**
> > +	 * claim_dev: Return True if addr is reachable via offload device.
> > +	 * @dev: The offload device to check.
> > +	 * @conn_params: ptr to routing params to be filled by the lower
> > +	 *               driver. Input+Output argument.
> > +	 */
> > +	bool (*claim_dev)(struct nvme_tcp_ofld_dev *dev,
> > +			  struct nvme_tcp_ofld_ctrl_con_params
> *conn_params);
> > +
> > +	/**
> > +	 * create_queue: Create offload queue and establish TCP +
> NVMeTCP
> > +	 * (icreq+icresp) connection. Return true on successful connection.
> > +	 * Based on nvme_tcp_alloc_queue.
> > +	 * @queue: The queue itself.
> > +	 * @qid: The queue ID associated with the requested queue.
> > +	 * @q_size: The queue depth.
> > +	 */
> > +	bool (*create_queue)(struct nvme_tcp_ofld_queue *queue, int qid,
> > +			     size_t q_size);
> 
> Why is this passing the queue and not getting the queue back from create?

The queue includes all the attributes which are needed for the offload device 
in order to create the connection.
The vendor driver will update the queue.private_data as the output of this 
function.

> anyways bool is not a good error propagation...

Will updated in V2.

> 
> > +
> > +	/**
> > +	 * drain_queue: Drain a given queue - Returning from this function
> > +	 * ensures that no additional completions will arrive on this queue.
> > +	 * @queue: The queue to drain.
> > +	 */
> > +	void (*drain_queue)(struct nvme_tcp_ofld_queue *queue);
> > +
> > +	/**
> > +	 * destroy_queue: Close the TCP + NVMeTCP connection of a given
> queue
> > +	 * and make sure its no longer active (no completions will arrive on
> the
> > +	 * queue).
> > +	 * @queue: The queue to destroy.
> > +	 */
> > +	void (*destroy_queue)(struct nvme_tcp_ofld_queue *queue);
> > +
> > +	/**
> > +	 * poll_queue: Poll a given queue for completions.
> > +	 * @queue: The queue to poll.
> > +	 */
> > +	void (*poll_queue)(struct nvme_tcp_ofld_queue *queue);
> 
> This should return how much was processed.

Will updated in V2.

> 
> > +
> > +	/**
> > +	 * init_req: Initialize vendor-specific params for a new request.
> > +	 * @req: Ptr to request to be initialized. Input+Output argument.
> > +	 */
> > +	void (*init_req)(struct nvme_tcp_ofld_req *req);
> 
> void? what does it do?

Will be changed to int in V2.

This function will include the vendor driver implementation of 
nvme_tcp_init_request().

> 
> > +
> > +	/**
> > +	 * send_req: Dispatch a request. Returns the execution status.
> > +	 * @req: Ptr to request to be sent.
> > +	 */
> > +	blk_status_t (*send_req)(struct nvme_tcp_ofld_req *req);
> 
> This should return a normal int not a blk_status_t.

Will be changed to int in V2.

> 
> > +
> > +	/**
> > +	 * map_sg: Map a scatter/gather list to DMA addresses Returns the
> > +	 * number of SGs entries mapped successfully.
> > +	 * @dev: The device for which the DMA addresses are to be created.
> > +	 * @req: The request corresponding to the SGs, to allow vendor-
> specific
> > +	 *       driver to initialize additional params if it needs to.
> > +	 */
> > +	int (*map_sg)(struct nvme_tcp_ofld_dev *dev,
> > +		      struct nvme_tcp_ofld_req *req);
> 
> really weird interface map_sg that doesn't receive nor return an sg...

The suggested usage of the sg is only within the vendor driver. We can change 
map_sg() to return the sg but I don’t see how it will be used in the upper 
layer.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP
       [not found] <PH0PR18MB38457E1674A23D869414FECACCFA0@PH0PR18MB3845.namprd18.prod.outlook.com>
@ 2020-11-26  1:19 ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2020-11-26  1:19 UTC (permalink / raw)
  To: Shai Malin, linux-nvme, Max Gurtovoy, Erik.Smith, Douglas.Farley, liuw
  Cc: Ariel Elior, Arie Gershberg, Michal Kalderon, Nikolay Assa,
	Dean Balandin, malin1024


>>> This patch series introduces the nvme-tcp-offload ULP host layer, which will
>>> be a new transport type called "tcp-offload" and will serve as an abstraction
>>> layer to work with vendor specific nvme-tcp offload drivers.
>>>
>>> The nvme-tcp-offload transport can co-exist with the existing tcp and other
>>> transports. The tcp offload was designed so that stack changes are kept to a
>>> bare minimum: only registering new transports. All other APIs, ops etc. are
>>> identical to the regular tcp transport.
>>> Representing the TCP offload as a new transport allows clear and
>>> manageable differentiation between the connections which should use the
>>> offload path and those that are not offloaded (even on the same device).
>>
>> why can't we extend the current NVMe-TCP driver and register vendor ops
>> to it, instead of duplicating the entire driver ?
>>
>> AFAIU, only the IO path logic is vendor specific but all the rest is the same.
>>
> 
> The reasons it's separated into a new transport (and a new module):
> 1. The offload we are adding is an offload of the entire tcp layer and most of
>      the nvme-tcp layer above it. When this offload is active, there are no tcp
>      connections in the stack, and no tcp sockets (just like the existing in kernel
>      tcp offload models for iWARP and iSCSI). From the nvme stack perspective it
>      really is a separate transport.
> 2. The offload is not just IO path, but also the entire control plane, including
>      tcp retransmissions, connection establishment and connection error handling.
> 3. Keeping the transports separate would allow each of them to evolve without
>      worrying about breaking each other.
> 4. If significant code duplication exists anywhere, we can solve it with
>      tcp_common functions between the transports.
> 

Hey, sorry it took me a while to get to this...

I'll try to give my PoV. past experience with the suggestion
of layering a TCP storage driver to support both offload
and SW (e.g. iscsi) resulted in a high degree of difficulty
to change things without breaking something (on both ends, both
sw and offload). This goes for both data plane and also the control
plane.

I think that layering would result in indirect interfaces that
can result in a higher degree of duplication. Hence I think that
this proposed approach would be a better way to go (yet to be proven
though).

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-12-01 22:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19 14:21 [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Shai Malin
2020-11-19 14:21 ` [PATCH 1/7] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
2020-11-26  1:22   ` Sagi Grimberg
2020-11-26  1:55   ` Sagi Grimberg
2020-12-01 22:38     ` [EXT] " Shai Malin
2020-11-19 14:21 ` [PATCH 2/7] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Shai Malin
2020-11-26  1:27   ` Sagi Grimberg
2020-11-19 14:21 ` [PATCH 3/7] nvme-tcp-offload: Add device scan implementation Shai Malin
2020-11-19 14:21 ` [PATCH 4/7] nvme-tcp-offload: Add controller level implementation Shai Malin
2020-11-19 14:21 ` [PATCH 5/7] nvme-tcp-offload: Add controller level error recovery implementation Shai Malin
2020-11-19 14:21 ` [PATCH 6/7] nvme-tcp-offload: Add queue level implementation Shai Malin
2020-11-19 14:21 ` [PATCH 7/7] nvme-tcp-offload: Add IO " Shai Malin
2020-11-24 10:41 ` [PATCH 0/7] RFC patch series - NVMeTCP Offload ULP Max Gurtovoy
2020-11-26  1:26 ` Sagi Grimberg
2020-12-01 22:21   ` [EXT] " Shai Malin
     [not found] <PH0PR18MB38457E1674A23D869414FECACCFA0@PH0PR18MB3845.namprd18.prod.outlook.com>
2020-11-26  1:19 ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.