Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver
@ 2021-04-29 19:08 Shai Malin
  2021-04-29 19:09 ` [RFC PATCH v4 01/27] qed: Add NVMeTCP Offload PF Level FW and HW HSI Shai Malin
                   ` (27 more replies)
  0 siblings, 28 replies; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:08 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

With the goal of enabling a generic infrastructure that allows NVMe/TCP 
offload devices like NICs to seamlessly plug into the NVMe-oF stack, this 
patch series introduces the nvme-tcp-offload ULP host layer, which will 
be a new transport type called "tcp-offload" and will serve as an 
abstraction layer to work with vendor specific nvme-tcp offload drivers.

NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes 
both the TCP level and the NVMeTCP level.

The nvme-tcp-offload transport can co-exist with the existing tcp and 
other transports. The tcp offload was designed so that stack changes are 
kept to a bare minimum: only registering new transports. 
All other APIs, ops etc. are identical to the regular tcp transport.
Representing the TCP offload as a new transport allows clear and manageable
differentiation between the connections which should use the offload path
and those that are not offloaded (even on the same device).


The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:

* NVMe layer: *

       [ nvme/nvme-fabrics/blk-mq ]
             |
        (nvme API and blk-mq API)
             |
             |			 
* Vendor agnostic transport layer: *

      [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
             |        |             |
           (Verbs) 
             |        |             |
             |     (Socket)
             |        |             |
             |        |        (nvme-tcp-offload API)
             |        |             |
             |        |             |
* Vendor Specific Driver: *

             |        |             |
           [ qedr ]       
                      |             |
                   [ qede ]
                                    |
                                  [ qedn ]


Performance:
============
With this implementation on top of the Marvell qedn driver (using the
Marvell FastLinQ NIC), we were able to demonstrate the following CPU 
utilization improvement:

On AMD EPYC 7402, 2.80GHz, 28 cores:
- For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate): 
  Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with 
  NVMeTCP offload.

On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores: 
- For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate): 
  Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with 
  NVMeTCP offload.

In addition, we were able to demonstrate the following latency improvement:
- For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
  Improved the average latency from 105 usec with NVMeTCP SW to 39 usec 
  with NVMeTCP offload.
  
  Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec 
  with NVMeTCP offload.

The end-to-end offload latency was measured from fio while running against 
back end of null device.


Upstream plan:
==============
Following this RFC, the series will be sent in a modular way so that changes 
in each part will not impact the previous part.

- Part 1 (Patches 1-7):
  The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.

- Part 2 (Patch 8-15): 
  The nvme-tcp-offload patches, will be sent to 
  'linux-nvme@lists.infradead.org'.

- Part 3 (Packet 16-27):
  The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
 

Queue Initialization Design:
============================
The nvme-tcp-offload ULP module shall register with the existing 
nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following ops:
- claim_dev() - in order to resolve the route to the target according to
                the paired net_dev.
- create_queue() - in order to create offloaded nvme-tcp queue.

The nvme-tcp-offload ULP module shall manage all the controller level
functionalities, call claim_dev and based on the return values shall call
the relevant module create_queue in order to create the admin queue and
the IO queues.


IO-path Design:
===============
The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload 
ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor
driver and later, the nvme-tcp-offload vendor driver returns the request
completion (the IO completion).
No additional handling is needed in between; this design will reduce the
CPU utilization as we will describe below.

The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following IO-path ops:
- init_req()
- send_req() - in order to pass the request to the handling of the
               offload driver that shall pass it to the vendor specific device.
- poll_queue()

Once the IO completes, the nvme-tcp-offload vendor driver shall call 
command.done() that will invoke the nvme-tcp-offload ULP layer to
complete the request.


TCP events:
===========
The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions
and OOO events.


Teardown and errors:
====================
In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall
call the nvme_tcp_ofld_report_queue_err.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following teardown ops:
- drain_queue()
- destroy_queue()


The Marvell FastLinQ NIC HW engine:
====================================
The Marvell NIC HW engine is capable of offloading the entire TCP/IP
stack and managing up to 64K connections per PF, already implemented and 
upstream use cases for this include iWARP (by the Marvell qedr driver) 
and iSCSI (by the Marvell qedi driver).
In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
and is able to manage the IO level also in case of TCP re-transmissions
and OOO events.
The HW engine enables direct data placement (including the data digest CRC
calculation and validation) and direct data transmission (including data
digest CRC calculation).


The Marvell qedn driver:
========================
The new driver will be added under "drivers/nvme/hw" and will be enabled
by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
As part of the qedn init, the driver will register as a pci device driver 
and will work with the Marvell fastlinQ NIC.
As part of the probe, the driver will register to the nvme_tcp_offload
(ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
"qed_*_ops" which are used by the qede, qedr, qedf and qedi device
drivers.
  

QEDN Future work:
=================
- Support extended HW resources.
- Digest support.
- Devlink support for device configuration and TCP offload configurations.
- Statistics

 
Long term future work:
======================
- The nvme-tcp-offload ULP target abstraction layer.
- The Marvell nvme-tcp-offload "qednt" target driver.


Changes since RFC v1:
=====================
- Fix nvme_tcp_ofld_ops return values.
- Remove NVMF_TRTYPE_TCP_OFFLOAD.
- Add nvme_tcp_ofld_poll() implementation.
- Fix nvme_tcp_ofld_queue_rq() to check map_sg() and send_req() return
  values.

Changes since RFC v2:
=====================
- Add qedn - Marvell's NVMeTCP HW offload vendor driver init and probe
  (patches 8-11).
- Fixes in controller and queue level (patches 3-6).
  
Changes since RFC v3:
=====================
- Add the full implementation of the nvme-tcp-offload layer including the 
  new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new flows (ASYNC
  and timeout).
- Add nvme-tcp-offload device maximums: max_hw_sectors, max_segments.
- Add nvme-tcp-offload layer design and optimization changes.
- Add the qedn full implementation for the conn level, IO path and error 
  handling.
- Add qed support for the new AHP HW. 


Arie Gershberg (3):
  nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
    definitions
  nvme-tcp-offload: Add controller level implementation
  nvme-tcp-offload: Add controller level error recovery implementation

Dean Balandin (3):
  nvme-tcp-offload: Add device scan implementation
  nvme-tcp-offload: Add queue level implementation
  nvme-tcp-offload: Add IO level implementation

Nikolay Assa (2):
  qed: Add IP services APIs support
  qedn: Add qedn_claim_dev API support

Omkar Kulkarni (1):
  qed: Add qed-NVMeTCP personality

Prabhakar Kushwaha (6):
  qed: Add support of HW filter block
  qedn: Add connection-level slowpath functionality
  qedn: Add support of configuring HW filter block
  qedn: Add support of Task and SGL
  qedn: Add support of NVME ICReq & ICResp
  qedn: Add support of ASYNC

Shai Malin (12):
  qed: Add NVMeTCP Offload PF Level FW and HW HSI
  qed: Add NVMeTCP Offload Connection Level FW and HW HSI
  qed: Add NVMeTCP Offload IO Level FW and HW HSI
  qed: Add NVMeTCP Offload IO Level FW Initializations
  nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  nvme-tcp-offload: Add Timeout and ASYNC Support
  qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver
  qedn: Add qedn probe
  qedn: Add IRQ and fast-path resources initializations
  qedn: Add IO level nvme_req and fw_cq workqueues
  qedn: Add IO level fastpath functionality
  qedn: Add Connection and IO level recovery flows

 MAINTAINERS                                   |   10 +
 drivers/net/ethernet/qlogic/Kconfig           |    3 +
 drivers/net/ethernet/qlogic/qed/Makefile      |    5 +
 drivers/net/ethernet/qlogic/qed/qed.h         |   16 +
 drivers/net/ethernet/qlogic/qed/qed_cxt.c     |   32 +
 drivers/net/ethernet/qlogic/qed/qed_cxt.h     |    1 +
 drivers/net/ethernet/qlogic/qed/qed_dev.c     |  151 +-
 drivers/net/ethernet/qlogic/qed/qed_hsi.h     |    4 +-
 drivers/net/ethernet/qlogic/qed/qed_ll2.c     |   31 +-
 drivers/net/ethernet/qlogic/qed/qed_mcp.c     |    3 +
 drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |    3 +-
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |  868 +++++++++++
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  114 ++
 .../qlogic/qed/qed_nvmetcp_fw_funcs.c         |  372 +++++
 .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |   43 +
 .../qlogic/qed/qed_nvmetcp_ip_services.c      |  239 +++
 drivers/net/ethernet/qlogic/qed/qed_ooo.c     |    5 +-
 drivers/net/ethernet/qlogic/qed/qed_sp.h      |    5 +
 .../net/ethernet/qlogic/qed/qed_sp_commands.c |    1 +
 drivers/nvme/Kconfig                          |    1 +
 drivers/nvme/Makefile                         |    1 +
 drivers/nvme/host/Kconfig                     |   16 +
 drivers/nvme/host/Makefile                    |    3 +
 drivers/nvme/host/fabrics.c                   |    7 -
 drivers/nvme/host/fabrics.h                   |    7 +
 drivers/nvme/host/tcp-offload.c               | 1330 +++++++++++++++++
 drivers/nvme/host/tcp-offload.h               |  209 +++
 drivers/nvme/hw/Kconfig                       |    9 +
 drivers/nvme/hw/Makefile                      |    3 +
 drivers/nvme/hw/qedn/Makefile                 |    4 +
 drivers/nvme/hw/qedn/qedn.h                   |  435 ++++++
 drivers/nvme/hw/qedn/qedn_conn.c              |  999 +++++++++++++
 drivers/nvme/hw/qedn/qedn_main.c              | 1153 ++++++++++++++
 drivers/nvme/hw/qedn/qedn_task.c              |  977 ++++++++++++
 include/linux/qed/common_hsi.h                |    1 +
 include/linux/qed/nvmetcp_common.h            |  616 ++++++++
 include/linux/qed/qed_if.h                    |   22 +
 include/linux/qed/qed_nvmetcp_if.h            |  244 +++
 .../linux/qed/qed_nvmetcp_ip_services_if.h    |   29 +
 39 files changed, 7947 insertions(+), 25 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
 create mode 100644 drivers/nvme/host/tcp-offload.c
 create mode 100644 drivers/nvme/host/tcp-offload.h
 create mode 100644 drivers/nvme/hw/Kconfig
 create mode 100644 drivers/nvme/hw/Makefile
 create mode 100644 drivers/nvme/hw/qedn/Makefile
 create mode 100644 drivers/nvme/hw/qedn/qedn.h
 create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c
 create mode 100644 drivers/nvme/hw/qedn/qedn_main.c
 create mode 100644 drivers/nvme/hw/qedn/qedn_task.c
 create mode 100644 include/linux/qed/nvmetcp_common.h
 create mode 100644 include/linux/qed/qed_nvmetcp_if.h
 create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h

-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 01/27] qed: Add NVMeTCP Offload PF Level FW and HW HSI
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-01 16:50   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 02/27] qed: Add NVMeTCP Offload Connection " Shai Malin
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Dean Balandin

This patch introduces the NVMeTCP device and PF level HSI and HSI
functionality in order to initialize and interact with the HW device.

This patch is based on the qede, qedr, qedi, qedf drivers HSI.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
---
 drivers/net/ethernet/qlogic/Kconfig           |   3 +
 drivers/net/ethernet/qlogic/qed/Makefile      |   2 +
 drivers/net/ethernet/qlogic/qed/qed.h         |   3 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h     |   1 +
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 282 ++++++++++++++++++
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  51 ++++
 drivers/net/ethernet/qlogic/qed/qed_sp.h      |   2 +
 include/linux/qed/common_hsi.h                |   1 +
 include/linux/qed/nvmetcp_common.h            |  54 ++++
 include/linux/qed/qed_if.h                    |  22 ++
 include/linux/qed/qed_nvmetcp_if.h            |  72 +++++
 11 files changed, 493 insertions(+)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
 create mode 100644 include/linux/qed/nvmetcp_common.h
 create mode 100644 include/linux/qed/qed_nvmetcp_if.h

diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig
index 6b5ddb07ee83..98f430905ffa 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -110,6 +110,9 @@ config QED_RDMA
 config QED_ISCSI
 	bool
 
+config QED_NVMETCP
+	bool
+
 config QED_FCOE
 	bool
 
diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile
index 8251755ec18c..7cb0db67ba5b 100644
--- a/drivers/net/ethernet/qlogic/qed/Makefile
+++ b/drivers/net/ethernet/qlogic/qed/Makefile
@@ -28,6 +28,8 @@ qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o
 qed-$(CONFIG_QED_LL2) += qed_ll2.o
 qed-$(CONFIG_QED_OOO) += qed_ooo.o
 
+qed-$(CONFIG_QED_NVMETCP) += qed_nvmetcp.o
+
 qed-$(CONFIG_QED_RDMA) +=	\
 	qed_iwarp.o		\
 	qed_rdma.o		\
diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
index a20cb8a0c377..91d4635009ab 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -240,6 +240,7 @@ enum QED_FEATURE {
 	QED_VF,
 	QED_RDMA_CNQ,
 	QED_ISCSI_CQ,
+	QED_NVMETCP_CQ = QED_ISCSI_CQ,
 	QED_FCOE_CQ,
 	QED_VF_L2_QUE,
 	QED_MAX_FEATURES,
@@ -592,6 +593,7 @@ struct qed_hwfn {
 	struct qed_ooo_info		*p_ooo_info;
 	struct qed_rdma_info		*p_rdma_info;
 	struct qed_iscsi_info		*p_iscsi_info;
+	struct qed_nvmetcp_info		*p_nvmetcp_info;
 	struct qed_fcoe_info		*p_fcoe_info;
 	struct qed_pf_params		pf_params;
 
@@ -828,6 +830,7 @@ struct qed_dev {
 		struct qed_eth_cb_ops		*eth;
 		struct qed_fcoe_cb_ops		*fcoe;
 		struct qed_iscsi_cb_ops		*iscsi;
+		struct qed_nvmetcp_cb_ops	*nvmetcp;
 	} protocol_ops;
 	void				*ops_cookie;
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 559df9f4d656..24472f6a83c2 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -20,6 +20,7 @@
 #include <linux/qed/fcoe_common.h>
 #include <linux/qed/eth_common.h>
 #include <linux/qed/iscsi_common.h>
+#include <linux/qed/nvmetcp_common.h>
 #include <linux/qed/iwarp_common.h>
 #include <linux/qed/rdma_common.h>
 #include <linux/qed/roce_common.h>
diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
new file mode 100644
index 000000000000..da3b5002d216
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
@@ -0,0 +1,282 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
+/* Copyright 2021 Marvell. All rights reserved. */
+
+#include <linux/types.h>
+#include <asm/byteorder.h>
+#include <asm/param.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/etherdevice.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/stddef.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/list.h>
+#include <linux/qed/qed_nvmetcp_if.h>
+#include "qed.h"
+#include "qed_cxt.h"
+#include "qed_dev_api.h"
+#include "qed_hsi.h"
+#include "qed_hw.h"
+#include "qed_int.h"
+#include "qed_nvmetcp.h"
+#include "qed_ll2.h"
+#include "qed_mcp.h"
+#include "qed_sp.h"
+#include "qed_reg_addr.h"
+
+static int qed_nvmetcp_async_event(struct qed_hwfn *p_hwfn, u8 fw_event_code,
+				   u16 echo, union event_ring_data *data,
+				   u8 fw_return_code)
+{
+	if (p_hwfn->p_nvmetcp_info->event_cb) {
+		struct qed_nvmetcp_info *p_nvmetcp = p_hwfn->p_nvmetcp_info;
+
+		return p_nvmetcp->event_cb(p_nvmetcp->event_context,
+					 fw_event_code, data);
+	} else {
+		DP_NOTICE(p_hwfn, "nvmetcp async completion is not set\n");
+
+		return -EINVAL;
+	}
+}
+
+static int qed_sp_nvmetcp_func_start(struct qed_hwfn *p_hwfn,
+				     enum spq_mode comp_mode,
+				     struct qed_spq_comp_cb *p_comp_addr,
+				     void *event_context,
+				     nvmetcp_event_cb_t async_event_cb)
+{
+	struct nvmetcp_init_ramrod_params *p_ramrod = NULL;
+	struct qed_nvmetcp_pf_params *p_params = NULL;
+	struct scsi_init_func_queues *p_queue = NULL;
+	struct nvmetcp_spe_func_init *p_init = NULL;
+	struct qed_sp_init_data init_data = {};
+	struct qed_spq_entry *p_ent = NULL;
+	int rc = 0;
+	u16 val;
+	u8 i;
+
+	/* Get SPQ entry */
+	init_data.cid = qed_spq_get_cid(p_hwfn);
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = comp_mode;
+	init_data.p_comp_data = p_comp_addr;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 NVMETCP_RAMROD_CMD_ID_INIT_FUNC,
+				 PROTOCOLID_NVMETCP, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.nvmetcp_init;
+	p_init = &p_ramrod->nvmetcp_init_spe;
+	p_params = &p_hwfn->pf_params.nvmetcp_pf_params;
+	p_queue = &p_init->q_params;
+
+	p_init->num_sq_pages_in_ring = p_params->num_sq_pages_in_ring;
+	p_init->num_r2tq_pages_in_ring = p_params->num_r2tq_pages_in_ring;
+	p_init->num_uhq_pages_in_ring = p_params->num_uhq_pages_in_ring;
+	p_init->ll2_rx_queue_id = RESC_START(p_hwfn, QED_LL2_RAM_QUEUE) +
+					p_params->ll2_ooo_queue_id;
+
+	SET_FIELD(p_init->flags, NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE, 1);
+
+	p_init->func_params.log_page_size = ilog2(PAGE_SIZE);
+	p_init->func_params.num_tasks = cpu_to_le16(p_params->num_tasks);
+	p_init->debug_flags = p_params->debug_mode;
+
+	DMA_REGPAIR_LE(p_queue->glbl_q_params_addr,
+		       p_params->glbl_q_params_addr);
+
+	p_queue->cq_num_entries = cpu_to_le16(QED_NVMETCP_FW_CQ_SIZE);
+	p_queue->num_queues = p_params->num_queues;
+	val = RESC_START(p_hwfn, QED_CMDQS_CQS);
+	p_queue->queue_relative_offset = cpu_to_le16((u16)val);
+	p_queue->cq_sb_pi = p_params->gl_rq_pi;
+
+	for (i = 0; i < p_params->num_queues; i++) {
+		val = qed_get_igu_sb_id(p_hwfn, i);
+		p_queue->cq_cmdq_sb_num_arr[i] = cpu_to_le16(val);
+	}
+
+	SET_FIELD(p_queue->q_validity,
+		  SCSI_INIT_FUNC_QUEUES_CMD_VALID, 0);
+	p_queue->cmdq_num_entries = 0;
+	p_queue->bdq_resource_id = (u8)RESC_START(p_hwfn, QED_BDQ);
+
+	/* p_ramrod->tcp_init.min_rto = cpu_to_le16(p_params->min_rto); */
+	p_ramrod->tcp_init.two_msl_timer = cpu_to_le32(QED_TCP_TWO_MSL_TIMER);
+	p_ramrod->tcp_init.tx_sws_timer = cpu_to_le16(QED_TCP_SWS_TIMER);
+	p_init->half_way_close_timeout = cpu_to_le16(QED_TCP_HALF_WAY_CLOSE_TIMEOUT);
+	p_ramrod->tcp_init.max_fin_rt = QED_TCP_MAX_FIN_RT;
+
+	SET_FIELD(p_ramrod->nvmetcp_init_spe.params,
+		  NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT, QED_TCP_MAX_FIN_RT);
+
+	p_hwfn->p_nvmetcp_info->event_context = event_context;
+	p_hwfn->p_nvmetcp_info->event_cb = async_event_cb;
+
+	qed_spq_register_async_cb(p_hwfn, PROTOCOLID_NVMETCP,
+				  qed_nvmetcp_async_event);
+
+	return qed_spq_post(p_hwfn, p_ent, NULL);
+}
+
+static int qed_sp_nvmetcp_func_stop(struct qed_hwfn *p_hwfn,
+				    enum spq_mode comp_mode,
+				    struct qed_spq_comp_cb *p_comp_addr)
+{
+	struct qed_spq_entry *p_ent = NULL;
+	struct qed_sp_init_data init_data;
+	int rc;
+
+	/* Get SPQ entry */
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.cid = qed_spq_get_cid(p_hwfn);
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = comp_mode;
+	init_data.p_comp_data = p_comp_addr;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC,
+				 PROTOCOLID_NVMETCP, &init_data);
+	if (rc)
+		return rc;
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+
+	qed_spq_unregister_async_cb(p_hwfn, PROTOCOLID_NVMETCP);
+
+	return rc;
+}
+
+static int qed_fill_nvmetcp_dev_info(struct qed_dev *cdev,
+				     struct qed_dev_nvmetcp_info *info)
+{
+	struct qed_hwfn *hwfn = QED_AFFIN_HWFN(cdev);
+	int rc;
+
+	memset(info, 0, sizeof(*info));
+	rc = qed_fill_dev_info(cdev, &info->common);
+
+	info->port_id = MFW_PORT(hwfn);
+	info->num_cqs = FEAT_NUM(hwfn, QED_NVMETCP_CQ);
+
+	return rc;
+}
+
+static void qed_register_nvmetcp_ops(struct qed_dev *cdev,
+				     struct qed_nvmetcp_cb_ops *ops,
+				     void *cookie)
+{
+	cdev->protocol_ops.nvmetcp = ops;
+	cdev->ops_cookie = cookie;
+}
+
+static int qed_nvmetcp_stop(struct qed_dev *cdev)
+{
+	int rc;
+
+	if (!(cdev->flags & QED_FLAG_STORAGE_STARTED)) {
+		DP_NOTICE(cdev, "nvmetcp already stopped\n");
+
+		return 0;
+	}
+
+	if (!hash_empty(cdev->connections)) {
+		DP_NOTICE(cdev,
+			  "Can't stop nvmetcp - not all connections were returned\n");
+
+		return -EINVAL;
+	}
+
+	/* Stop the nvmetcp */
+	rc = qed_sp_nvmetcp_func_stop(QED_AFFIN_HWFN(cdev), QED_SPQ_MODE_EBLOCK,
+				      NULL);
+	cdev->flags &= ~QED_FLAG_STORAGE_STARTED;
+
+	return rc;
+}
+
+static int qed_nvmetcp_start(struct qed_dev *cdev,
+			     struct qed_nvmetcp_tid *tasks,
+			     void *event_context,
+			     nvmetcp_event_cb_t async_event_cb)
+{
+	struct qed_tid_mem *tid_info;
+	int rc;
+
+	if (cdev->flags & QED_FLAG_STORAGE_STARTED) {
+		DP_NOTICE(cdev, "nvmetcp already started;\n");
+
+		return 0;
+	}
+
+	rc = qed_sp_nvmetcp_func_start(QED_AFFIN_HWFN(cdev),
+				       QED_SPQ_MODE_EBLOCK, NULL,
+				       event_context, async_event_cb);
+	if (rc) {
+		DP_NOTICE(cdev, "Failed to start nvmetcp\n");
+
+		return rc;
+	}
+
+	cdev->flags |= QED_FLAG_STORAGE_STARTED;
+	hash_init(cdev->connections);
+
+	if (!tasks)
+		return 0;
+
+	tid_info = kzalloc(sizeof(*tid_info), GFP_KERNEL);
+
+	if (!tid_info) {
+		qed_nvmetcp_stop(cdev);
+
+		return -ENOMEM;
+	}
+
+	rc = qed_cxt_get_tid_mem_info(QED_AFFIN_HWFN(cdev), tid_info);
+	if (rc) {
+		DP_NOTICE(cdev, "Failed to gather task information\n");
+		qed_nvmetcp_stop(cdev);
+		kfree(tid_info);
+
+		return rc;
+	}
+
+	/* Fill task information */
+	tasks->size = tid_info->tid_size;
+	tasks->num_tids_per_block = tid_info->num_tids_per_block;
+	memcpy(tasks->blocks, tid_info->blocks,
+	       MAX_TID_BLOCKS_NVMETCP * sizeof(u8 *));
+
+	kfree(tid_info);
+
+	return 0;
+}
+
+static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
+	.common = &qed_common_ops_pass,
+	.ll2 = &qed_ll2_ops_pass,
+	.fill_dev_info = &qed_fill_nvmetcp_dev_info,
+	.register_ops = &qed_register_nvmetcp_ops,
+	.start = &qed_nvmetcp_start,
+	.stop = &qed_nvmetcp_stop,
+
+	/* Placeholder - Connection level ops */
+};
+
+const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)
+{
+	return &qed_nvmetcp_ops_pass;
+}
+EXPORT_SYMBOL(qed_get_nvmetcp_ops);
+
+void qed_put_nvmetcp_ops(void)
+{
+}
+EXPORT_SYMBOL(qed_put_nvmetcp_ops);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
new file mode 100644
index 000000000000..774b46ade408
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
+/* Copyright 2021 Marvell. All rights reserved. */
+
+#ifndef _QED_NVMETCP_H
+#define _QED_NVMETCP_H
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/qed/tcp_common.h>
+#include <linux/qed/qed_nvmetcp_if.h>
+#include <linux/qed/qed_chain.h>
+#include "qed.h"
+#include "qed_hsi.h"
+#include "qed_mcp.h"
+#include "qed_sp.h"
+
+#define QED_NVMETCP_FW_CQ_SIZE (4 * 1024)
+
+/* tcp parameters */
+#define QED_TCP_TWO_MSL_TIMER 4000
+#define QED_TCP_HALF_WAY_CLOSE_TIMEOUT 10
+#define QED_TCP_MAX_FIN_RT 2
+#define QED_TCP_SWS_TIMER 5000
+
+struct qed_nvmetcp_info {
+	spinlock_t lock; /* Connection resources. */
+	struct list_head free_list;
+	u16 max_num_outstanding_tasks;
+	void *event_context;
+	nvmetcp_event_cb_t event_cb;
+};
+
+#if IS_ENABLED(CONFIG_QED_NVMETCP)
+int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn);
+void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn);
+void qed_nvmetcp_free(struct qed_hwfn *p_hwfn);
+
+#else /* IS_ENABLED(CONFIG_QED_NVMETCP) */
+static inline int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn)
+{
+	return -EINVAL;
+}
+
+static inline void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn) {}
+static inline void qed_nvmetcp_free(struct qed_hwfn *p_hwfn) {}
+
+#endif /* IS_ENABLED(CONFIG_QED_NVMETCP) */
+
+#endif
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h
index 993f1357b6fc..525159e747a5 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sp.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h
@@ -100,6 +100,8 @@ union ramrod_data {
 	struct iscsi_spe_conn_mac_update iscsi_conn_mac_update;
 	struct iscsi_spe_conn_termination iscsi_conn_terminate;
 
+	struct nvmetcp_init_ramrod_params nvmetcp_init;
+
 	struct vf_start_ramrod_data vf_start;
 	struct vf_stop_ramrod_data vf_stop;
 };
diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h
index 977807e1be53..59c5e5866607 100644
--- a/include/linux/qed/common_hsi.h
+++ b/include/linux/qed/common_hsi.h
@@ -703,6 +703,7 @@ enum mf_mode {
 /* Per-protocol connection types */
 enum protocol_type {
 	PROTOCOLID_ISCSI,
+	PROTOCOLID_NVMETCP = PROTOCOLID_ISCSI,
 	PROTOCOLID_FCOE,
 	PROTOCOLID_ROCE,
 	PROTOCOLID_CORE,
diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
new file mode 100644
index 000000000000..e9ccfc07041d
--- /dev/null
+++ b/include/linux/qed/nvmetcp_common.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
+/* Copyright 2021 Marvell. All rights reserved. */
+
+#ifndef __NVMETCP_COMMON__
+#define __NVMETCP_COMMON__
+
+#include "tcp_common.h"
+
+/* NVMeTCP firmware function init parameters */
+struct nvmetcp_spe_func_init {
+	__le16 half_way_close_timeout;
+	u8 num_sq_pages_in_ring;
+	u8 num_r2tq_pages_in_ring;
+	u8 num_uhq_pages_in_ring;
+	u8 ll2_rx_queue_id;
+	u8 flags;
+#define NVMETCP_SPE_FUNC_INIT_COUNTERS_EN_MASK 0x1
+#define NVMETCP_SPE_FUNC_INIT_COUNTERS_EN_SHIFT 0
+#define NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE_MASK 0x1
+#define NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE_SHIFT 1
+#define NVMETCP_SPE_FUNC_INIT_RESERVED0_MASK 0x3F
+#define NVMETCP_SPE_FUNC_INIT_RESERVED0_SHIFT 2
+	u8 debug_flags;
+	__le16 reserved1;
+	u8 params;
+#define NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT_MASK	0xF
+#define NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT_SHIFT	0
+#define NVMETCP_SPE_FUNC_INIT_RESERVED1_MASK	0xF
+#define NVMETCP_SPE_FUNC_INIT_RESERVED1_SHIFT	4
+	u8 reserved2[5];
+	struct scsi_init_func_params func_params;
+	struct scsi_init_func_queues q_params;
+};
+
+/* NVMeTCP init params passed by driver to FW in NVMeTCP init ramrod. */
+struct nvmetcp_init_ramrod_params {
+	struct nvmetcp_spe_func_init nvmetcp_init_spe;
+	struct tcp_init_params tcp_init;
+};
+
+/* NVMeTCP Ramrod Command IDs */
+enum nvmetcp_ramrod_cmd_id {
+	NVMETCP_RAMROD_CMD_ID_UNUSED = 0,
+	NVMETCP_RAMROD_CMD_ID_INIT_FUNC = 1,
+	NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC = 2,
+	MAX_NVMETCP_RAMROD_CMD_ID
+};
+
+struct nvmetcp_glbl_queue_entry {
+	struct regpair cq_pbl_addr;
+	struct regpair reserved;
+};
+
+#endif /* __NVMETCP_COMMON__ */
diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index 68d17a4fbf20..524f57821ba2 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -542,6 +542,26 @@ struct qed_iscsi_pf_params {
 	u8 bdq_pbl_num_entries[3];
 };
 
+struct qed_nvmetcp_pf_params {
+	u64 glbl_q_params_addr;
+	u16 cq_num_entries;
+
+	u16 num_cons;
+	u16 num_tasks;
+
+	u8 num_sq_pages_in_ring;
+	u8 num_r2tq_pages_in_ring;
+	u8 num_uhq_pages_in_ring;
+
+	u8 num_queues;
+	u8 gl_rq_pi;
+	u8 gl_cmd_pi;
+	u8 debug_mode;
+	u8 ll2_ooo_queue_id;
+
+	u16 min_rto;
+};
+
 struct qed_rdma_pf_params {
 	/* Supplied to QED during resource allocation (may affect the ILT and
 	 * the doorbell BAR).
@@ -560,6 +580,7 @@ struct qed_pf_params {
 	struct qed_eth_pf_params eth_pf_params;
 	struct qed_fcoe_pf_params fcoe_pf_params;
 	struct qed_iscsi_pf_params iscsi_pf_params;
+	struct qed_nvmetcp_pf_params nvmetcp_pf_params;
 	struct qed_rdma_pf_params rdma_pf_params;
 };
 
@@ -662,6 +683,7 @@ enum qed_sb_type {
 enum qed_protocol {
 	QED_PROTOCOL_ETH,
 	QED_PROTOCOL_ISCSI,
+	QED_PROTOCOL_NVMETCP = QED_PROTOCOL_ISCSI,
 	QED_PROTOCOL_FCOE,
 };
 
diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
new file mode 100644
index 000000000000..abc1f41862e3
--- /dev/null
+++ b/include/linux/qed/qed_nvmetcp_if.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
+/* Copyright 2021 Marvell. All rights reserved. */
+
+#ifndef _QED_NVMETCP_IF_H
+#define _QED_NVMETCP_IF_H
+#include <linux/types.h>
+#include <linux/qed/qed_if.h>
+
+#define QED_NVMETCP_MAX_IO_SIZE	0x800000
+
+typedef int (*nvmetcp_event_cb_t) (void *context,
+				   u8 fw_event_code, void *fw_handle);
+
+struct qed_dev_nvmetcp_info {
+	struct qed_dev_info common;
+
+	u8 port_id;  /* Physical port */
+	u8 num_cqs;
+};
+
+#define MAX_TID_BLOCKS_NVMETCP (512)
+struct qed_nvmetcp_tid {
+	u32 size;		/* In bytes per task */
+	u32 num_tids_per_block;
+	u8 *blocks[MAX_TID_BLOCKS_NVMETCP];
+};
+
+struct qed_nvmetcp_cb_ops {
+	struct qed_common_cb_ops common;
+};
+
+/**
+ * struct qed_nvmetcp_ops - qed NVMeTCP operations.
+ * @common:		common operations pointer
+ * @ll2:		light L2 operations pointer
+ * @fill_dev_info:	fills NVMeTCP specific information
+ *			@param cdev
+ *			@param info
+ *			@return 0 on success, otherwise error value.
+ * @register_ops:	register nvmetcp operations
+ *			@param cdev
+ *			@param ops - specified using qed_nvmetcp_cb_ops
+ *			@param cookie - driver private
+ * @start:		nvmetcp in FW
+ *			@param cdev
+ *			@param tasks - qed will fill information about tasks
+ *			return 0 on success, otherwise error value.
+ * @stop:		nvmetcp in FW
+ *			@param cdev
+ *			return 0 on success, otherwise error value.
+ */
+struct qed_nvmetcp_ops {
+	const struct qed_common_ops *common;
+
+	const struct qed_ll2_ops *ll2;
+
+	int (*fill_dev_info)(struct qed_dev *cdev,
+			     struct qed_dev_nvmetcp_info *info);
+
+	void (*register_ops)(struct qed_dev *cdev,
+			     struct qed_nvmetcp_cb_ops *ops, void *cookie);
+
+	int (*start)(struct qed_dev *cdev,
+		     struct qed_nvmetcp_tid *tasks,
+		     void *event_context, nvmetcp_event_cb_t async_event_cb);
+
+	int (*stop)(struct qed_dev *cdev);
+};
+
+const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);
+void qed_put_nvmetcp_ops(void);
+#endif
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 02/27] qed: Add NVMeTCP Offload Connection Level FW and HW HSI
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
  2021-04-29 19:09 ` [RFC PATCH v4 01/27] qed: Add NVMeTCP Offload PF Level FW and HW HSI Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-01 17:28   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 03/27] qed: Add qed-NVMeTCP personality Shai Malin
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

This patch introduces the NVMeTCP HSI and HSI functionality in order to
initialize and interact with the HW device as part of the connection level
HSI.

This includes:
- Connection offload: offload a TCP connection to the FW.
- Connection update: update the ICReq-ICResp params
- Connection clear SQ: outstanding IOs FW flush.
- Connection termination: terminate the TCP connection and flush the FW.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
---
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 580 +++++++++++++++++-
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  63 ++
 drivers/net/ethernet/qlogic/qed/qed_sp.h      |   3 +
 include/linux/qed/nvmetcp_common.h            | 143 +++++
 include/linux/qed/qed_nvmetcp_if.h            |  94 +++
 5 files changed, 881 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
index da3b5002d216..79bd1cc6677f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
@@ -259,6 +259,578 @@ static int qed_nvmetcp_start(struct qed_dev *cdev,
 	return 0;
 }
 
+static struct qed_hash_nvmetcp_con *qed_nvmetcp_get_hash(struct qed_dev *cdev,
+							 u32 handle)
+{
+	struct qed_hash_nvmetcp_con *hash_con = NULL;
+
+	if (!(cdev->flags & QED_FLAG_STORAGE_STARTED))
+		return NULL;
+
+	hash_for_each_possible(cdev->connections, hash_con, node, handle) {
+		if (hash_con->con->icid == handle)
+			break;
+	}
+
+	if (!hash_con || hash_con->con->icid != handle)
+		return NULL;
+
+	return hash_con;
+}
+
+static int qed_sp_nvmetcp_conn_offload(struct qed_hwfn *p_hwfn,
+				       struct qed_nvmetcp_conn *p_conn,
+				       enum spq_mode comp_mode,
+				       struct qed_spq_comp_cb *p_comp_addr)
+{
+	struct nvmetcp_spe_conn_offload *p_ramrod = NULL;
+	struct tcp_offload_params_opt2 *p_tcp2 = NULL;
+	struct qed_sp_init_data init_data = { 0 };
+	struct qed_spq_entry *p_ent = NULL;
+	dma_addr_t r2tq_pbl_addr;
+	dma_addr_t xhq_pbl_addr;
+	dma_addr_t uhq_pbl_addr;
+	u16 physical_q;
+	int rc = 0;
+	u32 dval;
+	u8 i;
+
+	/* Get SPQ entry */
+	init_data.cid = p_conn->icid;
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = comp_mode;
+	init_data.p_comp_data = p_comp_addr;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 NVMETCP_RAMROD_CMD_ID_OFFLOAD_CONN,
+				 PROTOCOLID_NVMETCP, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.nvmetcp_conn_offload;
+
+	/* Transmission PQ is the first of the PF */
+	physical_q = qed_get_cm_pq_idx(p_hwfn, PQ_FLAGS_OFLD);
+	p_conn->physical_q0 = cpu_to_le16(physical_q);
+	p_ramrod->nvmetcp.physical_q0 = cpu_to_le16(physical_q);
+
+	/* nvmetcp Pure-ACK PQ */
+	physical_q = qed_get_cm_pq_idx(p_hwfn, PQ_FLAGS_ACK);
+	p_conn->physical_q1 = cpu_to_le16(physical_q);
+	p_ramrod->nvmetcp.physical_q1 = cpu_to_le16(physical_q);
+
+	p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);
+
+	DMA_REGPAIR_LE(p_ramrod->nvmetcp.sq_pbl_addr, p_conn->sq_pbl_addr);
+
+	r2tq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->r2tq);
+	DMA_REGPAIR_LE(p_ramrod->nvmetcp.r2tq_pbl_addr, r2tq_pbl_addr);
+
+	xhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->xhq);
+	DMA_REGPAIR_LE(p_ramrod->nvmetcp.xhq_pbl_addr, xhq_pbl_addr);
+
+	uhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->uhq);
+	DMA_REGPAIR_LE(p_ramrod->nvmetcp.uhq_pbl_addr, uhq_pbl_addr);
+
+	p_ramrod->nvmetcp.flags = p_conn->offl_flags;
+	p_ramrod->nvmetcp.default_cq = p_conn->default_cq;
+	p_ramrod->nvmetcp.initial_ack = 0;
+
+	DMA_REGPAIR_LE(p_ramrod->nvmetcp.nvmetcp.cccid_itid_table_addr,
+		       p_conn->nvmetcp_cccid_itid_table_addr);
+	p_ramrod->nvmetcp.nvmetcp.cccid_max_range =
+		 cpu_to_le16(p_conn->nvmetcp_cccid_max_range);
+
+	p_tcp2 = &p_ramrod->tcp;
+
+	qed_set_fw_mac_addr(&p_tcp2->remote_mac_addr_hi,
+			    &p_tcp2->remote_mac_addr_mid,
+			    &p_tcp2->remote_mac_addr_lo, p_conn->remote_mac);
+	qed_set_fw_mac_addr(&p_tcp2->local_mac_addr_hi,
+			    &p_tcp2->local_mac_addr_mid,
+			    &p_tcp2->local_mac_addr_lo, p_conn->local_mac);
+
+	p_tcp2->vlan_id = cpu_to_le16(p_conn->vlan_id);
+	p_tcp2->flags = cpu_to_le16(p_conn->tcp_flags);
+
+	p_tcp2->ip_version = p_conn->ip_version;
+	for (i = 0; i < 4; i++) {
+		dval = p_conn->remote_ip[i];
+		p_tcp2->remote_ip[i] = cpu_to_le32(dval);
+		dval = p_conn->local_ip[i];
+		p_tcp2->local_ip[i] = cpu_to_le32(dval);
+	}
+
+	p_tcp2->flow_label = cpu_to_le32(p_conn->flow_label);
+	p_tcp2->ttl = p_conn->ttl;
+	p_tcp2->tos_or_tc = p_conn->tos_or_tc;
+	p_tcp2->remote_port = cpu_to_le16(p_conn->remote_port);
+	p_tcp2->local_port = cpu_to_le16(p_conn->local_port);
+	p_tcp2->mss = cpu_to_le16(p_conn->mss);
+	p_tcp2->rcv_wnd_scale = p_conn->rcv_wnd_scale;
+	p_tcp2->connect_mode = p_conn->connect_mode;
+	p_tcp2->cwnd = cpu_to_le32(p_conn->cwnd);
+	p_tcp2->ka_max_probe_cnt = p_conn->ka_max_probe_cnt;
+	p_tcp2->ka_timeout = cpu_to_le32(p_conn->ka_timeout);
+	p_tcp2->max_rt_time = cpu_to_le32(p_conn->max_rt_time);
+	p_tcp2->ka_interval = cpu_to_le32(p_conn->ka_interval);
+
+	return qed_spq_post(p_hwfn, p_ent, NULL);
+}
+
+static int qed_sp_nvmetcp_conn_update(struct qed_hwfn *p_hwfn,
+				      struct qed_nvmetcp_conn *p_conn,
+				      enum spq_mode comp_mode,
+				      struct qed_spq_comp_cb *p_comp_addr)
+{
+	struct nvmetcp_conn_update_ramrod_params *p_ramrod = NULL;
+	struct qed_spq_entry *p_ent = NULL;
+	struct qed_sp_init_data init_data;
+	int rc = -EINVAL;
+	u32 dval;
+
+	/* Get SPQ entry */
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.cid = p_conn->icid;
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = comp_mode;
+	init_data.p_comp_data = p_comp_addr;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 NVMETCP_RAMROD_CMD_ID_UPDATE_CONN,
+				 PROTOCOLID_NVMETCP, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.nvmetcp_conn_update;
+	p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);
+	p_ramrod->flags = p_conn->update_flag;
+	p_ramrod->max_seq_size = cpu_to_le32(p_conn->max_seq_size);
+	dval = p_conn->max_recv_pdu_length;
+	p_ramrod->max_recv_pdu_length = cpu_to_le32(dval);
+	dval = p_conn->max_send_pdu_length;
+	p_ramrod->max_send_pdu_length = cpu_to_le32(dval);
+	dval = p_conn->first_seq_length;
+	p_ramrod->first_seq_length = cpu_to_le32(dval);
+
+	return qed_spq_post(p_hwfn, p_ent, NULL);
+}
+
+static int qed_sp_nvmetcp_conn_terminate(struct qed_hwfn *p_hwfn,
+					 struct qed_nvmetcp_conn *p_conn,
+					 enum spq_mode comp_mode,
+					 struct qed_spq_comp_cb *p_comp_addr)
+{
+	struct nvmetcp_spe_conn_termination *p_ramrod = NULL;
+	struct qed_spq_entry *p_ent = NULL;
+	struct qed_sp_init_data init_data;
+	int rc = -EINVAL;
+
+	/* Get SPQ entry */
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.cid = p_conn->icid;
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = comp_mode;
+	init_data.p_comp_data = p_comp_addr;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 NVMETCP_RAMROD_CMD_ID_TERMINATION_CONN,
+				 PROTOCOLID_NVMETCP, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.nvmetcp_conn_terminate;
+	p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);
+	p_ramrod->abortive = p_conn->abortive_dsconnect;
+
+	return qed_spq_post(p_hwfn, p_ent, NULL);
+}
+
+static int qed_sp_nvmetcp_conn_clear_sq(struct qed_hwfn *p_hwfn,
+					struct qed_nvmetcp_conn *p_conn,
+					enum spq_mode comp_mode,
+					struct qed_spq_comp_cb *p_comp_addr)
+{
+	struct qed_spq_entry *p_ent = NULL;
+	struct qed_sp_init_data init_data;
+	int rc = -EINVAL;
+
+	/* Get SPQ entry */
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.cid = p_conn->icid;
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = comp_mode;
+	init_data.p_comp_data = p_comp_addr;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 NVMETCP_RAMROD_CMD_ID_CLEAR_SQ,
+				 PROTOCOLID_NVMETCP, &init_data);
+	if (rc)
+		return rc;
+
+	return qed_spq_post(p_hwfn, p_ent, NULL);
+}
+
+static void __iomem *qed_nvmetcp_get_db_addr(struct qed_hwfn *p_hwfn, u32 cid)
+{
+	return (u8 __iomem *)p_hwfn->doorbells +
+			     qed_db_addr(cid, DQ_DEMS_LEGACY);
+}
+
+static int qed_nvmetcp_allocate_connection(struct qed_hwfn *p_hwfn,
+					   struct qed_nvmetcp_conn **p_out_conn)
+{
+	struct qed_chain_init_params params = {
+		.mode		= QED_CHAIN_MODE_PBL,
+		.intended_use	= QED_CHAIN_USE_TO_CONSUME_PRODUCE,
+		.cnt_type	= QED_CHAIN_CNT_TYPE_U16,
+	};
+	struct qed_nvmetcp_pf_params *p_params = NULL;
+	struct qed_nvmetcp_conn *p_conn = NULL;
+	int rc = 0;
+
+	/* Try finding a free connection that can be used */
+	spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
+	if (!list_empty(&p_hwfn->p_nvmetcp_info->free_list))
+		p_conn = list_first_entry(&p_hwfn->p_nvmetcp_info->free_list,
+					  struct qed_nvmetcp_conn, list_entry);
+	if (p_conn) {
+		list_del(&p_conn->list_entry);
+		spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
+		*p_out_conn = p_conn;
+
+		return 0;
+	}
+	spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
+
+	/* Need to allocate a new connection */
+	p_params = &p_hwfn->pf_params.nvmetcp_pf_params;
+
+	p_conn = kzalloc(sizeof(*p_conn), GFP_KERNEL);
+	if (!p_conn)
+		return -ENOMEM;
+
+	params.num_elems = p_params->num_r2tq_pages_in_ring *
+			   QED_CHAIN_PAGE_SIZE / sizeof(struct nvmetcp_wqe);
+	params.elem_size = sizeof(struct nvmetcp_wqe);
+
+	rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->r2tq, &params);
+	if (rc)
+		goto nomem_r2tq;
+
+	params.num_elems = p_params->num_uhq_pages_in_ring *
+			   QED_CHAIN_PAGE_SIZE / sizeof(struct iscsi_uhqe);
+	params.elem_size = sizeof(struct iscsi_uhqe);
+
+	rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->uhq, &params);
+	if (rc)
+		goto nomem_uhq;
+
+	params.elem_size = sizeof(struct iscsi_xhqe);
+
+	rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->xhq, &params);
+	if (rc)
+		goto nomem;
+
+	p_conn->free_on_delete = true;
+	*p_out_conn = p_conn;
+
+	return 0;
+
+nomem:
+	qed_chain_free(p_hwfn->cdev, &p_conn->uhq);
+nomem_uhq:
+	qed_chain_free(p_hwfn->cdev, &p_conn->r2tq);
+nomem_r2tq:
+	kfree(p_conn);
+
+	return -ENOMEM;
+}
+
+static int qed_nvmetcp_acquire_connection(struct qed_hwfn *p_hwfn,
+					  struct qed_nvmetcp_conn **p_out_conn)
+{
+	struct qed_nvmetcp_conn *p_conn = NULL;
+	int rc = 0;
+	u32 icid;
+
+	spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
+	rc = qed_cxt_acquire_cid(p_hwfn, PROTOCOLID_NVMETCP, &icid);
+	spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
+
+	if (rc)
+		return rc;
+
+	rc = qed_nvmetcp_allocate_connection(p_hwfn, &p_conn);
+	if (rc) {
+		spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
+		qed_cxt_release_cid(p_hwfn, icid);
+		spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
+
+		return rc;
+	}
+
+	p_conn->icid = icid;
+	p_conn->conn_id = (u16)icid;
+	p_conn->fw_cid = (p_hwfn->hw_info.opaque_fid << 16) | icid;
+	*p_out_conn = p_conn;
+
+	return rc;
+}
+
+static void qed_nvmetcp_release_connection(struct qed_hwfn *p_hwfn,
+					   struct qed_nvmetcp_conn *p_conn)
+{
+	spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
+	list_add_tail(&p_conn->list_entry, &p_hwfn->p_nvmetcp_info->free_list);
+	qed_cxt_release_cid(p_hwfn, p_conn->icid);
+	spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
+}
+
+static void qed_nvmetcp_free_connection(struct qed_hwfn *p_hwfn,
+					struct qed_nvmetcp_conn *p_conn)
+{
+	qed_chain_free(p_hwfn->cdev, &p_conn->xhq);
+	qed_chain_free(p_hwfn->cdev, &p_conn->uhq);
+	qed_chain_free(p_hwfn->cdev, &p_conn->r2tq);
+
+	kfree(p_conn);
+}
+
+int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn)
+{
+	struct qed_nvmetcp_info *p_nvmetcp_info;
+
+	p_nvmetcp_info = kzalloc(sizeof(*p_nvmetcp_info), GFP_KERNEL);
+	if (!p_nvmetcp_info)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&p_nvmetcp_info->free_list);
+
+	p_hwfn->p_nvmetcp_info = p_nvmetcp_info;
+
+	return 0;
+}
+
+void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn)
+{
+	spin_lock_init(&p_hwfn->p_nvmetcp_info->lock);
+}
+
+void qed_nvmetcp_free(struct qed_hwfn *p_hwfn)
+{
+	struct qed_nvmetcp_conn *p_conn = NULL;
+
+	if (!p_hwfn->p_nvmetcp_info)
+		return;
+
+	while (!list_empty(&p_hwfn->p_nvmetcp_info->free_list)) {
+		p_conn = list_first_entry(&p_hwfn->p_nvmetcp_info->free_list,
+					  struct qed_nvmetcp_conn, list_entry);
+		if (p_conn) {
+			list_del(&p_conn->list_entry);
+			qed_nvmetcp_free_connection(p_hwfn, p_conn);
+		}
+	}
+
+	kfree(p_hwfn->p_nvmetcp_info);
+	p_hwfn->p_nvmetcp_info = NULL;
+}
+
+static int qed_nvmetcp_acquire_conn(struct qed_dev *cdev,
+				    u32 *handle,
+				    u32 *fw_cid, void __iomem **p_doorbell)
+{
+	struct qed_hash_nvmetcp_con *hash_con;
+	int rc;
+
+	/* Allocate a hashed connection */
+	hash_con = kzalloc(sizeof(*hash_con), GFP_ATOMIC);
+	if (!hash_con)
+		return -ENOMEM;
+
+	/* Acquire the connection */
+	rc = qed_nvmetcp_acquire_connection(QED_AFFIN_HWFN(cdev),
+					    &hash_con->con);
+	if (rc) {
+		DP_NOTICE(cdev, "Failed to acquire Connection\n");
+		kfree(hash_con);
+
+		return rc;
+	}
+
+	/* Added the connection to hash table */
+	*handle = hash_con->con->icid;
+	*fw_cid = hash_con->con->fw_cid;
+	hash_add(cdev->connections, &hash_con->node, *handle);
+
+	if (p_doorbell)
+		*p_doorbell = qed_nvmetcp_get_db_addr(QED_AFFIN_HWFN(cdev),
+						      *handle);
+
+	return 0;
+}
+
+static int qed_nvmetcp_release_conn(struct qed_dev *cdev, u32 handle)
+{
+	struct qed_hash_nvmetcp_con *hash_con;
+
+	hash_con = qed_nvmetcp_get_hash(cdev, handle);
+	if (!hash_con) {
+		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
+			  handle);
+
+		return -EINVAL;
+	}
+
+	hlist_del(&hash_con->node);
+	qed_nvmetcp_release_connection(QED_AFFIN_HWFN(cdev), hash_con->con);
+	kfree(hash_con);
+
+	return 0;
+}
+
+static int qed_nvmetcp_offload_conn(struct qed_dev *cdev, u32 handle,
+				    struct qed_nvmetcp_params_offload *conn_info)
+{
+	struct qed_hash_nvmetcp_con *hash_con;
+	struct qed_nvmetcp_conn *con;
+
+	hash_con = qed_nvmetcp_get_hash(cdev, handle);
+	if (!hash_con) {
+		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
+			  handle);
+
+		return -EINVAL;
+	}
+
+	/* Update the connection with information from the params */
+	con = hash_con->con;
+
+	/* FW initializations */
+	con->layer_code = NVMETCP_SLOW_PATH_LAYER_CODE;
+	con->sq_pbl_addr = conn_info->sq_pbl_addr;
+	con->nvmetcp_cccid_max_range = conn_info->nvmetcp_cccid_max_range;
+	con->nvmetcp_cccid_itid_table_addr = conn_info->nvmetcp_cccid_itid_table_addr;
+	con->default_cq = conn_info->default_cq;
+
+	SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE, 0);
+	SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE, 1);
+	SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B, 1);
+
+	/* Networking and TCP stack initializations */
+	ether_addr_copy(con->local_mac, conn_info->src.mac);
+	ether_addr_copy(con->remote_mac, conn_info->dst.mac);
+	memcpy(con->local_ip, conn_info->src.ip, sizeof(con->local_ip));
+	memcpy(con->remote_ip, conn_info->dst.ip, sizeof(con->remote_ip));
+	con->local_port = conn_info->src.port;
+	con->remote_port = conn_info->dst.port;
+	con->vlan_id = conn_info->vlan_id;
+
+	if (conn_info->timestamp_en)
+		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_TS_EN, 1);
+
+	if (conn_info->delayed_ack_en)
+		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_DA_EN, 1);
+
+	if (conn_info->tcp_keep_alive_en)
+		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_KA_EN, 1);
+
+	if (conn_info->ecn_en)
+		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_ECN_EN, 1);
+
+	con->ip_version = conn_info->ip_version;
+	con->flow_label = QED_TCP_FLOW_LABEL;
+	con->ka_max_probe_cnt = conn_info->ka_max_probe_cnt;
+	con->ka_timeout = conn_info->ka_timeout;
+	con->ka_interval = conn_info->ka_interval;
+	con->max_rt_time = conn_info->max_rt_time;
+	con->ttl = conn_info->ttl;
+	con->tos_or_tc = conn_info->tos_or_tc;
+	con->mss = conn_info->mss;
+	con->cwnd = conn_info->cwnd;
+	con->rcv_wnd_scale = conn_info->rcv_wnd_scale;
+	con->connect_mode = 0; /* TCP_CONNECT_ACTIVE */
+
+	return qed_sp_nvmetcp_conn_offload(QED_AFFIN_HWFN(cdev), con,
+					 QED_SPQ_MODE_EBLOCK, NULL);
+}
+
+static int qed_nvmetcp_update_conn(struct qed_dev *cdev,
+				   u32 handle,
+				   struct qed_nvmetcp_params_update *conn_info)
+{
+	struct qed_hash_nvmetcp_con *hash_con;
+	struct qed_nvmetcp_conn *con;
+
+	hash_con = qed_nvmetcp_get_hash(cdev, handle);
+	if (!hash_con) {
+		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
+			  handle);
+
+		return -EINVAL;
+	}
+
+	/* Update the connection with information from the params */
+	con = hash_con->con;
+
+	SET_FIELD(con->update_flag,
+		  ISCSI_CONN_UPDATE_RAMROD_PARAMS_INITIAL_R2T, 0);
+	SET_FIELD(con->update_flag,
+		  ISCSI_CONN_UPDATE_RAMROD_PARAMS_IMMEDIATE_DATA, 1);
+
+	if (conn_info->hdr_digest_en)
+		SET_FIELD(con->update_flag, ISCSI_CONN_UPDATE_RAMROD_PARAMS_HD_EN, 1);
+
+	if (conn_info->data_digest_en)
+		SET_FIELD(con->update_flag, ISCSI_CONN_UPDATE_RAMROD_PARAMS_DD_EN, 1);
+
+	/* Placeholder - initialize pfv, cpda, hpda */
+
+	con->max_seq_size = conn_info->max_io_size;
+	con->max_recv_pdu_length = conn_info->max_recv_pdu_length;
+	con->max_send_pdu_length = conn_info->max_send_pdu_length;
+	con->first_seq_length = conn_info->max_io_size;
+
+	return qed_sp_nvmetcp_conn_update(QED_AFFIN_HWFN(cdev), con,
+					QED_SPQ_MODE_EBLOCK, NULL);
+}
+
+static int qed_nvmetcp_clear_conn_sq(struct qed_dev *cdev, u32 handle)
+{
+	struct qed_hash_nvmetcp_con *hash_con;
+
+	hash_con = qed_nvmetcp_get_hash(cdev, handle);
+	if (!hash_con) {
+		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
+			  handle);
+
+		return -EINVAL;
+	}
+
+	return qed_sp_nvmetcp_conn_clear_sq(QED_AFFIN_HWFN(cdev), hash_con->con,
+					    QED_SPQ_MODE_EBLOCK, NULL);
+}
+
+static int qed_nvmetcp_destroy_conn(struct qed_dev *cdev,
+				    u32 handle, u8 abrt_conn)
+{
+	struct qed_hash_nvmetcp_con *hash_con;
+
+	hash_con = qed_nvmetcp_get_hash(cdev, handle);
+	if (!hash_con) {
+		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
+			  handle);
+
+		return -EINVAL;
+	}
+
+	hash_con->con->abortive_dsconnect = abrt_conn;
+
+	return qed_sp_nvmetcp_conn_terminate(QED_AFFIN_HWFN(cdev), hash_con->con,
+					   QED_SPQ_MODE_EBLOCK, NULL);
+}
+
 static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
 	.common = &qed_common_ops_pass,
 	.ll2 = &qed_ll2_ops_pass,
@@ -266,8 +838,12 @@ static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
 	.register_ops = &qed_register_nvmetcp_ops,
 	.start = &qed_nvmetcp_start,
 	.stop = &qed_nvmetcp_stop,
-
-	/* Placeholder - Connection level ops */
+	.acquire_conn = &qed_nvmetcp_acquire_conn,
+	.release_conn = &qed_nvmetcp_release_conn,
+	.offload_conn = &qed_nvmetcp_offload_conn,
+	.update_conn = &qed_nvmetcp_update_conn,
+	.destroy_conn = &qed_nvmetcp_destroy_conn,
+	.clear_sq = &qed_nvmetcp_clear_conn_sq,
 };
 
 const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
index 774b46ade408..749169f0bdb1 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
@@ -19,6 +19,7 @@
 #define QED_NVMETCP_FW_CQ_SIZE (4 * 1024)
 
 /* tcp parameters */
+#define QED_TCP_FLOW_LABEL 0
 #define QED_TCP_TWO_MSL_TIMER 4000
 #define QED_TCP_HALF_WAY_CLOSE_TIMEOUT 10
 #define QED_TCP_MAX_FIN_RT 2
@@ -32,6 +33,68 @@ struct qed_nvmetcp_info {
 	nvmetcp_event_cb_t event_cb;
 };
 
+struct qed_hash_nvmetcp_con {
+	struct hlist_node node;
+	struct qed_nvmetcp_conn *con;
+};
+
+struct qed_nvmetcp_conn {
+	struct list_head list_entry;
+	bool free_on_delete;
+
+	u16 conn_id;
+	u32 icid;
+	u32 fw_cid;
+
+	u8 layer_code;
+	u8 offl_flags;
+	u8 connect_mode;
+
+	dma_addr_t sq_pbl_addr;
+	struct qed_chain r2tq;
+	struct qed_chain xhq;
+	struct qed_chain uhq;
+
+	u8 local_mac[6];
+	u8 remote_mac[6];
+	u8 ip_version;
+	u8 ka_max_probe_cnt;
+
+	u16 vlan_id;
+	u16 tcp_flags;
+	u32 remote_ip[4];
+	u32 local_ip[4];
+
+	u32 flow_label;
+	u32 ka_timeout;
+	u32 ka_interval;
+	u32 max_rt_time;
+
+	u8 ttl;
+	u8 tos_or_tc;
+	u16 remote_port;
+	u16 local_port;
+	u16 mss;
+	u8 rcv_wnd_scale;
+	u32 rcv_wnd;
+	u32 cwnd;
+
+	u8 update_flag;
+	u8 default_cq;
+	u8 abortive_dsconnect;
+
+	u32 max_seq_size;
+	u32 max_recv_pdu_length;
+	u32 max_send_pdu_length;
+	u32 first_seq_length;
+
+	u16 physical_q0;
+	u16 physical_q1;
+
+	u16 nvmetcp_cccid_max_range;
+	dma_addr_t nvmetcp_cccid_itid_table_addr;
+};
+
 #if IS_ENABLED(CONFIG_QED_NVMETCP)
 int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn);
 void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h
index 525159e747a5..60ff3222bf55 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sp.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h
@@ -101,6 +101,9 @@ union ramrod_data {
 	struct iscsi_spe_conn_termination iscsi_conn_terminate;
 
 	struct nvmetcp_init_ramrod_params nvmetcp_init;
+	struct nvmetcp_spe_conn_offload nvmetcp_conn_offload;
+	struct nvmetcp_conn_update_ramrod_params nvmetcp_conn_update;
+	struct nvmetcp_spe_conn_termination nvmetcp_conn_terminate;
 
 	struct vf_start_ramrod_data vf_start;
 	struct vf_stop_ramrod_data vf_stop;
diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
index e9ccfc07041d..c8836b71b866 100644
--- a/include/linux/qed/nvmetcp_common.h
+++ b/include/linux/qed/nvmetcp_common.h
@@ -6,6 +6,8 @@
 
 #include "tcp_common.h"
 
+#define NVMETCP_SLOW_PATH_LAYER_CODE (6)
+
 /* NVMeTCP firmware function init parameters */
 struct nvmetcp_spe_func_init {
 	__le16 half_way_close_timeout;
@@ -43,6 +45,10 @@ enum nvmetcp_ramrod_cmd_id {
 	NVMETCP_RAMROD_CMD_ID_UNUSED = 0,
 	NVMETCP_RAMROD_CMD_ID_INIT_FUNC = 1,
 	NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC = 2,
+	NVMETCP_RAMROD_CMD_ID_OFFLOAD_CONN = 3,
+	NVMETCP_RAMROD_CMD_ID_UPDATE_CONN = 4,
+	NVMETCP_RAMROD_CMD_ID_TERMINATION_CONN = 5,
+	NVMETCP_RAMROD_CMD_ID_CLEAR_SQ = 6,
 	MAX_NVMETCP_RAMROD_CMD_ID
 };
 
@@ -51,4 +57,141 @@ struct nvmetcp_glbl_queue_entry {
 	struct regpair reserved;
 };
 
+/* NVMeTCP conn level EQEs */
+enum nvmetcp_eqe_opcode {
+	NVMETCP_EVENT_TYPE_INIT_FUNC = 0, /* Response after init Ramrod */
+	NVMETCP_EVENT_TYPE_DESTROY_FUNC, /* Response after destroy Ramrod */
+	NVMETCP_EVENT_TYPE_OFFLOAD_CONN,/* Response after option 2 offload Ramrod */
+	NVMETCP_EVENT_TYPE_UPDATE_CONN, /* Response after update Ramrod */
+	NVMETCP_EVENT_TYPE_CLEAR_SQ, /* Response after clear sq Ramrod */
+	NVMETCP_EVENT_TYPE_TERMINATE_CONN, /* Response after termination Ramrod */
+	NVMETCP_EVENT_TYPE_RESERVED0,
+	NVMETCP_EVENT_TYPE_RESERVED1,
+	NVMETCP_EVENT_TYPE_ASYN_CONNECT_COMPLETE, /* Connect completed (A-syn EQE) */
+	NVMETCP_EVENT_TYPE_ASYN_TERMINATE_DONE, /* Termination completed (A-syn EQE) */
+	NVMETCP_EVENT_TYPE_START_OF_ERROR_TYPES = 10, /* Separate EQs from err EQs */
+	NVMETCP_EVENT_TYPE_ASYN_ABORT_RCVD, /* TCP RST packet receive (A-syn EQE) */
+	NVMETCP_EVENT_TYPE_ASYN_CLOSE_RCVD, /* TCP FIN packet receive (A-syn EQE) */
+	NVMETCP_EVENT_TYPE_ASYN_SYN_RCVD, /* TCP SYN+ACK packet receive (A-syn EQE) */
+	NVMETCP_EVENT_TYPE_ASYN_MAX_RT_TIME, /* TCP max retransmit time (A-syn EQE) */
+	NVMETCP_EVENT_TYPE_ASYN_MAX_RT_CNT, /* TCP max retransmit count (A-syn EQE) */
+	NVMETCP_EVENT_TYPE_ASYN_MAX_KA_PROBES_CNT, /* TCP ka probes count (A-syn EQE) */
+	NVMETCP_EVENT_TYPE_ASYN_FIN_WAIT2, /* TCP fin wait 2 (A-syn EQE) */
+	NVMETCP_EVENT_TYPE_NVMETCP_CONN_ERROR, /* NVMeTCP error response (A-syn EQE) */
+	NVMETCP_EVENT_TYPE_TCP_CONN_ERROR, /* NVMeTCP error - tcp error (A-syn EQE) */
+	MAX_NVMETCP_EQE_OPCODE
+};
+
+struct nvmetcp_conn_offload_section {
+	struct regpair cccid_itid_table_addr; /* CCCID to iTID table address */
+	__le16 cccid_max_range; /* CCCID max value - used for validation */
+	__le16 reserved[3];
+};
+
+/* NVMe TCP connection offload params passed by driver to FW in NVMeTCP offload ramrod */
+struct nvmetcp_conn_offload_params {
+	struct regpair sq_pbl_addr;
+	struct regpair r2tq_pbl_addr;
+	struct regpair xhq_pbl_addr;
+	struct regpair uhq_pbl_addr;
+	__le16 physical_q0;
+	__le16 physical_q1;
+	u8 flags;
+#define NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B_MASK 0x1
+#define NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B_SHIFT 0
+#define NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE_MASK 0x1
+#define NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE_SHIFT 1
+#define NVMETCP_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_MASK 0x1
+#define NVMETCP_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_SHIFT 2
+#define NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE_MASK 0x1
+#define NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE_SHIFT 3
+#define NVMETCP_CONN_OFFLOAD_PARAMS_RESERVED1_MASK 0xF
+#define NVMETCP_CONN_OFFLOAD_PARAMS_RESERVED1_SHIFT 4
+	u8 default_cq;
+	__le16 reserved0;
+	__le32 reserved1;
+	__le32 initial_ack;
+
+	struct nvmetcp_conn_offload_section nvmetcp; /* NVMe/TCP section */
+};
+
+/* NVMe TCP and TCP connection offload params passed by driver to FW in NVMeTCP offload ramrod. */
+struct nvmetcp_spe_conn_offload {
+	__le16 reserved;
+	__le16 conn_id;
+	__le32 fw_cid;
+	struct nvmetcp_conn_offload_params nvmetcp;
+	struct tcp_offload_params_opt2 tcp;
+};
+
+/* NVMeTCP connection update params passed by driver to FW in NVMETCP update ramrod. */
+struct nvmetcp_conn_update_ramrod_params {
+	__le16 reserved0;
+	__le16 conn_id;
+	__le32 reserved1;
+	u8 flags;
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_HD_EN_MASK 0x1
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_HD_EN_SHIFT 0
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_DD_EN_MASK 0x1
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_DD_EN_SHIFT 1
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED0_MASK 0x1
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED0_SHIFT 2
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED1_MASK 0x1
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED1_DATA_SHIFT 3
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED2_MASK 0x1
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED2_SHIFT 4
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED3_MASK 0x1
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED3_SHIFT 5
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED4_MASK 0x1
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED4_SHIFT 6
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED5_MASK 0x1
+#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED5_SHIFT 7
+	u8 reserved3[3];
+	__le32 max_seq_size;
+	__le32 max_send_pdu_length;
+	__le32 max_recv_pdu_length;
+	__le32 first_seq_length;
+	__le32 reserved4[5];
+};
+
+/* NVMeTCP connection termination request */
+struct nvmetcp_spe_conn_termination {
+	__le16 reserved0;
+	__le16 conn_id;
+	__le32 reserved1;
+	u8 abortive;
+	u8 reserved2[7];
+	struct regpair reserved3;
+	struct regpair reserved4;
+};
+
+struct nvmetcp_dif_flags {
+	u8 flags;
+};
+
+enum nvmetcp_wqe_type {
+	NVMETCP_WQE_TYPE_NORMAL,
+	NVMETCP_WQE_TYPE_TASK_CLEANUP,
+	NVMETCP_WQE_TYPE_MIDDLE_PATH,
+	NVMETCP_WQE_TYPE_IC,
+	MAX_NVMETCP_WQE_TYPE
+};
+
+struct nvmetcp_wqe {
+	__le16 task_id;
+	u8 flags;
+#define NVMETCP_WQE_WQE_TYPE_MASK 0x7 /* [use nvmetcp_wqe_type] */
+#define NVMETCP_WQE_WQE_TYPE_SHIFT 0
+#define NVMETCP_WQE_NUM_SGES_MASK 0xF
+#define NVMETCP_WQE_NUM_SGES_SHIFT 3
+#define NVMETCP_WQE_RESPONSE_MASK 0x1
+#define NVMETCP_WQE_RESPONSE_SHIFT 7
+	struct nvmetcp_dif_flags prot_flags;
+	__le32 contlen_cdbsize;
+#define NVMETCP_WQE_CONT_LEN_MASK 0xFFFFFF
+#define NVMETCP_WQE_CONT_LEN_SHIFT 0
+#define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_MASK 0xFF
+#define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_SHIFT 24
+};
+
 #endif /* __NVMETCP_COMMON__ */
diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
index abc1f41862e3..96263e3cfa1e 100644
--- a/include/linux/qed/qed_nvmetcp_if.h
+++ b/include/linux/qed/qed_nvmetcp_if.h
@@ -25,6 +25,50 @@ struct qed_nvmetcp_tid {
 	u8 *blocks[MAX_TID_BLOCKS_NVMETCP];
 };
 
+struct qed_nvmetcp_id_params {
+	u8 mac[ETH_ALEN];
+	u32 ip[4];
+	u16 port;
+};
+
+struct qed_nvmetcp_params_offload {
+	/* FW initializations */
+	dma_addr_t sq_pbl_addr;
+	dma_addr_t nvmetcp_cccid_itid_table_addr;
+	u16 nvmetcp_cccid_max_range;
+	u8 default_cq;
+
+	/* Networking and TCP stack initializations */
+	struct qed_nvmetcp_id_params src;
+	struct qed_nvmetcp_id_params dst;
+	u32 ka_timeout;
+	u32 ka_interval;
+	u32 max_rt_time;
+	u32 cwnd;
+	u16 mss;
+	u16 vlan_id;
+	bool timestamp_en;
+	bool delayed_ack_en;
+	bool tcp_keep_alive_en;
+	bool ecn_en;
+	u8 ip_version;
+	u8 ka_max_probe_cnt;
+	u8 ttl;
+	u8 tos_or_tc;
+	u8 rcv_wnd_scale;
+};
+
+struct qed_nvmetcp_params_update {
+	u32 max_io_size;
+	u32 max_recv_pdu_length;
+	u32 max_send_pdu_length;
+
+	/* Placeholder: pfv, cpda, hpda */
+
+	bool hdr_digest_en;
+	bool data_digest_en;
+};
+
 struct qed_nvmetcp_cb_ops {
 	struct qed_common_cb_ops common;
 };
@@ -48,6 +92,38 @@ struct qed_nvmetcp_cb_ops {
  * @stop:		nvmetcp in FW
  *			@param cdev
  *			return 0 on success, otherwise error value.
+ * @acquire_conn:	acquire a new nvmetcp connection
+ *			@param cdev
+ *			@param handle - qed will fill handle that should be
+ *				used henceforth as identifier of the
+ *				connection.
+ *			@param p_doorbell - qed will fill the address of the
+ *				doorbell.
+ *			@return 0 on sucesss, otherwise error value.
+ * @release_conn:	release a previously acquired nvmetcp connection
+ *			@param cdev
+ *			@param handle - the connection handle.
+ *			@return 0 on success, otherwise error value.
+ * @offload_conn:	configures an offloaded connection
+ *			@param cdev
+ *			@param handle - the connection handle.
+ *			@param conn_info - the configuration to use for the
+ *				offload.
+ *			@return 0 on success, otherwise error value.
+ * @update_conn:	updates an offloaded connection
+ *			@param cdev
+ *			@param handle - the connection handle.
+ *			@param conn_info - the configuration to use for the
+ *				offload.
+ *			@return 0 on success, otherwise error value.
+ * @destroy_conn:	stops an offloaded connection
+ *			@param cdev
+ *			@param handle - the connection handle.
+ *			@return 0 on success, otherwise error value.
+ * @clear_sq:		clear all task in sq
+ *			@param cdev
+ *			@param handle - the connection handle.
+ *			@return 0 on success, otherwise error value.
  */
 struct qed_nvmetcp_ops {
 	const struct qed_common_ops *common;
@@ -65,6 +141,24 @@ struct qed_nvmetcp_ops {
 		     void *event_context, nvmetcp_event_cb_t async_event_cb);
 
 	int (*stop)(struct qed_dev *cdev);
+
+	int (*acquire_conn)(struct qed_dev *cdev,
+			    u32 *handle,
+			    u32 *fw_cid, void __iomem **p_doorbell);
+
+	int (*release_conn)(struct qed_dev *cdev, u32 handle);
+
+	int (*offload_conn)(struct qed_dev *cdev,
+			    u32 handle,
+			    struct qed_nvmetcp_params_offload *conn_info);
+
+	int (*update_conn)(struct qed_dev *cdev,
+			   u32 handle,
+			   struct qed_nvmetcp_params_update *conn_info);
+
+	int (*destroy_conn)(struct qed_dev *cdev, u32 handle, u8 abrt_conn);
+
+	int (*clear_sq)(struct qed_dev *cdev, u32 handle);
 };
 
 const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 03/27] qed: Add qed-NVMeTCP personality
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
  2021-04-29 19:09 ` [RFC PATCH v4 01/27] qed: Add NVMeTCP Offload PF Level FW and HW HSI Shai Malin
  2021-04-29 19:09 ` [RFC PATCH v4 02/27] qed: Add NVMeTCP Offload Connection " Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:11   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 04/27] qed: Add support of HW filter block Shai Malin
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

From: Omkar Kulkarni <okulkarni@marvell.com>

This patch adds qed NVMeTCP personality in order to support the NVMeTCP
qed functionalities and manage the HW device shared resources.
The same design is used with Eth (qede), RDMA(qedr), iSCSI (qedi) and
FCoE (qedf).

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
---
 drivers/net/ethernet/qlogic/qed/qed.h         |  3 ++
 drivers/net/ethernet/qlogic/qed/qed_cxt.c     | 32 ++++++++++++++
 drivers/net/ethernet/qlogic/qed/qed_cxt.h     |  1 +
 drivers/net/ethernet/qlogic/qed/qed_dev.c     | 44 ++++++++++++++++---
 drivers/net/ethernet/qlogic/qed/qed_hsi.h     |  3 +-
 drivers/net/ethernet/qlogic/qed/qed_ll2.c     | 31 ++++++++-----
 drivers/net/ethernet/qlogic/qed/qed_mcp.c     |  3 ++
 drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |  3 +-
 drivers/net/ethernet/qlogic/qed/qed_ooo.c     |  5 ++-
 .../net/ethernet/qlogic/qed/qed_sp_commands.c |  1 +
 10 files changed, 108 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
index 91d4635009ab..7ae648c4edba 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -200,6 +200,7 @@ enum qed_pci_personality {
 	QED_PCI_ETH,
 	QED_PCI_FCOE,
 	QED_PCI_ISCSI,
+	QED_PCI_NVMETCP,
 	QED_PCI_ETH_ROCE,
 	QED_PCI_ETH_IWARP,
 	QED_PCI_ETH_RDMA,
@@ -285,6 +286,8 @@ struct qed_hw_info {
 	((dev)->hw_info.personality == QED_PCI_FCOE)
 #define QED_IS_ISCSI_PERSONALITY(dev)					\
 	((dev)->hw_info.personality == QED_PCI_ISCSI)
+#define QED_IS_NVMETCP_PERSONALITY(dev)					\
+	((dev)->hw_info.personality == QED_PCI_NVMETCP)
 
 	/* Resource Allocation scheme results */
 	u32				resc_start[QED_MAX_RESC];
diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index 0a22f8ce9a2c..6cef75723e38 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
@@ -2106,6 +2106,30 @@ int qed_cxt_set_pf_params(struct qed_hwfn *p_hwfn, u32 rdma_tasks)
 		}
 		break;
 	}
+	case QED_PCI_NVMETCP:
+	{
+		struct qed_nvmetcp_pf_params *p_params;
+
+		p_params = &p_hwfn->pf_params.nvmetcp_pf_params;
+
+		if (p_params->num_cons && p_params->num_tasks) {
+			qed_cxt_set_proto_cid_count(p_hwfn,
+						    PROTOCOLID_NVMETCP,
+						    p_params->num_cons,
+						    0);
+
+			qed_cxt_set_proto_tid_count(p_hwfn,
+						    PROTOCOLID_NVMETCP,
+						    QED_CTX_NVMETCP_TID_SEG,
+						    0,
+						    p_params->num_tasks,
+						    true);
+		} else {
+			DP_INFO(p_hwfn->cdev,
+				"NvmeTCP personality used without setting params!\n");
+		}
+		break;
+	}
 	default:
 		return -EINVAL;
 	}
@@ -2132,6 +2156,10 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,
 		proto = PROTOCOLID_ISCSI;
 		seg = QED_CXT_ISCSI_TID_SEG;
 		break;
+	case QED_PCI_NVMETCP:
+		proto = PROTOCOLID_NVMETCP;
+		seg = QED_CTX_NVMETCP_TID_SEG;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2458,6 +2486,10 @@ int qed_cxt_get_task_ctx(struct qed_hwfn *p_hwfn,
 		proto = PROTOCOLID_ISCSI;
 		seg = QED_CXT_ISCSI_TID_SEG;
 		break;
+	case QED_PCI_NVMETCP:
+		proto = PROTOCOLID_NVMETCP;
+		seg = QED_CTX_NVMETCP_TID_SEG;
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.h b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
index 056e79620a0e..8f1a77cb33f6 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
@@ -51,6 +51,7 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,
 			     struct qed_tid_mem *p_info);
 
 #define QED_CXT_ISCSI_TID_SEG	PROTOCOLID_ISCSI
+#define QED_CTX_NVMETCP_TID_SEG PROTOCOLID_NVMETCP
 #define QED_CXT_ROCE_TID_SEG	PROTOCOLID_ROCE
 #define QED_CXT_FCOE_TID_SEG	PROTOCOLID_FCOE
 enum qed_cxt_elem_type {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index d2f5855b2ea7..d3f8cc42d07e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -37,6 +37,7 @@
 #include "qed_sriov.h"
 #include "qed_vf.h"
 #include "qed_rdma.h"
+#include "qed_nvmetcp.h"
 
 static DEFINE_SPINLOCK(qm_lock);
 
@@ -667,7 +668,8 @@ qed_llh_set_engine_affin(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
 	}
 
 	/* Storage PF is bound to a single engine while L2 PF uses both */
-	if (QED_IS_FCOE_PERSONALITY(p_hwfn) || QED_IS_ISCSI_PERSONALITY(p_hwfn))
+	if (QED_IS_FCOE_PERSONALITY(p_hwfn) || QED_IS_ISCSI_PERSONALITY(p_hwfn) ||
+	    QED_IS_NVMETCP_PERSONALITY(p_hwfn))
 		eng = cdev->fir_affin ? QED_ENG1 : QED_ENG0;
 	else			/* L2_PERSONALITY */
 		eng = QED_BOTH_ENG;
@@ -1164,6 +1166,9 @@ void qed_llh_remove_mac_filter(struct qed_dev *cdev,
 	if (!test_bit(QED_MF_LLH_MAC_CLSS, &cdev->mf_bits))
 		goto out;
 
+	if (QED_IS_NVMETCP_PERSONALITY(p_hwfn))
+		return;
+
 	ether_addr_copy(filter.mac.addr, mac_addr);
 	rc = qed_llh_shadow_remove_filter(cdev, ppfid, &filter, &filter_idx,
 					  &ref_cnt);
@@ -1381,6 +1386,11 @@ void qed_resc_free(struct qed_dev *cdev)
 			qed_ooo_free(p_hwfn);
 		}
 
+		if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
+			qed_nvmetcp_free(p_hwfn);
+			qed_ooo_free(p_hwfn);
+		}
+
 		if (QED_IS_RDMA_PERSONALITY(p_hwfn) && rdma_info) {
 			qed_spq_unregister_async_cb(p_hwfn, rdma_info->proto);
 			qed_rdma_info_free(p_hwfn);
@@ -1423,6 +1433,7 @@ static u32 qed_get_pq_flags(struct qed_hwfn *p_hwfn)
 		flags |= PQ_FLAGS_OFLD;
 		break;
 	case QED_PCI_ISCSI:
+	case QED_PCI_NVMETCP:
 		flags |= PQ_FLAGS_ACK | PQ_FLAGS_OOO | PQ_FLAGS_OFLD;
 		break;
 	case QED_PCI_ETH_ROCE:
@@ -2269,6 +2280,12 @@ int qed_resc_alloc(struct qed_dev *cdev)
 							PROTOCOLID_ISCSI,
 							NULL);
 			n_eqes += 2 * num_cons;
+		} else if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
+			num_cons =
+			    qed_cxt_get_proto_cid_count(p_hwfn,
+							PROTOCOLID_NVMETCP,
+							NULL);
+			n_eqes += 2 * num_cons;
 		}
 
 		if (n_eqes > 0xFFFF) {
@@ -2313,6 +2330,15 @@ int qed_resc_alloc(struct qed_dev *cdev)
 				goto alloc_err;
 		}
 
+		if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
+			rc = qed_nvmetcp_alloc(p_hwfn);
+			if (rc)
+				goto alloc_err;
+			rc = qed_ooo_alloc(p_hwfn);
+			if (rc)
+				goto alloc_err;
+		}
+
 		if (QED_IS_RDMA_PERSONALITY(p_hwfn)) {
 			rc = qed_rdma_info_alloc(p_hwfn);
 			if (rc)
@@ -2393,6 +2419,11 @@ void qed_resc_setup(struct qed_dev *cdev)
 			qed_iscsi_setup(p_hwfn);
 			qed_ooo_setup(p_hwfn);
 		}
+
+		if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
+			qed_nvmetcp_setup(p_hwfn);
+			qed_ooo_setup(p_hwfn);
+		}
 	}
 }
 
@@ -2854,7 +2885,8 @@ static int qed_hw_init_pf(struct qed_hwfn *p_hwfn,
 
 	/* Protocol Configuration */
 	STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_TCP_RT_OFFSET,
-		     (p_hwfn->hw_info.personality == QED_PCI_ISCSI) ? 1 : 0);
+		     ((p_hwfn->hw_info.personality == QED_PCI_ISCSI) ||
+			 (p_hwfn->hw_info.personality == QED_PCI_NVMETCP)) ? 1 : 0);
 	STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_FCOE_RT_OFFSET,
 		     (p_hwfn->hw_info.personality == QED_PCI_FCOE) ? 1 : 0);
 	STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_ROCE_RT_OFFSET, 0);
@@ -3531,7 +3563,7 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)
 					       RESC_NUM(p_hwfn,
 							QED_CMDQS_CQS));
 
-	if (QED_IS_ISCSI_PERSONALITY(p_hwfn))
+	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))
 		feat_num[QED_ISCSI_CQ] = min_t(u32, sb_cnt.cnt,
 					       RESC_NUM(p_hwfn,
 							QED_CMDQS_CQS));
@@ -3734,7 +3766,8 @@ int qed_hw_get_dflt_resc(struct qed_hwfn *p_hwfn,
 		break;
 	case QED_BDQ:
 		if (p_hwfn->hw_info.personality != QED_PCI_ISCSI &&
-		    p_hwfn->hw_info.personality != QED_PCI_FCOE)
+		    p_hwfn->hw_info.personality != QED_PCI_FCOE &&
+			p_hwfn->hw_info.personality != QED_PCI_NVMETCP)
 			*p_resc_num = 0;
 		else
 			*p_resc_num = 1;
@@ -3755,7 +3788,8 @@ int qed_hw_get_dflt_resc(struct qed_hwfn *p_hwfn,
 			*p_resc_start = 0;
 		else if (p_hwfn->cdev->num_ports_in_engine == 4)
 			*p_resc_start = p_hwfn->port_id;
-		else if (p_hwfn->hw_info.personality == QED_PCI_ISCSI)
+		else if (p_hwfn->hw_info.personality == QED_PCI_ISCSI ||
+			 p_hwfn->hw_info.personality == QED_PCI_NVMETCP)
 			*p_resc_start = p_hwfn->port_id;
 		else if (p_hwfn->hw_info.personality == QED_PCI_FCOE)
 			*p_resc_start = p_hwfn->port_id + 2;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 24472f6a83c2..9c9ec8f53ef8 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -12148,7 +12148,8 @@ struct public_func {
 #define FUNC_MF_CFG_PROTOCOL_ISCSI              0x00000010
 #define FUNC_MF_CFG_PROTOCOL_FCOE               0x00000020
 #define FUNC_MF_CFG_PROTOCOL_ROCE               0x00000030
-#define FUNC_MF_CFG_PROTOCOL_MAX	0x00000030
+#define FUNC_MF_CFG_PROTOCOL_NVMETCP    0x00000040
+#define FUNC_MF_CFG_PROTOCOL_MAX	0x00000040
 
 #define FUNC_MF_CFG_MIN_BW_MASK		0x0000ff00
 #define FUNC_MF_CFG_MIN_BW_SHIFT	8
diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index 49783f365079..88bfcdcd4a4c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -960,7 +960,8 @@ static int qed_sp_ll2_rx_queue_start(struct qed_hwfn *p_hwfn,
 
 	if (test_bit(QED_MF_LL2_NON_UNICAST, &p_hwfn->cdev->mf_bits) &&
 	    p_ramrod->main_func_queue && conn_type != QED_LL2_TYPE_ROCE &&
-	    conn_type != QED_LL2_TYPE_IWARP) {
+	    conn_type != QED_LL2_TYPE_IWARP &&
+		(!QED_IS_NVMETCP_PERSONALITY(p_hwfn))) {
 		p_ramrod->mf_si_bcast_accept_all = 1;
 		p_ramrod->mf_si_mcast_accept_all = 1;
 	} else {
@@ -1049,6 +1050,8 @@ static int qed_sp_ll2_tx_queue_start(struct qed_hwfn *p_hwfn,
 	case QED_LL2_TYPE_OOO:
 		if (p_hwfn->hw_info.personality == QED_PCI_ISCSI)
 			p_ramrod->conn_type = PROTOCOLID_ISCSI;
+		else if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP)
+			p_ramrod->conn_type = PROTOCOLID_NVMETCP;
 		else
 			p_ramrod->conn_type = PROTOCOLID_IWARP;
 		break;
@@ -1634,7 +1637,8 @@ int qed_ll2_establish_connection(void *cxt, u8 connection_handle)
 	if (rc)
 		goto out;
 
-	if (!QED_IS_RDMA_PERSONALITY(p_hwfn))
+	if (!QED_IS_RDMA_PERSONALITY(p_hwfn) &&
+	    !QED_IS_NVMETCP_PERSONALITY(p_hwfn))
 		qed_wr(p_hwfn, p_ptt, PRS_REG_USE_LIGHT_L2, 1);
 
 	qed_ll2_establish_connection_ooo(p_hwfn, p_ll2_conn);
@@ -2376,7 +2380,8 @@ static int qed_ll2_start_ooo(struct qed_hwfn *p_hwfn,
 static bool qed_ll2_is_storage_eng1(struct qed_dev *cdev)
 {
 	return (QED_IS_FCOE_PERSONALITY(QED_LEADING_HWFN(cdev)) ||
-		QED_IS_ISCSI_PERSONALITY(QED_LEADING_HWFN(cdev))) &&
+		QED_IS_ISCSI_PERSONALITY(QED_LEADING_HWFN(cdev)) ||
+		QED_IS_NVMETCP_PERSONALITY(QED_LEADING_HWFN(cdev))) &&
 		(QED_AFFIN_HWFN(cdev) != QED_LEADING_HWFN(cdev));
 }
 
@@ -2402,11 +2407,13 @@ static int qed_ll2_stop(struct qed_dev *cdev)
 
 	if (cdev->ll2->handle == QED_LL2_UNUSED_HANDLE)
 		return 0;
+	if (!QED_IS_NVMETCP_PERSONALITY(p_hwfn))
+		qed_llh_remove_mac_filter(cdev, 0, cdev->ll2_mac_address);
 
 	qed_llh_remove_mac_filter(cdev, 0, cdev->ll2_mac_address);
 	eth_zero_addr(cdev->ll2_mac_address);
 
-	if (QED_IS_ISCSI_PERSONALITY(p_hwfn))
+	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))
 		qed_ll2_stop_ooo(p_hwfn);
 
 	/* In CMT mode, LL2 is always started on engine 0 for a storage PF */
@@ -2442,6 +2449,7 @@ static int __qed_ll2_start(struct qed_hwfn *p_hwfn,
 		conn_type = QED_LL2_TYPE_FCOE;
 		break;
 	case QED_PCI_ISCSI:
+	case QED_PCI_NVMETCP:
 		conn_type = QED_LL2_TYPE_ISCSI;
 		break;
 	case QED_PCI_ETH_ROCE:
@@ -2567,7 +2575,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)
 		}
 	}
 
-	if (QED_IS_ISCSI_PERSONALITY(p_hwfn)) {
+	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn)) {
 		DP_VERBOSE(cdev, QED_MSG_STORAGE, "Starting OOO LL2 queue\n");
 		rc = qed_ll2_start_ooo(p_hwfn, params);
 		if (rc) {
@@ -2576,10 +2584,13 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)
 		}
 	}
 
-	rc = qed_llh_add_mac_filter(cdev, 0, params->ll2_mac_address);
-	if (rc) {
-		DP_NOTICE(cdev, "Failed to add an LLH filter\n");
-		goto err3;
+	if (!QED_IS_NVMETCP_PERSONALITY(p_hwfn)) {
+		rc = qed_llh_add_mac_filter(cdev, 0, params->ll2_mac_address);
+		if (rc) {
+			DP_NOTICE(cdev, "Failed to add an LLH filter\n");
+			goto err3;
+		}
+
 	}
 
 	ether_addr_copy(cdev->ll2_mac_address, params->ll2_mac_address);
@@ -2587,7 +2598,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)
 	return 0;
 
 err3:
-	if (QED_IS_ISCSI_PERSONALITY(p_hwfn))
+	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))
 		qed_ll2_stop_ooo(p_hwfn);
 err2:
 	if (b_is_storage_eng1)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index cd882c453394..4387292c37e2 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -2446,6 +2446,9 @@ qed_mcp_get_shmem_proto(struct qed_hwfn *p_hwfn,
 	case FUNC_MF_CFG_PROTOCOL_ISCSI:
 		*p_proto = QED_PCI_ISCSI;
 		break;
+	case FUNC_MF_CFG_PROTOCOL_NVMETCP:
+		*p_proto = QED_PCI_NVMETCP;
+		break;
 	case FUNC_MF_CFG_PROTOCOL_FCOE:
 		*p_proto = QED_PCI_FCOE;
 		break;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
index 3e3192a3ad9b..6190adf965bc 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
@@ -1306,7 +1306,8 @@ int qed_mfw_process_tlv_req(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
 	}
 
 	if ((tlv_group & QED_MFW_TLV_ISCSI) &&
-	    p_hwfn->hw_info.personality != QED_PCI_ISCSI) {
+	    p_hwfn->hw_info.personality != QED_PCI_ISCSI &&
+		p_hwfn->hw_info.personality != QED_PCI_NVMETCP) {
 		DP_VERBOSE(p_hwfn, QED_MSG_SP,
 			   "Skipping iSCSI TLVs for non-iSCSI function\n");
 		tlv_group &= ~QED_MFW_TLV_ISCSI;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_ooo.c b/drivers/net/ethernet/qlogic/qed/qed_ooo.c
index 88353aa404dc..d37bb2463f98 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ooo.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ooo.c
@@ -16,7 +16,7 @@
 #include "qed_ll2.h"
 #include "qed_ooo.h"
 #include "qed_cxt.h"
-
+#include "qed_nvmetcp.h"
 static struct qed_ooo_archipelago
 *qed_ooo_seek_archipelago(struct qed_hwfn *p_hwfn,
 			  struct qed_ooo_info
@@ -85,6 +85,9 @@ int qed_ooo_alloc(struct qed_hwfn *p_hwfn)
 	case QED_PCI_ISCSI:
 		proto = PROTOCOLID_ISCSI;
 		break;
+	case QED_PCI_NVMETCP:
+		proto = PROTOCOLID_NVMETCP;
+		break;
 	case QED_PCI_ETH_RDMA:
 	case QED_PCI_ETH_IWARP:
 		proto = PROTOCOLID_IWARP;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
index aa71adcf31ee..60b3876387a9 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
@@ -385,6 +385,7 @@ int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
 		p_ramrod->personality = PERSONALITY_FCOE;
 		break;
 	case QED_PCI_ISCSI:
+	case QED_PCI_NVMETCP:
 		p_ramrod->personality = PERSONALITY_ISCSI;
 		break;
 	case QED_PCI_ETH_ROCE:
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 04/27] qed: Add support of HW filter block
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (2 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 03/27] qed: Add qed-NVMeTCP personality Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:13   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 05/27] qed: Add NVMeTCP Offload IO Level FW and HW HSI Shai Malin
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

From: Prabhakar Kushwaha <pkushwaha@marvell.com>

This patch introduces the functionality of HW filter block.
It adds and removes filters based on source and target TCP port.

It also add functionality to clear all filters at once.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
---
 drivers/net/ethernet/qlogic/qed/qed.h         |  10 ++
 drivers/net/ethernet/qlogic/qed/qed_dev.c     | 107 ++++++++++++++++++
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |   5 +
 include/linux/qed/qed_nvmetcp_if.h            |  24 ++++
 4 files changed, 146 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
index 7ae648c4edba..c2305ff5bdc6 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -49,6 +49,8 @@ extern const struct qed_common_ops qed_common_ops_pass;
 #define QED_MIN_WIDS		(4)
 #define QED_PF_DEMS_SIZE        (4)
 
+#define QED_LLH_DONT_CARE 0
+
 /* cau states */
 enum qed_coalescing_mode {
 	QED_COAL_MODE_DISABLE,
@@ -1005,4 +1007,12 @@ int qed_mfw_fill_tlv_data(struct qed_hwfn *hwfn,
 void qed_hw_info_set_offload_tc(struct qed_hw_info *p_info, u8 tc);
 
 void qed_periodic_db_rec_start(struct qed_hwfn *p_hwfn);
+
+int qed_llh_add_src_tcp_port_filter(struct qed_dev *cdev, u16 src_port);
+int qed_llh_add_dst_tcp_port_filter(struct qed_dev *cdev, u16 dest_port);
+
+void qed_llh_remove_src_tcp_port_filter(struct qed_dev *cdev, u16 src_port);
+void qed_llh_remove_dst_tcp_port_filter(struct qed_dev *cdev, u16 src_port);
+
+void qed_llh_clear_all_filters(struct qed_dev *cdev);
 #endif /* _QED_H */
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index d3f8cc42d07e..12382e0c0419 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -5360,3 +5360,110 @@ void qed_set_fw_mac_addr(__le16 *fw_msb,
 	((u8 *)fw_lsb)[0] = mac[5];
 	((u8 *)fw_lsb)[1] = mac[4];
 }
+
+static int qed_llh_shadow_remove_all_filters(struct qed_dev *cdev, u8 ppfid)
+{
+	struct qed_llh_info *p_llh_info = cdev->p_llh_info;
+	struct qed_llh_filter_info *p_filters;
+	int rc;
+
+	rc = qed_llh_shadow_sanity(cdev, ppfid, 0, "remove_all");
+	if (rc)
+		return rc;
+
+	p_filters = p_llh_info->pp_filters[ppfid];
+	memset(p_filters, 0, NIG_REG_LLH_FUNC_FILTER_EN_SIZE *
+	       sizeof(*p_filters));
+
+	return 0;
+}
+
+int qed_abs_ppfid(struct qed_dev *cdev, u8 rel_ppfid, u8 *p_abs_ppfid)
+{
+	struct qed_llh_info *p_llh_info = cdev->p_llh_info;
+
+	if (rel_ppfid >= p_llh_info->num_ppfid) {
+		DP_NOTICE(cdev,
+			  "rel_ppfid %d is not valid, available indices are 0..%hhu\n",
+			  rel_ppfid, p_llh_info->num_ppfid - 1);
+
+		return -EINVAL;
+	}
+
+	*p_abs_ppfid = p_llh_info->ppfid_array[rel_ppfid];
+
+	return 0;
+}
+
+void qed_llh_clear_ppfid_filters(struct qed_dev *cdev, u8 ppfid)
+{
+	struct qed_hwfn *p_hwfn = QED_LEADING_HWFN(cdev);
+	struct qed_ptt *p_ptt = qed_ptt_acquire(p_hwfn);
+	u8 filter_idx, abs_ppfid;
+	int rc = 0;
+
+	if (!p_ptt)
+		return;
+
+	if (!test_bit(QED_MF_LLH_PROTO_CLSS, &cdev->mf_bits) &&
+	    !test_bit(QED_MF_LLH_MAC_CLSS, &cdev->mf_bits))
+		goto out;
+
+	rc = qed_abs_ppfid(cdev, ppfid, &abs_ppfid);
+	if (rc)
+		goto out;
+
+	rc = qed_llh_shadow_remove_all_filters(cdev, ppfid);
+	if (rc)
+		goto out;
+
+	for (filter_idx = 0; filter_idx < NIG_REG_LLH_FUNC_FILTER_EN_SIZE;
+	     filter_idx++) {
+		rc = qed_llh_remove_filter(p_hwfn, p_ptt,
+					   abs_ppfid, filter_idx);
+		if (rc)
+			goto out;
+	}
+out:
+	qed_ptt_release(p_hwfn, p_ptt);
+}
+
+int qed_llh_add_src_tcp_port_filter(struct qed_dev *cdev, u16 src_port)
+{
+	return qed_llh_add_protocol_filter(cdev, 0,
+					   QED_LLH_FILTER_TCP_SRC_PORT,
+					   src_port, QED_LLH_DONT_CARE);
+}
+
+void qed_llh_remove_src_tcp_port_filter(struct qed_dev *cdev, u16 src_port)
+{
+	qed_llh_remove_protocol_filter(cdev, 0,
+				       QED_LLH_FILTER_TCP_SRC_PORT,
+				       src_port, QED_LLH_DONT_CARE);
+}
+
+int qed_llh_add_dst_tcp_port_filter(struct qed_dev *cdev, u16 dest_port)
+{
+	return qed_llh_add_protocol_filter(cdev, 0,
+					   QED_LLH_FILTER_TCP_DEST_PORT,
+					   QED_LLH_DONT_CARE, dest_port);
+}
+
+void qed_llh_remove_dst_tcp_port_filter(struct qed_dev *cdev, u16 dest_port)
+{
+	qed_llh_remove_protocol_filter(cdev, 0,
+				       QED_LLH_FILTER_TCP_DEST_PORT,
+				       QED_LLH_DONT_CARE, dest_port);
+}
+
+void qed_llh_clear_all_filters(struct qed_dev *cdev)
+{
+	u8 ppfid;
+
+	if (!test_bit(QED_MF_LLH_PROTO_CLSS, &cdev->mf_bits) &&
+	    !test_bit(QED_MF_LLH_MAC_CLSS, &cdev->mf_bits))
+		return;
+
+	for (ppfid = 0; ppfid < cdev->p_llh_info->num_ppfid; ppfid++)
+		qed_llh_clear_ppfid_filters(cdev, ppfid);
+}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
index 79bd1cc6677f..1e2eb6dcbd6e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
@@ -844,6 +844,11 @@ static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
 	.update_conn = &qed_nvmetcp_update_conn,
 	.destroy_conn = &qed_nvmetcp_destroy_conn,
 	.clear_sq = &qed_nvmetcp_clear_conn_sq,
+	.add_src_tcp_port_filter = &qed_llh_add_src_tcp_port_filter,
+	.remove_src_tcp_port_filter = &qed_llh_remove_src_tcp_port_filter,
+	.add_dst_tcp_port_filter = &qed_llh_add_dst_tcp_port_filter,
+	.remove_dst_tcp_port_filter = &qed_llh_remove_dst_tcp_port_filter,
+	.clear_all_filters = &qed_llh_clear_all_filters
 };
 
 const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)
diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
index 96263e3cfa1e..686f924238e3 100644
--- a/include/linux/qed/qed_nvmetcp_if.h
+++ b/include/linux/qed/qed_nvmetcp_if.h
@@ -124,6 +124,20 @@ struct qed_nvmetcp_cb_ops {
  *			@param cdev
  *			@param handle - the connection handle.
  *			@return 0 on success, otherwise error value.
+ * @add_src_tcp_port_filter: Add source tcp port filter
+ *			@param cdev
+ *			@param src_port
+ * @remove_src_tcp_port_filter: Remove source tcp port filter
+ *			@param cdev
+ *			@param src_port
+ * @add_dst_tcp_port_filter: Add destination tcp port filter
+ *			@param cdev
+ *			@param dest_port
+ * @remove_dst_tcp_port_filter: Remove destination tcp port filter
+ *			@param cdev
+ *			@param dest_port
+ * @clear_all_filters: Clear all filters.
+ *			@param cdev
  */
 struct qed_nvmetcp_ops {
 	const struct qed_common_ops *common;
@@ -159,6 +173,16 @@ struct qed_nvmetcp_ops {
 	int (*destroy_conn)(struct qed_dev *cdev, u32 handle, u8 abrt_conn);
 
 	int (*clear_sq)(struct qed_dev *cdev, u32 handle);
+
+	int (*add_src_tcp_port_filter)(struct qed_dev *cdev, u16 src_port);
+
+	void (*remove_src_tcp_port_filter)(struct qed_dev *cdev, u16 src_port);
+
+	int (*add_dst_tcp_port_filter)(struct qed_dev *cdev, u16 dest_port);
+
+	void (*remove_dst_tcp_port_filter)(struct qed_dev *cdev, u16 dest_port);
+
+	void (*clear_all_filters)(struct qed_dev *cdev);
 };
 
 const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 05/27] qed: Add NVMeTCP Offload IO Level FW and HW HSI
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (3 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 04/27] qed: Add support of HW filter block Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:22   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 06/27] qed: Add NVMeTCP Offload IO Level FW Initializations Shai Malin
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

This patch introduces the NVMeTCP Offload FW and HW  HSI in order
to initialize the IO level configuration into a per IO HW
resource ("task") as part of the IO path flow.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
---
 include/linux/qed/nvmetcp_common.h | 418 ++++++++++++++++++++++++++++-
 include/linux/qed/qed_nvmetcp_if.h |  37 +++
 2 files changed, 454 insertions(+), 1 deletion(-)

diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
index c8836b71b866..dda7a785c321 100644
--- a/include/linux/qed/nvmetcp_common.h
+++ b/include/linux/qed/nvmetcp_common.h
@@ -7,6 +7,7 @@
 #include "tcp_common.h"
 
 #define NVMETCP_SLOW_PATH_LAYER_CODE (6)
+#define NVMETCP_WQE_NUM_SGES_SLOWIO (0xf)
 
 /* NVMeTCP firmware function init parameters */
 struct nvmetcp_spe_func_init {
@@ -194,4 +195,419 @@ struct nvmetcp_wqe {
 #define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_SHIFT 24
 };
 
-#endif /* __NVMETCP_COMMON__ */
+struct nvmetcp_host_cccid_itid_entry {
+	__le16 itid;
+};
+
+struct nvmetcp_connect_done_results {
+	__le16 icid;
+	__le16 conn_id;
+	struct tcp_ulp_connect_done_params params;
+};
+
+struct nvmetcp_eqe_data {
+	__le16 icid;
+	__le16 conn_id;
+	__le16 reserved;
+	u8 error_code;
+	u8 error_pdu_opcode_reserved;
+#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_MASK 0x3F
+#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_SHIFT  0
+#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_VALID_MASK  0x1
+#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_VALID_SHIFT  6
+#define NVMETCP_EQE_DATA_RESERVED0_MASK 0x1
+#define NVMETCP_EQE_DATA_RESERVED0_SHIFT 7
+};
+
+enum nvmetcp_task_type {
+	NVMETCP_TASK_TYPE_HOST_WRITE,
+	NVMETCP_TASK_TYPE_HOST_READ,
+	NVMETCP_TASK_TYPE_INIT_CONN_REQUEST,
+	NVMETCP_TASK_TYPE_RESERVED0,
+	NVMETCP_TASK_TYPE_CLEANUP,
+	NVMETCP_TASK_TYPE_HOST_READ_NO_CQE,
+	MAX_NVMETCP_TASK_TYPE
+};
+
+struct nvmetcp_db_data {
+	u8 params;
+#define NVMETCP_DB_DATA_DEST_MASK 0x3 /* destination of doorbell (use enum db_dest) */
+#define NVMETCP_DB_DATA_DEST_SHIFT 0
+#define NVMETCP_DB_DATA_AGG_CMD_MASK 0x3 /* aggregative command to CM (use enum db_agg_cmd_sel) */
+#define NVMETCP_DB_DATA_AGG_CMD_SHIFT 2
+#define NVMETCP_DB_DATA_BYPASS_EN_MASK 0x1 /* enable QM bypass */
+#define NVMETCP_DB_DATA_BYPASS_EN_SHIFT 4
+#define NVMETCP_DB_DATA_RESERVED_MASK 0x1
+#define NVMETCP_DB_DATA_RESERVED_SHIFT 5
+#define NVMETCP_DB_DATA_AGG_VAL_SEL_MASK 0x3 /* aggregative value selection */
+#define NVMETCP_DB_DATA_AGG_VAL_SEL_SHIFT 6
+	u8 agg_flags; /* bit for every DQ counter flags in CM context that DQ can increment */
+	__le16 sq_prod;
+};
+
+struct nvmetcp_fw_cqe_error_bitmap {
+	u8 cqe_error_status_bits;
+#define CQE_ERROR_BITMAP_DIF_ERR_BITS_MASK 0x7
+#define CQE_ERROR_BITMAP_DIF_ERR_BITS_SHIFT 0
+#define CQE_ERROR_BITMAP_DATA_DIGEST_ERR_MASK 0x1
+#define CQE_ERROR_BITMAP_DATA_DIGEST_ERR_SHIFT 3
+#define CQE_ERROR_BITMAP_RCV_ON_INVALID_CONN_MASK 0x1
+#define CQE_ERROR_BITMAP_RCV_ON_INVALID_CONN_SHIFT 4
+};
+
+struct nvmetcp_nvmf_cqe {
+	__le32 reserved[4];
+};
+
+struct nvmetcp_fw_cqe_data {
+	struct nvmetcp_nvmf_cqe nvme_cqe;
+	struct regpair task_opaque;
+	__le32 reserved[6];
+};
+
+struct nvmetcp_fw_cqe {
+	__le16 conn_id;
+	u8 cqe_type;
+	struct nvmetcp_fw_cqe_error_bitmap error_bitmap;
+	__le16 itid;
+	u8 task_type;
+	u8 fw_dbg_field;
+	u8 caused_conn_err;
+	u8 reserved0[3];
+	__le32 reserved1;
+	struct nvmetcp_nvmf_cqe nvme_cqe;
+	struct regpair task_opaque;
+	__le32 reserved[6];
+};
+
+enum nvmetcp_fw_cqes_type {
+	NVMETCP_FW_CQE_TYPE_NORMAL = 1,
+	NVMETCP_FW_CQE_TYPE_RESERVED0,
+	NVMETCP_FW_CQE_TYPE_RESERVED1,
+	NVMETCP_FW_CQE_TYPE_CLEANUP,
+	NVMETCP_FW_CQE_TYPE_DUMMY,
+	MAX_NVMETCP_FW_CQES_TYPE
+};
+
+struct ystorm_nvmetcp_task_state {
+	struct scsi_cached_sges data_desc;
+	struct scsi_sgl_params sgl_params;
+	__le32 resrved0;
+	__le32 buffer_offset;
+	__le16 cccid;
+	struct nvmetcp_dif_flags dif_flags;
+	u8 flags;
+#define YSTORM_NVMETCP_TASK_STATE_LOCAL_COMP_MASK 0x1
+#define YSTORM_NVMETCP_TASK_STATE_LOCAL_COMP_SHIFT 0
+#define YSTORM_NVMETCP_TASK_STATE_SLOW_IO_MASK 0x1
+#define YSTORM_NVMETCP_TASK_STATE_SLOW_IO_SHIFT 1
+#define YSTORM_NVMETCP_TASK_STATE_SET_DIF_OFFSET_MASK 0x1
+#define YSTORM_NVMETCP_TASK_STATE_SET_DIF_OFFSET_SHIFT 2
+#define YSTORM_NVMETCP_TASK_STATE_SEND_W_RSP_MASK 0x1
+#define YSTORM_NVMETCP_TASK_STATE_SEND_W_RSP_SHIFT 3
+};
+
+struct ystorm_nvmetcp_task_rxmit_opt {
+	__le32 reserved[4];
+};
+
+struct nvmetcp_task_hdr {
+	__le32 reg[18];
+};
+
+struct nvmetcp_task_hdr_aligned {
+	struct nvmetcp_task_hdr task_hdr;
+	__le32 reserved[2];	/* HSI_COMMENT: Align to QREG */
+};
+
+struct e5_tdif_task_context {
+	__le32 reserved[16];
+};
+
+struct e5_rdif_task_context {
+	__le32 reserved[12];
+};
+
+struct ystorm_nvmetcp_task_st_ctx {
+	struct ystorm_nvmetcp_task_state state;
+	struct ystorm_nvmetcp_task_rxmit_opt rxmit_opt;
+	struct nvmetcp_task_hdr_aligned pdu_hdr;
+};
+
+struct mstorm_nvmetcp_task_st_ctx {
+	struct scsi_cached_sges data_desc;
+	struct scsi_sgl_params sgl_params;
+	__le32 rem_task_size;
+	__le32 data_buffer_offset;
+	u8 task_type;
+	struct nvmetcp_dif_flags dif_flags;
+	__le16 dif_task_icid;
+	struct regpair reserved0;
+	__le32 expected_itt;
+	__le32 reserved1;
+};
+
+struct nvmetcp_reg1 {
+	__le32 reg1_map;
+#define NVMETCP_REG1_NUM_SGES_MASK 0xF
+#define NVMETCP_REG1_NUM_SGES_SHIFT 0
+#define NVMETCP_REG1_RESERVED1_MASK 0xFFFFFFF
+#define NVMETCP_REG1_RESERVED1_SHIFT 4
+};
+
+struct ustorm_nvmetcp_task_st_ctx {
+	__le32 rem_rcv_len;
+	__le32 exp_data_transfer_len;
+	__le32 exp_data_sn;
+	struct regpair reserved0;
+	struct nvmetcp_reg1 reg1;
+	u8 flags2;
+#define USTORM_NVMETCP_TASK_ST_CTX_AHS_EXIST_MASK 0x1
+#define USTORM_NVMETCP_TASK_ST_CTX_AHS_EXIST_SHIFT 0
+#define USTORM_NVMETCP_TASK_ST_CTX_RESERVED1_MASK 0x7F
+#define USTORM_NVMETCP_TASK_ST_CTX_RESERVED1_SHIFT 1
+	struct nvmetcp_dif_flags dif_flags;
+	__le16 reserved3;
+	__le16 tqe_opaque[2];
+	__le32 reserved5;
+	__le32 nvme_tcp_opaque_lo;
+	__le32 nvme_tcp_opaque_hi;
+	u8 task_type;
+	u8 error_flags;
+#define USTORM_NVMETCP_TASK_ST_CTX_DATA_DIGEST_ERROR_MASK 0x1
+#define USTORM_NVMETCP_TASK_ST_CTX_DATA_DIGEST_ERROR_SHIFT 0
+#define USTORM_NVMETCP_TASK_ST_CTX_DATA_TRUNCATED_ERROR_MASK 0x1
+#define USTORM_NVMETCP_TASK_ST_CTX_DATA_TRUNCATED_ERROR_SHIFT 1
+#define USTORM_NVMETCP_TASK_ST_CTX_UNDER_RUN_ERROR_MASK 0x1
+#define USTORM_NVMETCP_TASK_ST_CTX_UNDER_RUN_ERROR_SHIFT 2
+#define USTORM_NVMETCP_TASK_ST_CTX_NVME_TCP_MASK 0x1
+#define USTORM_NVMETCP_TASK_ST_CTX_NVME_TCP_SHIFT 3
+	u8 flags;
+#define USTORM_NVMETCP_TASK_ST_CTX_CQE_WRITE_MASK 0x3
+#define USTORM_NVMETCP_TASK_ST_CTX_CQE_WRITE_SHIFT 0
+#define USTORM_NVMETCP_TASK_ST_CTX_LOCAL_COMP_MASK 0x1
+#define USTORM_NVMETCP_TASK_ST_CTX_LOCAL_COMP_SHIFT 2
+#define USTORM_NVMETCP_TASK_ST_CTX_Q0_R2TQE_WRITE_MASK 0x1
+#define USTORM_NVMETCP_TASK_ST_CTX_Q0_R2TQE_WRITE_SHIFT 3
+#define USTORM_NVMETCP_TASK_ST_CTX_TOTAL_DATA_ACKED_DONE_MASK 0x1
+#define USTORM_NVMETCP_TASK_ST_CTX_TOTAL_DATA_ACKED_DONE_SHIFT 4
+#define USTORM_NVMETCP_TASK_ST_CTX_HQ_SCANNED_DONE_MASK 0x1
+#define USTORM_NVMETCP_TASK_ST_CTX_HQ_SCANNED_DONE_SHIFT 5
+#define USTORM_NVMETCP_TASK_ST_CTX_R2T2RECV_DONE_MASK 0x1
+#define USTORM_NVMETCP_TASK_ST_CTX_R2T2RECV_DONE_SHIFT 6
+	u8 cq_rss_number;
+};
+
+struct e5_ystorm_nvmetcp_task_ag_ctx {
+	u8 reserved /* cdu_validation */;
+	u8 byte1 /* state_and_core_id */;
+	__le16 word0 /* icid */;
+	u8 flags0;
+	u8 flags1;
+	u8 flags2;
+	u8 flags3;
+	__le32 TTT;
+	u8 byte2;
+	u8 byte3;
+	u8 byte4;
+	u8 e4_reserved7;
+};
+
+struct e5_mstorm_nvmetcp_task_ag_ctx {
+	u8 cdu_validation;
+	u8 byte1;
+	__le16 task_cid;
+	u8 flags0;
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CONNECTION_TYPE_MASK 0xF
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CONNECTION_TYPE_SHIFT 0
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_EXIST_IN_QM0_MASK 0x1
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_EXIST_IN_QM0_SHIFT 4
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_MASK 0x1
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_SHIFT 5
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_VALID_MASK 0x1
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_VALID_SHIFT 6
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_FLAG_MASK 0x1
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_FLAG_SHIFT 7
+	u8 flags1;
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_CF_MASK 0x3
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_CF_SHIFT 0
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF1_MASK 0x3
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF1_SHIFT 2
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF2_MASK 0x3
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF2_SHIFT 4
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_CF_EN_MASK 0x1
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_CF_EN_SHIFT 6
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF1EN_MASK 0x1
+#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF1EN_SHIFT 7
+	u8 flags2;
+	u8 flags3;
+	__le32 reg0;
+	u8 byte2;
+	u8 byte3;
+	u8 byte4;
+	u8 e4_reserved7;
+};
+
+struct e5_ustorm_nvmetcp_task_ag_ctx {
+	u8 reserved;
+	u8 state_and_core_id;
+	__le16 icid;
+	u8 flags0;
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CONNECTION_TYPE_MASK 0xF
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CONNECTION_TYPE_SHIFT 0
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_EXIST_IN_QM0_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_EXIST_IN_QM0_SHIFT 4
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_SHIFT 5
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_HQ_SCANNED_CF_MASK 0x3
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_HQ_SCANNED_CF_SHIFT 6
+	u8 flags1;
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_RESERVED1_MASK 0x3
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_RESERVED1_SHIFT 0
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV_MASK 0x3
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV_SHIFT 2
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CF3_MASK 0x3
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CF3_SHIFT 4
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_MASK 0x3
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_SHIFT 6
+	u8 flags2;
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_HQ_SCANNED_CF_EN_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_HQ_SCANNED_CF_EN_SHIFT 0
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_DISABLE_DATA_ACKED_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_DISABLE_DATA_ACKED_SHIFT 1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV_EN_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV_EN_SHIFT 2
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CF3EN_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CF3EN_SHIFT 3
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_EN_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_EN_SHIFT 4
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CMP_DATA_TOTAL_EXP_EN_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CMP_DATA_TOTAL_EXP_EN_SHIFT 5
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_RULE1EN_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_RULE1EN_SHIFT 6
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CMP_CONT_RCV_EXP_EN_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_CMP_CONT_RCV_EXP_EN_SHIFT 7
+	u8 flags3;
+	u8 flags4;
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED5_MASK 0x3
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED5_SHIFT 0
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED6_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED6_SHIFT 2
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED7_MASK 0x1
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED7_SHIFT 3
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_TYPE_MASK 0xF
+#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_TYPE_SHIFT 4
+	u8 byte2;
+	u8 byte3;
+	u8 e4_reserved8;
+	__le32 dif_err_intervals;
+	__le32 dif_error_1st_interval;
+	__le32 rcv_cont_len;
+	__le32 exp_cont_len;
+	__le32 total_data_acked;
+	__le32 exp_data_acked;
+	__le16 word1;
+	__le16 next_tid;
+	__le32 hdr_residual_count;
+	__le32 exp_r2t_sn;
+};
+
+struct e5_nvmetcp_task_context {
+	struct ystorm_nvmetcp_task_st_ctx ystorm_st_context;
+	struct e5_ystorm_nvmetcp_task_ag_ctx ystorm_ag_context;
+	struct regpair ystorm_ag_padding[2];
+	struct e5_tdif_task_context tdif_context;
+	struct e5_mstorm_nvmetcp_task_ag_ctx mstorm_ag_context;
+	struct regpair mstorm_ag_padding[2];
+	struct e5_ustorm_nvmetcp_task_ag_ctx ustorm_ag_context;
+	struct regpair ustorm_ag_padding[2];
+	struct mstorm_nvmetcp_task_st_ctx mstorm_st_context;
+	struct regpair mstorm_st_padding[2];
+	struct ustorm_nvmetcp_task_st_ctx ustorm_st_context;
+	struct regpair ustorm_st_padding[2];
+	struct e5_rdif_task_context rdif_context;
+};
+
+/* NVMe TCP common header in network order */
+struct nvmetcp_common_hdr {
+	u8 pdo;
+	u8 hlen;
+	u8 flags;
+#define NVMETCP_COMMON_HDR_HDGSTF_MASK 0x1
+#define NVMETCP_COMMON_HDR_HDGSTF_SHIFT 0
+#define NVMETCP_COMMON_HDR_DDGSTF_MASK 0x1
+#define NVMETCP_COMMON_HDR_DDGSTF_SHIFT 1
+#define NVMETCP_COMMON_HDR_LAST_PDU_MASK 0x1
+#define NVMETCP_COMMON_HDR_LAST_PDU_SHIFT 2
+#define NVMETCP_COMMON_HDR_SUCCESS_MASK 0x1
+#define NVMETCP_COMMON_HDR_SUCCESS_SHIFT 3
+#define NVMETCP_COMMON_HDR_RESERVED_MASK 0xF
+#define NVMETCP_COMMON_HDR_RESERVED_SHIFT 4
+	u8 pdu_type;
+	__le32 plen_swapped;
+};
+
+/* We don't need the entire 128 Bytes of the ICReq, hence passing only 16
+ * Bytes to the FW in network order.
+ */
+struct nvmetcp_icreq_hdr_psh {
+	__le16 pfv;
+	u8 hpda;
+	u8 digest;
+#define NVMETCP_ICREQ_HDR_PSH_16B_HDGST_EN_MASK 0x1
+#define NVMETCP_ICREQ_HDR_PSH_16B_HDGST_EN_SHIFT 0
+#define NVMETCP_ICREQ_HDR_PSH_16B_DDGST_EN_MASK 0x1
+#define NVMETCP_ICREQ_HDR_PSH_16B_DDGST_EN_SHIFT 1
+#define NVMETCP_ICREQ_HDR_PSH_16B_RESERVED1_MASK 0x3F
+#define NVMETCP_ICREQ_HDR_PSH_16B_RESERVED1_SHIFT 2
+	__le32 maxr2t;
+	u8 reserved[8];
+};
+
+struct nvmetcp_cmd_capsule_hdr_psh {
+	__le32 raw_swapped[16];
+};
+
+struct nvmetcp_cmd_capsule_hdr {
+	struct nvmetcp_common_hdr chdr;
+	struct nvmetcp_cmd_capsule_hdr_psh pshdr;
+};
+
+struct nvmetcp_data_hdr {
+	__le32 data[6];
+};
+
+struct nvmetcp_h2c_hdr_psh {
+	__le16 ttag_swapped;
+	__le16 command_id_swapped;
+	__le32 data_offset_swapped;
+	__le32 data_length_swapped;
+	__le32 reserved1;
+};
+
+struct nvmetcp_h2c_hdr {
+	struct nvmetcp_common_hdr chdr;
+	struct nvmetcp_h2c_hdr_psh pshdr;
+};
+
+/* We don't need the entire 128 Bytes of the ICResp, hence passing only 16
+ * Bytes to the FW in network order.
+ */
+struct nvmetcp_icresp_hdr_psh {
+	u8 digest;
+#define NVMETCP_ICRESP_HDR_PSH_16B_HDGST_EN_MASK 0x1
+#define NVMETCP_ICRESP_HDR_PSH_16B_HDGST_EN_SHIFT 0
+#define NVMETCP_ICRESP_HDR_PSH_16B_DDGST_EN_MASK 0x1
+#define NVMETCP_ICRESP_HDR_PSH_16B_DDGST_EN_SHIFT 1
+	u8 cpda;
+	__le16 pfv_swapped;
+	__le32 maxdata_swapped;
+	__le16 reserved2[4];
+};
+
+struct nvmetcp_init_conn_req_hdr {
+	struct nvmetcp_common_hdr chdr;
+	struct nvmetcp_icreq_hdr_psh pshdr;
+};
+
+#endif /* __NVMETCP_COMMON__*/
diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
index 686f924238e3..04e90dc42c12 100644
--- a/include/linux/qed/qed_nvmetcp_if.h
+++ b/include/linux/qed/qed_nvmetcp_if.h
@@ -5,6 +5,8 @@
 #define _QED_NVMETCP_IF_H
 #include <linux/types.h>
 #include <linux/qed/qed_if.h>
+#include <linux/qed/storage_common.h>
+#include <linux/qed/nvmetcp_common.h>
 
 #define QED_NVMETCP_MAX_IO_SIZE	0x800000
 
@@ -73,6 +75,41 @@ struct qed_nvmetcp_cb_ops {
 	struct qed_common_cb_ops common;
 };
 
+struct nvmetcp_sge {
+	struct regpair sge_addr; /* SGE address */
+	__le32 sge_len; /* SGE length */
+	__le32 reserved;
+};
+
+/* IO path HSI function SGL params */
+struct storage_sgl_task_params {
+	struct nvmetcp_sge *sgl;
+	struct regpair sgl_phys_addr;
+	u32 total_buffer_size;
+	u16 num_sges;
+	bool small_mid_sge;
+};
+
+/* IO path HSI function FW task context params */
+struct nvmetcp_task_params {
+	void *context; /* Output parameter - set/filled by the HSI function */
+	struct nvmetcp_wqe *sqe;
+	u32 tx_io_size; /* in bytes (Without DIF, if exists) */
+	u32 rx_io_size; /* in bytes (Without DIF, if exists) */
+	u16 conn_icid;
+	u16 itid;
+	struct regpair opq; /* qedn_task_ctx address */
+	u16 host_cccid;
+	u8 cq_rss_number;
+	bool send_write_incapsule;
+};
+
+/* IO path HSI function FW conn level input params */
+
+struct nvmetcp_conn_params {
+	u32 max_burst_length;
+};
+
 /**
  * struct qed_nvmetcp_ops - qed NVMeTCP operations.
  * @common:		common operations pointer
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 06/27] qed: Add NVMeTCP Offload IO Level FW Initializations
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (4 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 05/27] qed: Add NVMeTCP Offload IO Level FW and HW HSI Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:24   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 07/27] qed: Add IP services APIs support Shai Malin
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

This patch introduces the NVMeTCP FW initializations which is used
to initialize the IO level configuration into a per IO HW
resource ("task") as part of the IO path flow.

This includes:
- Write IO FW initialization
- Read IO FW initialization.
- IC-Req and IC-Resp FW exchange.
- FW Cleanup flow (Flush IO).

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
---
 drivers/net/ethernet/qlogic/qed/Makefile      |   5 +-
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |   7 +-
 .../qlogic/qed/qed_nvmetcp_fw_funcs.c         | 372 ++++++++++++++++++
 .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |  43 ++
 include/linux/qed/nvmetcp_common.h            |   3 +
 include/linux/qed/qed_nvmetcp_if.h            |  17 +
 6 files changed, 445 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h

diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile
index 7cb0db67ba5b..0d9c2fe0245d 100644
--- a/drivers/net/ethernet/qlogic/qed/Makefile
+++ b/drivers/net/ethernet/qlogic/qed/Makefile
@@ -28,7 +28,10 @@ qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o
 qed-$(CONFIG_QED_LL2) += qed_ll2.o
 qed-$(CONFIG_QED_OOO) += qed_ooo.o
 
-qed-$(CONFIG_QED_NVMETCP) += qed_nvmetcp.o
+qed-$(CONFIG_QED_NVMETCP) +=	\
+	qed_nvmetcp.o		\
+	qed_nvmetcp_fw_funcs.o	\
+	qed_nvmetcp_ip_services.o
 
 qed-$(CONFIG_QED_RDMA) +=	\
 	qed_iwarp.o		\
diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
index 1e2eb6dcbd6e..434363f8b5c0 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
@@ -27,6 +27,7 @@
 #include "qed_mcp.h"
 #include "qed_sp.h"
 #include "qed_reg_addr.h"
+#include "qed_nvmetcp_fw_funcs.h"
 
 static int qed_nvmetcp_async_event(struct qed_hwfn *p_hwfn, u8 fw_event_code,
 				   u16 echo, union event_ring_data *data,
@@ -848,7 +849,11 @@ static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
 	.remove_src_tcp_port_filter = &qed_llh_remove_src_tcp_port_filter,
 	.add_dst_tcp_port_filter = &qed_llh_add_dst_tcp_port_filter,
 	.remove_dst_tcp_port_filter = &qed_llh_remove_dst_tcp_port_filter,
-	.clear_all_filters = &qed_llh_clear_all_filters
+	.clear_all_filters = &qed_llh_clear_all_filters,
+	.init_read_io = &init_nvmetcp_host_read_task,
+	.init_write_io = &init_nvmetcp_host_write_task,
+	.init_icreq_exchange = &init_nvmetcp_init_conn_req_task,
+	.init_task_cleanup = &init_cleanup_task_nvmetcp
 };
 
 const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
new file mode 100644
index 000000000000..8485ad678284
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
@@ -0,0 +1,372 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
+/* Copyright 2021 Marvell. All rights reserved. */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <asm/byteorder.h>
+#include <linux/qed/common_hsi.h>
+#include <linux/qed/storage_common.h>
+#include <linux/qed/nvmetcp_common.h>
+#include <linux/qed/qed_nvmetcp_if.h>
+#include "qed_nvmetcp_fw_funcs.h"
+
+#define NVMETCP_NUM_SGES_IN_CACHE 0x4
+
+bool nvmetcp_is_slow_sgl(u16 num_sges, bool small_mid_sge)
+{
+	return (num_sges > SCSI_NUM_SGES_SLOW_SGL_THR && small_mid_sge);
+}
+
+void init_scsi_sgl_context(struct scsi_sgl_params *ctx_sgl_params,
+			   struct scsi_cached_sges *ctx_data_desc,
+			   struct storage_sgl_task_params *sgl_params)
+{
+	u8 num_sges_to_init = (u8)(sgl_params->num_sges > NVMETCP_NUM_SGES_IN_CACHE ?
+				   NVMETCP_NUM_SGES_IN_CACHE : sgl_params->num_sges);
+	u8 sge_index;
+
+	/* sgl params */
+	ctx_sgl_params->sgl_addr.lo = cpu_to_le32(sgl_params->sgl_phys_addr.lo);
+	ctx_sgl_params->sgl_addr.hi = cpu_to_le32(sgl_params->sgl_phys_addr.hi);
+	ctx_sgl_params->sgl_total_length = cpu_to_le32(sgl_params->total_buffer_size);
+	ctx_sgl_params->sgl_num_sges = cpu_to_le16(sgl_params->num_sges);
+
+	for (sge_index = 0; sge_index < num_sges_to_init; sge_index++) {
+		ctx_data_desc->sge[sge_index].sge_addr.lo =
+			cpu_to_le32(sgl_params->sgl[sge_index].sge_addr.lo);
+		ctx_data_desc->sge[sge_index].sge_addr.hi =
+			cpu_to_le32(sgl_params->sgl[sge_index].sge_addr.hi);
+		ctx_data_desc->sge[sge_index].sge_len =
+			cpu_to_le32(sgl_params->sgl[sge_index].sge_len);
+	}
+}
+
+static inline u32 calc_rw_task_size(struct nvmetcp_task_params *task_params,
+				    enum nvmetcp_task_type task_type)
+{
+	u32 io_size;
+
+	if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE)
+		io_size = task_params->tx_io_size;
+	else
+		io_size = task_params->rx_io_size;
+
+	if (unlikely(!io_size))
+		return 0;
+
+	return io_size;
+}
+
+static inline void init_sqe(struct nvmetcp_task_params *task_params,
+			    struct storage_sgl_task_params *sgl_task_params,
+			    enum nvmetcp_task_type task_type)
+{
+	if (!task_params->sqe)
+		return;
+
+	memset(task_params->sqe, 0, sizeof(*task_params->sqe));
+	task_params->sqe->task_id = cpu_to_le16(task_params->itid);
+
+	switch (task_type) {
+	case NVMETCP_TASK_TYPE_HOST_WRITE: {
+		u32 buf_size = 0;
+		u32 num_sges = 0;
+
+		SET_FIELD(task_params->sqe->contlen_cdbsize,
+			  NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD, 1);
+		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,
+			  NVMETCP_WQE_TYPE_NORMAL);
+		if (task_params->tx_io_size) {
+			if (task_params->send_write_incapsule)
+				buf_size = calc_rw_task_size(task_params, task_type);
+
+			if (nvmetcp_is_slow_sgl(sgl_task_params->num_sges,
+						sgl_task_params->small_mid_sge))
+				num_sges = NVMETCP_WQE_NUM_SGES_SLOWIO;
+			else
+				num_sges = min((u16)sgl_task_params->num_sges,
+					       (u16)SCSI_NUM_SGES_SLOW_SGL_THR);
+		}
+		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_NUM_SGES, num_sges);
+		SET_FIELD(task_params->sqe->contlen_cdbsize, NVMETCP_WQE_CONT_LEN, buf_size);
+	} break;
+
+	case NVMETCP_TASK_TYPE_HOST_READ: {
+		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,
+			  NVMETCP_WQE_TYPE_NORMAL);
+		SET_FIELD(task_params->sqe->contlen_cdbsize,
+			  NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD, 1);
+	} break;
+
+	case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST: {
+		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,
+			  NVMETCP_WQE_TYPE_MIDDLE_PATH);
+
+		if (task_params->tx_io_size) {
+			SET_FIELD(task_params->sqe->contlen_cdbsize, NVMETCP_WQE_CONT_LEN,
+				  task_params->tx_io_size);
+			SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_NUM_SGES,
+				  min((u16)sgl_task_params->num_sges,
+				      (u16)SCSI_NUM_SGES_SLOW_SGL_THR));
+		}
+	} break;
+
+	case NVMETCP_TASK_TYPE_CLEANUP:
+		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,
+			  NVMETCP_WQE_TYPE_TASK_CLEANUP);
+
+	default:
+		break;
+	}
+}
+
+/* The following function initializes of NVMeTCP task params */
+static inline void
+init_nvmetcp_task_params(struct e5_nvmetcp_task_context *context,
+			 struct nvmetcp_task_params *task_params,
+			 enum nvmetcp_task_type task_type)
+{
+	context->ystorm_st_context.state.cccid = task_params->host_cccid;
+	SET_FIELD(context->ustorm_st_context.error_flags, USTORM_NVMETCP_TASK_ST_CTX_NVME_TCP, 1);
+	context->ustorm_st_context.nvme_tcp_opaque_lo = cpu_to_le32(task_params->opq.lo);
+	context->ustorm_st_context.nvme_tcp_opaque_hi = cpu_to_le32(task_params->opq.hi);
+}
+
+/* The following function initializes default values to all tasks */
+static inline void
+init_default_nvmetcp_task(struct nvmetcp_task_params *task_params, void *pdu_header,
+			  enum nvmetcp_task_type task_type)
+{
+	struct e5_nvmetcp_task_context *context = task_params->context;
+	const u8 val_byte = context->mstorm_ag_context.cdu_validation;
+	u8 dw_index;
+
+	memset(context, 0, sizeof(*context));
+
+	init_nvmetcp_task_params(context, task_params,
+				 (enum nvmetcp_task_type)task_type);
+
+	if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE ||
+	    task_type == NVMETCP_TASK_TYPE_HOST_READ) {
+		for (dw_index = 0; dw_index < QED_NVMETCP_CMD_HDR_SIZE / 4; dw_index++)
+			context->ystorm_st_context.pdu_hdr.task_hdr.reg[dw_index] =
+				cpu_to_le32(((u32 *)pdu_header)[dw_index]);
+	} else {
+		for (dw_index = 0; dw_index < QED_NVMETCP_CMN_HDR_SIZE / 4; dw_index++)
+			context->ystorm_st_context.pdu_hdr.task_hdr.reg[dw_index] =
+				cpu_to_le32(((u32 *)pdu_header)[dw_index]);
+	}
+
+	/* M-Storm Context: */
+	context->mstorm_ag_context.cdu_validation = val_byte;
+	context->mstorm_st_context.task_type = (u8)(task_type);
+	context->mstorm_ag_context.task_cid = cpu_to_le16(task_params->conn_icid);
+
+	/* Ustorm Context: */
+	SET_FIELD(context->ustorm_ag_context.flags1, E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV, 1);
+	context->ustorm_st_context.task_type = (u8)(task_type);
+	context->ustorm_st_context.cq_rss_number = task_params->cq_rss_number;
+	context->ustorm_ag_context.icid = cpu_to_le16(task_params->conn_icid);
+}
+
+/* The following function initializes the U-Storm Task Contexts */
+static inline void
+init_ustorm_task_contexts(struct ustorm_nvmetcp_task_st_ctx *ustorm_st_context,
+			  struct e5_ustorm_nvmetcp_task_ag_ctx *ustorm_ag_context,
+			  u32 remaining_recv_len,
+			  u32 expected_data_transfer_len, u8 num_sges,
+			  bool tx_dif_conn_err_en)
+{
+	/* Remaining data to be received in bytes. Used in validations*/
+	ustorm_st_context->rem_rcv_len = cpu_to_le32(remaining_recv_len);
+	ustorm_ag_context->exp_data_acked = cpu_to_le32(expected_data_transfer_len);
+	ustorm_st_context->exp_data_transfer_len = cpu_to_le32(expected_data_transfer_len);
+	SET_FIELD(ustorm_st_context->reg1.reg1_map, NVMETCP_REG1_NUM_SGES, num_sges);
+	SET_FIELD(ustorm_ag_context->flags2, E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_EN,
+		  tx_dif_conn_err_en ? 1 : 0);
+}
+
+/* The following function initializes Local Completion Contexts: */
+static inline void
+set_local_completion_context(struct e5_nvmetcp_task_context *context)
+{
+	SET_FIELD(context->ystorm_st_context.state.flags,
+		  YSTORM_NVMETCP_TASK_STATE_LOCAL_COMP, 1);
+	SET_FIELD(context->ustorm_st_context.flags,
+		  USTORM_NVMETCP_TASK_ST_CTX_LOCAL_COMP, 1);
+}
+
+/* Common Fastpath task init function: */
+static inline void
+init_rw_nvmetcp_task(struct nvmetcp_task_params *task_params,
+		     enum nvmetcp_task_type task_type,
+		     struct nvmetcp_conn_params *conn_params, void *pdu_header,
+		     struct storage_sgl_task_params *sgl_task_params)
+{
+	struct e5_nvmetcp_task_context *context = task_params->context;
+	u32 task_size = calc_rw_task_size(task_params, task_type);
+	u32 exp_data_transfer_len = conn_params->max_burst_length;
+	bool slow_io = false;
+	u8 num_sges = 0;
+
+	init_default_nvmetcp_task(task_params, pdu_header, task_type);
+
+	/* Tx/Rx: */
+	if (task_params->tx_io_size) {
+		/* if data to transmit: */
+		init_scsi_sgl_context(&context->ystorm_st_context.state.sgl_params,
+				      &context->ystorm_st_context.state.data_desc,
+				      sgl_task_params);
+		slow_io = nvmetcp_is_slow_sgl(sgl_task_params->num_sges,
+					      sgl_task_params->small_mid_sge);
+		num_sges =
+			(u8)(!slow_io ? min((u32)sgl_task_params->num_sges,
+					    (u32)SCSI_NUM_SGES_SLOW_SGL_THR) :
+					    NVMETCP_WQE_NUM_SGES_SLOWIO);
+		if (slow_io) {
+			SET_FIELD(context->ystorm_st_context.state.flags,
+				  YSTORM_NVMETCP_TASK_STATE_SLOW_IO, 1);
+		}
+	} else if (task_params->rx_io_size) {
+		/* if data to receive: */
+		init_scsi_sgl_context(&context->mstorm_st_context.sgl_params,
+				      &context->mstorm_st_context.data_desc,
+				      sgl_task_params);
+		num_sges =
+			(u8)(!nvmetcp_is_slow_sgl(sgl_task_params->num_sges,
+						  sgl_task_params->small_mid_sge) ?
+						  min((u32)sgl_task_params->num_sges,
+						      (u32)SCSI_NUM_SGES_SLOW_SGL_THR) :
+						      NVMETCP_WQE_NUM_SGES_SLOWIO);
+		context->mstorm_st_context.rem_task_size = cpu_to_le32(task_size);
+	}
+
+	/* Ustorm context: */
+	if (exp_data_transfer_len > task_size)
+		/* The size of the transmitted task*/
+		exp_data_transfer_len = task_size;
+	init_ustorm_task_contexts(&context->ustorm_st_context,
+				  &context->ustorm_ag_context,
+				  /* Remaining Receive length is the Task Size */
+				  task_size,
+				  /* The size of the transmitted task */
+				  exp_data_transfer_len,
+				  /* num_sges */
+				  num_sges,
+				  false);
+
+	/* Set exp_data_acked */
+	if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE) {
+		if (task_params->send_write_incapsule)
+			context->ustorm_ag_context.exp_data_acked = task_size;
+		else
+			context->ustorm_ag_context.exp_data_acked = 0;
+	} else if (task_type == NVMETCP_TASK_TYPE_HOST_READ) {
+		context->ustorm_ag_context.exp_data_acked = 0;
+	}
+
+	context->ustorm_ag_context.exp_cont_len = 0;
+
+	init_sqe(task_params, sgl_task_params, task_type);
+}
+
+static void
+init_common_initiator_read_task(struct nvmetcp_task_params *task_params,
+				struct nvmetcp_conn_params *conn_params,
+				struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
+				struct storage_sgl_task_params *sgl_task_params)
+{
+	init_rw_nvmetcp_task(task_params, NVMETCP_TASK_TYPE_HOST_READ,
+			     conn_params, cmd_pdu_header, sgl_task_params);
+}
+
+void init_nvmetcp_host_read_task(struct nvmetcp_task_params *task_params,
+				 struct nvmetcp_conn_params *conn_params,
+				 struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
+				 struct storage_sgl_task_params *sgl_task_params)
+{
+	init_common_initiator_read_task(task_params, conn_params,
+					(void *)cmd_pdu_header, sgl_task_params);
+}
+
+static void
+init_common_initiator_write_task(struct nvmetcp_task_params *task_params,
+				 struct nvmetcp_conn_params *conn_params,
+				 struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
+				 struct storage_sgl_task_params *sgl_task_params)
+{
+	init_rw_nvmetcp_task(task_params, NVMETCP_TASK_TYPE_HOST_WRITE,
+			     conn_params, cmd_pdu_header, sgl_task_params);
+}
+
+void init_nvmetcp_host_write_task(struct nvmetcp_task_params *task_params,
+				  struct nvmetcp_conn_params *conn_params,
+				  struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
+				  struct storage_sgl_task_params *sgl_task_params)
+{
+	init_common_initiator_write_task(task_params, conn_params,
+					 (void *)cmd_pdu_header,
+					 sgl_task_params);
+}
+
+static void
+init_common_login_request_task(struct nvmetcp_task_params *task_params,
+			       void *login_req_pdu_header,
+			       struct storage_sgl_task_params *tx_sgl_task_params,
+			       struct storage_sgl_task_params *rx_sgl_task_params)
+{
+	struct e5_nvmetcp_task_context *context = task_params->context;
+
+	init_default_nvmetcp_task(task_params, (void *)login_req_pdu_header,
+				  NVMETCP_TASK_TYPE_INIT_CONN_REQUEST);
+
+	/* Ustorm Context: */
+	init_ustorm_task_contexts(&context->ustorm_st_context,
+				  &context->ustorm_ag_context,
+
+				  /* Remaining Receive length is the Task Size */
+				  task_params->rx_io_size ?
+				  rx_sgl_task_params->total_buffer_size : 0,
+
+				  /* The size of the transmitted task */
+				  task_params->tx_io_size ?
+				  tx_sgl_task_params->total_buffer_size : 0,
+				  0, /* num_sges */
+				  0); /* tx_dif_conn_err_en */
+
+	/* SGL context: */
+	if (task_params->tx_io_size)
+		init_scsi_sgl_context(&context->ystorm_st_context.state.sgl_params,
+				      &context->ystorm_st_context.state.data_desc,
+				      tx_sgl_task_params);
+	if (task_params->rx_io_size)
+		init_scsi_sgl_context(&context->mstorm_st_context.sgl_params,
+				      &context->mstorm_st_context.data_desc,
+				      rx_sgl_task_params);
+
+	context->mstorm_st_context.rem_task_size =
+		cpu_to_le32(task_params->rx_io_size ?
+				 rx_sgl_task_params->total_buffer_size : 0);
+
+	init_sqe(task_params, tx_sgl_task_params, NVMETCP_TASK_TYPE_INIT_CONN_REQUEST);
+}
+
+/* The following function initializes Login task in Host mode: */
+void init_nvmetcp_init_conn_req_task(struct nvmetcp_task_params *task_params,
+				     struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,
+				     struct storage_sgl_task_params *tx_sgl_task_params,
+				     struct storage_sgl_task_params *rx_sgl_task_params)
+{
+	init_common_login_request_task(task_params, init_conn_req_pdu_hdr,
+				       tx_sgl_task_params, rx_sgl_task_params);
+}
+
+void init_cleanup_task_nvmetcp(struct nvmetcp_task_params *task_params)
+{
+	init_sqe(task_params, NULL, NVMETCP_TASK_TYPE_CLEANUP);
+}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
new file mode 100644
index 000000000000..3a8c74356c4c
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
+/* Copyright 2021 Marvell. All rights reserved. */
+
+#ifndef _QED_NVMETCP_FW_FUNCS_H
+#define _QED_NVMETCP_FW_FUNCS_H
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <asm/byteorder.h>
+#include <linux/qed/common_hsi.h>
+#include <linux/qed/storage_common.h>
+#include <linux/qed/nvmetcp_common.h>
+#include <linux/qed/qed_nvmetcp_if.h>
+
+#if IS_ENABLED(CONFIG_QED_NVMETCP)
+
+void init_nvmetcp_host_read_task(struct nvmetcp_task_params *task_params,
+				 struct nvmetcp_conn_params *conn_params,
+				 struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
+				 struct storage_sgl_task_params *sgl_task_params);
+
+void init_nvmetcp_host_write_task(struct nvmetcp_task_params *task_params,
+				  struct nvmetcp_conn_params *conn_params,
+				  struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
+				  struct storage_sgl_task_params *sgl_task_params);
+
+void init_nvmetcp_init_conn_req_task(struct nvmetcp_task_params *task_params,
+				     struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,
+				     struct storage_sgl_task_params *tx_sgl_task_params,
+				     struct storage_sgl_task_params *rx_sgl_task_params);
+
+void init_cleanup_task_nvmetcp(struct nvmetcp_task_params *task_params);
+
+#else /* IS_ENABLED(CONFIG_QED_NVMETCP) */
+
+#endif /* IS_ENABLED(CONFIG_QED_NVMETCP) */
+
+#endif /* _QED_NVMETCP_FW_FUNCS_H */
diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
index dda7a785c321..c0023bb185dd 100644
--- a/include/linux/qed/nvmetcp_common.h
+++ b/include/linux/qed/nvmetcp_common.h
@@ -9,6 +9,9 @@
 #define NVMETCP_SLOW_PATH_LAYER_CODE (6)
 #define NVMETCP_WQE_NUM_SGES_SLOWIO (0xf)
 
+#define QED_NVMETCP_CMD_HDR_SIZE 72
+#define QED_NVMETCP_CMN_HDR_SIZE 24
+
 /* NVMeTCP firmware function init parameters */
 struct nvmetcp_spe_func_init {
 	__le16 half_way_close_timeout;
diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
index 04e90dc42c12..d971be84f804 100644
--- a/include/linux/qed/qed_nvmetcp_if.h
+++ b/include/linux/qed/qed_nvmetcp_if.h
@@ -220,6 +220,23 @@ struct qed_nvmetcp_ops {
 	void (*remove_dst_tcp_port_filter)(struct qed_dev *cdev, u16 dest_port);
 
 	void (*clear_all_filters)(struct qed_dev *cdev);
+
+	void (*init_read_io)(struct nvmetcp_task_params *task_params,
+			     struct nvmetcp_conn_params *conn_params,
+			     struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
+			     struct storage_sgl_task_params *sgl_task_params);
+
+	void (*init_write_io)(struct nvmetcp_task_params *task_params,
+			      struct nvmetcp_conn_params *conn_params,
+			      struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
+			      struct storage_sgl_task_params *sgl_task_params);
+
+	void (*init_icreq_exchange)(struct nvmetcp_task_params *task_params,
+				    struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,
+				    struct storage_sgl_task_params *tx_sgl_task_params,
+				    struct storage_sgl_task_params *rx_sgl_task_params);
+
+	void (*init_task_cleanup)(struct nvmetcp_task_params *task_params);
 };
 
 const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 07/27] qed: Add IP services APIs support
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (5 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 06/27] qed: Add NVMeTCP Offload IO Level FW Initializations Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:26   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 08/27] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
                   ` (20 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Nikolay Assa

From: Nikolay Assa <nassa@marvell.com>

This patch introduces APIs which the NVMeTCP Offload device (qedn)
will use through the paired net-device (qede).
It includes APIs for:
- ipv4/ipv6 routing
- get VLAN from net-device
- TCP ports reservation

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Nikolay Assa <nassa@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 .../qlogic/qed/qed_nvmetcp_ip_services.c      | 239 ++++++++++++++++++
 .../linux/qed/qed_nvmetcp_ip_services_if.h    |  29 +++
 2 files changed, 268 insertions(+)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
 create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h

diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
new file mode 100644
index 000000000000..2904b1a0830a
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+#include <linux/types.h>
+#include <asm/byteorder.h>
+#include <asm/param.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <linux/dma-mapping.h>
+#include <linux/etherdevice.h>
+#include <linux/kernel.h>
+#include <linux/stddef.h>
+#include <linux/errno.h>
+
+#include <net/tcp.h>
+
+#include <linux/qed/qed_nvmetcp_ip_services_if.h>
+
+#define QED_IP_RESOL_TIMEOUT  4
+
+int qed_route_ipv4(struct sockaddr_storage *local_addr,
+		   struct sockaddr_storage *remote_addr,
+		   struct sockaddr *hardware_address,
+		   struct net_device **ndev)
+{
+	struct neighbour *neigh = NULL;
+	__be32 *loc_ip, *rem_ip;
+	struct rtable *rt;
+	int rc = -ENXIO;
+	int retry;
+
+	loc_ip = &((struct sockaddr_in *)local_addr)->sin_addr.s_addr;
+	rem_ip = &((struct sockaddr_in *)remote_addr)->sin_addr.s_addr;
+	*ndev = NULL;
+	rt = ip_route_output(&init_net, *rem_ip, *loc_ip, 0/*tos*/, 0/*oif*/);
+	if (IS_ERR(rt)) {
+		pr_err("lookup route failed\n");
+		rc = PTR_ERR(rt);
+		goto return_err;
+	}
+
+	neigh = dst_neigh_lookup(&rt->dst, rem_ip);
+	if (!neigh) {
+		rc = -ENOMEM;
+		ip_rt_put(rt);
+		goto return_err;
+	}
+
+	*ndev = rt->dst.dev;
+	ip_rt_put(rt);
+
+	/* If not resolved, kick-off state machine towards resolution */
+	if (!(neigh->nud_state & NUD_VALID))
+		neigh_event_send(neigh, NULL);
+
+	/* query neighbor until resolved or timeout */
+	retry = QED_IP_RESOL_TIMEOUT;
+	while (!(neigh->nud_state & NUD_VALID) && retry > 0) {
+		msleep(1000);
+		retry--;
+	}
+
+	if (neigh->nud_state & NUD_VALID) {
+		/* copy resolved MAC address */
+		neigh_ha_snapshot(hardware_address->sa_data, neigh, *ndev);
+
+		hardware_address->sa_family = (*ndev)->type;
+		rc = 0;
+	}
+
+	neigh_release(neigh);
+	if (!(*loc_ip)) {
+		*loc_ip = inet_select_addr(*ndev, *rem_ip, RT_SCOPE_UNIVERSE);
+		local_addr->ss_family = AF_INET;
+	}
+
+return_err:
+
+	return rc;
+}
+EXPORT_SYMBOL(qed_route_ipv4);
+
+int qed_route_ipv6(struct sockaddr_storage *local_addr,
+		   struct sockaddr_storage *remote_addr,
+		   struct sockaddr *hardware_address,
+		   struct net_device **ndev)
+{
+	struct neighbour *neigh = NULL;
+	struct dst_entry *dst;
+	struct flowi6 fl6;
+	int rc = -ENXIO;
+	int retry;
+
+	memset(&fl6, 0, sizeof(fl6));
+	fl6.saddr = ((struct sockaddr_in6 *)local_addr)->sin6_addr;
+	fl6.daddr = ((struct sockaddr_in6 *)remote_addr)->sin6_addr;
+
+	dst = ip6_route_output(&init_net, NULL, &fl6);
+	if (!dst || dst->error) {
+		if (dst) {
+			dst_release(dst);
+			pr_err("lookup route failed %d\n", dst->error);
+		}
+
+		goto out;
+	}
+
+	neigh = dst_neigh_lookup(dst, &fl6.daddr);
+	if (neigh) {
+		*ndev = ip6_dst_idev(dst)->dev;
+
+		/* If not resolved, kick-off state machine towards resolution */
+		if (!(neigh->nud_state & NUD_VALID))
+			neigh_event_send(neigh, NULL);
+
+		/* query neighbor until resolved or timeout */
+		retry = QED_IP_RESOL_TIMEOUT;
+		while (!(neigh->nud_state & NUD_VALID) && retry > 0) {
+			msleep(1000);
+			retry--;
+		}
+
+		if (neigh->nud_state & NUD_VALID) {
+			neigh_ha_snapshot((u8 *)hardware_address->sa_data, neigh, *ndev);
+
+			hardware_address->sa_family = (*ndev)->type;
+			rc = 0;
+		}
+
+		neigh_release(neigh);
+
+		if (ipv6_addr_any(&fl6.saddr)) {
+			if (ipv6_dev_get_saddr(dev_net(*ndev), *ndev,
+					       &fl6.daddr, 0, &fl6.saddr)) {
+				pr_err("Unable to find source IP address\n");
+				goto out;
+			}
+
+			local_addr->ss_family = AF_INET6;
+			((struct sockaddr_in6 *)local_addr)->sin6_addr =
+								fl6.saddr;
+		}
+	}
+
+	dst_release(dst);
+
+out:
+
+	return rc;
+}
+EXPORT_SYMBOL(qed_route_ipv6);
+
+void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id)
+{
+	if (is_vlan_dev(*ndev)) {
+		*vlan_id = vlan_dev_vlan_id(*ndev);
+		*ndev = vlan_dev_real_dev(*ndev);
+	}
+}
+EXPORT_SYMBOL(qed_vlan_get_ndev);
+
+struct pci_dev *qed_validate_ndev(struct net_device *ndev)
+{
+	struct pci_dev *pdev = NULL;
+	struct net_device *upper;
+
+	for_each_pci_dev(pdev) {
+		if (pdev && pdev->driver &&
+		    !strcmp(pdev->driver->name, "qede")) {
+			upper = pci_get_drvdata(pdev);
+			if (upper->ifindex == ndev->ifindex)
+				return pdev;
+		}
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL(qed_validate_ndev);
+
+__be16 qed_get_in_port(struct sockaddr_storage *sa)
+{
+	return sa->ss_family == AF_INET
+		? ((struct sockaddr_in *)sa)->sin_port
+		: ((struct sockaddr_in6 *)sa)->sin6_port;
+}
+EXPORT_SYMBOL(qed_get_in_port);
+
+int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr,
+		       struct socket **sock, u16 *port)
+{
+	struct sockaddr_storage sa;
+	int rc = 0;
+
+	rc = sock_create(local_ip_addr.ss_family, SOCK_STREAM, IPPROTO_TCP, sock);
+	if (rc) {
+		pr_warn("failed to create socket: %d\n", rc);
+		goto err;
+	}
+
+	(*sock)->sk->sk_allocation = GFP_KERNEL;
+	sk_set_memalloc((*sock)->sk);
+
+	rc = kernel_bind(*sock, (struct sockaddr *)&local_ip_addr,
+			 sizeof(local_ip_addr));
+
+	if (rc) {
+		pr_warn("failed to bind socket: %d\n", rc);
+		goto err_sock;
+	}
+
+	rc = kernel_getsockname(*sock, (struct sockaddr *)&sa);
+	if (rc < 0) {
+		pr_warn("getsockname() failed: %d\n", rc);
+		goto err_sock;
+	}
+
+	*port = ntohs(qed_get_in_port(&sa));
+
+	return 0;
+
+err_sock:
+	sock_release(*sock);
+	sock = NULL;
+err:
+
+	return rc;
+}
+EXPORT_SYMBOL(qed_fetch_tcp_port);
+
+void qed_return_tcp_port(struct socket *sock)
+{
+	if (sock && sock->sk) {
+		tcp_set_state(sock->sk, TCP_CLOSE);
+		sock_release(sock);
+	}
+}
+EXPORT_SYMBOL(qed_return_tcp_port);
diff --git a/include/linux/qed/qed_nvmetcp_ip_services_if.h b/include/linux/qed/qed_nvmetcp_ip_services_if.h
new file mode 100644
index 000000000000..3604aee53796
--- /dev/null
+++ b/include/linux/qed/qed_nvmetcp_ip_services_if.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+#ifndef _QED_IP_SERVICES_IF_H
+#define _QED_IP_SERVICES_IF_H
+
+#include <linux/types.h>
+#include <net/route.h>
+#include <net/ip6_route.h>
+#include <linux/inetdevice.h>
+
+int qed_route_ipv4(struct sockaddr_storage *local_addr,
+		   struct sockaddr_storage *remote_addr,
+		   struct sockaddr *hardware_address,
+		   struct net_device **ndev);
+int qed_route_ipv6(struct sockaddr_storage *local_addr,
+		   struct sockaddr_storage *remote_addr,
+		   struct sockaddr *hardware_address,
+		   struct net_device **ndev);
+void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id);
+struct pci_dev *qed_validate_ndev(struct net_device *ndev);
+void qed_return_tcp_port(struct socket *sock);
+int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr,
+		       struct socket **sock, u16 *port);
+__be16 qed_get_in_port(struct sockaddr_storage *sa);
+
+#endif /* _QED_IP_SERVICES_IF_H */
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 08/27] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (6 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 07/27] qed: Add IP services APIs support Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-01 12:18   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 09/27] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Shai Malin
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Dean Balandin

This patch will present the structure for the NVMeTCP offload common
layer driver. This module is added under "drivers/nvme/host/" and future
offload drivers which will register to it will be placed under
"drivers/nvme/hw".
This new driver will be enabled by the Kconfig "NVM Express over Fabrics
TCP offload commmon layer".
In order to support the new transport type, for host mode, no change is
needed.

Each new vendor-specific offload driver will register to this ULP during
its probe function, by filling out the nvme_tcp_ofld_dev->ops and
nvme_tcp_ofld_dev->private_data and calling nvme_tcp_ofld_register_dev
with the initialized struct.

The internal implementation:
- tcp-offload.h:
  Includes all common structs and ops to be used and shared by offload
  drivers.

- tcp-offload.c:
  Includes the init function which registers as a NVMf transport just
  like any other transport.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/host/Kconfig       |  16 +++
 drivers/nvme/host/Makefile      |   3 +
 drivers/nvme/host/tcp-offload.c | 126 +++++++++++++++++++
 drivers/nvme/host/tcp-offload.h | 206 ++++++++++++++++++++++++++++++++
 4 files changed, 351 insertions(+)
 create mode 100644 drivers/nvme/host/tcp-offload.c
 create mode 100644 drivers/nvme/host/tcp-offload.h

diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index a44d49d63968..6e869e94e67f 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -84,3 +84,19 @@ config NVME_TCP
 	  from https://github.com/linux-nvme/nvme-cli.
 
 	  If unsure, say N.
+
+config NVME_TCP_OFFLOAD
+	tristate "NVM Express over Fabrics TCP offload common layer"
+	default m
+	depends on INET
+	depends on BLK_DEV_NVME
+	select NVME_FABRICS
+	help
+	  This provides support for the NVMe over Fabrics protocol using
+	  the TCP offload transport. This allows you to use remote block devices
+	  exported using the NVMe protocol set.
+
+	  To configure a NVMe over Fabrics controller use the nvme-cli tool
+	  from https://github.com/linux-nvme/nvme-cli.
+
+	  If unsure, say N.
diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index d7f6a87687b8..0e7ef044cf29 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)		+= nvme-fabrics.o
 obj-$(CONFIG_NVME_RDMA)			+= nvme-rdma.o
 obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
 obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
+obj-$(CONFIG_NVME_TCP_OFFLOAD)	+= nvme-tcp-offload.o
 
 nvme-core-y				:= core.o
 nvme-core-$(CONFIG_TRACING)		+= trace.o
@@ -26,3 +27,5 @@ nvme-rdma-y				+= rdma.o
 nvme-fc-y				+= fc.o
 
 nvme-tcp-y				+= tcp.o
+
+nvme-tcp-offload-y		+= tcp-offload.o
diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
new file mode 100644
index 000000000000..711232eba339
--- /dev/null
+++ b/drivers/nvme/host/tcp-offload.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+/* Kernel includes */
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* Driver includes */
+#include "tcp-offload.h"
+
+static LIST_HEAD(nvme_tcp_ofld_devices);
+static DECLARE_RWSEM(nvme_tcp_ofld_devices_rwsem);
+
+/**
+ * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
+ * function.
+ * @dev:	NVMeTCP offload device instance to be registered to the
+ *		common tcp offload instance.
+ *
+ * API function that registers the type of vendor specific driver
+ * being implemented to the common NVMe over TCP offload library. Part of
+ * the overall init sequence of starting up an offload driver.
+ */
+int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev)
+{
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
+
+	if (!ops->claim_dev ||
+	    !ops->setup_ctrl ||
+	    !ops->release_ctrl ||
+	    !ops->create_queue ||
+	    !ops->drain_queue ||
+	    !ops->destroy_queue ||
+	    !ops->poll_queue ||
+	    !ops->init_req ||
+	    !ops->send_req ||
+	    !ops->commit_rqs)
+		return -EINVAL;
+
+	down_write(&nvme_tcp_ofld_devices_rwsem);
+	list_add_tail(&dev->entry, &nvme_tcp_ofld_devices);
+	up_write(&nvme_tcp_ofld_devices_rwsem);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_register_dev);
+
+/**
+ * nvme_tcp_ofld_unregister_dev() - NVMeTCP Offload Library unregistration
+ * function.
+ * @dev:	NVMeTCP offload device instance to be unregistered from the
+ *		common tcp offload instance.
+ *
+ * API function that unregisters the type of vendor specific driver being
+ * implemented from the common NVMe over TCP offload library.
+ * Part of the overall exit sequence of unloading the implemented driver.
+ */
+void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)
+{
+	down_write(&nvme_tcp_ofld_devices_rwsem);
+	list_del(&dev->entry);
+	up_write(&nvme_tcp_ofld_devices_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
+
+/**
+ * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event
+ * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.
+ * @queue:	NVMeTCP offload queue instance on which the error has occurred.
+ *
+ * API function that allows the vendor specific offload driver to reports errors
+ * to the common offload layer, to invoke error recovery.
+ */
+int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)
+{
+	/* Placeholder - invoke error recovery flow */
+
+	return 0;
+}
+
+/**
+ * nvme_tcp_ofld_req_done() - NVMeTCP Offload request done callback
+ * function. Pointed to by nvme_tcp_ofld_req->done.
+ * Handles both NVME_TCP_F_DATA_SUCCESS flag and NVMe CQ.
+ * @req:	NVMeTCP offload request to complete.
+ * @result:     The nvme_result.
+ * @status:     The completion status.
+ *
+ * API function that allows the vendor specific offload driver to report request
+ * completions to the common offload layer.
+ */
+void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
+			    union nvme_result *result,
+			    __le16 status)
+{
+	/* Placeholder - complete request with/without error */
+}
+
+static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
+	.name		= "tcp_offload",
+	.module		= THIS_MODULE,
+	.required_opts	= NVMF_OPT_TRADDR,
+	.allowed_opts	= NVMF_OPT_TRSVCID | NVMF_OPT_NR_WRITE_QUEUES  |
+			  NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
+			  NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_HDR_DIGEST |
+			  NVMF_OPT_DATA_DIGEST | NVMF_OPT_NR_POLL_QUEUES |
+			  NVMF_OPT_TOS,
+};
+
+static int __init nvme_tcp_ofld_init_module(void)
+{
+	nvmf_register_transport(&nvme_tcp_ofld_transport);
+
+	return 0;
+}
+
+static void __exit nvme_tcp_ofld_cleanup_module(void)
+{
+	nvmf_unregister_transport(&nvme_tcp_ofld_transport);
+}
+
+module_init(nvme_tcp_ofld_init_module);
+module_exit(nvme_tcp_ofld_cleanup_module);
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
new file mode 100644
index 000000000000..9fd270240eaa
--- /dev/null
+++ b/drivers/nvme/host/tcp-offload.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+/* Linux includes */
+#include <linux/dma-mapping.h>
+#include <linux/scatterlist.h>
+#include <linux/types.h>
+#include <linux/nvme-tcp.h>
+
+/* Driver includes */
+#include "nvme.h"
+#include "fabrics.h"
+
+/* Forward declarations */
+struct nvme_tcp_ofld_ops;
+
+/* Representation of a vendor-specific device. This is the struct used to
+ * register to the offload layer by the vendor-specific driver during its probe
+ * function.
+ * Allocated by vendor-specific driver.
+ */
+struct nvme_tcp_ofld_dev {
+	struct list_head entry;
+	struct pci_dev *qede_pdev;
+	struct net_device *ndev;
+	struct nvme_tcp_ofld_ops *ops;
+};
+
+/* Per IO struct holding the nvme_request and command
+ * Allocated by blk-mq.
+ */
+struct nvme_tcp_ofld_req {
+	struct nvme_request req;
+	struct nvme_command nvme_cmd;
+	struct list_head queue_entry;
+	struct nvme_tcp_ofld_queue *queue;
+	struct request *rq;
+
+	/* Vendor specific driver context */
+	void *private_data;
+
+	bool async;
+	bool last;
+
+	void (*done)(struct nvme_tcp_ofld_req *req,
+		     union nvme_result *result,
+		     __le16 status);
+};
+
+enum nvme_tcp_ofld_queue_flags {
+	NVME_TCP_OFLD_Q_ALLOCATED = 0,
+	NVME_TCP_OFLD_Q_LIVE = 1,
+};
+
+/* Allocated by nvme_tcp_ofld */
+struct nvme_tcp_ofld_queue {
+	/* Offload device associated to this queue */
+	struct nvme_tcp_ofld_dev *dev;
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	unsigned long flags;
+	size_t cmnd_capsule_len;
+
+	/* Vendor specific driver context */
+	void *private_data;
+
+	/* Error callback function */
+	int (*report_err)(struct nvme_tcp_ofld_queue *queue);
+};
+
+/* Connectivity (routing) params used for establishing a connection */
+struct nvme_tcp_ofld_ctrl_con_params {
+	/* Input params */
+	struct sockaddr_storage remote_ip_addr;
+
+	/* If NVMF_OPT_HOST_TRADDR is provided it will be set in local_ip_addr
+	 * in nvme_tcp_ofld_create_ctrl().
+	 * If NVMF_OPT_HOST_TRADDR is not provided the local_ip_addr will be
+	 * initialized by claim_dev().
+	 */
+	struct sockaddr_storage local_ip_addr;
+
+	/* Output params */
+	struct sockaddr	remote_mac_addr;
+	struct sockaddr	local_mac_addr;
+	u16 vlan_id;
+};
+
+/* Allocated by nvme_tcp_ofld */
+struct nvme_tcp_ofld_ctrl {
+	struct nvme_ctrl nctrl;
+	struct list_head list;
+	struct nvme_tcp_ofld_dev *dev;
+
+	/* admin and IO queues */
+	struct blk_mq_tag_set tag_set;
+	struct blk_mq_tag_set admin_tag_set;
+	struct nvme_tcp_ofld_queue *queues;
+
+	struct work_struct err_work;
+	struct delayed_work connect_work;
+
+	/*
+	 * Each entry in the array indicates the number of queues of
+	 * corresponding type.
+	 */
+	u32 queue_type_mapping[HCTX_MAX_TYPES];
+	u32 io_queues[HCTX_MAX_TYPES];
+
+	/* Connectivity params */
+	struct nvme_tcp_ofld_ctrl_con_params conn_params;
+
+	/* Vendor specific driver context */
+	void *private_data;
+};
+
+struct nvme_tcp_ofld_ops {
+	const char *name;
+	struct module *module;
+
+	/* For vendor-specific driver to report what opts it supports */
+	int required_opts; /* bitmap using enum nvmf_parsing_opts */
+	int allowed_opts; /* bitmap using enum nvmf_parsing_opts */
+
+	/* For vendor-specific max num of segments and IO sizes */
+	u32 max_hw_sectors;
+	u32 max_segments;
+
+	/**
+	 * claim_dev: Return True if addr is reachable via offload device.
+	 * @dev: The offload device to check.
+	 * @conn_params: ptr to routing params to be filled by the lower
+	 *               driver. Input+Output argument.
+	 */
+	int (*claim_dev)(struct nvme_tcp_ofld_dev *dev,
+			 struct nvme_tcp_ofld_ctrl_con_params *conn_params);
+
+	/**
+	 * setup_ctrl: Setup device specific controller structures.
+	 * @ctrl: The offload ctrl.
+	 * @new: is new setup.
+	 */
+	int (*setup_ctrl)(struct nvme_tcp_ofld_ctrl *ctrl, bool new);
+
+	/**
+	 * release_ctrl: Release/Free device specific controller structures.
+	 * @ctrl: The offload ctrl.
+	 */
+	int (*release_ctrl)(struct nvme_tcp_ofld_ctrl *ctrl);
+
+	/**
+	 * create_queue: Create offload queue and establish TCP + NVMeTCP
+	 * (icreq+icresp) connection. Return true on successful connection.
+	 * Based on nvme_tcp_alloc_queue.
+	 * @queue: The queue itself - used as input and output.
+	 * @qid: The queue ID associated with the requested queue.
+	 * @q_size: The queue depth.
+	 */
+	int (*create_queue)(struct nvme_tcp_ofld_queue *queue, int qid,
+			    size_t q_size);
+
+	/**
+	 * drain_queue: Drain a given queue - Returning from this function
+	 * ensures that no additional completions will arrive on this queue.
+	 * @queue: The queue to drain.
+	 */
+	void (*drain_queue)(struct nvme_tcp_ofld_queue *queue);
+
+	/**
+	 * destroy_queue: Close the TCP + NVMeTCP connection of a given queue
+	 * and make sure its no longer active (no completions will arrive on the
+	 * queue).
+	 * @queue: The queue to destroy.
+	 */
+	void (*destroy_queue)(struct nvme_tcp_ofld_queue *queue);
+
+	/**
+	 * poll_queue: Poll a given queue for completions.
+	 * @queue: The queue to poll.
+	 */
+	int (*poll_queue)(struct nvme_tcp_ofld_queue *queue);
+
+	/**
+	 * init_req: Initialize vendor-specific params for a new request.
+	 * @req: Ptr to request to be initialized. Input+Output argument.
+	 */
+	int (*init_req)(struct nvme_tcp_ofld_req *req);
+
+	/**
+	 * send_req: Dispatch a request. Returns the execution status.
+	 * @req: Ptr to request to be sent.
+	 */
+	int (*send_req)(struct nvme_tcp_ofld_req *req);
+
+	/**
+	 * commit_rqs: Serves the purpose of kicking the hardware in case of
+	 * errors, otherwise it would have been kicked by the last request.
+	 * @queue: The queue to drain.
+	 */
+	void (*commit_rqs)(struct nvme_tcp_ofld_queue *queue);
+};
+
+/* Exported functions for lower vendor specific offload drivers */
+int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
+void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 09/27] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (7 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 08/27] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-01 12:19   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 10/27] nvme-tcp-offload: Add device scan implementation Shai Malin
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Arie Gershberg

From: Arie Gershberg <agershberg@marvell.com>

Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
to header file, so it can be used by transport modules.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/host/fabrics.c | 7 -------
 drivers/nvme/host/fabrics.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 604ab0e5a2ad..55d7125c8483 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -1001,13 +1001,6 @@ void nvmf_free_options(struct nvmf_ctrl_options *opts)
 }
 EXPORT_SYMBOL_GPL(nvmf_free_options);
 
-#define NVMF_REQUIRED_OPTS	(NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
-#define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
-				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
-				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
-				 NVMF_OPT_DISABLE_SQFLOW |\
-				 NVMF_OPT_FAIL_FAST_TMO)
-
 static struct nvme_ctrl *
 nvmf_create_ctrl(struct device *dev, const char *buf)
 {
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 888b108d87a4..b7627e8dcaaf 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -68,6 +68,13 @@ enum {
 	NVMF_OPT_FAIL_FAST_TMO	= 1 << 20,
 };
 
+#define NVMF_REQUIRED_OPTS	(NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
+#define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
+				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
+				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
+				 NVMF_OPT_DISABLE_SQFLOW |\
+				 NVMF_OPT_FAIL_FAST_TMO)
+
 /**
  * struct nvmf_ctrl_options - Used to hold the options specified
  *			      with the parsing opts enum.
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 10/27] nvme-tcp-offload: Add device scan implementation
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (8 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 09/27] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-01 12:25   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 11/27] nvme-tcp-offload: Add controller level implementation Shai Malin
                   ` (17 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Dean Balandin

From: Dean Balandin <dbalandin@marvell.com>

As part of create_ctrl(), it scans the registered devices and calls
the claim_dev op on each of them, to find the first devices that matches
the connection params. Once the correct devices is found (claim_dev
returns true), we raise the refcnt of that device and return that device
as the device to be used for ctrl currently being created.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/host/tcp-offload.c | 94 +++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 711232eba339..aa7cc239abf2 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -13,6 +13,11 @@
 static LIST_HEAD(nvme_tcp_ofld_devices);
 static DECLARE_RWSEM(nvme_tcp_ofld_devices_rwsem);
 
+static inline struct nvme_tcp_ofld_ctrl *to_tcp_ofld_ctrl(struct nvme_ctrl *nctrl)
+{
+	return container_of(nctrl, struct nvme_tcp_ofld_ctrl, nctrl);
+}
+
 /**
  * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
  * function.
@@ -98,6 +103,94 @@ void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
 	/* Placeholder - complete request with/without error */
 }
 
+struct nvme_tcp_ofld_dev *
+nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)
+{
+	struct nvme_tcp_ofld_dev *dev;
+
+	down_read(&nvme_tcp_ofld_devices_rwsem);
+	list_for_each_entry(dev, &nvme_tcp_ofld_devices, entry) {
+		if (dev->ops->claim_dev(dev, &ctrl->conn_params)) {
+			/* Increase driver refcnt */
+			if (!try_module_get(dev->ops->module)) {
+				pr_err("try_module_get failed\n");
+				dev = NULL;
+			}
+
+			goto out;
+		}
+	}
+
+	dev = NULL;
+out:
+	up_read(&nvme_tcp_ofld_devices_rwsem);
+
+	return dev;
+}
+
+static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
+{
+	/* Placeholder - validates inputs and creates admin and IO queues */
+
+	return 0;
+}
+
+static struct nvme_ctrl *
+nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	struct nvme_tcp_ofld_dev *dev;
+	struct nvme_ctrl *nctrl;
+	int rc = 0;
+
+	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
+	if (!ctrl)
+		return ERR_PTR(-ENOMEM);
+
+	nctrl = &ctrl->nctrl;
+
+	/* Init nvme_tcp_ofld_ctrl and nvme_ctrl params based on received opts */
+
+	/* Find device that can reach the dest addr */
+	dev = nvme_tcp_ofld_lookup_dev(ctrl);
+	if (!dev) {
+		pr_info("no device found for addr %s:%s.\n",
+			opts->traddr, opts->trsvcid);
+		rc = -EINVAL;
+		goto out_free_ctrl;
+	}
+
+	ctrl->dev = dev;
+
+	if (ctrl->dev->ops->max_hw_sectors)
+		nctrl->max_hw_sectors = ctrl->dev->ops->max_hw_sectors;
+	if (ctrl->dev->ops->max_segments)
+		nctrl->max_segments = ctrl->dev->ops->max_segments;
+
+	/* Init queues */
+
+	/* Call nvme_init_ctrl */
+
+	rc = ctrl->dev->ops->setup_ctrl(ctrl, true);
+	if (rc)
+		goto out_module_put;
+
+	rc = nvme_tcp_ofld_setup_ctrl(nctrl, true);
+	if (rc)
+		goto out_uninit_ctrl;
+
+	return nctrl;
+
+out_uninit_ctrl:
+	ctrl->dev->ops->release_ctrl(ctrl);
+out_module_put:
+	module_put(dev->ops->module);
+out_free_ctrl:
+	kfree(ctrl);
+
+	return ERR_PTR(rc);
+}
+
 static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
 	.name		= "tcp_offload",
 	.module		= THIS_MODULE,
@@ -107,6 +200,7 @@ static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
 			  NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_HDR_DIGEST |
 			  NVMF_OPT_DATA_DIGEST | NVMF_OPT_NR_POLL_QUEUES |
 			  NVMF_OPT_TOS,
+	.create_ctrl	= nvme_tcp_ofld_create_ctrl,
 };
 
 static int __init nvme_tcp_ofld_init_module(void)
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 11/27] nvme-tcp-offload: Add controller level implementation
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (9 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 10/27] nvme-tcp-offload: Add device scan implementation Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-04-29 19:09 ` [RFC PATCH v4 12/27] nvme-tcp-offload: Add controller level error recovery implementation Shai Malin
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Arie Gershberg

From: Arie Gershberg <agershberg@marvell.com>

In this patch we implement controller level functionality including:
- create_ctrl.
- delete_ctrl.
- free_ctrl.

The implementation is similar to other nvme fabrics modules, the main
difference being that the nvme-tcp-offload ULP calls the vendor specific
claim_dev() op with the given TCP/IP parameters to determine which device
will be used for this controller.
Once found, the vendor specific device and controller will be paired and
kept in a controller list managed by the ULP.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/host/tcp-offload.c | 467 +++++++++++++++++++++++++++++++-
 1 file changed, 459 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index aa7cc239abf2..59e1955e02ec 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -12,6 +12,10 @@
 
 static LIST_HEAD(nvme_tcp_ofld_devices);
 static DECLARE_RWSEM(nvme_tcp_ofld_devices_rwsem);
+static LIST_HEAD(nvme_tcp_ofld_ctrl_list);
+static DECLARE_RWSEM(nvme_tcp_ofld_ctrl_rwsem);
+static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops;
+static struct blk_mq_ops nvme_tcp_ofld_mq_ops;
 
 static inline struct nvme_tcp_ofld_ctrl *to_tcp_ofld_ctrl(struct nvme_ctrl *nctrl)
 {
@@ -128,28 +132,430 @@ nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)
 	return dev;
 }
 
+static struct blk_mq_tag_set *
+nvme_tcp_ofld_alloc_tagset(struct nvme_ctrl *nctrl, bool admin)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct blk_mq_tag_set *set;
+	int rc;
+
+	if (admin) {
+		set = &ctrl->admin_tag_set;
+		memset(set, 0, sizeof(*set));
+		set->ops = &nvme_tcp_ofld_admin_mq_ops;
+		set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
+		set->reserved_tags = NVMF_RESERVED_TAGS;
+		set->numa_node = nctrl->numa_node;
+		set->flags = BLK_MQ_F_BLOCKING;
+		set->cmd_size = sizeof(struct nvme_tcp_ofld_req);
+		set->driver_data = ctrl;
+		set->nr_hw_queues = 1;
+		set->timeout = NVME_ADMIN_TIMEOUT;
+	} else {
+		set = &ctrl->tag_set;
+		memset(set, 0, sizeof(*set));
+		set->ops = &nvme_tcp_ofld_mq_ops;
+		set->queue_depth = nctrl->sqsize + 1;
+		set->reserved_tags = NVMF_RESERVED_TAGS;
+		set->numa_node = nctrl->numa_node;
+		set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
+		set->cmd_size = sizeof(struct nvme_tcp_ofld_req);
+		set->driver_data = ctrl;
+		set->nr_hw_queues = nctrl->queue_count - 1;
+		set->timeout = NVME_IO_TIMEOUT;
+		set->nr_maps = nctrl->opts->nr_poll_queues ? HCTX_MAX_TYPES : 2;
+	}
+
+	rc = blk_mq_alloc_tag_set(set);
+	if (rc)
+		return ERR_PTR(rc);
+
+	return set;
+}
+
+static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
+					       bool new)
+{
+	int rc;
+
+	/* Placeholder - alloc_admin_queue */
+	if (new) {
+		nctrl->admin_tagset =
+				nvme_tcp_ofld_alloc_tagset(nctrl, true);
+		if (IS_ERR(nctrl->admin_tagset)) {
+			rc = PTR_ERR(nctrl->admin_tagset);
+			nctrl->admin_tagset = NULL;
+			goto out_free_queue;
+		}
+
+		nctrl->fabrics_q = blk_mq_init_queue(nctrl->admin_tagset);
+		if (IS_ERR(nctrl->fabrics_q)) {
+			rc = PTR_ERR(nctrl->fabrics_q);
+			nctrl->fabrics_q = NULL;
+			goto out_free_tagset;
+		}
+
+		nctrl->admin_q = blk_mq_init_queue(nctrl->admin_tagset);
+		if (IS_ERR(nctrl->admin_q)) {
+			rc = PTR_ERR(nctrl->admin_q);
+			nctrl->admin_q = NULL;
+			goto out_cleanup_fabrics_q;
+		}
+	}
+
+	/* Placeholder - nvme_tcp_ofld_start_queue */
+
+	rc = nvme_enable_ctrl(nctrl);
+	if (rc)
+		goto out_stop_queue;
+
+	blk_mq_unquiesce_queue(nctrl->admin_q);
+
+	rc = nvme_init_identify(nctrl);
+	if (rc)
+		goto out_quiesce_queue;
+
+	return 0;
+
+out_quiesce_queue:
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	blk_sync_queue(nctrl->admin_q);
+
+out_stop_queue:
+	/* Placeholder - stop offload queue */
+	nvme_cancel_admin_tagset(nctrl);
+
+out_cleanup_fabrics_q:
+	if (new)
+		blk_cleanup_queue(nctrl->fabrics_q);
+out_free_tagset:
+	if (new)
+		blk_mq_free_tag_set(nctrl->admin_tagset);
+out_free_queue:
+	/* Placeholder - free admin queue */
+
+	return rc;
+}
+
+static int
+nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
+{
+	int rc;
+
+	/* Placeholder - alloc_io_queues */
+
+	if (new) {
+		nctrl->tagset = nvme_tcp_ofld_alloc_tagset(nctrl, false);
+		if (IS_ERR(nctrl->tagset)) {
+			rc = PTR_ERR(nctrl->tagset);
+			nctrl->tagset = NULL;
+			goto out_free_io_queues;
+		}
+
+		nctrl->connect_q = blk_mq_init_queue(nctrl->tagset);
+		if (IS_ERR(nctrl->connect_q)) {
+			rc = PTR_ERR(nctrl->connect_q);
+			nctrl->connect_q = NULL;
+			goto out_free_tag_set;
+		}
+	}
+
+	/* Placeholder - start_io_queues */
+
+	if (!new) {
+		nvme_start_queues(nctrl);
+		if (!nvme_wait_freeze_timeout(nctrl, NVME_IO_TIMEOUT)) {
+			/*
+			 * If we timed out waiting for freeze we are likely to
+			 * be stuck.  Fail the controller initialization just
+			 * to be safe.
+			 */
+			rc = -ENODEV;
+			goto out_wait_freeze_timed_out;
+		}
+		blk_mq_update_nr_hw_queues(nctrl->tagset, nctrl->queue_count - 1);
+		nvme_unfreeze(nctrl);
+	}
+
+	return 0;
+
+out_wait_freeze_timed_out:
+	nvme_stop_queues(nctrl);
+	nvme_sync_io_queues(nctrl);
+
+	/* Placeholder - Stop IO queues */
+
+	if (new)
+		blk_cleanup_queue(nctrl->connect_q);
+out_free_tag_set:
+	if (new)
+		blk_mq_free_tag_set(nctrl->tagset);
+out_free_io_queues:
+	/* Placeholder - free_io_queues */
+
+	return rc;
+}
+
 static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
 {
-	/* Placeholder - validates inputs and creates admin and IO queues */
+	struct nvmf_ctrl_options *opts = nctrl->opts;
+	int rc;
+
+	rc = nvme_tcp_ofld_configure_admin_queue(nctrl, new);
+	if (rc)
+		return rc;
+
+	if (nctrl->icdoff) {
+		dev_err(nctrl->device, "icdoff is not supported!\n");
+		rc = -EINVAL;
+		goto destroy_admin;
+	}
+
+	if (opts->queue_size > nctrl->sqsize + 1)
+		dev_warn(nctrl->device,
+			 "queue_size %zu > ctrl sqsize %u, clamping down\n",
+			 opts->queue_size, nctrl->sqsize + 1);
+
+	if (nctrl->sqsize + 1 > nctrl->maxcmd) {
+		dev_warn(nctrl->device,
+			 "sqsize %u > ctrl maxcmd %u, clamping down\n",
+			 nctrl->sqsize + 1, nctrl->maxcmd);
+		nctrl->sqsize = nctrl->maxcmd - 1;
+	}
+
+	if (nctrl->queue_count > 1) {
+		rc = nvme_tcp_ofld_configure_io_queues(nctrl, new);
+		if (rc)
+			goto destroy_admin;
+	}
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_LIVE)) {
+		/*
+		 * state change failure is ok if we started ctrl delete,
+		 * unless we're during creation of a new controller to
+		 * avoid races with teardown flow.
+		 */
+		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
+			     nctrl->state != NVME_CTRL_DELETING_NOIO);
+		WARN_ON_ONCE(new);
+		rc = -EINVAL;
+		goto destroy_io;
+	}
+
+	nvme_start_ctrl(nctrl);
+
+	return 0;
+
+destroy_io:
+	/* Placeholder - stop and destroy io queues*/
+destroy_admin:
+	/* Placeholder - stop and destroy admin queue*/
+
+	return rc;
+}
+
+static int
+nvme_tcp_ofld_check_dev_opts(struct nvmf_ctrl_options *opts,
+			     struct nvme_tcp_ofld_ops *ofld_ops)
+{
+	unsigned int nvme_tcp_ofld_opt_mask = NVMF_ALLOWED_OPTS |
+			ofld_ops->allowed_opts | ofld_ops->required_opts;
+	if (opts->mask & ~nvme_tcp_ofld_opt_mask) {
+		pr_warn("One or more of the nvmf options isn't supported by %s.\n",
+			ofld_ops->name);
+
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_dev *dev = ctrl->dev;
+
+	if (list_empty(&ctrl->list))
+		goto free_ctrl;
+
+	down_write(&nvme_tcp_ofld_ctrl_rwsem);
+	ctrl->dev->ops->release_ctrl(ctrl);
+	list_del(&ctrl->list);
+	up_write(&nvme_tcp_ofld_ctrl_rwsem);
+
+	nvmf_free_options(nctrl->opts);
+free_ctrl:
+	module_put(dev->ops->module);
+	kfree(ctrl->queues);
+	kfree(ctrl);
+}
+
+static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)
+{
+	/* Placeholder - submit_async_event */
+}
+
+static void
+nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *ctrl, bool remove)
+{
+	/* Placeholder - teardown_admin_queue */
+}
+
+static void
+nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
+{
+	/* Placeholder - teardown_io_queues */
+}
+
+static void
+nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown)
+{
+	/* Placeholder - err_work and connect_work */
+	nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown);
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	if (shutdown)
+		nvme_shutdown_ctrl(nctrl);
+	else
+		nvme_disable_ctrl(nctrl);
+	nvme_tcp_ofld_teardown_admin_queue(nctrl, shutdown);
+}
+
+static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl)
+{
+	nvme_tcp_ofld_teardown_ctrl(nctrl, true);
+}
+
+static int
+nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
+			   struct request *rq,
+			   unsigned int hctx_idx,
+			   unsigned int numa_node)
+{
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
+
+	/* Placeholder - init request */
+
+	req->done = nvme_tcp_ofld_req_done;
+	ctrl->dev->ops->init_req(req);
 
 	return 0;
 }
 
+static blk_status_t
+nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
+		       const struct blk_mq_queue_data *bd)
+{
+	/* Call nvme_setup_cmd(...) */
+
+	/* Call ops->send_req(...) */
+
+	return BLK_STS_OK;
+}
+
+static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
+	.queue_rq	= nvme_tcp_ofld_queue_rq,
+	.init_request	= nvme_tcp_ofld_init_request,
+	/*
+	 * All additional ops will be also implemented and registered similar to
+	 * tcp.c
+	 */
+};
+
+static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
+	.queue_rq	= nvme_tcp_ofld_queue_rq,
+	.init_request	= nvme_tcp_ofld_init_request,
+	/*
+	 * All additional ops will be also implemented and registered similar to
+	 * tcp.c
+	 */
+};
+
+static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
+	.name			= "tcp_offload",
+	.module			= THIS_MODULE,
+	.flags			= NVME_F_FABRICS,
+	.reg_read32		= nvmf_reg_read32,
+	.reg_read64		= nvmf_reg_read64,
+	.reg_write32		= nvmf_reg_write32,
+	.free_ctrl		= nvme_tcp_ofld_free_ctrl,
+	.submit_async_event	= nvme_tcp_ofld_submit_async_event,
+	.delete_ctrl		= nvme_tcp_ofld_delete_ctrl,
+	.get_address		= nvmf_get_address,
+};
+
+static bool
+nvme_tcp_ofld_existing_controller(struct nvmf_ctrl_options *opts)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	bool found = false;
+
+	down_read(&nvme_tcp_ofld_ctrl_rwsem);
+	list_for_each_entry(ctrl, &nvme_tcp_ofld_ctrl_list, list) {
+		found = nvmf_ip_options_match(&ctrl->nctrl, opts);
+		if (found)
+			break;
+	}
+	up_read(&nvme_tcp_ofld_ctrl_rwsem);
+
+	return found;
+}
+
 static struct nvme_ctrl *
 nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 {
+	struct nvme_tcp_ofld_queue *queue;
 	struct nvme_tcp_ofld_ctrl *ctrl;
 	struct nvme_tcp_ofld_dev *dev;
 	struct nvme_ctrl *nctrl;
-	int rc = 0;
+	int i, rc = 0;
 
 	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
 	if (!ctrl)
 		return ERR_PTR(-ENOMEM);
 
+	INIT_LIST_HEAD(&ctrl->list);
 	nctrl = &ctrl->nctrl;
+	nctrl->opts = opts;
+	nctrl->queue_count = opts->nr_io_queues + opts->nr_write_queues +
+			     opts->nr_poll_queues + 1;
+	nctrl->sqsize = opts->queue_size - 1;
+	nctrl->kato = opts->kato;
+	if (!(opts->mask & NVMF_OPT_TRSVCID)) {
+		opts->trsvcid =
+			kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL);
+		if (!opts->trsvcid) {
+			rc = -ENOMEM;
+			goto out_free_ctrl;
+		}
+		opts->mask |= NVMF_OPT_TRSVCID;
+	}
 
-	/* Init nvme_tcp_ofld_ctrl and nvme_ctrl params based on received opts */
+	rc = inet_pton_with_scope(&init_net, AF_UNSPEC, opts->traddr,
+				  opts->trsvcid,
+				  &ctrl->conn_params.remote_ip_addr);
+	if (rc) {
+		pr_err("malformed address passed: %s:%s\n",
+		       opts->traddr, opts->trsvcid);
+		goto out_free_ctrl;
+	}
+
+	if (opts->mask & NVMF_OPT_HOST_TRADDR) {
+		rc = inet_pton_with_scope(&init_net, AF_UNSPEC,
+					  opts->host_traddr, NULL,
+					  &ctrl->conn_params.local_ip_addr);
+		if (rc) {
+			pr_err("malformed src address passed: %s\n",
+			       opts->host_traddr);
+			goto out_free_ctrl;
+		}
+	}
+
+	if (!opts->duplicate_connect &&
+	    nvme_tcp_ofld_existing_controller(opts)) {
+		rc = -EALREADY;
+		goto out_free_ctrl;
+	}
 
 	/* Find device that can reach the dest addr */
 	dev = nvme_tcp_ofld_lookup_dev(ctrl);
@@ -160,6 +566,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 		goto out_free_ctrl;
 	}
 
+	rc = nvme_tcp_ofld_check_dev_opts(opts, dev->ops);
+	if (rc)
+		goto out_module_put;
+
 	ctrl->dev = dev;
 
 	if (ctrl->dev->ops->max_hw_sectors)
@@ -167,22 +577,55 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 	if (ctrl->dev->ops->max_segments)
 		nctrl->max_segments = ctrl->dev->ops->max_segments;
 
-	/* Init queues */
+	ctrl->queues = kcalloc(nctrl->queue_count,
+			       sizeof(struct nvme_tcp_ofld_queue),
+			       GFP_KERNEL);
+	if (!ctrl->queues) {
+		rc = -ENOMEM;
+		goto out_module_put;
+	}
 
-	/* Call nvme_init_ctrl */
+	for (i = 0; i < nctrl->queue_count; ++i) {
+		queue = &ctrl->queues[i];
+		queue->ctrl = ctrl;
+		queue->dev = dev;
+		queue->report_err = nvme_tcp_ofld_report_queue_err;
+	}
+
+	rc = nvme_init_ctrl(nctrl, ndev, &nvme_tcp_ofld_ctrl_ops, 0);
+	if (rc)
+		goto out_free_queues;
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
+		WARN_ON_ONCE(1);
+		rc = -EINTR;
+		goto out_uninit_ctrl;
+	}
 
 	rc = ctrl->dev->ops->setup_ctrl(ctrl, true);
 	if (rc)
-		goto out_module_put;
+		goto out_uninit_ctrl;
 
 	rc = nvme_tcp_ofld_setup_ctrl(nctrl, true);
 	if (rc)
-		goto out_uninit_ctrl;
+		goto out_release_ctrl;
+
+	dev_info(nctrl->device, "new ctrl: NQN \"%s\", addr %pISp\n",
+		 opts->subsysnqn, &ctrl->conn_params.remote_ip_addr);
+
+	down_write(&nvme_tcp_ofld_ctrl_rwsem);
+	list_add_tail(&ctrl->list, &nvme_tcp_ofld_ctrl_list);
+	up_write(&nvme_tcp_ofld_ctrl_rwsem);
 
 	return nctrl;
 
-out_uninit_ctrl:
+out_release_ctrl:
 	ctrl->dev->ops->release_ctrl(ctrl);
+out_uninit_ctrl:
+	nvme_uninit_ctrl(nctrl);
+	nvme_put_ctrl(nctrl);
+out_free_queues:
+	kfree(ctrl->queues);
 out_module_put:
 	module_put(dev->ops->module);
 out_free_ctrl:
@@ -212,7 +655,15 @@ static int __init nvme_tcp_ofld_init_module(void)
 
 static void __exit nvme_tcp_ofld_cleanup_module(void)
 {
+	struct nvme_tcp_ofld_ctrl *ctrl;
+
 	nvmf_unregister_transport(&nvme_tcp_ofld_transport);
+
+	down_write(&nvme_tcp_ofld_ctrl_rwsem);
+	list_for_each_entry(ctrl, &nvme_tcp_ofld_ctrl_list, list)
+		nvme_delete_ctrl(&ctrl->nctrl);
+	up_write(&nvme_tcp_ofld_ctrl_rwsem);
+	flush_workqueue(nvme_delete_wq);
 }
 
 module_init(nvme_tcp_ofld_init_module);
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 12/27] nvme-tcp-offload: Add controller level error recovery implementation
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (10 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 11/27] nvme-tcp-offload: Add controller level implementation Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-01 16:29   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 13/27] nvme-tcp-offload: Add queue level implementation Shai Malin
                   ` (15 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Arie Gershberg

From: Arie Gershberg <agershberg@marvell.com>

In this patch, we implement controller level error handling and recovery.
Upon an error discovered by the ULP or reset controller initiated by the
nvme-core (using reset_ctrl workqueue), the ULP will initiate a controller
recovery which includes teardown and re-connect of all queues.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/host/tcp-offload.c | 138 +++++++++++++++++++++++++++++++-
 drivers/nvme/host/tcp-offload.h |   1 +
 2 files changed, 137 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 59e1955e02ec..9082b11c133f 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -74,6 +74,23 @@ void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)
 }
 EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
 
+/**
+ * nvme_tcp_ofld_error_recovery() - NVMeTCP Offload Library error recovery.
+ * function.
+ * @nctrl:	NVMe controller instance to change to resetting.
+ *
+ * API function that change the controller state to resseting.
+ * Part of the overall controller reset sequence.
+ */
+void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl)
+{
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_RESETTING))
+		return;
+
+	queue_work(nvme_reset_wq, &to_tcp_ofld_ctrl(nctrl)->err_work);
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_error_recovery);
+
 /**
  * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event
  * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.
@@ -84,7 +101,8 @@ EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
  */
 int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)
 {
-	/* Placeholder - invoke error recovery flow */
+	pr_err("nvme-tcp-offload queue error\n");
+	nvme_tcp_ofld_error_recovery(&queue->ctrl->nctrl);
 
 	return 0;
 }
@@ -296,6 +314,28 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 	return rc;
 }
 
+static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)
+{
+	/* If we are resetting/deleting then do nothing */
+	if (nctrl->state != NVME_CTRL_CONNECTING) {
+		WARN_ON_ONCE(nctrl->state == NVME_CTRL_NEW ||
+			     nctrl->state == NVME_CTRL_LIVE);
+
+		return;
+	}
+
+	if (nvmf_should_reconnect(nctrl)) {
+		dev_info(nctrl->device, "Reconnecting in %d seconds...\n",
+			 nctrl->opts->reconnect_delay);
+		queue_delayed_work(nvme_wq,
+				   &to_tcp_ofld_ctrl(nctrl)->connect_work,
+				   nctrl->opts->reconnect_delay * HZ);
+	} else {
+		dev_info(nctrl->device, "Removing controller...\n");
+		nvme_delete_ctrl(nctrl);
+	}
+}
+
 static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
 {
 	struct nvmf_ctrl_options *opts = nctrl->opts;
@@ -407,10 +447,68 @@ nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
 	/* Placeholder - teardown_io_queues */
 }
 
+static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl =
+				container_of(to_delayed_work(work),
+					     struct nvme_tcp_ofld_ctrl,
+					     connect_work);
+	struct nvme_ctrl *nctrl = &ctrl->nctrl;
+
+	++nctrl->nr_reconnects;
+
+	if (ctrl->dev->ops->setup_ctrl(ctrl, false))
+		goto requeue;
+
+	if (nvme_tcp_ofld_setup_ctrl(nctrl, false))
+		goto release_and_requeue;
+
+	dev_info(nctrl->device, "Successfully reconnected (%d attempt)\n",
+		 nctrl->nr_reconnects);
+
+	nctrl->nr_reconnects = 0;
+
+	return;
+
+release_and_requeue:
+	ctrl->dev->ops->release_ctrl(ctrl);
+requeue:
+	dev_info(nctrl->device, "Failed reconnect attempt %d\n",
+		 nctrl->nr_reconnects);
+	nvme_tcp_ofld_reconnect_or_remove(nctrl);
+}
+
+static void nvme_tcp_ofld_error_recovery_work(struct work_struct *work)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl =
+		container_of(work, struct nvme_tcp_ofld_ctrl, err_work);
+	struct nvme_ctrl *nctrl = &ctrl->nctrl;
+
+	nvme_stop_keep_alive(nctrl);
+	nvme_tcp_ofld_teardown_io_queues(nctrl, false);
+	/* unquiesce to fail fast pending requests */
+	nvme_start_queues(nctrl);
+	nvme_tcp_ofld_teardown_admin_queue(nctrl, false);
+	blk_mq_unquiesce_queue(nctrl->admin_q);
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
+		/* state change failure is ok if we started nctrl delete */
+		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
+			     nctrl->state != NVME_CTRL_DELETING_NOIO);
+
+		return;
+	}
+
+	nvme_tcp_ofld_reconnect_or_remove(nctrl);
+}
+
 static void
 nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown)
 {
-	/* Placeholder - err_work and connect_work */
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+
+	cancel_work_sync(&ctrl->err_work);
+	cancel_delayed_work_sync(&ctrl->connect_work);
 	nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown);
 	blk_mq_quiesce_queue(nctrl->admin_q);
 	if (shutdown)
@@ -425,6 +523,38 @@ static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl)
 	nvme_tcp_ofld_teardown_ctrl(nctrl, true);
 }
 
+static void nvme_tcp_ofld_reset_ctrl_work(struct work_struct *work)
+{
+	struct nvme_ctrl *nctrl =
+		container_of(work, struct nvme_ctrl, reset_work);
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+
+	nvme_stop_ctrl(nctrl);
+	nvme_tcp_ofld_teardown_ctrl(nctrl, false);
+
+	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
+		/* state change failure is ok if we started ctrl delete */
+		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
+			     nctrl->state != NVME_CTRL_DELETING_NOIO);
+
+		return;
+	}
+
+	if (ctrl->dev->ops->setup_ctrl(ctrl, false))
+		goto out_fail;
+
+	if (nvme_tcp_ofld_setup_ctrl(nctrl, false))
+		goto release_ctrl;
+
+	return;
+
+release_ctrl:
+	ctrl->dev->ops->release_ctrl(ctrl);
+out_fail:
+	++nctrl->nr_reconnects;
+	nvme_tcp_ofld_reconnect_or_remove(nctrl);
+}
+
 static int
 nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
 			   struct request *rq,
@@ -521,6 +651,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
 			     opts->nr_poll_queues + 1;
 	nctrl->sqsize = opts->queue_size - 1;
 	nctrl->kato = opts->kato;
+	INIT_DELAYED_WORK(&ctrl->connect_work,
+			  nvme_tcp_ofld_reconnect_ctrl_work);
+	INIT_WORK(&ctrl->err_work, nvme_tcp_ofld_error_recovery_work);
+	INIT_WORK(&nctrl->reset_work, nvme_tcp_ofld_reset_ctrl_work);
 	if (!(opts->mask & NVMF_OPT_TRSVCID)) {
 		opts->trsvcid =
 			kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL);
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
index 9fd270240eaa..b23b1d7ea6fa 100644
--- a/drivers/nvme/host/tcp-offload.h
+++ b/drivers/nvme/host/tcp-offload.h
@@ -204,3 +204,4 @@ struct nvme_tcp_ofld_ops {
 /* Exported functions for lower vendor specific offload drivers */
 int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
 void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
+void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 13/27] nvme-tcp-offload: Add queue level implementation
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (11 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 12/27] nvme-tcp-offload: Add controller level error recovery implementation Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-01 16:36   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 14/27] nvme-tcp-offload: Add IO " Shai Malin
                   ` (14 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Dean Balandin

From: Dean Balandin <dbalandin@marvell.com>

In this patch we implement queue level functionality.
The implementation is similar to the nvme-tcp module, the main
difference being that we call the vendor specific create_queue op which
creates the TCP connection, and NVMeTPC connection including
icreq+icresp negotiation.
Once create_queue returns successfully, we can move on to the fabrics
connect.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/host/tcp-offload.c | 415 ++++++++++++++++++++++++++++++--
 drivers/nvme/host/tcp-offload.h |   2 +-
 2 files changed, 390 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 9082b11c133f..8ddce2257100 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -22,6 +22,11 @@ static inline struct nvme_tcp_ofld_ctrl *to_tcp_ofld_ctrl(struct nvme_ctrl *nctr
 	return container_of(nctrl, struct nvme_tcp_ofld_ctrl, nctrl);
 }
 
+static inline int nvme_tcp_ofld_qid(struct nvme_tcp_ofld_queue *queue)
+{
+	return queue - queue->ctrl->queues;
+}
+
 /**
  * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
  * function.
@@ -191,12 +196,94 @@ nvme_tcp_ofld_alloc_tagset(struct nvme_ctrl *nctrl, bool admin)
 	return set;
 }
 
+static void __nvme_tcp_ofld_stop_queue(struct nvme_tcp_ofld_queue *queue)
+{
+	queue->dev->ops->drain_queue(queue);
+	queue->dev->ops->destroy_queue(queue);
+}
+
+static void nvme_tcp_ofld_stop_queue(struct nvme_ctrl *nctrl, int qid)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
+
+	if (!test_and_clear_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags))
+		return;
+
+	__nvme_tcp_ofld_stop_queue(queue);
+}
+
+static void nvme_tcp_ofld_stop_io_queues(struct nvme_ctrl *ctrl)
+{
+	int i;
+
+	for (i = 1; i < ctrl->queue_count; i++)
+		nvme_tcp_ofld_stop_queue(ctrl, i);
+}
+
+static void nvme_tcp_ofld_free_queue(struct nvme_ctrl *nctrl, int qid)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
+
+	if (!test_and_clear_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags))
+		return;
+
+	queue = &ctrl->queues[qid];
+	queue->ctrl = NULL;
+	queue->dev = NULL;
+	queue->report_err = NULL;
+}
+
+static void nvme_tcp_ofld_destroy_admin_queue(struct nvme_ctrl *nctrl, bool remove)
+{
+	nvme_tcp_ofld_stop_queue(nctrl, 0);
+	if (remove) {
+		blk_cleanup_queue(nctrl->admin_q);
+		blk_cleanup_queue(nctrl->fabrics_q);
+		blk_mq_free_tag_set(nctrl->admin_tagset);
+	}
+}
+
+static int nvme_tcp_ofld_start_queue(struct nvme_ctrl *nctrl, int qid)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
+	int rc;
+
+	queue = &ctrl->queues[qid];
+	if (qid) {
+		queue->cmnd_capsule_len = nctrl->ioccsz * 16;
+		rc = nvmf_connect_io_queue(nctrl, qid, false);
+	} else {
+		queue->cmnd_capsule_len = sizeof(struct nvme_command) + NVME_TCP_ADMIN_CCSZ;
+		rc = nvmf_connect_admin_queue(nctrl);
+	}
+
+	if (!rc) {
+		set_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags);
+	} else {
+		if (test_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags))
+			__nvme_tcp_ofld_stop_queue(queue);
+		dev_err(nctrl->device,
+			"failed to connect queue: %d ret=%d\n", qid, rc);
+	}
+
+	return rc;
+}
+
 static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 					       bool new)
 {
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[0];
 	int rc;
 
-	/* Placeholder - alloc_admin_queue */
+	rc = ctrl->dev->ops->create_queue(queue, 0, NVME_AQ_DEPTH);
+	if (rc)
+		return rc;
+
+	set_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags);
 	if (new) {
 		nctrl->admin_tagset =
 				nvme_tcp_ofld_alloc_tagset(nctrl, true);
@@ -221,7 +308,9 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 		}
 	}
 
-	/* Placeholder - nvme_tcp_ofld_start_queue */
+	rc = nvme_tcp_ofld_start_queue(nctrl, 0);
+	if (rc)
+		goto out_cleanup_queue;
 
 	rc = nvme_enable_ctrl(nctrl);
 	if (rc)
@@ -238,11 +327,12 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 out_quiesce_queue:
 	blk_mq_quiesce_queue(nctrl->admin_q);
 	blk_sync_queue(nctrl->admin_q);
-
 out_stop_queue:
-	/* Placeholder - stop offload queue */
+	nvme_tcp_ofld_stop_queue(nctrl, 0);
 	nvme_cancel_admin_tagset(nctrl);
-
+out_cleanup_queue:
+	if (new)
+		blk_cleanup_queue(nctrl->admin_q);
 out_cleanup_fabrics_q:
 	if (new)
 		blk_cleanup_queue(nctrl->fabrics_q);
@@ -250,7 +340,127 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 	if (new)
 		blk_mq_free_tag_set(nctrl->admin_tagset);
 out_free_queue:
-	/* Placeholder - free admin queue */
+	nvme_tcp_ofld_free_queue(nctrl, 0);
+
+	return rc;
+}
+
+static unsigned int nvme_tcp_ofld_nr_io_queues(struct nvme_ctrl *nctrl)
+{
+	unsigned int nr_io_queues;
+
+	nr_io_queues = min(nctrl->opts->nr_io_queues, num_online_cpus());
+	nr_io_queues += min(nctrl->opts->nr_write_queues, num_online_cpus());
+	nr_io_queues += min(nctrl->opts->nr_poll_queues, num_online_cpus());
+
+	return nr_io_queues;
+}
+
+static void
+nvme_tcp_ofld_set_io_queues(struct nvme_ctrl *nctrl, unsigned int nr_io_queues)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	struct nvmf_ctrl_options *opts = nctrl->opts;
+
+	if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
+		/*
+		 * separate read/write queues
+		 * hand out dedicated default queues only after we have
+		 * sufficient read queues.
+		 */
+		ctrl->io_queues[HCTX_TYPE_READ] = opts->nr_io_queues;
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_READ];
+		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+			min(opts->nr_write_queues, nr_io_queues);
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
+	} else {
+		/*
+		 * shared read/write queues
+		 * either no write queues were requested, or we don't have
+		 * sufficient queue count to have dedicated default queues.
+		 */
+		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+			min(opts->nr_io_queues, nr_io_queues);
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
+	}
+
+	if (opts->nr_poll_queues && nr_io_queues) {
+		/* map dedicated poll queues only if we have queues left */
+		ctrl->io_queues[HCTX_TYPE_POLL] =
+			min(opts->nr_poll_queues, nr_io_queues);
+	}
+}
+
+static void
+nvme_tcp_ofld_terminate_io_queues(struct nvme_ctrl *nctrl, int start_from)
+{
+	int i;
+
+	/* admin-q will be ignored because of the loop condition */
+	for (i = start_from; i >= 1; i--)
+		nvme_tcp_ofld_stop_queue(nctrl, i);
+}
+
+static int nvme_tcp_ofld_create_io_queues(struct nvme_ctrl *nctrl)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	int i, rc;
+
+	for (i = 1; i < nctrl->queue_count; i++) {
+		rc = ctrl->dev->ops->create_queue(&ctrl->queues[i],
+						  i, nctrl->sqsize + 1);
+		if (rc)
+			goto out_free_queues;
+
+		set_bit(NVME_TCP_OFLD_Q_ALLOCATED, &ctrl->queues[i].flags);
+	}
+
+	return 0;
+
+out_free_queues:
+	nvme_tcp_ofld_terminate_io_queues(nctrl, --i);
+
+	return rc;
+}
+
+static int nvme_tcp_ofld_alloc_io_queues(struct nvme_ctrl *nctrl)
+{
+	unsigned int nr_io_queues;
+	int rc;
+
+	nr_io_queues = nvme_tcp_ofld_nr_io_queues(nctrl);
+	rc = nvme_set_queue_count(nctrl, &nr_io_queues);
+	if (rc)
+		return rc;
+
+	nctrl->queue_count = nr_io_queues + 1;
+	if (nctrl->queue_count < 2) {
+		dev_err(nctrl->device,
+			"unable to set any I/O queues\n");
+
+		return -ENOMEM;
+	}
+
+	dev_info(nctrl->device, "creating %d I/O queues.\n", nr_io_queues);
+	nvme_tcp_ofld_set_io_queues(nctrl, nr_io_queues);
+
+	return nvme_tcp_ofld_create_io_queues(nctrl);
+}
+
+static int nvme_tcp_ofld_start_io_queues(struct nvme_ctrl *nctrl)
+{
+	int i, rc = 0;
+
+	for (i = 1; i < nctrl->queue_count; i++) {
+		rc = nvme_tcp_ofld_start_queue(nctrl, i);
+		if (rc)
+			goto terminate_queues;
+	}
+
+	return 0;
+
+terminate_queues:
+	nvme_tcp_ofld_terminate_io_queues(nctrl, --i);
 
 	return rc;
 }
@@ -258,9 +468,10 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
 static int
 nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 {
-	int rc;
+	int rc = nvme_tcp_ofld_alloc_io_queues(nctrl);
 
-	/* Placeholder - alloc_io_queues */
+	if (rc)
+		return rc;
 
 	if (new) {
 		nctrl->tagset = nvme_tcp_ofld_alloc_tagset(nctrl, false);
@@ -278,7 +489,9 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 		}
 	}
 
-	/* Placeholder - start_io_queues */
+	rc = nvme_tcp_ofld_start_io_queues(nctrl);
+	if (rc)
+		goto out_cleanup_connect_q;
 
 	if (!new) {
 		nvme_start_queues(nctrl);
@@ -300,16 +513,16 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
 out_wait_freeze_timed_out:
 	nvme_stop_queues(nctrl);
 	nvme_sync_io_queues(nctrl);
-
-	/* Placeholder - Stop IO queues */
-
+	nvme_tcp_ofld_stop_io_queues(nctrl);
+out_cleanup_connect_q:
+	nvme_cancel_tagset(nctrl);
 	if (new)
 		blk_cleanup_queue(nctrl->connect_q);
 out_free_tag_set:
 	if (new)
 		blk_mq_free_tag_set(nctrl->tagset);
 out_free_io_queues:
-	/* Placeholder - free_io_queues */
+	nvme_tcp_ofld_terminate_io_queues(nctrl, nctrl->queue_count);
 
 	return rc;
 }
@@ -336,6 +549,26 @@ static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)
 	}
 }
 
+static int
+nvme_tcp_ofld_init_admin_hctx(struct blk_mq_hw_ctx *hctx, void *data,
+			      unsigned int hctx_idx)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = data;
+
+	hctx->driver_data = &ctrl->queues[0];
+
+	return 0;
+}
+
+static void nvme_tcp_ofld_destroy_io_queues(struct nvme_ctrl *nctrl, bool remove)
+{
+	nvme_tcp_ofld_stop_io_queues(nctrl);
+	if (remove) {
+		blk_cleanup_queue(nctrl->connect_q);
+		blk_mq_free_tag_set(nctrl->tagset);
+	}
+}
+
 static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
 {
 	struct nvmf_ctrl_options *opts = nctrl->opts;
@@ -387,9 +620,19 @@ static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
 	return 0;
 
 destroy_io:
-	/* Placeholder - stop and destroy io queues*/
+	if (nctrl->queue_count > 1) {
+		nvme_stop_queues(nctrl);
+		nvme_sync_io_queues(nctrl);
+		nvme_tcp_ofld_stop_io_queues(nctrl);
+		nvme_cancel_tagset(nctrl);
+		nvme_tcp_ofld_destroy_io_queues(nctrl, new);
+	}
 destroy_admin:
-	/* Placeholder - stop and destroy admin queue*/
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	blk_sync_queue(nctrl->admin_q);
+	nvme_tcp_ofld_stop_queue(nctrl, 0);
+	nvme_cancel_admin_tagset(nctrl);
+	nvme_tcp_ofld_destroy_admin_queue(nctrl, new);
 
 	return rc;
 }
@@ -410,6 +653,18 @@ nvme_tcp_ofld_check_dev_opts(struct nvmf_ctrl_options *opts,
 	return 0;
 }
 
+static void nvme_tcp_ofld_free_ctrl_queues(struct nvme_ctrl *nctrl)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
+	int i;
+
+	for (i = 0; i < nctrl->queue_count; ++i)
+		nvme_tcp_ofld_free_queue(nctrl, i);
+
+	kfree(ctrl->queues);
+	ctrl->queues = NULL;
+}
+
 static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
 {
 	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
@@ -419,6 +674,7 @@ static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
 		goto free_ctrl;
 
 	down_write(&nvme_tcp_ofld_ctrl_rwsem);
+	nvme_tcp_ofld_free_ctrl_queues(nctrl);
 	ctrl->dev->ops->release_ctrl(ctrl);
 	list_del(&ctrl->list);
 	up_write(&nvme_tcp_ofld_ctrl_rwsem);
@@ -436,15 +692,37 @@ static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)
 }
 
 static void
-nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *ctrl, bool remove)
+nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *nctrl, bool remove)
 {
-	/* Placeholder - teardown_admin_queue */
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	blk_sync_queue(nctrl->admin_q);
+
+	nvme_tcp_ofld_stop_queue(nctrl, 0);
+	nvme_cancel_admin_tagset(nctrl);
+
+	if (remove)
+		blk_mq_unquiesce_queue(nctrl->admin_q);
+
+	nvme_tcp_ofld_destroy_admin_queue(nctrl, remove);
 }
 
 static void
 nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
 {
-	/* Placeholder - teardown_io_queues */
+	if (nctrl->queue_count <= 1)
+		return;
+
+	blk_mq_quiesce_queue(nctrl->admin_q);
+	nvme_start_freeze(nctrl);
+	nvme_stop_queues(nctrl);
+	nvme_sync_io_queues(nctrl);
+	nvme_tcp_ofld_stop_io_queues(nctrl);
+	nvme_cancel_tagset(nctrl);
+
+	if (remove)
+		nvme_start_queues(nctrl);
+
+	nvme_tcp_ofld_destroy_io_queues(nctrl, remove);
 }
 
 static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)
@@ -572,6 +850,17 @@ nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
 	return 0;
 }
 
+inline size_t nvme_tcp_ofld_inline_data_size(struct nvme_tcp_ofld_queue *queue)
+{
+	return queue->cmnd_capsule_len - sizeof(struct nvme_command);
+}
+EXPORT_SYMBOL_GPL(nvme_tcp_ofld_inline_data_size);
+
+static void nvme_tcp_ofld_commit_rqs(struct blk_mq_hw_ctx *hctx)
+{
+	/* Call ops->commit_rqs */
+}
+
 static blk_status_t
 nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
 		       const struct blk_mq_queue_data *bd)
@@ -583,22 +872,96 @@ nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return BLK_STS_OK;
 }
 
+static void
+nvme_tcp_ofld_exit_request(struct blk_mq_tag_set *set,
+			   struct request *rq, unsigned int hctx_idx)
+{
+	/*
+	 * Nothing is allocated in nvme_tcp_ofld_init_request,
+	 * hence empty.
+	 */
+}
+
+static int
+nvme_tcp_ofld_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
+			unsigned int hctx_idx)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = data;
+
+	hctx->driver_data = &ctrl->queues[hctx_idx + 1];
+
+	return 0;
+}
+
+static int nvme_tcp_ofld_map_queues(struct blk_mq_tag_set *set)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
+	struct nvmf_ctrl_options *opts = ctrl->nctrl.opts;
+
+	if (opts->nr_write_queues && ctrl->io_queues[HCTX_TYPE_READ]) {
+		/* separate read/write queues */
+		set->map[HCTX_TYPE_DEFAULT].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+		set->map[HCTX_TYPE_READ].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_READ];
+		set->map[HCTX_TYPE_READ].queue_offset =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+	} else {
+		/* shared read/write queues */
+		set->map[HCTX_TYPE_DEFAULT].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+		set->map[HCTX_TYPE_READ].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_READ].queue_offset = 0;
+	}
+	blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
+	blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
+
+	if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) {
+		/* map dedicated poll queues only if we have queues left */
+		set->map[HCTX_TYPE_POLL].nr_queues =
+				ctrl->io_queues[HCTX_TYPE_POLL];
+		set->map[HCTX_TYPE_POLL].queue_offset =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT] +
+			ctrl->io_queues[HCTX_TYPE_READ];
+		blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
+	}
+
+	dev_info(ctrl->nctrl.device,
+		 "mapped %d/%d/%d default/read/poll queues.\n",
+		 ctrl->io_queues[HCTX_TYPE_DEFAULT],
+		 ctrl->io_queues[HCTX_TYPE_READ],
+		 ctrl->io_queues[HCTX_TYPE_POLL]);
+
+	return 0;
+}
+
+static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
+{
+	/* Placeholder - Implement polling mechanism */
+
+	return 0;
+}
+
 static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
 	.queue_rq	= nvme_tcp_ofld_queue_rq,
+	.commit_rqs     = nvme_tcp_ofld_commit_rqs,
+	.complete	= nvme_complete_rq,
 	.init_request	= nvme_tcp_ofld_init_request,
-	/*
-	 * All additional ops will be also implemented and registered similar to
-	 * tcp.c
-	 */
+	.exit_request	= nvme_tcp_ofld_exit_request,
+	.init_hctx	= nvme_tcp_ofld_init_hctx,
+	.map_queues	= nvme_tcp_ofld_map_queues,
+	.poll		= nvme_tcp_ofld_poll,
 };
 
 static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
 	.queue_rq	= nvme_tcp_ofld_queue_rq,
+	.complete	= nvme_complete_rq,
 	.init_request	= nvme_tcp_ofld_init_request,
-	/*
-	 * All additional ops will be also implemented and registered similar to
-	 * tcp.c
-	 */
+	.exit_request	= nvme_tcp_ofld_exit_request,
+	.init_hctx	= nvme_tcp_ofld_init_admin_hctx,
 };
 
 static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
index b23b1d7ea6fa..d82645fcf9da 100644
--- a/drivers/nvme/host/tcp-offload.h
+++ b/drivers/nvme/host/tcp-offload.h
@@ -105,7 +105,6 @@ struct nvme_tcp_ofld_ctrl {
 	 * Each entry in the array indicates the number of queues of
 	 * corresponding type.
 	 */
-	u32 queue_type_mapping[HCTX_MAX_TYPES];
 	u32 io_queues[HCTX_MAX_TYPES];
 
 	/* Connectivity params */
@@ -205,3 +204,4 @@ struct nvme_tcp_ofld_ops {
 int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
 void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
 void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);
+inline size_t nvme_tcp_ofld_inline_data_size(struct nvme_tcp_ofld_queue *queue);
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 14/27] nvme-tcp-offload: Add IO level implementation
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (12 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 13/27] nvme-tcp-offload: Add queue level implementation Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-01 16:38   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 15/27] nvme-tcp-offload: Add Timeout and ASYNC Support Shai Malin
                   ` (13 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Dean Balandin

From: Dean Balandin <dbalandin@marvell.com>

In this patch, we present the IO level functionality.
The nvme-tcp-offload shall work on the IO-level, meaning the
nvme-tcp-offload ULP module shall pass the request to the nvme-tcp-offload
vendor driver and shall expect for the request compilation.
No additional handling is needed in between, this design will reduce the
CPU utilization as we will describe below.

The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following IO-path ops:
 - init_req
 - send_req - in order to pass the request to the handling of the offload
   driver that shall pass it to the vendor specific device
 - poll_queue

The vendor driver will manage the context from which the request will be
executed and the request aggregations.
Once the IO completed, the nvme-tcp-offload vendor driver shall call
command.done() that shall invoke the nvme-tcp-offload ULP layer for
completing the request.

This patch also contains initial definition of nvme_tcp_ofld_queue_rq().

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/host/tcp-offload.c | 95 ++++++++++++++++++++++++++++++---
 1 file changed, 87 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 8ddce2257100..0cdf5a432208 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -127,7 +127,10 @@ void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
 			    union nvme_result *result,
 			    __le16 status)
 {
-	/* Placeholder - complete request with/without error */
+	struct request *rq = blk_mq_rq_from_pdu(req);
+
+	if (!nvme_try_complete_req(rq, cpu_to_le16(status << 1), *result))
+		nvme_complete_rq(rq);
 }
 
 struct nvme_tcp_ofld_dev *
@@ -686,6 +689,34 @@ static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
 	kfree(ctrl);
 }
 
+static void nvme_tcp_ofld_set_sg_null(struct nvme_command *c)
+{
+	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
+
+	sg->addr = 0;
+	sg->length = 0;
+	sg->type = (NVME_TRANSPORT_SGL_DATA_DESC << 4) | NVME_SGL_FMT_TRANSPORT_A;
+}
+
+inline void nvme_tcp_ofld_set_sg_inline(struct nvme_tcp_ofld_queue *queue,
+					struct nvme_command *c, u32 data_len)
+{
+	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
+
+	sg->addr = cpu_to_le64(queue->ctrl->nctrl.icdoff);
+	sg->length = cpu_to_le32(data_len);
+	sg->type = (NVME_SGL_FMT_DATA_DESC << 4) | NVME_SGL_FMT_OFFSET;
+}
+
+void nvme_tcp_ofld_map_data(struct nvme_command *c, u32 data_len)
+{
+	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
+
+	sg->addr = 0;
+	sg->length = cpu_to_le32(data_len);
+	sg->type = (NVME_TRANSPORT_SGL_DATA_DESC << 4) | NVME_SGL_FMT_TRANSPORT_A;
+}
+
 static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)
 {
 	/* Placeholder - submit_async_event */
@@ -841,9 +872,11 @@ nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
 {
 	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
 	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
+	int qid;
 
-	/* Placeholder - init request */
-
+	qid = (set == &ctrl->tag_set) ? hctx_idx + 1 : 0;
+	req->queue = &ctrl->queues[qid];
+	nvme_req(rq)->ctrl = &ctrl->nctrl;
 	req->done = nvme_tcp_ofld_req_done;
 	ctrl->dev->ops->init_req(req);
 
@@ -858,16 +891,60 @@ EXPORT_SYMBOL_GPL(nvme_tcp_ofld_inline_data_size);
 
 static void nvme_tcp_ofld_commit_rqs(struct blk_mq_hw_ctx *hctx)
 {
-	/* Call ops->commit_rqs */
+	struct nvme_tcp_ofld_queue *queue = hctx->driver_data;
+	struct nvme_tcp_ofld_dev *dev = queue->dev;
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
+
+	ops->commit_rqs(queue);
 }
 
 static blk_status_t
 nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
 		       const struct blk_mq_queue_data *bd)
 {
-	/* Call nvme_setup_cmd(...) */
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(bd->rq);
+	struct nvme_tcp_ofld_queue *queue = hctx->driver_data;
+	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
+	struct nvme_ns *ns = hctx->queue->queuedata;
+	struct nvme_tcp_ofld_dev *dev = queue->dev;
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
+	struct nvme_command *nvme_cmd;
+	struct request *rq;
+	bool queue_ready;
+	u32 data_len;
+	int rc;
+
+	queue_ready = test_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags);
+
+	req->rq = bd->rq;
+	req->async = false;
+	rq = req->rq;
+
+	if (!nvmf_check_ready(&ctrl->nctrl, req->rq, queue_ready))
+		return nvmf_fail_nonready_command(&ctrl->nctrl, req->rq);
+
+	rc = nvme_setup_cmd(ns, req->rq, &req->nvme_cmd);
+	if (unlikely(rc))
+		return rc;
 
-	/* Call ops->send_req(...) */
+	blk_mq_start_request(req->rq);
+	req->last = bd->last;
+
+	nvme_cmd = &req->nvme_cmd;
+	nvme_cmd->common.flags |= NVME_CMD_SGL_METABUF;
+
+	data_len = blk_rq_nr_phys_segments(rq) ? blk_rq_payload_bytes(rq) : 0;
+	if (!data_len)
+		nvme_tcp_ofld_set_sg_null(&req->nvme_cmd);
+	else if ((rq_data_dir(rq) == WRITE) &&
+		 data_len <= nvme_tcp_ofld_inline_data_size(queue))
+		nvme_tcp_ofld_set_sg_inline(queue, nvme_cmd, data_len);
+	else
+		nvme_tcp_ofld_map_data(nvme_cmd, data_len);
+
+	rc = ops->send_req(req);
+	if (unlikely(rc))
+		return rc;
 
 	return BLK_STS_OK;
 }
@@ -940,9 +1017,11 @@ static int nvme_tcp_ofld_map_queues(struct blk_mq_tag_set *set)
 
 static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
 {
-	/* Placeholder - Implement polling mechanism */
+	struct nvme_tcp_ofld_queue *queue = hctx->driver_data;
+	struct nvme_tcp_ofld_dev *dev = queue->dev;
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
 
-	return 0;
+	return ops->poll_queue(queue);
 }
 
 static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 15/27] nvme-tcp-offload: Add Timeout and ASYNC Support
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (13 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 14/27] nvme-tcp-offload: Add IO " Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-01 16:45   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 16/27] qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver Shai Malin
                   ` (12 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

In this patch, we present the nvme-tcp-offload timeout support
nvme_tcp_ofld_timeout() and ASYNC support
nvme_tcp_ofld_submit_async_event().

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/host/tcp-offload.c | 85 ++++++++++++++++++++++++++++++++-
 drivers/nvme/host/tcp-offload.h |  2 +
 2 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
index 0cdf5a432208..1d62f921f109 100644
--- a/drivers/nvme/host/tcp-offload.c
+++ b/drivers/nvme/host/tcp-offload.c
@@ -133,6 +133,26 @@ void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
 		nvme_complete_rq(rq);
 }
 
+/**
+ * nvme_tcp_ofld_async_req_done() - NVMeTCP Offload request done callback
+ * function for async request. Pointed to by nvme_tcp_ofld_req->done.
+ * Handles both NVME_TCP_F_DATA_SUCCESS flag and NVMe CQ.
+ * @req:	NVMeTCP offload request to complete.
+ * @result:     The nvme_result.
+ * @status:     The completion status.
+ *
+ * API function that allows the vendor specific offload driver to report request
+ * completions to the common offload layer.
+ */
+void nvme_tcp_ofld_async_req_done(struct nvme_tcp_ofld_req *req,
+				  union nvme_result *result, __le16 status)
+{
+	struct nvme_tcp_ofld_queue *queue = req->queue;
+	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
+
+	nvme_complete_async_event(&ctrl->nctrl, status, result);
+}
+
 struct nvme_tcp_ofld_dev *
 nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)
 {
@@ -719,7 +739,23 @@ void nvme_tcp_ofld_map_data(struct nvme_command *c, u32 data_len)
 
 static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)
 {
-	/* Placeholder - submit_async_event */
+	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(arg);
+	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[0];
+	struct nvme_tcp_ofld_dev *dev = queue->dev;
+	struct nvme_tcp_ofld_ops *ops = dev->ops;
+
+	ctrl->async_req.nvme_cmd.common.opcode = nvme_admin_async_event;
+	ctrl->async_req.nvme_cmd.common.command_id = NVME_AQ_BLK_MQ_DEPTH;
+	ctrl->async_req.nvme_cmd.common.flags |= NVME_CMD_SGL_METABUF;
+
+	nvme_tcp_ofld_set_sg_null(&ctrl->async_req.nvme_cmd);
+
+	ctrl->async_req.async = true;
+	ctrl->async_req.queue = queue;
+	ctrl->async_req.last = true;
+	ctrl->async_req.done = nvme_tcp_ofld_async_req_done;
+
+	ops->send_req(&ctrl->async_req);
 }
 
 static void
@@ -1024,6 +1060,51 @@ static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
 	return ops->poll_queue(queue);
 }
 
+static void nvme_tcp_ofld_complete_timed_out(struct request *rq)
+{
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_ctrl *nctrl = &req->queue->ctrl->nctrl;
+
+	nvme_tcp_ofld_stop_queue(nctrl, nvme_tcp_ofld_qid(req->queue));
+	if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
+		nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
+		blk_mq_complete_request(rq);
+	}
+}
+
+static enum blk_eh_timer_return nvme_tcp_ofld_timeout(struct request *rq, bool reserved)
+{
+	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_tcp_ofld_ctrl *ctrl = req->queue->ctrl;
+
+	dev_warn(ctrl->nctrl.device,
+		 "queue %d: timeout request %#x type %d\n",
+		 nvme_tcp_ofld_qid(req->queue), rq->tag, req->nvme_cmd.common.opcode);
+
+	if (ctrl->nctrl.state != NVME_CTRL_LIVE) {
+		/*
+		 * If we are resetting, connecting or deleting we should
+		 * complete immediately because we may block controller
+		 * teardown or setup sequence
+		 * - ctrl disable/shutdown fabrics requests
+		 * - connect requests
+		 * - initialization admin requests
+		 * - I/O requests that entered after unquiescing and
+		 *   the controller stopped responding
+		 *
+		 * All other requests should be cancelled by the error
+		 * recovery work, so it's fine that we fail it here.
+		 */
+		nvme_tcp_ofld_complete_timed_out(rq);
+
+		return BLK_EH_DONE;
+	}
+
+	nvme_tcp_ofld_error_recovery(&ctrl->nctrl);
+
+	return BLK_EH_RESET_TIMER;
+}
+
 static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
 	.queue_rq	= nvme_tcp_ofld_queue_rq,
 	.commit_rqs     = nvme_tcp_ofld_commit_rqs,
@@ -1031,6 +1112,7 @@ static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
 	.init_request	= nvme_tcp_ofld_init_request,
 	.exit_request	= nvme_tcp_ofld_exit_request,
 	.init_hctx	= nvme_tcp_ofld_init_hctx,
+	.timeout	= nvme_tcp_ofld_timeout,
 	.map_queues	= nvme_tcp_ofld_map_queues,
 	.poll		= nvme_tcp_ofld_poll,
 };
@@ -1041,6 +1123,7 @@ static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
 	.init_request	= nvme_tcp_ofld_init_request,
 	.exit_request	= nvme_tcp_ofld_exit_request,
 	.init_hctx	= nvme_tcp_ofld_init_admin_hctx,
+	.timeout	= nvme_tcp_ofld_timeout,
 };
 
 static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
index d82645fcf9da..275a7e2d9d8a 100644
--- a/drivers/nvme/host/tcp-offload.h
+++ b/drivers/nvme/host/tcp-offload.h
@@ -110,6 +110,8 @@ struct nvme_tcp_ofld_ctrl {
 	/* Connectivity params */
 	struct nvme_tcp_ofld_ctrl_con_params conn_params;
 
+	struct nvme_tcp_ofld_req async_req;
+
 	/* Vendor specific driver context */
 	void *private_data;
 };
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 16/27] qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (14 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 15/27] nvme-tcp-offload: Add Timeout and ASYNC Support Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:27   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 17/27] qedn: Add qedn probe Shai Malin
                   ` (11 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Arie Gershberg

This patch will present the skeleton of the qedn driver.
The new driver will be added under "drivers/nvme/hw/qedn" and will be
enabled by the Kconfig "Marvell NVM Express over Fabrics TCP offload".

The internal implementation:
- qedn.h:
  Includes all common structs to be used by the qedn vendor driver.

- qedn_main.c
  Includes the qedn_init and qedn_cleanup implementation.
  As part of the qedn init, the driver will register as a pci device and
  will work with the Marvell fastlinQ NICs.
  As part of the probe, the driver will register to the nvme_tcp_offload
  (ULP).

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 MAINTAINERS                      |  10 ++
 drivers/nvme/Kconfig             |   1 +
 drivers/nvme/Makefile            |   1 +
 drivers/nvme/hw/Kconfig          |   8 ++
 drivers/nvme/hw/Makefile         |   3 +
 drivers/nvme/hw/qedn/Makefile    |   5 +
 drivers/nvme/hw/qedn/qedn.h      |  19 +++
 drivers/nvme/hw/qedn/qedn_main.c | 201 +++++++++++++++++++++++++++++++
 8 files changed, 248 insertions(+)
 create mode 100644 drivers/nvme/hw/Kconfig
 create mode 100644 drivers/nvme/hw/Makefile
 create mode 100644 drivers/nvme/hw/qedn/Makefile
 create mode 100644 drivers/nvme/hw/qedn/qedn.h
 create mode 100644 drivers/nvme/hw/qedn/qedn_main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 0d85ae9e61e2..2d6f50523db8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14688,6 +14688,16 @@ S:	Supported
 F:	drivers/infiniband/hw/qedr/
 F:	include/uapi/rdma/qedr-abi.h
 
+QLOGIC QL4xxx NVME-TCP-OFFLOAD DRIVER
+M:	Shai Malin <smalin@marvell.com>
+M:	Ariel Elior <aelior@marvell.com>
+L:	linux-nvme@lists.infradead.org
+S:	Supported
+W:	http://git.infradead.org/nvme.git
+T:	git://git.infradead.org/nvme.git
+F:	drivers/nvme/hw/qedn/
+F:	include/linux/qed/
+
 QLOGIC QLA1280 SCSI DRIVER
 M:	Michael Reed <mdr@sgi.com>
 L:	linux-scsi@vger.kernel.org
diff --git a/drivers/nvme/Kconfig b/drivers/nvme/Kconfig
index 87ae409a32b9..827c2c9f0ad1 100644
--- a/drivers/nvme/Kconfig
+++ b/drivers/nvme/Kconfig
@@ -3,5 +3,6 @@ menu "NVME Support"
 
 source "drivers/nvme/host/Kconfig"
 source "drivers/nvme/target/Kconfig"
+source "drivers/nvme/hw/Kconfig"
 
 endmenu
diff --git a/drivers/nvme/Makefile b/drivers/nvme/Makefile
index fb42c44609a8..14c569040ef2 100644
--- a/drivers/nvme/Makefile
+++ b/drivers/nvme/Makefile
@@ -2,3 +2,4 @@
 
 obj-y		+= host/
 obj-y		+= target/
+obj-y		+= hw/
\ No newline at end of file
diff --git a/drivers/nvme/hw/Kconfig b/drivers/nvme/hw/Kconfig
new file mode 100644
index 000000000000..374f1f9dbd3d
--- /dev/null
+++ b/drivers/nvme/hw/Kconfig
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config NVME_QEDN
+	tristate "Marvell NVM Express over Fabrics TCP offload"
+	depends on NVME_TCP_OFFLOAD
+	help
+	  This enables the Marvell NVMe TCP offload support (qedn).
+
+	  If unsure, say N.
diff --git a/drivers/nvme/hw/Makefile b/drivers/nvme/hw/Makefile
new file mode 100644
index 000000000000..7780ea5ab520
--- /dev/null
+++ b/drivers/nvme/hw/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_NVME_QEDN)		+= qedn/
diff --git a/drivers/nvme/hw/qedn/Makefile b/drivers/nvme/hw/qedn/Makefile
new file mode 100644
index 000000000000..cb169bbaae18
--- /dev/null
+++ b/drivers/nvme/hw/qedn/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_NVME_QEDN) := qedn.o
+
+qedn-y := qedn_main.o
\ No newline at end of file
diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
new file mode 100644
index 000000000000..bcd0748a10fd
--- /dev/null
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+#ifndef _QEDN_H_
+#define _QEDN_H_
+
+/* Driver includes */
+#include "../../host/tcp-offload.h"
+
+#define QEDN_MODULE_NAME "qedn"
+
+struct qedn_ctx {
+	struct pci_dev *pdev;
+	struct nvme_tcp_ofld_dev qedn_ofld_dev;
+};
+
+#endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
new file mode 100644
index 000000000000..31d6d86d6eb7
--- /dev/null
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -0,0 +1,201 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+ /* Kernel includes */
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* Driver includes */
+#include "qedn.h"
+
+#define CHIP_NUM_AHP_NVMETCP 0x8194
+
+static struct pci_device_id qedn_pci_tbl[] = {
+	{ PCI_VDEVICE(QLOGIC, CHIP_NUM_AHP_NVMETCP), 0 },
+	{0, 0},
+};
+
+static int
+qedn_claim_dev(struct nvme_tcp_ofld_dev *dev,
+	       struct nvme_tcp_ofld_ctrl_con_params *conn_params)
+{
+	/* Placeholder - qedn_claim_dev */
+
+	return 0;
+}
+
+static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
+			     size_t q_size)
+{
+	/* Placeholder - qedn_create_queue */
+
+	return 0;
+}
+
+static void qedn_drain_queue(struct nvme_tcp_ofld_queue *queue)
+{
+	/* Placeholder - qedn_drain_queue */
+}
+
+static void qedn_destroy_queue(struct nvme_tcp_ofld_queue *queue)
+{
+	/* Placeholder - qedn_destroy_queue */
+}
+
+static int qedn_poll_queue(struct nvme_tcp_ofld_queue *queue)
+{
+	/*
+	 * Poll queue support will be added as part of future
+	 * enhancements.
+	 */
+
+	return 0;
+}
+
+static int qedn_init_req(struct nvme_tcp_ofld_req *req)
+{
+	/*
+	 * tcp-offload layer taking care of everything,
+	 * Nothing required from qedn driver, hence empty.
+	 */
+
+	return 0;
+}
+
+static void qedn_commit_rqs(struct nvme_tcp_ofld_queue *queue)
+{
+	/* Placeholder - queue work */
+}
+
+static int qedn_send_req(struct nvme_tcp_ofld_req *req)
+{
+	/* Placeholder - qedn_send_req */
+
+	return 0;
+}
+
+static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
+	.name = "qedn",
+	.module = THIS_MODULE,
+	.required_opts = NVMF_OPT_TRADDR,
+	.allowed_opts = NVMF_OPT_TRSVCID | NVMF_OPT_NR_WRITE_QUEUES |
+			NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
+			NVMF_OPT_RECONNECT_DELAY,
+		/* These flags will be as part of future enhancements
+		 *	NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
+		 *	NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS
+		 */
+	.claim_dev = qedn_claim_dev,
+	.create_queue = qedn_create_queue,
+	.drain_queue = qedn_drain_queue,
+	.destroy_queue = qedn_destroy_queue,
+	.poll_queue = qedn_poll_queue,
+	.init_req = qedn_init_req,
+	.send_req = qedn_send_req,
+	.commit_rqs = qedn_commit_rqs,
+};
+
+static void __qedn_remove(struct pci_dev *pdev)
+{
+	struct qedn_ctx *qedn = pci_get_drvdata(pdev);
+
+	pr_notice("Starting qedn_remove\n");
+	nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
+	kfree(qedn);
+	pr_notice("Ending qedn_remove successfully\n");
+}
+
+static void qedn_remove(struct pci_dev *pdev)
+{
+	__qedn_remove(pdev);
+}
+
+static void qedn_shutdown(struct pci_dev *pdev)
+{
+	__qedn_remove(pdev);
+}
+
+static struct qedn_ctx *qedn_alloc_ctx(struct pci_dev *pdev)
+{
+	struct qedn_ctx *qedn = NULL;
+
+	qedn = kzalloc(sizeof(*qedn), GFP_KERNEL);
+	if (!qedn)
+		return NULL;
+
+	qedn->pdev = pdev;
+	pci_set_drvdata(pdev, qedn);
+
+	return qedn;
+}
+
+static int __qedn_probe(struct pci_dev *pdev)
+{
+	struct qedn_ctx *qedn;
+	int rc;
+
+	pr_notice("Starting qedn probe\n");
+
+	qedn = qedn_alloc_ctx(pdev);
+	if (!qedn)
+		return -ENODEV;
+
+	qedn->qedn_ofld_dev.ops = &qedn_ofld_ops;
+	INIT_LIST_HEAD(&qedn->qedn_ofld_dev.entry);
+	rc = nvme_tcp_ofld_register_dev(&qedn->qedn_ofld_dev);
+	if (rc)
+		goto release_qedn;
+
+	return 0;
+release_qedn:
+	kfree(qedn);
+
+	return rc;
+}
+
+static int qedn_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	return __qedn_probe(pdev);
+}
+
+static struct pci_driver qedn_pci_driver = {
+	.name     = QEDN_MODULE_NAME,
+	.id_table = qedn_pci_tbl,
+	.probe    = qedn_probe,
+	.remove   = qedn_remove,
+	.shutdown = qedn_shutdown,
+};
+
+static int __init qedn_init(void)
+{
+	int rc;
+
+	rc = pci_register_driver(&qedn_pci_driver);
+	if (rc) {
+		pr_err("Failed to register pci driver\n");
+
+		return -EINVAL;
+	}
+
+	pr_notice("driver loaded successfully\n");
+
+	return 0;
+}
+
+static void __exit qedn_cleanup(void)
+{
+	pci_unregister_driver(&qedn_pci_driver);
+	pr_notice("Unloading qedn ended\n");
+}
+
+module_init(qedn_init);
+module_exit(qedn_cleanup);
+
+MODULE_LICENSE("GPL v2");
+MODULE_SOFTDEP("pre: qede nvme-fabrics nvme-tcp-offload");
+MODULE_DESCRIPTION("Marvell 25/50/100G NVMe-TCP Offload Host Driver");
+MODULE_AUTHOR("Marvell");
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 17/27] qedn: Add qedn probe
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (15 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 16/27] qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:28   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 18/27] qedn: Add qedn_claim_dev API support Shai Malin
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Dean Balandin

This patch introduces the functionality of loading and unloading
physical function.
qedn_probe() loads the offload device PF(physical function), and
initialize the HW and the FW with the PF parameters using the
HW ops->qed_nvmetcp_ops, which are similar to other "qed_*_ops" which
are used by the qede, qedr, qedf and qedi device drivers.
qedn_remove() unloads the offload device PF, re-initialize the HW and
the FW with the PF parameters.

The struct qedn_ctx is per PF container for PF-specific attributes and
resources.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/Kconfig          |   1 +
 drivers/nvme/hw/qedn/qedn.h      |  49 ++++++++
 drivers/nvme/hw/qedn/qedn_main.c | 191 ++++++++++++++++++++++++++++++-
 3 files changed, 236 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/hw/Kconfig b/drivers/nvme/hw/Kconfig
index 374f1f9dbd3d..91b1bd6f07d8 100644
--- a/drivers/nvme/hw/Kconfig
+++ b/drivers/nvme/hw/Kconfig
@@ -2,6 +2,7 @@
 config NVME_QEDN
 	tristate "Marvell NVM Express over Fabrics TCP offload"
 	depends on NVME_TCP_OFFLOAD
+	select QED_NVMETCP
 	help
 	  This enables the Marvell NVMe TCP offload support (qedn).
 
diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index bcd0748a10fd..c1ac17eabcb7 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -6,14 +6,63 @@
 #ifndef _QEDN_H_
 #define _QEDN_H_
 
+#include <linux/qed/qed_if.h>
+#include <linux/qed/qed_nvmetcp_if.h>
+
 /* Driver includes */
 #include "../../host/tcp-offload.h"
 
+#define QEDN_MAJOR_VERSION		8
+#define QEDN_MINOR_VERSION		62
+#define QEDN_REVISION_VERSION		10
+#define QEDN_ENGINEERING_VERSION	0
+#define DRV_MODULE_VERSION __stringify(QEDE_MAJOR_VERSION) "."	\
+		__stringify(QEDE_MINOR_VERSION) "."		\
+		__stringify(QEDE_REVISION_VERSION) "."		\
+		__stringify(QEDE_ENGINEERING_VERSION)
+
 #define QEDN_MODULE_NAME "qedn"
 
+#define QEDN_MAX_TASKS_PER_PF (16 * 1024)
+#define QEDN_MAX_CONNS_PER_PF (4 * 1024)
+#define QEDN_FW_CQ_SIZE (4 * 1024)
+#define QEDN_PROTO_CQ_PROD_IDX	0
+#define QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES 2
+
+enum qedn_state {
+	QEDN_STATE_CORE_PROBED = 0,
+	QEDN_STATE_CORE_OPEN,
+	QEDN_STATE_GL_PF_LIST_ADDED,
+	QEDN_STATE_MFW_STATE,
+	QEDN_STATE_REGISTERED_OFFLOAD_DEV,
+	QEDN_STATE_MODULE_REMOVE_ONGOING,
+};
+
 struct qedn_ctx {
 	struct pci_dev *pdev;
+	struct qed_dev *cdev;
+	struct qed_dev_nvmetcp_info dev_info;
 	struct nvme_tcp_ofld_dev qedn_ofld_dev;
+	struct qed_pf_params pf_params;
+
+	/* Global PF list entry */
+	struct list_head gl_pf_entry;
+
+	/* Accessed with atomic bit ops, used with enum qedn_state */
+	unsigned long state;
+
+	/* Fast path queues */
+	u8 num_fw_cqs;
+};
+
+struct qedn_global {
+	struct list_head qedn_pf_list;
+
+	/* Host mode */
+	struct list_head ctrl_list;
+
+	/* Mutex for accessing the global struct */
+	struct mutex glb_mutex;
 };
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 31d6d86d6eb7..e3e8e3676b79 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -14,6 +14,10 @@
 
 #define CHIP_NUM_AHP_NVMETCP 0x8194
 
+const struct qed_nvmetcp_ops *qed_ops;
+
+/* Global context instance */
+struct qedn_global qedn_glb;
 static struct pci_device_id qedn_pci_tbl[] = {
 	{ PCI_VDEVICE(QLOGIC, CHIP_NUM_AHP_NVMETCP), 0 },
 	{0, 0},
@@ -99,12 +103,132 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
 	.commit_rqs = qedn_commit_rqs,
 };
 
+static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)
+{
+	/* Placeholder - Initialize qedn fields */
+}
+
+static inline void
+qedn_init_core_probe_params(struct qed_probe_params *probe_params)
+{
+	memset(probe_params, 0, sizeof(*probe_params));
+	probe_params->protocol = QED_PROTOCOL_NVMETCP;
+	probe_params->is_vf = false;
+	probe_params->recov_in_prog = 0;
+}
+
+static inline int qedn_core_probe(struct qedn_ctx *qedn)
+{
+	struct qed_probe_params probe_params;
+	int rc = 0;
+
+	qedn_init_core_probe_params(&probe_params);
+	pr_info("Starting QED probe\n");
+	qedn->cdev = qed_ops->common->probe(qedn->pdev, &probe_params);
+	if (!qedn->cdev) {
+		rc = -ENODEV;
+		pr_err("QED probe failed\n");
+	}
+
+	return rc;
+}
+
+static void qedn_add_pf_to_gl_list(struct qedn_ctx *qedn)
+{
+	mutex_lock(&qedn_glb.glb_mutex);
+	list_add_tail(&qedn->gl_pf_entry, &qedn_glb.qedn_pf_list);
+	mutex_unlock(&qedn_glb.glb_mutex);
+}
+
+static void qedn_remove_pf_from_gl_list(struct qedn_ctx *qedn)
+{
+	mutex_lock(&qedn_glb.glb_mutex);
+	list_del_init(&qedn->gl_pf_entry);
+	mutex_unlock(&qedn_glb.glb_mutex);
+}
+
+static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)
+{
+	u32 fw_conn_queue_pages = QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES;
+	struct qed_nvmetcp_pf_params *pf_params;
+
+	pf_params = &qedn->pf_params.nvmetcp_pf_params;
+	memset(pf_params, 0, sizeof(*pf_params));
+	qedn->num_fw_cqs = min_t(u8, qedn->dev_info.num_cqs, num_online_cpus());
+
+	pf_params->num_cons = QEDN_MAX_CONNS_PER_PF;
+	pf_params->num_tasks = QEDN_MAX_TASKS_PER_PF;
+
+	/* Placeholder - Initialize function level queues */
+
+	/* Placeholder - Initialize TCP params */
+
+	/* Queues */
+	pf_params->num_sq_pages_in_ring = fw_conn_queue_pages;
+	pf_params->num_r2tq_pages_in_ring = fw_conn_queue_pages;
+	pf_params->num_uhq_pages_in_ring = fw_conn_queue_pages;
+	pf_params->num_queues = qedn->num_fw_cqs;
+	pf_params->cq_num_entries = QEDN_FW_CQ_SIZE;
+
+	/* the CQ SB pi */
+	pf_params->gl_rq_pi = QEDN_PROTO_CQ_PROD_IDX;
+
+	return 0;
+}
+
+static inline int qedn_slowpath_start(struct qedn_ctx *qedn)
+{
+	struct qed_slowpath_params sp_params = {};
+	int rc = 0;
+
+	/* Start the Slowpath-process */
+	sp_params.int_mode = QED_INT_MODE_MSIX;
+	sp_params.drv_major = QEDN_MAJOR_VERSION;
+	sp_params.drv_minor = QEDN_MINOR_VERSION;
+	sp_params.drv_rev = QEDN_REVISION_VERSION;
+	sp_params.drv_eng = QEDN_ENGINEERING_VERSION;
+	strscpy(sp_params.name, "qedn NVMeTCP", QED_DRV_VER_STR_SIZE);
+	rc = qed_ops->common->slowpath_start(qedn->cdev, &sp_params);
+	if (rc)
+		pr_err("Cannot start slowpath\n");
+
+	return rc;
+}
+
 static void __qedn_remove(struct pci_dev *pdev)
 {
 	struct qedn_ctx *qedn = pci_get_drvdata(pdev);
+	int rc;
+
+	pr_notice("qedn remove started: abs PF id=%u\n",
+		  qedn->dev_info.common.abs_pf_id);
+
+	if (test_and_set_bit(QEDN_STATE_MODULE_REMOVE_ONGOING, &qedn->state)) {
+		pr_err("Remove already ongoing\n");
+
+		return;
+	}
+
+	if (test_and_clear_bit(QEDN_STATE_REGISTERED_OFFLOAD_DEV, &qedn->state))
+		nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
+
+	if (test_and_clear_bit(QEDN_STATE_GL_PF_LIST_ADDED, &qedn->state))
+		qedn_remove_pf_from_gl_list(qedn);
+	else
+		pr_err("Failed to remove from global PF list\n");
+
+	if (test_and_clear_bit(QEDN_STATE_MFW_STATE, &qedn->state)) {
+		rc = qed_ops->common->update_drv_state(qedn->cdev, false);
+		if (rc)
+			pr_err("Failed to send drv state to MFW\n");
+	}
+
+	if (test_and_clear_bit(QEDN_STATE_CORE_OPEN, &qedn->state))
+		qed_ops->common->slowpath_stop(qedn->cdev);
+
+	if (test_and_clear_bit(QEDN_STATE_CORE_PROBED, &qedn->state))
+		qed_ops->common->remove(qedn->cdev);
 
-	pr_notice("Starting qedn_remove\n");
-	nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
 	kfree(qedn);
 	pr_notice("Ending qedn_remove successfully\n");
 }
@@ -144,15 +268,55 @@ static int __qedn_probe(struct pci_dev *pdev)
 	if (!qedn)
 		return -ENODEV;
 
+	qedn_init_pf_struct(qedn);
+
+	/* QED probe */
+	rc = qedn_core_probe(qedn);
+	if (rc)
+		goto exit_probe_and_release_mem;
+
+	set_bit(QEDN_STATE_CORE_PROBED, &qedn->state);
+
+	rc = qed_ops->fill_dev_info(qedn->cdev, &qedn->dev_info);
+	if (rc) {
+		pr_err("fill_dev_info failed\n");
+		goto exit_probe_and_release_mem;
+	}
+
+	qedn_add_pf_to_gl_list(qedn);
+	set_bit(QEDN_STATE_GL_PF_LIST_ADDED, &qedn->state);
+
+	rc = qedn_set_nvmetcp_pf_param(qedn);
+	if (rc)
+		goto exit_probe_and_release_mem;
+
+	qed_ops->common->update_pf_params(qedn->cdev, &qedn->pf_params);
+	rc = qedn_slowpath_start(qedn);
+	if (rc)
+		goto exit_probe_and_release_mem;
+
+	set_bit(QEDN_STATE_CORE_OPEN, &qedn->state);
+
+	rc = qed_ops->common->update_drv_state(qedn->cdev, true);
+	if (rc) {
+		pr_err("Failed to send drv state to MFW\n");
+		goto exit_probe_and_release_mem;
+	}
+
+	set_bit(QEDN_STATE_MFW_STATE, &qedn->state);
+
 	qedn->qedn_ofld_dev.ops = &qedn_ofld_ops;
 	INIT_LIST_HEAD(&qedn->qedn_ofld_dev.entry);
 	rc = nvme_tcp_ofld_register_dev(&qedn->qedn_ofld_dev);
 	if (rc)
-		goto release_qedn;
+		goto exit_probe_and_release_mem;
+
+	set_bit(QEDN_STATE_REGISTERED_OFFLOAD_DEV, &qedn->state);
 
 	return 0;
-release_qedn:
-	kfree(qedn);
+exit_probe_and_release_mem:
+	__qedn_remove(pdev);
+	pr_err("probe ended with error\n");
 
 	return rc;
 }
@@ -170,10 +334,26 @@ static struct pci_driver qedn_pci_driver = {
 	.shutdown = qedn_shutdown,
 };
 
+static inline void qedn_init_global_contxt(void)
+{
+	INIT_LIST_HEAD(&qedn_glb.qedn_pf_list);
+	INIT_LIST_HEAD(&qedn_glb.ctrl_list);
+	mutex_init(&qedn_glb.glb_mutex);
+}
+
 static int __init qedn_init(void)
 {
 	int rc;
 
+	qedn_init_global_contxt();
+
+	qed_ops = qed_get_nvmetcp_ops();
+	if (!qed_ops) {
+		pr_err("Failed to get QED NVMeTCP ops\n");
+
+		return -EINVAL;
+	}
+
 	rc = pci_register_driver(&qedn_pci_driver);
 	if (rc) {
 		pr_err("Failed to register pci driver\n");
@@ -189,6 +369,7 @@ static int __init qedn_init(void)
 static void __exit qedn_cleanup(void)
 {
 	pci_unregister_driver(&qedn_pci_driver);
+	qed_put_nvmetcp_ops();
 	pr_notice("Unloading qedn ended\n");
 }
 
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 18/27] qedn: Add qedn_claim_dev API support
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (16 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 17/27] qedn: Add qedn probe Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:29   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 19/27] qedn: Add IRQ and fast-path resources initializations Shai Malin
                   ` (9 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024, Nikolay Assa

From: Nikolay Assa <nassa@marvell.com>

This patch introduces the qedn_claim_dev() network service which the
offload device (qedn) is using through the paired net-device (qede).
qedn_claim_dev() returns true if the IP addr(IPv4 or IPv6) of the target
server is reachable via the net-device which is paired with the
offloaded device.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Nikolay Assa <nassa@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/qedn.h      |  4 +++
 drivers/nvme/hw/qedn/qedn_main.c | 42 ++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index c1ac17eabcb7..7efe2366eb7c 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -8,6 +8,10 @@
 
 #include <linux/qed/qed_if.h>
 #include <linux/qed/qed_nvmetcp_if.h>
+#include <linux/qed/qed_nvmetcp_ip_services_if.h>
+#include <linux/qed/qed_chain.h>
+#include <linux/qed/storage_common.h>
+#include <linux/qed/nvmetcp_common.h>
 
 /* Driver includes */
 #include "../../host/tcp-offload.h"
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index e3e8e3676b79..52007d35622d 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -27,9 +27,47 @@ static int
 qedn_claim_dev(struct nvme_tcp_ofld_dev *dev,
 	       struct nvme_tcp_ofld_ctrl_con_params *conn_params)
 {
-	/* Placeholder - qedn_claim_dev */
+	struct pci_dev *qede_pdev = NULL;
+	struct net_device *ndev = NULL;
+	u16 vlan_id = 0;
+	int rc = 0;
 
-	return 0;
+	/* qedn utilizes host network stack through paired qede device for
+	 * non-offload traffic. First we verify there is valid route to remote
+	 * peer.
+	 */
+	if (conn_params->remote_ip_addr.ss_family == AF_INET) {
+		rc = qed_route_ipv4(&conn_params->local_ip_addr,
+				    &conn_params->remote_ip_addr,
+				    &conn_params->remote_mac_addr,
+				    &ndev);
+	} else if (conn_params->remote_ip_addr.ss_family == AF_INET6) {
+		rc = qed_route_ipv6(&conn_params->local_ip_addr,
+				    &conn_params->remote_ip_addr,
+				    &conn_params->remote_mac_addr,
+				    &ndev);
+	} else {
+		pr_err("address family %d not supported\n",
+		       conn_params->remote_ip_addr.ss_family);
+
+		return false;
+	}
+
+	if (rc)
+		return false;
+
+	qed_vlan_get_ndev(&ndev, &vlan_id);
+	conn_params->vlan_id = vlan_id;
+
+	/* route found through ndev - validate this is qede*/
+	qede_pdev = qed_validate_ndev(ndev);
+	if (!qede_pdev)
+		return false;
+
+	dev->qede_pdev = qede_pdev;
+	dev->ndev = ndev;
+
+	return true;
 }
 
 static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 19/27] qedn: Add IRQ and fast-path resources initializations
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (17 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 18/27] qedn: Add qedn_claim_dev API support Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:32   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 20/27] qedn: Add connection-level slowpath functionality Shai Malin
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

This patch will present the adding of qedn_fp_queue - this is a per cpu
core element which handles all of the connections on that cpu core.
The qedn_fp_queue will handle a group of connections (NVMeoF QPs) which
are handled on the same cpu core, and will only use the same FW-driver
resources with no need to be related to the same NVMeoF controller.

The per qedn_fq_queue resources are the FW CQ and FW status block:
- The FW CQ will be used for the FW to notify the driver that the
  the exchange has ended and the FW will pass the incoming NVMeoF CQE
  (if exist) to the driver.
- FW status block - which is used for the FW to notify the driver with
  the producer update of the FW CQE chain.

The FW fast-path queues are based on qed_chain.h

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/qedn.h      |  26 +++
 drivers/nvme/hw/qedn/qedn_main.c | 287 ++++++++++++++++++++++++++++++-
 2 files changed, 310 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 7efe2366eb7c..5d4d04d144e4 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -33,18 +33,41 @@
 #define QEDN_PROTO_CQ_PROD_IDX	0
 #define QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES 2
 
+#define QEDN_PAGE_SIZE	4096 /* FW page size - Configurable */
+#define QEDN_IRQ_NAME_LEN 24
+#define QEDN_IRQ_NO_FLAGS 0
+
+/* TCP defines */
+#define QEDN_TCP_RTO_DEFAULT 280
+
 enum qedn_state {
 	QEDN_STATE_CORE_PROBED = 0,
 	QEDN_STATE_CORE_OPEN,
 	QEDN_STATE_GL_PF_LIST_ADDED,
 	QEDN_STATE_MFW_STATE,
+	QEDN_STATE_NVMETCP_OPEN,
+	QEDN_STATE_IRQ_SET,
+	QEDN_STATE_FP_WORK_THREAD_SET,
 	QEDN_STATE_REGISTERED_OFFLOAD_DEV,
 	QEDN_STATE_MODULE_REMOVE_ONGOING,
 };
 
+/* Per CPU core params */
+struct qedn_fp_queue {
+	struct qed_chain cq_chain;
+	u16 *cq_prod;
+	struct mutex cq_mutex; /* cq handler mutex */
+	struct qedn_ctx	*qedn;
+	struct qed_sb_info *sb_info;
+	unsigned int cpu;
+	u16 sb_id;
+	char irqname[QEDN_IRQ_NAME_LEN];
+};
+
 struct qedn_ctx {
 	struct pci_dev *pdev;
 	struct qed_dev *cdev;
+	struct qed_int_info int_info;
 	struct qed_dev_nvmetcp_info dev_info;
 	struct nvme_tcp_ofld_dev qedn_ofld_dev;
 	struct qed_pf_params pf_params;
@@ -57,6 +80,9 @@ struct qedn_ctx {
 
 	/* Fast path queues */
 	u8 num_fw_cqs;
+	struct qedn_fp_queue *fp_q_arr;
+	struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;
+	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
 };
 
 struct qedn_global {
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 52007d35622d..0135a1f490da 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -141,6 +141,104 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
 	.commit_rqs = qedn_commit_rqs,
 };
 
+/* Fastpath IRQ handler */
+static irqreturn_t qedn_irq_handler(int irq, void *dev_id)
+{
+	/* Placeholder */
+
+	return IRQ_HANDLED;
+}
+
+static void qedn_sync_free_irqs(struct qedn_ctx *qedn)
+{
+	u16 vector_idx;
+	int i;
+
+	for (i = 0; i < qedn->num_fw_cqs; i++) {
+		vector_idx = i * qedn->dev_info.common.num_hwfns +
+			     qed_ops->common->get_affin_hwfn_idx(qedn->cdev);
+		synchronize_irq(qedn->int_info.msix[vector_idx].vector);
+		irq_set_affinity_hint(qedn->int_info.msix[vector_idx].vector,
+				      NULL);
+		free_irq(qedn->int_info.msix[vector_idx].vector,
+			 &qedn->fp_q_arr[i]);
+	}
+
+	qedn->int_info.used_cnt = 0;
+	qed_ops->common->set_fp_int(qedn->cdev, 0);
+}
+
+static int qedn_request_msix_irq(struct qedn_ctx *qedn)
+{
+	struct pci_dev *pdev = qedn->pdev;
+	struct qedn_fp_queue *fp_q = NULL;
+	int i, rc, cpu;
+	u16 vector_idx;
+	u32 vector;
+
+	/* numa-awareness will be added in future enhancements */
+	cpu = cpumask_first(cpu_online_mask);
+	for (i = 0; i < qedn->num_fw_cqs; i++) {
+		fp_q = &qedn->fp_q_arr[i];
+		vector_idx = i * qedn->dev_info.common.num_hwfns +
+			     qed_ops->common->get_affin_hwfn_idx(qedn->cdev);
+		vector = qedn->int_info.msix[vector_idx].vector;
+		sprintf(fp_q->irqname, "qedn_queue_%x.%x.%x_%d",
+			pdev->bus->number, PCI_SLOT(pdev->devfn),
+			PCI_FUNC(pdev->devfn), i);
+		rc = request_irq(vector, qedn_irq_handler, QEDN_IRQ_NO_FLAGS,
+				 fp_q->irqname, fp_q);
+		if (rc) {
+			pr_err("request_irq failed.\n");
+			qedn_sync_free_irqs(qedn);
+
+			return rc;
+		}
+
+		fp_q->cpu = cpu;
+		qedn->int_info.used_cnt++;
+		rc = irq_set_affinity_hint(vector, get_cpu_mask(cpu));
+		cpu = cpumask_next_wrap(cpu, cpu_online_mask, -1, false);
+	}
+
+	return 0;
+}
+
+static int qedn_setup_irq(struct qedn_ctx *qedn)
+{
+	int rc = 0;
+	u8 rval;
+
+	rval = qed_ops->common->set_fp_int(qedn->cdev, qedn->num_fw_cqs);
+	if (rval < qedn->num_fw_cqs) {
+		qedn->num_fw_cqs = rval;
+		if (rval == 0) {
+			pr_err("set_fp_int return 0 IRQs\n");
+
+			return -ENODEV;
+		}
+	}
+
+	rc = qed_ops->common->get_fp_int(qedn->cdev, &qedn->int_info);
+	if (rc) {
+		pr_err("get_fp_int failed\n");
+		goto exit_setup_int;
+	}
+
+	if (qedn->int_info.msix_cnt) {
+		rc = qedn_request_msix_irq(qedn);
+		goto exit_setup_int;
+	} else {
+		pr_err("msix_cnt = 0\n");
+		rc = -EINVAL;
+		goto exit_setup_int;
+	}
+
+exit_setup_int:
+
+	return rc;
+}
+
 static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)
 {
 	/* Placeholder - Initialize qedn fields */
@@ -185,21 +283,173 @@ static void qedn_remove_pf_from_gl_list(struct qedn_ctx *qedn)
 	mutex_unlock(&qedn_glb.glb_mutex);
 }
 
+static void qedn_free_function_queues(struct qedn_ctx *qedn)
+{
+	struct qed_sb_info *sb_info = NULL;
+	struct qedn_fp_queue *fp_q;
+	int i;
+
+	/* Free workqueues */
+
+	/* Free the fast path queues*/
+	for (i = 0; i < qedn->num_fw_cqs; i++) {
+		fp_q = &qedn->fp_q_arr[i];
+
+		/* Free SB */
+		sb_info = fp_q->sb_info;
+		if (sb_info->sb_virt) {
+			qed_ops->common->sb_release(qedn->cdev, sb_info,
+						    fp_q->sb_id,
+						    QED_SB_TYPE_STORAGE);
+			dma_free_coherent(&qedn->pdev->dev,
+					  sizeof(*sb_info->sb_virt),
+					  (void *)sb_info->sb_virt,
+					  sb_info->sb_phys);
+			memset(sb_info, 0, sizeof(*sb_info));
+			kfree(sb_info);
+			fp_q->sb_info = NULL;
+		}
+
+		qed_ops->common->chain_free(qedn->cdev, &fp_q->cq_chain);
+	}
+
+	if (qedn->fw_cq_array_virt)
+		dma_free_coherent(&qedn->pdev->dev,
+				  qedn->num_fw_cqs * sizeof(u64),
+				  qedn->fw_cq_array_virt,
+				  qedn->fw_cq_array_phy);
+	kfree(qedn->fp_q_arr);
+	qedn->fp_q_arr = NULL;
+}
+
+static int qedn_alloc_and_init_sb(struct qedn_ctx *qedn,
+				  struct qed_sb_info *sb_info, u16 sb_id)
+{
+	int rc = 0;
+
+	sb_info->sb_virt = dma_alloc_coherent(&qedn->pdev->dev,
+					      sizeof(struct status_block_e4),
+					      &sb_info->sb_phys, GFP_KERNEL);
+	if (!sb_info->sb_virt) {
+		pr_err("Status block allocation failed\n");
+
+		return -ENOMEM;
+	}
+
+	rc = qed_ops->common->sb_init(qedn->cdev, sb_info, sb_info->sb_virt,
+				      sb_info->sb_phys, sb_id,
+				      QED_SB_TYPE_STORAGE);
+	if (rc) {
+		pr_err("Status block initialization failed\n");
+
+		return rc;
+	}
+
+	return 0;
+}
+
+static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
+{
+	struct qed_chain_init_params chain_params = {};
+	struct status_block_e4 *sb = NULL;  /* To change to status_block_e4 */
+	struct qedn_fp_queue *fp_q = NULL;
+	int rc = 0, arr_size;
+	u64 cq_phy_addr;
+	int i;
+
+	/* Place holder - IO-path workqueues */
+
+	qedn->fp_q_arr = kcalloc(qedn->num_fw_cqs,
+				 sizeof(struct qedn_fp_queue), GFP_KERNEL);
+	if (!qedn->fp_q_arr)
+		return -ENOMEM;
+
+	arr_size = qedn->num_fw_cqs * sizeof(struct nvmetcp_glbl_queue_entry);
+	qedn->fw_cq_array_virt = dma_alloc_coherent(&qedn->pdev->dev,
+						    arr_size,
+						    &qedn->fw_cq_array_phy,
+						    GFP_KERNEL);
+	if (!qedn->fw_cq_array_virt) {
+		rc = -ENOMEM;
+		goto mem_alloc_failure;
+	}
+
+	/* placeholder - create task pools */
+
+	for (i = 0; i < qedn->num_fw_cqs; i++) {
+		fp_q = &qedn->fp_q_arr[i];
+		mutex_init(&fp_q->cq_mutex);
+
+		/* FW CQ */
+		chain_params.intended_use = QED_CHAIN_USE_TO_CONSUME,
+		chain_params.mode = QED_CHAIN_MODE_PBL,
+		chain_params.cnt_type = QED_CHAIN_CNT_TYPE_U16,
+		chain_params.num_elems = QEDN_FW_CQ_SIZE;
+		chain_params.elem_size = 64; /*Placeholder - sizeof(struct nvmetcp_fw_cqe)*/
+
+		rc = qed_ops->common->chain_alloc(qedn->cdev,
+						  &fp_q->cq_chain,
+						  &chain_params);
+		if (rc) {
+			pr_err("CQ chain pci_alloc_consistent fail\n");
+			goto mem_alloc_failure;
+		}
+
+		cq_phy_addr = qed_chain_get_pbl_phys(&fp_q->cq_chain);
+		qedn->fw_cq_array_virt[i].cq_pbl_addr.hi = PTR_HI(cq_phy_addr);
+		qedn->fw_cq_array_virt[i].cq_pbl_addr.lo = PTR_LO(cq_phy_addr);
+
+		/* SB */
+		fp_q->sb_info = kzalloc(sizeof(*fp_q->sb_info), GFP_KERNEL);
+		if (!fp_q->sb_info)
+			goto mem_alloc_failure;
+
+		fp_q->sb_id = i;
+		rc = qedn_alloc_and_init_sb(qedn, fp_q->sb_info, fp_q->sb_id);
+		if (rc) {
+			pr_err("SB allocation and initialization failed.\n");
+			goto mem_alloc_failure;
+		}
+
+		sb = fp_q->sb_info->sb_virt;
+		fp_q->cq_prod = (u16 *)&sb->pi_array[QEDN_PROTO_CQ_PROD_IDX];
+		fp_q->qedn = qedn;
+
+		/* Placeholder - Init IO-path workqueue */
+
+		/* Placeholder - Init IO-path resources */
+	}
+
+	return 0;
+
+mem_alloc_failure:
+	pr_err("Function allocation failed\n");
+	qedn_free_function_queues(qedn);
+
+	return rc;
+}
+
 static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)
 {
 	u32 fw_conn_queue_pages = QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES;
 	struct qed_nvmetcp_pf_params *pf_params;
+	int rc;
 
 	pf_params = &qedn->pf_params.nvmetcp_pf_params;
 	memset(pf_params, 0, sizeof(*pf_params));
 	qedn->num_fw_cqs = min_t(u8, qedn->dev_info.num_cqs, num_online_cpus());
+	pr_info("Num qedn CPU cores is %u\n", qedn->num_fw_cqs);
 
 	pf_params->num_cons = QEDN_MAX_CONNS_PER_PF;
 	pf_params->num_tasks = QEDN_MAX_TASKS_PER_PF;
 
-	/* Placeholder - Initialize function level queues */
+	rc = qedn_alloc_function_queues(qedn);
+	if (rc) {
+		pr_err("Global queue allocation failed.\n");
+		goto err_alloc_mem;
+	}
 
-	/* Placeholder - Initialize TCP params */
+	set_bit(QEDN_STATE_FP_WORK_THREAD_SET, &qedn->state);
 
 	/* Queues */
 	pf_params->num_sq_pages_in_ring = fw_conn_queue_pages;
@@ -207,11 +457,14 @@ static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)
 	pf_params->num_uhq_pages_in_ring = fw_conn_queue_pages;
 	pf_params->num_queues = qedn->num_fw_cqs;
 	pf_params->cq_num_entries = QEDN_FW_CQ_SIZE;
+	pf_params->glbl_q_params_addr = qedn->fw_cq_array_phy;
 
 	/* the CQ SB pi */
 	pf_params->gl_rq_pi = QEDN_PROTO_CQ_PROD_IDX;
 
-	return 0;
+err_alloc_mem:
+
+	return rc;
 }
 
 static inline int qedn_slowpath_start(struct qedn_ctx *qedn)
@@ -255,6 +508,12 @@ static void __qedn_remove(struct pci_dev *pdev)
 	else
 		pr_err("Failed to remove from global PF list\n");
 
+	if (test_and_clear_bit(QEDN_STATE_IRQ_SET, &qedn->state))
+		qedn_sync_free_irqs(qedn);
+
+	if (test_and_clear_bit(QEDN_STATE_NVMETCP_OPEN, &qedn->state))
+		qed_ops->stop(qedn->cdev);
+
 	if (test_and_clear_bit(QEDN_STATE_MFW_STATE, &qedn->state)) {
 		rc = qed_ops->common->update_drv_state(qedn->cdev, false);
 		if (rc)
@@ -264,6 +523,9 @@ static void __qedn_remove(struct pci_dev *pdev)
 	if (test_and_clear_bit(QEDN_STATE_CORE_OPEN, &qedn->state))
 		qed_ops->common->slowpath_stop(qedn->cdev);
 
+	if (test_and_clear_bit(QEDN_STATE_FP_WORK_THREAD_SET, &qedn->state))
+		qedn_free_function_queues(qedn);
+
 	if (test_and_clear_bit(QEDN_STATE_CORE_PROBED, &qedn->state))
 		qed_ops->common->remove(qedn->cdev);
 
@@ -335,6 +597,25 @@ static int __qedn_probe(struct pci_dev *pdev)
 
 	set_bit(QEDN_STATE_CORE_OPEN, &qedn->state);
 
+	rc = qedn_setup_irq(qedn);
+	if (rc)
+		goto exit_probe_and_release_mem;
+
+	set_bit(QEDN_STATE_IRQ_SET, &qedn->state);
+
+	/* NVMeTCP start HW PF */
+	rc = qed_ops->start(qedn->cdev,
+			    NULL /* Placeholder for FW IO-path resources */,
+			    qedn,
+			    NULL /* Placeholder for FW Event callback */);
+	if (rc) {
+		rc = -ENODEV;
+		pr_err("Cannot start NVMeTCP Function\n");
+		goto exit_probe_and_release_mem;
+	}
+
+	set_bit(QEDN_STATE_NVMETCP_OPEN, &qedn->state);
+
 	rc = qed_ops->common->update_drv_state(qedn->cdev, true);
 	if (rc) {
 		pr_err("Failed to send drv state to MFW\n");
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 20/27] qedn: Add connection-level slowpath functionality
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (18 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 19/27] qedn: Add IRQ and fast-path resources initializations Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:37   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 21/27] qedn: Add support of configuring HW filter block Shai Malin
                   ` (7 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

From: Prabhakar Kushwaha <pkushwaha@marvell.com>

This patch will present the connection (queue) level slowpath
implementation relevant for create_queue flow.

The internal implementation:
- Add per controller slowpath workqeueue via pre_setup_ctrl

- qedn_main.c:
  Includes qedn's implementation of the create_queue op.

- qedn_conn.c will include main slowpath connection level functions,
  including:
    1. Per-queue resources allocation.
    2. Creating a new connection.
    3. Offloading the connection to the FW for TCP handshake.
    4. Destroy of a connection.
    5. Support of delete and free controller.
    6. TCP port management via qed_fetch_tcp_port, qed_return_tcp_port

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/Makefile    |   5 +-
 drivers/nvme/hw/qedn/qedn.h      | 173 ++++++++++-
 drivers/nvme/hw/qedn/qedn_conn.c | 508 +++++++++++++++++++++++++++++++
 drivers/nvme/hw/qedn/qedn_main.c | 208 ++++++++++++-
 4 files changed, 883 insertions(+), 11 deletions(-)
 create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c

diff --git a/drivers/nvme/hw/qedn/Makefile b/drivers/nvme/hw/qedn/Makefile
index cb169bbaae18..d8b343afcd16 100644
--- a/drivers/nvme/hw/qedn/Makefile
+++ b/drivers/nvme/hw/qedn/Makefile
@@ -1,5 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-obj-$(CONFIG_NVME_QEDN) := qedn.o
-
-qedn-y := qedn_main.o
\ No newline at end of file
+obj-$(CONFIG_NVME_QEDN) += qedn.o
+qedn-y := qedn_main.o qedn_conn.o
diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 5d4d04d144e4..ed0d43163da2 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -6,6 +6,7 @@
 #ifndef _QEDN_H_
 #define _QEDN_H_
 
+#include <linux/qed/common_hsi.h>
 #include <linux/qed/qed_if.h>
 #include <linux/qed/qed_nvmetcp_if.h>
 #include <linux/qed/qed_nvmetcp_ip_services_if.h>
@@ -37,8 +38,41 @@
 #define QEDN_IRQ_NAME_LEN 24
 #define QEDN_IRQ_NO_FLAGS 0
 
-/* TCP defines */
+/* Destroy connection defines */
+#define QEDN_NON_ABORTIVE_TERMINATION 0
+#define QEDN_ABORTIVE_TERMINATION 1
+
+/*
+ * TCP offload stack default configurations and defines.
+ * Future enhancements will allow controlling the configurable
+ * parameters via devlink.
+ */
 #define QEDN_TCP_RTO_DEFAULT 280
+#define QEDN_TCP_ECN_EN 0
+#define QEDN_TCP_TS_EN 0
+#define QEDN_TCP_DA_EN 0
+#define QEDN_TCP_KA_EN 0
+#define QEDN_TCP_TOS 0
+#define QEDN_TCP_TTL 0xfe
+#define QEDN_TCP_FLOW_LABEL 0
+#define QEDN_TCP_KA_TIMEOUT 7200000
+#define QEDN_TCP_KA_INTERVAL 10000
+#define QEDN_TCP_KA_MAX_PROBE_COUNT 10
+#define QEDN_TCP_MAX_RT_TIME 1200
+#define QEDN_TCP_MAX_CWND 4
+#define QEDN_TCP_RCV_WND_SCALE 2
+#define QEDN_TCP_TS_OPTION_LEN 12
+
+/* SP Work queue defines */
+#define QEDN_SP_WORKQUEUE "qedn_sp_wq"
+#define QEDN_SP_WORKQUEUE_MAX_ACTIVE 1
+
+#define QEDN_HOST_MAX_SQ_SIZE (512)
+#define QEDN_SQ_SIZE (2 * QEDN_HOST_MAX_SQ_SIZE)
+
+/* Timeouts and delay constants */
+#define QEDN_WAIT_CON_ESTABLSH_TMO 10000 /* 10 seconds */
+#define QEDN_RLS_CONS_TMO 5000 /* 5 sec */
 
 enum qedn_state {
 	QEDN_STATE_CORE_PROBED = 0,
@@ -78,6 +112,12 @@ struct qedn_ctx {
 	/* Accessed with atomic bit ops, used with enum qedn_state */
 	unsigned long state;
 
+	u8 local_mac_addr[ETH_ALEN];
+	u16 mtu;
+
+	/* Connections */
+	DECLARE_HASHTABLE(conn_ctx_hash, 16);
+
 	/* Fast path queues */
 	u8 num_fw_cqs;
 	struct qedn_fp_queue *fp_q_arr;
@@ -85,6 +125,126 @@ struct qedn_ctx {
 	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
 };
 
+struct qedn_endpoint {
+	/* FW Params */
+	struct qed_chain fw_sq_chain;
+	void __iomem *p_doorbell;
+
+	/* TCP Params */
+	__be32 dst_addr[4]; /* In network order */
+	__be32 src_addr[4]; /* In network order */
+	u16 src_port;
+	u16 dst_port;
+	u16 vlan_id;
+	u8 src_mac[ETH_ALEN];
+	u8 dst_mac[ETH_ALEN];
+	u8 ip_type;
+};
+
+enum sp_work_agg_action {
+	CREATE_CONNECTION = 0,
+	SEND_ICREQ,
+	HANDLE_ICRESP,
+	DESTROY_CONNECTION,
+};
+
+enum qedn_ctrl_agg_state {
+	QEDN_CTRL_SET_TO_OFLD_CTRL = 0, /* CTRL set to OFLD_CTRL */
+	QEDN_STATE_SP_WORK_THREAD_SET, /* slow patch WQ was created*/
+	LLH_FILTER, /* LLH filter added */
+	QEDN_RECOVERY,
+	ADMINQ_CONNECTED, /* At least one connection has attempted offload */
+	ERR_FLOW,
+};
+
+enum qedn_ctrl_sp_wq_state {
+	QEDN_CTRL_STATE_UNINITIALIZED = 0,
+	QEDN_CTRL_STATE_FREE_CTRL,
+	QEDN_CTRL_STATE_CTRL_ERR,
+};
+
+/* Any change to this enum requires an update of qedn_conn_state_str */
+enum qedn_conn_state {
+	CONN_STATE_CONN_IDLE = 0,
+	CONN_STATE_CREATE_CONNECTION,
+	CONN_STATE_WAIT_FOR_CONNECT_DONE,
+	CONN_STATE_OFFLOAD_COMPLETE,
+	CONN_STATE_WAIT_FOR_UPDATE_EQE,
+	CONN_STATE_WAIT_FOR_IC_COMP,
+	CONN_STATE_NVMETCP_CONN_ESTABLISHED,
+	CONN_STATE_DESTROY_CONNECTION,
+	CONN_STATE_WAIT_FOR_DESTROY_DONE,
+	CONN_STATE_DESTROY_COMPLETE
+};
+
+struct qedn_ctrl {
+	struct list_head glb_entry;
+	struct list_head pf_entry;
+
+	struct qedn_ctx *qedn;
+	struct nvme_tcp_ofld_queue *queue;
+	struct nvme_tcp_ofld_ctrl *ctrl;
+
+	struct workqueue_struct *sp_wq;
+	enum qedn_ctrl_sp_wq_state sp_wq_state;
+
+	struct work_struct sp_wq_entry;
+
+	struct qedn_llh_filter *llh_filter;
+
+	unsigned long agg_state;
+
+	atomic_t host_num_active_conns;
+};
+
+/* Connection level struct */
+struct qedn_conn_ctx {
+	struct qedn_ctx *qedn;
+	struct nvme_tcp_ofld_queue *queue;
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	u32 conn_handle;
+	u32 fw_cid;
+
+	atomic_t est_conn_indicator;
+	atomic_t destroy_conn_indicator;
+	wait_queue_head_t conn_waitq;
+
+	struct work_struct sp_wq_entry;
+
+	/* Connection aggregative state.
+	 * Can have different states independently.
+	 */
+	unsigned long agg_work_action;
+
+	struct hlist_node hash_node;
+	struct nvmetcp_host_cccid_itid_entry *host_cccid_itid;
+	dma_addr_t host_cccid_itid_phy_addr;
+	struct qedn_endpoint ep;
+	int abrt_flag;
+
+	/* Connection resources - turned on to indicate what resource was
+	 * allocated, to that it can later be released.
+	 */
+	unsigned long resrc_state;
+
+	/* Connection state */
+	spinlock_t conn_state_lock;
+	enum qedn_conn_state state;
+
+	size_t sq_depth;
+
+	/* "dummy" socket */
+	struct socket *sock;
+};
+
+enum qedn_conn_resources_state {
+	QEDN_CONN_RESRC_FW_SQ,
+	QEDN_CONN_RESRC_ACQUIRE_CONN,
+	QEDN_CONN_RESRC_CCCID_ITID_MAP,
+	QEDN_CONN_RESRC_TCP_PORT,
+	QEDN_CONN_RESRC_MAX = 64
+};
+
 struct qedn_global {
 	struct list_head qedn_pf_list;
 
@@ -95,4 +255,15 @@ struct qedn_global {
 	struct mutex glb_mutex;
 };
 
+struct qedn_conn_ctx *qedn_get_conn_hash(struct qedn_ctx *qedn, u16 icid);
+int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data);
+void qedn_sp_wq_handler(struct work_struct *work);
+void qedn_set_sp_wa(struct qedn_conn_ctx *conn_ctx, u32 bit);
+void qedn_clr_sp_wa(struct qedn_conn_ctx *conn_ctx, u32 bit);
+int qedn_initialize_endpoint(struct qedn_endpoint *ep, u8 *local_mac_addr,
+			     struct nvme_tcp_ofld_ctrl_con_params *conn_params);
+int qedn_wait_for_conn_est(struct qedn_conn_ctx *conn_ctx);
+int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state);
+void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx, int abrt_flag);
+
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
new file mode 100644
index 000000000000..9bfc0a5f0cdb
--- /dev/null
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -0,0 +1,508 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+ /* Kernel includes */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <net/tcp.h>
+
+/* Driver includes */
+#include "qedn.h"
+
+extern const struct qed_nvmetcp_ops *qed_ops;
+
+static const char * const qedn_conn_state_str[] = {
+	"CONN_IDLE",
+	"CREATE_CONNECTION",
+	"WAIT_FOR_CONNECT_DONE",
+	"OFFLOAD_COMPLETE",
+	"WAIT_FOR_UPDATE_EQE",
+	"WAIT_FOR_IC_COMP",
+	"NVMETCP_CONN_ESTABLISHED",
+	"DESTROY_CONNECTION",
+	"WAIT_FOR_DESTROY_DONE",
+	"DESTROY_COMPLETE",
+	NULL
+};
+
+int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state)
+{
+	spin_lock_bh(&conn_ctx->conn_state_lock);
+	conn_ctx->state = new_state;
+	spin_unlock_bh(&conn_ctx->conn_state_lock);
+
+	return 0;
+}
+
+static void qedn_return_tcp_port(struct qedn_conn_ctx *conn_ctx)
+{
+	if (conn_ctx->sock && conn_ctx->sock->sk) {
+		qed_return_tcp_port(conn_ctx->sock);
+		conn_ctx->sock = NULL;
+	}
+
+	conn_ctx->ep.src_port = 0;
+}
+
+int qedn_wait_for_conn_est(struct qedn_conn_ctx *conn_ctx)
+{
+	int wrc, rc;
+
+	wrc = wait_event_interruptible_timeout(conn_ctx->conn_waitq,
+					       atomic_read(&conn_ctx->est_conn_indicator) > 0,
+					       msecs_to_jiffies(QEDN_WAIT_CON_ESTABLSH_TMO));
+	atomic_set(&conn_ctx->est_conn_indicator, 0);
+	if (!wrc ||
+	    conn_ctx->state != CONN_STATE_NVMETCP_CONN_ESTABLISHED) {
+		rc = -ETIMEDOUT;
+
+		/* If error was prior or during offload, conn_ctx was released.
+		 * If the error was after offload sync has completed, we need to
+		 * terminate the connection ourselves.
+		 */
+		if (conn_ctx &&
+		    conn_ctx->state >= CONN_STATE_WAIT_FOR_CONNECT_DONE &&
+		    conn_ctx->state <= CONN_STATE_NVMETCP_CONN_ESTABLISHED)
+			qedn_terminate_connection(conn_ctx,
+						  QEDN_ABORTIVE_TERMINATION);
+	} else {
+		rc = 0;
+	}
+
+	return rc;
+}
+
+int qedn_fill_ep_addr4(struct qedn_endpoint *ep,
+		       struct nvme_tcp_ofld_ctrl_con_params *conn_params)
+{
+	struct sockaddr_in *raddr = (struct sockaddr_in *)&conn_params->remote_ip_addr;
+	struct sockaddr_in *laddr = (struct sockaddr_in *)&conn_params->local_ip_addr;
+
+	ep->ip_type = TCP_IPV4;
+	ep->src_port = laddr->sin_port;
+	ep->dst_port = ntohs(raddr->sin_port);
+
+	ep->src_addr[0] = laddr->sin_addr.s_addr;
+	ep->dst_addr[0] = raddr->sin_addr.s_addr;
+
+	return 0;
+}
+
+int qedn_fill_ep_addr6(struct qedn_endpoint *ep,
+		       struct nvme_tcp_ofld_ctrl_con_params *conn_params)
+{
+	struct sockaddr_in6 *raddr6 = (struct sockaddr_in6 *)&conn_params->remote_ip_addr;
+	struct sockaddr_in6 *laddr6 = (struct sockaddr_in6 *)&conn_params->local_ip_addr;
+	int i;
+
+	ep->ip_type = TCP_IPV6;
+	ep->src_port = laddr6->sin6_port;
+	ep->dst_port = ntohs(raddr6->sin6_port);
+
+	for (i = 0; i < 4; i++) {
+		ep->src_addr[i] = laddr6->sin6_addr.in6_u.u6_addr32[i];
+		ep->dst_addr[i] = raddr6->sin6_addr.in6_u.u6_addr32[i];
+	}
+
+	return 0;
+}
+
+int qedn_initialize_endpoint(struct qedn_endpoint *ep, u8 *local_mac_addr,
+			     struct nvme_tcp_ofld_ctrl_con_params *conn_params)
+{
+	ether_addr_copy(ep->dst_mac, conn_params->remote_mac_addr.sa_data);
+	ether_addr_copy(ep->src_mac, local_mac_addr);
+	ep->vlan_id = conn_params->vlan_id;
+	if (conn_params->remote_ip_addr.ss_family == AF_INET)
+		qedn_fill_ep_addr4(ep, conn_params);
+	else
+		qedn_fill_ep_addr6(ep, conn_params);
+
+	return -1;
+}
+
+static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int rc = 0;
+
+	if (test_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state)) {
+		qed_ops->common->chain_free(qedn->cdev,
+					    &conn_ctx->ep.fw_sq_chain);
+		clear_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
+	}
+
+	if (test_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state)) {
+		hash_del(&conn_ctx->hash_node);
+		rc = qed_ops->release_conn(qedn->cdev, conn_ctx->conn_handle);
+		if (rc)
+			pr_warn("Release_conn returned with an error %u\n",
+				rc);
+
+		clear_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
+	}
+
+	if (test_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state)) {
+		dma_free_coherent(&qedn->pdev->dev,
+				  conn_ctx->sq_depth *
+				  sizeof(struct nvmetcp_host_cccid_itid_entry),
+				  conn_ctx->host_cccid_itid,
+				  conn_ctx->host_cccid_itid_phy_addr);
+		clear_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP,
+			  &conn_ctx->resrc_state);
+	}
+
+	if (test_bit(QEDN_CONN_RESRC_TCP_PORT, &conn_ctx->resrc_state)) {
+		qedn_return_tcp_port(conn_ctx);
+		clear_bit(QEDN_CONN_RESRC_TCP_PORT,
+			  &conn_ctx->resrc_state);
+	}
+
+	if (conn_ctx->resrc_state)
+		pr_err("Conn resources state isn't 0 as expected 0x%lx\n",
+		       conn_ctx->resrc_state);
+
+	atomic_inc(&conn_ctx->destroy_conn_indicator);
+	qedn_set_con_state(conn_ctx, CONN_STATE_DESTROY_COMPLETE);
+	wake_up_interruptible(&conn_ctx->conn_waitq);
+}
+
+static int qedn_alloc_fw_sq(struct qedn_ctx *qedn,
+			    struct qedn_endpoint *ep)
+{
+	struct qed_chain_init_params params = {
+		.mode           = QED_CHAIN_MODE_PBL,
+		.intended_use   = QED_CHAIN_USE_TO_PRODUCE,
+		.cnt_type       = QED_CHAIN_CNT_TYPE_U16,
+		.num_elems      = QEDN_SQ_SIZE,
+		.elem_size      = sizeof(struct nvmetcp_wqe),
+	};
+	int rc;
+
+	rc = qed_ops->common->chain_alloc(qedn->cdev,
+					   &ep->fw_sq_chain,
+					   &params);
+	if (rc) {
+		pr_err("Failed to allocate SQ chain\n");
+
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int qedn_nvmetcp_offload_conn(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qed_nvmetcp_params_offload offld_prms = { 0 };
+	struct qedn_endpoint *qedn_ep = &conn_ctx->ep;
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	u8 ts_hdr_size = 0;
+	u32 hdr_size;
+	int rc, i;
+
+	ether_addr_copy(offld_prms.src.mac, qedn_ep->src_mac);
+	ether_addr_copy(offld_prms.dst.mac, qedn_ep->dst_mac);
+	offld_prms.vlan_id = qedn_ep->vlan_id;
+	offld_prms.ecn_en = QEDN_TCP_ECN_EN;
+	offld_prms.timestamp_en =  QEDN_TCP_TS_EN;
+	offld_prms.delayed_ack_en = QEDN_TCP_DA_EN;
+	offld_prms.tcp_keep_alive_en = QEDN_TCP_KA_EN;
+	offld_prms.ip_version = qedn_ep->ip_type;
+
+	offld_prms.src.ip[0] = ntohl(qedn_ep->src_addr[0]);
+	offld_prms.dst.ip[0] = ntohl(qedn_ep->dst_addr[0]);
+	if (qedn_ep->ip_type == TCP_IPV6) {
+		for (i = 1; i < 4; i++) {
+			offld_prms.src.ip[i] = ntohl(qedn_ep->src_addr[i]);
+			offld_prms.dst.ip[i] = ntohl(qedn_ep->dst_addr[i]);
+		}
+	}
+
+	offld_prms.ttl = QEDN_TCP_TTL;
+	offld_prms.tos_or_tc = QEDN_TCP_TOS;
+	offld_prms.dst.port = qedn_ep->dst_port;
+	offld_prms.src.port = qedn_ep->src_port;
+	offld_prms.nvmetcp_cccid_itid_table_addr =
+		conn_ctx->host_cccid_itid_phy_addr;
+	offld_prms.nvmetcp_cccid_max_range = conn_ctx->sq_depth;
+
+	/* Calculate MSS */
+	if (offld_prms.timestamp_en)
+		ts_hdr_size = QEDN_TCP_TS_OPTION_LEN;
+
+	hdr_size = qedn_ep->ip_type == TCP_IPV4 ?
+		   sizeof(struct iphdr) : sizeof(struct ipv6hdr);
+	hdr_size += sizeof(struct tcphdr) + ts_hdr_size;
+
+	offld_prms.mss = qedn->mtu - hdr_size;
+	offld_prms.rcv_wnd_scale = QEDN_TCP_RCV_WND_SCALE;
+	offld_prms.cwnd = QEDN_TCP_MAX_CWND * offld_prms.mss;
+	offld_prms.ka_max_probe_cnt = QEDN_TCP_KA_MAX_PROBE_COUNT;
+	offld_prms.ka_timeout = QEDN_TCP_KA_TIMEOUT;
+	offld_prms.ka_interval = QEDN_TCP_KA_INTERVAL;
+	offld_prms.max_rt_time = QEDN_TCP_MAX_RT_TIME;
+	offld_prms.sq_pbl_addr =
+		(u64)qed_chain_get_pbl_phys(&qedn_ep->fw_sq_chain);
+
+	rc = qed_ops->offload_conn(qedn->cdev,
+				   conn_ctx->conn_handle,
+				   &offld_prms);
+	if (rc)
+		pr_err("offload_conn returned with an error\n");
+
+	return rc;
+}
+
+static int qedn_fetch_tcp_port(struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	struct qedn_ctrl *qctrl;
+	int rc = 0;
+
+	ctrl = conn_ctx->ctrl;
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+
+	rc = qed_fetch_tcp_port(ctrl->conn_params.local_ip_addr,
+				&conn_ctx->sock, &conn_ctx->ep.src_port);
+
+	return rc;
+}
+
+static void qedn_decouple_conn(struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_tcp_ofld_queue *queue;
+
+	queue = conn_ctx->queue;
+	queue->private_data = NULL;
+}
+
+void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx, int abrt_flag)
+{
+	struct qedn_ctrl *qctrl;
+
+	if (!conn_ctx)
+		return;
+
+	qctrl = (struct qedn_ctrl *)conn_ctx->ctrl->private_data;
+
+	if (test_and_set_bit(DESTROY_CONNECTION, &conn_ctx->agg_work_action))
+		return;
+
+	qedn_set_con_state(conn_ctx, CONN_STATE_DESTROY_CONNECTION);
+	conn_ctx->abrt_flag = abrt_flag;
+
+	queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
+}
+
+/* Slowpath EQ Callback */
+int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
+{
+	struct nvmetcp_connect_done_results *eqe_connect_done;
+	struct nvmetcp_eqe_data *eqe_data;
+	struct nvme_tcp_ofld_ctrl *ctrl;
+	struct qedn_conn_ctx *conn_ctx;
+	struct qedn_ctrl *qctrl;
+	struct qedn_ctx *qedn;
+	u16 icid;
+	int rc;
+
+	if (!context || !event_ring_data) {
+		pr_err("Recv event with ctx NULL\n");
+
+		return -EINVAL;
+	}
+
+	qedn = (struct qedn_ctx *)context;
+
+	if (fw_event_code != NVMETCP_EVENT_TYPE_ASYN_CONNECT_COMPLETE) {
+		eqe_data = (struct nvmetcp_eqe_data *)event_ring_data;
+		icid = le16_to_cpu(eqe_data->icid);
+		pr_err("EQE Info: icid=0x%x, conn_id=0x%x, err-code=0x%x, err-pdu-opcode-reserved=0x%x\n",
+		       eqe_data->icid, eqe_data->conn_id,
+		       eqe_data->error_code,
+		       eqe_data->error_pdu_opcode_reserved);
+	} else {
+		eqe_connect_done =
+			(struct nvmetcp_connect_done_results *)event_ring_data;
+		icid = le16_to_cpu(eqe_connect_done->icid);
+	}
+
+	conn_ctx = qedn_get_conn_hash(qedn, icid);
+	if (!conn_ctx) {
+		pr_err("Connection with icid=0x%x doesn't exist in conn list\n", icid);
+
+		return -EINVAL;
+	}
+
+	ctrl = conn_ctx->ctrl;
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+
+	switch (fw_event_code) {
+	case NVMETCP_EVENT_TYPE_ASYN_CONNECT_COMPLETE:
+		if (conn_ctx->state != CONN_STATE_WAIT_FOR_CONNECT_DONE) {
+			pr_err("CID=0x%x - ASYN_CONNECT_COMPLETE: Unexpected connection state %u\n",
+			       conn_ctx->fw_cid, conn_ctx->state);
+		} else {
+			rc = qedn_set_con_state(conn_ctx, CONN_STATE_OFFLOAD_COMPLETE);
+
+			if (rc)
+				return rc;
+
+			/* Placeholder - for ICReq flow */
+		}
+
+		break;
+	case NVMETCP_EVENT_TYPE_ASYN_TERMINATE_DONE:
+		if (conn_ctx->state != CONN_STATE_WAIT_FOR_DESTROY_DONE)
+			pr_err("CID=0x%x - ASYN_TERMINATE_DONE: Unexpected connection state %u\n",
+			       conn_ctx->fw_cid, conn_ctx->state);
+		else
+			queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
+
+		break;
+	default:
+		pr_err("CID=0x%x - Recv Unknown Event %u\n", conn_ctx->fw_cid, fw_event_code);
+		break;
+	}
+
+	return 0;
+}
+
+static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	size_t dma_size;
+	int rc;
+
+	rc = qedn_alloc_fw_sq(qedn, &conn_ctx->ep);
+	if (rc) {
+		pr_err("Failed to allocate FW SQ\n");
+		goto rel_conn;
+	}
+
+	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
+	rc = qed_ops->acquire_conn(qedn->cdev,
+				   &conn_ctx->conn_handle,
+				   &conn_ctx->fw_cid,
+				   &conn_ctx->ep.p_doorbell);
+	if (rc) {
+		pr_err("Couldn't acquire connection\n");
+		goto rel_conn;
+	}
+
+	hash_add(qedn->conn_ctx_hash, &conn_ctx->hash_node,
+		 conn_ctx->conn_handle);
+	set_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
+
+	/* Placeholder - Allocate task resources and initialize fields */
+
+	rc = qedn_fetch_tcp_port(conn_ctx);
+	if (rc)
+		goto rel_conn;
+
+	set_bit(QEDN_CONN_RESRC_TCP_PORT, &conn_ctx->resrc_state);
+	dma_size = conn_ctx->sq_depth *
+			   sizeof(struct nvmetcp_host_cccid_itid_entry);
+	conn_ctx->host_cccid_itid =
+			dma_alloc_coherent(&qedn->pdev->dev,
+					   dma_size,
+					   &conn_ctx->host_cccid_itid_phy_addr,
+					   GFP_ATOMIC);
+	if (!conn_ctx->host_cccid_itid) {
+		pr_err("CCCID-iTID Map allocation failed\n");
+		goto rel_conn;
+	}
+
+	memset(conn_ctx->host_cccid_itid, 0xFF, dma_size);
+	set_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state);
+	rc = qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_CONNECT_DONE);
+	if (rc)
+		goto rel_conn;
+
+	rc = qedn_nvmetcp_offload_conn(conn_ctx);
+	if (rc) {
+		pr_err("Offload error: CID=0x%x\n", conn_ctx->fw_cid);
+		goto rel_conn;
+	}
+
+	return 0;
+
+rel_conn:
+	pr_err("qedn create queue ended with ERROR\n");
+	qedn_release_conn_ctx(conn_ctx);
+
+	return -EINVAL;
+}
+
+void qedn_destroy_connection(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int rc;
+
+	qedn_decouple_conn(conn_ctx);
+
+	if (qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_DESTROY_DONE))
+		return;
+
+	/* Placeholder - task cleanup */
+
+	rc = qed_ops->destroy_conn(qedn->cdev, conn_ctx->conn_handle,
+				   conn_ctx->abrt_flag);
+	if (rc)
+		pr_warn("destroy_conn failed - rc %u\n", rc);
+}
+
+void qedn_sp_wq_handler(struct work_struct *work)
+{
+	struct qedn_conn_ctx *conn_ctx;
+	struct qedn_ctx *qedn;
+	int rc;
+
+	conn_ctx = container_of(work, struct qedn_conn_ctx, sp_wq_entry);
+	qedn = conn_ctx->qedn;
+
+	if (conn_ctx->state == CONN_STATE_DESTROY_COMPLETE) {
+		pr_err("Connection already released!\n");
+
+		return;
+	}
+
+	if (conn_ctx->state == CONN_STATE_WAIT_FOR_DESTROY_DONE) {
+		qedn_release_conn_ctx(conn_ctx);
+
+		return;
+	}
+
+	qedn = conn_ctx->qedn;
+	if (test_bit(DESTROY_CONNECTION, &conn_ctx->agg_work_action)) {
+		qedn_destroy_connection(conn_ctx);
+
+		return;
+	}
+
+	if (test_bit(CREATE_CONNECTION, &conn_ctx->agg_work_action)) {
+		qedn_clr_sp_wa(conn_ctx, CREATE_CONNECTION);
+		rc = qedn_prep_and_offload_queue(conn_ctx);
+		if (rc) {
+			pr_err("Error in queue prepare & firmware offload\n");
+
+			return;
+		}
+	}
+}
+
+/* Clear connection aggregative slowpath work action */
+void qedn_clr_sp_wa(struct qedn_conn_ctx *conn_ctx, u32 bit)
+{
+	clear_bit(bit, &conn_ctx->agg_work_action);
+}
+
+/* Set connection aggregative slowpath work action */
+void qedn_set_sp_wa(struct qedn_conn_ctx *conn_ctx, u32 bit)
+{
+	set_bit(bit, &conn_ctx->agg_work_action);
+}
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 0135a1f490da..70beb84b9793 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -23,6 +23,38 @@ static struct pci_device_id qedn_pci_tbl[] = {
 	{0, 0},
 };
 
+static bool qedn_matches_qede(struct qedn_ctx *qedn, struct pci_dev *qede_pdev)
+{
+	struct pci_dev *qedn_pdev = qedn->pdev;
+
+	return (qede_pdev->bus->number == qedn_pdev->bus->number &&
+		PCI_SLOT(qede_pdev->devfn) == PCI_SLOT(qedn_pdev->devfn) &&
+		PCI_FUNC(qede_pdev->devfn) == qedn->dev_info.port_id);
+}
+
+static struct qedn_ctx *qedn_get_pf_from_pdev(struct pci_dev *qede_pdev)
+{
+	struct list_head *pf_list = NULL;
+	struct qedn_ctx *qedn = NULL;
+	int rc;
+
+	pf_list = &qedn_glb.qedn_pf_list;
+	if (!pf_list) {
+		pr_err("Failed fetching pf list for nvmet_add_port\n");
+		rc = -EFAULT;
+		goto pf_list_err;
+	}
+
+	list_for_each_entry(qedn, pf_list, gl_pf_entry) {
+		if (qedn_matches_qede(qedn, qede_pdev))
+			return qedn;
+	}
+
+pf_list_err:
+
+	return NULL;
+}
+
 static int
 qedn_claim_dev(struct nvme_tcp_ofld_dev *dev,
 	       struct nvme_tcp_ofld_ctrl_con_params *conn_params)
@@ -70,22 +102,167 @@ qedn_claim_dev(struct nvme_tcp_ofld_dev *dev,
 	return true;
 }
 
-static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
-			     size_t q_size)
+static int qedn_setup_ctrl(struct nvme_tcp_ofld_ctrl *ctrl, bool new)
 {
-	/* Placeholder - qedn_create_queue */
+	struct nvme_tcp_ofld_dev *dev = ctrl->dev;
+	struct qedn_ctrl *qctrl = NULL;
+	struct qedn_ctx *qedn = NULL;
+	int rc = 0;
+
+	if (new) {
+		qctrl = kzalloc(sizeof(*qctrl), GFP_KERNEL);
+		if (!qctrl)
+			return -ENOMEM;
+
+		ctrl->private_data = (void *)qctrl;
+		set_bit(QEDN_CTRL_SET_TO_OFLD_CTRL, &qctrl->agg_state);
+
+		qctrl->sp_wq = alloc_workqueue(QEDN_SP_WORKQUEUE, WQ_MEM_RECLAIM,
+					       QEDN_SP_WORKQUEUE_MAX_ACTIVE);
+		if (!qctrl->sp_wq) {
+			rc = -ENODEV;
+			pr_err("Unable to create slowpath work queue!\n");
+			kfree(qctrl);
+
+			return rc;
+		}
+
+		set_bit(QEDN_STATE_SP_WORK_THREAD_SET, &qctrl->agg_state);
+	}
+
+	qedn = qedn_get_pf_from_pdev(dev->qede_pdev);
+	if (!qedn) {
+		pr_err("Failed locating QEDN for ip=%pIS\n",
+		       &ctrl->conn_params.local_ip_addr);
+		rc = -EFAULT;
+		goto err_out;
+	}
+
+	qctrl->qedn = qedn;
+
+	/* Placeholder - setup LLH filter */
+
+	return 0;
+
+err_out:
+	flush_workqueue(qctrl->sp_wq);
+	kfree(qctrl);
+
+	return rc;
+}
+
+static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
+{
+	struct qedn_ctrl *qctrl = (struct qedn_ctrl *)ctrl->private_data;
+
+	if (test_and_clear_bit(QEDN_STATE_SP_WORK_THREAD_SET, &qctrl->agg_state))
+		flush_workqueue(qctrl->sp_wq);
+
+	if (test_and_clear_bit(QEDN_CTRL_SET_TO_OFLD_CTRL, &qctrl->agg_state)) {
+		kfree(qctrl);
+		ctrl->private_data = NULL;
+	}
+
+	qctrl->agg_state = 0;
+	kfree(ctrl);
+
+	return 0;
+}
+
+static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t q_size)
+{
+	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
+	struct qedn_conn_ctx *conn_ctx;
+	struct qedn_ctrl *qctrl;
+	struct qedn_ctx *qedn;
+	int rc;
+
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+	qedn = qctrl->qedn;
+
+	/* Allocate qedn connection context */
+	conn_ctx = kzalloc(sizeof(*conn_ctx), GFP_KERNEL);
+	if (!conn_ctx)
+		return -ENOMEM;
+
+	queue->private_data = conn_ctx;
+	conn_ctx->qedn = qedn;
+	conn_ctx->queue = queue;
+	conn_ctx->ctrl = ctrl;
+	conn_ctx->sq_depth = q_size;
+
+	init_waitqueue_head(&conn_ctx->conn_waitq);
+	atomic_set(&conn_ctx->est_conn_indicator, 0);
+	atomic_set(&conn_ctx->destroy_conn_indicator, 0);
+
+	spin_lock_init(&conn_ctx->conn_state_lock);
+
+	qedn_initialize_endpoint(&conn_ctx->ep, qedn->local_mac_addr,
+				 &ctrl->conn_params);
+
+	atomic_inc(&qctrl->host_num_active_conns);
+
+	qedn_set_sp_wa(conn_ctx, CREATE_CONNECTION);
+	qedn_set_con_state(conn_ctx, CONN_STATE_CREATE_CONNECTION);
+	INIT_WORK(&conn_ctx->sp_wq_entry, qedn_sp_wq_handler);
+	queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
+
+	/* Wait for the connection establishment to complete - this includes the
+	 * FW TCP connection establishment and the NVMeTCP ICReq & ICResp
+	 */
+	rc = qedn_wait_for_conn_est(conn_ctx);
+	if (rc)
+		return -ENXIO;
 
 	return 0;
 }
 
 static void qedn_drain_queue(struct nvme_tcp_ofld_queue *queue)
 {
-	/* Placeholder - qedn_drain_queue */
+	/* No queue drain is required */
+}
+
+#define ATOMIC_READ_DESTROY_IND atomic_read(&conn_ctx->destroy_conn_indicator)
+#define TERMINATE_TIMEOUT msecs_to_jiffies(QEDN_RLS_CONS_TMO)
+static inline void
+qedn_queue_wait_for_terminate_complete(struct qedn_conn_ctx *conn_ctx)
+{
+	/* Returns valid non-0 */
+	int wrc, state;
+
+	wrc = wait_event_interruptible_timeout(conn_ctx->conn_waitq,
+					       ATOMIC_READ_DESTROY_IND > 0,
+					       TERMINATE_TIMEOUT);
+
+	atomic_set(&conn_ctx->destroy_conn_indicator, 0);
+
+	spin_lock_bh(&conn_ctx->conn_state_lock);
+	state = conn_ctx->state;
+	spin_unlock_bh(&conn_ctx->conn_state_lock);
+
+	if (!wrc  || state != CONN_STATE_DESTROY_COMPLETE)
+		pr_warn("Timed out waiting for clear-SQ on FW conns");
 }
 
 static void qedn_destroy_queue(struct nvme_tcp_ofld_queue *queue)
 {
-	/* Placeholder - qedn_destroy_queue */
+	struct qedn_conn_ctx *conn_ctx;
+
+	if (!queue) {
+		pr_err("ctrl has no queues\n");
+
+		return;
+	}
+
+	conn_ctx = (struct qedn_conn_ctx *)queue->private_data;
+	if (!conn_ctx)
+		return;
+
+	qedn_terminate_connection(conn_ctx, QEDN_ABORTIVE_TERMINATION);
+
+	qedn_queue_wait_for_terminate_complete(conn_ctx);
+
+	kfree(conn_ctx);
 }
 
 static int qedn_poll_queue(struct nvme_tcp_ofld_queue *queue)
@@ -132,6 +309,8 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
 		 *	NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS
 		 */
 	.claim_dev = qedn_claim_dev,
+	.setup_ctrl = qedn_setup_ctrl,
+	.release_ctrl = qedn_release_ctrl,
 	.create_queue = qedn_create_queue,
 	.drain_queue = qedn_drain_queue,
 	.destroy_queue = qedn_destroy_queue,
@@ -141,6 +320,21 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
 	.commit_rqs = qedn_commit_rqs,
 };
 
+struct qedn_conn_ctx *qedn_get_conn_hash(struct qedn_ctx *qedn, u16 icid)
+{
+	struct qedn_conn_ctx *conn = NULL;
+
+	hash_for_each_possible(qedn->conn_ctx_hash, conn, hash_node, icid) {
+		if (conn->conn_handle == icid)
+			break;
+	}
+
+	if (!conn || conn->conn_handle != icid)
+		return NULL;
+
+	return conn;
+}
+
 /* Fastpath IRQ handler */
 static irqreturn_t qedn_irq_handler(int irq, void *dev_id)
 {
@@ -241,7 +435,7 @@ static int qedn_setup_irq(struct qedn_ctx *qedn)
 
 static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)
 {
-	/* Placeholder - Initialize qedn fields */
+	hash_init(qedn->conn_ctx_hash);
 }
 
 static inline void
@@ -607,7 +801,7 @@ static int __qedn_probe(struct pci_dev *pdev)
 	rc = qed_ops->start(qedn->cdev,
 			    NULL /* Placeholder for FW IO-path resources */,
 			    qedn,
-			    NULL /* Placeholder for FW Event callback */);
+			    qedn_event_cb);
 	if (rc) {
 		rc = -ENODEV;
 		pr_err("Cannot start NVMeTCP Function\n");
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 21/27] qedn: Add support of configuring HW filter block
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (19 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 20/27] qedn: Add connection-level slowpath functionality Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:38   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 22/27] qedn: Add IO level nvme_req and fw_cq workqueues Shai Malin
                   ` (6 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

From: Prabhakar Kushwaha <pkushwaha@marvell.com>

HW filter can be configured to filter TCP packets based on either
source or target TCP port. QEDN leverage this feature to route
NVMeTCP traffic.

This patch configures HW filter block based on source port for all
receiving packets to deliver correct QEDN PF.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/qedn.h      |  15 +++++
 drivers/nvme/hw/qedn/qedn_main.c | 108 ++++++++++++++++++++++++++++++-
 2 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index ed0d43163da2..c15cac37ec1e 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -38,6 +38,11 @@
 #define QEDN_IRQ_NAME_LEN 24
 #define QEDN_IRQ_NO_FLAGS 0
 
+/* HW defines */
+
+/* QEDN_MAX_LLH_PORTS will be extended in future */
+#define QEDN_MAX_LLH_PORTS 16
+
 /* Destroy connection defines */
 #define QEDN_NON_ABORTIVE_TERMINATION 0
 #define QEDN_ABORTIVE_TERMINATION 1
@@ -78,6 +83,7 @@ enum qedn_state {
 	QEDN_STATE_CORE_PROBED = 0,
 	QEDN_STATE_CORE_OPEN,
 	QEDN_STATE_GL_PF_LIST_ADDED,
+	QEDN_STATE_LLH_PORT_FILTER_SET,
 	QEDN_STATE_MFW_STATE,
 	QEDN_STATE_NVMETCP_OPEN,
 	QEDN_STATE_IRQ_SET,
@@ -112,6 +118,8 @@ struct qedn_ctx {
 	/* Accessed with atomic bit ops, used with enum qedn_state */
 	unsigned long state;
 
+	u8 num_llh_filters;
+	struct list_head llh_filter_list;
 	u8 local_mac_addr[ETH_ALEN];
 	u16 mtu;
 
@@ -177,6 +185,12 @@ enum qedn_conn_state {
 	CONN_STATE_DESTROY_COMPLETE
 };
 
+struct qedn_llh_filter {
+	struct list_head entry;
+	u16 port;
+	u16 ref_cnt;
+};
+
 struct qedn_ctrl {
 	struct list_head glb_entry;
 	struct list_head pf_entry;
@@ -265,5 +279,6 @@ int qedn_initialize_endpoint(struct qedn_endpoint *ep, u8 *local_mac_addr,
 int qedn_wait_for_conn_est(struct qedn_conn_ctx *conn_ctx);
 int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state);
 void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx, int abrt_flag);
+__be16 qedn_get_in_port(struct sockaddr_storage *sa);
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 70beb84b9793..8b5714e7e2bb 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -23,6 +23,81 @@ static struct pci_device_id qedn_pci_tbl[] = {
 	{0, 0},
 };
 
+__be16 qedn_get_in_port(struct sockaddr_storage *sa)
+{
+	return sa->ss_family == AF_INET
+		? ((struct sockaddr_in *)sa)->sin_port
+		: ((struct sockaddr_in6 *)sa)->sin6_port;
+}
+
+struct qedn_llh_filter *qedn_add_llh_filter(struct qedn_ctx *qedn, u16 tcp_port)
+{
+	struct qedn_llh_filter *llh_filter = NULL;
+	struct qedn_llh_filter *llh_tmp = NULL;
+	bool new_filter = 1;
+	int rc = 0;
+
+	/* Check if LLH filter already defined */
+	list_for_each_entry_safe(llh_filter, llh_tmp, &qedn->llh_filter_list, entry) {
+		if (llh_filter->port == tcp_port) {
+			new_filter = 0;
+			llh_filter->ref_cnt++;
+			break;
+		}
+	}
+
+	if (new_filter) {
+		if (qedn->num_llh_filters >= QEDN_MAX_LLH_PORTS) {
+			pr_err("PF reached the max target ports limit %u. %u\n",
+			       qedn->dev_info.common.abs_pf_id,
+			       qedn->num_llh_filters);
+
+			return NULL;
+		}
+
+		rc = qed_ops->add_src_tcp_port_filter(qedn->cdev, tcp_port);
+		if (rc) {
+			pr_err("LLH port configuration failed. port:%u; rc:%u\n", tcp_port, rc);
+
+			return NULL;
+		}
+
+		llh_filter = kzalloc(sizeof(*llh_filter), GFP_KERNEL);
+		if (!llh_filter) {
+			qed_ops->remove_src_tcp_port_filter(qedn->cdev, tcp_port);
+
+			return NULL;
+		}
+
+		llh_filter->port = tcp_port;
+		llh_filter->ref_cnt = 1;
+		++qedn->num_llh_filters;
+		list_add_tail(&llh_filter->entry, &qedn->llh_filter_list);
+		set_bit(QEDN_STATE_LLH_PORT_FILTER_SET, &qedn->state);
+	}
+
+	return llh_filter;
+}
+
+void qedn_dec_llh_filter(struct qedn_ctx *qedn, struct qedn_llh_filter *llh_filter)
+{
+	if (!llh_filter)
+		return;
+
+	llh_filter->ref_cnt--;
+	if (!llh_filter->ref_cnt) {
+		list_del(&llh_filter->entry);
+
+		/* Remove LLH protocol port filter */
+		qed_ops->remove_src_tcp_port_filter(qedn->cdev, llh_filter->port);
+
+		--qedn->num_llh_filters;
+		kfree(llh_filter);
+		if (!qedn->num_llh_filters)
+			clear_bit(QEDN_STATE_LLH_PORT_FILTER_SET, &qedn->state);
+	}
+}
+
 static bool qedn_matches_qede(struct qedn_ctx *qedn, struct pci_dev *qede_pdev)
 {
 	struct pci_dev *qedn_pdev = qedn->pdev;
@@ -105,8 +180,10 @@ qedn_claim_dev(struct nvme_tcp_ofld_dev *dev,
 static int qedn_setup_ctrl(struct nvme_tcp_ofld_ctrl *ctrl, bool new)
 {
 	struct nvme_tcp_ofld_dev *dev = ctrl->dev;
+	struct qedn_llh_filter *llh_filter = NULL;
 	struct qedn_ctrl *qctrl = NULL;
 	struct qedn_ctx *qedn = NULL;
+	__be16 remote_port;
 	int rc = 0;
 
 	if (new) {
@@ -140,7 +217,22 @@ static int qedn_setup_ctrl(struct nvme_tcp_ofld_ctrl *ctrl, bool new)
 
 	qctrl->qedn = qedn;
 
-	/* Placeholder - setup LLH filter */
+	if (qedn->num_llh_filters == 0) {
+		qedn->mtu = dev->ndev->mtu;
+		memcpy(qedn->local_mac_addr, dev->ndev->dev_addr, ETH_ALEN);
+	}
+
+	remote_port = qedn_get_in_port(&ctrl->conn_params.remote_ip_addr);
+	if (new) {
+		llh_filter = qedn_add_llh_filter(qedn, ntohs(remote_port));
+		if (!llh_filter) {
+			rc = -EFAULT;
+			goto err_out;
+		}
+
+		qctrl->llh_filter = llh_filter;
+		set_bit(LLH_FILTER, &qctrl->agg_state);
+	}
 
 	return 0;
 
@@ -155,6 +247,12 @@ static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 {
 	struct qedn_ctrl *qctrl = (struct qedn_ctrl *)ctrl->private_data;
 
+	if (test_and_clear_bit(LLH_FILTER, &qctrl->agg_state) &&
+	    qctrl->llh_filter) {
+		qedn_dec_llh_filter(qctrl->qedn, qctrl->llh_filter);
+		qctrl->llh_filter = NULL;
+	}
+
 	if (test_and_clear_bit(QEDN_STATE_SP_WORK_THREAD_SET, &qctrl->agg_state))
 		flush_workqueue(qctrl->sp_wq);
 
@@ -435,6 +533,8 @@ static int qedn_setup_irq(struct qedn_ctx *qedn)
 
 static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)
 {
+	INIT_LIST_HEAD(&qedn->llh_filter_list);
+	qedn->num_llh_filters = 0;
 	hash_init(qedn->conn_ctx_hash);
 }
 
@@ -694,6 +794,12 @@ static void __qedn_remove(struct pci_dev *pdev)
 		return;
 	}
 
+	if (test_and_clear_bit(QEDN_STATE_LLH_PORT_FILTER_SET, &qedn->state)) {
+		pr_err("LLH port configuration removal. %d filters still set\n",
+		       qedn->num_llh_filters);
+		qed_ops->clear_all_filters(qedn->cdev);
+	}
+
 	if (test_and_clear_bit(QEDN_STATE_REGISTERED_OFFLOAD_DEV, &qedn->state))
 		nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
 
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 22/27] qedn: Add IO level nvme_req and fw_cq workqueues
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (20 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 21/27] qedn: Add support of configuring HW filter block Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:42   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 23/27] qedn: Add support of Task and SGL Shai Malin
                   ` (5 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

This patch will present the IO level workqueues:

- qedn_nvme_req_fp_wq(): process new requests, similar to
			 nvme_tcp_io_work(). The flow starts from
			 send_req() and will aggregate all the requests
			 on this CPU core.

- qedn_fw_cq_fp_wq():   process new FW completions, the flow starts from
			the IRQ handler and for a single interrupt it will
			process all the pending NVMeoF Completions under
			polling mode.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/Makefile    |   2 +-
 drivers/nvme/hw/qedn/qedn.h      |  29 +++++++
 drivers/nvme/hw/qedn/qedn_conn.c |   3 +
 drivers/nvme/hw/qedn/qedn_main.c | 114 +++++++++++++++++++++++--
 drivers/nvme/hw/qedn/qedn_task.c | 138 +++++++++++++++++++++++++++++++
 5 files changed, 278 insertions(+), 8 deletions(-)
 create mode 100644 drivers/nvme/hw/qedn/qedn_task.c

diff --git a/drivers/nvme/hw/qedn/Makefile b/drivers/nvme/hw/qedn/Makefile
index d8b343afcd16..c7d838a61ae6 100644
--- a/drivers/nvme/hw/qedn/Makefile
+++ b/drivers/nvme/hw/qedn/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 obj-$(CONFIG_NVME_QEDN) += qedn.o
-qedn-y := qedn_main.o qedn_conn.o
+qedn-y := qedn_main.o qedn_conn.o qedn_task.o
\ No newline at end of file
diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index c15cac37ec1e..bd9a250cb2f5 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -47,6 +47,9 @@
 #define QEDN_NON_ABORTIVE_TERMINATION 0
 #define QEDN_ABORTIVE_TERMINATION 1
 
+#define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"
+#define QEDN_NVME_REQ_FP_WQ_WORKQUEUE "qedn_nvme_req_fp_wq"
+
 /*
  * TCP offload stack default configurations and defines.
  * Future enhancements will allow controlling the configurable
@@ -100,6 +103,7 @@ struct qedn_fp_queue {
 	struct qedn_ctx	*qedn;
 	struct qed_sb_info *sb_info;
 	unsigned int cpu;
+	struct work_struct fw_cq_fp_wq_entry;
 	u16 sb_id;
 	char irqname[QEDN_IRQ_NAME_LEN];
 };
@@ -131,6 +135,8 @@ struct qedn_ctx {
 	struct qedn_fp_queue *fp_q_arr;
 	struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;
 	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
+	struct workqueue_struct *nvme_req_fp_wq;
+	struct workqueue_struct *fw_cq_fp_wq;
 };
 
 struct qedn_endpoint {
@@ -213,6 +219,25 @@ struct qedn_ctrl {
 
 /* Connection level struct */
 struct qedn_conn_ctx {
+	/* IO path */
+	struct workqueue_struct	*nvme_req_fp_wq; /* ptr to qedn->nvme_req_fp_wq */
+	struct nvme_tcp_ofld_req *req; /* currently proccessed request */
+
+	struct list_head host_pend_req_list;
+	/* Spinlock to access pending request list */
+	spinlock_t nvme_req_lock;
+	unsigned int cpu;
+
+	/* Entry for registering to nvme_req_fp_wq */
+	struct work_struct nvme_req_fp_wq_entry;
+	/*
+	 * Spinlock for accessing qedn_process_req as it can be called
+	 * from multiple place like queue_rq, async, self requeued
+	 */
+	struct mutex nvme_req_mutex;
+	struct qedn_fp_queue *fp_q;
+	int qid;
+
 	struct qedn_ctx *qedn;
 	struct nvme_tcp_ofld_queue *queue;
 	struct nvme_tcp_ofld_ctrl *ctrl;
@@ -280,5 +305,9 @@ int qedn_wait_for_conn_est(struct qedn_conn_ctx *conn_ctx);
 int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state);
 void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx, int abrt_flag);
 __be16 qedn_get_in_port(struct sockaddr_storage *sa);
+inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 cccid);
+void qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req);
+void qedn_nvme_req_fp_wq_handler(struct work_struct *work);
+void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe);
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
index 9bfc0a5f0cdb..90d8aa36d219 100644
--- a/drivers/nvme/hw/qedn/qedn_conn.c
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -385,6 +385,9 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 	}
 
 	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
+	INIT_LIST_HEAD(&conn_ctx->host_pend_req_list);
+	spin_lock_init(&conn_ctx->nvme_req_lock);
+
 	rc = qed_ops->acquire_conn(qedn->cdev,
 				   &conn_ctx->conn_handle,
 				   &conn_ctx->fw_cid,
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 8b5714e7e2bb..38f23dbb03a5 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -267,6 +267,18 @@ static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
 	return 0;
 }
 
+static void qedn_set_ctrl_io_cpus(struct qedn_conn_ctx *conn_ctx, int qid)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_fp_queue *fp_q = NULL;
+	int index;
+
+	index = qid ? (qid - 1) % qedn->num_fw_cqs : 0;
+	fp_q = &qedn->fp_q_arr[index];
+
+	conn_ctx->cpu = fp_q->cpu;
+}
+
 static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t q_size)
 {
 	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
@@ -288,6 +300,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t
 	conn_ctx->queue = queue;
 	conn_ctx->ctrl = ctrl;
 	conn_ctx->sq_depth = q_size;
+	qedn_set_ctrl_io_cpus(conn_ctx, qid);
 
 	init_waitqueue_head(&conn_ctx->conn_waitq);
 	atomic_set(&conn_ctx->est_conn_indicator, 0);
@@ -295,6 +308,10 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t
 
 	spin_lock_init(&conn_ctx->conn_state_lock);
 
+	INIT_WORK(&conn_ctx->nvme_req_fp_wq_entry, qedn_nvme_req_fp_wq_handler);
+	conn_ctx->nvme_req_fp_wq = qedn->nvme_req_fp_wq;
+	conn_ctx->qid = qid;
+
 	qedn_initialize_endpoint(&conn_ctx->ep, qedn->local_mac_addr,
 				 &ctrl->conn_params);
 
@@ -356,6 +373,7 @@ static void qedn_destroy_queue(struct nvme_tcp_ofld_queue *queue)
 	if (!conn_ctx)
 		return;
 
+	cancel_work_sync(&conn_ctx->nvme_req_fp_wq_entry);
 	qedn_terminate_connection(conn_ctx, QEDN_ABORTIVE_TERMINATION);
 
 	qedn_queue_wait_for_terminate_complete(conn_ctx);
@@ -385,12 +403,24 @@ static int qedn_init_req(struct nvme_tcp_ofld_req *req)
 
 static void qedn_commit_rqs(struct nvme_tcp_ofld_queue *queue)
 {
-	/* Placeholder - queue work */
+	struct qedn_conn_ctx *conn_ctx;
+
+	conn_ctx = (struct qedn_conn_ctx *)queue->private_data;
+
+	if (!list_empty(&conn_ctx->host_pend_req_list))
+		queue_work_on(conn_ctx->cpu, conn_ctx->nvme_req_fp_wq,
+			      &conn_ctx->nvme_req_fp_wq_entry);
 }
 
 static int qedn_send_req(struct nvme_tcp_ofld_req *req)
 {
-	/* Placeholder - qedn_send_req */
+	struct qedn_conn_ctx *qedn_conn = (struct qedn_conn_ctx *)req->queue->private_data;
+
+	/* Under the assumption that the cccid/tag will be in the range of 0 to sq_depth-1. */
+	if (!req->async && qedn_validate_cccid_in_range(qedn_conn, req->rq->tag))
+		return BLK_STS_NOTSUPP;
+
+	qedn_queue_request(qedn_conn, req);
 
 	return 0;
 }
@@ -434,9 +464,59 @@ struct qedn_conn_ctx *qedn_get_conn_hash(struct qedn_ctx *qedn, u16 icid)
 }
 
 /* Fastpath IRQ handler */
+void qedn_fw_cq_fp_handler(struct qedn_fp_queue *fp_q)
+{
+	u16 sb_id, cq_prod_idx, cq_cons_idx;
+	struct qedn_ctx *qedn = fp_q->qedn;
+	struct nvmetcp_fw_cqe *cqe = NULL;
+
+	sb_id = fp_q->sb_id;
+	qed_sb_update_sb_idx(fp_q->sb_info);
+
+	/* rmb - to prevent missing new cqes */
+	rmb();
+
+	/* Read the latest cq_prod from the SB */
+	cq_prod_idx = *fp_q->cq_prod;
+	cq_cons_idx = qed_chain_get_cons_idx(&fp_q->cq_chain);
+
+	while (cq_cons_idx != cq_prod_idx) {
+		cqe = qed_chain_consume(&fp_q->cq_chain);
+		if (likely(cqe))
+			qedn_io_work_cq(qedn, cqe);
+		else
+			pr_err("Failed consuming cqe\n");
+
+		cq_cons_idx = qed_chain_get_cons_idx(&fp_q->cq_chain);
+
+		/* Check if new completions were posted */
+		if (unlikely(cq_prod_idx == cq_cons_idx)) {
+			/* rmb - to prevent missing new cqes */
+			rmb();
+
+			/* Update the latest cq_prod from the SB */
+			cq_prod_idx = *fp_q->cq_prod;
+		}
+	}
+}
+
+static void qedn_fw_cq_fq_wq_handler(struct work_struct *work)
+{
+	struct qedn_fp_queue *fp_q = container_of(work, struct qedn_fp_queue, fw_cq_fp_wq_entry);
+
+	qedn_fw_cq_fp_handler(fp_q);
+	qed_sb_ack(fp_q->sb_info, IGU_INT_ENABLE, 1);
+}
+
 static irqreturn_t qedn_irq_handler(int irq, void *dev_id)
 {
-	/* Placeholder */
+	struct qedn_fp_queue *fp_q = dev_id;
+	struct qedn_ctx *qedn = fp_q->qedn;
+
+	fp_q->cpu = smp_processor_id();
+
+	qed_sb_ack(fp_q->sb_info, IGU_INT_DISABLE, 0);
+	queue_work_on(fp_q->cpu, qedn->fw_cq_fp_wq, &fp_q->fw_cq_fp_wq_entry);
 
 	return IRQ_HANDLED;
 }
@@ -584,6 +664,11 @@ static void qedn_free_function_queues(struct qedn_ctx *qedn)
 	int i;
 
 	/* Free workqueues */
+	destroy_workqueue(qedn->fw_cq_fp_wq);
+	qedn->fw_cq_fp_wq = NULL;
+
+	destroy_workqueue(qedn->nvme_req_fp_wq);
+	qedn->nvme_req_fp_wq = NULL;
 
 	/* Free the fast path queues*/
 	for (i = 0; i < qedn->num_fw_cqs; i++) {
@@ -651,7 +736,23 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
 	u64 cq_phy_addr;
 	int i;
 
-	/* Place holder - IO-path workqueues */
+	qedn->fw_cq_fp_wq = alloc_workqueue(QEDN_FW_CQ_FP_WQ_WORKQUEUE,
+					    WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
+	if (!qedn->fw_cq_fp_wq) {
+		rc = -ENODEV;
+		pr_err("Unable to create fastpath FW CQ workqueue!\n");
+
+		return rc;
+	}
+
+	qedn->nvme_req_fp_wq = alloc_workqueue(QEDN_NVME_REQ_FP_WQ_WORKQUEUE,
+					       WQ_HIGHPRI | WQ_MEM_RECLAIM, 1);
+	if (!qedn->nvme_req_fp_wq) {
+		rc = -ENODEV;
+		pr_err("Unable to create fastpath qedn nvme workqueue!\n");
+
+		return rc;
+	}
 
 	qedn->fp_q_arr = kcalloc(qedn->num_fw_cqs,
 				 sizeof(struct qedn_fp_queue), GFP_KERNEL);
@@ -679,7 +780,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
 		chain_params.mode = QED_CHAIN_MODE_PBL,
 		chain_params.cnt_type = QED_CHAIN_CNT_TYPE_U16,
 		chain_params.num_elems = QEDN_FW_CQ_SIZE;
-		chain_params.elem_size = 64; /*Placeholder - sizeof(struct nvmetcp_fw_cqe)*/
+		chain_params.elem_size = sizeof(struct nvmetcp_fw_cqe);
 
 		rc = qed_ops->common->chain_alloc(qedn->cdev,
 						  &fp_q->cq_chain,
@@ -708,8 +809,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
 		sb = fp_q->sb_info->sb_virt;
 		fp_q->cq_prod = (u16 *)&sb->pi_array[QEDN_PROTO_CQ_PROD_IDX];
 		fp_q->qedn = qedn;
-
-		/* Placeholder - Init IO-path workqueue */
+		INIT_WORK(&fp_q->fw_cq_fp_wq_entry, qedn_fw_cq_fq_wq_handler);
 
 		/* Placeholder - Init IO-path resources */
 	}
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
new file mode 100644
index 000000000000..d3474188efdc
--- /dev/null
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Marvell. All rights reserved.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+ /* Kernel includes */
+#include <linux/kernel.h>
+
+/* Driver includes */
+#include "qedn.h"
+
+inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 cccid)
+{
+	int rc = 0;
+
+	if (unlikely(cccid >= conn_ctx->sq_depth)) {
+		pr_err("cccid 0x%x out of range ( > sq depth)\n", cccid);
+		rc = -EINVAL;
+	}
+
+	return rc;
+}
+
+static bool qedn_process_req(struct qedn_conn_ctx *qedn_conn)
+{
+	return true;
+}
+
+/* The WQ handler can be call from 3 flows:
+ *	1. queue_rq.
+ *	2. async.
+ *	3. self requeued
+ * Try to send requests from the pending list. If a request proccess has failed,
+ * re-register to the workqueue.
+ * If there are no additional pending requests - exit the handler.
+ */
+void qedn_nvme_req_fp_wq_handler(struct work_struct *work)
+{
+	struct qedn_conn_ctx *qedn_conn;
+	bool more = false;
+
+	qedn_conn = container_of(work, struct qedn_conn_ctx, nvme_req_fp_wq_entry);
+	do {
+		if (mutex_trylock(&qedn_conn->nvme_req_mutex)) {
+			more = qedn_process_req(qedn_conn);
+			qedn_conn->req = NULL;
+			mutex_unlock(&qedn_conn->nvme_req_mutex);
+		}
+	} while (more);
+
+	if (!list_empty(&qedn_conn->host_pend_req_list))
+		queue_work_on(qedn_conn->cpu, qedn_conn->nvme_req_fp_wq,
+			      &qedn_conn->nvme_req_fp_wq_entry);
+}
+
+void qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req)
+{
+	bool empty, res = false;
+
+	spin_lock(&qedn_conn->nvme_req_lock);
+	empty = list_empty(&qedn_conn->host_pend_req_list) && !qedn_conn->req;
+	list_add_tail(&req->queue_entry, &qedn_conn->host_pend_req_list);
+	spin_unlock(&qedn_conn->nvme_req_lock);
+
+	/* attempt workqueue bypass */
+	if (qedn_conn->cpu == smp_processor_id() && empty &&
+	    mutex_trylock(&qedn_conn->nvme_req_mutex)) {
+		res = qedn_process_req(qedn_conn);
+		qedn_conn->req = NULL;
+		mutex_unlock(&qedn_conn->nvme_req_mutex);
+		if (res || list_empty(&qedn_conn->host_pend_req_list))
+			return;
+	} else if (req->last) {
+		queue_work_on(qedn_conn->cpu, qedn_conn->nvme_req_fp_wq,
+			      &qedn_conn->nvme_req_fp_wq_entry);
+	}
+}
+
+struct qedn_task_ctx *qedn_cqe_get_active_task(struct nvmetcp_fw_cqe *cqe)
+{
+	struct regpair *p = &cqe->task_opaque;
+
+	return (struct qedn_task_ctx *)((((u64)(le32_to_cpu(p->hi)) << 32)
+					+ le32_to_cpu(p->lo)));
+}
+
+void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
+{
+	struct qedn_task_ctx *qedn_task = NULL;
+	struct qedn_conn_ctx *conn_ctx = NULL;
+	u16 itid;
+	u32 cid;
+
+	conn_ctx = qedn_get_conn_hash(qedn, le16_to_cpu(cqe->conn_id));
+	if (unlikely(!conn_ctx)) {
+		pr_err("CID 0x%x: Failed to fetch conn_ctx from hash\n",
+		       le16_to_cpu(cqe->conn_id));
+
+		return;
+	}
+
+	cid = conn_ctx->fw_cid;
+	itid = le16_to_cpu(cqe->itid);
+	qedn_task = qedn_cqe_get_active_task(cqe);
+	if (unlikely(!qedn_task))
+		return;
+
+	if (likely(cqe->cqe_type == NVMETCP_FW_CQE_TYPE_NORMAL)) {
+		/* Placeholder - verify the connection was established */
+
+		switch (cqe->task_type) {
+		case NVMETCP_TASK_TYPE_HOST_WRITE:
+		case NVMETCP_TASK_TYPE_HOST_READ:
+
+			/* Placeholder - IO flow */
+
+			break;
+
+		case NVMETCP_TASK_TYPE_HOST_READ_NO_CQE:
+
+			/* Placeholder - IO flow */
+
+			break;
+
+		case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST:
+
+			/* Placeholder - ICReq flow */
+
+			break;
+		default:
+			pr_info("Could not identify task type\n");
+		}
+	} else {
+		/* Placeholder - Recovery flows */
+	}
+}
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 23/27] qedn: Add support of Task and SGL
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (21 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 22/27] qedn: Add IO level nvme_req and fw_cq workqueues Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:48   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 24/27] qedn: Add support of NVME ICReq & ICResp Shai Malin
                   ` (4 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

From: Prabhakar Kushwaha <pkushwaha@marvell.com>

This patch will add support of Task and SGL which is used
for slowpath and fast path IO. here Task is IO granule used
by firmware to perform tasks

The internal implementation:
- Create task/sgl resources used by all connection
- Provide APIs to allocate and free task.
- Add task support during connection establishment i.e. slowpath

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/qedn.h      |  66 +++++
 drivers/nvme/hw/qedn/qedn_conn.c |  43 +++-
 drivers/nvme/hw/qedn/qedn_main.c |  34 ++-
 drivers/nvme/hw/qedn/qedn_task.c | 411 +++++++++++++++++++++++++++++++
 4 files changed, 550 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index bd9a250cb2f5..880ca245b02c 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -50,6 +50,21 @@
 #define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"
 #define QEDN_NVME_REQ_FP_WQ_WORKQUEUE "qedn_nvme_req_fp_wq"
 
+/* Protocol defines */
+#define QEDN_MAX_IO_SIZE QED_NVMETCP_MAX_IO_SIZE
+
+#define QEDN_SGE_BUFF_SIZE 4096
+#define QEDN_MAX_SGES_PER_TASK DIV_ROUND_UP(QEDN_MAX_IO_SIZE, QEDN_SGE_BUFF_SIZE)
+#define QEDN_FW_SGE_SIZE sizeof(struct nvmetcp_sge)
+#define QEDN_MAX_FW_SGL_SIZE ((QEDN_MAX_SGES_PER_TASK) * QEDN_FW_SGE_SIZE)
+#define QEDN_FW_SLOW_IO_MIN_SGE_LIMIT (9700 / 6)
+
+#define QEDN_MAX_HW_SECTORS (QEDN_MAX_IO_SIZE / 512)
+#define QEDN_MAX_SEGMENTS QEDN_MAX_SGES_PER_TASK
+
+#define QEDN_TASK_INSIST_TMO 1000 /* 1 sec */
+#define QEDN_INVALID_ITID 0xFFFF
+
 /*
  * TCP offload stack default configurations and defines.
  * Future enhancements will allow controlling the configurable
@@ -95,6 +110,15 @@ enum qedn_state {
 	QEDN_STATE_MODULE_REMOVE_ONGOING,
 };
 
+struct qedn_io_resources {
+	/* Lock for IO resources */
+	spinlock_t resources_lock;
+	struct list_head task_free_list;
+	u32 num_alloc_tasks;
+	u32 num_free_tasks;
+	u32 no_avail_resrc_cnt;
+};
+
 /* Per CPU core params */
 struct qedn_fp_queue {
 	struct qed_chain cq_chain;
@@ -104,6 +128,10 @@ struct qedn_fp_queue {
 	struct qed_sb_info *sb_info;
 	unsigned int cpu;
 	struct work_struct fw_cq_fp_wq_entry;
+
+	/* IO related resources for host */
+	struct qedn_io_resources host_resrc;
+
 	u16 sb_id;
 	char irqname[QEDN_IRQ_NAME_LEN];
 };
@@ -130,6 +158,8 @@ struct qedn_ctx {
 	/* Connections */
 	DECLARE_HASHTABLE(conn_ctx_hash, 16);
 
+	u32 num_tasks_per_pool;
+
 	/* Fast path queues */
 	u8 num_fw_cqs;
 	struct qedn_fp_queue *fp_q_arr;
@@ -137,6 +167,27 @@ struct qedn_ctx {
 	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
 	struct workqueue_struct *nvme_req_fp_wq;
 	struct workqueue_struct *fw_cq_fp_wq;
+
+	/* Fast Path Tasks */
+	struct qed_nvmetcp_tid	tasks;
+};
+
+struct qedn_task_ctx {
+	struct qedn_conn_ctx *qedn_conn;
+	struct qedn_ctx *qedn;
+	void *fw_task_ctx;
+	struct qedn_fp_queue *fp_q;
+	struct scatterlist *nvme_sg;
+	struct nvme_tcp_ofld_req *req; /* currently proccessed request */
+	struct list_head entry;
+	spinlock_t lock; /* To protect task resources */
+	bool valid;
+	unsigned long flags; /* Used by qedn_task_flags */
+	u32 task_size;
+	u16 itid;
+	u16 cccid;
+	int req_direction;
+	struct storage_sgl_task_params sgl_task_params;
 };
 
 struct qedn_endpoint {
@@ -243,6 +294,7 @@ struct qedn_conn_ctx {
 	struct nvme_tcp_ofld_ctrl *ctrl;
 	u32 conn_handle;
 	u32 fw_cid;
+	u8 default_cq;
 
 	atomic_t est_conn_indicator;
 	atomic_t destroy_conn_indicator;
@@ -260,6 +312,11 @@ struct qedn_conn_ctx {
 	dma_addr_t host_cccid_itid_phy_addr;
 	struct qedn_endpoint ep;
 	int abrt_flag;
+	/* Spinlock for accessing active_task_list */
+	spinlock_t task_list_lock;
+	struct list_head active_task_list;
+	atomic_t num_active_tasks;
+	atomic_t num_active_fw_tasks;
 
 	/* Connection resources - turned on to indicate what resource was
 	 * allocated, to that it can later be released.
@@ -279,6 +336,7 @@ struct qedn_conn_ctx {
 enum qedn_conn_resources_state {
 	QEDN_CONN_RESRC_FW_SQ,
 	QEDN_CONN_RESRC_ACQUIRE_CONN,
+	QEDN_CONN_RESRC_TASKS,
 	QEDN_CONN_RESRC_CCCID_ITID_MAP,
 	QEDN_CONN_RESRC_TCP_PORT,
 	QEDN_CONN_RESRC_MAX = 64
@@ -309,5 +367,13 @@ inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 ccci
 void qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req);
 void qedn_nvme_req_fp_wq_handler(struct work_struct *work);
 void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe);
+int qedn_alloc_tasks(struct qedn_conn_ctx *conn_ctx);
+inline int qedn_qid(struct nvme_tcp_ofld_queue *queue);
+struct qedn_task_ctx *
+	qedn_get_task_from_pool_insist(struct qedn_conn_ctx *conn_ctx, u16 cccid);
+void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params);
+void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx);
+void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,
+			     struct qedn_io_resources *io_resrc);
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
index 90d8aa36d219..10a80fbeac43 100644
--- a/drivers/nvme/hw/qedn/qedn_conn.c
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -29,6 +29,11 @@ static const char * const qedn_conn_state_str[] = {
 	NULL
 };
 
+inline int qedn_qid(struct nvme_tcp_ofld_queue *queue)
+{
+	return queue - queue->ctrl->queues;
+}
+
 int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state)
 {
 	spin_lock_bh(&conn_ctx->conn_state_lock);
@@ -146,6 +151,11 @@ static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
 		clear_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
 	}
 
+	if (test_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state)) {
+		clear_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);
+			qedn_return_active_tasks(conn_ctx);
+	}
+
 	if (test_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state)) {
 		dma_free_coherent(&qedn->pdev->dev,
 				  conn_ctx->sq_depth *
@@ -247,6 +257,7 @@ static int qedn_nvmetcp_offload_conn(struct qedn_conn_ctx *conn_ctx)
 	offld_prms.max_rt_time = QEDN_TCP_MAX_RT_TIME;
 	offld_prms.sq_pbl_addr =
 		(u64)qed_chain_get_pbl_phys(&qedn_ep->fw_sq_chain);
+	offld_prms.default_cq = conn_ctx->default_cq;
 
 	rc = qed_ops->offload_conn(qedn->cdev,
 				   conn_ctx->conn_handle,
@@ -375,6 +386,9 @@ int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 {
 	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_io_resources *io_resrc;
+	struct qedn_fp_queue *fp_q;
+	u8 default_cq_idx, qid;
 	size_t dma_size;
 	int rc;
 
@@ -387,6 +401,8 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
 	INIT_LIST_HEAD(&conn_ctx->host_pend_req_list);
 	spin_lock_init(&conn_ctx->nvme_req_lock);
+	atomic_set(&conn_ctx->num_active_tasks, 0);
+	atomic_set(&conn_ctx->num_active_fw_tasks, 0);
 
 	rc = qed_ops->acquire_conn(qedn->cdev,
 				   &conn_ctx->conn_handle,
@@ -401,7 +417,32 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 		 conn_ctx->conn_handle);
 	set_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
 
-	/* Placeholder - Allocate task resources and initialize fields */
+	qid = qedn_qid(conn_ctx->queue);
+	default_cq_idx = qid ? qid - 1 : 0; /* Offset adminq */
+
+	conn_ctx->default_cq = (default_cq_idx % qedn->num_fw_cqs);
+	fp_q = &qedn->fp_q_arr[conn_ctx->default_cq];
+	conn_ctx->fp_q = fp_q;
+	io_resrc = &fp_q->host_resrc;
+
+	/* The first connection on each fp_q will fill task
+	 * resources
+	 */
+	spin_lock(&io_resrc->resources_lock);
+	if (io_resrc->num_alloc_tasks == 0) {
+		rc = qedn_alloc_tasks(conn_ctx);
+		if (rc) {
+			pr_err("Failed allocating tasks: CID=0x%x\n",
+			       conn_ctx->fw_cid);
+			spin_unlock(&io_resrc->resources_lock);
+			goto rel_conn;
+		}
+	}
+	spin_unlock(&io_resrc->resources_lock);
+
+	spin_lock_init(&conn_ctx->task_list_lock);
+	INIT_LIST_HEAD(&conn_ctx->active_task_list);
+	set_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);
 
 	rc = qedn_fetch_tcp_port(conn_ctx);
 	if (rc)
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 38f23dbb03a5..8d9c19d63480 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -30,6 +30,12 @@ __be16 qedn_get_in_port(struct sockaddr_storage *sa)
 		: ((struct sockaddr_in6 *)sa)->sin6_port;
 }
 
+static void qedn_init_io_resc(struct qedn_io_resources *io_resrc)
+{
+	spin_lock_init(&io_resrc->resources_lock);
+	INIT_LIST_HEAD(&io_resrc->task_free_list);
+}
+
 struct qedn_llh_filter *qedn_add_llh_filter(struct qedn_ctx *qedn, u16 tcp_port)
 {
 	struct qedn_llh_filter *llh_filter = NULL;
@@ -436,6 +442,8 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
 		 *	NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
 		 *	NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS
 		 */
+	.max_hw_sectors = QEDN_MAX_HW_SECTORS,
+	.max_segments = QEDN_MAX_SEGMENTS,
 	.claim_dev = qedn_claim_dev,
 	.setup_ctrl = qedn_setup_ctrl,
 	.release_ctrl = qedn_release_ctrl,
@@ -657,8 +665,24 @@ static void qedn_remove_pf_from_gl_list(struct qedn_ctx *qedn)
 	mutex_unlock(&qedn_glb.glb_mutex);
 }
 
+static void qedn_call_destroy_free_tasks(struct qedn_fp_queue *fp_q,
+					 struct qedn_io_resources *io_resrc)
+{
+	if (list_empty(&io_resrc->task_free_list))
+		return;
+
+	if (io_resrc->num_alloc_tasks != io_resrc->num_free_tasks)
+		pr_err("Task Pool:Not all returned allocated=0x%x, free=0x%x\n",
+		       io_resrc->num_alloc_tasks, io_resrc->num_free_tasks);
+
+	qedn_destroy_free_tasks(fp_q, io_resrc);
+	if (io_resrc->num_free_tasks)
+		pr_err("Expected num_free_tasks to be 0\n");
+}
+
 static void qedn_free_function_queues(struct qedn_ctx *qedn)
 {
+	struct qedn_io_resources *host_resrc;
 	struct qed_sb_info *sb_info = NULL;
 	struct qedn_fp_queue *fp_q;
 	int i;
@@ -673,6 +697,9 @@ static void qedn_free_function_queues(struct qedn_ctx *qedn)
 	/* Free the fast path queues*/
 	for (i = 0; i < qedn->num_fw_cqs; i++) {
 		fp_q = &qedn->fp_q_arr[i];
+		host_resrc = &fp_q->host_resrc;
+
+		qedn_call_destroy_free_tasks(fp_q, host_resrc);
 
 		/* Free SB */
 		sb_info = fp_q->sb_info;
@@ -769,7 +796,8 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
 		goto mem_alloc_failure;
 	}
 
-	/* placeholder - create task pools */
+	qedn->num_tasks_per_pool =
+		qedn->pf_params.nvmetcp_pf_params.num_tasks / qedn->num_fw_cqs;
 
 	for (i = 0; i < qedn->num_fw_cqs; i++) {
 		fp_q = &qedn->fp_q_arr[i];
@@ -811,7 +839,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
 		fp_q->qedn = qedn;
 		INIT_WORK(&fp_q->fw_cq_fp_wq_entry, qedn_fw_cq_fq_wq_handler);
 
-		/* Placeholder - Init IO-path resources */
+		qedn_init_io_resc(&fp_q->host_resrc);
 	}
 
 	return 0;
@@ -1005,7 +1033,7 @@ static int __qedn_probe(struct pci_dev *pdev)
 
 	/* NVMeTCP start HW PF */
 	rc = qed_ops->start(qedn->cdev,
-			    NULL /* Placeholder for FW IO-path resources */,
+			    &qedn->tasks,
 			    qedn,
 			    qedn_event_cb);
 	if (rc) {
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
index d3474188efdc..54f2f4cba6ea 100644
--- a/drivers/nvme/hw/qedn/qedn_task.c
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -11,6 +11,263 @@
 /* Driver includes */
 #include "qedn.h"
 
+static bool qedn_sgl_has_small_mid_sge(struct nvmetcp_sge *sgl, u16 sge_count)
+{
+	u16 sge_num;
+
+	if (sge_count > 8) {
+		for (sge_num = 0; sge_num < sge_count; sge_num++) {
+			if (le32_to_cpu(sgl[sge_num].sge_len) <
+			    QEDN_FW_SLOW_IO_MIN_SGE_LIMIT)
+				return true; /* small middle SGE found */
+		}
+	}
+
+	return false; /* no small middle SGEs */
+}
+
+static int qedn_init_sgl(struct qedn_ctx *qedn, struct qedn_task_ctx *qedn_task)
+{
+	struct storage_sgl_task_params *sgl_task_params;
+	enum dma_data_direction dma_dir;
+	struct scatterlist *sg;
+	struct request *rq;
+	u16 num_sges;
+	int index;
+	int rc;
+
+	sgl_task_params = &qedn_task->sgl_task_params;
+	rq = blk_mq_rq_from_pdu(qedn_task->req);
+	if (qedn_task->task_size == 0) {
+		sgl_task_params->num_sges = 0;
+
+		return 0;
+	}
+
+	/* Convert BIO to scatterlist */
+	num_sges = blk_rq_map_sg(rq->q, rq, qedn_task->nvme_sg);
+	if (qedn_task->req_direction == WRITE)
+		dma_dir = DMA_TO_DEVICE;
+	else
+		dma_dir = DMA_FROM_DEVICE;
+
+	/* DMA map the scatterlist */
+	if (dma_map_sg(&qedn->pdev->dev, qedn_task->nvme_sg, num_sges, dma_dir) != num_sges) {
+		pr_err("Couldn't map sgl\n");
+		rc = -EPERM;
+
+		return rc;
+	}
+
+	sgl_task_params->total_buffer_size = qedn_task->task_size;
+	sgl_task_params->num_sges = num_sges;
+
+	for_each_sg(qedn_task->nvme_sg, sg, num_sges, index) {
+		DMA_REGPAIR_LE(sgl_task_params->sgl[index].sge_addr, sg_dma_address(sg));
+		sgl_task_params->sgl[index].sge_len = cpu_to_le32(sg_dma_len(sg));
+	}
+
+	/* Relevant for Host Write Only */
+	sgl_task_params->small_mid_sge = (qedn_task->req_direction == READ) ?
+		false :
+		qedn_sgl_has_small_mid_sge(sgl_task_params->sgl,
+					   sgl_task_params->num_sges);
+
+	return 0;
+}
+
+static void qedn_free_nvme_sg(struct qedn_task_ctx *qedn_task)
+{
+	kfree(qedn_task->nvme_sg);
+	qedn_task->nvme_sg = NULL;
+}
+
+static void qedn_free_fw_sgl(struct qedn_task_ctx *qedn_task)
+{
+	struct qedn_ctx *qedn = qedn_task->qedn;
+	dma_addr_t sgl_pa;
+
+	sgl_pa = HILO_DMA_REGPAIR(qedn_task->sgl_task_params.sgl_phys_addr);
+	dma_free_coherent(&qedn->pdev->dev,
+			  QEDN_MAX_FW_SGL_SIZE,
+			  qedn_task->sgl_task_params.sgl,
+			  sgl_pa);
+	qedn_task->sgl_task_params.sgl = NULL;
+}
+
+static void qedn_destroy_single_task(struct qedn_task_ctx *qedn_task)
+{
+	u16 itid;
+
+	itid = qedn_task->itid;
+	list_del(&qedn_task->entry);
+	qedn_free_nvme_sg(qedn_task);
+	qedn_free_fw_sgl(qedn_task);
+	kfree(qedn_task);
+	qedn_task = NULL;
+}
+
+void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,
+			     struct qedn_io_resources *io_resrc)
+{
+	struct qedn_task_ctx *qedn_task, *task_tmp;
+
+	/* Destroy tasks from the free task list */
+	list_for_each_entry_safe(qedn_task, task_tmp,
+				 &io_resrc->task_free_list, entry) {
+		qedn_destroy_single_task(qedn_task);
+		io_resrc->num_free_tasks -= 1;
+	}
+}
+
+static int qedn_alloc_nvme_sg(struct qedn_task_ctx *qedn_task)
+{
+	int rc;
+
+	qedn_task->nvme_sg = kcalloc(QEDN_MAX_SGES_PER_TASK,
+				     sizeof(*qedn_task->nvme_sg), GFP_KERNEL);
+	if (!qedn_task->nvme_sg) {
+		rc = -ENOMEM;
+
+		return rc;
+	}
+
+	return 0;
+}
+
+static int qedn_alloc_fw_sgl(struct qedn_task_ctx *qedn_task)
+{
+	struct qedn_ctx *qedn = qedn_task->qedn_conn->qedn;
+	dma_addr_t fw_sgl_phys;
+
+	qedn_task->sgl_task_params.sgl =
+		dma_alloc_coherent(&qedn->pdev->dev, QEDN_MAX_FW_SGL_SIZE,
+				   &fw_sgl_phys, GFP_KERNEL);
+	if (!qedn_task->sgl_task_params.sgl) {
+		pr_err("Couldn't allocate FW sgl\n");
+
+		return -ENOMEM;
+	}
+
+	DMA_REGPAIR_LE(qedn_task->sgl_task_params.sgl_phys_addr, fw_sgl_phys);
+
+	return 0;
+}
+
+static inline void *qedn_get_fw_task(struct qed_nvmetcp_tid *info, u16 itid)
+{
+	return (void *)(info->blocks[itid / info->num_tids_per_block] +
+			(itid % info->num_tids_per_block) * info->size);
+}
+
+static struct qedn_task_ctx *qedn_alloc_task(struct qedn_conn_ctx *conn_ctx, u16 itid)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_task_ctx *qedn_task;
+	void *fw_task_ctx;
+	int rc = 0;
+
+	qedn_task = kzalloc(sizeof(*qedn_task), GFP_KERNEL);
+	if (!qedn_task)
+		return NULL;
+
+	spin_lock_init(&qedn_task->lock);
+	fw_task_ctx = qedn_get_fw_task(&qedn->tasks, itid);
+	if (!fw_task_ctx) {
+		pr_err("iTID: 0x%x; Failed getting fw_task_ctx memory\n", itid);
+		goto release_task;
+	}
+
+	/* No need to memset fw_task_ctx - its done in the HSI func */
+	qedn_task->qedn_conn = conn_ctx;
+	qedn_task->qedn = qedn;
+	qedn_task->fw_task_ctx = fw_task_ctx;
+	qedn_task->valid = 0;
+	qedn_task->flags = 0;
+	qedn_task->itid = itid;
+	rc = qedn_alloc_fw_sgl(qedn_task);
+	if (rc) {
+		pr_err("iTID: 0x%x; Failed allocating FW sgl\n", itid);
+		goto release_task;
+	}
+
+	rc = qedn_alloc_nvme_sg(qedn_task);
+	if (rc) {
+		pr_err("iTID: 0x%x; Failed allocating FW sgl\n", itid);
+		goto release_fw_sgl;
+	}
+
+	return qedn_task;
+
+release_fw_sgl:
+	qedn_free_fw_sgl(qedn_task);
+release_task:
+	kfree(qedn_task);
+
+	return NULL;
+}
+
+int qedn_alloc_tasks(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_task_ctx *qedn_task = NULL;
+	struct qedn_io_resources *io_resrc;
+	u16 itid, start_itid, offset;
+	struct qedn_fp_queue *fp_q;
+	int i, rc;
+
+	fp_q = conn_ctx->fp_q;
+
+	offset = fp_q->sb_id;
+	io_resrc = &fp_q->host_resrc;
+
+	start_itid = qedn->num_tasks_per_pool * offset;
+	for (i = 0; i < qedn->num_tasks_per_pool; ++i) {
+		itid = start_itid + i;
+		qedn_task = qedn_alloc_task(conn_ctx, itid);
+		if (!qedn_task) {
+			pr_err("Failed allocating task\n");
+			rc = -ENOMEM;
+			goto release_tasks;
+		}
+
+		qedn_task->fp_q = fp_q;
+		io_resrc->num_free_tasks += 1;
+		list_add_tail(&qedn_task->entry, &io_resrc->task_free_list);
+	}
+
+	io_resrc->num_alloc_tasks = io_resrc->num_free_tasks;
+
+	return 0;
+
+release_tasks:
+	qedn_destroy_free_tasks(fp_q, io_resrc);
+
+	return rc;
+}
+
+void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params)
+{
+	u16 sge_cnt = sgl_task_params->num_sges;
+
+	memset(&sgl_task_params->sgl[(sge_cnt - 1)], 0,
+	       sizeof(struct nvmetcp_sge));
+	sgl_task_params->total_buffer_size = 0;
+	sgl_task_params->small_mid_sge = false;
+	sgl_task_params->num_sges = 0;
+}
+
+inline void qedn_host_reset_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx,
+					     u16 cccid)
+{
+	conn_ctx->host_cccid_itid[cccid].itid = cpu_to_le16(QEDN_INVALID_ITID);
+}
+
+inline void qedn_host_set_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx, u16 cccid, u16 itid)
+{
+	conn_ctx->host_cccid_itid[cccid].itid = cpu_to_le16(itid);
+}
+
 inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 cccid)
 {
 	int rc = 0;
@@ -23,6 +280,160 @@ inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 ccci
 	return rc;
 }
 
+static void qedn_clear_sgl(struct qedn_ctx *qedn,
+			   struct qedn_task_ctx *qedn_task)
+{
+	struct storage_sgl_task_params *sgl_task_params;
+	enum dma_data_direction dma_dir;
+	u32 sge_cnt;
+
+	sgl_task_params = &qedn_task->sgl_task_params;
+	sge_cnt = sgl_task_params->num_sges;
+
+	/* Nothing to do if no SGEs were used */
+	if (!qedn_task->task_size || !sge_cnt)
+		return;
+
+	dma_dir = (qedn_task->req_direction == WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
+	dma_unmap_sg(&qedn->pdev->dev, qedn_task->nvme_sg, sge_cnt, dma_dir);
+	memset(&qedn_task->nvme_sg[(sge_cnt - 1)], 0, sizeof(struct scatterlist));
+	qedn_common_clear_fw_sgl(sgl_task_params);
+	qedn_task->task_size = 0;
+}
+
+static void qedn_clear_task(struct qedn_conn_ctx *conn_ctx,
+			    struct qedn_task_ctx *qedn_task)
+{
+	/* Task lock isn't needed since it is no longer in use */
+	qedn_clear_sgl(conn_ctx->qedn, qedn_task);
+	qedn_task->valid = 0;
+	qedn_task->flags = 0;
+
+	atomic_dec(&conn_ctx->num_active_tasks);
+}
+
+void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_fp_queue *fp_q = conn_ctx->fp_q;
+	struct qedn_task_ctx *qedn_task, *task_tmp;
+	struct qedn_io_resources *io_resrc;
+	int num_returned_tasks = 0;
+	int num_active_tasks;
+
+	io_resrc = &fp_q->host_resrc;
+
+	/* Return tasks that aren't "Used by FW" to the pool */
+	list_for_each_entry_safe(qedn_task, task_tmp,
+				 &conn_ctx->active_task_list, entry) {
+		qedn_clear_task(conn_ctx, qedn_task);
+		num_returned_tasks++;
+	}
+
+	if (num_returned_tasks) {
+		spin_lock(&io_resrc->resources_lock);
+		/* Return tasks to FP_Q pool in one shot */
+
+		list_splice_tail_init(&conn_ctx->active_task_list,
+				      &io_resrc->task_free_list);
+		io_resrc->num_free_tasks += num_returned_tasks;
+		spin_unlock(&io_resrc->resources_lock);
+	}
+
+	num_active_tasks = atomic_read(&conn_ctx->num_active_tasks);
+	if (num_active_tasks)
+		pr_err("num_active_tasks is %u after cleanup.\n", num_active_tasks);
+}
+
+void qedn_return_task_to_pool(struct qedn_conn_ctx *conn_ctx,
+			      struct qedn_task_ctx *qedn_task)
+{
+	struct qedn_fp_queue *fp_q = conn_ctx->fp_q;
+	struct qedn_io_resources *io_resrc;
+	unsigned long lock_flags;
+
+	io_resrc = &fp_q->host_resrc;
+
+	spin_lock_irqsave(&qedn_task->lock, lock_flags);
+	qedn_task->valid = 0;
+	qedn_task->flags = 0;
+	qedn_clear_sgl(conn_ctx->qedn, qedn_task);
+	spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
+
+	spin_lock(&conn_ctx->task_list_lock);
+	list_del(&qedn_task->entry);
+	qedn_host_reset_cccid_itid_entry(conn_ctx, qedn_task->cccid);
+	spin_unlock(&conn_ctx->task_list_lock);
+
+	atomic_dec(&conn_ctx->num_active_tasks);
+	atomic_dec(&conn_ctx->num_active_fw_tasks);
+
+	spin_lock(&io_resrc->resources_lock);
+	list_add_tail(&qedn_task->entry, &io_resrc->task_free_list);
+	io_resrc->num_free_tasks += 1;
+	spin_unlock(&io_resrc->resources_lock);
+}
+
+struct qedn_task_ctx *
+qedn_get_free_task_from_pool(struct qedn_conn_ctx *conn_ctx, u16 cccid)
+{
+	struct qedn_task_ctx *qedn_task = NULL;
+	struct qedn_io_resources *io_resrc;
+	struct qedn_fp_queue *fp_q;
+
+	fp_q = conn_ctx->fp_q;
+	io_resrc = &fp_q->host_resrc;
+
+	spin_lock(&io_resrc->resources_lock);
+	qedn_task = list_first_entry_or_null(&io_resrc->task_free_list,
+					     struct qedn_task_ctx, entry);
+	if (unlikely(!qedn_task)) {
+		spin_unlock(&io_resrc->resources_lock);
+
+		return NULL;
+	}
+	list_del(&qedn_task->entry);
+	io_resrc->num_free_tasks -= 1;
+	spin_unlock(&io_resrc->resources_lock);
+
+	spin_lock(&conn_ctx->task_list_lock);
+	list_add_tail(&qedn_task->entry, &conn_ctx->active_task_list);
+	qedn_host_set_cccid_itid_entry(conn_ctx, cccid, qedn_task->itid);
+	spin_unlock(&conn_ctx->task_list_lock);
+
+	atomic_inc(&conn_ctx->num_active_tasks);
+	qedn_task->cccid = cccid;
+	qedn_task->qedn_conn = conn_ctx;
+	qedn_task->valid = 1;
+
+	return qedn_task;
+}
+
+struct qedn_task_ctx *
+qedn_get_task_from_pool_insist(struct qedn_conn_ctx *conn_ctx, u16 cccid)
+{
+	struct qedn_task_ctx *qedn_task = NULL;
+	unsigned long timeout;
+
+	qedn_task = qedn_get_free_task_from_pool(conn_ctx, cccid);
+	if (unlikely(!qedn_task)) {
+		timeout = msecs_to_jiffies(QEDN_TASK_INSIST_TMO) + jiffies;
+		while (1) {
+			qedn_task = qedn_get_free_task_from_pool(conn_ctx, cccid);
+			if (likely(qedn_task))
+				break;
+
+			msleep(100);
+			if (time_after(jiffies, timeout)) {
+				pr_err("Failed on timeout of fetching task\n");
+
+				return NULL;
+			}
+		}
+	}
+
+	return qedn_task;
+}
+
 static bool qedn_process_req(struct qedn_conn_ctx *qedn_conn)
 {
 	return true;
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 24/27] qedn: Add support of NVME ICReq & ICResp
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (22 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 23/27] qedn: Add support of Task and SGL Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:53   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 25/27] qedn: Add IO level fastpath functionality Shai Malin
                   ` (3 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

From: Prabhakar Kushwaha <pkushwaha@marvell.com>

Once a TCP connection established, the host sends an Initialize
Connection Request (ICReq) PDU to the controller.
Further Initialize Connection Response (ICResp) PDU received from
controller is processed by host to establish a connection and
exchange connection configuration parameters.

This patch present support of generation of ICReq and processing of
ICResp. It also update host configuration based on exchanged parameters.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/qedn.h      |  36 ++++
 drivers/nvme/hw/qedn/qedn_conn.c | 317 ++++++++++++++++++++++++++++++-
 drivers/nvme/hw/qedn/qedn_main.c |  22 +++
 drivers/nvme/hw/qedn/qedn_task.c |   8 +-
 4 files changed, 379 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 880ca245b02c..773a57994148 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -16,6 +16,7 @@
 
 /* Driver includes */
 #include "../../host/tcp-offload.h"
+#include <linux/nvme-tcp.h>
 
 #define QEDN_MAJOR_VERSION		8
 #define QEDN_MINOR_VERSION		62
@@ -52,6 +53,8 @@
 
 /* Protocol defines */
 #define QEDN_MAX_IO_SIZE QED_NVMETCP_MAX_IO_SIZE
+#define QEDN_MAX_PDU_SIZE 0x80000 /* 512KB */
+#define QEDN_MAX_OUTSTANDING_R2T_PDUS 0 /* 0 Based == 1 max R2T */
 
 #define QEDN_SGE_BUFF_SIZE 4096
 #define QEDN_MAX_SGES_PER_TASK DIV_ROUND_UP(QEDN_MAX_IO_SIZE, QEDN_SGE_BUFF_SIZE)
@@ -65,6 +68,11 @@
 #define QEDN_TASK_INSIST_TMO 1000 /* 1 sec */
 #define QEDN_INVALID_ITID 0xFFFF
 
+#define QEDN_ICREQ_FW_PAYLOAD (sizeof(struct nvme_tcp_icreq_pdu) - \
+			       sizeof(struct nvmetcp_init_conn_req_hdr))
+/* The FW will handle the ICReq as CCCID 0 (FW internal design) */
+#define QEDN_ICREQ_CCCID 0
+
 /*
  * TCP offload stack default configurations and defines.
  * Future enhancements will allow controlling the configurable
@@ -136,6 +144,16 @@ struct qedn_fp_queue {
 	char irqname[QEDN_IRQ_NAME_LEN];
 };
 
+struct qedn_negotiation_params {
+	u32 maxh2cdata; /* Negotiation */
+	u32 maxr2t; /* Validation */
+	u16 pfv; /* Validation */
+	bool hdr_digest; /* Negotiation */
+	bool data_digest; /* Negotiation */
+	u8 cpda; /* Negotiation */
+	u8 hpda; /* Validation */
+};
+
 struct qedn_ctx {
 	struct pci_dev *pdev;
 	struct qed_dev *cdev;
@@ -195,6 +213,9 @@ struct qedn_endpoint {
 	struct qed_chain fw_sq_chain;
 	void __iomem *p_doorbell;
 
+	/* Spinlock for accessing FW queue */
+	spinlock_t doorbell_lock;
+
 	/* TCP Params */
 	__be32 dst_addr[4]; /* In network order */
 	__be32 src_addr[4]; /* In network order */
@@ -268,6 +289,12 @@ struct qedn_ctrl {
 	atomic_t host_num_active_conns;
 };
 
+struct qedn_icreq_padding {
+	u32 *buffer;
+	dma_addr_t pa;
+	struct nvmetcp_sge sge;
+};
+
 /* Connection level struct */
 struct qedn_conn_ctx {
 	/* IO path */
@@ -329,6 +356,11 @@ struct qedn_conn_ctx {
 
 	size_t sq_depth;
 
+	struct qedn_negotiation_params required_params;
+	struct qedn_negotiation_params pdu_params;
+	struct nvmetcp_icresp_hdr_psh icresp;
+	struct qedn_icreq_padding *icreq_pad;
+
 	/* "dummy" socket */
 	struct socket *sock;
 };
@@ -337,6 +369,7 @@ enum qedn_conn_resources_state {
 	QEDN_CONN_RESRC_FW_SQ,
 	QEDN_CONN_RESRC_ACQUIRE_CONN,
 	QEDN_CONN_RESRC_TASKS,
+	QEDN_CONN_RESRC_ICREQ_PAD,
 	QEDN_CONN_RESRC_CCCID_ITID_MAP,
 	QEDN_CONN_RESRC_TCP_PORT,
 	QEDN_CONN_RESRC_MAX = 64
@@ -375,5 +408,8 @@ void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params);
 void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx);
 void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,
 			     struct qedn_io_resources *io_resrc);
+void qedn_swap_bytes(u32 *p, int size);
+void qedn_prep_icresp(struct qedn_conn_ctx *conn_ctx, struct nvmetcp_fw_cqe *cqe);
+void qedn_ring_doorbell(struct qedn_conn_ctx *conn_ctx);
 
 #endif /* _QEDN_H_ */
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
index 10a80fbeac43..5679354aa0e0 100644
--- a/drivers/nvme/hw/qedn/qedn_conn.c
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -34,6 +34,25 @@ inline int qedn_qid(struct nvme_tcp_ofld_queue *queue)
 	return queue - queue->ctrl->queues;
 }
 
+void qedn_ring_doorbell(struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvmetcp_db_data dbell = { 0 };
+	u16 prod_idx;
+
+	dbell.agg_flags = 0;
+	dbell.params |= DB_DEST_XCM << NVMETCP_DB_DATA_DEST_SHIFT;
+	dbell.params |= DB_AGG_CMD_SET << NVMETCP_DB_DATA_AGG_CMD_SHIFT;
+	dbell.params |=
+		DQ_XCM_ISCSI_SQ_PROD_CMD << NVMETCP_DB_DATA_AGG_VAL_SEL_SHIFT;
+	dbell.params |= 1 << NVMETCP_DB_DATA_BYPASS_EN_SHIFT;
+	prod_idx = qed_chain_get_prod_idx(&conn_ctx->ep.fw_sq_chain);
+	dbell.sq_prod = cpu_to_le16(prod_idx);
+
+	/* wmb - Make sure fw idx is coherent */
+	wmb();
+	writel(*(u32 *)&dbell, conn_ctx->ep.p_doorbell);
+}
+
 int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state)
 {
 	spin_lock_bh(&conn_ctx->conn_state_lock);
@@ -130,6 +149,71 @@ int qedn_initialize_endpoint(struct qedn_endpoint *ep, u8 *local_mac_addr,
 	return -1;
 }
 
+static int qedn_alloc_icreq_pad(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_icreq_padding *icreq_pad;
+	u32 *buffer;
+	int rc = 0;
+
+	icreq_pad = kzalloc(sizeof(*icreq_pad), GFP_KERNEL);
+	if (!icreq_pad)
+		return -ENOMEM;
+
+	conn_ctx->icreq_pad = icreq_pad;
+	memset(&icreq_pad->sge, 0, sizeof(icreq_pad->sge));
+	buffer = dma_alloc_coherent(&qedn->pdev->dev,
+				    QEDN_ICREQ_FW_PAYLOAD,
+				    &icreq_pad->pa,
+				    GFP_KERNEL);
+	if (!buffer) {
+		pr_err("Could not allocate icreq_padding SGE buffer.\n");
+		rc =  -ENOMEM;
+		goto release_icreq_pad;
+	}
+
+	DMA_REGPAIR_LE(icreq_pad->sge.sge_addr, icreq_pad->pa);
+	icreq_pad->sge.sge_len = cpu_to_le32(QEDN_ICREQ_FW_PAYLOAD);
+	icreq_pad->buffer = buffer;
+	set_bit(QEDN_CONN_RESRC_ICREQ_PAD, &conn_ctx->resrc_state);
+
+	return 0;
+
+release_icreq_pad:
+	kfree(icreq_pad);
+	conn_ctx->icreq_pad = NULL;
+
+	return rc;
+}
+
+static void qedn_free_icreq_pad(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct qedn_icreq_padding *icreq_pad;
+	u32 *buffer;
+
+	icreq_pad = conn_ctx->icreq_pad;
+	if (unlikely(!icreq_pad)) {
+		pr_err("null ptr in icreq_pad in conn_ctx\n");
+		goto finally;
+	}
+
+	buffer = icreq_pad->buffer;
+	if (buffer) {
+		dma_free_coherent(&qedn->pdev->dev,
+				  QEDN_ICREQ_FW_PAYLOAD,
+				  (void *)buffer,
+				  icreq_pad->pa);
+		icreq_pad->buffer = NULL;
+	}
+
+	kfree(icreq_pad);
+	conn_ctx->icreq_pad = NULL;
+
+finally:
+	clear_bit(QEDN_CONN_RESRC_ICREQ_PAD, &conn_ctx->resrc_state);
+}
+
 static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
 {
 	struct qedn_ctx *qedn = conn_ctx->qedn;
@@ -151,6 +235,9 @@ static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
 		clear_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
 	}
 
+	if (test_bit(QEDN_CONN_RESRC_ICREQ_PAD, &conn_ctx->resrc_state))
+		qedn_free_icreq_pad(conn_ctx);
+
 	if (test_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state)) {
 		clear_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);
 			qedn_return_active_tasks(conn_ctx);
@@ -309,6 +396,194 @@ void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx, int abrt_flag)
 	queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
 }
 
+static int qedn_nvmetcp_update_conn(struct qedn_ctx *qedn, struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_negotiation_params *pdu_params = &conn_ctx->pdu_params;
+	struct qed_nvmetcp_params_update *conn_info;
+	int rc;
+
+	conn_info = kzalloc(sizeof(*conn_info), GFP_KERNEL);
+	if (!conn_info)
+		return -ENOMEM;
+
+	conn_info->hdr_digest_en = pdu_params->hdr_digest;
+	conn_info->data_digest_en = pdu_params->data_digest;
+	conn_info->max_recv_pdu_length = QEDN_MAX_PDU_SIZE;
+	conn_info->max_io_size = QEDN_MAX_IO_SIZE;
+	conn_info->max_send_pdu_length = pdu_params->maxh2cdata;
+
+	rc = qed_ops->update_conn(qedn->cdev, conn_ctx->conn_handle, conn_info);
+	if (rc) {
+		pr_err("Could not update connection\n");
+		rc = -ENXIO;
+	}
+
+	kfree(conn_info);
+
+	return rc;
+}
+
+static int qedn_update_ramrod(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int rc = 0;
+
+	rc = qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_UPDATE_EQE);
+	if (rc)
+		return rc;
+
+	rc = qedn_nvmetcp_update_conn(qedn, conn_ctx);
+	if (rc)
+		return rc;
+
+	if (conn_ctx->state != CONN_STATE_WAIT_FOR_UPDATE_EQE) {
+		pr_err("cid 0x%x: Unexpected state 0x%x after update ramrod\n",
+		       conn_ctx->fw_cid, conn_ctx->state);
+
+		return -EINVAL;
+	}
+
+	return rc;
+}
+
+static int qedn_send_icreq(struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvmetcp_init_conn_req_hdr *icreq_ptr = NULL;
+	struct storage_sgl_task_params *sgl_task_params;
+	struct nvmetcp_task_params task_params;
+	struct qedn_task_ctx *qedn_task = NULL;
+	struct nvme_tcp_icreq_pdu icreq;
+	struct nvmetcp_wqe *chain_sqe;
+	struct nvmetcp_wqe local_sqe;
+
+	qedn_task = qedn_get_task_from_pool_insist(conn_ctx, QEDN_ICREQ_CCCID);
+	if (!qedn_task)
+		return -EINVAL;
+
+	memset(&icreq, 0, sizeof(icreq));
+	memset(&local_sqe, 0, sizeof(local_sqe));
+
+	/* Initialize ICReq */
+	icreq.hdr.type = nvme_tcp_icreq;
+	icreq.hdr.hlen = sizeof(icreq);
+	icreq.hdr.pdo = 0;
+	icreq.hdr.plen = cpu_to_le32(icreq.hdr.hlen);
+	icreq.pfv = cpu_to_le16(conn_ctx->required_params.pfv);
+	icreq.maxr2t = cpu_to_le32(conn_ctx->required_params.maxr2t);
+	icreq.hpda = conn_ctx->required_params.hpda;
+	if (conn_ctx->required_params.hdr_digest)
+		icreq.digest |= NVME_TCP_HDR_DIGEST_ENABLE;
+	if (conn_ctx->required_params.data_digest)
+		icreq.digest |= NVME_TCP_DATA_DIGEST_ENABLE;
+
+	qedn_swap_bytes((u32 *)&icreq,
+			(sizeof(icreq) - QEDN_ICREQ_FW_PAYLOAD) /
+			 sizeof(u32));
+
+	/* Initialize task params */
+	task_params.opq.lo = cpu_to_le32(((u64)(qedn_task)) & 0xffffffff);
+	task_params.opq.hi = cpu_to_le32(((u64)(qedn_task)) >> 32);
+	task_params.context = qedn_task->fw_task_ctx;
+	task_params.sqe = &local_sqe;
+	task_params.conn_icid = (u16)conn_ctx->conn_handle;
+	task_params.itid = qedn_task->itid;
+	task_params.cq_rss_number = conn_ctx->default_cq;
+	task_params.tx_io_size = QEDN_ICREQ_FW_PAYLOAD;
+	task_params.rx_io_size = 0; /* Rx doesn't use SGL for icresp */
+
+	/* Init SGE for ICReq padding */
+	sgl_task_params = &qedn_task->sgl_task_params;
+	sgl_task_params->total_buffer_size = task_params.tx_io_size;
+	sgl_task_params->small_mid_sge = false;
+	sgl_task_params->num_sges = 1;
+	memcpy(sgl_task_params->sgl, &conn_ctx->icreq_pad->sge,
+	       sizeof(conn_ctx->icreq_pad->sge));
+	icreq_ptr = (struct nvmetcp_init_conn_req_hdr *)&icreq;
+
+	qed_ops->init_icreq_exchange(&task_params, icreq_ptr, sgl_task_params,  NULL);
+
+	qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_IC_COMP);
+	atomic_inc(&conn_ctx->num_active_fw_tasks);
+
+	/* spin_lock - doorbell is accessed  both Rx flow and response flow */
+	spin_lock(&conn_ctx->ep.doorbell_lock);
+	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
+	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
+	qedn_ring_doorbell(conn_ctx);
+	spin_unlock(&conn_ctx->ep.doorbell_lock);
+
+	return 0;
+}
+
+void qedn_prep_icresp(struct qedn_conn_ctx *conn_ctx, struct nvmetcp_fw_cqe *cqe)
+{
+	struct nvmetcp_icresp_hdr_psh *icresp_from_cqe =
+		(struct nvmetcp_icresp_hdr_psh *)&cqe->nvme_cqe;
+	struct nvme_tcp_ofld_ctrl *ctrl = conn_ctx->ctrl;
+	struct qedn_ctrl *qctrl = NULL;
+
+	qctrl = (struct qedn_ctrl *)ctrl->private_data;
+
+	memcpy(&conn_ctx->icresp, icresp_from_cqe, sizeof(conn_ctx->icresp));
+	qedn_set_sp_wa(conn_ctx, HANDLE_ICRESP);
+	queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
+}
+
+static int qedn_handle_icresp(struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvmetcp_icresp_hdr_psh *icresp = &conn_ctx->icresp;
+	u16 pfv = __swab16(le16_to_cpu(icresp->pfv_swapped));
+	int rc = 0;
+
+	qedn_free_icreq_pad(conn_ctx);
+
+	/* Validate ICResp */
+	if (pfv != conn_ctx->required_params.pfv) {
+		pr_err("cid %u: unsupported pfv %u\n", conn_ctx->fw_cid, pfv);
+
+		return -EINVAL;
+	}
+
+	if (icresp->cpda > conn_ctx->required_params.cpda) {
+		pr_err("cid %u: unsupported cpda %u\n", conn_ctx->fw_cid, icresp->cpda);
+
+		return -EINVAL;
+	}
+
+	if ((NVME_TCP_HDR_DIGEST_ENABLE & icresp->digest) !=
+	    conn_ctx->required_params.hdr_digest) {
+		if ((NVME_TCP_HDR_DIGEST_ENABLE & icresp->digest) >
+		    conn_ctx->required_params.hdr_digest) {
+			pr_err("cid 0x%x: invalid header digest bit\n", conn_ctx->fw_cid);
+		}
+	}
+
+	if ((NVME_TCP_DATA_DIGEST_ENABLE & icresp->digest) !=
+	    conn_ctx->required_params.data_digest) {
+		if ((NVME_TCP_DATA_DIGEST_ENABLE & icresp->digest) >
+		    conn_ctx->required_params.data_digest) {
+			pr_err("cid 0x%x: invalid data digest bit\n", conn_ctx->fw_cid);
+	}
+	}
+
+	memset(&conn_ctx->pdu_params, 0, sizeof(conn_ctx->pdu_params));
+	conn_ctx->pdu_params.maxh2cdata =
+		__swab32(le32_to_cpu(icresp->maxdata_swapped));
+	conn_ctx->pdu_params.maxh2cdata = QEDN_MAX_PDU_SIZE;
+	if (conn_ctx->pdu_params.maxh2cdata > QEDN_MAX_PDU_SIZE)
+		conn_ctx->pdu_params.maxh2cdata = QEDN_MAX_PDU_SIZE;
+
+	conn_ctx->pdu_params.pfv = pfv;
+	conn_ctx->pdu_params.cpda = icresp->cpda;
+	conn_ctx->pdu_params.hpda = conn_ctx->required_params.hpda;
+	conn_ctx->pdu_params.hdr_digest = NVME_TCP_HDR_DIGEST_ENABLE & icresp->digest;
+	conn_ctx->pdu_params.data_digest = NVME_TCP_DATA_DIGEST_ENABLE & icresp->digest;
+	conn_ctx->pdu_params.maxr2t = conn_ctx->required_params.maxr2t;
+	rc = qedn_update_ramrod(conn_ctx);
+
+	return rc;
+}
+
 /* Slowpath EQ Callback */
 int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 {
@@ -363,7 +638,8 @@ int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 			if (rc)
 				return rc;
 
-			/* Placeholder - for ICReq flow */
+			qedn_set_sp_wa(conn_ctx, SEND_ICREQ);
+			queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
 		}
 
 		break;
@@ -399,6 +675,7 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 	}
 
 	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
+	spin_lock_init(&conn_ctx->ep.doorbell_lock);
 	INIT_LIST_HEAD(&conn_ctx->host_pend_req_list);
 	spin_lock_init(&conn_ctx->nvme_req_lock);
 	atomic_set(&conn_ctx->num_active_tasks, 0);
@@ -463,6 +740,11 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 
 	memset(conn_ctx->host_cccid_itid, 0xFF, dma_size);
 	set_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state);
+
+	rc = qedn_alloc_icreq_pad(conn_ctx);
+		if (rc)
+			goto rel_conn;
+
 	rc = qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_CONNECT_DONE);
 	if (rc)
 		goto rel_conn;
@@ -523,6 +805,9 @@ void qedn_sp_wq_handler(struct work_struct *work)
 
 	qedn = conn_ctx->qedn;
 	if (test_bit(DESTROY_CONNECTION, &conn_ctx->agg_work_action)) {
+		if (test_bit(HANDLE_ICRESP, &conn_ctx->agg_work_action))
+			qedn_clr_sp_wa(conn_ctx, HANDLE_ICRESP);
+
 		qedn_destroy_connection(conn_ctx);
 
 		return;
@@ -537,6 +822,36 @@ void qedn_sp_wq_handler(struct work_struct *work)
 			return;
 		}
 	}
+
+	if (test_bit(SEND_ICREQ, &conn_ctx->agg_work_action)) {
+		qedn_clr_sp_wa(conn_ctx, SEND_ICREQ);
+		rc = qedn_send_icreq(conn_ctx);
+		if (rc)
+			return;
+
+		return;
+	}
+
+	if (test_bit(HANDLE_ICRESP, &conn_ctx->agg_work_action)) {
+		rc = qedn_handle_icresp(conn_ctx);
+
+		qedn_clr_sp_wa(conn_ctx, HANDLE_ICRESP);
+		if (rc) {
+			pr_err("IC handling returned with 0x%x\n", rc);
+			if (test_and_set_bit(DESTROY_CONNECTION, &conn_ctx->agg_work_action))
+				return;
+
+			qedn_destroy_connection(conn_ctx);
+
+			return;
+		}
+
+		atomic_inc(&conn_ctx->est_conn_indicator);
+		qedn_set_con_state(conn_ctx, CONN_STATE_NVMETCP_CONN_ESTABLISHED);
+		wake_up_interruptible(&conn_ctx->conn_waitq);
+
+		return;
+	}
 }
 
 /* Clear connection aggregative slowpath work action */
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 8d9c19d63480..a6756d7250b7 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -285,6 +285,19 @@ static void qedn_set_ctrl_io_cpus(struct qedn_conn_ctx *conn_ctx, int qid)
 	conn_ctx->cpu = fp_q->cpu;
 }
 
+static void qedn_set_pdu_params(struct qedn_conn_ctx *conn_ctx)
+{
+	/* Enable digest once supported */
+	conn_ctx->required_params.hdr_digest = 0;
+	conn_ctx->required_params.data_digest = 0;
+
+	conn_ctx->required_params.maxr2t = QEDN_MAX_OUTSTANDING_R2T_PDUS;
+	conn_ctx->required_params.pfv = NVME_TCP_PFV_1_0;
+	conn_ctx->required_params.cpda = 0;
+	conn_ctx->required_params.hpda = 0;
+	conn_ctx->required_params.maxh2cdata = QEDN_MAX_PDU_SIZE;
+}
+
 static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t q_size)
 {
 	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
@@ -307,6 +320,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t
 	conn_ctx->ctrl = ctrl;
 	conn_ctx->sq_depth = q_size;
 	qedn_set_ctrl_io_cpus(conn_ctx, qid);
+	qedn_set_pdu_params(conn_ctx);
 
 	init_waitqueue_head(&conn_ctx->conn_waitq);
 	atomic_set(&conn_ctx->est_conn_indicator, 0);
@@ -1073,6 +1087,14 @@ static int qedn_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	return __qedn_probe(pdev);
 }
 
+void qedn_swap_bytes(u32 *p, int size)
+{
+	int i;
+
+	for (i = 0; i < size; ++i, ++p)
+		*p = __swab32(*p);
+}
+
 static struct pci_driver qedn_pci_driver = {
 	.name     = QEDN_MODULE_NAME,
 	.id_table = qedn_pci_tbl,
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
index 54f2f4cba6ea..9cb84883e95e 100644
--- a/drivers/nvme/hw/qedn/qedn_task.c
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -536,9 +536,11 @@ void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
 			break;
 
 		case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST:
-
-			/* Placeholder - ICReq flow */
-
+			/* Clear ICReq-padding SGE from SGL */
+			qedn_common_clear_fw_sgl(&qedn_task->sgl_task_params);
+			/* Task is not required for icresp processing */
+			qedn_return_task_to_pool(conn_ctx, qedn_task);
+			qedn_prep_icresp(conn_ctx, cqe);
 			break;
 		default:
 			pr_info("Could not identify task type\n");
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 25/27] qedn: Add IO level fastpath functionality
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (23 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 24/27] qedn: Add support of NVME ICReq & ICResp Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:54   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 26/27] qedn: Add Connection and IO level recovery flows Shai Malin
                   ` (2 subsequent siblings)
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

This patch will present the IO level functionality of qedn
nvme-tcp-offload host mode. The qedn_task_ctx structure is containing
various params and state of the current IO, and is mapped 1x1 to the
fw_task_ctx which is a HW and FW IO context.
A qedn_task is mapped directly to its parent connection.
For every new IO a qedn_task structure will be assigned and they will be
linked for the entire IO's life span.

The patch will include 2 flows:
  1. Send new command to the FW:
	 The flow is: nvme_tcp_ofld_queue_rq() which invokes qedn_send_req()
	 which invokes qedn_queue_request() which will:
     - Assign fw_task_ctx.
	 - Prepare the Read/Write SG buffer.
	 -  Initialize the HW and FW context.
	 - Pass the IO to the FW.

  2. Process the IO completion:
     The flow is: qedn_irq_handler() which invokes qedn_fw_cq_fp_handler()
	 which invokes qedn_io_work_cq() which will:
	 - process the FW completion.
	 - Return the fw_task_ctx to the task pool.
	 - complete the nvme req.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/qedn.h      |   4 +
 drivers/nvme/hw/qedn/qedn_conn.c |   1 +
 drivers/nvme/hw/qedn/qedn_task.c | 269 ++++++++++++++++++++++++++++++-
 3 files changed, 272 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 773a57994148..065e4324e30c 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -190,6 +190,10 @@ struct qedn_ctx {
 	struct qed_nvmetcp_tid	tasks;
 };
 
+enum qedn_task_flags {
+	QEDN_TASK_USED_BY_FW,
+};
+
 struct qedn_task_ctx {
 	struct qedn_conn_ctx *qedn_conn;
 	struct qedn_ctx *qedn;
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
index 5679354aa0e0..fa8d414eb888 100644
--- a/drivers/nvme/hw/qedn/qedn_conn.c
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -503,6 +503,7 @@ static int qedn_send_icreq(struct qedn_conn_ctx *conn_ctx)
 	qed_ops->init_icreq_exchange(&task_params, icreq_ptr, sgl_task_params,  NULL);
 
 	qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_IC_COMP);
+	set_bit(QEDN_TASK_USED_BY_FW, &qedn_task->flags);
 	atomic_inc(&conn_ctx->num_active_fw_tasks);
 
 	/* spin_lock - doorbell is accessed  both Rx flow and response flow */
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
index 9cb84883e95e..13d9fb6ed5b6 100644
--- a/drivers/nvme/hw/qedn/qedn_task.c
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -11,6 +11,8 @@
 /* Driver includes */
 #include "qedn.h"
 
+extern const struct qed_nvmetcp_ops *qed_ops;
+
 static bool qedn_sgl_has_small_mid_sge(struct nvmetcp_sge *sgl, u16 sge_count)
 {
 	u16 sge_num;
@@ -434,8 +436,194 @@ qedn_get_task_from_pool_insist(struct qedn_conn_ctx *conn_ctx, u16 cccid)
 	return qedn_task;
 }
 
+int qedn_send_read_cmd(struct qedn_task_ctx *qedn_task, struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_command *nvme_cmd = &qedn_task->req->nvme_cmd;
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct nvmetcp_cmd_capsule_hdr cmd_hdr;
+	struct nvmetcp_task_params task_params;
+	struct nvmetcp_conn_params conn_params;
+	struct nvmetcp_wqe *chain_sqe;
+	struct nvmetcp_wqe local_sqe;
+	int rc;
+	int i;
+
+	rc = qedn_init_sgl(qedn, qedn_task);
+	if (rc)
+		return rc;
+
+	task_params.opq.lo = cpu_to_le32(((u64)(qedn_task)) & 0xffffffff);
+	task_params.opq.hi = cpu_to_le32(((u64)(qedn_task)) >> 32);
+
+	/* Initialize task params */
+	task_params.context = qedn_task->fw_task_ctx;
+	task_params.sqe = &local_sqe;
+	task_params.tx_io_size = 0;
+	task_params.rx_io_size = qedn_task->task_size;
+	task_params.conn_icid = (u16)conn_ctx->conn_handle;
+	task_params.itid = qedn_task->itid;
+	task_params.cq_rss_number = conn_ctx->default_cq;
+	task_params.send_write_incapsule = 0;
+
+	/* Initialize conn params */
+	conn_params.max_burst_length = QEDN_MAX_IO_SIZE;
+
+	cmd_hdr.chdr.pdu_type = nvme_tcp_cmd;
+	cmd_hdr.chdr.flags = 0;
+	cmd_hdr.chdr.hlen = sizeof(cmd_hdr);
+	cmd_hdr.chdr.pdo = 0x0;
+	cmd_hdr.chdr.plen_swapped = cpu_to_le32(__swab32(cmd_hdr.chdr.hlen));
+
+	for (i = 0; i < 16; i++)
+		cmd_hdr.pshdr.raw_swapped[i] = cpu_to_le32(__swab32(((u32 *)nvme_cmd)[i]));
+
+	qed_ops->init_read_io(&task_params, &conn_params, &cmd_hdr, &qedn_task->sgl_task_params);
+
+	set_bit(QEDN_TASK_USED_BY_FW, &qedn_task->flags);
+	atomic_inc(&conn_ctx->num_active_fw_tasks);
+
+	spin_lock(&conn_ctx->ep.doorbell_lock);
+	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
+	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
+	qedn_ring_doorbell(conn_ctx);
+	spin_unlock(&conn_ctx->ep.doorbell_lock);
+
+	return 0;
+}
+
+int qedn_send_write_cmd(struct qedn_task_ctx *qedn_task, struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_command *nvme_cmd = &qedn_task->req->nvme_cmd;
+	struct nvmetcp_task_params task_params;
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	struct nvmetcp_cmd_capsule_hdr cmd_hdr;
+	struct nvmetcp_conn_params conn_params;
+	u32 pdu_len = sizeof(cmd_hdr);
+	struct nvmetcp_wqe *chain_sqe;
+	struct nvmetcp_wqe local_sqe;
+	u8 send_write_incapsule;
+	int rc;
+	int i;
+
+	if (qedn_task->task_size <= nvme_tcp_ofld_inline_data_size(conn_ctx->queue) &&
+	    qedn_task->task_size) {
+		send_write_incapsule = 1;
+		pdu_len += qedn_task->task_size;
+
+		/* Add digest length once supported */
+		cmd_hdr.chdr.pdo = sizeof(cmd_hdr);
+	} else {
+		send_write_incapsule = 0;
+
+		cmd_hdr.chdr.pdo = 0x0;
+	}
+
+	rc = qedn_init_sgl(qedn, qedn_task);
+	if (rc)
+		return rc;
+
+	task_params.host_cccid = cpu_to_le16(qedn_task->cccid);
+	task_params.opq.lo = cpu_to_le32(((u64)(qedn_task)) & 0xffffffff);
+	task_params.opq.hi = cpu_to_le32(((u64)(qedn_task)) >> 32);
+
+	/* Initialize task params */
+	task_params.context = qedn_task->fw_task_ctx;
+	task_params.sqe = &local_sqe;
+	task_params.tx_io_size = qedn_task->task_size;
+	task_params.rx_io_size = 0;
+	task_params.conn_icid = (u16)conn_ctx->conn_handle;
+	task_params.itid = qedn_task->itid;
+	task_params.cq_rss_number = conn_ctx->default_cq;
+	task_params.send_write_incapsule = send_write_incapsule;
+
+	/* Initialize conn params */
+
+	cmd_hdr.chdr.pdu_type = nvme_tcp_cmd;
+	cmd_hdr.chdr.flags = 0;
+	cmd_hdr.chdr.hlen = sizeof(cmd_hdr);
+	cmd_hdr.chdr.plen_swapped = cpu_to_le32(__swab32(pdu_len));
+	for (i = 0; i < 16; i++)
+		cmd_hdr.pshdr.raw_swapped[i] = cpu_to_le32(__swab32(((u32 *)nvme_cmd)[i]));
+
+	qed_ops->init_write_io(&task_params, &conn_params, &cmd_hdr, &qedn_task->sgl_task_params);
+
+	set_bit(QEDN_TASK_USED_BY_FW, &qedn_task->flags);
+	atomic_inc(&conn_ctx->num_active_fw_tasks);
+
+	spin_lock(&conn_ctx->ep.doorbell_lock);
+	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
+	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
+	qedn_ring_doorbell(conn_ctx);
+	spin_unlock(&conn_ctx->ep.doorbell_lock);
+
+	return 0;
+}
+
+static void qedn_fetch_request(struct qedn_conn_ctx *qedn_conn)
+{
+	spin_lock(&qedn_conn->nvme_req_lock);
+	qedn_conn->req = list_first_entry_or_null(&qedn_conn->host_pend_req_list,
+						  struct nvme_tcp_ofld_req, queue_entry);
+	if (qedn_conn->req)
+		list_del(&qedn_conn->req->queue_entry);
+	spin_unlock(&qedn_conn->nvme_req_lock);
+}
+
 static bool qedn_process_req(struct qedn_conn_ctx *qedn_conn)
 {
+	struct qedn_task_ctx *qedn_task;
+	struct nvme_tcp_ofld_req *req;
+	struct request *rq;
+	int rc = 0;
+	u16 cccid;
+
+	qedn_fetch_request(qedn_conn);
+	if (!qedn_conn->req)
+		return false;
+
+	req = qedn_conn->req;
+	rq = blk_mq_rq_from_pdu(req);
+
+	/* Placeholder - async */
+
+	cccid = rq->tag;
+	qedn_task = qedn_get_task_from_pool_insist(qedn_conn, cccid);
+	if (unlikely(!qedn_task)) {
+		pr_err("Not able to allocate task context\n");
+		goto doorbell;
+	}
+
+	req->private_data = qedn_task;
+	qedn_task->req = req;
+
+	/* Placeholder - handle (req->async) */
+
+	/* Check if there are physical segments in request to determine the task size.
+	 * The logic of nvme_tcp_set_sg_null() will be implemented as part of
+	 * qedn_set_sg_host_data().
+	 */
+	qedn_task->task_size = blk_rq_nr_phys_segments(rq) ? blk_rq_payload_bytes(rq) : 0;
+	qedn_task->req_direction = rq_data_dir(rq);
+	if (qedn_task->req_direction == WRITE)
+		rc = qedn_send_write_cmd(qedn_task, qedn_conn);
+	else
+		rc = qedn_send_read_cmd(qedn_task, qedn_conn);
+
+	if (unlikely(rc)) {
+		pr_err("Read/Write command failure\n");
+		goto doorbell;
+	}
+
+	/* Don't ring doorbell if this is not the last request */
+	if (!req->last)
+		return true;
+
+doorbell:
+	/* Always ring doorbell if reached here, in case there were coalesced
+	 * requests which were delayed
+	 */
+	qedn_ring_doorbell(qedn_conn);
+
 	return true;
 }
 
@@ -497,8 +685,71 @@ struct qedn_task_ctx *qedn_cqe_get_active_task(struct nvmetcp_fw_cqe *cqe)
 					+ le32_to_cpu(p->lo)));
 }
 
+static struct nvme_tcp_ofld_req *qedn_decouple_req_task(struct qedn_task_ctx *qedn_task)
+{
+	struct nvme_tcp_ofld_req *ulp_req = qedn_task->req;
+
+	qedn_task->req = NULL;
+	if (ulp_req)
+		ulp_req->private_data = NULL;
+
+	return ulp_req;
+}
+
+static inline int qedn_comp_valid_task(struct qedn_task_ctx *qedn_task,
+				       union nvme_result *result, __le16 status)
+{
+	struct qedn_conn_ctx *conn_ctx = qedn_task->qedn_conn;
+	struct nvme_tcp_ofld_req *req;
+
+	req = qedn_decouple_req_task(qedn_task);
+	qedn_return_task_to_pool(conn_ctx, qedn_task);
+	if (!req) {
+		pr_err("req not found\n");
+
+		return -EINVAL;
+	}
+
+	/* Call request done to compelete the request */
+	if (req->done)
+		req->done(req, result, status);
+	else
+		pr_err("request done not Set !!!\n");
+
+	return 0;
+}
+
+int qedn_process_nvme_cqe(struct qedn_task_ctx *qedn_task, struct nvme_completion *cqe)
+{
+	int rc = 0;
+
+	/* cqe arrives swapped */
+	qedn_swap_bytes((u32 *)cqe, (sizeof(*cqe) / sizeof(u32)));
+
+	/* Placeholder - async */
+
+	rc = qedn_comp_valid_task(qedn_task, &cqe->result, cqe->status);
+
+	return rc;
+}
+
+int qedn_complete_c2h(struct qedn_task_ctx *qedn_task)
+{
+	int rc = 0;
+
+	__le16 status = cpu_to_le16(NVME_SC_SUCCESS << 1);
+	union nvme_result result = {};
+
+	rc = qedn_comp_valid_task(qedn_task, &result, status);
+
+	return rc;
+}
+
 void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
 {
+	int rc = 0;
+
+	struct nvme_completion *nvme_cqe = NULL;
 	struct qedn_task_ctx *qedn_task = NULL;
 	struct qedn_conn_ctx *conn_ctx = NULL;
 	u16 itid;
@@ -525,13 +776,27 @@ void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
 		case NVMETCP_TASK_TYPE_HOST_WRITE:
 		case NVMETCP_TASK_TYPE_HOST_READ:
 
-			/* Placeholder - IO flow */
+			/* Verify data digest once supported */
+
+			nvme_cqe = (struct nvme_completion *)&cqe->nvme_cqe;
+			rc = qedn_process_nvme_cqe(qedn_task, nvme_cqe);
+			if (rc) {
+				pr_err("Read/Write completion error\n");
 
+				return;
+			}
 			break;
 
 		case NVMETCP_TASK_TYPE_HOST_READ_NO_CQE:
 
-			/* Placeholder - IO flow */
+			/* Verify data digest once supported */
+
+			rc = qedn_complete_c2h(qedn_task);
+			if (rc) {
+				pr_err("Controller To Host Data Transfer error error\n");
+
+				return;
+			}
 
 			break;
 
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 26/27] qedn: Add Connection and IO level recovery flows
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (24 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 25/27] qedn: Add IO level fastpath functionality Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:57   ` Hannes Reinecke
  2021-04-29 19:09 ` [RFC PATCH v4 27/27] qedn: Add support of ASYNC Shai Malin
  2021-05-01 16:47 ` [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Hannes Reinecke
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

This patch will present the connection level functionalities:
 - conn clear-sq: will release the FW restrictions in order to flush all
   the pending IOs.
 - drain: in case clear-sq is stuck, will release all the device FW
   restrictions in order to flush all the pending IOs.
 - task cleanup - will flush the IO level resources.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/qedn.h      |   8 ++
 drivers/nvme/hw/qedn/qedn_conn.c | 133 ++++++++++++++++++++++++++++++-
 drivers/nvme/hw/qedn/qedn_main.c |   1 +
 drivers/nvme/hw/qedn/qedn_task.c |  27 ++++++-
 4 files changed, 166 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index 065e4324e30c..fed4252392e0 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -51,6 +51,8 @@
 #define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"
 #define QEDN_NVME_REQ_FP_WQ_WORKQUEUE "qedn_nvme_req_fp_wq"
 
+#define QEDN_DRAIN_MAX_ATTEMPTS 3
+
 /* Protocol defines */
 #define QEDN_MAX_IO_SIZE QED_NVMETCP_MAX_IO_SIZE
 #define QEDN_MAX_PDU_SIZE 0x80000 /* 512KB */
@@ -104,6 +106,8 @@
 /* Timeouts and delay constants */
 #define QEDN_WAIT_CON_ESTABLSH_TMO 10000 /* 10 seconds */
 #define QEDN_RLS_CONS_TMO 5000 /* 5 sec */
+#define QEDN_TASK_CLEANUP_TMO 3000 /* 3 sec */
+#define QEDN_DRAIN_TMO 1000 /* 1 sec */
 
 enum qedn_state {
 	QEDN_STATE_CORE_PROBED = 0,
@@ -191,7 +195,9 @@ struct qedn_ctx {
 };
 
 enum qedn_task_flags {
+	QEDN_TASK_IS_ICREQ,
 	QEDN_TASK_USED_BY_FW,
+	QEDN_TASK_WAIT_FOR_CLEANUP,
 };
 
 struct qedn_task_ctx {
@@ -348,6 +354,8 @@ struct qedn_conn_ctx {
 	struct list_head active_task_list;
 	atomic_t num_active_tasks;
 	atomic_t num_active_fw_tasks;
+	atomic_t task_cleanups_cnt;
+	wait_queue_head_t cleanup_waitq;
 
 	/* Connection resources - turned on to indicate what resource was
 	 * allocated, to that it can later be released.
diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
index fa8d414eb888..8af119202b91 100644
--- a/drivers/nvme/hw/qedn/qedn_conn.c
+++ b/drivers/nvme/hw/qedn/qedn_conn.c
@@ -585,6 +585,11 @@ static int qedn_handle_icresp(struct qedn_conn_ctx *conn_ctx)
 	return rc;
 }
 
+void qedn_error_recovery(struct nvme_ctrl *nctrl)
+{
+	nvme_tcp_ofld_error_recovery(nctrl);
+}
+
 /* Slowpath EQ Callback */
 int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 {
@@ -644,6 +649,7 @@ int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 		}
 
 		break;
+
 	case NVMETCP_EVENT_TYPE_ASYN_TERMINATE_DONE:
 		if (conn_ctx->state != CONN_STATE_WAIT_FOR_DESTROY_DONE)
 			pr_err("CID=0x%x - ASYN_TERMINATE_DONE: Unexpected connection state %u\n",
@@ -652,6 +658,19 @@ int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
 			queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
 
 		break;
+
+	case NVMETCP_EVENT_TYPE_ASYN_CLOSE_RCVD:
+	case NVMETCP_EVENT_TYPE_ASYN_ABORT_RCVD:
+	case NVMETCP_EVENT_TYPE_ASYN_MAX_RT_TIME:
+	case NVMETCP_EVENT_TYPE_ASYN_MAX_RT_CNT:
+	case NVMETCP_EVENT_TYPE_ASYN_SYN_RCVD:
+	case NVMETCP_EVENT_TYPE_ASYN_MAX_KA_PROBES_CNT:
+	case NVMETCP_EVENT_TYPE_NVMETCP_CONN_ERROR:
+	case NVMETCP_EVENT_TYPE_TCP_CONN_ERROR:
+		qedn_error_recovery(&conn_ctx->ctrl->nctrl);
+
+		break;
+
 	default:
 		pr_err("CID=0x%x - Recv Unknown Event %u\n", conn_ctx->fw_cid, fw_event_code);
 		break;
@@ -765,8 +784,110 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
 	return -EINVAL;
 }
 
+static void qedn_cleanup_fw_task(struct qedn_ctx *qedn, struct qedn_task_ctx *qedn_task)
+{
+	struct qedn_conn_ctx *conn_ctx = qedn_task->qedn_conn;
+	struct nvmetcp_task_params task_params;
+	struct nvmetcp_wqe *chain_sqe;
+	struct nvmetcp_wqe local_sqe;
+	unsigned long lock_flags;
+
+	/* Take lock to prevent race with fastpath, we don't want to
+	 * invoke cleanup flows on tasks that already returned.
+	 */
+	spin_lock_irqsave(&qedn_task->lock, lock_flags);
+	if (!qedn_task->valid) {
+		spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
+
+		return;
+	}
+	/* Skip tasks not used by FW */
+	if (!test_bit(QEDN_TASK_USED_BY_FW, &qedn_task->flags)) {
+		spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
+
+		return;
+	}
+	/* Skip tasks that were already invoked for cleanup */
+	if (unlikely(test_bit(QEDN_TASK_WAIT_FOR_CLEANUP, &qedn_task->flags))) {
+		spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
+
+		return;
+	}
+	set_bit(QEDN_TASK_WAIT_FOR_CLEANUP, &qedn_task->flags);
+	spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
+
+	atomic_inc(&conn_ctx->task_cleanups_cnt);
+
+	task_params.sqe = &local_sqe;
+	task_params.itid = qedn_task->itid;
+	qed_ops->init_task_cleanup(&task_params);
+
+	/* spin_lock - doorbell is accessed  both Rx flow and response flow */
+	spin_lock(&conn_ctx->ep.doorbell_lock);
+	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
+	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
+	qedn_ring_doorbell(conn_ctx);
+	spin_unlock(&conn_ctx->ep.doorbell_lock);
+}
+
+inline int qedn_drain(struct qedn_conn_ctx *conn_ctx)
+{
+	int drain_iter = QEDN_DRAIN_MAX_ATTEMPTS;
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int wrc;
+
+	while (drain_iter) {
+		qed_ops->common->drain(qedn->cdev);
+		msleep(100);
+
+		wrc = wait_event_interruptible_timeout(conn_ctx->cleanup_waitq,
+						       !atomic_read(&conn_ctx->task_cleanups_cnt),
+						       msecs_to_jiffies(QEDN_DRAIN_TMO));
+		if (!wrc) {
+			drain_iter--;
+			continue;
+		}
+
+		return 0;
+	}
+
+	pr_err("CID 0x%x: cleanup after drain failed - need hard reset.\n", conn_ctx->fw_cid);
+
+	return -EINVAL;
+}
+
+void qedn_cleanup_all_fw_tasks(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_task_ctx *qedn_task, *task_tmp;
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int wrc;
+
+	list_for_each_entry_safe_reverse(qedn_task, task_tmp, &conn_ctx->active_task_list, entry) {
+		qedn_cleanup_fw_task(qedn, qedn_task);
+	}
+
+	wrc = wait_event_interruptible_timeout(conn_ctx->cleanup_waitq,
+					       atomic_read(&conn_ctx->task_cleanups_cnt) == 0,
+					       msecs_to_jiffies(QEDN_TASK_CLEANUP_TMO));
+	if (!wrc) {
+		if (qedn_drain(conn_ctx))
+			return;
+	}
+}
+
+static void qedn_clear_fw_sq(struct qedn_conn_ctx *conn_ctx)
+{
+	struct qedn_ctx *qedn = conn_ctx->qedn;
+	int rc;
+
+	rc = qed_ops->clear_sq(qedn->cdev, conn_ctx->conn_handle);
+	if (rc)
+		pr_warn("clear_sq failed - rc %u\n", rc);
+}
+
 void qedn_destroy_connection(struct qedn_conn_ctx *conn_ctx)
 {
+	struct nvme_tcp_ofld_req *req, *req_tmp;
 	struct qedn_ctx *qedn = conn_ctx->qedn;
 	int rc;
 
@@ -775,7 +896,17 @@ void qedn_destroy_connection(struct qedn_conn_ctx *conn_ctx)
 	if (qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_DESTROY_DONE))
 		return;
 
-	/* Placeholder - task cleanup */
+	spin_lock(&conn_ctx->nvme_req_lock);
+	list_for_each_entry_safe(req, req_tmp, &conn_ctx->host_pend_req_list, queue_entry) {
+		list_del(&req->queue_entry);
+	}
+	spin_unlock(&conn_ctx->nvme_req_lock);
+
+	if (atomic_read(&conn_ctx->num_active_fw_tasks)) {
+		conn_ctx->abrt_flag = QEDN_ABORTIVE_TERMINATION;
+		qedn_clear_fw_sq(conn_ctx);
+		qedn_cleanup_all_fw_tasks(conn_ctx);
+	}
 
 	rc = qed_ops->destroy_conn(qedn->cdev, conn_ctx->conn_handle,
 				   conn_ctx->abrt_flag);
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index a6756d7250b7..63a4e88d826d 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -323,6 +323,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t
 	qedn_set_pdu_params(conn_ctx);
 
 	init_waitqueue_head(&conn_ctx->conn_waitq);
+	init_waitqueue_head(&conn_ctx->cleanup_waitq);
 	atomic_set(&conn_ctx->est_conn_indicator, 0);
 	atomic_set(&conn_ctx->destroy_conn_indicator, 0);
 
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
index 13d9fb6ed5b6..4ae6c0f66258 100644
--- a/drivers/nvme/hw/qedn/qedn_task.c
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -327,6 +327,17 @@ void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx)
 	/* Return tasks that aren't "Used by FW" to the pool */
 	list_for_each_entry_safe(qedn_task, task_tmp,
 				 &conn_ctx->active_task_list, entry) {
+		/* If we got this far, cleanup was already done
+		 * in which case we want to return the task to the pool and
+		 * release it. So we make sure the cleanup indication is down
+		 */
+		clear_bit(QEDN_TASK_WAIT_FOR_CLEANUP, &qedn_task->flags);
+
+		/* Special handling in case of ICREQ task */
+		if (unlikely(conn_ctx->state ==	CONN_STATE_WAIT_FOR_IC_COMP &&
+			     test_bit(QEDN_TASK_IS_ICREQ, &(qedn_task)->flags)))
+			qedn_common_clear_fw_sgl(&qedn_task->sgl_task_params);
+
 		qedn_clear_task(conn_ctx, qedn_task);
 		num_returned_tasks++;
 	}
@@ -770,7 +781,8 @@ void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
 		return;
 
 	if (likely(cqe->cqe_type == NVMETCP_FW_CQE_TYPE_NORMAL)) {
-		/* Placeholder - verify the connection was established */
+		if (unlikely(test_bit(QEDN_TASK_WAIT_FOR_CLEANUP, &qedn_task->flags)))
+			return;
 
 		switch (cqe->task_type) {
 		case NVMETCP_TASK_TYPE_HOST_WRITE:
@@ -811,6 +823,17 @@ void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
 			pr_info("Could not identify task type\n");
 		}
 	} else {
-		/* Placeholder - Recovery flows */
+		if (cqe->cqe_type == NVMETCP_FW_CQE_TYPE_CLEANUP) {
+			clear_bit(QEDN_TASK_WAIT_FOR_CLEANUP, &qedn_task->flags);
+			qedn_return_task_to_pool(conn_ctx, qedn_task);
+			atomic_dec(&conn_ctx->task_cleanups_cnt);
+			wake_up_interruptible(&conn_ctx->cleanup_waitq);
+
+			return;
+		}
+
+		 /* The else is NVMETCP_FW_CQE_TYPE_DUMMY - in which don't return the task.
+		  * The task will return during NVMETCP_FW_CQE_TYPE_CLEANUP.
+		  */
 	}
 }
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v4 27/27] qedn: Add support of ASYNC
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (25 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 26/27] qedn: Add Connection and IO level recovery flows Shai Malin
@ 2021-04-29 19:09 ` Shai Malin
  2021-05-02 11:59   ` Hannes Reinecke
  2021-05-01 16:47 ` [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Hannes Reinecke
  27 siblings, 1 reply; 81+ messages in thread
From: Shai Malin @ 2021-04-29 19:09 UTC (permalink / raw)
  To: netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net  --cc=Jakub
	Kicinski, smalin, aelior, mkalderon, okulkarni, pkushwaha,
	malin1024

From: Prabhakar Kushwaha <pkushwaha@marvell.com>

This patch implement ASYNC request and response event notification
handling at qedn driver level.

NVME Ofld layer's ASYNC request is treated similar to read with
fake CCCID. This CCCID used to route ASYNC notification back to
the NVME ofld layer.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/nvme/hw/qedn/qedn.h      |   8 ++
 drivers/nvme/hw/qedn/qedn_main.c |   1 +
 drivers/nvme/hw/qedn/qedn_task.c | 156 +++++++++++++++++++++++++++++--
 3 files changed, 156 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
index fed4252392e0..067dc45027d4 100644
--- a/drivers/nvme/hw/qedn/qedn.h
+++ b/drivers/nvme/hw/qedn/qedn.h
@@ -109,6 +109,9 @@
 #define QEDN_TASK_CLEANUP_TMO 3000 /* 3 sec */
 #define QEDN_DRAIN_TMO 1000 /* 1 sec */
 
+#define QEDN_MAX_OUTSTAND_ASYNC 32
+#define QEDN_INVALID_CCCID (-1)
+
 enum qedn_state {
 	QEDN_STATE_CORE_PROBED = 0,
 	QEDN_STATE_CORE_OPEN,
@@ -196,6 +199,7 @@ struct qedn_ctx {
 
 enum qedn_task_flags {
 	QEDN_TASK_IS_ICREQ,
+	QEDN_TASK_ASYNC,
 	QEDN_TASK_USED_BY_FW,
 	QEDN_TASK_WAIT_FOR_CLEANUP,
 };
@@ -373,6 +377,10 @@ struct qedn_conn_ctx {
 	struct nvmetcp_icresp_hdr_psh icresp;
 	struct qedn_icreq_padding *icreq_pad;
 
+	DECLARE_BITMAP(async_cccid_idx_map, QEDN_MAX_OUTSTAND_ASYNC);
+	/* Spinlock for fetching pseudo CCCID for async request */
+	spinlock_t async_cccid_bitmap_lock;
+
 	/* "dummy" socket */
 	struct socket *sock;
 };
diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
index 63a4e88d826d..d2575767be47 100644
--- a/drivers/nvme/hw/qedn/qedn_main.c
+++ b/drivers/nvme/hw/qedn/qedn_main.c
@@ -328,6 +328,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t
 	atomic_set(&conn_ctx->destroy_conn_indicator, 0);
 
 	spin_lock_init(&conn_ctx->conn_state_lock);
+	spin_lock_init(&conn_ctx->async_cccid_bitmap_lock);
 
 	INIT_WORK(&conn_ctx->nvme_req_fp_wq_entry, qedn_nvme_req_fp_wq_handler);
 	conn_ctx->nvme_req_fp_wq = qedn->nvme_req_fp_wq;
diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
index 4ae6c0f66258..9ed6bc6c5c8a 100644
--- a/drivers/nvme/hw/qedn/qedn_task.c
+++ b/drivers/nvme/hw/qedn/qedn_task.c
@@ -259,10 +259,45 @@ void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params)
 	sgl_task_params->num_sges = 0;
 }
 
-inline void qedn_host_reset_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx,
-					     u16 cccid)
+inline void qedn_host_reset_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx, u16 cccid, bool async)
 {
 	conn_ctx->host_cccid_itid[cccid].itid = cpu_to_le16(QEDN_INVALID_ITID);
+	if (unlikely(async))
+		clear_bit(cccid - NVME_AQ_DEPTH,
+			  conn_ctx->async_cccid_idx_map);
+}
+
+static int qedn_get_free_idx(struct qedn_conn_ctx *conn_ctx, unsigned int size)
+{
+	int idx;
+
+	spin_lock(&conn_ctx->async_cccid_bitmap_lock);
+	idx = find_first_zero_bit(conn_ctx->async_cccid_idx_map, size);
+	if (unlikely(idx >= size)) {
+		idx = -1;
+		spin_unlock(&conn_ctx->async_cccid_bitmap_lock);
+		goto err_idx;
+	}
+	set_bit(idx, conn_ctx->async_cccid_idx_map);
+	spin_unlock(&conn_ctx->async_cccid_bitmap_lock);
+
+err_idx:
+
+	return idx;
+}
+
+int qedn_get_free_async_cccid(struct qedn_conn_ctx *conn_ctx)
+{
+	int async_cccid;
+
+	async_cccid =
+		qedn_get_free_idx(conn_ctx, QEDN_MAX_OUTSTAND_ASYNC);
+	if (unlikely(async_cccid == QEDN_INVALID_CCCID))
+		pr_err("No available CCCID for Async.\n");
+	else
+		async_cccid += NVME_AQ_DEPTH;
+
+	return async_cccid;
 }
 
 inline void qedn_host_set_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx, u16 cccid, u16 itid)
@@ -363,10 +398,12 @@ void qedn_return_task_to_pool(struct qedn_conn_ctx *conn_ctx,
 	struct qedn_fp_queue *fp_q = conn_ctx->fp_q;
 	struct qedn_io_resources *io_resrc;
 	unsigned long lock_flags;
+	bool async;
 
 	io_resrc = &fp_q->host_resrc;
 
 	spin_lock_irqsave(&qedn_task->lock, lock_flags);
+	async = test_bit(QEDN_TASK_ASYNC, &(qedn_task)->flags);
 	qedn_task->valid = 0;
 	qedn_task->flags = 0;
 	qedn_clear_sgl(conn_ctx->qedn, qedn_task);
@@ -374,7 +411,7 @@ void qedn_return_task_to_pool(struct qedn_conn_ctx *conn_ctx,
 
 	spin_lock(&conn_ctx->task_list_lock);
 	list_del(&qedn_task->entry);
-	qedn_host_reset_cccid_itid_entry(conn_ctx, qedn_task->cccid);
+	qedn_host_reset_cccid_itid_entry(conn_ctx, qedn_task->cccid, async);
 	spin_unlock(&conn_ctx->task_list_lock);
 
 	atomic_dec(&conn_ctx->num_active_tasks);
@@ -447,6 +484,67 @@ qedn_get_task_from_pool_insist(struct qedn_conn_ctx *conn_ctx, u16 cccid)
 	return qedn_task;
 }
 
+void qedn_send_async_event_cmd(struct qedn_task_ctx *qedn_task,
+			       struct qedn_conn_ctx *conn_ctx)
+{
+	struct nvme_tcp_ofld_req *async_req = qedn_task->req;
+	struct nvme_command *nvme_cmd = &async_req->nvme_cmd;
+	struct storage_sgl_task_params *sgl_task_params;
+	struct nvmetcp_task_params task_params;
+	struct nvmetcp_cmd_capsule_hdr cmd_hdr;
+	struct nvmetcp_conn_params conn_params;
+	struct nvmetcp_wqe *chain_sqe;
+	struct nvmetcp_wqe local_sqe;
+	int i;
+
+	set_bit(QEDN_TASK_ASYNC, &qedn_task->flags);
+	nvme_cmd->common.command_id = qedn_task->cccid;
+	qedn_task->task_size = 0;
+
+	/* Initialize sgl params */
+	sgl_task_params = &qedn_task->sgl_task_params;
+	sgl_task_params->total_buffer_size = 0;
+	sgl_task_params->num_sges = 0;
+	sgl_task_params->small_mid_sge = false;
+
+	task_params.opq.lo = cpu_to_le32(((u64)(qedn_task)) & 0xffffffff);
+	task_params.opq.hi = cpu_to_le32(((u64)(qedn_task)) >> 32);
+
+	/* Initialize task params */
+	task_params.context = qedn_task->fw_task_ctx;
+	task_params.sqe = &local_sqe;
+	task_params.tx_io_size = 0;
+	task_params.rx_io_size = 0;
+	task_params.conn_icid = (u16)conn_ctx->conn_handle;
+	task_params.itid = qedn_task->itid;
+	task_params.cq_rss_number = conn_ctx->default_cq;
+	task_params.send_write_incapsule = 0;
+
+	/* Initialize conn params */
+	conn_params.max_burst_length = QEDN_MAX_IO_SIZE;
+
+	/* Internal impl. - async is treated like zero len read */
+	cmd_hdr.chdr.pdu_type = nvme_tcp_cmd;
+	cmd_hdr.chdr.flags = 0;
+	cmd_hdr.chdr.hlen = sizeof(cmd_hdr);
+	cmd_hdr.chdr.pdo = 0x0;
+	cmd_hdr.chdr.plen_swapped = cpu_to_le32(__swab32(cmd_hdr.chdr.hlen));
+
+	for (i = 0; i < 16; i++)
+		cmd_hdr.pshdr.raw_swapped[i] = cpu_to_le32(__swab32(((u32 *)nvme_cmd)[i]));
+
+	qed_ops->init_read_io(&task_params, &conn_params, &cmd_hdr, &qedn_task->sgl_task_params);
+
+	set_bit(QEDN_TASK_USED_BY_FW, &qedn_task->flags);
+	atomic_inc(&conn_ctx->num_active_fw_tasks);
+
+	spin_lock(&conn_ctx->ep.doorbell_lock);
+	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
+	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
+	qedn_ring_doorbell(conn_ctx);
+	spin_unlock(&conn_ctx->ep.doorbell_lock);
+}
+
 int qedn_send_read_cmd(struct qedn_task_ctx *qedn_task, struct qedn_conn_ctx *conn_ctx)
 {
 	struct nvme_command *nvme_cmd = &qedn_task->req->nvme_cmd;
@@ -580,6 +678,24 @@ static void qedn_fetch_request(struct qedn_conn_ctx *qedn_conn)
 	spin_unlock(&qedn_conn->nvme_req_lock);
 }
 
+static void qedn_return_error_req(struct nvme_tcp_ofld_req *req)
+{
+	__le16 status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+	union nvme_result res = {};
+	struct request *rq;
+
+	if (!req)
+		return;
+
+	rq = blk_mq_rq_from_pdu(req);
+
+	/* Call request done to compelete the request */
+	if (req->done)
+		req->done(req, &res, status);
+	else
+		pr_err("request done not set !!!\n");
+}
+
 static bool qedn_process_req(struct qedn_conn_ctx *qedn_conn)
 {
 	struct qedn_task_ctx *qedn_task;
@@ -595,9 +711,16 @@ static bool qedn_process_req(struct qedn_conn_ctx *qedn_conn)
 	req = qedn_conn->req;
 	rq = blk_mq_rq_from_pdu(req);
 
-	/* Placeholder - async */
+	if (unlikely(req->async)) {
+		cccid = qedn_get_free_async_cccid(qedn_conn);
+		if (cccid == QEDN_INVALID_CCCID) {
+			qedn_return_error_req(req);
+			goto doorbell;
+		}
+	} else {
+		cccid = rq->tag;
+	}
 
-	cccid = rq->tag;
 	qedn_task = qedn_get_task_from_pool_insist(qedn_conn, cccid);
 	if (unlikely(!qedn_task)) {
 		pr_err("Not able to allocate task context\n");
@@ -607,7 +730,10 @@ static bool qedn_process_req(struct qedn_conn_ctx *qedn_conn)
 	req->private_data = qedn_task;
 	qedn_task->req = req;
 
-	/* Placeholder - handle (req->async) */
+	if (unlikely(req->async)) {
+		qedn_send_async_event_cmd(qedn_task, qedn_conn);
+		goto doorbell;
+	}
 
 	/* Check if there are physical segments in request to determine the task size.
 	 * The logic of nvme_tcp_set_sg_null() will be implemented as part of
@@ -732,14 +858,26 @@ static inline int qedn_comp_valid_task(struct qedn_task_ctx *qedn_task,
 
 int qedn_process_nvme_cqe(struct qedn_task_ctx *qedn_task, struct nvme_completion *cqe)
 {
+	struct qedn_conn_ctx *conn_ctx = qedn_task->qedn_conn;
+	struct nvme_tcp_ofld_req *req;
 	int rc = 0;
+	bool async;
+
+	async = test_bit(QEDN_TASK_ASYNC, &(qedn_task)->flags);
 
 	/* cqe arrives swapped */
 	qedn_swap_bytes((u32 *)cqe, (sizeof(*cqe) / sizeof(u32)));
 
-	/* Placeholder - async */
-
-	rc = qedn_comp_valid_task(qedn_task, &cqe->result, cqe->status);
+	if (unlikely(async)) {
+		qedn_return_task_to_pool(conn_ctx, qedn_task);
+		req = qedn_task->req;
+		if (req->done)
+			req->done(req, &cqe->result, cqe->status);
+		else
+			pr_err("request done not set for async request !!!\n");
+	} else {
+		rc = qedn_comp_valid_task(qedn_task, &cqe->result, cqe->status);
+	}
 
 	return rc;
 }
-- 
2.22.0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 08/27] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2021-04-29 19:09 ` [RFC PATCH v4 08/27] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
@ 2021-05-01 12:18   ` Hannes Reinecke
  2021-05-03 15:46     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-01 12:18 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024,
	Dean Balandin

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the structure for the NVMeTCP offload common
> layer driver. This module is added under "drivers/nvme/host/" and future
> offload drivers which will register to it will be placed under
> "drivers/nvme/hw".
> This new driver will be enabled by the Kconfig "NVM Express over Fabrics
> TCP offload commmon layer".
> In order to support the new transport type, for host mode, no change is
> needed.
> 
> Each new vendor-specific offload driver will register to this ULP during
> its probe function, by filling out the nvme_tcp_ofld_dev->ops and
> nvme_tcp_ofld_dev->private_data and calling nvme_tcp_ofld_register_dev
> with the initialized struct.
> 
> The internal implementation:
> - tcp-offload.h:
>    Includes all common structs and ops to be used and shared by offload
>    drivers.
> 
> - tcp-offload.c:
>    Includes the init function which registers as a NVMf transport just
>    like any other transport.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/host/Kconfig       |  16 +++
>   drivers/nvme/host/Makefile      |   3 +
>   drivers/nvme/host/tcp-offload.c | 126 +++++++++++++++++++
>   drivers/nvme/host/tcp-offload.h | 206 ++++++++++++++++++++++++++++++++
>   4 files changed, 351 insertions(+)
>   create mode 100644 drivers/nvme/host/tcp-offload.c
>   create mode 100644 drivers/nvme/host/tcp-offload.h
> 
It will be tricky to select the correct transport eg when traversing the 
discovery log page; the discovery log page only knows about 'tcp' (not 
'tcp_offload'), so the offload won't be picked up.
But that can we worked on / fixed later on, as it's arguably a policy 
decision.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 09/27] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
  2021-04-29 19:09 ` [RFC PATCH v4 09/27] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Shai Malin
@ 2021-05-01 12:19   ` Hannes Reinecke
  2021-05-03 15:50     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-01 12:19 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024,
	Arie Gershberg

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Arie Gershberg <agershberg@marvell.com>
> 
> Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
> to header file, so it can be used by transport modules.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Arie Gershberg <agershberg@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/host/fabrics.c | 7 -------
>   drivers/nvme/host/fabrics.h | 7 +++++++
>   2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 604ab0e5a2ad..55d7125c8483 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -1001,13 +1001,6 @@ void nvmf_free_options(struct nvmf_ctrl_options *opts)
>   }
>   EXPORT_SYMBOL_GPL(nvmf_free_options);
>   
> -#define NVMF_REQUIRED_OPTS	(NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
> -#define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
> -				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
> -				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
> -				 NVMF_OPT_DISABLE_SQFLOW |\
> -				 NVMF_OPT_FAIL_FAST_TMO)
> -
>   static struct nvme_ctrl *
>   nvmf_create_ctrl(struct device *dev, const char *buf)
>   {
> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
> index 888b108d87a4..b7627e8dcaaf 100644
> --- a/drivers/nvme/host/fabrics.h
> +++ b/drivers/nvme/host/fabrics.h
> @@ -68,6 +68,13 @@ enum {
>   	NVMF_OPT_FAIL_FAST_TMO	= 1 << 20,
>   };
>   
> +#define NVMF_REQUIRED_OPTS	(NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
> +#define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
> +				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
> +				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
> +				 NVMF_OPT_DISABLE_SQFLOW |\
> +				 NVMF_OPT_FAIL_FAST_TMO)
> +
>   /**
>    * struct nvmf_ctrl_options - Used to hold the options specified
>    *			      with the parsing opts enum.
> 

Why do you need them? None of the other transport drivers use them, why you?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 10/27] nvme-tcp-offload: Add device scan implementation
  2021-04-29 19:09 ` [RFC PATCH v4 10/27] nvme-tcp-offload: Add device scan implementation Shai Malin
@ 2021-05-01 12:25   ` Hannes Reinecke
  2021-05-05 17:52     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-01 12:25 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024,
	Dean Balandin

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Dean Balandin <dbalandin@marvell.com>
> 
> As part of create_ctrl(), it scans the registered devices and calls
> the claim_dev op on each of them, to find the first devices that matches
> the connection params. Once the correct devices is found (claim_dev
> returns true), we raise the refcnt of that device and return that device
> as the device to be used for ctrl currently being created.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/host/tcp-offload.c | 94 +++++++++++++++++++++++++++++++++
>   1 file changed, 94 insertions(+)
> 
> diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
> index 711232eba339..aa7cc239abf2 100644
> --- a/drivers/nvme/host/tcp-offload.c
> +++ b/drivers/nvme/host/tcp-offload.c
> @@ -13,6 +13,11 @@
>   static LIST_HEAD(nvme_tcp_ofld_devices);
>   static DECLARE_RWSEM(nvme_tcp_ofld_devices_rwsem);
>   
> +static inline struct nvme_tcp_ofld_ctrl *to_tcp_ofld_ctrl(struct nvme_ctrl *nctrl)
> +{
> +	return container_of(nctrl, struct nvme_tcp_ofld_ctrl, nctrl);
> +}
> +
>   /**
>    * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
>    * function.
> @@ -98,6 +103,94 @@ void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
>   	/* Placeholder - complete request with/without error */
>   }
>   
> +struct nvme_tcp_ofld_dev *
> +nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)
> +{
> +	struct nvme_tcp_ofld_dev *dev;
> +
> +	down_read(&nvme_tcp_ofld_devices_rwsem);
> +	list_for_each_entry(dev, &nvme_tcp_ofld_devices, entry) {
> +		if (dev->ops->claim_dev(dev, &ctrl->conn_params)) {
> +			/* Increase driver refcnt */
> +			if (!try_module_get(dev->ops->module)) {
> +				pr_err("try_module_get failed\n");
> +				dev = NULL;
> +			}
> +
> +			goto out;
> +		}
> +	}
> +
> +	dev = NULL;
> +out:
> +	up_read(&nvme_tcp_ofld_devices_rwsem);
> +
> +	return dev;
> +}
> +
> +static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
> +{
> +	/* Placeholder - validates inputs and creates admin and IO queues */
> +
> +	return 0;
> +}
> +
> +static struct nvme_ctrl *
> +nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl;
> +	struct nvme_tcp_ofld_dev *dev;
> +	struct nvme_ctrl *nctrl;
> +	int rc = 0;
> +
> +	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
> +	if (!ctrl)
> +		return ERR_PTR(-ENOMEM);
> +
> +	nctrl = &ctrl->nctrl;
> +
> +	/* Init nvme_tcp_ofld_ctrl and nvme_ctrl params based on received opts */
> +
> +	/* Find device that can reach the dest addr */
> +	dev = nvme_tcp_ofld_lookup_dev(ctrl);
> +	if (!dev) {
> +		pr_info("no device found for addr %s:%s.\n",
> +			opts->traddr, opts->trsvcid);
> +		rc = -EINVAL;
> +		goto out_free_ctrl;
> +	}
> +
> +	ctrl->dev = dev;
> +
> +	if (ctrl->dev->ops->max_hw_sectors)
> +		nctrl->max_hw_sectors = ctrl->dev->ops->max_hw_sectors;
> +	if (ctrl->dev->ops->max_segments)
> +		nctrl->max_segments = ctrl->dev->ops->max_segments;
> +
> +	/* Init queues */
> +
> +	/* Call nvme_init_ctrl */
> +
> +	rc = ctrl->dev->ops->setup_ctrl(ctrl, true);
> +	if (rc)
> +		goto out_module_put;
> +
> +	rc = nvme_tcp_ofld_setup_ctrl(nctrl, true);
> +	if (rc)
> +		goto out_uninit_ctrl;
> +
> +	return nctrl;
> +
> +out_uninit_ctrl:
> +	ctrl->dev->ops->release_ctrl(ctrl);
> +out_module_put:
> +	module_put(dev->ops->module);
> +out_free_ctrl:
> +	kfree(ctrl);
> +
> +	return ERR_PTR(rc);
> +}
> +
>   static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
>   	.name		= "tcp_offload",
>   	.module		= THIS_MODULE,
> @@ -107,6 +200,7 @@ static struct nvmf_transport_ops nvme_tcp_ofld_transport = {
>   			  NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_HDR_DIGEST |
>   			  NVMF_OPT_DATA_DIGEST | NVMF_OPT_NR_POLL_QUEUES |
>   			  NVMF_OPT_TOS,
> +	.create_ctrl	= nvme_tcp_ofld_create_ctrl,
>   };
>   
>   static int __init nvme_tcp_ofld_init_module(void)
> 
I wonder if we shouldn't take the approach from Martin Belanger, and 
introduce a new option 'host_iface' to select the interface to use.
That is, _if_ the nvme-tcp offload driver would present itself as a 
network interface; one might argue that it would put too much 
restriction on the implementations.
But if it does not present itself as a network interface, how do we 
address it? And if it does, wouldn't we be better off to specify the 
interface directly, and not try to imply the interface from the IP address?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 12/27] nvme-tcp-offload: Add controller level error recovery implementation
  2021-04-29 19:09 ` [RFC PATCH v4 12/27] nvme-tcp-offload: Add controller level error recovery implementation Shai Malin
@ 2021-05-01 16:29   ` Hannes Reinecke
  2021-05-03 15:52     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-01 16:29 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024,
	Arie Gershberg

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Arie Gershberg <agershberg@marvell.com>
> 
> In this patch, we implement controller level error handling and recovery.
> Upon an error discovered by the ULP or reset controller initiated by the
> nvme-core (using reset_ctrl workqueue), the ULP will initiate a controller
> recovery which includes teardown and re-connect of all queues.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Arie Gershberg <agershberg@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/host/tcp-offload.c | 138 +++++++++++++++++++++++++++++++-
>   drivers/nvme/host/tcp-offload.h |   1 +
>   2 files changed, 137 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
> index 59e1955e02ec..9082b11c133f 100644
> --- a/drivers/nvme/host/tcp-offload.c
> +++ b/drivers/nvme/host/tcp-offload.c
> @@ -74,6 +74,23 @@ void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)
>   }
>   EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
>   
> +/**
> + * nvme_tcp_ofld_error_recovery() - NVMeTCP Offload Library error recovery.
> + * function.
> + * @nctrl:	NVMe controller instance to change to resetting.
> + *
> + * API function that change the controller state to resseting.
> + * Part of the overall controller reset sequence.
> + */
> +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl)
> +{
> +	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_RESETTING))
> +		return;
> +
> +	queue_work(nvme_reset_wq, &to_tcp_ofld_ctrl(nctrl)->err_work);
> +}
> +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_error_recovery);
> +
>   /**
>    * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event
>    * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.
> @@ -84,7 +101,8 @@ EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
>    */
>   int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)
>   {
> -	/* Placeholder - invoke error recovery flow */
> +	pr_err("nvme-tcp-offload queue error\n");
> +	nvme_tcp_ofld_error_recovery(&queue->ctrl->nctrl);
>   
>   	return 0;
>   }
> @@ -296,6 +314,28 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
>   	return rc;
>   }
>   
> +static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)
> +{
> +	/* If we are resetting/deleting then do nothing */
> +	if (nctrl->state != NVME_CTRL_CONNECTING) {
> +		WARN_ON_ONCE(nctrl->state == NVME_CTRL_NEW ||
> +			     nctrl->state == NVME_CTRL_LIVE);
> +
> +		return;
> +	}
> +
> +	if (nvmf_should_reconnect(nctrl)) {
> +		dev_info(nctrl->device, "Reconnecting in %d seconds...\n",
> +			 nctrl->opts->reconnect_delay);
> +		queue_delayed_work(nvme_wq,
> +				   &to_tcp_ofld_ctrl(nctrl)->connect_work,
> +				   nctrl->opts->reconnect_delay * HZ);
> +	} else {
> +		dev_info(nctrl->device, "Removing controller...\n");
> +		nvme_delete_ctrl(nctrl);
> +	}
> +}
> +
>   static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
>   {
>   	struct nvmf_ctrl_options *opts = nctrl->opts;
> @@ -407,10 +447,68 @@ nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
>   	/* Placeholder - teardown_io_queues */
>   }
>   
> +static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl =
> +				container_of(to_delayed_work(work),
> +					     struct nvme_tcp_ofld_ctrl,
> +					     connect_work);
> +	struct nvme_ctrl *nctrl = &ctrl->nctrl;
> +
> +	++nctrl->nr_reconnects;
> +
> +	if (ctrl->dev->ops->setup_ctrl(ctrl, false))
> +		goto requeue;
> +
> +	if (nvme_tcp_ofld_setup_ctrl(nctrl, false))
> +		goto release_and_requeue;
> +
> +	dev_info(nctrl->device, "Successfully reconnected (%d attempt)\n",
> +		 nctrl->nr_reconnects);
> +
> +	nctrl->nr_reconnects = 0;
> +
> +	return;
> +
> +release_and_requeue:
> +	ctrl->dev->ops->release_ctrl(ctrl);
> +requeue:
> +	dev_info(nctrl->device, "Failed reconnect attempt %d\n",
> +		 nctrl->nr_reconnects);
> +	nvme_tcp_ofld_reconnect_or_remove(nctrl);
> +}
> +
> +static void nvme_tcp_ofld_error_recovery_work(struct work_struct *work)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl =
> +		container_of(work, struct nvme_tcp_ofld_ctrl, err_work);
> +	struct nvme_ctrl *nctrl = &ctrl->nctrl;
> +
> +	nvme_stop_keep_alive(nctrl);
> +	nvme_tcp_ofld_teardown_io_queues(nctrl, false);
> +	/* unquiesce to fail fast pending requests */
> +	nvme_start_queues(nctrl);
> +	nvme_tcp_ofld_teardown_admin_queue(nctrl, false);
> +	blk_mq_unquiesce_queue(nctrl->admin_q);
> +
> +	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
> +		/* state change failure is ok if we started nctrl delete */
> +		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
> +			     nctrl->state != NVME_CTRL_DELETING_NOIO);
> +
> +		return;
> +	}
> +
> +	nvme_tcp_ofld_reconnect_or_remove(nctrl);
> +}
> +
>   static void
>   nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown)
>   {
> -	/* Placeholder - err_work and connect_work */
> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> +
> +	cancel_work_sync(&ctrl->err_work);
> +	cancel_delayed_work_sync(&ctrl->connect_work);
>   	nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown);
>   	blk_mq_quiesce_queue(nctrl->admin_q);
>   	if (shutdown)
> @@ -425,6 +523,38 @@ static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl)
>   	nvme_tcp_ofld_teardown_ctrl(nctrl, true);
>   }
>   
> +static void nvme_tcp_ofld_reset_ctrl_work(struct work_struct *work)
> +{
> +	struct nvme_ctrl *nctrl =
> +		container_of(work, struct nvme_ctrl, reset_work);
> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> +
> +	nvme_stop_ctrl(nctrl);
> +	nvme_tcp_ofld_teardown_ctrl(nctrl, false);
> +
> +	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
> +		/* state change failure is ok if we started ctrl delete */
> +		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
> +			     nctrl->state != NVME_CTRL_DELETING_NOIO);
> +
> +		return;
> +	}
> +
> +	if (ctrl->dev->ops->setup_ctrl(ctrl, false))
> +		goto out_fail;
> +
> +	if (nvme_tcp_ofld_setup_ctrl(nctrl, false))
> +		goto release_ctrl;
> +
> +	return;
> +
> +release_ctrl:
> +	ctrl->dev->ops->release_ctrl(ctrl);
> +out_fail:
> +	++nctrl->nr_reconnects;
> +	nvme_tcp_ofld_reconnect_or_remove(nctrl);
> +}
> +
>   static int
>   nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
>   			   struct request *rq,
> @@ -521,6 +651,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
>   			     opts->nr_poll_queues + 1;
>   	nctrl->sqsize = opts->queue_size - 1;
>   	nctrl->kato = opts->kato;
> +	INIT_DELAYED_WORK(&ctrl->connect_work,
> +			  nvme_tcp_ofld_reconnect_ctrl_work);
> +	INIT_WORK(&ctrl->err_work, nvme_tcp_ofld_error_recovery_work);
> +	INIT_WORK(&nctrl->reset_work, nvme_tcp_ofld_reset_ctrl_work);
>   	if (!(opts->mask & NVMF_OPT_TRSVCID)) {
>   		opts->trsvcid =
>   			kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL);
> diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
> index 9fd270240eaa..b23b1d7ea6fa 100644
> --- a/drivers/nvme/host/tcp-offload.h
> +++ b/drivers/nvme/host/tcp-offload.h
> @@ -204,3 +204,4 @@ struct nvme_tcp_ofld_ops {
>   /* Exported functions for lower vendor specific offload drivers */
>   int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
>   void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
> +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 13/27] nvme-tcp-offload: Add queue level implementation
  2021-04-29 19:09 ` [RFC PATCH v4 13/27] nvme-tcp-offload: Add queue level implementation Shai Malin
@ 2021-05-01 16:36   ` Hannes Reinecke
  2021-05-03 15:56     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-01 16:36 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024,
	Dean Balandin

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Dean Balandin <dbalandin@marvell.com>
> 
> In this patch we implement queue level functionality.
> The implementation is similar to the nvme-tcp module, the main
> difference being that we call the vendor specific create_queue op which
> creates the TCP connection, and NVMeTPC connection including
> icreq+icresp negotiation.
> Once create_queue returns successfully, we can move on to the fabrics
> connect.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/host/tcp-offload.c | 415 ++++++++++++++++++++++++++++++--
>   drivers/nvme/host/tcp-offload.h |   2 +-
>   2 files changed, 390 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
> index 9082b11c133f..8ddce2257100 100644
> --- a/drivers/nvme/host/tcp-offload.c
> +++ b/drivers/nvme/host/tcp-offload.c
> @@ -22,6 +22,11 @@ static inline struct nvme_tcp_ofld_ctrl *to_tcp_ofld_ctrl(struct nvme_ctrl *nctr
>   	return container_of(nctrl, struct nvme_tcp_ofld_ctrl, nctrl);
>   }
>   
> +static inline int nvme_tcp_ofld_qid(struct nvme_tcp_ofld_queue *queue)
> +{
> +	return queue - queue->ctrl->queues;
> +}
> +
>   /**
>    * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
>    * function.
> @@ -191,12 +196,94 @@ nvme_tcp_ofld_alloc_tagset(struct nvme_ctrl *nctrl, bool admin)
>   	return set;
>   }
>   
> +static void __nvme_tcp_ofld_stop_queue(struct nvme_tcp_ofld_queue *queue)
> +{
> +	queue->dev->ops->drain_queue(queue);
> +	queue->dev->ops->destroy_queue(queue);
> +}
> +
> +static void nvme_tcp_ofld_stop_queue(struct nvme_ctrl *nctrl, int qid)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> +	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
> +
> +	if (!test_and_clear_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags))
> +		return;
> +
> +	__nvme_tcp_ofld_stop_queue(queue);
> +}
> +
> +static void nvme_tcp_ofld_stop_io_queues(struct nvme_ctrl *ctrl)
> +{
> +	int i;
> +
> +	for (i = 1; i < ctrl->queue_count; i++)
> +		nvme_tcp_ofld_stop_queue(ctrl, i);
> +}
> +
> +static void nvme_tcp_ofld_free_queue(struct nvme_ctrl *nctrl, int qid)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> +	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
> +
> +	if (!test_and_clear_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags))
> +		return;
> +
> +	queue = &ctrl->queues[qid];
> +	queue->ctrl = NULL;
> +	queue->dev = NULL;
> +	queue->report_err = NULL;
> +}
> +
> +static void nvme_tcp_ofld_destroy_admin_queue(struct nvme_ctrl *nctrl, bool remove)
> +{
> +	nvme_tcp_ofld_stop_queue(nctrl, 0);
> +	if (remove) {
> +		blk_cleanup_queue(nctrl->admin_q);
> +		blk_cleanup_queue(nctrl->fabrics_q);
> +		blk_mq_free_tag_set(nctrl->admin_tagset);
> +	}
> +}
> +
> +static int nvme_tcp_ofld_start_queue(struct nvme_ctrl *nctrl, int qid)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> +	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
> +	int rc;
> +
> +	queue = &ctrl->queues[qid];
> +	if (qid) {
> +		queue->cmnd_capsule_len = nctrl->ioccsz * 16;
> +		rc = nvmf_connect_io_queue(nctrl, qid, false);
> +	} else {
> +		queue->cmnd_capsule_len = sizeof(struct nvme_command) + NVME_TCP_ADMIN_CCSZ;
> +		rc = nvmf_connect_admin_queue(nctrl);
> +	}
> +
> +	if (!rc) {
> +		set_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags);
> +	} else {
> +		if (test_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags))
> +			__nvme_tcp_ofld_stop_queue(queue);
> +		dev_err(nctrl->device,
> +			"failed to connect queue: %d ret=%d\n", qid, rc);
> +	}
> +
> +	return rc;
> +}
> +
>   static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
>   					       bool new)
>   {
> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> +	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[0];
>   	int rc;
>   
> -	/* Placeholder - alloc_admin_queue */
> +	rc = ctrl->dev->ops->create_queue(queue, 0, NVME_AQ_DEPTH);
> +	if (rc)
> +		return rc;
> +
> +	set_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags);
>   	if (new) {
>   		nctrl->admin_tagset =
>   				nvme_tcp_ofld_alloc_tagset(nctrl, true);
> @@ -221,7 +308,9 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
>   		}
>   	}
>   
> -	/* Placeholder - nvme_tcp_ofld_start_queue */
> +	rc = nvme_tcp_ofld_start_queue(nctrl, 0);
> +	if (rc)
> +		goto out_cleanup_queue;
>   
>   	rc = nvme_enable_ctrl(nctrl);
>   	if (rc)
> @@ -238,11 +327,12 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
>   out_quiesce_queue:
>   	blk_mq_quiesce_queue(nctrl->admin_q);
>   	blk_sync_queue(nctrl->admin_q);
> -
>   out_stop_queue:
> -	/* Placeholder - stop offload queue */
> +	nvme_tcp_ofld_stop_queue(nctrl, 0);
>   	nvme_cancel_admin_tagset(nctrl);
> -
> +out_cleanup_queue:
> +	if (new)
> +		blk_cleanup_queue(nctrl->admin_q);
>   out_cleanup_fabrics_q:
>   	if (new)
>   		blk_cleanup_queue(nctrl->fabrics_q);
> @@ -250,7 +340,127 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
>   	if (new)
>   		blk_mq_free_tag_set(nctrl->admin_tagset);
>   out_free_queue:
> -	/* Placeholder - free admin queue */
> +	nvme_tcp_ofld_free_queue(nctrl, 0);
> +
> +	return rc;
> +}
> +
> +static unsigned int nvme_tcp_ofld_nr_io_queues(struct nvme_ctrl *nctrl)
> +{
> +	unsigned int nr_io_queues;
> +
> +	nr_io_queues = min(nctrl->opts->nr_io_queues, num_online_cpus());
> +	nr_io_queues += min(nctrl->opts->nr_write_queues, num_online_cpus());
> +	nr_io_queues += min(nctrl->opts->nr_poll_queues, num_online_cpus());
> +
> +	return nr_io_queues;
> +}
> +

Really? Isn't this hardware-dependent?
I would have expected the hardware to impose some limitations here (# of 
MSIx interrupts or something). Hmm?

> +static void
> +nvme_tcp_ofld_set_io_queues(struct nvme_ctrl *nctrl, unsigned int nr_io_queues)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> +	struct nvmf_ctrl_options *opts = nctrl->opts;
> +
> +	if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
> +		/*
> +		 * separate read/write queues
> +		 * hand out dedicated default queues only after we have
> +		 * sufficient read queues.
> +		 */
> +		ctrl->io_queues[HCTX_TYPE_READ] = opts->nr_io_queues;
> +		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_READ];
> +		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> +			min(opts->nr_write_queues, nr_io_queues);
> +		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +	} else {
> +		/*
> +		 * shared read/write queues
> +		 * either no write queues were requested, or we don't have
> +		 * sufficient queue count to have dedicated default queues.
> +		 */
> +		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> +			min(opts->nr_io_queues, nr_io_queues);
> +		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +	}
> +
> +	if (opts->nr_poll_queues && nr_io_queues) {
> +		/* map dedicated poll queues only if we have queues left */
> +		ctrl->io_queues[HCTX_TYPE_POLL] =
> +			min(opts->nr_poll_queues, nr_io_queues);
> +	}
> +}
> +

Same here.
Poll queues only ever make sense of the hardware can serve specific 
queue pairs without interrupts. Which again relates to the number of 
interrupts, and the affinity of those.
Or isn't this a concern with your card?

> +static void
> +nvme_tcp_ofld_terminate_io_queues(struct nvme_ctrl *nctrl, int start_from)
> +{
> +	int i;
> +
> +	/* admin-q will be ignored because of the loop condition */
> +	for (i = start_from; i >= 1; i--)
> +		nvme_tcp_ofld_stop_queue(nctrl, i);
> +}
> +

Loop condition? Care to elaborate?

> +static int nvme_tcp_ofld_create_io_queues(struct nvme_ctrl *nctrl)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> +	int i, rc;
> +
> +	for (i = 1; i < nctrl->queue_count; i++) {
> +		rc = ctrl->dev->ops->create_queue(&ctrl->queues[i],
> +						  i, nctrl->sqsize + 1);
> +		if (rc)
> +			goto out_free_queues;
> +
> +		set_bit(NVME_TCP_OFLD_Q_ALLOCATED, &ctrl->queues[i].flags);
> +	}
> +
> +	return 0;
> +
> +out_free_queues:
> +	nvme_tcp_ofld_terminate_io_queues(nctrl, --i);
> +
> +	return rc;
> +}
> +
> +static int nvme_tcp_ofld_alloc_io_queues(struct nvme_ctrl *nctrl)
> +{
> +	unsigned int nr_io_queues;
> +	int rc;
> +
> +	nr_io_queues = nvme_tcp_ofld_nr_io_queues(nctrl);
> +	rc = nvme_set_queue_count(nctrl, &nr_io_queues);
> +	if (rc)
> +		return rc;
> +
> +	nctrl->queue_count = nr_io_queues + 1;
> +	if (nctrl->queue_count < 2) {
> +		dev_err(nctrl->device,
> +			"unable to set any I/O queues\n");
> +
> +		return -ENOMEM;
> +	}
> +
> +	dev_info(nctrl->device, "creating %d I/O queues.\n", nr_io_queues);
> +	nvme_tcp_ofld_set_io_queues(nctrl, nr_io_queues);
> +
> +	return nvme_tcp_ofld_create_io_queues(nctrl);
> +}
> +
> +static int nvme_tcp_ofld_start_io_queues(struct nvme_ctrl *nctrl)
> +{
> +	int i, rc = 0;
> +
> +	for (i = 1; i < nctrl->queue_count; i++) {
> +		rc = nvme_tcp_ofld_start_queue(nctrl, i);
> +		if (rc)
> +			goto terminate_queues;
> +	}
> +
> +	return 0;
> +
> +terminate_queues:
> +	nvme_tcp_ofld_terminate_io_queues(nctrl, --i);
>   
>   	return rc;
>   }
> @@ -258,9 +468,10 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
>   static int
>   nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
>   {
> -	int rc;
> +	int rc = nvme_tcp_ofld_alloc_io_queues(nctrl);
>   
> -	/* Placeholder - alloc_io_queues */
> +	if (rc)
> +		return rc;
>   
>   	if (new) {
>   		nctrl->tagset = nvme_tcp_ofld_alloc_tagset(nctrl, false);
> @@ -278,7 +489,9 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
>   		}
>   	}
>   
> -	/* Placeholder - start_io_queues */
> +	rc = nvme_tcp_ofld_start_io_queues(nctrl);
> +	if (rc)
> +		goto out_cleanup_connect_q;
>   
>   	if (!new) {
>   		nvme_start_queues(nctrl);
> @@ -300,16 +513,16 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
>   out_wait_freeze_timed_out:
>   	nvme_stop_queues(nctrl);
>   	nvme_sync_io_queues(nctrl);
> -
> -	/* Placeholder - Stop IO queues */
> -
> +	nvme_tcp_ofld_stop_io_queues(nctrl);
> +out_cleanup_connect_q:
> +	nvme_cancel_tagset(nctrl);
>   	if (new)
>   		blk_cleanup_queue(nctrl->connect_q);
>   out_free_tag_set:
>   	if (new)
>   		blk_mq_free_tag_set(nctrl->tagset);
>   out_free_io_queues:
> -	/* Placeholder - free_io_queues */
> +	nvme_tcp_ofld_terminate_io_queues(nctrl, nctrl->queue_count);
>   
>   	return rc;
>   }
> @@ -336,6 +549,26 @@ static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)
>   	}
>   }
>   
> +static int
> +nvme_tcp_ofld_init_admin_hctx(struct blk_mq_hw_ctx *hctx, void *data,
> +			      unsigned int hctx_idx)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl = data;
> +
> +	hctx->driver_data = &ctrl->queues[0];
> +
> +	return 0;
> +}
> +
> +static void nvme_tcp_ofld_destroy_io_queues(struct nvme_ctrl *nctrl, bool remove)
> +{
> +	nvme_tcp_ofld_stop_io_queues(nctrl);
> +	if (remove) {
> +		blk_cleanup_queue(nctrl->connect_q);
> +		blk_mq_free_tag_set(nctrl->tagset);
> +	}
> +}
> +
>   static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
>   {
>   	struct nvmf_ctrl_options *opts = nctrl->opts;
> @@ -387,9 +620,19 @@ static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
>   	return 0;
>   
>   destroy_io:
> -	/* Placeholder - stop and destroy io queues*/
> +	if (nctrl->queue_count > 1) {
> +		nvme_stop_queues(nctrl);
> +		nvme_sync_io_queues(nctrl);
> +		nvme_tcp_ofld_stop_io_queues(nctrl);
> +		nvme_cancel_tagset(nctrl);
> +		nvme_tcp_ofld_destroy_io_queues(nctrl, new);
> +	}
>   destroy_admin:
> -	/* Placeholder - stop and destroy admin queue*/
> +	blk_mq_quiesce_queue(nctrl->admin_q);
> +	blk_sync_queue(nctrl->admin_q);
> +	nvme_tcp_ofld_stop_queue(nctrl, 0);
> +	nvme_cancel_admin_tagset(nctrl);
> +	nvme_tcp_ofld_destroy_admin_queue(nctrl, new);
>   
>   	return rc;
>   }
> @@ -410,6 +653,18 @@ nvme_tcp_ofld_check_dev_opts(struct nvmf_ctrl_options *opts,
>   	return 0;
>   }
>   
> +static void nvme_tcp_ofld_free_ctrl_queues(struct nvme_ctrl *nctrl)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> +	int i;
> +
> +	for (i = 0; i < nctrl->queue_count; ++i)
> +		nvme_tcp_ofld_free_queue(nctrl, i);
> +
> +	kfree(ctrl->queues);
> +	ctrl->queues = NULL;
> +}
> +
>   static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
>   {
>   	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> @@ -419,6 +674,7 @@ static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
>   		goto free_ctrl;
>   
>   	down_write(&nvme_tcp_ofld_ctrl_rwsem);
> +	nvme_tcp_ofld_free_ctrl_queues(nctrl);
>   	ctrl->dev->ops->release_ctrl(ctrl);
>   	list_del(&ctrl->list);
>   	up_write(&nvme_tcp_ofld_ctrl_rwsem);
> @@ -436,15 +692,37 @@ static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)
>   }
>   
>   static void
> -nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *ctrl, bool remove)
> +nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *nctrl, bool remove)
>   {
> -	/* Placeholder - teardown_admin_queue */
> +	blk_mq_quiesce_queue(nctrl->admin_q);
> +	blk_sync_queue(nctrl->admin_q);
> +
> +	nvme_tcp_ofld_stop_queue(nctrl, 0);
> +	nvme_cancel_admin_tagset(nctrl);
> +
> +	if (remove)
> +		blk_mq_unquiesce_queue(nctrl->admin_q);
> +
> +	nvme_tcp_ofld_destroy_admin_queue(nctrl, remove);
>   }
>   
>   static void
>   nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
>   {
> -	/* Placeholder - teardown_io_queues */
> +	if (nctrl->queue_count <= 1)
> +		return;
> +
> +	blk_mq_quiesce_queue(nctrl->admin_q);
> +	nvme_start_freeze(nctrl);
> +	nvme_stop_queues(nctrl);
> +	nvme_sync_io_queues(nctrl);
> +	nvme_tcp_ofld_stop_io_queues(nctrl);
> +	nvme_cancel_tagset(nctrl);
> +
> +	if (remove)
> +		nvme_start_queues(nctrl);
> +
> +	nvme_tcp_ofld_destroy_io_queues(nctrl, remove);
>   }
>   
>   static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)
> @@ -572,6 +850,17 @@ nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
>   	return 0;
>   }
>   
> +inline size_t nvme_tcp_ofld_inline_data_size(struct nvme_tcp_ofld_queue *queue)
> +{
> +	return queue->cmnd_capsule_len - sizeof(struct nvme_command);
> +}
> +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_inline_data_size);
> +
> +static void nvme_tcp_ofld_commit_rqs(struct blk_mq_hw_ctx *hctx)
> +{
> +	/* Call ops->commit_rqs */
> +}
> +
>   static blk_status_t
>   nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
>   		       const struct blk_mq_queue_data *bd)
> @@ -583,22 +872,96 @@ nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
>   	return BLK_STS_OK;
>   }
>   
> +static void
> +nvme_tcp_ofld_exit_request(struct blk_mq_tag_set *set,
> +			   struct request *rq, unsigned int hctx_idx)
> +{
> +	/*
> +	 * Nothing is allocated in nvme_tcp_ofld_init_request,
> +	 * hence empty.
> +	 */
> +}
> +
> +static int
> +nvme_tcp_ofld_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
> +			unsigned int hctx_idx)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl = data;
> +
> +	hctx->driver_data = &ctrl->queues[hctx_idx + 1];
> +
> +	return 0;
> +}
> +
> +static int nvme_tcp_ofld_map_queues(struct blk_mq_tag_set *set)
> +{
> +	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
> +	struct nvmf_ctrl_options *opts = ctrl->nctrl.opts;
> +
> +	if (opts->nr_write_queues && ctrl->io_queues[HCTX_TYPE_READ]) {
> +		/* separate read/write queues */
> +		set->map[HCTX_TYPE_DEFAULT].nr_queues =
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
> +		set->map[HCTX_TYPE_READ].nr_queues =
> +			ctrl->io_queues[HCTX_TYPE_READ];
> +		set->map[HCTX_TYPE_READ].queue_offset =
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +	} else {
> +		/* shared read/write queues */
> +		set->map[HCTX_TYPE_DEFAULT].nr_queues =
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
> +		set->map[HCTX_TYPE_READ].nr_queues =
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +		set->map[HCTX_TYPE_READ].queue_offset = 0;
> +	}
> +	blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
> +	blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
> +
> +	if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) {
> +		/* map dedicated poll queues only if we have queues left */
> +		set->map[HCTX_TYPE_POLL].nr_queues =
> +				ctrl->io_queues[HCTX_TYPE_POLL];
> +		set->map[HCTX_TYPE_POLL].queue_offset =
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT] +
> +			ctrl->io_queues[HCTX_TYPE_READ];
> +		blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
> +	}
> +
> +	dev_info(ctrl->nctrl.device,
> +		 "mapped %d/%d/%d default/read/poll queues.\n",
> +		 ctrl->io_queues[HCTX_TYPE_DEFAULT],
> +		 ctrl->io_queues[HCTX_TYPE_READ],
> +		 ctrl->io_queues[HCTX_TYPE_POLL]);
> +
> +	return 0;
> +}
> +
> +static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
> +{
> +	/* Placeholder - Implement polling mechanism */
> +
> +	return 0;
> +}
> +
>   static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
>   	.queue_rq	= nvme_tcp_ofld_queue_rq,
> +	.commit_rqs     = nvme_tcp_ofld_commit_rqs,
> +	.complete	= nvme_complete_rq,
>   	.init_request	= nvme_tcp_ofld_init_request,
> -	/*
> -	 * All additional ops will be also implemented and registered similar to
> -	 * tcp.c
> -	 */
> +	.exit_request	= nvme_tcp_ofld_exit_request,
> +	.init_hctx	= nvme_tcp_ofld_init_hctx,
> +	.map_queues	= nvme_tcp_ofld_map_queues,
> +	.poll		= nvme_tcp_ofld_poll,
>   };
>   
>   static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
>   	.queue_rq	= nvme_tcp_ofld_queue_rq,
> +	.complete	= nvme_complete_rq,
>   	.init_request	= nvme_tcp_ofld_init_request,
> -	/*
> -	 * All additional ops will be also implemented and registered similar to
> -	 * tcp.c
> -	 */
> +	.exit_request	= nvme_tcp_ofld_exit_request,
> +	.init_hctx	= nvme_tcp_ofld_init_admin_hctx,
>   };
>   
>   static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
> diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
> index b23b1d7ea6fa..d82645fcf9da 100644
> --- a/drivers/nvme/host/tcp-offload.h
> +++ b/drivers/nvme/host/tcp-offload.h
> @@ -105,7 +105,6 @@ struct nvme_tcp_ofld_ctrl {
>   	 * Each entry in the array indicates the number of queues of
>   	 * corresponding type.
>   	 */
> -	u32 queue_type_mapping[HCTX_MAX_TYPES];
>   	u32 io_queues[HCTX_MAX_TYPES];
>   
>   	/* Connectivity params */
> @@ -205,3 +204,4 @@ struct nvme_tcp_ofld_ops {
>   int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
>   void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
>   void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);
> +inline size_t nvme_tcp_ofld_inline_data_size(struct nvme_tcp_ofld_queue *queue);
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 14/27] nvme-tcp-offload: Add IO level implementation
  2021-04-29 19:09 ` [RFC PATCH v4 14/27] nvme-tcp-offload: Add IO " Shai Malin
@ 2021-05-01 16:38   ` Hannes Reinecke
  2021-05-04 16:34     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-01 16:38 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024,
	Dean Balandin

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Dean Balandin <dbalandin@marvell.com>
> 
> In this patch, we present the IO level functionality.
> The nvme-tcp-offload shall work on the IO-level, meaning the
> nvme-tcp-offload ULP module shall pass the request to the nvme-tcp-offload
> vendor driver and shall expect for the request compilation.

Request compilation? Not request completion?

> No additional handling is needed in between, this design will reduce the
> CPU utilization as we will describe below.
> 
> The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
> with the following IO-path ops:
>   - init_req
>   - send_req - in order to pass the request to the handling of the offload
>     driver that shall pass it to the vendor specific device
>   - poll_queue
> 
> The vendor driver will manage the context from which the request will be
> executed and the request aggregations.
> Once the IO completed, the nvme-tcp-offload vendor driver shall call
> command.done() that shall invoke the nvme-tcp-offload ULP layer for
> completing the request.
> 
> This patch also contains initial definition of nvme_tcp_ofld_queue_rq().
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/host/tcp-offload.c | 95 ++++++++++++++++++++++++++++++---
>   1 file changed, 87 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
> index 8ddce2257100..0cdf5a432208 100644
> --- a/drivers/nvme/host/tcp-offload.c
> +++ b/drivers/nvme/host/tcp-offload.c
> @@ -127,7 +127,10 @@ void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
>   			    union nvme_result *result,
>   			    __le16 status)
>   {
> -	/* Placeholder - complete request with/without error */
> +	struct request *rq = blk_mq_rq_from_pdu(req);
> +
> +	if (!nvme_try_complete_req(rq, cpu_to_le16(status << 1), *result))
> +		nvme_complete_rq(rq);
>   }
>   
>   struct nvme_tcp_ofld_dev *
> @@ -686,6 +689,34 @@ static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
>   	kfree(ctrl);
>   }
>   
> +static void nvme_tcp_ofld_set_sg_null(struct nvme_command *c)
> +{
> +	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
> +
> +	sg->addr = 0;
> +	sg->length = 0;
> +	sg->type = (NVME_TRANSPORT_SGL_DATA_DESC << 4) | NVME_SGL_FMT_TRANSPORT_A;
> +}
> +
> +inline void nvme_tcp_ofld_set_sg_inline(struct nvme_tcp_ofld_queue *queue,
> +					struct nvme_command *c, u32 data_len)
> +{
> +	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
> +
> +	sg->addr = cpu_to_le64(queue->ctrl->nctrl.icdoff);
> +	sg->length = cpu_to_le32(data_len);
> +	sg->type = (NVME_SGL_FMT_DATA_DESC << 4) | NVME_SGL_FMT_OFFSET;
> +}
> +
> +void nvme_tcp_ofld_map_data(struct nvme_command *c, u32 data_len)
> +{
> +	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
> +
> +	sg->addr = 0;
> +	sg->length = cpu_to_le32(data_len);
> +	sg->type = (NVME_TRANSPORT_SGL_DATA_DESC << 4) | NVME_SGL_FMT_TRANSPORT_A;
> +}
> +
>   static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)
>   {
>   	/* Placeholder - submit_async_event */
> @@ -841,9 +872,11 @@ nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
>   {
>   	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
>   	struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
> +	int qid;
>   
> -	/* Placeholder - init request */
> -
> +	qid = (set == &ctrl->tag_set) ? hctx_idx + 1 : 0;
> +	req->queue = &ctrl->queues[qid];
> +	nvme_req(rq)->ctrl = &ctrl->nctrl;
>   	req->done = nvme_tcp_ofld_req_done;
>   	ctrl->dev->ops->init_req(req);
>   
> @@ -858,16 +891,60 @@ EXPORT_SYMBOL_GPL(nvme_tcp_ofld_inline_data_size);
>   
>   static void nvme_tcp_ofld_commit_rqs(struct blk_mq_hw_ctx *hctx)
>   {
> -	/* Call ops->commit_rqs */
> +	struct nvme_tcp_ofld_queue *queue = hctx->driver_data;
> +	struct nvme_tcp_ofld_dev *dev = queue->dev;
> +	struct nvme_tcp_ofld_ops *ops = dev->ops;
> +
> +	ops->commit_rqs(queue);
>   }
>   
>   static blk_status_t
>   nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
>   		       const struct blk_mq_queue_data *bd)
>   {
> -	/* Call nvme_setup_cmd(...) */
> +	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(bd->rq);
> +	struct nvme_tcp_ofld_queue *queue = hctx->driver_data;
> +	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
> +	struct nvme_ns *ns = hctx->queue->queuedata;
> +	struct nvme_tcp_ofld_dev *dev = queue->dev;
> +	struct nvme_tcp_ofld_ops *ops = dev->ops;
> +	struct nvme_command *nvme_cmd;
> +	struct request *rq;
> +	bool queue_ready;
> +	u32 data_len;
> +	int rc;
> +
> +	queue_ready = test_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags);
> +
> +	req->rq = bd->rq;
> +	req->async = false;
> +	rq = req->rq;
> +
> +	if (!nvmf_check_ready(&ctrl->nctrl, req->rq, queue_ready))
> +		return nvmf_fail_nonready_command(&ctrl->nctrl, req->rq);
> +
> +	rc = nvme_setup_cmd(ns, req->rq, &req->nvme_cmd);
> +	if (unlikely(rc))
> +		return rc;
>   
> -	/* Call ops->send_req(...) */
> +	blk_mq_start_request(req->rq);
> +	req->last = bd->last;
> +
> +	nvme_cmd = &req->nvme_cmd;
> +	nvme_cmd->common.flags |= NVME_CMD_SGL_METABUF;
> +
> +	data_len = blk_rq_nr_phys_segments(rq) ? blk_rq_payload_bytes(rq) : 0;
> +	if (!data_len)
> +		nvme_tcp_ofld_set_sg_null(&req->nvme_cmd);
> +	else if ((rq_data_dir(rq) == WRITE) &&
> +		 data_len <= nvme_tcp_ofld_inline_data_size(queue))
> +		nvme_tcp_ofld_set_sg_inline(queue, nvme_cmd, data_len);
> +	else
> +		nvme_tcp_ofld_map_data(nvme_cmd, data_len);
> +
> +	rc = ops->send_req(req);
> +	if (unlikely(rc))
> +		return rc;
>   
>   	return BLK_STS_OK;
>   }
> @@ -940,9 +1017,11 @@ static int nvme_tcp_ofld_map_queues(struct blk_mq_tag_set *set)
>   
>   static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
>   {
> -	/* Placeholder - Implement polling mechanism */
> +	struct nvme_tcp_ofld_queue *queue = hctx->driver_data;
> +	struct nvme_tcp_ofld_dev *dev = queue->dev;
> +	struct nvme_tcp_ofld_ops *ops = dev->ops;
>   
> -	return 0;
> +	return ops->poll_queue(queue);
>   }
>   
>   static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 15/27] nvme-tcp-offload: Add Timeout and ASYNC Support
  2021-04-29 19:09 ` [RFC PATCH v4 15/27] nvme-tcp-offload: Add Timeout and ASYNC Support Shai Malin
@ 2021-05-01 16:45   ` Hannes Reinecke
  2021-05-04 16:49     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-01 16:45 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> In this patch, we present the nvme-tcp-offload timeout support
> nvme_tcp_ofld_timeout() and ASYNC support
> nvme_tcp_ofld_submit_async_event().
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/host/tcp-offload.c | 85 ++++++++++++++++++++++++++++++++-
>   drivers/nvme/host/tcp-offload.h |  2 +
>   2 files changed, 86 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
> index 0cdf5a432208..1d62f921f109 100644
> --- a/drivers/nvme/host/tcp-offload.c
> +++ b/drivers/nvme/host/tcp-offload.c
> @@ -133,6 +133,26 @@ void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,
>   		nvme_complete_rq(rq);
>   }
>   
> +/**
> + * nvme_tcp_ofld_async_req_done() - NVMeTCP Offload request done callback
> + * function for async request. Pointed to by nvme_tcp_ofld_req->done.
> + * Handles both NVME_TCP_F_DATA_SUCCESS flag and NVMe CQ.
> + * @req:	NVMeTCP offload request to complete.
> + * @result:     The nvme_result.
> + * @status:     The completion status.
> + *
> + * API function that allows the vendor specific offload driver to report request
> + * completions to the common offload layer.
> + */
> +void nvme_tcp_ofld_async_req_done(struct nvme_tcp_ofld_req *req,
> +				  union nvme_result *result, __le16 status)
> +{
> +	struct nvme_tcp_ofld_queue *queue = req->queue;
> +	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
> +
> +	nvme_complete_async_event(&ctrl->nctrl, status, result);
> +}
> +
>   struct nvme_tcp_ofld_dev *
>   nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)
>   {
> @@ -719,7 +739,23 @@ void nvme_tcp_ofld_map_data(struct nvme_command *c, u32 data_len)
>   
>   static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)
>   {
> -	/* Placeholder - submit_async_event */
> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(arg);
> +	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[0];
> +	struct nvme_tcp_ofld_dev *dev = queue->dev;
> +	struct nvme_tcp_ofld_ops *ops = dev->ops;
> +
> +	ctrl->async_req.nvme_cmd.common.opcode = nvme_admin_async_event;
> +	ctrl->async_req.nvme_cmd.common.command_id = NVME_AQ_BLK_MQ_DEPTH;
> +	ctrl->async_req.nvme_cmd.common.flags |= NVME_CMD_SGL_METABUF;
> +
> +	nvme_tcp_ofld_set_sg_null(&ctrl->async_req.nvme_cmd);
> +
> +	ctrl->async_req.async = true;
> +	ctrl->async_req.queue = queue;
> +	ctrl->async_req.last = true;
> +	ctrl->async_req.done = nvme_tcp_ofld_async_req_done;
> +
> +	ops->send_req(&ctrl->async_req);
>   }
>   
>   static void
> @@ -1024,6 +1060,51 @@ static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
>   	return ops->poll_queue(queue);
>   }
>   
> +static void nvme_tcp_ofld_complete_timed_out(struct request *rq)
> +{
> +	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
> +	struct nvme_ctrl *nctrl = &req->queue->ctrl->nctrl;
> +
> +	nvme_tcp_ofld_stop_queue(nctrl, nvme_tcp_ofld_qid(req->queue));
> +	if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
> +		nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
> +		blk_mq_complete_request(rq);
> +	}
> +}
> +
> +static enum blk_eh_timer_return nvme_tcp_ofld_timeout(struct request *rq, bool reserved)
> +{
> +	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);
> +	struct nvme_tcp_ofld_ctrl *ctrl = req->queue->ctrl;
> +
> +	dev_warn(ctrl->nctrl.device,
> +		 "queue %d: timeout request %#x type %d\n",
> +		 nvme_tcp_ofld_qid(req->queue), rq->tag, req->nvme_cmd.common.opcode);
> +
> +	if (ctrl->nctrl.state != NVME_CTRL_LIVE) {
> +		/*
> +		 * If we are resetting, connecting or deleting we should
> +		 * complete immediately because we may block controller
> +		 * teardown or setup sequence
> +		 * - ctrl disable/shutdown fabrics requests
> +		 * - connect requests
> +		 * - initialization admin requests
> +		 * - I/O requests that entered after unquiescing and
> +		 *   the controller stopped responding
> +		 *
> +		 * All other requests should be cancelled by the error
> +		 * recovery work, so it's fine that we fail it here.
> +		 */
> +		nvme_tcp_ofld_complete_timed_out(rq);
> +
> +		return BLK_EH_DONE;
> +	}

And this particular error code has been causing _so_ _many_ issues 
during testing, that I'd rather get rid of it altogether.
But probably not your fault, your just copying what tcp and rdma is doing.

> +
> +	nvme_tcp_ofld_error_recovery(&ctrl->nctrl);
> +
> +	return BLK_EH_RESET_TIMER;
> +}
> +
>   static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
>   	.queue_rq	= nvme_tcp_ofld_queue_rq,
>   	.commit_rqs     = nvme_tcp_ofld_commit_rqs,
> @@ -1031,6 +1112,7 @@ static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
>   	.init_request	= nvme_tcp_ofld_init_request,
>   	.exit_request	= nvme_tcp_ofld_exit_request,
>   	.init_hctx	= nvme_tcp_ofld_init_hctx,
> +	.timeout	= nvme_tcp_ofld_timeout,
>   	.map_queues	= nvme_tcp_ofld_map_queues,
>   	.poll		= nvme_tcp_ofld_poll,
>   };
> @@ -1041,6 +1123,7 @@ static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
>   	.init_request	= nvme_tcp_ofld_init_request,
>   	.exit_request	= nvme_tcp_ofld_exit_request,
>   	.init_hctx	= nvme_tcp_ofld_init_admin_hctx,
> +	.timeout	= nvme_tcp_ofld_timeout,
>   };
>   
>   static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
> diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
> index d82645fcf9da..275a7e2d9d8a 100644
> --- a/drivers/nvme/host/tcp-offload.h
> +++ b/drivers/nvme/host/tcp-offload.h
> @@ -110,6 +110,8 @@ struct nvme_tcp_ofld_ctrl {
>   	/* Connectivity params */
>   	struct nvme_tcp_ofld_ctrl_con_params conn_params;
>   
> +	struct nvme_tcp_ofld_req async_req;
> +
>   	/* Vendor specific driver context */
>   	void *private_data;
>   };
> 
So:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver
  2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
                   ` (26 preceding siblings ...)
  2021-04-29 19:09 ` [RFC PATCH v4 27/27] qedn: Add support of ASYNC Shai Malin
@ 2021-05-01 16:47 ` Hannes Reinecke
  2021-05-03 15:13   ` Shai Malin
  27 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-01 16:47 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:08 PM, Shai Malin wrote:
> With the goal of enabling a generic infrastructure that allows NVMe/TCP
> offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
> patch series introduces the nvme-tcp-offload ULP host layer, which will
> be a new transport type called "tcp-offload" and will serve as an
> abstraction layer to work with vendor specific nvme-tcp offload drivers.
> 
> NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
> both the TCP level and the NVMeTCP level.
> 
> The nvme-tcp-offload transport can co-exist with the existing tcp and
> other transports. The tcp offload was designed so that stack changes are
> kept to a bare minimum: only registering new transports.
> All other APIs, ops etc. are identical to the regular tcp transport.
> Representing the TCP offload as a new transport allows clear and manageable
> differentiation between the connections which should use the offload path
> and those that are not offloaded (even on the same device).
> 
> 
> The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:
> 
> * NVMe layer: *
> 
>         [ nvme/nvme-fabrics/blk-mq ]
>               |
>          (nvme API and blk-mq API)
>               |
>               |			
> * Vendor agnostic transport layer: *
> 
>        [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
>               |        |             |
>             (Verbs)
>               |        |             |
>               |     (Socket)
>               |        |             |
>               |        |        (nvme-tcp-offload API)
>               |        |             |
>               |        |             |
> * Vendor Specific Driver: *
> 
>               |        |             |
>             [ qedr ]
>                        |             |
>                     [ qede ]
>                                      |
>                                    [ qedn ]
> 
> 
> Performance:
> ============
> With this implementation on top of the Marvell qedn driver (using the
> Marvell FastLinQ NIC), we were able to demonstrate the following CPU
> utilization improvement:
> 
> On AMD EPYC 7402, 2.80GHz, 28 cores:
> - For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate):
>    Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with
>    NVMeTCP offload.
> 
> On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores:
> - For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate):
>    Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with
>    NVMeTCP offload.
> 
> In addition, we were able to demonstrate the following latency improvement:
> - For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
>    Improved the average latency from 105 usec with NVMeTCP SW to 39 usec
>    with NVMeTCP offload.
>    
>    Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec
>    with NVMeTCP offload.
> 
> The end-to-end offload latency was measured from fio while running against
> back end of null device.
> 
> 
> Upstream plan:
> ==============
> Following this RFC, the series will be sent in a modular way so that changes
> in each part will not impact the previous part.
> 
> - Part 1 (Patches 1-7):
>    The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.
> 
> - Part 2 (Patch 8-15):
>    The nvme-tcp-offload patches, will be sent to
>    'linux-nvme@lists.infradead.org'.
> 
> - Part 3 (Packet 16-27):
>    The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
>   
> 
> Queue Initialization Design:
> ============================
> The nvme-tcp-offload ULP module shall register with the existing
> nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
> The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
> with the following ops:
> - claim_dev() - in order to resolve the route to the target according to
>                  the paired net_dev.
> - create_queue() - in order to create offloaded nvme-tcp queue.
> 
> The nvme-tcp-offload ULP module shall manage all the controller level
> functionalities, call claim_dev and based on the return values shall call
> the relevant module create_queue in order to create the admin queue and
> the IO queues.
> 
> 
> IO-path Design:
> ===============
> The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload
> ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor
> driver and later, the nvme-tcp-offload vendor driver returns the request
> completion (the IO completion).
> No additional handling is needed in between; this design will reduce the
> CPU utilization as we will describe below.
> 
> The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
> with the following IO-path ops:
> - init_req()
> - send_req() - in order to pass the request to the handling of the
>                 offload driver that shall pass it to the vendor specific device.
> - poll_queue()
> 
> Once the IO completes, the nvme-tcp-offload vendor driver shall call
> command.done() that will invoke the nvme-tcp-offload ULP layer to
> complete the request.
> 
> 
> TCP events:
> ===========
> The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions
> and OOO events.
> 
> 
> Teardown and errors:
> ====================
> In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall
> call the nvme_tcp_ofld_report_queue_err.
> The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
> with the following teardown ops:
> - drain_queue()
> - destroy_queue()
> 
> 
> The Marvell FastLinQ NIC HW engine:
> ====================================
> The Marvell NIC HW engine is capable of offloading the entire TCP/IP
> stack and managing up to 64K connections per PF, already implemented and
> upstream use cases for this include iWARP (by the Marvell qedr driver)
> and iSCSI (by the Marvell qedi driver).
> In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
> and is able to manage the IO level also in case of TCP re-transmissions
> and OOO events.
> The HW engine enables direct data placement (including the data digest CRC
> calculation and validation) and direct data transmission (including data
> digest CRC calculation).
> 
> 
> The Marvell qedn driver:
> ========================
> The new driver will be added under "drivers/nvme/hw" and will be enabled
> by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
> As part of the qedn init, the driver will register as a pci device driver
> and will work with the Marvell fastlinQ NIC.
> As part of the probe, the driver will register to the nvme_tcp_offload
> (ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
> "qed_*_ops" which are used by the qede, qedr, qedf and qedi device
> drivers.
>    
> 
> QEDN Future work:
> =================
> - Support extended HW resources.
> - Digest support.
> - Devlink support for device configuration and TCP offload configurations.
> - Statistics
> 
>   
> Long term future work:
> ======================
> - The nvme-tcp-offload ULP target abstraction layer.
> - The Marvell nvme-tcp-offload "qednt" target driver.
> 
> 
> Changes since RFC v1:
> =====================
> - Fix nvme_tcp_ofld_ops return values.
> - Remove NVMF_TRTYPE_TCP_OFFLOAD.
> - Add nvme_tcp_ofld_poll() implementation.
> - Fix nvme_tcp_ofld_queue_rq() to check map_sg() and send_req() return
>    values.
> 
> Changes since RFC v2:
> =====================
> - Add qedn - Marvell's NVMeTCP HW offload vendor driver init and probe
>    (patches 8-11).
> - Fixes in controller and queue level (patches 3-6).
>    
> Changes since RFC v3:
> =====================
> - Add the full implementation of the nvme-tcp-offload layer including the
>    new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new flows (ASYNC
>    and timeout).
> - Add nvme-tcp-offload device maximums: max_hw_sectors, max_segments.
> - Add nvme-tcp-offload layer design and optimization changes.
> - Add the qedn full implementation for the conn level, IO path and error
>    handling.
> - Add qed support for the new AHP HW.
> 
> 
> Arie Gershberg (3):
>    nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
>      definitions
>    nvme-tcp-offload: Add controller level implementation
>    nvme-tcp-offload: Add controller level error recovery implementation
> 
> Dean Balandin (3):
>    nvme-tcp-offload: Add device scan implementation
>    nvme-tcp-offload: Add queue level implementation
>    nvme-tcp-offload: Add IO level implementation
> 
> Nikolay Assa (2):
>    qed: Add IP services APIs support
>    qedn: Add qedn_claim_dev API support
> 
> Omkar Kulkarni (1):
>    qed: Add qed-NVMeTCP personality
> 
> Prabhakar Kushwaha (6):
>    qed: Add support of HW filter block
>    qedn: Add connection-level slowpath functionality
>    qedn: Add support of configuring HW filter block
>    qedn: Add support of Task and SGL
>    qedn: Add support of NVME ICReq & ICResp
>    qedn: Add support of ASYNC
> 
> Shai Malin (12):
>    qed: Add NVMeTCP Offload PF Level FW and HW HSI
>    qed: Add NVMeTCP Offload Connection Level FW and HW HSI
>    qed: Add NVMeTCP Offload IO Level FW and HW HSI
>    qed: Add NVMeTCP Offload IO Level FW Initializations
>    nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
>    nvme-tcp-offload: Add Timeout and ASYNC Support
>    qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver
>    qedn: Add qedn probe
>    qedn: Add IRQ and fast-path resources initializations
>    qedn: Add IO level nvme_req and fw_cq workqueues
>    qedn: Add IO level fastpath functionality
>    qedn: Add Connection and IO level recovery flows
> 
>   MAINTAINERS                                   |   10 +
>   drivers/net/ethernet/qlogic/Kconfig           |    3 +
>   drivers/net/ethernet/qlogic/qed/Makefile      |    5 +
>   drivers/net/ethernet/qlogic/qed/qed.h         |   16 +
>   drivers/net/ethernet/qlogic/qed/qed_cxt.c     |   32 +
>   drivers/net/ethernet/qlogic/qed/qed_cxt.h     |    1 +
>   drivers/net/ethernet/qlogic/qed/qed_dev.c     |  151 +-
>   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |    4 +-
>   drivers/net/ethernet/qlogic/qed/qed_ll2.c     |   31 +-
>   drivers/net/ethernet/qlogic/qed/qed_mcp.c     |    3 +
>   drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |    3 +-
>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |  868 +++++++++++
>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  114 ++
>   .../qlogic/qed/qed_nvmetcp_fw_funcs.c         |  372 +++++
>   .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |   43 +
>   .../qlogic/qed/qed_nvmetcp_ip_services.c      |  239 +++
>   drivers/net/ethernet/qlogic/qed/qed_ooo.c     |    5 +-
>   drivers/net/ethernet/qlogic/qed/qed_sp.h      |    5 +
>   .../net/ethernet/qlogic/qed/qed_sp_commands.c |    1 +
>   drivers/nvme/Kconfig                          |    1 +
>   drivers/nvme/Makefile                         |    1 +
>   drivers/nvme/host/Kconfig                     |   16 +
>   drivers/nvme/host/Makefile                    |    3 +
>   drivers/nvme/host/fabrics.c                   |    7 -
>   drivers/nvme/host/fabrics.h                   |    7 +
>   drivers/nvme/host/tcp-offload.c               | 1330 +++++++++++++++++
>   drivers/nvme/host/tcp-offload.h               |  209 +++
>   drivers/nvme/hw/Kconfig                       |    9 +
>   drivers/nvme/hw/Makefile                      |    3 +
>   drivers/nvme/hw/qedn/Makefile                 |    4 +
>   drivers/nvme/hw/qedn/qedn.h                   |  435 ++++++
>   drivers/nvme/hw/qedn/qedn_conn.c              |  999 +++++++++++++
>   drivers/nvme/hw/qedn/qedn_main.c              | 1153 ++++++++++++++
>   drivers/nvme/hw/qedn/qedn_task.c              |  977 ++++++++++++
>   include/linux/qed/common_hsi.h                |    1 +
>   include/linux/qed/nvmetcp_common.h            |  616 ++++++++
>   include/linux/qed/qed_if.h                    |   22 +
>   include/linux/qed/qed_nvmetcp_if.h            |  244 +++
>   .../linux/qed/qed_nvmetcp_ip_services_if.h    |   29 +
>   39 files changed, 7947 insertions(+), 25 deletions(-)
>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
>   create mode 100644 drivers/nvme/host/tcp-offload.c
>   create mode 100644 drivers/nvme/host/tcp-offload.h
>   create mode 100644 drivers/nvme/hw/Kconfig
>   create mode 100644 drivers/nvme/hw/Makefile
>   create mode 100644 drivers/nvme/hw/qedn/Makefile
>   create mode 100644 drivers/nvme/hw/qedn/qedn.h
>   create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c
>   create mode 100644 drivers/nvme/hw/qedn/qedn_main.c
>   create mode 100644 drivers/nvme/hw/qedn/qedn_task.c
>   create mode 100644 include/linux/qed/nvmetcp_common.h
>   create mode 100644 include/linux/qed/qed_nvmetcp_if.h
>   create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h
> 
I would structure this patchset slightly different, in putting the 
NVMe-oF implementation at the start of the patchset; this will be where 
you get most of the comment, and any change there will potentially 
reflect back on the driver implementation, too.

Something to consider for the next round.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 01/27] qed: Add NVMeTCP Offload PF Level FW and HW HSI
  2021-04-29 19:09 ` [RFC PATCH v4 01/27] qed: Add NVMeTCP Offload PF Level FW and HW HSI Shai Malin
@ 2021-05-01 16:50   ` Hannes Reinecke
  2021-05-03 15:23     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-01 16:50 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024,
	Dean Balandin

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch introduces the NVMeTCP device and PF level HSI and HSI
> functionality in order to initialize and interact with the HW device.
> 
> This patch is based on the qede, qedr, qedi, qedf drivers HSI.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> ---
>   drivers/net/ethernet/qlogic/Kconfig           |   3 +
>   drivers/net/ethernet/qlogic/qed/Makefile      |   2 +
>   drivers/net/ethernet/qlogic/qed/qed.h         |   3 +
>   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |   1 +
>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 282 ++++++++++++++++++
>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  51 ++++
>   drivers/net/ethernet/qlogic/qed/qed_sp.h      |   2 +
>   include/linux/qed/common_hsi.h                |   1 +
>   include/linux/qed/nvmetcp_common.h            |  54 ++++
>   include/linux/qed/qed_if.h                    |  22 ++
>   include/linux/qed/qed_nvmetcp_if.h            |  72 +++++
>   11 files changed, 493 insertions(+)
>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
>   create mode 100644 include/linux/qed/nvmetcp_common.h
>   create mode 100644 include/linux/qed/qed_nvmetcp_if.h
> 
> diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig
> index 6b5ddb07ee83..98f430905ffa 100644
> --- a/drivers/net/ethernet/qlogic/Kconfig
> +++ b/drivers/net/ethernet/qlogic/Kconfig
> @@ -110,6 +110,9 @@ config QED_RDMA
>   config QED_ISCSI
>   	bool
>   
> +config QED_NVMETCP
> +	bool
> +
>   config QED_FCOE
>   	bool
>   
> diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile
> index 8251755ec18c..7cb0db67ba5b 100644
> --- a/drivers/net/ethernet/qlogic/qed/Makefile
> +++ b/drivers/net/ethernet/qlogic/qed/Makefile
> @@ -28,6 +28,8 @@ qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o
>   qed-$(CONFIG_QED_LL2) += qed_ll2.o
>   qed-$(CONFIG_QED_OOO) += qed_ooo.o
>   
> +qed-$(CONFIG_QED_NVMETCP) += qed_nvmetcp.o
> +
>   qed-$(CONFIG_QED_RDMA) +=	\
>   	qed_iwarp.o		\
>   	qed_rdma.o		\
> diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
> index a20cb8a0c377..91d4635009ab 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed.h
> @@ -240,6 +240,7 @@ enum QED_FEATURE {
>   	QED_VF,
>   	QED_RDMA_CNQ,
>   	QED_ISCSI_CQ,
> +	QED_NVMETCP_CQ = QED_ISCSI_CQ,
>   	QED_FCOE_CQ,
>   	QED_VF_L2_QUE,
>   	QED_MAX_FEATURES,
> @@ -592,6 +593,7 @@ struct qed_hwfn {
>   	struct qed_ooo_info		*p_ooo_info;
>   	struct qed_rdma_info		*p_rdma_info;
>   	struct qed_iscsi_info		*p_iscsi_info;
> +	struct qed_nvmetcp_info		*p_nvmetcp_info;
>   	struct qed_fcoe_info		*p_fcoe_info;
>   	struct qed_pf_params		pf_params;
>   
> @@ -828,6 +830,7 @@ struct qed_dev {
>   		struct qed_eth_cb_ops		*eth;
>   		struct qed_fcoe_cb_ops		*fcoe;
>   		struct qed_iscsi_cb_ops		*iscsi;
> +		struct qed_nvmetcp_cb_ops	*nvmetcp;
>   	} protocol_ops;
>   	void				*ops_cookie;
>   
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> index 559df9f4d656..24472f6a83c2 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> @@ -20,6 +20,7 @@
>   #include <linux/qed/fcoe_common.h>
>   #include <linux/qed/eth_common.h>
>   #include <linux/qed/iscsi_common.h>
> +#include <linux/qed/nvmetcp_common.h>
>   #include <linux/qed/iwarp_common.h>
>   #include <linux/qed/rdma_common.h>
>   #include <linux/qed/roce_common.h>
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> new file mode 100644
> index 000000000000..da3b5002d216
> --- /dev/null
> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> @@ -0,0 +1,282 @@
> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
> +/* Copyright 2021 Marvell. All rights reserved. */
> +
> +#include <linux/types.h>
> +#include <asm/byteorder.h>
> +#include <asm/param.h>
> +#include <linux/delay.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/etherdevice.h>
> +#include <linux/kernel.h>
> +#include <linux/log2.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/stddef.h>
> +#include <linux/string.h>
> +#include <linux/errno.h>
> +#include <linux/list.h>
> +#include <linux/qed/qed_nvmetcp_if.h>
> +#include "qed.h"
> +#include "qed_cxt.h"
> +#include "qed_dev_api.h"
> +#include "qed_hsi.h"
> +#include "qed_hw.h"
> +#include "qed_int.h"
> +#include "qed_nvmetcp.h"
> +#include "qed_ll2.h"
> +#include "qed_mcp.h"
> +#include "qed_sp.h"
> +#include "qed_reg_addr.h"
> +
> +static int qed_nvmetcp_async_event(struct qed_hwfn *p_hwfn, u8 fw_event_code,
> +				   u16 echo, union event_ring_data *data,
> +				   u8 fw_return_code)
> +{
> +	if (p_hwfn->p_nvmetcp_info->event_cb) {
> +		struct qed_nvmetcp_info *p_nvmetcp = p_hwfn->p_nvmetcp_info;
> +
> +		return p_nvmetcp->event_cb(p_nvmetcp->event_context,
> +					 fw_event_code, data);
> +	} else {
> +		DP_NOTICE(p_hwfn, "nvmetcp async completion is not set\n");
> +
> +		return -EINVAL;
> +	}
> +}
> +
> +static int qed_sp_nvmetcp_func_start(struct qed_hwfn *p_hwfn,
> +				     enum spq_mode comp_mode,
> +				     struct qed_spq_comp_cb *p_comp_addr,
> +				     void *event_context,
> +				     nvmetcp_event_cb_t async_event_cb)
> +{
> +	struct nvmetcp_init_ramrod_params *p_ramrod = NULL;
> +	struct qed_nvmetcp_pf_params *p_params = NULL;
> +	struct scsi_init_func_queues *p_queue = NULL;
> +	struct nvmetcp_spe_func_init *p_init = NULL;
> +	struct qed_sp_init_data init_data = {};
> +	struct qed_spq_entry *p_ent = NULL;
> +	int rc = 0;
> +	u16 val;
> +	u8 i;
> +
> +	/* Get SPQ entry */
> +	init_data.cid = qed_spq_get_cid(p_hwfn);
> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> +	init_data.comp_mode = comp_mode;
> +	init_data.p_comp_data = p_comp_addr;
> +
> +	rc = qed_sp_init_request(p_hwfn, &p_ent,
> +				 NVMETCP_RAMROD_CMD_ID_INIT_FUNC,
> +				 PROTOCOLID_NVMETCP, &init_data);
> +	if (rc)
> +		return rc;
> +
> +	p_ramrod = &p_ent->ramrod.nvmetcp_init;
> +	p_init = &p_ramrod->nvmetcp_init_spe;
> +	p_params = &p_hwfn->pf_params.nvmetcp_pf_params;
> +	p_queue = &p_init->q_params;
> +
> +	p_init->num_sq_pages_in_ring = p_params->num_sq_pages_in_ring;
> +	p_init->num_r2tq_pages_in_ring = p_params->num_r2tq_pages_in_ring;
> +	p_init->num_uhq_pages_in_ring = p_params->num_uhq_pages_in_ring;
> +	p_init->ll2_rx_queue_id = RESC_START(p_hwfn, QED_LL2_RAM_QUEUE) +
> +					p_params->ll2_ooo_queue_id;
> +
> +	SET_FIELD(p_init->flags, NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE, 1);
> +
> +	p_init->func_params.log_page_size = ilog2(PAGE_SIZE);
> +	p_init->func_params.num_tasks = cpu_to_le16(p_params->num_tasks);
> +	p_init->debug_flags = p_params->debug_mode;
> +
> +	DMA_REGPAIR_LE(p_queue->glbl_q_params_addr,
> +		       p_params->glbl_q_params_addr);
> +
> +	p_queue->cq_num_entries = cpu_to_le16(QED_NVMETCP_FW_CQ_SIZE);
> +	p_queue->num_queues = p_params->num_queues;
> +	val = RESC_START(p_hwfn, QED_CMDQS_CQS);
> +	p_queue->queue_relative_offset = cpu_to_le16((u16)val);
> +	p_queue->cq_sb_pi = p_params->gl_rq_pi;
> +
> +	for (i = 0; i < p_params->num_queues; i++) {
> +		val = qed_get_igu_sb_id(p_hwfn, i);
> +		p_queue->cq_cmdq_sb_num_arr[i] = cpu_to_le16(val);
> +	}
> +
> +	SET_FIELD(p_queue->q_validity,
> +		  SCSI_INIT_FUNC_QUEUES_CMD_VALID, 0);
> +	p_queue->cmdq_num_entries = 0;
> +	p_queue->bdq_resource_id = (u8)RESC_START(p_hwfn, QED_BDQ);
> +
> +	/* p_ramrod->tcp_init.min_rto = cpu_to_le16(p_params->min_rto); */
> +	p_ramrod->tcp_init.two_msl_timer = cpu_to_le32(QED_TCP_TWO_MSL_TIMER);
> +	p_ramrod->tcp_init.tx_sws_timer = cpu_to_le16(QED_TCP_SWS_TIMER);
> +	p_init->half_way_close_timeout = cpu_to_le16(QED_TCP_HALF_WAY_CLOSE_TIMEOUT);
> +	p_ramrod->tcp_init.max_fin_rt = QED_TCP_MAX_FIN_RT;
> +
> +	SET_FIELD(p_ramrod->nvmetcp_init_spe.params,
> +		  NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT, QED_TCP_MAX_FIN_RT);
> +
> +	p_hwfn->p_nvmetcp_info->event_context = event_context;
> +	p_hwfn->p_nvmetcp_info->event_cb = async_event_cb;
> +
> +	qed_spq_register_async_cb(p_hwfn, PROTOCOLID_NVMETCP,
> +				  qed_nvmetcp_async_event);
> +
> +	return qed_spq_post(p_hwfn, p_ent, NULL);
> +}
> +
> +static int qed_sp_nvmetcp_func_stop(struct qed_hwfn *p_hwfn,
> +				    enum spq_mode comp_mode,
> +				    struct qed_spq_comp_cb *p_comp_addr)
> +{
> +	struct qed_spq_entry *p_ent = NULL;
> +	struct qed_sp_init_data init_data;
> +	int rc;
> +
> +	/* Get SPQ entry */
> +	memset(&init_data, 0, sizeof(init_data));
> +	init_data.cid = qed_spq_get_cid(p_hwfn);
> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> +	init_data.comp_mode = comp_mode;
> +	init_data.p_comp_data = p_comp_addr;
> +
> +	rc = qed_sp_init_request(p_hwfn, &p_ent,
> +				 NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC,
> +				 PROTOCOLID_NVMETCP, &init_data);
> +	if (rc)
> +		return rc;
> +
> +	rc = qed_spq_post(p_hwfn, p_ent, NULL);
> +
> +	qed_spq_unregister_async_cb(p_hwfn, PROTOCOLID_NVMETCP);
> +
> +	return rc;
> +}
> +
> +static int qed_fill_nvmetcp_dev_info(struct qed_dev *cdev,
> +				     struct qed_dev_nvmetcp_info *info)
> +{
> +	struct qed_hwfn *hwfn = QED_AFFIN_HWFN(cdev);
> +	int rc;
> +
> +	memset(info, 0, sizeof(*info));
> +	rc = qed_fill_dev_info(cdev, &info->common);
> +
> +	info->port_id = MFW_PORT(hwfn);
> +	info->num_cqs = FEAT_NUM(hwfn, QED_NVMETCP_CQ);
> +
> +	return rc;
> +}
> +
> +static void qed_register_nvmetcp_ops(struct qed_dev *cdev,
> +				     struct qed_nvmetcp_cb_ops *ops,
> +				     void *cookie)
> +{
> +	cdev->protocol_ops.nvmetcp = ops;
> +	cdev->ops_cookie = cookie;
> +}
> +
> +static int qed_nvmetcp_stop(struct qed_dev *cdev)
> +{
> +	int rc;
> +
> +	if (!(cdev->flags & QED_FLAG_STORAGE_STARTED)) {
> +		DP_NOTICE(cdev, "nvmetcp already stopped\n");
> +
> +		return 0;
> +	}
> +
> +	if (!hash_empty(cdev->connections)) {
> +		DP_NOTICE(cdev,
> +			  "Can't stop nvmetcp - not all connections were returned\n");
> +
> +		return -EINVAL;
> +	}
> +
> +	/* Stop the nvmetcp */
> +	rc = qed_sp_nvmetcp_func_stop(QED_AFFIN_HWFN(cdev), QED_SPQ_MODE_EBLOCK,
> +				      NULL);
> +	cdev->flags &= ~QED_FLAG_STORAGE_STARTED;
> +
> +	return rc;
> +}
> +
> +static int qed_nvmetcp_start(struct qed_dev *cdev,
> +			     struct qed_nvmetcp_tid *tasks,
> +			     void *event_context,
> +			     nvmetcp_event_cb_t async_event_cb)
> +{
> +	struct qed_tid_mem *tid_info;
> +	int rc;
> +
> +	if (cdev->flags & QED_FLAG_STORAGE_STARTED) {
> +		DP_NOTICE(cdev, "nvmetcp already started;\n");
> +
> +		return 0;
> +	}
> +
> +	rc = qed_sp_nvmetcp_func_start(QED_AFFIN_HWFN(cdev),
> +				       QED_SPQ_MODE_EBLOCK, NULL,
> +				       event_context, async_event_cb);
> +	if (rc) {
> +		DP_NOTICE(cdev, "Failed to start nvmetcp\n");
> +
> +		return rc;
> +	}
> +
> +	cdev->flags |= QED_FLAG_STORAGE_STARTED;
> +	hash_init(cdev->connections);
> +
> +	if (!tasks)
> +		return 0;
> +
> +	tid_info = kzalloc(sizeof(*tid_info), GFP_KERNEL);
> +
> +	if (!tid_info) {
> +		qed_nvmetcp_stop(cdev);
> +
> +		return -ENOMEM;
> +	}
> +
> +	rc = qed_cxt_get_tid_mem_info(QED_AFFIN_HWFN(cdev), tid_info);
> +	if (rc) {
> +		DP_NOTICE(cdev, "Failed to gather task information\n");
> +		qed_nvmetcp_stop(cdev);
> +		kfree(tid_info);
> +
> +		return rc;
> +	}
> +
> +	/* Fill task information */
> +	tasks->size = tid_info->tid_size;
> +	tasks->num_tids_per_block = tid_info->num_tids_per_block;
> +	memcpy(tasks->blocks, tid_info->blocks,
> +	       MAX_TID_BLOCKS_NVMETCP * sizeof(u8 *));
> +
> +	kfree(tid_info);
> +
> +	return 0;
> +}
> +
> +static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
> +	.common = &qed_common_ops_pass,
> +	.ll2 = &qed_ll2_ops_pass,
> +	.fill_dev_info = &qed_fill_nvmetcp_dev_info,
> +	.register_ops = &qed_register_nvmetcp_ops,
> +	.start = &qed_nvmetcp_start,
> +	.stop = &qed_nvmetcp_stop,
> +
> +	/* Placeholder - Connection level ops */
> +};
> +
> +const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)
> +{
> +	return &qed_nvmetcp_ops_pass;
> +}
> +EXPORT_SYMBOL(qed_get_nvmetcp_ops);
> +
> +void qed_put_nvmetcp_ops(void)
> +{
> +}
> +EXPORT_SYMBOL(qed_put_nvmetcp_ops);
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> new file mode 100644
> index 000000000000..774b46ade408
> --- /dev/null
> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> @@ -0,0 +1,51 @@
> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
> +/* Copyright 2021 Marvell. All rights reserved. */
> +
> +#ifndef _QED_NVMETCP_H
> +#define _QED_NVMETCP_H
> +
> +#include <linux/types.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/qed/tcp_common.h>
> +#include <linux/qed/qed_nvmetcp_if.h>
> +#include <linux/qed/qed_chain.h>
> +#include "qed.h"
> +#include "qed_hsi.h"
> +#include "qed_mcp.h"
> +#include "qed_sp.h"
> +
> +#define QED_NVMETCP_FW_CQ_SIZE (4 * 1024)
> +
> +/* tcp parameters */
> +#define QED_TCP_TWO_MSL_TIMER 4000
> +#define QED_TCP_HALF_WAY_CLOSE_TIMEOUT 10
> +#define QED_TCP_MAX_FIN_RT 2
> +#define QED_TCP_SWS_TIMER 5000
> +
> +struct qed_nvmetcp_info {
> +	spinlock_t lock; /* Connection resources. */
> +	struct list_head free_list;
> +	u16 max_num_outstanding_tasks;
> +	void *event_context;
> +	nvmetcp_event_cb_t event_cb;
> +};
> +
> +#if IS_ENABLED(CONFIG_QED_NVMETCP)
> +int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn);
> +void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn);
> +void qed_nvmetcp_free(struct qed_hwfn *p_hwfn);
> +
> +#else /* IS_ENABLED(CONFIG_QED_NVMETCP) */
> +static inline int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn) {}
> +static inline void qed_nvmetcp_free(struct qed_hwfn *p_hwfn) {}
> +
> +#endif /* IS_ENABLED(CONFIG_QED_NVMETCP) */
> +
> +#endif
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h
> index 993f1357b6fc..525159e747a5 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_sp.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h
> @@ -100,6 +100,8 @@ union ramrod_data {
>   	struct iscsi_spe_conn_mac_update iscsi_conn_mac_update;
>   	struct iscsi_spe_conn_termination iscsi_conn_terminate;
>   
> +	struct nvmetcp_init_ramrod_params nvmetcp_init;
> +
>   	struct vf_start_ramrod_data vf_start;
>   	struct vf_stop_ramrod_data vf_stop;
>   };
> diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h
> index 977807e1be53..59c5e5866607 100644
> --- a/include/linux/qed/common_hsi.h
> +++ b/include/linux/qed/common_hsi.h
> @@ -703,6 +703,7 @@ enum mf_mode {
>   /* Per-protocol connection types */
>   enum protocol_type {
>   	PROTOCOLID_ISCSI,
> +	PROTOCOLID_NVMETCP = PROTOCOLID_ISCSI,
>   	PROTOCOLID_FCOE,
>   	PROTOCOLID_ROCE,
>   	PROTOCOLID_CORE,

Why not a separate Protocol ID?
Don't you expect iSCSI and NVMe-TCP to be run at the same time?

> diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
> new file mode 100644
> index 000000000000..e9ccfc07041d
> --- /dev/null
> +++ b/include/linux/qed/nvmetcp_common.h
> @@ -0,0 +1,54 @@
> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
> +/* Copyright 2021 Marvell. All rights reserved. */
> +
> +#ifndef __NVMETCP_COMMON__
> +#define __NVMETCP_COMMON__
> +
> +#include "tcp_common.h"
> +
> +/* NVMeTCP firmware function init parameters */
> +struct nvmetcp_spe_func_init {
> +	__le16 half_way_close_timeout;
> +	u8 num_sq_pages_in_ring;
> +	u8 num_r2tq_pages_in_ring;
> +	u8 num_uhq_pages_in_ring;
> +	u8 ll2_rx_queue_id;
> +	u8 flags;
> +#define NVMETCP_SPE_FUNC_INIT_COUNTERS_EN_MASK 0x1
> +#define NVMETCP_SPE_FUNC_INIT_COUNTERS_EN_SHIFT 0
> +#define NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE_MASK 0x1
> +#define NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE_SHIFT 1
> +#define NVMETCP_SPE_FUNC_INIT_RESERVED0_MASK 0x3F
> +#define NVMETCP_SPE_FUNC_INIT_RESERVED0_SHIFT 2
> +	u8 debug_flags;
> +	__le16 reserved1;
> +	u8 params;
> +#define NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT_MASK	0xF
> +#define NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT_SHIFT	0
> +#define NVMETCP_SPE_FUNC_INIT_RESERVED1_MASK	0xF
> +#define NVMETCP_SPE_FUNC_INIT_RESERVED1_SHIFT	4
> +	u8 reserved2[5];
> +	struct scsi_init_func_params func_params;
> +	struct scsi_init_func_queues q_params;
> +};
> +
> +/* NVMeTCP init params passed by driver to FW in NVMeTCP init ramrod. */
> +struct nvmetcp_init_ramrod_params {
> +	struct nvmetcp_spe_func_init nvmetcp_init_spe;
> +	struct tcp_init_params tcp_init;
> +};
> +
> +/* NVMeTCP Ramrod Command IDs */
> +enum nvmetcp_ramrod_cmd_id {
> +	NVMETCP_RAMROD_CMD_ID_UNUSED = 0,
> +	NVMETCP_RAMROD_CMD_ID_INIT_FUNC = 1,
> +	NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC = 2,
> +	MAX_NVMETCP_RAMROD_CMD_ID
> +};
> +
> +struct nvmetcp_glbl_queue_entry {
> +	struct regpair cq_pbl_addr;
> +	struct regpair reserved;
> +};
> +
> +#endif /* __NVMETCP_COMMON__ */
> diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
> index 68d17a4fbf20..524f57821ba2 100644
> --- a/include/linux/qed/qed_if.h
> +++ b/include/linux/qed/qed_if.h
> @@ -542,6 +542,26 @@ struct qed_iscsi_pf_params {
>   	u8 bdq_pbl_num_entries[3];
>   };
>   
> +struct qed_nvmetcp_pf_params {
> +	u64 glbl_q_params_addr;
> +	u16 cq_num_entries;
> +
> +	u16 num_cons;
> +	u16 num_tasks;
> +
> +	u8 num_sq_pages_in_ring;
> +	u8 num_r2tq_pages_in_ring;
> +	u8 num_uhq_pages_in_ring;
> +
> +	u8 num_queues;
> +	u8 gl_rq_pi;
> +	u8 gl_cmd_pi;
> +	u8 debug_mode;
> +	u8 ll2_ooo_queue_id;
> +
> +	u16 min_rto;
> +};
> +
>   struct qed_rdma_pf_params {
>   	/* Supplied to QED during resource allocation (may affect the ILT and
>   	 * the doorbell BAR).
> @@ -560,6 +580,7 @@ struct qed_pf_params {
>   	struct qed_eth_pf_params eth_pf_params;
>   	struct qed_fcoe_pf_params fcoe_pf_params;
>   	struct qed_iscsi_pf_params iscsi_pf_params;
> +	struct qed_nvmetcp_pf_params nvmetcp_pf_params;
>   	struct qed_rdma_pf_params rdma_pf_params;
>   };
>   
> @@ -662,6 +683,7 @@ enum qed_sb_type {
>   enum qed_protocol {
>   	QED_PROTOCOL_ETH,
>   	QED_PROTOCOL_ISCSI,
> +	QED_PROTOCOL_NVMETCP = QED_PROTOCOL_ISCSI,
>   	QED_PROTOCOL_FCOE,
>   };
>   
> diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
> new file mode 100644
> index 000000000000..abc1f41862e3
> --- /dev/null
> +++ b/include/linux/qed/qed_nvmetcp_if.h
> @@ -0,0 +1,72 @@
> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
> +/* Copyright 2021 Marvell. All rights reserved. */
> +
> +#ifndef _QED_NVMETCP_IF_H
> +#define _QED_NVMETCP_IF_H
> +#include <linux/types.h>
> +#include <linux/qed/qed_if.h>
> +
> +#define QED_NVMETCP_MAX_IO_SIZE	0x800000
> +
> +typedef int (*nvmetcp_event_cb_t) (void *context,
> +				   u8 fw_event_code, void *fw_handle);
> +
> +struct qed_dev_nvmetcp_info {
> +	struct qed_dev_info common;
> +
> +	u8 port_id;  /* Physical port */
> +	u8 num_cqs;
> +};
> +
> +#define MAX_TID_BLOCKS_NVMETCP (512)
> +struct qed_nvmetcp_tid {
> +	u32 size;		/* In bytes per task */
> +	u32 num_tids_per_block;
> +	u8 *blocks[MAX_TID_BLOCKS_NVMETCP];
> +};
> +
> +struct qed_nvmetcp_cb_ops {
> +	struct qed_common_cb_ops common;
> +};
> +
> +/**
> + * struct qed_nvmetcp_ops - qed NVMeTCP operations.
> + * @common:		common operations pointer
> + * @ll2:		light L2 operations pointer
> + * @fill_dev_info:	fills NVMeTCP specific information
> + *			@param cdev
> + *			@param info
> + *			@return 0 on success, otherwise error value.
> + * @register_ops:	register nvmetcp operations
> + *			@param cdev
> + *			@param ops - specified using qed_nvmetcp_cb_ops
> + *			@param cookie - driver private
> + * @start:		nvmetcp in FW
> + *			@param cdev
> + *			@param tasks - qed will fill information about tasks
> + *			return 0 on success, otherwise error value.
> + * @stop:		nvmetcp in FW
> + *			@param cdev
> + *			return 0 on success, otherwise error value.
> + */
> +struct qed_nvmetcp_ops {
> +	const struct qed_common_ops *common;
> +
> +	const struct qed_ll2_ops *ll2;
> +
> +	int (*fill_dev_info)(struct qed_dev *cdev,
> +			     struct qed_dev_nvmetcp_info *info);
> +
> +	void (*register_ops)(struct qed_dev *cdev,
> +			     struct qed_nvmetcp_cb_ops *ops, void *cookie);
> +
> +	int (*start)(struct qed_dev *cdev,
> +		     struct qed_nvmetcp_tid *tasks,
> +		     void *event_context, nvmetcp_event_cb_t async_event_cb);
> +
> +	int (*stop)(struct qed_dev *cdev);
> +};
> +
> +const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);
> +void qed_put_nvmetcp_ops(void);
> +#endif
> 
As mentioned, please rearrange the patchset to have the NVMe-TCP patches 
first, then the driver specific bits.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 02/27] qed: Add NVMeTCP Offload Connection Level FW and HW HSI
  2021-04-29 19:09 ` [RFC PATCH v4 02/27] qed: Add NVMeTCP Offload Connection " Shai Malin
@ 2021-05-01 17:28   ` Hannes Reinecke
  2021-05-03 15:25     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-01 17:28 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch introduces the NVMeTCP HSI and HSI functionality in order to
> initialize and interact with the HW device as part of the connection level
> HSI.
> 
> This includes:
> - Connection offload: offload a TCP connection to the FW.
> - Connection update: update the ICReq-ICResp params
> - Connection clear SQ: outstanding IOs FW flush.
> - Connection termination: terminate the TCP connection and flush the FW.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> ---
>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 580 +++++++++++++++++-
>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  63 ++
>   drivers/net/ethernet/qlogic/qed/qed_sp.h      |   3 +
>   include/linux/qed/nvmetcp_common.h            | 143 +++++
>   include/linux/qed/qed_nvmetcp_if.h            |  94 +++
>   5 files changed, 881 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> index da3b5002d216..79bd1cc6677f 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> @@ -259,6 +259,578 @@ static int qed_nvmetcp_start(struct qed_dev *cdev,
>   	return 0;
>   }
>   
> +static struct qed_hash_nvmetcp_con *qed_nvmetcp_get_hash(struct qed_dev *cdev,
> +							 u32 handle)
> +{
> +	struct qed_hash_nvmetcp_con *hash_con = NULL;
> +
> +	if (!(cdev->flags & QED_FLAG_STORAGE_STARTED))
> +		return NULL;
> +
> +	hash_for_each_possible(cdev->connections, hash_con, node, handle) {
> +		if (hash_con->con->icid == handle)
> +			break;
> +	}
> +
> +	if (!hash_con || hash_con->con->icid != handle)
> +		return NULL;
> +
> +	return hash_con;
> +}
> +
> +static int qed_sp_nvmetcp_conn_offload(struct qed_hwfn *p_hwfn,
> +				       struct qed_nvmetcp_conn *p_conn,
> +				       enum spq_mode comp_mode,
> +				       struct qed_spq_comp_cb *p_comp_addr)
> +{
> +	struct nvmetcp_spe_conn_offload *p_ramrod = NULL;
> +	struct tcp_offload_params_opt2 *p_tcp2 = NULL;
> +	struct qed_sp_init_data init_data = { 0 };
> +	struct qed_spq_entry *p_ent = NULL;
> +	dma_addr_t r2tq_pbl_addr;
> +	dma_addr_t xhq_pbl_addr;
> +	dma_addr_t uhq_pbl_addr;
> +	u16 physical_q;
> +	int rc = 0;
> +	u32 dval;
> +	u8 i;
> +
> +	/* Get SPQ entry */
> +	init_data.cid = p_conn->icid;
> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> +	init_data.comp_mode = comp_mode;
> +	init_data.p_comp_data = p_comp_addr;
> +
> +	rc = qed_sp_init_request(p_hwfn, &p_ent,
> +				 NVMETCP_RAMROD_CMD_ID_OFFLOAD_CONN,
> +				 PROTOCOLID_NVMETCP, &init_data);
> +	if (rc)
> +		return rc;
> +
> +	p_ramrod = &p_ent->ramrod.nvmetcp_conn_offload;
> +
> +	/* Transmission PQ is the first of the PF */
> +	physical_q = qed_get_cm_pq_idx(p_hwfn, PQ_FLAGS_OFLD);
> +	p_conn->physical_q0 = cpu_to_le16(physical_q);
> +	p_ramrod->nvmetcp.physical_q0 = cpu_to_le16(physical_q);
> +
> +	/* nvmetcp Pure-ACK PQ */
> +	physical_q = qed_get_cm_pq_idx(p_hwfn, PQ_FLAGS_ACK);
> +	p_conn->physical_q1 = cpu_to_le16(physical_q);
> +	p_ramrod->nvmetcp.physical_q1 = cpu_to_le16(physical_q);
> +
> +	p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);
> +
> +	DMA_REGPAIR_LE(p_ramrod->nvmetcp.sq_pbl_addr, p_conn->sq_pbl_addr);
> +
> +	r2tq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->r2tq);
> +	DMA_REGPAIR_LE(p_ramrod->nvmetcp.r2tq_pbl_addr, r2tq_pbl_addr);
> +
> +	xhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->xhq);
> +	DMA_REGPAIR_LE(p_ramrod->nvmetcp.xhq_pbl_addr, xhq_pbl_addr);
> +
> +	uhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->uhq);
> +	DMA_REGPAIR_LE(p_ramrod->nvmetcp.uhq_pbl_addr, uhq_pbl_addr);
> +
> +	p_ramrod->nvmetcp.flags = p_conn->offl_flags;
> +	p_ramrod->nvmetcp.default_cq = p_conn->default_cq;
> +	p_ramrod->nvmetcp.initial_ack = 0;
> +
> +	DMA_REGPAIR_LE(p_ramrod->nvmetcp.nvmetcp.cccid_itid_table_addr,
> +		       p_conn->nvmetcp_cccid_itid_table_addr);
> +	p_ramrod->nvmetcp.nvmetcp.cccid_max_range =
> +		 cpu_to_le16(p_conn->nvmetcp_cccid_max_range);
> +
> +	p_tcp2 = &p_ramrod->tcp;
> +
> +	qed_set_fw_mac_addr(&p_tcp2->remote_mac_addr_hi,
> +			    &p_tcp2->remote_mac_addr_mid,
> +			    &p_tcp2->remote_mac_addr_lo, p_conn->remote_mac);
> +	qed_set_fw_mac_addr(&p_tcp2->local_mac_addr_hi,
> +			    &p_tcp2->local_mac_addr_mid,
> +			    &p_tcp2->local_mac_addr_lo, p_conn->local_mac);
> +
> +	p_tcp2->vlan_id = cpu_to_le16(p_conn->vlan_id);
> +	p_tcp2->flags = cpu_to_le16(p_conn->tcp_flags);
> +
> +	p_tcp2->ip_version = p_conn->ip_version;
> +	for (i = 0; i < 4; i++) {
> +		dval = p_conn->remote_ip[i];
> +		p_tcp2->remote_ip[i] = cpu_to_le32(dval);
> +		dval = p_conn->local_ip[i];
> +		p_tcp2->local_ip[i] = cpu_to_le32(dval);
> +	}
> +

What is this?
Some convoluted way of assigning the IP address in little endian?
Pointless if it's IPv4, as then each bit is just one byte.
And if it's for IPv6, what do you do for IPv4?
And isn't there a helper for it?

> +	p_tcp2->flow_label = cpu_to_le32(p_conn->flow_label);
> +	p_tcp2->ttl = p_conn->ttl;
> +	p_tcp2->tos_or_tc = p_conn->tos_or_tc;
> +	p_tcp2->remote_port = cpu_to_le16(p_conn->remote_port);
> +	p_tcp2->local_port = cpu_to_le16(p_conn->local_port);
> +	p_tcp2->mss = cpu_to_le16(p_conn->mss);
> +	p_tcp2->rcv_wnd_scale = p_conn->rcv_wnd_scale;
> +	p_tcp2->connect_mode = p_conn->connect_mode;
> +	p_tcp2->cwnd = cpu_to_le32(p_conn->cwnd);
> +	p_tcp2->ka_max_probe_cnt = p_conn->ka_max_probe_cnt;
> +	p_tcp2->ka_timeout = cpu_to_le32(p_conn->ka_timeout);
> +	p_tcp2->max_rt_time = cpu_to_le32(p_conn->max_rt_time);
> +	p_tcp2->ka_interval = cpu_to_le32(p_conn->ka_interval);
> +
> +	return qed_spq_post(p_hwfn, p_ent, NULL);
> +}
> +
> +static int qed_sp_nvmetcp_conn_update(struct qed_hwfn *p_hwfn,
> +				      struct qed_nvmetcp_conn *p_conn,
> +				      enum spq_mode comp_mode,
> +				      struct qed_spq_comp_cb *p_comp_addr)
> +{
> +	struct nvmetcp_conn_update_ramrod_params *p_ramrod = NULL;
> +	struct qed_spq_entry *p_ent = NULL;
> +	struct qed_sp_init_data init_data;
> +	int rc = -EINVAL;
> +	u32 dval;
> +
> +	/* Get SPQ entry */
> +	memset(&init_data, 0, sizeof(init_data));
> +	init_data.cid = p_conn->icid;
> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> +	init_data.comp_mode = comp_mode;
> +	init_data.p_comp_data = p_comp_addr;
> +
> +	rc = qed_sp_init_request(p_hwfn, &p_ent,
> +				 NVMETCP_RAMROD_CMD_ID_UPDATE_CONN,
> +				 PROTOCOLID_NVMETCP, &init_data);
> +	if (rc)
> +		return rc;
> +
> +	p_ramrod = &p_ent->ramrod.nvmetcp_conn_update;
> +	p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);
> +	p_ramrod->flags = p_conn->update_flag;
> +	p_ramrod->max_seq_size = cpu_to_le32(p_conn->max_seq_size);
> +	dval = p_conn->max_recv_pdu_length;
> +	p_ramrod->max_recv_pdu_length = cpu_to_le32(dval);
> +	dval = p_conn->max_send_pdu_length;
> +	p_ramrod->max_send_pdu_length = cpu_to_le32(dval);
> +	dval = p_conn->first_seq_length;
> +	p_ramrod->first_seq_length = cpu_to_le32(dval);
> +
> +	return qed_spq_post(p_hwfn, p_ent, NULL);
> +}
> +
> +static int qed_sp_nvmetcp_conn_terminate(struct qed_hwfn *p_hwfn,
> +					 struct qed_nvmetcp_conn *p_conn,
> +					 enum spq_mode comp_mode,
> +					 struct qed_spq_comp_cb *p_comp_addr)
> +{
> +	struct nvmetcp_spe_conn_termination *p_ramrod = NULL;
> +	struct qed_spq_entry *p_ent = NULL;
> +	struct qed_sp_init_data init_data;
> +	int rc = -EINVAL;
> +
> +	/* Get SPQ entry */
> +	memset(&init_data, 0, sizeof(init_data));
> +	init_data.cid = p_conn->icid;
> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> +	init_data.comp_mode = comp_mode;
> +	init_data.p_comp_data = p_comp_addr;
> +
> +	rc = qed_sp_init_request(p_hwfn, &p_ent,
> +				 NVMETCP_RAMROD_CMD_ID_TERMINATION_CONN,
> +				 PROTOCOLID_NVMETCP, &init_data);
> +	if (rc)
> +		return rc;
> +
> +	p_ramrod = &p_ent->ramrod.nvmetcp_conn_terminate;
> +	p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);
> +	p_ramrod->abortive = p_conn->abortive_dsconnect;
> +
> +	return qed_spq_post(p_hwfn, p_ent, NULL);
> +}
> +
> +static int qed_sp_nvmetcp_conn_clear_sq(struct qed_hwfn *p_hwfn,
> +					struct qed_nvmetcp_conn *p_conn,
> +					enum spq_mode comp_mode,
> +					struct qed_spq_comp_cb *p_comp_addr)
> +{
> +	struct qed_spq_entry *p_ent = NULL;
> +	struct qed_sp_init_data init_data;
> +	int rc = -EINVAL;
> +
> +	/* Get SPQ entry */
> +	memset(&init_data, 0, sizeof(init_data));
> +	init_data.cid = p_conn->icid;
> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> +	init_data.comp_mode = comp_mode;
> +	init_data.p_comp_data = p_comp_addr;
> +
> +	rc = qed_sp_init_request(p_hwfn, &p_ent,
> +				 NVMETCP_RAMROD_CMD_ID_CLEAR_SQ,
> +				 PROTOCOLID_NVMETCP, &init_data);
> +	if (rc)
> +		return rc;
> +
> +	return qed_spq_post(p_hwfn, p_ent, NULL);
> +}
> +
> +static void __iomem *qed_nvmetcp_get_db_addr(struct qed_hwfn *p_hwfn, u32 cid)
> +{
> +	return (u8 __iomem *)p_hwfn->doorbells +
> +			     qed_db_addr(cid, DQ_DEMS_LEGACY);
> +}
> +
> +static int qed_nvmetcp_allocate_connection(struct qed_hwfn *p_hwfn,
> +					   struct qed_nvmetcp_conn **p_out_conn)
> +{
> +	struct qed_chain_init_params params = {
> +		.mode		= QED_CHAIN_MODE_PBL,
> +		.intended_use	= QED_CHAIN_USE_TO_CONSUME_PRODUCE,
> +		.cnt_type	= QED_CHAIN_CNT_TYPE_U16,
> +	};
> +	struct qed_nvmetcp_pf_params *p_params = NULL;
> +	struct qed_nvmetcp_conn *p_conn = NULL;
> +	int rc = 0;
> +
> +	/* Try finding a free connection that can be used */
> +	spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
> +	if (!list_empty(&p_hwfn->p_nvmetcp_info->free_list))
> +		p_conn = list_first_entry(&p_hwfn->p_nvmetcp_info->free_list,
> +					  struct qed_nvmetcp_conn, list_entry);
> +	if (p_conn) {
> +		list_del(&p_conn->list_entry);
> +		spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
> +		*p_out_conn = p_conn;
> +
> +		return 0;
> +	}
> +	spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
> +
> +	/* Need to allocate a new connection */
> +	p_params = &p_hwfn->pf_params.nvmetcp_pf_params;
> +
> +	p_conn = kzalloc(sizeof(*p_conn), GFP_KERNEL);
> +	if (!p_conn)
> +		return -ENOMEM;
> +
> +	params.num_elems = p_params->num_r2tq_pages_in_ring *
> +			   QED_CHAIN_PAGE_SIZE / sizeof(struct nvmetcp_wqe);
> +	params.elem_size = sizeof(struct nvmetcp_wqe);
> +
> +	rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->r2tq, &params);
> +	if (rc)
> +		goto nomem_r2tq;
> +
> +	params.num_elems = p_params->num_uhq_pages_in_ring *
> +			   QED_CHAIN_PAGE_SIZE / sizeof(struct iscsi_uhqe);
> +	params.elem_size = sizeof(struct iscsi_uhqe);
> +
> +	rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->uhq, &params);
> +	if (rc)
> +		goto nomem_uhq;
> +
> +	params.elem_size = sizeof(struct iscsi_xhqe);
> +
> +	rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->xhq, &params);
> +	if (rc)
> +		goto nomem;
> +
> +	p_conn->free_on_delete = true;
> +	*p_out_conn = p_conn;
> +
> +	return 0;
> +
> +nomem:
> +	qed_chain_free(p_hwfn->cdev, &p_conn->uhq);
> +nomem_uhq:
> +	qed_chain_free(p_hwfn->cdev, &p_conn->r2tq);
> +nomem_r2tq:
> +	kfree(p_conn);
> +
> +	return -ENOMEM;
> +}
> +
> +static int qed_nvmetcp_acquire_connection(struct qed_hwfn *p_hwfn,
> +					  struct qed_nvmetcp_conn **p_out_conn)
> +{
> +	struct qed_nvmetcp_conn *p_conn = NULL;
> +	int rc = 0;
> +	u32 icid;
> +
> +	spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
> +	rc = qed_cxt_acquire_cid(p_hwfn, PROTOCOLID_NVMETCP, &icid);
> +	spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
> +
> +	if (rc)
> +		return rc;
> +
> +	rc = qed_nvmetcp_allocate_connection(p_hwfn, &p_conn);
> +	if (rc) {
> +		spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
> +		qed_cxt_release_cid(p_hwfn, icid);
> +		spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
> +
> +		return rc;
> +	}
> +
> +	p_conn->icid = icid;
> +	p_conn->conn_id = (u16)icid;
> +	p_conn->fw_cid = (p_hwfn->hw_info.opaque_fid << 16) | icid;
> +	*p_out_conn = p_conn;
> +
> +	return rc;
> +}
> +
> +static void qed_nvmetcp_release_connection(struct qed_hwfn *p_hwfn,
> +					   struct qed_nvmetcp_conn *p_conn)
> +{
> +	spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
> +	list_add_tail(&p_conn->list_entry, &p_hwfn->p_nvmetcp_info->free_list);
> +	qed_cxt_release_cid(p_hwfn, p_conn->icid);
> +	spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
> +}
> +
> +static void qed_nvmetcp_free_connection(struct qed_hwfn *p_hwfn,
> +					struct qed_nvmetcp_conn *p_conn)
> +{
> +	qed_chain_free(p_hwfn->cdev, &p_conn->xhq);
> +	qed_chain_free(p_hwfn->cdev, &p_conn->uhq);
> +	qed_chain_free(p_hwfn->cdev, &p_conn->r2tq);
> +
> +	kfree(p_conn);
> +}
> +
> +int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn)
> +{
> +	struct qed_nvmetcp_info *p_nvmetcp_info;
> +
> +	p_nvmetcp_info = kzalloc(sizeof(*p_nvmetcp_info), GFP_KERNEL);
> +	if (!p_nvmetcp_info)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&p_nvmetcp_info->free_list);
> +
> +	p_hwfn->p_nvmetcp_info = p_nvmetcp_info;
> +
> +	return 0;
> +}
> +
> +void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn)
> +{
> +	spin_lock_init(&p_hwfn->p_nvmetcp_info->lock);
> +}
> +
> +void qed_nvmetcp_free(struct qed_hwfn *p_hwfn)
> +{
> +	struct qed_nvmetcp_conn *p_conn = NULL;
> +
> +	if (!p_hwfn->p_nvmetcp_info)
> +		return;
> +
> +	while (!list_empty(&p_hwfn->p_nvmetcp_info->free_list)) {
> +		p_conn = list_first_entry(&p_hwfn->p_nvmetcp_info->free_list,
> +					  struct qed_nvmetcp_conn, list_entry);
> +		if (p_conn) {
> +			list_del(&p_conn->list_entry);
> +			qed_nvmetcp_free_connection(p_hwfn, p_conn);
> +		}
> +	}
> +
> +	kfree(p_hwfn->p_nvmetcp_info);
> +	p_hwfn->p_nvmetcp_info = NULL;
> +}
> +
> +static int qed_nvmetcp_acquire_conn(struct qed_dev *cdev,
> +				    u32 *handle,
> +				    u32 *fw_cid, void __iomem **p_doorbell)
> +{
> +	struct qed_hash_nvmetcp_con *hash_con;
> +	int rc;
> +
> +	/* Allocate a hashed connection */
> +	hash_con = kzalloc(sizeof(*hash_con), GFP_ATOMIC);
> +	if (!hash_con)
> +		return -ENOMEM;
> +
> +	/* Acquire the connection */
> +	rc = qed_nvmetcp_acquire_connection(QED_AFFIN_HWFN(cdev),
> +					    &hash_con->con);
> +	if (rc) {
> +		DP_NOTICE(cdev, "Failed to acquire Connection\n");
> +		kfree(hash_con);
> +
> +		return rc;
> +	}
> +
> +	/* Added the connection to hash table */
> +	*handle = hash_con->con->icid;
> +	*fw_cid = hash_con->con->fw_cid;
> +	hash_add(cdev->connections, &hash_con->node, *handle);
> +
> +	if (p_doorbell)
> +		*p_doorbell = qed_nvmetcp_get_db_addr(QED_AFFIN_HWFN(cdev),
> +						      *handle);
> +
> +	return 0;
> +}
> +
> +static int qed_nvmetcp_release_conn(struct qed_dev *cdev, u32 handle)
> +{
> +	struct qed_hash_nvmetcp_con *hash_con;
> +
> +	hash_con = qed_nvmetcp_get_hash(cdev, handle);
> +	if (!hash_con) {
> +		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
> +			  handle);
> +
> +		return -EINVAL;
> +	}
> +
> +	hlist_del(&hash_con->node);
> +	qed_nvmetcp_release_connection(QED_AFFIN_HWFN(cdev), hash_con->con);
> +	kfree(hash_con);
> +
> +	return 0;
> +}
> +
> +static int qed_nvmetcp_offload_conn(struct qed_dev *cdev, u32 handle,
> +				    struct qed_nvmetcp_params_offload *conn_info)
> +{
> +	struct qed_hash_nvmetcp_con *hash_con;
> +	struct qed_nvmetcp_conn *con;
> +
> +	hash_con = qed_nvmetcp_get_hash(cdev, handle);
> +	if (!hash_con) {
> +		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
> +			  handle);
> +
> +		return -EINVAL;
> +	}
> +
> +	/* Update the connection with information from the params */
> +	con = hash_con->con;
> +
> +	/* FW initializations */
> +	con->layer_code = NVMETCP_SLOW_PATH_LAYER_CODE;
> +	con->sq_pbl_addr = conn_info->sq_pbl_addr;
> +	con->nvmetcp_cccid_max_range = conn_info->nvmetcp_cccid_max_range;
> +	con->nvmetcp_cccid_itid_table_addr = conn_info->nvmetcp_cccid_itid_table_addr;
> +	con->default_cq = conn_info->default_cq;
> +
> +	SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE, 0);
> +	SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE, 1);
> +	SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B, 1);
> +
> +	/* Networking and TCP stack initializations */
> +	ether_addr_copy(con->local_mac, conn_info->src.mac);
> +	ether_addr_copy(con->remote_mac, conn_info->dst.mac);
> +	memcpy(con->local_ip, conn_info->src.ip, sizeof(con->local_ip));
> +	memcpy(con->remote_ip, conn_info->dst.ip, sizeof(con->remote_ip));
> +	con->local_port = conn_info->src.port;
> +	con->remote_port = conn_info->dst.port;
> +	con->vlan_id = conn_info->vlan_id;
> +
> +	if (conn_info->timestamp_en)
> +		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_TS_EN, 1);
> +
> +	if (conn_info->delayed_ack_en)
> +		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_DA_EN, 1);
> +
> +	if (conn_info->tcp_keep_alive_en)
> +		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_KA_EN, 1);
> +
> +	if (conn_info->ecn_en)
> +		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_ECN_EN, 1);
> +
> +	con->ip_version = conn_info->ip_version;
> +	con->flow_label = QED_TCP_FLOW_LABEL;
> +	con->ka_max_probe_cnt = conn_info->ka_max_probe_cnt;
> +	con->ka_timeout = conn_info->ka_timeout;
> +	con->ka_interval = conn_info->ka_interval;
> +	con->max_rt_time = conn_info->max_rt_time;
> +	con->ttl = conn_info->ttl;
> +	con->tos_or_tc = conn_info->tos_or_tc;
> +	con->mss = conn_info->mss;
> +	con->cwnd = conn_info->cwnd;
> +	con->rcv_wnd_scale = conn_info->rcv_wnd_scale;
> +	con->connect_mode = 0; /* TCP_CONNECT_ACTIVE */
> +
> +	return qed_sp_nvmetcp_conn_offload(QED_AFFIN_HWFN(cdev), con,
> +					 QED_SPQ_MODE_EBLOCK, NULL);
> +}
> +
> +static int qed_nvmetcp_update_conn(struct qed_dev *cdev,
> +				   u32 handle,
> +				   struct qed_nvmetcp_params_update *conn_info)
> +{
> +	struct qed_hash_nvmetcp_con *hash_con;
> +	struct qed_nvmetcp_conn *con;
> +
> +	hash_con = qed_nvmetcp_get_hash(cdev, handle);
> +	if (!hash_con) {
> +		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
> +			  handle);
> +
> +		return -EINVAL;
> +	}
> +
> +	/* Update the connection with information from the params */
> +	con = hash_con->con;
> +
> +	SET_FIELD(con->update_flag,
> +		  ISCSI_CONN_UPDATE_RAMROD_PARAMS_INITIAL_R2T, 0);
> +	SET_FIELD(con->update_flag,
> +		  ISCSI_CONN_UPDATE_RAMROD_PARAMS_IMMEDIATE_DATA, 1);
> +
> +	if (conn_info->hdr_digest_en)
> +		SET_FIELD(con->update_flag, ISCSI_CONN_UPDATE_RAMROD_PARAMS_HD_EN, 1);
> +
> +	if (conn_info->data_digest_en)
> +		SET_FIELD(con->update_flag, ISCSI_CONN_UPDATE_RAMROD_PARAMS_DD_EN, 1);
> +
> +	/* Placeholder - initialize pfv, cpda, hpda */
> +
> +	con->max_seq_size = conn_info->max_io_size;
> +	con->max_recv_pdu_length = conn_info->max_recv_pdu_length;
> +	con->max_send_pdu_length = conn_info->max_send_pdu_length;
> +	con->first_seq_length = conn_info->max_io_size;
> +
> +	return qed_sp_nvmetcp_conn_update(QED_AFFIN_HWFN(cdev), con,
> +					QED_SPQ_MODE_EBLOCK, NULL);
> +}
> +
> +static int qed_nvmetcp_clear_conn_sq(struct qed_dev *cdev, u32 handle)
> +{
> +	struct qed_hash_nvmetcp_con *hash_con;
> +
> +	hash_con = qed_nvmetcp_get_hash(cdev, handle);
> +	if (!hash_con) {
> +		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
> +			  handle);
> +
> +		return -EINVAL;
> +	}
> +
> +	return qed_sp_nvmetcp_conn_clear_sq(QED_AFFIN_HWFN(cdev), hash_con->con,
> +					    QED_SPQ_MODE_EBLOCK, NULL);
> +}
> +
> +static int qed_nvmetcp_destroy_conn(struct qed_dev *cdev,
> +				    u32 handle, u8 abrt_conn)
> +{
> +	struct qed_hash_nvmetcp_con *hash_con;
> +
> +	hash_con = qed_nvmetcp_get_hash(cdev, handle);
> +	if (!hash_con) {
> +		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
> +			  handle);
> +
> +		return -EINVAL;
> +	}
> +
> +	hash_con->con->abortive_dsconnect = abrt_conn;
> +
> +	return qed_sp_nvmetcp_conn_terminate(QED_AFFIN_HWFN(cdev), hash_con->con,
> +					   QED_SPQ_MODE_EBLOCK, NULL);
> +}
> +
>   static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
>   	.common = &qed_common_ops_pass,
>   	.ll2 = &qed_ll2_ops_pass,
> @@ -266,8 +838,12 @@ static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
>   	.register_ops = &qed_register_nvmetcp_ops,
>   	.start = &qed_nvmetcp_start,
>   	.stop = &qed_nvmetcp_stop,
> -
> -	/* Placeholder - Connection level ops */
> +	.acquire_conn = &qed_nvmetcp_acquire_conn,
> +	.release_conn = &qed_nvmetcp_release_conn,
> +	.offload_conn = &qed_nvmetcp_offload_conn,
> +	.update_conn = &qed_nvmetcp_update_conn,
> +	.destroy_conn = &qed_nvmetcp_destroy_conn,
> +	.clear_sq = &qed_nvmetcp_clear_conn_sq,
>   };
>   
>   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> index 774b46ade408..749169f0bdb1 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> @@ -19,6 +19,7 @@
>   #define QED_NVMETCP_FW_CQ_SIZE (4 * 1024)
>   
>   /* tcp parameters */
> +#define QED_TCP_FLOW_LABEL 0
>   #define QED_TCP_TWO_MSL_TIMER 4000
>   #define QED_TCP_HALF_WAY_CLOSE_TIMEOUT 10
>   #define QED_TCP_MAX_FIN_RT 2
> @@ -32,6 +33,68 @@ struct qed_nvmetcp_info {
>   	nvmetcp_event_cb_t event_cb;
>   };
>   
> +struct qed_hash_nvmetcp_con {
> +	struct hlist_node node;
> +	struct qed_nvmetcp_conn *con;
> +};
> +
> +struct qed_nvmetcp_conn {
> +	struct list_head list_entry;
> +	bool free_on_delete;
> +
> +	u16 conn_id;
> +	u32 icid;
> +	u32 fw_cid;
> +
> +	u8 layer_code;
> +	u8 offl_flags;
> +	u8 connect_mode;
> +
> +	dma_addr_t sq_pbl_addr;
> +	struct qed_chain r2tq;
> +	struct qed_chain xhq;
> +	struct qed_chain uhq;
> +
> +	u8 local_mac[6];
> +	u8 remote_mac[6];
> +	u8 ip_version;
> +	u8 ka_max_probe_cnt;
> +
> +	u16 vlan_id;
> +	u16 tcp_flags;
> +	u32 remote_ip[4];
> +	u32 local_ip[4];
> +
> +	u32 flow_label;
> +	u32 ka_timeout;
> +	u32 ka_interval;
> +	u32 max_rt_time;
> +
> +	u8 ttl;
> +	u8 tos_or_tc;
> +	u16 remote_port;
> +	u16 local_port;
> +	u16 mss;
> +	u8 rcv_wnd_scale;
> +	u32 rcv_wnd;
> +	u32 cwnd;
> +
> +	u8 update_flag;
> +	u8 default_cq;
> +	u8 abortive_dsconnect;
> +
> +	u32 max_seq_size;
> +	u32 max_recv_pdu_length;
> +	u32 max_send_pdu_length;
> +	u32 first_seq_length;
> +
> +	u16 physical_q0;
> +	u16 physical_q1;
> +
> +	u16 nvmetcp_cccid_max_range;
> +	dma_addr_t nvmetcp_cccid_itid_table_addr;
> +};
> +
>   #if IS_ENABLED(CONFIG_QED_NVMETCP)
>   int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn);
>   void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn);
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h
> index 525159e747a5..60ff3222bf55 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_sp.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h
> @@ -101,6 +101,9 @@ union ramrod_data {
>   	struct iscsi_spe_conn_termination iscsi_conn_terminate;
>   
>   	struct nvmetcp_init_ramrod_params nvmetcp_init;
> +	struct nvmetcp_spe_conn_offload nvmetcp_conn_offload;
> +	struct nvmetcp_conn_update_ramrod_params nvmetcp_conn_update;
> +	struct nvmetcp_spe_conn_termination nvmetcp_conn_terminate;
>   
>   	struct vf_start_ramrod_data vf_start;
>   	struct vf_stop_ramrod_data vf_stop;
> diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
> index e9ccfc07041d..c8836b71b866 100644
> --- a/include/linux/qed/nvmetcp_common.h
> +++ b/include/linux/qed/nvmetcp_common.h
> @@ -6,6 +6,8 @@
>   
>   #include "tcp_common.h"
>   
> +#define NVMETCP_SLOW_PATH_LAYER_CODE (6)
> +
>   /* NVMeTCP firmware function init parameters */
>   struct nvmetcp_spe_func_init {
>   	__le16 half_way_close_timeout;
> @@ -43,6 +45,10 @@ enum nvmetcp_ramrod_cmd_id {
>   	NVMETCP_RAMROD_CMD_ID_UNUSED = 0,
>   	NVMETCP_RAMROD_CMD_ID_INIT_FUNC = 1,
>   	NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC = 2,
> +	NVMETCP_RAMROD_CMD_ID_OFFLOAD_CONN = 3,
> +	NVMETCP_RAMROD_CMD_ID_UPDATE_CONN = 4,
> +	NVMETCP_RAMROD_CMD_ID_TERMINATION_CONN = 5,
> +	NVMETCP_RAMROD_CMD_ID_CLEAR_SQ = 6,
>   	MAX_NVMETCP_RAMROD_CMD_ID
>   };
>   
> @@ -51,4 +57,141 @@ struct nvmetcp_glbl_queue_entry {
>   	struct regpair reserved;
>   };
>   
> +/* NVMeTCP conn level EQEs */
> +enum nvmetcp_eqe_opcode {
> +	NVMETCP_EVENT_TYPE_INIT_FUNC = 0, /* Response after init Ramrod */
> +	NVMETCP_EVENT_TYPE_DESTROY_FUNC, /* Response after destroy Ramrod */
> +	NVMETCP_EVENT_TYPE_OFFLOAD_CONN,/* Response after option 2 offload Ramrod */
> +	NVMETCP_EVENT_TYPE_UPDATE_CONN, /* Response after update Ramrod */
> +	NVMETCP_EVENT_TYPE_CLEAR_SQ, /* Response after clear sq Ramrod */
> +	NVMETCP_EVENT_TYPE_TERMINATE_CONN, /* Response after termination Ramrod */
> +	NVMETCP_EVENT_TYPE_RESERVED0,
> +	NVMETCP_EVENT_TYPE_RESERVED1,
> +	NVMETCP_EVENT_TYPE_ASYN_CONNECT_COMPLETE, /* Connect completed (A-syn EQE) */
> +	NVMETCP_EVENT_TYPE_ASYN_TERMINATE_DONE, /* Termination completed (A-syn EQE) */
> +	NVMETCP_EVENT_TYPE_START_OF_ERROR_TYPES = 10, /* Separate EQs from err EQs */
> +	NVMETCP_EVENT_TYPE_ASYN_ABORT_RCVD, /* TCP RST packet receive (A-syn EQE) */
> +	NVMETCP_EVENT_TYPE_ASYN_CLOSE_RCVD, /* TCP FIN packet receive (A-syn EQE) */
> +	NVMETCP_EVENT_TYPE_ASYN_SYN_RCVD, /* TCP SYN+ACK packet receive (A-syn EQE) */
> +	NVMETCP_EVENT_TYPE_ASYN_MAX_RT_TIME, /* TCP max retransmit time (A-syn EQE) */
> +	NVMETCP_EVENT_TYPE_ASYN_MAX_RT_CNT, /* TCP max retransmit count (A-syn EQE) */
> +	NVMETCP_EVENT_TYPE_ASYN_MAX_KA_PROBES_CNT, /* TCP ka probes count (A-syn EQE) */
> +	NVMETCP_EVENT_TYPE_ASYN_FIN_WAIT2, /* TCP fin wait 2 (A-syn EQE) */
> +	NVMETCP_EVENT_TYPE_NVMETCP_CONN_ERROR, /* NVMeTCP error response (A-syn EQE) */
> +	NVMETCP_EVENT_TYPE_TCP_CONN_ERROR, /* NVMeTCP error - tcp error (A-syn EQE) */
> +	MAX_NVMETCP_EQE_OPCODE
> +};
> +
> +struct nvmetcp_conn_offload_section {
> +	struct regpair cccid_itid_table_addr; /* CCCID to iTID table address */
> +	__le16 cccid_max_range; /* CCCID max value - used for validation */
> +	__le16 reserved[3];
> +};
> +
> +/* NVMe TCP connection offload params passed by driver to FW in NVMeTCP offload ramrod */
> +struct nvmetcp_conn_offload_params {
> +	struct regpair sq_pbl_addr;
> +	struct regpair r2tq_pbl_addr;
> +	struct regpair xhq_pbl_addr;
> +	struct regpair uhq_pbl_addr;
> +	__le16 physical_q0;
> +	__le16 physical_q1;
> +	u8 flags;
> +#define NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B_MASK 0x1
> +#define NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B_SHIFT 0
> +#define NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE_MASK 0x1
> +#define NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE_SHIFT 1
> +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_MASK 0x1
> +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_SHIFT 2
> +#define NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE_MASK 0x1
> +#define NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE_SHIFT 3
> +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESERVED1_MASK 0xF
> +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESERVED1_SHIFT 4
> +	u8 default_cq;
> +	__le16 reserved0;
> +	__le32 reserved1;
> +	__le32 initial_ack;
> +
> +	struct nvmetcp_conn_offload_section nvmetcp; /* NVMe/TCP section */
> +};
> +
> +/* NVMe TCP and TCP connection offload params passed by driver to FW in NVMeTCP offload ramrod. */
> +struct nvmetcp_spe_conn_offload {
> +	__le16 reserved;
> +	__le16 conn_id;
> +	__le32 fw_cid;
> +	struct nvmetcp_conn_offload_params nvmetcp;
> +	struct tcp_offload_params_opt2 tcp;
> +};
> +
> +/* NVMeTCP connection update params passed by driver to FW in NVMETCP update ramrod. */
> +struct nvmetcp_conn_update_ramrod_params {
> +	__le16 reserved0;
> +	__le16 conn_id;
> +	__le32 reserved1;
> +	u8 flags;
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_HD_EN_MASK 0x1
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_HD_EN_SHIFT 0
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_DD_EN_MASK 0x1
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_DD_EN_SHIFT 1
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED0_MASK 0x1
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED0_SHIFT 2
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED1_MASK 0x1
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED1_DATA_SHIFT 3
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED2_MASK 0x1
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED2_SHIFT 4
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED3_MASK 0x1
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED3_SHIFT 5
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED4_MASK 0x1
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED4_SHIFT 6
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED5_MASK 0x1
> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED5_SHIFT 7
> +	u8 reserved3[3];
> +	__le32 max_seq_size;
> +	__le32 max_send_pdu_length;
> +	__le32 max_recv_pdu_length;
> +	__le32 first_seq_length;
> +	__le32 reserved4[5];
> +};
> +
> +/* NVMeTCP connection termination request */
> +struct nvmetcp_spe_conn_termination {
> +	__le16 reserved0;
> +	__le16 conn_id;
> +	__le32 reserved1;
> +	u8 abortive;
> +	u8 reserved2[7];
> +	struct regpair reserved3;
> +	struct regpair reserved4;
> +};
> +
> +struct nvmetcp_dif_flags {
> +	u8 flags;
> +};
> +
> +enum nvmetcp_wqe_type {
> +	NVMETCP_WQE_TYPE_NORMAL,
> +	NVMETCP_WQE_TYPE_TASK_CLEANUP,
> +	NVMETCP_WQE_TYPE_MIDDLE_PATH,
> +	NVMETCP_WQE_TYPE_IC,
> +	MAX_NVMETCP_WQE_TYPE
> +};
> +
> +struct nvmetcp_wqe {
> +	__le16 task_id;
> +	u8 flags;
> +#define NVMETCP_WQE_WQE_TYPE_MASK 0x7 /* [use nvmetcp_wqe_type] */
> +#define NVMETCP_WQE_WQE_TYPE_SHIFT 0
> +#define NVMETCP_WQE_NUM_SGES_MASK 0xF
> +#define NVMETCP_WQE_NUM_SGES_SHIFT 3
> +#define NVMETCP_WQE_RESPONSE_MASK 0x1
> +#define NVMETCP_WQE_RESPONSE_SHIFT 7
> +	struct nvmetcp_dif_flags prot_flags;
> +	__le32 contlen_cdbsize;
> +#define NVMETCP_WQE_CONT_LEN_MASK 0xFFFFFF
> +#define NVMETCP_WQE_CONT_LEN_SHIFT 0
> +#define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_MASK 0xFF
> +#define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_SHIFT 24
> +};
> +
>   #endif /* __NVMETCP_COMMON__ */
> diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
> index abc1f41862e3..96263e3cfa1e 100644
> --- a/include/linux/qed/qed_nvmetcp_if.h
> +++ b/include/linux/qed/qed_nvmetcp_if.h
> @@ -25,6 +25,50 @@ struct qed_nvmetcp_tid {
>   	u8 *blocks[MAX_TID_BLOCKS_NVMETCP];
>   };
>   
> +struct qed_nvmetcp_id_params {
> +	u8 mac[ETH_ALEN];
> +	u32 ip[4];
> +	u16 port;
> +};
> +
> +struct qed_nvmetcp_params_offload {
> +	/* FW initializations */
> +	dma_addr_t sq_pbl_addr;
> +	dma_addr_t nvmetcp_cccid_itid_table_addr;
> +	u16 nvmetcp_cccid_max_range;
> +	u8 default_cq;
> +
> +	/* Networking and TCP stack initializations */
> +	struct qed_nvmetcp_id_params src;
> +	struct qed_nvmetcp_id_params dst;
> +	u32 ka_timeout;
> +	u32 ka_interval;
> +	u32 max_rt_time;
> +	u32 cwnd;
> +	u16 mss;
> +	u16 vlan_id;
> +	bool timestamp_en;
> +	bool delayed_ack_en;
> +	bool tcp_keep_alive_en;
> +	bool ecn_en;
> +	u8 ip_version;
> +	u8 ka_max_probe_cnt;
> +	u8 ttl;
> +	u8 tos_or_tc;
> +	u8 rcv_wnd_scale;
> +};
> +
> +struct qed_nvmetcp_params_update {
> +	u32 max_io_size;
> +	u32 max_recv_pdu_length;
> +	u32 max_send_pdu_length;
> +
> +	/* Placeholder: pfv, cpda, hpda */
> +
> +	bool hdr_digest_en;
> +	bool data_digest_en;
> +};
> +
>   struct qed_nvmetcp_cb_ops {
>   	struct qed_common_cb_ops common;
>   };
> @@ -48,6 +92,38 @@ struct qed_nvmetcp_cb_ops {
>    * @stop:		nvmetcp in FW
>    *			@param cdev
>    *			return 0 on success, otherwise error value.
> + * @acquire_conn:	acquire a new nvmetcp connection
> + *			@param cdev
> + *			@param handle - qed will fill handle that should be
> + *				used henceforth as identifier of the
> + *				connection.
> + *			@param p_doorbell - qed will fill the address of the
> + *				doorbell.
> + *			@return 0 on sucesss, otherwise error value.
> + * @release_conn:	release a previously acquired nvmetcp connection
> + *			@param cdev
> + *			@param handle - the connection handle.
> + *			@return 0 on success, otherwise error value.
> + * @offload_conn:	configures an offloaded connection
> + *			@param cdev
> + *			@param handle - the connection handle.
> + *			@param conn_info - the configuration to use for the
> + *				offload.
> + *			@return 0 on success, otherwise error value.
> + * @update_conn:	updates an offloaded connection
> + *			@param cdev
> + *			@param handle - the connection handle.
> + *			@param conn_info - the configuration to use for the
> + *				offload.
> + *			@return 0 on success, otherwise error value.
> + * @destroy_conn:	stops an offloaded connection
> + *			@param cdev
> + *			@param handle - the connection handle.
> + *			@return 0 on success, otherwise error value.
> + * @clear_sq:		clear all task in sq
> + *			@param cdev
> + *			@param handle - the connection handle.
> + *			@return 0 on success, otherwise error value.
>    */
>   struct qed_nvmetcp_ops {
>   	const struct qed_common_ops *common;
> @@ -65,6 +141,24 @@ struct qed_nvmetcp_ops {
>   		     void *event_context, nvmetcp_event_cb_t async_event_cb);
>   
>   	int (*stop)(struct qed_dev *cdev);
> +
> +	int (*acquire_conn)(struct qed_dev *cdev,
> +			    u32 *handle,
> +			    u32 *fw_cid, void __iomem **p_doorbell);
> +
> +	int (*release_conn)(struct qed_dev *cdev, u32 handle);
> +
> +	int (*offload_conn)(struct qed_dev *cdev,
> +			    u32 handle,
> +			    struct qed_nvmetcp_params_offload *conn_info);
> +
> +	int (*update_conn)(struct qed_dev *cdev,
> +			   u32 handle,
> +			   struct qed_nvmetcp_params_update *conn_info);
> +
> +	int (*destroy_conn)(struct qed_dev *cdev, u32 handle, u8 abrt_conn);
> +
> +	int (*clear_sq)(struct qed_dev *cdev, u32 handle);
>   };
>   
>   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 03/27] qed: Add qed-NVMeTCP personality
  2021-04-29 19:09 ` [RFC PATCH v4 03/27] qed: Add qed-NVMeTCP personality Shai Malin
@ 2021-05-02 11:11   ` Hannes Reinecke
  2021-05-03 15:26     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:11 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Omkar Kulkarni <okulkarni@marvell.com>
> 
> This patch adds qed NVMeTCP personality in order to support the NVMeTCP
> qed functionalities and manage the HW device shared resources.
> The same design is used with Eth (qede), RDMA(qedr), iSCSI (qedi) and
> FCoE (qedf).
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> ---
>   drivers/net/ethernet/qlogic/qed/qed.h         |  3 ++
>   drivers/net/ethernet/qlogic/qed/qed_cxt.c     | 32 ++++++++++++++
>   drivers/net/ethernet/qlogic/qed/qed_cxt.h     |  1 +
>   drivers/net/ethernet/qlogic/qed/qed_dev.c     | 44 ++++++++++++++++---
>   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |  3 +-
>   drivers/net/ethernet/qlogic/qed/qed_ll2.c     | 31 ++++++++-----
>   drivers/net/ethernet/qlogic/qed/qed_mcp.c     |  3 ++
>   drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |  3 +-
>   drivers/net/ethernet/qlogic/qed/qed_ooo.c     |  5 ++-
>   .../net/ethernet/qlogic/qed/qed_sp_commands.c |  1 +
>   10 files changed, 108 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
> index 91d4635009ab..7ae648c4edba 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed.h
> @@ -200,6 +200,7 @@ enum qed_pci_personality {
>   	QED_PCI_ETH,
>   	QED_PCI_FCOE,
>   	QED_PCI_ISCSI,
> +	QED_PCI_NVMETCP,
>   	QED_PCI_ETH_ROCE,
>   	QED_PCI_ETH_IWARP,
>   	QED_PCI_ETH_RDMA,
> @@ -285,6 +286,8 @@ struct qed_hw_info {
>   	((dev)->hw_info.personality == QED_PCI_FCOE)
>   #define QED_IS_ISCSI_PERSONALITY(dev)					\
>   	((dev)->hw_info.personality == QED_PCI_ISCSI)
> +#define QED_IS_NVMETCP_PERSONALITY(dev)					\
> +	((dev)->hw_info.personality == QED_PCI_NVMETCP)
>   
So you have a distinct PCI personality for NVMe-oF, but not for the 
protocol? Strange.
Why don't you have a distinct NVMe-oF protocol ID?

>   	/* Resource Allocation scheme results */
>   	u32				resc_start[QED_MAX_RESC];
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
> index 0a22f8ce9a2c..6cef75723e38 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
> @@ -2106,6 +2106,30 @@ int qed_cxt_set_pf_params(struct qed_hwfn *p_hwfn, u32 rdma_tasks)
>   		}
>   		break;
>   	}
> +	case QED_PCI_NVMETCP:
> +	{
> +		struct qed_nvmetcp_pf_params *p_params;
> +
> +		p_params = &p_hwfn->pf_params.nvmetcp_pf_params;
> +
> +		if (p_params->num_cons && p_params->num_tasks) {
> +			qed_cxt_set_proto_cid_count(p_hwfn,
> +						    PROTOCOLID_NVMETCP,
> +						    p_params->num_cons,
> +						    0);
> +
> +			qed_cxt_set_proto_tid_count(p_hwfn,
> +						    PROTOCOLID_NVMETCP,
> +						    QED_CTX_NVMETCP_TID_SEG,
> +						    0,
> +						    p_params->num_tasks,
> +						    true);
> +		} else {
> +			DP_INFO(p_hwfn->cdev,
> +				"NvmeTCP personality used without setting params!\n");
> +		}
> +		break;
> +	}
>   	default:
>   		return -EINVAL;
>   	}
> @@ -2132,6 +2156,10 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,
>   		proto = PROTOCOLID_ISCSI;
>   		seg = QED_CXT_ISCSI_TID_SEG;
>   		break;
> +	case QED_PCI_NVMETCP:
> +		proto = PROTOCOLID_NVMETCP;
> +		seg = QED_CTX_NVMETCP_TID_SEG;
> +		break;
>   	default:
>   		return -EINVAL;
>   	}
> @@ -2458,6 +2486,10 @@ int qed_cxt_get_task_ctx(struct qed_hwfn *p_hwfn,
>   		proto = PROTOCOLID_ISCSI;
>   		seg = QED_CXT_ISCSI_TID_SEG;
>   		break;
> +	case QED_PCI_NVMETCP:
> +		proto = PROTOCOLID_NVMETCP;
> +		seg = QED_CTX_NVMETCP_TID_SEG;
> +		break;
>   	default:
>   		return -EINVAL;
>   	}
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.h b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
> index 056e79620a0e..8f1a77cb33f6 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_cxt.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
> @@ -51,6 +51,7 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,
>   			     struct qed_tid_mem *p_info);
>   
>   #define QED_CXT_ISCSI_TID_SEG	PROTOCOLID_ISCSI
> +#define QED_CTX_NVMETCP_TID_SEG PROTOCOLID_NVMETCP
>   #define QED_CXT_ROCE_TID_SEG	PROTOCOLID_ROCE
>   #define QED_CXT_FCOE_TID_SEG	PROTOCOLID_FCOE
>   enum qed_cxt_elem_type {
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c
> index d2f5855b2ea7..d3f8cc42d07e 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
> @@ -37,6 +37,7 @@
>   #include "qed_sriov.h"
>   #include "qed_vf.h"
>   #include "qed_rdma.h"
> +#include "qed_nvmetcp.h"
>   
>   static DEFINE_SPINLOCK(qm_lock);
>   
> @@ -667,7 +668,8 @@ qed_llh_set_engine_affin(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
>   	}
>   
>   	/* Storage PF is bound to a single engine while L2 PF uses both */
> -	if (QED_IS_FCOE_PERSONALITY(p_hwfn) || QED_IS_ISCSI_PERSONALITY(p_hwfn))
> +	if (QED_IS_FCOE_PERSONALITY(p_hwfn) || QED_IS_ISCSI_PERSONALITY(p_hwfn) ||
> +	    QED_IS_NVMETCP_PERSONALITY(p_hwfn))
>   		eng = cdev->fir_affin ? QED_ENG1 : QED_ENG0;
>   	else			/* L2_PERSONALITY */
>   		eng = QED_BOTH_ENG;
> @@ -1164,6 +1166,9 @@ void qed_llh_remove_mac_filter(struct qed_dev *cdev,
>   	if (!test_bit(QED_MF_LLH_MAC_CLSS, &cdev->mf_bits))
>   		goto out;
>   
> +	if (QED_IS_NVMETCP_PERSONALITY(p_hwfn))
> +		return;
> +
>   	ether_addr_copy(filter.mac.addr, mac_addr);
>   	rc = qed_llh_shadow_remove_filter(cdev, ppfid, &filter, &filter_idx,
>   					  &ref_cnt);
> @@ -1381,6 +1386,11 @@ void qed_resc_free(struct qed_dev *cdev)
>   			qed_ooo_free(p_hwfn);
>   		}
>   
> +		if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
> +			qed_nvmetcp_free(p_hwfn);
> +			qed_ooo_free(p_hwfn);
> +		}
> +
>   		if (QED_IS_RDMA_PERSONALITY(p_hwfn) && rdma_info) {
>   			qed_spq_unregister_async_cb(p_hwfn, rdma_info->proto);
>   			qed_rdma_info_free(p_hwfn);
> @@ -1423,6 +1433,7 @@ static u32 qed_get_pq_flags(struct qed_hwfn *p_hwfn)
>   		flags |= PQ_FLAGS_OFLD;
>   		break;
>   	case QED_PCI_ISCSI:
> +	case QED_PCI_NVMETCP:
>   		flags |= PQ_FLAGS_ACK | PQ_FLAGS_OOO | PQ_FLAGS_OFLD;
>   		break;
>   	case QED_PCI_ETH_ROCE:
> @@ -2269,6 +2280,12 @@ int qed_resc_alloc(struct qed_dev *cdev)
>   							PROTOCOLID_ISCSI,
>   							NULL);
>   			n_eqes += 2 * num_cons;
> +		} else if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
> +			num_cons =
> +			    qed_cxt_get_proto_cid_count(p_hwfn,
> +							PROTOCOLID_NVMETCP,
> +							NULL);
> +			n_eqes += 2 * num_cons;
>   		}
>   
>   		if (n_eqes > 0xFFFF) {
> @@ -2313,6 +2330,15 @@ int qed_resc_alloc(struct qed_dev *cdev)
>   				goto alloc_err;
>   		}
>   
> +		if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
> +			rc = qed_nvmetcp_alloc(p_hwfn);
> +			if (rc)
> +				goto alloc_err;
> +			rc = qed_ooo_alloc(p_hwfn);
> +			if (rc)
> +				goto alloc_err;
> +		}
> +
>   		if (QED_IS_RDMA_PERSONALITY(p_hwfn)) {
>   			rc = qed_rdma_info_alloc(p_hwfn);
>   			if (rc)
> @@ -2393,6 +2419,11 @@ void qed_resc_setup(struct qed_dev *cdev)
>   			qed_iscsi_setup(p_hwfn);
>   			qed_ooo_setup(p_hwfn);
>   		}
> +
> +		if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
> +			qed_nvmetcp_setup(p_hwfn);
> +			qed_ooo_setup(p_hwfn);
> +		}
>   	}
>   }
>   
> @@ -2854,7 +2885,8 @@ static int qed_hw_init_pf(struct qed_hwfn *p_hwfn,
>   
>   	/* Protocol Configuration */
>   	STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_TCP_RT_OFFSET,
> -		     (p_hwfn->hw_info.personality == QED_PCI_ISCSI) ? 1 : 0);
> +		     ((p_hwfn->hw_info.personality == QED_PCI_ISCSI) ||
> +			 (p_hwfn->hw_info.personality == QED_PCI_NVMETCP)) ? 1 : 0);
>   	STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_FCOE_RT_OFFSET,
>   		     (p_hwfn->hw_info.personality == QED_PCI_FCOE) ? 1 : 0);
>   	STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_ROCE_RT_OFFSET, 0);
> @@ -3531,7 +3563,7 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)
>   					       RESC_NUM(p_hwfn,
>   							QED_CMDQS_CQS));
>   
> -	if (QED_IS_ISCSI_PERSONALITY(p_hwfn))
> +	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))
>   		feat_num[QED_ISCSI_CQ] = min_t(u32, sb_cnt.cnt,
>   					       RESC_NUM(p_hwfn,
>   							QED_CMDQS_CQS));
> @@ -3734,7 +3766,8 @@ int qed_hw_get_dflt_resc(struct qed_hwfn *p_hwfn,
>   		break;
>   	case QED_BDQ:
>   		if (p_hwfn->hw_info.personality != QED_PCI_ISCSI &&
> -		    p_hwfn->hw_info.personality != QED_PCI_FCOE)
> +		    p_hwfn->hw_info.personality != QED_PCI_FCOE &&
> +			p_hwfn->hw_info.personality != QED_PCI_NVMETCP)
>   			*p_resc_num = 0;
>   		else
>   			*p_resc_num = 1;
> @@ -3755,7 +3788,8 @@ int qed_hw_get_dflt_resc(struct qed_hwfn *p_hwfn,
>   			*p_resc_start = 0;
>   		else if (p_hwfn->cdev->num_ports_in_engine == 4)
>   			*p_resc_start = p_hwfn->port_id;
> -		else if (p_hwfn->hw_info.personality == QED_PCI_ISCSI)
> +		else if (p_hwfn->hw_info.personality == QED_PCI_ISCSI ||
> +			 p_hwfn->hw_info.personality == QED_PCI_NVMETCP)
>   			*p_resc_start = p_hwfn->port_id;
>   		else if (p_hwfn->hw_info.personality == QED_PCI_FCOE)
>   			*p_resc_start = p_hwfn->port_id + 2;
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> index 24472f6a83c2..9c9ec8f53ef8 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> @@ -12148,7 +12148,8 @@ struct public_func {
>   #define FUNC_MF_CFG_PROTOCOL_ISCSI              0x00000010
>   #define FUNC_MF_CFG_PROTOCOL_FCOE               0x00000020
>   #define FUNC_MF_CFG_PROTOCOL_ROCE               0x00000030
> -#define FUNC_MF_CFG_PROTOCOL_MAX	0x00000030
> +#define FUNC_MF_CFG_PROTOCOL_NVMETCP    0x00000040
> +#define FUNC_MF_CFG_PROTOCOL_MAX	0x00000040
>   
>   #define FUNC_MF_CFG_MIN_BW_MASK		0x0000ff00
>   #define FUNC_MF_CFG_MIN_BW_SHIFT	8
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> index 49783f365079..88bfcdcd4a4c 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> @@ -960,7 +960,8 @@ static int qed_sp_ll2_rx_queue_start(struct qed_hwfn *p_hwfn,
>   
>   	if (test_bit(QED_MF_LL2_NON_UNICAST, &p_hwfn->cdev->mf_bits) &&
>   	    p_ramrod->main_func_queue && conn_type != QED_LL2_TYPE_ROCE &&
> -	    conn_type != QED_LL2_TYPE_IWARP) {
> +	    conn_type != QED_LL2_TYPE_IWARP &&
> +		(!QED_IS_NVMETCP_PERSONALITY(p_hwfn))) {
>   		p_ramrod->mf_si_bcast_accept_all = 1;
>   		p_ramrod->mf_si_mcast_accept_all = 1;
>   	} else {
> @@ -1049,6 +1050,8 @@ static int qed_sp_ll2_tx_queue_start(struct qed_hwfn *p_hwfn,
>   	case QED_LL2_TYPE_OOO:
>   		if (p_hwfn->hw_info.personality == QED_PCI_ISCSI)
>   			p_ramrod->conn_type = PROTOCOLID_ISCSI;
> +		else if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP)
> +			p_ramrod->conn_type = PROTOCOLID_NVMETCP;
>   		else
>   			p_ramrod->conn_type = PROTOCOLID_IWARP;
>   		break;
> @@ -1634,7 +1637,8 @@ int qed_ll2_establish_connection(void *cxt, u8 connection_handle)
>   	if (rc)
>   		goto out;
>   
> -	if (!QED_IS_RDMA_PERSONALITY(p_hwfn))
> +	if (!QED_IS_RDMA_PERSONALITY(p_hwfn) &&
> +	    !QED_IS_NVMETCP_PERSONALITY(p_hwfn))
>   		qed_wr(p_hwfn, p_ptt, PRS_REG_USE_LIGHT_L2, 1);
>   
>   	qed_ll2_establish_connection_ooo(p_hwfn, p_ll2_conn);
> @@ -2376,7 +2380,8 @@ static int qed_ll2_start_ooo(struct qed_hwfn *p_hwfn,
>   static bool qed_ll2_is_storage_eng1(struct qed_dev *cdev)
>   {
>   	return (QED_IS_FCOE_PERSONALITY(QED_LEADING_HWFN(cdev)) ||
> -		QED_IS_ISCSI_PERSONALITY(QED_LEADING_HWFN(cdev))) &&
> +		QED_IS_ISCSI_PERSONALITY(QED_LEADING_HWFN(cdev)) ||
> +		QED_IS_NVMETCP_PERSONALITY(QED_LEADING_HWFN(cdev))) &&
>   		(QED_AFFIN_HWFN(cdev) != QED_LEADING_HWFN(cdev));
>   }
>   
> @@ -2402,11 +2407,13 @@ static int qed_ll2_stop(struct qed_dev *cdev)
>   
>   	if (cdev->ll2->handle == QED_LL2_UNUSED_HANDLE)
>   		return 0;
> +	if (!QED_IS_NVMETCP_PERSONALITY(p_hwfn))
> +		qed_llh_remove_mac_filter(cdev, 0, cdev->ll2_mac_address);
>   
>   	qed_llh_remove_mac_filter(cdev, 0, cdev->ll2_mac_address);
>   	eth_zero_addr(cdev->ll2_mac_address);
>   
> -	if (QED_IS_ISCSI_PERSONALITY(p_hwfn))
> +	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))
>   		qed_ll2_stop_ooo(p_hwfn);
>   
>   	/* In CMT mode, LL2 is always started on engine 0 for a storage PF */
> @@ -2442,6 +2449,7 @@ static int __qed_ll2_start(struct qed_hwfn *p_hwfn,
>   		conn_type = QED_LL2_TYPE_FCOE;
>   		break;
>   	case QED_PCI_ISCSI:
> +	case QED_PCI_NVMETCP:
>   		conn_type = QED_LL2_TYPE_ISCSI;
>   		break;
>   	case QED_PCI_ETH_ROCE:
> @@ -2567,7 +2575,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)
>   		}
>   	}
>   
> -	if (QED_IS_ISCSI_PERSONALITY(p_hwfn)) {
> +	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn)) {
>   		DP_VERBOSE(cdev, QED_MSG_STORAGE, "Starting OOO LL2 queue\n");
>   		rc = qed_ll2_start_ooo(p_hwfn, params);
>   		if (rc) {
> @@ -2576,10 +2584,13 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)
>   		}
>   	}
>   
> -	rc = qed_llh_add_mac_filter(cdev, 0, params->ll2_mac_address);
> -	if (rc) {
> -		DP_NOTICE(cdev, "Failed to add an LLH filter\n");
> -		goto err3;
> +	if (!QED_IS_NVMETCP_PERSONALITY(p_hwfn)) {
> +		rc = qed_llh_add_mac_filter(cdev, 0, params->ll2_mac_address);
> +		if (rc) {
> +			DP_NOTICE(cdev, "Failed to add an LLH filter\n");
> +			goto err3;
> +		}
> +
>   	}
>   
>   	ether_addr_copy(cdev->ll2_mac_address, params->ll2_mac_address);
> @@ -2587,7 +2598,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)
>   	return 0;
>   
>   err3:
> -	if (QED_IS_ISCSI_PERSONALITY(p_hwfn))
> +	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))
>   		qed_ll2_stop_ooo(p_hwfn);
>   err2:
>   	if (b_is_storage_eng1)
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
> index cd882c453394..4387292c37e2 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
> @@ -2446,6 +2446,9 @@ qed_mcp_get_shmem_proto(struct qed_hwfn *p_hwfn,
>   	case FUNC_MF_CFG_PROTOCOL_ISCSI:
>   		*p_proto = QED_PCI_ISCSI;
>   		break;
> +	case FUNC_MF_CFG_PROTOCOL_NVMETCP:
> +		*p_proto = QED_PCI_NVMETCP;
> +		break;
>   	case FUNC_MF_CFG_PROTOCOL_FCOE:
>   		*p_proto = QED_PCI_FCOE;
>   		break;
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
> index 3e3192a3ad9b..6190adf965bc 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
> @@ -1306,7 +1306,8 @@ int qed_mfw_process_tlv_req(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
>   	}
>   
>   	if ((tlv_group & QED_MFW_TLV_ISCSI) &&
> -	    p_hwfn->hw_info.personality != QED_PCI_ISCSI) {
> +	    p_hwfn->hw_info.personality != QED_PCI_ISCSI &&
> +		p_hwfn->hw_info.personality != QED_PCI_NVMETCP) {
>   		DP_VERBOSE(p_hwfn, QED_MSG_SP,
>   			   "Skipping iSCSI TLVs for non-iSCSI function\n");
>   		tlv_group &= ~QED_MFW_TLV_ISCSI;
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_ooo.c b/drivers/net/ethernet/qlogic/qed/qed_ooo.c
> index 88353aa404dc..d37bb2463f98 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_ooo.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_ooo.c
> @@ -16,7 +16,7 @@
>   #include "qed_ll2.h"
>   #include "qed_ooo.h"
>   #include "qed_cxt.h"
> -
> +#include "qed_nvmetcp.h"
>   static struct qed_ooo_archipelago
>   *qed_ooo_seek_archipelago(struct qed_hwfn *p_hwfn,
>   			  struct qed_ooo_info
> @@ -85,6 +85,9 @@ int qed_ooo_alloc(struct qed_hwfn *p_hwfn)
>   	case QED_PCI_ISCSI:
>   		proto = PROTOCOLID_ISCSI;
>   		break;
> +	case QED_PCI_NVMETCP:
> +		proto = PROTOCOLID_NVMETCP;
> +		break;
>   	case QED_PCI_ETH_RDMA:
>   	case QED_PCI_ETH_IWARP:
>   		proto = PROTOCOLID_IWARP;
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
> index aa71adcf31ee..60b3876387a9 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
> @@ -385,6 +385,7 @@ int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
>   		p_ramrod->personality = PERSONALITY_FCOE;
>   		break;
>   	case QED_PCI_ISCSI:
> +	case QED_PCI_NVMETCP:
>   		p_ramrod->personality = PERSONALITY_ISCSI;
>   		break;
>   	case QED_PCI_ETH_ROCE:
> 
As indicated, I do find this mix of 'nvmetcp is nearly iscsi' a bit 
strange. I would have preferred to have distinct types for nvmetcp.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 04/27] qed: Add support of HW filter block
  2021-04-29 19:09 ` [RFC PATCH v4 04/27] qed: Add support of HW filter block Shai Malin
@ 2021-05-02 11:13   ` Hannes Reinecke
  2021-05-03 15:27     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:13 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Prabhakar Kushwaha <pkushwaha@marvell.com>
> 
> This patch introduces the functionality of HW filter block.
> It adds and removes filters based on source and target TCP port.
> 
> It also add functionality to clear all filters at once.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> ---
>   drivers/net/ethernet/qlogic/qed/qed.h         |  10 ++
>   drivers/net/ethernet/qlogic/qed/qed_dev.c     | 107 ++++++++++++++++++
>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |   5 +
>   include/linux/qed/qed_nvmetcp_if.h            |  24 ++++
>   4 files changed, 146 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 05/27] qed: Add NVMeTCP Offload IO Level FW and HW HSI
  2021-04-29 19:09 ` [RFC PATCH v4 05/27] qed: Add NVMeTCP Offload IO Level FW and HW HSI Shai Malin
@ 2021-05-02 11:22   ` Hannes Reinecke
  2021-05-04 16:25     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:22 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch introduces the NVMeTCP Offload FW and HW  HSI in order
> to initialize the IO level configuration into a per IO HW
> resource ("task") as part of the IO path flow.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> ---
>   include/linux/qed/nvmetcp_common.h | 418 ++++++++++++++++++++++++++++-
>   include/linux/qed/qed_nvmetcp_if.h |  37 +++
>   2 files changed, 454 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
> index c8836b71b866..dda7a785c321 100644
> --- a/include/linux/qed/nvmetcp_common.h
> +++ b/include/linux/qed/nvmetcp_common.h
> @@ -7,6 +7,7 @@
>   #include "tcp_common.h"
>   
>   #define NVMETCP_SLOW_PATH_LAYER_CODE (6)
> +#define NVMETCP_WQE_NUM_SGES_SLOWIO (0xf)
>   
>   /* NVMeTCP firmware function init parameters */
>   struct nvmetcp_spe_func_init {
> @@ -194,4 +195,419 @@ struct nvmetcp_wqe {
>   #define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_SHIFT 24
>   };
>   
> -#endif /* __NVMETCP_COMMON__ */
> +struct nvmetcp_host_cccid_itid_entry {
> +	__le16 itid;
> +};
> +
> +struct nvmetcp_connect_done_results {
> +	__le16 icid;
> +	__le16 conn_id;
> +	struct tcp_ulp_connect_done_params params;
> +};
> +
> +struct nvmetcp_eqe_data {
> +	__le16 icid;
> +	__le16 conn_id;
> +	__le16 reserved;
> +	u8 error_code;
> +	u8 error_pdu_opcode_reserved;
> +#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_MASK 0x3F
> +#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_SHIFT  0
> +#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_VALID_MASK  0x1
> +#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_VALID_SHIFT  6
> +#define NVMETCP_EQE_DATA_RESERVED0_MASK 0x1
> +#define NVMETCP_EQE_DATA_RESERVED0_SHIFT 7
> +};
> +
> +enum nvmetcp_task_type {
> +	NVMETCP_TASK_TYPE_HOST_WRITE,
> +	NVMETCP_TASK_TYPE_HOST_READ,
> +	NVMETCP_TASK_TYPE_INIT_CONN_REQUEST,
> +	NVMETCP_TASK_TYPE_RESERVED0,
> +	NVMETCP_TASK_TYPE_CLEANUP,
> +	NVMETCP_TASK_TYPE_HOST_READ_NO_CQE,
> +	MAX_NVMETCP_TASK_TYPE
> +};
> +
> +struct nvmetcp_db_data {
> +	u8 params;
> +#define NVMETCP_DB_DATA_DEST_MASK 0x3 /* destination of doorbell (use enum db_dest) */
> +#define NVMETCP_DB_DATA_DEST_SHIFT 0
> +#define NVMETCP_DB_DATA_AGG_CMD_MASK 0x3 /* aggregative command to CM (use enum db_agg_cmd_sel) */
> +#define NVMETCP_DB_DATA_AGG_CMD_SHIFT 2
> +#define NVMETCP_DB_DATA_BYPASS_EN_MASK 0x1 /* enable QM bypass */
> +#define NVMETCP_DB_DATA_BYPASS_EN_SHIFT 4
> +#define NVMETCP_DB_DATA_RESERVED_MASK 0x1
> +#define NVMETCP_DB_DATA_RESERVED_SHIFT 5
> +#define NVMETCP_DB_DATA_AGG_VAL_SEL_MASK 0x3 /* aggregative value selection */
> +#define NVMETCP_DB_DATA_AGG_VAL_SEL_SHIFT 6
> +	u8 agg_flags; /* bit for every DQ counter flags in CM context that DQ can increment */
> +	__le16 sq_prod;
> +};
> +
> +struct nvmetcp_fw_cqe_error_bitmap {
> +	u8 cqe_error_status_bits;
> +#define CQE_ERROR_BITMAP_DIF_ERR_BITS_MASK 0x7
> +#define CQE_ERROR_BITMAP_DIF_ERR_BITS_SHIFT 0
> +#define CQE_ERROR_BITMAP_DATA_DIGEST_ERR_MASK 0x1
> +#define CQE_ERROR_BITMAP_DATA_DIGEST_ERR_SHIFT 3
> +#define CQE_ERROR_BITMAP_RCV_ON_INVALID_CONN_MASK 0x1
> +#define CQE_ERROR_BITMAP_RCV_ON_INVALID_CONN_SHIFT 4
> +};

Why exactly do you need a struct here?
For just one 8 bit value?

> + > +struct nvmetcp_nvmf_cqe {
> +	__le32 reserved[4];
> +};
> +
> +struct nvmetcp_fw_cqe_data {
> +	struct nvmetcp_nvmf_cqe nvme_cqe;
> +	struct regpair task_opaque;
> +	__le32 reserved[6];
> +};
> +
> +struct nvmetcp_fw_cqe {
> +	__le16 conn_id;
> +	u8 cqe_type;
> +	struct nvmetcp_fw_cqe_error_bitmap error_bitmap;
> +	__le16 itid;
> +	u8 task_type;
> +	u8 fw_dbg_field;
> +	u8 caused_conn_err;
> +	u8 reserved0[3];
> +	__le32 reserved1;
> +	struct nvmetcp_nvmf_cqe nvme_cqe;
> +	struct regpair task_opaque;
> +	__le32 reserved[6];
> +};
> +
> +enum nvmetcp_fw_cqes_type {
> +	NVMETCP_FW_CQE_TYPE_NORMAL = 1,
> +	NVMETCP_FW_CQE_TYPE_RESERVED0,
> +	NVMETCP_FW_CQE_TYPE_RESERVED1,
> +	NVMETCP_FW_CQE_TYPE_CLEANUP,
> +	NVMETCP_FW_CQE_TYPE_DUMMY,
> +	MAX_NVMETCP_FW_CQES_TYPE
> +};
> +
> +struct ystorm_nvmetcp_task_state {
> +	struct scsi_cached_sges data_desc;
> +	struct scsi_sgl_params sgl_params;
> +	__le32 resrved0;
> +	__le32 buffer_offset;
> +	__le16 cccid;
> +	struct nvmetcp_dif_flags dif_flags;
> +	u8 flags;
> +#define YSTORM_NVMETCP_TASK_STATE_LOCAL_COMP_MASK 0x1
> +#define YSTORM_NVMETCP_TASK_STATE_LOCAL_COMP_SHIFT 0
> +#define YSTORM_NVMETCP_TASK_STATE_SLOW_IO_MASK 0x1
> +#define YSTORM_NVMETCP_TASK_STATE_SLOW_IO_SHIFT 1
> +#define YSTORM_NVMETCP_TASK_STATE_SET_DIF_OFFSET_MASK 0x1
> +#define YSTORM_NVMETCP_TASK_STATE_SET_DIF_OFFSET_SHIFT 2
> +#define YSTORM_NVMETCP_TASK_STATE_SEND_W_RSP_MASK 0x1
> +#define YSTORM_NVMETCP_TASK_STATE_SEND_W_RSP_SHIFT 3
> +};
> +
> +struct ystorm_nvmetcp_task_rxmit_opt {
> +	__le32 reserved[4];
> +};
> +
> +struct nvmetcp_task_hdr {
> +	__le32 reg[18];
> +};
> +
> +struct nvmetcp_task_hdr_aligned {
> +	struct nvmetcp_task_hdr task_hdr;
> +	__le32 reserved[2];	/* HSI_COMMENT: Align to QREG */
> +};
> +
> +struct e5_tdif_task_context {
> +	__le32 reserved[16];
> +};
> +
> +struct e5_rdif_task_context {
> +	__le32 reserved[12];
> +};
> +
> +struct ystorm_nvmetcp_task_st_ctx {
> +	struct ystorm_nvmetcp_task_state state;
> +	struct ystorm_nvmetcp_task_rxmit_opt rxmit_opt;
> +	struct nvmetcp_task_hdr_aligned pdu_hdr;
> +};
> +
> +struct mstorm_nvmetcp_task_st_ctx {
> +	struct scsi_cached_sges data_desc;
> +	struct scsi_sgl_params sgl_params;
> +	__le32 rem_task_size;
> +	__le32 data_buffer_offset;
> +	u8 task_type;
> +	struct nvmetcp_dif_flags dif_flags;
> +	__le16 dif_task_icid;
> +	struct regpair reserved0;
> +	__le32 expected_itt;
> +	__le32 reserved1;
> +};
> +
> +struct nvmetcp_reg1 {
> +	__le32 reg1_map;
> +#define NVMETCP_REG1_NUM_SGES_MASK 0xF
> +#define NVMETCP_REG1_NUM_SGES_SHIFT 0
> +#define NVMETCP_REG1_RESERVED1_MASK 0xFFFFFFF
> +#define NVMETCP_REG1_RESERVED1_SHIFT 4
> +};
> +

Similar here; it is 32 bits, true, but still just a single value.

> +struct ustorm_nvmetcp_task_st_ctx {
> +	__le32 rem_rcv_len;
> +	__le32 exp_data_transfer_len;
> +	__le32 exp_data_sn;
> +	struct regpair reserved0;
> +	struct nvmetcp_reg1 reg1;
> +	u8 flags2;
> +#define USTORM_NVMETCP_TASK_ST_CTX_AHS_EXIST_MASK 0x1
> +#define USTORM_NVMETCP_TASK_ST_CTX_AHS_EXIST_SHIFT 0
> +#define USTORM_NVMETCP_TASK_ST_CTX_RESERVED1_MASK 0x7F
> +#define USTORM_NVMETCP_TASK_ST_CTX_RESERVED1_SHIFT 1
> +	struct nvmetcp_dif_flags dif_flags;
> +	__le16 reserved3;
> +	__le16 tqe_opaque[2];
> +	__le32 reserved5;
> +	__le32 nvme_tcp_opaque_lo;
> +	__le32 nvme_tcp_opaque_hi;
> +	u8 task_type;
> +	u8 error_flags;
> +#define USTORM_NVMETCP_TASK_ST_CTX_DATA_DIGEST_ERROR_MASK 0x1
> +#define USTORM_NVMETCP_TASK_ST_CTX_DATA_DIGEST_ERROR_SHIFT 0
> +#define USTORM_NVMETCP_TASK_ST_CTX_DATA_TRUNCATED_ERROR_MASK 0x1
> +#define USTORM_NVMETCP_TASK_ST_CTX_DATA_TRUNCATED_ERROR_SHIFT 1
> +#define USTORM_NVMETCP_TASK_ST_CTX_UNDER_RUN_ERROR_MASK 0x1
> +#define USTORM_NVMETCP_TASK_ST_CTX_UNDER_RUN_ERROR_SHIFT 2
> +#define USTORM_NVMETCP_TASK_ST_CTX_NVME_TCP_MASK 0x1
> +#define USTORM_NVMETCP_TASK_ST_CTX_NVME_TCP_SHIFT 3
> +	u8 flags;
> +#define USTORM_NVMETCP_TASK_ST_CTX_CQE_WRITE_MASK 0x3
> +#define USTORM_NVMETCP_TASK_ST_CTX_CQE_WRITE_SHIFT 0
> +#define USTORM_NVMETCP_TASK_ST_CTX_LOCAL_COMP_MASK 0x1
> +#define USTORM_NVMETCP_TASK_ST_CTX_LOCAL_COMP_SHIFT 2
> +#define USTORM_NVMETCP_TASK_ST_CTX_Q0_R2TQE_WRITE_MASK 0x1
> +#define USTORM_NVMETCP_TASK_ST_CTX_Q0_R2TQE_WRITE_SHIFT 3
> +#define USTORM_NVMETCP_TASK_ST_CTX_TOTAL_DATA_ACKED_DONE_MASK 0x1
> +#define USTORM_NVMETCP_TASK_ST_CTX_TOTAL_DATA_ACKED_DONE_SHIFT 4
> +#define USTORM_NVMETCP_TASK_ST_CTX_HQ_SCANNED_DONE_MASK 0x1
> +#define USTORM_NVMETCP_TASK_ST_CTX_HQ_SCANNED_DONE_SHIFT 5
> +#define USTORM_NVMETCP_TASK_ST_CTX_R2T2RECV_DONE_MASK 0x1
> +#define USTORM_NVMETCP_TASK_ST_CTX_R2T2RECV_DONE_SHIFT 6
> +	u8 cq_rss_number;
> +};
> +
> +struct e5_ystorm_nvmetcp_task_ag_ctx {
> +	u8 reserved /* cdu_validation */;
> +	u8 byte1 /* state_and_core_id */;
> +	__le16 word0 /* icid */;
> +	u8 flags0;
> +	u8 flags1;
> +	u8 flags2;
> +	u8 flags3;
> +	__le32 TTT;
> +	u8 byte2;
> +	u8 byte3;
> +	u8 byte4;
> +	u8 e4_reserved7;
> +};
> +
> +struct e5_mstorm_nvmetcp_task_ag_ctx {
> +	u8 cdu_validation;
> +	u8 byte1;
> +	__le16 task_cid;
> +	u8 flags0;
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CONNECTION_TYPE_MASK 0xF
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CONNECTION_TYPE_SHIFT 0
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_EXIST_IN_QM0_MASK 0x1
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_EXIST_IN_QM0_SHIFT 4
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_MASK 0x1
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_SHIFT 5
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_VALID_MASK 0x1
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_VALID_SHIFT 6
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_FLAG_MASK 0x1
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_FLAG_SHIFT 7
> +	u8 flags1;
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_CF_MASK 0x3
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_CF_SHIFT 0
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF1_MASK 0x3
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF1_SHIFT 2
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF2_MASK 0x3
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF2_SHIFT 4
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_CF_EN_MASK 0x1
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_TASK_CLEANUP_CF_EN_SHIFT 6
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF1EN_MASK 0x1
> +#define E5_MSTORM_NVMETCP_TASK_AG_CTX_CF1EN_SHIFT 7
> +	u8 flags2;
> +	u8 flags3;
> +	__le32 reg0;
> +	u8 byte2;
> +	u8 byte3;
> +	u8 byte4;
> +	u8 e4_reserved7;
> +};
> +
> +struct e5_ustorm_nvmetcp_task_ag_ctx {
> +	u8 reserved;
> +	u8 state_and_core_id;
> +	__le16 icid;
> +	u8 flags0;
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CONNECTION_TYPE_MASK 0xF
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CONNECTION_TYPE_SHIFT 0
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_EXIST_IN_QM0_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_EXIST_IN_QM0_SHIFT 4
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_SHIFT 5
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_HQ_SCANNED_CF_MASK 0x3
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_HQ_SCANNED_CF_SHIFT 6
> +	u8 flags1;
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_RESERVED1_MASK 0x3
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_RESERVED1_SHIFT 0
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV_MASK 0x3
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV_SHIFT 2
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CF3_MASK 0x3
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CF3_SHIFT 4
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_MASK 0x3
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_SHIFT 6
> +	u8 flags2;
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_HQ_SCANNED_CF_EN_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_HQ_SCANNED_CF_EN_SHIFT 0
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_DISABLE_DATA_ACKED_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_DISABLE_DATA_ACKED_SHIFT 1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV_EN_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV_EN_SHIFT 2
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CF3EN_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CF3EN_SHIFT 3
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_EN_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_EN_SHIFT 4
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CMP_DATA_TOTAL_EXP_EN_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CMP_DATA_TOTAL_EXP_EN_SHIFT 5
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_RULE1EN_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_RULE1EN_SHIFT 6
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CMP_CONT_RCV_EXP_EN_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_CMP_CONT_RCV_EXP_EN_SHIFT 7
> +	u8 flags3;
> +	u8 flags4;
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED5_MASK 0x3
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED5_SHIFT 0
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED6_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED6_SHIFT 2
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED7_MASK 0x1
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_E4_RESERVED7_SHIFT 3
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_TYPE_MASK 0xF
> +#define E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_TYPE_SHIFT 4
> +	u8 byte2;
> +	u8 byte3;
> +	u8 e4_reserved8;
> +	__le32 dif_err_intervals;
> +	__le32 dif_error_1st_interval;
> +	__le32 rcv_cont_len;
> +	__le32 exp_cont_len;
> +	__le32 total_data_acked;
> +	__le32 exp_data_acked;
> +	__le16 word1;
> +	__le16 next_tid;
> +	__le32 hdr_residual_count;
> +	__le32 exp_r2t_sn;
> +};
> +
> +struct e5_nvmetcp_task_context {
> +	struct ystorm_nvmetcp_task_st_ctx ystorm_st_context;
> +	struct e5_ystorm_nvmetcp_task_ag_ctx ystorm_ag_context;
> +	struct regpair ystorm_ag_padding[2];
> +	struct e5_tdif_task_context tdif_context;
> +	struct e5_mstorm_nvmetcp_task_ag_ctx mstorm_ag_context;
> +	struct regpair mstorm_ag_padding[2];
> +	struct e5_ustorm_nvmetcp_task_ag_ctx ustorm_ag_context;
> +	struct regpair ustorm_ag_padding[2];
> +	struct mstorm_nvmetcp_task_st_ctx mstorm_st_context;
> +	struct regpair mstorm_st_padding[2];
> +	struct ustorm_nvmetcp_task_st_ctx ustorm_st_context;
> +	struct regpair ustorm_st_padding[2];
> +	struct e5_rdif_task_context rdif_context;
> +};
> +
> +/* NVMe TCP common header in network order */
> +struct nvmetcp_common_hdr {
> +	u8 pdo;
> +	u8 hlen;
> +	u8 flags;
> +#define NVMETCP_COMMON_HDR_HDGSTF_MASK 0x1
> +#define NVMETCP_COMMON_HDR_HDGSTF_SHIFT 0
> +#define NVMETCP_COMMON_HDR_DDGSTF_MASK 0x1
> +#define NVMETCP_COMMON_HDR_DDGSTF_SHIFT 1
> +#define NVMETCP_COMMON_HDR_LAST_PDU_MASK 0x1
> +#define NVMETCP_COMMON_HDR_LAST_PDU_SHIFT 2
> +#define NVMETCP_COMMON_HDR_SUCCESS_MASK 0x1
> +#define NVMETCP_COMMON_HDR_SUCCESS_SHIFT 3
> +#define NVMETCP_COMMON_HDR_RESERVED_MASK 0xF
> +#define NVMETCP_COMMON_HDR_RESERVED_SHIFT 4
> +	u8 pdu_type;
> +	__le32 plen_swapped;
> +};
> +
Why do you need to define this?
Wouldn't it be easier to use the standard 'struct nvme_tcp_hdr' and swap 
the bytes before sending it off?

(Or modify the firmware to do the byte-swapping itself ...)

> +/* We don't need the entire 128 Bytes of the ICReq, hence passing only 16
> + * Bytes to the FW in network order.
> + */
> +struct nvmetcp_icreq_hdr_psh {
> +	__le16 pfv;
> +	u8 hpda;
> +	u8 digest;
> +#define NVMETCP_ICREQ_HDR_PSH_16B_HDGST_EN_MASK 0x1
> +#define NVMETCP_ICREQ_HDR_PSH_16B_HDGST_EN_SHIFT 0
> +#define NVMETCP_ICREQ_HDR_PSH_16B_DDGST_EN_MASK 0x1
> +#define NVMETCP_ICREQ_HDR_PSH_16B_DDGST_EN_SHIFT 1
> +#define NVMETCP_ICREQ_HDR_PSH_16B_RESERVED1_MASK 0x3F
> +#define NVMETCP_ICREQ_HDR_PSH_16B_RESERVED1_SHIFT 2
> +	__le32 maxr2t;
> +	u8 reserved[8];
> +};
> +

One of these short-cuts which will come back to haunt you eventually; I 
would consider updating the firmware to process the entire icreq.

> +struct nvmetcp_cmd_capsule_hdr_psh {
> +	__le32 raw_swapped[16];
> +};
> +
> +struct nvmetcp_cmd_capsule_hdr {
> +	struct nvmetcp_common_hdr chdr;
> +	struct nvmetcp_cmd_capsule_hdr_psh pshdr;
> +};
> +
> +struct nvmetcp_data_hdr {
> +	__le32 data[6];
> +};
> +
> +struct nvmetcp_h2c_hdr_psh {
> +	__le16 ttag_swapped;
> +	__le16 command_id_swapped;
> +	__le32 data_offset_swapped;
> +	__le32 data_length_swapped;
> +	__le32 reserved1;
> +};
> +
> +struct nvmetcp_h2c_hdr {
> +	struct nvmetcp_common_hdr chdr;
> +	struct nvmetcp_h2c_hdr_psh pshdr;
> +};
> +
> +/* We don't need the entire 128 Bytes of the ICResp, hence passing only 16
> + * Bytes to the FW in network order.
> + */
> +struct nvmetcp_icresp_hdr_psh {
> +	u8 digest;
> +#define NVMETCP_ICRESP_HDR_PSH_16B_HDGST_EN_MASK 0x1
> +#define NVMETCP_ICRESP_HDR_PSH_16B_HDGST_EN_SHIFT 0
> +#define NVMETCP_ICRESP_HDR_PSH_16B_DDGST_EN_MASK 0x1
> +#define NVMETCP_ICRESP_HDR_PSH_16B_DDGST_EN_SHIFT 1
> +	u8 cpda;
> +	__le16 pfv_swapped;
> +	__le32 maxdata_swapped;
> +	__le16 reserved2[4];
> +};
> +

Same here.

> +struct nvmetcp_init_conn_req_hdr {
> +	struct nvmetcp_common_hdr chdr;
> +	struct nvmetcp_icreq_hdr_psh pshdr;
> +};
> +
> +#endif /* __NVMETCP_COMMON__*/
> diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
> index 686f924238e3..04e90dc42c12 100644
> --- a/include/linux/qed/qed_nvmetcp_if.h
> +++ b/include/linux/qed/qed_nvmetcp_if.h
> @@ -5,6 +5,8 @@
>   #define _QED_NVMETCP_IF_H
>   #include <linux/types.h>
>   #include <linux/qed/qed_if.h>
> +#include <linux/qed/storage_common.h>
> +#include <linux/qed/nvmetcp_common.h>
>   
>   #define QED_NVMETCP_MAX_IO_SIZE	0x800000
>   
> @@ -73,6 +75,41 @@ struct qed_nvmetcp_cb_ops {
>   	struct qed_common_cb_ops common;
>   };
>   
> +struct nvmetcp_sge {
> +	struct regpair sge_addr; /* SGE address */
> +	__le32 sge_len; /* SGE length */
> +	__le32 reserved;
> +};
> +
> +/* IO path HSI function SGL params */
> +struct storage_sgl_task_params {
> +	struct nvmetcp_sge *sgl;
> +	struct regpair sgl_phys_addr;
> +	u32 total_buffer_size;
> +	u16 num_sges;
> +	bool small_mid_sge;
> +};
> +
> +/* IO path HSI function FW task context params */
> +struct nvmetcp_task_params {
> +	void *context; /* Output parameter - set/filled by the HSI function */
> +	struct nvmetcp_wqe *sqe;
> +	u32 tx_io_size; /* in bytes (Without DIF, if exists) */
> +	u32 rx_io_size; /* in bytes (Without DIF, if exists) */
> +	u16 conn_icid;
> +	u16 itid;
> +	struct regpair opq; /* qedn_task_ctx address */
> +	u16 host_cccid;
> +	u8 cq_rss_number;
> +	bool send_write_incapsule;
> +};
> +
> +/* IO path HSI function FW conn level input params */
> +
> +struct nvmetcp_conn_params {
> +	u32 max_burst_length;
> +};
> +
>   /**
>    * struct qed_nvmetcp_ops - qed NVMeTCP operations.
>    * @common:		common operations pointer
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 06/27] qed: Add NVMeTCP Offload IO Level FW Initializations
  2021-04-29 19:09 ` [RFC PATCH v4 06/27] qed: Add NVMeTCP Offload IO Level FW Initializations Shai Malin
@ 2021-05-02 11:24   ` Hannes Reinecke
  2021-05-04 16:28     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:24 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch introduces the NVMeTCP FW initializations which is used
> to initialize the IO level configuration into a per IO HW
> resource ("task") as part of the IO path flow.
> 
> This includes:
> - Write IO FW initialization
> - Read IO FW initialization.
> - IC-Req and IC-Resp FW exchange.
> - FW Cleanup flow (Flush IO).
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> ---
>   drivers/net/ethernet/qlogic/qed/Makefile      |   5 +-
>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |   7 +-
>   .../qlogic/qed/qed_nvmetcp_fw_funcs.c         | 372 ++++++++++++++++++
>   .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |  43 ++
>   include/linux/qed/nvmetcp_common.h            |   3 +
>   include/linux/qed/qed_nvmetcp_if.h            |  17 +
>   6 files changed, 445 insertions(+), 2 deletions(-)
>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile
> index 7cb0db67ba5b..0d9c2fe0245d 100644
> --- a/drivers/net/ethernet/qlogic/qed/Makefile
> +++ b/drivers/net/ethernet/qlogic/qed/Makefile
> @@ -28,7 +28,10 @@ qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o
>   qed-$(CONFIG_QED_LL2) += qed_ll2.o
>   qed-$(CONFIG_QED_OOO) += qed_ooo.o
>   
> -qed-$(CONFIG_QED_NVMETCP) += qed_nvmetcp.o
> +qed-$(CONFIG_QED_NVMETCP) +=	\
> +	qed_nvmetcp.o		\
> +	qed_nvmetcp_fw_funcs.o	\
> +	qed_nvmetcp_ip_services.o
>   
>   qed-$(CONFIG_QED_RDMA) +=	\
>   	qed_iwarp.o		\
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> index 1e2eb6dcbd6e..434363f8b5c0 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> @@ -27,6 +27,7 @@
>   #include "qed_mcp.h"
>   #include "qed_sp.h"
>   #include "qed_reg_addr.h"
> +#include "qed_nvmetcp_fw_funcs.h"
>   
>   static int qed_nvmetcp_async_event(struct qed_hwfn *p_hwfn, u8 fw_event_code,
>   				   u16 echo, union event_ring_data *data,
> @@ -848,7 +849,11 @@ static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
>   	.remove_src_tcp_port_filter = &qed_llh_remove_src_tcp_port_filter,
>   	.add_dst_tcp_port_filter = &qed_llh_add_dst_tcp_port_filter,
>   	.remove_dst_tcp_port_filter = &qed_llh_remove_dst_tcp_port_filter,
> -	.clear_all_filters = &qed_llh_clear_all_filters
> +	.clear_all_filters = &qed_llh_clear_all_filters,
> +	.init_read_io = &init_nvmetcp_host_read_task,
> +	.init_write_io = &init_nvmetcp_host_write_task,
> +	.init_icreq_exchange = &init_nvmetcp_init_conn_req_task,
> +	.init_task_cleanup = &init_cleanup_task_nvmetcp
>   };
>   
>   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
> new file mode 100644
> index 000000000000..8485ad678284
> --- /dev/null
> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
> @@ -0,0 +1,372 @@
> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
> +/* Copyright 2021 Marvell. All rights reserved. */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/kernel.h>
> +#include <linux/list.h>
> +#include <linux/mm.h>
> +#include <linux/types.h>
> +#include <asm/byteorder.h>
> +#include <linux/qed/common_hsi.h>
> +#include <linux/qed/storage_common.h>
> +#include <linux/qed/nvmetcp_common.h>
> +#include <linux/qed/qed_nvmetcp_if.h>
> +#include "qed_nvmetcp_fw_funcs.h"
> +
> +#define NVMETCP_NUM_SGES_IN_CACHE 0x4
> +
> +bool nvmetcp_is_slow_sgl(u16 num_sges, bool small_mid_sge)
> +{
> +	return (num_sges > SCSI_NUM_SGES_SLOW_SGL_THR && small_mid_sge);
> +}
> +
> +void init_scsi_sgl_context(struct scsi_sgl_params *ctx_sgl_params,
> +			   struct scsi_cached_sges *ctx_data_desc,
> +			   struct storage_sgl_task_params *sgl_params)
> +{
> +	u8 num_sges_to_init = (u8)(sgl_params->num_sges > NVMETCP_NUM_SGES_IN_CACHE ?
> +				   NVMETCP_NUM_SGES_IN_CACHE : sgl_params->num_sges);
> +	u8 sge_index;
> +
> +	/* sgl params */
> +	ctx_sgl_params->sgl_addr.lo = cpu_to_le32(sgl_params->sgl_phys_addr.lo);
> +	ctx_sgl_params->sgl_addr.hi = cpu_to_le32(sgl_params->sgl_phys_addr.hi);
> +	ctx_sgl_params->sgl_total_length = cpu_to_le32(sgl_params->total_buffer_size);
> +	ctx_sgl_params->sgl_num_sges = cpu_to_le16(sgl_params->num_sges);
> +
> +	for (sge_index = 0; sge_index < num_sges_to_init; sge_index++) {
> +		ctx_data_desc->sge[sge_index].sge_addr.lo =
> +			cpu_to_le32(sgl_params->sgl[sge_index].sge_addr.lo);
> +		ctx_data_desc->sge[sge_index].sge_addr.hi =
> +			cpu_to_le32(sgl_params->sgl[sge_index].sge_addr.hi);
> +		ctx_data_desc->sge[sge_index].sge_len =
> +			cpu_to_le32(sgl_params->sgl[sge_index].sge_len);
> +	}
> +}
> +
> +static inline u32 calc_rw_task_size(struct nvmetcp_task_params *task_params,
> +				    enum nvmetcp_task_type task_type)
> +{
> +	u32 io_size;
> +
> +	if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE)
> +		io_size = task_params->tx_io_size;
> +	else
> +		io_size = task_params->rx_io_size;
> +
> +	if (unlikely(!io_size))
> +		return 0;
> +
> +	return io_size;
> +}
> +
> +static inline void init_sqe(struct nvmetcp_task_params *task_params,
> +			    struct storage_sgl_task_params *sgl_task_params,
> +			    enum nvmetcp_task_type task_type)
> +{
> +	if (!task_params->sqe)
> +		return;
> +
> +	memset(task_params->sqe, 0, sizeof(*task_params->sqe));
> +	task_params->sqe->task_id = cpu_to_le16(task_params->itid);
> +
> +	switch (task_type) {
> +	case NVMETCP_TASK_TYPE_HOST_WRITE: {
> +		u32 buf_size = 0;
> +		u32 num_sges = 0;
> +
> +		SET_FIELD(task_params->sqe->contlen_cdbsize,
> +			  NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD, 1);
> +		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,
> +			  NVMETCP_WQE_TYPE_NORMAL);
> +		if (task_params->tx_io_size) {
> +			if (task_params->send_write_incapsule)
> +				buf_size = calc_rw_task_size(task_params, task_type);
> +
> +			if (nvmetcp_is_slow_sgl(sgl_task_params->num_sges,
> +						sgl_task_params->small_mid_sge))
> +				num_sges = NVMETCP_WQE_NUM_SGES_SLOWIO;
> +			else
> +				num_sges = min((u16)sgl_task_params->num_sges,
> +					       (u16)SCSI_NUM_SGES_SLOW_SGL_THR);
> +		}
> +		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_NUM_SGES, num_sges);
> +		SET_FIELD(task_params->sqe->contlen_cdbsize, NVMETCP_WQE_CONT_LEN, buf_size);
> +	} break;
> +
> +	case NVMETCP_TASK_TYPE_HOST_READ: {
> +		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,
> +			  NVMETCP_WQE_TYPE_NORMAL);
> +		SET_FIELD(task_params->sqe->contlen_cdbsize,
> +			  NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD, 1);
> +	} break;
> +
> +	case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST: {
> +		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,
> +			  NVMETCP_WQE_TYPE_MIDDLE_PATH);
> +
> +		if (task_params->tx_io_size) {
> +			SET_FIELD(task_params->sqe->contlen_cdbsize, NVMETCP_WQE_CONT_LEN,
> +				  task_params->tx_io_size);
> +			SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_NUM_SGES,
> +				  min((u16)sgl_task_params->num_sges,
> +				      (u16)SCSI_NUM_SGES_SLOW_SGL_THR));
> +		}
> +	} break;
> +
> +	case NVMETCP_TASK_TYPE_CLEANUP:
> +		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,
> +			  NVMETCP_WQE_TYPE_TASK_CLEANUP);
> +
> +	default:
> +		break;
> +	}
> +}
> +
> +/* The following function initializes of NVMeTCP task params */
> +static inline void
> +init_nvmetcp_task_params(struct e5_nvmetcp_task_context *context,
> +			 struct nvmetcp_task_params *task_params,
> +			 enum nvmetcp_task_type task_type)
> +{
> +	context->ystorm_st_context.state.cccid = task_params->host_cccid;
> +	SET_FIELD(context->ustorm_st_context.error_flags, USTORM_NVMETCP_TASK_ST_CTX_NVME_TCP, 1);
> +	context->ustorm_st_context.nvme_tcp_opaque_lo = cpu_to_le32(task_params->opq.lo);
> +	context->ustorm_st_context.nvme_tcp_opaque_hi = cpu_to_le32(task_params->opq.hi);
> +}
> +
> +/* The following function initializes default values to all tasks */
> +static inline void
> +init_default_nvmetcp_task(struct nvmetcp_task_params *task_params, void *pdu_header,
> +			  enum nvmetcp_task_type task_type)
> +{
> +	struct e5_nvmetcp_task_context *context = task_params->context;
> +	const u8 val_byte = context->mstorm_ag_context.cdu_validation;
> +	u8 dw_index;
> +
> +	memset(context, 0, sizeof(*context));
> +
> +	init_nvmetcp_task_params(context, task_params,
> +				 (enum nvmetcp_task_type)task_type);
> +
> +	if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE ||
> +	    task_type == NVMETCP_TASK_TYPE_HOST_READ) {
> +		for (dw_index = 0; dw_index < QED_NVMETCP_CMD_HDR_SIZE / 4; dw_index++)
> +			context->ystorm_st_context.pdu_hdr.task_hdr.reg[dw_index] =
> +				cpu_to_le32(((u32 *)pdu_header)[dw_index]);
> +	} else {
> +		for (dw_index = 0; dw_index < QED_NVMETCP_CMN_HDR_SIZE / 4; dw_index++)
> +			context->ystorm_st_context.pdu_hdr.task_hdr.reg[dw_index] =
> +				cpu_to_le32(((u32 *)pdu_header)[dw_index]);
> +	}
> +

And this is what I meant. You are twiddling with the bytes already, so 
why bother with a separate struct at all?

> +	/* M-Storm Context: */
> +	context->mstorm_ag_context.cdu_validation = val_byte;
> +	context->mstorm_st_context.task_type = (u8)(task_type);
> +	context->mstorm_ag_context.task_cid = cpu_to_le16(task_params->conn_icid);
> +
> +	/* Ustorm Context: */
> +	SET_FIELD(context->ustorm_ag_context.flags1, E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV, 1);
> +	context->ustorm_st_context.task_type = (u8)(task_type);
> +	context->ustorm_st_context.cq_rss_number = task_params->cq_rss_number;
> +	context->ustorm_ag_context.icid = cpu_to_le16(task_params->conn_icid);
> +}
> +
> +/* The following function initializes the U-Storm Task Contexts */
> +static inline void
> +init_ustorm_task_contexts(struct ustorm_nvmetcp_task_st_ctx *ustorm_st_context,
> +			  struct e5_ustorm_nvmetcp_task_ag_ctx *ustorm_ag_context,
> +			  u32 remaining_recv_len,
> +			  u32 expected_data_transfer_len, u8 num_sges,
> +			  bool tx_dif_conn_err_en)
> +{
> +	/* Remaining data to be received in bytes. Used in validations*/
> +	ustorm_st_context->rem_rcv_len = cpu_to_le32(remaining_recv_len);
> +	ustorm_ag_context->exp_data_acked = cpu_to_le32(expected_data_transfer_len);
> +	ustorm_st_context->exp_data_transfer_len = cpu_to_le32(expected_data_transfer_len);
> +	SET_FIELD(ustorm_st_context->reg1.reg1_map, NVMETCP_REG1_NUM_SGES, num_sges);
> +	SET_FIELD(ustorm_ag_context->flags2, E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_EN,
> +		  tx_dif_conn_err_en ? 1 : 0);
> +}
> +
> +/* The following function initializes Local Completion Contexts: */
> +static inline void
> +set_local_completion_context(struct e5_nvmetcp_task_context *context)
> +{
> +	SET_FIELD(context->ystorm_st_context.state.flags,
> +		  YSTORM_NVMETCP_TASK_STATE_LOCAL_COMP, 1);
> +	SET_FIELD(context->ustorm_st_context.flags,
> +		  USTORM_NVMETCP_TASK_ST_CTX_LOCAL_COMP, 1);
> +}
> +
> +/* Common Fastpath task init function: */
> +static inline void
> +init_rw_nvmetcp_task(struct nvmetcp_task_params *task_params,
> +		     enum nvmetcp_task_type task_type,
> +		     struct nvmetcp_conn_params *conn_params, void *pdu_header,
> +		     struct storage_sgl_task_params *sgl_task_params)
> +{
> +	struct e5_nvmetcp_task_context *context = task_params->context;
> +	u32 task_size = calc_rw_task_size(task_params, task_type);
> +	u32 exp_data_transfer_len = conn_params->max_burst_length;
> +	bool slow_io = false;
> +	u8 num_sges = 0;
> +
> +	init_default_nvmetcp_task(task_params, pdu_header, task_type);
> +
> +	/* Tx/Rx: */
> +	if (task_params->tx_io_size) {
> +		/* if data to transmit: */
> +		init_scsi_sgl_context(&context->ystorm_st_context.state.sgl_params,
> +				      &context->ystorm_st_context.state.data_desc,
> +				      sgl_task_params);
> +		slow_io = nvmetcp_is_slow_sgl(sgl_task_params->num_sges,
> +					      sgl_task_params->small_mid_sge);
> +		num_sges =
> +			(u8)(!slow_io ? min((u32)sgl_task_params->num_sges,
> +					    (u32)SCSI_NUM_SGES_SLOW_SGL_THR) :
> +					    NVMETCP_WQE_NUM_SGES_SLOWIO);
> +		if (slow_io) {
> +			SET_FIELD(context->ystorm_st_context.state.flags,
> +				  YSTORM_NVMETCP_TASK_STATE_SLOW_IO, 1);
> +		}
> +	} else if (task_params->rx_io_size) {
> +		/* if data to receive: */
> +		init_scsi_sgl_context(&context->mstorm_st_context.sgl_params,
> +				      &context->mstorm_st_context.data_desc,
> +				      sgl_task_params);
> +		num_sges =
> +			(u8)(!nvmetcp_is_slow_sgl(sgl_task_params->num_sges,
> +						  sgl_task_params->small_mid_sge) ?
> +						  min((u32)sgl_task_params->num_sges,
> +						      (u32)SCSI_NUM_SGES_SLOW_SGL_THR) :
> +						      NVMETCP_WQE_NUM_SGES_SLOWIO);
> +		context->mstorm_st_context.rem_task_size = cpu_to_le32(task_size);
> +	}
> +
> +	/* Ustorm context: */
> +	if (exp_data_transfer_len > task_size)
> +		/* The size of the transmitted task*/
> +		exp_data_transfer_len = task_size;
> +	init_ustorm_task_contexts(&context->ustorm_st_context,
> +				  &context->ustorm_ag_context,
> +				  /* Remaining Receive length is the Task Size */
> +				  task_size,
> +				  /* The size of the transmitted task */
> +				  exp_data_transfer_len,
> +				  /* num_sges */
> +				  num_sges,
> +				  false);
> +
> +	/* Set exp_data_acked */
> +	if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE) {
> +		if (task_params->send_write_incapsule)
> +			context->ustorm_ag_context.exp_data_acked = task_size;
> +		else
> +			context->ustorm_ag_context.exp_data_acked = 0;
> +	} else if (task_type == NVMETCP_TASK_TYPE_HOST_READ) {
> +		context->ustorm_ag_context.exp_data_acked = 0;
> +	}
> +
> +	context->ustorm_ag_context.exp_cont_len = 0;
> +
> +	init_sqe(task_params, sgl_task_params, task_type);
> +}
> +
> +static void
> +init_common_initiator_read_task(struct nvmetcp_task_params *task_params,
> +				struct nvmetcp_conn_params *conn_params,
> +				struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
> +				struct storage_sgl_task_params *sgl_task_params)
> +{
> +	init_rw_nvmetcp_task(task_params, NVMETCP_TASK_TYPE_HOST_READ,
> +			     conn_params, cmd_pdu_header, sgl_task_params);
> +}
> +
> +void init_nvmetcp_host_read_task(struct nvmetcp_task_params *task_params,
> +				 struct nvmetcp_conn_params *conn_params,
> +				 struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
> +				 struct storage_sgl_task_params *sgl_task_params)
> +{
> +	init_common_initiator_read_task(task_params, conn_params,
> +					(void *)cmd_pdu_header, sgl_task_params);
> +}
> +
> +static void
> +init_common_initiator_write_task(struct nvmetcp_task_params *task_params,
> +				 struct nvmetcp_conn_params *conn_params,
> +				 struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
> +				 struct storage_sgl_task_params *sgl_task_params)
> +{
> +	init_rw_nvmetcp_task(task_params, NVMETCP_TASK_TYPE_HOST_WRITE,
> +			     conn_params, cmd_pdu_header, sgl_task_params);
> +}
> +
> +void init_nvmetcp_host_write_task(struct nvmetcp_task_params *task_params,
> +				  struct nvmetcp_conn_params *conn_params,
> +				  struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
> +				  struct storage_sgl_task_params *sgl_task_params)
> +{
> +	init_common_initiator_write_task(task_params, conn_params,
> +					 (void *)cmd_pdu_header,
> +					 sgl_task_params);
> +}
> +
> +static void
> +init_common_login_request_task(struct nvmetcp_task_params *task_params,
> +			       void *login_req_pdu_header,
> +			       struct storage_sgl_task_params *tx_sgl_task_params,
> +			       struct storage_sgl_task_params *rx_sgl_task_params)
> +{
> +	struct e5_nvmetcp_task_context *context = task_params->context;
> +
> +	init_default_nvmetcp_task(task_params, (void *)login_req_pdu_header,
> +				  NVMETCP_TASK_TYPE_INIT_CONN_REQUEST);
> +
> +	/* Ustorm Context: */
> +	init_ustorm_task_contexts(&context->ustorm_st_context,
> +				  &context->ustorm_ag_context,
> +
> +				  /* Remaining Receive length is the Task Size */
> +				  task_params->rx_io_size ?
> +				  rx_sgl_task_params->total_buffer_size : 0,
> +
> +				  /* The size of the transmitted task */
> +				  task_params->tx_io_size ?
> +				  tx_sgl_task_params->total_buffer_size : 0,
> +				  0, /* num_sges */
> +				  0); /* tx_dif_conn_err_en */
> +
> +	/* SGL context: */
> +	if (task_params->tx_io_size)
> +		init_scsi_sgl_context(&context->ystorm_st_context.state.sgl_params,
> +				      &context->ystorm_st_context.state.data_desc,
> +				      tx_sgl_task_params);
> +	if (task_params->rx_io_size)
> +		init_scsi_sgl_context(&context->mstorm_st_context.sgl_params,
> +				      &context->mstorm_st_context.data_desc,
> +				      rx_sgl_task_params);
> +
> +	context->mstorm_st_context.rem_task_size =
> +		cpu_to_le32(task_params->rx_io_size ?
> +				 rx_sgl_task_params->total_buffer_size : 0);
> +
> +	init_sqe(task_params, tx_sgl_task_params, NVMETCP_TASK_TYPE_INIT_CONN_REQUEST);
> +}
> +
> +/* The following function initializes Login task in Host mode: */
> +void init_nvmetcp_init_conn_req_task(struct nvmetcp_task_params *task_params,
> +				     struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,
> +				     struct storage_sgl_task_params *tx_sgl_task_params,
> +				     struct storage_sgl_task_params *rx_sgl_task_params)
> +{
> +	init_common_login_request_task(task_params, init_conn_req_pdu_hdr,
> +				       tx_sgl_task_params, rx_sgl_task_params);
> +}
> +
> +void init_cleanup_task_nvmetcp(struct nvmetcp_task_params *task_params)
> +{
> +	init_sqe(task_params, NULL, NVMETCP_TASK_TYPE_CLEANUP);
> +}
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
> new file mode 100644
> index 000000000000..3a8c74356c4c
> --- /dev/null
> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
> @@ -0,0 +1,43 @@
> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
> +/* Copyright 2021 Marvell. All rights reserved. */
> +
> +#ifndef _QED_NVMETCP_FW_FUNCS_H
> +#define _QED_NVMETCP_FW_FUNCS_H
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/kernel.h>
> +#include <linux/list.h>
> +#include <linux/mm.h>
> +#include <linux/types.h>
> +#include <asm/byteorder.h>
> +#include <linux/qed/common_hsi.h>
> +#include <linux/qed/storage_common.h>
> +#include <linux/qed/nvmetcp_common.h>
> +#include <linux/qed/qed_nvmetcp_if.h>
> +
> +#if IS_ENABLED(CONFIG_QED_NVMETCP)
> +
> +void init_nvmetcp_host_read_task(struct nvmetcp_task_params *task_params,
> +				 struct nvmetcp_conn_params *conn_params,
> +				 struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
> +				 struct storage_sgl_task_params *sgl_task_params);
> +
> +void init_nvmetcp_host_write_task(struct nvmetcp_task_params *task_params,
> +				  struct nvmetcp_conn_params *conn_params,
> +				  struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
> +				  struct storage_sgl_task_params *sgl_task_params);
> +
> +void init_nvmetcp_init_conn_req_task(struct nvmetcp_task_params *task_params,
> +				     struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,
> +				     struct storage_sgl_task_params *tx_sgl_task_params,
> +				     struct storage_sgl_task_params *rx_sgl_task_params);
> +
> +void init_cleanup_task_nvmetcp(struct nvmetcp_task_params *task_params);
> +
> +#else /* IS_ENABLED(CONFIG_QED_NVMETCP) */
> +
> +#endif /* IS_ENABLED(CONFIG_QED_NVMETCP) */
> +
> +#endif /* _QED_NVMETCP_FW_FUNCS_H */
> diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
> index dda7a785c321..c0023bb185dd 100644
> --- a/include/linux/qed/nvmetcp_common.h
> +++ b/include/linux/qed/nvmetcp_common.h
> @@ -9,6 +9,9 @@
>   #define NVMETCP_SLOW_PATH_LAYER_CODE (6)
>   #define NVMETCP_WQE_NUM_SGES_SLOWIO (0xf)
>   
> +#define QED_NVMETCP_CMD_HDR_SIZE 72
> +#define QED_NVMETCP_CMN_HDR_SIZE 24
> +
>   /* NVMeTCP firmware function init parameters */
>   struct nvmetcp_spe_func_init {
>   	__le16 half_way_close_timeout;
> diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
> index 04e90dc42c12..d971be84f804 100644
> --- a/include/linux/qed/qed_nvmetcp_if.h
> +++ b/include/linux/qed/qed_nvmetcp_if.h
> @@ -220,6 +220,23 @@ struct qed_nvmetcp_ops {
>   	void (*remove_dst_tcp_port_filter)(struct qed_dev *cdev, u16 dest_port);
>   
>   	void (*clear_all_filters)(struct qed_dev *cdev);
> +
> +	void (*init_read_io)(struct nvmetcp_task_params *task_params,
> +			     struct nvmetcp_conn_params *conn_params,
> +			     struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
> +			     struct storage_sgl_task_params *sgl_task_params);
> +
> +	void (*init_write_io)(struct nvmetcp_task_params *task_params,
> +			      struct nvmetcp_conn_params *conn_params,
> +			      struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,
> +			      struct storage_sgl_task_params *sgl_task_params);
> +
> +	void (*init_icreq_exchange)(struct nvmetcp_task_params *task_params,
> +				    struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,
> +				    struct storage_sgl_task_params *tx_sgl_task_params,
> +				    struct storage_sgl_task_params *rx_sgl_task_params);
> +
> +	void (*init_task_cleanup)(struct nvmetcp_task_params *task_params);
>   };
>   
>   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 07/27] qed: Add IP services APIs support
  2021-04-29 19:09 ` [RFC PATCH v4 07/27] qed: Add IP services APIs support Shai Malin
@ 2021-05-02 11:26   ` Hannes Reinecke
  2021-05-03 15:44     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:26 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024, Nikolay Assa

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Nikolay Assa <nassa@marvell.com>
> 
> This patch introduces APIs which the NVMeTCP Offload device (qedn)
> will use through the paired net-device (qede).
> It includes APIs for:
> - ipv4/ipv6 routing
> - get VLAN from net-device
> - TCP ports reservation
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Nikolay Assa <nassa@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   .../qlogic/qed/qed_nvmetcp_ip_services.c      | 239 ++++++++++++++++++
>   .../linux/qed/qed_nvmetcp_ip_services_if.h    |  29 +++
>   2 files changed, 268 insertions(+)
>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
>   create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
> new file mode 100644
> index 000000000000..2904b1a0830a
> --- /dev/null
> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
> @@ -0,0 +1,239 @@
> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
> +/*
> + * Copyright 2021 Marvell. All rights reserved.
> + */
> +
> +#include <linux/types.h>
> +#include <asm/byteorder.h>
> +#include <asm/param.h>
> +#include <linux/delay.h>
> +#include <linux/pci.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/etherdevice.h>
> +#include <linux/kernel.h>
> +#include <linux/stddef.h>
> +#include <linux/errno.h>
> +
> +#include <net/tcp.h>
> +
> +#include <linux/qed/qed_nvmetcp_ip_services_if.h>
> +
> +#define QED_IP_RESOL_TIMEOUT  4
> +
> +int qed_route_ipv4(struct sockaddr_storage *local_addr,
> +		   struct sockaddr_storage *remote_addr,
> +		   struct sockaddr *hardware_address,
> +		   struct net_device **ndev)
> +{
> +	struct neighbour *neigh = NULL;
> +	__be32 *loc_ip, *rem_ip;
> +	struct rtable *rt;
> +	int rc = -ENXIO;
> +	int retry;
> +
> +	loc_ip = &((struct sockaddr_in *)local_addr)->sin_addr.s_addr;
> +	rem_ip = &((struct sockaddr_in *)remote_addr)->sin_addr.s_addr;
> +	*ndev = NULL;
> +	rt = ip_route_output(&init_net, *rem_ip, *loc_ip, 0/*tos*/, 0/*oif*/);
> +	if (IS_ERR(rt)) {
> +		pr_err("lookup route failed\n");
> +		rc = PTR_ERR(rt);
> +		goto return_err;
> +	}
> +
> +	neigh = dst_neigh_lookup(&rt->dst, rem_ip);
> +	if (!neigh) {
> +		rc = -ENOMEM;
> +		ip_rt_put(rt);
> +		goto return_err;
> +	}
> +
> +	*ndev = rt->dst.dev;
> +	ip_rt_put(rt);
> +
> +	/* If not resolved, kick-off state machine towards resolution */
> +	if (!(neigh->nud_state & NUD_VALID))
> +		neigh_event_send(neigh, NULL);
> +
> +	/* query neighbor until resolved or timeout */
> +	retry = QED_IP_RESOL_TIMEOUT;
> +	while (!(neigh->nud_state & NUD_VALID) && retry > 0) {
> +		msleep(1000);
> +		retry--;
> +	}
> +
> +	if (neigh->nud_state & NUD_VALID) {
> +		/* copy resolved MAC address */
> +		neigh_ha_snapshot(hardware_address->sa_data, neigh, *ndev);
> +
> +		hardware_address->sa_family = (*ndev)->type;
> +		rc = 0;
> +	}
> +
> +	neigh_release(neigh);
> +	if (!(*loc_ip)) {
> +		*loc_ip = inet_select_addr(*ndev, *rem_ip, RT_SCOPE_UNIVERSE);
> +		local_addr->ss_family = AF_INET;
> +	}
> +
> +return_err:
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL(qed_route_ipv4);
> +
> +int qed_route_ipv6(struct sockaddr_storage *local_addr,
> +		   struct sockaddr_storage *remote_addr,
> +		   struct sockaddr *hardware_address,
> +		   struct net_device **ndev)
> +{
> +	struct neighbour *neigh = NULL;
> +	struct dst_entry *dst;
> +	struct flowi6 fl6;
> +	int rc = -ENXIO;
> +	int retry;
> +
> +	memset(&fl6, 0, sizeof(fl6));
> +	fl6.saddr = ((struct sockaddr_in6 *)local_addr)->sin6_addr;
> +	fl6.daddr = ((struct sockaddr_in6 *)remote_addr)->sin6_addr;
> +
> +	dst = ip6_route_output(&init_net, NULL, &fl6);
> +	if (!dst || dst->error) {
> +		if (dst) {
> +			dst_release(dst);
> +			pr_err("lookup route failed %d\n", dst->error);
> +		}
> +
> +		goto out;
> +	}
> +
> +	neigh = dst_neigh_lookup(dst, &fl6.daddr);
> +	if (neigh) {
> +		*ndev = ip6_dst_idev(dst)->dev;
> +
> +		/* If not resolved, kick-off state machine towards resolution */
> +		if (!(neigh->nud_state & NUD_VALID))
> +			neigh_event_send(neigh, NULL);
> +
> +		/* query neighbor until resolved or timeout */
> +		retry = QED_IP_RESOL_TIMEOUT;
> +		while (!(neigh->nud_state & NUD_VALID) && retry > 0) {
> +			msleep(1000);
> +			retry--;
> +		}
> +
> +		if (neigh->nud_state & NUD_VALID) {
> +			neigh_ha_snapshot((u8 *)hardware_address->sa_data, neigh, *ndev);
> +
> +			hardware_address->sa_family = (*ndev)->type;
> +			rc = 0;
> +		}
> +
> +		neigh_release(neigh);
> +
> +		if (ipv6_addr_any(&fl6.saddr)) {
> +			if (ipv6_dev_get_saddr(dev_net(*ndev), *ndev,
> +					       &fl6.daddr, 0, &fl6.saddr)) {
> +				pr_err("Unable to find source IP address\n");
> +				goto out;
> +			}
> +
> +			local_addr->ss_family = AF_INET6;
> +			((struct sockaddr_in6 *)local_addr)->sin6_addr =
> +								fl6.saddr;
> +		}
> +	}
> +
> +	dst_release(dst);
> +
> +out:
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL(qed_route_ipv6);
> +
> +void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id)
> +{
> +	if (is_vlan_dev(*ndev)) {
> +		*vlan_id = vlan_dev_vlan_id(*ndev);
> +		*ndev = vlan_dev_real_dev(*ndev);
> +	}
> +}
> +EXPORT_SYMBOL(qed_vlan_get_ndev);
> +
> +struct pci_dev *qed_validate_ndev(struct net_device *ndev)
> +{
> +	struct pci_dev *pdev = NULL;
> +	struct net_device *upper;
> +
> +	for_each_pci_dev(pdev) {
> +		if (pdev && pdev->driver &&
> +		    !strcmp(pdev->driver->name, "qede")) {
> +			upper = pci_get_drvdata(pdev);
> +			if (upper->ifindex == ndev->ifindex)
> +				return pdev;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL(qed_validate_ndev);
> +
> +__be16 qed_get_in_port(struct sockaddr_storage *sa)
> +{
> +	return sa->ss_family == AF_INET
> +		? ((struct sockaddr_in *)sa)->sin_port
> +		: ((struct sockaddr_in6 *)sa)->sin6_port;
> +}
> +EXPORT_SYMBOL(qed_get_in_port);
> +
> +int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr,
> +		       struct socket **sock, u16 *port)
> +{
> +	struct sockaddr_storage sa;
> +	int rc = 0;
> +
> +	rc = sock_create(local_ip_addr.ss_family, SOCK_STREAM, IPPROTO_TCP, sock);
> +	if (rc) {
> +		pr_warn("failed to create socket: %d\n", rc);
> +		goto err;
> +	}
> +
> +	(*sock)->sk->sk_allocation = GFP_KERNEL;
> +	sk_set_memalloc((*sock)->sk);
> +
> +	rc = kernel_bind(*sock, (struct sockaddr *)&local_ip_addr,
> +			 sizeof(local_ip_addr));
> +
> +	if (rc) {
> +		pr_warn("failed to bind socket: %d\n", rc);
> +		goto err_sock;
> +	}
> +
> +	rc = kernel_getsockname(*sock, (struct sockaddr *)&sa);
> +	if (rc < 0) {
> +		pr_warn("getsockname() failed: %d\n", rc);
> +		goto err_sock;
> +	}
> +
> +	*port = ntohs(qed_get_in_port(&sa));
> +
> +	return 0;
> +
> +err_sock:
> +	sock_release(*sock);
> +	sock = NULL;
> +err:
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL(qed_fetch_tcp_port);
> +
> +void qed_return_tcp_port(struct socket *sock)
> +{
> +	if (sock && sock->sk) {
> +		tcp_set_state(sock->sk, TCP_CLOSE);
> +		sock_release(sock);
> +	}
> +}
> +EXPORT_SYMBOL(qed_return_tcp_port);
> diff --git a/include/linux/qed/qed_nvmetcp_ip_services_if.h b/include/linux/qed/qed_nvmetcp_ip_services_if.h
> new file mode 100644
> index 000000000000..3604aee53796
> --- /dev/null
> +++ b/include/linux/qed/qed_nvmetcp_ip_services_if.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
> +/*
> + * Copyright 2021 Marvell. All rights reserved.
> + */
> +
> +#ifndef _QED_IP_SERVICES_IF_H
> +#define _QED_IP_SERVICES_IF_H
> +
> +#include <linux/types.h>
> +#include <net/route.h>
> +#include <net/ip6_route.h>
> +#include <linux/inetdevice.h>
> +
> +int qed_route_ipv4(struct sockaddr_storage *local_addr,
> +		   struct sockaddr_storage *remote_addr,
> +		   struct sockaddr *hardware_address,
> +		   struct net_device **ndev);
> +int qed_route_ipv6(struct sockaddr_storage *local_addr,
> +		   struct sockaddr_storage *remote_addr,
> +		   struct sockaddr *hardware_address,
> +		   struct net_device **ndev);
> +void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id);
> +struct pci_dev *qed_validate_ndev(struct net_device *ndev);
> +void qed_return_tcp_port(struct socket *sock);
> +int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr,
> +		       struct socket **sock, u16 *port);
> +__be16 qed_get_in_port(struct sockaddr_storage *sa);
> +
> +#endif /* _QED_IP_SERVICES_IF_H */
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 16/27] qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver
  2021-04-29 19:09 ` [RFC PATCH v4 16/27] qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver Shai Malin
@ 2021-05-02 11:27   ` Hannes Reinecke
  2021-05-04 16:52     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:27 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024,
	Arie Gershberg

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the skeleton of the qedn driver.
> The new driver will be added under "drivers/nvme/hw/qedn" and will be
> enabled by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
> 
> The internal implementation:
> - qedn.h:
>    Includes all common structs to be used by the qedn vendor driver.
> 
> - qedn_main.c
>    Includes the qedn_init and qedn_cleanup implementation.
>    As part of the qedn init, the driver will register as a pci device and
>    will work with the Marvell fastlinQ NICs.
>    As part of the probe, the driver will register to the nvme_tcp_offload
>    (ULP).
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Arie Gershberg <agershberg@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   MAINTAINERS                      |  10 ++
>   drivers/nvme/Kconfig             |   1 +
>   drivers/nvme/Makefile            |   1 +
>   drivers/nvme/hw/Kconfig          |   8 ++
>   drivers/nvme/hw/Makefile         |   3 +
>   drivers/nvme/hw/qedn/Makefile    |   5 +
>   drivers/nvme/hw/qedn/qedn.h      |  19 +++
>   drivers/nvme/hw/qedn/qedn_main.c | 201 +++++++++++++++++++++++++++++++
>   8 files changed, 248 insertions(+)
>   create mode 100644 drivers/nvme/hw/Kconfig
>   create mode 100644 drivers/nvme/hw/Makefile
>   create mode 100644 drivers/nvme/hw/qedn/Makefile
>   create mode 100644 drivers/nvme/hw/qedn/qedn.h
>   create mode 100644 drivers/nvme/hw/qedn/qedn_main.c
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 17/27] qedn: Add qedn probe
  2021-04-29 19:09 ` [RFC PATCH v4 17/27] qedn: Add qedn probe Shai Malin
@ 2021-05-02 11:28   ` Hannes Reinecke
  0 siblings, 0 replies; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:28 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024,
	Dean Balandin

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch introduces the functionality of loading and unloading
> physical function.
> qedn_probe() loads the offload device PF(physical function), and
> initialize the HW and the FW with the PF parameters using the
> HW ops->qed_nvmetcp_ops, which are similar to other "qed_*_ops" which
> are used by the qede, qedr, qedf and qedi device drivers.
> qedn_remove() unloads the offload device PF, re-initialize the HW and
> the FW with the PF parameters.
> 
> The struct qedn_ctx is per PF container for PF-specific attributes and
> resources.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/Kconfig          |   1 +
>   drivers/nvme/hw/qedn/qedn.h      |  49 ++++++++
>   drivers/nvme/hw/qedn/qedn_main.c | 191 ++++++++++++++++++++++++++++++-
>   3 files changed, 236 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/hw/Kconfig b/drivers/nvme/hw/Kconfig
> index 374f1f9dbd3d..91b1bd6f07d8 100644
> --- a/drivers/nvme/hw/Kconfig
> +++ b/drivers/nvme/hw/Kconfig
> @@ -2,6 +2,7 @@
>   config NVME_QEDN
>   	tristate "Marvell NVM Express over Fabrics TCP offload"
>   	depends on NVME_TCP_OFFLOAD
> +	select QED_NVMETCP
>   	help
>   	  This enables the Marvell NVMe TCP offload support (qedn).
>   
> diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
> index bcd0748a10fd..c1ac17eabcb7 100644
> --- a/drivers/nvme/hw/qedn/qedn.h
> +++ b/drivers/nvme/hw/qedn/qedn.h
> @@ -6,14 +6,63 @@
>   #ifndef _QEDN_H_
>   #define _QEDN_H_
>   
> +#include <linux/qed/qed_if.h>
> +#include <linux/qed/qed_nvmetcp_if.h>
> +
>   /* Driver includes */
>   #include "../../host/tcp-offload.h"
>   
> +#define QEDN_MAJOR_VERSION		8
> +#define QEDN_MINOR_VERSION		62
> +#define QEDN_REVISION_VERSION		10
> +#define QEDN_ENGINEERING_VERSION	0
> +#define DRV_MODULE_VERSION __stringify(QEDE_MAJOR_VERSION) "."	\
> +		__stringify(QEDE_MINOR_VERSION) "."		\
> +		__stringify(QEDE_REVISION_VERSION) "."		\
> +		__stringify(QEDE_ENGINEERING_VERSION)
> +
>   #define QEDN_MODULE_NAME "qedn"
>   
> +#define QEDN_MAX_TASKS_PER_PF (16 * 1024)
> +#define QEDN_MAX_CONNS_PER_PF (4 * 1024)
> +#define QEDN_FW_CQ_SIZE (4 * 1024)
> +#define QEDN_PROTO_CQ_PROD_IDX	0
> +#define QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES 2
> +
> +enum qedn_state {
> +	QEDN_STATE_CORE_PROBED = 0,
> +	QEDN_STATE_CORE_OPEN,
> +	QEDN_STATE_GL_PF_LIST_ADDED,
> +	QEDN_STATE_MFW_STATE,
> +	QEDN_STATE_REGISTERED_OFFLOAD_DEV,
> +	QEDN_STATE_MODULE_REMOVE_ONGOING,
> +};
> +
>   struct qedn_ctx {
>   	struct pci_dev *pdev;
> +	struct qed_dev *cdev;
> +	struct qed_dev_nvmetcp_info dev_info;
>   	struct nvme_tcp_ofld_dev qedn_ofld_dev;
> +	struct qed_pf_params pf_params;
> +
> +	/* Global PF list entry */
> +	struct list_head gl_pf_entry;
> +
> +	/* Accessed with atomic bit ops, used with enum qedn_state */
> +	unsigned long state;
> +
> +	/* Fast path queues */
> +	u8 num_fw_cqs;
> +};
> +
> +struct qedn_global {
> +	struct list_head qedn_pf_list;
> +
> +	/* Host mode */
> +	struct list_head ctrl_list;
> +
> +	/* Mutex for accessing the global struct */
> +	struct mutex glb_mutex;
>   };
>   
>   #endif /* _QEDN_H_ */
> diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
> index 31d6d86d6eb7..e3e8e3676b79 100644
> --- a/drivers/nvme/hw/qedn/qedn_main.c
> +++ b/drivers/nvme/hw/qedn/qedn_main.c
> @@ -14,6 +14,10 @@
>   
>   #define CHIP_NUM_AHP_NVMETCP 0x8194
>   
> +const struct qed_nvmetcp_ops *qed_ops;
> +
> +/* Global context instance */
> +struct qedn_global qedn_glb;
>   static struct pci_device_id qedn_pci_tbl[] = {
>   	{ PCI_VDEVICE(QLOGIC, CHIP_NUM_AHP_NVMETCP), 0 },
>   	{0, 0},
> @@ -99,12 +103,132 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
>   	.commit_rqs = qedn_commit_rqs,
>   };
>   
> +static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)
> +{
> +	/* Placeholder - Initialize qedn fields */
> +}
> +
> +static inline void
> +qedn_init_core_probe_params(struct qed_probe_params *probe_params)
> +{
> +	memset(probe_params, 0, sizeof(*probe_params));
> +	probe_params->protocol = QED_PROTOCOL_NVMETCP;
> +	probe_params->is_vf = false;
> +	probe_params->recov_in_prog = 0;
> +}
> +
> +static inline int qedn_core_probe(struct qedn_ctx *qedn)
> +{
> +	struct qed_probe_params probe_params;
> +	int rc = 0;
> +
> +	qedn_init_core_probe_params(&probe_params);
> +	pr_info("Starting QED probe\n");
> +	qedn->cdev = qed_ops->common->probe(qedn->pdev, &probe_params);
> +	if (!qedn->cdev) {
> +		rc = -ENODEV;
> +		pr_err("QED probe failed\n");
> +	}
> +
> +	return rc;
> +}
> +
> +static void qedn_add_pf_to_gl_list(struct qedn_ctx *qedn)
> +{
> +	mutex_lock(&qedn_glb.glb_mutex);
> +	list_add_tail(&qedn->gl_pf_entry, &qedn_glb.qedn_pf_list);
> +	mutex_unlock(&qedn_glb.glb_mutex);
> +}
> +
> +static void qedn_remove_pf_from_gl_list(struct qedn_ctx *qedn)
> +{
> +	mutex_lock(&qedn_glb.glb_mutex);
> +	list_del_init(&qedn->gl_pf_entry);
> +	mutex_unlock(&qedn_glb.glb_mutex);
> +}
> +
> +static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)
> +{
> +	u32 fw_conn_queue_pages = QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES;
> +	struct qed_nvmetcp_pf_params *pf_params;
> +
> +	pf_params = &qedn->pf_params.nvmetcp_pf_params;
> +	memset(pf_params, 0, sizeof(*pf_params));
> +	qedn->num_fw_cqs = min_t(u8, qedn->dev_info.num_cqs, num_online_cpus());
> +
> +	pf_params->num_cons = QEDN_MAX_CONNS_PER_PF;
> +	pf_params->num_tasks = QEDN_MAX_TASKS_PER_PF;
> +
> +	/* Placeholder - Initialize function level queues */
> +
> +	/* Placeholder - Initialize TCP params */
> +
> +	/* Queues */
> +	pf_params->num_sq_pages_in_ring = fw_conn_queue_pages;
> +	pf_params->num_r2tq_pages_in_ring = fw_conn_queue_pages;
> +	pf_params->num_uhq_pages_in_ring = fw_conn_queue_pages;
> +	pf_params->num_queues = qedn->num_fw_cqs;
> +	pf_params->cq_num_entries = QEDN_FW_CQ_SIZE;
> +
> +	/* the CQ SB pi */
> +	pf_params->gl_rq_pi = QEDN_PROTO_CQ_PROD_IDX;
> +
> +	return 0;
> +}
> +
> +static inline int qedn_slowpath_start(struct qedn_ctx *qedn)
> +{
> +	struct qed_slowpath_params sp_params = {};
> +	int rc = 0;
> +
> +	/* Start the Slowpath-process */
> +	sp_params.int_mode = QED_INT_MODE_MSIX;
> +	sp_params.drv_major = QEDN_MAJOR_VERSION;
> +	sp_params.drv_minor = QEDN_MINOR_VERSION;
> +	sp_params.drv_rev = QEDN_REVISION_VERSION;
> +	sp_params.drv_eng = QEDN_ENGINEERING_VERSION;
> +	strscpy(sp_params.name, "qedn NVMeTCP", QED_DRV_VER_STR_SIZE);
> +	rc = qed_ops->common->slowpath_start(qedn->cdev, &sp_params);
> +	if (rc)
> +		pr_err("Cannot start slowpath\n");
> +
> +	return rc;
> +}
> +
>   static void __qedn_remove(struct pci_dev *pdev)
>   {
>   	struct qedn_ctx *qedn = pci_get_drvdata(pdev);
> +	int rc;
> +
> +	pr_notice("qedn remove started: abs PF id=%u\n",
> +		  qedn->dev_info.common.abs_pf_id);
> +
> +	if (test_and_set_bit(QEDN_STATE_MODULE_REMOVE_ONGOING, &qedn->state)) {
> +		pr_err("Remove already ongoing\n");
> +
> +		return;
> +	}
> +
> +	if (test_and_clear_bit(QEDN_STATE_REGISTERED_OFFLOAD_DEV, &qedn->state))
> +		nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
> +
> +	if (test_and_clear_bit(QEDN_STATE_GL_PF_LIST_ADDED, &qedn->state))
> +		qedn_remove_pf_from_gl_list(qedn);
> +	else
> +		pr_err("Failed to remove from global PF list\n");
> +
> +	if (test_and_clear_bit(QEDN_STATE_MFW_STATE, &qedn->state)) {
> +		rc = qed_ops->common->update_drv_state(qedn->cdev, false);
> +		if (rc)
> +			pr_err("Failed to send drv state to MFW\n");
> +	}
> +
> +	if (test_and_clear_bit(QEDN_STATE_CORE_OPEN, &qedn->state))
> +		qed_ops->common->slowpath_stop(qedn->cdev);
> +
> +	if (test_and_clear_bit(QEDN_STATE_CORE_PROBED, &qedn->state))
> +		qed_ops->common->remove(qedn->cdev);
>   
> -	pr_notice("Starting qedn_remove\n");
> -	nvme_tcp_ofld_unregister_dev(&qedn->qedn_ofld_dev);
>   	kfree(qedn);
>   	pr_notice("Ending qedn_remove successfully\n");
>   }
> @@ -144,15 +268,55 @@ static int __qedn_probe(struct pci_dev *pdev)
>   	if (!qedn)
>   		return -ENODEV;
>   
> +	qedn_init_pf_struct(qedn);
> +
> +	/* QED probe */
> +	rc = qedn_core_probe(qedn);
> +	if (rc)
> +		goto exit_probe_and_release_mem;
> +
> +	set_bit(QEDN_STATE_CORE_PROBED, &qedn->state);
> +
> +	rc = qed_ops->fill_dev_info(qedn->cdev, &qedn->dev_info);
> +	if (rc) {
> +		pr_err("fill_dev_info failed\n");
> +		goto exit_probe_and_release_mem;
> +	}
> +
> +	qedn_add_pf_to_gl_list(qedn);
> +	set_bit(QEDN_STATE_GL_PF_LIST_ADDED, &qedn->state);
> +
> +	rc = qedn_set_nvmetcp_pf_param(qedn);
> +	if (rc)
> +		goto exit_probe_and_release_mem;
> +
> +	qed_ops->common->update_pf_params(qedn->cdev, &qedn->pf_params);
> +	rc = qedn_slowpath_start(qedn);
> +	if (rc)
> +		goto exit_probe_and_release_mem;
> +
> +	set_bit(QEDN_STATE_CORE_OPEN, &qedn->state);
> +
> +	rc = qed_ops->common->update_drv_state(qedn->cdev, true);
> +	if (rc) {
> +		pr_err("Failed to send drv state to MFW\n");
> +		goto exit_probe_and_release_mem;
> +	}
> +
> +	set_bit(QEDN_STATE_MFW_STATE, &qedn->state);
> +
>   	qedn->qedn_ofld_dev.ops = &qedn_ofld_ops;
>   	INIT_LIST_HEAD(&qedn->qedn_ofld_dev.entry);
>   	rc = nvme_tcp_ofld_register_dev(&qedn->qedn_ofld_dev);
>   	if (rc)
> -		goto release_qedn;
> +		goto exit_probe_and_release_mem;
> +
> +	set_bit(QEDN_STATE_REGISTERED_OFFLOAD_DEV, &qedn->state);
>   
>   	return 0;
> -release_qedn:
> -	kfree(qedn);
> +exit_probe_and_release_mem:
> +	__qedn_remove(pdev);
> +	pr_err("probe ended with error\n");
>   
>   	return rc;
>   }
> @@ -170,10 +334,26 @@ static struct pci_driver qedn_pci_driver = {
>   	.shutdown = qedn_shutdown,
>   };
>   
> +static inline void qedn_init_global_contxt(void)
> +{
> +	INIT_LIST_HEAD(&qedn_glb.qedn_pf_list);
> +	INIT_LIST_HEAD(&qedn_glb.ctrl_list);
> +	mutex_init(&qedn_glb.glb_mutex);
> +}
> +
>   static int __init qedn_init(void)
>   {
>   	int rc;
>   
> +	qedn_init_global_contxt();
> +
> +	qed_ops = qed_get_nvmetcp_ops();
> +	if (!qed_ops) {
> +		pr_err("Failed to get QED NVMeTCP ops\n");
> +
> +		return -EINVAL;
> +	}
> +
>   	rc = pci_register_driver(&qedn_pci_driver);
>   	if (rc) {
>   		pr_err("Failed to register pci driver\n");
> @@ -189,6 +369,7 @@ static int __init qedn_init(void)
>   static void __exit qedn_cleanup(void)
>   {
>   	pci_unregister_driver(&qedn_pci_driver);
> +	qed_put_nvmetcp_ops();
>   	pr_notice("Unloading qedn ended\n");
>   }
>   
> 
I do wonder what you need the global list of devices for, but let's see.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 18/27] qedn: Add qedn_claim_dev API support
  2021-04-29 19:09 ` [RFC PATCH v4 18/27] qedn: Add qedn_claim_dev API support Shai Malin
@ 2021-05-02 11:29   ` Hannes Reinecke
  2021-05-07 13:57     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:29 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024, Nikolay Assa

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Nikolay Assa <nassa@marvell.com>
> 
> This patch introduces the qedn_claim_dev() network service which the
> offload device (qedn) is using through the paired net-device (qede).
> qedn_claim_dev() returns true if the IP addr(IPv4 or IPv6) of the target
> server is reachable via the net-device which is paired with the
> offloaded device.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Nikolay Assa <nassa@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/qedn/qedn.h      |  4 +++
>   drivers/nvme/hw/qedn/qedn_main.c | 42 ++++++++++++++++++++++++++++++--
>   2 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
> index c1ac17eabcb7..7efe2366eb7c 100644
> --- a/drivers/nvme/hw/qedn/qedn.h
> +++ b/drivers/nvme/hw/qedn/qedn.h
> @@ -8,6 +8,10 @@
>   
>   #include <linux/qed/qed_if.h>
>   #include <linux/qed/qed_nvmetcp_if.h>
> +#include <linux/qed/qed_nvmetcp_ip_services_if.h>
> +#include <linux/qed/qed_chain.h>
> +#include <linux/qed/storage_common.h>
> +#include <linux/qed/nvmetcp_common.h>
>   
>   /* Driver includes */
>   #include "../../host/tcp-offload.h"
> diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
> index e3e8e3676b79..52007d35622d 100644
> --- a/drivers/nvme/hw/qedn/qedn_main.c
> +++ b/drivers/nvme/hw/qedn/qedn_main.c
> @@ -27,9 +27,47 @@ static int
>   qedn_claim_dev(struct nvme_tcp_ofld_dev *dev,
>   	       struct nvme_tcp_ofld_ctrl_con_params *conn_params)
>   {
> -	/* Placeholder - qedn_claim_dev */
> +	struct pci_dev *qede_pdev = NULL;
> +	struct net_device *ndev = NULL;
> +	u16 vlan_id = 0;
> +	int rc = 0;
>   
> -	return 0;
> +	/* qedn utilizes host network stack through paired qede device for
> +	 * non-offload traffic. First we verify there is valid route to remote
> +	 * peer.
> +	 */
> +	if (conn_params->remote_ip_addr.ss_family == AF_INET) {
> +		rc = qed_route_ipv4(&conn_params->local_ip_addr,
> +				    &conn_params->remote_ip_addr,
> +				    &conn_params->remote_mac_addr,
> +				    &ndev);
> +	} else if (conn_params->remote_ip_addr.ss_family == AF_INET6) {
> +		rc = qed_route_ipv6(&conn_params->local_ip_addr,
> +				    &conn_params->remote_ip_addr,
> +				    &conn_params->remote_mac_addr,
> +				    &ndev);
> +	} else {
> +		pr_err("address family %d not supported\n",
> +		       conn_params->remote_ip_addr.ss_family);
> +
> +		return false;
> +	}
> +
> +	if (rc)
> +		return false;
> +
> +	qed_vlan_get_ndev(&ndev, &vlan_id);
> +	conn_params->vlan_id = vlan_id;
> +
> +	/* route found through ndev - validate this is qede*/
> +	qede_pdev = qed_validate_ndev(ndev);
> +	if (!qede_pdev)
> +		return false;
> +
> +	dev->qede_pdev = qede_pdev;
> +	dev->ndev = ndev;
> +
> +	return true;
>   }
>   
>   static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid,
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 19/27] qedn: Add IRQ and fast-path resources initializations
  2021-04-29 19:09 ` [RFC PATCH v4 19/27] qedn: Add IRQ and fast-path resources initializations Shai Malin
@ 2021-05-02 11:32   ` Hannes Reinecke
  2021-05-05 17:54     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:32 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the adding of qedn_fp_queue - this is a per cpu
> core element which handles all of the connections on that cpu core.
> The qedn_fp_queue will handle a group of connections (NVMeoF QPs) which
> are handled on the same cpu core, and will only use the same FW-driver
> resources with no need to be related to the same NVMeoF controller.
> 
> The per qedn_fq_queue resources are the FW CQ and FW status block:
> - The FW CQ will be used for the FW to notify the driver that the
>    the exchange has ended and the FW will pass the incoming NVMeoF CQE
>    (if exist) to the driver.
> - FW status block - which is used for the FW to notify the driver with
>    the producer update of the FW CQE chain.
> 
> The FW fast-path queues are based on qed_chain.h
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/qedn/qedn.h      |  26 +++
>   drivers/nvme/hw/qedn/qedn_main.c | 287 ++++++++++++++++++++++++++++++-
>   2 files changed, 310 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
> index 7efe2366eb7c..5d4d04d144e4 100644
> --- a/drivers/nvme/hw/qedn/qedn.h
> +++ b/drivers/nvme/hw/qedn/qedn.h
> @@ -33,18 +33,41 @@
>   #define QEDN_PROTO_CQ_PROD_IDX	0
>   #define QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES 2
>   
> +#define QEDN_PAGE_SIZE	4096 /* FW page size - Configurable */
> +#define QEDN_IRQ_NAME_LEN 24
> +#define QEDN_IRQ_NO_FLAGS 0
> +
> +/* TCP defines */
> +#define QEDN_TCP_RTO_DEFAULT 280
> +
>   enum qedn_state {
>   	QEDN_STATE_CORE_PROBED = 0,
>   	QEDN_STATE_CORE_OPEN,
>   	QEDN_STATE_GL_PF_LIST_ADDED,
>   	QEDN_STATE_MFW_STATE,
> +	QEDN_STATE_NVMETCP_OPEN,
> +	QEDN_STATE_IRQ_SET,
> +	QEDN_STATE_FP_WORK_THREAD_SET,
>   	QEDN_STATE_REGISTERED_OFFLOAD_DEV,
>   	QEDN_STATE_MODULE_REMOVE_ONGOING,
>   };
>   
> +/* Per CPU core params */
> +struct qedn_fp_queue {
> +	struct qed_chain cq_chain;
> +	u16 *cq_prod;
> +	struct mutex cq_mutex; /* cq handler mutex */
> +	struct qedn_ctx	*qedn;
> +	struct qed_sb_info *sb_info;
> +	unsigned int cpu;
> +	u16 sb_id;
> +	char irqname[QEDN_IRQ_NAME_LEN];
> +};
> +
>   struct qedn_ctx {
>   	struct pci_dev *pdev;
>   	struct qed_dev *cdev;
> +	struct qed_int_info int_info;
>   	struct qed_dev_nvmetcp_info dev_info;
>   	struct nvme_tcp_ofld_dev qedn_ofld_dev;
>   	struct qed_pf_params pf_params;
> @@ -57,6 +80,9 @@ struct qedn_ctx {
>   
>   	/* Fast path queues */
>   	u8 num_fw_cqs;
> +	struct qedn_fp_queue *fp_q_arr;
> +	struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;
> +	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
>   };
>   
>   struct qedn_global {
> diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
> index 52007d35622d..0135a1f490da 100644
> --- a/drivers/nvme/hw/qedn/qedn_main.c
> +++ b/drivers/nvme/hw/qedn/qedn_main.c
> @@ -141,6 +141,104 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
>   	.commit_rqs = qedn_commit_rqs,
>   };
>   
> +/* Fastpath IRQ handler */
> +static irqreturn_t qedn_irq_handler(int irq, void *dev_id)
> +{
> +	/* Placeholder */
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static void qedn_sync_free_irqs(struct qedn_ctx *qedn)
> +{
> +	u16 vector_idx;
> +	int i;
> +
> +	for (i = 0; i < qedn->num_fw_cqs; i++) {
> +		vector_idx = i * qedn->dev_info.common.num_hwfns +
> +			     qed_ops->common->get_affin_hwfn_idx(qedn->cdev);
> +		synchronize_irq(qedn->int_info.msix[vector_idx].vector);
> +		irq_set_affinity_hint(qedn->int_info.msix[vector_idx].vector,
> +				      NULL);
> +		free_irq(qedn->int_info.msix[vector_idx].vector,
> +			 &qedn->fp_q_arr[i]);
> +	}
> +
> +	qedn->int_info.used_cnt = 0;
> +	qed_ops->common->set_fp_int(qedn->cdev, 0);
> +}
> +
> +static int qedn_request_msix_irq(struct qedn_ctx *qedn)
> +{
> +	struct pci_dev *pdev = qedn->pdev;
> +	struct qedn_fp_queue *fp_q = NULL;
> +	int i, rc, cpu;
> +	u16 vector_idx;
> +	u32 vector;
> +
> +	/* numa-awareness will be added in future enhancements */
> +	cpu = cpumask_first(cpu_online_mask);
> +	for (i = 0; i < qedn->num_fw_cqs; i++) {
> +		fp_q = &qedn->fp_q_arr[i];
> +		vector_idx = i * qedn->dev_info.common.num_hwfns +
> +			     qed_ops->common->get_affin_hwfn_idx(qedn->cdev);
> +		vector = qedn->int_info.msix[vector_idx].vector;
> +		sprintf(fp_q->irqname, "qedn_queue_%x.%x.%x_%d",
> +			pdev->bus->number, PCI_SLOT(pdev->devfn),
> +			PCI_FUNC(pdev->devfn), i);
> +		rc = request_irq(vector, qedn_irq_handler, QEDN_IRQ_NO_FLAGS,
> +				 fp_q->irqname, fp_q);
> +		if (rc) {
> +			pr_err("request_irq failed.\n");
> +			qedn_sync_free_irqs(qedn);
> +
> +			return rc;
> +		}
> +
> +		fp_q->cpu = cpu;
> +		qedn->int_info.used_cnt++;
> +		rc = irq_set_affinity_hint(vector, get_cpu_mask(cpu));
> +		cpu = cpumask_next_wrap(cpu, cpu_online_mask, -1, false);
> +	}
> +
> +	return 0;
> +}
> +

Hah. I knew it.
So you _do_ have a limited number of MSIx interrupts.
And that should limit the number of queue pairs, too.

> +static int qedn_setup_irq(struct qedn_ctx *qedn)
> +{
> +	int rc = 0;
> +	u8 rval;
> +
> +	rval = qed_ops->common->set_fp_int(qedn->cdev, qedn->num_fw_cqs);
> +	if (rval < qedn->num_fw_cqs) {
> +		qedn->num_fw_cqs = rval;
> +		if (rval == 0) {
> +			pr_err("set_fp_int return 0 IRQs\n");
> +
> +			return -ENODEV;
> +		}
> +	}
> +
> +	rc = qed_ops->common->get_fp_int(qedn->cdev, &qedn->int_info);
> +	if (rc) {
> +		pr_err("get_fp_int failed\n");
> +		goto exit_setup_int;
> +	}
> +
> +	if (qedn->int_info.msix_cnt) {
> +		rc = qedn_request_msix_irq(qedn);
> +		goto exit_setup_int;
> +	} else {
> +		pr_err("msix_cnt = 0\n");
> +		rc = -EINVAL;
> +		goto exit_setup_int;
> +	}
> +
> +exit_setup_int:
> +
> +	return rc;
> +}
> +
>   static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)
>   {
>   	/* Placeholder - Initialize qedn fields */
> @@ -185,21 +283,173 @@ static void qedn_remove_pf_from_gl_list(struct qedn_ctx *qedn)
>   	mutex_unlock(&qedn_glb.glb_mutex);
>   }
>   
> +static void qedn_free_function_queues(struct qedn_ctx *qedn)
> +{
> +	struct qed_sb_info *sb_info = NULL;
> +	struct qedn_fp_queue *fp_q;
> +	int i;
> +
> +	/* Free workqueues */
> +
> +	/* Free the fast path queues*/
> +	for (i = 0; i < qedn->num_fw_cqs; i++) {
> +		fp_q = &qedn->fp_q_arr[i];
> +
> +		/* Free SB */
> +		sb_info = fp_q->sb_info;
> +		if (sb_info->sb_virt) {
> +			qed_ops->common->sb_release(qedn->cdev, sb_info,
> +						    fp_q->sb_id,
> +						    QED_SB_TYPE_STORAGE);
> +			dma_free_coherent(&qedn->pdev->dev,
> +					  sizeof(*sb_info->sb_virt),
> +					  (void *)sb_info->sb_virt,
> +					  sb_info->sb_phys);
> +			memset(sb_info, 0, sizeof(*sb_info));
> +			kfree(sb_info);
> +			fp_q->sb_info = NULL;
> +		}
> +
> +		qed_ops->common->chain_free(qedn->cdev, &fp_q->cq_chain);
> +	}
> +
> +	if (qedn->fw_cq_array_virt)
> +		dma_free_coherent(&qedn->pdev->dev,
> +				  qedn->num_fw_cqs * sizeof(u64),
> +				  qedn->fw_cq_array_virt,
> +				  qedn->fw_cq_array_phy);
> +	kfree(qedn->fp_q_arr);
> +	qedn->fp_q_arr = NULL;
> +}
> +
> +static int qedn_alloc_and_init_sb(struct qedn_ctx *qedn,
> +				  struct qed_sb_info *sb_info, u16 sb_id)
> +{
> +	int rc = 0;
> +
> +	sb_info->sb_virt = dma_alloc_coherent(&qedn->pdev->dev,
> +					      sizeof(struct status_block_e4),
> +					      &sb_info->sb_phys, GFP_KERNEL);
> +	if (!sb_info->sb_virt) {
> +		pr_err("Status block allocation failed\n");
> +
> +		return -ENOMEM;
> +	}
> +
> +	rc = qed_ops->common->sb_init(qedn->cdev, sb_info, sb_info->sb_virt,
> +				      sb_info->sb_phys, sb_id,
> +				      QED_SB_TYPE_STORAGE);
> +	if (rc) {
> +		pr_err("Status block initialization failed\n");
> +
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +
> +static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
> +{
> +	struct qed_chain_init_params chain_params = {};
> +	struct status_block_e4 *sb = NULL;  /* To change to status_block_e4 */
> +	struct qedn_fp_queue *fp_q = NULL;
> +	int rc = 0, arr_size;
> +	u64 cq_phy_addr;
> +	int i;
> +
> +	/* Place holder - IO-path workqueues */
> +
> +	qedn->fp_q_arr = kcalloc(qedn->num_fw_cqs,
> +				 sizeof(struct qedn_fp_queue), GFP_KERNEL);
> +	if (!qedn->fp_q_arr)
> +		return -ENOMEM;
> +
> +	arr_size = qedn->num_fw_cqs * sizeof(struct nvmetcp_glbl_queue_entry);
> +	qedn->fw_cq_array_virt = dma_alloc_coherent(&qedn->pdev->dev,
> +						    arr_size,
> +						    &qedn->fw_cq_array_phy,
> +						    GFP_KERNEL);
> +	if (!qedn->fw_cq_array_virt) {
> +		rc = -ENOMEM;
> +		goto mem_alloc_failure;
> +	}
> +
> +	/* placeholder - create task pools */
> +
> +	for (i = 0; i < qedn->num_fw_cqs; i++) {
> +		fp_q = &qedn->fp_q_arr[i];
> +		mutex_init(&fp_q->cq_mutex);
> +
> +		/* FW CQ */
> +		chain_params.intended_use = QED_CHAIN_USE_TO_CONSUME,
> +		chain_params.mode = QED_CHAIN_MODE_PBL,
> +		chain_params.cnt_type = QED_CHAIN_CNT_TYPE_U16,
> +		chain_params.num_elems = QEDN_FW_CQ_SIZE;
> +		chain_params.elem_size = 64; /*Placeholder - sizeof(struct nvmetcp_fw_cqe)*/
> +
> +		rc = qed_ops->common->chain_alloc(qedn->cdev,
> +						  &fp_q->cq_chain,
> +						  &chain_params);
> +		if (rc) {
> +			pr_err("CQ chain pci_alloc_consistent fail\n");
> +			goto mem_alloc_failure;
> +		}
> +
> +		cq_phy_addr = qed_chain_get_pbl_phys(&fp_q->cq_chain);
> +		qedn->fw_cq_array_virt[i].cq_pbl_addr.hi = PTR_HI(cq_phy_addr);
> +		qedn->fw_cq_array_virt[i].cq_pbl_addr.lo = PTR_LO(cq_phy_addr);
> +
> +		/* SB */
> +		fp_q->sb_info = kzalloc(sizeof(*fp_q->sb_info), GFP_KERNEL);
> +		if (!fp_q->sb_info)
> +			goto mem_alloc_failure;
> +
> +		fp_q->sb_id = i;
> +		rc = qedn_alloc_and_init_sb(qedn, fp_q->sb_info, fp_q->sb_id);
> +		if (rc) {
> +			pr_err("SB allocation and initialization failed.\n");
> +			goto mem_alloc_failure;
> +		}
> +
> +		sb = fp_q->sb_info->sb_virt;
> +		fp_q->cq_prod = (u16 *)&sb->pi_array[QEDN_PROTO_CQ_PROD_IDX];
> +		fp_q->qedn = qedn;
> +
> +		/* Placeholder - Init IO-path workqueue */
> +
> +		/* Placeholder - Init IO-path resources */
> +	}
> +
> +	return 0;
> +
> +mem_alloc_failure:
> +	pr_err("Function allocation failed\n");
> +	qedn_free_function_queues(qedn);
> +
> +	return rc;
> +}
> +
>   static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)
>   {
>   	u32 fw_conn_queue_pages = QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES;
>   	struct qed_nvmetcp_pf_params *pf_params;
> +	int rc;
>   
>   	pf_params = &qedn->pf_params.nvmetcp_pf_params;
>   	memset(pf_params, 0, sizeof(*pf_params));
>   	qedn->num_fw_cqs = min_t(u8, qedn->dev_info.num_cqs, num_online_cpus());
> +	pr_info("Num qedn CPU cores is %u\n", qedn->num_fw_cqs);
>   
>   	pf_params->num_cons = QEDN_MAX_CONNS_PER_PF;
>   	pf_params->num_tasks = QEDN_MAX_TASKS_PER_PF;
>   
> -	/* Placeholder - Initialize function level queues */
> +	rc = qedn_alloc_function_queues(qedn);
> +	if (rc) {
> +		pr_err("Global queue allocation failed.\n");
> +		goto err_alloc_mem;
> +	}
>   
> -	/* Placeholder - Initialize TCP params */
> +	set_bit(QEDN_STATE_FP_WORK_THREAD_SET, &qedn->state);
>   
>   	/* Queues */
>   	pf_params->num_sq_pages_in_ring = fw_conn_queue_pages;
> @@ -207,11 +457,14 @@ static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)
>   	pf_params->num_uhq_pages_in_ring = fw_conn_queue_pages;
>   	pf_params->num_queues = qedn->num_fw_cqs;
>   	pf_params->cq_num_entries = QEDN_FW_CQ_SIZE;
> +	pf_params->glbl_q_params_addr = qedn->fw_cq_array_phy;
>   
>   	/* the CQ SB pi */
>   	pf_params->gl_rq_pi = QEDN_PROTO_CQ_PROD_IDX;
>   
> -	return 0;
> +err_alloc_mem:
> +
> +	return rc;
>   }
>   
>   static inline int qedn_slowpath_start(struct qedn_ctx *qedn)
> @@ -255,6 +508,12 @@ static void __qedn_remove(struct pci_dev *pdev)
>   	else
>   		pr_err("Failed to remove from global PF list\n");
>   
> +	if (test_and_clear_bit(QEDN_STATE_IRQ_SET, &qedn->state))
> +		qedn_sync_free_irqs(qedn);
> +
> +	if (test_and_clear_bit(QEDN_STATE_NVMETCP_OPEN, &qedn->state))
> +		qed_ops->stop(qedn->cdev);
> +
>   	if (test_and_clear_bit(QEDN_STATE_MFW_STATE, &qedn->state)) {
>   		rc = qed_ops->common->update_drv_state(qedn->cdev, false);
>   		if (rc)
> @@ -264,6 +523,9 @@ static void __qedn_remove(struct pci_dev *pdev)
>   	if (test_and_clear_bit(QEDN_STATE_CORE_OPEN, &qedn->state))
>   		qed_ops->common->slowpath_stop(qedn->cdev);
>   
> +	if (test_and_clear_bit(QEDN_STATE_FP_WORK_THREAD_SET, &qedn->state))
> +		qedn_free_function_queues(qedn);
> +
>   	if (test_and_clear_bit(QEDN_STATE_CORE_PROBED, &qedn->state))
>   		qed_ops->common->remove(qedn->cdev);
>   
> @@ -335,6 +597,25 @@ static int __qedn_probe(struct pci_dev *pdev)
>   
>   	set_bit(QEDN_STATE_CORE_OPEN, &qedn->state);
>   
> +	rc = qedn_setup_irq(qedn);
> +	if (rc)
> +		goto exit_probe_and_release_mem;
> +
> +	set_bit(QEDN_STATE_IRQ_SET, &qedn->state);
> +
> +	/* NVMeTCP start HW PF */
> +	rc = qed_ops->start(qedn->cdev,
> +			    NULL /* Placeholder for FW IO-path resources */,
> +			    qedn,
> +			    NULL /* Placeholder for FW Event callback */);
> +	if (rc) {
> +		rc = -ENODEV;
> +		pr_err("Cannot start NVMeTCP Function\n");
> +		goto exit_probe_and_release_mem;
> +	}
> +
> +	set_bit(QEDN_STATE_NVMETCP_OPEN, &qedn->state);
> +
>   	rc = qed_ops->common->update_drv_state(qedn->cdev, true);
>   	if (rc) {
>   		pr_err("Failed to send drv state to MFW\n");
> 
So you have a limited number of MSI-x interrupts, but don't limit the 
number of hw queues to that. Why?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 20/27] qedn: Add connection-level slowpath functionality
  2021-04-29 19:09 ` [RFC PATCH v4 20/27] qedn: Add connection-level slowpath functionality Shai Malin
@ 2021-05-02 11:37   ` Hannes Reinecke
  2021-05-05 17:56     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:37 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Prabhakar Kushwaha <pkushwaha@marvell.com>
> 
> This patch will present the connection (queue) level slowpath
> implementation relevant for create_queue flow.
> 
> The internal implementation:
> - Add per controller slowpath workqeueue via pre_setup_ctrl
> 
> - qedn_main.c:
>    Includes qedn's implementation of the create_queue op.
> 
> - qedn_conn.c will include main slowpath connection level functions,
>    including:
>      1. Per-queue resources allocation.
>      2. Creating a new connection.
>      3. Offloading the connection to the FW for TCP handshake.
>      4. Destroy of a connection.
>      5. Support of delete and free controller.
>      6. TCP port management via qed_fetch_tcp_port, qed_return_tcp_port
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/qedn/Makefile    |   5 +-
>   drivers/nvme/hw/qedn/qedn.h      | 173 ++++++++++-
>   drivers/nvme/hw/qedn/qedn_conn.c | 508 +++++++++++++++++++++++++++++++
>   drivers/nvme/hw/qedn/qedn_main.c | 208 ++++++++++++-
>   4 files changed, 883 insertions(+), 11 deletions(-)
>   create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c
> 

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 21/27] qedn: Add support of configuring HW filter block
  2021-04-29 19:09 ` [RFC PATCH v4 21/27] qedn: Add support of configuring HW filter block Shai Malin
@ 2021-05-02 11:38   ` Hannes Reinecke
  2021-05-05 17:57     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:38 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Prabhakar Kushwaha <pkushwaha@marvell.com>
> 
> HW filter can be configured to filter TCP packets based on either
> source or target TCP port. QEDN leverage this feature to route
> NVMeTCP traffic.
> 
> This patch configures HW filter block based on source port for all
> receiving packets to deliver correct QEDN PF.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/qedn/qedn.h      |  15 +++++
>   drivers/nvme/hw/qedn/qedn_main.c | 108 ++++++++++++++++++++++++++++++-
>   2 files changed, 122 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 22/27] qedn: Add IO level nvme_req and fw_cq workqueues
  2021-04-29 19:09 ` [RFC PATCH v4 22/27] qedn: Add IO level nvme_req and fw_cq workqueues Shai Malin
@ 2021-05-02 11:42   ` Hannes Reinecke
  2021-05-07 13:56     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:42 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the IO level workqueues:
> 
> - qedn_nvme_req_fp_wq(): process new requests, similar to
> 			 nvme_tcp_io_work(). The flow starts from
> 			 send_req() and will aggregate all the requests
> 			 on this CPU core.
> 
> - qedn_fw_cq_fp_wq():   process new FW completions, the flow starts from
> 			the IRQ handler and for a single interrupt it will
> 			process all the pending NVMeoF Completions under
> 			polling mode.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/qedn/Makefile    |   2 +-
>   drivers/nvme/hw/qedn/qedn.h      |  29 +++++++
>   drivers/nvme/hw/qedn/qedn_conn.c |   3 +
>   drivers/nvme/hw/qedn/qedn_main.c | 114 +++++++++++++++++++++++--
>   drivers/nvme/hw/qedn/qedn_task.c | 138 +++++++++++++++++++++++++++++++
>   5 files changed, 278 insertions(+), 8 deletions(-)
>   create mode 100644 drivers/nvme/hw/qedn/qedn_task.c
> 
> diff --git a/drivers/nvme/hw/qedn/Makefile b/drivers/nvme/hw/qedn/Makefile
> index d8b343afcd16..c7d838a61ae6 100644
> --- a/drivers/nvme/hw/qedn/Makefile
> +++ b/drivers/nvme/hw/qedn/Makefile
> @@ -1,4 +1,4 @@
>   # SPDX-License-Identifier: GPL-2.0-only
>   
>   obj-$(CONFIG_NVME_QEDN) += qedn.o
> -qedn-y := qedn_main.o qedn_conn.o
> +qedn-y := qedn_main.o qedn_conn.o qedn_task.o
> \ No newline at end of file
> diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
> index c15cac37ec1e..bd9a250cb2f5 100644
> --- a/drivers/nvme/hw/qedn/qedn.h
> +++ b/drivers/nvme/hw/qedn/qedn.h
> @@ -47,6 +47,9 @@
>   #define QEDN_NON_ABORTIVE_TERMINATION 0
>   #define QEDN_ABORTIVE_TERMINATION 1
>   
> +#define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"
> +#define QEDN_NVME_REQ_FP_WQ_WORKQUEUE "qedn_nvme_req_fp_wq"
> +
>   /*
>    * TCP offload stack default configurations and defines.
>    * Future enhancements will allow controlling the configurable
> @@ -100,6 +103,7 @@ struct qedn_fp_queue {
>   	struct qedn_ctx	*qedn;
>   	struct qed_sb_info *sb_info;
>   	unsigned int cpu;
> +	struct work_struct fw_cq_fp_wq_entry;
>   	u16 sb_id;
>   	char irqname[QEDN_IRQ_NAME_LEN];
>   };
> @@ -131,6 +135,8 @@ struct qedn_ctx {
>   	struct qedn_fp_queue *fp_q_arr;
>   	struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;
>   	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
> +	struct workqueue_struct *nvme_req_fp_wq;
> +	struct workqueue_struct *fw_cq_fp_wq;
>   };
>   
>   struct qedn_endpoint {
> @@ -213,6 +219,25 @@ struct qedn_ctrl {
>   
>   /* Connection level struct */
>   struct qedn_conn_ctx {
> +	/* IO path */
> +	struct workqueue_struct	*nvme_req_fp_wq; /* ptr to qedn->nvme_req_fp_wq */
> +	struct nvme_tcp_ofld_req *req; /* currently proccessed request */
> +
> +	struct list_head host_pend_req_list;
> +	/* Spinlock to access pending request list */
> +	spinlock_t nvme_req_lock;
> +	unsigned int cpu;
> +
> +	/* Entry for registering to nvme_req_fp_wq */
> +	struct work_struct nvme_req_fp_wq_entry;
> +	/*
> +	 * Spinlock for accessing qedn_process_req as it can be called
> +	 * from multiple place like queue_rq, async, self requeued
> +	 */
> +	struct mutex nvme_req_mutex;
> +	struct qedn_fp_queue *fp_q;
> +	int qid;
> +
>   	struct qedn_ctx *qedn;
>   	struct nvme_tcp_ofld_queue *queue;
>   	struct nvme_tcp_ofld_ctrl *ctrl;
> @@ -280,5 +305,9 @@ int qedn_wait_for_conn_est(struct qedn_conn_ctx *conn_ctx);
>   int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state);
>   void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx, int abrt_flag);
>   __be16 qedn_get_in_port(struct sockaddr_storage *sa);
> +inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 cccid);
> +void qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req);
> +void qedn_nvme_req_fp_wq_handler(struct work_struct *work);
> +void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe);
>   
>   #endif /* _QEDN_H_ */
> diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
> index 9bfc0a5f0cdb..90d8aa36d219 100644
> --- a/drivers/nvme/hw/qedn/qedn_conn.c
> +++ b/drivers/nvme/hw/qedn/qedn_conn.c
> @@ -385,6 +385,9 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
>   	}
>   
>   	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
> +	INIT_LIST_HEAD(&conn_ctx->host_pend_req_list);
> +	spin_lock_init(&conn_ctx->nvme_req_lock);
> +
>   	rc = qed_ops->acquire_conn(qedn->cdev,
>   				   &conn_ctx->conn_handle,
>   				   &conn_ctx->fw_cid,
> diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
> index 8b5714e7e2bb..38f23dbb03a5 100644
> --- a/drivers/nvme/hw/qedn/qedn_main.c
> +++ b/drivers/nvme/hw/qedn/qedn_main.c
> @@ -267,6 +267,18 @@ static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)
>   	return 0;
>   }
>   
> +static void qedn_set_ctrl_io_cpus(struct qedn_conn_ctx *conn_ctx, int qid)
> +{
> +	struct qedn_ctx *qedn = conn_ctx->qedn;
> +	struct qedn_fp_queue *fp_q = NULL;
> +	int index;
> +
> +	index = qid ? (qid - 1) % qedn->num_fw_cqs : 0;
> +	fp_q = &qedn->fp_q_arr[index];
> +
> +	conn_ctx->cpu = fp_q->cpu;
> +}
> +
>   static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t q_size)
>   {
>   	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
> @@ -288,6 +300,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t
>   	conn_ctx->queue = queue;
>   	conn_ctx->ctrl = ctrl;
>   	conn_ctx->sq_depth = q_size;
> +	qedn_set_ctrl_io_cpus(conn_ctx, qid);
>   
>   	init_waitqueue_head(&conn_ctx->conn_waitq);
>   	atomic_set(&conn_ctx->est_conn_indicator, 0);
> @@ -295,6 +308,10 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t
>   
>   	spin_lock_init(&conn_ctx->conn_state_lock);
>   
> +	INIT_WORK(&conn_ctx->nvme_req_fp_wq_entry, qedn_nvme_req_fp_wq_handler);
> +	conn_ctx->nvme_req_fp_wq = qedn->nvme_req_fp_wq;
> +	conn_ctx->qid = qid;
> +
>   	qedn_initialize_endpoint(&conn_ctx->ep, qedn->local_mac_addr,
>   				 &ctrl->conn_params);
>   
> @@ -356,6 +373,7 @@ static void qedn_destroy_queue(struct nvme_tcp_ofld_queue *queue)
>   	if (!conn_ctx)
>   		return;
>   
> +	cancel_work_sync(&conn_ctx->nvme_req_fp_wq_entry);
>   	qedn_terminate_connection(conn_ctx, QEDN_ABORTIVE_TERMINATION);
>   
>   	qedn_queue_wait_for_terminate_complete(conn_ctx);
> @@ -385,12 +403,24 @@ static int qedn_init_req(struct nvme_tcp_ofld_req *req)
>   
>   static void qedn_commit_rqs(struct nvme_tcp_ofld_queue *queue)
>   {
> -	/* Placeholder - queue work */
> +	struct qedn_conn_ctx *conn_ctx;
> +
> +	conn_ctx = (struct qedn_conn_ctx *)queue->private_data;
> +
> +	if (!list_empty(&conn_ctx->host_pend_req_list))
> +		queue_work_on(conn_ctx->cpu, conn_ctx->nvme_req_fp_wq,
> +			      &conn_ctx->nvme_req_fp_wq_entry);
>   }
>   
>   static int qedn_send_req(struct nvme_tcp_ofld_req *req)
>   {
> -	/* Placeholder - qedn_send_req */
> +	struct qedn_conn_ctx *qedn_conn = (struct qedn_conn_ctx *)req->queue->private_data;
> +
> +	/* Under the assumption that the cccid/tag will be in the range of 0 to sq_depth-1. */
> +	if (!req->async && qedn_validate_cccid_in_range(qedn_conn, req->rq->tag))
> +		return BLK_STS_NOTSUPP;
> +
> +	qedn_queue_request(qedn_conn, req);
>   
>   	return 0;
>   }
> @@ -434,9 +464,59 @@ struct qedn_conn_ctx *qedn_get_conn_hash(struct qedn_ctx *qedn, u16 icid)
>   }
>   
>   /* Fastpath IRQ handler */
> +void qedn_fw_cq_fp_handler(struct qedn_fp_queue *fp_q)
> +{
> +	u16 sb_id, cq_prod_idx, cq_cons_idx;
> +	struct qedn_ctx *qedn = fp_q->qedn;
> +	struct nvmetcp_fw_cqe *cqe = NULL;
> +
> +	sb_id = fp_q->sb_id;
> +	qed_sb_update_sb_idx(fp_q->sb_info);
> +
> +	/* rmb - to prevent missing new cqes */
> +	rmb();
> +
> +	/* Read the latest cq_prod from the SB */
> +	cq_prod_idx = *fp_q->cq_prod;
> +	cq_cons_idx = qed_chain_get_cons_idx(&fp_q->cq_chain);
> +
> +	while (cq_cons_idx != cq_prod_idx) {
> +		cqe = qed_chain_consume(&fp_q->cq_chain);
> +		if (likely(cqe))
> +			qedn_io_work_cq(qedn, cqe);
> +		else
> +			pr_err("Failed consuming cqe\n");
> +
> +		cq_cons_idx = qed_chain_get_cons_idx(&fp_q->cq_chain);
> +
> +		/* Check if new completions were posted */
> +		if (unlikely(cq_prod_idx == cq_cons_idx)) {
> +			/* rmb - to prevent missing new cqes */
> +			rmb();
> +
> +			/* Update the latest cq_prod from the SB */
> +			cq_prod_idx = *fp_q->cq_prod;
> +		}
> +	}
> +}
> +
> +static void qedn_fw_cq_fq_wq_handler(struct work_struct *work)
> +{
> +	struct qedn_fp_queue *fp_q = container_of(work, struct qedn_fp_queue, fw_cq_fp_wq_entry);
> +
> +	qedn_fw_cq_fp_handler(fp_q);
> +	qed_sb_ack(fp_q->sb_info, IGU_INT_ENABLE, 1);
> +}
> +
>   static irqreturn_t qedn_irq_handler(int irq, void *dev_id)
>   {
> -	/* Placeholder */
> +	struct qedn_fp_queue *fp_q = dev_id;
> +	struct qedn_ctx *qedn = fp_q->qedn;
> +
> +	fp_q->cpu = smp_processor_id();
> +
> +	qed_sb_ack(fp_q->sb_info, IGU_INT_DISABLE, 0);
> +	queue_work_on(fp_q->cpu, qedn->fw_cq_fp_wq, &fp_q->fw_cq_fp_wq_entry);
>   
>   	return IRQ_HANDLED;
>   }
> @@ -584,6 +664,11 @@ static void qedn_free_function_queues(struct qedn_ctx *qedn)
>   	int i;
>   
>   	/* Free workqueues */
> +	destroy_workqueue(qedn->fw_cq_fp_wq);
> +	qedn->fw_cq_fp_wq = NULL;
> +
> +	destroy_workqueue(qedn->nvme_req_fp_wq);
> +	qedn->nvme_req_fp_wq = NULL;
>   
>   	/* Free the fast path queues*/
>   	for (i = 0; i < qedn->num_fw_cqs; i++) {
> @@ -651,7 +736,23 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
>   	u64 cq_phy_addr;
>   	int i;
>   
> -	/* Place holder - IO-path workqueues */
> +	qedn->fw_cq_fp_wq = alloc_workqueue(QEDN_FW_CQ_FP_WQ_WORKQUEUE,
> +					    WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
> +	if (!qedn->fw_cq_fp_wq) {
> +		rc = -ENODEV;
> +		pr_err("Unable to create fastpath FW CQ workqueue!\n");
> +
> +		return rc;
> +	}
> +
> +	qedn->nvme_req_fp_wq = alloc_workqueue(QEDN_NVME_REQ_FP_WQ_WORKQUEUE,
> +					       WQ_HIGHPRI | WQ_MEM_RECLAIM, 1);
> +	if (!qedn->nvme_req_fp_wq) {
> +		rc = -ENODEV;
> +		pr_err("Unable to create fastpath qedn nvme workqueue!\n");
> +
> +		return rc;
> +	}
>   
>   	qedn->fp_q_arr = kcalloc(qedn->num_fw_cqs,
>   				 sizeof(struct qedn_fp_queue), GFP_KERNEL);

Why don't you use threaded interrupts if you're spinning off a workqueue 
for handling interrupts anyway?

> @@ -679,7 +780,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
>   		chain_params.mode = QED_CHAIN_MODE_PBL,
>   		chain_params.cnt_type = QED_CHAIN_CNT_TYPE_U16,
>   		chain_params.num_elems = QEDN_FW_CQ_SIZE;
> -		chain_params.elem_size = 64; /*Placeholder - sizeof(struct nvmetcp_fw_cqe)*/
> +		chain_params.elem_size = sizeof(struct nvmetcp_fw_cqe);
>   
>   		rc = qed_ops->common->chain_alloc(qedn->cdev,
>   						  &fp_q->cq_chain,
> @@ -708,8 +809,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
>   		sb = fp_q->sb_info->sb_virt;
>   		fp_q->cq_prod = (u16 *)&sb->pi_array[QEDN_PROTO_CQ_PROD_IDX];
>   		fp_q->qedn = qedn;
> -
> -		/* Placeholder - Init IO-path workqueue */
> +		INIT_WORK(&fp_q->fw_cq_fp_wq_entry, qedn_fw_cq_fq_wq_handler);
>   
>   		/* Placeholder - Init IO-path resources */
>   	}
> diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
> new file mode 100644
> index 000000000000..d3474188efdc
> --- /dev/null
> +++ b/drivers/nvme/hw/qedn/qedn_task.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2021 Marvell. All rights reserved.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> + /* Kernel includes */
> +#include <linux/kernel.h>
> +
> +/* Driver includes */
> +#include "qedn.h"
> +
> +inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 cccid)
> +{
> +	int rc = 0;
> +
> +	if (unlikely(cccid >= conn_ctx->sq_depth)) {
> +		pr_err("cccid 0x%x out of range ( > sq depth)\n", cccid);
> +		rc = -EINVAL;
> +	}
> +
> +	return rc;
> +}
> +
> +static bool qedn_process_req(struct qedn_conn_ctx *qedn_conn)
> +{
> +	return true;
> +}
> +
> +/* The WQ handler can be call from 3 flows:
> + *	1. queue_rq.
> + *	2. async.
> + *	3. self requeued
> + * Try to send requests from the pending list. If a request proccess has failed,
> + * re-register to the workqueue.
> + * If there are no additional pending requests - exit the handler.
> + */
> +void qedn_nvme_req_fp_wq_handler(struct work_struct *work)
> +{
> +	struct qedn_conn_ctx *qedn_conn;
> +	bool more = false;
> +
> +	qedn_conn = container_of(work, struct qedn_conn_ctx, nvme_req_fp_wq_entry);
> +	do {
> +		if (mutex_trylock(&qedn_conn->nvme_req_mutex)) {
> +			more = qedn_process_req(qedn_conn);
> +			qedn_conn->req = NULL;
> +			mutex_unlock(&qedn_conn->nvme_req_mutex);
> +		}
> +	} while (more);
> +
> +	if (!list_empty(&qedn_conn->host_pend_req_list))
> +		queue_work_on(qedn_conn->cpu, qedn_conn->nvme_req_fp_wq,
> +			      &qedn_conn->nvme_req_fp_wq_entry);
> +}
> +
> +void qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req)
> +{
> +	bool empty, res = false;
> +
> +	spin_lock(&qedn_conn->nvme_req_lock);
> +	empty = list_empty(&qedn_conn->host_pend_req_list) && !qedn_conn->req;
> +	list_add_tail(&req->queue_entry, &qedn_conn->host_pend_req_list);
> +	spin_unlock(&qedn_conn->nvme_req_lock);
> +
> +	/* attempt workqueue bypass */
> +	if (qedn_conn->cpu == smp_processor_id() && empty &&
> +	    mutex_trylock(&qedn_conn->nvme_req_mutex)) {
> +		res = qedn_process_req(qedn_conn);
> +		qedn_conn->req = NULL;
> +		mutex_unlock(&qedn_conn->nvme_req_mutex);
> +		if (res || list_empty(&qedn_conn->host_pend_req_list))
> +			return;
> +	} else if (req->last) {
> +		queue_work_on(qedn_conn->cpu, qedn_conn->nvme_req_fp_wq,
> +			      &qedn_conn->nvme_req_fp_wq_entry);
> +	}
> +}
> +

Queueing a request?
Does wonders for your latency ... Can't you do without?

> +struct qedn_task_ctx *qedn_cqe_get_active_task(struct nvmetcp_fw_cqe *cqe)
> +{
> +	struct regpair *p = &cqe->task_opaque;
> +
> +	return (struct qedn_task_ctx *)((((u64)(le32_to_cpu(p->hi)) << 32)
> +					+ le32_to_cpu(p->lo)));
> +}
> +
> +void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
> +{
> +	struct qedn_task_ctx *qedn_task = NULL;
> +	struct qedn_conn_ctx *conn_ctx = NULL;
> +	u16 itid;
> +	u32 cid;
> +
> +	conn_ctx = qedn_get_conn_hash(qedn, le16_to_cpu(cqe->conn_id));
> +	if (unlikely(!conn_ctx)) {
> +		pr_err("CID 0x%x: Failed to fetch conn_ctx from hash\n",
> +		       le16_to_cpu(cqe->conn_id));
> +
> +		return;
> +	}
> +
> +	cid = conn_ctx->fw_cid;
> +	itid = le16_to_cpu(cqe->itid);
> +	qedn_task = qedn_cqe_get_active_task(cqe);
> +	if (unlikely(!qedn_task))
> +		return;
> +
> +	if (likely(cqe->cqe_type == NVMETCP_FW_CQE_TYPE_NORMAL)) {
> +		/* Placeholder - verify the connection was established */
> +
> +		switch (cqe->task_type) {
> +		case NVMETCP_TASK_TYPE_HOST_WRITE:
> +		case NVMETCP_TASK_TYPE_HOST_READ:
> +
> +			/* Placeholder - IO flow */
> +
> +			break;
> +
> +		case NVMETCP_TASK_TYPE_HOST_READ_NO_CQE:
> +
> +			/* Placeholder - IO flow */
> +
> +			break;
> +
> +		case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST:
> +
> +			/* Placeholder - ICReq flow */
> +
> +			break;
> +		default:
> +			pr_info("Could not identify task type\n");
> +		}
> +	} else {
> +		/* Placeholder - Recovery flows */
> +	}
> +}
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 23/27] qedn: Add support of Task and SGL
  2021-04-29 19:09 ` [RFC PATCH v4 23/27] qedn: Add support of Task and SGL Shai Malin
@ 2021-05-02 11:48   ` Hannes Reinecke
  2021-05-07 14:00     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:48 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Prabhakar Kushwaha <pkushwaha@marvell.com>
> 
> This patch will add support of Task and SGL which is used
> for slowpath and fast path IO. here Task is IO granule used
> by firmware to perform tasks
> 
> The internal implementation:
> - Create task/sgl resources used by all connection
> - Provide APIs to allocate and free task.
> - Add task support during connection establishment i.e. slowpath
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/qedn/qedn.h      |  66 +++++
>   drivers/nvme/hw/qedn/qedn_conn.c |  43 +++-
>   drivers/nvme/hw/qedn/qedn_main.c |  34 ++-
>   drivers/nvme/hw/qedn/qedn_task.c | 411 +++++++++++++++++++++++++++++++
>   4 files changed, 550 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
> index bd9a250cb2f5..880ca245b02c 100644
> --- a/drivers/nvme/hw/qedn/qedn.h
> +++ b/drivers/nvme/hw/qedn/qedn.h
> @@ -50,6 +50,21 @@
>   #define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"
>   #define QEDN_NVME_REQ_FP_WQ_WORKQUEUE "qedn_nvme_req_fp_wq"
>   
> +/* Protocol defines */
> +#define QEDN_MAX_IO_SIZE QED_NVMETCP_MAX_IO_SIZE
> +
> +#define QEDN_SGE_BUFF_SIZE 4096
> +#define QEDN_MAX_SGES_PER_TASK DIV_ROUND_UP(QEDN_MAX_IO_SIZE, QEDN_SGE_BUFF_SIZE)
> +#define QEDN_FW_SGE_SIZE sizeof(struct nvmetcp_sge)
> +#define QEDN_MAX_FW_SGL_SIZE ((QEDN_MAX_SGES_PER_TASK) * QEDN_FW_SGE_SIZE)
> +#define QEDN_FW_SLOW_IO_MIN_SGE_LIMIT (9700 / 6)
> +
> +#define QEDN_MAX_HW_SECTORS (QEDN_MAX_IO_SIZE / 512)
> +#define QEDN_MAX_SEGMENTS QEDN_MAX_SGES_PER_TASK
> +
> +#define QEDN_TASK_INSIST_TMO 1000 /* 1 sec */
> +#define QEDN_INVALID_ITID 0xFFFF
> +
>   /*
>    * TCP offload stack default configurations and defines.
>    * Future enhancements will allow controlling the configurable
> @@ -95,6 +110,15 @@ enum qedn_state {
>   	QEDN_STATE_MODULE_REMOVE_ONGOING,
>   };
>   
> +struct qedn_io_resources {
> +	/* Lock for IO resources */
> +	spinlock_t resources_lock;
> +	struct list_head task_free_list;
> +	u32 num_alloc_tasks;
> +	u32 num_free_tasks;
> +	u32 no_avail_resrc_cnt;
> +};
> +
>   /* Per CPU core params */
>   struct qedn_fp_queue {
>   	struct qed_chain cq_chain;
> @@ -104,6 +128,10 @@ struct qedn_fp_queue {
>   	struct qed_sb_info *sb_info;
>   	unsigned int cpu;
>   	struct work_struct fw_cq_fp_wq_entry;
> +
> +	/* IO related resources for host */
> +	struct qedn_io_resources host_resrc;
> +
>   	u16 sb_id;
>   	char irqname[QEDN_IRQ_NAME_LEN];
>   };
> @@ -130,6 +158,8 @@ struct qedn_ctx {
>   	/* Connections */
>   	DECLARE_HASHTABLE(conn_ctx_hash, 16);
>   
> +	u32 num_tasks_per_pool;
> +
>   	/* Fast path queues */
>   	u8 num_fw_cqs;
>   	struct qedn_fp_queue *fp_q_arr;
> @@ -137,6 +167,27 @@ struct qedn_ctx {
>   	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */
>   	struct workqueue_struct *nvme_req_fp_wq;
>   	struct workqueue_struct *fw_cq_fp_wq;
> +
> +	/* Fast Path Tasks */
> +	struct qed_nvmetcp_tid	tasks;
> +};
> +
> +struct qedn_task_ctx {
> +	struct qedn_conn_ctx *qedn_conn;
> +	struct qedn_ctx *qedn;
> +	void *fw_task_ctx;
> +	struct qedn_fp_queue *fp_q;
> +	struct scatterlist *nvme_sg;
> +	struct nvme_tcp_ofld_req *req; /* currently proccessed request */
> +	struct list_head entry;
> +	spinlock_t lock; /* To protect task resources */
> +	bool valid;
> +	unsigned long flags; /* Used by qedn_task_flags */
> +	u32 task_size;
> +	u16 itid;
> +	u16 cccid;
> +	int req_direction;
> +	struct storage_sgl_task_params sgl_task_params;
>   };
>   
>   struct qedn_endpoint {
> @@ -243,6 +294,7 @@ struct qedn_conn_ctx {
>   	struct nvme_tcp_ofld_ctrl *ctrl;
>   	u32 conn_handle;
>   	u32 fw_cid;
> +	u8 default_cq;
>   
>   	atomic_t est_conn_indicator;
>   	atomic_t destroy_conn_indicator;
> @@ -260,6 +312,11 @@ struct qedn_conn_ctx {
>   	dma_addr_t host_cccid_itid_phy_addr;
>   	struct qedn_endpoint ep;
>   	int abrt_flag;
> +	/* Spinlock for accessing active_task_list */
> +	spinlock_t task_list_lock;
> +	struct list_head active_task_list;
> +	atomic_t num_active_tasks;
> +	atomic_t num_active_fw_tasks;
>   
>   	/* Connection resources - turned on to indicate what resource was
>   	 * allocated, to that it can later be released.
> @@ -279,6 +336,7 @@ struct qedn_conn_ctx {
>   enum qedn_conn_resources_state {
>   	QEDN_CONN_RESRC_FW_SQ,
>   	QEDN_CONN_RESRC_ACQUIRE_CONN,
> +	QEDN_CONN_RESRC_TASKS,
>   	QEDN_CONN_RESRC_CCCID_ITID_MAP,
>   	QEDN_CONN_RESRC_TCP_PORT,
>   	QEDN_CONN_RESRC_MAX = 64
> @@ -309,5 +367,13 @@ inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 ccci
>   void qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req);
>   void qedn_nvme_req_fp_wq_handler(struct work_struct *work);
>   void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe);
> +int qedn_alloc_tasks(struct qedn_conn_ctx *conn_ctx);
> +inline int qedn_qid(struct nvme_tcp_ofld_queue *queue);
> +struct qedn_task_ctx *
> +	qedn_get_task_from_pool_insist(struct qedn_conn_ctx *conn_ctx, u16 cccid);
> +void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params);
> +void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx);
> +void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,
> +			     struct qedn_io_resources *io_resrc);
>   
>   #endif /* _QEDN_H_ */
> diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
> index 90d8aa36d219..10a80fbeac43 100644
> --- a/drivers/nvme/hw/qedn/qedn_conn.c
> +++ b/drivers/nvme/hw/qedn/qedn_conn.c
> @@ -29,6 +29,11 @@ static const char * const qedn_conn_state_str[] = {
>   	NULL
>   };
>   
> +inline int qedn_qid(struct nvme_tcp_ofld_queue *queue)
> +{
> +	return queue - queue->ctrl->queues;
> +}
> +
>   int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state)
>   {
>   	spin_lock_bh(&conn_ctx->conn_state_lock);
> @@ -146,6 +151,11 @@ static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
>   		clear_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
>   	}
>   
> +	if (test_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state)) {
> +		clear_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);
> +			qedn_return_active_tasks(conn_ctx);
> +	}
> +
>   	if (test_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state)) {
>   		dma_free_coherent(&qedn->pdev->dev,
>   				  conn_ctx->sq_depth *
> @@ -247,6 +257,7 @@ static int qedn_nvmetcp_offload_conn(struct qedn_conn_ctx *conn_ctx)
>   	offld_prms.max_rt_time = QEDN_TCP_MAX_RT_TIME;
>   	offld_prms.sq_pbl_addr =
>   		(u64)qed_chain_get_pbl_phys(&qedn_ep->fw_sq_chain);
> +	offld_prms.default_cq = conn_ctx->default_cq;
>   
>   	rc = qed_ops->offload_conn(qedn->cdev,
>   				   conn_ctx->conn_handle,
> @@ -375,6 +386,9 @@ int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
>   static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
>   {
>   	struct qedn_ctx *qedn = conn_ctx->qedn;
> +	struct qedn_io_resources *io_resrc;
> +	struct qedn_fp_queue *fp_q;
> +	u8 default_cq_idx, qid;
>   	size_t dma_size;
>   	int rc;
>   
> @@ -387,6 +401,8 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
>   	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
>   	INIT_LIST_HEAD(&conn_ctx->host_pend_req_list);
>   	spin_lock_init(&conn_ctx->nvme_req_lock);
> +	atomic_set(&conn_ctx->num_active_tasks, 0);
> +	atomic_set(&conn_ctx->num_active_fw_tasks, 0);
>   
>   	rc = qed_ops->acquire_conn(qedn->cdev,
>   				   &conn_ctx->conn_handle,
> @@ -401,7 +417,32 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
>   		 conn_ctx->conn_handle);
>   	set_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
>   
> -	/* Placeholder - Allocate task resources and initialize fields */
> +	qid = qedn_qid(conn_ctx->queue);
> +	default_cq_idx = qid ? qid - 1 : 0; /* Offset adminq */
> +
> +	conn_ctx->default_cq = (default_cq_idx % qedn->num_fw_cqs);
> +	fp_q = &qedn->fp_q_arr[conn_ctx->default_cq];
> +	conn_ctx->fp_q = fp_q;
> +	io_resrc = &fp_q->host_resrc;
> +
> +	/* The first connection on each fp_q will fill task
> +	 * resources
> +	 */
> +	spin_lock(&io_resrc->resources_lock);
> +	if (io_resrc->num_alloc_tasks == 0) {
> +		rc = qedn_alloc_tasks(conn_ctx);
> +		if (rc) {
> +			pr_err("Failed allocating tasks: CID=0x%x\n",
> +			       conn_ctx->fw_cid);
> +			spin_unlock(&io_resrc->resources_lock);
> +			goto rel_conn;
> +		}
> +	}
> +	spin_unlock(&io_resrc->resources_lock);
> +
> +	spin_lock_init(&conn_ctx->task_list_lock);
> +	INIT_LIST_HEAD(&conn_ctx->active_task_list);
> +	set_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);
>    >   	rc = qedn_fetch_tcp_port(conn_ctx);
>   	if (rc)
> diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
> index 38f23dbb03a5..8d9c19d63480 100644
> --- a/drivers/nvme/hw/qedn/qedn_main.c
> +++ b/drivers/nvme/hw/qedn/qedn_main.c
> @@ -30,6 +30,12 @@ __be16 qedn_get_in_port(struct sockaddr_storage *sa)
>   		: ((struct sockaddr_in6 *)sa)->sin6_port;
>   }
>   
> +static void qedn_init_io_resc(struct qedn_io_resources *io_resrc)
> +{
> +	spin_lock_init(&io_resrc->resources_lock);
> +	INIT_LIST_HEAD(&io_resrc->task_free_list);
> +}
> +
>   struct qedn_llh_filter *qedn_add_llh_filter(struct qedn_ctx *qedn, u16 tcp_port)
>   {
>   	struct qedn_llh_filter *llh_filter = NULL;
> @@ -436,6 +442,8 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {
>   		 *	NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
>   		 *	NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS
>   		 */
> +	.max_hw_sectors = QEDN_MAX_HW_SECTORS,
> +	.max_segments = QEDN_MAX_SEGMENTS,
>   	.claim_dev = qedn_claim_dev,
>   	.setup_ctrl = qedn_setup_ctrl,
>   	.release_ctrl = qedn_release_ctrl,
> @@ -657,8 +665,24 @@ static void qedn_remove_pf_from_gl_list(struct qedn_ctx *qedn)
>   	mutex_unlock(&qedn_glb.glb_mutex);
>   }
>   
> +static void qedn_call_destroy_free_tasks(struct qedn_fp_queue *fp_q,
> +					 struct qedn_io_resources *io_resrc)
> +{
> +	if (list_empty(&io_resrc->task_free_list))
> +		return;
> +
> +	if (io_resrc->num_alloc_tasks != io_resrc->num_free_tasks)
> +		pr_err("Task Pool:Not all returned allocated=0x%x, free=0x%x\n",
> +		       io_resrc->num_alloc_tasks, io_resrc->num_free_tasks);
> +
> +	qedn_destroy_free_tasks(fp_q, io_resrc);
> +	if (io_resrc->num_free_tasks)
> +		pr_err("Expected num_free_tasks to be 0\n");
> +}
> +
>   static void qedn_free_function_queues(struct qedn_ctx *qedn)
>   {
> +	struct qedn_io_resources *host_resrc;
>   	struct qed_sb_info *sb_info = NULL;
>   	struct qedn_fp_queue *fp_q;
>   	int i;
> @@ -673,6 +697,9 @@ static void qedn_free_function_queues(struct qedn_ctx *qedn)
>   	/* Free the fast path queues*/
>   	for (i = 0; i < qedn->num_fw_cqs; i++) {
>   		fp_q = &qedn->fp_q_arr[i];
> +		host_resrc = &fp_q->host_resrc;
> +
> +		qedn_call_destroy_free_tasks(fp_q, host_resrc);
>   
>   		/* Free SB */
>   		sb_info = fp_q->sb_info;
> @@ -769,7 +796,8 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
>   		goto mem_alloc_failure;
>   	}
>   
> -	/* placeholder - create task pools */
> +	qedn->num_tasks_per_pool =
> +		qedn->pf_params.nvmetcp_pf_params.num_tasks / qedn->num_fw_cqs;
>   
>   	for (i = 0; i < qedn->num_fw_cqs; i++) {
>   		fp_q = &qedn->fp_q_arr[i];
> @@ -811,7 +839,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)
>   		fp_q->qedn = qedn;
>   		INIT_WORK(&fp_q->fw_cq_fp_wq_entry, qedn_fw_cq_fq_wq_handler);
>   
> -		/* Placeholder - Init IO-path resources */
> +		qedn_init_io_resc(&fp_q->host_resrc);
>   	}
>   
>   	return 0;
> @@ -1005,7 +1033,7 @@ static int __qedn_probe(struct pci_dev *pdev)
>   
>   	/* NVMeTCP start HW PF */
>   	rc = qed_ops->start(qedn->cdev,
> -			    NULL /* Placeholder for FW IO-path resources */,
> +			    &qedn->tasks,
>   			    qedn,
>   			    qedn_event_cb);
>   	if (rc) {
> diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
> index d3474188efdc..54f2f4cba6ea 100644
> --- a/drivers/nvme/hw/qedn/qedn_task.c
> +++ b/drivers/nvme/hw/qedn/qedn_task.c
> @@ -11,6 +11,263 @@
>   /* Driver includes */
>   #include "qedn.h"
>   
> +static bool qedn_sgl_has_small_mid_sge(struct nvmetcp_sge *sgl, u16 sge_count)
> +{
> +	u16 sge_num;
> +
> +	if (sge_count > 8) {
> +		for (sge_num = 0; sge_num < sge_count; sge_num++) {
> +			if (le32_to_cpu(sgl[sge_num].sge_len) <
> +			    QEDN_FW_SLOW_IO_MIN_SGE_LIMIT)
> +				return true; /* small middle SGE found */
> +		}
> +	}
> +
> +	return false; /* no small middle SGEs */
> +}
> +
> +static int qedn_init_sgl(struct qedn_ctx *qedn, struct qedn_task_ctx *qedn_task)
> +{
> +	struct storage_sgl_task_params *sgl_task_params;
> +	enum dma_data_direction dma_dir;
> +	struct scatterlist *sg;
> +	struct request *rq;
> +	u16 num_sges;
> +	int index;
> +	int rc;
> +
> +	sgl_task_params = &qedn_task->sgl_task_params;
> +	rq = blk_mq_rq_from_pdu(qedn_task->req);
> +	if (qedn_task->task_size == 0) {
> +		sgl_task_params->num_sges = 0;
> +
> +		return 0;
> +	}
> +
> +	/* Convert BIO to scatterlist */
> +	num_sges = blk_rq_map_sg(rq->q, rq, qedn_task->nvme_sg);
> +	if (qedn_task->req_direction == WRITE)
> +		dma_dir = DMA_TO_DEVICE;
> +	else
> +		dma_dir = DMA_FROM_DEVICE;
> +
> +	/* DMA map the scatterlist */
> +	if (dma_map_sg(&qedn->pdev->dev, qedn_task->nvme_sg, num_sges, dma_dir) != num_sges) {
> +		pr_err("Couldn't map sgl\n");
> +		rc = -EPERM;
> +
> +		return rc;
> +	}
> +
> +	sgl_task_params->total_buffer_size = qedn_task->task_size;
> +	sgl_task_params->num_sges = num_sges;
> +
> +	for_each_sg(qedn_task->nvme_sg, sg, num_sges, index) {
> +		DMA_REGPAIR_LE(sgl_task_params->sgl[index].sge_addr, sg_dma_address(sg));
> +		sgl_task_params->sgl[index].sge_len = cpu_to_le32(sg_dma_len(sg));
> +	}
> +
> +	/* Relevant for Host Write Only */
> +	sgl_task_params->small_mid_sge = (qedn_task->req_direction == READ) ?
> +		false :
> +		qedn_sgl_has_small_mid_sge(sgl_task_params->sgl,
> +					   sgl_task_params->num_sges);
> +
> +	return 0;
> +}
> +
> +static void qedn_free_nvme_sg(struct qedn_task_ctx *qedn_task)
> +{
> +	kfree(qedn_task->nvme_sg);
> +	qedn_task->nvme_sg = NULL;
> +}
> +
> +static void qedn_free_fw_sgl(struct qedn_task_ctx *qedn_task)
> +{
> +	struct qedn_ctx *qedn = qedn_task->qedn;
> +	dma_addr_t sgl_pa;
> +
> +	sgl_pa = HILO_DMA_REGPAIR(qedn_task->sgl_task_params.sgl_phys_addr);
> +	dma_free_coherent(&qedn->pdev->dev,
> +			  QEDN_MAX_FW_SGL_SIZE,
> +			  qedn_task->sgl_task_params.sgl,
> +			  sgl_pa);
> +	qedn_task->sgl_task_params.sgl = NULL;
> +}
> +
> +static void qedn_destroy_single_task(struct qedn_task_ctx *qedn_task)
> +{
> +	u16 itid;
> +
> +	itid = qedn_task->itid;
> +	list_del(&qedn_task->entry);
> +	qedn_free_nvme_sg(qedn_task);
> +	qedn_free_fw_sgl(qedn_task);
> +	kfree(qedn_task);
> +	qedn_task = NULL;
> +}
> +
> +void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,
> +			     struct qedn_io_resources *io_resrc)
> +{
> +	struct qedn_task_ctx *qedn_task, *task_tmp;
> +
> +	/* Destroy tasks from the free task list */
> +	list_for_each_entry_safe(qedn_task, task_tmp,
> +				 &io_resrc->task_free_list, entry) {
> +		qedn_destroy_single_task(qedn_task);
> +		io_resrc->num_free_tasks -= 1;
> +	}
> +}
> +
> +static int qedn_alloc_nvme_sg(struct qedn_task_ctx *qedn_task)
> +{
> +	int rc;
> +
> +	qedn_task->nvme_sg = kcalloc(QEDN_MAX_SGES_PER_TASK,
> +				     sizeof(*qedn_task->nvme_sg), GFP_KERNEL);
> +	if (!qedn_task->nvme_sg) {
> +		rc = -ENOMEM;
> +
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +
> +static int qedn_alloc_fw_sgl(struct qedn_task_ctx *qedn_task)
> +{
> +	struct qedn_ctx *qedn = qedn_task->qedn_conn->qedn;
> +	dma_addr_t fw_sgl_phys;
> +
> +	qedn_task->sgl_task_params.sgl =
> +		dma_alloc_coherent(&qedn->pdev->dev, QEDN_MAX_FW_SGL_SIZE,
> +				   &fw_sgl_phys, GFP_KERNEL);
> +	if (!qedn_task->sgl_task_params.sgl) {
> +		pr_err("Couldn't allocate FW sgl\n");
> +
> +		return -ENOMEM;
> +	}
> +
> +	DMA_REGPAIR_LE(qedn_task->sgl_task_params.sgl_phys_addr, fw_sgl_phys);
> +
> +	return 0;
> +}
> +
> +static inline void *qedn_get_fw_task(struct qed_nvmetcp_tid *info, u16 itid)
> +{
> +	return (void *)(info->blocks[itid / info->num_tids_per_block] +
> +			(itid % info->num_tids_per_block) * info->size);
> +}
> +
> +static struct qedn_task_ctx *qedn_alloc_task(struct qedn_conn_ctx *conn_ctx, u16 itid)
> +{
> +	struct qedn_ctx *qedn = conn_ctx->qedn;
> +	struct qedn_task_ctx *qedn_task;
> +	void *fw_task_ctx;
> +	int rc = 0;
> +
> +	qedn_task = kzalloc(sizeof(*qedn_task), GFP_KERNEL);
> +	if (!qedn_task)
> +		return NULL;
> +

As this is a pool, why don't you use mempools here?

> +	spin_lock_init(&qedn_task->lock);
> +	fw_task_ctx = qedn_get_fw_task(&qedn->tasks, itid);
> +	if (!fw_task_ctx) {
> +		pr_err("iTID: 0x%x; Failed getting fw_task_ctx memory\n", itid);
> +		goto release_task;
> +	}
> +
> +	/* No need to memset fw_task_ctx - its done in the HSI func */
> +	qedn_task->qedn_conn = conn_ctx;
> +	qedn_task->qedn = qedn;
> +	qedn_task->fw_task_ctx = fw_task_ctx;
> +	qedn_task->valid = 0;
> +	qedn_task->flags = 0;
> +	qedn_task->itid = itid;
> +	rc = qedn_alloc_fw_sgl(qedn_task);
> +	if (rc) {
> +		pr_err("iTID: 0x%x; Failed allocating FW sgl\n", itid);
> +		goto release_task;
> +	}
> +
> +	rc = qedn_alloc_nvme_sg(qedn_task);
> +	if (rc) {
> +		pr_err("iTID: 0x%x; Failed allocating FW sgl\n", itid);
> +		goto release_fw_sgl;
> +	}
> +
> +	return qedn_task;
> +
> +release_fw_sgl:
> +	qedn_free_fw_sgl(qedn_task);
> +release_task:
> +	kfree(qedn_task);
> +
> +	return NULL;
> +}
> +
> +int qedn_alloc_tasks(struct qedn_conn_ctx *conn_ctx)
> +{
> +	struct qedn_ctx *qedn = conn_ctx->qedn;
> +	struct qedn_task_ctx *qedn_task = NULL;
> +	struct qedn_io_resources *io_resrc;
> +	u16 itid, start_itid, offset;
> +	struct qedn_fp_queue *fp_q;
> +	int i, rc;
> +
> +	fp_q = conn_ctx->fp_q;
> +
> +	offset = fp_q->sb_id;
> +	io_resrc = &fp_q->host_resrc;
> +
> +	start_itid = qedn->num_tasks_per_pool * offset;
> +	for (i = 0; i < qedn->num_tasks_per_pool; ++i) {
> +		itid = start_itid + i;
> +		qedn_task = qedn_alloc_task(conn_ctx, itid);
> +		if (!qedn_task) {
> +			pr_err("Failed allocating task\n");
> +			rc = -ENOMEM;
> +			goto release_tasks;
> +		}
> +
> +		qedn_task->fp_q = fp_q;
> +		io_resrc->num_free_tasks += 1;
> +		list_add_tail(&qedn_task->entry, &io_resrc->task_free_list);
> +	}
> +
> +	io_resrc->num_alloc_tasks = io_resrc->num_free_tasks;
> +
> +	return 0;
> +
> +release_tasks:
> +	qedn_destroy_free_tasks(fp_q, io_resrc);
> +
> +	return rc;
> +}
> +
> +void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params)
> +{
> +	u16 sge_cnt = sgl_task_params->num_sges;
> +
> +	memset(&sgl_task_params->sgl[(sge_cnt - 1)], 0,
> +	       sizeof(struct nvmetcp_sge));
> +	sgl_task_params->total_buffer_size = 0;
> +	sgl_task_params->small_mid_sge = false;
> +	sgl_task_params->num_sges = 0;
> +}
> +
> +inline void qedn_host_reset_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx,
> +					     u16 cccid)
> +{
> +	conn_ctx->host_cccid_itid[cccid].itid = cpu_to_le16(QEDN_INVALID_ITID);
> +}
> +
> +inline void qedn_host_set_cccid_itid_entry(struct qedn_conn_ctx *conn_ctx, u16 cccid, u16 itid)
> +{
> +	conn_ctx->host_cccid_itid[cccid].itid = cpu_to_le16(itid);
> +}
> +
>   inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 cccid)
>   {
>   	int rc = 0;
> @@ -23,6 +280,160 @@ inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 ccci
>   	return rc;
>   }
>   
> +static void qedn_clear_sgl(struct qedn_ctx *qedn,
> +			   struct qedn_task_ctx *qedn_task)
> +{
> +	struct storage_sgl_task_params *sgl_task_params;
> +	enum dma_data_direction dma_dir;
> +	u32 sge_cnt;
> +
> +	sgl_task_params = &qedn_task->sgl_task_params;
> +	sge_cnt = sgl_task_params->num_sges;
> +
> +	/* Nothing to do if no SGEs were used */
> +	if (!qedn_task->task_size || !sge_cnt)
> +		return;
> +
> +	dma_dir = (qedn_task->req_direction == WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
> +	dma_unmap_sg(&qedn->pdev->dev, qedn_task->nvme_sg, sge_cnt, dma_dir);
> +	memset(&qedn_task->nvme_sg[(sge_cnt - 1)], 0, sizeof(struct scatterlist));
> +	qedn_common_clear_fw_sgl(sgl_task_params);
> +	qedn_task->task_size = 0;
> +}
> +
> +static void qedn_clear_task(struct qedn_conn_ctx *conn_ctx,
> +			    struct qedn_task_ctx *qedn_task)
> +{
> +	/* Task lock isn't needed since it is no longer in use */
> +	qedn_clear_sgl(conn_ctx->qedn, qedn_task);
> +	qedn_task->valid = 0;
> +	qedn_task->flags = 0;
> +
> +	atomic_dec(&conn_ctx->num_active_tasks);
> +}
> +
> +void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx)
> +{
> +	struct qedn_fp_queue *fp_q = conn_ctx->fp_q;
> +	struct qedn_task_ctx *qedn_task, *task_tmp;
> +	struct qedn_io_resources *io_resrc;
> +	int num_returned_tasks = 0;
> +	int num_active_tasks;
> +
> +	io_resrc = &fp_q->host_resrc;
> +
> +	/* Return tasks that aren't "Used by FW" to the pool */
> +	list_for_each_entry_safe(qedn_task, task_tmp,
> +				 &conn_ctx->active_task_list, entry) {
> +		qedn_clear_task(conn_ctx, qedn_task);
> +		num_returned_tasks++;
> +	}
> +
> +	if (num_returned_tasks) {
> +		spin_lock(&io_resrc->resources_lock);
> +		/* Return tasks to FP_Q pool in one shot */
> +
> +		list_splice_tail_init(&conn_ctx->active_task_list,
> +				      &io_resrc->task_free_list);
> +		io_resrc->num_free_tasks += num_returned_tasks;
> +		spin_unlock(&io_resrc->resources_lock);
> +	}
> +
> +	num_active_tasks = atomic_read(&conn_ctx->num_active_tasks);
> +	if (num_active_tasks)
> +		pr_err("num_active_tasks is %u after cleanup.\n", num_active_tasks);
> +}
> +
> +void qedn_return_task_to_pool(struct qedn_conn_ctx *conn_ctx,
> +			      struct qedn_task_ctx *qedn_task)
> +{
> +	struct qedn_fp_queue *fp_q = conn_ctx->fp_q;
> +	struct qedn_io_resources *io_resrc;
> +	unsigned long lock_flags;
> +
> +	io_resrc = &fp_q->host_resrc;
> +
> +	spin_lock_irqsave(&qedn_task->lock, lock_flags);
> +	qedn_task->valid = 0;
> +	qedn_task->flags = 0;
> +	qedn_clear_sgl(conn_ctx->qedn, qedn_task);
> +	spin_unlock_irqrestore(&qedn_task->lock, lock_flags);
> +
> +	spin_lock(&conn_ctx->task_list_lock);
> +	list_del(&qedn_task->entry);
> +	qedn_host_reset_cccid_itid_entry(conn_ctx, qedn_task->cccid);
> +	spin_unlock(&conn_ctx->task_list_lock);
> +
> +	atomic_dec(&conn_ctx->num_active_tasks);
> +	atomic_dec(&conn_ctx->num_active_fw_tasks);
> +
> +	spin_lock(&io_resrc->resources_lock);
> +	list_add_tail(&qedn_task->entry, &io_resrc->task_free_list);
> +	io_resrc->num_free_tasks += 1;
> +	spin_unlock(&io_resrc->resources_lock);
> +}
> +
> +struct qedn_task_ctx *
> +qedn_get_free_task_from_pool(struct qedn_conn_ctx *conn_ctx, u16 cccid)
> +{
> +	struct qedn_task_ctx *qedn_task = NULL;
> +	struct qedn_io_resources *io_resrc;
> +	struct qedn_fp_queue *fp_q;
> +
> +	fp_q = conn_ctx->fp_q;
> +	io_resrc = &fp_q->host_resrc;
> +
> +	spin_lock(&io_resrc->resources_lock);
> +	qedn_task = list_first_entry_or_null(&io_resrc->task_free_list,
> +					     struct qedn_task_ctx, entry);
> +	if (unlikely(!qedn_task)) {
> +		spin_unlock(&io_resrc->resources_lock);
> +
> +		return NULL;
> +	}
> +	list_del(&qedn_task->entry);
> +	io_resrc->num_free_tasks -= 1;
> +	spin_unlock(&io_resrc->resources_lock);
> +
> +	spin_lock(&conn_ctx->task_list_lock);
> +	list_add_tail(&qedn_task->entry, &conn_ctx->active_task_list);
> +	qedn_host_set_cccid_itid_entry(conn_ctx, cccid, qedn_task->itid);
> +	spin_unlock(&conn_ctx->task_list_lock);
> +
> +	atomic_inc(&conn_ctx->num_active_tasks);
> +	qedn_task->cccid = cccid;
> +	qedn_task->qedn_conn = conn_ctx;
> +	qedn_task->valid = 1;
> +
> +	return qedn_task;
> +}
> +
> +struct qedn_task_ctx *
> +qedn_get_task_from_pool_insist(struct qedn_conn_ctx *conn_ctx, u16 cccid)
> +{
> +	struct qedn_task_ctx *qedn_task = NULL;
> +	unsigned long timeout;
> +
> +	qedn_task = qedn_get_free_task_from_pool(conn_ctx, cccid);
> +	if (unlikely(!qedn_task)) {
> +		timeout = msecs_to_jiffies(QEDN_TASK_INSIST_TMO) + jiffies;
> +		while (1) {
> +			qedn_task = qedn_get_free_task_from_pool(conn_ctx, cccid);
> +			if (likely(qedn_task))
> +				break;
> +
> +			msleep(100);
> +			if (time_after(jiffies, timeout)) {
> +				pr_err("Failed on timeout of fetching task\n");
> +
> +				return NULL;
> +			}
> +		}
> +	}
> +
> +	return qedn_task;
> +}
> +
>   static bool qedn_process_req(struct qedn_conn_ctx *qedn_conn)
>   {
>   	return true;
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 24/27] qedn: Add support of NVME ICReq & ICResp
  2021-04-29 19:09 ` [RFC PATCH v4 24/27] qedn: Add support of NVME ICReq & ICResp Shai Malin
@ 2021-05-02 11:53   ` Hannes Reinecke
  2021-05-05 18:01     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:53 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Prabhakar Kushwaha <pkushwaha@marvell.com>
> 
> Once a TCP connection established, the host sends an Initialize
> Connection Request (ICReq) PDU to the controller.
> Further Initialize Connection Response (ICResp) PDU received from
> controller is processed by host to establish a connection and
> exchange connection configuration parameters.
> 
> This patch present support of generation of ICReq and processing of
> ICResp. It also update host configuration based on exchanged parameters.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/qedn/qedn.h      |  36 ++++
>   drivers/nvme/hw/qedn/qedn_conn.c | 317 ++++++++++++++++++++++++++++++-
>   drivers/nvme/hw/qedn/qedn_main.c |  22 +++
>   drivers/nvme/hw/qedn/qedn_task.c |   8 +-
>   4 files changed, 379 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h
> index 880ca245b02c..773a57994148 100644
> --- a/drivers/nvme/hw/qedn/qedn.h
> +++ b/drivers/nvme/hw/qedn/qedn.h
> @@ -16,6 +16,7 @@
>   
>   /* Driver includes */
>   #include "../../host/tcp-offload.h"
> +#include <linux/nvme-tcp.h>
>   
>   #define QEDN_MAJOR_VERSION		8
>   #define QEDN_MINOR_VERSION		62
> @@ -52,6 +53,8 @@
>   
>   /* Protocol defines */
>   #define QEDN_MAX_IO_SIZE QED_NVMETCP_MAX_IO_SIZE
> +#define QEDN_MAX_PDU_SIZE 0x80000 /* 512KB */
> +#define QEDN_MAX_OUTSTANDING_R2T_PDUS 0 /* 0 Based == 1 max R2T */
>   
>   #define QEDN_SGE_BUFF_SIZE 4096
>   #define QEDN_MAX_SGES_PER_TASK DIV_ROUND_UP(QEDN_MAX_IO_SIZE, QEDN_SGE_BUFF_SIZE)
> @@ -65,6 +68,11 @@
>   #define QEDN_TASK_INSIST_TMO 1000 /* 1 sec */
>   #define QEDN_INVALID_ITID 0xFFFF
>   
> +#define QEDN_ICREQ_FW_PAYLOAD (sizeof(struct nvme_tcp_icreq_pdu) - \
> +			       sizeof(struct nvmetcp_init_conn_req_hdr))
> +/* The FW will handle the ICReq as CCCID 0 (FW internal design) */
> +#define QEDN_ICREQ_CCCID 0
> +
>   /*
>    * TCP offload stack default configurations and defines.
>    * Future enhancements will allow controlling the configurable
> @@ -136,6 +144,16 @@ struct qedn_fp_queue {
>   	char irqname[QEDN_IRQ_NAME_LEN];
>   };
>   
> +struct qedn_negotiation_params {
> +	u32 maxh2cdata; /* Negotiation */
> +	u32 maxr2t; /* Validation */
> +	u16 pfv; /* Validation */
> +	bool hdr_digest; /* Negotiation */
> +	bool data_digest; /* Negotiation */
> +	u8 cpda; /* Negotiation */
> +	u8 hpda; /* Validation */
> +};
> +
>   struct qedn_ctx {
>   	struct pci_dev *pdev;
>   	struct qed_dev *cdev;
> @@ -195,6 +213,9 @@ struct qedn_endpoint {
>   	struct qed_chain fw_sq_chain;
>   	void __iomem *p_doorbell;
>   
> +	/* Spinlock for accessing FW queue */
> +	spinlock_t doorbell_lock;
> +
>   	/* TCP Params */
>   	__be32 dst_addr[4]; /* In network order */
>   	__be32 src_addr[4]; /* In network order */
> @@ -268,6 +289,12 @@ struct qedn_ctrl {
>   	atomic_t host_num_active_conns;
>   };
>   
> +struct qedn_icreq_padding {
> +	u32 *buffer;
> +	dma_addr_t pa;
> +	struct nvmetcp_sge sge;
> +};
> +
>   /* Connection level struct */
>   struct qedn_conn_ctx {
>   	/* IO path */
> @@ -329,6 +356,11 @@ struct qedn_conn_ctx {
>   
>   	size_t sq_depth;
>   
> +	struct qedn_negotiation_params required_params;
> +	struct qedn_negotiation_params pdu_params;
> +	struct nvmetcp_icresp_hdr_psh icresp;
> +	struct qedn_icreq_padding *icreq_pad;
> +
>   	/* "dummy" socket */
>   	struct socket *sock;
>   };
> @@ -337,6 +369,7 @@ enum qedn_conn_resources_state {
>   	QEDN_CONN_RESRC_FW_SQ,
>   	QEDN_CONN_RESRC_ACQUIRE_CONN,
>   	QEDN_CONN_RESRC_TASKS,
> +	QEDN_CONN_RESRC_ICREQ_PAD,
>   	QEDN_CONN_RESRC_CCCID_ITID_MAP,
>   	QEDN_CONN_RESRC_TCP_PORT,
>   	QEDN_CONN_RESRC_MAX = 64
> @@ -375,5 +408,8 @@ void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params);
>   void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx);
>   void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,
>   			     struct qedn_io_resources *io_resrc);
> +void qedn_swap_bytes(u32 *p, int size);
> +void qedn_prep_icresp(struct qedn_conn_ctx *conn_ctx, struct nvmetcp_fw_cqe *cqe);
> +void qedn_ring_doorbell(struct qedn_conn_ctx *conn_ctx);
>   
>   #endif /* _QEDN_H_ */
> diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c
> index 10a80fbeac43..5679354aa0e0 100644
> --- a/drivers/nvme/hw/qedn/qedn_conn.c
> +++ b/drivers/nvme/hw/qedn/qedn_conn.c
> @@ -34,6 +34,25 @@ inline int qedn_qid(struct nvme_tcp_ofld_queue *queue)
>   	return queue - queue->ctrl->queues;
>   }
>   
> +void qedn_ring_doorbell(struct qedn_conn_ctx *conn_ctx)
> +{
> +	struct nvmetcp_db_data dbell = { 0 };
> +	u16 prod_idx;
> +
> +	dbell.agg_flags = 0;
> +	dbell.params |= DB_DEST_XCM << NVMETCP_DB_DATA_DEST_SHIFT;
> +	dbell.params |= DB_AGG_CMD_SET << NVMETCP_DB_DATA_AGG_CMD_SHIFT;
> +	dbell.params |=
> +		DQ_XCM_ISCSI_SQ_PROD_CMD << NVMETCP_DB_DATA_AGG_VAL_SEL_SHIFT;
> +	dbell.params |= 1 << NVMETCP_DB_DATA_BYPASS_EN_SHIFT;
> +	prod_idx = qed_chain_get_prod_idx(&conn_ctx->ep.fw_sq_chain);
> +	dbell.sq_prod = cpu_to_le16(prod_idx);
> +
> +	/* wmb - Make sure fw idx is coherent */
> +	wmb();
> +	writel(*(u32 *)&dbell, conn_ctx->ep.p_doorbell);
> +}
> +
>   int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state)
>   {
>   	spin_lock_bh(&conn_ctx->conn_state_lock);
> @@ -130,6 +149,71 @@ int qedn_initialize_endpoint(struct qedn_endpoint *ep, u8 *local_mac_addr,
>   	return -1;
>   }
>   
> +static int qedn_alloc_icreq_pad(struct qedn_conn_ctx *conn_ctx)
> +{
> +	struct qedn_ctx *qedn = conn_ctx->qedn;
> +	struct qedn_icreq_padding *icreq_pad;
> +	u32 *buffer;
> +	int rc = 0;
> +
> +	icreq_pad = kzalloc(sizeof(*icreq_pad), GFP_KERNEL);
> +	if (!icreq_pad)
> +		return -ENOMEM;
> +
> +	conn_ctx->icreq_pad = icreq_pad;
> +	memset(&icreq_pad->sge, 0, sizeof(icreq_pad->sge));
> +	buffer = dma_alloc_coherent(&qedn->pdev->dev,
> +				    QEDN_ICREQ_FW_PAYLOAD,
> +				    &icreq_pad->pa,
> +				    GFP_KERNEL);
> +	if (!buffer) {
> +		pr_err("Could not allocate icreq_padding SGE buffer.\n");
> +		rc =  -ENOMEM;
> +		goto release_icreq_pad;
> +	}
> +
> +	DMA_REGPAIR_LE(icreq_pad->sge.sge_addr, icreq_pad->pa);
> +	icreq_pad->sge.sge_len = cpu_to_le32(QEDN_ICREQ_FW_PAYLOAD);
> +	icreq_pad->buffer = buffer;
> +	set_bit(QEDN_CONN_RESRC_ICREQ_PAD, &conn_ctx->resrc_state);
> +
> +	return 0;
> +
> +release_icreq_pad:
> +	kfree(icreq_pad);
> +	conn_ctx->icreq_pad = NULL;
> +
> +	return rc;
> +}
> +
> +static void qedn_free_icreq_pad(struct qedn_conn_ctx *conn_ctx)
> +{
> +	struct qedn_ctx *qedn = conn_ctx->qedn;
> +	struct qedn_icreq_padding *icreq_pad;
> +	u32 *buffer;
> +
> +	icreq_pad = conn_ctx->icreq_pad;
> +	if (unlikely(!icreq_pad)) {
> +		pr_err("null ptr in icreq_pad in conn_ctx\n");
> +		goto finally;
> +	}
> +
> +	buffer = icreq_pad->buffer;
> +	if (buffer) {
> +		dma_free_coherent(&qedn->pdev->dev,
> +				  QEDN_ICREQ_FW_PAYLOAD,
> +				  (void *)buffer,
> +				  icreq_pad->pa);
> +		icreq_pad->buffer = NULL;
> +	}
> +
> +	kfree(icreq_pad);
> +	conn_ctx->icreq_pad = NULL;
> +
> +finally:
> +	clear_bit(QEDN_CONN_RESRC_ICREQ_PAD, &conn_ctx->resrc_state);
> +}
> +
>   static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
>   {
>   	struct qedn_ctx *qedn = conn_ctx->qedn;
> @@ -151,6 +235,9 @@ static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)
>   		clear_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);
>   	}
>   
> +	if (test_bit(QEDN_CONN_RESRC_ICREQ_PAD, &conn_ctx->resrc_state))
> +		qedn_free_icreq_pad(conn_ctx);
> +
>   	if (test_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state)) {
>   		clear_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);
>   			qedn_return_active_tasks(conn_ctx);
> @@ -309,6 +396,194 @@ void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx, int abrt_flag)
>   	queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
>   }
>   
> +static int qedn_nvmetcp_update_conn(struct qedn_ctx *qedn, struct qedn_conn_ctx *conn_ctx)
> +{
> +	struct qedn_negotiation_params *pdu_params = &conn_ctx->pdu_params;
> +	struct qed_nvmetcp_params_update *conn_info;
> +	int rc;
> +
> +	conn_info = kzalloc(sizeof(*conn_info), GFP_KERNEL);
> +	if (!conn_info)
> +		return -ENOMEM;
> +
> +	conn_info->hdr_digest_en = pdu_params->hdr_digest;
> +	conn_info->data_digest_en = pdu_params->data_digest;
> +	conn_info->max_recv_pdu_length = QEDN_MAX_PDU_SIZE;
> +	conn_info->max_io_size = QEDN_MAX_IO_SIZE;
> +	conn_info->max_send_pdu_length = pdu_params->maxh2cdata;
> +
> +	rc = qed_ops->update_conn(qedn->cdev, conn_ctx->conn_handle, conn_info);
> +	if (rc) {
> +		pr_err("Could not update connection\n");
> +		rc = -ENXIO;
> +	}
> +
> +	kfree(conn_info);
> +
> +	return rc;
> +}
> +
> +static int qedn_update_ramrod(struct qedn_conn_ctx *conn_ctx)
> +{
> +	struct qedn_ctx *qedn = conn_ctx->qedn;
> +	int rc = 0;
> +
> +	rc = qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_UPDATE_EQE);
> +	if (rc)
> +		return rc;
> +
> +	rc = qedn_nvmetcp_update_conn(qedn, conn_ctx);
> +	if (rc)
> +		return rc;
> +
> +	if (conn_ctx->state != CONN_STATE_WAIT_FOR_UPDATE_EQE) {
> +		pr_err("cid 0x%x: Unexpected state 0x%x after update ramrod\n",
> +		       conn_ctx->fw_cid, conn_ctx->state);
> +
> +		return -EINVAL;
> +	}
> +
> +	return rc;
> +}
> +
> +static int qedn_send_icreq(struct qedn_conn_ctx *conn_ctx)
> +{
> +	struct nvmetcp_init_conn_req_hdr *icreq_ptr = NULL;
> +	struct storage_sgl_task_params *sgl_task_params;
> +	struct nvmetcp_task_params task_params;
> +	struct qedn_task_ctx *qedn_task = NULL;
> +	struct nvme_tcp_icreq_pdu icreq;
> +	struct nvmetcp_wqe *chain_sqe;
> +	struct nvmetcp_wqe local_sqe;
> +
> +	qedn_task = qedn_get_task_from_pool_insist(conn_ctx, QEDN_ICREQ_CCCID);
> +	if (!qedn_task)
> +		return -EINVAL;
> +
> +	memset(&icreq, 0, sizeof(icreq));
> +	memset(&local_sqe, 0, sizeof(local_sqe));
> +
> +	/* Initialize ICReq */
> +	icreq.hdr.type = nvme_tcp_icreq;
> +	icreq.hdr.hlen = sizeof(icreq);
> +	icreq.hdr.pdo = 0;
> +	icreq.hdr.plen = cpu_to_le32(icreq.hdr.hlen);
> +	icreq.pfv = cpu_to_le16(conn_ctx->required_params.pfv);
> +	icreq.maxr2t = cpu_to_le32(conn_ctx->required_params.maxr2t);
> +	icreq.hpda = conn_ctx->required_params.hpda;
> +	if (conn_ctx->required_params.hdr_digest)
> +		icreq.digest |= NVME_TCP_HDR_DIGEST_ENABLE;
> +	if (conn_ctx->required_params.data_digest)
> +		icreq.digest |= NVME_TCP_DATA_DIGEST_ENABLE;
> +
> +	qedn_swap_bytes((u32 *)&icreq,
> +			(sizeof(icreq) - QEDN_ICREQ_FW_PAYLOAD) /
> +			 sizeof(u32));
> +
> +	/* Initialize task params */
> +	task_params.opq.lo = cpu_to_le32(((u64)(qedn_task)) & 0xffffffff);
> +	task_params.opq.hi = cpu_to_le32(((u64)(qedn_task)) >> 32);
> +	task_params.context = qedn_task->fw_task_ctx;
> +	task_params.sqe = &local_sqe;
> +	task_params.conn_icid = (u16)conn_ctx->conn_handle;
> +	task_params.itid = qedn_task->itid;
> +	task_params.cq_rss_number = conn_ctx->default_cq;
> +	task_params.tx_io_size = QEDN_ICREQ_FW_PAYLOAD;
> +	task_params.rx_io_size = 0; /* Rx doesn't use SGL for icresp */
> +
> +	/* Init SGE for ICReq padding */
> +	sgl_task_params = &qedn_task->sgl_task_params;
> +	sgl_task_params->total_buffer_size = task_params.tx_io_size;
> +	sgl_task_params->small_mid_sge = false;
> +	sgl_task_params->num_sges = 1;
> +	memcpy(sgl_task_params->sgl, &conn_ctx->icreq_pad->sge,
> +	       sizeof(conn_ctx->icreq_pad->sge));
> +	icreq_ptr = (struct nvmetcp_init_conn_req_hdr *)&icreq;
> +
> +	qed_ops->init_icreq_exchange(&task_params, icreq_ptr, sgl_task_params,  NULL);
> +
> +	qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_IC_COMP);
> +	atomic_inc(&conn_ctx->num_active_fw_tasks);
> +
> +	/* spin_lock - doorbell is accessed  both Rx flow and response flow */
> +	spin_lock(&conn_ctx->ep.doorbell_lock);
> +	chain_sqe = qed_chain_produce(&conn_ctx->ep.fw_sq_chain);
> +	memcpy(chain_sqe, &local_sqe, sizeof(local_sqe));
> +	qedn_ring_doorbell(conn_ctx);
> +	spin_unlock(&conn_ctx->ep.doorbell_lock);
> +
> +	return 0;
> +}
> +

And this is what I meant. You _do_ swab bytes before sending it off to 
the HW, _and_ you use the standard nvme-tcp PDU definitions.
So why do you have your own, byte-swapped versions of the PDUs?

> +void qedn_prep_icresp(struct qedn_conn_ctx *conn_ctx, struct nvmetcp_fw_cqe *cqe)
> +{
> +	struct nvmetcp_icresp_hdr_psh *icresp_from_cqe =
> +		(struct nvmetcp_icresp_hdr_psh *)&cqe->nvme_cqe;
> +	struct nvme_tcp_ofld_ctrl *ctrl = conn_ctx->ctrl;
> +	struct qedn_ctrl *qctrl = NULL;
> +
> +	qctrl = (struct qedn_ctrl *)ctrl->private_data;
> +
> +	memcpy(&conn_ctx->icresp, icresp_from_cqe, sizeof(conn_ctx->icresp));
> +	qedn_set_sp_wa(conn_ctx, HANDLE_ICRESP);
> +	queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
> +}
> +
> +static int qedn_handle_icresp(struct qedn_conn_ctx *conn_ctx)
> +{
> +	struct nvmetcp_icresp_hdr_psh *icresp = &conn_ctx->icresp;
> +	u16 pfv = __swab16(le16_to_cpu(icresp->pfv_swapped));
> +	int rc = 0;
> +

Again here; you could treat the received icresp as a binaray blob. 
byteswap it, and then cast is to the standard icresp structure.
Hmm?

> +	qedn_free_icreq_pad(conn_ctx);
> +
> +	/* Validate ICResp */
> +	if (pfv != conn_ctx->required_params.pfv) {
> +		pr_err("cid %u: unsupported pfv %u\n", conn_ctx->fw_cid, pfv);
> +
> +		return -EINVAL;
> +	}
> +
> +	if (icresp->cpda > conn_ctx->required_params.cpda) {
> +		pr_err("cid %u: unsupported cpda %u\n", conn_ctx->fw_cid, icresp->cpda);
> +
> +		return -EINVAL;
> +	}
> +
> +	if ((NVME_TCP_HDR_DIGEST_ENABLE & icresp->digest) !=
> +	    conn_ctx->required_params.hdr_digest) {
> +		if ((NVME_TCP_HDR_DIGEST_ENABLE & icresp->digest) >
> +		    conn_ctx->required_params.hdr_digest) {
> +			pr_err("cid 0x%x: invalid header digest bit\n", conn_ctx->fw_cid);
> +		}
> +	}
> +
> +	if ((NVME_TCP_DATA_DIGEST_ENABLE & icresp->digest) !=
> +	    conn_ctx->required_params.data_digest) {
> +		if ((NVME_TCP_DATA_DIGEST_ENABLE & icresp->digest) >
> +		    conn_ctx->required_params.data_digest) {
> +			pr_err("cid 0x%x: invalid data digest bit\n", conn_ctx->fw_cid);
> +	}
> +	}
> +
> +	memset(&conn_ctx->pdu_params, 0, sizeof(conn_ctx->pdu_params));
> +	conn_ctx->pdu_params.maxh2cdata =
> +		__swab32(le32_to_cpu(icresp->maxdata_swapped));
> +	conn_ctx->pdu_params.maxh2cdata = QEDN_MAX_PDU_SIZE;
> +	if (conn_ctx->pdu_params.maxh2cdata > QEDN_MAX_PDU_SIZE)
> +		conn_ctx->pdu_params.maxh2cdata = QEDN_MAX_PDU_SIZE;
> +
> +	conn_ctx->pdu_params.pfv = pfv;
> +	conn_ctx->pdu_params.cpda = icresp->cpda;
> +	conn_ctx->pdu_params.hpda = conn_ctx->required_params.hpda;
> +	conn_ctx->pdu_params.hdr_digest = NVME_TCP_HDR_DIGEST_ENABLE & icresp->digest;
> +	conn_ctx->pdu_params.data_digest = NVME_TCP_DATA_DIGEST_ENABLE & icresp->digest;
> +	conn_ctx->pdu_params.maxr2t = conn_ctx->required_params.maxr2t;
> +	rc = qedn_update_ramrod(conn_ctx);
> +
> +	return rc;
> +}
> +
>   /* Slowpath EQ Callback */
>   int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
>   {
> @@ -363,7 +638,8 @@ int qedn_event_cb(void *context, u8 fw_event_code, void *event_ring_data)
>   			if (rc)
>   				return rc;
>   
> -			/* Placeholder - for ICReq flow */
> +			qedn_set_sp_wa(conn_ctx, SEND_ICREQ);
> +			queue_work(qctrl->sp_wq, &conn_ctx->sp_wq_entry);
>   		}
>   
>   		break;
> @@ -399,6 +675,7 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
>   	}
>   
>   	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);
> +	spin_lock_init(&conn_ctx->ep.doorbell_lock);
>   	INIT_LIST_HEAD(&conn_ctx->host_pend_req_list);
>   	spin_lock_init(&conn_ctx->nvme_req_lock);
>   	atomic_set(&conn_ctx->num_active_tasks, 0);
> @@ -463,6 +740,11 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)
>   
>   	memset(conn_ctx->host_cccid_itid, 0xFF, dma_size);
>   	set_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state);
> +
> +	rc = qedn_alloc_icreq_pad(conn_ctx);
> +		if (rc)
> +			goto rel_conn;
> +
>   	rc = qedn_set_con_state(conn_ctx, CONN_STATE_WAIT_FOR_CONNECT_DONE);
>   	if (rc)
>   		goto rel_conn;
> @@ -523,6 +805,9 @@ void qedn_sp_wq_handler(struct work_struct *work)
>   
>   	qedn = conn_ctx->qedn;
>   	if (test_bit(DESTROY_CONNECTION, &conn_ctx->agg_work_action)) {
> +		if (test_bit(HANDLE_ICRESP, &conn_ctx->agg_work_action))
> +			qedn_clr_sp_wa(conn_ctx, HANDLE_ICRESP);
> +
>   		qedn_destroy_connection(conn_ctx);
>   
>   		return;
> @@ -537,6 +822,36 @@ void qedn_sp_wq_handler(struct work_struct *work)
>   			return;
>   		}
>   	}
> +
> +	if (test_bit(SEND_ICREQ, &conn_ctx->agg_work_action)) {
> +		qedn_clr_sp_wa(conn_ctx, SEND_ICREQ);
> +		rc = qedn_send_icreq(conn_ctx);
> +		if (rc)
> +			return;
> +
> +		return;
> +	}
> +
> +	if (test_bit(HANDLE_ICRESP, &conn_ctx->agg_work_action)) {
> +		rc = qedn_handle_icresp(conn_ctx);
> +
> +		qedn_clr_sp_wa(conn_ctx, HANDLE_ICRESP);
> +		if (rc) {
> +			pr_err("IC handling returned with 0x%x\n", rc);
> +			if (test_and_set_bit(DESTROY_CONNECTION, &conn_ctx->agg_work_action))
> +				return;
> +
> +			qedn_destroy_connection(conn_ctx);
> +
> +			return;
> +		}
> +
> +		atomic_inc(&conn_ctx->est_conn_indicator);
> +		qedn_set_con_state(conn_ctx, CONN_STATE_NVMETCP_CONN_ESTABLISHED);
> +		wake_up_interruptible(&conn_ctx->conn_waitq);
> +
> +		return;
> +	}
>   }
>   
>   /* Clear connection aggregative slowpath work action */
> diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c
> index 8d9c19d63480..a6756d7250b7 100644
> --- a/drivers/nvme/hw/qedn/qedn_main.c
> +++ b/drivers/nvme/hw/qedn/qedn_main.c
> @@ -285,6 +285,19 @@ static void qedn_set_ctrl_io_cpus(struct qedn_conn_ctx *conn_ctx, int qid)
>   	conn_ctx->cpu = fp_q->cpu;
>   }
>   
> +static void qedn_set_pdu_params(struct qedn_conn_ctx *conn_ctx)
> +{
> +	/* Enable digest once supported */
> +	conn_ctx->required_params.hdr_digest = 0;
> +	conn_ctx->required_params.data_digest = 0;
> +
> +	conn_ctx->required_params.maxr2t = QEDN_MAX_OUTSTANDING_R2T_PDUS;
> +	conn_ctx->required_params.pfv = NVME_TCP_PFV_1_0;
> +	conn_ctx->required_params.cpda = 0;
> +	conn_ctx->required_params.hpda = 0;
> +	conn_ctx->required_params.maxh2cdata = QEDN_MAX_PDU_SIZE;
> +}
> +
>   static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t q_size)
>   {
>   	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;
> @@ -307,6 +320,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t
>   	conn_ctx->ctrl = ctrl;
>   	conn_ctx->sq_depth = q_size;
>   	qedn_set_ctrl_io_cpus(conn_ctx, qid);
> +	qedn_set_pdu_params(conn_ctx);
>   
>   	init_waitqueue_head(&conn_ctx->conn_waitq);
>   	atomic_set(&conn_ctx->est_conn_indicator, 0);
> @@ -1073,6 +1087,14 @@ static int qedn_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   	return __qedn_probe(pdev);
>   }
>   
> +void qedn_swap_bytes(u32 *p, int size)
> +{
> +	int i;
> +
> +	for (i = 0; i < size; ++i, ++p)
> +		*p = __swab32(*p);
> +}
> +
>   static struct pci_driver qedn_pci_driver = {
>   	.name     = QEDN_MODULE_NAME,
>   	.id_table = qedn_pci_tbl,
> diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c
> index 54f2f4cba6ea..9cb84883e95e 100644
> --- a/drivers/nvme/hw/qedn/qedn_task.c
> +++ b/drivers/nvme/hw/qedn/qedn_task.c
> @@ -536,9 +536,11 @@ void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)
>   			break;
>   
>   		case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST:
> -
> -			/* Placeholder - ICReq flow */
> -
> +			/* Clear ICReq-padding SGE from SGL */
> +			qedn_common_clear_fw_sgl(&qedn_task->sgl_task_params);
> +			/* Task is not required for icresp processing */
> +			qedn_return_task_to_pool(conn_ctx, qedn_task);
> +			qedn_prep_icresp(conn_ctx, cqe);
>   			break;
>   		default:
>   			pr_info("Could not identify task type\n");
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 25/27] qedn: Add IO level fastpath functionality
  2021-04-29 19:09 ` [RFC PATCH v4 25/27] qedn: Add IO level fastpath functionality Shai Malin
@ 2021-05-02 11:54   ` Hannes Reinecke
  2021-05-05 18:04     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:54 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the IO level functionality of qedn
> nvme-tcp-offload host mode. The qedn_task_ctx structure is containing
> various params and state of the current IO, and is mapped 1x1 to the
> fw_task_ctx which is a HW and FW IO context.
> A qedn_task is mapped directly to its parent connection.
> For every new IO a qedn_task structure will be assigned and they will be
> linked for the entire IO's life span.
> 
> The patch will include 2 flows:
>    1. Send new command to the FW:
> 	 The flow is: nvme_tcp_ofld_queue_rq() which invokes qedn_send_req()
> 	 which invokes qedn_queue_request() which will:
>       - Assign fw_task_ctx.
> 	 - Prepare the Read/Write SG buffer.
> 	 -  Initialize the HW and FW context.
> 	 - Pass the IO to the FW.
> 
>    2. Process the IO completion:
>       The flow is: qedn_irq_handler() which invokes qedn_fw_cq_fp_handler()
> 	 which invokes qedn_io_work_cq() which will:
> 	 - process the FW completion.
> 	 - Return the fw_task_ctx to the task pool.
> 	 - complete the nvme req.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/qedn/qedn.h      |   4 +
>   drivers/nvme/hw/qedn/qedn_conn.c |   1 +
>   drivers/nvme/hw/qedn/qedn_task.c | 269 ++++++++++++++++++++++++++++++-
>   3 files changed, 272 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 26/27] qedn: Add Connection and IO level recovery flows
  2021-04-29 19:09 ` [RFC PATCH v4 26/27] qedn: Add Connection and IO level recovery flows Shai Malin
@ 2021-05-02 11:57   ` Hannes Reinecke
  2021-05-05 18:06     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:57 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the connection level functionalities:
>   - conn clear-sq: will release the FW restrictions in order to flush all
>     the pending IOs.
>   - drain: in case clear-sq is stuck, will release all the device FW
>     restrictions in order to flush all the pending IOs.
>   - task cleanup - will flush the IO level resources.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/qedn/qedn.h      |   8 ++
>   drivers/nvme/hw/qedn/qedn_conn.c | 133 ++++++++++++++++++++++++++++++-
>   drivers/nvme/hw/qedn/qedn_main.c |   1 +
>   drivers/nvme/hw/qedn/qedn_task.c |  27 ++++++-
>   4 files changed, 166 insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 27/27] qedn: Add support of ASYNC
  2021-04-29 19:09 ` [RFC PATCH v4 27/27] qedn: Add support of ASYNC Shai Malin
@ 2021-05-02 11:59   ` Hannes Reinecke
  2021-05-05 18:08     ` Shai Malin
  0 siblings, 1 reply; 81+ messages in thread
From: Hannes Reinecke @ 2021-05-02 11:59 UTC (permalink / raw)
  To: Shai Malin, netdev, linux-nvme, sagi, hch, axboe, kbusch
  Cc: David S . Miller davem @ davemloft . net --cc=Jakub Kicinski,
	aelior, mkalderon, okulkarni, pkushwaha, malin1024

On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Prabhakar Kushwaha <pkushwaha@marvell.com>
> 
> This patch implement ASYNC request and response event notification
> handling at qedn driver level.
> 
> NVME Ofld layer's ASYNC request is treated similar to read with
> fake CCCID. This CCCID used to route ASYNC notification back to
> the NVME ofld layer.
> 
> Acked-by: Igor Russkikh <irusskikh@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>   drivers/nvme/hw/qedn/qedn.h      |   8 ++
>   drivers/nvme/hw/qedn/qedn_main.c |   1 +
>   drivers/nvme/hw/qedn/qedn_task.c | 156 +++++++++++++++++++++++++++++--
>   3 files changed, 156 insertions(+), 9 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver
  2021-05-01 16:47 ` [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Hannes Reinecke
@ 2021-05-03 15:13   ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-03 15:13 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Shai Malin, netdev, linux-nvme, davem, kuba, sagi, hch, axboe,
	kbusch, Ariel Elior, Michal Kalderon, okulkarni, pkushwaha

On 5/1/21 7:47 PM, Hannes Reinecke wrote:
> On 4/29/21 9:08 PM, Shai Malin wrote:
> > With the goal of enabling a generic infrastructure that allows NVMe/TCP
> > offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
> > patch series introduces the nvme-tcp-offload ULP host layer, which will
> > be a new transport type called "tcp-offload" and will serve as an
> > abstraction layer to work with vendor specific nvme-tcp offload drivers.
> >
> > NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
> > both the TCP level and the NVMeTCP level.
> >
> > The nvme-tcp-offload transport can co-exist with the existing tcp and
> > other transports. The tcp offload was designed so that stack changes are
> > kept to a bare minimum: only registering new transports.
> > All other APIs, ops etc. are identical to the regular tcp transport.
> > Representing the TCP offload as a new transport allows clear and manageable
> > differentiation between the connections which should use the offload path
> > and those that are not offloaded (even on the same device).
> >
> >
> > The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:
> >
> > * NVMe layer: *
> >
> >         [ nvme/nvme-fabrics/blk-mq ]
> >               |
> >          (nvme API and blk-mq API)
> >               |
> >               |
> > * Vendor agnostic transport layer: *
> >
> >        [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
> >               |        |             |
> >             (Verbs)
> >               |        |             |
> >               |     (Socket)
> >               |        |             |
> >               |        |        (nvme-tcp-offload API)
> >               |        |             |
> >               |        |             |
> > * Vendor Specific Driver: *
> >
> >               |        |             |
> >             [ qedr ]
> >                        |             |
> >                     [ qede ]
> >                                      |
> >                                    [ qedn ]
> >
> >
> > Performance:
> > ============
> > With this implementation on top of the Marvell qedn driver (using the
> > Marvell FastLinQ NIC), we were able to demonstrate the following CPU
> > utilization improvement:
> >
> > On AMD EPYC 7402, 2.80GHz, 28 cores:
> > - For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate):
> >    Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with
> >    NVMeTCP offload.
> >
> > On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores:
> > - For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate):
> >    Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with
> >    NVMeTCP offload.
> >
> > In addition, we were able to demonstrate the following latency improvement:
> > - For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
> >    Improved the average latency from 105 usec with NVMeTCP SW to 39 usec
> >    with NVMeTCP offload.
> >
> >    Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec
> >    with NVMeTCP offload.
> >
> > The end-to-end offload latency was measured from fio while running against
> > back end of null device.
> >
> >
> > Upstream plan:
> > ==============
> > Following this RFC, the series will be sent in a modular way so that changes
> > in each part will not impact the previous part.
> >
> > - Part 1 (Patches 1-7):
> >    The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.
> >
> > - Part 2 (Patch 8-15):
> >    The nvme-tcp-offload patches, will be sent to
> >    'linux-nvme@lists.infradead.org'.
> >
> > - Part 3 (Packet 16-27):
> >    The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
> >
> >
> > Queue Initialization Design:
> > ============================
> > The nvme-tcp-offload ULP module shall register with the existing
> > nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
> > The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
> > with the following ops:
> > - claim_dev() - in order to resolve the route to the target according to
> >                  the paired net_dev.
> > - create_queue() - in order to create offloaded nvme-tcp queue.
> >
> > The nvme-tcp-offload ULP module shall manage all the controller level
> > functionalities, call claim_dev and based on the return values shall call
> > the relevant module create_queue in order to create the admin queue and
> > the IO queues.
> >
> >
> > IO-path Design:
> > ===============
> > The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload
> > ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor
> > driver and later, the nvme-tcp-offload vendor driver returns the request
> > completion (the IO completion).
> > No additional handling is needed in between; this design will reduce the
> > CPU utilization as we will describe below.
> >
> > The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
> > with the following IO-path ops:
> > - init_req()
> > - send_req() - in order to pass the request to the handling of the
> >                 offload driver that shall pass it to the vendor specific device.
> > - poll_queue()
> >
> > Once the IO completes, the nvme-tcp-offload vendor driver shall call
> > command.done() that will invoke the nvme-tcp-offload ULP layer to
> > complete the request.
> >
> >
> > TCP events:
> > ===========
> > The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions
> > and OOO events.
> >
> >
> > Teardown and errors:
> > ====================
> > In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall
> > call the nvme_tcp_ofld_report_queue_err.
> > The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
> > with the following teardown ops:
> > - drain_queue()
> > - destroy_queue()
> >
> >
> > The Marvell FastLinQ NIC HW engine:
> > ====================================
> > The Marvell NIC HW engine is capable of offloading the entire TCP/IP
> > stack and managing up to 64K connections per PF, already implemented and
> > upstream use cases for this include iWARP (by the Marvell qedr driver)
> > and iSCSI (by the Marvell qedi driver).
> > In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
> > and is able to manage the IO level also in case of TCP re-transmissions
> > and OOO events.
> > The HW engine enables direct data placement (including the data digest CRC
> > calculation and validation) and direct data transmission (including data
> > digest CRC calculation).
> >
> >
> > The Marvell qedn driver:
> > ========================
> > The new driver will be added under "drivers/nvme/hw" and will be enabled
> > by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
> > As part of the qedn init, the driver will register as a pci device driver
> > and will work with the Marvell fastlinQ NIC.
> > As part of the probe, the driver will register to the nvme_tcp_offload
> > (ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
> > "qed_*_ops" which are used by the qede, qedr, qedf and qedi device
> > drivers.
> >
> >
> > QEDN Future work:
> > =================
> > - Support extended HW resources.
> > - Digest support.
> > - Devlink support for device configuration and TCP offload configurations.
> > - Statistics
> >
> >
> > Long term future work:
> > ======================
> > - The nvme-tcp-offload ULP target abstraction layer.
> > - The Marvell nvme-tcp-offload "qednt" target driver.
> >
> >
> > Changes since RFC v1:
> > =====================
> > - Fix nvme_tcp_ofld_ops return values.
> > - Remove NVMF_TRTYPE_TCP_OFFLOAD.
> > - Add nvme_tcp_ofld_poll() implementation.
> > - Fix nvme_tcp_ofld_queue_rq() to check map_sg() and send_req() return
> >    values.
> >
> > Changes since RFC v2:
> > =====================
> > - Add qedn - Marvell's NVMeTCP HW offload vendor driver init and probe
> >    (patches 8-11).
> > - Fixes in controller and queue level (patches 3-6).
> >
> > Changes since RFC v3:
> > =====================
> > - Add the full implementation of the nvme-tcp-offload layer including the
> >    new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new flows (ASYNC
> >    and timeout).
> > - Add nvme-tcp-offload device maximums: max_hw_sectors, max_segments.
> > - Add nvme-tcp-offload layer design and optimization changes.
> > - Add the qedn full implementation for the conn level, IO path and error
> >    handling.
> > - Add qed support for the new AHP HW.
> >
> >
> > Arie Gershberg (3):
> >    nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
> >      definitions
> >    nvme-tcp-offload: Add controller level implementation
> >    nvme-tcp-offload: Add controller level error recovery implementation
> >
> > Dean Balandin (3):
> >    nvme-tcp-offload: Add device scan implementation
> >    nvme-tcp-offload: Add queue level implementation
> >    nvme-tcp-offload: Add IO level implementation
> >
> > Nikolay Assa (2):
> >    qed: Add IP services APIs support
> >    qedn: Add qedn_claim_dev API support
> >
> > Omkar Kulkarni (1):
> >    qed: Add qed-NVMeTCP personality
> >
> > Prabhakar Kushwaha (6):
> >    qed: Add support of HW filter block
> >    qedn: Add connection-level slowpath functionality
> >    qedn: Add support of configuring HW filter block
> >    qedn: Add support of Task and SGL
> >    qedn: Add support of NVME ICReq & ICResp
> >    qedn: Add support of ASYNC
> >
> > Shai Malin (12):
> >    qed: Add NVMeTCP Offload PF Level FW and HW HSI
> >    qed: Add NVMeTCP Offload Connection Level FW and HW HSI
> >    qed: Add NVMeTCP Offload IO Level FW and HW HSI
> >    qed: Add NVMeTCP Offload IO Level FW Initializations
> >    nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
> >    nvme-tcp-offload: Add Timeout and ASYNC Support
> >    qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver
> >    qedn: Add qedn probe
> >    qedn: Add IRQ and fast-path resources initializations
> >    qedn: Add IO level nvme_req and fw_cq workqueues
> >    qedn: Add IO level fastpath functionality
> >    qedn: Add Connection and IO level recovery flows
> >
> >   MAINTAINERS                                   |   10 +
> >   drivers/net/ethernet/qlogic/Kconfig           |    3 +
> >   drivers/net/ethernet/qlogic/qed/Makefile      |    5 +
> >   drivers/net/ethernet/qlogic/qed/qed.h         |   16 +
> >   drivers/net/ethernet/qlogic/qed/qed_cxt.c     |   32 +
> >   drivers/net/ethernet/qlogic/qed/qed_cxt.h     |    1 +
> >   drivers/net/ethernet/qlogic/qed/qed_dev.c     |  151 +-
> >   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |    4 +-
> >   drivers/net/ethernet/qlogic/qed/qed_ll2.c     |   31 +-
> >   drivers/net/ethernet/qlogic/qed/qed_mcp.c     |    3 +
> >   drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |    3 +-
> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |  868 +++++++++++
> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  114 ++
> >   .../qlogic/qed/qed_nvmetcp_fw_funcs.c         |  372 +++++
> >   .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |   43 +
> >   .../qlogic/qed/qed_nvmetcp_ip_services.c      |  239 +++
> >   drivers/net/ethernet/qlogic/qed/qed_ooo.c     |    5 +-
> >   drivers/net/ethernet/qlogic/qed/qed_sp.h      |    5 +
> >   .../net/ethernet/qlogic/qed/qed_sp_commands.c |    1 +
> >   drivers/nvme/Kconfig                          |    1 +
> >   drivers/nvme/Makefile                         |    1 +
> >   drivers/nvme/host/Kconfig                     |   16 +
> >   drivers/nvme/host/Makefile                    |    3 +
> >   drivers/nvme/host/fabrics.c                   |    7 -
> >   drivers/nvme/host/fabrics.h                   |    7 +
> >   drivers/nvme/host/tcp-offload.c               | 1330 +++++++++++++++++
> >   drivers/nvme/host/tcp-offload.h               |  209 +++
> >   drivers/nvme/hw/Kconfig                       |    9 +
> >   drivers/nvme/hw/Makefile                      |    3 +
> >   drivers/nvme/hw/qedn/Makefile                 |    4 +
> >   drivers/nvme/hw/qedn/qedn.h                   |  435 ++++++
> >   drivers/nvme/hw/qedn/qedn_conn.c              |  999 +++++++++++++
> >   drivers/nvme/hw/qedn/qedn_main.c              | 1153 ++++++++++++++
> >   drivers/nvme/hw/qedn/qedn_task.c              |  977 ++++++++++++
> >   include/linux/qed/common_hsi.h                |    1 +
> >   include/linux/qed/nvmetcp_common.h            |  616 ++++++++
> >   include/linux/qed/qed_if.h                    |   22 +
> >   include/linux/qed/qed_nvmetcp_if.h            |  244 +++
> >   .../linux/qed/qed_nvmetcp_ip_services_if.h    |   29 +
> >   39 files changed, 7947 insertions(+), 25 deletions(-)
> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
> >   create mode 100644 drivers/nvme/host/tcp-offload.c
> >   create mode 100644 drivers/nvme/host/tcp-offload.h
> >   create mode 100644 drivers/nvme/hw/Kconfig
> >   create mode 100644 drivers/nvme/hw/Makefile
> >   create mode 100644 drivers/nvme/hw/qedn/Makefile
> >   create mode 100644 drivers/nvme/hw/qedn/qedn.h
> >   create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c
> >   create mode 100644 drivers/nvme/hw/qedn/qedn_main.c
> >   create mode 100644 drivers/nvme/hw/qedn/qedn_task.c
> >   create mode 100644 include/linux/qed/nvmetcp_common.h
> >   create mode 100644 include/linux/qed/qed_nvmetcp_if.h
> >   create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h
> >
> I would structure this patchset slightly different, in putting the
> NVMe-oF implementation at the start of the patchset; this will be where
> you get most of the comment, and any change there will potentially
> reflect back on the driver implementation, too.
>
> Something to consider for the next round.

Will do. Thanks.

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 01/27] qed: Add NVMeTCP Offload PF Level FW and HW HSI
  2021-05-01 16:50   ` Hannes Reinecke
@ 2021-05-03 15:23     ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-03 15:23 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Shai Malin, netdev, linux-nvme, davem, kuba, sagi, hch, axboe,
	kbusch, Ariel Elior, Michal Kalderon, okulkarni, pkushwaha,
	Dean Balandin

On 5/1/21 7:50 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:
> > This patch introduces the NVMeTCP device and PF level HSI and HSI
> > functionality in order to initialize and interact with the HW device.
> >
> > This patch is based on the qede, qedr, qedi, qedf drivers HSI.
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > ---
> >   drivers/net/ethernet/qlogic/Kconfig           |   3 +
> >   drivers/net/ethernet/qlogic/qed/Makefile      |   2 +
> >   drivers/net/ethernet/qlogic/qed/qed.h         |   3 +
> >   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |   1 +
> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 282 ++++++++++++++++++
> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  51 ++++
> >   drivers/net/ethernet/qlogic/qed/qed_sp.h      |   2 +
> >   include/linux/qed/common_hsi.h                |   1 +
> >   include/linux/qed/nvmetcp_common.h            |  54 ++++
> >   include/linux/qed/qed_if.h                    |  22 ++
> >   include/linux/qed/qed_nvmetcp_if.h            |  72 +++++
> >   11 files changed, 493 insertions(+)
> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> >   create mode 100644 include/linux/qed/nvmetcp_common.h
> >   create mode 100644 include/linux/qed/qed_nvmetcp_if.h
> >
> > diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig
> > index 6b5ddb07ee83..98f430905ffa 100644
> > --- a/drivers/net/ethernet/qlogic/Kconfig
> > +++ b/drivers/net/ethernet/qlogic/Kconfig
> > @@ -110,6 +110,9 @@ config QED_RDMA
> >   config QED_ISCSI
> >       bool
> >
> > +config QED_NVMETCP
> > +     bool
> > +
> >   config QED_FCOE
> >       bool
> >
> > diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile
> > index 8251755ec18c..7cb0db67ba5b 100644
> > --- a/drivers/net/ethernet/qlogic/qed/Makefile
> > +++ b/drivers/net/ethernet/qlogic/qed/Makefile
> > @@ -28,6 +28,8 @@ qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o
> >   qed-$(CONFIG_QED_LL2) += qed_ll2.o
> >   qed-$(CONFIG_QED_OOO) += qed_ooo.o
> >
> > +qed-$(CONFIG_QED_NVMETCP) += qed_nvmetcp.o
> > +
> >   qed-$(CONFIG_QED_RDMA) +=   \
> >       qed_iwarp.o             \
> >       qed_rdma.o              \
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
> > index a20cb8a0c377..91d4635009ab 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed.h
> > +++ b/drivers/net/ethernet/qlogic/qed/qed.h
> > @@ -240,6 +240,7 @@ enum QED_FEATURE {
> >       QED_VF,
> >       QED_RDMA_CNQ,
> >       QED_ISCSI_CQ,
> > +     QED_NVMETCP_CQ = QED_ISCSI_CQ,
> >       QED_FCOE_CQ,
> >       QED_VF_L2_QUE,
> >       QED_MAX_FEATURES,
> > @@ -592,6 +593,7 @@ struct qed_hwfn {
> >       struct qed_ooo_info             *p_ooo_info;
> >       struct qed_rdma_info            *p_rdma_info;
> >       struct qed_iscsi_info           *p_iscsi_info;
> > +     struct qed_nvmetcp_info         *p_nvmetcp_info;
> >       struct qed_fcoe_info            *p_fcoe_info;
> >       struct qed_pf_params            pf_params;
> >
> > @@ -828,6 +830,7 @@ struct qed_dev {
> >               struct qed_eth_cb_ops           *eth;
> >               struct qed_fcoe_cb_ops          *fcoe;
> >               struct qed_iscsi_cb_ops         *iscsi;
> > +             struct qed_nvmetcp_cb_ops       *nvmetcp;
> >       } protocol_ops;
> >       void                            *ops_cookie;
> >
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> > index 559df9f4d656..24472f6a83c2 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> > @@ -20,6 +20,7 @@
> >   #include <linux/qed/fcoe_common.h>
> >   #include <linux/qed/eth_common.h>
> >   #include <linux/qed/iscsi_common.h>
> > +#include <linux/qed/nvmetcp_common.h>
> >   #include <linux/qed/iwarp_common.h>
> >   #include <linux/qed/rdma_common.h>
> >   #include <linux/qed/roce_common.h>
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> > new file mode 100644
> > index 000000000000..da3b5002d216
> > --- /dev/null
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> > @@ -0,0 +1,282 @@
> > +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
> > +/* Copyright 2021 Marvell. All rights reserved. */
> > +
> > +#include <linux/types.h>
> > +#include <asm/byteorder.h>
> > +#include <asm/param.h>
> > +#include <linux/delay.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/etherdevice.h>
> > +#include <linux/kernel.h>
> > +#include <linux/log2.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +#include <linux/stddef.h>
> > +#include <linux/string.h>
> > +#include <linux/errno.h>
> > +#include <linux/list.h>
> > +#include <linux/qed/qed_nvmetcp_if.h>
> > +#include "qed.h"
> > +#include "qed_cxt.h"
> > +#include "qed_dev_api.h"
> > +#include "qed_hsi.h"
> > +#include "qed_hw.h"
> > +#include "qed_int.h"
> > +#include "qed_nvmetcp.h"
> > +#include "qed_ll2.h"
> > +#include "qed_mcp.h"
> > +#include "qed_sp.h"
> > +#include "qed_reg_addr.h"
> > +
> > +static int qed_nvmetcp_async_event(struct qed_hwfn *p_hwfn, u8 fw_event_code,
> > +                                u16 echo, union event_ring_data *data,
> > +                                u8 fw_return_code)
> > +{
> > +     if (p_hwfn->p_nvmetcp_info->event_cb) {
> > +             struct qed_nvmetcp_info *p_nvmetcp = p_hwfn->p_nvmetcp_info;
> > +
> > +             return p_nvmetcp->event_cb(p_nvmetcp->event_context,
> > +                                      fw_event_code, data);
> > +     } else {
> > +             DP_NOTICE(p_hwfn, "nvmetcp async completion is not set\n");
> > +
> > +             return -EINVAL;
> > +     }
> > +}
> > +
> > +static int qed_sp_nvmetcp_func_start(struct qed_hwfn *p_hwfn,
> > +                                  enum spq_mode comp_mode,
> > +                                  struct qed_spq_comp_cb *p_comp_addr,
> > +                                  void *event_context,
> > +                                  nvmetcp_event_cb_t async_event_cb)
> > +{
> > +     struct nvmetcp_init_ramrod_params *p_ramrod = NULL;
> > +     struct qed_nvmetcp_pf_params *p_params = NULL;
> > +     struct scsi_init_func_queues *p_queue = NULL;
> > +     struct nvmetcp_spe_func_init *p_init = NULL;
> > +     struct qed_sp_init_data init_data = {};
> > +     struct qed_spq_entry *p_ent = NULL;
> > +     int rc = 0;
> > +     u16 val;
> > +     u8 i;
> > +
> > +     /* Get SPQ entry */
> > +     init_data.cid = qed_spq_get_cid(p_hwfn);
> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> > +     init_data.comp_mode = comp_mode;
> > +     init_data.p_comp_data = p_comp_addr;
> > +
> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,
> > +                              NVMETCP_RAMROD_CMD_ID_INIT_FUNC,
> > +                              PROTOCOLID_NVMETCP, &init_data);
> > +     if (rc)
> > +             return rc;
> > +
> > +     p_ramrod = &p_ent->ramrod.nvmetcp_init;
> > +     p_init = &p_ramrod->nvmetcp_init_spe;
> > +     p_params = &p_hwfn->pf_params.nvmetcp_pf_params;
> > +     p_queue = &p_init->q_params;
> > +
> > +     p_init->num_sq_pages_in_ring = p_params->num_sq_pages_in_ring;
> > +     p_init->num_r2tq_pages_in_ring = p_params->num_r2tq_pages_in_ring;
> > +     p_init->num_uhq_pages_in_ring = p_params->num_uhq_pages_in_ring;
> > +     p_init->ll2_rx_queue_id = RESC_START(p_hwfn, QED_LL2_RAM_QUEUE) +
> > +                                     p_params->ll2_ooo_queue_id;
> > +
> > +     SET_FIELD(p_init->flags, NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE, 1);
> > +
> > +     p_init->func_params.log_page_size = ilog2(PAGE_SIZE);
> > +     p_init->func_params.num_tasks = cpu_to_le16(p_params->num_tasks);
> > +     p_init->debug_flags = p_params->debug_mode;
> > +
> > +     DMA_REGPAIR_LE(p_queue->glbl_q_params_addr,
> > +                    p_params->glbl_q_params_addr);
> > +
> > +     p_queue->cq_num_entries = cpu_to_le16(QED_NVMETCP_FW_CQ_SIZE);
> > +     p_queue->num_queues = p_params->num_queues;
> > +     val = RESC_START(p_hwfn, QED_CMDQS_CQS);
> > +     p_queue->queue_relative_offset = cpu_to_le16((u16)val);
> > +     p_queue->cq_sb_pi = p_params->gl_rq_pi;
> > +
> > +     for (i = 0; i < p_params->num_queues; i++) {
> > +             val = qed_get_igu_sb_id(p_hwfn, i);
> > +             p_queue->cq_cmdq_sb_num_arr[i] = cpu_to_le16(val);
> > +     }
> > +
> > +     SET_FIELD(p_queue->q_validity,
> > +               SCSI_INIT_FUNC_QUEUES_CMD_VALID, 0);
> > +     p_queue->cmdq_num_entries = 0;
> > +     p_queue->bdq_resource_id = (u8)RESC_START(p_hwfn, QED_BDQ);
> > +
> > +     /* p_ramrod->tcp_init.min_rto = cpu_to_le16(p_params->min_rto); */
> > +     p_ramrod->tcp_init.two_msl_timer = cpu_to_le32(QED_TCP_TWO_MSL_TIMER);
> > +     p_ramrod->tcp_init.tx_sws_timer = cpu_to_le16(QED_TCP_SWS_TIMER);
> > +     p_init->half_way_close_timeout = cpu_to_le16(QED_TCP_HALF_WAY_CLOSE_TIMEOUT);
> > +     p_ramrod->tcp_init.max_fin_rt = QED_TCP_MAX_FIN_RT;
> > +
> > +     SET_FIELD(p_ramrod->nvmetcp_init_spe.params,
> > +               NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT, QED_TCP_MAX_FIN_RT);
> > +
> > +     p_hwfn->p_nvmetcp_info->event_context = event_context;
> > +     p_hwfn->p_nvmetcp_info->event_cb = async_event_cb;
> > +
> > +     qed_spq_register_async_cb(p_hwfn, PROTOCOLID_NVMETCP,
> > +                               qed_nvmetcp_async_event);
> > +
> > +     return qed_spq_post(p_hwfn, p_ent, NULL);
> > +}
> > +
> > +static int qed_sp_nvmetcp_func_stop(struct qed_hwfn *p_hwfn,
> > +                                 enum spq_mode comp_mode,
> > +                                 struct qed_spq_comp_cb *p_comp_addr)
> > +{
> > +     struct qed_spq_entry *p_ent = NULL;
> > +     struct qed_sp_init_data init_data;
> > +     int rc;
> > +
> > +     /* Get SPQ entry */
> > +     memset(&init_data, 0, sizeof(init_data));
> > +     init_data.cid = qed_spq_get_cid(p_hwfn);
> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> > +     init_data.comp_mode = comp_mode;
> > +     init_data.p_comp_data = p_comp_addr;
> > +
> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,
> > +                              NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC,
> > +                              PROTOCOLID_NVMETCP, &init_data);
> > +     if (rc)
> > +             return rc;
> > +
> > +     rc = qed_spq_post(p_hwfn, p_ent, NULL);
> > +
> > +     qed_spq_unregister_async_cb(p_hwfn, PROTOCOLID_NVMETCP);
> > +
> > +     return rc;
> > +}
> > +
> > +static int qed_fill_nvmetcp_dev_info(struct qed_dev *cdev,
> > +                                  struct qed_dev_nvmetcp_info *info)
> > +{
> > +     struct qed_hwfn *hwfn = QED_AFFIN_HWFN(cdev);
> > +     int rc;
> > +
> > +     memset(info, 0, sizeof(*info));
> > +     rc = qed_fill_dev_info(cdev, &info->common);
> > +
> > +     info->port_id = MFW_PORT(hwfn);
> > +     info->num_cqs = FEAT_NUM(hwfn, QED_NVMETCP_CQ);
> > +
> > +     return rc;
> > +}
> > +
> > +static void qed_register_nvmetcp_ops(struct qed_dev *cdev,
> > +                                  struct qed_nvmetcp_cb_ops *ops,
> > +                                  void *cookie)
> > +{
> > +     cdev->protocol_ops.nvmetcp = ops;
> > +     cdev->ops_cookie = cookie;
> > +}
> > +
> > +static int qed_nvmetcp_stop(struct qed_dev *cdev)
> > +{
> > +     int rc;
> > +
> > +     if (!(cdev->flags & QED_FLAG_STORAGE_STARTED)) {
> > +             DP_NOTICE(cdev, "nvmetcp already stopped\n");
> > +
> > +             return 0;
> > +     }
> > +
> > +     if (!hash_empty(cdev->connections)) {
> > +             DP_NOTICE(cdev,
> > +                       "Can't stop nvmetcp - not all connections were returned\n");
> > +
> > +             return -EINVAL;
> > +     }
> > +
> > +     /* Stop the nvmetcp */
> > +     rc = qed_sp_nvmetcp_func_stop(QED_AFFIN_HWFN(cdev), QED_SPQ_MODE_EBLOCK,
> > +                                   NULL);
> > +     cdev->flags &= ~QED_FLAG_STORAGE_STARTED;
> > +
> > +     return rc;
> > +}
> > +
> > +static int qed_nvmetcp_start(struct qed_dev *cdev,
> > +                          struct qed_nvmetcp_tid *tasks,
> > +                          void *event_context,
> > +                          nvmetcp_event_cb_t async_event_cb)
> > +{
> > +     struct qed_tid_mem *tid_info;
> > +     int rc;
> > +
> > +     if (cdev->flags & QED_FLAG_STORAGE_STARTED) {
> > +             DP_NOTICE(cdev, "nvmetcp already started;\n");
> > +
> > +             return 0;
> > +     }
> > +
> > +     rc = qed_sp_nvmetcp_func_start(QED_AFFIN_HWFN(cdev),
> > +                                    QED_SPQ_MODE_EBLOCK, NULL,
> > +                                    event_context, async_event_cb);
> > +     if (rc) {
> > +             DP_NOTICE(cdev, "Failed to start nvmetcp\n");
> > +
> > +             return rc;
> > +     }
> > +
> > +     cdev->flags |= QED_FLAG_STORAGE_STARTED;
> > +     hash_init(cdev->connections);
> > +
> > +     if (!tasks)
> > +             return 0;
> > +
> > +     tid_info = kzalloc(sizeof(*tid_info), GFP_KERNEL);
> > +
> > +     if (!tid_info) {
> > +             qed_nvmetcp_stop(cdev);
> > +
> > +             return -ENOMEM;
> > +     }
> > +
> > +     rc = qed_cxt_get_tid_mem_info(QED_AFFIN_HWFN(cdev), tid_info);
> > +     if (rc) {
> > +             DP_NOTICE(cdev, "Failed to gather task information\n");
> > +             qed_nvmetcp_stop(cdev);
> > +             kfree(tid_info);
> > +
> > +             return rc;
> > +     }
> > +
> > +     /* Fill task information */
> > +     tasks->size = tid_info->tid_size;
> > +     tasks->num_tids_per_block = tid_info->num_tids_per_block;
> > +     memcpy(tasks->blocks, tid_info->blocks,
> > +            MAX_TID_BLOCKS_NVMETCP * sizeof(u8 *));
> > +
> > +     kfree(tid_info);
> > +
> > +     return 0;
> > +}
> > +
> > +static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
> > +     .common = &qed_common_ops_pass,
> > +     .ll2 = &qed_ll2_ops_pass,
> > +     .fill_dev_info = &qed_fill_nvmetcp_dev_info,
> > +     .register_ops = &qed_register_nvmetcp_ops,
> > +     .start = &qed_nvmetcp_start,
> > +     .stop = &qed_nvmetcp_stop,
> > +
> > +     /* Placeholder - Connection level ops */
> > +};
> > +
> > +const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)
> > +{
> > +     return &qed_nvmetcp_ops_pass;
> > +}
> > +EXPORT_SYMBOL(qed_get_nvmetcp_ops);
> > +
> > +void qed_put_nvmetcp_ops(void)
> > +{
> > +}
> > +EXPORT_SYMBOL(qed_put_nvmetcp_ops);
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> > new file mode 100644
> > index 000000000000..774b46ade408
> > --- /dev/null
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> > @@ -0,0 +1,51 @@
> > +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
> > +/* Copyright 2021 Marvell. All rights reserved. */
> > +
> > +#ifndef _QED_NVMETCP_H
> > +#define _QED_NVMETCP_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/list.h>
> > +#include <linux/slab.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/qed/tcp_common.h>
> > +#include <linux/qed/qed_nvmetcp_if.h>
> > +#include <linux/qed/qed_chain.h>
> > +#include "qed.h"
> > +#include "qed_hsi.h"
> > +#include "qed_mcp.h"
> > +#include "qed_sp.h"
> > +
> > +#define QED_NVMETCP_FW_CQ_SIZE (4 * 1024)
> > +
> > +/* tcp parameters */
> > +#define QED_TCP_TWO_MSL_TIMER 4000
> > +#define QED_TCP_HALF_WAY_CLOSE_TIMEOUT 10
> > +#define QED_TCP_MAX_FIN_RT 2
> > +#define QED_TCP_SWS_TIMER 5000
> > +
> > +struct qed_nvmetcp_info {
> > +     spinlock_t lock; /* Connection resources. */
> > +     struct list_head free_list;
> > +     u16 max_num_outstanding_tasks;
> > +     void *event_context;
> > +     nvmetcp_event_cb_t event_cb;
> > +};
> > +
> > +#if IS_ENABLED(CONFIG_QED_NVMETCP)
> > +int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn);
> > +void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn);
> > +void qed_nvmetcp_free(struct qed_hwfn *p_hwfn);
> > +
> > +#else /* IS_ENABLED(CONFIG_QED_NVMETCP) */
> > +static inline int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn)
> > +{
> > +     return -EINVAL;
> > +}
> > +
> > +static inline void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn) {}
> > +static inline void qed_nvmetcp_free(struct qed_hwfn *p_hwfn) {}
> > +
> > +#endif /* IS_ENABLED(CONFIG_QED_NVMETCP) */
> > +
> > +#endif
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h
> > index 993f1357b6fc..525159e747a5 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_sp.h
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h
> > @@ -100,6 +100,8 @@ union ramrod_data {
> >       struct iscsi_spe_conn_mac_update iscsi_conn_mac_update;
> >       struct iscsi_spe_conn_termination iscsi_conn_terminate;
> >
> > +     struct nvmetcp_init_ramrod_params nvmetcp_init;
> > +
> >       struct vf_start_ramrod_data vf_start;
> >       struct vf_stop_ramrod_data vf_stop;
> >   };
> > diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h
> > index 977807e1be53..59c5e5866607 100644
> > --- a/include/linux/qed/common_hsi.h
> > +++ b/include/linux/qed/common_hsi.h
> > @@ -703,6 +703,7 @@ enum mf_mode {
> >   /* Per-protocol connection types */
> >   enum protocol_type {
> >       PROTOCOLID_ISCSI,
> > +     PROTOCOLID_NVMETCP = PROTOCOLID_ISCSI,
> >       PROTOCOLID_FCOE,
> >       PROTOCOLID_ROCE,
> >       PROTOCOLID_CORE,
>
> Why not a separate Protocol ID?
> Don't you expect iSCSI and NVMe-TCP to be run at the same time?

PROTOCOLID determines the FW resource layout, which is the same for iSCSI
and NVMeTCP.
I will change PROTOCOLID_NVMETCP and PROTOCOLID_ISCSI to
PROTOCOLID_TCP_ULP.
iSCSI and NVMeTCP can run concurrently on the device, but not on the same PF.
Both iSCSI and NVMeTCP PFs will use PROTOCOLID_TCP_ULP

>
> > diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
> > new file mode 100644
> > index 000000000000..e9ccfc07041d
> > --- /dev/null
> > +++ b/include/linux/qed/nvmetcp_common.h
> > @@ -0,0 +1,54 @@
> > +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
> > +/* Copyright 2021 Marvell. All rights reserved. */
> > +
> > +#ifndef __NVMETCP_COMMON__
> > +#define __NVMETCP_COMMON__
> > +
> > +#include "tcp_common.h"
> > +
> > +/* NVMeTCP firmware function init parameters */
> > +struct nvmetcp_spe_func_init {
> > +     __le16 half_way_close_timeout;
> > +     u8 num_sq_pages_in_ring;
> > +     u8 num_r2tq_pages_in_ring;
> > +     u8 num_uhq_pages_in_ring;
> > +     u8 ll2_rx_queue_id;
> > +     u8 flags;
> > +#define NVMETCP_SPE_FUNC_INIT_COUNTERS_EN_MASK 0x1
> > +#define NVMETCP_SPE_FUNC_INIT_COUNTERS_EN_SHIFT 0
> > +#define NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE_MASK 0x1
> > +#define NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE_SHIFT 1
> > +#define NVMETCP_SPE_FUNC_INIT_RESERVED0_MASK 0x3F
> > +#define NVMETCP_SPE_FUNC_INIT_RESERVED0_SHIFT 2
> > +     u8 debug_flags;
> > +     __le16 reserved1;
> > +     u8 params;
> > +#define NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT_MASK        0xF
> > +#define NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT_SHIFT       0
> > +#define NVMETCP_SPE_FUNC_INIT_RESERVED1_MASK 0xF
> > +#define NVMETCP_SPE_FUNC_INIT_RESERVED1_SHIFT        4
> > +     u8 reserved2[5];
> > +     struct scsi_init_func_params func_params;
> > +     struct scsi_init_func_queues q_params;
> > +};
> > +
> > +/* NVMeTCP init params passed by driver to FW in NVMeTCP init ramrod. */
> > +struct nvmetcp_init_ramrod_params {
> > +     struct nvmetcp_spe_func_init nvmetcp_init_spe;
> > +     struct tcp_init_params tcp_init;
> > +};
> > +
> > +/* NVMeTCP Ramrod Command IDs */
> > +enum nvmetcp_ramrod_cmd_id {
> > +     NVMETCP_RAMROD_CMD_ID_UNUSED = 0,
> > +     NVMETCP_RAMROD_CMD_ID_INIT_FUNC = 1,
> > +     NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC = 2,
> > +     MAX_NVMETCP_RAMROD_CMD_ID
> > +};
> > +
> > +struct nvmetcp_glbl_queue_entry {
> > +     struct regpair cq_pbl_addr;
> > +     struct regpair reserved;
> > +};
> > +
> > +#endif /* __NVMETCP_COMMON__ */
> > diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
> > index 68d17a4fbf20..524f57821ba2 100644
> > --- a/include/linux/qed/qed_if.h
> > +++ b/include/linux/qed/qed_if.h
> > @@ -542,6 +542,26 @@ struct qed_iscsi_pf_params {
> >       u8 bdq_pbl_num_entries[3];
> >   };
> >
> > +struct qed_nvmetcp_pf_params {
> > +     u64 glbl_q_params_addr;
> > +     u16 cq_num_entries;
> > +
> > +     u16 num_cons;
> > +     u16 num_tasks;
> > +
> > +     u8 num_sq_pages_in_ring;
> > +     u8 num_r2tq_pages_in_ring;
> > +     u8 num_uhq_pages_in_ring;
> > +
> > +     u8 num_queues;
> > +     u8 gl_rq_pi;
> > +     u8 gl_cmd_pi;
> > +     u8 debug_mode;
> > +     u8 ll2_ooo_queue_id;
> > +
> > +     u16 min_rto;
> > +};
> > +
> >   struct qed_rdma_pf_params {
> >       /* Supplied to QED during resource allocation (may affect the ILT and
> >        * the doorbell BAR).
> > @@ -560,6 +580,7 @@ struct qed_pf_params {
> >       struct qed_eth_pf_params eth_pf_params;
> >       struct qed_fcoe_pf_params fcoe_pf_params;
> >       struct qed_iscsi_pf_params iscsi_pf_params;
> > +     struct qed_nvmetcp_pf_params nvmetcp_pf_params;
> >       struct qed_rdma_pf_params rdma_pf_params;
> >   };
> >
> > @@ -662,6 +683,7 @@ enum qed_sb_type {
> >   enum qed_protocol {
> >       QED_PROTOCOL_ETH,
> >       QED_PROTOCOL_ISCSI,
> > +     QED_PROTOCOL_NVMETCP = QED_PROTOCOL_ISCSI,
> >       QED_PROTOCOL_FCOE,
> >   };
> >
> > diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
> > new file mode 100644
> > index 000000000000..abc1f41862e3
> > --- /dev/null
> > +++ b/include/linux/qed/qed_nvmetcp_if.h
> > @@ -0,0 +1,72 @@
> > +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
> > +/* Copyright 2021 Marvell. All rights reserved. */
> > +
> > +#ifndef _QED_NVMETCP_IF_H
> > +#define _QED_NVMETCP_IF_H
> > +#include <linux/types.h>
> > +#include <linux/qed/qed_if.h>
> > +
> > +#define QED_NVMETCP_MAX_IO_SIZE      0x800000
> > +
> > +typedef int (*nvmetcp_event_cb_t) (void *context,
> > +                                u8 fw_event_code, void *fw_handle);
> > +
> > +struct qed_dev_nvmetcp_info {
> > +     struct qed_dev_info common;
> > +
> > +     u8 port_id;  /* Physical port */
> > +     u8 num_cqs;
> > +};
> > +
> > +#define MAX_TID_BLOCKS_NVMETCP (512)
> > +struct qed_nvmetcp_tid {
> > +     u32 size;               /* In bytes per task */
> > +     u32 num_tids_per_block;
> > +     u8 *blocks[MAX_TID_BLOCKS_NVMETCP];
> > +};
> > +
> > +struct qed_nvmetcp_cb_ops {
> > +     struct qed_common_cb_ops common;
> > +};
> > +
> > +/**
> > + * struct qed_nvmetcp_ops - qed NVMeTCP operations.
> > + * @common:          common operations pointer
> > + * @ll2:             light L2 operations pointer
> > + * @fill_dev_info:   fills NVMeTCP specific information
> > + *                   @param cdev
> > + *                   @param info
> > + *                   @return 0 on success, otherwise error value.
> > + * @register_ops:    register nvmetcp operations
> > + *                   @param cdev
> > + *                   @param ops - specified using qed_nvmetcp_cb_ops
> > + *                   @param cookie - driver private
> > + * @start:           nvmetcp in FW
> > + *                   @param cdev
> > + *                   @param tasks - qed will fill information about tasks
> > + *                   return 0 on success, otherwise error value.
> > + * @stop:            nvmetcp in FW
> > + *                   @param cdev
> > + *                   return 0 on success, otherwise error value.
> > + */
> > +struct qed_nvmetcp_ops {
> > +     const struct qed_common_ops *common;
> > +
> > +     const struct qed_ll2_ops *ll2;
> > +
> > +     int (*fill_dev_info)(struct qed_dev *cdev,
> > +                          struct qed_dev_nvmetcp_info *info);
> > +
> > +     void (*register_ops)(struct qed_dev *cdev,
> > +                          struct qed_nvmetcp_cb_ops *ops, void *cookie);
> > +
> > +     int (*start)(struct qed_dev *cdev,
> > +                  struct qed_nvmetcp_tid *tasks,
> > +                  void *event_context, nvmetcp_event_cb_t async_event_cb);
> > +
> > +     int (*stop)(struct qed_dev *cdev);
> > +};
> > +
> > +const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);
> > +void qed_put_nvmetcp_ops(void);
> > +#endif
> >
> As mentioned, please rearrange the patchset to have the NVMe-TCP patches
> first, then the driver specific bits.

Sure.

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 02/27] qed: Add NVMeTCP Offload Connection Level FW and HW HSI
  2021-05-01 17:28   ` Hannes Reinecke
@ 2021-05-03 15:25     ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-03 15:25 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Shai Malin, netdev, linux-nvme, davem, kuba, sagi, hch, axboe,
	kbusch, Ariel Elior, Michal Kalderon, okulkarni, pkushwaha

On 5/1/21 8:28 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:
> > This patch introduces the NVMeTCP HSI and HSI functionality in order to
> > initialize and interact with the HW device as part of the connection level
> > HSI.
> >
> > This includes:
> > - Connection offload: offload a TCP connection to the FW.
> > - Connection update: update the ICReq-ICResp params
> > - Connection clear SQ: outstanding IOs FW flush.
> > - Connection termination: terminate the TCP connection and flush the FW.
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > ---
> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 580 +++++++++++++++++-
> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  63 ++
> >   drivers/net/ethernet/qlogic/qed/qed_sp.h      |   3 +
> >   include/linux/qed/nvmetcp_common.h            | 143 +++++
> >   include/linux/qed/qed_nvmetcp_if.h            |  94 +++
> >   5 files changed, 881 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> > index da3b5002d216..79bd1cc6677f 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> > @@ -259,6 +259,578 @@ static int qed_nvmetcp_start(struct qed_dev *cdev,
> >       return 0;
> >   }
> >
> > +static struct qed_hash_nvmetcp_con *qed_nvmetcp_get_hash(struct qed_dev *cdev,
> > +                                                      u32 handle)
> > +{
> > +     struct qed_hash_nvmetcp_con *hash_con = NULL;
> > +
> > +     if (!(cdev->flags & QED_FLAG_STORAGE_STARTED))
> > +             return NULL;
> > +
> > +     hash_for_each_possible(cdev->connections, hash_con, node, handle) {
> > +             if (hash_con->con->icid == handle)
> > +                     break;
> > +     }
> > +
> > +     if (!hash_con || hash_con->con->icid != handle)
> > +             return NULL;
> > +
> > +     return hash_con;
> > +}
> > +
> > +static int qed_sp_nvmetcp_conn_offload(struct qed_hwfn *p_hwfn,
> > +                                    struct qed_nvmetcp_conn *p_conn,
> > +                                    enum spq_mode comp_mode,
> > +                                    struct qed_spq_comp_cb *p_comp_addr)
> > +{
> > +     struct nvmetcp_spe_conn_offload *p_ramrod = NULL;
> > +     struct tcp_offload_params_opt2 *p_tcp2 = NULL;
> > +     struct qed_sp_init_data init_data = { 0 };
> > +     struct qed_spq_entry *p_ent = NULL;
> > +     dma_addr_t r2tq_pbl_addr;
> > +     dma_addr_t xhq_pbl_addr;
> > +     dma_addr_t uhq_pbl_addr;
> > +     u16 physical_q;
> > +     int rc = 0;
> > +     u32 dval;
> > +     u8 i;
> > +
> > +     /* Get SPQ entry */
> > +     init_data.cid = p_conn->icid;
> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> > +     init_data.comp_mode = comp_mode;
> > +     init_data.p_comp_data = p_comp_addr;
> > +
> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,
> > +                              NVMETCP_RAMROD_CMD_ID_OFFLOAD_CONN,
> > +                              PROTOCOLID_NVMETCP, &init_data);
> > +     if (rc)
> > +             return rc;
> > +
> > +     p_ramrod = &p_ent->ramrod.nvmetcp_conn_offload;
> > +
> > +     /* Transmission PQ is the first of the PF */
> > +     physical_q = qed_get_cm_pq_idx(p_hwfn, PQ_FLAGS_OFLD);
> > +     p_conn->physical_q0 = cpu_to_le16(physical_q);
> > +     p_ramrod->nvmetcp.physical_q0 = cpu_to_le16(physical_q);
> > +
> > +     /* nvmetcp Pure-ACK PQ */
> > +     physical_q = qed_get_cm_pq_idx(p_hwfn, PQ_FLAGS_ACK);
> > +     p_conn->physical_q1 = cpu_to_le16(physical_q);
> > +     p_ramrod->nvmetcp.physical_q1 = cpu_to_le16(physical_q);
> > +
> > +     p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);
> > +
> > +     DMA_REGPAIR_LE(p_ramrod->nvmetcp.sq_pbl_addr, p_conn->sq_pbl_addr);
> > +
> > +     r2tq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->r2tq);
> > +     DMA_REGPAIR_LE(p_ramrod->nvmetcp.r2tq_pbl_addr, r2tq_pbl_addr);
> > +
> > +     xhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->xhq);
> > +     DMA_REGPAIR_LE(p_ramrod->nvmetcp.xhq_pbl_addr, xhq_pbl_addr);
> > +
> > +     uhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->uhq);
> > +     DMA_REGPAIR_LE(p_ramrod->nvmetcp.uhq_pbl_addr, uhq_pbl_addr);
> > +
> > +     p_ramrod->nvmetcp.flags = p_conn->offl_flags;
> > +     p_ramrod->nvmetcp.default_cq = p_conn->default_cq;
> > +     p_ramrod->nvmetcp.initial_ack = 0;
> > +
> > +     DMA_REGPAIR_LE(p_ramrod->nvmetcp.nvmetcp.cccid_itid_table_addr,
> > +                    p_conn->nvmetcp_cccid_itid_table_addr);
> > +     p_ramrod->nvmetcp.nvmetcp.cccid_max_range =
> > +              cpu_to_le16(p_conn->nvmetcp_cccid_max_range);
> > +
> > +     p_tcp2 = &p_ramrod->tcp;
> > +
> > +     qed_set_fw_mac_addr(&p_tcp2->remote_mac_addr_hi,
> > +                         &p_tcp2->remote_mac_addr_mid,
> > +                         &p_tcp2->remote_mac_addr_lo, p_conn->remote_mac);
> > +     qed_set_fw_mac_addr(&p_tcp2->local_mac_addr_hi,
> > +                         &p_tcp2->local_mac_addr_mid,
> > +                         &p_tcp2->local_mac_addr_lo, p_conn->local_mac);
> > +
> > +     p_tcp2->vlan_id = cpu_to_le16(p_conn->vlan_id);
> > +     p_tcp2->flags = cpu_to_le16(p_conn->tcp_flags);
> > +
> > +     p_tcp2->ip_version = p_conn->ip_version;
> > +     for (i = 0; i < 4; i++) {
> > +             dval = p_conn->remote_ip[i];
> > +             p_tcp2->remote_ip[i] = cpu_to_le32(dval);
> > +             dval = p_conn->local_ip[i];
> > +             p_tcp2->local_ip[i] = cpu_to_le32(dval);
> > +     }
> > +
>
> What is this?
> Some convoluted way of assigning the IP address in little endian?
> Pointless if it's IPv4, as then each bit is just one byte.
> And if it's for IPv6, what do you do for IPv4?
> And isn't there a helper for it?

Endianity here is only for BE machines.
I haven't found a relevant helper function,
Will re-write to have cleaner implementation separately for ipv4 and ipv6.

>
> > +     p_tcp2->flow_label = cpu_to_le32(p_conn->flow_label);
> > +     p_tcp2->ttl = p_conn->ttl;
> > +     p_tcp2->tos_or_tc = p_conn->tos_or_tc;
> > +     p_tcp2->remote_port = cpu_to_le16(p_conn->remote_port);
> > +     p_tcp2->local_port = cpu_to_le16(p_conn->local_port);
> > +     p_tcp2->mss = cpu_to_le16(p_conn->mss);
> > +     p_tcp2->rcv_wnd_scale = p_conn->rcv_wnd_scale;
> > +     p_tcp2->connect_mode = p_conn->connect_mode;
> > +     p_tcp2->cwnd = cpu_to_le32(p_conn->cwnd);
> > +     p_tcp2->ka_max_probe_cnt = p_conn->ka_max_probe_cnt;
> > +     p_tcp2->ka_timeout = cpu_to_le32(p_conn->ka_timeout);
> > +     p_tcp2->max_rt_time = cpu_to_le32(p_conn->max_rt_time);
> > +     p_tcp2->ka_interval = cpu_to_le32(p_conn->ka_interval);
> > +
> > +     return qed_spq_post(p_hwfn, p_ent, NULL);
> > +}
> > +
> > +static int qed_sp_nvmetcp_conn_update(struct qed_hwfn *p_hwfn,
> > +                                   struct qed_nvmetcp_conn *p_conn,
> > +                                   enum spq_mode comp_mode,
> > +                                   struct qed_spq_comp_cb *p_comp_addr)
> > +{
> > +     struct nvmetcp_conn_update_ramrod_params *p_ramrod = NULL;
> > +     struct qed_spq_entry *p_ent = NULL;
> > +     struct qed_sp_init_data init_data;
> > +     int rc = -EINVAL;
> > +     u32 dval;
> > +
> > +     /* Get SPQ entry */
> > +     memset(&init_data, 0, sizeof(init_data));
> > +     init_data.cid = p_conn->icid;
> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> > +     init_data.comp_mode = comp_mode;
> > +     init_data.p_comp_data = p_comp_addr;
> > +
> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,
> > +                              NVMETCP_RAMROD_CMD_ID_UPDATE_CONN,
> > +                              PROTOCOLID_NVMETCP, &init_data);
> > +     if (rc)
> > +             return rc;
> > +
> > +     p_ramrod = &p_ent->ramrod.nvmetcp_conn_update;
> > +     p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);
> > +     p_ramrod->flags = p_conn->update_flag;
> > +     p_ramrod->max_seq_size = cpu_to_le32(p_conn->max_seq_size);
> > +     dval = p_conn->max_recv_pdu_length;
> > +     p_ramrod->max_recv_pdu_length = cpu_to_le32(dval);
> > +     dval = p_conn->max_send_pdu_length;
> > +     p_ramrod->max_send_pdu_length = cpu_to_le32(dval);
> > +     dval = p_conn->first_seq_length;
> > +     p_ramrod->first_seq_length = cpu_to_le32(dval);
> > +
> > +     return qed_spq_post(p_hwfn, p_ent, NULL);
> > +}
> > +
> > +static int qed_sp_nvmetcp_conn_terminate(struct qed_hwfn *p_hwfn,
> > +                                      struct qed_nvmetcp_conn *p_conn,
> > +                                      enum spq_mode comp_mode,
> > +                                      struct qed_spq_comp_cb *p_comp_addr)
> > +{
> > +     struct nvmetcp_spe_conn_termination *p_ramrod = NULL;
> > +     struct qed_spq_entry *p_ent = NULL;
> > +     struct qed_sp_init_data init_data;
> > +     int rc = -EINVAL;
> > +
> > +     /* Get SPQ entry */
> > +     memset(&init_data, 0, sizeof(init_data));
> > +     init_data.cid = p_conn->icid;
> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> > +     init_data.comp_mode = comp_mode;
> > +     init_data.p_comp_data = p_comp_addr;
> > +
> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,
> > +                              NVMETCP_RAMROD_CMD_ID_TERMINATION_CONN,
> > +                              PROTOCOLID_NVMETCP, &init_data);
> > +     if (rc)
> > +             return rc;
> > +
> > +     p_ramrod = &p_ent->ramrod.nvmetcp_conn_terminate;
> > +     p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);
> > +     p_ramrod->abortive = p_conn->abortive_dsconnect;
> > +
> > +     return qed_spq_post(p_hwfn, p_ent, NULL);
> > +}
> > +
> > +static int qed_sp_nvmetcp_conn_clear_sq(struct qed_hwfn *p_hwfn,
> > +                                     struct qed_nvmetcp_conn *p_conn,
> > +                                     enum spq_mode comp_mode,
> > +                                     struct qed_spq_comp_cb *p_comp_addr)
> > +{
> > +     struct qed_spq_entry *p_ent = NULL;
> > +     struct qed_sp_init_data init_data;
> > +     int rc = -EINVAL;
> > +
> > +     /* Get SPQ entry */
> > +     memset(&init_data, 0, sizeof(init_data));
> > +     init_data.cid = p_conn->icid;
> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> > +     init_data.comp_mode = comp_mode;
> > +     init_data.p_comp_data = p_comp_addr;
> > +
> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,
> > +                              NVMETCP_RAMROD_CMD_ID_CLEAR_SQ,
> > +                              PROTOCOLID_NVMETCP, &init_data);
> > +     if (rc)
> > +             return rc;
> > +
> > +     return qed_spq_post(p_hwfn, p_ent, NULL);
> > +}
> > +
> > +static void __iomem *qed_nvmetcp_get_db_addr(struct qed_hwfn *p_hwfn, u32 cid)
> > +{
> > +     return (u8 __iomem *)p_hwfn->doorbells +
> > +                          qed_db_addr(cid, DQ_DEMS_LEGACY);
> > +}
> > +
> > +static int qed_nvmetcp_allocate_connection(struct qed_hwfn *p_hwfn,
> > +                                        struct qed_nvmetcp_conn **p_out_conn)
> > +{
> > +     struct qed_chain_init_params params = {
> > +             .mode           = QED_CHAIN_MODE_PBL,
> > +             .intended_use   = QED_CHAIN_USE_TO_CONSUME_PRODUCE,
> > +             .cnt_type       = QED_CHAIN_CNT_TYPE_U16,
> > +     };
> > +     struct qed_nvmetcp_pf_params *p_params = NULL;
> > +     struct qed_nvmetcp_conn *p_conn = NULL;
> > +     int rc = 0;
> > +
> > +     /* Try finding a free connection that can be used */
> > +     spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
> > +     if (!list_empty(&p_hwfn->p_nvmetcp_info->free_list))
> > +             p_conn = list_first_entry(&p_hwfn->p_nvmetcp_info->free_list,
> > +                                       struct qed_nvmetcp_conn, list_entry);
> > +     if (p_conn) {
> > +             list_del(&p_conn->list_entry);
> > +             spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
> > +             *p_out_conn = p_conn;
> > +
> > +             return 0;
> > +     }
> > +     spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
> > +
> > +     /* Need to allocate a new connection */
> > +     p_params = &p_hwfn->pf_params.nvmetcp_pf_params;
> > +
> > +     p_conn = kzalloc(sizeof(*p_conn), GFP_KERNEL);
> > +     if (!p_conn)
> > +             return -ENOMEM;
> > +
> > +     params.num_elems = p_params->num_r2tq_pages_in_ring *
> > +                        QED_CHAIN_PAGE_SIZE / sizeof(struct nvmetcp_wqe);
> > +     params.elem_size = sizeof(struct nvmetcp_wqe);
> > +
> > +     rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->r2tq, &params);
> > +     if (rc)
> > +             goto nomem_r2tq;
> > +
> > +     params.num_elems = p_params->num_uhq_pages_in_ring *
> > +                        QED_CHAIN_PAGE_SIZE / sizeof(struct iscsi_uhqe);
> > +     params.elem_size = sizeof(struct iscsi_uhqe);
> > +
> > +     rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->uhq, &params);
> > +     if (rc)
> > +             goto nomem_uhq;
> > +
> > +     params.elem_size = sizeof(struct iscsi_xhqe);
> > +
> > +     rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->xhq, &params);
> > +     if (rc)
> > +             goto nomem;
> > +
> > +     p_conn->free_on_delete = true;
> > +     *p_out_conn = p_conn;
> > +
> > +     return 0;
> > +
> > +nomem:
> > +     qed_chain_free(p_hwfn->cdev, &p_conn->uhq);
> > +nomem_uhq:
> > +     qed_chain_free(p_hwfn->cdev, &p_conn->r2tq);
> > +nomem_r2tq:
> > +     kfree(p_conn);
> > +
> > +     return -ENOMEM;
> > +}
> > +
> > +static int qed_nvmetcp_acquire_connection(struct qed_hwfn *p_hwfn,
> > +                                       struct qed_nvmetcp_conn **p_out_conn)
> > +{
> > +     struct qed_nvmetcp_conn *p_conn = NULL;
> > +     int rc = 0;
> > +     u32 icid;
> > +
> > +     spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
> > +     rc = qed_cxt_acquire_cid(p_hwfn, PROTOCOLID_NVMETCP, &icid);
> > +     spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
> > +
> > +     if (rc)
> > +             return rc;
> > +
> > +     rc = qed_nvmetcp_allocate_connection(p_hwfn, &p_conn);
> > +     if (rc) {
> > +             spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
> > +             qed_cxt_release_cid(p_hwfn, icid);
> > +             spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
> > +
> > +             return rc;
> > +     }
> > +
> > +     p_conn->icid = icid;
> > +     p_conn->conn_id = (u16)icid;
> > +     p_conn->fw_cid = (p_hwfn->hw_info.opaque_fid << 16) | icid;
> > +     *p_out_conn = p_conn;
> > +
> > +     return rc;
> > +}
> > +
> > +static void qed_nvmetcp_release_connection(struct qed_hwfn *p_hwfn,
> > +                                        struct qed_nvmetcp_conn *p_conn)
> > +{
> > +     spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);
> > +     list_add_tail(&p_conn->list_entry, &p_hwfn->p_nvmetcp_info->free_list);
> > +     qed_cxt_release_cid(p_hwfn, p_conn->icid);
> > +     spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);
> > +}
> > +
> > +static void qed_nvmetcp_free_connection(struct qed_hwfn *p_hwfn,
> > +                                     struct qed_nvmetcp_conn *p_conn)
> > +{
> > +     qed_chain_free(p_hwfn->cdev, &p_conn->xhq);
> > +     qed_chain_free(p_hwfn->cdev, &p_conn->uhq);
> > +     qed_chain_free(p_hwfn->cdev, &p_conn->r2tq);
> > +
> > +     kfree(p_conn);
> > +}
> > +
> > +int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn)
> > +{
> > +     struct qed_nvmetcp_info *p_nvmetcp_info;
> > +
> > +     p_nvmetcp_info = kzalloc(sizeof(*p_nvmetcp_info), GFP_KERNEL);
> > +     if (!p_nvmetcp_info)
> > +             return -ENOMEM;
> > +
> > +     INIT_LIST_HEAD(&p_nvmetcp_info->free_list);
> > +
> > +     p_hwfn->p_nvmetcp_info = p_nvmetcp_info;
> > +
> > +     return 0;
> > +}
> > +
> > +void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn)
> > +{
> > +     spin_lock_init(&p_hwfn->p_nvmetcp_info->lock);
> > +}
> > +
> > +void qed_nvmetcp_free(struct qed_hwfn *p_hwfn)
> > +{
> > +     struct qed_nvmetcp_conn *p_conn = NULL;
> > +
> > +     if (!p_hwfn->p_nvmetcp_info)
> > +             return;
> > +
> > +     while (!list_empty(&p_hwfn->p_nvmetcp_info->free_list)) {
> > +             p_conn = list_first_entry(&p_hwfn->p_nvmetcp_info->free_list,
> > +                                       struct qed_nvmetcp_conn, list_entry);
> > +             if (p_conn) {
> > +                     list_del(&p_conn->list_entry);
> > +                     qed_nvmetcp_free_connection(p_hwfn, p_conn);
> > +             }
> > +     }
> > +
> > +     kfree(p_hwfn->p_nvmetcp_info);
> > +     p_hwfn->p_nvmetcp_info = NULL;
> > +}
> > +
> > +static int qed_nvmetcp_acquire_conn(struct qed_dev *cdev,
> > +                                 u32 *handle,
> > +                                 u32 *fw_cid, void __iomem **p_doorbell)
> > +{
> > +     struct qed_hash_nvmetcp_con *hash_con;
> > +     int rc;
> > +
> > +     /* Allocate a hashed connection */
> > +     hash_con = kzalloc(sizeof(*hash_con), GFP_ATOMIC);
> > +     if (!hash_con)
> > +             return -ENOMEM;
> > +
> > +     /* Acquire the connection */
> > +     rc = qed_nvmetcp_acquire_connection(QED_AFFIN_HWFN(cdev),
> > +                                         &hash_con->con);
> > +     if (rc) {
> > +             DP_NOTICE(cdev, "Failed to acquire Connection\n");
> > +             kfree(hash_con);
> > +
> > +             return rc;
> > +     }
> > +
> > +     /* Added the connection to hash table */
> > +     *handle = hash_con->con->icid;
> > +     *fw_cid = hash_con->con->fw_cid;
> > +     hash_add(cdev->connections, &hash_con->node, *handle);
> > +
> > +     if (p_doorbell)
> > +             *p_doorbell = qed_nvmetcp_get_db_addr(QED_AFFIN_HWFN(cdev),
> > +                                                   *handle);
> > +
> > +     return 0;
> > +}
> > +
> > +static int qed_nvmetcp_release_conn(struct qed_dev *cdev, u32 handle)
> > +{
> > +     struct qed_hash_nvmetcp_con *hash_con;
> > +
> > +     hash_con = qed_nvmetcp_get_hash(cdev, handle);
> > +     if (!hash_con) {
> > +             DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
> > +                       handle);
> > +
> > +             return -EINVAL;
> > +     }
> > +
> > +     hlist_del(&hash_con->node);
> > +     qed_nvmetcp_release_connection(QED_AFFIN_HWFN(cdev), hash_con->con);
> > +     kfree(hash_con);
> > +
> > +     return 0;
> > +}
> > +
> > +static int qed_nvmetcp_offload_conn(struct qed_dev *cdev, u32 handle,
> > +                                 struct qed_nvmetcp_params_offload *conn_info)
> > +{
> > +     struct qed_hash_nvmetcp_con *hash_con;
> > +     struct qed_nvmetcp_conn *con;
> > +
> > +     hash_con = qed_nvmetcp_get_hash(cdev, handle);
> > +     if (!hash_con) {
> > +             DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
> > +                       handle);
> > +
> > +             return -EINVAL;
> > +     }
> > +
> > +     /* Update the connection with information from the params */
> > +     con = hash_con->con;
> > +
> > +     /* FW initializations */
> > +     con->layer_code = NVMETCP_SLOW_PATH_LAYER_CODE;
> > +     con->sq_pbl_addr = conn_info->sq_pbl_addr;
> > +     con->nvmetcp_cccid_max_range = conn_info->nvmetcp_cccid_max_range;
> > +     con->nvmetcp_cccid_itid_table_addr = conn_info->nvmetcp_cccid_itid_table_addr;
> > +     con->default_cq = conn_info->default_cq;
> > +
> > +     SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE, 0);
> > +     SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE, 1);
> > +     SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B, 1);
> > +
> > +     /* Networking and TCP stack initializations */
> > +     ether_addr_copy(con->local_mac, conn_info->src.mac);
> > +     ether_addr_copy(con->remote_mac, conn_info->dst.mac);
> > +     memcpy(con->local_ip, conn_info->src.ip, sizeof(con->local_ip));
> > +     memcpy(con->remote_ip, conn_info->dst.ip, sizeof(con->remote_ip));
> > +     con->local_port = conn_info->src.port;
> > +     con->remote_port = conn_info->dst.port;
> > +     con->vlan_id = conn_info->vlan_id;
> > +
> > +     if (conn_info->timestamp_en)
> > +             SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_TS_EN, 1);
> > +
> > +     if (conn_info->delayed_ack_en)
> > +             SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_DA_EN, 1);
> > +
> > +     if (conn_info->tcp_keep_alive_en)
> > +             SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_KA_EN, 1);
> > +
> > +     if (conn_info->ecn_en)
> > +             SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_ECN_EN, 1);
> > +
> > +     con->ip_version = conn_info->ip_version;
> > +     con->flow_label = QED_TCP_FLOW_LABEL;
> > +     con->ka_max_probe_cnt = conn_info->ka_max_probe_cnt;
> > +     con->ka_timeout = conn_info->ka_timeout;
> > +     con->ka_interval = conn_info->ka_interval;
> > +     con->max_rt_time = conn_info->max_rt_time;
> > +     con->ttl = conn_info->ttl;
> > +     con->tos_or_tc = conn_info->tos_or_tc;
> > +     con->mss = conn_info->mss;
> > +     con->cwnd = conn_info->cwnd;
> > +     con->rcv_wnd_scale = conn_info->rcv_wnd_scale;
> > +     con->connect_mode = 0; /* TCP_CONNECT_ACTIVE */
> > +
> > +     return qed_sp_nvmetcp_conn_offload(QED_AFFIN_HWFN(cdev), con,
> > +                                      QED_SPQ_MODE_EBLOCK, NULL);
> > +}
> > +
> > +static int qed_nvmetcp_update_conn(struct qed_dev *cdev,
> > +                                u32 handle,
> > +                                struct qed_nvmetcp_params_update *conn_info)
> > +{
> > +     struct qed_hash_nvmetcp_con *hash_con;
> > +     struct qed_nvmetcp_conn *con;
> > +
> > +     hash_con = qed_nvmetcp_get_hash(cdev, handle);
> > +     if (!hash_con) {
> > +             DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
> > +                       handle);
> > +
> > +             return -EINVAL;
> > +     }
> > +
> > +     /* Update the connection with information from the params */
> > +     con = hash_con->con;
> > +
> > +     SET_FIELD(con->update_flag,
> > +               ISCSI_CONN_UPDATE_RAMROD_PARAMS_INITIAL_R2T, 0);
> > +     SET_FIELD(con->update_flag,
> > +               ISCSI_CONN_UPDATE_RAMROD_PARAMS_IMMEDIATE_DATA, 1);
> > +
> > +     if (conn_info->hdr_digest_en)
> > +             SET_FIELD(con->update_flag, ISCSI_CONN_UPDATE_RAMROD_PARAMS_HD_EN, 1);
> > +
> > +     if (conn_info->data_digest_en)
> > +             SET_FIELD(con->update_flag, ISCSI_CONN_UPDATE_RAMROD_PARAMS_DD_EN, 1);
> > +
> > +     /* Placeholder - initialize pfv, cpda, hpda */
> > +
> > +     con->max_seq_size = conn_info->max_io_size;
> > +     con->max_recv_pdu_length = conn_info->max_recv_pdu_length;
> > +     con->max_send_pdu_length = conn_info->max_send_pdu_length;
> > +     con->first_seq_length = conn_info->max_io_size;
> > +
> > +     return qed_sp_nvmetcp_conn_update(QED_AFFIN_HWFN(cdev), con,
> > +                                     QED_SPQ_MODE_EBLOCK, NULL);
> > +}
> > +
> > +static int qed_nvmetcp_clear_conn_sq(struct qed_dev *cdev, u32 handle)
> > +{
> > +     struct qed_hash_nvmetcp_con *hash_con;
> > +
> > +     hash_con = qed_nvmetcp_get_hash(cdev, handle);
> > +     if (!hash_con) {
> > +             DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
> > +                       handle);
> > +
> > +             return -EINVAL;
> > +     }
> > +
> > +     return qed_sp_nvmetcp_conn_clear_sq(QED_AFFIN_HWFN(cdev), hash_con->con,
> > +                                         QED_SPQ_MODE_EBLOCK, NULL);
> > +}
> > +
> > +static int qed_nvmetcp_destroy_conn(struct qed_dev *cdev,
> > +                                 u32 handle, u8 abrt_conn)
> > +{
> > +     struct qed_hash_nvmetcp_con *hash_con;
> > +
> > +     hash_con = qed_nvmetcp_get_hash(cdev, handle);
> > +     if (!hash_con) {
> > +             DP_NOTICE(cdev, "Failed to find connection for handle %d\n",
> > +                       handle);
> > +
> > +             return -EINVAL;
> > +     }
> > +
> > +     hash_con->con->abortive_dsconnect = abrt_conn;
> > +
> > +     return qed_sp_nvmetcp_conn_terminate(QED_AFFIN_HWFN(cdev), hash_con->con,
> > +                                        QED_SPQ_MODE_EBLOCK, NULL);
> > +}
> > +
> >   static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
> >       .common = &qed_common_ops_pass,
> >       .ll2 = &qed_ll2_ops_pass,
> > @@ -266,8 +838,12 @@ static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {
> >       .register_ops = &qed_register_nvmetcp_ops,
> >       .start = &qed_nvmetcp_start,
> >       .stop = &qed_nvmetcp_stop,
> > -
> > -     /* Placeholder - Connection level ops */
> > +     .acquire_conn = &qed_nvmetcp_acquire_conn,
> > +     .release_conn = &qed_nvmetcp_release_conn,
> > +     .offload_conn = &qed_nvmetcp_offload_conn,
> > +     .update_conn = &qed_nvmetcp_update_conn,
> > +     .destroy_conn = &qed_nvmetcp_destroy_conn,
> > +     .clear_sq = &qed_nvmetcp_clear_conn_sq,
> >   };
> >
> >   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> > index 774b46ade408..749169f0bdb1 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> > @@ -19,6 +19,7 @@
> >   #define QED_NVMETCP_FW_CQ_SIZE (4 * 1024)
> >
> >   /* tcp parameters */
> > +#define QED_TCP_FLOW_LABEL 0
> >   #define QED_TCP_TWO_MSL_TIMER 4000
> >   #define QED_TCP_HALF_WAY_CLOSE_TIMEOUT 10
> >   #define QED_TCP_MAX_FIN_RT 2
> > @@ -32,6 +33,68 @@ struct qed_nvmetcp_info {
> >       nvmetcp_event_cb_t event_cb;
> >   };
> >
> > +struct qed_hash_nvmetcp_con {
> > +     struct hlist_node node;
> > +     struct qed_nvmetcp_conn *con;
> > +};
> > +
> > +struct qed_nvmetcp_conn {
> > +     struct list_head list_entry;
> > +     bool free_on_delete;
> > +
> > +     u16 conn_id;
> > +     u32 icid;
> > +     u32 fw_cid;
> > +
> > +     u8 layer_code;
> > +     u8 offl_flags;
> > +     u8 connect_mode;
> > +
> > +     dma_addr_t sq_pbl_addr;
> > +     struct qed_chain r2tq;
> > +     struct qed_chain xhq;
> > +     struct qed_chain uhq;
> > +
> > +     u8 local_mac[6];
> > +     u8 remote_mac[6];
> > +     u8 ip_version;
> > +     u8 ka_max_probe_cnt;
> > +
> > +     u16 vlan_id;
> > +     u16 tcp_flags;
> > +     u32 remote_ip[4];
> > +     u32 local_ip[4];
> > +
> > +     u32 flow_label;
> > +     u32 ka_timeout;
> > +     u32 ka_interval;
> > +     u32 max_rt_time;
> > +
> > +     u8 ttl;
> > +     u8 tos_or_tc;
> > +     u16 remote_port;
> > +     u16 local_port;
> > +     u16 mss;
> > +     u8 rcv_wnd_scale;
> > +     u32 rcv_wnd;
> > +     u32 cwnd;
> > +
> > +     u8 update_flag;
> > +     u8 default_cq;
> > +     u8 abortive_dsconnect;
> > +
> > +     u32 max_seq_size;
> > +     u32 max_recv_pdu_length;
> > +     u32 max_send_pdu_length;
> > +     u32 first_seq_length;
> > +
> > +     u16 physical_q0;
> > +     u16 physical_q1;
> > +
> > +     u16 nvmetcp_cccid_max_range;
> > +     dma_addr_t nvmetcp_cccid_itid_table_addr;
> > +};
> > +
> >   #if IS_ENABLED(CONFIG_QED_NVMETCP)
> >   int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn);
> >   void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn);
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h
> > index 525159e747a5..60ff3222bf55 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_sp.h
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h
> > @@ -101,6 +101,9 @@ union ramrod_data {
> >       struct iscsi_spe_conn_termination iscsi_conn_terminate;
> >
> >       struct nvmetcp_init_ramrod_params nvmetcp_init;
> > +     struct nvmetcp_spe_conn_offload nvmetcp_conn_offload;
> > +     struct nvmetcp_conn_update_ramrod_params nvmetcp_conn_update;
> > +     struct nvmetcp_spe_conn_termination nvmetcp_conn_terminate;
> >
> >       struct vf_start_ramrod_data vf_start;
> >       struct vf_stop_ramrod_data vf_stop;
> > diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
> > index e9ccfc07041d..c8836b71b866 100644
> > --- a/include/linux/qed/nvmetcp_common.h
> > +++ b/include/linux/qed/nvmetcp_common.h
> > @@ -6,6 +6,8 @@
> >
> >   #include "tcp_common.h"
> >
> > +#define NVMETCP_SLOW_PATH_LAYER_CODE (6)
> > +
> >   /* NVMeTCP firmware function init parameters */
> >   struct nvmetcp_spe_func_init {
> >       __le16 half_way_close_timeout;
> > @@ -43,6 +45,10 @@ enum nvmetcp_ramrod_cmd_id {
> >       NVMETCP_RAMROD_CMD_ID_UNUSED = 0,
> >       NVMETCP_RAMROD_CMD_ID_INIT_FUNC = 1,
> >       NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC = 2,
> > +     NVMETCP_RAMROD_CMD_ID_OFFLOAD_CONN = 3,
> > +     NVMETCP_RAMROD_CMD_ID_UPDATE_CONN = 4,
> > +     NVMETCP_RAMROD_CMD_ID_TERMINATION_CONN = 5,
> > +     NVMETCP_RAMROD_CMD_ID_CLEAR_SQ = 6,
> >       MAX_NVMETCP_RAMROD_CMD_ID
> >   };
> >
> > @@ -51,4 +57,141 @@ struct nvmetcp_glbl_queue_entry {
> >       struct regpair reserved;
> >   };
> >
> > +/* NVMeTCP conn level EQEs */
> > +enum nvmetcp_eqe_opcode {
> > +     NVMETCP_EVENT_TYPE_INIT_FUNC = 0, /* Response after init Ramrod */
> > +     NVMETCP_EVENT_TYPE_DESTROY_FUNC, /* Response after destroy Ramrod */
> > +     NVMETCP_EVENT_TYPE_OFFLOAD_CONN,/* Response after option 2 offload Ramrod */
> > +     NVMETCP_EVENT_TYPE_UPDATE_CONN, /* Response after update Ramrod */
> > +     NVMETCP_EVENT_TYPE_CLEAR_SQ, /* Response after clear sq Ramrod */
> > +     NVMETCP_EVENT_TYPE_TERMINATE_CONN, /* Response after termination Ramrod */
> > +     NVMETCP_EVENT_TYPE_RESERVED0,
> > +     NVMETCP_EVENT_TYPE_RESERVED1,
> > +     NVMETCP_EVENT_TYPE_ASYN_CONNECT_COMPLETE, /* Connect completed (A-syn EQE) */
> > +     NVMETCP_EVENT_TYPE_ASYN_TERMINATE_DONE, /* Termination completed (A-syn EQE) */
> > +     NVMETCP_EVENT_TYPE_START_OF_ERROR_TYPES = 10, /* Separate EQs from err EQs */
> > +     NVMETCP_EVENT_TYPE_ASYN_ABORT_RCVD, /* TCP RST packet receive (A-syn EQE) */
> > +     NVMETCP_EVENT_TYPE_ASYN_CLOSE_RCVD, /* TCP FIN packet receive (A-syn EQE) */
> > +     NVMETCP_EVENT_TYPE_ASYN_SYN_RCVD, /* TCP SYN+ACK packet receive (A-syn EQE) */
> > +     NVMETCP_EVENT_TYPE_ASYN_MAX_RT_TIME, /* TCP max retransmit time (A-syn EQE) */
> > +     NVMETCP_EVENT_TYPE_ASYN_MAX_RT_CNT, /* TCP max retransmit count (A-syn EQE) */
> > +     NVMETCP_EVENT_TYPE_ASYN_MAX_KA_PROBES_CNT, /* TCP ka probes count (A-syn EQE) */
> > +     NVMETCP_EVENT_TYPE_ASYN_FIN_WAIT2, /* TCP fin wait 2 (A-syn EQE) */
> > +     NVMETCP_EVENT_TYPE_NVMETCP_CONN_ERROR, /* NVMeTCP error response (A-syn EQE) */
> > +     NVMETCP_EVENT_TYPE_TCP_CONN_ERROR, /* NVMeTCP error - tcp error (A-syn EQE) */
> > +     MAX_NVMETCP_EQE_OPCODE
> > +};
> > +
> > +struct nvmetcp_conn_offload_section {
> > +     struct regpair cccid_itid_table_addr; /* CCCID to iTID table address */
> > +     __le16 cccid_max_range; /* CCCID max value - used for validation */
> > +     __le16 reserved[3];
> > +};
> > +
> > +/* NVMe TCP connection offload params passed by driver to FW in NVMeTCP offload ramrod */
> > +struct nvmetcp_conn_offload_params {
> > +     struct regpair sq_pbl_addr;
> > +     struct regpair r2tq_pbl_addr;
> > +     struct regpair xhq_pbl_addr;
> > +     struct regpair uhq_pbl_addr;
> > +     __le16 physical_q0;
> > +     __le16 physical_q1;
> > +     u8 flags;
> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B_MASK 0x1
> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B_SHIFT 0
> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE_MASK 0x1
> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE_SHIFT 1
> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_MASK 0x1
> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_SHIFT 2
> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE_MASK 0x1
> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE_SHIFT 3
> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESERVED1_MASK 0xF
> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESERVED1_SHIFT 4
> > +     u8 default_cq;
> > +     __le16 reserved0;
> > +     __le32 reserved1;
> > +     __le32 initial_ack;
> > +
> > +     struct nvmetcp_conn_offload_section nvmetcp; /* NVMe/TCP section */
> > +};
> > +
> > +/* NVMe TCP and TCP connection offload params passed by driver to FW in NVMeTCP offload ramrod. */
> > +struct nvmetcp_spe_conn_offload {
> > +     __le16 reserved;
> > +     __le16 conn_id;
> > +     __le32 fw_cid;
> > +     struct nvmetcp_conn_offload_params nvmetcp;
> > +     struct tcp_offload_params_opt2 tcp;
> > +};
> > +
> > +/* NVMeTCP connection update params passed by driver to FW in NVMETCP update ramrod. */
> > +struct nvmetcp_conn_update_ramrod_params {
> > +     __le16 reserved0;
> > +     __le16 conn_id;
> > +     __le32 reserved1;
> > +     u8 flags;
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_HD_EN_MASK 0x1
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_HD_EN_SHIFT 0
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_DD_EN_MASK 0x1
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_DD_EN_SHIFT 1
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED0_MASK 0x1
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED0_SHIFT 2
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED1_MASK 0x1
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED1_DATA_SHIFT 3
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED2_MASK 0x1
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED2_SHIFT 4
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED3_MASK 0x1
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED3_SHIFT 5
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED4_MASK 0x1
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED4_SHIFT 6
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED5_MASK 0x1
> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED5_SHIFT 7
> > +     u8 reserved3[3];
> > +     __le32 max_seq_size;
> > +     __le32 max_send_pdu_length;
> > +     __le32 max_recv_pdu_length;
> > +     __le32 first_seq_length;
> > +     __le32 reserved4[5];
> > +};
> > +
> > +/* NVMeTCP connection termination request */
> > +struct nvmetcp_spe_conn_termination {
> > +     __le16 reserved0;
> > +     __le16 conn_id;
> > +     __le32 reserved1;
> > +     u8 abortive;
> > +     u8 reserved2[7];
> > +     struct regpair reserved3;
> > +     struct regpair reserved4;
> > +};
> > +
> > +struct nvmetcp_dif_flags {
> > +     u8 flags;
> > +};
> > +
> > +enum nvmetcp_wqe_type {
> > +     NVMETCP_WQE_TYPE_NORMAL,
> > +     NVMETCP_WQE_TYPE_TASK_CLEANUP,
> > +     NVMETCP_WQE_TYPE_MIDDLE_PATH,
> > +     NVMETCP_WQE_TYPE_IC,
> > +     MAX_NVMETCP_WQE_TYPE
> > +};
> > +
> > +struct nvmetcp_wqe {
> > +     __le16 task_id;
> > +     u8 flags;
> > +#define NVMETCP_WQE_WQE_TYPE_MASK 0x7 /* [use nvmetcp_wqe_type] */
> > +#define NVMETCP_WQE_WQE_TYPE_SHIFT 0
> > +#define NVMETCP_WQE_NUM_SGES_MASK 0xF
> > +#define NVMETCP_WQE_NUM_SGES_SHIFT 3
> > +#define NVMETCP_WQE_RESPONSE_MASK 0x1
> > +#define NVMETCP_WQE_RESPONSE_SHIFT 7
> > +     struct nvmetcp_dif_flags prot_flags;
> > +     __le32 contlen_cdbsize;
> > +#define NVMETCP_WQE_CONT_LEN_MASK 0xFFFFFF
> > +#define NVMETCP_WQE_CONT_LEN_SHIFT 0
> > +#define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_MASK 0xFF
> > +#define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_SHIFT 24
> > +};
> > +
> >   #endif /* __NVMETCP_COMMON__ */
> > diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h
> > index abc1f41862e3..96263e3cfa1e 100644
> > --- a/include/linux/qed/qed_nvmetcp_if.h
> > +++ b/include/linux/qed/qed_nvmetcp_if.h
> > @@ -25,6 +25,50 @@ struct qed_nvmetcp_tid {
> >       u8 *blocks[MAX_TID_BLOCKS_NVMETCP];
> >   };
> >
> > +struct qed_nvmetcp_id_params {
> > +     u8 mac[ETH_ALEN];
> > +     u32 ip[4];
> > +     u16 port;
> > +};
> > +
> > +struct qed_nvmetcp_params_offload {
> > +     /* FW initializations */
> > +     dma_addr_t sq_pbl_addr;
> > +     dma_addr_t nvmetcp_cccid_itid_table_addr;
> > +     u16 nvmetcp_cccid_max_range;
> > +     u8 default_cq;
> > +
> > +     /* Networking and TCP stack initializations */
> > +     struct qed_nvmetcp_id_params src;
> > +     struct qed_nvmetcp_id_params dst;
> > +     u32 ka_timeout;
> > +     u32 ka_interval;
> > +     u32 max_rt_time;
> > +     u32 cwnd;
> > +     u16 mss;
> > +     u16 vlan_id;
> > +     bool timestamp_en;
> > +     bool delayed_ack_en;
> > +     bool tcp_keep_alive_en;
> > +     bool ecn_en;
> > +     u8 ip_version;
> > +     u8 ka_max_probe_cnt;
> > +     u8 ttl;
> > +     u8 tos_or_tc;
> > +     u8 rcv_wnd_scale;
> > +};
> > +
> > +struct qed_nvmetcp_params_update {
> > +     u32 max_io_size;
> > +     u32 max_recv_pdu_length;
> > +     u32 max_send_pdu_length;
> > +
> > +     /* Placeholder: pfv, cpda, hpda */
> > +
> > +     bool hdr_digest_en;
> > +     bool data_digest_en;
> > +};
> > +
> >   struct qed_nvmetcp_cb_ops {
> >       struct qed_common_cb_ops common;
> >   };
> > @@ -48,6 +92,38 @@ struct qed_nvmetcp_cb_ops {
> >    * @stop:           nvmetcp in FW
> >    *                  @param cdev
> >    *                  return 0 on success, otherwise error value.
> > + * @acquire_conn:    acquire a new nvmetcp connection
> > + *                   @param cdev
> > + *                   @param handle - qed will fill handle that should be
> > + *                           used henceforth as identifier of the
> > + *                           connection.
> > + *                   @param p_doorbell - qed will fill the address of the
> > + *                           doorbell.
> > + *                   @return 0 on sucesss, otherwise error value.
> > + * @release_conn:    release a previously acquired nvmetcp connection
> > + *                   @param cdev
> > + *                   @param handle - the connection handle.
> > + *                   @return 0 on success, otherwise error value.
> > + * @offload_conn:    configures an offloaded connection
> > + *                   @param cdev
> > + *                   @param handle - the connection handle.
> > + *                   @param conn_info - the configuration to use for the
> > + *                           offload.
> > + *                   @return 0 on success, otherwise error value.
> > + * @update_conn:     updates an offloaded connection
> > + *                   @param cdev
> > + *                   @param handle - the connection handle.
> > + *                   @param conn_info - the configuration to use for the
> > + *                           offload.
> > + *                   @return 0 on success, otherwise error value.
> > + * @destroy_conn:    stops an offloaded connection
> > + *                   @param cdev
> > + *                   @param handle - the connection handle.
> > + *                   @return 0 on success, otherwise error value.
> > + * @clear_sq:                clear all task in sq
> > + *                   @param cdev
> > + *                   @param handle - the connection handle.
> > + *                   @return 0 on success, otherwise error value.
> >    */
> >   struct qed_nvmetcp_ops {
> >       const struct qed_common_ops *common;
> > @@ -65,6 +141,24 @@ struct qed_nvmetcp_ops {
> >                    void *event_context, nvmetcp_event_cb_t async_event_cb);
> >
> >       int (*stop)(struct qed_dev *cdev);
> > +
> > +     int (*acquire_conn)(struct qed_dev *cdev,
> > +                         u32 *handle,
> > +                         u32 *fw_cid, void __iomem **p_doorbell);
> > +
> > +     int (*release_conn)(struct qed_dev *cdev, u32 handle);
> > +
> > +     int (*offload_conn)(struct qed_dev *cdev,
> > +                         u32 handle,
> > +                         struct qed_nvmetcp_params_offload *conn_info);
> > +
> > +     int (*update_conn)(struct qed_dev *cdev,
> > +                        u32 handle,
> > +                        struct qed_nvmetcp_params_update *conn_info);
> > +
> > +     int (*destroy_conn)(struct qed_dev *cdev, u32 handle, u8 abrt_conn);
> > +
> > +     int (*clear_sq)(struct qed_dev *cdev, u32 handle);
> >   };
> >
> >   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);
> >
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 03/27] qed: Add qed-NVMeTCP personality
  2021-05-02 11:11   ` Hannes Reinecke
@ 2021-05-03 15:26     ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-03 15:26 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Shai Malin, netdev, linux-nvme, davem, kuba, sagi, hch, axboe,
	kbusch, Ariel Elior, Michal Kalderon, okulkarni, pkushwaha

On 5/1/21 2:11 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:
> > From: Omkar Kulkarni <okulkarni@marvell.com>
> >
> > This patch adds qed NVMeTCP personality in order to support the NVMeTCP
> > qed functionalities and manage the HW device shared resources.
> > The same design is used with Eth (qede), RDMA(qedr), iSCSI (qedi) and
> > FCoE (qedf).
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > ---
> >   drivers/net/ethernet/qlogic/qed/qed.h         |  3 ++
> >   drivers/net/ethernet/qlogic/qed/qed_cxt.c     | 32 ++++++++++++++
> >   drivers/net/ethernet/qlogic/qed/qed_cxt.h     |  1 +
> >   drivers/net/ethernet/qlogic/qed/qed_dev.c     | 44 ++++++++++++++++---
> >   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |  3 +-
> >   drivers/net/ethernet/qlogic/qed/qed_ll2.c     | 31 ++++++++-----
> >   drivers/net/ethernet/qlogic/qed/qed_mcp.c     |  3 ++
> >   drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |  3 +-
> >   drivers/net/ethernet/qlogic/qed/qed_ooo.c     |  5 ++-
> >   .../net/ethernet/qlogic/qed/qed_sp_commands.c |  1 +
> >   10 files changed, 108 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
> > index 91d4635009ab..7ae648c4edba 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed.h
> > +++ b/drivers/net/ethernet/qlogic/qed/qed.h
> > @@ -200,6 +200,7 @@ enum qed_pci_personality {
> >       QED_PCI_ETH,
> >       QED_PCI_FCOE,
> >       QED_PCI_ISCSI,
> > +     QED_PCI_NVMETCP,
> >       QED_PCI_ETH_ROCE,
> >       QED_PCI_ETH_IWARP,
> >       QED_PCI_ETH_RDMA,
> > @@ -285,6 +286,8 @@ struct qed_hw_info {
> >       ((dev)->hw_info.personality == QED_PCI_FCOE)
> >   #define QED_IS_ISCSI_PERSONALITY(dev)                                       \
> >       ((dev)->hw_info.personality == QED_PCI_ISCSI)
> > +#define QED_IS_NVMETCP_PERSONALITY(dev)                                      \
> > +     ((dev)->hw_info.personality == QED_PCI_NVMETCP)
> >
> So you have a distinct PCI personality for NVMe-oF, but not for the
> protocol? Strange.
> Why don't you have a distinct NVMe-oF protocol ID?
>
> >       /* Resource Allocation scheme results */
> >       u32                             resc_start[QED_MAX_RESC];
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
> > index 0a22f8ce9a2c..6cef75723e38 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
> > @@ -2106,6 +2106,30 @@ int qed_cxt_set_pf_params(struct qed_hwfn *p_hwfn, u32 rdma_tasks)
> >               }
> >               break;
> >       }
> > +     case QED_PCI_NVMETCP:
> > +     {
> > +             struct qed_nvmetcp_pf_params *p_params;
> > +
> > +             p_params = &p_hwfn->pf_params.nvmetcp_pf_params;
> > +
> > +             if (p_params->num_cons && p_params->num_tasks) {
> > +                     qed_cxt_set_proto_cid_count(p_hwfn,
> > +                                                 PROTOCOLID_NVMETCP,
> > +                                                 p_params->num_cons,
> > +                                                 0);
> > +
> > +                     qed_cxt_set_proto_tid_count(p_hwfn,
> > +                                                 PROTOCOLID_NVMETCP,
> > +                                                 QED_CTX_NVMETCP_TID_SEG,
> > +                                                 0,
> > +                                                 p_params->num_tasks,
> > +                                                 true);
> > +             } else {
> > +                     DP_INFO(p_hwfn->cdev,
> > +                             "NvmeTCP personality used without setting params!\n");
> > +             }
> > +             break;
> > +     }
> >       default:
> >               return -EINVAL;
> >       }
> > @@ -2132,6 +2156,10 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,
> >               proto = PROTOCOLID_ISCSI;
> >               seg = QED_CXT_ISCSI_TID_SEG;
> >               break;
> > +     case QED_PCI_NVMETCP:
> > +             proto = PROTOCOLID_NVMETCP;
> > +             seg = QED_CTX_NVMETCP_TID_SEG;
> > +             break;
> >       default:
> >               return -EINVAL;
> >       }
> > @@ -2458,6 +2486,10 @@ int qed_cxt_get_task_ctx(struct qed_hwfn *p_hwfn,
> >               proto = PROTOCOLID_ISCSI;
> >               seg = QED_CXT_ISCSI_TID_SEG;
> >               break;
> > +     case QED_PCI_NVMETCP:
> > +             proto = PROTOCOLID_NVMETCP;
> > +             seg = QED_CTX_NVMETCP_TID_SEG;
> > +             break;
> >       default:
> >               return -EINVAL;
> >       }
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.h b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
> > index 056e79620a0e..8f1a77cb33f6 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_cxt.h
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
> > @@ -51,6 +51,7 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,
> >                            struct qed_tid_mem *p_info);
> >
> >   #define QED_CXT_ISCSI_TID_SEG       PROTOCOLID_ISCSI
> > +#define QED_CTX_NVMETCP_TID_SEG PROTOCOLID_NVMETCP
> >   #define QED_CXT_ROCE_TID_SEG        PROTOCOLID_ROCE
> >   #define QED_CXT_FCOE_TID_SEG        PROTOCOLID_FCOE
> >   enum qed_cxt_elem_type {
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c
> > index d2f5855b2ea7..d3f8cc42d07e 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
> > @@ -37,6 +37,7 @@
> >   #include "qed_sriov.h"
> >   #include "qed_vf.h"
> >   #include "qed_rdma.h"
> > +#include "qed_nvmetcp.h"
> >
> >   static DEFINE_SPINLOCK(qm_lock);
> >
> > @@ -667,7 +668,8 @@ qed_llh_set_engine_affin(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
> >       }
> >
> >       /* Storage PF is bound to a single engine while L2 PF uses both */
> > -     if (QED_IS_FCOE_PERSONALITY(p_hwfn) || QED_IS_ISCSI_PERSONALITY(p_hwfn))
> > +     if (QED_IS_FCOE_PERSONALITY(p_hwfn) || QED_IS_ISCSI_PERSONALITY(p_hwfn) ||
> > +         QED_IS_NVMETCP_PERSONALITY(p_hwfn))
> >               eng = cdev->fir_affin ? QED_ENG1 : QED_ENG0;
> >       else                    /* L2_PERSONALITY */
> >               eng = QED_BOTH_ENG;
> > @@ -1164,6 +1166,9 @@ void qed_llh_remove_mac_filter(struct qed_dev *cdev,
> >       if (!test_bit(QED_MF_LLH_MAC_CLSS, &cdev->mf_bits))
> >               goto out;
> >
> > +     if (QED_IS_NVMETCP_PERSONALITY(p_hwfn))
> > +             return;
> > +
> >       ether_addr_copy(filter.mac.addr, mac_addr);
> >       rc = qed_llh_shadow_remove_filter(cdev, ppfid, &filter, &filter_idx,
> >                                         &ref_cnt);
> > @@ -1381,6 +1386,11 @@ void qed_resc_free(struct qed_dev *cdev)
> >                       qed_ooo_free(p_hwfn);
> >               }
> >
> > +             if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
> > +                     qed_nvmetcp_free(p_hwfn);
> > +                     qed_ooo_free(p_hwfn);
> > +             }
> > +
> >               if (QED_IS_RDMA_PERSONALITY(p_hwfn) && rdma_info) {
> >                       qed_spq_unregister_async_cb(p_hwfn, rdma_info->proto);
> >                       qed_rdma_info_free(p_hwfn);
> > @@ -1423,6 +1433,7 @@ static u32 qed_get_pq_flags(struct qed_hwfn *p_hwfn)
> >               flags |= PQ_FLAGS_OFLD;
> >               break;
> >       case QED_PCI_ISCSI:
> > +     case QED_PCI_NVMETCP:
> >               flags |= PQ_FLAGS_ACK | PQ_FLAGS_OOO | PQ_FLAGS_OFLD;
> >               break;
> >       case QED_PCI_ETH_ROCE:
> > @@ -2269,6 +2280,12 @@ int qed_resc_alloc(struct qed_dev *cdev)
> >                                                       PROTOCOLID_ISCSI,
> >                                                       NULL);
> >                       n_eqes += 2 * num_cons;
> > +             } else if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
> > +                     num_cons =
> > +                         qed_cxt_get_proto_cid_count(p_hwfn,
> > +                                                     PROTOCOLID_NVMETCP,
> > +                                                     NULL);
> > +                     n_eqes += 2 * num_cons;
> >               }
> >
> >               if (n_eqes > 0xFFFF) {
> > @@ -2313,6 +2330,15 @@ int qed_resc_alloc(struct qed_dev *cdev)
> >                               goto alloc_err;
> >               }
> >
> > +             if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
> > +                     rc = qed_nvmetcp_alloc(p_hwfn);
> > +                     if (rc)
> > +                             goto alloc_err;
> > +                     rc = qed_ooo_alloc(p_hwfn);
> > +                     if (rc)
> > +                             goto alloc_err;
> > +             }
> > +
> >               if (QED_IS_RDMA_PERSONALITY(p_hwfn)) {
> >                       rc = qed_rdma_info_alloc(p_hwfn);
> >                       if (rc)
> > @@ -2393,6 +2419,11 @@ void qed_resc_setup(struct qed_dev *cdev)
> >                       qed_iscsi_setup(p_hwfn);
> >                       qed_ooo_setup(p_hwfn);
> >               }
> > +
> > +             if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {
> > +                     qed_nvmetcp_setup(p_hwfn);
> > +                     qed_ooo_setup(p_hwfn);
> > +             }
> >       }
> >   }
> >
> > @@ -2854,7 +2885,8 @@ static int qed_hw_init_pf(struct qed_hwfn *p_hwfn,
> >
> >       /* Protocol Configuration */
> >       STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_TCP_RT_OFFSET,
> > -                  (p_hwfn->hw_info.personality == QED_PCI_ISCSI) ? 1 : 0);
> > +                  ((p_hwfn->hw_info.personality == QED_PCI_ISCSI) ||
> > +                      (p_hwfn->hw_info.personality == QED_PCI_NVMETCP)) ? 1 : 0);
> >       STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_FCOE_RT_OFFSET,
> >                    (p_hwfn->hw_info.personality == QED_PCI_FCOE) ? 1 : 0);
> >       STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_ROCE_RT_OFFSET, 0);
> > @@ -3531,7 +3563,7 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)
> >                                              RESC_NUM(p_hwfn,
> >                                                       QED_CMDQS_CQS));
> >
> > -     if (QED_IS_ISCSI_PERSONALITY(p_hwfn))
> > +     if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))
> >               feat_num[QED_ISCSI_CQ] = min_t(u32, sb_cnt.cnt,
> >                                              RESC_NUM(p_hwfn,
> >                                                       QED_CMDQS_CQS));
> > @@ -3734,7 +3766,8 @@ int qed_hw_get_dflt_resc(struct qed_hwfn *p_hwfn,
> >               break;
> >       case QED_BDQ:
> >               if (p_hwfn->hw_info.personality != QED_PCI_ISCSI &&
> > -                 p_hwfn->hw_info.personality != QED_PCI_FCOE)
> > +                 p_hwfn->hw_info.personality != QED_PCI_FCOE &&
> > +                     p_hwfn->hw_info.personality != QED_PCI_NVMETCP)
> >                       *p_resc_num = 0;
> >               else
> >                       *p_resc_num = 1;
> > @@ -3755,7 +3788,8 @@ int qed_hw_get_dflt_resc(struct qed_hwfn *p_hwfn,
> >                       *p_resc_start = 0;
> >               else if (p_hwfn->cdev->num_ports_in_engine == 4)
> >                       *p_resc_start = p_hwfn->port_id;
> > -             else if (p_hwfn->hw_info.personality == QED_PCI_ISCSI)
> > +             else if (p_hwfn->hw_info.personality == QED_PCI_ISCSI ||
> > +                      p_hwfn->hw_info.personality == QED_PCI_NVMETCP)
> >                       *p_resc_start = p_hwfn->port_id;
> >               else if (p_hwfn->hw_info.personality == QED_PCI_FCOE)
> >                       *p_resc_start = p_hwfn->port_id + 2;
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> > index 24472f6a83c2..9c9ec8f53ef8 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
> > @@ -12148,7 +12148,8 @@ struct public_func {
> >   #define FUNC_MF_CFG_PROTOCOL_ISCSI              0x00000010
> >   #define FUNC_MF_CFG_PROTOCOL_FCOE               0x00000020
> >   #define FUNC_MF_CFG_PROTOCOL_ROCE               0x00000030
> > -#define FUNC_MF_CFG_PROTOCOL_MAX     0x00000030
> > +#define FUNC_MF_CFG_PROTOCOL_NVMETCP    0x00000040
> > +#define FUNC_MF_CFG_PROTOCOL_MAX     0x00000040
> >
> >   #define FUNC_MF_CFG_MIN_BW_MASK             0x0000ff00
> >   #define FUNC_MF_CFG_MIN_BW_SHIFT    8
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> > index 49783f365079..88bfcdcd4a4c 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> > @@ -960,7 +960,8 @@ static int qed_sp_ll2_rx_queue_start(struct qed_hwfn *p_hwfn,
> >
> >       if (test_bit(QED_MF_LL2_NON_UNICAST, &p_hwfn->cdev->mf_bits) &&
> >           p_ramrod->main_func_queue && conn_type != QED_LL2_TYPE_ROCE &&
> > -         conn_type != QED_LL2_TYPE_IWARP) {
> > +         conn_type != QED_LL2_TYPE_IWARP &&
> > +             (!QED_IS_NVMETCP_PERSONALITY(p_hwfn))) {
> >               p_ramrod->mf_si_bcast_accept_all = 1;
> >               p_ramrod->mf_si_mcast_accept_all = 1;
> >       } else {
> > @@ -1049,6 +1050,8 @@ static int qed_sp_ll2_tx_queue_start(struct qed_hwfn *p_hwfn,
> >       case QED_LL2_TYPE_OOO:
> >               if (p_hwfn->hw_info.personality == QED_PCI_ISCSI)
> >                       p_ramrod->conn_type = PROTOCOLID_ISCSI;
> > +             else if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP)
> > +                     p_ramrod->conn_type = PROTOCOLID_NVMETCP;
> >               else
> >                       p_ramrod->conn_type = PROTOCOLID_IWARP;
> >               break;
> > @@ -1634,7 +1637,8 @@ int qed_ll2_establish_connection(void *cxt, u8 connection_handle)
> >       if (rc)
> >               goto out;
> >
> > -     if (!QED_IS_RDMA_PERSONALITY(p_hwfn))
> > +     if (!QED_IS_RDMA_PERSONALITY(p_hwfn) &&
> > +         !QED_IS_NVMETCP_PERSONALITY(p_hwfn))
> >               qed_wr(p_hwfn, p_ptt, PRS_REG_USE_LIGHT_L2, 1);
> >
> >       qed_ll2_establish_connection_ooo(p_hwfn, p_ll2_conn);
> > @@ -2376,7 +2380,8 @@ static int qed_ll2_start_ooo(struct qed_hwfn *p_hwfn,
> >   static bool qed_ll2_is_storage_eng1(struct qed_dev *cdev)
> >   {
> >       return (QED_IS_FCOE_PERSONALITY(QED_LEADING_HWFN(cdev)) ||
> > -             QED_IS_ISCSI_PERSONALITY(QED_LEADING_HWFN(cdev))) &&
> > +             QED_IS_ISCSI_PERSONALITY(QED_LEADING_HWFN(cdev)) ||
> > +             QED_IS_NVMETCP_PERSONALITY(QED_LEADING_HWFN(cdev))) &&
> >               (QED_AFFIN_HWFN(cdev) != QED_LEADING_HWFN(cdev));
> >   }
> >
> > @@ -2402,11 +2407,13 @@ static int qed_ll2_stop(struct qed_dev *cdev)
> >
> >       if (cdev->ll2->handle == QED_LL2_UNUSED_HANDLE)
> >               return 0;
> > +     if (!QED_IS_NVMETCP_PERSONALITY(p_hwfn))
> > +             qed_llh_remove_mac_filter(cdev, 0, cdev->ll2_mac_address);
> >
> >       qed_llh_remove_mac_filter(cdev, 0, cdev->ll2_mac_address);
> >       eth_zero_addr(cdev->ll2_mac_address);
> >
> > -     if (QED_IS_ISCSI_PERSONALITY(p_hwfn))
> > +     if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))
> >               qed_ll2_stop_ooo(p_hwfn);
> >
> >       /* In CMT mode, LL2 is always started on engine 0 for a storage PF */
> > @@ -2442,6 +2449,7 @@ static int __qed_ll2_start(struct qed_hwfn *p_hwfn,
> >               conn_type = QED_LL2_TYPE_FCOE;
> >               break;
> >       case QED_PCI_ISCSI:
> > +     case QED_PCI_NVMETCP:
> >               conn_type = QED_LL2_TYPE_ISCSI;
> >               break;
> >       case QED_PCI_ETH_ROCE:
> > @@ -2567,7 +2575,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)
> >               }
> >       }
> >
> > -     if (QED_IS_ISCSI_PERSONALITY(p_hwfn)) {
> > +     if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn)) {
> >               DP_VERBOSE(cdev, QED_MSG_STORAGE, "Starting OOO LL2 queue\n");
> >               rc = qed_ll2_start_ooo(p_hwfn, params);
> >               if (rc) {
> > @@ -2576,10 +2584,13 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)
> >               }
> >       }
> >
> > -     rc = qed_llh_add_mac_filter(cdev, 0, params->ll2_mac_address);
> > -     if (rc) {
> > -             DP_NOTICE(cdev, "Failed to add an LLH filter\n");
> > -             goto err3;
> > +     if (!QED_IS_NVMETCP_PERSONALITY(p_hwfn)) {
> > +             rc = qed_llh_add_mac_filter(cdev, 0, params->ll2_mac_address);
> > +             if (rc) {
> > +                     DP_NOTICE(cdev, "Failed to add an LLH filter\n");
> > +                     goto err3;
> > +             }
> > +
> >       }
> >
> >       ether_addr_copy(cdev->ll2_mac_address, params->ll2_mac_address);
> > @@ -2587,7 +2598,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)
> >       return 0;
> >
> >   err3:
> > -     if (QED_IS_ISCSI_PERSONALITY(p_hwfn))
> > +     if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))
> >               qed_ll2_stop_ooo(p_hwfn);
> >   err2:
> >       if (b_is_storage_eng1)
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
> > index cd882c453394..4387292c37e2 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
> > @@ -2446,6 +2446,9 @@ qed_mcp_get_shmem_proto(struct qed_hwfn *p_hwfn,
> >       case FUNC_MF_CFG_PROTOCOL_ISCSI:
> >               *p_proto = QED_PCI_ISCSI;
> >               break;
> > +     case FUNC_MF_CFG_PROTOCOL_NVMETCP:
> > +             *p_proto = QED_PCI_NVMETCP;
> > +             break;
> >       case FUNC_MF_CFG_PROTOCOL_FCOE:
> >               *p_proto = QED_PCI_FCOE;
> >               break;
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
> > index 3e3192a3ad9b..6190adf965bc 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
> > @@ -1306,7 +1306,8 @@ int qed_mfw_process_tlv_req(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
> >       }
> >
> >       if ((tlv_group & QED_MFW_TLV_ISCSI) &&
> > -         p_hwfn->hw_info.personality != QED_PCI_ISCSI) {
> > +         p_hwfn->hw_info.personality != QED_PCI_ISCSI &&
> > +             p_hwfn->hw_info.personality != QED_PCI_NVMETCP) {
> >               DP_VERBOSE(p_hwfn, QED_MSG_SP,
> >                          "Skipping iSCSI TLVs for non-iSCSI function\n");
> >               tlv_group &= ~QED_MFW_TLV_ISCSI;
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_ooo.c b/drivers/net/ethernet/qlogic/qed/qed_ooo.c
> > index 88353aa404dc..d37bb2463f98 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_ooo.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_ooo.c
> > @@ -16,7 +16,7 @@
> >   #include "qed_ll2.h"
> >   #include "qed_ooo.h"
> >   #include "qed_cxt.h"
> > -
> > +#include "qed_nvmetcp.h"
> >   static struct qed_ooo_archipelago
> >   *qed_ooo_seek_archipelago(struct qed_hwfn *p_hwfn,
> >                         struct qed_ooo_info
> > @@ -85,6 +85,9 @@ int qed_ooo_alloc(struct qed_hwfn *p_hwfn)
> >       case QED_PCI_ISCSI:
> >               proto = PROTOCOLID_ISCSI;
> >               break;
> > +     case QED_PCI_NVMETCP:
> > +             proto = PROTOCOLID_NVMETCP;
> > +             break;
> >       case QED_PCI_ETH_RDMA:
> >       case QED_PCI_ETH_IWARP:
> >               proto = PROTOCOLID_IWARP;
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
> > index aa71adcf31ee..60b3876387a9 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
> > @@ -385,6 +385,7 @@ int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
> >               p_ramrod->personality = PERSONALITY_FCOE;
> >               break;
> >       case QED_PCI_ISCSI:
> > +     case QED_PCI_NVMETCP:
> >               p_ramrod->personality = PERSONALITY_ISCSI;
> >               break;
> >       case QED_PCI_ETH_ROCE:
> >
> As indicated, I do find this mix of 'nvmetcp is nearly iscsi' a bit
> strange. I would have preferred to have distinct types for nvmetcp.
>

PERSONALITY_ determines the FW resource layout, which is the same for iSCSI and
NVMeTCP. I will change PERSONALITY_ISCSI to PERSONALITY_TCP_ULP.

> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 04/27] qed: Add support of HW filter block
  2021-05-02 11:13   ` Hannes Reinecke
@ 2021-05-03 15:27     ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-03 15:27 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Shai Malin, netdev, linux-nvme, davem, kuba, sagi, hch, axboe,
	kbusch, Ariel Elior, Michal Kalderon, okulkarni, pkushwaha

On 5/1/21 2:13 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:
> > From: Prabhakar Kushwaha <pkushwaha@marvell.com>
> >
> > This patch introduces the functionality of HW filter block.
> > It adds and removes filters based on source and target TCP port.
> >
> > It also add functionality to clear all filters at once.
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > ---
> >   drivers/net/ethernet/qlogic/qed/qed.h         |  10 ++
> >   drivers/net/ethernet/qlogic/qed/qed_dev.c     | 107 ++++++++++++++++++
> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |   5 +
> >   include/linux/qed/qed_nvmetcp_if.h            |  24 ++++
> >   4 files changed, 146 insertions(+)
> >
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Thanks.

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 07/27] qed: Add IP services APIs support
  2021-05-02 11:26   ` Hannes Reinecke
@ 2021-05-03 15:44     ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-03 15:44 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Shai Malin, netdev, linux-nvme, davem, kuba, sagi, hch, axboe,
	kbusch, Ariel Elior, Michal Kalderon, okulkarni, pkushwaha,
	Nikolay Assa

On 5/2/21 2:26 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:
> > From: Nikolay Assa <nassa@marvell.com>
> >
> > This patch introduces APIs which the NVMeTCP Offload device (qedn)
> > will use through the paired net-device (qede).
> > It includes APIs for:
> > - ipv4/ipv6 routing
> > - get VLAN from net-device
> > - TCP ports reservation
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Nikolay Assa <nassa@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > ---
> >   .../qlogic/qed/qed_nvmetcp_ip_services.c      | 239 ++++++++++++++++++
> >   .../linux/qed/qed_nvmetcp_ip_services_if.h    |  29 +++
> >   2 files changed, 268 insertions(+)
> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
> >   create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h
> >
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
> > new file mode 100644
> > index 000000000000..2904b1a0830a
> > --- /dev/null
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
> > @@ -0,0 +1,239 @@
> > +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
> > +/*
> > + * Copyright 2021 Marvell. All rights reserved.
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <asm/byteorder.h>
> > +#include <asm/param.h>
> > +#include <linux/delay.h>
> > +#include <linux/pci.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/etherdevice.h>
> > +#include <linux/kernel.h>
> > +#include <linux/stddef.h>
> > +#include <linux/errno.h>
> > +
> > +#include <net/tcp.h>
> > +
> > +#include <linux/qed/qed_nvmetcp_ip_services_if.h>
> > +
> > +#define QED_IP_RESOL_TIMEOUT  4
> > +
> > +int qed_route_ipv4(struct sockaddr_storage *local_addr,
> > +                struct sockaddr_storage *remote_addr,
> > +                struct sockaddr *hardware_address,
> > +                struct net_device **ndev)
> > +{
> > +     struct neighbour *neigh = NULL;
> > +     __be32 *loc_ip, *rem_ip;
> > +     struct rtable *rt;
> > +     int rc = -ENXIO;
> > +     int retry;
> > +
> > +     loc_ip = &((struct sockaddr_in *)local_addr)->sin_addr.s_addr;
> > +     rem_ip = &((struct sockaddr_in *)remote_addr)->sin_addr.s_addr;
> > +     *ndev = NULL;
> > +     rt = ip_route_output(&init_net, *rem_ip, *loc_ip, 0/*tos*/, 0/*oif*/);
> > +     if (IS_ERR(rt)) {
> > +             pr_err("lookup route failed\n");
> > +             rc = PTR_ERR(rt);
> > +             goto return_err;
> > +     }
> > +
> > +     neigh = dst_neigh_lookup(&rt->dst, rem_ip);
> > +     if (!neigh) {
> > +             rc = -ENOMEM;
> > +             ip_rt_put(rt);
> > +             goto return_err;
> > +     }
> > +
> > +     *ndev = rt->dst.dev;
> > +     ip_rt_put(rt);
> > +
> > +     /* If not resolved, kick-off state machine towards resolution */
> > +     if (!(neigh->nud_state & NUD_VALID))
> > +             neigh_event_send(neigh, NULL);
> > +
> > +     /* query neighbor until resolved or timeout */
> > +     retry = QED_IP_RESOL_TIMEOUT;
> > +     while (!(neigh->nud_state & NUD_VALID) && retry > 0) {
> > +             msleep(1000);
> > +             retry--;
> > +     }
> > +
> > +     if (neigh->nud_state & NUD_VALID) {
> > +             /* copy resolved MAC address */
> > +             neigh_ha_snapshot(hardware_address->sa_data, neigh, *ndev);
> > +
> > +             hardware_address->sa_family = (*ndev)->type;
> > +             rc = 0;
> > +     }
> > +
> > +     neigh_release(neigh);
> > +     if (!(*loc_ip)) {
> > +             *loc_ip = inet_select_addr(*ndev, *rem_ip, RT_SCOPE_UNIVERSE);
> > +             local_addr->ss_family = AF_INET;
> > +     }
> > +
> > +return_err:
> > +
> > +     return rc;
> > +}
> > +EXPORT_SYMBOL(qed_route_ipv4);
> > +
> > +int qed_route_ipv6(struct sockaddr_storage *local_addr,
> > +                struct sockaddr_storage *remote_addr,
> > +                struct sockaddr *hardware_address,
> > +                struct net_device **ndev)
> > +{
> > +     struct neighbour *neigh = NULL;
> > +     struct dst_entry *dst;
> > +     struct flowi6 fl6;
> > +     int rc = -ENXIO;
> > +     int retry;
> > +
> > +     memset(&fl6, 0, sizeof(fl6));
> > +     fl6.saddr = ((struct sockaddr_in6 *)local_addr)->sin6_addr;
> > +     fl6.daddr = ((struct sockaddr_in6 *)remote_addr)->sin6_addr;
> > +
> > +     dst = ip6_route_output(&init_net, NULL, &fl6);
> > +     if (!dst || dst->error) {
> > +             if (dst) {
> > +                     dst_release(dst);
> > +                     pr_err("lookup route failed %d\n", dst->error);
> > +             }
> > +
> > +             goto out;
> > +     }
> > +
> > +     neigh = dst_neigh_lookup(dst, &fl6.daddr);
> > +     if (neigh) {
> > +             *ndev = ip6_dst_idev(dst)->dev;
> > +
> > +             /* If not resolved, kick-off state machine towards resolution */
> > +             if (!(neigh->nud_state & NUD_VALID))
> > +                     neigh_event_send(neigh, NULL);
> > +
> > +             /* query neighbor until resolved or timeout */
> > +             retry = QED_IP_RESOL_TIMEOUT;
> > +             while (!(neigh->nud_state & NUD_VALID) && retry > 0) {
> > +                     msleep(1000);
> > +                     retry--;
> > +             }
> > +
> > +             if (neigh->nud_state & NUD_VALID) {
> > +                     neigh_ha_snapshot((u8 *)hardware_address->sa_data, neigh, *ndev);
> > +
> > +                     hardware_address->sa_family = (*ndev)->type;
> > +                     rc = 0;
> > +             }
> > +
> > +             neigh_release(neigh);
> > +
> > +             if (ipv6_addr_any(&fl6.saddr)) {
> > +                     if (ipv6_dev_get_saddr(dev_net(*ndev), *ndev,
> > +                                            &fl6.daddr, 0, &fl6.saddr)) {
> > +                             pr_err("Unable to find source IP address\n");
> > +                             goto out;
> > +                     }
> > +
> > +                     local_addr->ss_family = AF_INET6;
> > +                     ((struct sockaddr_in6 *)local_addr)->sin6_addr =
> > +                                                             fl6.saddr;
> > +             }
> > +     }
> > +
> > +     dst_release(dst);
> > +
> > +out:
> > +
> > +     return rc;
> > +}
> > +EXPORT_SYMBOL(qed_route_ipv6);
> > +
> > +void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id)
> > +{
> > +     if (is_vlan_dev(*ndev)) {
> > +             *vlan_id = vlan_dev_vlan_id(*ndev);
> > +             *ndev = vlan_dev_real_dev(*ndev);
> > +     }
> > +}
> > +EXPORT_SYMBOL(qed_vlan_get_ndev);
> > +
> > +struct pci_dev *qed_validate_ndev(struct net_device *ndev)
> > +{
> > +     struct pci_dev *pdev = NULL;
> > +     struct net_device *upper;
> > +
> > +     for_each_pci_dev(pdev) {
> > +             if (pdev && pdev->driver &&
> > +                 !strcmp(pdev->driver->name, "qede")) {
> > +                     upper = pci_get_drvdata(pdev);
> > +                     if (upper->ifindex == ndev->ifindex)
> > +                             return pdev;
> > +             }
> > +     }
> > +
> > +     return NULL;
> > +}
> > +EXPORT_SYMBOL(qed_validate_ndev);
> > +
> > +__be16 qed_get_in_port(struct sockaddr_storage *sa)
> > +{
> > +     return sa->ss_family == AF_INET
> > +             ? ((struct sockaddr_in *)sa)->sin_port
> > +             : ((struct sockaddr_in6 *)sa)->sin6_port;
> > +}
> > +EXPORT_SYMBOL(qed_get_in_port);
> > +
> > +int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr,
> > +                    struct socket **sock, u16 *port)
> > +{
> > +     struct sockaddr_storage sa;
> > +     int rc = 0;
> > +
> > +     rc = sock_create(local_ip_addr.ss_family, SOCK_STREAM, IPPROTO_TCP, sock);
> > +     if (rc) {
> > +             pr_warn("failed to create socket: %d\n", rc);
> > +             goto err;
> > +     }
> > +
> > +     (*sock)->sk->sk_allocation = GFP_KERNEL;
> > +     sk_set_memalloc((*sock)->sk);
> > +
> > +     rc = kernel_bind(*sock, (struct sockaddr *)&local_ip_addr,
> > +                      sizeof(local_ip_addr));
> > +
> > +     if (rc) {
> > +             pr_warn("failed to bind socket: %d\n", rc);
> > +             goto err_sock;
> > +     }
> > +
> > +     rc = kernel_getsockname(*sock, (struct sockaddr *)&sa);
> > +     if (rc < 0) {
> > +             pr_warn("getsockname() failed: %d\n", rc);
> > +             goto err_sock;
> > +     }
> > +
> > +     *port = ntohs(qed_get_in_port(&sa));
> > +
> > +     return 0;
> > +
> > +err_sock:
> > +     sock_release(*sock);
> > +     sock = NULL;
> > +err:
> > +
> > +     return rc;
> > +}
> > +EXPORT_SYMBOL(qed_fetch_tcp_port);
> > +
> > +void qed_return_tcp_port(struct socket *sock)
> > +{
> > +     if (sock && sock->sk) {
> > +             tcp_set_state(sock->sk, TCP_CLOSE);
> > +             sock_release(sock);
> > +     }
> > +}
> > +EXPORT_SYMBOL(qed_return_tcp_port);
> > diff --git a/include/linux/qed/qed_nvmetcp_ip_services_if.h b/include/linux/qed/qed_nvmetcp_ip_services_if.h
> > new file mode 100644
> > index 000000000000..3604aee53796
> > --- /dev/null
> > +++ b/include/linux/qed/qed_nvmetcp_ip_services_if.h
> > @@ -0,0 +1,29 @@
> > +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */
> > +/*
> > + * Copyright 2021 Marvell. All rights reserved.
> > + */
> > +
> > +#ifndef _QED_IP_SERVICES_IF_H
> > +#define _QED_IP_SERVICES_IF_H
> > +
> > +#include <linux/types.h>
> > +#include <net/route.h>
> > +#include <net/ip6_route.h>
> > +#include <linux/inetdevice.h>
> > +
> > +int qed_route_ipv4(struct sockaddr_storage *local_addr,
> > +                struct sockaddr_storage *remote_addr,
> > +                struct sockaddr *hardware_address,
> > +                struct net_device **ndev);
> > +int qed_route_ipv6(struct sockaddr_storage *local_addr,
> > +                struct sockaddr_storage *remote_addr,
> > +                struct sockaddr *hardware_address,
> > +                struct net_device **ndev);
> > +void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id);
> > +struct pci_dev *qed_validate_ndev(struct net_device *ndev);
> > +void qed_return_tcp_port(struct socket *sock);
> > +int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr,
> > +                    struct socket **sock, u16 *port);
> > +__be16 qed_get_in_port(struct sockaddr_storage *sa);
> > +
> > +#endif /* _QED_IP_SERVICES_IF_H */
> >
> Reviewed-by: Hannes Reinecke <hare@suse.de>
>

Thanks.

> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 08/27] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  2021-05-01 12:18   ` Hannes Reinecke
@ 2021-05-03 15:46     ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-03 15:46 UTC (permalink / raw)
  To: Hannes Reinecke, linux-nvme
  Cc: Shai Malin, netdev, davem, kuba, sagi, hch, axboe, kbusch,
	Ariel Elior, Michal Kalderon, okulkarni, pkushwaha,
	Dean Balandin

On 5/1/21 3:18 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:
> > This patch will present the structure for the NVMeTCP offload common
> > layer driver. This module is added under "drivers/nvme/host/" and future
> > offload drivers which will register to it will be placed under
> > "drivers/nvme/hw".
> > This new driver will be enabled by the Kconfig "NVM Express over Fabrics
> > TCP offload commmon layer".
> > In order to support the new transport type, for host mode, no change is
> > needed.
> >
> > Each new vendor-specific offload driver will register to this ULP during
> > its probe function, by filling out the nvme_tcp_ofld_dev->ops and
> > nvme_tcp_ofld_dev->private_data and calling nvme_tcp_ofld_register_dev
> > with the initialized struct.
> >
> > The internal implementation:
> > - tcp-offload.h:
> >    Includes all common structs and ops to be used and shared by offload
> >    drivers.
> >
> > - tcp-offload.c:
> >    Includes the init function which registers as a NVMf transport just
> >    like any other transport.
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > ---
> >   drivers/nvme/host/Kconfig       |  16 +++
> >   drivers/nvme/host/Makefile      |   3 +
> >   drivers/nvme/host/tcp-offload.c | 126 +++++++++++++++++++
> >   drivers/nvme/host/tcp-offload.h | 206 ++++++++++++++++++++++++++++++++
> >   4 files changed, 351 insertions(+)
> >   create mode 100644 drivers/nvme/host/tcp-offload.c
> >   create mode 100644 drivers/nvme/host/tcp-offload.h
> >
> It will be tricky to select the correct transport eg when traversing the
> discovery log page; the discovery log page only knows about 'tcp' (not
> 'tcp_offload'), so the offload won't be picked up.
> But that can we worked on / fixed later on, as it's arguably a policy
> decision.

I agree that we should improve the policy decision and allow additional
capabilities and it may be discussed as a new NVMe TPAR.

>
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Thanks.

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 09/27] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
  2021-05-01 12:19   ` Hannes Reinecke
@ 2021-05-03 15:50     ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-03 15:50 UTC (permalink / raw)
  To: Hannes Reinecke, linux-nvme
  Cc: Shai Malin, netdev, davem, kuba, sagi, hch, axboe, kbusch,
	Ariel Elior, Michal Kalderon, okulkarni, pkushwaha,
	Arie Gershberg

On 4/29/21 3:19 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:
> > From: Arie Gershberg <agershberg@marvell.com>
> >
> > Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
> > to header file, so it can be used by transport modules.
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Arie Gershberg <agershberg@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > ---
> >   drivers/nvme/host/fabrics.c | 7 -------
> >   drivers/nvme/host/fabrics.h | 7 +++++++
> >   2 files changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> > index 604ab0e5a2ad..55d7125c8483 100644
> > --- a/drivers/nvme/host/fabrics.c
> > +++ b/drivers/nvme/host/fabrics.c
> > @@ -1001,13 +1001,6 @@ void nvmf_free_options(struct nvmf_ctrl_options *opts)
> >   }
> >   EXPORT_SYMBOL_GPL(nvmf_free_options);
> >
> > -#define NVMF_REQUIRED_OPTS   (NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
> > -#define NVMF_ALLOWED_OPTS    (NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
> > -                              NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
> > -                              NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
> > -                              NVMF_OPT_DISABLE_SQFLOW |\
> > -                              NVMF_OPT_FAIL_FAST_TMO)
> > -
> >   static struct nvme_ctrl *
> >   nvmf_create_ctrl(struct device *dev, const char *buf)
> >   {
> > diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
> > index 888b108d87a4..b7627e8dcaaf 100644
> > --- a/drivers/nvme/host/fabrics.h
> > +++ b/drivers/nvme/host/fabrics.h
> > @@ -68,6 +68,13 @@ enum {
> >       NVMF_OPT_FAIL_FAST_TMO  = 1 << 20,
> >   };
> >
> > +#define NVMF_REQUIRED_OPTS   (NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
> > +#define NVMF_ALLOWED_OPTS    (NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
> > +                              NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
> > +                              NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
> > +                              NVMF_OPT_DISABLE_SQFLOW |\
> > +                              NVMF_OPT_FAIL_FAST_TMO)
> > +
> >   /**
> >    * struct nvmf_ctrl_options - Used to hold the options specified
> >    *                        with the parsing opts enum.
> >
>
> Why do you need them? None of the other transport drivers use them, why you?
>

Different HW devices that are offloading the NVMeTCP might have different
limitations of the allowed options.
For example, a device that does not support all the queue types.
With tcp and rdma, only the nvme-tcp and nvme-rdma layers handle those
attributes and the HW devices do not create any limitations for the allowed
options.

An alternative design could be to add separate fields in nvme_tcp_ofld_ops such
as max_hw_sectors and max_segments that we already have in this series.

Which would you prefer?

> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 12/27] nvme-tcp-offload: Add controller level error recovery implementation
  2021-05-01 16:29   ` Hannes Reinecke
@ 2021-05-03 15:52     ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-03 15:52 UTC (permalink / raw)
  To: Hannes Reinecke, linux-nvme
  Cc: Shai Malin, netdev, davem, kuba, sagi, hch, axboe, kbusch,
	Ariel Elior, Michal Kalderon, okulkarni, pkushwaha,
	Arie Gershberg

On 5/1/21 7:29 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:
> > From: Arie Gershberg <agershberg@marvell.com>
> >
> > In this patch, we implement controller level error handling and recovery.
> > Upon an error discovered by the ULP or reset controller initiated by the
> > nvme-core (using reset_ctrl workqueue), the ULP will initiate a controller
> > recovery which includes teardown and re-connect of all queues.
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Arie Gershberg <agershberg@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > ---
> >   drivers/nvme/host/tcp-offload.c | 138 +++++++++++++++++++++++++++++++-
> >   drivers/nvme/host/tcp-offload.h |   1 +
> >   2 files changed, 137 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
> > index 59e1955e02ec..9082b11c133f 100644
> > --- a/drivers/nvme/host/tcp-offload.c
> > +++ b/drivers/nvme/host/tcp-offload.c
> > @@ -74,6 +74,23 @@ void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)
> >   }
> >   EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
> >
> > +/**
> > + * nvme_tcp_ofld_error_recovery() - NVMeTCP Offload Library error recovery.
> > + * function.
> > + * @nctrl:   NVMe controller instance to change to resetting.
> > + *
> > + * API function that change the controller state to resseting.
> > + * Part of the overall controller reset sequence.
> > + */
> > +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl)
> > +{
> > +     if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_RESETTING))
> > +             return;
> > +
> > +     queue_work(nvme_reset_wq, &to_tcp_ofld_ctrl(nctrl)->err_work);
> > +}
> > +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_error_recovery);
> > +
> >   /**
> >    * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event
> >    * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.
> > @@ -84,7 +101,8 @@ EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);
> >    */
> >   int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)
> >   {
> > -     /* Placeholder - invoke error recovery flow */
> > +     pr_err("nvme-tcp-offload queue error\n");
> > +     nvme_tcp_ofld_error_recovery(&queue->ctrl->nctrl);
> >
> >       return 0;
> >   }
> > @@ -296,6 +314,28 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
> >       return rc;
> >   }
> >
> > +static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)
> > +{
> > +     /* If we are resetting/deleting then do nothing */
> > +     if (nctrl->state != NVME_CTRL_CONNECTING) {
> > +             WARN_ON_ONCE(nctrl->state == NVME_CTRL_NEW ||
> > +                          nctrl->state == NVME_CTRL_LIVE);
> > +
> > +             return;
> > +     }
> > +
> > +     if (nvmf_should_reconnect(nctrl)) {
> > +             dev_info(nctrl->device, "Reconnecting in %d seconds...\n",
> > +                      nctrl->opts->reconnect_delay);
> > +             queue_delayed_work(nvme_wq,
> > +                                &to_tcp_ofld_ctrl(nctrl)->connect_work,
> > +                                nctrl->opts->reconnect_delay * HZ);
> > +     } else {
> > +             dev_info(nctrl->device, "Removing controller...\n");
> > +             nvme_delete_ctrl(nctrl);
> > +     }
> > +}
> > +
> >   static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
> >   {
> >       struct nvmf_ctrl_options *opts = nctrl->opts;
> > @@ -407,10 +447,68 @@ nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
> >       /* Placeholder - teardown_io_queues */
> >   }
> >
> > +static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl =
> > +                             container_of(to_delayed_work(work),
> > +                                          struct nvme_tcp_ofld_ctrl,
> > +                                          connect_work);
> > +     struct nvme_ctrl *nctrl = &ctrl->nctrl;
> > +
> > +     ++nctrl->nr_reconnects;
> > +
> > +     if (ctrl->dev->ops->setup_ctrl(ctrl, false))
> > +             goto requeue;
> > +
> > +     if (nvme_tcp_ofld_setup_ctrl(nctrl, false))
> > +             goto release_and_requeue;
> > +
> > +     dev_info(nctrl->device, "Successfully reconnected (%d attempt)\n",
> > +              nctrl->nr_reconnects);
> > +
> > +     nctrl->nr_reconnects = 0;
> > +
> > +     return;
> > +
> > +release_and_requeue:
> > +     ctrl->dev->ops->release_ctrl(ctrl);
> > +requeue:
> > +     dev_info(nctrl->device, "Failed reconnect attempt %d\n",
> > +              nctrl->nr_reconnects);
> > +     nvme_tcp_ofld_reconnect_or_remove(nctrl);
> > +}
> > +
> > +static void nvme_tcp_ofld_error_recovery_work(struct work_struct *work)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl =
> > +             container_of(work, struct nvme_tcp_ofld_ctrl, err_work);
> > +     struct nvme_ctrl *nctrl = &ctrl->nctrl;
> > +
> > +     nvme_stop_keep_alive(nctrl);
> > +     nvme_tcp_ofld_teardown_io_queues(nctrl, false);
> > +     /* unquiesce to fail fast pending requests */
> > +     nvme_start_queues(nctrl);
> > +     nvme_tcp_ofld_teardown_admin_queue(nctrl, false);
> > +     blk_mq_unquiesce_queue(nctrl->admin_q);
> > +
> > +     if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
> > +             /* state change failure is ok if we started nctrl delete */
> > +             WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
> > +                          nctrl->state != NVME_CTRL_DELETING_NOIO);
> > +
> > +             return;
> > +     }
> > +
> > +     nvme_tcp_ofld_reconnect_or_remove(nctrl);
> > +}
> > +
> >   static void
> >   nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown)
> >   {
> > -     /* Placeholder - err_work and connect_work */
> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> > +
> > +     cancel_work_sync(&ctrl->err_work);
> > +     cancel_delayed_work_sync(&ctrl->connect_work);
> >       nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown);
> >       blk_mq_quiesce_queue(nctrl->admin_q);
> >       if (shutdown)
> > @@ -425,6 +523,38 @@ static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl)
> >       nvme_tcp_ofld_teardown_ctrl(nctrl, true);
> >   }
> >
> > +static void nvme_tcp_ofld_reset_ctrl_work(struct work_struct *work)
> > +{
> > +     struct nvme_ctrl *nctrl =
> > +             container_of(work, struct nvme_ctrl, reset_work);
> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> > +
> > +     nvme_stop_ctrl(nctrl);
> > +     nvme_tcp_ofld_teardown_ctrl(nctrl, false);
> > +
> > +     if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {
> > +             /* state change failure is ok if we started ctrl delete */
> > +             WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&
> > +                          nctrl->state != NVME_CTRL_DELETING_NOIO);
> > +
> > +             return;
> > +     }
> > +
> > +     if (ctrl->dev->ops->setup_ctrl(ctrl, false))
> > +             goto out_fail;
> > +
> > +     if (nvme_tcp_ofld_setup_ctrl(nctrl, false))
> > +             goto release_ctrl;
> > +
> > +     return;
> > +
> > +release_ctrl:
> > +     ctrl->dev->ops->release_ctrl(ctrl);
> > +out_fail:
> > +     ++nctrl->nr_reconnects;
> > +     nvme_tcp_ofld_reconnect_or_remove(nctrl);
> > +}
> > +
> >   static int
> >   nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
> >                          struct request *rq,
> > @@ -521,6 +651,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)
> >                            opts->nr_poll_queues + 1;
> >       nctrl->sqsize = opts->queue_size - 1;
> >       nctrl->kato = opts->kato;
> > +     INIT_DELAYED_WORK(&ctrl->connect_work,
> > +                       nvme_tcp_ofld_reconnect_ctrl_work);
> > +     INIT_WORK(&ctrl->err_work, nvme_tcp_ofld_error_recovery_work);
> > +     INIT_WORK(&nctrl->reset_work, nvme_tcp_ofld_reset_ctrl_work);
> >       if (!(opts->mask & NVMF_OPT_TRSVCID)) {
> >               opts->trsvcid =
> >                       kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL);
> > diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
> > index 9fd270240eaa..b23b1d7ea6fa 100644
> > --- a/drivers/nvme/host/tcp-offload.h
> > +++ b/drivers/nvme/host/tcp-offload.h
> > @@ -204,3 +204,4 @@ struct nvme_tcp_ofld_ops {
> >   /* Exported functions for lower vendor specific offload drivers */
> >   int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
> >   void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
> > +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);
> >
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Thanks.


>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 13/27] nvme-tcp-offload: Add queue level implementation
  2021-05-01 16:36   ` Hannes Reinecke
@ 2021-05-03 15:56     ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-03 15:56 UTC (permalink / raw)
  To: Hannes Reinecke, linux-nvme
  Cc: Shai Malin, netdev, davem, kuba, sagi, hch, axboe, kbusch,
	Ariel Elior, Michal Kalderon, okulkarni, pkushwaha,
	Dean Balandin

On 5/1/21 7:36 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:
> > From: Dean Balandin <dbalandin@marvell.com>
> >
> > In this patch we implement queue level functionality.
> > The implementation is similar to the nvme-tcp module, the main
> > difference being that we call the vendor specific create_queue op which
> > creates the TCP connection, and NVMeTPC connection including
> > icreq+icresp negotiation.
> > Once create_queue returns successfully, we can move on to the fabrics
> > connect.
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Dean Balandin <dbalandin@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > ---
> >   drivers/nvme/host/tcp-offload.c | 415 ++++++++++++++++++++++++++++++--
> >   drivers/nvme/host/tcp-offload.h |   2 +-
> >   2 files changed, 390 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c
> > index 9082b11c133f..8ddce2257100 100644
> > --- a/drivers/nvme/host/tcp-offload.c
> > +++ b/drivers/nvme/host/tcp-offload.c
> > @@ -22,6 +22,11 @@ static inline struct nvme_tcp_ofld_ctrl *to_tcp_ofld_ctrl(struct nvme_ctrl *nctr
> >       return container_of(nctrl, struct nvme_tcp_ofld_ctrl, nctrl);
> >   }
> >
> > +static inline int nvme_tcp_ofld_qid(struct nvme_tcp_ofld_queue *queue)
> > +{
> > +     return queue - queue->ctrl->queues;
> > +}
> > +
> >   /**
> >    * nvme_tcp_ofld_register_dev() - NVMeTCP Offload Library registration
> >    * function.
> > @@ -191,12 +196,94 @@ nvme_tcp_ofld_alloc_tagset(struct nvme_ctrl *nctrl, bool admin)
> >       return set;
> >   }
> >
> > +static void __nvme_tcp_ofld_stop_queue(struct nvme_tcp_ofld_queue *queue)
> > +{
> > +     queue->dev->ops->drain_queue(queue);
> > +     queue->dev->ops->destroy_queue(queue);
> > +}
> > +
> > +static void nvme_tcp_ofld_stop_queue(struct nvme_ctrl *nctrl, int qid)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> > +     struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
> > +
> > +     if (!test_and_clear_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags))
> > +             return;
> > +
> > +     __nvme_tcp_ofld_stop_queue(queue);
> > +}
> > +
> > +static void nvme_tcp_ofld_stop_io_queues(struct nvme_ctrl *ctrl)
> > +{
> > +     int i;
> > +
> > +     for (i = 1; i < ctrl->queue_count; i++)
> > +             nvme_tcp_ofld_stop_queue(ctrl, i);
> > +}
> > +
> > +static void nvme_tcp_ofld_free_queue(struct nvme_ctrl *nctrl, int qid)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> > +     struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
> > +
> > +     if (!test_and_clear_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags))
> > +             return;
> > +
> > +     queue = &ctrl->queues[qid];
> > +     queue->ctrl = NULL;
> > +     queue->dev = NULL;
> > +     queue->report_err = NULL;
> > +}
> > +
> > +static void nvme_tcp_ofld_destroy_admin_queue(struct nvme_ctrl *nctrl, bool remove)
> > +{
> > +     nvme_tcp_ofld_stop_queue(nctrl, 0);
> > +     if (remove) {
> > +             blk_cleanup_queue(nctrl->admin_q);
> > +             blk_cleanup_queue(nctrl->fabrics_q);
> > +             blk_mq_free_tag_set(nctrl->admin_tagset);
> > +     }
> > +}
> > +
> > +static int nvme_tcp_ofld_start_queue(struct nvme_ctrl *nctrl, int qid)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> > +     struct nvme_tcp_ofld_queue *queue = &ctrl->queues[qid];
> > +     int rc;
> > +
> > +     queue = &ctrl->queues[qid];
> > +     if (qid) {
> > +             queue->cmnd_capsule_len = nctrl->ioccsz * 16;
> > +             rc = nvmf_connect_io_queue(nctrl, qid, false);
> > +     } else {
> > +             queue->cmnd_capsule_len = sizeof(struct nvme_command) + NVME_TCP_ADMIN_CCSZ;
> > +             rc = nvmf_connect_admin_queue(nctrl);
> > +     }
> > +
> > +     if (!rc) {
> > +             set_bit(NVME_TCP_OFLD_Q_LIVE, &queue->flags);
> > +     } else {
> > +             if (test_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags))
> > +                     __nvme_tcp_ofld_stop_queue(queue);
> > +             dev_err(nctrl->device,
> > +                     "failed to connect queue: %d ret=%d\n", qid, rc);
> > +     }
> > +
> > +     return rc;
> > +}
> > +
> >   static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
> >                                              bool new)
> >   {
> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> > +     struct nvme_tcp_ofld_queue *queue = &ctrl->queues[0];
> >       int rc;
> >
> > -     /* Placeholder - alloc_admin_queue */
> > +     rc = ctrl->dev->ops->create_queue(queue, 0, NVME_AQ_DEPTH);
> > +     if (rc)
> > +             return rc;
> > +
> > +     set_bit(NVME_TCP_OFLD_Q_ALLOCATED, &queue->flags);
> >       if (new) {
> >               nctrl->admin_tagset =
> >                               nvme_tcp_ofld_alloc_tagset(nctrl, true);
> > @@ -221,7 +308,9 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
> >               }
> >       }
> >
> > -     /* Placeholder - nvme_tcp_ofld_start_queue */
> > +     rc = nvme_tcp_ofld_start_queue(nctrl, 0);
> > +     if (rc)
> > +             goto out_cleanup_queue;
> >
> >       rc = nvme_enable_ctrl(nctrl);
> >       if (rc)
> > @@ -238,11 +327,12 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
> >   out_quiesce_queue:
> >       blk_mq_quiesce_queue(nctrl->admin_q);
> >       blk_sync_queue(nctrl->admin_q);
> > -
> >   out_stop_queue:
> > -     /* Placeholder - stop offload queue */
> > +     nvme_tcp_ofld_stop_queue(nctrl, 0);
> >       nvme_cancel_admin_tagset(nctrl);
> > -
> > +out_cleanup_queue:
> > +     if (new)
> > +             blk_cleanup_queue(nctrl->admin_q);
> >   out_cleanup_fabrics_q:
> >       if (new)
> >               blk_cleanup_queue(nctrl->fabrics_q);
> > @@ -250,7 +340,127 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
> >       if (new)
> >               blk_mq_free_tag_set(nctrl->admin_tagset);
> >   out_free_queue:
> > -     /* Placeholder - free admin queue */
> > +     nvme_tcp_ofld_free_queue(nctrl, 0);
> > +
> > +     return rc;
> > +}
> > +
> > +static unsigned int nvme_tcp_ofld_nr_io_queues(struct nvme_ctrl *nctrl)
> > +{
> > +     unsigned int nr_io_queues;
> > +
> > +     nr_io_queues = min(nctrl->opts->nr_io_queues, num_online_cpus());
> > +     nr_io_queues += min(nctrl->opts->nr_write_queues, num_online_cpus());
> > +     nr_io_queues += min(nctrl->opts->nr_poll_queues, num_online_cpus());
> > +
> > +     return nr_io_queues;
> > +}
> > +
>
> Really? Isn't this hardware-dependent?
> I would have expected the hardware to impose some limitations here (# of
> MSIx interrupts or something). Hmm?

Right!  We will be added to nvme_tcp_ofld_ops.

>
> > +static void
> > +nvme_tcp_ofld_set_io_queues(struct nvme_ctrl *nctrl, unsigned int nr_io_queues)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> > +     struct nvmf_ctrl_options *opts = nctrl->opts;
> > +
> > +     if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
> > +             /*
> > +              * separate read/write queues
> > +              * hand out dedicated default queues only after we have
> > +              * sufficient read queues.
> > +              */
> > +             ctrl->io_queues[HCTX_TYPE_READ] = opts->nr_io_queues;
> > +             nr_io_queues -= ctrl->io_queues[HCTX_TYPE_READ];
> > +             ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> > +                     min(opts->nr_write_queues, nr_io_queues);
> > +             nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
> > +     } else {
> > +             /*
> > +              * shared read/write queues
> > +              * either no write queues were requested, or we don't have
> > +              * sufficient queue count to have dedicated default queues.
> > +              */
> > +             ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> > +                     min(opts->nr_io_queues, nr_io_queues);
> > +             nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
> > +     }
> > +
> > +     if (opts->nr_poll_queues && nr_io_queues) {
> > +             /* map dedicated poll queues only if we have queues left */
> > +             ctrl->io_queues[HCTX_TYPE_POLL] =
> > +                     min(opts->nr_poll_queues, nr_io_queues);
> > +     }
> > +}
> > +
>
> Same here.
> Poll queues only ever make sense of the hardware can serve specific
> queue pairs without interrupts. Which again relates to the number of
> interrupts, and the affinity of those.
> Or isn't this a concern with your card?

Right!  We will be added to nvme_tcp_ofld_ops.
Our NVMeTCP offload HW supports 256 interrupt lines across the offload devices,
meaning 64-256 per offload device, (depends on the number of ports).

>
> > +static void
> > +nvme_tcp_ofld_terminate_io_queues(struct nvme_ctrl *nctrl, int start_from)
> > +{
> > +     int i;
> > +
> > +     /* admin-q will be ignored because of the loop condition */
> > +     for (i = start_from; i >= 1; i--)
> > +             nvme_tcp_ofld_stop_queue(nctrl, i);
> > +}
> > +
>
> Loop condition? Care to elaborate?

Similar code (with the same loop condition) exists in the other transports,
e.g. __nvme_tcp_alloc_io_queues(), which calls nvme_tcp_free_queues().
Will rephrase comment to: "Loop condition will stop before index 0 which
is the admin queue."

>
> > +static int nvme_tcp_ofld_create_io_queues(struct nvme_ctrl *nctrl)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> > +     int i, rc;
> > +
> > +     for (i = 1; i < nctrl->queue_count; i++) {
> > +             rc = ctrl->dev->ops->create_queue(&ctrl->queues[i],
> > +                                               i, nctrl->sqsize + 1);
> > +             if (rc)
> > +                     goto out_free_queues;
> > +
> > +             set_bit(NVME_TCP_OFLD_Q_ALLOCATED, &ctrl->queues[i].flags);
> > +     }
> > +
> > +     return 0;
> > +
> > +out_free_queues:
> > +     nvme_tcp_ofld_terminate_io_queues(nctrl, --i);
> > +
> > +     return rc;
> > +}
> > +
> > +static int nvme_tcp_ofld_alloc_io_queues(struct nvme_ctrl *nctrl)
> > +{
> > +     unsigned int nr_io_queues;
> > +     int rc;
> > +
> > +     nr_io_queues = nvme_tcp_ofld_nr_io_queues(nctrl);
> > +     rc = nvme_set_queue_count(nctrl, &nr_io_queues);
> > +     if (rc)
> > +             return rc;
> > +
> > +     nctrl->queue_count = nr_io_queues + 1;
> > +     if (nctrl->queue_count < 2) {
> > +             dev_err(nctrl->device,
> > +                     "unable to set any I/O queues\n");
> > +
> > +             return -ENOMEM;
> > +     }
> > +
> > +     dev_info(nctrl->device, "creating %d I/O queues.\n", nr_io_queues);
> > +     nvme_tcp_ofld_set_io_queues(nctrl, nr_io_queues);
> > +
> > +     return nvme_tcp_ofld_create_io_queues(nctrl);
> > +}
> > +
> > +static int nvme_tcp_ofld_start_io_queues(struct nvme_ctrl *nctrl)
> > +{
> > +     int i, rc = 0;
> > +
> > +     for (i = 1; i < nctrl->queue_count; i++) {
> > +             rc = nvme_tcp_ofld_start_queue(nctrl, i);
> > +             if (rc)
> > +                     goto terminate_queues;
> > +     }
> > +
> > +     return 0;
> > +
> > +terminate_queues:
> > +     nvme_tcp_ofld_terminate_io_queues(nctrl, --i);
> >
> >       return rc;
> >   }
> > @@ -258,9 +468,10 @@ static int nvme_tcp_ofld_configure_admin_queue(struct nvme_ctrl *nctrl,
> >   static int
> >   nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
> >   {
> > -     int rc;
> > +     int rc = nvme_tcp_ofld_alloc_io_queues(nctrl);
> >
> > -     /* Placeholder - alloc_io_queues */
> > +     if (rc)
> > +             return rc;
> >
> >       if (new) {
> >               nctrl->tagset = nvme_tcp_ofld_alloc_tagset(nctrl, false);
> > @@ -278,7 +489,9 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
> >               }
> >       }
> >
> > -     /* Placeholder - start_io_queues */
> > +     rc = nvme_tcp_ofld_start_io_queues(nctrl);
> > +     if (rc)
> > +             goto out_cleanup_connect_q;
> >
> >       if (!new) {
> >               nvme_start_queues(nctrl);
> > @@ -300,16 +513,16 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)
> >   out_wait_freeze_timed_out:
> >       nvme_stop_queues(nctrl);
> >       nvme_sync_io_queues(nctrl);
> > -
> > -     /* Placeholder - Stop IO queues */
> > -
> > +     nvme_tcp_ofld_stop_io_queues(nctrl);
> > +out_cleanup_connect_q:
> > +     nvme_cancel_tagset(nctrl);
> >       if (new)
> >               blk_cleanup_queue(nctrl->connect_q);
> >   out_free_tag_set:
> >       if (new)
> >               blk_mq_free_tag_set(nctrl->tagset);
> >   out_free_io_queues:
> > -     /* Placeholder - free_io_queues */
> > +     nvme_tcp_ofld_terminate_io_queues(nctrl, nctrl->queue_count);
> >
> >       return rc;
> >   }
> > @@ -336,6 +549,26 @@ static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)
> >       }
> >   }
> >
> > +static int
> > +nvme_tcp_ofld_init_admin_hctx(struct blk_mq_hw_ctx *hctx, void *data,
> > +                           unsigned int hctx_idx)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl = data;
> > +
> > +     hctx->driver_data = &ctrl->queues[0];
> > +
> > +     return 0;
> > +}
> > +
> > +static void nvme_tcp_ofld_destroy_io_queues(struct nvme_ctrl *nctrl, bool remove)
> > +{
> > +     nvme_tcp_ofld_stop_io_queues(nctrl);
> > +     if (remove) {
> > +             blk_cleanup_queue(nctrl->connect_q);
> > +             blk_mq_free_tag_set(nctrl->tagset);
> > +     }
> > +}
> > +
> >   static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
> >   {
> >       struct nvmf_ctrl_options *opts = nctrl->opts;
> > @@ -387,9 +620,19 @@ static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)
> >       return 0;
> >
> >   destroy_io:
> > -     /* Placeholder - stop and destroy io queues*/
> > +     if (nctrl->queue_count > 1) {
> > +             nvme_stop_queues(nctrl);
> > +             nvme_sync_io_queues(nctrl);
> > +             nvme_tcp_ofld_stop_io_queues(nctrl);
> > +             nvme_cancel_tagset(nctrl);
> > +             nvme_tcp_ofld_destroy_io_queues(nctrl, new);
> > +     }
> >   destroy_admin:
> > -     /* Placeholder - stop and destroy admin queue*/
> > +     blk_mq_quiesce_queue(nctrl->admin_q);
> > +     blk_sync_queue(nctrl->admin_q);
> > +     nvme_tcp_ofld_stop_queue(nctrl, 0);
> > +     nvme_cancel_admin_tagset(nctrl);
> > +     nvme_tcp_ofld_destroy_admin_queue(nctrl, new);
> >
> >       return rc;
> >   }
> > @@ -410,6 +653,18 @@ nvme_tcp_ofld_check_dev_opts(struct nvmf_ctrl_options *opts,
> >       return 0;
> >   }
> >
> > +static void nvme_tcp_ofld_free_ctrl_queues(struct nvme_ctrl *nctrl)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> > +     int i;
> > +
> > +     for (i = 0; i < nctrl->queue_count; ++i)
> > +             nvme_tcp_ofld_free_queue(nctrl, i);
> > +
> > +     kfree(ctrl->queues);
> > +     ctrl->queues = NULL;
> > +}
> > +
> >   static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
> >   {
> >       struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);
> > @@ -419,6 +674,7 @@ static void nvme_tcp_ofld_free_ctrl(struct nvme_ctrl *nctrl)
> >               goto free_ctrl;
> >
> >       down_write(&nvme_tcp_ofld_ctrl_rwsem);
> > +     nvme_tcp_ofld_free_ctrl_queues(nctrl);
> >       ctrl->dev->ops->release_ctrl(ctrl);
> >       list_del(&ctrl->list);
> >       up_write(&nvme_tcp_ofld_ctrl_rwsem);
> > @@ -436,15 +692,37 @@ static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)
> >   }
> >
> >   static void
> > -nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *ctrl, bool remove)
> > +nvme_tcp_ofld_teardown_admin_queue(struct nvme_ctrl *nctrl, bool remove)
> >   {
> > -     /* Placeholder - teardown_admin_queue */
> > +     blk_mq_quiesce_queue(nctrl->admin_q);
> > +     blk_sync_queue(nctrl->admin_q);
> > +
> > +     nvme_tcp_ofld_stop_queue(nctrl, 0);
> > +     nvme_cancel_admin_tagset(nctrl);
> > +
> > +     if (remove)
> > +             blk_mq_unquiesce_queue(nctrl->admin_q);
> > +
> > +     nvme_tcp_ofld_destroy_admin_queue(nctrl, remove);
> >   }
> >
> >   static void
> >   nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)
> >   {
> > -     /* Placeholder - teardown_io_queues */
> > +     if (nctrl->queue_count <= 1)
> > +             return;
> > +
> > +     blk_mq_quiesce_queue(nctrl->admin_q);
> > +     nvme_start_freeze(nctrl);
> > +     nvme_stop_queues(nctrl);
> > +     nvme_sync_io_queues(nctrl);
> > +     nvme_tcp_ofld_stop_io_queues(nctrl);
> > +     nvme_cancel_tagset(nctrl);
> > +
> > +     if (remove)
> > +             nvme_start_queues(nctrl);
> > +
> > +     nvme_tcp_ofld_destroy_io_queues(nctrl, remove);
> >   }
> >
> >   static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)
> > @@ -572,6 +850,17 @@ nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,
> >       return 0;
> >   }
> >
> > +inline size_t nvme_tcp_ofld_inline_data_size(struct nvme_tcp_ofld_queue *queue)
> > +{
> > +     return queue->cmnd_capsule_len - sizeof(struct nvme_command);
> > +}
> > +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_inline_data_size);
> > +
> > +static void nvme_tcp_ofld_commit_rqs(struct blk_mq_hw_ctx *hctx)
> > +{
> > +     /* Call ops->commit_rqs */
> > +}
> > +
> >   static blk_status_t
> >   nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
> >                      const struct blk_mq_queue_data *bd)
> > @@ -583,22 +872,96 @@ nvme_tcp_ofld_queue_rq(struct blk_mq_hw_ctx *hctx,
> >       return BLK_STS_OK;
> >   }
> >
> > +static void
> > +nvme_tcp_ofld_exit_request(struct blk_mq_tag_set *set,
> > +                        struct request *rq, unsigned int hctx_idx)
> > +{
> > +     /*
> > +      * Nothing is allocated in nvme_tcp_ofld_init_request,
> > +      * hence empty.
> > +      */
> > +}
> > +
> > +static int
> > +nvme_tcp_ofld_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
> > +                     unsigned int hctx_idx)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl = data;
> > +
> > +     hctx->driver_data = &ctrl->queues[hctx_idx + 1];
> > +
> > +     return 0;
> > +}
> > +
> > +static int nvme_tcp_ofld_map_queues(struct blk_mq_tag_set *set)
> > +{
> > +     struct nvme_tcp_ofld_ctrl *ctrl = set->driver_data;
> > +     struct nvmf_ctrl_options *opts = ctrl->nctrl.opts;
> > +
> > +     if (opts->nr_write_queues && ctrl->io_queues[HCTX_TYPE_READ]) {
> > +             /* separate read/write queues */
> > +             set->map[HCTX_TYPE_DEFAULT].nr_queues =
> > +                     ctrl->io_queues[HCTX_TYPE_DEFAULT];
> > +             set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
> > +             set->map[HCTX_TYPE_READ].nr_queues =
> > +                     ctrl->io_queues[HCTX_TYPE_READ];
> > +             set->map[HCTX_TYPE_READ].queue_offset =
> > +                     ctrl->io_queues[HCTX_TYPE_DEFAULT];
> > +     } else {
> > +             /* shared read/write queues */
> > +             set->map[HCTX_TYPE_DEFAULT].nr_queues =
> > +                     ctrl->io_queues[HCTX_TYPE_DEFAULT];
> > +             set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
> > +             set->map[HCTX_TYPE_READ].nr_queues =
> > +                     ctrl->io_queues[HCTX_TYPE_DEFAULT];
> > +             set->map[HCTX_TYPE_READ].queue_offset = 0;
> > +     }
> > +     blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
> > +     blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
> > +
> > +     if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) {
> > +             /* map dedicated poll queues only if we have queues left */
> > +             set->map[HCTX_TYPE_POLL].nr_queues =
> > +                             ctrl->io_queues[HCTX_TYPE_POLL];
> > +             set->map[HCTX_TYPE_POLL].queue_offset =
> > +                     ctrl->io_queues[HCTX_TYPE_DEFAULT] +
> > +                     ctrl->io_queues[HCTX_TYPE_READ];
> > +             blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
> > +     }
> > +
> > +     dev_info(ctrl->nctrl.device,
> > +              "mapped %d/%d/%d default/read/poll queues.\n",
> > +              ctrl->io_queues[HCTX_TYPE_DEFAULT],
> > +              ctrl->io_queues[HCTX_TYPE_READ],
> > +              ctrl->io_queues[HCTX_TYPE_POLL]);
> > +
> > +     return 0;
> > +}
> > +
> > +static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)
> > +{
> > +     /* Placeholder - Implement polling mechanism */
> > +
> > +     return 0;
> > +}
> > +
> >   static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {
> >       .queue_rq       = nvme_tcp_ofld_queue_rq,
> > +     .commit_rqs     = nvme_tcp_ofld_commit_rqs,
> > +     .complete       = nvme_complete_rq,
> >       .init_request   = nvme_tcp_ofld_init_request,
> > -     /*
> > -      * All additional ops will be also implemented and registered similar to
> > -      * tcp.c
> > -      */
> > +     .exit_request   = nvme_tcp_ofld_exit_request,
> > +     .init_hctx      = nvme_tcp_ofld_init_hctx,
> > +     .map_queues     = nvme_tcp_ofld_map_queues,
> > +     .poll           = nvme_tcp_ofld_poll,
> >   };
> >
> >   static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {
> >       .queue_rq       = nvme_tcp_ofld_queue_rq,
> > +     .complete       = nvme_complete_rq,
> >       .init_request   = nvme_tcp_ofld_init_request,
> > -     /*
> > -      * All additional ops will be also implemented and registered similar to
> > -      * tcp.c
> > -      */
> > +     .exit_request   = nvme_tcp_ofld_exit_request,
> > +     .init_hctx      = nvme_tcp_ofld_init_admin_hctx,
> >   };
> >
> >   static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {
> > diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h
> > index b23b1d7ea6fa..d82645fcf9da 100644
> > --- a/drivers/nvme/host/tcp-offload.h
> > +++ b/drivers/nvme/host/tcp-offload.h
> > @@ -105,7 +105,6 @@ struct nvme_tcp_ofld_ctrl {
> >        * Each entry in the array indicates the number of queues of
> >        * corresponding type.
> >        */
> > -     u32 queue_type_mapping[HCTX_MAX_TYPES];
> >       u32 io_queues[HCTX_MAX_TYPES];
> >
> >       /* Connectivity params */
> > @@ -205,3 +204,4 @@ struct nvme_tcp_ofld_ops {
> >   int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);
> >   void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);
> >   void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);
> > +inline size_t nvme_tcp_ofld_inline_data_size(struct nvme_tcp_ofld_queue *queue);
> >
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v4 05/27] qed: Add NVMeTCP Offload IO Level FW and HW HSI
  2021-05-02 11:22   ` Hannes Reinecke
@ 2021-05-04 16:25     ` Shai Malin
  0 siblings, 0 replies; 81+ messages in thread
From: Shai Malin @ 2021-05-04 16:25 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Shai Malin, netdev, davem, kuba, linux-nvme, sagi, hch, axboe,
	kbusch, Ariel Elior, Michal Kalderon, okulkarni, pkushwaha

On 5/2/21 2:22 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:
> > This patch introduces the NVMeTCP Offload FW and HW  HSI in order
> > to initialize the IO level configuration into a per IO HW
> > resource ("task") as part of the IO path flow.
> >
> > Acked-by: Igor Russkikh <irusskikh@marvell.com>
> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > ---
> >   include/linux/qed/nvmetcp_common.h | 418 ++++++++++++++++++++++++++++-
> >   include/linux/qed/qed_nvmetcp_if.h |  37 +++
> >   2 files changed, 454 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h
> > index c8836b71b866..dda7a785c321 100644
> > --- a/include/linux/qed/nvmetcp_common.h
> > +++ b/include/linux/qed/nvmetcp_common.h
> > @@ -7,6 +7,7 @@
> >   #include "tcp_common.h"
> >
> >   #define NVMETCP_SLOW_PATH_LAYER_CODE (6)
> > +#define NVMETCP_WQE_NUM_SGES_SLOWIO (0xf)
> >
> >   /* NVMeTCP firmware function init parameters */
> >   struct nvmetcp_spe_func_init {
> > @@ -194,4 +195,419 @@ struct nvmetcp_wqe {
> >   #define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_SHIFT 24
> >   };
> >
> > -#endif /* __NVMETCP_COMMON__ */
> > +struct nvmetcp_host_cccid_itid_entry {
> > +     __le16 itid;
> > +};
> > +
> > +struct nvmetcp_connect_done_results {
> > +     __le16 icid;
> > +     __le16 conn_id;
> > +     struct tcp_ulp_connect_done_params params;
> > +};
> > +
> > +struct nvmetcp_eqe_data {
> > +     __le16 icid;
> > +     __le16 conn_id;
> > +     __le16 reserved;
> > +     u8 error_code;
> > +     u8 error_pdu_opcode_reserved;
> > +#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_MASK 0x3F
> > +#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_SHIFT  0
> > +#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_VALID_MASK  0x1
> > +#define NVMETCP_EQE_DATA_ERROR_PDU_OPCODE_VALID_SHIFT  6
> > +#define NVMETCP_EQE_DATA_RESERVED0_MASK 0x1
> > +#define NVMETCP_EQE_DATA_RESERVED0_SHIFT 7
> > +};
> > +
> > +enum nvmetcp_task_type {
> > +     NVMETCP