From: Shai Malin <malin1024@gmail.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Shai Malin <smalin@marvell.com>,
netdev@vger.kernel.org, linux-nvme@lists.infradead.org,
davem@davemloft.net, kuba@kernel.org, sagi@grimberg.me,
hch@lst.de, axboe@fb.com, kbusch@kernel.org,
Ariel Elior <aelior@marvell.com>,
Michal Kalderon <mkalderon@marvell.com>,
okulkarni@marvell.com, pkushwaha@marvell.com
Subject: Re: [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver
Date: Mon, 3 May 2021 18:13:54 +0300 [thread overview]
Message-ID: <CAKKgK4wp4ps2Qk6_aFhw3VZYU7oEPf-y5iOZQ8g7ELbhFFajBg@mail.gmail.com> (raw)
In-Reply-To: <abf99c45-c896-b135-56c5-601e7e6ffa61@suse.de>
On 5/1/21 7:47 PM, Hannes Reinecke wrote:
> On 4/29/21 9:08 PM, Shai Malin wrote:
> > With the goal of enabling a generic infrastructure that allows NVMe/TCP
> > offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
> > patch series introduces the nvme-tcp-offload ULP host layer, which will
> > be a new transport type called "tcp-offload" and will serve as an
> > abstraction layer to work with vendor specific nvme-tcp offload drivers.
> >
> > NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
> > both the TCP level and the NVMeTCP level.
> >
> > The nvme-tcp-offload transport can co-exist with the existing tcp and
> > other transports. The tcp offload was designed so that stack changes are
> > kept to a bare minimum: only registering new transports.
> > All other APIs, ops etc. are identical to the regular tcp transport.
> > Representing the TCP offload as a new transport allows clear and manageable
> > differentiation between the connections which should use the offload path
> > and those that are not offloaded (even on the same device).
> >
> >
> > The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:
> >
> > * NVMe layer: *
> >
> > [ nvme/nvme-fabrics/blk-mq ]
> > |
> > (nvme API and blk-mq API)
> > |
> > |
> > * Vendor agnostic transport layer: *
> >
> > [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
> > | | |
> > (Verbs)
> > | | |
> > | (Socket)
> > | | |
> > | | (nvme-tcp-offload API)
> > | | |
> > | | |
> > * Vendor Specific Driver: *
> >
> > | | |
> > [ qedr ]
> > | |
> > [ qede ]
> > |
> > [ qedn ]
> >
> >
> > Performance:
> > ============
> > With this implementation on top of the Marvell qedn driver (using the
> > Marvell FastLinQ NIC), we were able to demonstrate the following CPU
> > utilization improvement:
> >
> > On AMD EPYC 7402, 2.80GHz, 28 cores:
> > - For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate):
> > Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with
> > NVMeTCP offload.
> >
> > On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores:
> > - For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate):
> > Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with
> > NVMeTCP offload.
> >
> > In addition, we were able to demonstrate the following latency improvement:
> > - For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
> > Improved the average latency from 105 usec with NVMeTCP SW to 39 usec
> > with NVMeTCP offload.
> >
> > Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec
> > with NVMeTCP offload.
> >
> > The end-to-end offload latency was measured from fio while running against
> > back end of null device.
> >
> >
> > Upstream plan:
> > ==============
> > Following this RFC, the series will be sent in a modular way so that changes
> > in each part will not impact the previous part.
> >
> > - Part 1 (Patches 1-7):
> > The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.
> >
> > - Part 2 (Patch 8-15):
> > The nvme-tcp-offload patches, will be sent to
> > 'linux-nvme@lists.infradead.org'.
> >
> > - Part 3 (Packet 16-27):
> > The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
> >
> >
> > Queue Initialization Design:
> > ============================
> > The nvme-tcp-offload ULP module shall register with the existing
> > nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
> > The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
> > with the following ops:
> > - claim_dev() - in order to resolve the route to the target according to
> > the paired net_dev.
> > - create_queue() - in order to create offloaded nvme-tcp queue.
> >
> > The nvme-tcp-offload ULP module shall manage all the controller level
> > functionalities, call claim_dev and based on the return values shall call
> > the relevant module create_queue in order to create the admin queue and
> > the IO queues.
> >
> >
> > IO-path Design:
> > ===============
> > The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload
> > ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor
> > driver and later, the nvme-tcp-offload vendor driver returns the request
> > completion (the IO completion).
> > No additional handling is needed in between; this design will reduce the
> > CPU utilization as we will describe below.
> >
> > The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
> > with the following IO-path ops:
> > - init_req()
> > - send_req() - in order to pass the request to the handling of the
> > offload driver that shall pass it to the vendor specific device.
> > - poll_queue()
> >
> > Once the IO completes, the nvme-tcp-offload vendor driver shall call
> > command.done() that will invoke the nvme-tcp-offload ULP layer to
> > complete the request.
> >
> >
> > TCP events:
> > ===========
> > The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions
> > and OOO events.
> >
> >
> > Teardown and errors:
> > ====================
> > In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall
> > call the nvme_tcp_ofld_report_queue_err.
> > The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
> > with the following teardown ops:
> > - drain_queue()
> > - destroy_queue()
> >
> >
> > The Marvell FastLinQ NIC HW engine:
> > ====================================
> > The Marvell NIC HW engine is capable of offloading the entire TCP/IP
> > stack and managing up to 64K connections per PF, already implemented and
> > upstream use cases for this include iWARP (by the Marvell qedr driver)
> > and iSCSI (by the Marvell qedi driver).
> > In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
> > and is able to manage the IO level also in case of TCP re-transmissions
> > and OOO events.
> > The HW engine enables direct data placement (including the data digest CRC
> > calculation and validation) and direct data transmission (including data
> > digest CRC calculation).
> >
> >
> > The Marvell qedn driver:
> > ========================
> > The new driver will be added under "drivers/nvme/hw" and will be enabled
> > by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
> > As part of the qedn init, the driver will register as a pci device driver
> > and will work with the Marvell fastlinQ NIC.
> > As part of the probe, the driver will register to the nvme_tcp_offload
> > (ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
> > "qed_*_ops" which are used by the qede, qedr, qedf and qedi device
> > drivers.
> >
> >
> > QEDN Future work:
> > =================
> > - Support extended HW resources.
> > - Digest support.
> > - Devlink support for device configuration and TCP offload configurations.
> > - Statistics
> >
> >
> > Long term future work:
> > ======================
> > - The nvme-tcp-offload ULP target abstraction layer.
> > - The Marvell nvme-tcp-offload "qednt" target driver.
> >
> >
> > Changes since RFC v1:
> > =====================
> > - Fix nvme_tcp_ofld_ops return values.
> > - Remove NVMF_TRTYPE_TCP_OFFLOAD.
> > - Add nvme_tcp_ofld_poll() implementation.
> > - Fix nvme_tcp_ofld_queue_rq() to check map_sg() and send_req() return
> > values.
> >
> > Changes since RFC v2:
> > =====================
> > - Add qedn - Marvell's NVMeTCP HW offload vendor driver init and probe
> > (patches 8-11).
> > - Fixes in controller and queue level (patches 3-6).
> >
> > Changes since RFC v3:
> > =====================
> > - Add the full implementation of the nvme-tcp-offload layer including the
> > new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new flows (ASYNC
> > and timeout).
> > - Add nvme-tcp-offload device maximums: max_hw_sectors, max_segments.
> > - Add nvme-tcp-offload layer design and optimization changes.
> > - Add the qedn full implementation for the conn level, IO path and error
> > handling.
> > - Add qed support for the new AHP HW.
> >
> >
> > Arie Gershberg (3):
> > nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
> > definitions
> > nvme-tcp-offload: Add controller level implementation
> > nvme-tcp-offload: Add controller level error recovery implementation
> >
> > Dean Balandin (3):
> > nvme-tcp-offload: Add device scan implementation
> > nvme-tcp-offload: Add queue level implementation
> > nvme-tcp-offload: Add IO level implementation
> >
> > Nikolay Assa (2):
> > qed: Add IP services APIs support
> > qedn: Add qedn_claim_dev API support
> >
> > Omkar Kulkarni (1):
> > qed: Add qed-NVMeTCP personality
> >
> > Prabhakar Kushwaha (6):
> > qed: Add support of HW filter block
> > qedn: Add connection-level slowpath functionality
> > qedn: Add support of configuring HW filter block
> > qedn: Add support of Task and SGL
> > qedn: Add support of NVME ICReq & ICResp
> > qedn: Add support of ASYNC
> >
> > Shai Malin (12):
> > qed: Add NVMeTCP Offload PF Level FW and HW HSI
> > qed: Add NVMeTCP Offload Connection Level FW and HW HSI
> > qed: Add NVMeTCP Offload IO Level FW and HW HSI
> > qed: Add NVMeTCP Offload IO Level FW Initializations
> > nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
> > nvme-tcp-offload: Add Timeout and ASYNC Support
> > qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver
> > qedn: Add qedn probe
> > qedn: Add IRQ and fast-path resources initializations
> > qedn: Add IO level nvme_req and fw_cq workqueues
> > qedn: Add IO level fastpath functionality
> > qedn: Add Connection and IO level recovery flows
> >
> > MAINTAINERS | 10 +
> > drivers/net/ethernet/qlogic/Kconfig | 3 +
> > drivers/net/ethernet/qlogic/qed/Makefile | 5 +
> > drivers/net/ethernet/qlogic/qed/qed.h | 16 +
> > drivers/net/ethernet/qlogic/qed/qed_cxt.c | 32 +
> > drivers/net/ethernet/qlogic/qed/qed_cxt.h | 1 +
> > drivers/net/ethernet/qlogic/qed/qed_dev.c | 151 +-
> > drivers/net/ethernet/qlogic/qed/qed_hsi.h | 4 +-
> > drivers/net/ethernet/qlogic/qed/qed_ll2.c | 31 +-
> > drivers/net/ethernet/qlogic/qed/qed_mcp.c | 3 +
> > drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c | 3 +-
> > drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 868 +++++++++++
> > drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h | 114 ++
> > .../qlogic/qed/qed_nvmetcp_fw_funcs.c | 372 +++++
> > .../qlogic/qed/qed_nvmetcp_fw_funcs.h | 43 +
> > .../qlogic/qed/qed_nvmetcp_ip_services.c | 239 +++
> > drivers/net/ethernet/qlogic/qed/qed_ooo.c | 5 +-
> > drivers/net/ethernet/qlogic/qed/qed_sp.h | 5 +
> > .../net/ethernet/qlogic/qed/qed_sp_commands.c | 1 +
> > drivers/nvme/Kconfig | 1 +
> > drivers/nvme/Makefile | 1 +
> > drivers/nvme/host/Kconfig | 16 +
> > drivers/nvme/host/Makefile | 3 +
> > drivers/nvme/host/fabrics.c | 7 -
> > drivers/nvme/host/fabrics.h | 7 +
> > drivers/nvme/host/tcp-offload.c | 1330 +++++++++++++++++
> > drivers/nvme/host/tcp-offload.h | 209 +++
> > drivers/nvme/hw/Kconfig | 9 +
> > drivers/nvme/hw/Makefile | 3 +
> > drivers/nvme/hw/qedn/Makefile | 4 +
> > drivers/nvme/hw/qedn/qedn.h | 435 ++++++
> > drivers/nvme/hw/qedn/qedn_conn.c | 999 +++++++++++++
> > drivers/nvme/hw/qedn/qedn_main.c | 1153 ++++++++++++++
> > drivers/nvme/hw/qedn/qedn_task.c | 977 ++++++++++++
> > include/linux/qed/common_hsi.h | 1 +
> > include/linux/qed/nvmetcp_common.h | 616 ++++++++
> > include/linux/qed/qed_if.h | 22 +
> > include/linux/qed/qed_nvmetcp_if.h | 244 +++
> > .../linux/qed/qed_nvmetcp_ip_services_if.h | 29 +
> > 39 files changed, 7947 insertions(+), 25 deletions(-)
> > create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
> > create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
> > create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
> > create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
> > create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
> > create mode 100644 drivers/nvme/host/tcp-offload.c
> > create mode 100644 drivers/nvme/host/tcp-offload.h
> > create mode 100644 drivers/nvme/hw/Kconfig
> > create mode 100644 drivers/nvme/hw/Makefile
> > create mode 100644 drivers/nvme/hw/qedn/Makefile
> > create mode 100644 drivers/nvme/hw/qedn/qedn.h
> > create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c
> > create mode 100644 drivers/nvme/hw/qedn/qedn_main.c
> > create mode 100644 drivers/nvme/hw/qedn/qedn_task.c
> > create mode 100644 include/linux/qed/nvmetcp_common.h
> > create mode 100644 include/linux/qed/qed_nvmetcp_if.h
> > create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h
> >
> I would structure this patchset slightly different, in putting the
> NVMe-oF implementation at the start of the patchset; this will be where
> you get most of the comment, and any change there will potentially
> reflect back on the driver implementation, too.
>
> Something to consider for the next round.
Will do. Thanks.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke Kernel Storage Architect
> hare@suse.de +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
prev parent reply other threads:[~2021-05-03 15:14 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-29 19:08 [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 01/27] qed: Add NVMeTCP Offload PF Level FW and HW HSI Shai Malin
2021-05-01 16:50 ` Hannes Reinecke
2021-05-03 15:23 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 02/27] qed: Add NVMeTCP Offload Connection " Shai Malin
2021-05-01 17:28 ` Hannes Reinecke
2021-05-03 15:25 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 03/27] qed: Add qed-NVMeTCP personality Shai Malin
2021-05-02 11:11 ` Hannes Reinecke
2021-05-03 15:26 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 04/27] qed: Add support of HW filter block Shai Malin
2021-05-02 11:13 ` Hannes Reinecke
2021-05-03 15:27 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 05/27] qed: Add NVMeTCP Offload IO Level FW and HW HSI Shai Malin
2021-05-02 11:22 ` Hannes Reinecke
2021-05-04 16:25 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 06/27] qed: Add NVMeTCP Offload IO Level FW Initializations Shai Malin
2021-05-02 11:24 ` Hannes Reinecke
2021-05-04 16:28 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 07/27] qed: Add IP services APIs support Shai Malin
2021-05-02 11:26 ` Hannes Reinecke
2021-05-03 15:44 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 08/27] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
2021-05-01 12:18 ` Hannes Reinecke
2021-05-03 15:46 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 09/27] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Shai Malin
2021-05-01 12:19 ` Hannes Reinecke
2021-05-03 15:50 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 10/27] nvme-tcp-offload: Add device scan implementation Shai Malin
2021-05-01 12:25 ` Hannes Reinecke
2021-05-05 17:52 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 11/27] nvme-tcp-offload: Add controller level implementation Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 12/27] nvme-tcp-offload: Add controller level error recovery implementation Shai Malin
2021-05-01 16:29 ` Hannes Reinecke
2021-05-03 15:52 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 13/27] nvme-tcp-offload: Add queue level implementation Shai Malin
2021-05-01 16:36 ` Hannes Reinecke
2021-05-03 15:56 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 14/27] nvme-tcp-offload: Add IO " Shai Malin
2021-05-01 16:38 ` Hannes Reinecke
2021-05-04 16:34 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 15/27] nvme-tcp-offload: Add Timeout and ASYNC Support Shai Malin
2021-05-01 16:45 ` Hannes Reinecke
2021-05-04 16:49 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 16/27] qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver Shai Malin
2021-05-02 11:27 ` Hannes Reinecke
2021-05-04 16:52 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 17/27] qedn: Add qedn probe Shai Malin
2021-05-02 11:28 ` Hannes Reinecke
2021-05-08 11:01 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 18/27] qedn: Add qedn_claim_dev API support Shai Malin
2021-05-02 11:29 ` Hannes Reinecke
2021-05-07 13:57 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 19/27] qedn: Add IRQ and fast-path resources initializations Shai Malin
2021-05-02 11:32 ` Hannes Reinecke
2021-05-05 17:54 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 20/27] qedn: Add connection-level slowpath functionality Shai Malin
2021-05-02 11:37 ` Hannes Reinecke
2021-05-05 17:56 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 21/27] qedn: Add support of configuring HW filter block Shai Malin
2021-05-02 11:38 ` Hannes Reinecke
2021-05-05 17:57 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 22/27] qedn: Add IO level nvme_req and fw_cq workqueues Shai Malin
2021-05-02 11:42 ` Hannes Reinecke
2021-05-07 13:56 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 23/27] qedn: Add support of Task and SGL Shai Malin
2021-05-02 11:48 ` Hannes Reinecke
2021-05-07 14:00 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 24/27] qedn: Add support of NVME ICReq & ICResp Shai Malin
2021-05-02 11:53 ` Hannes Reinecke
2021-05-05 18:01 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 25/27] qedn: Add IO level fastpath functionality Shai Malin
2021-05-02 11:54 ` Hannes Reinecke
2021-05-05 18:04 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 26/27] qedn: Add Connection and IO level recovery flows Shai Malin
2021-05-02 11:57 ` Hannes Reinecke
2021-05-05 18:06 ` Shai Malin
2021-04-29 19:09 ` [RFC PATCH v4 27/27] qedn: Add support of ASYNC Shai Malin
2021-05-02 11:59 ` Hannes Reinecke
2021-05-05 18:08 ` Shai Malin
2021-05-01 16:47 ` [RFC PATCH v4 00/27] NVMeTCP Offload ULP and QEDN Device Driver Hannes Reinecke
2021-05-03 15:13 ` Shai Malin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKKgK4wp4ps2Qk6_aFhw3VZYU7oEPf-y5iOZQ8g7ELbhFFajBg@mail.gmail.com \
--to=malin1024@gmail.com \
--cc=aelior@marvell.com \
--cc=axboe@fb.com \
--cc=davem@davemloft.net \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mkalderon@marvell.com \
--cc=netdev@vger.kernel.org \
--cc=okulkarni@marvell.com \
--cc=pkushwaha@marvell.com \
--cc=sagi@grimberg.me \
--cc=smalin@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).