From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3349FC61DA4 for ; Thu, 23 Feb 2023 15:33:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234134AbjBWPds (ORCPT ); Thu, 23 Feb 2023 10:33:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233627AbjBWPdr (ORCPT ); Thu, 23 Feb 2023 10:33:47 -0500 Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B921A19BE for ; Thu, 23 Feb 2023 07:33:44 -0800 (PST) Received: by mail-wm1-f49.google.com with SMTP id o38-20020a05600c512600b003e8320d1c11so2899781wms.1 for ; Thu, 23 Feb 2023 07:33:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=z6hXYN8j62FdMOuIHfGnONjGoCuAo9ox8KU2kT7xN5I=; b=kL3fdcBUx3XiNH3dE1fDgOYjhJ1ZPiHfxc5dNl9SbJzEuJiJhZFXjzL5D4VjlnaWlh m0269zNZS0jzTdYkE8tj8GT20AaH4ylrBqWbGni76plCTIuDsgJpsxD6cC7Ro/KEV3aj KTykwMsk3ISd+MDaUzJ/Z9cHNua+5YyLkZ1N48NXKDrQdBMEZKm31Dabt4AEZlD3Yu2q S7LpMcUki+qYHhkx34kC2CstLRDoI9rnCKPBCLrfvT12DYVlv60RuEzN0WY6oJoXA0OH hQRX2hEr30hG3lz9uUbnlkUoP4UNN8zH+6CXSFXIV0FmHwMcF7/zKHekPjL2L3MsyNzI HbwQ== X-Gm-Message-State: AO0yUKWmD95lOo/k75cCcclU/pcXa9p1lwTb9jFFDYnDn2DZQ/JWjdNp Cbzto5pqc0Jt3wbR0J2ZTo8= X-Google-Smtp-Source: AK7set8H9eIuX3XvQveK6S6sVV3dnOv6DtpjIG/yxFwYLXs40/SLN5hc0pHZXNkxrAzWCP4r/JWM1Q== X-Received: by 2002:a05:600c:28e:b0:3db:2922:2b99 with SMTP id 14-20020a05600c028e00b003db29222b99mr9364819wmk.4.1677166423089; Thu, 23 Feb 2023 07:33:43 -0800 (PST) Received: from [10.100.102.14] (85-250-17-149.bb.netvision.net.il. [85.250.17.149]) by smtp.gmail.com with ESMTPSA id v10-20020a05600c214a00b003e209186c07sm11616530wml.19.2023.02.23.07.33.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 Feb 2023 07:33:42 -0800 (PST) Message-ID: <72746760-f045-d7bc-1557-255720d7638d@grimberg.me> Date: Thu, 23 Feb 2023 17:33:40 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [PATCH v11 00/25] nvme-tcp receive offloads Content-Language: en-US To: Aurelien Aptel , linux-nvme@lists.infradead.org, netdev@vger.kernel.org, hch@lst.de, kbusch@kernel.org, axboe@fb.com, chaitanyak@nvidia.com, davem@davemloft.net, kuba@kernel.org Cc: aurelien.aptel@gmail.com, smalin@nvidia.com, malin1024@gmail.com, ogerlitz@nvidia.com, yorayz@nvidia.com, borisp@nvidia.com References: <20230203132705.627232-1-aaptel@nvidia.com> From: Sagi Grimberg In-Reply-To: <20230203132705.627232-1-aaptel@nvidia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org > Hi, > > Here is the next iteration of our nvme-tcp receive offload series. > > The main changes are in patch 3 (netlink). > > Rebased on top of today net-next > 8065c0e13f98 ("Merge branch 'yt8531-support'") > > The changes are also available through git: > > Repo: https://github.com/aaptel/linux.git branch nvme-rx-offload-v11 > Web: https://github.com/aaptel/linux/tree/nvme-rx-offload-v11 > > The NVMeTCP offload was presented in netdev 0x16 (video now available): > - https://netdevconf.info/0x16/session.html?NVMeTCP-Offload-%E2%80%93-Implementation-and-Performance-Gains > - https://youtu.be/W74TR-SNgi4 > > From: Aurelien Aptel > From: Shai Malin > From: Ben Ben-Ishay > From: Boris Pismenny > From: Or Gerlitz > From: Yoray Zack Hey Aurelien and Co, I've spent some time today looking at the last iteration of this, What I cannot understand, is how will this ever be used outside of the kernel nvme-tcp host driver? It seems that the interface is diesigned to fit only a kernel consumer, and a very specific one. Have you considered using a more standard interfaces to use this such that spdk or an io_uring based initiator can use it? To me it appears that: - ddp limits can be obtained via getsockopt - sk_add/sk_del can be done via setsockopt - offloaded DDGST crc can be obtained via something like msghdr.msg_control - Perhaps for setting up the offload per IO, recvmsg would be the vehicle with a new msg flag MSG_RCV_DDP or something, that would hide all the details of what the HW needs (the command_id would be set somewhere in the msghdr). - And all of the resync flow would be something that a separate ulp socket provider would take care of. Similar to how TLS presents itself to a tcp application. So the application does not need to be aware of it. I'm not sure that such interface could cover everything that is needed, but what I'm trying to convey, is that the current interface limits the usability for almost anything else. Please correct me if I'm wrong. Is this designed to also cater anything else outside of the kernel nvme-tcp host driver? > Compatibility > ============= > * The offload works with bare-metal or SRIOV. > * The HW can support up to 64K connections per device (assuming no > other HW accelerations are used). In this series, we will introduce > the support for up to 4k connections, and we have plans to increase it. > * SW TLS could not work together with the NVMeTCP offload as the HW > will need to track the NVMeTCP headers in the TCP stream. Can't say I like that. > * The ConnectX HW support HW TLS, but in ConnectX-7 those features > could not co-exists (and it is not part of this series). > * The NVMeTCP offload ConnectX 7 HW can support tunneling, but we > don’t see the need for this feature yet. > * NVMe poll queues are not in the scope of this series. bonding/teaming? > > Future Work > =========== > * NVMeTCP transmit offload. > * NVMeTCP host offloads incremental features. > * NVMeTCP target offload. Which target? which host?