From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.2 required=3.0 tests=BAYES_00,DKIM_ADSP_DISCARD, DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C8E5C433F5 for ; Mon, 13 Sep 2021 07:22:18 +0000 (UTC) Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by mail.kernel.org (Postfix) with ESMTP id 15B6760EE7 for ; Mon, 13 Sep 2021 07:22:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 15B6760EE7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=oktetlabs.ru Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dpdk.org Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0E72640151; Mon, 13 Sep 2021 09:22:16 +0200 (CEST) Received: from shelob.oktetlabs.ru (shelob.oktetlabs.ru [91.220.146.113]) by mails.dpdk.org (Postfix) with ESMTP id 322154014F for ; Mon, 13 Sep 2021 09:22:15 +0200 (CEST) Received: from [192.168.38.17] (aros.oktetlabs.ru [192.168.38.17]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by shelob.oktetlabs.ru (Postfix) with ESMTPSA id 82FF97F53A; Mon, 13 Sep 2021 10:22:14 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 shelob.oktetlabs.ru 82FF97F53A DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=oktetlabs.ru; s=default; t=1631517734; bh=6FQPMxOPst9Q1VrKsKxnpJiTgRu7tCdb54zmr7wj2Ro=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=X/i2Gw5oaVZa/5+1l7UpYFROF6RmJe3UDgMTF7PxSJk2zH26UyWPkYcPVYPrK4Uzm V04Zn3OrhDbZTMJjemLPGncZgsAZYfUog4xFgsXi4CBh+BLLYWn4NHx0qpldl9z6vX ECPFuDAE05+VO+FfDN8pex3MxljewCN3eJTJWges= To: "Xu, Rosen" , Anoob Joseph , "Yigit, Ferruh" , Andrew Rybchenko Cc: "Nicolau, Radu" , "Doherty, Declan" , "hemant.agrawal@nxp.com" , "matan@nvidia.com" , "Ananyev, Konstantin" , "thomas@monjalon.net" , Ankur Dwivedi , Akhil Goyal , "dev@dpdk.org" References: <20210823100259.1619886-1-gakhil@marvell.com> <06b92d82-f8af-fab0-58aa-9b977ec0f3a1@intel.com> From: Andrew Rybchenko Organization: OKTET Labs Message-ID: <011f96fb-0f02-e681-abb4-46257200da35@oktetlabs.ru> Date: Mon, 13 Sep 2021 10:22:14 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 9/13/21 9:56 AM, Xu, Rosen wrote: > Hi, > >> -----Original Message----- >> From: Anoob Joseph >> Sent: Wednesday, September 08, 2021 18:30 >> To: Yigit, Ferruh ; Xu, Rosen ; >> Andrew Rybchenko >> Cc: Nicolau, Radu ; Doherty, Declan >> ; hemant.agrawal@nxp.com; >> matan@nvidia.com; Ananyev, Konstantin ; >> thomas@monjalon.net; Ankur Dwivedi ; >> andrew.rybchenko@oktetlabs.ru; Akhil Goyal ; >> dev@dpdk.org >> Subject: RE: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload >> >> Hi Ferruh, Rosen, Andrew, >> >> Please see inline. >> >> Thanks, >> Anoob >> >>> Subject: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload >>> >>> External Email >>> >>> ---------------------------------------------------------------------- >>> On 8/23/2021 11:02 AM, Akhil Goyal wrote: >>>> Reassembly is a costly operation if it is done in software, however, >>>> if it is offloaded to HW, it can considerably save application cycles. >>>> The operation becomes even more costlier if IP fragmants are >>>> encrypted. >>>> >>>> To resolve above two issues, a new offload >>> DEV_RX_OFFLOAD_REASSEMBLY >>>> is introduced in ethdev for devices which can attempt reassembly of >>>> packets in hardware. >>>> rte_eth_dev_info is added with the reassembly capabilities which a >>>> device can support. >>>> Now, if IP fragments are encrypted, reassembly can also be attempted >>>> while doing inline IPsec processing. >>>> This is controlled by a flag in rte_security_ipsec_sa_options to >>>> enable reassembly of encrypted IP fragments in the inline path. >>>> >>>> The resulting reassembled packet would be a typical segmented mbuf >>>> in case of success. >>>> >>>> And if reassembly of fragments is failed or is incomplete (if >>>> fragments do not come before the reass_timeout), the mbuf is updated >>>> with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and mbuf is returned >>> as >>>> is. Now application may decide the fate of the packet to wait more >>>> for fragments to come or drop. >>>> >>>> Signed-off-by: Akhil Goyal >>>> --- >>>> lib/ethdev/rte_ethdev.c | 1 + >>>> lib/ethdev/rte_ethdev.h | 18 +++++++++++++++++- >>>> lib/mbuf/rte_mbuf_core.h | 3 ++- >>>> lib/security/rte_security.h | 10 ++++++++++ >>>> 4 files changed, 30 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index >>>> 9d95cd11e1..1ab3a093cf 100644 >>>> --- a/lib/ethdev/rte_ethdev.c >>>> +++ b/lib/ethdev/rte_ethdev.c >>>> @@ -119,6 +119,7 @@ static const struct { >>>> RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER), >>>> RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND), >>>> RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME), >>>> + RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY), >>>> RTE_RX_OFFLOAD_BIT2STR(SCATTER), >>>> RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP), >>>> RTE_RX_OFFLOAD_BIT2STR(SECURITY), >>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index >>>> d2b27c351f..e89a4dc1eb 100644 >>>> --- a/lib/ethdev/rte_ethdev.h >>>> +++ b/lib/ethdev/rte_ethdev.h >>>> @@ -1360,6 +1360,7 @@ struct rte_eth_conf { >>>> #define DEV_RX_OFFLOAD_VLAN_FILTER 0x00000200 >>>> #define DEV_RX_OFFLOAD_VLAN_EXTEND 0x00000400 >>>> #define DEV_RX_OFFLOAD_JUMBO_FRAME 0x00000800 >>>> +#define DEV_RX_OFFLOAD_REASSEMBLY 0x00001000 >>> >>> previous '0x00001000' was 'DEV_RX_OFFLOAD_CRC_STRIP', it has been >> long >>> that offload has been removed, but not sure if it cause any problem to >>> re- use it. >>> >>>> #define DEV_RX_OFFLOAD_SCATTER 0x00002000 >>>> /** >>>> * Timestamp is set by the driver in >>> RTE_MBUF_DYNFIELD_TIMESTAMP_NAME >>>> @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf { >>>> */ >>>> #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID >>> (UINT16_MAX) >>>> >>>> +/** >>>> + * Reassembly capabilities that a device can support. >>>> + * The device which can support reassembly offload should set >>>> + * DEV_RX_OFFLOAD_REASSEMBLY >>>> + */ >>>> +struct rte_eth_reass_capa { >>>> + /** Maximum time in ns that a fragment can wait for further >>> fragments */ >>>> + uint64_t reass_timeout; >>>> + /** Maximum number of fragments that device can reassemble */ >>>> + uint16_t max_frags; >>>> + /** Reserved for future capabilities */ >>>> + uint16_t reserved[3]; >>>> +}; >>>> + >>> >>> I wonder if there is any other hardware around supports reassembly >>> offload, it would be good to get more feedback on the capabilities list. >>> >>>> /** >>>> * Ethernet device associated switch information >>>> */ >>>> @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info { >>>> * embedded managed interconnect/switch. >>>> */ >>>> struct rte_eth_switch_info switch_info; >>>> + /* Reassembly capabilities of a device for reassembly offload */ >>>> + struct rte_eth_reass_capa reass_capa; >>>> >>>> - uint64_t reserved_64s[2]; /**< Reserved for future fields */ >>> >>> Reserved fields were added to be able to update the struct without >>> breaking the ABI, so that a critical change doesn't have to wait until >>> next ABI break release. >>> Since this is ABI break release, we can keep the reserved field and >>> add the new struct. Or this can be an opportunity to get rid of the reserved >> field. >>> >>> Personally I have no objection to get rid of the reserved field, but >>> better to agree on this explicitly. >>> >>>> void *reserved_ptrs[2]; /**< Reserved for future fields */ >>>> }; >>>> >>>> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h >>>> index >>>> bb38d7f581..cea25c87f7 100644 >>>> --- a/lib/mbuf/rte_mbuf_core.h >>>> +++ b/lib/mbuf/rte_mbuf_core.h >>>> @@ -200,10 +200,11 @@ extern "C" { >>>> #define PKT_RX_OUTER_L4_CKSUM_BAD (1ULL << 21) >>>> #define PKT_RX_OUTER_L4_CKSUM_GOOD (1ULL << 22) >>>> #define PKT_RX_OUTER_L4_CKSUM_INVALID ((1ULL << 21) | (1ULL >>> << 22)) >>>> +#define PKT_RX_REASSEMBLY_INCOMPLETE (1ULL << 23) >>>> >>> >>> Similar comment with Andrew's, what is the expectation from >>> application if this flag exists? Can we drop it to simplify the logic in the >> application? >> >> [Anoob] There can be few cases where hardware/NIC attempts inline >> reassembly but it fails to complete it >> >> 1. Number of fragments is larger than what is supported by the hardware 2. >> Hardware reassembly resources are exhausted (due to limited reassembly >> contexts etc) 3. Reassembly errors such as overlapping fragments 4. Wait >> time exhausted (or reassembly timeout) >> >> In such cases, application would be required to retrieve the original >> fragments so that it can attempt reassembly in software. The incomplete flag >> is useful for 2 purposes basically, 1. Application would need to retrieve the >> time the fragment has already spend in hardware reassembly so that >> software reassembly attempt can compensate for it. Otherwise, reassembly >> timeout across hardware + software will not be accurate Could you clarify how application will find out the time spent in HW. >> 2. Retrieve original >> fragments. With this proposal, an incomplete reassembly would result in a >> chained mbuf but the segments need not be consecutive. To explain bit more, >> >> Suppose we have a packet that is fragmented into 3 fragments, and fragment >> 3 & fragment 1 arrives in that order. Fragment 2 didn't arrive and hardware >> ultimately pushes it. In that case, application would be receiving a >> chained/segmented mbuf with fragment 1 & fragment 3 chained. >> >> Now, this chained mbuf can't be treated like a regular chained mbuf. Each >> fragment would have its IP hdr and there are fragments missing in between. >> The only thing application is expected to do is, retrieve fragments, push it to >> s/w reassembly. It sounds like it conflicts with SCATTER and BUFFER_SPLIT offloads which allow to return chained mbuf's. Don't know if it is good or bad, but anyway it must be documented. > > What you mentioned is error identification. But actually a negotiation about max frame size is needed before datagrams tx/rx. It sounds like it is OK for informational purposes, but right now I don't understand how it could be used by the application. Application still has to support reassembly in SW regardless of the information. >>> >>>> /* add new RX flags here, don't forget to update PKT_FIRST_FREE */ >>>> >>>> -#define PKT_FIRST_FREE (1ULL << 23) >>>> +#define PKT_FIRST_FREE (1ULL << 24) >>>> #define PKT_LAST_FREE (1ULL << 40) >>>> >>>> /* add new TX flags here, don't forget to update PKT_LAST_FREE */ >>>> diff --git a/lib/security/rte_security.h >>>> b/lib/security/rte_security.h index 88d31de0a6..364eeb5cd4 100644 >>>> --- a/lib/security/rte_security.h >>>> +++ b/lib/security/rte_security.h >>>> @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options { >>>> * * 0: Disable per session security statistics collection for this SA. >>>> */ >>>> uint32_t stats : 1; >>>> + >>>> + /** Enable reassembly on incoming packets. >>>> + * >>>> + * * 1: Enable driver to try reassembly of encrypted IP packets for >>>> + * this SA, if supported by the driver. This feature will work >>>> + * only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in >>>> + * inline ethernet device. >>>> + * * 0: Disable reassembly of packets (default). >>>> + */ >>>> + uint32_t reass_en : 1; >>>> }; >>>> >>>> /** IPSec security association direction */ >>>> >