From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A03AACA9EAF for ; Wed, 30 Oct 2019 09:20:34 +0000 (UTC) Received: from dpdk.org (dpdk.org [92.243.14.124]) by mail.kernel.org (Postfix) with ESMTP id 1016C205ED for ; Wed, 30 Oct 2019 09:20:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1016C205ED Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=solarflare.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 86F8D1BF90; Wed, 30 Oct 2019 10:20:32 +0100 (CET) Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com [148.163.129.52]) by dpdk.org (Postfix) with ESMTP id 1C3421BF67 for ; Wed, 30 Oct 2019 10:20:31 +0100 (CET) X-Virus-Scanned: Proofpoint Essentials engine Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mx1-us2.ppe-hosted.com (PPE Hosted ESMTP Server) with ESMTPS id 9E0B9A40064; Wed, 30 Oct 2019 09:20:28 +0000 (UTC) Received: from [192.168.38.17] (91.220.146.112) by ukex01.SolarFlarecom.com (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Wed, 30 Oct 2019 09:20:22 +0000 To: Slava Ovsiienko , Thomas Monjalon , "olivier.matz@6wind.com" CC: "dev@dpdk.org" , Matan Azrad , Ori Kam , Yongseok Koh References: <1571922495-4588-1-git-send-email-viacheslavo@mellanox.com> <1572201636-16374-1-git-send-email-viacheslavo@mellanox.com> From: Andrew Rybchenko Message-ID: <058d3b3f-d023-9484-36d6-ee190dbedea0@solarflare.com> Date: Wed, 30 Oct 2019 12:20:17 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-Originating-IP: [91.220.146.112] X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To ukex01.SolarFlarecom.com (10.17.10.4) X-TM-AS-Product-Ver: SMEX-12.5.0.1300-8.5.1010-25010.003 X-TM-AS-Result: No-15.278400-8.000000-10 X-TMASE-MatchedRID: yebcs53SkkDmLzc6AOD8DfHkpkyUphL9Ud7Bjfo+5jTVEYfep7SzplNp MitqrKLNeTmX9/lkG60DhLqsl9I3I+RDOJEwZ/POAP2Bflpd3h9RR/7dxqpfAs5BgOy60g672Uw BLM8az1QgOU4U1ceXYki3T0FOoyKzlZGUKVCVab5HAa5T9FxbllTwUMN+zeHgMtnJCTb1F/jccF +vY4fYYbsfSDmga9fLo7Q09rzFew909yt1Yp3gn+IfK/Jd5eHmI5rZlsanIIUMU5ymv9yCwL6YV RYkPkYCcZnF3miqVX7ozjAyHLRIgcQcYtefg1OeIwk7p1qp3JbM8zLNncnslSSCvYM0up4/5R82 E5I3a7yDKZG0+IjRMCw1eAckYeY6zpuqf4OFuVLKE9oA9cXOzTLKL6f4fTnRDs0BGU1luwhW0x6 DHJ8MRnas+ajcg6EGq1b3ElHQAPAEaNToh7sIanIA6GYWLWr2E3EgF0+MVuBvmq6agcbc/Dya5R yZT3SuuGFmdKFVj3pjla1FCd5LRLUyJYb1zLXzbrJ9gVnOsZ3l5dByKUKjJsj0QMA/92m2MoyyY 2a36NGwgFLjrLJv4OrPDBxL/oFiYgyy6MLVCU0dxBAG5/hkWwqiBO2qhCIGs7RCQP2nr73ph2VA suOVAJXbNSz2NInE19KAqj5Qr9b0CAr2KR//W+UKNN5hHcAB/lboZPqaz5UOQZDk4Qncv3WCd6Q vVzbeDNDPNrK+A/PNndPLCkJ8eExTnVCnoQvGpdm9xR8qGyhTZ+NmI28jKRZ50KsQddqXX+4VVy Mmy9+BAcI2M9Ckid8+5LasA8AQQ0DzzQ2Pzx6eAiCmPx4NwFkMvWAuahr8+gD2vYtOFhgqtq5d3 cxkNYzzTLFRnsvHTfEGF/N8yJoDE8Ma6f5J/1MuY+208OagHEI46Z6A9KY= X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No X-TMASE-Result: 10--15.278400-8.000000 X-TMASE-Version: SMEX-12.5.0.1300-8.5.1010-25010.003 X-MDID: 1572427229-pM-M-3Kk-i0f Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/30/19 11:59 AM, Slava Ovsiienko wrote: >> -----Original Message----- >> From: Andrew Rybchenko >> Sent: Wednesday, October 30, 2019 9:35 >> To: Slava Ovsiienko ; Thomas Monjalon >> ; olivier.matz@6wind.com >> Cc: dev@dpdk.org; Matan Azrad ; Ori Kam >> ; Yongseok Koh >> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata >> >> @Olivier, please, take a look at the end of the mail. >> >> On 10/29/19 8:19 PM, Slava Ovsiienko wrote: >>> Hi, Andrew >>> >>> Thank you for the review. >>> >>>> -----Original Message----- >>>> From: Andrew Rybchenko >>>> Sent: Tuesday, October 29, 2019 18:22 >>>> To: Slava Ovsiienko ; dev@dpdk.org >>>> Cc: Thomas Monjalon ; Matan Azrad >>>> ; olivier.matz@6wind.com; Ori Kam >>>> ; Yongseok Koh >>>> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata >>>> >>>> On 10/27/19 9:40 PM, Viacheslav Ovsiienko wrote: >>>>> Currently, metadata can be set on egress path via mbuf tx_metadata >>>>> field with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META >>>> matches metadata. >>>>> This patch extends the metadata feature usability. >>>>> >>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META >>>>> >>>>> When supporting multiple tables, Tx metadata can also be set by a >>>>> rule and matched by another rule. This new action allows metadata to >>>>> be set as a result of flow match. >>>>> >>>>> 2) Metadata on ingress >>>>> >>>>> There's also need to support metadata on ingress. Metadata can be >>>>> set by SET_META action and matched by META item like Tx. The final >>>>> value set by the action will be delivered to application via >>>>> metadata dynamic field of mbuf which can be accessed by >>>> RTE_FLOW_DYNF_METADATA(). >>>>> PKT_RX_DYNF_METADATA flag will be set along with the data. >>>>> >>>>> The mbuf dynamic field must be registered by calling >>>>> rte_flow_dynf_metadata_register() prior to use SET_META action. >>>>> >>>>> The availability of dynamic mbuf metadata field can be checked with >>>>> rte_flow_dynf_metadata_avail() routine. >>>>> >>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be >>>>> propagated to the other path depending on hardware capability. >>>>> >>>>> Signed-off-by: Yongseok Koh >>>>> Signed-off-by: Viacheslav Ovsiienko >>>> Above explanations lack information about "meta" vs "mark" which may >>>> be set on Rx as well and delivered in other mbuf field. >>>> It should be explained by one more field is required and rules defined. >>> There is some story about metadata features. >>> Initially, there were proposed two metadata related actions: >>> >>> - RTE_FLOW_ACTION_TYPE_FLAG >>> - RTE_FLOW_ACTION_TYPE_MARK >>> >>> These actions set the special flag in the packet metadata, MARK action >>> stores some specified value in the metadata storage, and, on the >>> packet receiving PMD puts the flag and value to the mbuf and >>> applications can see the packet was threated inside flow engine >>> according to the appropriate RTE flow(s). MARK and FLAG are like some >>> kind of gateway to transfer some per-packet information from the flow >> engine to the application via receiving datapath. >>> From the datapath point of view, the MARK and FLAG are related to the >> receiving side only. >>> It would useful to have the same gateway on the transmitting side and >>> there was the feature of type RTE_FLOW_ITEM_TYPE_META was >> proposed. >>> The application can fill the field in mbuf and this value will be transferred to >> some field in the packet metadata inside the flow engine. >>> It did not matter whether these metadata fields are shared because of >>> MARK and META items belonged to different domains (receiving and >> transmitting) and could be vendor-specific. >>> So far, so good, DPDK proposes some entities to control metadata >>> inside the flow engine and gateways to exchange these values on a per- >> packet basis via datapaths. >>> As we can see, the MARK and META means are not symmetric, there is >>> absent action which would allow us to set META value on the transmitting >> path. So, the action of type: >>> - RTE_FLOW_ACTION_TYPE_SET_META is proposed. >>> >>> The next, applications raise the new requirements for packet metadata. >>> The flow engines are getting more complex, internal switches are >>> introduced, multiple ports might be supported within the same flow >>> engine namespace. From the DPDK points of view, it means the packets >>> might be sent on one eth_dev port and received on the other one, and the >> packet path inside the flow engine entirely belongs to the same hardware >> device. The simplest example is SR-IOV with PF, VFs and the representors. >>> And there is a brilliant opportunity to provide some out-of-band channel to >> transfer some extra data >>> from one port to another one, besides the packet data itself. >>> >>> >>>> Above explanations lack information about "meta" vs "mark" which may >>>> be set on Rx as well and delivered in other mbuf field. >>>> It should be explained by one more field is required and rules defined. >>>> Otherwise we can endup in half PMDs supporting mark only, half PMDs >>>> supporting meta only and applications in an interesting situation to >>>> make a choice which one to use. >>> There is no "mark" vs "meta". MARK and META means are kept for >>> compatibility issues and legacy part works exactly as before. The >>> trials (with flow_validate) is supposed to check whether PMD supports >>> MARK or META feature on appropriate domain. It depends on PMD >>> implementation, configuration and underlaying HW/FW/kernel capabilities >> and should be resolved in runtime. >> >> The trials a way, but very tricky way. My imagination draws me pictures how >> an application code could look like in attempt to use either mark or meta for >> Rx only and these pictures are not nice. > Agree, trials is not the best way. > For now there is no application trying to choose mark or meta, because these ones > belonged to other domains, and extension is newly introduced. > So, the applications using mark will continue use mark, the same is regarding meta. > The new application definitely will ask for both mark and of them, > we have the requirements from customers - "give us as many through bits as you can". > This new application just may refuse to work if metadata features are not detected, > because relays on it strongly. > BTW, trials are not so complex: rte_flow_validate(mark), rte_flow_validate_metadata() > and that's it. In assumption that pattern is supported and fate action used together with mark is supported as well. Not that easy, but OK. >> May be it will look acceptable when mark becomes a dynamic since usage of >> either one or another dynamic field is definitely easier than usage of either >> fixed or dynamic field. > At least in PMD datapath it does not look very ugly. >> May be dynamic field for mark at fixed offset should be introduced in the >> release or the nearest future? It will allow to preserve ABI up to 20.11 and >> provide future proof API. >> The trick is to register dynamic meta field at fixed offset at start of a day to >> be sure that it is guaranteed to succeed. >> It sounds like it is a transition mechanism from fixed to dynamic fields. >> >> [snip] >> >>>>> diff --git a/lib/librte_ethdev/rte_flow.h >>>>> b/lib/librte_ethdev/rte_flow.h index 4fee105..b821557 100644 >>>>> --- a/lib/librte_ethdev/rte_flow.h >>>>> +++ b/lib/librte_ethdev/rte_flow.h >>>>> @@ -28,6 +28,8 @@ >>>>> #include >>>>> #include >>>>> #include >>>>> +#include >>>>> +#include >>>>> >>>>> #ifdef __cplusplus >>>>> extern "C" { >>>>> @@ -418,7 +420,8 @@ enum rte_flow_item_type { >>>>> /** >>>>> * [META] >>>>> * >>>>> - * Matches a metadata value specified in mbuf metadata field. >>>>> + * Matches a metadata value. >>>>> + * >>>>> * See struct rte_flow_item_meta. >>>>> */ >>>>> RTE_FLOW_ITEM_TYPE_META, >>>>> @@ -1263,9 +1266,17 @@ struct rte_flow_item_icmp6_nd_opt_tla_eth >> { >>>>> #endif >>>>> >>>>> /** >>>>> - * RTE_FLOW_ITEM_TYPE_META. >>>>> + * @warning >>>>> + * @b EXPERIMENTAL: this structure may change without prior notice >>>> Is it allowed to make experimental back? >>> I think we should remove EXPERIMENTAL here. We do not introduce new >>> feature, but just extend the apply area. >> Agreed. >> >>>>> * >>>>> - * Matches a specified metadata value. >>>>> + * RTE_FLOW_ITEM_TYPE_META >>>>> + * >>>>> + * Matches a specified metadata value. On egress, metadata can be >>>>> + set either by >>>>> + * mbuf tx_metadata field with PKT_TX_METADATA flag or >>>>> + * RTE_FLOW_ACTION_TYPE_SET_META. On ingress, >>>>> + RTE_FLOW_ACTION_TYPE_SET_META sets >>>>> + * metadata for a packet and the metadata will be reported via mbuf >>>>> + metadata >>>>> + * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic >> mbuf >>>>> + field must be >>>>> + * registered in advance by rte_flow_dynf_metadata_register(). >>>>> */ >>>>> struct rte_flow_item_meta { >>>>> rte_be32_t data; >>>> [snip] >>>> >>>>> @@ -2429,6 +2447,55 @@ struct rte_flow_action_set_mac { >>>>> uint8_t mac_addr[RTE_ETHER_ADDR_LEN]; >>>>> }; >>>>> >>>>> +/** >>>>> + * @warning >>>>> + * @b EXPERIMENTAL: this structure may change without prior notice >>>>> + * >>>>> + * RTE_FLOW_ACTION_TYPE_SET_META >>>>> + * >>>>> + * Set metadata. Metadata set by mbuf tx_metadata field with >>>>> + * PKT_TX_METADATA flag on egress will be overridden by this action. >>>>> +On >>>>> + * ingress, the metadata will be carried by mbuf metadata dynamic >>>>> +field >>>>> + * with PKT_RX_DYNF_METADATA flag if set. The dynamic mbuf field >>>>> +must be >>>>> + * registered in advance by rte_flow_dynf_metadata_register(). >>>>> + * >>>>> + * Altering partial bits is supported with mask. For bits which >>>>> +have never >>>>> + * been set, unpredictable value will be seen depending on driver >>>>> + * implementation. For loopback/hairpin packet, metadata set on >>>>> +Rx/Tx may >>>>> + * or may not be propagated to the other path depending on HW >>>> capability. >>>>> + * >>>>> + * RTE_FLOW_ITEM_TYPE_META matches metadata. >>>>> + */ >>>>> +struct rte_flow_action_set_meta { >>>>> + rte_be32_t data; >>>>> + rte_be32_t mask; >>>> As I understand tx_metadata is host endian. Just double-checking. >>>> Is a new dynamic field host endian or big endian? >>>> I definitely would like to see motivation in comments why data/mask >>>> are big- endian here. >>> metadata is opaque value, endianness does not matter, there are no >>> some special motivations for choosing endiannes. rte_flow_item_meta() >>> structure provides data with rte_be32_t type, so meta related action does >> the same. >> >> Endianness of meta in mbuf and flow API should match and it must be > Yes, and they match (for meta), both are rte_be32_t. OK, will emphasize it in docs. > >> documented. Endianness is important if a HW supports less bits since it >> makes a hit for application to use LSB first if the bit space is sufficient. >> mark is defined as host-endian (uint32_t) and I think meta should be the >> same. Otherwise it complicates even more either mark or meta usage as >> discussed above . > mark is mark, meta is meta, these ones are not interrelated (despite they > are becoming similar). And there is no choice between mark and meta. > It is not obvious we should have the same endianness for both. > There are some applications using this legacy features, so there might be compatibility > issues either. There are few reasons above to make meta host endian: - match mark endianness (explained above, I still think that my reasons vaild) - make it easier for application to use it without endianness conversion   if a sequence number is used to allocate metas (similar to mark in OVS)   and bit space is less than 32-bits I see no single reason to keep it big-endian except a reason to keep it. Since tx_metadata is going away and metadata was used for Tx only before it is a good moment to fix it. >> Yes, I think that rte_flow_item_meta should be fixed since both mark and >> tx_metadata are host-endian. >> >> (it says nothing about HW interface which is vendor specific and vendor >> PMDs should care about it) > Handling this in PMD might introduce the extra endianness conversion in datapath, > impacting performance. Not nice, IMO. These concerns should not affect external interface since it could be different for different HW vendors.