From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adrien Mazarguil Subject: Re: [RFC] Generic flow director/filtering/classification API Date: Fri, 8 Jul 2016 15:03:10 +0200 Message-ID: <20160708130310.GD7621@6wind.com> References: <20160705181646.GO7621@6wind.com> <2EF2F5C0CC56984AA024D0B180335FCB13DEA331@IRSMSX102.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" , Thomas Monjalon , "Zhang, Helin" , "Wu, Jingjing" , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , "Lu, Wenzhuo" , Jan Medala , John Daley , "Chen, Jing D" , "Ananyev, Konstantin" , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , "De Lara Guarch, Pablo" , Olga Shern To: "Chandran, Sugesh" Return-path: Received: from mail-wm0-f54.google.com (mail-wm0-f54.google.com [74.125.82.54]) by dpdk.org (Postfix) with ESMTP id A607F697B for ; Fri, 8 Jul 2016 15:03:33 +0200 (CEST) Received: by mail-wm0-f54.google.com with SMTP id n127so12534036wme.1 for ; Fri, 08 Jul 2016 06:03:33 -0700 (PDT) Content-Disposition: inline In-Reply-To: <2EF2F5C0CC56984AA024D0B180335FCB13DEA331@IRSMSX102.ger.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Sugesh, On Thu, Jul 07, 2016 at 11:15:07PM +0000, Chandran, Sugesh wrote: > Hi Adrien, >=20 > Thank you for proposing this. It would be really useful for application= such as OVS-DPDK. > Please find my comments and questions inline below prefixed with [Suges= h]. Most of them are from the perspective of enabling these APIs in appli= cation such as OVS-DPDK. Thanks, I'm replying below. > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil > > Sent: Tuesday, July 5, 2016 7:17 PM > > To: dev@dpdk.org > > Cc: Thomas Monjalon ; Zhang, Helin > > ; Wu, Jingjing ; Rasesh > > Mody ; Ajit Khaparde > > ; Rahul Lakkireddy > > ; Lu, Wenzhuo ; > > Jan Medala ; John Daley ; Chen, > > Jing D ; Ananyev, Konstantin > > ; Matej Vido ; > > Alejandro Lucero ; Sony Chacko > > ; Jerin Jacob > > ; De Lara Guarch, Pablo > > ; Olga Shern > > Subject: [dpdk-dev] [RFC] Generic flow director/filtering/classificat= ion API > >=20 > > Hi All, > >=20 > > First, forgive me for this large message, I know our mailboxes alread= y > > suffer quite a bit from the amount of traffic on this ML. > >=20 > > This is not exactly yet another thread about how flow director should= be > > extended, rather about a brand new API to handle filtering and > > classification for incoming packets in the most PMD-generic and > > application-friendly fashion we can come up with. Reasons described b= elow. > >=20 > > I think this topic is important enough to include both the users of t= his API > > as well as PMD maintainers. So far I have CC'ed librte_ether (especia= lly > > rte_eth_ctrl.h contributors), testpmd and PMD maintainers (with and > > without > > a .filter_ctrl implementation), but if you know application maintaine= rs > > other than testpmd who use FDIR or might be interested in this discus= sion, > > feel free to add them. > >=20 > > The issues we found with the current approach are already summarized = in > > the > > following document, but here is a quick summary for TL;DR folks: > >=20 > > - PMDs do not expose a common set of filter types and even when they = do, > > their behavior more or less differs. > >=20 > > - Applications need to determine and adapt to device-specific limitat= ions > > and quirks on their own, without help from PMDs. > >=20 > > - Writing an application that creates flow rules targeting all device= s > > supported by DPDK is thus difficult, if not impossible. > >=20 > > - The current API has too many unspecified areas (particularly regard= ing > > side effects of flow rules) that make PMD implementation tricky. > >=20 > > This RFC API handles everything currently supported by .filter_ctrl, = the > > idea being to reimplement all of these to make them fully usable by > > applications in a more generic and well defined fashion. It has a ver= y small > > set of mandatory features and an easy method to let applications prob= e for > > supported capabilities. > >=20 > > The only downside is more work for the software control side of PMDs > > because > > they have to adapt to the API instead of the reverse. I think helpers= can be > > added to EAL to assist with this. > >=20 > > HTML version: > >=20 > > https://rawgit.com/6WIND/rte_flow/master/rte_flow.html > >=20 > > PDF version: > >=20 > > https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf > >=20 > > Related draft header file (for reference while reading the specificat= ion): > >=20 > > https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h > >=20 > > Git tree for completeness (latest .rst version can be retrieved from = here): > >=20 > > https://github.com/6WIND/rte_flow > >=20 > > What follows is the ReST source of the above, for inline comments and > > discussion. I intend to update that specification accordingly. > >=20 > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > Generic filter interface > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > >=20 > > .. footer:: > >=20 > > v0.6 > >=20 > > .. contents:: > > .. sectnum:: > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Overview > > =3D=3D=3D=3D=3D=3D=3D=3D > >=20 > > DPDK provides several competing interfaces added over time to perform > > packet > > matching and related actions such as filtering and classification. > >=20 > > They must be extended to implement the features supported by newer > > devices > > in order to expose them to applications, however the current design h= as > > several drawbacks: > >=20 > > - Complicated filter combinations which have not been hard-coded cann= ot be > > expressed. > > - Prone to API/ABI breakage when new features must be added to an > > existing > > filter type, which frequently happens. > >=20 > > From an application point of view: > >=20 > > - Having disparate interfaces, all optional and lacking in features d= oes not > > make this API easy to use. > > - Seemingly arbitrary built-in limitations of filter types based on t= he > > device they were initially designed for. > > - Undefined relationship between different filter types. > > - High complexity, considerable undocumented and/or undefined behavio= r. > >=20 > > Considering the growing number of devices supported by DPDK, adding a > > new > > filter type each time a new feature must be implemented is not sustai= nable > > in the long term. Applications not written to target a specific devic= e > > cannot really benefit from such an API. > >=20 > > For these reasons, this document defines an extensible unified API th= at > > encompasses and supersedes these legacy filter types. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Current API > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=20 > > Rationale > > --------- > >=20 > > The reason several competing (and mostly overlapping) filtering APIs = are > > present in DPDK is due to its nature as a thin layer between hardware= and > > software. > >=20 > > Each subsequent interface has been added to better match the capabili= ties > > and limitations of the latest supported device, which usually happene= d to > > need an incompatible configuration approach. Because of this, many en= ded > > up > > device-centric and not usable by applications that were not written f= or that > > particular device. > >=20 > > This document is not the first attempt to address this proliferation = issue, > > in fact a lot of work has already been done both to create a more gen= eric > > interface while somewhat keeping compatibility with legacy ones throu= gh a > > common call interface (``rte_eth_dev_filter_ctrl()`` with the > > ``.filter_ctrl`` PMD callback in ``rte_ethdev.h``). > >=20 > > Today, these previously incompatible interfaces are known as filter t= ypes > > (``RTE_ETH_FILTER_*`` from ``enum rte_filter_type`` in ``rte_eth_ctrl= .h``). > >=20 > > However while trivial to extend with new types, it only shifted the > > underlying problem as applications still need to be written for one k= ind of > > filter type, which, as described in the following sections, is not > > necessarily implemented by all PMDs that support filtering. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Filter types > > ------------ > >=20 > > This section summarizes the capabilities of each filter type. > >=20 > > Although the following list is exhaustive, the description of individ= ual > > types may contain inaccuracies due to the lack of documentation or us= age > > examples. > >=20 > > Note: names are prefixed with ``RTE_ETH_FILTER_``. > >=20 > > ``MACVLAN`` > > ~~~~~~~~~~~ > >=20 > > Matching: > >=20 > > - L2 source/destination addresses. > > - Optional 802.1Q VLAN ID. > > - Masking individual fields on a rule basis is not supported. > >=20 > > Action: > >=20 > > - Packets are redirected either to a given VF device using its ID or = to the > > PF. > >=20 > > ``ETHERTYPE`` > > ~~~~~~~~~~~~~ > >=20 > > Matching: > >=20 > > - L2 source/destination addresses (optional). > > - Ethertype (no VLAN ID?). > > - Masking individual fields on a rule basis is not supported. > >=20 > > Action: > >=20 > > - Receive packets on a given queue. > > - Drop packets. > >=20 > > ``FLEXIBLE`` > > ~~~~~~~~~~~~ > >=20 > > Matching: > >=20 > > - At most 128 consecutive bytes anywhere in packets. > > - Masking is supported with byte granularity. > > - Priorities are supported (relative to this filter type, undefined > > otherwise). > >=20 > > Action: > >=20 > > - Receive packets on a given queue. > >=20 > > ``SYN`` > > ~~~~~~~ > >=20 > > Matching: > >=20 > > - TCP SYN packets only. > > - One high priority bit can be set to give the highest possible prior= ity to > > this type when other filters with different types are configured. > >=20 > > Action: > >=20 > > - Receive packets on a given queue. > >=20 > > ``NTUPLE`` > > ~~~~~~~~~~ > >=20 > > Matching: > >=20 > > - Source/destination IPv4 addresses (optional in 2-tuple mode). > > - Source/destination TCP/UDP port (mandatory in 2 and 5-tuple modes). > > - L4 protocol (2 and 5-tuple modes). > > - Masking individual fields is supported. > > - TCP flags. > > - Up to 7 levels of priority relative to this filter type, undefined > > otherwise. > > - No IPv6. > >=20 > > Action: > >=20 > > - Receive packets on a given queue. > >=20 > > ``TUNNEL`` > > ~~~~~~~~~~ > >=20 > > Matching: > >=20 > > - Outer L2 source/destination addresses. > > - Inner L2 source/destination addresses. > > - Inner VLAN ID. > > - IPv4/IPv6 source (destination?) address. > > - Tunnel type to match (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE, > > 802.1BR > > E-Tag). > > - Tenant ID for tunneling protocols that have one. > > - Any combination of the above can be specified. > > - Masking individual fields on a rule basis is not supported. > >=20 > > Action: > >=20 > > - Receive packets on a given queue. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > ``FDIR`` > > ~~~~~~~~ > >=20 > > Queries: > >=20 > > - Device capabilities and limitations. > > - Device statistics about configured filters (resource usage, collisi= ons). > > - Device configuration (matching input set and masks) > >=20 > > Matching: > >=20 > > - Device mode of operation: none (to disable filtering), signature > > (hash-based dispatching from masked fields) or perfect (either MAC = VLAN > > or > > tunnel). > > - L2 Ethertype. > > - Outer L2 destination address (MAC VLAN mode). > > - Inner L2 destination address, tunnel type (NVGRE, VXLAN) and tunnel= ID > > (tunnel mode). > > - IPv4 source/destination addresses, ToS, TTL and protocol fields. > > - IPv6 source/destination addresses, TC, protocol and hop limits fiel= ds. > > - UDP source/destination IPv4/IPv6 and ports. > > - TCP source/destination IPv4/IPv6 and ports. > > - SCTP source/destination IPv4/IPv6, ports and verification tag field= . > > - Note, only one protocol type at once (either only L2 Ethertype, bas= ic > > IPv6, IPv4+UDP, IPv4+TCP and so on). > > - VLAN TCI (extended API). > > - At most 16 bytes to match in payload (extended API). A global devic= e > > look-up table specifies for each possible protocol layer (unknown, = raw, > > L2, L3, L4) the offset to use for each byte (they do not need to be > > contiguous) and the related bitmask. > > - Whether packet is addressed to PF or VF, in that case its ID can be > > matched as well (extended API). > > - Masking most of the above fields is supported, but simultaneously a= ffects > > all filters configured on a device. > > - Input set can be modified in a similar fashion for a given device t= o > > ignore individual fields of filters (i.e. do not match the destinat= ion > > address in a IPv4 filter, refer to **RTE_ETH_INPUT_SET_** > > macros). Configuring this also affects RSS processing on **i40e**. > > - Filters can also provide 32 bits of arbitrary data to return as par= t of > > matched packets. > >=20 > > Action: > >=20 > > - **RTE_ETH_FDIR_ACCEPT**: receive (accept) packet on a given queue. > > - **RTE_ETH_FDIR_REJECT**: drop packet immediately. > > - **RTE_ETH_FDIR_PASSTHRU**: similar to accept for the last filter in= list, > > otherwise process it with subsequent filters. > > - For accepted packets and if requested by filter, either 32 bits of > > arbitrary data and four bytes of matched payload (only in case of f= lex > > bytes matching), or eight bytes of matched payload (flex also) are = added > > to meta data. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > ``HASH`` > > ~~~~~~~~ > >=20 > > Not an actual filter type. Provides and retrieves the global device > > configuration (per port or entire NIC) for hash functions and their > > properties. > >=20 > > Hash function selection: "default" (keep current), XOR or Toeplitz. > >=20 > > This function can be configured per flow type (**RTE_ETH_FLOW_** > > definitions), supported types are: > >=20 > > - Unknown. > > - Raw. > > - Fragmented or non-fragmented IPv4. > > - Non-fragmented IPv4 with L4 (TCP, UDP, SCTP or other). > > - Fragmented or non-fragmented IPv6. > > - Non-fragmented IPv6 with L4 (TCP, UDP, SCTP or other). > > - L2 payload. > > - IPv6 with extensions. > > - IPv6 with L4 (TCP, UDP) and extensions. > >=20 > > ``L2_TUNNEL`` > > ~~~~~~~~~~~~~ > >=20 > > Matching: > >=20 > > - All packets received on a given port. > >=20 > > Action: > >=20 > > - Add tunnel encapsulation (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE= , > > 802.1BR E-Tag) using the provided Ethertype and tunnel ID (only E-T= ag > > is implemented at the moment). > > - VF ID to use for tag insertion (currently unused). > > - Destination pool for tag based forwarding (pools are IDs that can b= e > > affected to ports, duplication occurs if the same ID is shared by s= everal > > ports of the same NIC). > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Driver support > > -------------- > >=20 > > =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D =3D=3D=3D=3D > > =3D=3D=3D=3D=3D=3D=3D=3D=3D > > Driver MACVLAN ETHERTYPE FLEXIBLE SYN NTUPLE TUNNEL FDIR HASH > > L2_TUNNEL > > =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D =3D=3D=3D=3D > > =3D=3D=3D=3D=3D=3D=3D=3D=3D > > bnx2x > > cxgbe > > e1000 yes yes yes yes > > ena > > enic yes > > fm10k > > i40e yes yes yes yes yes > > ixgbe yes yes yes yes yes > > mlx4 > > mlx5 yes > > szedata2 > > =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D =3D=3D=3D=3D > > =3D=3D=3D=3D=3D=3D=3D=3D=3D > >=20 > > Flow director > > ------------- > >=20 > > Flow director (FDIR) is the name of the most capable filter type, whi= ch > > covers most features offered by others. As such, it is the most wides= pread > > in PMDs that support filtering (i.e. all of them besides **e1000**). > >=20 > > It is also the only type that allows an arbitrary 32 bits value provi= ded by > > applications to be attached to a filter and returned with matching pa= ckets > > instead of relying on the destination queue to recognize flows. > >=20 > > Unfortunately, even FDIR requires applications to be aware of low-lev= el > > capabilities and limitations (most of which come directly from **ixgb= e** and > > **i40e**): > >=20 > > - Bitmasks are set globally per device (port?), not per filter. > [Sugesh] This means application cannot define filters that matches on a= rbitrary different offsets? > If that=E2=80=99s the case, I assume the application has to program bit= mask in advance. Otherwise how=20 > the API framework deduce this bitmask information from the rules?? Its = not very clear to me > that how application pass down the bitmask information for multiple fil= ters on same port? This is my understanding of how flow director currently works, perhaps someome more familiar with it can answer this question better than I coul= d. Let me take an example, if particular device can only handle a single IPv= 4 mask common to all flow rules (say only to match destination addresses), updating that mask to also match the source address affects all defined a= nd future flow rules simultaneously. That is how FDIR currently works and I think it is wrong, as it penalizes devices that do support individual bit-masks per rule, and is a little awkward from an application point of view. What I suggest for the new API instead is the ability to specify one bit-mask per rule, and let the PMD deal with HW limitations by automatica= lly configuring global bitmasks from the first added rule, then refusing to a= dd subsequent rules if they specify a conflicting bit-mask. Existing rules remain unaffected that way, and applications do not have to be extra cautious. > > - Configuration state is not expected to be saved by the driver, and > > stopping/restarting a port requires the application to perform it a= gain > > (API documentation is also unclear about this). > > - Monolithic approach with ABI issues as soon as a new kind of flow o= r > > combination needs to be supported. > > - Cryptic global statistics/counters. > > - Unclear about how priorities are managed; filters seem to be arrang= ed as a > > linked list in hardware (possibly related to configuration order). > >=20 > > Packet alteration > > ----------------- > >=20 > > One interesting feature is that the L2 tunnel filter type implements = the > > ability to alter incoming packets through a filter (in this case to > > encapsulate them), thus the **mlx5** flow encap/decap features are no= t a > > foreign concept. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Proposed API > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=20 > > Terminology > > ----------- > >=20 > > - **Filtering API**: overall framework affecting the fate of selected > > packets, covers everything described in this document. > > - **Matching pattern**: properties to look for in received packets, a > > combination of any number of items. > > - **Pattern item**: part of a pattern that either matches packet data > > (protocol header, payload or derived information), or specifies pro= perties > > of the pattern itself. > > - **Actions**: what needs to be done when a packet matches a pattern. > > - **Flow rule**: this is the result of combining a *matching pattern*= with > > *actions*. > > - **Filter rule**: a less generic term than *flow rule*, can otherwis= e be > > used interchangeably. > > - **Hit**: a flow rule is said to be *hit* when processing a matching > > packet. > >=20 > > Requirements > > ------------ > >=20 > > As described in the previous section, there is a growing need for a c= ommon > > method to configure filtering and related actions in a hardware indep= endent > > fashion. > >=20 > > The filtering API should not disallow any filter combination by desig= n and > > must remain as simple as possible to use. It can simply be defined as= a > > method to perform one or several actions on selected packets. > >=20 > > PMDs are aware of the capabilities of the device they manage and shou= ld be > > responsible for preventing unsupported or conflicting combinations. > >=20 > > This approach is fundamentally different as it places most of the bur= den on > > the software side of the PMD instead of having device capabilities di= rectly > > mapped to API functions, then expecting applications to work around > > ensuing > > compatibility issues. > >=20 > > Requirements for a new API: > >=20 > > - Flexible and extensible without causing API/ABI problems for existi= ng > > applications. > > - Should be unambiguous and easy to use. > > - Support existing filtering features and actions listed in `Filter t= ypes`_. > > - Support packet alteration. > > - In case of overlapping filters, their priority should be well docum= ented. > > - Support filter queries (for example to retrieve counters). > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > High level design > > ----------------- > >=20 > > The chosen approach to make filtering as generic as possible is by > > expressing matching patterns through lists of items instead of the fl= at > > structures used in DPDK today, enabling combinations that are not > > predefined > > and thus being more versatile. > >=20 > > Flow rules can have several distinct actions (such as counting, > > encapsulating, decapsulating before redirecting packets to a particul= ar > > queue, etc.), instead of relying on several rules to achieve this and= having > > applications deal with hardware implementation details regarding thei= r > > order. > >=20 > > Support for different priority levels on a rule basis is provided, fo= r > > example in order to force a more specific rule come before a more gen= eric > > one for packets matched by both, however hardware support for more th= an > > a > > single priority level cannot be guaranteed. When supported, the numbe= r of > > available priority levels is usually low, which is why they can also = be > > implemented in software by PMDs (e.g. to simulate missing priority le= vels by > > reordering rules). > >=20 > > In order to remain as hardware agnostic as possible, by default all r= ules > > are considered to have the same priority, which means that the order > > between > > overlapping rules (when a packet is matched by several filters) is > > undefined, packet duplication may even occur as a result. > >=20 > > PMDs may refuse to create overlapping rules at a given priority level= when > > they can be detected (e.g. if a pattern matches an existing filter). > >=20 > > Thus predictable results for a given priority level can only be achie= ved > > with non-overlapping rules, using perfect matching on all protocol la= yers. > >=20 > > Support for multiple actions per rule may be implemented internally o= n top > > of non-default hardware priorities, as a result both features may not= be > > simultaneously available to applications. > >=20 > > Considering that allowed pattern/actions combinations cannot be known= in > > advance and would result in an unpractically large number of capabili= ties to > > expose, a method is provided to validate a given rule from the curren= t > > device configuration state without actually adding it (akin to a "dry= run" > > mode). > >=20 > > This enables applications to check if the rule types they need is sup= ported > > at initialization time, before starting their data path. This method = can be > > used anytime, its only requirement being that the resources needed by= a > > rule > > must exist (e.g. a target RX queue must be configured first). > >=20 > > Each defined rule is associated with an opaque handle managed by the = PMD, > > applications are responsible for keeping it. These can be used for qu= eries > > and rules management, such as retrieving counters or other data and > > destroying them. > >=20 > > Handles must be destroyed before releasing associated resources such = as > > queues. > >=20 > > Integration > > ----------- > >=20 > > To avoid ABI breakage, this new interface will be implemented through= the > > existing filtering control framework (``rte_eth_dev_filter_ctrl()``) = using > > **RTE_ETH_FILTER_GENERIC** as a new filter type. > >=20 > > However a public front-end API described in `Rules management`_ will > > be added as the preferred method to use it. > >=20 > > Once discussions with the community have converged to a definite API, > > legacy > > filter types should be deprecated and a deadline defined to remove th= eir > > support entirely. > >=20 > > PMDs will have to be gradually converted to **RTE_ETH_FILTER_GENERIC*= * > > or > > drop filtering support entirely. Less maintained PMDs for older hardw= are > > may > > lose support at this point. > >=20 > > The notion of filter type will then be deprecated and subsequently dr= opped > > to avoid confusion between both frameworks. > >=20 > > Implementation details > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=20 > > Flow rule > > --------- > >=20 > > A flow rule is the combination of a matching pattern with a list of a= ctions, > > and is the basis of this API. > >=20 > > Priorities > > ~~~~~~~~~~ > >=20 > > A priority can be assigned to a matching pattern. > >=20 > > The default priority level is 0 and is also the highest. Support for = more > > than a single priority level in hardware is not guaranteed. > >=20 > > If a packet is matched by several filters at a given priority level, = the > > outcome is undefined. It can take any path and can even be duplicated= . > >=20 > > Matching pattern > > ~~~~~~~~~~~~~~~~ > >=20 > > A matching pattern comprises any number of items of various types. > >=20 > > Items are arranged in a list to form a matching pattern for packets. = They > > fall in two categories: > >=20 > > - Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, VXLAN= and > > so > > on), usually associated with a specification structure. These must = be > > stacked in the same order as the protocol layers to match, starting= from > > L2. > >=20 > > - Affecting how the pattern is processed (END, VOID, INVERT, PF, VF, > > SIGNATURE and so on), often without a specification structure. Sinc= e they > > are meta data that does not match packet contents, these can be spe= cified > > anywhere within item lists without affecting the protocol matching = items. > >=20 > > Most item specifications can be optionally paired with a mask to narr= ow the > > specific fields or bits to be matched. > >=20 > > - Items are defined with ``struct rte_flow_item``. > > - Patterns are defined with ``struct rte_flow_pattern``. > >=20 > > Example of an item specification matching an Ethernet header: > >=20 > > +-----------------------------------------+ > > | Ethernet | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``spec`` | ``src`` | ``00:01:02:03:04`` | > > | +---------+--------------------+ > > | | ``dst`` | ``00:2a:66:00:01`` | > > +----------+---------+--------------------+ > > | ``mask`` | ``src`` | ``00:ff:ff:ff:00`` | > > | +---------+--------------------+ > > | | ``dst`` | ``00:00:00:00:ff`` | > > +----------+---------+--------------------+ > >=20 > > Non-masked bits stand for any value, Ethernet headers with the follow= ing > > properties are thus matched: > >=20 > > - ``src``: ``??:01:02:03:??`` > > - ``dst``: ``??:??:??:??:01`` > >=20 > > Except for meta types that do not need one, ``spec`` must be a valid = pointer > > to a structure of the related item type. A ``mask`` of the same type = can be > > provided to tell which bits in ``spec`` are to be matched. > >=20 > > A mask is normally only needed for ``spec`` fields matching packet da= ta, > > ignored otherwise. See individual item types for more information. > >=20 > > A ``NULL`` mask pointer is allowed and is similar to matching with a = full > > mask (all ones) ``spec`` fields supported by hardware, the remaining = fields > > are ignored (all zeroes), there is thus no error checking for unsuppo= rted > > fields. > >=20 > > Matching pattern items for packet data must be naturally stacked (ord= ered > > from lowest to highest protocol layer), as in the following examples: > >=20 > > +--------------+ > > | TCPv4 as L4 | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | Ethernet | > > +---+----------+ > > | 1 | IPv4 | > > +---+----------+ > > | 2 | TCP | > > +---+----------+ > >=20 > > +----------------+ > > | TCPv6 in VXLAN | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | Ethernet | > > +---+------------+ > > | 1 | IPv4 | > > +---+------------+ > > | 2 | UDP | > > +---+------------+ > > | 3 | VXLAN | > > +---+------------+ > > | 4 | Ethernet | > > +---+------------+ > > | 5 | IPv6 | > > +---+------------+ > > | 6 | TCP | > > +---+------------+ > >=20 > > +-----------------------------+ > > | TCPv4 as L4 with meta items | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D+ > > | 0 | VOID | > > +---+-------------------------+ > > | 1 | Ethernet | > > +---+-------------------------+ > > | 2 | VOID | > > +---+-------------------------+ > > | 3 | IPv4 | > > +---+-------------------------+ > > | 4 | TCP | > > +---+-------------------------+ > > | 5 | VOID | > > +---+-------------------------+ > > | 6 | VOID | > > +---+-------------------------+ > >=20 > > The above example shows how meta items do not affect packet data > > matching > > items, as long as those remain stacked properly. The resulting matchi= ng > > pattern is identical to "TCPv4 as L4". > >=20 > > +----------------+ > > | UDPv6 anywhere | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | IPv6 | > > +---+------------+ > > | 1 | UDP | > > +---+------------+ > >=20 > > If supported by the PMD, omitting one or several protocol layers at t= he > > bottom of the stack as in the above example (missing an Ethernet > > specification) enables hardware to look anywhere in packets. > >=20 > > It is unspecified whether the payload of supported encapsulations > > (e.g. VXLAN inner packet) is matched by such a pattern, which may app= ly to > > inner, outer or both packets. > >=20 > > +---------------------+ > > | Invalid, missing L3 | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | Ethernet | > > +---+-----------------+ > > | 1 | UDP | > > +---+-----------------+ > >=20 > > The above pattern is invalid due to a missing L3 specification betwee= n L2 > > and L4. It is only allowed at the bottom and at the top of the stack. > >=20 > > Meta item types > > ~~~~~~~~~~~~~~~ > >=20 > > These do not match packet data but affect how the pattern is processe= d, > > most > > of them do not need a specification structure. This particularity all= ows > > them to be specified anywhere without affecting other item types. > >=20 > > ``END`` > > ^^^^^^^ > >=20 > > End marker for item lists. Prevents further processing of items, ther= eby > > ending the pattern. > >=20 > > - Its numeric value is **0** for convenience. > > - PMD support is mandatory. > > - Both ``spec`` and ``mask`` are ignored. > >=20 > > +--------------------+ > > | END | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``spec`` | ignored | > > +----------+---------+ > > | ``mask`` | ignored | > > +----------+---------+ > >=20 > > ``VOID`` > > ^^^^^^^^ > >=20 > > Used as a placeholder for convenience. It is ignored and simply disca= rded by > > PMDs. > >=20 > > - PMD support is mandatory. > > - Both ``spec`` and ``mask`` are ignored. > >=20 > > +--------------------+ > > | VOID | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``spec`` | ignored | > > +----------+---------+ > > | ``mask`` | ignored | > > +----------+---------+ > >=20 > > One usage example for this type is generating rules that share a comm= on > > prefix quickly without reallocating memory, only by updating item typ= es: > >=20 > > +------------------------+ > > | TCP, UDP or ICMP as L4 | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= + > > | 0 | Ethernet | > > +---+--------------------+ > > | 1 | IPv4 | > > +---+------+------+------+ > > | 2 | UDP | VOID | VOID | > > +---+------+------+------+ > > | 3 | VOID | TCP | VOID | > > +---+------+------+------+ > > | 4 | VOID | VOID | ICMP | > > +---+------+------+------+ > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > ``INVERT`` > > ^^^^^^^^^^ > >=20 > > Inverted matching, i.e. process packets that do not match the pattern= . > >=20 > > - Both ``spec`` and ``mask`` are ignored. > >=20 > > +--------------------+ > > | INVERT | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``spec`` | ignored | > > +----------+---------+ > > | ``mask`` | ignored | > > +----------+---------+ > >=20 > > Usage example in order to match non-TCPv4 packets only: > >=20 > > +--------------------+ > > | Anything but TCPv4 | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | INVERT | > > +---+----------------+ > > | 1 | Ethernet | > > +---+----------------+ > > | 2 | IPv4 | > > +---+----------------+ > > | 3 | TCP | > > +---+----------------+ > >=20 > > ``PF`` > > ^^^^^^ > >=20 > > Matches packets addressed to the physical function of the device. > >=20 > > - Both ``spec`` and ``mask`` are ignored. > >=20 > > +--------------------+ > > | PF | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``spec`` | ignored | > > +----------+---------+ > > | ``mask`` | ignored | > > +----------+---------+ > >=20 > > ``VF`` > > ^^^^^^ > >=20 > > Matches packets addressed to the given virtual function ID of the dev= ice. > >=20 > > - Only ``spec`` needs to be defined, ``mask`` is ignored. > >=20 > > +----------------------------------------+ > > | VF | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``spec`` | ``vf`` | destination VF ID | > > +----------+---------+-------------------+ > > | ``mask`` | ignored | > > +----------+-----------------------------+ > >=20 > > ``SIGNATURE`` > > ^^^^^^^^^^^^^ > >=20 > > Requests hash-based signature dispatching for this rule. > >=20 > > Considering this is a global setting on devices that support it, all > > subsequent filter rules may have to be created with it as well. > >=20 > > - Only ``spec`` needs to be defined, ``mask`` is ignored. > >=20 > > +--------------------+ > > | SIGNATURE | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``spec`` | TBD | > > +----------+---------+ > > | ``mask`` | ignored | > > +----------+---------+ > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Data matching item types > > ~~~~~~~~~~~~~~~~~~~~~~~~ > >=20 > > Most of these are basically protocol header definitions with associat= ed > > bitmasks. They must be specified (stacked) from lowest to highest pro= tocol > > layer. > >=20 > > The following list is not exhaustive as new protocols will be added i= n the > > future. > >=20 > > ``ANY`` > > ^^^^^^^ > >=20 > > Matches any protocol in place of the current layer, a single ANY may = also > > stand for several protocol layers. > >=20 > > This is usually specified as the first pattern item when looking for = a > > protocol anywhere in a packet. > >=20 > > - A maximum value of **0** requests matching any number of protocol > > layers > > above or equal to the minimum value, a maximum value lower than the > > minimum one is otherwise invalid. > > - Only ``spec`` needs to be defined, ``mask`` is ignored. > >=20 > > +--------------------------------------------------------------------= ---+ > > | ANY = | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``spec`` | ``min`` | minimum number of layers covered = | > > | +---------+-----------------------------------------------= ---+ > > | | ``max`` | maximum number of layers covered, 0 for infini= ty | > > +----------+---------+-----------------------------------------------= ---+ > > | ``mask`` | ignored = | > > +----------+---------------------------------------------------------= ---+ > >=20 > > Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 o= r > > IPv6) > > and L4 (UDP) both matched by the first ANY specification, and inner L= 3 (IPv4 > > or IPv6) matched by the second ANY specification: > >=20 > > +----------------------------------+ > > | TCP in VXLAN with wildcards | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | Ethernet | > > +---+-----+----------+---------+---+ > > | 1 | ANY | ``spec`` | ``min`` | 2 | > > | | | +---------+---+ > > | | | | ``max`` | 2 | > > +---+-----+----------+---------+---+ > > | 2 | VXLAN | > > +---+------------------------------+ > > | 3 | Ethernet | > > +---+-----+----------+---------+---+ > > | 4 | ANY | ``spec`` | ``min`` | 1 | > > | | | +---------+---+ > > | | | | ``max`` | 1 | > > +---+-----+----------+---------+---+ > > | 5 | TCP | > > +---+------------------------------+ > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > ``RAW`` > > ^^^^^^^ > >=20 > > Matches a string of a given length at a given offset (in bytes), or a= nywhere > > in the payload of the current protocol layer (including L2 header if = used as > > the first item in the stack). > >=20 > > This does not increment the protocol layer count as it is not a proto= col > > definition. Subsequent RAW items modulate the first absolute one with > > relative offsets. > >=20 > > - Using **-1** as the ``offset`` of the first RAW item makes its abso= lute > > offset not fixed, i.e. the pattern is searched everywhere. > > - ``mask`` only affects the pattern. > >=20 > > +--------------------------------------------------------------+ > > | RAW | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > > =3D=3D=3D=3D=3D+ > > | ``spec`` | ``offset`` | absolute or relative pattern offset | > > | +-------------+-------------------------------------+ > > | | ``length`` | pattern length | > > | +-------------+-------------------------------------+ > > | | ``pattern`` | byte string of the above length | > > +----------+-------------+-------------------------------------+ > > | ``mask`` | ``offset`` | ignored | > > | +-------------+-------------------------------------+ > > | | ``length`` | ignored | > > | +-------------+-------------------------------------+ > > | | ``pattern`` | bitmask with the same byte length | > > +----------+-------------+-------------------------------------+ > >=20 > > Example pattern looking for several strings at various offsets of a U= DP > > payload, using combined RAW items: > >=20 > > +------------------------------------------+ > > | UDP payload matching | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | Ethernet | > > +---+--------------------------------------+ > > | 1 | IPv4 | > > +---+--------------------------------------+ > > | 2 | UDP | > > +---+-----+----------+-------------+-------+ > > | 3 | RAW | ``spec`` | ``offset`` | -1 | > > | | | +-------------+-------+ > > | | | | ``length`` | 3 | > > | | | +-------------+-------+ > > | | | | ``pattern`` | "foo" | > > +---+-----+----------+-------------+-------+ > > | 4 | RAW | ``spec`` | ``offset`` | 20 | > > | | | +-------------+-------+ > > | | | | ``length`` | 3 | > > | | | +-------------+-------+ > > | | | | ``pattern`` | "bar" | > > +---+-----+----------+-------------+-------+ > > | 5 | RAW | ``spec`` | ``offset`` | -30 | > > | | | +-------------+-------+ > > | | | | ``length`` | 3 | > > | | | +-------------+-------+ > > | | | | ``pattern`` | "baz" | > > +---+-----+----------+-------------+-------+ > >=20 > > This translates to: > >=20 > > - Locate "foo" in UDP payload, remember its offset. > > - Check "bar" at "foo"'s offset plus 20 bytes. > > - Check "baz" at "foo"'s offset minus 30 bytes. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > ``ETH`` > > ^^^^^^^ > >=20 > > Matches an Ethernet header. > >=20 > > - ``dst``: destination MAC. > > - ``src``: source MAC. > > - ``type``: EtherType. > > - ``tags``: number of 802.1Q/ad tags defined. > > - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one= : > >=20 > > - ``tpid``: Tag protocol identifier. > > - ``tci``: Tag control information. > >=20 > > ``IPV4`` > > ^^^^^^^^ > >=20 > > Matches an IPv4 header. > >=20 > > - ``src``: source IP address. > > - ``dst``: destination IP address. > > - ``tos``: ToS/DSCP field. > > - ``ttl``: TTL field. > > - ``proto``: protocol number for the next layer. > >=20 > > ``IPV6`` > > ^^^^^^^^ > >=20 > > Matches an IPv6 header. > >=20 > > - ``src``: source IP address. > > - ``dst``: destination IP address. > > - ``tc``: traffic class field. > > - ``nh``: Next header field (protocol). > > - ``hop_limit``: hop limit field (TTL). > >=20 > > ``ICMP`` > > ^^^^^^^^ > >=20 > > Matches an ICMP header. > >=20 > > - TBD. > >=20 > > ``UDP`` > > ^^^^^^^ > >=20 > > Matches a UDP header. > >=20 > > - ``sport``: source port. > > - ``dport``: destination port. > > - ``length``: UDP length. > > - ``checksum``: UDP checksum. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > ``TCP`` > > ^^^^^^^ > >=20 > > Matches a TCP header. > >=20 > > - ``sport``: source port. > > - ``dport``: destination port. > > - All other TCP fields and bits. > >=20 > > ``VXLAN`` > > ^^^^^^^^^ > >=20 > > Matches a VXLAN header. > >=20 > > - TBD. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Actions > > ~~~~~~~ > >=20 > > Each possible action is represented by a type. Some have associated > > configuration structures. Several actions combined in a list can be a= ffected > > to a flow rule. That list is not ordered. > >=20 > > At least one action must be defined in a filter rule in order to do > > something with matched packets. > >=20 > > - Actions are defined with ``struct rte_flow_action``. > > - A list of actions is defined with ``struct rte_flow_actions``. > >=20 > > They fall in three categories: > >=20 > > - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent > > processing matched packets by subsequent flow rules, unless overrid= den > > with PASSTHRU. > >=20 > > - Non terminating actions (PASSTHRU, DUP) that leave matched packets = up > > for > > additional processing by subsequent flow rules. > >=20 > > - Other non terminating meta actions that do not affect the fate of p= ackets > > (END, VOID, ID, COUNT). > >=20 > > When several actions are combined in a flow rule, they should all hav= e > > different types (e.g. dropping a packet twice is not possible). Howev= er > > considering the VOID type is an exception to this rule, the defined b= ehavior > > is for PMDs to only take into account the last action of a given type= found > > in the list. PMDs still perform error checking on the entire list. > >=20 > > *Note that PASSTHRU is the only action able to override a terminating= rule.* > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Example of an action that redirects packets to queue index 10: > >=20 > > +----------------+ > > | QUEUE | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D+ > > | ``queue`` | 10 | > > +-----------+----+ > >=20 > > Action lists examples, their order is not significant, applications m= ust > > consider all actions to be performed simultaneously: > >=20 > > +----------------+ > > | Count and drop | > > +=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+ > > | COUNT | | > > +-------+--------+ > > | DROP | | > > +-------+--------+ > >=20 > > +--------------------------+ > > | Tag, count and redirect | > > +=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D+ > > | ID | ``id`` | 0x2a | > > +-------+-----------+------+ > > | COUNT | | > > +-------+-----------+------+ > > | QUEUE | ``queue`` | 10 | > > +-------+-----------+------+ > >=20 > > +-----------------------+ > > | Redirect to queue 5 | > > +=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | DROP | | > > +-------+-----------+---+ > > | QUEUE | ``queue`` | 5 | > > +-------+-----------+---+ > >=20 > > In the above example, considering both actions are performed > > simultaneously, > > its end result is that only QUEUE has any effect. > >=20 > > +-----------------------+ > > | Redirect to queue 3 | > > +=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D+ > > | QUEUE | ``queue`` | 5 | > > +-------+-----------+---+ > > | VOID | | > > +-------+-----------+---+ > > | QUEUE | ``queue`` | 3 | > > +-------+-----------+---+ > >=20 > > As previously described, only the last action of a given type found i= n the > > list is taken into account. The above example also shows that VOID is > > ignored. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Action types > > ~~~~~~~~~~~~ > >=20 > > Common action types are described in this section. Like pattern item = types, > > this list is not exhaustive as new actions will be added in the futur= e. > >=20 > > ``END`` (action) > > ^^^^^^^^^^^^^^^^ > >=20 > > End marker for action lists. Prevents further processing of actions, = thereby > > ending the list. > >=20 > > - Its numeric value is **0** for convenience. > > - PMD support is mandatory. > > - No configurable property. > >=20 > > +---------------+ > > | END | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | no properties | > > +---------------+ > >=20 > > ``VOID`` (action) > > ^^^^^^^^^^^^^^^^^ > >=20 > > Used as a placeholder for convenience. It is ignored and simply disca= rded by > > PMDs. > >=20 > > - PMD support is mandatory. > > - No configurable property. > >=20 > > +---------------+ > > | VOID | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | no properties | > > +---------------+ > >=20 > > ``PASSTHRU`` > > ^^^^^^^^^^^^ > >=20 > > Leaves packets up for additional processing by subsequent flow rules.= This > > is the default when a rule does not contain a terminating action, but= can be > > specified to force a rule to become non-terminating. > >=20 > > - No configurable property. > >=20 > > +---------------+ > > | PASSTHRU | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | no properties | > > +---------------+ > >=20 > > Example to copy a packet to a queue and continue processing by subseq= uent > > flow rules: > [Sugesh] If a packet get copied to a queue, it=E2=80=99s a termination = action.=20 > How can its possible to do subsequent action after the packet already=20 > moved to the queue. ?How it differs from DUP action? > Am I missing anything here?=20 Devices may not support the combination of QUEUE + PASSTHRU (i.e. making QUEUE non-terminating). However these same devices may expose the ability= to copy a packet to another (sniffer) queue all while keeping the rule terminating (QUEUE + DUP but no PASSTHRU). DUP with two rules, assuming priorties and PASSTRHU are supported: - pattern X, priority 0; actions: QUEUE 5, PASSTHRU (non-terminating) - pattern X, priority 1; actions: QUEUE 6 (terminating) DUP with two actions on a single rule and a single priority: - pattern X, priority 0; actions: DUP 5, QUEUE 6 (terminating) If supported, from an application point of view the end result is similar= in both cases (note the second case may be implemented by the PMD using two = HW rules internally). However the second case does not waste a priority level and clearly state= s the intent to the PMD which is more likely to be supported. If HW support= s DUP directly it is even faster since there is a single rule. That is why = I thought having DUP as an action would be useful. > > +--------------------------+ > > | Copy to queue 8 | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D+ > > | PASSTHRU | | > > +----------+-----------+---+ > > | QUEUE | ``queue`` | 8 | > > +----------+-----------+---+ > >=20 > > ``ID`` > > ^^^^^^ > >=20 > > Attaches a 32 bit value to packets. > >=20 > > +----------------------------------------------+ > > | ID | > > +=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``id`` | 32 bit value to return with packets | > > +--------+-------------------------------------+ > >=20 > [Sugesh] I assume the application has to program the flow=20 > with a unique ID and matching packets are stamped with this ID > when reporting to the software. The uniqueness of ID is NOT=20 > guaranteed by the API framework. Correct me if I am wrong here. You are right, if the way I wrote it is not clear enough, I'm open to suggestions to improve it. > [Sugesh] Is it a limitation to use only 32 bit ID? Is it possible to ha= ve a > 64 bit ID? So that application can use the control plane flow pointer > Itself as an ID. Does it make sense?=20 I've specified a 32 bit ID for now because this is what FDIR supports and also what existing devices can report today AFAIK (i40e and mlx5). We could use 64 bit for future-proofness in a separate action like "ID64" when at least one device supports it. To PMD maintainers: please comment if you know devices that support taggi= ng matching packets with more than 32 bits of user-provided data! > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > ``QUEUE`` > > ^^^^^^^^^ > >=20 > > Assigns packets to a given queue index. > >=20 > > - Terminating by default. > >=20 > > +--------------------------------+ > > | QUEUE | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``queue`` | queue index to use | > > +-----------+--------------------+ > >=20 > > ``DROP`` > > ^^^^^^^^ > >=20 > > Drop packets. > >=20 > > - No configurable property. > > - Terminating by default. > > - PASSTHRU overrides this action if both are specified. > >=20 > > +---------------+ > > | DROP | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | no properties | > > +---------------+ > >=20 > > ``COUNT`` > > ^^^^^^^^^ > >=20 > [Sugesh] Should we really have to set count action explicitly for every= rule? > IMHO it would be great to be an implicit action. Most of the applicatio= n would be > interested in the stats of almost all the filters/flows . I can see why, but no, it must be explicitly requested because you may wa= nt to know in advance when it is not supported. Also considering it is something else to be done by HW (a separate action), we can assume enabli= ng this may slow things down a bit. HW limitations may also prevent you from having as many flow counters as = you want, in which case you probably want to carefully pick which rules have them. I think this target is most useful with DROP, VF and PF actions since those are currently the only ones where SW may not see the related packet= s. > > Enables hits counter for this rule. > >=20 > > This counter can be retrieved and reset through ``rte_flow_query()``,= see > > ``struct rte_flow_query_count``. > >=20 > > - Counters can be retrieved with ``rte_flow_query()``. > > - No configurable property. > >=20 > > +---------------+ > > | COUNT | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | no properties | > > +---------------+ > >=20 > > Query structure to retrieve and reset the flow rule hits counter: > >=20 > > +------------------------------------------------+ > > | COUNT query | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``reset`` | in | reset counter after query | > > +-----------+-----+------------------------------+ > > | ``hits`` | out | number of hits for this flow | > > +-----------+-----+------------------------------+ > >=20 > > ``DUP`` > > ^^^^^^^ > >=20 > > Duplicates packets to a given queue index. > >=20 > > This is normally combined with QUEUE, however when used alone, it is > > actually similar to QUEUE + PASSTHRU. > >=20 > > - Non-terminating by default. > >=20 > > +------------------------------------------------+ > > | DUP | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``queue`` | queue index to duplicate packet to | > > +-----------+------------------------------------+ > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > ``RSS`` > > ^^^^^^^ > >=20 > > Similar to QUEUE, except RSS is additionally performed on packets to = spread > > them among several queues according to the provided parameters. > >=20 > > - Terminating by default. > >=20 > > +---------------------------------------------+ > > | RSS | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``rss_conf`` | RSS parameters | > > +--------------+------------------------------+ > > | ``queues`` | number of entries in queue[] | > > +--------------+------------------------------+ > > | ``queue[]`` | queue indices to use | > > +--------------+------------------------------+ > >=20 > > ``PF`` (action) > > ^^^^^^^^^^^^^^^ > >=20 > > Redirects packets to the physical function (PF) of the current device= . > >=20 > > - No configurable property. > > - Terminating by default. > >=20 > > +---------------+ > > | PF | > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | no properties | > > +---------------+ > >=20 > > ``VF`` (action) > > ^^^^^^^^^^^^^^^ > >=20 > > Redirects packets to the virtual function (VF) of the current device = with > > the specified ID. > >=20 > > - Terminating by default. > >=20 > > +---------------------------------------+ > > | VF | > > +=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | ``id`` | VF ID to redirect packets to | > > +--------+------------------------------+ > >=20 > > Planned types > > ~~~~~~~~~~~~~ > >=20 > > Other action types are planned but not defined yet. These actions wil= l add > > the ability to alter matching packets in several ways, such as perfor= ming > > encapsulation/decapsulation of tunnel headers on specific flows. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Rules management > > ---------------- > >=20 > > A simple API with only four functions is provided to fully manage flo= ws. > >=20 > > Each created flow rule is associated with an opaque, PMD-specific han= dle > > pointer. The application is responsible for keeping it until the rule= is > > destroyed. > >=20 > > Flows rules are defined with ``struct rte_flow``. > >=20 > > Validation > > ~~~~~~~~~~ > >=20 > > Given that expressing a definite set of device capabilities with this= API is > > not practical, a dedicated function is provided to check if a flow ru= le is > > supported and can be created. > >=20 > > :: > >=20 > > int > > rte_flow_validate(uint8_t port_id, > > const struct rte_flow_pattern *pattern, > > const struct rte_flow_actions *actions); > >=20 > > While this function has no effect on the target device, the flow rule= is > > validated against its current configuration state and the returned va= lue > > should be considered valid by the caller for that state only. > >=20 > > The returned value is guaranteed to remain valid only as long as no > > successful calls to rte_flow_create() or rte_flow_destroy() are made = in the > > meantime and no device parameter affecting flow rules in any way are > > modified, due to possible collisions or resource limitations (althoug= h in > > such cases ``EINVAL`` should not be returned). > >=20 > > Arguments: > >=20 > > - ``port_id``: port identifier of Ethernet device. > > - ``pattern``: pattern specification to check. > > - ``actions``: actions associated with the flow definition. > >=20 > > Return value: > >=20 > > - **0** if flow rule is valid and can be created. A negative errno va= lue > > otherwise (``rte_errno`` is also set), the following errors are def= ined. > > - ``-EINVAL``: unknown or invalid rule specification. > > - ``-ENOTSUP``: valid but unsupported rule specification (e.g. partia= l masks > > are unsupported). > > - ``-EEXIST``: collision with an existing rule. > > - ``-ENOMEM``: not enough resources. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Creation > > ~~~~~~~~ > >=20 > > Creating a flow rule is similar to validating one, except the rule is > > actually created. > >=20 > > :: > >=20 > > struct rte_flow * > > rte_flow_create(uint8_t port_id, > > const struct rte_flow_pattern *pattern, > > const struct rte_flow_actions *actions); > >=20 > > Arguments: > >=20 > > - ``port_id``: port identifier of Ethernet device. > > - ``pattern``: pattern specification to add. > > - ``actions``: actions associated with the flow definition. > >=20 > > Return value: > >=20 > > A valid flow pointer in case of success, NULL otherwise and ``rte_err= no`` is > > set to the positive version of one of the error codes defined for > > ``rte_flow_validate()``. > [Sugesh] : Kind of implementation specific query. What if application > try to add duplicate rules? Does the API create new flow entry for ever= y=20 > API call?=20 If an application adds duplicate rules at a given priority level, the sec= ond one may return an error depending on the PMD. Collisions are sometimes trivial to detect (such as the same pattern twice), others not so much (o= ne matching an Ethernet header only, the other one matching an IP header onl= y). Either way if a packet is matched by two rules at a given priority level, what happens is described in 3.3 (High level design) and 4.4.1 (Prioritie= s). Applications are responsible for not relying on the PMD to detect these, = or should use a single priority level for each rule to make things clear. However since the number of HW priority levels is finite and possibly sma= ll, they must also make sure not to waste them. My advice is to only use priority levels when it cannot be proven that rules do not collide. If all you have is perfect matching rules without wildcards and all of th= em match the same number of layers, a single priority level is fine. > [Sugesh] Another concern is the cost and time of installing these rules > in the hardware. Can we make these APIs time bound(or at least an optio= n to > set the time limit to execute these APIs), so that > Application doesn=E2=80=99t have to wait so long when installing and de= leting flows with > slow hardware/NIC. What do you think? Most of the datapath flow install= ations are=20 > dynamic and triggered only when there is > an ingress traffic. Delay in flow insertion/deletion have unpredictable= consequences. This API is (currently) aimed at the control path only, and must indeed b= e assumed to be slow. Creating million of rules may take quite long as it m= ay involve syscalls and other time-consuming synchronization things on the P= MD side. So currently there is no plan to have rules added from the data path with time constraints. I think it would be implemented through a different set= of functions anyway. I do not think adding time limits is practical, even specifying in the AP= I that creating a single flow rule must take less than a maximum number of seconds in order to be effective is too much of a constraint (application= s that create all flows during init may not care after all). You should consider in any case that modifying flow rules will always be slower than receiving packets, there is no way around that. Applications have to live with it and provide a software fallback for incoming packets while managing flow rules. Moreover, think about what happens when you hit the maximum number of flo= w rules and cannot create any more. Applications need to implement some kin= d of fallback in their data path. Offloading flows in HW is also only useful if they live much longer than = the time taken to create and delete them. Perhaps applications may choose to = do so after detecting long lived flows such as TCP sessions. You may have one separate control thread dedicated to manage flows and keep your normal control thread unaffected by delays. Several threads can even be dedicated, one per device. > [Sugesh] Another query is on the synchronization part. What if same rul= es are=20 > handled from different threads? Is application responsible for handling= the concurrent > hardware programming? Like most (if not all) DPDK APIs, applications are responsible for managi= ng locking issues as decribed in 4.3 (Behavior). Since this is a control pat= h API and applications usually have a single control thread, locking should not be necessary in most cases. Regarding my above comment about using several control threads to manage different devices, section 4.3 says: =20 "There is no provision for reentrancy/multi-thread safety, although noth= ing should prevent different devices from being configured at the same time. PMDs may protect their control path functions accordingly." I'd like to emphasize it is not "per port" but "per device", since in a f= ew cases a configurable resource is shared by several ports. It may be difficult for applications to determine which ports are shared by a given device but this falls outside the scope of this API. Do you think adding the guarantee that it is always safe to configure two different ports simultaneously without locking from the application side = is necessary? In which case the PMD would be responsible for locking shared resources. > > Destruction > > ~~~~~~~~~~~ > >=20 > > Flow rules destruction is not automatic, and a queue should not be re= leased > > if any are still attached to it. Applications must take care of perfo= rming > > this step before releasing resources. > >=20 > > :: > >=20 > > int > > rte_flow_destroy(uint8_t port_id, > > struct rte_flow *flow); > >=20 > >=20 > [Sugesh] I would suggest having a clean-up API is really useful as the = releasing of > Queue(is it applicable for releasing of port too?) is not guaranteeing = the automatic flow=20 > destruction. Would something like rte_flow_flush(port_id) do the trick? I wanted to emphasize in this first draft that applications should really keep the fl= ow pointers around in order to manage/destroy them. It is their responsibili= ty, not PMD's. > This way application can initialize the port, > clean-up all the existing rules and create new rules on a clean slate. No resource can be released as long as a flow rule is using it (bad thing= s may happen otherwise), all flow rules must be destroyed first, thus none = can possibly remain after initializing a port. It is assumed that PMDs do automatic clean up during init if necessary to ensure this. > > Failure to destroy a flow rule may occur when other flow rules depend= on it, > > and destroying it would result in an inconsistent state. > >=20 > > This function is only guaranteed to succeed if flow rules are destroy= ed in > > reverse order of their creation. > >=20 > > Arguments: > >=20 > > - ``port_id``: port identifier of Ethernet device. > > - ``flow``: flow rule to destroy. > >=20 > > Return value: > >=20 > > - **0** on success, a negative errno value otherwise and ``rte_errno`= ` is > > set. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Query > > ~~~~~ > >=20 > > Query an existing flow rule. > >=20 > > This function allows retrieving flow-specific data such as counters. = Data > > is gathered by special actions which must be present in the flow rule > > definition. > >=20 > > :: > >=20 > > int > > rte_flow_query(uint8_t port_id, > > struct rte_flow *flow, > > enum rte_flow_action_type action, > > void *data); > >=20 > > Arguments: > >=20 > > - ``port_id``: port identifier of Ethernet device. > > - ``flow``: flow rule to query. > > - ``action``: action type to query. > > - ``data``: pointer to storage for the associated query data type. > >=20 > > Return value: > >=20 > > - **0** on success, a negative errno value otherwise and ``rte_errno`= ` is > > set. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Behavior > > -------- > >=20 > > - API operations are synchronous and blocking (``EAGAIN`` cannot be > > returned). > >=20 > > - There is no provision for reentrancy/multi-thread safety, although = nothing > > should prevent different devices from being configured at the same > > time. PMDs may protect their control path functions accordingly. > >=20 > > - Stopping the data path (TX/RX) should not be necessary when managin= g > > flow > > rules. If this cannot be achieved naturally or with workarounds (su= ch as > > temporarily replacing the burst function pointers), an appropriate = error > > code must be returned (``EBUSY``). > >=20 > > - PMDs, not applications, are responsible for maintaining flow rules > > configuration when stopping and restarting a port or performing oth= er > > actions which may affect them. They can only be destroyed explicitl= y. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > [Sugesh] Query all the rules for a specific port/queue?? Useful when ad= ding and > deleting ports and queues dynamically according to the need. I am not s= ure=20 > what are the other different usecases for these APIs. But I feel it ma= kes much easier to=20 > manage flows from the application. What do you think? Not sure, that seems to fall out of the scope of this API. As described, applications already store the related rte_flow pointers. Accordingly, th= ey know how many rules are associated to a given port. They need both a port= ID and a flow rule pointer to destroy them after all. Now perhaps something to convert back an existing rte_flow to a pattern a= nd a list of actions, however I cannot see an immediate use case for it. What you describe seems to be doable through a front-end API, I think keeping this one as low-level as possible with only basic actions is bett= er right now. I'll keep your suggestion in mind. > > Compatibility > > ------------- > >=20 > > No known hardware implementation supports all the features described = in > > this > > document. > >=20 > > Unsupported features or combinations are not expected to be fully > > emulated > > in software by PMDs for performance reasons. Partially supported feat= ures > > may be completed in software as long as hardware performs most of the > > work > > (such as queue redirection and packet recognition). > >=20 > > However PMDs are expected to do their best to satisfy application req= uests > > by working around hardware limitations as long as doing so does not a= ffect > > the behavior of existing flow rules. > >=20 > > The following sections provide a few examples of such cases, they are= based > > on limitations built into the previous APIs. > >=20 > > Global bitmasks > > ~~~~~~~~~~~~~~~ > >=20 > > Each flow rule comes with its own, per-layer bitmasks, while hardware= may > > support only a single, device-wide bitmask for a given layer type, so= that > > two IPv4 rules cannot use different bitmasks. > >=20 > > The expected behavior in this case is that PMDs automatically configu= re > > global bitmasks according to the needs of the first created flow rule= . > >=20 > > Subsequent rules are allowed only if their bitmasks match those, the > > ``EEXIST`` error code should be returned otherwise. > >=20 > > Unsupported layer types > > ~~~~~~~~~~~~~~~~~~~~~~~ > >=20 > > Many protocols can be simulated by crafting patterns with the `RAW`_ = type. > >=20 > > PMDs can rely on this capability to simulate support for protocols wi= th > > fixed headers not directly recognized by hardware. > >=20 > > ``ANY`` pattern item > > ~~~~~~~~~~~~~~~~~~~~ > >=20 > > This pattern item stands for anything, which can be difficult to tran= slate > > to something hardware would understand, particularly if followed by m= ore > > specific types. > >=20 > > Consider the following pattern: > >=20 > > +---+--------------------------------+ > > | 0 | ETHER | > > +---+--------------------------------+ > > | 1 | ANY (``min`` =3D 1, ``max`` =3D 1) | > > +---+--------------------------------+ > > | 2 | TCP | > > +---+--------------------------------+ > >=20 > > Knowing that TCP does not make sense with something other than IPv4 a= nd > > IPv6 > > as L3, such a pattern may be translated to two flow rules instead: > >=20 > > +---+--------------------+ > > | 0 | ETHER | > > +---+--------------------+ > > | 1 | IPV4 (zeroed mask) | > > +---+--------------------+ > > | 2 | TCP | > > +---+--------------------+ > >=20 > > +---+--------------------+ > > | 0 | ETHER | > > +---+--------------------+ > > | 1 | IPV6 (zeroed mask) | > > +---+--------------------+ > > | 2 | TCP | > > +---+--------------------+ > >=20 > > Note that as soon as a ANY rule covers several layers, this approach = may > > yield a large number of hidden flow rules. It is thus suggested to on= ly > > support the most common scenarios (anything as L2 and/or L3). > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > Unsupported actions > > ~~~~~~~~~~~~~~~~~~~ > >=20 > > - When combined with a `QUEUE`_ action, packet counting (`COUNT`_) an= d > > tagging (`ID`_) may be implemented in software as long as the targe= t queue > > is used by a single rule. > >=20 > > - A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two h= idden > > rules combining `QUEUE`_ and `PASSTHRU`_. > >=20 > > - When a single target queue is provided, `RSS`_ can also be implemen= ted > > through `QUEUE`_. > >=20 > > Flow rules priority > > ~~~~~~~~~~~~~~~~~~~ > >=20 > > While it would naturally make sense, flow rules cannot be assumed to = be > > processed by hardware in the same order as their creation for several > > reasons: > >=20 > > - They may be managed internally as a tree or a hash table instead of= a > > list. > > - Removing a flow rule before adding another one can either put the n= ew > > rule > > at the end of the list or reuse a freed entry. > > - Duplication may occur when packets are matched by several rules. > >=20 > > For overlapping rules (particularly in order to use the `PASSTHRU`_ a= ction) > > predictable behavior is only guaranteed by using different priority l= evels. > >=20 > > Priority levels are not necessarily implemented in hardware, or may b= e > > severely limited (e.g. a single priority bit). > >=20 > > For these reasons, priority levels may be implemented purely in softw= are by > > PMDs. > >=20 > > - For devices expecting flow rules to be added in the correct order, = PMDs > > may destroy and re-create existing rules after adding a new one wit= h > > a higher priority. > >=20 > > - A configurable number of dummy or empty rules can be created at > > initialization time to save high priority slots for later. > >=20 > > - In order to save priority levels, PMDs may evaluate whether rules a= re > > likely to collide and adjust their priority accordingly. > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > API migration > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=20 > > Exhaustive list of deprecated filter types and how to convert them to > > generic flow rules. > >=20 > > ``MACVLAN`` to ``ETH`` =E2=86=92 ``VF``, ``PF`` > > --------------------------------------- > >=20 > > `MACVLAN`_ can be translated to a basic `ETH`_ flow rule with a `VF > > (action)`_ or `PF (action)`_ terminating action. > >=20 > > +------------------------------------+ > > | MACVLAN | > > +--------------------------+---------+ > > | Pattern | Actions | > > +=3D=3D=3D+=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | ETH | ``spec`` | any | VF, | > > | | +----------+-----+ PF | > > | | | ``mask`` | any | | > > +---+-----+----------+-----+---------+ > >=20 > > ``ETHERTYPE`` to ``ETH`` =E2=86=92 ``QUEUE``, ``DROP`` > > ---------------------------------------------- > >=20 > > `ETHERTYPE`_ is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`= _ as > > a terminating action. > >=20 > > +------------------------------------+ > > | ETHERTYPE | > > +--------------------------+---------+ > > | Pattern | Actions | > > +=3D=3D=3D+=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | ETH | ``spec`` | any | QUEUE, | > > | | +----------+-----+ DROP | > > | | | ``mask`` | any | | > > +---+-----+----------+-----+---------+ > >=20 > > ``FLEXIBLE`` to ``RAW`` =E2=86=92 ``QUEUE`` > > ----------------------------------- > >=20 > > `FLEXIBLE`_ can be translated to one `RAW`_ pattern with `QUEUE`_ as = the > > terminating action and a defined priority level. > >=20 > > +------------------------------------+ > > | FLEXIBLE | > > +--------------------------+---------+ > > | Pattern | Actions | > > +=3D=3D=3D+=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | RAW | ``spec`` | any | QUEUE | > > | | +----------+-----+ | > > | | | ``mask`` | any | | > > +---+-----+----------+-----+---------+ > >=20 > > ``SYN`` to ``TCP`` =E2=86=92 ``QUEUE`` > > ------------------------------ > >=20 > > `SYN`_ is a `TCP`_ rule with only the ``syn`` bit enabled and masked,= and > > `QUEUE`_ as the terminating action. > >=20 > > Priority level can be set to simulate the high priority bit. > >=20 > > +---------------------------------------------+ > > | SYN | > > +-----------------------------------+---------+ > > | Pattern | Actions | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | ETH | ``spec`` | N/A | QUEUE | > > | | +----------+-------------+ | > > | | | ``mask`` | empty | | > > +---+------+----------+-------------+ | > > | 1 | IPV4 | ``spec`` | N/A | | > > | | +----------+-------------+ | > > | | | ``mask`` | empty | | > > +---+------+----------+-------------+ | > > | 2 | TCP | ``spec`` | ``syn`` =3D 1 | | > > | | +----------+-------------+ | > > | | | ``mask`` | ``syn`` =3D 1 | | > > +---+------+----------+-------------+---------+ > >=20 > > ``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` =E2=86=92 ``QUEUE`` > > ---------------------------------------------------- > >=20 > > `NTUPLE`_ is similar to specifying an empty L2, `IPV4`_ as L3 with `T= CP`_ or > > `UDP`_ as L4 and `QUEUE`_ as the terminating action. > >=20 > > A priority level can be specified as well. > >=20 > > +---------------------------------------+ > > | NTUPLE | > > +-----------------------------+---------+ > > | Pattern | Actions | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D= =3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | ETH | ``spec`` | N/A | QUEUE | > > | | +----------+-------+ | > > | | | ``mask`` | empty | | > > +---+------+----------+-------+ | > > | 1 | IPV4 | ``spec`` | any | | > > | | +----------+-------+ | > > | | | ``mask`` | any | | > > +---+------+----------+-------+ | > > | 2 | TCP, | ``spec`` | any | | > > | | UDP +----------+-------+ | > > | | | ``mask`` | any | | > > +---+------+----------+-------+---------+ > >=20 > > ``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) =E2=86= =92 ``QUEUE`` > > ---------------------------------------------------------------------= ------ > >=20 > > `TUNNEL`_ matches common IPv4 and IPv6 L3/L4-based tunnel types. > >=20 > > In the following table, `ANY`_ is used to cover the optional L4. > >=20 > > +------------------------------------------------+ > > | TUNNEL | > > +--------------------------------------+---------+ > > | Pattern | Actions | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | ETH | ``spec`` | any | QUEUE | > > | | +----------+-------------+ | > > | | | ``mask`` | any | | > > +---+---------+----------+-------------+ | > > | 1 | IPV4, | ``spec`` | any | | > > | | IPV6 +----------+-------------+ | > > | | | ``mask`` | any | | > > +---+---------+----------+-------------+ | > > | 2 | ANY | ``spec`` | ``min`` =3D 0 | | > > | | | +-------------+ | > > | | | | ``max`` =3D 0 | | > > | | +----------+-------------+ | > > | | | ``mask`` | N/A | | > > +---+---------+----------+-------------+ | > > | 3 | VXLAN, | ``spec`` | any | | > > | | GENEVE, +----------+-------------+ | > > | | TEREDO, | ``mask`` | any | | > > | | NVGRE, | | | | > > | | GRE, | | | | > > | | ... | | | | > > +---+---------+----------+-------------+---------+ > >=20 > > .. raw:: pdf > >=20 > > PageBreak > >=20 > > ``FDIR`` to most item types =E2=86=92 ``QUEUE``, ``DROP``, ``PASSTHRU= `` > > --------------------------------------------------------------- > >=20 > > `FDIR`_ is more complex than any other type, there are several method= s to > > emulate its functionality. It is summarized for the most part in the = table > > below. > >=20 > > A few features are intentionally not supported: > >=20 > > - The ability to configure the matching input set and masks for the e= ntire > > device, PMDs should take care of it automatically according to flow= rules. > >=20 > > - Returning four or eight bytes of matched data when using flex bytes > > filtering. Although a specific action could implement it, it confli= cts > > with the much more useful 32 bits tagging on devices that support i= t. > >=20 > > - Side effects on RSS processing of the entire device. Flow rules tha= t > > conflict with the current device configuration should not be > > allowed. Similarly, device configuration should not be allowed when= it > > affects existing flow rules. > >=20 > > - Device modes of operation. "none" is unsupported since filtering ca= nnot be > > disabled as long as a flow rule is present. > >=20 > > - "MAC VLAN" or "tunnel" perfect matching modes should be automatical= ly > > set > > according to the created flow rules. > >=20 > > +----------------------------------------------+ > > | FDIR | > > +---------------------------------+------------+ > > | Pattern | Actions | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | ETH, | ``spec`` | any | QUEUE, | > > | | RAW +----------+-----+ DROP, | > > | | | ``mask`` | any | PASSTHRU | > > +---+------------+----------+-----+------------+ > > | 1 | IPV4, | ``spec`` | any | ID | > > | | IPV6 +----------+-----+ (optional) | > > | | | ``mask`` | any | | > > +---+------------+----------+-----+ | > > | 2 | TCP, | ``spec`` | any | | > > | | UDP, +----------+-----+ | > > | | SCTP | ``mask`` | any | | > > +---+------------+----------+-----+ | > > | 3 | VF, | ``spec`` | any | | > > | | PF, +----------+-----+ | > > | | SIGNATURE | ``mask`` | any | | > > | | (optional) | | | | > > +---+------------+----------+-----+------------+ > >=20 > > ``HASH`` > > ~~~~~~~~ > >=20 > > Hashing configuration is set per rule through the `SIGNATURE`_ item. > >=20 > > Since it is usually a global device setting, all flow rules created w= ith > > this item may have to share the same specification. > >=20 > > ``L2_TUNNEL`` to ``VOID`` =E2=86=92 ``VXLAN`` (or others) > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >=20 > > All packets are matched. This type alters incoming packets to encapsu= late > > them in a chosen tunnel type, optionally redirect them to a VF as wel= l. > >=20 > > The destination pool for tag based forwarding can be emulated with ot= her > > flow rules using `DUP`_ as the action. > >=20 > > +----------------------------------------+ > > | L2_TUNNEL | > > +---------------------------+------------+ > > | Pattern | Actions | > > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D= =3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > > | 0 | VOID | ``spec`` | N/A | VXLAN, | > > | | | | | GENEVE, | > > | | | | | ... | > > | | +----------+-----+------------+ > > | | | ``mask`` | N/A | VF | > > | | | | | (optional) | > > +---+------+----------+-----+------------+ > >=20 > > -- > > Adrien Mazarguil > > 6WIND --=20 Adrien Mazarguil 6WIND