From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adrien Mazarguil Subject: Re: [RFC] Generic flow director/filtering/classification API Date: Thu, 7 Jul 2016 12:26:50 +0200 Message-ID: <20160707102650.GU7621@6wind.com> References: <20160705181646.GO7621@6wind.com> <6A0DE07E22DDAD4C9103DF62FEBC09090348E1A7@shsmsx102.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" , Thomas Monjalon , "Zhang, Helin" , "Wu, Jingjing" , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Jan Medala , John Daley , "Chen, Jing D" , "Ananyev, Konstantin" , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , "De Lara Guarch, Pablo" , Olga Shern To: "Lu, Wenzhuo" Return-path: Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) by dpdk.org (Postfix) with ESMTP id 26651FE5 for ; Thu, 7 Jul 2016 12:26:55 +0200 (CEST) Received: by mail-wm0-f53.google.com with SMTP id f126so204833406wma.1 for ; Thu, 07 Jul 2016 03:26:55 -0700 (PDT) Content-Disposition: inline In-Reply-To: <6A0DE07E22DDAD4C9103DF62FEBC09090348E1A7@shsmsx102.ccr.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Lu Wenzhuo, Thanks for your feedback, I'm replying below as well. On Thu, Jul 07, 2016 at 07:14:18AM +0000, Lu, Wenzhuo wrote: > Hi Adrien, > I have some questions, please see inline, thanks. >=20 > > -----Original Message----- > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] > > Sent: Wednesday, July 6, 2016 2:17 AM > > To: dev@dpdk.org > > Cc: Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody; Ajit Kh= aparde; > > Rahul Lakkireddy; Lu, Wenzhuo; Jan Medala; John Daley; Chen, Jing D; = Ananyev, > > Konstantin; Matej Vido; Alejandro Lucero; Sony Chacko; Jerin Jacob; D= e Lara > > Guarch, Pablo; Olga Shern > > Subject: [RFC] Generic flow director/filtering/classification API > >=20 > >=20 > > Requirements for a new API: > >=20 > > - Flexible and extensible without causing API/ABI problems for existi= ng > > applications. > > - Should be unambiguous and easy to use. > > - Support existing filtering features and actions listed in `Filter t= ypes`_. > > - Support packet alteration. > > - In case of overlapping filters, their priority should be well docum= ented. > Does that mean we don't guarantee the consistent of priority? The prior= ity can be different on different NICs. So the behavior of the actions c= an be different. Right? No, the intent is precisely to define what happens in order to get a consistent result across different devices, and document cases with undefined behavior. There must be no room left for interpretation. For example, the API must describe what happens when two overlapping filt= ers (e.g. one matching an Ethernet header, another one matching an IP header) match a given packet at a given priority level. It is documented in section 4.1.1 (priorities) as "undefined behavior". Applications remain free to do it and deal with consequences, at least th= ey know they cannot expect a consistent outcome, unless they use different priority levels for both rules, see also 4.4.5 (flow rules priority). > Seems the users still need to aware the some details of the HW? Do we n= eed to add the negotiation for the priority? Priorities as defined in this document may not be directly mappable to HW capabilities (e.g. HW does not support enough priorities, or that some corner case make them not work as described), in which case the PMD may choose to simulate priorities (again 4.4.5), as long as the end result follows the specification. So users must not be aware of some HW details, the PMD does and must perf= orm the needed workarounds to suit their expectations. Users may only be impacted by errors while attempting to create rules that are either unsupported or would cause them (or existing rules) to diverge from the spec. > > Flow rules can have several distinct actions (such as counting, > > encapsulating, decapsulating before redirecting packets to a particul= ar > > queue, etc.), instead of relying on several rules to achieve this and= having > > applications deal with hardware implementation details regarding thei= r > > order. > I think normally HW doesn't support several actions in one rule. If a r= ule has several actions, seems HW has to split it to several rules. The o= rder can still be a problem. Note that, except for a very small subset of pattern items and actions, supporting multiple actions for a given rule is not mandatory, and can be emulated as you said by having to split them into several rules each with its own priority if possible (see 3.3 "high level design"). Also, a rule "action" as defined in this API can be just about anything, = for example combining a queue redirection with 32-bit tagging. FDIR supports many cases of what can be described as several actions, see 5.7 "FDIR to most item types =E2=86=92 QUEUE, DROP, PASSTHRU". If you were thinking about having two queue targets for a given rule, the= n I agree with you - that is why a rule cannot have more than a single acti= on of a given type (see 4.1.5 actions), to avoid such abuse from application= s. Applications must use several pass-through rules with different priority levels if they want to perform a given action several times on a given packet. Again, PMDs support is not mandatory as pass-through is optional. > > ``ETH`` > > ^^^^^^^ > >=20 > > Matches an Ethernet header. > >=20 > > - ``dst``: destination MAC. > > - ``src``: source MAC. > > - ``type``: EtherType. > > - ``tags``: number of 802.1Q/ad tags defined. > > - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one= : > >=20 > > - ``tpid``: Tag protocol identifier. > > - ``tci``: Tag control information. > "ETH" means all the parameters, dst, src, type... need to be matched? T= he same question for IPv4, IPv6 ... Yes, it's basically the description of all Ethernet header fields includi= ng VLAN tags (same for other protocols). Please see the linked draft header file which should make all of this easier to understand: https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h > > ``UDP`` > > ^^^^^^^ > >=20 > > Matches a UDP header. > >=20 > > - ``sport``: source port. > > - ``dport``: destination port. > > - ``length``: UDP length. > > - ``checksum``: UDP checksum. > Why checksum? Do we need to filter the packets by checksum value? Well, I've decided to include all protocol header fields for completeness (so the ABI does not need to be broken later then they become necessary, = or require another pattern item), not that all of them make sense in a patte= rn. In this specific case, all PMDs I know of must reject a pattern specification with a nonzero mask for the checksum field, because none of them support it. > > ``VOID`` (action) > > ^^^^^^^^^^^^^^^^^ > >=20 > > Used as a placeholder for convenience. It is ignored and simply disca= rded by > > PMDs. > Don't understand why we need VOID. If it=E2=80=99s about the format. Wh= y not guarantee it in rte layer? I'm not sure to understand your question about rte layer, but this type i= s fully managed by the PMD and is not supposed to be translated to a hardwa= re action. I think it may come handy in some cases (like the VOID pattern item), so = it is defined just in case. Should be relatively trivial to support. Applications may find a use for it when they want to statically define templates for flow rules, when they need room for some reason. > > Behavior > > -------- > >=20 > > - API operations are synchronous and blocking (``EAGAIN`` cannot be > > returned). > >=20 > > - There is no provision for reentrancy/multi-thread safety, although = nothing > > should prevent different devices from being configured at the same > > time. PMDs may protect their control path functions accordingly. > >=20 > > - Stopping the data path (TX/RX) should not be necessary when managin= g flow > > rules. If this cannot be achieved naturally or with workarounds (su= ch as > > temporarily replacing the burst function pointers), an appropriate = error > > code must be returned (``EBUSY``). > PMD cannot stop the data path without adding lock. So I think if some r= ules cannot be applied without stopping rx/tx, PMD has to return fail. > Or let the APP to stop the data path. Agreed, that is the intent. If the PMD cannot touch flow rules for some reason even after trying really hard, then it just returns EBUSY. Perhaps we should write down that applications may get a different outcom= e after stopping the data path if they get EBUSY? > > - PMDs, not applications, are responsible for maintaining flow rules > > configuration when stopping and restarting a port or performing oth= er > > actions which may affect them. They can only be destroyed explicitl= y. > Don=E2=80=99t understand " They can only be destroyed explicitly." This part says that as long as an application has not called rte_flow_destroy() on a flow rule, it never disappears, whatever happens = to the port (stopped, restarted). The application is not responsible for re-creating rules after that. Note that according to the specification, this may translate to not being able to stop a port as long as a flow rule is present, depending on how n= ice the PMD intends to be with applications. Implementation can be done in sm= all steps with minimal amount of code on the PMD side. > If a new rule has conflict with an old one, what should we do? Return f= ail? That should not happen. If say 100 rules have been created with various priorities and the port is happily running with them, stopping the port m= ay require the PMD to destroy them, it then has to re-create all 100 of them exactly as they were automatically when restarting the port. If re-creating them is not possible for some reason, the port cannot be restarted as long as rules that cannot be added back haven't been destroy= ed by the application. Frankly, this should not happen. To manage this case, I suggest preventing applications from doing things that conflict with existing flow rules while the port is stopped (just li= ke when it is not stopped, as described in 5.7 "FDIR to most item types"). > > ``ANY`` pattern item > > ~~~~~~~~~~~~~~~~~~~~ > >=20 > > This pattern item stands for anything, which can be difficult to tran= slate > > to something hardware would understand, particularly if followed by m= ore > > specific types. > >=20 > > Consider the following pattern: > >=20 > > +---+--------------------------------+ > > | 0 | ETHER | > > +---+--------------------------------+ > > | 1 | ANY (``min`` =3D 1, ``max`` =3D 1) | > > +---+--------------------------------+ > > | 2 | TCP | > > +---+--------------------------------+ > >=20 > > Knowing that TCP does not make sense with something other than IPv4 a= nd IPv6 > > as L3, such a pattern may be translated to two flow rules instead: > >=20 > > +---+--------------------+ > > | 0 | ETHER | > > +---+--------------------+ > > | 1 | IPV4 (zeroed mask) | > > +---+--------------------+ > > | 2 | TCP | > > +---+--------------------+ > >=20 > > +---+--------------------+ > > | 0 | ETHER | > > +---+--------------------+ > > | 1 | IPV6 (zeroed mask) | > > +---+--------------------+ > > | 2 | TCP | > > +---+--------------------+ > >=20 > > Note that as soon as a ANY rule covers several layers, this approach = may > > yield a large number of hidden flow rules. It is thus suggested to on= ly > > support the most common scenarios (anything as L2 and/or L3). > I think "any" may make things confusing. How about if the NIC doesn't = support IPv6? Should we return fail for this rule? In a sense you are right, ANY relies on HW capabilities so you cannot kno= w that it won't match unsupported protocols. The above example would be somewhat useless for a conscious application which should really have created two separate flow rules (and gotten an error on the IPv6 one). So an ANY flow rule only able to match v4 packets won't return an error. ANY can be useful to match outer packets when only a tunnel header and th= e inner packet are meaningful to the application. HW that does not recogniz= e the outer packet is not able to recognize the inner one anyway. This section only says that PMDs should do their best to make HW match wh= at they can when faced with ANY. Also once again, ANY support is not mandatory. > > Flow rules priority > > ~~~~~~~~~~~~~~~~~~~ > >=20 > > While it would naturally make sense, flow rules cannot be assumed to = be > > processed by hardware in the same order as their creation for several > > reasons: > >=20 > > - They may be managed internally as a tree or a hash table instead of= a > > list. > > - Removing a flow rule before adding another one can either put the n= ew rule > > at the end of the list or reuse a freed entry. > > - Duplication may occur when packets are matched by several rules. > >=20 > > For overlapping rules (particularly in order to use the `PASSTHRU`_ a= ction) > > predictable behavior is only guaranteed by using different priority l= evels. > >=20 > > Priority levels are not necessarily implemented in hardware, or may b= e > > severely limited (e.g. a single priority bit). > >=20 > > For these reasons, priority levels may be implemented purely in softw= are by > > PMDs. > >=20 > > - For devices expecting flow rules to be added in the correct order, = PMDs > > may destroy and re-create existing rules after adding a new one wit= h > > a higher priority. > >=20 > > - A configurable number of dummy or empty rules can be created at > > initialization time to save high priority slots for later. > >=20 > > - In order to save priority levels, PMDs may evaluate whether rules a= re > > likely to collide and adjust their priority accordingly. > If there's 3 rules, r1, r2,r3. The rules say the priority is r1 > r2 > = r3. If PMD can only support r1 > r3 > r2, or doesn't support r2. Should P= MD apply r1 and r3 or totally not support them all? Remember that the API lets applications create only one rule at a time. I= f all 3 rules are not supported together but individually are, the answer depends on what the application does: 1. r1 OK, r2 FAIL =3D> application chooses to stop here, thus only r1 wor= ks as expected (may roll back and remove r1 as a result). 2. r1 OK, r2 FAIL, r3 OK =3D> application chooses to ignore the fact r2 f= ailed and added r3 anyway, so it should end up with r1 > r3. Applications should do as described in 1, they need to check for errors i= f they want consistency. This document describes only the basic functions, but may be extended lat= er with methods to add several flow rules at once, so rules that depend on others can be added together and a single failure is returned without the need for a rollback at the application level. > A generic question, is the parsing supposed to be done by rte or PMD? Actually, a bit of both. EAL will certainly at least provide helpers to assist PMDs. This specification defines only the public-facing API for no= w, but our goal is really to have something that is not too difficult to implement both for applications and PMDs. These helpers can be defined later with the first implementation. --=20 Adrien Mazarguil 6WIND