From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adrien Mazarguil Subject: Re: [RFC] Generic flow director/filtering/classification API Date: Tue, 19 Jul 2016 15:12:19 +0200 Message-ID: <20160719131219.GK7621@6wind.com> References: <20160705181646.GO7621@6wind.com> <6A0DE07E22DDAD4C9103DF62FEBC09090348E1A7@shsmsx102.ccr.corp.intel.com> <20160707102650.GU7621@6wind.com> <6A0DE07E22DDAD4C9103DF62FEBC090903492563@shsmsx102.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" , Thomas Monjalon , "Zhang, Helin" , "Wu, Jingjing" , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Jan Medala , John Daley , "Chen, Jing D" , "Ananyev, Konstantin" , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , "De Lara Guarch, Pablo" , Olga Shern To: "Lu, Wenzhuo" Return-path: Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by dpdk.org (Postfix) with ESMTP id 22384F72 for ; Tue, 19 Jul 2016 15:12:25 +0200 (CEST) Received: by mail-wm0-f51.google.com with SMTP id o80so25917241wme.1 for ; Tue, 19 Jul 2016 06:12:25 -0700 (PDT) Content-Disposition: inline In-Reply-To: <6A0DE07E22DDAD4C9103DF62FEBC090903492563@shsmsx102.ccr.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, Jul 19, 2016 at 08:11:48AM +0000, Lu, Wenzhuo wrote: > Hi Adrien, > Thanks for your clarification. Most of my questions are clear, but sti= ll something may need to be discussed, comment below. Hi Wenzhuo, Please see below. [...] > > > > Requirements for a new API: > > > > > > > > - Flexible and extensible without causing API/ABI problems for ex= isting > > > > applications. > > > > - Should be unambiguous and easy to use. > > > > - Support existing filtering features and actions listed in `Filt= er types`_. > > > > - Support packet alteration. > > > > - In case of overlapping filters, their priority should be well d= ocumented. > > > Does that mean we don't guarantee the consistent of priority? The p= riority can > > be different on different NICs. So the behavior of the actions can b= e different. > > Right? > >=20 > > No, the intent is precisely to define what happens in order to get a = consistent > > result across different devices, and document cases with undefined be= havior. > > There must be no room left for interpretation. > >=20 > > For example, the API must describe what happens when two overlapping = filters > > (e.g. one matching an Ethernet header, another one matching an IP hea= der) > > match a given packet at a given priority level. > >=20 > > It is documented in section 4.1.1 (priorities) as "undefined behavior= ". > > Applications remain free to do it and deal with consequences, at leas= t they know > > they cannot expect a consistent outcome, unless they use different pr= iority > > levels for both rules, see also 4.4.5 (flow rules priority). > >=20 > > > Seems the users still need to aware the some details of the HW? Do = we need > > to add the negotiation for the priority? > >=20 > > Priorities as defined in this document may not be directly mappable t= o HW > > capabilities (e.g. HW does not support enough priorities, or that som= e corner > > case make them not work as described), in which case the PMD may choo= se to > > simulate priorities (again 4.4.5), as long as the end result follows = the > > specification. > >=20 > > So users must not be aware of some HW details, the PMD does and must > > perform the needed workarounds to suit their expectations. Users may = only be > > impacted by errors while attempting to create rules that are either u= nsupported > > or would cause them (or existing rules) to diverge from the spec. > The problem is sometime the priority of the filters is fixed according = to > > HW's implementation. For example, on ixgbe, n-tuple has a higher > > priority than flow director. As a side note I did not know that N-tuple had a higher priority than flo= w director on ixgbe, priorities among filter types do not seem to be documented at all in DPDK. This is one of the reasons I think we need a generic API to handle flow configuration. So, today an application cannot combine N-tuple and FDIR flow rules and g= et a reliable outcome, unless it is designed for specific devices with a kno= wn behavior. > What's the right behavior of PMD if APP want to create a flow director = rule which has a higher or even equal priority than an existing n-tuple r= ule? Should PMD return fail?=20 First remember applications only deal with the generic API, PMDs are responsible for choosing the most appropriate HW implementation to use according to the requested flow rules (FDIR, N-tuple or anything else). For the specific case of FDIR vs N-tuple, if the underlying HW supports b= oth I do not see why the PMD would create a N-tuple rule. Doesn't FDIR suppor= t everything N-tuple can do and much more? Assuming such a thing happened anyway, that the PMD had to create a rule using a high priority filter type and that the application requests the creation of a rule that can only be done using a lower priority filter ty= pe, but also requested a higher priority for that rule, then yes, it should obviously fail. That is, unless the PMD can perform some kind of workaround to have both. > If so, do we need more fail reasons? According to this RFC, I think we = need return " EEXIST: collision with an existing rule. ", but it's not ve= ry clear, APP doesn't know the problem is priority, maybe more detailed r= eason is helpful. Possibly, I've defined a basic set of errors, there are quite a number of errno values to choose from. However I think we should not define too man= y values. In my opinion the basic set covers every possible failure: - EINVAL: invalid format, rule is broken or cannot be understood by the P= MD anyhow. - ENOTSUP: pattern/actions look fine but something in the requested rule = is not supported and thus cannot be applied. - EEXIST: pattern/actions are fine and could have been applied if only so= me other rule did not prevent the PMD to do it (I see it as the closest th= ing to "ETOOBAD" which unfortunately does not exist). - ENOMEM: like EEXIST, except it is due to the lack of resources not beca= use of another rule. I wasn't sure which of ENOMEM or ENOSPC was better but settled on ENOMEM as it is well known. Still open to debate. Errno values are only useful to get a rough idea of the reason, and anoth= er mechanism is needed to pinpoint the exact problem for debugging/reporting purposes, something like: enum rte_flow_error_type { RTE_FLOW_ERROR_TYPE_NONE, RTE_FLOW_ERROR_TYPE_UNKNOWN, RTE_FLOW_ERROR_TYPE_PRIORITY, RTE_FLOW_ERROR_TYPE_PATTERN, RTE_FLOW_ERROR_TYPE_ACTION, }; struct rte_flow_error { enum rte_flow_error_type type; void *offset; /* Points to the exact pattern item or action. */ const char *message; }; Then either provide an optional struct rte_flow_error pointer to rte_flow_validate(), or a separate function (rte_flow_analyze()?), since processing this may be quite expensive and applications may not care abou= t the exact reason. What do you suggest? > > > > Behavior > > > > -------- > > > > > > > > - API operations are synchronous and blocking (``EAGAIN`` cannot = be > > > > returned). > > > > > > > > - There is no provision for reentrancy/multi-thread safety, altho= ugh nothing > > > > should prevent different devices from being configured at the s= ame > > > > time. PMDs may protect their control path functions accordingly= . > > > > > > > > - Stopping the data path (TX/RX) should not be necessary when man= aging > > flow > > > > rules. If this cannot be achieved naturally or with workarounds= (such as > > > > temporarily replacing the burst function pointers), an appropri= ate error > > > > code must be returned (``EBUSY``). > > > PMD cannot stop the data path without adding lock. So I think if so= me rules > > cannot be applied without stopping rx/tx, PMD has to return fail. > > > Or let the APP to stop the data path. > >=20 > > Agreed, that is the intent. If the PMD cannot touch flow rules for so= me reason > > even after trying really hard, then it just returns EBUSY. > >=20 > > Perhaps we should write down that applications may get a different ou= tcome > > after stopping the data path if they get EBUSY? > Agree, it's better to describe more about the APP. BTW, I checked the b= ehavior of ixgbe/igb, I think we can add/delete filters during runtime. H= opefully we'll not hit too many EBUSY problems on other NICs :) OK, I will add it. > > > > - PMDs, not applications, are responsible for maintaining flow ru= les > > > > configuration when stopping and restarting a port or performing= other > > > > actions which may affect them. They can only be destroyed expli= citly. > > > Don=E2=80=99t understand " They can only be destroyed explicitly." > >=20 > > This part says that as long as an application has not called > > rte_flow_destroy() on a flow rule, it never disappears, whatever happ= ens to the > > port (stopped, restarted). The application is not responsible for re-= creating rules > > after that. > >=20 > > Note that according to the specification, this may translate to not b= eing able to > > stop a port as long as a flow rule is present, depending on how nice = the PMD > > intends to be with applications. Implementation can be done in small = steps with > > minimal amount of code on the PMD side. > Does it mean PMD should store and maintain all the rules? Why not let r= te do that? I think if PMD maintain all the rules, it means every kind of= NIC should have a copy of code for the rules. But if rte do that, only o= ne copy of code need to be maintained, right? I've considered having rules stored in a common format understood at the = RTE level and not specific to each PMD and decided that the opaque rte_flow pointer was a better choice for the following reasons:=20 - Even though flow rules management is done in the control path, processi= ng must be as fast as possible. Letting PMDs store flow rules using their = own internal representation gives them the chance to achieve better performance. - An opaque context managed by PMDs would probably have to be stored somewhere as well anyway. - PMDs may not need to allocate/store anything at all if they exclusively rely on HW state for everything. In my opinion, the generic API has eno= ugh constraints for this to work and maintain consistency between flow rules. Note this is currently how most PMDs implement FDIR and other filter types. - RTE can (and will) provide helpers to avoid most of the code redundancy= , PMDs are free to use them or manage everything by themselves. - Given that the opaque rte_flow pointer associated with a flow rule is t= o be stored by the application, PMDs do not even have to keep references = to them. - The flow rules format described in this specification (pattern / action= s) will be used by applications directly, and will be free to arrange them= in lists, trees or in any other way if they need to keep flow specificatio= ns around for further processing. > When the port is stopped and restarted, rte can reconfigure the rules. = Is the concern that PMD may adjust the sequence of the rules according to= the priority, so every NIC has a different list of rules? But PMD can ad= just them again when rte reconfiguring the rules. What about PMDs able to stop and restart ports without destroying their o= wn flow rules? If we assume flow rules must be destroyed when stopping a por= t, these PMDs are needlessly penalized with slower stop/start cycles. Think about it assuming thousands of flow rules. Thus from an application point of view, whatever happens when stopping an= d restarting a port should not matter. If a flow rule was present before, i= t must still be present afterwards. If the PMD had to destroy flow rules an= d re-create them, it does not actually matter if they differ slightly at th= e HW level, as long as: - Existing opaque flow rule pointers (rte_flow) are still valid to the PM= D and refer to the same rules. - The overall behavior of all rules is the same. The list of rules you think of (patterns / actions) is maintained by applications (not RTE), and only if they need them. RTE would needlessly duplicate this. --=20 Adrien Mazarguil 6WIND