From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Fastabend Subject: Re: [RFC] Generic flow director/filtering/classification API Date: Wed, 10 Aug 2016 09:46:27 -0700 Message-ID: <57AB5A63.1090106@gmail.com> References: <20160705181646.GO7621@6wind.com> <20160721081335.GA15856@chelsio.com> <20160721170738.GT7621@6wind.com> <20160725113229.GA24036@chelsio.com> <579640E2.50702@gmail.com> <20160726100731.GA2542@chelsio.com> <20160803164410.GH3336@6wind.com> <57A241FC.30508@gmail.com> <20160804132453.GN3336@6wind.com> <57AA4F80.6040101@gmail.com> <20160810133755.GB3336@6wind.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit To: Rahul Lakkireddy , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , Pablo de Lara , Olga Shern , Kumar A S , Nirranjan Kirubaharan , Indranil Choudhury Return-path: Received: from mail-pa0-f47.google.com (mail-pa0-f47.google.com [209.85.220.47]) by dpdk.org (Postfix) with ESMTP id E88DE5913 for ; Wed, 10 Aug 2016 18:46:53 +0200 (CEST) Received: by mail-pa0-f47.google.com with SMTP id fi15so17389672pac.1 for ; Wed, 10 Aug 2016 09:46:53 -0700 (PDT) In-Reply-To: <20160810133755.GB3336@6wind.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 16-08-10 06:37 AM, Adrien Mazarguil wrote: > On Tue, Aug 09, 2016 at 02:47:44PM -0700, John Fastabend wrote: >> On 16-08-04 06:24 AM, Adrien Mazarguil wrote: >>> On Wed, Aug 03, 2016 at 12:11:56PM -0700, John Fastabend wrote: > [...] >>>> The problem is keeping priorities in order and/or possibly breaking >>>> rules apart (e.g. you have an L2 table and an L3 table) becomes very >>>> complex to manage at driver level. I think its easier for the >>>> application which has some context to do this. The application "knows" >>>> if its a router for example will likely be able to pack rules better >>>> than a PMD will. >>> >>> I don't think most applications know they are L2 or L3 routers. They may not >>> know more than the pattern provided to the PMD, which may indeed end at a L2 >>> or L3 protocol. If the application simply chooses a table based on this >>> information, then the PMD could have easily done the same. >>> >> >> But when we start thinking about encap/decap then its natural to start >> using this interface to implement various forwarding dataplanes. And one >> common way to organize a switch is into a TEP, router, switch >> (mac/vlan), ACL tables, etc. In fact we see this topology starting to >> show up in the NICs now. >> >> Further each table may be "managed" by a different entity. In which >> case the software will want to manage the physical and virtual networks >> separately. >> >> It doesn't make sense to me to require a software aggregator object to >> marshal the rules into a flat table then for a PMD to split them apart >> again. > > OK, my point was mostly about handling basic cases easily and making sure > applications do not have to bother with petty HW details when they do not > want to, yet still get maximum performance by having the PMD make the most > appropriate choices automatically. > > You've convinced me that in many cases PMDs won't be able to optimize > efficiently and that conscious applications will know better. The API has to > provide the ability to do so. I think it's fine as long as it is not > mandatory. > Great. I also agree making table feature _not_ mandatory for many use cases will be helpful. I'm just making sure we get all the use cases I know of covered. >>> I understand the issue is what happens when applications really want to >>> define e.g. L2/L3/L2 rules in this specific order (or any ordering that >>> cannot be satisfied by HW due to table constraints). >>> >>> By exposing tables, in such a case applications should move all rules from >>> L2 to a L3 table themselves (assuming this is even supported) to guarantee >>> ordering between rules, or fail to add them. This is basically what the PMD >>> could have done, possibly in a more efficient manner in my opinion. >> >> I disagree with the more efficient comment :) >> >> If the software layer is working on L2/TEP/ACL/router layers merging >> them just to pull them back apart is not going to be more efficient. > > Moving flow rules around cannot be efficient by definition, however I think > that attempting to describe table capabilities may be as complicated as > describing HW bit-masking features. Applications may get it wrong as a > result while a PMD would not make any mistake. > > Your use case is valid though, if the application already groups rules, then > sharing this information with the PMD would make sense from a performance > standpoint. > >>> Let's assume two opposite scenarios for this discussion: >>> >>> - App #1 is a command-line interface directly mapped to flow rules, which >>> basically gets slow random input from users depending on how they want to >>> configure their traffic. All rules differ considerably (L2, L3, L4, some >>> with incomplete bit-masks, etc). All in all, few but complex rules with >>> specific priorities. >>> >> >> Agree with this and in this case the application should be behind any >> network physical/virtual and not giving rules like encap/decap/etc. This >> application either sits on the physical function and "owns" the hardware >> resource or sits behind a virtual switch. >> >> >>> - App #2 is something like OVS, creating and deleting a large number of very >>> specific (without incomplete bit-masks) and mostly identical >>> single-priority rules automatically and very frequently. >>> >> >> Maybe for OVS but not all virtual switches are built with flat tables >> at the bottom like this. Nor is it optimal it necessarily optimal. >> >> Another application (the one I'm concerned about :) would be build as >> a pipeline, something like >> >> ACL -> TEP -> ACL -> VEB -> ACL >> >> If I have hardware that supports a TEP hardware block an ACL hardware >> block and a VEB block for example I don't want to merge my control >> plane into a single table. The merging in this case is just pure >> overhead/complexity for no gain. > > It could be done by dedicating priority ranges for each item in the > pipeline but then it would be clunky. OK then, let's discuss the best > approach to implement this. > > [...] >>>> Its not about mask vs no mask. The devices with multiple tables that I >>>> have don't have this mask limitations. Its about how to optimally pack >>>> the rules and who implements that logic. I think its best done in the >>>> application where I have the context. >>>> >>>> Is there a way to omit the table field if the PMD is expected to do >>>> a best effort and add the table field if the user wants explicit >>>> control over table mgmt. This would support both models. I at least >>>> would like to have explicit control over rule population in my pipeline >>>> for use cases where I'm building a pipeline on top of the hardware. >>> >>> Yes that's a possibility. Perhaps the table ID to use could be specified as >>> a meta pattern item? We'd still need methods to report how many tables exist >>> and perhaps some way to report their limitations, these could be later >>> through a separate set of functions. >> >> Sure I think a meta pattern item would be fine or put it in the API call >> directly, something like >> >> rte_flow_create(port_id, pattern, actions); >> rte_flow_create_table(port_id, table_id, pattern, actions); > > I suggest using a common method for both cases, either seems fine to me, as > long as a default table value can be provided (zero) when applications do > not care. > Works for me just use zero as the default when the application has no preference and expects PMD to do the table mapping. > Now about tables management, I think there is no need to not expose table > capabilities (in case they have different capabilities) but instead provide > guidelines as part of the specification to encourage applications writers to > group similar rules in tables. A previously discussed, flow rules priorities > would be specific to the table they are affected to. This seems sufficient to me. > > Like flow rules, table priorities could be handled through their index with > index 0 having the highest priority. Like flow rule priorities, table > indices wouldn't have to be contiguous. > > If this works for you, how about renaming "tables" to "groups"? > Works for me. And actually I like renaming them "groups" as this seems more neutral to how the hardware actually implements a group. For example I've worked on hardware with multiple Tunnel Endpoint engines but we exposed it as a single "group" to simplify the user interface. .John