From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Fastabend <john.fastabend@gmail.com>
Subject: Re: [RFC] Generic flow director/filtering/classification
 API
Date: Wed, 10 Aug 2016 09:35:36 -0700
Message-ID: <57AB57D8.7070507@gmail.com>
References: <20160705181646.GO7621@6wind.com>
 <20160711104141.GA10172@localhost.localdomain>
 <20160721192023.GU7621@6wind.com> <5793DD3E.3080605@gmail.com>
 <57A0E423.2030804@gmail.com> <20160803143049.GF3336@6wind.com>
 <57A233A9.3000006@gmail.com> <20160804130528.GM3336@6wind.com>
 <57AA4A0A.8060809@gmail.com> <20160810110236.GA3336@6wind.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, dev@dpdk.org,
 Thomas Monjalon <thomas.monjalon@6wind.com>,
 Helin Zhang <helin.zhang@intel.com>, Jingjing Wu <jingjing.wu@intel.com>,
 Rasesh Mody <rasesh.mody@qlogic.com>,
 Ajit Khaparde <ajit.khaparde@broadcom.com>,
 Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>,
 Wenzhuo Lu <wenzhuo.lu@intel.com>, Jan Medala <jan@semihalf.com>,
 John Daley <johndale@cisco.com>, Jing Chen <jing.d.chen@intel.com>,
 Konstantin Ananyev <konstantin.ananyev@intel.com>,
 Matej Vido <matejvido@gmail.com>,
 Alejandro Lucero <alejandro.lucero@netronome.com>,
 Sony Chacko <sony.chacko@qlogic.com>,
 Pablo de Lara <pablo.de.lara.guarch@intel.com>,
 Olga Shern <olgas@mellanox.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com
 [209.85.220.43]) by dpdk.org (Postfix) with ESMTP id 8506E5913
 for <dev@dpdk.org>; Wed, 10 Aug 2016 18:36:01 +0200 (CEST)
Received: by mail-pa0-f43.google.com with SMTP id ti13so17334865pac.0
 for <dev@dpdk.org>; Wed, 10 Aug 2016 09:36:01 -0700 (PDT)
In-Reply-To: <20160810110236.GA3336@6wind.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 16-08-10 04:02 AM, Adrien Mazarguil wrote:
> On Tue, Aug 09, 2016 at 02:24:26PM -0700, John Fastabend wrote:
> [...]
>>> Just an idea, could some kind of meta pattern items specifying time
>>> constraints for a rule address this issue? Say, how long (cycles/ms) the PMD
>>> may take to query/apply/delete the rule. If it cannot be guaranteed, the
>>> rule cannot be created. Applications could mantain statistic counters about
>>> failed rules to determine if performance issues are caused by the inability
>>> to create them.
>>
>> It seems a bit heavy to me to have each PMD driver implementing
>> something like this. But it would be interesting to explore probably
>> after the basic support is implemented though.
> 
> OK, let's keep this for later.
> 
> [...]
>>> Such a pattern could be generated from a separate function before feeding it
>>> to rte_flow_create(), or translated by the PMD afterwards assuming a
>>> separate meta item such as RAW_END exists to signal the end of a RAW layer.
>>> Of course processing this would be more expensive.
>>>
>>
>> Or the supported parse graph could be fetched from the hardware with the
>> values for each protocol so that the programming interface is the same.
>> The well known protocols could keep the 'enum values' in the header
>> rte_flow_item_type enum so that users would not be required to do
>> the parse graph but for new or experimental protocols we could query
>> the parse graph and get the programming pattern matching id for them.
>>
>> The normal flow would be unchanged but we don't get stuck upgrading
>> everything to add our own protocol. So the flow would be,
>>
>>  rte_get_parse_graph(graph);
>>  flow_item_proto = is_my_proto_supported(graph);
>>
>>  pattern = build_flow_match(flow_item_proto, value, mask);
>>  action = build_action();
>>  rte_flow_create(my_port, pattern, action);
>>
>> The only change to the API proposed to support this would be to allow
>> unsupported RTE_FLOW_ values to be pushed to the hardware and define
>> a range of values that are reserved for use by the parse graph discover.
>>
>> This would not have to be any more expensive.
> 
> Makes sense. Unless made entirely out of RAW items however the ensuing
> pattern would not be portable across DPDK ports, instances and versions if
> dumped in binary form for later use.
> 

Right.

> Since those would have be recognized by PMDs and applications regardless of
> the API version, I suggest making generated item types negative (enums are
> signed, let's use that).

That works then the normal positive enums maintain the list of
known/accepted protocols.

> 
> DPDK would have to maintain a list of expended values to avoid collisions
> between PMDs. A method should be provided to release them.

The 'middle layer' could have a non-public API for PMDs to get new
values call it get_flow_type_item_id() or something.

> 
> [...]
>> hmm for performance reasons building an entire graph up using RAW items
>> seems to be a bit heavy. Another alternative to the above parse graph
>> notion would be to allow users to add RAW node definitions at init time
>> and have the PMD give a ID back for those. Then the new node could be
>> used just like any other RTE_FLOW_ITEM_TYPE in a pattern.
>>
>> Something like,
>>
>> 	ret_flow_item_type_foo = rte_create_raw_node(foo_raw_pattern)
>> 	ret_flow_item_type_bar = rte_create_raw_node(bar_raw_pattern)
>>
>> then allow ret_flow_item_type_{foo|bar} to be used in subsequent
>> pattern matching items. And if the hardware can not support this return
>> an error from the initial rte_create_raw_node() API call.
>>
>> Do any either of those proposals sound like reasonable extensions?
> 
> Both seem acceptable in my opinion as they fit in the described API. However
> I think it would be better for this function to spit out a pattern made of
> any number of items instead of a single new item type. That way, existing
> fixed items can be reused as well, the entire pattern may even become
> portable as a result, it could be considered as a method to optimize a
> RAW pattern.
> 
> The RAW approach has the advantage of not requiring much additional code in
> the public API besides a couple of function declarations. A proper full
> blown graph would require a lot more as described in your original
> reply. Not sure which is better.
> 
> Either way they won't be part of the initial specification but it looks like
> they can be added later without affecting the basics.
> 

Right its not needed in initial spec as long as we have a path to get
there and it looks like we have two usable possibilities so that works
for me.


>>> [...]
>>>> The two open items from me are do we need to support adding new variable
>>>> length headers? And how do we handle multiple tables I'll take that up
>>>> in the other thread.
>>>
>>> I think variable length headers may be eventually supported through pattern
>>> tricks or eventually a separate conversion layer.
>>>
>>
>> A parse graph notion would support this naturally though without pattern
>> tricks hence my above suggestions.
> 
> All right, I agree a method to let applications precisely define what they
> want to match can be useful now I understand what you mean by
> "dynamically".
> 
>> Also in the current scheme how would I match an ipv6 option or specific
>> nsh option or mpls tag?
> 
> Ideally through specific pattern items defined for this purpose, which is
> how I thought the API would evolve. Of course it wouldn't be fully dynamic
> and you'd have to wait for a DPDK release that implements them.
> 

The only trouble is if you don't know exactly where the option is in the
list of options (which you wont in general) its a bit hard to get right
with the existing spec as best I can tell. Because RAW patterns
would require you to know where the option is in the list and ANY
pattern wouldn't guarantee a match is in the correct header with stacked
headers. At least if I'm reading the spec correctly it seems to be
an issue.

.John