From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ferruh Yigit <ferruh.yigit@intel.com>
Subject: Re: [RFC v2] Flow classification library
Date: Tue, 23 May 2017 13:58:44 +0100
Message-ID: <2742b205-fec9-3414-7693-e0c2369ae80d@intel.com>
References: <20170420185448.19162-1-ferruh.yigit@intel.com>
 <20170518181203.15244-1-ferruh.yigit@intel.com>
 <8CEF83825BEC744B83065625E567D7C224D5F20E@IRSMSX108.ger.corp.intel.com>
 <7428261c-4276-1d28-63c2-3a51a583cdf9@intel.com>
 <20170523122612.GC1758@6wind.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Cc: "Iremonger, Bernard" <bernard.iremonger@intel.com>,
 "dev@dpdk.org" <dev@dpdk.org>, "Mcnamara, John" <john.mcnamara@intel.com>,
 "Tahhan, Maryam" <maryam.tahhan@intel.com>
To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id C5C2828EE
 for <dev@dpdk.org>; Tue, 23 May 2017 14:58:47 +0200 (CEST)
In-Reply-To: <20170523122612.GC1758@6wind.com>
Content-Language: en-US
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 5/23/2017 1:26 PM, Adrien Mazarguil wrote:
> On Mon, May 22, 2017 at 02:53:28PM +0100, Ferruh Yigit wrote:
>> On 5/19/2017 5:30 PM, Iremonger, Bernard wrote:
>>> Hi Ferruh,
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
>>>> Sent: Thursday, May 18, 2017 7:12 PM
>>>> To: dev@dpdk.org
>>>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; Mcnamara, John
>>>> <john.mcnamara@intel.com>; Tahhan, Maryam
>>>> <maryam.tahhan@intel.com>
>>>> Subject: [dpdk-dev] [RFC v2] Flow classification library
>>>>
>>>> DPDK works with packets, but some network administration tools works
>>>> based on flow information.
>>>>
>>>> This library is suggested to provide helper API to convert packet based
>>>> information to the flow records.
>>>>
>>>> Basically the library consist of a single API that gets packets, flow definition
>>>> and action as parameter and provides flow stats based on action. Application
>>>> should call the API for all received packets.
>>>>
>>>> Library header file has more comments on how library works and provided
>>>> APIs.
>>>>
>>>> Packets to flow conversion will cause performance drop, that is why
>>>> conversion done on demand by an API call provided by this library.
>>>>
>>>> Initial implementation in mind is to provide support for IPFIX metering
>>>> process but library planned to be as generic as possible. And flow information
>>>> provided by this library is missing to implement full IPFIX features, but this is
>>>> planned to be initial step.
>>>>
>>>> Flows are defined using rte_flow, also measurements (actions) are provided
>>>> by rte_flow. To support more IPFIX measurements, the implementation may
>>>> require extending rte_flow addition to implementing this library.
>>>
>>> Do you know what extensions are needed to the rte_flow code?
>>
>> The extension may be required on two fields:
>> 1- Defining the flow
>> 2- Available actions
>>
>> For defining the flow, an update may not be required, specially at first
>> version of the library.
>>
>> But for action, there may be some updates.
>>
>> IPFIX RFC defines Metering process as [1], (in [2]). This library should
>> provide helper APIs to metering process.
>>
>> Currently only action can be used in rte_flow is COUNT, more actions can
>> be added to help "packet header capturing, timestamping, sampling,
>> classifying" tasks of the metering process.
>>
>> The exact list depends on the what will be implemented in this release.
>>
>>
>> [1]
>>    Metering Process
>>
>>       The Metering Process generates Flow Records.  Inputs to the
>>       process are packet headers, characteristics, and Packet Treatment
>>       observed at one or more Observation Points.
>>
>>       The Metering Process consists of a set of functions that includes
>>       packet header capturing, timestamping, sampling, classifying, and
>>       maintaining Flow Records.
>>
>>       The maintenance of Flow Records may include creating new records,
>>       updating existing ones, computing Flow statistics, deriving
>>       further Flow properties, detecting Flow expiration, passing Flow
>>       Records to the Exporting Process, and deleting Flow Records.
>>
>> [2]
>> https://tools.ietf.org/html/rfc7011
> 
> Since I did not take this into account in my previous answer [3], I now
> understand several of these requirements cannot be met by hardware (at least
> in the near future). Therefore I think it makes sense to leave IPFIX and
> more generally the maintenance of software data associated with flows to
> separate libraries, instead of adding many new rte_flow actions that can
> only be implemented in software.
> 
> A hybrid solution as described in [3] is still needed regardless to offload
> flow recognition, so that only flows of interest are reprocessed in software
> to compute IPFIX and other data.
> 
> You suggested at one point to take flow rules in addition to mbufs as input
> to handle that. Well, that's actually a nice approach.
> 
> For this to work, rte_flow_classify would have to use opaque handles like
> rte_flow, provided back by the application when attempting to classify
> traffic. If the handle is not known (e.g. MARK is unsupported), a separate
> API function could take a mbuf as input and spit the related
> rte_flow_classify object if any.
> 
> To be clear:
> 
> 1. Create classifier object:
> 
>    classify = rte_flow_classify_create([some rte_flow pattern],
>       [classify-specific actions list, associated resources]);
> 
> 2. Create some flow rule with a MARK action to identify it uniquely. This
>    step might fail and flow can be NULL, that's not an issue:
> 
>    flow = rte_flow_create([the same pattern], MARK with id 42)
> 
> 3. For each received packet:
> 
>    /*
>     * Attempt HW and fall back on SW for flow identification in order to
>     * update classifier flow-related data.
>     */
>    if (flow) {
>       if (mbuf->ol_flags & PKT_RX_FDIR_ID && mbuf->hash.fdir.hi == 42)
>          tmp_classify = classify;
>    } else {
>       tmp_classify = rte_flow_classify_lookup([classifier candidates], mbuf);
>    }
>    if (tmp_classify)
>          rte_flow_classify_update(tmp_classify, mbuf);
> 
> 4. At some point, retrieve computed data from the classifier object itself:
> 
>    rte_flow_classify_stats_get(classify, [output buffer(s)])
> 
> On the RX path, the MARK action can be enough to implement the above. When
> not supported, it could also be emulated through the "sw_fallback" bit
> described in [3] however if the above approach is fine, no need for that.
> 
> It's a bit more complicated to benefit from rte_flow on the TX path since no
> MARK data can be returned. There is currently no other solution than doing
> it all in software anyway.
> 
> [3] http://dpdk.org/ml/archives/dev/2017-May/066177.html
> 

Using MARK action is good idea indeed, but I believe the software
version needs to be implemented anyway, so the MARK action can be next
step optimization.

And another thing is, having 3. and 4. as separate steps means
rte_flow_classify to maintain the flow_records in the library for an
unknown time.
Application needs to explicitly do the call for 3., why not return the
output buffers immediately so the application do the required
association of data to the ports.

Mainly, I would like to keep same API logic in RFC v2, but agree to add
two new APIs:
rte_flow_classify_create()
rte_flow_classify_destroy()

Create will return and opaque rte_flow object (as done in rte_flow),
meanwhile it can create some masks in rte_flow object to help parsing mbuf.

And rte_flow_classify_stats_get() will get rte_flow object and stats as
parameter. Also instead of getting an array of filters, to simplify the
logic, it will get a rte_flow object at a time.

I am planning to send an updated RFC (v3) as above, if there is no major
objection, we can continue to discuss on that RFC.

Thanks,
ferruh