From mboxrd@z Thu Jan  1 00:00:00 1970
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Subject: Re: [RFC] Generic flow director/filtering/classification
 API
Date: Wed, 13 Jul 2016 22:03:27 +0200
Message-ID: <20160713200327.GC7621@6wind.com>
References: <20160705181646.GO7621@6wind.com>
 <2EF2F5C0CC56984AA024D0B180335FCB13DEA331@IRSMSX102.ger.corp.intel.com>
 <20160708130310.GD7621@6wind.com>
 <2EF2F5C0CC56984AA024D0B180335FCB13DEB236@IRSMSX102.ger.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: "dev@dpdk.org" <dev@dpdk.org>, Thomas Monjalon <thomas.monjalon@6wind.com>,
 "Zhang, Helin" <helin.zhang@intel.com>,
 "Wu, Jingjing" <jingjing.wu@intel.com>,
 Rasesh Mody <rasesh.mody@qlogic.com>,
 Ajit Khaparde <ajit.khaparde@broadcom.com>,
 Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>,
 "Lu, Wenzhuo" <wenzhuo.lu@intel.com>, Jan Medala <jan@semihalf.com>,
 John Daley <johndale@cisco.com>, "Chen, Jing D" <jing.d.chen@intel.com>,
 "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
 Matej Vido <matejvido@gmail.com>,
 Alejandro Lucero <alejandro.lucero@netronome.com>,
 Sony Chacko <sony.chacko@qlogic.com>,
 Jerin Jacob <jerin.jacob@caviumnetworks.com>,
 "De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>,
 Olga Shern <olgas@mellanox.com>
To: "Chandran, Sugesh" <sugesh.chandran@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-wm0-f45.google.com (mail-wm0-f45.google.com [74.125.82.45])
 by dpdk.org (Postfix) with ESMTP id 01F492BDC
 for <dev@dpdk.org>; Wed, 13 Jul 2016 22:03:33 +0200 (CEST)
Received: by mail-wm0-f45.google.com with SMTP id f65so42435862wmi.0
 for <dev@dpdk.org>; Wed, 13 Jul 2016 13:03:32 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <2EF2F5C0CC56984AA024D0B180335FCB13DEB236@IRSMSX102.ger.corp.intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Mon, Jul 11, 2016 at 10:42:36AM +0000, Chandran, Sugesh wrote:
> Hi Adrien,
>=20
> Thank you for your response,
> Please see my comments inline.

Hi Sugesh,

Sorry for the delay, please see my answers inline as well.

[...]
> > > > Flow director
> > > > -------------
> > > >
> > > > Flow director (FDIR) is the name of the most capable filter type,=
 which
> > > > covers most features offered by others. As such, it is the most
> > widespread
> > > > in PMDs that support filtering (i.e. all of them besides **e1000*=
*).
> > > >
> > > > It is also the only type that allows an arbitrary 32 bits value p=
rovided by
> > > > applications to be attached to a filter and returned with matchin=
g packets
> > > > instead of relying on the destination queue to recognize flows.
> > > >
> > > > Unfortunately, even FDIR requires applications to be aware of low=
-level
> > > > capabilities and limitations (most of which come directly from **=
ixgbe**
> > and
> > > > **i40e**):
> > > >
> > > > - Bitmasks are set globally per device (port?), not per filter.
> > > [Sugesh] This means application cannot define filters that matches =
on
> > arbitrary different offsets?
> > > If that=E2=80=99s the case, I assume the application has to program=
 bitmask in
> > advance. Otherwise how
> > > the API framework deduce this bitmask information from the rules?? =
Its
> > not very clear to me
> > > that how application pass down the bitmask information for multiple=
 filters
> > on same port?
> >=20
> > This is my understanding of how flow director currently works, perhap=
s
> > someome more familiar with it can answer this question better than I =
could.
> >=20
> > Let me take an example, if particular device can only handle a single=
 IPv4
> > mask common to all flow rules (say only to match destination addresse=
s),
> > updating that mask to also match the source address affects all defin=
ed and
> > future flow rules simultaneously.
> >=20
> > That is how FDIR currently works and I think it is wrong, as it penal=
izes
> > devices that do support individual bit-masks per rule, and is a littl=
e
> > awkward from an application point of view.
> >=20
> > What I suggest for the new API instead is the ability to specify one
> > bit-mask per rule, and let the PMD deal with HW limitations by automa=
tically
> > configuring global bitmasks from the first added rule, then refusing =
to add
> > subsequent rules if they specify a conflicting bit-mask. Existing rul=
es
> > remain unaffected that way, and applications do not have to be extra
> > cautious.
> >=20
> [Sugesh] The issue with that approach is, the hardware simply discards =
the rule
> when it is a super set of first one eventhough the hardware is capable =
of=20
> handling it. How its guaranteed the first rule will set the bitmask for=
 all the
> subsequent rules.=20

Just to clarify, the API only says that new rules cannot affect existing
ones (which I think makes sense from a user's perspective), so as long as
the PMD does whatever is needed to make all rules work together, there
should not be any problem with this approach.

Even if the PMD has to temporarily remove an existing rule and reconfigur=
e
global masks in order to add subsequent rules, it is fine as long as pack=
ets
aren't misdirected in the meantime (they may be dropped if there is no ot=
her
choice).

> How about having a CLASSIFER_TYPE for the classifier. Every port can ha=
ve=20
> set of supported flow types(for eg: L3_TYPE, L4_TYPE, L4_TYPE_8BYTE_FLE=
X,
> L4_TYPE_16BYTE_FLEX) based on the underlying FDIR support. Application =
can query=20
> this and set the type accordingly while initializing the port. This way=
 the first rule need=20
> not set all the bits that may needed in the future rules.=20

Again from a user's POV, I think doing so would add unwanted HW-specific
complexity.=20

However this concern can be handled through a different approach. Let's s=
ay
user creates a pattern that only specifies a IP header with a given
bit-mask.

In FDIR language this translates to:

- Set global mask for IPv4 accordingly, remaining global masks all zeroed
  (assumed default value).

- Create an IPv4 flow.

>>From now on, all rules specifying a IPv4 header must have this exact
bitmask (implicitly or explicitly), otherwise they cannot be created,
i.e. the global bitmask for IPv4 becomes immutable.

Now user creates a TCPv4 rule (as long as it uses the same IPv4 mask), to
handle this FDIR would:

- Keep global immutable mask for IPv4 unchanged, set global TCP mask
  according to the flow rule.

- Create a TCPv4 flow.

>>From this point on, like IPv4, subsequent TCP rules must have this exact
bitmask and so on as the global bitmask becomes immutable.

Basically, only protocol bit-masks affected by existing flow rules are
immutable, others can be changed later. Global flow masks for protocols
become mutable again when no existing flow rule uses them.

Does it look fine for you?

[...]
> > > > +--------------------------+
> > > > | Copy to queue 8          |
> > > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D+
> > > > | PASSTHRU |               |
> > > > +----------+-----------+---+
> > > > | QUEUE    | ``queue`` | 8 |
> > > > +----------+-----------+---+
> > > >
> > > > ``ID``
> > > > ^^^^^^
> > > >
> > > > Attaches a 32 bit value to packets.
> > > >
> > > > +----------------------------------------------+
> > > > | ID                                           |
> > > > +=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+
> > > > | ``id`` | 32 bit value to return with packets |
> > > > +--------+-------------------------------------+
> > > >
> > > [Sugesh] I assume the application has to program the flow
> > > with a unique ID and matching packets are stamped with this ID
> > > when reporting to the software. The uniqueness of ID is NOT
> > > guaranteed by the API framework. Correct me if I am wrong here.
> >=20
> > You are right, if the way I wrote it is not clear enough, I'm open to
> > suggestions to improve it.
> [Sugesh] I guess its fine and would like to confirm the same. Perhaps
> it would be nice to mention that the IDs are application defined.

OK, I will make it clearer.

> > > [Sugesh] Is it a limitation to use only 32 bit ID? Is it possible t=
o have a
> > > 64 bit ID? So that application can use the control plane flow point=
er
> > > Itself as an ID. Does it make sense?
> >=20
> > I've specified a 32 bit ID for now because this is what FDIR supports=
 and
> > also what existing devices can report today AFAIK (i40e and mlx5).
> >=20
> > We could use 64 bit for future-proofness in a separate action like "I=
D64"
> > when at least one device supports it.
> >=20
> > To PMD maintainers: please comment if you know devices that support
> > tagging
> > matching packets with more than 32 bits of user-provided data!
> [Sugesh] I guess the flow director ID is 64 bit , The XL710 datasheet s=
ays so.
> And in the 'rte_mbuf' structure the 64 bit FDIR-ID is shared with rss h=
ash. This can be
> a software driver limitation that expose only 32 bit. Possibly because =
of cache=20
> alignment issues? Since the hardware can support 64 bit, I feel it make=
 sense=20
> to support 64 bit as well.

I agree we need 64 bit support, but then we also need a solution for devi=
ces
that support only 32 bit. Possible methods I can think of:

- A separate "ID64" action (or a "ID32" one, perhaps with a better name).

- A single ID action with an unlimited number of bytes to return with
  packets (would actually be a string). PMDs can then refuse to create fl=
ow
  rules requesting an unsupported number of bytes. Devices supporting few=
er
  than 32 bits are also included this way without the need for yet anothe=
r
  action.

Thoughts?

[...]
> > > [Sugesh] Another concern is the cost and time of installing these r=
ules
> > > in the hardware. Can we make these APIs time bound(or at least an o=
ption
> > to
> > > set the time limit to execute these APIs), so that
> > > Application doesn=E2=80=99t have to wait so long when installing an=
d deleting flows
> > with
> > > slow hardware/NIC. What do you think? Most of the datapath flow
> > installations are
> > > dynamic and triggered only when there is
> > > an ingress traffic. Delay in flow insertion/deletion have unpredict=
able
> > consequences.
> >=20
> > This API is (currently) aimed at the control path only, and must inde=
ed be
> > assumed to be slow. Creating million of rules may take quite long as =
it may
> > involve syscalls and other time-consuming synchronization things on t=
he
> > PMD
> > side.
> >=20
> > So currently there is no plan to have rules added from the data path =
with
> > time constraints. I think it would be implemented through a different=
 set of
> > functions anyway.
> >=20
> > I do not think adding time limits is practical, even specifying in th=
e API
> > that creating a single flow rule must take less than a maximum number=
 of
> > seconds in order to be effective is too much of a constraint (applica=
tions
> > that create all flows during init may not care after all).
> >=20
> > You should consider in any case that modifying flow rules will always=
 be
> > slower than receiving packets, there is no way around that. Applicati=
ons
> > have to live with it and provide a software fallback for incoming pac=
kets
> > while managing flow rules.
> >=20
> > Moreover, think about what happens when you hit the maximum number of
> > flow
> > rules and cannot create any more. Applications need to implement some
> > kind
> > of fallback in their data path.
> >=20
> > Offloading flows in HW is also only useful if they live much longer t=
han the
> > time taken to create and delete them. Perhaps applications may choose=
 to
> > do
> > so after detecting long lived flows such as TCP sessions.
> >=20
> > You may have one separate control thread dedicated to manage flows an=
d
> > keep your normal control thread unaffected by delays. Several threads=
 can
> > even be dedicated, one per device.
> [Sugesh] I agree that the flow insertion cannot be as fast as the packe=
t receiving=20
> rate.  From application point of view the problem will be when hardware=
 flow=20
> insertion takes longer than software flow insertion. At least applicati=
on has to know
> the cost of inserting/deleting a rule in hardware beforehand. Otherwise=
 how application
> can choose the right flow candidate for hardware. My point here is appl=
ication is expecting=20
> a deterministic behavior from a classifier while inserting and deleting=
 rules.

Understood, however it will be difficult to estimate, particularly if a P=
MD
must rearrange flow rules to make room for a new one due to priority leve=
ls
collision or some other HW-related reason. I mean, spent time cannot be
assumed to be constant, even PMDs cannot know in advance because it also
depends on the performance of the host CPU.

Such applications may find it easier to measure elapsed time for the rule=
s
they create, make statistics and extrapolate from this information for
future rules. I do not think the PMD can help much here.

> > > [Sugesh] Another query is on the synchronization part. What if same=
 rules
> > are
> > > handled from different threads? Is application responsible for hand=
ling the
> > concurrent
> > > hardware programming?
> >=20
> > Like most (if not all) DPDK APIs, applications are responsible for ma=
naging
> > locking issues as decribed in 4.3 (Behavior). Since this is a control=
 path
> > API and applications usually have a single control thread, locking sh=
ould
> > not be necessary in most cases.
> >=20
> > Regarding my above comment about using several control threads to
> > manage
> > different devices, section 4.3 says:
> >=20
> >  "There is no provision for reentrancy/multi-thread safety, although =
nothing
> >  should prevent different devices from being configured at the same
> >  time. PMDs may protect their control path functions accordingly."
> >=20
> > I'd like to emphasize it is not "per port" but "per device", since in=
 a few
> > cases a configurable resource is shared by several ports. It may be
> > difficult for applications to determine which ports are shared by a g=
iven
> > device but this falls outside the scope of this API.
> >=20
> > Do you think adding the guarantee that it is always safe to configure=
 two
> > different ports simultaneously without locking from the application s=
ide is
> > necessary? In which case the PMD would be responsible for locking sha=
red
> > resources.
> [Sugesh] This would be little bit complicated when some of ports are no=
t under=20
> DPDK itself(what if one port is managed by Kernel) Or ports are tied by=
=20
> different application. Locking in PMD helps when the ports are accessed=
 by=20
> multiple DPDK application. However what if the port itself not under DP=
DK?

Well, either we do not care about what happens outside of the DPDK contex=
t,
or PMDs must find a way to satisfy everyone. I'm not a fan of locking eit=
her
but it would be nice if flow rules configuration could be attempted on
different ports simultaneously without the risk of wrecking anything, so
that applications do not need to care.

Possible cases for a dual port device with global flow rule settings
affecting both ports:

1) ports 1 & 2 are managed by DPDK: this is the easy case, a rule that ne=
eds
   to alter a global setting necessary for an existing rule on any port i=
s
   not allowed (EEXIST). PMD must maintain a device context common to bot=
h
   ports in order for this to work. This context is either under lock, or
   the first port on which a flow rule is created owns all future flow
   rules.

2) port 1 is managed by DPDK, port 2 by something else, the PMD is aware =
of
   it and knows that port 2 may modify the global context: no flow rules =
can
   be created from the DPDK application due to safety issues (EBUSY?).

3) port 1 is managed by DPDK, port 2 by something else, the PMD is aware =
of
   it and knows that port 2 will not modify flow rules: PMD should not ca=
re,
   no lock necessary.

4) port 1 is managed by DPDK, port 2 by something else and the PMD is not
   aware of it: either flow rules cannot be created ever at all, or we sa=
y
   it is user's reponsibility to make sure this does not happen.

Considering that most control operations performed by DPDK affect the dev=
ice
regardless of other applications, I think 1) is the only case that should=
 be
defined, otherwise 4), defined as user's responsibility.

> > > > Destruction
> > > > ~~~~~~~~~~~
> > > >
> > > > Flow rules destruction is not automatic, and a queue should not b=
e
> > released
> > > > if any are still attached to it. Applications must take care of p=
erforming
> > > > this step before releasing resources.
> > > >
> > > > ::
> > > >
> > > >  int
> > > >  rte_flow_destroy(uint8_t port_id,
> > > >                   struct rte_flow *flow);
> > > >
> > > >
> > > [Sugesh] I would suggest having a clean-up API is really useful as =
the
> > releasing of
> > > Queue(is it applicable for releasing of port too?) is not guarantee=
ing the
> > automatic flow
> > > destruction.
> >=20
> > Would something like rte_flow_flush(port_id) do the trick? I wanted t=
o
> > emphasize in this first draft that applications should really keep th=
e flow
> > pointers around in order to manage/destroy them. It is their responsi=
bility,
> > not PMD's.
> [Sugesh] Thanks, I think the flush call will do.

Noted, will add it.

> > > This way application can initialize the port,
> > > clean-up all the existing rules and create new rules  on a clean sl=
ate.
> >=20
> > No resource can be released as long as a flow rule is using it (bad t=
hings
> > may happen otherwise), all flow rules must be destroyed first, thus n=
one can
> > possibly remain after initializing a port. It is assumed that PMDs do
> > automatic clean up during init if necessary to ensure this.
> [Sugesh] That will do.

I will make it more explicit as well.

[...]

--=20
Adrien Mazarguil
6WIND