From mboxrd@z Thu Jan  1 00:00:00 1970
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Subject: Re: [RFC] Generic flow director/filtering/classification
	API
Date: Wed, 20 Jul 2016 12:41:15 +0200
Message-ID: <20160720104115.GN7621@6wind.com>
References: <20160705181646.GO7621@6wind.com>
 <6A0DE07E22DDAD4C9103DF62FEBC09090348E1A7@shsmsx102.ccr.corp.intel.com>
 <20160707102650.GU7621@6wind.com>
 <6A0DE07E22DDAD4C9103DF62FEBC090903492563@shsmsx102.ccr.corp.intel.com>
 <20160719131219.GK7621@6wind.com>
 <6A0DE07E22DDAD4C9103DF62FEBC090903492A93@shsmsx102.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: "dev@dpdk.org" <dev@dpdk.org>, Thomas Monjalon <thomas.monjalon@6wind.com>,
 "Zhang, Helin" <helin.zhang@intel.com>,
 "Wu, Jingjing" <jingjing.wu@intel.com>,
 Rasesh Mody <rasesh.mody@qlogic.com>,
 Ajit Khaparde <ajit.khaparde@broadcom.com>,
 Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>,
 Jan Medala <jan@semihalf.com>, John Daley <johndale@cisco.com>,
 "Chen, Jing D" <jing.d.chen@intel.com>,
 "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
 Matej Vido <matejvido@gmail.com>,
 Alejandro Lucero <alejandro.lucero@netronome.com>,
 Sony Chacko <sony.chacko@qlogic.com>,
 Jerin Jacob <jerin.jacob@caviumnetworks.com>,
 "De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>,
 Olga Shern <olgas@mellanox.com>
To: "Lu, Wenzhuo" <wenzhuo.lu@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51])
 by dpdk.org (Postfix) with ESMTP id C281B275D
 for <dev@dpdk.org>; Wed, 20 Jul 2016 12:41:21 +0200 (CEST)
Received: by mail-wm0-f51.google.com with SMTP id q128so50708987wma.1
 for <dev@dpdk.org>; Wed, 20 Jul 2016 03:41:21 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <6A0DE07E22DDAD4C9103DF62FEBC090903492A93@shsmsx102.ccr.corp.intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Wenzhuo,

On Wed, Jul 20, 2016 at 02:16:51AM +0000, Lu, Wenzhuo wrote:
[...]
> > So, today an application cannot combine N-tuple and FDIR flow rules a=
nd get a
> > reliable outcome, unless it is designed for specific devices with a k=
nown
> > behavior.
> >=20
> > > What's the right behavior of PMD if APP want to create a flow direc=
tor rule
> > which has a higher or even equal priority than an existing n-tuple ru=
le? Should
> > PMD return fail?
> >=20
> > First remember applications only deal with the generic API, PMDs are
> > responsible for choosing the most appropriate HW implementation to us=
e
> > according to the requested flow rules (FDIR, N-tuple or anything else=
).
> >=20
> > For the specific case of FDIR vs N-tuple, if the underlying HW suppor=
ts both I do
> > not see why the PMD would create a N-tuple rule. Doesn't FDIR support
> > everything N-tuple can do and much more?
> Talking about the filters, fdir can cover n-tuple. I think that's why i=
40e only supports fdir but not n-tuple. But n-tuple has its own highlight=
. As we know, at least on intel NICs, fdir only supports per device mask.=
 But n-tuple can support per rule mask.
> As every pattern has spec and mask both, we cannot guarantee the masks =
are same. I think ixgbe will try to use n-tuple first if can. Because eve=
n the masks are different, we can support them all.

OK, makes sense. In that case existing rules may indeed prevent subsequen=
t
ones from getting created if their priority is wrong. I do not think ther=
e
is a way around that if the application needs this exact ordering.

> > Assuming such a thing happened anyway, that the PMD had to create a r=
ule
> > using a high priority filter type and that the application requests t=
he creation of a
> > rule that can only be done using a lower priority filter type, but al=
so requested a
> > higher priority for that rule, then yes, it should obviously fail.
> >=20
> > That is, unless the PMD can perform some kind of workaround to have b=
oth.
> >=20
> > > If so, do we need more fail reasons? According to this RFC, I think=
 we need
> > return " EEXIST: collision with an existing rule. ", but it's not ver=
y clear, APP
> > doesn't know the problem is priority, maybe more detailed reason is h=
elpful.
> >=20
> > Possibly, I've defined a basic set of errors, there are quite a numbe=
r of errno
> > values to choose from. However I think we should not define too many =
values.
> > In my opinion the basic set covers every possible failure:
> >=20
> > - EINVAL: invalid format, rule is broken or cannot be understood by t=
he PMD
> >   anyhow.
> >=20
> > - ENOTSUP: pattern/actions look fine but something in the requested r=
ule is
> >   not supported and thus cannot be applied.
> >=20
> > - EEXIST: pattern/actions are fine and could have been applied if onl=
y some
> >   other rule did not prevent the PMD to do it (I see it as the closes=
t thing
> >   to "ETOOBAD" which unfortunately does not exist).
> >=20
> > - ENOMEM: like EEXIST, except it is due to the lack of resources not =
because
> >   of another rule. I wasn't sure which of ENOMEM or ENOSPC was better=
 but
> >   settled on ENOMEM as it is well known. Still open to debate.
> >=20
> > Errno values are only useful to get a rough idea of the reason, and a=
nother
> > mechanism is needed to pinpoint the exact problem for debugging/repor=
ting
> > purposes, something like:
> >=20
> >  enum rte_flow_error_type {
> >      RTE_FLOW_ERROR_TYPE_NONE,
> >      RTE_FLOW_ERROR_TYPE_UNKNOWN,
> >      RTE_FLOW_ERROR_TYPE_PRIORITY,
> >      RTE_FLOW_ERROR_TYPE_PATTERN,
> >      RTE_FLOW_ERROR_TYPE_ACTION,
> >  };
> >=20
> >  struct rte_flow_error {
> >      enum rte_flow_error_type type;
> >      void *offset; /* Points to the exact pattern item or action. */
> >      const char *message;
> >  };
> When we are using a CLI and it fails, normally it will let us know whic=
h parameter is not appropriate. So, I think it=E2=80=99s a good idea to h=
ave this error structure :)

Agreed.

> > Then either provide an optional struct rte_flow_error pointer to
> > rte_flow_validate(), or a separate function (rte_flow_analyze()?), si=
nce
> > processing this may be quite expensive and applications may not care =
about the
> > exact reason.
> Agree the processing may be too expensive. Maybe we can say it's option=
al to return error details. And that's a good question that what APP shou=
ld do if creating the rule fails. I believe normally it will choose handl=
e the rule by itself. But I think it's not bad to feedback more. Or even =
the APP want to adjust the rules, it cannot be an option for lack of info=
.

All right then, I'll add it to the specification.

 int
 rte_flow_validate(uint8_t port_id,
                   const struct rte_flow_pattern *pattern,
                   const struct rte_flow_actions *actions,
                   struct rte_flow_error *error);

With error possibly NULL if the application does not care. Is it fine for
you?

[...]
> > > > > > - PMDs, not applications, are responsible for maintaining flo=
w rules
> > > > > >   configuration when stopping and restarting a port or perfor=
ming other
> > > > > >   actions which may affect them. They can only be destroyed e=
xplicitly.
> > > > > Don=E2=80=99t understand " They can only be destroyed explicitl=
y."
> > > >
> > > > This part says that as long as an application has not called
> > > > rte_flow_destroy() on a flow rule, it never disappears, whatever
> > > > happens to the port (stopped, restarted). The application is not
> > > > responsible for re-creating rules after that.
> > > >
> > > > Note that according to the specification, this may translate to n=
ot
> > > > being able to stop a port as long as a flow rule is present,
> > > > depending on how nice the PMD intends to be with applications.
> > > > Implementation can be done in small steps with minimal amount of =
code on
> > the PMD side.
> > > Does it mean PMD should store and maintain all the rules? Why not l=
et rte do
> > that? I think if PMD maintain all the rules, it means every kind of N=
IC should have
> > a copy of code for the rules. But if rte do that, only one copy of co=
de need to be
> > maintained, right?
> >=20
> > I've considered having rules stored in a common format understood at =
the RTE
> > level and not specific to each PMD and decided that the opaque rte_fl=
ow pointer
> > was a better choice for the following reasons:
> >=20
> > - Even though flow rules management is done in the control path, proc=
essing
> >   must be as fast as possible. Letting PMDs store flow rules using th=
eir own
> >   internal representation gives them the chance to achieve better
> >   performance.
> Not quite understand. I think we're talking about maintain the rules by=
 SW. I don=E2=80=99t think there's something need to be optimized accordi=
ng to specific NICs. If we need to optimize the code, I think we need to =
consider the CPU, OS ... and some common means. I'm wrong?

Perhaps we were talking about different things, here I was only explainin=
g
why rte_flow (the result of creating a flow rule) should be opaque and fu=
lly
managed by the PMD. More on the SW side of things below.

> > - An opaque context managed by PMDs would probably have to be stored
> >   somewhere as well anyway.
> >=20
> > - PMDs may not need to allocate/store anything at all if they exclusi=
vely
> >   rely on HW state for everything. In my opinion, the generic API has=
 enough
> >   constraints for this to work and maintain consistency between flow
> >   rules. Note this is currently how most PMDs implement FDIR and othe=
r
> >   filter types.
> Yes, the rules are stored by HW. But considering stop/start the device,=
 the rules in HW will lose. we have to store the rules by SW and re-progr=
am them when restarting the device.

Assume a HW capable of keeping flow rules programmed even during a
stop/start cycle (e.g. mlx4/mlx5 may be able to do it from DPDK point of
view), don't you think it is more efficient to standardize on this behavi=
or
and let PMDs restore flow rules for HW that do not support it regardless =
of
whether it would be done by RTE or the application (SW)?

> And in existing code, we store the filters by SW at least on Intel NICs=
. But I think we cannot reuse them, because considering the priority and =
which category of filter should be chosen, I think we need a whole new ta=
ble for generic API. I think it=E2=80=99s what's designed now, right?

So I understand you'd want RTE to help your PMD keep track of the flow ru=
les
it created?

Nothing wrong with that, all I'm saying is that it should be entirely
optional. RTE should not automatically maintain a list. PMDs have to call
RTE helpers if they need help to maintain a context. These helpers are no=
t
defined in this API yet because it is difficult to know what will be usef=
ul
in advance.

> > - RTE can (and will) provide helpers to avoid most of the code redund=
ancy,
> >   PMDs are free to use them or manage everything by themselves.
> >=20
> > - Given that the opaque rte_flow pointer associated with a flow rule =
is to
> >   be stored by the application, PMDs do not even have to keep referen=
ces to
> >   them.
> Don=E2=80=99t understand. More details?

In an application:

 rte_flow *foo =3D rte_flow_create(...);

In the above example, foo cannot be dereferenced by the application nor R=
TE,
only the PMD is aware of its contents. This object can only be used with
rte_flow*() functions.

PMDs are thus free to make this object grow as needed when adding interna=
l
features without breaking any kind of public API/ABI.

What I meant is, given that the application is supposed to store foo
somewhere in order to destroy it later, the PMD does not have to keep tra=
ck
of that pointer assuming it does not need to access it later on its own f=
or
some reason.

> > - The flow rules format described in this specification (pattern / ac=
tions)
> >   will be used by applications directly, and will be free to arrange =
them in
> >   lists, trees or in any other way if they need to keep flow specific=
ations
> >   around for further processing.
> Who will create the lists, trees or something else? According to previo=
us discussion, I think the APP will program the rules one by one. So if A=
PP organize the rules to lists, trees..., PMD doesn=E2=80=99t know that.=20
> And you said " Given that the opaque rte_flow pointer associated with a=
 flow rule is to be stored by the application ". I'm lost here.

I guess that's because we're discussing two different things, flow rule
specifications and flow rule objects. Let me sum it up:

- Flow rule specifications are the patterns/actions combinations provided=
 by
  applications to rte_flow_create(). Applications can store those as need=
ed
  and organize them as they wish (hash, tree, list). Neither PMDs nor RTE
  will do it for them.

- Flow rule objects (struct rte_flow *) are generated when a flow rule is
  created. Applications must keep these around if they want to manipulate
  them later (i.e. destroy or query existing rules).

Then PMDs *may* need to keep and arrange flow rule objects internally for
management purposes. Could be because HW requires it, detecting conflicti=
ng
rules, managing priorities and so on. Possible reasons are not described =
in
this API because these are thought as PMD-specific needs.

> > > When the port is stopped and restarted, rte can reconfigure the rul=
es. Is the
> > concern that PMD may adjust the sequence of the rules according to th=
e priority,
> > so every NIC has a different list of rules? But PMD can adjust them a=
gain when
> > rte reconfiguring the rules.
> >=20
> > What about PMDs able to stop and restart ports without destroying the=
ir own
> > flow rules? If we assume flow rules must be destroyed when stopping a=
 port,
> > these PMDs are needlessly penalized with slower stop/start cycles. Th=
ink about
> > it assuming thousands of flow rules.
> I believe the rules maintained by SW should not be destroyed, because t=
hey're used to be re-programed when the device starts again.

Do we agree that applications should not care? Flow rules configured befo=
re
stopping a port must still be there after restarting it.

What we seem to not agree about is that you think RTE should be responsib=
le
for restoring flow rules of devices that lose them when stopped. I think
doing so is unfair to devices for which it is not the case and not really
nice to applications, so my opinion is that the PMD is responsible for
restoring flow rules however it wants. It is free to use RTE helpers to k=
eep
their track, as long as it's all managed internally.

> > Thus from an application point of view, whatever happens when stoppin=
g and
> > restarting a port should not matter. If a flow rule was present befor=
e, it must
> > still be present afterwards. If the PMD had to destroy flow rules and=
 re-create
> > them, it does not actually matter if they differ slightly at the HW l=
evel, as long as:
> >=20
> > - Existing opaque flow rule pointers (rte_flow) are still valid to th=
e PMD
> >   and refer to the same rules.
> >=20
> > - The overall behavior of all rules is the same.
> >=20
> > The list of rules you think of (patterns / actions) is maintained by =
applications
> > (not RTE), and only if they need them. RTE would needlessly duplicate=
 this.
> As said before, need more details to understand this. Maybe an example =
is better :)

The generic format both RTE and applications might understand is the one
described in this API (struct rte_flow_pattern and struct
rte_flow_actions).

If we wanted RTE to maintain some sort of per-port state for flow rule
specifications, it would have to be a copy of these structures arranged
somehow (list or something else).

If we consider that PMDs need to keep a context object associated to a fl=
ow
rule (the opaque struct rte_flow *), then RTE would most likely have to
store it along with the flow specification.

Such a list may not be useful to applications (list lookups take time), s=
o
they would implement their own redundant method. They might also require
extra room to attach some application context to flow rules. A generic li=
st
cannot plan for it.

Applications know what they want to do with flow rules and are responsibl=
e
for managing them efficiently with RTE out of the way.

I'm not sure if this answered your question, if not, please describe a
scenario where a RTE-managed list of flow rules would be mandatory.

--=20
Adrien Mazarguil
6WIND