From mboxrd@z Thu Jan  1 00:00:00 1970
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Subject: Re: [RFC] Generic flow director/filtering/classification
 API
Date: Fri, 15 Jul 2016 17:04:02 +0200
Message-ID: <20160715150402.GE7621@6wind.com>
References: <20160705181646.GO7621@6wind.com>
 <2EF2F5C0CC56984AA024D0B180335FCB13DEA331@IRSMSX102.ger.corp.intel.com>
 <20160708130310.GD7621@6wind.com>
 <2EF2F5C0CC56984AA024D0B180335FCB13DEB236@IRSMSX102.ger.corp.intel.com>
 <20160713200327.GC7621@6wind.com>
 <2EF2F5C0CC56984AA024D0B180335FCB13DEE55F@IRSMSX102.ger.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: "dev@dpdk.org" <dev@dpdk.org>, Thomas Monjalon <thomas.monjalon@6wind.com>,
 "Zhang, Helin" <helin.zhang@intel.com>,
 "Wu, Jingjing" <jingjing.wu@intel.com>,
 Rasesh Mody <rasesh.mody@qlogic.com>,
 Ajit Khaparde <ajit.khaparde@broadcom.com>,
 Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>,
 "Lu, Wenzhuo" <wenzhuo.lu@intel.com>, Jan Medala <jan@semihalf.com>,
 John Daley <johndale@cisco.com>, "Chen, Jing D" <jing.d.chen@intel.com>,
 "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
 Matej Vido <matejvido@gmail.com>,
 Alejandro Lucero <alejandro.lucero@netronome.com>,
 Sony Chacko <sony.chacko@qlogic.com>,
 Jerin Jacob <jerin.jacob@caviumnetworks.com>,
 "De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>,
 Olga Shern <olgas@mellanox.com>,
 "Chilikin, Andrey" <andrey.chilikin@intel.com>
To: "Chandran, Sugesh" <sugesh.chandran@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51])
 by dpdk.org (Postfix) with ESMTP id A69B12BD9
 for <dev@dpdk.org>; Fri, 15 Jul 2016 17:04:07 +0200 (CEST)
Received: by mail-wm0-f51.google.com with SMTP id f126so29651022wma.1
 for <dev@dpdk.org>; Fri, 15 Jul 2016 08:04:07 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <2EF2F5C0CC56984AA024D0B180335FCB13DEE55F@IRSMSX102.ger.corp.intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Fri, Jul 15, 2016 at 09:23:26AM +0000, Chandran, Sugesh wrote:
> Thank you Adrien,
> Please find below for some more comments/inputs
>=20
> Let me know your thoughts on this.

Thanks, stripping again non relevant parts.

[...]
> > > > > [Sugesh] Is it a limitation to use only 32 bit ID? Is it possib=
le
> > > > > to have a
> > > > > 64 bit ID? So that application can use the control plane flow
> > > > > pointer Itself as an ID. Does it make sense?
> > > >
> > > > I've specified a 32 bit ID for now because this is what FDIR
> > > > supports and also what existing devices can report today AFAIK (i=
40e and
> > mlx5).
> > > >
> > > > We could use 64 bit for future-proofness in a separate action lik=
e "ID64"
> > > > when at least one device supports it.
> > > >
> > > > To PMD maintainers: please comment if you know devices that suppo=
rt
> > > > tagging matching packets with more than 32 bits of user-provided
> > > > data!
> > > [Sugesh] I guess the flow director ID is 64 bit , The XL710 datashe=
et says so.
> > > And in the 'rte_mbuf' structure the 64 bit FDIR-ID is shared with r=
ss
> > > hash. This can be a software driver limitation that expose only 32
> > > bit. Possibly because of cache alignment issues? Since the hardware
> > > can support 64 bit, I feel it make sense to support 64 bit as well.
> >=20
> > I agree we need 64 bit support, but then we also need a solution for =
devices
> > that support only 32 bit. Possible methods I can think of:
> >=20
> > - A separate "ID64" action (or a "ID32" one, perhaps with a better na=
me).
> >=20
> > - A single ID action with an unlimited number of bytes to return with
> >   packets (would actually be a string). PMDs can then refuse to creat=
e flow
> >   rules requesting an unsupported number of bytes. Devices supporting
> > fewer
> >   than 32 bits are also included this way without the need for yet an=
other
> >   action.
> >=20
> > Thoughts?
> [Sugesh] I feel the single ID approach is much better. But I would say =
a fixed size ID
> is easy to handle at upper layers. Say PMD returns 64bit ID in which MS=
Bs=20
> are masked out, based on how many bits the hardware can support.=20
> PMD can refuse the unsupported number of bytes when requested. So the s=
ize
> of ID going to be a parameter to program the flow.
> What do you think?

What you suggest if I am not mistaken is:

 struct rte_flow_action_id {
     uint64_t id;
     uint64_t mask; /* either a bit-mask or a prefix/suffix length? */
 };

I think in this case a mask is more versatile than a prefix/suffix length=
 as
the value itself comes in an unknown endian (from PMD's POV). It also all=
ows
specific bits to be taken into account, like when HW only supports 32 bit=
,
with some black magic the full original 64 bit value can be restored as l=
ong
as the application only cares about at most 32 bits anywhere.

However I do not think many applications "won't care" about specific bits=
 in
a given value and having to provide a properly crafted mask will be a
hassle, they will just fill it with ones and hope for the best. As a resu=
lt
they won't take advantage of this feature or will stick to 32 bits all th=
e
time, or whatever happens to be the least common denominator.

My previous suggestion was:

 struct rte_flow_action_id {
     uint8_t size; /* number of bytes in id[] */
     uint8_t id[];
 };

It does not solve the issue if an application requests more bytes than
supported, however as a string, there is no endianness ambiguity and thes=
e
bytes are copied as-is to the related mbuf field as if done through memcp=
y()
possibly with some padding to fill the entire 64 bit field (copied bytes
thus starting from MSB for big-endian machines, LSB for little-endian
ones). The value itself remains opaque to the PMD.

One issue is the flexible array approach makes static initialization more
complicated. Maybe it is not worth the trouble since according to Andrey,
even X710 reports at most 32 bits of user data.

So what should we do? Fixed 32 bits ID for now to keep things simple, the=
n
another action for 64 bits later when necessary?

> > [...]
> > > > > [Sugesh] Another concern is the cost and time of installing the=
se
> > > > > rules in the hardware. Can we make these APIs time bound(or at
> > > > > least an option
> > > > to
> > > > > set the time limit to execute these APIs), so that Application
> > > > > doesn=E2=80=99t have to wait so long when installing and deleti=
ng flows
> > > > with
> > > > > slow hardware/NIC. What do you think? Most of the datapath flow
> > > > installations are
> > > > > dynamic and triggered only when there is an ingress traffic. De=
lay
> > > > > in flow insertion/deletion have unpredictable
> > > > consequences.
> > > >
> > > > This API is (currently) aimed at the control path only, and must
> > > > indeed be assumed to be slow. Creating million of rules may take
> > > > quite long as it may involve syscalls and other time-consuming
> > > > synchronization things on the PMD side.
> > > >
> > > > So currently there is no plan to have rules added from the data p=
ath
> > > > with time constraints. I think it would be implemented through a
> > > > different set of functions anyway.
> > > >
> > > > I do not think adding time limits is practical, even specifying i=
n
> > > > the API that creating a single flow rule must take less than a
> > > > maximum number of seconds in order to be effective is too much of=
 a
> > > > constraint (applications that create all flows during init may no=
t care after
> > all).
> > > >
> > > > You should consider in any case that modifying flow rules will
> > > > always be slower than receiving packets, there is no way around
> > > > that. Applications have to live with it and provide a software
> > > > fallback for incoming packets while managing flow rules.
> > > >
> > > > Moreover, think about what happens when you hit the maximum
> > number
> > > > of flow rules and cannot create any more. Applications need to
> > > > implement some kind of fallback in their data path.
> > > >
> > > > Offloading flows in HW is also only useful if they live much long=
er
> > > > than the time taken to create and delete them. Perhaps applicatio=
ns
> > > > may choose to do so after detecting long lived flows such as TCP
> > > > sessions.
> > > >
> > > > You may have one separate control thread dedicated to manage flow=
s
> > > > and keep your normal control thread unaffected by delays. Several
> > > > threads can even be dedicated, one per device.
> > > [Sugesh] I agree that the flow insertion cannot be as fast as the
> > > packet receiving rate.  From application point of view the problem
> > > will be when hardware flow insertion takes longer than software flo=
w
> > > insertion. At least application has to know the cost of
> > > inserting/deleting a rule in hardware beforehand. Otherwise how
> > > application can choose the right flow candidate for hardware. My po=
int
> > here is application is expecting a deterministic behavior from a clas=
sifier while
> > inserting and deleting rules.
> >=20
> > Understood, however it will be difficult to estimate, particularly if=
 a PMD
> > must rearrange flow rules to make room for a new one due to priority =
levels
> > collision or some other HW-related reason. I mean, spent time cannot =
be
> > assumed to be constant, even PMDs cannot know in advance because it a=
lso
> > depends on the performance of the host CPU.
> >=20
> > Such applications may find it easier to measure elapsed time for the =
rules
> > they create, make statistics and extrapolate from this information fo=
r future
> > rules. I do not think the PMD can help much here.
> [Sugesh] From an application point of view this can be an issue.=20
> Even there is a security concern when we program a short lived flow. Le=
ts consider the case,=20
>=20
> 1) Control plane programs the hardware with Queue termination flow.
> 2) Software dataplane programmed to treat the packets from the specific=
 queue accordingly.
> 3) Remove the flow from the hardware. (Lets consider this is a long wai=
t process..).=20
> Or even there is a chance that hardware take more time to report the st=
atus than removing it=20
> physically . Now the packets in the queue no longer consider as matched=
/flow hit.
> . This is due to the software dataplane update is yet to happen.
> We must need a way to sync between software datapath and classifier API=
s even though=20
> they are both programmed from a different control thread.
>=20
> Are we saying these APIs are only meant for user defined static flows??

No, that is definitely not the intent. These are good points.

With the specified API, applications may have to adapt their logic and ta=
ke
extra precautions in order to remain on the safe side at all times.

For your above example, the application cannot assume a rule is
added/deleted as long as the PMD has not completed the related operation,
which means keeping the SW rule/fallback in place in the meantime. Should
handle security concerns as long as after removing a rule, packets end up=
 in
a default queue entirely processed by SW. Obviously this may worsen respo=
nse
time.

The ID action can help with this. By knowing which rule a received packet=
 is
associated with, processing can be temporarily offloaded by another threa=
d
without much complexity.

I think applications have to implement SW fallbacks all the time, as even
some sort of guarantee on the flow rule processing time may not be enough=
 to
avoid misdirected packets and related security issues.

Let's wait for applications to start using this API and then consider an
extra set of asynchronous / real-time functions when the need arises. It
should not impact the way rules are specified.

> > > > > [Sugesh] Another query is on the synchronization part. What if
> > > > > same rules
> > > > are
> > > > > handled from different threads? Is application responsible for
> > > > > handling the
> > > > concurrent
> > > > > hardware programming?
> > > >
> > > > Like most (if not all) DPDK APIs, applications are responsible fo=
r
> > > > managing locking issues as decribed in 4.3 (Behavior). Since this=
 is
> > > > a control path API and applications usually have a single control
> > > > thread, locking should not be necessary in most cases.
> > > >
> > > > Regarding my above comment about using several control threads to
> > > > manage different devices, section 4.3 says:
> > > >
> > > >  "There is no provision for reentrancy/multi-thread safety, altho=
ugh
> > > > nothing  should prevent different devices from being configured a=
t
> > > > the same  time. PMDs may protect their control path functions
> > accordingly."
> > > >
> > > > I'd like to emphasize it is not "per port" but "per device", sinc=
e
> > > > in a few cases a configurable resource is shared by several ports=
.
> > > > It may be difficult for applications to determine which ports are
> > > > shared by a given device but this falls outside the scope of this=
 API.
> > > >
> > > > Do you think adding the guarantee that it is always safe to
> > > > configure two different ports simultaneously without locking from
> > > > the application side is necessary? In which case the PMD would be
> > > > responsible for locking shared resources.
> > > [Sugesh] This would be little bit complicated when some of ports ar=
e
> > > not under DPDK itself(what if one port is managed by Kernel) Or por=
ts
> > > are tied by different application. Locking in PMD helps when the po=
rts
> > > are accessed by multiple DPDK application. However what if the port=
 itself
> > not under DPDK?
> >=20
> > Well, either we do not care about what happens outside of the DPDK
> > context, or PMDs must find a way to satisfy everyone. I'm not a fan o=
f locking
> > either but it would be nice if flow rules configuration could be atte=
mpted on
> > different ports simultaneously without the risk of wrecking anything,=
 so that
> > applications do not need to care.
> >=20
> > Possible cases for a dual port device with global flow rule settings =
affecting
> > both ports:
> >=20
> > 1) ports 1 & 2 are managed by DPDK: this is the easy case, a rule tha=
t needs
> >    to alter a global setting necessary for an existing rule on any po=
rt is
> >    not allowed (EEXIST). PMD must maintain a device context common to=
 both
> >    ports in order for this to work. This context is either under lock=
, or
> >    the first port on which a flow rule is created owns all future flo=
w
> >    rules.
> >=20
> > 2) port 1 is managed by DPDK, port 2 by something else, the PMD is aw=
are of
> >    it and knows that port 2 may modify the global context: no flow ru=
les can
> >    be created from the DPDK application due to safety issues (EBUSY?)=
.
> >=20
> > 3) port 1 is managed by DPDK, port 2 by something else, the PMD is aw=
are of
> >    it and knows that port 2 will not modify flow rules: PMD should no=
t care,
> >    no lock necessary.
> >=20
> > 4) port 1 is managed by DPDK, port 2 by something else and the PMD is=
 not
> >    aware of it: either flow rules cannot be created ever at all, or w=
e say
> >    it is user's reponsibility to make sure this does not happen.
> >=20
> > Considering that most control operations performed by DPDK affect the
> > device regardless of other applications, I think 1) is the only case =
that should
> > be defined, otherwise 4), defined as user's responsibility.

No more comments on this part? What do you suggest?

--=20
Adrien Mazarguil
6WIND