bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamal Hadi Salim <jhs@mojatatu.com>
To: John Fastabend <john.fastabend@gmail.com>,
	Toshiaki Makita <toshiaki.makita1@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <jakub.kicinski@netronome.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Jozsef Kadlecsik <kadlec@netfilter.org>,
	Florian Westphal <fw@strlen.de>,
	Pravin B Shelar <pshelar@ovn.org>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
	William Tu <u9012063@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>
Subject: Re: [RFC PATCH v2 bpf-next 00/15] xdp_flow: Flow offload to XDP
Date: Thu, 24 Oct 2019 13:05:53 -0400	[thread overview]
Message-ID: <ff34648b-3c12-7a8c-381a-6f4838a202f4@mojatatu.com> (raw)
In-Reply-To: <5db12ac278d9f_549d2affde7825b85c@john-XPS-13-9370.notmuch>

On 2019-10-24 12:38 a.m., John Fastabend wrote:
> Jamal Hadi Salim wrote:
>>

[..]
> Correct, sorry was not entirely precise. I've written tooling on top of
> the netlink API to do what is needed and it worked out just fine.
> 
> I think it would be interesting (in this context of flower vs XDP vs
> u32, etc.) to build a flow API that abstracts tc vs XDP away and leverages
> the correct lower level mechanics as needed. Easier said than done
> of course.


So, IMO, the choice is usability vs performance vs expressability.
Pick 2 of 3.
Some context..

Usability:
Flower is intended for humans, so usability is higher priority.
Somewhere along that journey we lost track of reality - now all the
freaking drivers are exposing very highly perfomant interfaces
abstracted as flower. I was worried this is where this XDP interface
was heading when i saw this.

Expressability:
Flower: You want to add another tuple, sure change kernel
code, user code, driver code. Wait 3 years before the rest
of the world catches up.
u32: none of the above. Infact i can express flower using
u32.

performance:
I think flower does well on egress when the flow cache
is already collected; on ingress those memcmps are
not cheap.
u32: you can organize your tables to make it performant
for your traffic patterns.

Back to your comment:
XDP should make choices that prioritize expressability and performance.

u32 will be a good choice because of its use of hierachies of
tables for expression (and tables being close relatives of ebpf maps).
The embedded parse/match in u32 could use some refinements. Maybe in
modern machines we should work on 64 bit words instead of 32, etc.
Note: it doesnt have to be u32  _as long as the two requirements
are met_.
A human friendly "wrapper" API (if you want your 15 tuples by all means)
can be made on top. For machines give them the power to do more.

The third requirement i would have is to allow for other ways of
doing these classification/actions; sort of what tc does - allowing
many different implementations for different classifiers to coexist.
It may u64 today but for some other use case you may need a different
classifier (and yes OVS can move theirs down there too).

> But flower itself is not so old.

It is out in the wild already.

>>
>> Summary: there is value in what Toshiaki is doing.
>>
>> I am disappointed that given a flexible canvas like XDP, we are still
>> going after something like flower... if someone was using u32 as the
>> abstraction it will justify it a lot more in my mind.
>> Tying it to OVS as well is not doing it justice.
> 
> William Tu worked on doing OVS natively in XDP at one point and
> could provide more input on the pain points. But seems easier to just
> modify OVS vs adding kernel shim code to take tc to xdp IMO.
> 

Will be good to hear Williams pain points (there may be a paper out
there).

I dont think any of this should be done to cater for OVS. We need
a low level interface that is both expressive and performant.
OVS can ride on top of it. Human friendly interfaces can be
written on top.

Note also ebpf maps can be shared between tc and XDP.

> Agree but still first packets happen and introducing latency spikes
> when we have a better solution around should be avoided.
> 

Certainly susceptible to attacks (re: old route cache)

But:
If you want to allow for people for choice - then we cant put
obstacles for people who want to do silly things. Just dont
force everyone else to use your shit.

>> Hashes are good for datapath use cases but not when you consider
>> a holistic access where you have to worry about control aspect.
> 
> Whats the "right" data structure?

 From a datapath perspective, hash tables are fine. You can shard
them by having hierarchies, give them more buckets, use some clever
traffic specific keying algorithm etc.
 From a control path perspective, there are challenges. If i want
to (for example) dump based on a partial key filter - that interface
becomes a linked list (i.e i iterate the whole hash table matching
things). A trie would be better in that case.
In my world, when you have hundreds of thousands or millions of
flow entries that you need to retrieve for whatever reasons
every few seconds - this is a big deal.

> We can build it in XDP if
> its useful/generic. tc flower doesn't implement the saem data
> structures as ovs kmod as far as I know.

Generic is key. Freedom is key. OVS is not that. If someone wants
to do a performant 2 tuple hardcoded classifier, let it be.
Let 1000 flowers (garden variety, not tc variety) bloom.

cheers,
jamal

  reply	other threads:[~2019-10-24 17:05 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18  4:07 Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 01/15] xdp_flow: Add skeleton of XDP based flow offload driver Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 02/15] xdp_flow: Add skeleton bpf program for XDP Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 03/15] bpf: Add API to get program from id Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 04/15] xdp: Export dev_check_xdp and dev_change_xdp Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 05/15] xdp_flow: Attach bpf prog to XDP in kernel after UMH loaded program Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 06/15] xdp_flow: Prepare flow tables in bpf Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 07/15] xdp_flow: Add flow entry insertion/deletion logic in UMH Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 08/15] xdp_flow: Add flow handling and basic actions in bpf prog Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 09/15] xdp_flow: Implement flow replacement/deletion logic in xdp_flow kmod Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 10/15] xdp_flow: Add netdev feature for enabling flow offload to XDP Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 11/15] xdp_flow: Implement redirect action Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 12/15] xdp_flow: Implement vlan_push action Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 13/15] bpf, selftest: Add test for xdp_flow Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 14/15] i40e: prefetch xdp->data before running XDP prog Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 15/15] bpf, hashtab: Compare keys in long Toshiaki Makita
2019-10-18 15:22 ` [RFC PATCH v2 bpf-next 00/15] xdp_flow: Flow offload to XDP John Fastabend
2019-10-21  7:31   ` Toshiaki Makita
2019-10-22 16:54     ` John Fastabend
2019-10-22 17:45       ` Toke Høiland-Jørgensen
2019-10-24  4:27         ` John Fastabend
2019-10-24 10:13           ` Toke Høiland-Jørgensen
2019-10-27 13:19             ` Toshiaki Makita
2019-10-27 15:21               ` Toke Høiland-Jørgensen
2019-10-28  3:16                 ` David Ahern
2019-10-28  8:36                   ` Toke Høiland-Jørgensen
2019-10-28 10:08                     ` Jesper Dangaard Brouer
2019-10-28 19:07                       ` David Ahern
2019-10-28 19:05                     ` David Ahern
2019-10-31  0:18                 ` Toshiaki Makita
2019-10-31 12:12                   ` Toke Høiland-Jørgensen
2019-11-11  7:32                     ` Toshiaki Makita
2019-11-12 16:53                       ` Toke Høiland-Jørgensen
2019-11-14 10:11                         ` Toshiaki Makita
2019-11-14 12:41                           ` Toke Høiland-Jørgensen
2019-11-18  6:41                             ` Toshiaki Makita
2019-11-18 10:20                               ` Toke Høiland-Jørgensen
2019-11-22  5:42                                 ` Toshiaki Makita
2019-11-22 11:54                                   ` Toke Høiland-Jørgensen
2019-11-25 10:18                                     ` Toshiaki Makita
2019-11-25 13:03                                       ` Toke Høiland-Jørgensen
2019-11-18 10:28                               ` Toke Høiland-Jørgensen
2019-10-27 13:13         ` Toshiaki Makita
2019-10-27 15:24           ` Toke Høiland-Jørgensen
2019-10-27 19:17             ` David Miller
2019-10-31  0:32               ` Toshiaki Makita
2019-11-12 17:50                 ` William Tu
2019-11-14 10:06                   ` Toshiaki Makita
2019-11-14 17:09                     ` William Tu
2019-11-15 13:16                       ` Toke Høiland-Jørgensen
2019-11-12 17:38             ` William Tu
2019-10-23 14:11       ` Jamal Hadi Salim
2019-10-24  4:38         ` John Fastabend
2019-10-24 17:05           ` Jamal Hadi Salim [this message]
2019-10-27 13:27         ` Toshiaki Makita
2019-10-27 13:06       ` Toshiaki Makita
2019-10-21 11:23 ` Björn Töpel
2019-10-21 11:47   ` Toshiaki Makita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff34648b-3c12-7a8c-381a-6f4838a202f4@mojatatu.com \
    --to=jhs@mojatatu.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=hawk@kernel.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=kadlec@netfilter.org \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=pshelar@ovn.org \
    --cc=sdf@fomichev.me \
    --cc=songliubraving@fb.com \
    --cc=toshiaki.makita1@gmail.com \
    --cc=u9012063@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \
    --subject='Re: [RFC PATCH v2 bpf-next 00/15] xdp_flow: Flow offload to XDP' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).