bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Toshiaki Makita <toshiaki.makita1@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <jakub.kicinski@netronome.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Jozsef Kadlecsik <kadlec@netfilter.org>,
	Florian Westphal <fw@strlen.de>,
	Pravin B Shelar <pshelar@ovn.org>
Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	William Tu <u9012063@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>
Subject: RE: [RFC PATCH v2 bpf-next 00/15] xdp_flow: Flow offload to XDP
Date: Fri, 18 Oct 2019 08:22:41 -0700	[thread overview]
Message-ID: <5da9d8c125fd4_31cf2adc704105c456@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <20191018040748.30593-1-toshiaki.makita1@gmail.com>

Toshiaki Makita wrote:
> This is a PoC for an idea to offload flow, i.e. TC flower and nftables,
> to XDP.
> 

I've only read the cover letter so far but...

> * Motivation
> 
> The purpose is to speed up flow based network features like TC flower and
> nftables by making use of XDP.
> 
> I chose flow feature because my current interest is in OVS. OVS uses TC
> flower to offload flow tables to hardware, so if TC can offload flows to
> XDP, OVS also can be offloaded to XDP.

This adds a non-trivial amount of code and complexity so I'm
critical of the usefulness of being able to offload TC flower to
XDP when userspace can simply load an XDP program.

Why does OVS use tc flower at all if XDP is about 5x faster using
your measurements below? Rather than spend energy adding code to
a use case that as far as I can tell is narrowly focused on offload
support can we enumerate what is missing on XDP side that blocks
OVS from using it directly? Additionally for hardware that can
do XDP/BPF offload you will get the hardware offload for free.

Yes I know XDP is bytecode and you can't "offload" bytecode into
a flow based interface likely backed by a tcam but IMO that doesn't
mean we should leak complexity into the kernel network stack to
fix this. Use the tc-flower for offload only (it has support for
this) if you must and use the best (in terms of Mpps) software
interface for your software bits. And if you want auto-magic
offload support build hardware with BPF offload support.

In addition by using XDP natively any extra latency overhead from
bouncing calls through multiple layers would be removed.

> 
> When TC flower filter is offloaded to XDP, the received packets are
> handled by XDP first, and if their protocol or something is not
> supported by the eBPF program, the program returns XDP_PASS and packets
> are passed to upper layer TC.
> 
> The packet processing flow will be like this when this mechanism,
> xdp_flow, is used with OVS.

Same as obove just cross out the 'TC flower' box and add support
for your missing features to 'XDP prog' box. Now you have less
code to maintain and less bugs and aren't pushing packets through
multiple hops in a call chain.

> 
>  +-------------+
>  | openvswitch |
>  |    kmod     |
>  +-------------+
>         ^
>         | if not match in filters (flow key or action not supported by TC)
>  +-------------+
>  |  TC flower  |
>  +-------------+
>         ^
>         | if not match in flow tables (flow key or action not supported by XDP)
>  +-------------+
>  |  XDP prog   |
>  +-------------+
>         ^
>         | incoming packets
> 
> Of course we can directly use TC flower without OVS to speed up TC.

huh? TC flower is part of TC so not sure what 'speed up TC' means. I
guess this means using tc flower offload to xdp prog would speed up
general tc flower usage as well?

But again if we are concerned about Mpps metrics just write the XDP
program directly.

> 
> This is useful especially when the device does not support HW-offload.
> Such interfaces include virtual interfaces like veth.

I disagree, use XDP directly.

> 
> 
> * How to use

[...]

> * Performance

[...]

> Tested single core/single flow with 3 kinds of configurations.
> (spectre_v2 disabled)
> - xdp_flow: hw-offload=true, flow-offload-xdp on
> - TC:       hw-offload=true, flow-offload-xdp off (software TC)
> - ovs kmod: hw-offload=false
> 
>  xdp_flow  TC        ovs kmod
>  --------  --------  --------
>  5.2 Mpps  1.2 Mpps  1.1 Mpps
> 
> So xdp_flow drop rate is roughly 4x-5x faster than software TC or ovs kmod.

+1 yep the main point of using XDP ;)

> 
> OTOH the time to add a flow increases with xdp_flow.
> 
> ping latency of first packet when veth1 does XDP_PASS instead of DROP:
> 
>  xdp_flow  TC        ovs kmod
>  --------  --------  --------
>  22ms      6ms       0.6ms
> 
> xdp_flow does a lot of work to emulate TC behavior including UMH
> transaction and multiple bpf map update from UMH which I think increases
> the latency.

And this is IMO sinks why we would adopt this. A native XDP solution would
as far as I can tell not suffer this latency. So in short, we add lots
of code that needs to be maintained, in my opinion it adds complexity,
and finally I can't see what XDP is missing today (with the code we
already have upstream!) to block doing an implementation without any
changes.

> 
> 
> * Implementation
> 
 
[...]

> 
> * About OVS AF_XDP netdev

[...]
 
> * About alternative userland (ovs-vswitchd etc.) implementation
> 
> Maybe a similar logic can be implemented in ovs-vswitchd offload
> mechanism, instead of adding code to kernel. I just thought offloading
> TC is more generic and allows wider usage with direct TC command.
> 
> For example, considering that OVS inserts a flow to kernel only when
> flow miss happens in kernel, we can in advance add offloaded flows via
> tc filter to avoid flow insertion latency for certain sensitive flows.
> TC flower usage without using OVS is also possible.

I argue to cut tc filter out entirely and then I think non of this
is needed.

> 
> Also as written above nftables can be offloaded to XDP with this
> mechanism as well.

Or same argument use XDP directly.

> 
> Another way to achieve this from userland is to add notifications in
> flow_offload kernel code to inform userspace of flow addition and
> deletion events, and listen them by a deamon which in turn loads eBPF
> programs, attach them to XDP, and modify eBPF maps. Although this may
> open up more use cases, I'm not thinking this is the best solution
> because it requires emulation of kernel behavior as an offload engine
> but flow related code is heavily changing which is difficult to follow
> from out of tree.

So if everything was already in XDP why would we need these
notifications? I think a way to poll on a map from user space would
be a great idea e.g. everytime my XDP program adds a flow to my
hash map wake up my userspace agent with some ctx on what was added or
deleted so I can do some control plane logic.

[...]

Lots of code churn...

>  24 files changed, 2864 insertions(+), 30 deletions(-)

Thanks,
John

  parent reply	other threads:[~2019-10-18 15:22 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18  4:07 [RFC PATCH v2 bpf-next 00/15] xdp_flow: Flow offload to XDP Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 01/15] xdp_flow: Add skeleton of XDP based flow offload driver Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 02/15] xdp_flow: Add skeleton bpf program for XDP Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 03/15] bpf: Add API to get program from id Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 04/15] xdp: Export dev_check_xdp and dev_change_xdp Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 05/15] xdp_flow: Attach bpf prog to XDP in kernel after UMH loaded program Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 06/15] xdp_flow: Prepare flow tables in bpf Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 07/15] xdp_flow: Add flow entry insertion/deletion logic in UMH Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 08/15] xdp_flow: Add flow handling and basic actions in bpf prog Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 09/15] xdp_flow: Implement flow replacement/deletion logic in xdp_flow kmod Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 10/15] xdp_flow: Add netdev feature for enabling flow offload to XDP Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 11/15] xdp_flow: Implement redirect action Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 12/15] xdp_flow: Implement vlan_push action Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 13/15] bpf, selftest: Add test for xdp_flow Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 14/15] i40e: prefetch xdp->data before running XDP prog Toshiaki Makita
2019-10-18  4:07 ` [RFC PATCH v2 bpf-next 15/15] bpf, hashtab: Compare keys in long Toshiaki Makita
2019-10-18 15:22 ` John Fastabend [this message]
2019-10-21  7:31   ` [RFC PATCH v2 bpf-next 00/15] xdp_flow: Flow offload to XDP Toshiaki Makita
2019-10-22 16:54     ` John Fastabend
2019-10-22 17:45       ` Toke Høiland-Jørgensen
2019-10-24  4:27         ` John Fastabend
2019-10-24 10:13           ` Toke Høiland-Jørgensen
2019-10-27 13:19             ` Toshiaki Makita
2019-10-27 15:21               ` Toke Høiland-Jørgensen
2019-10-28  3:16                 ` David Ahern
2019-10-28  8:36                   ` Toke Høiland-Jørgensen
2019-10-28 10:08                     ` Jesper Dangaard Brouer
2019-10-28 19:07                       ` David Ahern
2019-10-28 19:05                     ` David Ahern
2019-10-31  0:18                 ` Toshiaki Makita
2019-10-31 12:12                   ` Toke Høiland-Jørgensen
2019-11-11  7:32                     ` Toshiaki Makita
2019-11-12 16:53                       ` Toke Høiland-Jørgensen
2019-11-14 10:11                         ` Toshiaki Makita
2019-11-14 12:41                           ` Toke Høiland-Jørgensen
2019-11-18  6:41                             ` Toshiaki Makita
2019-11-18 10:20                               ` Toke Høiland-Jørgensen
2019-11-22  5:42                                 ` Toshiaki Makita
2019-11-22 11:54                                   ` Toke Høiland-Jørgensen
2019-11-25 10:18                                     ` Toshiaki Makita
2019-11-25 13:03                                       ` Toke Høiland-Jørgensen
2019-11-18 10:28                               ` Toke Høiland-Jørgensen
2019-10-27 13:13         ` Toshiaki Makita
2019-10-27 15:24           ` Toke Høiland-Jørgensen
2019-10-27 19:17             ` David Miller
2019-10-31  0:32               ` Toshiaki Makita
2019-11-12 17:50                 ` William Tu
2019-11-14 10:06                   ` Toshiaki Makita
2019-11-14 17:09                     ` William Tu
2019-11-15 13:16                       ` Toke Høiland-Jørgensen
2019-11-12 17:38             ` William Tu
2019-10-23 14:11       ` Jamal Hadi Salim
2019-10-24  4:38         ` John Fastabend
2019-10-24 17:05           ` Jamal Hadi Salim
2019-10-27 13:27         ` Toshiaki Makita
2019-10-27 13:06       ` Toshiaki Makita
2019-10-21 11:23 ` Björn Töpel
2019-10-21 11:47   ` Toshiaki Makita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5da9d8c125fd4_31cf2adc704105c456@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=hawk@kernel.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kadlec@netfilter.org \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=pshelar@ovn.org \
    --cc=sdf@fomichev.me \
    --cc=songliubraving@fb.com \
    --cc=toshiaki.makita1@gmail.com \
    --cc=u9012063@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).