BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Toshiaki Makita <toshiaki.makita1@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <jakub.kicinski@netronome.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	William Tu <u9012063@gmail.com>
Subject: Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
Date: Wed, 14 Aug 2019 16:33:56 +0900
Message-ID: <f6160572-8fa8-0199-8d81-6159dd4cd5ff@gmail.com> (raw)
In-Reply-To: <20190814014445.3dnduyrass5jycr5@ast-mbp>

Hi Alexei, thank you for taking a look!

On 2019/08/14 10:44, Alexei Starovoitov wrote:
> On Tue, Aug 13, 2019 at 09:05:44PM +0900, Toshiaki Makita wrote:
>> This is a rough PoC for an idea to offload TC flower to XDP.
> ...
>>   xdp_flow  TC        ovs kmod
>>   --------  --------  --------
>>   4.0 Mpps  1.1 Mpps  1.1 Mpps
> Is xdp_flow limited to 4 Mpps due to veth or something else?

Looking at perf, accumulation of each layer's overhead resulted in the number.
With XDP prog which only redirects packets and does not touch the data,
the drop rate is 10 Mpps. In this case the main overhead is XDP's redirect processing
and handling of 2 XDP progs (in veth and i40e).
In the xdp_flow test the overhead additionally includes flow key parse in XDP prog
and hash table lookup (including jhash calculation) which resulted in 4 Mpps.

>> So xdp_flow drop rate is roughly 4x faster than software TC or ovs kmod.
>> OTOH the time to add a flow increases with xdp_flow.
>> ping latency of first packet when veth1 does XDP_PASS instead of DROP:
>>   xdp_flow  TC        ovs kmod
>>   --------  --------  --------
>>   25ms      12ms      0.6ms
>> xdp_flow does a lot of work to emulate TC behavior including UMH
>> transaction and multiple bpf map update from UMH which I think increases
>> the latency.
> make sense, but why vanilla TC is so slow ?

No ideas. At least TC requires additional syscall to insert a flow compared to ovs kmod,
but 12 ms looks too long for that.

>> * Implementation
>> xdp_flow makes use of UMH to load an eBPF program for XDP, similar to
>> bpfilter. The difference is that xdp_flow does not generate the eBPF
>> program dynamically but a prebuilt program is embedded in UMH. This is
>> mainly because flow insertion is considerably frequent. If we generate
>> and load an eBPF program on each insertion of a flow, the latency of the
>> first packet of ping in above test will incease, which I want to avoid.
> I think UMH approach is a good fit for this.
> Clearly the same algorithm can be done as kernel code or kernel module, but
> bpfilter-like UMH is a safer approach.
>> - patch 9
>>   Add tc-offload-xdp netdev feature and hooks to call xdp_flow kmod in
>>   TC flower offload code.
> The hook into UMH from TC looks simple. Do you expect the same interface to be
> reused from OVS ?

Do you mean openvswitch kernel module by OVS?
If so, no, at this point. TC hook is simple because I reused flow offload mechanism.
OVS kmod does not have offload interface and ovs-vswitchd is using TC for offload.
I wanted to reuse this mechanism for offloading to XDP, so using TC.

>> * About alternative userland (ovs-vswitchd etc.) implementation
>> Maybe a similar logic can be implemented in ovs-vswitchd offload
>> mechanism, instead of adding code to kernel. I just thought offloading
>> TC is more generic and allows wider usage with direct TC command.
>> For example, considering that OVS inserts a flow to kernel only when
>> flow miss happens in kernel, we can in advance add offloaded flows via
>> tc filter to avoid flow insertion latency for certain sensitive flows.
>> TC flower usage without using OVS is also possible.
>> Also as written above nftables can be offloaded to XDP with this
>> mechanism as well.
> Makes sense to me.
>>    bpf, hashtab: Compare keys in long
> 3Mpps vs 4Mpps just from this patch ?
> or combined with i40 prefech patch ?


>>   drivers/net/ethernet/intel/i40e/i40e_txrx.c  |    1 +
> Could you share "perf report" for just hash tab optimization
> and for i40 ?

Sure, I'll get some more data and post them.

> I haven't seen memcmp to be bottle neck in hash tab.
> What is the the of the key?

typo of "size of the key"? IIRC 64 bytes.

Toshiaki Makita

  reply index

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-13 12:05 Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 01/14] xdp_flow: Add skeleton of XDP based TC offload driver Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 02/14] xdp_flow: Add skeleton bpf program for XDP Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 03/14] bpf: Add API to get program from id Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 04/14] xdp_flow: Attach bpf prog to XDP in kernel after UMH loaded program Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 05/14] xdp_flow: Prepare flow tables in bpf Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 06/14] xdp_flow: Add flow entry insertion/deletion logic in UMH Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 07/14] xdp_flow: Add flow handling and basic actions in bpf prog Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 08/14] xdp_flow: Implement flow replacement/deletion logic in xdp_flow kmod Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 09/14] xdp_flow: Add netdev feature for enabling TC flower offload to XDP Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 10/14] xdp_flow: Implement redirect action Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 11/14] xdp_flow: Implement vlan_push action Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 12/14] bpf, selftest: Add test for xdp_flow Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 13/14] i40e: prefetch xdp->data before running XDP prog Toshiaki Makita
2019-08-13 12:05 ` [RFC PATCH bpf-next 14/14] bpf, hashtab: Compare keys in long Toshiaki Makita
2019-08-14  1:44 ` [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP Alexei Starovoitov
2019-08-14  7:33   ` Toshiaki Makita [this message]
2019-08-15 10:59     ` Toshiaki Makita
2019-08-14 17:07 ` Stanislav Fomichev
2019-08-15 10:26   ` Toshiaki Makita
2019-08-15 15:21     ` Stanislav Fomichev
2019-08-15 19:22       ` Jakub Kicinski
2019-08-16  1:28         ` Toshiaki Makita
2019-08-16 18:52           ` Jakub Kicinski
2019-08-17 14:01             ` Toshiaki Makita
2019-08-19 18:15               ` Jakub Kicinski
2019-08-21  8:49                 ` Toshiaki Makita
2019-08-21 18:38                   ` Jakub Kicinski
2019-08-16 15:59         ` Stanislav Fomichev
2019-08-16 16:20           ` Stanislav Fomichev
2019-08-16  1:09       ` Toshiaki Makita
2019-08-16 15:35         ` Stanislav Fomichev
2019-08-17 14:10           ` Toshiaki Makita
2019-08-15 15:46 ` William Tu
2019-08-16  1:38   ` Toshiaki Makita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6160572-8fa8-0199-8d81-6159dd4cd5ff@gmail.com \
    --to=toshiaki.makita1@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=u9012063@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git