Re: [RFC PATCH net-next V2 0/6] XDP rx handler

From: Jason Wang <jasowang@redhat.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Ahern <dsahern@gmail.com>,
	Jesper Dangaard Brouer <jbrouer@redhat.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	ast@kernel.org, daniel@iogearbox.net, mst@redhat.com,
	Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Subject: Re: [RFC PATCH net-next V2 0/6] XDP rx handler
Date: Thu, 16 Aug 2018 12:21:18 +0800	[thread overview]
Message-ID: <e459ee4a-7689-7b20-e2e3-c31944306396@redhat.com> (raw)
In-Reply-To: <20180816024913.57ykirt7eqrjntfy@ast-mbp.dhcp.thefacebook.com>

On 2018年08月16日 10:49, Alexei Starovoitov wrote:
> On Wed, Aug 15, 2018 at 03:04:35PM +0800, Jason Wang wrote:
>>>> 3 Deliver XDP buff to userspace through macvtap.
>>> I think I'm getting what you're trying to achieve.
>>> You actually don't want any bpf programs in there at all.
>>> You want macvlan builtin logic to act on raw packet frames.
>> The built-in logic is just used to find the destination macvlan device. It
>> could be done by through another bpf program. Instead of inventing lots of
>> generic infrastructure on kernel with specific userspace API, built-in logic
>> has its own advantages:
>>
>> - support hundreds or even thousands of macvlans
> are you saying xdp bpf program cannot handle thousands macvlans?

Correct me if I was wrong. It works well when the macvlan requires 
similar logic. But let's consider the case when each macvlan wants its 
own specific logic. Is this possible to have thousands of different 
policies and actions in a single BPF program? With XDP rx hanlder, 
there's no need to root device to care about them. Each macvlan can only 
care about itself. This is similar to the case that qdisc could be 
attached to each stacked device.

>
>> - using exist tools to configure network
>> - immunity to topology changes
> what do you mean specifically?

Still the above example, if some macvlans is deleted or created. We need 
notify and update the policies in the root device, this requires 
userspace control program to monitor those changes and notify BPF 
program through maps. Unless the BPF program is designed for some 
specific configurations and setups, it would not be an easy task.

>
>> Besides the usage for containers, we can implement macvtap RX handler which
>> allows a fast packet forwarding to userspace.
> and try to reinvent af_xdp? the motivation for the patchset still escapes me.

Nope, macvtap was used for forwarding packets to VM. This is just try to 
deliver the XDP buff to VM instead of skb. Similar idea was used by 
TUN/TAP which shows amazing improvements.

>
>> Actually, the idea is not limited to macvlan but for all device that is
>> based on rx handler. Consider the case of bonding, this allows to set a very
>> simple XDP program on slaves and keep a single main logic XDP program on the
>> bond instead of duplicating it in all slaves.
> I think such mixed environment of hardcoded in-kernel things like bond
> mixed together with xdp programs will be difficult to manage and debug.
> How admin suppose to debug it?

Well, we've already had in-kernel XDP_TX routine. It should be not 
harder than that.

>   Say something in the chain of
> nic -> native xdp -> bond with your xdp rx -> veth -> xdp prog -> consumer
> is dropping a packet. If all forwarding decisions are done by bpf progs
> the progs will have packet tracing facility (like cilium does) to
> show packet flow end-to-end. It works briliantly like traceroute within a host.

Does this work well for veth pair as well? If yes, it should work for rx 
handler or maybe it has some hard code logic like "ok, the packet goes 
to veth, I'm sure it will be delivered to its peer"? The idea of this 
series is not forbidding the forwarding decisions done by bpf progs, if 
the code did this by accident, we can introduce flag to disable/enable 
XDP rx handler.

And I believe redirection is part of XDP usage, we may still want things 
like XDP_TX.

> But when you have things like macvlan, bond, bridge in the middle
> that can also act on packet, the admin will have a hard time.

I admit it may require admin help, but it gives us more flexibility.

>
> Essentially what you're proposing is to make all kernel builtin packet
> steering/forwarding facilities to understand raw xdp frames.

Probably not, at least for this series it just focus on rx handler. We 
only have less than 10 devices use that.

> That's a lot of code
> and at the end of the chain you'd need fast xdp frame consumer otherwise
> perf benefits are lost.

The performance are lost but still the same as skb. And except for 
redirection, we do have other consumer like XDP_TX.

>   If that consumer is xdp bpf program
> why bother with xdp-fied macvlan or bond?

For macvlan, we may want to have different polices for different 
devices. For bond, we don't want to duplicate XDP logic in each slaves, 
and only bond know which slave could be used for XDP_TX.

>   If that consumer is tcp stack
> than forwarding via xdp-fied bond is no faster than via skb-based bond.
>

Yes.

Thanks