Re: [RFC 2/2] net: bridge: add a software fast-path implementation

From: Nikolay Aleksandrov <nikolay@nvidia.com>
To: Felix Fietkau <nbd@nbd.name>, <netdev@vger.kernel.org>
Subject: Re: [RFC 2/2] net: bridge: add a software fast-path implementation
Date: Fri, 11 Feb 2022 10:50:40 +0200	[thread overview]
Message-ID: <0b4318af-4c12-bd5a-ae32-165c70af65b2@nvidia.com> (raw)
In-Reply-To: <e8f1e8f5-8417-84a8-61c3-793fa7803ac6@nbd.name>

On 10/02/2022 18:53, Felix Fietkau wrote:
> 
> On 10.02.22 16:02, Nikolay Aleksandrov wrote:
>> Hi Felix,
>> that looks kind of familiar. :) I've been thinking about a similar optimization for
>> quite some time and generally love the idea, but I thought we'd allow this to be
>> implemented via eBPF flow speedup with some bridge helpers. There's also a lot of low
>> hanging fruit about optimizations in bridge's fast-path.
>>
>> Also from your commit message it seems you don't need to store this in the bridge at
>> all but can use the notifications that others currently use and program these flows
>> in the interested driver. I think it'd be better to do the software flow cache via
>> ebpf, and do the hardware offload in the specific driver.
> To be honest, I have no idea how to handle this in a clean way in the driver, because this offloading path crosses several driver/subsystem boundaries.
> 
> Right now we have support for a packet processing engine (PPE) in the MT7622 SoC, which can handle offloading IPv4 NAT/routing and IPv6 routing.
> The hardware can also handle forwarding of src-mac/destination-mac tuples, but that is currently unused because it's not needed for ethernet-only forwarding.
> 
> When adding WLAN to the mix, it gets more complex. The PPE has an output port that connects to a special block called Wireless Ethernet Dispatch, which can be configured to intercept DMA between the WLAN driver (mt76) and a PCIe device with MT7615 or MT7915 in order to inject extra packets.
> 
> I already have working NAT/routing offload support for this, which I will post soon. In order to figure out the path to WLAN, the offloading code calls the .ndo_fill_forward_path op, which mac80211 supports.
> This allows the mt76 driver to fill in required metadata which gets stored in the PPE flowtable.
> > On MT7622, traffic can only flow from ethernet to WLAN in this manner, on newer SoCs, offloading can work in the other direction as well.
> 

That's really not a bridge problem and not an argument to add so much new infrastructure
which we should later maintain and fix. I'd prefer all that complexity to be kept where
it is needed. Especially for something that (minus the offload/hw support) can already be
done much more efficiently by using existing tools (flow marking/matching and offloading
using xdp/ebpf).

> So when looking at fdb entries and flows between them, the ethernet driver will have to figure out:
> - which port number to use in the DSA tag on the ethernet side
> - which VLAN to use on the ethernet side, with the extra gotcha that ingress traffic will be tagged, but egress won't
> - which entries sit behind mac80211 vifs that support offloading through WED.
> I would also need to add a way to push the notifications through DSA to the ethernet driver, because there is a switch inbetween that is not involved in the offloading path (PPE handles DSA tagging/untagging).
> 

Add the absolute minimum infrastructure (if any at all) on the bridge side to achieve it.
As I mentioned above caching flows can already be achieved by using ebpf with some
extra care and maybe help from user-space. We don't need to maintain such new complex and
very fragile infrastructure. The new "fast" path is far from ideal, you've taken care
of a few cases only, there are many more that can and should affect it, in addition any new
features which get added will have to consider it. It will be a big headache to get correct in
the first place and to maintain in the future, while at the same time we can already do
it through ebpf and we can even make it easily available if new ebpf helpers are accepted.
I don't see any value in adding this to the bridge, a flow cache done through xdp would
be much faster for the software case.

> By the way, all of this is needed for offloading a fairly standard configuration on these devices, so not just for weird exotic settings.
> 
> If I let the bridge code tracks flows, I can easily handle this by using the same kind of infrastructure that netfilter flowtable uses. If I push this to the driver, it becomes a lot more complex and messy, in my opinion...
> > - Felix

I've seen these arguments many times over, it'll be easier to do device-specific
(unrelated to bridge in most cases) feature Y in the bridge so let's stick it there even
though it doesn't fit the bridge model and similar functionality can already be
achieved by re-using existing tools.
I'm sure there are many ways to achieve that flow tracking, you can think about a netfilter
solution to get these flows (maybe through some nftables/ebtables rules), you can think about
marking the flows at ingress and picking them up at egress, surely there are many more
solutions for your device to track the flows that go through the bridge and offload them.

Thanks,
 Nik