netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 0/9] net: bridge: Forward offloading
@ 2021-04-26 17:04 Tobias Waldekranz
  2021-04-26 17:04 ` [RFC net-next 1/9] net: dfwd: Constrain existing users to macvlan subordinates Tobias Waldekranz
                   ` (9 more replies)
  0 siblings, 10 replies; 42+ messages in thread
From: Tobias Waldekranz @ 2021-04-26 17:04 UTC (permalink / raw)
  To: davem, kuba
  Cc: andrew, vivien.didelot, f.fainelli, olteanv, roopa, nikolay,
	jiri, idosch, stephen, netdev, bridge

## Overview

   vlan1   vlan2
       \   /
   .-----------.
   |    br0    |
   '-----------'
   /   /   \   \
swp0 swp1 swp2 eth0
  :   :   :
  (hwdom 1)

Up to this point, switchdevs have been trusted with offloading
forwarding between bridge ports, e.g. forwarding a unicast from swp0
to swp1 or flooding a broadcast from swp2 to swp1 and swp0. This
series extends forward offloading to include some new classes of
traffic:

- Locally originating flows, i.e. packets that ingress on br0 that are
  to be forwarded to one or several of the ports swp{0,1,2}. Notably
  this also includes routed flows, e.g. a packet ingressing swp0 on
  VLAN 1 which is then routed over to VLAN 2 by the CPU and then
  forwarded to swp1 is "locally originating" from br0's point of view.

- Flows originating from "foreign" interfaces, i.e. an interface that
  is not offloaded by a particular switchdev instance. This includes
  ports belonging to other switchdev instances. A typical example
  would be flows from eth0 towards swp{0,1,2}.

The bridge still looks up its FDB/MDB as usual and then notifies the
switchdev driver that a particular skb should be offloaded if it
matches one of the classes above. It does so by using the _accel
version of dev_queue_xmit, supplying its own netdev as the
"subordinate" device. The driver can react to the presence of the
subordinate in its .ndo_select_queue in what ever way it needs to make
sure to forward the skb in much the same way that it would for packets
ingressing on regular ports.

Hardware domains to which a particular skb has been forwarded are
recorded so that duplicates are avoided.

The main performance benefit is thus seen on multicast flows. Imagine
for example that:

- An IP camera is connected to swp0 (VLAN 1)

- The CPU is acting as a multicast router, routing the group from VLAN
  1 to VLAN 2.

- There are subscribers for the group in question behind both swp1 and
  swp2 (VLAN 2).

With this offloading in place, the bridge need only send a single skb
to the driver, which will send it to the hardware marked in such a way
that the switch will perform the multicast replication according to
the MDB configuration. Naturally, the number of saved skb_clones
increase linearly with the number of subscribed ports.

As an extra benefit, on mv88e6xxx, this also allows the switch to
perform source address learning on these flows, which avoids having to
sync dynamic FDB entries over slow configuration interfaces like MDIO
to avoid flows directed towards the CPU being flooded as unknown
unicast by the switch.


## RFC

- In general, what do you think about this idea?

- hwdom. What do you think about this terminology? Personally I feel
  that we had too many things called offload_fwd_mark, and that as the
  use of the bridge internal ID (nbp->offload_fwd_mark) expands, it
  might be useful to have a separate term for it.

- .dfwd_{add,del}_station. Am I stretching this abstraction too far,
  and if so do you have any suggestion/preference on how to signal the
  offloading from the bridge down to the switchdev driver?

- The way that flooding is implemented in br_forward.c (lazily cloning
  skbs) means that you have to mark the forwarding as completed very
  early (right after should_deliver in maybe_deliver) in order to
  avoid duplicates. Is there some way to move this decision point to a
  later stage that I am missing?

- BR_MULTICAST_TO_UNICAST. Right now, I expect that this series is not
  compatible with unicast-to-multicast being used on a port. Then
  again, I think that this would also be broken for regular switchdev
  bridge offloading as this flag is not offloaded to the switchdev
  port, so there is no way for the driver to refuse it. Any ideas on
  how to handle this?


## mv88e6xxx Specifics

Since we are now only receiving a single skb for both unicast and
multicast flows, we can tag the packets with the FORWARD command
instead of FROM_CPU. The swich(es) will then forward the packet in
accordance with its ATU, VTU, STU, and PVT configuration - just like
for packets ingressing on user ports.

Crucially, FROM_CPU is still used for:

- Ports in standalone mode.

- Flows that are trapped to the CPU and software-forwarded by a
  bridge. Note that these flows match neither of the classes discussed
  in the overview.

- Packets that are sent directly to a port netdev without going
  through the bridge, e.g. lldpd sending out PDU via an AF_PACKET
  socket.

We thus have a pretty clean separation where the data plane uses
FORWARDs and the control plane uses TO_/FROM_CPU.

The barrier between different bridges is enforced by port based VLANs
on mv88e6xxx, which in essence is a mapping from a source device/port
pair to an allowed set of egress ports. In order to have a FORWARD
frame (which carries a _source_ device/port) correctly mapped by the
PVT, we must use a unique pair for each bridge.

Fortunately, there is typically lots of unused address space in most
switch trees. When was the last time you saw an mv88e6xxx product
using more than 4 chips? Even if you found one with 16 (!) devices,
you would still have room to allocate 16*16 virtual ports to software
bridges.

Therefore, the mv88e6xxx driver will allocate a virtual device/port
pair to each bridge that it offloads. All members of the same bridge
are then configured to allow packets from this virtual port in their
PVTs.

Tobias Waldekranz (9):
  net: dfwd: Constrain existing users to macvlan subordinates
  net: bridge: Disambiguate offload_fwd_mark
  net: bridge: switchdev: Recycle unused hwdoms
  net: bridge: switchdev: Forward offloading
  net: dsa: Track port PVIDs
  net: dsa: Forward offloading
  net: dsa: mv88e6xxx: Allocate a virtual DSA port for each bridge
  net: dsa: mv88e6xxx: Map virtual bridge port in PVT
  net: dsa: mv88e6xxx: Forward offloading

 MAINTAINERS                                   |   1 +
 drivers/net/dsa/mv88e6xxx/Makefile            |   1 +
 drivers/net/dsa/mv88e6xxx/chip.c              |  61 ++++++-
 drivers/net/dsa/mv88e6xxx/dst.c               | 160 ++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/dst.h               |  14 ++
 .../net/ethernet/intel/fm10k/fm10k_netdev.c   |   3 +
 drivers/net/ethernet/intel/i40e/i40e_main.c   |   3 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   3 +
 include/linux/dsa/mv88e6xxx.h                 |  13 ++
 include/net/dsa.h                             |  13 ++
 net/bridge/br_forward.c                       |  11 +-
 net/bridge/br_if.c                            |   4 +-
 net/bridge/br_private.h                       |  54 +++++-
 net/bridge/br_switchdev.c                     | 141 +++++++++++----
 net/dsa/port.c                                |  16 +-
 net/dsa/slave.c                               |  36 +++-
 net/dsa/tag_dsa.c                             |  33 +++-
 17 files changed, 510 insertions(+), 57 deletions(-)
 create mode 100644 drivers/net/dsa/mv88e6xxx/dst.c
 create mode 100644 drivers/net/dsa/mv88e6xxx/dst.h
 create mode 100644 include/linux/dsa/mv88e6xxx.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2021-05-06 11:01 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26 17:04 [RFC net-next 0/9] net: bridge: Forward offloading Tobias Waldekranz
2021-04-26 17:04 ` [RFC net-next 1/9] net: dfwd: Constrain existing users to macvlan subordinates Tobias Waldekranz
2021-04-26 17:04 ` [RFC net-next 2/9] net: bridge: Disambiguate offload_fwd_mark Tobias Waldekranz
2021-05-02 15:00   ` Ido Schimmel
2021-05-03  8:49     ` Tobias Waldekranz
2021-05-05  7:39       ` Ido Schimmel
2021-04-26 17:04 ` [RFC net-next 3/9] net: bridge: switchdev: Recycle unused hwdoms Tobias Waldekranz
2021-04-27 10:42   ` Nikolay Aleksandrov
2021-04-26 17:04 ` [RFC net-next 4/9] net: bridge: switchdev: Forward offloading Tobias Waldekranz
2021-04-27 10:35   ` Nikolay Aleksandrov
2021-04-28 22:47     ` Tobias Waldekranz
2021-04-29  9:16       ` Nikolay Aleksandrov
2021-04-29 14:55         ` Tobias Waldekranz
2021-05-02 15:04   ` Ido Schimmel
2021-05-03  8:53     ` Tobias Waldekranz
2021-05-06 11:01       ` Vladimir Oltean
2021-04-26 17:04 ` [RFC net-next 5/9] net: dsa: Track port PVIDs Tobias Waldekranz
2021-04-26 19:40   ` Vladimir Oltean
2021-04-26 20:05     ` Tobias Waldekranz
2021-04-26 20:28       ` Vladimir Oltean
2021-04-27  9:12         ` Tobias Waldekranz
2021-04-27  9:27           ` Vladimir Oltean
2021-04-27 10:07           ` Vladimir Oltean
2021-04-28 23:10             ` Tobias Waldekranz
2021-04-26 17:04 ` [RFC net-next 6/9] net: dsa: Forward offloading Tobias Waldekranz
2021-04-27 10:17   ` Vladimir Oltean
2021-05-04 14:44     ` Tobias Waldekranz
2021-05-04 15:21       ` Vladimir Oltean
2021-05-04 20:07         ` Tobias Waldekranz
2021-05-04 20:33           ` Andrew Lunn
2021-05-04 21:24             ` Tobias Waldekranz
2021-05-04 20:58           ` Vladimir Oltean
2021-05-04 22:12             ` Tobias Waldekranz
2021-05-04 23:04               ` Vladimir Oltean
2021-05-05  9:01                 ` Tobias Waldekranz
2021-05-05 16:12                   ` Vladimir Oltean
2021-04-26 17:04 ` [RFC net-next 7/9] net: dsa: mv88e6xxx: Allocate a virtual DSA port for each bridge Tobias Waldekranz
2021-04-26 17:04 ` [RFC net-next 8/9] net: dsa: mv88e6xxx: Map virtual bridge port in PVT Tobias Waldekranz
2021-04-26 17:04 ` [RFC net-next 9/9] net: dsa: mv88e6xxx: Forward offloading Tobias Waldekranz
2021-05-02 14:58 ` [RFC net-next 0/9] net: bridge: " Ido Schimmel
2021-05-03  9:44   ` Tobias Waldekranz
2021-05-06 10:59     ` Vladimir Oltean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).