netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: Tobias Waldekranz <tobias@waldekranz.com>
Cc: davem@davemloft.net, kuba@kernel.org, andrew@lunn.ch,
	vivien.didelot@gmail.com, f.fainelli@gmail.com, roopa@nvidia.com,
	nikolay@nvidia.com, netdev@vger.kernel.org, jiri@resnulli.us,
	idosch@idosch.org, stephen@networkplumber.org
Subject: Re: [RFC net-next 2/7] net: bridge: switchdev: Include local flag in FDB notifications
Date: Sun, 17 Jan 2021 21:30:09 +0200	[thread overview]
Message-ID: <20210117193009.io3nungdwuzmo5f7@skbuf> (raw)
In-Reply-To: <20210116012515.3152-3-tobias@waldekranz.com>

Hi Tobias,

On Sat, Jan 16, 2021 at 02:25:10AM +0100, Tobias Waldekranz wrote:
> Some switchdev drivers, notably DSA, ignore all dynamically learned
> address notifications (!added_by_user) as these are autonomously added
> by the switch. Previously, such a notification was indistinguishable
> from a local address notification. Include a local bit in the
> notification so that the two classes can be discriminated.
>
> This allows DSA-like devices to add local addresses to the hardware
> FDB (with the CPU as the destination), thereby avoiding flows towards
> the CPU being flooded by the switch as unknown unicast.
>
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
> ---

In an ideal world, the BR_FDB_LOCAL bit of an FDB entry is what you
would probably want to use as an indication that the packet must be
delivered upstream by the hardware, considering that this is what the
software data path does:

br_handle_frame_finish:
		if (test_bit(BR_FDB_LOCAL, &dst->flags))
			return br_pass_frame_up(skb);

However, we are not in an ideal world, but in a cacophony of nonsensical
flags that must be passed to the 'bridge fdb add' command. For example,
I noticed this usage pattern in your patch 6/7:

|    br0
|    / \
| swp0 dummy0
|
| $ bridge fdb add 02:00:de:ad:00:01 dev dummy0 vlan 1 master

Do you know that this command doesn't do what you think it does
(assuming that 02:00:de:ad:00:01 is not the MAC address of dummy0)?

The command you wrote will add a _local_ FDB entry on dummy0.
I tried it on a DSA switch interface (swp0):

$ bridge fdb add 02:00:de:ad:00:01 dev swp0 vlan 1 master
[ 3162.165561] rtnl_fdb_add: addr 02:00:de:ad:00:01 vid 1 ndm_state NUD_NOARP|NUD_PERMANENT
[ 3162.172487] fdb_add_entry: fdb 02:00:de:ad:00:01 state NUD_NOARP|NUD_PERMANENT, fdb_to_nud NUD_REACHABLE, flags 0x0
[ 3162.180515] mscc_felix 0000:00:00.5 swp0: local fdb: 02:00:de:ad:00:01 vid 1

So, after your patches, this command syntax will stop adding a
front-facing FDB entry on swp0. It will add a CPU-facing FDB entry
instead.

You know why the bridge neighbour state is NUD_NOARP|NUD_PERMANENT in
rtnl_fdb_add? Well, because iproute2 set it that way:

	/* Assume permanent */
	if (!(req.ndm.ndm_state&(NUD_PERMANENT|NUD_REACHABLE)))
		req.ndm.ndm_state |= NUD_PERMANENT;

See iproute2 commit 0849e60a10d0 ("bridge: manage VXLAN based forwarding
tables"). I know so little about VXLAN's use of the bridge command, that
I cannot tell why it was decided to "assume permanent" (which seems to
have changed the default behavior for everybody).

Otherwise said, even if not mentioned in the man page, the default FDB
entry type is NUD_PERMANENT (which, in short, means a "local" entry, see
a more detailed explanation at the end).

The man page just says:

   bridge fdb add - add a new fdb entry
       This command creates a new fdb entry.

       LLADDR the Ethernet MAC address.

       dev DEV
              the interface to which this address is associated.

              local - is a local permanent fdb entry

              static - is a static (no arp) fdb entry

which is utterly misleading and useless. It does not say:
(a) what a local FDB entry is
(b) that if neither "local" or "static"|"dynamic" is specified,
    "local" is default

This already creates pretty bad premises. You would have needed to
explicitly add "static" to your command. Not only you, but in fact also
thousands of other people who already have switchdev deployments using
the 'bridge fdb' command.

So, in short, if everybody with switchdev hardware used the 'bridge fdb'
command correctly so far, your series would have been great. But in
fact, nobody did (me included). So we need to be more creative.

For example, there's that annoying "self" flag.
As far as I understand, there is zero reason for a DSA driver to use the
"self" flag, since that means "bypass the bridge FDB and just call the
.ndo_fdb_add of the device driver, which in the case of DSA is
dsa_legacy_fdb_add". Instead you would just use the "master" flag, which
makes the operation be propagated through br_fdb_add and the software
FDB has a chance to be updated.

Only that there's no one preventing you from using the 'self' and
'master' flags together. Which means that the FDB would be offloaded to
the device twice: once through the SWITCHDEV_FDB_ADD_TO_DEVICE
notification emitted by br_fdb_add, and once through dsa_legacy_fdb_add.
Contradict me if I'm wrong, but I'm thinking that you may have missed
this detail that bridge fdb addresses are implicitly 'local' because you
also used some 'self master' commands, and the "to-CPU" address
installed through switchdev was immediately overwritten by the correct
one installed through dsa_legacy_fdb_add.

So I think there is a very real issue in that the FDB entries with the
is_local bit was never specified to switchdev thus far, and now suddenly
is. I'm sorry, but what you're saying in the commit message, that
"!added_by_user has so far been indistinguishable from is_local" is
simply false.

What I'm saying is that some of the is_local addresses should have been
rejected by DSA from day one, like this one:

bridge fdb add dev swp0 00:01:02:03:04:05 master

but nonetheless DSA is happy to accept it anyway, because switchdev
doesn't tell it it's local. Yay.

It looks like Mellanox have been telling their users to explicitly use
the "static" keyword when they mean a static FDB entry:
https://github.com/Mellanox/mlxsw/wiki/Bridge#forwarding-database-configuration
which, I mean, is great for them, but pretty much sucks for everybody
else, because Documentation/networking/switchdev.rst just says:

The switchdev driver should implement ndo_fdb_add, ndo_fdb_del and ndo_fdb_dump
to support static FDB entries installed to the device.  Static bridge FDB
entries are installed, for example, using iproute2 bridge cmd::

	bridge fdb add ADDR dev DEV [vlan VID] [self]

which of course is completely bogus anyway (it indicates 'self', it
doesn't indicate 'master' at all, it doesn't say anything about 'static').

Look, I'd be more than happy to accept that I'm the only idiot who
misread the existing documentation on 'bridge fdb', but I fear that this
is far from true. If we want to make progress with this patch, some user
space breakage will be necessary - and I think I'm in favour of doing
that.


Appendix:

The bridge FDB netlink UAPI is modeled as a neighbouring protocol for
PF_BRIDGE, similar to what ARP is for IPv4 and ND is for IPv6. There is
this forced correlation between generic neighbour states (NUD ==
"Network Unreachability Detection") and types of FDB entries:
- NUD_NOARP:
Normally this state is used to denote a neighbour that doesn't need any
protocol to resolve the mapping between L3 address and L2 address.
It seems to be mostly used when the neigh->dev does not implement
header_ops (net device has no L2 header), and therefore no neighbor
resolution is needed, because there is no L2 addressing on that device.
For PF_BRIDGE, all FDB entries are NUD_NOARP, except for 'dynamic' ones
(see below).

- NUD_PERMANENT:
Normally this state means that the L2 address of the neighbor has been
statically configured by the user and therefore there is no need for a
neighbour resolution.
For PF_BRIDGE though, it means that an FDB entry is 'local', i.e. the L2
address belongs to a local interface.

- NUD_REACHABLE:
Normally this state means that the L2 address has been resolved by the
neighbouring protocol.
For PF_BRIDGE, this flag must be interpreted together with NUD_NOARP for
full meaning.
NUD_REACHABLE=true, NUD_NOARP=true: static FDB entry
NUD_REACHABLE=true, NUD_NOARP=false: dynamic FDB entry

  reply	other threads:[~2021-01-17 19:32 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-16  1:25 [RFC net-next 0/7] net: dsa: Sync local bridge FDB addresses to hardware Tobias Waldekranz
2021-01-16  1:25 ` [RFC net-next 1/7] net: bridge: switchdev: Refactor br_switchdev_fdb_notify Tobias Waldekranz
2021-01-17 17:24   ` Vladimir Oltean
2021-01-16  1:25 ` [RFC net-next 2/7] net: bridge: switchdev: Include local flag in FDB notifications Tobias Waldekranz
2021-01-17 19:30   ` Vladimir Oltean [this message]
2021-01-18 18:58     ` Tobias Waldekranz
2021-01-18 19:27       ` Vladimir Oltean
2021-01-18 20:19         ` Tobias Waldekranz
2021-01-18 21:03           ` Vladimir Oltean
2021-01-18 21:17           ` Nikolay Aleksandrov
2021-01-18 21:22             ` Nikolay Aleksandrov
2021-01-18 21:39               ` Nikolay Aleksandrov
2021-01-18 21:50                 ` Vladimir Oltean
2021-01-18 21:53                   ` Nikolay Aleksandrov
2021-01-18 22:06                     ` Vladimir Oltean
2021-01-18 22:09                       ` Vladimir Oltean
2021-01-18 22:42                       ` Nikolay Aleksandrov
2021-01-19  0:42                         ` Vladimir Oltean
2021-01-19 10:14                           ` Nikolay Aleksandrov
2021-01-18 19:28     ` Ido Schimmel
2021-01-16  1:25 ` [RFC net-next 3/7] net: bridge: switchdev: Send FDB notifications for host addresses Tobias Waldekranz
2021-01-18 11:28   ` Vladimir Oltean
2021-01-16  1:25 ` [RFC net-next 4/7] net: dsa: Include local addresses in assisted CPU port learning Tobias Waldekranz
2021-01-16  1:25 ` [RFC net-next 5/7] net: dsa: Include bridge " Tobias Waldekranz
2021-01-16  1:25 ` [RFC net-next 6/7] net: dsa: Sync static FDB entries on foreign interfaces to hardware Tobias Waldekranz
2021-01-16  1:25 ` [RFC net-next 7/7] net: dsa: mv88e6xxx: Request assisted learning on CPU port Tobias Waldekranz
2021-02-01  6:24 ` DENG Qingfang
2021-02-03  9:27   ` Tobias Waldekranz
2021-02-03 10:14     ` Vladimir Oltean
2021-02-03 10:42       ` Tobias Waldekranz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210117193009.io3nungdwuzmo5f7@skbuf \
    --to=olteanv@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=idosch@idosch.org \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=roopa@nvidia.com \
    --cc=stephen@networkplumber.org \
    --cc=tobias@waldekranz.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).