netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tobias Waldekranz <tobias@waldekranz.com>
To: Ido Schimmel <idosch@idosch.org>, Vladimir Oltean <olteanv@gmail.com>
Cc: davem@davemloft.net, kuba@kernel.org, andrew@lunn.ch,
	vivien.didelot@gmail.com, f.fainelli@gmail.com,
	j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net,
	netdev@vger.kernel.org
Subject: Re: [PATCH v3 net-next 2/4] net: dsa: Link aggregation support
Date: Wed, 16 Dec 2020 16:15:03 +0100	[thread overview]
Message-ID: <87y2hxbx54.fsf@waldekranz.com> (raw)
In-Reply-To: <20201214114237.GA2789489@shredder.lan>

On Mon, Dec 14, 2020 at 13:42, Ido Schimmel <idosch@idosch.org> wrote:
> On Mon, Dec 14, 2020 at 02:12:31AM +0200, Vladimir Oltean wrote:
>> On Sun, Dec 13, 2020 at 10:18:27PM +0100, Tobias Waldekranz wrote:
>> > On Sat, Dec 12, 2020 at 16:26, Vladimir Oltean <olteanv@gmail.com> wrote:
>> > > On Fri, Dec 11, 2020 at 09:50:24PM +0100, Tobias Waldekranz wrote:
>> > >> 2. The issue Vladimir mentioned above. This is also a straight forward
>> > >>    fix, I have patch for tag_dsa, making sure that offload_fwd_mark is
>> > >>    never set for ports in standalone mode.
>> > >>
>> > >>    I am not sure if I should solve it like that or if we should just
>> > >>    clear the mark in dsa_switch_rcv if the dp does not have a
>> > >>    bridge_dev. I know both Vladimir and I were leaning towards each
>> > >>    tagger solving it internally. But looking at the code, I get the
>> > >>    feeling that all taggers will end up copying the same block of code
>> > >>    anyway. What do you think?
>> > >> As for this series, my intention is to make sure that (A) works as
>> > >> intended, leaving (B) for another day. Does that seem reasonable?
>> > >>
>> > >> NOTE: In the offloaded case, (B) will of course also be supported.
>> > >
>> > > Yeah, ok, one can already tell that the way I've tested this setup was
>> > > by commenting out skb->offload_fwd_mark = 1 altogether. It seems ok to
>> > > postpone this a bit.
>> > >
>> > > For what it's worth, in the giant "RX filtering for DSA switches" fiasco
>> > > https://patchwork.ozlabs.org/project/netdev/patch/20200521211036.668624-11-olteanv@gmail.com/
>> > > we seemed to reach the conclusion that it would be ok to add a new NDO
>> > > answering the question "can this interface do forwarding in hardware
>> > > towards this other interface". We can probably start with the question
>> > > being asked for L2 forwarding only.
>> >
>> > Very interesting, though I did not completely understand the VXLAN
>> > scenario laid out in that thread. I understand that OFM can not be 0,
>> > because you might have successfully forwarded to some destinations. But
>> > setting it to 1 does not smell right either. OFM=1 means "this has
>> > already been forwarded according to your current configuration" which is
>> > not completely true in this case. This is something in the middle, more
>> > like skb->offload_fwd_mark = its_complicated;
>> 
>> Very pertinent question. Given your observation that nbp_switchdev_mark_set()
>> calls dev_get_port_parent_id() with recurse=true, this means that a vxlan
>> upper should have the same parent ID as the real interface. At least the
>> theory coincides with the little practice I applied to my setup where
>> felix does not support vxlan offload:
>> 
>> I printed the p->offload_fwd_mark assigned by nbp_switchdev_mark_set:
>> ip link add br0 type bridge
>> ip link set swp1 master br0
>> [   15.887217] mscc_felix 0000:00:00.5 swp1: offload_fwd_mark 1
>> ip link add vxlan10 type vxlan id 10 group 224.10.10.10 dstport 4789 ttl 10 dev swp0
>> ip link set vxlan10 master br0
>> [  102.734390] vxlan10: offload_fwd_mark 1
>> 
>> So a clearer explanation needs to be found for how Ido's exception
>> traffic due to missing neighbor in the vxlan underlay gets re-forwarded
>> by the software bridge to the software vxlan interface. It cannot be due
>> to a mismatch of bridge port offload_fwd_mark values unless there is
>> some different logic applied for Mellanox hardware that I am not seeing.
>> So after all, it must be due to skb->offload_fwd_mark being unset?
>> 
>> To be honest, I almost expect that the Mellanox switches are "all or
>> nothing" in terms of forwarding. So if the vxlan interface (which is
>> only one of the bridge ports) could not deliver the packet, it would
>> seem cleaner to me that none of the other interfaces deliver the packet
>> either. Then the driver picks up this exception packet on the original
>> ingress interface, and the software bridge + software vxlan do the job.
>> And this means that skb->offload_fwd_mark = it_isnt_complicated.
>> 
>> But this is clearly at odds with what Ido said, that "swp0 and vxlan0 do
>> not have the same parent ID", and which was the center of his entire
>> argument. It's my fault really, I should have checked. Let's hope that
>> Ido can explain again.
>
> Problem is here:
>
> ip link add vxlan10 type vxlan id 10 group 224.10.10.10 dstport 4789 ttl 10 dev swp0
>
> We don't configure VXLAN with a bound device. In fact, we forbid it:
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c#L46
> https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/drivers/net/mlxsw/vxlan.sh#L182
>
> Even if we were to support a bound device, it is unlikely to be a switch
> port, but some dummy interface that we would enslave to a VRF in which
> we would like the underlay lookup to be performed. We use this with GRE
> tunnels:
> https://github.com/Mellanox/mlxsw/wiki/L3-Tunneling#general-gre-configuration
>
> Currently, underlay lookup always happens in the default VRF.
>
> VXLAN recently got support for this as well. See this series:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=79dfab43a976b76713c40222987c48e32510ebc1

How do you handle multiple VXLAN interfaces?  I.e. in this setup:

         br0
   .--' .' '. '----.
  /    /     \      \
swp0 swp1  vxlan0 vxlan1

Say that both VXLANs are offloaded, the nexthop of vxlan0 is in the
hardware ARP cache, but vxlan1's is not.

If a broadcast is received on swp0, hardware will forward to swp1 and
vxlan0, then trap the original frame to the CPU with offload_fwd_mark=1.

What prevents duplicates from being sent out through vxlan0 in that
case?

>> 
>> > Anyway, so we are essentially talking about replacing the question "do
>> > you share a parent with this netdev?" with "do you share the same
>> > hardware bridging domain as this netdev?" when choosing the port's OFM
>> > in a bridge, correct? If so, great, that would also solve the software
>> > LAG case. This would also get us one step closer to selectively
>> > disabling bridge offloading on a switchdev port.
>> 
>> Well, I cannot answer this until I fully understand the other issue
>> above - basically how is it that Mellanox switches do software
>> forwarding for exception traffic today.
>> 
>> Ido, for background, here's the relevant portion of the thread. We're
>> talking about software fallback for a bridge-over-bonding-over-DSA
>> scenario:
>> https://lore.kernel.org/netdev/87a6uk5apb.fsf@waldekranz.com/

  reply	other threads:[~2020-12-16 15:16 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-02  9:13 [PATCH v3 net-next 0/4] net: dsa: Link aggregation support Tobias Waldekranz
2020-12-02  9:13 ` [PATCH v3 net-next 1/4] net: bonding: Notify ports about their initial state Tobias Waldekranz
2020-12-02 19:09   ` Jay Vosburgh
2020-12-02 21:52     ` Tobias Waldekranz
2020-12-03  0:39       ` Jay Vosburgh
2020-12-03  8:16         ` Tobias Waldekranz
2020-12-02  9:13 ` [PATCH v3 net-next 2/4] net: dsa: Link aggregation support Tobias Waldekranz
2020-12-02 10:07   ` Vladimir Oltean
2020-12-02 10:51     ` Tobias Waldekranz
2020-12-02 18:58   ` Jakub Kicinski
2020-12-02 21:29     ` Tobias Waldekranz
2020-12-02 21:32       ` Vladimir Oltean
2020-12-03 16:24   ` Vladimir Oltean
2020-12-03 20:53     ` Tobias Waldekranz
2020-12-03 21:09       ` Andrew Lunn
2020-12-03 21:35         ` Tobias Waldekranz
2020-12-04  0:35           ` Vladimir Oltean
2020-12-03 21:57       ` Vladimir Oltean
2020-12-03 23:12         ` Tobias Waldekranz
2020-12-04  0:56           ` Vladimir Oltean
2020-12-07 21:49             ` Tobias Waldekranz
2020-12-04  1:33         ` Andrew Lunn
2020-12-04  4:18           ` Florian Fainelli
2020-12-07 21:56             ` Tobias Waldekranz
2020-12-03 20:48   ` Vladimir Oltean
2020-12-04  2:20   ` Andrew Lunn
2020-12-07 21:19     ` Tobias Waldekranz
2020-12-07 23:26       ` Andrew Lunn
2020-12-09  8:57         ` Tobias Waldekranz
2020-12-09 14:27           ` Andrew Lunn
2020-12-09 15:21             ` Tobias Waldekranz
2020-12-09 23:03               ` Andrew Lunn
2020-12-04  4:04   ` Florian Fainelli
2020-12-08 11:23   ` Vladimir Oltean
2020-12-08 15:33     ` Tobias Waldekranz
2020-12-08 16:37       ` Vladimir Oltean
2020-12-09  8:37         ` Tobias Waldekranz
2020-12-09 10:53           ` Vladimir Oltean
2020-12-09 14:11             ` Tobias Waldekranz
2020-12-09 16:04               ` Vladimir Oltean
2020-12-09 22:01                 ` Tobias Waldekranz
2020-12-09 22:21                   ` Vladimir Oltean
2020-12-10 10:18                     ` Tobias Waldekranz
2020-12-09 22:59                 ` Andrew Lunn
2020-12-10  1:05                   ` Vladimir Oltean
2020-12-09 14:23             ` Andrew Lunn
2020-12-09 23:17               ` Vladimir Oltean
2020-12-08 17:26     ` Andrew Lunn
2020-12-11 20:50     ` Tobias Waldekranz
2020-12-12 14:26       ` Vladimir Oltean
2020-12-13 21:18         ` Tobias Waldekranz
2020-12-14  0:12           ` Vladimir Oltean
2020-12-14 11:42             ` Ido Schimmel
2020-12-16 15:15               ` Tobias Waldekranz [this message]
2020-12-16 18:48                 ` Ido Schimmel
2020-12-14  9:41           ` Tobias Waldekranz
2020-12-02  9:13 ` [PATCH v3 net-next 3/4] net: dsa: mv88e6xxx: " Tobias Waldekranz
2020-12-02  9:13 ` [PATCH v3 net-next 4/4] net: dsa: tag_dsa: Support reception of packets from LAG devices Tobias Waldekranz
2020-12-04  3:58   ` Florian Fainelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2hxbx54.fsf@waldekranz.com \
    --to=tobias@waldekranz.com \
    --cc=andrew@lunn.ch \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=idosch@idosch.org \
    --cc=j.vosburgh@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=vfalico@gmail.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).