netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: Tobias Waldekranz <tobias@waldekranz.com>
Cc: andrew@lunn.ch, vivien.didelot@gmail.com, f.fainelli@gmail.com,
	netdev@vger.kernel.org, Ido Schimmel <idosch@idosch.org>
Subject: Re: [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices
Date: Wed, 28 Oct 2020 20:18:24 +0200	[thread overview]
Message-ID: <20201028181824.3dccguch7d5iij2r@skbuf> (raw)
In-Reply-To: <C6OMPK3XEMGG.1243CP066VN7O@wkz-x280>

Adding Ido here, he has some more experience with the do's and dont's
here, and therefore might have one or two ideas to share.

On Wed, Oct 28, 2020 at 04:28:23PM +0100, Tobias Waldekranz wrote:
> On Wed Oct 28, 2020 at 3:05 PM CET, Vladimir Oltean wrote:
> > On one hand, I feel pretty yucky about this change.
> > On the other hand, I wonder if it might be useful under some conditions
> > for drivers with DSA_TAG_PROTO_NONE? For example, once the user bridges
> > all slave interfaces, then that bridge will start being able to send and
> > receive traffic, despite none of the individual switch ports being able
> > to do that. Then, you could even go off and bridge a "foreign"
> > interface,
> > and that would again work properly. That use case is not supported
> > today,
> > but is very useful.
> >
> > Thoughts?
> 
> In a scenario like the one you describe, are you saying that you would
> set skb->dev to the bridge's netdev in the rcv op?
> 
> On ingress I think that would work. On egress I guess you would be
> getting duplicates for all multi-destination packets as there is no
> way for the none-tagger to limit it, right? (Unless you have the
> awesome tx-offloading we talked about yesterday of course :))
> 
> I think bridging to "foreign" interfaces still won't work, because on
> ingress the packet would never be caught by the bridge's rx
> handler. In order for something to be received by br_input.c, it has
> to pass through an interface that is attached to it, no?  Everything
> above the bridge (like VLAN interfaces) should still work though.

Yes, I expect that the bridge input would need to have one more entry
path into it than just br_handle_frame.

I'm a bit confused and undecided right now, so let's look at it from a
different perspective. Let's imagine a switchdev driver (DSA or not)
which is able to offload IP forwarding. There are some interfaces that
are bridged and one that is standalone. The setup looks as below.

 IP interfaces
                +---------------------------------------------------------+
                |                           br0                           |
                +---------------------------------------------------------+

 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    eth0    |
 +------------+ +------------+ +------------+ +------------+ +------------+

 Hardware interfaces

 +------------+ +------------+ +------------+ +------------+ +------------+
 | DSA port 0 | | DSA port 1 | | DSA port 2 | | DSA port 3 | |   e1000    |
 +------------+ +------------+ +------------+ +------------+ +------------+

Let's say you receive a packet on the standalone swp0, and you need to
perform IP routing towards the bridged domain br0. Some switchdev/DSA
ports are bridged and some aren't.

The switchdev/DSA switch will attempt to do the IP routing step first,
and it _can_ do that because it is aware of the br0 interface, so it
will decrement the TTL and replace the L2 header.

At this stage we have a modified IP packet, which corresponds with what
should be injected into the hardware's view of the br0 interface. The
packet is still in the switchdev/DSA hardware data path.

But then, the switchdev/DSA hardware will look up the FDB in the name of
br0, in an attempt of finding the destination port for the packet. But
the packet should be delivered to a station connected to eth0 (e1000,
foreign interface). So that's part of the exception path, the packet
should be delivered to the CPU.

But the packet was already modified by the hardware data path (IP
forwarding has already taken place)! So how should the DSA/switchdev
hardware deliver the packet to the CPU? It has 2 options:

(a) unwind the entire packet modification, cancel the IP forwarding and
    deliver the unmodified packet to the CPU on behalf of swp0, the
    ingress port. Then let software IP forwarding plus software bridging
    deal with it, so that it can reach the e1000.
(b) deliver the packet to the CPU in the middle of the hardware
    forwarding data path, where the exception/miss occurred, aka deliver
    it on behalf of br0. Modified by IP forwarding. This is where we'd
    have to manually inject skb->dev into br0 somehow.

Maybe this sounds a bit crazy, considering that we don't have IP
forwarding hardware with DSA today, and I am not exactly sure how other
switchdev drivers deal with this exception path today. But nonetheless,
it's almost impossible for DSA switches with IP forwarding abilities to
never come up some day, so we ought to have our mind set about how the
RX data path should like, and whether injecting directly into an upper
is icky or a fact of life.

Things get even more interesting when this is a cascaded DSA setup, and
the bridging and routing are cross-chip. There, the FIB/FDB of 2 there
isn't really any working around the problem that the packet might need
to be delivered to the CPU somewhere in the middle of the data path, and
it would need to be injected into the RX path of an upper interface in
that case.

What do you think?

  reply	other threads:[~2020-10-28 22:23 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
2020-10-27 10:51 ` [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X Tobias Waldekranz
2020-10-27 14:52   ` Marek Behun
2020-10-27 14:54     ` Marek Behun
2020-10-27 14:58       ` Marek Behun
2020-10-27 10:51 ` [RFC PATCH 2/4] net: dsa: link aggregation support Tobias Waldekranz
2020-10-28  0:58   ` Vladimir Oltean
2020-10-28 14:03     ` Tobias Waldekranz
2020-10-27 10:51 ` [RFC PATCH 3/4] net: dsa: mv88e6xxx: " Tobias Waldekranz
2020-10-27 10:51 ` [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices Tobias Waldekranz
2020-10-28 12:05   ` Vladimir Oltean
2020-10-28 15:28     ` Tobias Waldekranz
2020-10-28 18:18       ` Vladimir Oltean [this message]
2020-10-28 22:31         ` Tobias Waldekranz
2020-10-28 23:08           ` Vladimir Oltean
2020-10-29  7:47             ` Tobias Waldekranz
2020-10-30  9:21               ` Vladimir Oltean
2020-11-01 11:31         ` Ido Schimmel
2020-10-27 12:27 ` [RFC PATCH 0/4] net: dsa: link aggregation support Vladimir Oltean
2020-10-27 14:29   ` Andrew Lunn
2020-10-27 14:59   ` Tobias Waldekranz
2020-10-27 14:00 ` Andrew Lunn
2020-10-27 15:09   ` Tobias Waldekranz
2020-10-27 15:05 ` Marek Behun
2020-10-27 15:23   ` Andrew Lunn
2020-10-27 18:25     ` Tobias Waldekranz
2020-10-27 18:33       ` Marek Behun
2020-10-27 19:04         ` Vladimir Oltean
2020-10-27 19:21           ` Tobias Waldekranz
2020-10-27 19:00       ` Vladimir Oltean
2020-10-27 19:37         ` Tobias Waldekranz
2020-10-27 20:02           ` Vladimir Oltean
2020-10-27 20:53             ` Tobias Waldekranz
2020-10-27 22:32               ` Vladimir Oltean
2020-10-28  0:27                 ` Tobias Waldekranz
2020-10-28 22:35       ` Marek Behun
2020-10-27 22:36 ` Andrew Lunn
2020-10-28  0:45   ` Tobias Waldekranz
2020-10-28  1:03     ` Andrew Lunn
2020-11-11  4:28 ` Florian Fainelli
2020-11-19 10:51 ` Vladimir Oltean
2020-11-19 11:52   ` Tobias Waldekranz
2020-11-19 18:12     ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201028181824.3dccguch7d5iij2r@skbuf \
    --to=olteanv@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=f.fainelli@gmail.com \
    --cc=idosch@idosch.org \
    --cc=netdev@vger.kernel.org \
    --cc=tobias@waldekranz.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).