From: Vladimir Oltean <olteanv@gmail.com>
To: Ido Schimmel <idosch@idosch.org>
Cc: Andrew Lunn <andrew@lunn.ch>,
Florian Fainelli <f.fainelli@gmail.com>,
Vivien Didelot <vivien.didelot@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <jakub.kicinski@netronome.com>,
murali.policharla@broadcom.com,
Stephen Hemminger <stephen@networkplumber.org>,
Jiri Pirko <jiri@resnulli.us>, Jakub Kicinski <kuba@kernel.org>,
Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
Date: Thu, 26 Mar 2020 13:44:51 +0200 [thread overview]
Message-ID: <CA+h21hqSWKSc-AD0fTA0XXsqmPdF_LCvKrksWEe8DGhdLm=AWQ@mail.gmail.com> (raw)
In-Reply-To: <20200326113542.GA1383155@splinter>
On Thu, 26 Mar 2020 at 13:35, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
> > Hi Ido,
> >
> > On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
> > >
> > > Hi Vladimir,
> > >
> > > On Wed, Mar 25, 2020 at 05:22:09PM +0200, Vladimir Oltean wrote:
> > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > >
> > > > In the initial attempt to add MTU configuration for DSA:
> > > >
> > > > https://patchwork.ozlabs.org/cover/1199868/
> > > >
> > > > Florian raised a concern about the bridge MTU normalization logic (when
> > > > you bridge an interface with MTU 9000 and one with MTU 1500). His
> > > > expectation was that the bridge would automatically change the MTU of
> > > > all its slave ports to the minimum MTU, if those slaves are part of the
> > > > same hardware bridge. However, it doesn't do that, and for good reason,
> > > > I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
> > > > bridge net device itself, and not that of any slave port. If it were to
> > > > modify the MTU of the slave ports, the effect would be that the user
> > > > wouldn't be able to increase the MTU of any bridge slave port as long as
> > > > it was part of the bridge, which would be a bit annoying to say the
> > > > least.
> > > >
> > > > The idea behind this behavior is that normal termination from Linux over
> > > > the L2 forwarding domain described by DSA should happen over the bridge
> > > > net device, which _is_ properly limited by the minimum MTU. And
> > > > termination over individual slave device is possible even if those are
> > > > bridged. But that is not "forwarding", so there's no reason to do
> > > > normalization there, since only a single interface sees that packet.
> > > >
> > > > The real problem is with the offloaded data path, where of course, the
> > > > bridge net device MTU is ignored. So a packet received on an interface
> > > > with MTU 9000 would still be forwarded to an interface with MTU 1500.
> > > > And that is exactly what this patch is trying to prevent from happening.
> > >
> > > How is that different from the software data path where the CPU needs to
> > > forward the packet between port A with MTU X and port B with MTU X/2 ?
> > >
> > > I don't really understand what problem you are trying to solve here. It
> > > seems like the user did some misconfiguration and now you're introducing
> > > a policy to mitigate it? If so, it should be something the user can
> > > disable. It also seems like something that can be easily handled by a
> > > user space application. You get netlink notifications for all these
> > > operations.
> > >
> >
> > Actually I think the problem can be better understood if I explain
> > what the switches I'm dealing with look like.
> > None of them really has a 'MTU' register. They perform length-based
> > admission control on RX.
>
> IIUC, by that you mean that these switches only perform length-based
> filtering on RX, but not on TX?
>
Yes.
> > At this moment in time I don't think anybody wants to introduce an MRU
> > knob in iproute2, so we're adjusting that maximum ingress length
> > through the MTU. But it becomes an inverted problem, since the 'MTU'
> > needs to be controlled for all possible sources of traffic that are
> > going to egress on this port, in order for the real MTU on the port
> > itself to be observed.
>
> Looking at your example from the changelog:
>
> ip link set dev sw0p0 master br0
> ip link set dev sw0p1 mtu 1400
> ip link set dev sw0p1 master br0
>
> Without your patch, after these commands sw0p0 has an MTU of 1500 and
> sw0p1 has an MTU of 1400. Are you saying that a frame with a length of
> 1450 bytes received on sw0p0 will be able to egress sw0p1 (assuming it
> should be forwarded there)?
>
Yes.
> If so, then I think I understand the problem. However, I don't think
> such code belongs in the bridge driver as this restriction does not
> apply to all switches.
How do Mellanox switches deal with this?
> Also, I think that having the kernel change MTU
> of port A following MTU change of port B is a bit surprising and not
> intuitive.
>
It already changes the MTU of br0, this just goes along the same path.
> I think you should be more explicit about it. Did you consider listening
> to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
> unsupported configurations with an appropriate extack message? If you
> can't veto (in order not to break user space), you can still emit an
> extack message.
I suppose that is an alternative approach. This would be done from the
DSA core then? But instead of veto, just do the normalization thing.
Thanks,
-Vladimir
next prev parent reply other threads:[~2020-03-26 11:45 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 01/10] net: dsa: configure the MTU for switch ports Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY Vladimir Oltean
2020-03-25 15:44 ` Heiner Kallweit
2020-03-25 22:45 ` Vladimir Oltean
2020-03-25 23:02 ` Florian Fainelli
2020-03-25 23:21 ` Heiner Kallweit
2020-03-25 15:22 ` [PATCH v2 net-next 03/10] bgmac: Add support for Jumbo frames Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 04/10] bgmac: Add MTU configuration support to the driver Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 05/10] bgmac: Add DMA support to handle frames beyond 8192 bytes Vladimir Oltean
2020-03-25 23:07 ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 06/10] net: dsa: b53: Add MTU configuration support Vladimir Oltean
2020-03-25 23:21 ` Florian Fainelli
2020-03-26 0:48 ` Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 07/10] net: dsa: sja1105: Implement the port MTU callbacks Vladimir Oltean
2020-03-25 23:08 ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 08/10] net: dsa: vsc73xx: Make the MTU configurable Vladimir Oltean
2020-03-25 23:09 ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 09/10] net: dsa: felix: support changing the MTU Vladimir Oltean
2020-03-25 23:10 ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath Vladimir Oltean
2020-03-25 23:17 ` Florian Fainelli
2020-03-26 0:30 ` Vladimir Oltean
2020-03-26 10:17 ` Ido Schimmel
2020-03-26 10:25 ` Vladimir Oltean
2020-03-26 11:35 ` Ido Schimmel
2020-03-26 11:44 ` Vladimir Oltean [this message]
2020-03-26 11:54 ` Ido Schimmel
2020-03-26 12:34 ` Vladimir Oltean
2020-03-26 12:59 ` Ido Schimmel
2020-03-26 12:06 ` Nikolay Aleksandrov
2020-03-26 12:18 ` Vladimir Oltean
2020-03-26 12:19 ` Nikolay Aleksandrov
2020-03-26 12:25 ` Vladimir Oltean
2020-03-26 12:38 ` Nikolay Aleksandrov
2020-03-26 18:49 ` Jakub Kicinski
2020-03-26 19:41 ` Nikolay Aleksandrov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+h21hqSWKSc-AD0fTA0XXsqmPdF_LCvKrksWEe8DGhdLm=AWQ@mail.gmail.com' \
--to=olteanv@gmail.com \
--cc=andrew@lunn.ch \
--cc=davem@davemloft.net \
--cc=f.fainelli@gmail.com \
--cc=idosch@idosch.org \
--cc=jakub.kicinski@netronome.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=murali.policharla@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=nikolay@cumulusnetworks.com \
--cc=stephen@networkplumber.org \
--cc=vivien.didelot@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).