netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: Ido Schimmel <idosch@idosch.org>
Cc: Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <jakub.kicinski@netronome.com>,
	murali.policharla@broadcom.com,
	Stephen Hemminger <stephen@networkplumber.org>,
	Jiri Pirko <jiri@resnulli.us>, Jakub Kicinski <kuba@kernel.org>,
	Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
	netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
Date: Thu, 26 Mar 2020 13:44:51 +0200	[thread overview]
Message-ID: <CA+h21hqSWKSc-AD0fTA0XXsqmPdF_LCvKrksWEe8DGhdLm=AWQ@mail.gmail.com> (raw)
In-Reply-To: <20200326113542.GA1383155@splinter>

On Thu, 26 Mar 2020 at 13:35, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
> > Hi Ido,
> >
> > On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
> > >
> > > Hi Vladimir,
> > >
> > > On Wed, Mar 25, 2020 at 05:22:09PM +0200, Vladimir Oltean wrote:
> > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > >
> > > > In the initial attempt to add MTU configuration for DSA:
> > > >
> > > > https://patchwork.ozlabs.org/cover/1199868/
> > > >
> > > > Florian raised a concern about the bridge MTU normalization logic (when
> > > > you bridge an interface with MTU 9000 and one with MTU 1500). His
> > > > expectation was that the bridge would automatically change the MTU of
> > > > all its slave ports to the minimum MTU, if those slaves are part of the
> > > > same hardware bridge. However, it doesn't do that, and for good reason,
> > > > I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
> > > > bridge net device itself, and not that of any slave port.  If it were to
> > > > modify the MTU of the slave ports, the effect would be that the user
> > > > wouldn't be able to increase the MTU of any bridge slave port as long as
> > > > it was part of the bridge, which would be a bit annoying to say the
> > > > least.
> > > >
> > > > The idea behind this behavior is that normal termination from Linux over
> > > > the L2 forwarding domain described by DSA should happen over the bridge
> > > > net device, which _is_ properly limited by the minimum MTU. And
> > > > termination over individual slave device is possible even if those are
> > > > bridged. But that is not "forwarding", so there's no reason to do
> > > > normalization there, since only a single interface sees that packet.
> > > >
> > > > The real problem is with the offloaded data path, where of course, the
> > > > bridge net device MTU is ignored. So a packet received on an interface
> > > > with MTU 9000 would still be forwarded to an interface with MTU 1500.
> > > > And that is exactly what this patch is trying to prevent from happening.
> > >
> > > How is that different from the software data path where the CPU needs to
> > > forward the packet between port A with MTU X and port B with MTU X/2 ?
> > >
> > > I don't really understand what problem you are trying to solve here. It
> > > seems like the user did some misconfiguration and now you're introducing
> > > a policy to mitigate it? If so, it should be something the user can
> > > disable. It also seems like something that can be easily handled by a
> > > user space application. You get netlink notifications for all these
> > > operations.
> > >
> >
> > Actually I think the problem can be better understood if I explain
> > what the switches I'm dealing with look like.
> > None of them really has a 'MTU' register. They perform length-based
> > admission control on RX.
>
> IIUC, by that you mean that these switches only perform length-based
> filtering on RX, but not on TX?
>

Yes.

> > At this moment in time I don't think anybody wants to introduce an MRU
> > knob in iproute2, so we're adjusting that maximum ingress length
> > through the MTU. But it becomes an inverted problem, since the 'MTU'
> > needs to be controlled for all possible sources of traffic that are
> > going to egress on this port, in order for the real MTU on the port
> > itself to be observed.
>
> Looking at your example from the changelog:
>
> ip link set dev sw0p0 master br0
> ip link set dev sw0p1 mtu 1400
> ip link set dev sw0p1 master br0
>
> Without your patch, after these commands sw0p0 has an MTU of 1500 and
> sw0p1 has an MTU of 1400. Are you saying that a frame with a length of
> 1450 bytes received on sw0p0 will be able to egress sw0p1 (assuming it
> should be forwarded there)?
>

Yes.

> If so, then I think I understand the problem. However, I don't think
> such code belongs in the bridge driver as this restriction does not
> apply to all switches.

How do Mellanox switches deal with this?

> Also, I think that having the kernel change MTU
> of port A following MTU change of port B is a bit surprising and not
> intuitive.
>

It already changes the MTU of br0, this just goes along the same path.

> I think you should be more explicit about it. Did you consider listening
> to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
> unsupported configurations with an appropriate extack message? If you
> can't veto (in order not to break user space), you can still emit an
> extack message.

I suppose that is an alternative approach. This would be done from the
DSA core then? But instead of veto, just do the normalization thing.

Thanks,
-Vladimir

  reply	other threads:[~2020-03-26 11:45 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 01/10] net: dsa: configure the MTU for switch ports Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY Vladimir Oltean
2020-03-25 15:44   ` Heiner Kallweit
2020-03-25 22:45     ` Vladimir Oltean
2020-03-25 23:02       ` Florian Fainelli
2020-03-25 23:21       ` Heiner Kallweit
2020-03-25 15:22 ` [PATCH v2 net-next 03/10] bgmac: Add support for Jumbo frames Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 04/10] bgmac: Add MTU configuration support to the driver Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 05/10] bgmac: Add DMA support to handle frames beyond 8192 bytes Vladimir Oltean
2020-03-25 23:07   ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 06/10] net: dsa: b53: Add MTU configuration support Vladimir Oltean
2020-03-25 23:21   ` Florian Fainelli
2020-03-26  0:48     ` Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 07/10] net: dsa: sja1105: Implement the port MTU callbacks Vladimir Oltean
2020-03-25 23:08   ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 08/10] net: dsa: vsc73xx: Make the MTU configurable Vladimir Oltean
2020-03-25 23:09   ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 09/10] net: dsa: felix: support changing the MTU Vladimir Oltean
2020-03-25 23:10   ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath Vladimir Oltean
2020-03-25 23:17   ` Florian Fainelli
2020-03-26  0:30     ` Vladimir Oltean
2020-03-26 10:17   ` Ido Schimmel
2020-03-26 10:25     ` Vladimir Oltean
2020-03-26 11:35       ` Ido Schimmel
2020-03-26 11:44         ` Vladimir Oltean [this message]
2020-03-26 11:54           ` Ido Schimmel
2020-03-26 12:34             ` Vladimir Oltean
2020-03-26 12:59               ` Ido Schimmel
2020-03-26 12:06         ` Nikolay Aleksandrov
2020-03-26 12:18           ` Vladimir Oltean
2020-03-26 12:19             ` Nikolay Aleksandrov
2020-03-26 12:25               ` Vladimir Oltean
2020-03-26 12:38                 ` Nikolay Aleksandrov
2020-03-26 18:49                   ` Jakub Kicinski
2020-03-26 19:41                     ` Nikolay Aleksandrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+h21hqSWKSc-AD0fTA0XXsqmPdF_LCvKrksWEe8DGhdLm=AWQ@mail.gmail.com' \
    --to=olteanv@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=idosch@idosch.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=murali.policharla@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=stephen@networkplumber.org \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).